There will also be a community dinner (Tuesday, September 26, 2017), so definitely watch the announcement that will be on the blog, and I’m sure on Twitter. Besides being fun, the Lightning Talks will happen during that time.Releases
The annual db tech showcase Tokyo 2017 took place this week from 5-7 September. It was a fun event as always, with capacity for 800 people per day. The event grows larger each year, and reminds me of the heyday of the MySQL Conference & Expo.
The db tech showcase is a five-parallel-track show, with each talk approximately 50 minutes. The event started with a keynote by Richard Hipp, creator of SQLite (if you were a Percona Live Santa Clara 2017 attendee, you’d have also seen him there). The rest of the event is a mix between Japanese language content and English language content. The sponsor list is lengthy, and if you walk the floor you could collect a lot of datasheets.
One thing I really liked? At some talks, you’d get a clear folder with a contact form as well as the printed slide deck. This is a great way to let the speaker’s company contact you. It’s a common issue that I (and others) speak to large amounts of people and have no idea who’s in the talk. I can only imagine our marketing and sales teams being much happier if they could get access to an attendee list! I wonder if this will work in other markets?
It’s interesting to see that there is a Japan MariaDB User Group now. It’s clear the MySQL user group needs a revival! I saw a talk from Toshiba on performance tests using MariaDB Server, but not with MySQL (a little odd?). The MongoDB content was pretty latent, which is unsurprising because we don’t see a huge MongoDB uptake or community in Japan (or South Korea for that matter).
Will I go back? Absolutely. I’ve been going for a few years, and it’s a great place for people who are crazy about database technology. You really get a spectrum of database presentations, and I expect most people go back with many ideas of what they might want to evaluate for production.
I spoke about the Engineering that goes into Percona Server for MySQL 5.6 and 5.7, with a hint of MongoDB. The slides are in a mix of Japanese and English. The Japanese translation: Percona ServerをMySQL 5.6と5.7用に作るエンジニアリング（そしてMongoDBのヒント）.Upcoming Appearances
Percona’s website keeps track of community events, so check there to see where to listen to a Perconian speak. My upcoming appearances are:
Did you try replication-manager last week? Guillaume Lefranc, the lead developer, writes in to talk about the new features such as support for MySQL 5.7, Binlog Flashback, multi-cluster mode and various stability fixes.
I came across a post from Hans-Juergen Schoenig, a Postgres consultant at Cybertec. In it, he dismissed MySQL and showed Postgres as better. While his post ignores most of the reasons why MySQL is better, I will focus on where his post is less than accurate. Testing for MySQL was done with Percona Server 5.7, defaults.
Mr. Schoenig complains that MySQL changes data types automatically. He claims inserting 1234.5678 into a numeric(4, 2) column on Postgres produces an error, and that MySQL just rounds the number to fit. In my testing I found this to be a false claim:mysql> CREATE TABLE data ( -> id integer NOT NULL, -> data numeric(4, 2)); Query OK, 0 rows affected (0.07 sec) mysql> INSERT INTO data VALUES (1, 1234.5678); ERROR 1264 (22003): Out of range value for column 'data' at row 1
His next claim is that MySQL allows updating a key column to NULL and silently changes it to 0. This is also false:mysql> INSERT INTO data VALUES (1, 12); Query OK, 1 row affected (0.00 sec) mysql> UPDATE data SET id = NULL WHERE id = 1; ERROR 1048 (23000): Column 'id' cannot be null
In the original post, we never see the warnings and so don’t have the full details of his environment. Since he didn’t specify which version he was testing on, I will point out that MySQL 5.7 does a far better job out-of-the-box handling your data than 5.6 does, and SQL Mode has existed in MySQL for ages. Any user could set it to STRICT_ALL|TRANS_TABLES and get the behavior that is now default in 5.7.
The author is also focusing on a narrow issue, using it to say Postgres is better. I feel this is misleading. I could point out factors in MySQL that are better than in Postgres as well.
This is another case of “don’t necessarily take our word for it”. A simple test of what you see on a blog can help you understand how things work in your environment and why.
Join Percona’s Product Manager, Michael Coburn as he presents Using Percona Monitoring and Management to Troubleshoot MySQL Performance Issues on Thursday, September 7, 2017, at 10:00 am PDT / 1:00 pm EDT (UTC-7).Reserve Your Spot
Successful applications often become limited by MySQL performance. Michael will show you how to get great MySQL performance using Percona Monitoring and Management (PMM). There will be a demonstration of how to leverage the combination of the query analytics and metrics monitor when troubleshooting MySQL performance issues. We’ll review the essential components of PMM, and use some of the most common database slowness cases as examples of where to look and what to do.
By the end of the webinar you will have a better understanding of:
Percona, in collaboration with Facebook, is proud to announce the first experimental release of MyRocks in Percona Server for MySQL 5.7, with packages.
Now in September 2017, we are pleased to announce the first full experimental release of MyRocks with packages for Percona Server for MySQL 5.7.19-17.
The basis of the MyRocks storage engine is the RocksDB key-value store, which is a log-structured merge-tree (or LSM). It uses much less space, and has a much smaller write volume (write amplification) compared to a B+ tree database implementation such as InnoDB. As a result, MyRocks has the following advantages compared to other storage engines, if your workload uses fast storage (such as SSD):
Percona MyRocks is distributed as a separate package that can be enabled as a plugin for Percona Server for MySQL 5.7.19-17.WARNING: Percona MyRocks is currently considered experimental and is not yet recommended for production use.
We are providing packages for most popular 64-bit Linux distributions:
Installation instructions can be found here.
Due to the differences between Facebook MySQL 5.6.35 and Percona Server for MySQL 5.7, there are some behavioral differences and additional limitations. Some of these are documented here.
We encourage you to install and experiment with MyRocks for Percona Server for MySQL and join the discussion here.
Any issues that you might find can be searched for and reported here.
We thank the RocksDB and MyRocks development teams at Facebook for providing the foundation and assistance in developing MyRocks for Percona Server for MySQL. Without their efforts, this would not have been possible.
A new set of MongoDB attacks and data breaches struck businesses this weekend, mirroring the attacks that hit back in January and putting MongoDB security back into the spotlight.
Like the last set, this new attack strategy focused on ransomware that demanded a paid ransom to unlock hijacked data. As with many security breaches, the attack was preventable – if you correctly configured your MongoDB databases to prevent security vulnerabilities.
From the ZDNet article above:
“So these attackers simply scan the entire IPv4 internet for a MongoDB running on port 271017,” Gevers told ZDNet. “When they detect these, they then simply try to get access to it with a script that automatically deletes the database and creates a similar one with only one record holding the ransom note.
“The databases that get hacked were running with default settings and were completely exposed to the internet.”
If you rely on databases to run your business, you need to guarantee database security and performance. Your administrators must protect you from situations like the one mentioned above.
Ultimately, database security comes down to two things: identifying core areas of MongoDB security and knowing exactly what to monitor. For MongoDB to work as expected, you need to correctly setup up your databases and monitor them regularly.
Percona experts have addressed many of these issues already on our blog and in our webinars. Here are some links to existing resources that can help you secure your MongoDB databases, and prevent security disasters:
During this webinar Peter will talk about newly released features in Percona software, show a few quick demos and share with you highlights from the Percona open source software roadmap. This discussion will cover Percona Server for MySQL and MongoDB, Percona XtraBackup, Percona Toolkit, Percona XtraDB Cluster and Percona Monitoring and Management.
Peter will also talk about new developments in Percona commercial services and finish with a Q&A.
Register for the webinar before seats fill up for this exciting webinar Wednesday, September 6, 2017 at 10am PT / 1pm ET (UTC-7).Peter Zaitsev, Percona CEO and Co-Founder Peter Zaitsev co-founded Percona and assumed the role of CEO in 2006. As one of the foremost experts on MySQL strategy and optimization, Peter leveraged both his technical vision and entrepreneurial skills to grow Percona from a two-person shop to one of the most respected open source companies in the business. With over 140 professionals in 30+ countries, Peter’s venture now serves over 3000 customers – including the “who’s who” of internet giants, large enterprises and many exciting startups. Percona was named to the Inc. 5000 in 2013, 2014, 2015 and 2016. Peter was an early employee at MySQL AB, eventually leading the company’s High Performance Group. A serial entrepreneur, Peter co-founded his first startup while attending Moscow State University, where he majored in Computer Science. Peter is a co-author of High Performance MySQL: Optimization, Backups, and Replication, one of the most popular books on MySQL performance. Peter frequently speaks as an expert lecturer at MySQL and related conferences, and regularly posts on the Percona Database Performance Blog. Fortune and DZone have both tapped Peter as a contributor, and his recent ebook Practical MySQL Performance Optimization is one of percona.com’s most popular downloads.
I’ve blogged before about one of our regular clients, Life360. One of the issues they recently had was the PING command taking about 30%-40% of total queries per second across their database infrastructure. This is a non-trivial amount and was easily tens of thousands of pings per second. This added a significant amount of latency to real queries.
A large number of pings is due to the use of PHP PDO with persistent connections. Persistence, or pooling, is necessary to reduce time spent on connecting, disconnecting and reconnecting.
Unfortunately, in PHP (and other) implementations, the driver checks if the database is still alive with a PING before sending the actual command. Logic dictates that you could use the actual command as the PING, and if it fails it could return the same error it would have if the ping itself failed. Baron Schwartz has a lot to say about how unwise the use of a PING is within the drivers.
Barring rewriting PHP PDO, we thought up another solution: ProxySQL.
Before testing ProxySQL, we didn’t know how much gets forwarded to the actual hosts (including these com_admin commands). We wanted to test a quick PoC to discover what the actual behavior of ProxySQL was with respect to these commands. We had two hypothesis that we wanted to check:
In the event that ProxySQL forwarded everything, we set up a “decoy” MySQL instance to respond to the pings using ProxySQL query filtering. As it turned out, we found that ProxySQL quickly and silently replies to PINGs and doesn’t forward it onto the underlying database server backend. This is the case for other commands as well, as ProxySQL isn’t strictly a forwarding proxy (but more of a reverse proxy).
By placing ProxySQL on the application servers, Life360 was able to reduce QPS significantly. The other advantage of introducing ProxySQL is that it does connection pooling and multiplexing for you.
Here is the graph of com_ping on the day of the deployment:
Overall, we are talking about hundreds of millions of pings per day down to 0.
We can see that the vanilla install of ProxySQL also reduced active threads significantly:
This change has enabled Life360 to put off some of their scaling plans and provided other operational gains.
In conclusion, you can use ProxySQL as a simple (or advanced) firewall between your application and database. It consumes very little resources, but provides an immense performance gain.
While ProxySQL can be used as a more advanced firewall, these features are beyond the scope of this post. There are very specific ways to configure it as an advanced firewall. We plan to blog more on this soon.
Join Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.Percona Live Europe Dublin
Have you registered for Percona Live Europe Dublin? We’ve more or less finalized the schedule, and the conference grid looks 100% full. We’re four weeks away, so I suggest you register ASAP!
I should also mention that no event can be pulled off without sponsors, so thank you sponsors of Percona Live Europe 2017. I sincerely hope to see more sign up. Feel free to ask me more about it, or just check out our sponsor prospectus.Releases
Percona’s web site tracks community events, so check that out and see where to listen to Perconians speak. My upcoming appearances are:
I’ve been spending time on writing my db tech showcase talk. Will you be in Tokyo, Japan next week? Want to meet up? Don’t hesitate to drop me an email: firstname.lastname@example.org.Feedback
bet365 now purchases Basho assets. The good news for Riak users? “It is our intention to open source all of Basho’s products and all of the source code that they have been working on.” The Register covers this, too.
Welcome to another post our series of interview blogs for the upcoming Percona Live Europe 2017 in Dublin. This series highlights a number of talks that will be at the conference and gives a short preview of what attendees can expect to learn from the presenter.
This blog post is with Avraham Apelbaum, DBA and DevOps at Wix.com His talk is titled Orchestrating ProxySQL with Orchestrator and Consul. The combination of ProxySQL and Orchestrator solves many problems, but still requires some manual labor when the configuration changes when there is a network split (and other scenarios). In our conversation, we discussed using Consul to solve some of these issues:
Percona: How did you get into database technology? What do you love about it?
Avraham: On my first day as a soldier in a technology unit of the IDF, I received a HUGE Oracle 8 book and a very low-level design of a DB-based system. “You have one month,” they told me. I finished it all within ten days. Before that, I didn’t even know what a DB was. Today, I’m at Wix managing hundreds of databases that support 100M users!
Percona: You’re presenting a session called “Orchestrating ProxySQL with Orchestrator and Consul”. How do these technologies work together to help provide a high availability solution?
Avraham: ProxySQL is supposed to help you out with high availability (HA) and disaster recovery (DR) for MySQL servers, but it still requires some manual labor when the configuration changes – as a result of a network split, for example. Somehow all ProxySQL servers need to get the new MySQL cluster topology. So to automate all that, I added two more parts: a Consul KV store and a Consul template, which are responsible for updating ProxySQL on every architecture change in the MySQL cluster.
Percona: What is special about this combination of products that works better than other solutions? Is it right all the time, or does it depend on the workload?
Avraham: As DevOps I prefer not to do anything manually. What’s more, no one wants to wake up in the middle of the night because any one of our DB servers can fail. Most everyone, I guess, will have more than one ProxySQL server in their system at some point, so this solution can help them use ProxySql and Orchestrator.
Percona: What do you want attendees to take away from your session? Why should they attend?
Avraham: I am hoping to help people automate their HA and DR solutions. If as a result of my talk someone will earn even one minute off downtime, I’ll be happy.
Percona: What are you most looking forward to at Percona Live Europe 2017?
Avraham: In the DevOps and open source world, it’s all about sharing ideas. It was actually when I attended the talks by ProxySQL and Orchestrator’s creators that I thought of assembling it all up to solve our own problem. So I am looking forward to sharing my idea with others, and getting input from the audience so that everyone can benefit.
Want to find out more about Avraham and RDS migration? Register for Percona Live Europe 2017, and see his talk Orchestrating ProxySQL with Orchestrator and Consul. Register now to get the best price! Use discount code SeeMeSpeakPLE17 to get 10% off your registration.
Percona Live Open Source Database Conference Europe 2017 in Dublin is the premier European open source event for the data performance ecosystem. It is the place to be for the open source community as well as businesses that thrive in the MySQL, MariaDB, MongoDB, time series database, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.
The Percona Live Open Source Database Conference Europe will be September 25-27, 2017 at the Radisson Blu Royal Hotel, Dublin.
Percona announces the release of Percona Server for MySQL 5.7.19-17 on August 31, 2017. Download the latest version from the Percona web site or the Percona Software Repositories. You can also run Docker containers from the images in the Docker Hub repository.
Based on MySQL 5.7.19, and including all the bug fixes in it, Percona Server for MySQL 5.7.19-17 is now the current GA release in the Percona Server for MySQL 5.7 series. Percona Server for MySQL is open-source and free – this is the latest release of our enhanced, drop-in replacement for MySQL. Complete details of this release are available in the 5.7.19-17 milestone on Launchpad.
NOTE: Percona software no longer supports Red Hat Enterprise Linux 5 (including CentOS 5 and other derivatives), Ubuntu 12.04 and older versions. These platforms have reached end of life, won’t be updated and are not recommended for use in production.
Included the Percona MyRocks storage engine
NOTE: MyRocks for Percona Server is currently experimental and not recommended for production deployments until further notice. You are encouraged to try it in a testing environment and provide feedback or report bugs.
Nested structures are not common in Relational Database Management Systems. Usually, it’s just flat tables. Sometimes it would be convenient to store unstructured information in structured databases.
We are working to adapt ClickHouse as a long term storage for Percona Monitoring and Management (PMM), and particularly to store detailed information about queries. One of the problems we are trying to solve is to count the different errors that cause a particular query to fail.
For example, for date 2017-08-17 the query:"SELECT foo FROM bar WHERE id=?"
was executed 1000 times. 25 times it failed with error code “1212”, and eight times it failed with error code “1250”. Of course, the traditional way to store this in relational data would be to have a table "Date, QueryID, ErrorCode, ErrorCnt" and then perform a JOIN to this table. Unfortunately, columnar databases don’t perform well with multiple joins, and often the recommendation is to have de-normalized tables.
We can create a column for each possible ErrorCode, but this is not an optimal solution. There could be thousands of them, and most of the time they would be empty.
In this case, ClickHouse proposes Nested data structures. For our case, these can be defined as:CREATE TABLE queries ( Period Date, QueryID UInt32, Fingerprint String, Errors Nested ( ErrorCode String, ErrorCnt UInt32 ) )Engine=MergeTree(Period,QueryID,8192);
This solution has obvious questions: How do we insert data into this table? How do we extract it?
Let’s start with INSERT. Insert can look like:INSERT INTO queries VALUES ('2017-08-17',5,'SELECT foo FROM bar WHERE id=?',['1220','1230','1212'],[5,6,2])
which means that the inserted query during 2017-08-17 gave error 1220 five times, error 1230 six times and error 1212 two times.
Now, during a different date, it might produce different errors:INSERT INTO queries VALUES ('2017-08-18',5,'SELECT foo FROM bar WHERE id=?',['1220','1240','1258'],[3,2,1])
Let’s take a look at ways to SELECT data. A very basic SELECT:SELECT * FROM queries ┌─────Period─┬─QueryID─┬─Fingerprint─┬─Errors.ErrorCode───────┬─Errors.ErrorCnt─┐ │ 2017-08-17 │ 5 │ SELECT foo │ ['1220','1230','1212'] │ [5,6,2] │ │ 2017-08-18 │ 5 │ SELECT foo │ ['1220','1240','1260'] │ [3,16,12] │ └────────────┴─────────┴─────────────┴────────────────────────┴─────────────────┘
If we want to use a more familiar tabular output, we can use the ARRAY JOIN extension:SELECT * FROM queries ARRAY JOIN Errors ┌─────Period─┬─QueryID─┬─Fingerprint─┬─Errors.ErrorCode─┬─Errors.ErrorCnt─┐ │ 2017-08-17 │ 5 │ SELECT foo │ 1220 │ 5 │ │ 2017-08-17 │ 5 │ SELECT foo │ 1230 │ 6 │ │ 2017-08-17 │ 5 │ SELECT foo │ 1212 │ 2 │ │ 2017-08-18 │ 5 │ SELECT foo │ 1220 │ 3 │ │ 2017-08-18 │ 5 │ SELECT foo │ 1240 │ 16 │ │ 2017-08-18 │ 5 │ SELECT foo │ 1260 │ 12 │ └────────────┴─────────┴─────────────┴──────────────────┴─────────────────┘
However, usually we want to see the aggregation over multiple periods, which can be done with traditional aggregation functions:SELECT QueryID, Errors.ErrorCode, SUM(Errors.ErrorCnt) FROM queries ARRAY JOIN Errors GROUP BY QueryID, Errors.ErrorCode ┌─QueryID─┬─Errors.ErrorCode─┬─SUM(Errors.ErrorCnt)─┐ │ 5 │ 1212 │ 2 │ │ 5 │ 1230 │ 6 │ │ 5 │ 1260 │ 12 │ │ 5 │ 1240 │ 16 │ │ 5 │ 1220 │ 8 │ └─────────┴──────────────────┴──────────────────────┘
If we want to get really creative and return only one row per QueryID, we can do that as well:SELECT QueryID, groupArray((ecode, cnt)) FROM ( SELECT QueryID, ecode, sum(ecnt) AS cnt FROM queries ARRAY JOIN Errors.ErrorCode AS ecode, Errors.ErrorCnt AS ecnt GROUP BY QueryID, ecode ) GROUP BY QueryID ┌─QueryID─┬─groupArray(tuple(ecode, cnt))──────────────────────────────┐ │ 5 │ [('1230',6),('1212',2),('1260',12),('1220',8),('1240',16)] │ └─────────┴────────────────────────────────────────────────────────────┘ Conclusion
ClickHouse provides flexible ways to store data in a less structured manner and variety of functions to extract and aggregate it – despite being a columnar database.
Happy data warehousing!
In this blog post, I will look at disk utilization and saturation.
In my previous blog post, I wrote about CPU utilization and saturation, the practical difference between them and how different CPU utilization and saturation impact response times. Now we will look at another critical component of database performance: the storage subsystem. In this post, I will refer to the storage subsystem as “disk” (as a casual catch-all).
The most common tool for command line IO performance monitoring is iostat, which shows information like this:root@ts140i:~# iostat -x nvme0n1 5 Linux 4.4.0-89-generic (ts140i) 08/05/2017 _x86_64_ (4 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 0.51 0.00 2.00 9.45 0.00 88.04 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util nvme0n1 0.00 0.00 3555.57 5887.81 52804.15 87440.73 29.70 0.53 0.06 0.13 0.01 0.05 50.71 avg-cpu: %user %nice %system %iowait %steal %idle 0.60 0.00 1.06 20.77 0.00 77.57 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util nvme0n1 0.00 0.00 7612.80 0.00 113507.20 0.00 29.82 0.97 0.13 0.13 0.00 0.12 93.68 avg-cpu: %user %nice %system %iowait %steal %idle 0.50 0.00 1.26 6.08 0.00 92.16 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util nvme0n1 0.00 0.00 7653.20 0.00 113497.60 0.00 29.66 0.99 0.13 0.13 0.00 0.12 93.52
The first line shows the average performance since system start. In some cases, it is useful to compare the current load to the long term average. In this case, as it is a test system, it can be safely ignored. The next line shows the current performance metrics over five seconds intervals (as specified in the command line).
The iostat command reports utilization information in the %util column, and you can look at saturation by either looking at the average request queue size (the avgqu-sz column) or looking at the r_await and w_await columns (which show the average wait for read and write operations). If it goes well above “normal” then the device is over-saturated.
As in my previous blog post, we’ll perform some system Sysbench runs and observe how the iostat command line tool and Percona Monitoring and Management graphs behave.
To focus specifically on the disk, we’re using the Sysbench fileio test. I’m using just one 100GB file, as I’m using DirectIO so all requests hit the disk directly. I’m also using “sync” request submission mode so I can get better control of request concurrency.
I’m using an Intel 750 NVME SSD in this test (though it does not really matter).Sysbench FileIO 1 Thread root@ts140i:/mnt/data# sysbench --threads=1 --time=600 --max-requests=0 fileio --file-num=1 --file-total-size=100G --file-io-mode=sync --file-extra-flags=direct --file-test-mode=rndrd run File operations: reads/s: 7113.16 writes/s: 0.00 fsyncs/s: 0.00 Throughput: read, MiB/s: 111.14 written, MiB/s: 0.00 General statistics: total time: 600.0001s total number of events: 4267910 Latency (ms): min: 0.07 avg: 0.14 max: 6.18 95th percentile: 0.17
A single thread run is always great as a baseline, as with only one request in flight we should expect the best response time possible (though typically not the best throughput possible).Iostat Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util nvme0n1 0.00 0.00 7612.80 0.00 113507.20 0.00 29.82 0.97 0.13 0.13 0.00 0.12 93.68
The Disk Latency graph confirms the disk IO latency we saw in the iostat command, and it will be highly device-specific. We use it as a baseline to compare changes to with higher concurrency.
Disk IO utilization is close to 100% even though we have just one outstanding IO request (queue depth). This is the problem with Linux disk utilization reporting: unlike CPUs, Linux does not have direct visibility on how the IO device is designed. How many “execution units” does it really have? How are they utilized? Single spinning disks can be seen as a single execution unit while RAID, SSDs and cloud storage (such as EBS) are more than one.
This graph shows the disk load (or request queue size), which roughly matches the number of threads that are hitting disk as hard as possible.
The IO load on the Saturation Metrics graph shows pretty much the same numbers. The only difference is that unlike Disk IO statistics, it shows the summary for the whole system.Sysbench FileIO 4 Threads
Now let’s increase IO to four concurrent threads and see how disk responds:sysbench --threads=4 --time=600 --max-requests=0 fileio --file-num=1 --file-total-size=100G --file-io-mode=sync --file-extra-flags=direct --file-test-mode=rndrd run File operations: reads/s: 26248.44 writes/s: 0.00 fsyncs/s: 0.00 Throughput: read, MiB/s: 410.13 written, MiB/s: 0.00 General statistics: total time: 600.0002s total number of events: 15749205 Latency (ms): min: 0.06 avg: 0.15 max: 8.73 95th percentile: 0.21
We can see the number of requests scales almost linearly, while request latency changes very little: 0.14ms vs. 0.15ms. This shows the device has enough execution units internally to handle the load in parallel, and there are no other bottlenecks (such as the connection interface).Iostat Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util nvme0n1 0.00 0.00 28808.60 0.00 427668.00 0.00 29.69 4.05 0.14 0.14 0.00 0.03 99.92
Saturation Metrics (IO Load)
These stats and graphs show interesting picture: we barely see a response time increase for IO requests, while utilization inches closer to 100% (with four threads submitting requests all the time, it is hard to catch the time when the disk does not have any requests in flight). The load is near four (showing the disk has to handle four requests at the time on average).Sysbench FileIO 16 Threads root@ts140i:/mnt/data# sysbench --threads=16 --time=600 --max-requests=0 fileio --file-num=1 --file-total-size=100G --file-io-mode=sync --file-extra-flags=direct --file-test-mode=rndrd run File operations: reads/s: 76845.96 writes/s: 0.00 fsyncs/s: 0.00 Throughput: read, MiB/s: 1200.72 written, MiB/s: 0.00 General statistics: total time: 600.0003s total number of events: 46107727 Latency (ms): min: 0.07 avg: 0.21 max: 9.72 95th percentile: 0.36
Going from four to 16 threads, we again see a good throughput increase with a mild response time increase. If you look at the results closely, you will notice one more interesting thing: the average response time has increased from 0.15ms to 0.21ms (which is a 40% increase), while the 95% response time has increased from 0.21ms to 0.36ms (which is 71%). I also ran a separate test measuring 99% response time, and the difference is even larger: 0.26ms vs. 0.48ms (or 84%).
This is an important observation to make: once saturation starts to happen, the variance is likely to increase and some of the requests will be disproportionately affected (beyond what the average response time shows).Iostat Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util nvme0n1 0.00 0.00 82862.20 0.00 1230567.20 0.00 29.70 16.33 0.20 0.20 0.00 0.01 100.00
Disk IO Latency
Disk IO Utilization
Saturation Metrics IO Load
The graphs show an expected figure: the disk load and IO load from saturation are up to about 16, and utilization remains at 100%.
One thing to notice is increased jitter in the graphs. IO utilization jumps to over 100% and disk IO load spikes to 18, when there should not be as many requests in flight. This comes from how this information is gathered. An attempt is made to sample this data every second, but with the loaded system it takes time for this process to work: sometimes when we try to get the data for a one-second interval but really get data for 1.05- or 0.95-second intervals. When the math is applied to the data, it creates the spikes and dips in the graph when there should be none. You can just ignore them if you’re looking at the big picture.Sysbench FileIO 64 Threads
Finally, let’s run sysbench with 64 concurrent threads hitting the disk:root@ts140i:/mnt/data# sysbench --threads=64 --time=600 --max-requests=0 fileio --file-num=1 --file-total-size=100G --file-io-mode=sync --file-extra-flags=direct --file-test-mode=rndrd run File operations: reads/s: 127840.59 writes/s: 0.00 fsyncs/s: 0.00 Throughput: read, MiB/s: 1997.51 written, MiB/s: 0.00 General statistics: total time: 600.0014s total number of events: 76704744 Latency (ms): min: 0.08 avg: 0.50 max: 9.34 95th percentile: 1.25
We can see the average has risen from 0.21ms to 0.50 (more than two times), and 95% almost tripped from 0.36ms to 1.25ms. From a practical standpoint, we can see some saturation starting to happen, but we’re still not seeing a linear response time increase with increasing numbers of parallel operations as we have seen with CPU saturation. I guess this points to the fact that this IO device has a lot of parallel capacity inside and can process requests more effectively (even going from 16 to 64 concurrent threads).
Over the series of tests, as we increased concurrency from one to 64, we saw response times increase from 0.14ms to 0.5ms (or approximately three times). The 95% response time at this time grew from 0.17ms to 1.25ms (or about seven times). For practical purposes, this is where we see the IO device saturation start to show.Iostat Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util nvme0n1 0.00 0.00 138090.20 0.00 2049791.20 0.00 29.69 65.99 0.48 0.48 0.00 0.01 100.24
We’ll skip the rest of the graphs as they basically look the same, just with higher latency and 64 requests in flight.Sysbench FileIO 256 Threads root@ts140i:/mnt/data# sysbench --threads=256 --time=600 --max-requests=0 fileio --file-num=1 --file-total-size=100G --file-io-mode=sync --file-extra-flags=direct --file-test-mode=rndrd run File operations: reads/s: 131558.79 writes/s: 0.00 fsyncs/s: 0.00 Throughput: read, MiB/s: 2055.61 written, MiB/s: 0.00 General statistics: total time: 600.0026s total number of events: 78935828 Latency (ms): min: 0.10 avg: 1.95 max: 17.08 95th percentile: 3.89Iostat Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util nvme0n1 0.00 0.00 142227.60 0.00 2112719.20 0.00 29.71 268.30 1.89 1.89 0.00 0.01 100.00
With 256 threads, finally we’re seeing the linear growth of the average response time that indicates overload and queueing to process requests. There is no easy way to tell if it is due to the IO bus saturation (we’re reading 2GB/sec here) or if it is the internal device processing ability.
As we’ve seen a less than linear increase in response time going from 16 to 64 connections, and a linear increase going from 64 to 256, we can see the “optimal” concurrency for this device: somewhere between 16 and 64 connections. This allows for peak throughput without a lot of queuing.
Before we get to the summary, I want to make an important note about this particular test. The test is a random reads test, which is a very important pattern for many database workloads, but it might not be the dominant load for your environment. You might be write-bound as well, or have mainly sequential IO access patterns (which could behave differently). For those other workloads, I hope this gives you some ideas on how to also analyze them.
Another Way to Think About Saturation
When I asked the Percona staff for feedback on this blog post by, my colleague Yves Trudeau provided another way to think about saturation: measure saturation as percent increase in the average response time compared to the single user. Like this:Threads Avg Response Time Saturation 1 0.14 – 4 0.15 1.07x or 7% 16 0.21 1.5x or 50% 64 0.50 3.6x or 260% 256 1.95 13.9x or 1290%
We can see how understanding disk utilization and saturation is much more complicated than for the CPU:
We are five weeks out to the conference! The tutorials and the sessions have been released, and there’s an added bonus – you can now look at all this in a grid view: tutorials, day one and day two. Now that you can visualize what’s being offered, don’t forget to register.
If you want a discount code, feel free to email me at email@example.com.
We have some exciting keynotes as well. Some highlights:
Let’s not forget the usual State of the Dolphin, an update from Oracle’s MySQL team (representative: Geir Høydalsvik), as well as a keynote by Peter Zaitsev (CEO, Percona) and Continuent. There will also be a couple of Percona customers keynoting, so expect information-packed fun mornings! You can see more details about the keynotes here: day one and day two.Releases
Percona’s website keeps track of community events, so check out where to listen to a Perconian speak. My upcoming appearances are:
Bill Bogasky (MariaDB Corporation) says that if you’re looking for commercial support for Riak now that Basho has gone under, you could get it from Erlang Solutions or TI Tokyo. See their announcement: Riak commercial support now available post-Basho. Thanks, Bill!
Percona announces the release of Percona Server for MySQL 5.6.37-82.2 on August 25, 2017. Download the latest version from the Percona web site or the Percona Software Repositories. You can also run Docker containers from the images in the Docker Hub repository.
Based on MySQL 5.6.37, and including all the bug fixes in it, Percona Server for MySQL 5.6.37-82.2 is now the current GA release in the Percona Server for MySQL 5.6 series. Percona Server for MySQL is open-source and free – this is the latest release of our enhanced, drop-in replacement for MySQL. Complete details of this release are available in the 5.6.37-82.2 milestone on Launchpad.
NOTE: Red Hat Enterprise Linux 5 (including CentOS 5 and other derivatives), Ubuntu 12.04 and older versions are no longer supported by Percona software. The reason for this is that these platforms reached end of life, will not receive updates and are not recommended for use in production.
For general inquiries, please send us your question and someone will contact you.