Feed aggregator

You are here

Systemtap solves phantom MySQLd SIGTERM / SIGKILL issue

Latest MySQL Performance Blog posts - July 18, 2014 - 7:38am

The Percona Managed Services team recently faced a somewhat peculiar client issue. We’d receive pages about their MySQL service being unreachable. However, studying the logs showed nothing out of the ordinary…. for the most part it appeared to be a normal shutdown and there was nothing in anyone’s command history nor a cron task to speak of that was suspicious.

This is one of those obscure and peculiar (read: unique) issues that triggered an old memory; I’d seen this behavior before and I had just the tool to catch the culprit in the act.

Systemtap made diagnostics of this issue possible and I can’t state enough how much of a powerful and often under-utilized tool set systemtap really is.

cat > signals.stp << EOF
probe signal.send {
if (sig_name == “SIGKILL” || sig_name == “SIGTERM”)
printf(“[%s] %s was sent to %s (pid:%d) by %s uid:%dn”,
ctime(gettimeofday_s()), sig_name, pid_name, sig_pid, execname(), uid())

sudo stap ./signals.stp > signals.log 2>signals.err

grep mysqld signals.log
[Wed Jun 11 19:03:23 2014] SIGKILL was sent to mysqld (pid:8707) by cfagent uid:0
[Fri Jun 13 21:37:27 2014] SIGKILL was sent to mysqld (pid:6583) by cfagent uid:0
[Sun Jun 15 05:05:34 2014] SIGKILL was sent to mysqld (pid:19818) by cfagent uid:0
[Wed Jul 9 07:03:47 2014] SIGKILL was sent to mysqld (pid:4802) by cfagent uid:0

Addendum: It had been so long since I had used this tooling that I could not remember the original source from which I derived the module above; some cursory searching to rectify this issue for this blog post found this original source by Eugene Teo of Red Hat made available under GPLv2.

From this we were able to show that cfagent was killing the mysqld process presumably via a misconfigured job; this information was returned to the client and this has continued to be run in production for two months now at the client’s request with no issues to speak of.

This is by no means the limit to what systemtap can be used to achieve; you can hook into functions though whilst you may need to install the debug packages to find what functions are available run for example:

sudo stap -L 'process("/usr/sbin/mysqld").function("*")' > /tmp/mysql_stapfunc
head /tmp/mysql_stapfunc

This is also true of the kernel using sudo stap -L 'kernel.function("*")' > /tmp/kernel_stapfunc however you must be booted into a debug kernel for this to function.

Systemtap is more than a worthy tool to have at your disposal with plenty of examples available.

Finally I invite you to join me July 23 at 10 a.m. Pacific time for my webinar, “What Every DBA Needs to Know About MySQL Security.” This detailed technical webinar provides insight into best security practices for either setting up a new MySQL environment or upgrading the security of an existing one. I hope to see you there!

The post Systemtap solves phantom MySQLd SIGTERM / SIGKILL issue appeared first on MySQL Performance Blog.

Q&A: Even More Deadly Mistakes of MySQL Development

Latest MySQL Performance Blog posts - July 17, 2014 - 9:57am

On Wednesday I gave a presentation on “How to Avoid Even More Common (but Deadly) MySQL Development Mistakes” for Percona MySQL Webinars.  If you missed it, you can still register to view the recording and my slides.

Thanks to everyone who attended, and especially to folks who asked the great questions.  I answered as many as we had time for  during the session, but here are all the questions with my complete answers:

Q: Disk bandwidth also not infinite

Indeed, you’re right!

We discussed in the webinar the impact on network bandwidth from using column wildcards in queries like SELECT *, but it’s also possible that using SELECT * can impact disk operations. Varchar, Blob, or Text columns can be stored on extra pages in the database, and if you include those columns in your query needlessly, it can cause the storage engine to do a lot of seeks and page reads unnecessarily.

For more details on string storage in InnoDB, see Peter Zaitsev’s blog on Blob Storage in Innodb.

Q: How many tables can be joined in a single query? What is the optimal number of joins?

MySQL has a limit of 63 table references in a given query. This limits how many JOIN operations you can do, and also limits the number of UNIONs. Actually you can go over this limit if your JOIN or UNION don’t reference any tables, that is, create a derived table of one row of expressions.

If you do join a lot of tables (or even self-join the same table many times), you’re likely to hit a practical scaling limit long before you reach 63 table references. The practical limit in your case depends on many factors, including the length of the tables, the data types, the type of join expressions in your queries, and your physical server’s capabilities. It’s not a fixed limit I can cite for you.

If you think you need dozens of table references in a single query, you should probably step back and reconsider your database design or your query design.

I often see this type of question (“what is the limit on the number of joins?”) when people try to use key/value tables, also called Entity-Attribute-Value, and they’re trying to pivot attributes from rows into columns, as if the table were stored in a conventional way with one column per attribute. This is a broken design for many reasons, and the scalability of many-way joins is just one problem with it.

Q: How many indexes can be created in a single table? Any limitation? What is the optimal number of indexes?

All MySQL storage engines support at least 16 indexes per table.

As far as the optimal number of indexes, I don’t pay attention to the number of indexes (as long as it remains lower than the max of 16). I try to make sure I have the right indexes for my queries. If you put an arbitrary cap of for example 8 or 10 indexes on a given table, then you might be running queries that lack a needed index, and the unnecessary extra cost of running that query is probably greater than the cost of maintaining the one extra index it needs.

That said, there are cases where you have such variation in query types that there’s no way to have optimal indexes to cover every possible case. Given that you can have multi-column indexes, and multi-column indexes with columns in different orders, there are n-factorial possible indexes on a table with n columns.

Q: There is a table with 3 columns: id(int), user_id(int), day(date). There is a high chance same user_id will ‘exist’ for every day. I read data by “where user_id = some_id” (very high throuhput) and delete all entries once a day by cron using “where sent_date = ’2014-01-01′ “. Have approx 6M rows per day deletion is pretty painfull. Will partitioning by column ‘day’ help me deleting those bulks faster? If yes – how much faster? How much will it slow down SELECTs? – not all entries are deleted, but only entries for some specific old day, e.g. ‘ WHERE day = ’1 week ago’

Range partitioning by date would give you the opportunity to ALTER TABLE…DROP PARTITION, so you could remove all data for a given date very quickly, much faster than deleting millions of rows. The performance of DROP PARTITION is like that of DROP TABLE, because each partition is physically stored like a separate table.

Searching for “where user_id = ?” would not be able to take advantage of partition pruning, but it would still be able to use an index on user_id. And if you drop old partitions, the benefit of searching a smaller table could be a good tradeoff.

Q: Regarding 20% selectivity as a threshold for the optimizer preferring a table-scan to an index lookup – is that a tunable?

No, it’s not tunable, it’s a fixed behavior of the query optimizer. If you search for a value and the optimizer estimates that > 20% of rows contain the value you search for, it will bypass the index and just do a table-scan.

For the same reason that the index of a book doesn’t contain very common words, because the list of pages that word appears on would be too long, and flipping back and forth from the back of the book to each listed page would actually be more work than just reading the book.

Also keep in mind my figure of 20% is approximate. Your results may vary. This is not a magic threshold in the source code, it’s just a tendency I have observed.

Q: Regarding generating synthetic test data, it sounds like a pretty easy perl script to write.

Yes, it might be easy to do that for one given table. But every table is different, and you might have hundreds of tables in dozens of applications to generate test data for. You might also want to vary the distribution of data values from one test to another.

Writing a test-data generator for one particular case is easy, so you might reasonably do it as a one-off task. Writing a general-purpose test-data generator that you can use for many cases is more work.

Q: Would love to have the set of URLs cited in the presentation without having to go back and mine them out of the presentation.

Open source message queues:

MySQL Performance Blog articles:

Open source test-data generator:

Load-testing tools for web applications:

Load-testing tools to replay query logs:

Further reading for implementing business rules:

Q: How to best use mysql query cache?

Any cache is best used if you read from it many times for each time you write to it. So we’d like to estimate the average ratio of query cache reads to writes, to estimate how much leverage it’s giving us.


Check the values for QCache_hits (which are cases when a query result was read from the query cache) over QCache_inserts (which are cases when the desired query result was not in the cache, and had to be run and then the result stored in the cache). I like to see a ratio of 1000% or more (i.e. 10:1 hits to inserts).

If you have a poor ratio, for example less than 1:1 or less than 100%, then consider disabling the query cache, because it may be costing more to maintain it than the performance benefit it’s giving you.

Keep in mind that this is only a guideline, because the calculation I described is only an average. It could be that the queries served by the query cache are very expensive, so using the cached result is a great benefit even if it accounts for a small number of hits. The only way to be certain is to load-test your application under your load, and compare overall performance results with the query cache enabled or disabled, and at different sizes.

Q: How to detect when too much indexes start to affect performance?

Some people are reluctant to create indexes because they have been warned that indexes require synchronous updates when you INSERT, UPDATE, or DELETE rows. Some people also make the generalization that indexes harm writes but benefit reads. Bot of these are not true.

Your DML operations aren’t really updating indexes in real time. InnoDB includes a feature called change buffering, which defers index updates. The change buffer is gradually merged into the index over time. That way, InnoDB can handle a big spike in traffic without it hurting throughput as much. You can monitor how much content in the change buffer remains to be merged:
mysql> SHOW GLOBAL STATUS LIKE 'Innodb_ibuf_size';

It’s also not accurate that indexes hurt writes. UPDATE and DELETE statements usually have a WHERE clause, to apply the changes to particular rows. These conditions use indexes to reduce the examined rows, just like in SELECT statements. But in UPDATE and DELETE statements, it’s even more important to use indexes, because otherwise the statement has to lock a lot of rows to ensure it locks the rows you’re changing.

So I generally say, don’t avoid indexes based only on the number of indexes you have, just make sure your indexes are being employed by the queries you run, and drop indexes that aren’t used. Here are a couple of past blog posts that show how to do this:

Thanks again for attending my webinar!  Here are some more tips:

  • Check out upcoming Percona Training classes in North America and Europe.
  • Join Percona and the MySQL community at our Percona Live.
  • Watch more webinars from Percona in the future!

The post Q&A: Even More Deadly Mistakes of MySQL Development appeared first on MySQL Performance Blog.

High Availability with mysqlnd_ms on Percona XtraDB Cluster

Latest MySQL Performance Blog posts - July 16, 2014 - 7:11am

This is the second part of my series on High Availability with mysqlnd_ms. In my first post, “Simple MySQL Master HA with mysqlnd_ms,” I showed a simple HA solution using asynchronous MySQL replication. This time we will see how to leverage an all-primary cluster where you can write to all nodes. In this post I used Percona XtraDB Cluster, but you should also be able to do the same with MySQL NDB Cluster or Tungsten Replicator.

To start with, here is the mysqlnd_ms configuration I used:mysqlnd_ms_mm.ini.  All of these files are available from my Github repository. Below, I have three Percona XtraDB Cluster nodes, all defined as masters and no slaves. I’ve configured a roundrobin filter where all connections will happen on the first node, in this case192.168.56.44 . In case the first node fails, the second node will be used and so forth until no more nodes are available. Another interesting configuration option here is the loop_before_master strategy whereby if connection or a statement to the current server fails, it will be retried silently on the remaining nodes before returning an error to the user, more on this below.

{ "primary": { "master": { "master_1": { "host": "", "port": "3306" }, "master_2": { "host": "", "port": "3306" }, "master_3": { "host": "", "port": "3306" } }, "slave": { }, "filters": { "roundrobin": [ ] }, "failover": { "strategy": "loop_before_master", "remember_failed": true } } }

Similar to my previous post, I also used a custom INI file for PHP to use, this time aptly namedmaster-master.ini :

mysqlnd_ms.enable = 1 mysqlnd_ms.disable_rw_split = 1 mysqlnd_ms.multi_master = 1 mysqlnd_ms.force_config_usage = 1 mysqlnd_ms.config_file = /home/revin/git/demo-me/phpugph201407/mysqlnd_ms_mm.ini

A new addition to this configuration ismysqlnd_ms.multi_master , when enabled it would allow you to use all nodes or just one and treat the others as passive. The PHP script I used this time is calledmaster-master.php , it is largely similar tomaster-slave-ng.phpwith a few differences:

  1. There is no need for /tmp/PRIMARY_HAS_FAILED  sentinel as all nodes were writable.
  2. There is no need for /*ms=master*/  SQL hint when validating a connection from connect_mysql function since all nodes acts as master.

So here is a quick test, first with roundrobin filter, after 4 INSERTs, I shutdown  which sends my connection to the next server in the configuration, . When I started back  again, the script resumed connections there. Pretty cool right?

[revin@forge phpugph201407]$ php -c master-master.ini master-master.php Last value 3564 from host via TCP/IP and thread id 19 Last value 3565 from host via TCP/IP and thread id 20 Last value 3566 from host via TCP/IP and thread id 21 Last value 3567 from host via TCP/IP and thread id 22 Warning: connect_mysql(): MySQL server has gone away in /home/revin/git/demo-me/phpugph201407/master-master.php on line 63 Warning: connect_mysql(): Error while reading greeting packet. PID=23464 in /home/revin/git/demo-me/phpugph201407/master-master.php on line 63 ERRROR: via TCP/IP [2006] MySQL server has gone away on line 30 Last value 0 from host and thread id 0 Last value 3568 from host via TCP/IP and thread id 1552 Last value 3569 from host via TCP/IP and thread id 1553 [...] Last value 3584 from host via TCP/IP and thread id 1568 Last value 3585 from host via TCP/IP and thread id 18

Here’s another test using the random filter which allows you to write to all nodes, on my mysqlnd_ms_mm.ini above, I just changed roundrobin  torandom . As you can see, all three nodes were being used, of course in random, at the same time you will also see when I shutdown  around where the connect_mysql  errors and then the server was used again near the bottom after a started it back up. Still pretty cool right?

[revin@forge phpugph201407]$ php -c master-master.ini master-master.php Last value 3590 from host via TCP/IP and thread id 2060 Last value 3591 from host via TCP/IP and thread id 1569 Last value 3592 from host via TCP/IP and thread id 1570 Warning: connect_mysql(): MySQL server has gone away in /home/revin/git/demo-me/phpugph201407/master-master.php on line 63 Warning: connect_mysql(): Error while reading greeting packet. PID=23919 in /home/revin/git/demo-me/phpugph201407/master-master.php on line 63 ERRROR: via TCP/IP [2006] MySQL server has gone away on line 30 Last value 0 from host and thread id 0 Last value 3593 from host via TCP/IP and thread id 2061 Last value 3594 from host via TCP/IP and thread id 2062 Last value 3595 from host via TCP/IP and thread id 2063 Last value 3596 from host via TCP/IP and thread id 2064 Last value 3597 from host via TCP/IP and thread id 1576 Last value 3598 from host via TCP/IP and thread id 1577 Last value 3599 from host via TCP/IP and thread id 1578 Last value 3600 from host via TCP/IP and thread id 1579 Last value 3601 from host via TCP/IP and thread id 2065 Last value 3602 from host via TCP/IP and thread id 1581 Last value 3603 from host via TCP/IP and thread id 1582 Last value 3604 from host via TCP/IP and thread id 2066 Last value 3605 from host via TCP/IP and thread id 19 Last value 3606 from host via TCP/IP and thread id 1583 Last value 3607 from host via TCP/IP and thread id 21

So here are some issues I’ve observed during these tests:

  1. remember_failed  during failover does not work as advertised. Supposedly, a failed node should not be used again for every connection request but in my test, this is not the case. See more from this bug. This means that if you have 2 out of 3 failed nodes in this scenario the overhead would be too big when testing both connections. Perhaps some sort of in memory shared TTL can be used to overcome this? I’m not sure.
  2. If you look closely around line 7 on my last output above the error displayed is kind of misleading. In particular it saysERRROR: via TCP/IP , whereby it was not  that failed, it was192.168.56.43 . This is because under the hood, immediately after failure the next node will be cycled to, this is especially true since we have loop_before_master configured. I sure do have a bug on the script that should capture the host_info  properly, but this is something to always keep in mind so you don’t keep scratching your head.

So we’ve seen these two forms of possibilities and they definitely have use cases and advantages. On the other hand because of the issues we have found so far (I’ve reported 4 bugs on the PHP bugs database during the course of these tests including one crashing), I recommend to make sure you test seriously before putting this on production.

The post High Availability with mysqlnd_ms on Percona XtraDB Cluster appeared first on MySQL Performance Blog.

TokuDB tips: MySQL backups

Latest MySQL Performance Blog posts - July 15, 2014 - 3:00am

In my recent post, “TokuDB gotchas: slow INFORMATION_SCHEMA TABLES,” I saw a couple questions and tweets asking if we use TokuDB in production. Actually I mentioned it in that post and we also blogged about it in a couple of other recent posts:

So, yes, we are using Percona Server + TokuDB as a main storage engine in Percona Cloud Tools to store timeseries data.

And, yes, Percona Server + TokuDB is available GA Percona Server 5.6.19-67.0 with TokuDB (GA).

Just having good performance is not enough to make it into production; there are also operational questions and one such question is about backups. I want to explain how we do backups for Percona Server + TokuDB in Percona Cloud Tools.

I should say up front, that we DO NOT have support for TokuDB in Percona XtraBackup. TokuDB internals are significantly different from InnoDB/XtraDB, so it will be a major project to add this to Percona XtraBackup and we do not have any plans at the moment to work on this.

It does not mean that TokuDB users do not have options for backups. There is Tokutek Hot back-up, included in the Tokutek Enterpise Subscription. And there is a method we use in Percona Cloud Tools: LVM Backups. We use mylvmbackup scripts for this task and it works fairly well for us.

There is however some gotchas to be aware. If you understand an LVM backups mechanic, this is basically a managed crash recovery process when you restore from a backup.

Now we need to go in a little detail for TokuDB. To support transactions that involve both TokuDB and InnoDB engines, TokuDB uses a two-phase commit mechanism in MySQL. When involved, the two-phase commit requires binary logs presented for a proper recovery procedures.

But now we need to take a look at how we setup a binary log in Percona Cloud Tools. We used SSD for the main data storage (LVM partition is here) and we use a Hardware RAID1 over two hard-drives for binary logs. We choose this setup as we care about SSD lifetime. In write-intensive workloads, binary logs will produce a lot of write operations and in our calculation we will just burn these SSDs, so we have to store them on something less expensive.

So the problem there is that when we take an LVM snapshot over main storage, we do not have a consistent view of binary logs (although it is possible to modify backup scripts to copy the current binary log under FLUSH TABLES WITH READ LOCK operation, this is probably what we will do next). But binary logs are needed for recovery, without them we face these kind of errors during restoring from backup:

2014-DD-MM 02:15:16 16414 [Note] Found 1 prepared transaction(s) in TokuDB 2014-DD-MM 02:15:16 16414 [ERROR] Found 1 prepared transactions! It means that mysqld was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start mysqld with --tc-heuristic-recover switch to commit or rollback pending transactions. 2014-DD-MM 02:15:16 16414 [ERROR] Aborting

The error message actually hints a way out. Unfortunately it seems that we are the first ones to have ever tried this option, as tc-heuristic-recover is totally broken in current MySQL and not supposed to work… and it would be noticed if someone really tried it before us (which gives me an impression that Oracle/MySQL never properly tested it, but that is a different story).

We will fix this in Percona Server soon.

So the way to handle a recovery from LVM backup without binary logs is to start mysqld with –tc-heuristic-recover switch (unfortunately I did not figure out yet, should it be COMMIT or ROLLBACK value, hehe).

The proper way to use LVM backup is to have a corresponding binary log file, like I said it will require a modification to mylvmbackup script.

I should say this is not the only way we do backups in Percona Cloud Tools. In this project we use Percona Backup Service provided by the Percona Managed Services team, and our team also uses mydumper to perform a logical backup of data.
While it works acceptably to backup hundreds of gigabytes worth of data (it is just a sequential scan, which should be easy for TokuDB), the full recovery is painful and takes unacceptably long. So mydumper backup (recovery) will be used if we ever need to perform a fine-grained recovery (i.e only small amount of specific tables).

So I hope this tip is useful if you are looking for info about how to do backups for TokuDB.

The post TokuDB tips: MySQL backups appeared first on MySQL Performance Blog.

Simple MySQL Master HA with mysqlnd_ms

Latest MySQL Performance Blog posts - July 14, 2014 - 7:51am

I had the pleasure of presenting to the PHP Users Group Philippines a few days ago about mysqlnd_ms. The mysqlnd plugin, MySQL Master Slave, is a transparent layer on top of mysqlnd extension. This allows you to do read-write splitting and slave reads load balancing without needing to change anything from your application. But do you know you can also achieve a form of high availability with this plugin? I shared 2 forms on my presentation, using async MySQL replication either in master-slave configuration or master-master configuration, while the second form is having an all primary cluster where you can write to all nodes.

This first part is to demonstrate how you can achieve a simple HA solution using the first form. First, all the sample code here can be found on my GitHub repository. So, to use the mysqlnd_ms plugin, it uses an additional external configuration file in JSON format. This configuration file, will define your master and slave nodes, failover properties and any filters (connection selection method) you want to dictate how the algorithm will provide you the connection.

Let’s start with the mysqlnd_ms configuration I used,mysqlnd_ms_ms.ini :

{ "primary": { "master": { "master_1": { "host": "", "port": "33001" } }, "slave": { } }, "standby": { "master": { "master_1": { "host": "", "port": "33002" } }, "slave": { } } }

Here, I have two applications defined, one called “primary” and another called “standby”, I have not defined any slaves for simplicity. The two MySQL instances running on port 33001 and 33002 are in master-master configuration.

mysqlnd_ms.enable = 1 mysqlnd_ms.disable_rw_split = 1 mysqlnd_ms.force_config_usage = 1 mysqlnd_ms.config_file = /home/revin/git/demo-me/phpugph201407/mysqlnd_ms_ms.ini

This is the custom INI file I used for the tests,master-slave.ini . The first line simply enables the plugin for use. The second line, mysqlnd_ms.disable_rw_split instructs the plugin that I should only send all queries to the master because I only have masters for this test.

As for the PHP script, the full copy can be found here, as it is a bit lengthy I will just explain the logic on what it does.

  1. To start the test, it bootstraps the test table via DROP and then CREATE queries.
  2. It then enters a for loop where it will execute an INSERT followed by a SELECT to validate the newly inserted row and additional information like the current active server id and the connection id.
  3. For every iteration of the loop, a new mysqli object is created to simulate non-persistent connections to the database server.
  4. To create the new connection, a call to the function connect_mysql  is made which returns a mysqli object when successful. An important thing to remember here is that mysqlnd_ms uses lazy connections by default, this means that when the mysqli object is created, it is not really connected yet to the server. One has to issue a statement like 'SELECT 1'  to start the connection manually or callmysqli::real_connect . Not even mysqli::ping  does not work without the former, I’ve opened this bug.
  5. After the mysqli object is returned, the INSERT statement will trigger mysqlnd_ms to actually establish the connection and then execute the statement. This is where the good part is, if the connection cannot be made, the query_write_mysql function will know and will re-request the connection from connect_mysql, this time within the connect_mysql function, connection to the primary will be retried at least 10 times if the type of error from the previous failure is something related to a connection like error numbers 2002  and2003 . If the connection cannot be established after 10 retries, the application creates a sentinel file as /tmp/PRIMARY_HAS_FAILED  and will retry the connection to the secondary (slave or passive-master).

Here is an example run, my primary has a server id or 101 while my standby is 102:

[revin@forge phpugph201407]$ php -c master-slave.ini master-slave-ng.php Last value 0001 from server id 101 thread id 7 Last value 0003 from server id 101 thread id 8 37: [2002] Connection refused Connection to host 'primary' failed: [0] Connection refused, retrying (1 of 10) in 3 seconds Connection to host 'primary' failed: [0] Connection refused, retrying (2 of 10) in 3 seconds Connection to host 'primary' failed: [0] Connection refused, retrying (3 of 10) in 3 seconds Connection to host 'primary' failed: [0] Connection refused, retrying (4 of 10) in 3 seconds Connection to host 'primary' failed: [0] Connection refused, retrying (5 of 10) in 3 seconds Connection to host 'primary' failed: [0] Connection refused, retrying (6 of 10) in 3 seconds Connection to host 'primary' failed: [0] Connection refused, retrying (7 of 10) in 3 seconds Connection to host 'primary' failed: [0] Connection refused, retrying (8 of 10) in 3 seconds Connection to host 'primary' failed: [0] Connection refused, retrying (9 of 10) in 3 seconds Connection to host 'primary' failed: [0] Connection refused, retrying (10 of 10) in 3 seconds The primary host 'primary' has failed after 30 seconds, failing over to standby! 52: [2002] Connection refused Last value 0004 from server id 102 thread id 635 Last value 0006 from server id 102 thread id 636 Last value 0008 from server id 102 thread id 637 [...]

This is not the perfect setup and there are a number of limitations, however it tells us that if you have a simple HA requirement like if you’re not running a very critical application but still do not want to be waken up at night but rather deal with issues in the morning, this might just fit. So here are some more notes:

    • If you have a master-slave configuration, you just basically shot your primary (master) in the foot during the failover. You may need to rebuild its data in the morning.
    • If instead you have master-master, you might just be able to bring the primary master back online, get it caught up in replication and then delete /tmp/PRIMARY_HAS_FAILED  file to switch your application back to it.
    • The use of /tmp/PRIMARY_HAS_FAILED  sentinel file is rudimentary, its not the only way. You should consider sending notifications to yourself when failover happens because this method requires human intervention to put back the primary master back in rotation.

The same effect can be achieved with a little more coding, but you can already take advantage of the plugin with less.

I’ve also tested the plugin on the second form where you can write to multiple masters using Percona XtraDB Cluster. I’ve found a few interesting issues there so stay tuned.

The post Simple MySQL Master HA with mysqlnd_ms appeared first on MySQL Performance Blog.

Managing shards of MySQL databases with MySQL Fabric

Latest MySQL Performance Blog posts - July 11, 2014 - 2:00am

This is the fourth post in our MySQL Fabric series. In case you’re joining us now, we started with an introductory post, and then discussed High Availability (HA) using MySQL Fabric here (Part 1) and here (Part 2). Today we will talk about how MySQL Fabric can help you scale out MySQL databases with sharding.


At the time of writing, MySQL Fabric includes support for range- and hash-based sharding. As with HA, the functionality is split between client, through a MySQL Fabric-aware connector; and server, through the mysqlfabric utility and the XML-RPC server we’ve talked about before.

In this post, we’ll go through the process of setting up a sharded table for use with MySQL Fabric, and then go through some usage examples, again using the Python connector.

In our next post, we’ll talk about shard management operations, and go into more detail about how we can combine the Sharding and HA features of MySQL Fabric.

The architecture

For our examples, we’ll be using a sharding branch from our vagrant-fabric repository. If you have been following previous posts and already have a local copy of the repo, you can get this one just by running the following command:

git checkout sharding

from the root of your copy. Bear in mind that the node names are the same in the Vagrantfile, so while in theory  just running vagrant provision should be enough, you may have to run vagrant destroy and vagrant up again, if you hit unexpected behavior.

The only difference between this branch and the original one is that you’ll have two mysqld instances per node: one on port 3306 and one on port 13306. This will let us achieve high availability for our shard groups. But don’t worry about that for now, it’s something we’ll discuss more in depth in our next post.

In today’s examples, we’ll be using the three group architecture described by this diagram:

The blue boxes represent shard-groups and the green box represent the global-group. The red arrows indicate the flow of replication and the violet arrows represent client connections.

Setting up sharding

The official documentation about sharding with MySQL Fabric can be found here. We’ll be using the same example employees database and shard the salaries table.

As we said, to keep things simple for the introduction, we’ll create all the groups but only add one instance to each one of them. In our next post, we’ll use two instances per group to evaluate how MySQL Fabric can make our shards highly available, and how it can rearrange replication topologies automatically after a failure.

To start, let’s create three groups:

[vagrant@store ~]$ mysqlfabric group create salaries-global Procedure : { uuid = 390aa6c0-acda-40e2-ad52-8c0869613635, finished = True, success = True, return = True, activities = } [vagrant@store ~]$ for i in 1 2; do mysqlfabric group create salaries-$i; done Procedure : { uuid = 274742a2-5e84-49b8-8446-5a8fc55f1899, finished = True, success = True, return = True, activities = } Procedure : { uuid = 408cfd6a-ff3a-493e-b39b-a3241d83fda6, finished = True, success = True, return = True, activities = }


The global group will be used to propagate schema changes and to store unpartitioned data. Think of configuration tables that don’t need to be sharded, for example.

The other two groups will host shards, that is, tables that will have the same structure across all the nodes, but not the same data (and that will be empty in the global group’s nodes).

Now, let’s add one instance to each group:

[vagrant@store ~]$ mysqlfabric group add salaries-global node1:3306 Procedure : { uuid = 0d0f657c-9304-4e3f-bf5b-a63a5e2e4390, finished = True, success = True, return = True, activities = } [vagrant@store ~]$ mysqlfabric group add salaries-1 node2:3306 Procedure : { uuid = b0ee9a52-49a2-416e-bdfd-eda9a384f308, finished = True, success = True, return = True, activities = } [vagrant@store ~]$ mysqlfabric group add salaries-2 node3:3306 Procedure : { uuid = ea5d8fc5-d4f9-48b1-b349-49520aa74e41, finished = True, success = True, return = True, activities = }

We also need to promote the groups. Even though each group has a single node, MySQL Fabric sets up that node as SECONDARY, which means it can’t take writes.

[vagrant@store ~]$ mysqlfabric group promote salaries-global Procedure : { uuid = 5e764b97-281a-49f0-b486-25088a96d96b, finished = True, success = True, return = True, activities = } [vagrant@store ~]$ for i in 1 2; do mysqlfabric group promote salaries-$i; done Procedure : { uuid = 7814e96f-71d7-4865-a278-cb6ed32a2d11, finished = True, success = True, return = True, activities = } Procedure : { uuid = cd30e9a9-b9ea-4b2d-a8ae-5e70f22363d6, finished = True, success = True, return = True, activities = }

Finally, we are ready to create a shard definition and associate ranges to groups:

[vagrant@store ~]$ mysqlfabric sharding create_definition RANGE salaries-global Procedure : { uuid = fffcbb5f-24c6-47a2-9348-f1d810c8ef2f, finished = True, success = True, return = 1, activities = } [vagrant@store ~]$ mysqlfabric sharding add_table 1 employees.salaries emp_no Procedure : { uuid = 8d0a3c51-d543-49a6-b47a-36a4ab499ab4, finished = True, success = True, return = True, activities = } [vagrant@store ~]$ mysqlfabric sharding add_shard 1 "salaries-1/1, salaries-2/25000" --state=ENABLED Procedure : { uuid = 2585a5ea-a097-44a4-89fa-a948298d0595, finished = True, success = True, return = True, activities =

The integer after each shard group is the lower bound for emp_no values found on that shard.

After the last command, the shard groups should be replicating off the global one. We can verify that this is the case by checking salaries-3:

[vagrant@node3 ~]$ mysql -uroot -e 'show slave statusG' *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: node1 Master_User: fabric Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000001 Read_Master_Log_Pos: 151 Relay_Log_File: mysqld-relay-bin.000002 Relay_Log_Pos: 361 Relay_Master_Log_File: mysql-bin.000001 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 151 Relay_Log_Space: 566 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 870101 Master_UUID: e34ab4cd-00b9-11e4-8ced-0800274fb806 Master_Info_File: /var/lib/mysql/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 1

Looks good. Let’s go ahead and create the database schema. To avoid being too verbose, we’re only including the create statement for the salaries table in this example. Notice we run this on the PRIMARY node for the global group:

[vagrant@node1 ~]$ mysql -uroot Welcome to the MySQL monitor. Commands end with ; or g. Your MySQL connection id is 8 Server version: 5.6.19-log MySQL Community Server (GPL) Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or 'h' for help. Type 'c' to clear the current input statement. mysql> CREATE DATABASE IF NOT EXISTS employees; Query OK, 1 row affected (0.01 sec) mysql> USE employees; Database changed mysql> CREATE TABLE salaries ( -> emp_no INT NOT NULL, -> salary INT NOT NULL, -> from_date DATE NOT NULL, -> to_date DATE NOT NULL, -> KEY (emp_no)); Query OK, 0 rows affected (0.06 sec)

And again, check that it made it to the shard groups:

[vagrant@node2 ~]$ mysql -uroot -e 'show databases' +--------------------+ | Database | +--------------------+ | information_schema | | employees | | mysql | | performance_schema | +--------------------+

Good. We’re now ready to use the Python connector and load some data into this table. We’ll be using the following script:

import mysql.connector from mysql.connector import fabric from mysql.connector import errors import time import random import datetime config = { 'fabric': { 'host': 'store', 'port': 8080, 'username': 'admin', 'password': 'admin', 'report_errors': True }, 'user': 'fabric', 'password': 'f4bric', 'database': 'employees', 'autocommit': 'true' } from_min = datetime.datetime(1980,1,1,00,00,00) to_max = datetime.datetime(2014,1,1,00,00,00) fcnx = None print "starting loop" while 1: if fcnx == None: print "connecting" fcnx = mysql.connector.connect(**config) fcnx.reset_cache() try: print "will run query" emp_no = random.randint(1,50000) salary = random.randint(1,200000) from_date = from_min + datetime.timedelta(seconds=random.randint(0, int((to_max - from_min).total_seconds()))) to_date = from_min + datetime.timedelta(seconds=random.randint(0, int((to_max - from_min).total_seconds()))) fcnx.set_property(tables=["employees.salaries"], key=emp_no, mode=fabric.MODE_READWRITE) cur = fcnx.cursor() cur.execute("insert into employees.salaries (emp_no,salary,from_date,to_date) values (%s, %s, %s, %s)",(emp_no,salary,from_date,to_date)) print "inserted", emp_no, ", will now sleep 1 second" time.sleep(1) except (errors.DatabaseError, errors.InterfaceError): print "sleeping 1 second and reconnecting" time.sleep(1) del fcnx fcnx = None

This is similar to the script we used in our HA post. It inserts rows with random data in an endless loop. The sleep on every iteration is there just to make it easier to cancel the script, and to keep row insert rate under control.

If you leave this running for a while, you should then be able to check the global server and individual shards, and confirm they have different data:

[vagrant@store ~]$ for i in 1 2 3; do mysql -ufabric -pf4bric -hnode$i -e "select count(emp_no),max(emp_no) from employees.salaries"; done Warning: Using a password on the command line interface can be insecure. +---------------+-------------+ | count(emp_no) | max(emp_no) | +---------------+-------------+ | 0 | NULL | +---------------+-------------+ Warning: Using a password on the command line interface can be insecure. +---------------+-------------+ | count(emp_no) | max(emp_no) | +---------------+-------------+ | 36 | 24982 | +---------------+-------------+ Warning: Using a password on the command line interface can be insecure. +---------------+-------------+ | count(emp_no) | max(emp_no) | +---------------+-------------+ | 43 | 49423 | +---------------+-------------+

As you can see, the global group’s server has no data for this table, and each shard’s server has data within the defined boundaries.

Querying data is done similarly (though with a READ_ONLY connection), and we can also lookup the group a row belongs to using the mysqlfabric utility directly:

[vagrant@store ~]$ mysqlfabric sharding lookup_servers employees.salaries 2045 Command : { success = True return = [['ecab7dd2-00b9-11e4-8cee-0800274fb806', 'node2:3306', True]] activities = } [vagrant@store ~]$ mysqlfabric sharding lookup_servers employees.salaries 142045 Command : { success = True return = [['f8a90096-00b9-11e4-8cee-0800274fb806', 'node3:3306', True]] activities = }

Bear in mind that this lookups only use the fabric store, which means they can tell you on which servers a given row may be, but can’t confirm if the row exists or not. You need to actually query the given servers for that. If you use the connector, both steps are done for you when you issue the query.

The following code snippets illustrate the point:

>>> fcnx = mysql.connector.connect(**config) >>> emp_no = random.randint(1,50000) >>> fcnx.set_property(tables=["employees.salaries"], key=emp_no, mode=fabric.MODE_READONLY) >>> >>> cur = fcnx.cursor() >>> cur.execute("select count(*) as cnt from employees.salaries where emp_no = %s", (emp_no,)) >>> >>> for row in cur: ... print row ... (0,) >>> fcnx.set_property(tables=["employees.salaries"], key=20734, mode=fabric.MODE_READONLY) >>> cur = fcnx.cursor() >>> cur.execute("select count(*) as cnt from employees.salaries where emp_no = 20734") >>> for row in cur: ... print row ... ... (1,) >>>

In our examples, we connected directly to the PRIMARY node of the global group in order to execute DDL statements, but the same can be done requesting a global connection to MySQL Fabric, like so:

fcnx.set_property(group="salaries-global",scope=fabric.SCOPE_GLOBAL,mode=fabric.MODE_READWRITE) cur = fcnx.cursor() cur.execute("USE employees") cur.execute("CREATE TABLE test (id int not null primary key)")

We can see that the table gets replicated as expected:

[vagrant@node3 ~]$ mysql -uroot -e 'show tables from employees' +---------------------+ | Tables_in_employees | +---------------------+ | salaries | | test | +---------------------+

Note that we’re explicitly indicating we want to connect to the global group here. When establishing a MySQL Fabric connection, we need to specify either a group name or a key and table pair (as in the insert example).


Today we’ve presented the basics of how MySQL Fabric can help you scale out by sharding, but we’ve intentionally left a few things out of the picture to keep this example simple.

In our next post, we’ll see how we can combine MySQL Fabric’s HA and sharding features, what support we have for shard operations and how HASH sharding works in MySQL Fabric.

The post Managing shards of MySQL databases with MySQL Fabric appeared first on MySQL Performance Blog.

Percona Toolkit 2.2.9 is now available

Latest MySQL Performance Blog posts - July 10, 2014 - 4:05am

Percona is glad to announce the release of Percona Toolkit 2.2.9 on July 10, 2014 (downloads are available here and from the Percona Software Repositories). This release is the current GA (Generally Available) stable release in the 2.2 series.

Bugs Fixed:

  • Fixed bug 1335960: pt-query-digest could not parse the binlogs from MySQL 5.6 because the binlog format was changed.
  • Fixed bug 1315130: pt-online-schema-change did not find child tables as expected. It could incorrectly locate tables – tables which reference a table with the same name in a different schema, and could miss tables referencing the altered table – if they were in a different schema..
  • Fixed bug 1335322: pt-stalk would fail when variable or threshold was a non-integer.
  • Fixed bug 1258135: pt-deadlock-logger was inserting older deadlocks into the deadlock table even if it was already there, therby creating unnecessary noise. For example, if the deadlock happened 1 year ago, and MySQL keeps it in the memory, pt-deadlock-logger would INSERT it into percona.deadlocks table every minute until server was restarted. This was fixed by comparing with the last deadlock fingerprint before issuing the INSERT query.
  • Fixed bug 1329422: pt-online-schema-change foreign-keys-method=none can break FK constraints in a way that is hard to recover from. Although this method of handling foreign key constraints is provided so that the database administrator can disable the tool’s built-in functionality if desired, a warning and confirmation request when using alter-foreign-keys-method “none” has been added to warn users when using this option.

Percona Toolkit is free and open-source. Details of the release can be found in the release notes and the 2.2.9 milestone at Launchpad. Bugs can be reported on the Percona Toolkit launchpad bug tracker.

The post Percona Toolkit 2.2.9 is now available appeared first on MySQL Performance Blog.


Latest MySQL Performance Blog posts - July 9, 2014 - 9:01am

We are using Percona Server + TokuDB engine extensively in Percona Cloud Tools and getting real usage operational experience with this engine. So I want to share some findings we came across, in hope it may help someone in their work with TokuDB.

So, one problem I faced is that SELECT * FROM INFORMATION_SCHEMA.TABLES is quite slow when I have thousands tables in TokuDB. How slow? For example…

select * from information_schema.tables limit 1000; ... 1000 rows in set (18 min 31.93 sec)

This is very similar to what InnoDB faced a couple years back. InnoDB solved it by adding variable innodb_stats_on_metadata.

So what happens with TokuDB? There is an explanation from Rich Prohaska at Tokutek: “Tokudb has too much overhead for table opens. TokuDB does a calculation on the table when it is opened to see if it is empty. This calculation can be disabled when ‘tokudb_empty_scan=disabled‘. ”

So let’s see what we have with tokudb_empty_scan=disabled

select * from information_schema.tables limit 1000; ... 1000 rows in set (3 min 4.59 sec)

An impressive improvement, but still somewhat slow. Tokutek promises a fix to improve it in the next TokuDB 7.2 release.

The post TokuDB gotchas: slow INFORMATION_SCHEMA TABLES appeared first on MySQL Performance Blog.

Check for MySQL slave lag with Percona Toolkit plugin for Tungsten Replicator

Latest MySQL Performance Blog posts - July 9, 2014 - 6:28am

A while back, I made some changes to the plugin interface for pt-online-schema-change which allows custom replication checks to be written. As I was adding this functionality, I also added the --plugin option to pt-table-checksum. This was released in Percona Toolkit 2.2.8.

With these additions, I spent some time writing a plugin that allows Percona Toolkit tools to use Tungsten Replicator to check for slave lag, you can find the code at https://github.com/grypyrg/percona-toolkit-plugin-tungsten-replicator


The plugin uses the perl JSON::XS module (perl-JSON-XS rpm package, http://search.cpan.org/dist/JSON-XS/XS.pm), make sure it’s available or the plugin will not work.


We need to use the --recursion-method=dsns as the Percona Toolkit tools are not able to automatically find the tungsten replicator slaves that are connected to the master database. (I did add a blueprint on launchpad to make this possible https://blueprints.launchpad.net/percona-toolkit/+spec/plugin-custom-recursion-method)

The dsns recursion-method gets the list of slaves from a database table you specify:

CREATE TABLE `percona`.`dsns` ( `id` int(11) NOT NULL AUTO_INCREMENT, `parent_id` int(11) DEFAULT NULL, `dsn` varchar(255) NOT NULL, PRIMARY KEY (`id`) );

Here one slave node3 is replicating from the master:

node1 mysql> select * from percona.dsns; +----+-----------+---------+ | id | parent_id | dsn | +----+-----------+---------+ | 2 | NULL | h=node3 | +----+-----------+---------+


Currently, it is not possible to specify extra options for the plugin with Percona Toolkit, so some manual editing of the perl file is still necessary to configure it.

So before we can run a checksum, we need to configure the plugin:

## CONFIGURATION # trepctl command to run my $trepctl="/opt/tungsten/installs/cookbook/tungsten/tungsten-replicator/bin/trepctl"; # what tungsten replicator service to check my $service="bravo"; # what user does tungsten replicator use to perform the writes? # See Binlog Format for more information my $tungstenusername = 'tungsten';

Running A Checksum

Here I did a checksum of a table with pt-table-checksum. During the checksum process, I brought the slave node offline and brought it back online again:

# pt-table-checksum -u checksum --no-check-binlog-format --recursion-method=dsn=D=percona,t=dsns --plugin=/vagrant/pt-plugin-tungsten_replicator.pl --databases app --check-interval=5 --max-lag=10 Created plugin from /vagrant/pt-plugin-tungsten_replicator.pl. PLUGIN get_slave_lag: Using Tungsten Replicator to check replication lag Tungsten Replicator status of host node3 is OFFLINE:NORMAL, waiting Tungsten Replicator status of host node3 is OFFLINE:NORMAL, waiting Replica node3 is stopped. Waiting. Tungsten Replicator status of host node3 is OFFLINE:NORMAL, waiting Replica lag is 125 seconds on node3. Waiting. Replica lag is 119 seconds on node3. Waiting. Checksumming app.large_table: 22% 00:12 remain TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE 07-03T10:49:54 0 0 2097152 7 0 213.238 app.large_table

I recommend to change the check-interval higher than the default 1 second as running trepctl takes a while. This could slow down the process quite a lot.

Making Schema Changes

The plugin also works with pt-online-schema-change:

# pt-online-schema-change -u schemachange --recursion-method=dsn=D=percona,t=dsns --plugin=/vagrant/pt-plugin-tungsten_replicator.pl --check-interval=5 --max-lag=10 --alter "add index (column1) " --execute D=app,t=large_table Created plugin from /vagrant/pt-plugin-tungsten_replicator.pl. Found 1 slaves: node3 Will check slave lag on: node3 PLUGIN get_slave_lag: Using Tungsten Replicator to check replication lag Operation, tries, wait: copy_rows, 10, 0.25 create_triggers, 10, 1 drop_triggers, 10, 1 swap_tables, 10, 1 update_foreign_keys, 10, 1 Altering `app`.`large_table`... Creating new table... Created new table app._large_table_new OK. Waiting forever for new table `app`.`_large_table_new` to replicate to node3... Altering new table... Altered `app`.`_large_table_new` OK. 2014-07-03T13:02:33 Creating triggers... 2014-07-03T13:02:33 Created triggers OK. 2014-07-03T13:02:33 Copying approximately 8774670 rows... Copying `app`.`large_table`: 26% 01:21 remain Copying `app`.`large_table`: 50% 00:59 remain Replica lag is 12 seconds on node3. Waiting. Replica lag is 12 seconds on node3. Waiting. Copying `app`.`large_table`: 53% 02:22 remain Copying `app`.`large_table`: 82% 00:39 remain 2014-07-03T13:06:06 Copied rows OK. 2014-07-03T13:06:06 Swapping tables... 2014-07-03T13:06:06 Swapped original and new tables OK. 2014-07-03T13:06:06 Dropping old table... 2014-07-03T13:06:06 Dropped old table `app`.`_large_table_old` OK. 2014-07-03T13:06:06 Dropping triggers... 2014-07-03T13:06:06 Dropped triggers OK. Successfully altered `app`.`large_table`.

As you can see, there was some slave lag during the schema changes.

Binlog Format & pt-online-schema-change

pt-online-schema-change uses triggers in order to do the schema changes. Tungsten Replicator has some limitations with different binary log formats and triggers (https://code.google.com/p/tungsten-replicator/wiki/TRCAdministration#Triggers_and_Row_Replication).

In Tungsten Replicator, ROW based binlog events will be converted to SQL statements, which causes triggers to be executed on the slave as well, this does not happen with traditional replication.

Different settings:

  • STATEMENT based binary logging works by default
  • ROW based binary logging works, the plugin recreates the triggers and uses the technique documented at https://code.google.com/p/tungsten-replicator/wiki/TRCAdministration#Triggers_and_Row_Replication
  • MIXED binary logging does not work, as there is currently no way to determine whether an event was written to the binary log in statement or row based format, so it’s not possible to know if triggers should be run or not. The tool will exit and and error will be returned:
    Error creating --plugin: The master it's binlog_format=MIXED, pt-online-schema change does not work well with Tungsten Replicator and binlog_format=MIXED.
Be Warned

The binlog_format can be overriden on a per session basis, make sure that this does NOT happen when using pt-online-schema-change.


The documentation on the Continuent website already mentions how you can compare data with pt-table-checksum.

I believe this plugin is a good addition to it. The features in Percona Toolkit that monitor replication lag can now be used with Tungsten Replicator and therefore gives you control on how much replication lag is tolerated while using those tools.

The post Check for MySQL slave lag with Percona Toolkit plugin for Tungsten Replicator appeared first on MySQL Performance Blog.

TIMESTAMP Columns, Amazon RDS 5.6, and You

Latest MySQL Performance Blog posts - July 8, 2014 - 7:18am

This comes from an issue that I worked on recently, wherein a customer reported that their application was working fine under stock MySQL 5.6 but producing erroneous results when they tried running it on Amazon RDS 5.6. They had a table which, on the working server, contained two TIMESTAMP columns, one which defaulted to CURRENT_TIMESTAMP and the other which defaulted to ’0000-00-00 00:00:00′, like so:


However, under Amazon RDS, the same table looked like this:


They mentioned that their schema contains TIMESTAMP column definitions without any modifiers for nullability or default values. In other words, they were doing something like this:


It’s a known issue (or change, or difference, whatever we choose to call it) that MySQL is deprecating defaults for TIMESTAMP columns that don’t have any nullability or default-value specifiers; this is covered in the 5.6 documentation. However, the docs also mention that the default value for this setting is OFF – i.e., if you create a table with TIMESTAMP columns without any defaults, it will fill them in for you, similarly to what I’ve described above.

As it turns out, the RDS default for this setting is ON, hence the “NULL DEFAULT NULL” modifiers when creating the table under RDS. We changed the parameter group, restarted the instance (note that this variable is NOT dynamic), and their schema-creation script created the tables in the proper way.

So, what have we learned here?
  • Migrating from standalone MySQL to Amazon RDS sometimes has hidden pitfalls that aren’t always readily apparent. Many times it will “just work” – but sometimes it doesn’t. Percona is, of course, happy to help review your configurations and assist with any Amazon RDS implementation plans you might have.
  • When in doubt, fully-specify your TIMESTAMP columns. If you want them NOT NULL, say so. If you want a default value or an on-updated value, set it. Even the configuration variable explicit_defaults_for_timestamp is deprecated and slated for removal in a future version, so eventually it won’t be possible to get the old pre-5.6 behavior at all.

The post TIMESTAMP Columns, Amazon RDS 5.6, and You appeared first on MySQL Performance Blog.

Looking out for max values in integer-based columns in MySQL

Latest MySQL Performance Blog posts - July 7, 2014 - 3:00am

Yay! My first blog post! As long as at least 1 person finds it useful, I’ve done my job.

Recently, one of my long-term clients was noticing that while their INSERTs were succeeding, a particular column counter was not incrementing. A quick investigation determined the column was of type int(11) and they had reached the maximum value of 2147483647. We fixed this by using pt-online-schema-change to change the column to int(10) unsigned, thus allowing values up to 4294967295.

My client was now concerned about all his other integer-based columns and wanted me to check them all. So I wrote a quick-n-dirty script in Go to check all integer-based columns on their current value compared to the maximum allowed for that column type.

You can find the full source code in my git repo.

Here’s a quick overview; the code is pretty simple.

First we connect to MySQL and verify the connection:

db, err := sql.Open("mysql", fmt.Sprintf("%s:%s@tcp(%s:3306)/%s", mysqlUn, mysqlPw, hostToCheck, dbToCheck)) if err != nil { fmt.Printf("Error connecting to MySQL on '%s': n", hostToCheck, err) db.Close() os.Exit(1) } // Check connection is alive. err = db.Ping() if err != nil { fmt.Printf("Unable to ping mysql at '%s': %sn", hostToCheck, err) db.Close() os.Exit(1) }

Next, we query the information_schema.columns table for the names of all integer-based columns and calculate what their maximum value can be (credit for the clever SQL goes to Peter Boros).

// Construct our base i_s query var tableExtraSql string if tableToCheck != "" { tableExtraSql = fmt.Sprintf("AND TABLE_NAME = '%s'", tableToCheck) } baseSql := fmt.Sprintf(` SELECT TABLE_NAME, COLUMN_NAME, COLUMN_TYPE, (CASE DATA_TYPE WHEN 'tinyint' THEN 255 WHEN 'smallint' THEN 65535 WHEN 'mediumint' THEN 16777215 WHEN 'int' THEN 4294967295 WHEN 'bigint' THEN 18446744073709551615 END >> IF(LOCATE('unsigned', COLUMN_TYPE) > 0, 0, 1)) AS MAX_VALUE FROM information_schema.columns WHERE TABLE_SCHEMA = '%s' %s AND DATA_TYPE IN ('tinyint', 'int', 'mediumint', 'bigint')`, dbToCheck, tableExtraSql)

Now that we have this list of columns to check, we simply loop over this result set, get the MAX() of each column and print a pretty report.

// Loop over rows received from i_s query above. for columnsToCheck.Next() { err := columnsToCheck.Scan(&tableName, &columnName, &columnType, &maxValue) if err != nil { log.Fatal("Scanning Row Error: ", err) } // Check this column query := fmt.Sprintf("SELECT MAX(%s), ROUND((MAX(%s)/%d)*100, 2) AS ratio FROM %s.%s", columnName, columnName, maxValue, dbToCheck, tableName) err = db.QueryRow(query).Scan(&currentValue, &ratio) if err != nil { fmt.Printf("Couldn't get MAX(%s.%s): %sn", tableName, columnName, err) fmt.Println("SQL: ", query) continue } // Print report if ratio.Valid && ratio.Float64 >= float64(reportPct) { fmt.Printf("'%s'.'%s' - Type: '%s' - ", tableName, columnName, columnType) fmt.Printf("ColumMax: '%d'", maxValue) fmt.Printf(" - CurVal: '%d'", currentValue.Int64) fmt.Printf(" - FillRatio: '%.2f'n", ratio.Float64) } }

There are more options to the app that allow you to silence some of the verbosity and to only print report lines where the value-to-max ratio is > a user-defined threshold. If you have frequently changing schemas, this should allow you to cron the app and only receive email reports when there is a potential problem. Otherwise, this tool could be useful to run once a month/quarter, just to verify things are in good standing.

Like I said before, hopefully this helps at least 1 person catch a potential problem sooner rather than later.

The post Looking out for max values in integer-based columns in MySQL appeared first on MySQL Performance Blog.

Failover with the MySQL Utilities: Part 2 – mysqlfailover

Latest MySQL Performance Blog posts - July 3, 2014 - 12:00am

In the previous post of this series we saw how you could use mysqlrpladmin to perform manual failover/switchover when GTID replication is enabled in MySQL 5.6. Now we will review mysqlfailover (version 1.4.3), another tool from the MySQL Utilities that can be used for automatic failover.

  • mysqlfailover can perform automatic failover if MySQL 5.6′s GTID-replication is enabled.
  • All slaves must use --master-info-repository=TABLE.
  • The monitoring node is a single point of failure: don’t forget to monitor it!
  • Detection of errant transactions works well, but you have to use the --pedantic option to make sure failover will never happen if there is an errant transaction.
  • There are a few limitations such as the inability to only fail over once, or excessive CPU utilization, but they are probably not showstoppers for most setups.

We will use the same setup as last time: one master and two slaves, all using GTID replication. We can see the topology using mysqlfailover with the health command:

$ mysqlfailover --master=root@localhost:13001 --discover-slaves-login=root health [...] MySQL Replication Failover Utility Failover Mode = auto Next Interval = Tue Jul 1 10:01:22 2014 Master Information ------------------ Binary Log File Position Binlog_Do_DB Binlog_Ignore_DB mysql-bin.000003 700 GTID Executed Set a9a396c6-00f3-11e4-8e66-9cebe8067a3f:1-3 Replication Health Status +------------+--------+---------+--------+------------+---------+ | host | port | role | state | gtid_mode | health | +------------+--------+---------+--------+------------+---------+ | localhost | 13001 | MASTER | UP | ON | OK | | localhost | 13002 | SLAVE | UP | ON | OK | | localhost | 13003 | SLAVE | UP | ON | OK | +------------+--------+---------+--------+------------+---------+

Note that --master-info-repository=TABLE needs to be configured on all slaves or the tool will exit with an error message:

2014-07-01 10:18:55 AM CRITICAL Failover requires --master-info-repository=TABLE for all slaves. ERROR: Failover requires --master-info-repository=TABLE for all slaves.


You can use 2 commands to trigger automatic failover:

  • auto: the tool tries to find a candidate in the list of servers specified with --candidates, and if no good server is found in this list, it will look at the other slaves to see if one can be a good candidate. This is the default command
  • elect: same as auto, but if no good candidate is found in the list of candidates, other slaves will not be checked and the tool will exit with an error.

Let’s start the tool with auto:

$ mysqlfailover --master=root@localhost:13001 --discover-slaves-login=root auto

The monitoring console is visible and is refreshed every --interval seconds (default: 15). Its output is similar to what you get when using the health command.

Then let’s kill -9 the master to see what happens once the master is detected as down:

Failed to reconnect to the master after 3 attemps. Failover starting in 'auto' mode... # Candidate slave localhost:13002 will become the new master. # Checking slaves status (before failover). # Preparing candidate for failover. # Creating replication user if it does not exist. # Stopping slaves. # Performing STOP on all slaves. # Switching slaves to new master. # Disconnecting new master as slave. # Starting slaves. # Performing START on all slaves. # Checking slaves for errors. # Failover complete. # Discovering slaves for master at localhost:13002 Failover console will restart in 5 seconds. MySQL Replication Failover Utility Failover Mode = auto Next Interval = Tue Jul 1 10:59:47 2014 Master Information ------------------ Binary Log File Position Binlog_Do_DB Binlog_Ignore_DB mysql-bin.000005 191 GTID Executed Set a9a396c6-00f3-11e4-8e66-9cebe8067a3f:1-3 Replication Health Status +------------+--------+---------+--------+------------+---------+ | host | port | role | state | gtid_mode | health | +------------+--------+---------+--------+------------+---------+ | localhost | 13002 | MASTER | UP | ON | OK | | localhost | 13003 | SLAVE | UP | ON | OK | +------------+--------+---------+--------+------------+---------+

Looks good! The tool is then ready to fail over to another slave if the new master becomes unavailable.

You can also run custom scripts at several points of execution with the --exec-before, --exec-after, --exec-fail-check, --exec-post-failover options.

However it would be great to have a --failover-and-exit option to avoid flapping: the tool would detect master failure, promote one of the slaves, reconfigure replication and then exit (this is what MHA does for instance).

Tool registration

When the tool is started, it registers itself on the master by writing a few things in the specific table:

mysql> SELECT * FROM mysql.failover_console; +-----------+-------+ | host | port | +-----------+-------+ | localhost | 13001 | +-----------+-------+

This is nice as it avoids that you start several instances of mysqlfailover to monitor the same master. If we try, this is what we get:

$ mysqlfailover --master=root@localhost:13001 --discover-slaves-login=root auto [...] Multiple instances of failover console found for master localhost:13001. If this is an error, restart the console with --force. Failover mode changed to 'FAIL' for this instance. Console will start in 10 seconds..........starting Console.

With the fail command, mysqlfailover will monitor replication health and exit in the case of a master failure, without actually performing failover.

Running in the background

In all previous examples, mysqlfailover was running in the foreground. This is very good for demo, but in a production environment you are likely to prefer running it in the background. This can be done with the --daemon option:

$ mysqlfailover --master=root@localhost:13001 --discover-slaves-login=root auto --daemon=start --log=/var/log/mysqlfailover.log

and it can be stopped with:

$ mysqlfailover --daemon=stop

Errant transactions

If we create an errant transaction on one of the slaves, it will be detected:

MySQL Replication Failover Utility Failover Mode = auto Next Interval = Tue Jul 1 16:29:44 2014 [...] WARNING: Errant transaction(s) found on slave(s). Replication Health Status [...]

However this does not prevent failover from occurring! You have to use --pedantic:

$ mysqlfailover --master=root@localhost:13001 --discover-slaves-login=root --pedantic auto [...] # WARNING: Errant transaction(s) found on slave(s). # - For slave 'localhost@13003': db906eee-012d-11e4-8fe1-9cebe8067a3f:1 2014-07-01 16:44:49 PM CRITICAL Errant transaction(s) found on slave(s). Note: If you want to ignore this issue, please do not use the --pedantic option. ERROR: Errant transaction(s) found on slave(s). Note: If you want to ignore this issue, please do not use the --pedantic option.

  • Like for mysqlrpladmin, the slave election process is not very sophisticated and it cannot be tuned.
  • The server on which mysqlfailover is running is a single point of failure.
  • Excessive CPU utilization: once it is running, mysqlfailover hogs one core. This is quite surprising.

mysqlfailover is a good tool to automate failover in clusters using GTID replication. It is flexible and looks reliable. Its main drawback is that there is no easy way to make it highly available itself: if mysqlfailover crashes, you will have to manually restart it.

The post Failover with the MySQL Utilities: Part 2 – mysqlfailover appeared first on MySQL Performance Blog.

Percona Server 5.5.38-35.2 is now available

Latest MySQL Performance Blog posts - July 2, 2014 - 7:36am

Percona is glad to announce the release of Percona Server 5.5.38-35.2 on July 2, 2014 (Downloads are available here and from the Percona Software Repositories). Based on MySQL 5.5.38, including all the bug fixes in it, Percona Server 5.5.38-35.2 is now the current stable release in the 5.5 series. All of Percona‘s software is open-source and free, all the details of the release can be found in the 5.5.38-35.2 milestone at Launchpad.

Bugs Fixed:

Release notes for Percona Server 5.5.38-35.2 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

The post Percona Server 5.5.38-35.2 is now available appeared first on MySQL Performance Blog.

Using MySQL triggers and views in Amazon RDS

Latest MySQL Performance Blog posts - July 2, 2014 - 3:00am

I recently had an opportunity to migrate a customer from a physical server into Amazon’s RDS environment. In this particular case the customers’ platform makes extensive use of MySQL triggers and views.  I came across two significant issues that prevented me from following Amazon’s documentation, which basically states “use mysqldump” but doesn’t call out a specific method of dealing with MySQL triggers and views.

Amazon Relational Database Service (Amazon RDS) is a great platform if you’re looking for complete hands-off management of your MySQL environment, but comes at a cost in the area of flexibility, i.e. you don’t have SUPER privilege and this brings up additional challenges.

  1. You need to ensure you set log_bin_trust_function_creators=1 ( by default this is off, 0).
  2. You need to clean up your mysqldump syntax.

#1 is easy, you simply make a configuration change within the Amazon RDS GUI on the node’s Parameter Group to set log_bin_trust_function_creators=1 and then a restart of your Amazon RDS node.  The restart is required since without the SUPER privilege you lose access to changing DYNAMIC variables on the fly.
#2 is a little more complex.  If you go with vanilla mysqldump (from say a 5.5 mysqldump binary) on a schema that has triggers and views, you will see error 1227, something like this:

ERROR 1227 (42000) at line 27311: Access denied; you need (at least one of) the SUPER privilege(s) for this operation

You’re seeing this message because MySQL in Amazon RDS doesn’t provide the SUPER privilege, and thus you cannot set up a trigger or view to run as a different user — only a user with SUPER can do that.

mysqldump will generate syntax for a trigger like this:

DELIMITER ;; /*!50003 CREATE*/ /*!50017 DEFINER=`root`@`%`*/ /*!50003 TRIGGER `after_insert_lead` AFTER INSERT ON `leads` FOR EACH ROW BEGIN UPDATE analytics.mapping SET id_lead = NEW.id_lead WHERE mc_email = NEW.email; END */;; DELIMITER ;

and for a view like this:

/*!50001 CREATE ALGORITHM=UNDEFINED */ /*!50013 DEFINER=`web`@`%` SQL SECURITY DEFINER */ /*!50001 VIEW `admin_user_view` AS SELECT ...

The problem is in the “DEFINER” lines.

Here’s one method that worked for me:

  1. Identify all the DEFINER lines in your schema. I found it helpful to dump out a –no-data and then weed through that to get a unique list of the DEFINER lines
  2. Create a sed line for each unique DEFINER line (see my example in a moment)
  3. Include this sed line in your dump/load script

Here’s what my sed matches looked like:

sed -e 's//*!50017 DEFINER=`root`@`localhost`*///' -e 's//*!50017 DEFINER=`root`@`%`*///' -e 's//*!50017 DEFINER=`web`@`%`*///' -e 's//*!50017 DEFINER=`cron`@`%`*///' -e 's//*!50013 DEFINER=`cron`@`%` SQL SECURITY DEFINER *///' -e 's//*!50013 DEFINER=`root`@`localhost` SQL SECURITY DEFINER *///' -e 's//*!50013 DEFINER=`root`@`%` SQL SECURITY DEFINER *///' -e 's//*!50013 DEFINER=`web`@`%` SQL SECURITY DEFINER *///'

Note: the example above won’t directly work due to WordPress “helpfully” stripping my text… you need to escape the forward slashes and asterisks.

A big caveat: this method is akin to a brute force method of getting your data into Amazon RDS — you’ve lost the elegance & security of running your triggers and views as separate defined users within the database — they are all now going to run as the user you loaded them in as. If this is a show-stopper for you, contact Percona and I’d be happy to take on your case and develop a more comprehensive solution. 

Now all that’s left is to integrate this into your dump flow.  Something like this should work:

mysqldump --host=source | sed -e ... lots of lines | mysql --host=destination

I hope this helps someone!

The post Using MySQL triggers and views in Amazon RDS appeared first on MySQL Performance Blog.

Percona Server 5.6.19-67.0 with TokuDB (GA) now available

Latest MySQL Performance Blog posts - July 1, 2014 - 8:15am

Percona is glad to announce the release of Percona Server 5.6.19-67.0 on July 1, 2014. Download the latest version from the Percona web site or from the Percona Software Repositories.

Based on MySQL 5.6.19, including all the bug fixes in it, Percona Server 5.6.19-67.0 is the current GA release in the Percona Server 5.6 series. All of Percona’s software is open-source and free. Complete details of this release can be found in the 5.6.19-67.0 milestone on Launchpad.

New Features:

  • Percona has merged a contributed patch by Kostja Osipov implementing the Multiple user level locks per connection feature. This feature fixes the upstream bugs: #1118 and #67806.
  • TokuDB storage engine support is now considered general availability (GA) quality. The TokuDB storage engine from Tokutek improves scalability and the operational efficiency of MySQL with faster performance and increased compression. It is available as a separate package and can be installed along with the Percona Server by following the instructions in the release documentation.
  • Percona Server now supports the MTR --valgrind option for a server that is either statically or dynamically linked with jemalloc.

Bugs Fixed:

  • The libperconaserverclient18.1 package was missing the library files. Bug fixed #1329911.
  • Percona Server introduced a regression in 5.6.17-66.0 when support for TokuDB storage engine was initially introduced. This regression caused spurious “wrong table structure” errors for PERFORMANCE_SCHEMA tables. Bug fixed #1329772.
  • Race condition in group commit code could lead to a race condition in PFS instrumentation code resulting in a server crash. Bug fixed #1309026 (upstream #72681).

Other bugs fixed: #1326348 and #1167486.

NOTE: There was no Percona Server 5.6.18 release because there was no MySQL Community Server 5.6.18 release. That version number was used for a MySQL Enterprise Edition release to address the OpenSSL “Heartbleed” issue.

Release notes for Percona Server 5.6.19-67.0 are available in the online documentation. Please report any bugs on the launchpad bug tracker.

The post Percona Server 5.6.19-67.0 with TokuDB (GA) now available appeared first on MySQL Performance Blog.

How to avoid even more of the common (but deadly) MySQL development mistakes

Latest MySQL Performance Blog posts - June 30, 2014 - 9:08am

On July 16 I’ll be presenting my next webinar focusing on common mistakes committed by MySQL users.

How to Avoid Even More of the Common (but Deadly) MySQL Development Mistakes

“Why can’t I just save my data to a file?”

Using an SQL database seems so complex to get right, and for good reason. The variety of data-driven applications is practically limitless, and as project requirements change, we find ourselves taking shortcuts and adopting bad habits. But there are proven methods to understanding how to develop and manage data in a scalable and reliable way. This talk shows you some of these methods, including:

  • How to optimize a database application with partitioning and sharding.
  • How to avoid the secret security vulnerability that you’re probably guilty of.
  • How to use optimizer hints effectively.

At the end of this webinar, you’ll be more productive and confident as you develop database-driven applications.

Please register for this webinar and join me on July 16!

You might also like to view recordings of my past “deadly mistakes” webinars:

The post How to avoid even more of the common (but deadly) MySQL development mistakes appeared first on MySQL Performance Blog.

Failover with the MySQL Utilities – Part 1: mysqlrpladmin

Latest MySQL Performance Blog posts - June 27, 2014 - 8:41am

MySQL Utilities are a set of tools provided by Oracle to perform many kinds of administrative tasks. When GTID-replication is enabled, 2 tools can be used for slave promotion: mysqlrpladmin and mysqlfailover. We will review mysqlrpladmin (version 1.4.3) in this post.

  • mysqlrpladmin can perform manual failover/switchover when GTID-replication is enabled.
  • You need to have your servers configured with --master-info-repository = TABLE or to add the --rpl-user option for the tool to work properly.
  • The check for errant transactions is failing in the current GA version (1.4.3) so be extra careful when using it or watch bug #73110 to see when a fix is committed.
  • There are some limitations, for instance the inability to pre-configure the list of slaves in a configuration file or the inability to check that the tool will work well without actually doing a failover or switchover.
Failover vs switchover

mysqlrpladmin can help you promote a slave to be the new master when the master goes down and then automate replication reconfiguration after this slave promotion. There are 2 separate scenarios: unplanned promotion (failover) and planned promotion (switchover). Beyond the words, it has implications on the way you have to execute the tool.

Setup for this test

To test the tool, our setup will be a master with 2 slaves, all using GTID replication. mysqlrpladmin can show us the current replication topology with the health command:

$ mysqlrpladmin --master=root@localhost:13001 --discover-slaves-login=root health # Discovering slaves for master at localhost:13001 # Discovering slave at localhost:13002 # Found slave: localhost:13002 # Discovering slave at localhost:13003 # Found slave: localhost:13003 # Checking privileges. # # Replication Topology Health: +------------+--------+---------+--------+------------+---------+ | host | port | role | state | gtid_mode | health | +------------+--------+---------+--------+------------+---------+ | localhost | 13001 | MASTER | UP | ON | OK | | localhost | 13002 | SLAVE | UP | ON | OK | | localhost | 13003 | SLAVE | UP | ON | OK | +------------+--------+---------+--------+------------+---------+ # ...done.

As you can see, we have to specify how to connect to the master (no surprise) but instead of listing all the slaves, we can let the tool discover them.

Simple failover scenario

What will the tool do when performing failover? Essentially we will give it the list of slaves and the list of candidates and it will:

  • Run a few sanity checks
  • Elect a candidate to be the new master
  • Make the candidate as up-to-date as possible by making it a slave of all the other slaves
  • Configure replication on all the other slaves to make them replicate from the new master

After killing -9 the master, let’s try failover:

$ mysqlrpladmin --slaves=root:@localhost:13002,root:@localhost:13003 --candidates=root@localhost:13002 failover

This time, the master is down so the tool has no way to automatically discover the slaves. Thus we have to specify them with the --slaves option.

However we get an error:

# Checking privileges. # Checking privileges on candidates. ERROR: You must specify either the --rpl-user or set all slaves to use --master-info-repository=TABLE.

The error message is clear, but it would have been nice to have such details when running the health command (maybe a warning instead of an error). That would allow you to check beforehand that the tool can run smoothly rather than to discover in the middle of an emergency that you have to look at the documentation to find which option is missing.

Let’s choose to specify the replication user:

$ mysqlrpladmin --slaves=root:@localhost:13002,root:@localhost:13003 --candidates=root@localhost:13002 --rpl-user=repl:repl failover # Checking privileges. # Checking privileges on candidates. # Performing failover. # Candidate slave localhost:13002 will become the new master. # Checking slaves status (before failover). # Preparing candidate for failover. # Creating replication user if it does not exist. # Stopping slaves. # Performing STOP on all slaves. # Switching slaves to new master. # Disconnecting new master as slave. # Starting slaves. # Performing START on all slaves. # Checking slaves for errors. # Failover complete. # # Replication Topology Health: +------------+--------+---------+--------+------------+---------+ | host | port | role | state | gtid_mode | health | +------------+--------+---------+--------+------------+---------+ | localhost | 13002 | MASTER | UP | ON | OK | | localhost | 13003 | SLAVE | UP | ON | OK | +------------+--------+---------+--------+------------+---------+ # ...done.

Simple switchover scenario

Let’s now restart the old master and configure it as a slave of the new master (by the way, this can be done with mysqlreplicate, another tool from the MySQL Utilities). If we want to promote the old master, we can run:

$ mysqlrpladmin --master=root@localhost:13002 --new-master=root@localhost:13001 --discover-slaves-login=root --demote-master --rpl-user=repl:repl --quiet switchover # Discovering slave at localhost:13001 # Found slave: localhost:13001 # Discovering slave at localhost:13003 # Found slave: localhost:13003 +------------+--------+---------+--------+------------+---------+ | host | port | role | state | gtid_mode | health | +------------+--------+---------+--------+------------+---------+ | localhost | 13001 | MASTER | UP | ON | OK | | localhost | 13002 | SLAVE | UP | ON | OK | | localhost | 13003 | SLAVE | UP | ON | OK | +------------+--------+---------+--------+------------+---------+

Notice that the master is available in this case so we can use the discover-slaves-login option. Also notice that we can tune the verbosity of the tool by using --quiet or --verbose or even log the output in a file with --log.

We also used --demote-master to make the old master a slave of the new master. Without this option, the old master will be isolated from the other nodes.

Extension points

In general doing switchover/failover at the database level is one thing but informing the other components of the application that something has changed is most often necessary for the application to keep on working correctly.

This is where the extension points are handy: you can execute a script before switchover/failover with --exec-before and after switchover/failover with --exec-after.

For instance with these simple scripts:

# cat /usr/local/bin/check_before #!/bin/bash /usr/local/mysql5619/bin/mysql -uroot -S /tmp/node1.sock -Ee 'SHOW SLAVE STATUS' > /tmp/before # cat /usr/local/bin/check_after #!/bin/bash /usr/local/mysql5619/bin/mysql -uroot -S /tmp/node1.sock -Ee 'SHOW SLAVE STATUS' > /tmp/after

We can execute:

$ mysqlrpladmin --master=root@localhost:13001 --new-master=root@localhost:13002 --discover-slaves-login=root --demote-master --rpl-user=repl:repl --quiet --exec-before=/usr/local/bin/check_before --exec-after=/usr/local/bin/check_after switchover

And looking the /tmp/before and /tmp/after, we can see that our scripts have been executed:

# cat /tmp/before # cat /tmp/after *************************** 1. row *************************** Slave_IO_State: Queueing master event to the relay log Master_Host: localhost Master_User: repl Master_Port: 13002 [...]

If the external script does not seem to work, using –verbose can be useful to diagnose the issue.

What about errant transactions?

We already mentioned that errant transactions can create lots of issues when a new master is promoted in a cluster running GTIDs. So the question is: how mysqlrpladmin behaves when there is an errant transaction?

Let’s create an errant transaction:

# On localhost:13003 mysql> CREATE DATABASE test2; mysql> FLUSH LOGS; mysql> SHOW BINARY LOGS; +------------------+-----------+ | Log_name | File_size | +------------------+-----------+ | mysql-bin.000001 | 69309 | | mysql-bin.000002 | 1237667 | | mysql-bin.000003 | 617 | | mysql-bin.000004 | 231 | +------------------+-----------+ mysql> PURGE BINARY LOGS TO 'mysql-bin.000004';

and let’s try to promote localhost:13003 as the new master:

$ mysqlrpladmin --master=root@localhost:13001 --new-master=root@localhost:13003 --discover-slaves-login=root --demote-master --rpl-user=repl:repl --quiet switchover [...] +------------+--------+---------+--------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | host | port | role | state | gtid_mode | health | +------------+--------+---------+--------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | localhost | 13003 | MASTER | UP | ON | OK | | localhost | 13001 | SLAVE | UP | ON | IO thread is not running., Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.', Slave has 1 transactions behind master. | | localhost | 13002 | SLAVE | UP | ON | IO thread is not running., Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.', Slave has 1 transactions behind master. | +------------+--------+---------+--------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Oops! Although it is suggested by the documentation, the tool does not check errant transactions. This is a major issue as you cannot run failover/switchover reliably with GTID replication if errant transactions are not correctly detected.

The documentation suggests errant transactions should be checked and a quick look at the code confirms that, but it does not work! So it has been reported.

Some limitations

Apart from the missing errant transaction check, I also noticed a few limitations:

  • You cannot use a configuration file listing all the slaves. This becomes boring once you have a large amount of slaves. In such a case, you should write a wrapper script around mysqlrpladmin to generate the right command for you
  • The slave election process is either automatic or it relies on the order of the servers given in the --candidates option. This is not very sophisticated.
  • It would be useful to have a –dry-run mode which would validate that everything is configured correctly but without actually failing/switching over. This is something MHA does for instance.

mysqlrpladmin is a very good tool to help you perform manual failover/switchover in a cluster using GTID replication. The main caveat at this point is the failing check for errant transactions, which requires a lot of care before executing the tool.

The post Failover with the MySQL Utilities – Part 1: mysqlrpladmin appeared first on MySQL Performance Blog.

Percona Server with TokuDB (beta): Installation, configuration

Latest MySQL Performance Blog posts - June 26, 2014 - 9:18am

My previous post was an introduction to the TokuDB storage engine and aimed at explaining the basics of its design and how it differentiates from InnoDB/XtraDB. This post is all about motivating you to give it a try and have a look for yourself. Percona Server is not officially supporting TokuDB as of today, though the guys in the development team are working hard on this and the first GA release of Percona Server with TokuDB is looming on the horizon. However, there’s a beta version available now. For the installation tests in this post I’ve used the latest version of Percona Server alongside the accompanying TokuDB complement, which was published last week.

Installing Percona Server with TokuDB on a sandbox

One of the tools Percona Support Engineers really love is Giuseppe Maxia’s MySQL Sandbox. It allows us to setup a sandbox running a MySQL instance of our choice and makes executing multiple ones for testing purposes very easily. Whenever a customer reaches us with a problem happening on a particular version of MySQL or Percona Server that we can reproduce, we quickly spin off a new sandbox and test it ourselves, so it’s very handy. I’ll use one here to explore this beta version of Percona Server with TokuDB but if you prefer you can install it the regular way using a package from our apt experimental or yum testing repositories.

We start by downloading the tarballs from here: TokuDB’s plugin has been packaged in its own tarball, so there are two to download. Once you get them let’s decompress both and create a unified working directory, compressing it again to create a single tarball we’ll use as source to create our sandbox:

[nando@test01 ~]# tar zxvf Percona-Server-5.6.17-rel66.0-608.Linux.x86_64.tar.gz [nando@test01 ~]# tar zxvf Percona-Server-5.6.17-rel66.0-608.TokuDB.Linux.x86_64.tar.gz [nando@test01 ~]# tar cfa Percona-Server-5.6.17-rel66.0-608.Linux.x86_64.tar.gz Percona-Server-5.6.17-rel66.0-608.Linux.x86_64/

Before going ahead, verify if you have transparent huge pages enabled as TokuDB won’t run if it is set. See this documentation page for explanation on what this is and how to disable it on Ubuntu. In my CentOS test server it was defined in a slightly different place and I’ve done the following to temporarily disable it:

[nando@test01]# echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled [nando@test01]# echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag

We’re now ready to create our sandbox. The following command should be enough (I’ve chosen to run Percona Server on port 5617, you can use any other available one):

[nando@test01 ~]# make_sandbox Percona-Server-5.6.17-rel66.0-608.Linux.x86_64.tar.gz -- --sandbox_directory=tokudb --sandbox_port=5617

If the creation process goes well you will see something like the following at the end:

.... sandbox server started Your sandbox server was installed in $HOME/sandboxes/tokudb

You should now be able to access the MySQL console on the sandbox with the default credentials; if you cannot, verify the log-in $HOME/sandboxes/tokudb/data/msandbox.err:

[nando@test01 ~]# mysql --socket=/tmp/mysql_sandbox5617.sock -umsandbox -pmsandbox

Alternatively, you can make use of the “use” script located inside the sandbox directory, which employs the same credentials (configured in the client section of the configuration file my.sandbox.cnf):

[nando@test01 ~]# cd sandboxes/tokudb/ [nando@test01 tokudb]# ./use

First thing to check is if TokuDB is being listed as an available storage engine:

mysql> SHOW ENGINES; +--------------------+---------+----------------------------------------------------------------------------+--------------+------+------------+ | Engine | Support | Comment | Transactions | XA | Savepoints | +--------------------+---------+----------------------------------------------------------------------------+--------------+------+------------+ | (...) | (...) | (...) | (...) | (...)| (...) | | TokuDB | YES | Tokutek TokuDB Storage Engine with Fractal Tree(tm) Technology | YES | YES | YES | | InnoDB | DEFAULT | Percona-XtraDB, Supports transactions, row-level locking, and foreign keys | YES | YES | YES | | (...) | (...) | (...) | NO | (...)| (...) | +--------------------+---------+----------------------------------------------------------------------------+--------------+------+------------+

If that’s not the case, you may need to load the plugins manually – I had to do so in my sandbox; you may not need if you’re installing it from a package in a fresh setup:

mysql> INSTALL PLUGIN tokudb SONAME 'ha_tokudb.so';

TokuDB should now figure in the list of supported ENGINES but you still need to activate the related plugins:

mysql> INSTALL PLUGIN tokudb_file_map SONAME 'ha_tokudb.so'; mysql> INSTALL PLUGIN tokudb_fractal_tree_info SONAME 'ha_tokudb.so'; mysql> INSTALL PLUGIN tokudb_fractal_tree_block_map SONAME 'ha_tokudb.so'; mysql> INSTALL PLUGIN tokudb_trx SONAME 'ha_tokudb.so'; mysql> INSTALL PLUGIN tokudb_locks SONAME 'ha_tokudb.so'; mysql> INSTALL PLUGIN tokudb_lock_waits SONAME 'ha_tokudb.so';

Please note the INSTALL PLUGIN action results in permanent changes and thus is required only once. No modifications to MySQL’s configuration file are required to have those plugins load in subsequent server restarts.

Now you should see not only the main TokuDB plugin but also the add-ons to the INFORMATION SCHEMA:

mysql> SHOW PLUGINS; +-------------------------------+----------+--------------------+--------------+---------+ | Name | Status | Type | Library | License | +-------------------------------+----------+--------------------+--------------+---------+ | (...) | (...) | (...) | (...) | (...) | | TokuDB | ACTIVE | STORAGE ENGINE | ha_tokudb.so | GPL | | TokuDB_trx | ACTIVE | INFORMATION SCHEMA | ha_tokudb.so | GPL | | TokuDB_locks | ACTIVE | INFORMATION SCHEMA | ha_tokudb.so | GPL | | TokuDB_lock_waits | ACTIVE | INFORMATION SCHEMA | ha_tokudb.so | GPL | | TokuDB_file_map | ACTIVE | INFORMATION SCHEMA | ha_tokudb.so | GPL | | TokuDB_fractal_tree_info | ACTIVE | INFORMATION SCHEMA | ha_tokudb.so | GPL | | TokuDB_fractal_tree_block_map | ACTIVE | INFORMATION SCHEMA | ha_tokudb.so | GPL | +-------------------------------+----------+--------------------+--------------+---------+

We are now ready to create our first TokuDB table – the only different thing to do here is to specify TokuDB as the storage engine to use:

mysql> CREATE TABLE test.Numbers (id INT PRIMARY KEY, number VARCHAR(20)) ENGINE=TokuDB;

Note some unfamiliar files lying in the datadir; the details surrounding those is certainly good material for future posts:

[nando@test01]# ls ~/sandboxes/tokudb/data auto.cnf _test_Numbers_main_3_2_19.tokudb ibdata1 _test_Numbers_status_3_1_19.tokudb ib_logfile0 tokudb.directory ib_logfile1 tokudb.environment log000000000000.tokulog25 __tokudb_lock_dont_delete_me_data msandbox.err __tokudb_lock_dont_delete_me_environment mysql __tokudb_lock_dont_delete_me_logs mysql_sandbox5617.pid __tokudb_lock_dont_delete_me_recovery performance_schema __tokudb_lock_dont_delete_me_temp tc.log tokudb.rollback test

Configuration: what’s really important

As noted by Vadim long ago, “Tuning of TokuDB is much easier than InnoDB, there’re only a few parameters to change, and actually out-of-box things running pretty well“:

mysql> show variables like 'tokudb_%'; +---------------------------------+------------------+ | Variable_name | Value | +---------------------------------+------------------+ | tokudb_alter_print_error | OFF | | tokudb_analyze_time | 5 | | tokudb_block_size | 4194304 | | tokudb_cache_size | 522651648 | | tokudb_check_jemalloc | 1 | | tokudb_checkpoint_lock | OFF | | tokudb_checkpoint_on_flush_logs | OFF | | tokudb_checkpointing_period | 60 | | tokudb_cleaner_iterations | 5 | | tokudb_cleaner_period | 1 | | tokudb_commit_sync | ON | | tokudb_create_index_online | ON | | tokudb_data_dir | | | tokudb_debug | 0 | | tokudb_directio | OFF | | tokudb_disable_hot_alter | OFF | | tokudb_disable_prefetching | OFF | | tokudb_disable_slow_alter | OFF | | tokudb_disable_slow_update | OFF | | tokudb_disable_slow_upsert | OFF | | tokudb_empty_scan | rl | | tokudb_fs_reserve_percent | 5 | | tokudb_fsync_log_period | 0 | | tokudb_hide_default_row_format | ON | | tokudb_init_flags | 11403457 | | tokudb_killed_time | 4000 | | tokudb_last_lock_timeout | | | tokudb_load_save_space | ON | | tokudb_loader_memory_size | 100000000 | | tokudb_lock_timeout | 4000 | | tokudb_lock_timeout_debug | 1 | | tokudb_log_dir | | | tokudb_max_lock_memory | 65331456 | | tokudb_pk_insert_mode | 1 | | tokudb_prelock_empty | ON | | tokudb_read_block_size | 65536 | | tokudb_read_buf_size | 131072 | | tokudb_read_status_frequency | 10000 | | tokudb_row_format | tokudb_zlib | | tokudb_tmp_dir | | | tokudb_version | tokudb-7.1.7-rc7 | | tokudb_write_status_frequency | 1000 | +---------------------------------+------------------+ 42 rows in set (0.00 sec)

The most important of the tokudb_ variables is arguably tokudb_cache_size. The test server where I ran those tests (test01) have a little less than 1G of memory and as you can see above TokuDB is “reserving” half (50%) of them to itself. That’s the default behavior but, of course, you can change it. And you must do it if you are also going to have InnoDB tables on your server – you should not overcommit memory between InnoDB and TokuDB engines. Shlomi Noach wrote a good post explaining the main TokuDB-specific variables and what they do. It’s definitely a worth read.

I hope you have fun testing Percona Server with TokuDB! If you run into any problems worth reporting, please let us know.

The post Percona Server with TokuDB (beta): Installation, configuration appeared first on MySQL Performance Blog.

Why %util number from iostat is meaningless for MySQL capacity planning

Latest MySQL Performance Blog posts - June 25, 2014 - 3:00am

Earlier this month I wrote about vmstat iowait cpu numbers and some of the comments I got were advertising the use of util% as reported by the iostat tool instead. I find this number even more useless for MySQL performance tuning and capacity planning.

Now let me start by saying this is a really tricky and deceptive number. Many DBAs who report instances of their systems having a very busy IO subsystem said the util% in vmstat was above 99% and therefore they believe this number is a good indicator of an overloaded IO subsystem.

Indeed – when your IO subsystem is busy, up to its full capacity, the utilization should be very close to 100%. However, it is perfectly possible for the IO subsystem and MySQL with it to have plenty more capacity than when utilization is showing 100% – as I will show in an example.

Before that though lets see what the iostat manual page has to say on this topic – from this main page we can read:


Percentage of CPU time during which I/O requests were issued to the device (bandwidth utilization for the device). Device saturation occurs when this value is close to 100% for devices serving requests serially. But for devices serving requests in parallel, such as RAID arrays and modern SSDs, this number does not reflect their performance limits.

Which says right here that the number is useless for pretty much any production database server that is likely to be running RAID, Flash/SSD, SAN or cloud storage (such as EBS) capable of handling multiple requests in parallel.

Let’s look at the following illustration. I will run sysbench on a system with a rather slow storage data size larger than buffer pool and uniform distribution to put pressure on the IO subsystem. I will use a read-only benchmark here as it keeps things more simple…

sysbench –num-threads=1 –max-requests=0 –max-time=6000000 –report-interval=10 –test=oltp –oltp-read-only=on –db-driver=mysql –oltp-table-size=100000000 –oltp-dist-type=uniform –init-rng=on –mysql-user=root –mysql-password= run

I’m seeing some 9 transactions per second, while disk utilization from iostat is at nearly 96%:

[ 80s] threads: 1, tps: 9.30, reads/s: 130.20, writes/s: 0.00 response time: 171.82ms (95%)
[ 90s] threads: 1, tps: 9.20, reads/s: 128.80, writes/s: 0.00 response time: 157.72ms (95%)
[ 100s] threads: 1, tps: 9.00, reads/s: 126.00, writes/s: 0.00 response time: 215.38ms (95%)
[ 110s] threads: 1, tps: 9.30, reads/s: 130.20, writes/s: 0.00 response time: 141.39ms (95%)

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
dm-0 0.00 0.00 127.90 0.70 4070.40 28.00 31.87 1.01 7.83 7.52 96.68

This makes a lot of sense – with read single thread read workload the drive should be only used getting data needed by the query, which will not be 100% as there is some extra time needed to process the query on the MySQL side as well as passing the result set back to sysbench.

So 96% utilization; 9 transactions per second, this is a close to full-system capacity with less than 5% of device time to spare, right?

Let’s run a benchmark with more concurrency – 4 threads at the time; we’ll see…

[ 110s] threads: 4, tps: 21.10, reads/s: 295.40, writes/s: 0.00 response time: 312.09ms (95%)
[ 120s] threads: 4, tps: 22.00, reads/s: 308.00, writes/s: 0.00 response time: 297.05ms (95%)
[ 130s] threads: 4, tps: 22.40, reads/s: 313.60, writes/s: 0.00 response time: 335.34ms (95%)

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
dm-0 0.00 0.00 295.40 0.90 9372.80 35.20 31.75 4.06 13.69 3.38 100.01

So we’re seeing 100% utilization now, but what is interesting – we’re able to reclaim much more than less than 5% which was left if we look at utilization – throughput of the system increased about 2.5x

Finally let’s do the test with 64 threads – this is more concurrency than exists at storage level which is conventional hard drives in RAID on this system…

[ 70s] threads: 64, tps: 42.90, reads/s: 600.60, writes/s: 0.00 response time: 2516.43ms (95%)
[ 80s] threads: 64, tps: 42.40, reads/s: 593.60, writes/s: 0.00 response time: 2613.15ms (95%)
[ 90s] threads: 64, tps: 44.80, reads/s: 627.20, writes/s: 0.00 response time: 2404.49ms (95%)

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
dm-0 0.00 0.00 601.20 0.80 19065.60 33.60 31.73 65.98 108.72 1.66 100.00

In this case we’re getting 4.5x of throughput compared to single thread and 100% utilization. We’re also getting almost double throughput of the run with 4 thread where 100% utilization was reported. This makes sense – there are 4 drives which can work in parallel and with many outstanding requests they are able to optimize their seeks better hence giving a bit more than 4x.

So what have we so ? The system which was 96% capacity but which could have driven still to provide 4.5x throughput – so it had plenty of extra IO capacity. More powerful storage might have significantly more ability to run requests in parallel so it is quite possible to see 10x or more room after utilization% starts to be reported close to 100%

So if utilization% is not very helpful what can we use to understand our database IO capacity better ? First lets look at the performance reported from those sysbench runs. If we look at 95% response time you can see 1 thread and 4 threads had relatively close 95% time growing just from 150ms to 250-300ms. This is the number I really like to look at- if system is able to respond to the queries with response time not significantly higher than it has with concurrency of 1 it is not overloaded. I like using 3x as multiplier – ie when 95% spikes to be more than 3x of the single concurrency the system might be getting to the overload.

With 64 threads the 95% response time is 15-20x of the one we see with single thread so it is surely overloaded.

Do we have anything reported by iostat which we can use in a similar way? It turns out we do! Check out the “await” column which tells us how much the requester had to wait for the IO request to be serviced. With single concurrency it is 7.8ms which is what this drives can do for random IO and is as good as it gets. With 4 threads it is 13.7ms – less than double of best possible, so also good enough… with concurrency of 64 it is however 108ms which is over 10x of what this device could produce with no waiting and which is surely sign of overload.

A couple words of caution. First, do not look at svctm which is not designed with parallel processing in mind. You can see in our case it actually gets better with high concurrency while really database had to wait a lot longer for requests submitted. Second, iostat mixes together reads and writes in single statistics which specifically for databases and especially on RAID with BBU might be like mixing apples and oranges together – writes should go to writeback cache and be acknowledged essentially instantly while reads only complete when actual data can be delivered. The tool pt-diskstats from Percona Tookit breaks them apart and so can be much more for storage tuning for database workload.

Final note – I used a read-only workload on purpose – when it comes to writes things can be even more complicated – MySQL buffer pool can be more efficient with more intensive writes plus group commit might be able to commit a lot of transactions with single disk write. Still, the same base logic will apply.

Summary: The take away is simple – util% only shows if a device has at least one operation to deal with or is completely busy, which does not reflect actual utilization for a majority of modern IO subsystems. So you may have a lot of storage IO capacity left even when utilization is close to 100%.

The post Why %util number from iostat is meaningless for MySQL capacity planning appeared first on MySQL Performance Blog.

Sysbench Benchmarking of Tesora’s Database Virtualization Engine

Latest MySQL Performance Blog posts - June 24, 2014 - 5:00am

Tesora, previously called Parelastic, asked Percona to do a sysbench benchmark evaluation of its Database Virtualization Engine on specific architectures on Amazon EC2.

The focus of Tesora is to provide a scalable Database As A Service platform for OpenStack. The Database Virtualization Engine (DVE) plays a part in this as it aims at allowing databases to scale transparently across multiple MySQL shards.

DVE was open sourced last week. Downloads and source are already available on tesora.com

Some of the features include:

  • Transparent Sharding of data accross multiple storage nodes.
  • Applications can connect to DVE directly, using the MySQL Protocol, no code changes required.
  • Transactional and full ACID compliance with multiple storage nodes.
  • Storage Nodes can be added on an existing cluster.
Benchmarking Setup

Synthetic benchmarks were run using sysbench on different environments and different DVE architectures, as provided by Tesora. Architectures with 1 and 3 DVE nodes were benchmarked using up to 5 storage nodes.

The environments include a disk-bound dataset, the purpose of which is to create more insight into how large datasets might scale across multiple nodes. This type of use case is a common reason for companies to look into solutions like sharding.

A memory-bound dataset was also benchmarked to find out how DVE performs.

The memory-bound dataset was also used to compare a standalone MySQL Instance with a single DVE node using a single storage node. This provides more insight into the amount of overhead DVE creates at its most basic setup.

The remainder of this blog post gives a general overview on the findings of these benchmarks.
The full report, which includes more information on the configuration and goes deeper in it’s analysis of the results, can be downloaded here (PDF).


OLTP – Disk Bound – Throughput:

OLTP – Disk Bound – Response Time:

In terms of scalability DVE lives up to its expectations as long as there is enough CPU available on the DVE nodes.

More complex transactions that query across multiple shards scale quite well. The explanation for that scalability, which is beyond linear, is that the available memory grows as storage nodes are added. The environment becomes less disk bound and performs better.

SELECT, INSERT, UPDATE – Disk Bound – Throughput:

SELECT, INSERT, UPDATE – Disk Bound – Response Time:

Single row reads and writes scale even better. This demonstrates that sharding has to be tailored towards both the data and the queries that will be executed across those tables.

Especially when using storage nodes with default EBS storage, disk performance is bad which makes the difference even larger.

The good thing about this solution is that storage and DVE nodes can be added, and the capacity of the whole system increases. No application code changes are necessary.

Running such an environment could come at a higher cost, but it could save a lot of resources and thus money on development. Of course, as a sharded architecture such as this are more complex compared to a non sharded architecture, the operational cost should not be ignored.


Memory Bound Throughput:

Memory Bound Response Time:

The overhead of using DVE is noticeable, but much of it is related to the added network hop in the database layer and the CPU requirements of the DVE nodes.

DVE suffers as we go above 64 threads as you can see in the big increase in response time.

CPU Intensive

OLTP – Disk Bound:

DVE nodes are very CPU intensive. Throughout the analysis, the bottleneck is often caused by the CPU-constrained DVE nodes, this is very visible when looking at the benchmark results of p1_s3 and p1_s5, which use only one DVE node.

Even with simpler single-row, primary key-based SELECT statements, the overhead is noticeable.

Keep in mind that the hardware specifications of the DVE nodes (8 cores) were much higher than the database nodes (2 cores) itself. This makes the issue even more apparent.


Overall DVE looks very promising. When looking at the sysbench results, DVE seems to scale very well, provided that there are enough CPU resources available for the DVE nodes. You can download the full report here (PDF).

It will be interesting to see how much DVE scales on larger instance types. Other types of benchmarks should also be performed, such as linkbench. It is also important to understand the impact on the operational side, such as adding storage nodes, backups & recovery, high availability and how well it works transactions in various scenarios. Stay tuned.

The post Sysbench Benchmarking of Tesora’s Database Virtualization Engine appeared first on MySQL Performance Blog.


Subscribe to Percona aggregator
Contact Us 24 Hours A Day
Support Contact us 24×7
Emergency? Contact us for help now!
Sales North America (888) 316-9775 or
(208) 473-2904
+44-208-133-0309 (UK)
0-800-051-8984 (UK Toll Free)
0-800-181-0665 (GER Toll Free)
More Numbers
Training (855) 55TRAIN or
(925) 271-5054


Share This