Subscribe to Latest MySQL Performance Blog posts feed
Percona's Data performance and scalability blog
Updated: 3 min 50 sec ago

InnoDB vs TokuDB in LinkBench benchmark

July 24, 2015 - 7:12am

Previously I tested Tokutek’s Fractal Trees (TokuMX & TokuMXse) as MongoDB storage engines – today let’s look into the MySQL area.

I am going to use modified LinkBench in a heavy IO-load.

I compared InnoDB without compression, InnoDB with 8k compression, TokuDB with quicklz compression.
Uncompressed datasize is 115GiB, and cachesize is 12GiB for InnoDB and 8GiB + 4GiB OS cache for TokuDB.

Important to note is that I used tokudb_fanout=128, which is only available in our latest Percona Server release.
I will write more on Fractal Tree internals and what does tokudb_fanout mean later. For now let’s just say it changes the shape of the fractal tree (comparing to default tokudb_fanout=16).

I am using two storage options:

  • Intel P3600 PCIe SSD 1.6TB (marked as “i3600” on charts) – as a high end performance option
  • Crucial M500 SATA SSD 900GB (marked as “M500” on charts) – as a low end SATA SSD

The full results and engine options are available here

Results on Crucial M500 (throughput, more is better)

    Engine Throughput [ADD_LINK/10sec]

  • InnoDB: 6029
  • InnoDB 8K: 6911
  • TokuDB: 14633

There TokuDB outperforms InnoDB almost two times, but also shows a great variance in results, which I correspond to a checkpoint activity.

Results on Intel P3600 (throughput, more is better)

  • Engine Throughput [ADD_LINK/10sec]
  • InnoDB: 27739
  • InnoDB 8K: 9853
  • TokuDB: 20594

To understand the reasoning why InnoDB shines on a fast storage let’s review IO usage by all engines.
Following chart shows Reads in KiB, that engines, in average, performs for a request from client.

Following chart shows Writes in KiB, that engines, in average, performs for a request from client.

There we can make interesting observations that TokuDB on average performs two times less writes than InnoDB, and this is what allows TokuDB to be better on slow storages. On a fast storage, where there is no performance penalty on many writes, InnoDB is able to get ahead, as InnoDB is still better in using CPUs.

Though, it worth remembering, that:

  • On a fast expensive storage, TokuDB provides a better compression, which allows to store more data in limited capacity
  • TokuDB still writes two time less than InnoDB, that mean twice longer lifetime for SSD (still expensive).

Also looking at the results, I can make the conclusion that InnoDB compression is inefficient in its implementation, as it is not able to get befits: first, from doing less reads (well, it helps to get better than uncompressed InnoDB, but not much); and, second, from a fast storage.

The post InnoDB vs TokuDB in LinkBench benchmark appeared first on MySQL Performance Blog.

The Q&A: Creating best-in-class backup solutions for your MySQL environment

July 23, 2015 - 6:55am

Thank you for attending my July 15 webinar, “Creating Best in Class Backup solutions for your MySQL environment.” Due to the amount of content we discussed and some minor technical difficulties faced near the end of webinar we have decided to cover the final two slides of the presentation along with the questions asked by attendees during the webinar via this blog post.

The slides are available for download. And you can watch the webinar in it’s entirety here.

The final two slides were about our tips for having a good backup and recovery strategy. Lets see the bullet points along with what would have been their explanation during the webinar :

  • Use the three types of backups
    • Binary for full restores, new slaves
      • Binary backups are easy to restore, plus takes the least amount of time to restore. The mean time to recover is mostly bound by the time to transfer backup to the appropriate target server,
    • Logical for partial restores
      • Logical backups, especially when done table-wise come in handy when you’re wanting to restore one or few smaller tables,
    • Binlog for point in time recovery
      • Very often the need is to have Point In Time Recovery, with a Full backup of any type (Logical or Binary) its half the story, we still need the DML statements processed on the server in order to bring it to the latest state, thats where Binary logs (Binlog) backups come into picture.
  • Store on more than one server and off-site
    •  Store your backups on more than one location, what if the backup server goes down ? Considering offsite storages like Amazon S3 and Glacier with weekly or monthly backups retention can be cheaper options.
  • Test your backups!!!!
    • Testing your backups is very important, its always great to know backups are recoverable and not corrupted. Spin off an EC2 instance if you want, copy and restore the backup there, roll-forward a days worth of binlogs just to be sure.
  • Document restore procedures, script them and test them!!!
    • Also when you test your backups, make sure to document the steps to restore the backup to avoid last minute hassle over which commands to use.
  • If taking from a slave run pt-table-checksum
    • Backups are mostly taken from slaves, as such make sure to checksum them regularly, you dont wanna backup inconsistent data. 
  • Configuration files, scripts
    • Data is not the only thing you should be backing up, backup your config files, scripts and user access at a secure location.
  • Do you need to backup everything all days?
    • For very large instances doing a logical backup is a toughie, in such cases evaluate your backup needs, do you want to backup all the tables ? Most of the time smaller tables are the more important ones, and needs partial restore, backup only those.
  • Hardlinking backups can save lot of disk space in some circumstances
    • There are schemas which contains only a few high activity tables, rest of them are probably updated once a week or are updated by an archiver job that runs montly, make sure to hardlink the files with the previous backup, it can save good amount of space in such scenarios.
  • Monitor your Backups
    • Lastly, monitor your backups. You do not want to realize that you’re backup had been failing the whole time. Even a simple email notification from your backup scripts can help reduce the chance of failure.

Now lets try to answer some of the questions asked during the webinar :

Q : –use-memory=2G, is that pretty standard, if we have more more, should we have a higher value?
Usually we would evaluate the value based on size of xtrabackup_logfile (amount of transactions to apply). If you have more free memory feel free to provide it to –use-memory, you dont want to let the memory be a bottleneck in the restore process.

Q : which is the best backup option for a 8Tb DB?
Usually it would depend on what type of data would you have and business requirements for the backups. For eg: a full xtrabackup and later incrementals on the weekdays would be a good idea. Time required for backups play an important role here, backing up to a slow NAS share can be time consuming, and it will make xtrabackup record lot of transactions which will further increase your restore time. Also look into backing up very important medium-small size tables via logical backups.

Q : I’m not sure if this will be covered, but if you have a 3 master-master-master cluster using haproxy, is it recommended to run the backup from the haproxy server or directly on a specific server? Would it be wise to have a 4th server which would be part of the cluster, but not read from to perform the backups?
I am assuming this a Galera cluster setup, in which case you can do backups locally on any of the node by using tools like percona xtrabackup, however the best solution would be spinning off a slave from one of the nodes and running backups there.

Q : With Mudumper, can we strem the data over SSH or netcat to another server? Or would one have to use something like NFS? I’ve used mysqldump and piped it over netcat before.. curious if we can do that with Mydumper ?
Mydumper is similar in nature with other mysql client tools. They can be run remotely (–host option). Which means you can run mydumper from another server to backup from the master or slave. Mydumper can be piped for sure too.

Q : Is Mydumper still maintained. It hasn’t had a release since March of last year?
Indeed, Max Bubenick from Percona is currently maintaining the project. Actually he has added new features to the tool which  makes it more comprehensive and feature rich. He is planning the next release soon, stay tuned for the blog post.

Q : Is MyDumper an opensource ? prepare and restore are same ?
Absolutely. Right now we need to download the source and compile, however very soon we will have packages built for it too. Prepare and Restore are common terminologies used in the backup lingo, in the webinar, Restore means copying back the backup files from its storage location to the destination location, whereas Prepare means applying the transactions to the backup and making it ready to restore.

Q : Is binlog mirroring needed on Galera (PXC)?
It is good idea to keep binlog mirroring. Even though the IST and SST will do its job to join the node, the binlogs could play a role in case you wanted to rollforward a particular schema on a slave or QA instance.

Q : As we know that Percona XtraBackup takes Full & Incremental as well. Like that Does MyDumper helps in taking the incremental backup.?
At this moment we do not have the ability to take Incremental backups with mydumper or with any other logical backup tool. However, Weekly full backups (logical) and daily binlog backups can serve as the same strategy with other Incremental backup solutions, plus they are easy to restore

Q : Is it possible to encrypt the output file ? What will be Best methodology to back up data with the database size of 7 to 8 Gb and increses 25 % each day ? what is difference between innobackupex and mydumper ?
Indeed its possible to encrypt the backup files, as a matter of fact, we encrypt backup files with GPG keys before uploading to offsite storage. The best method to backup a 7 to 8G instance would be implementing all 3 types of backup we discussed in the webinar, your scenarios require planning for the future, so its always best to have different solutions available as the data grows. Innobackupex is part of the Percona-Xtrabackup toolkit and is a script which does binary backups of databases, MyDumper on the other hand is a logical backup tool which creates backups as text files.

Q : How can I optimize a MySQL dump of a large database? The main bottleneck while taking MySQL dump backup of a large database is if any table is found to be corrupted then it never goes beyond by skipping this corrupted tables temporary. Can we take database backup of large database without using locking mechanism i.e. Does someone know how to make the backup without locking the tables ? Is there any tools which would faster in restoration and backup technique or how come we use MySQL dump to optimize this kind of issue in future during crash recovery.
Mysqldump is logical backup tool, and as such it executes full table scans to backup the tables and write them down in the output file, hence its very difficult to improve performance of mysqldump (query-wise). Assuming that you’re referring the corruption to MyISAM tables, it is highly recommended you repair them before backing up, also to make sure mysqldump doesnt fail due to error on such a corrupt table try using –force option to mysqldump. If you’re using MyISAM tables first recommendation would be to switch to Innodb, with most of the tables innodb locking can be greatly reduced, actually till a point where the locking is negligible, look into –single-transaction. Faster backup recovery can be achieved with binary backups, look into using Percona Xtrabackup tool, we have comprehensive documentation to get you started.

Hope this was a good webinar and we have answered most of your questions. Stay tuned for more such webinars from Percona.

The post The Q&A: Creating best-in-class backup solutions for your MySQL environment appeared first on MySQL Performance Blog.

MySQL QA Episode 8: Reducing Testcases for Engineers: tuning reducer.sh

July 23, 2015 - 3:00am

Welcome to MySQL QA Episode 8: Reducing Testcases for Engineers: tuning reducer.sh

  1. Advanced configurable variables & their default/vanilla reducer.sh settings
  2. Some examples

Full-screen viewing @ 720p resolution recommended.

The post MySQL QA Episode 8: Reducing Testcases for Engineers: tuning reducer.sh appeared first on MySQL Performance Blog.

SELinux and the MySQL init script

July 22, 2015 - 10:09am

I recently worked with a customer who had a weird issue: when their MySQL server was started (Percona Server 5.5), if they try to run service mysql start a second time, the init script was not able to detect that an instance was already running. As a result, it tried to start a second instance with the same settings as the first one. Of course this fails and this creates a mess. What was the issue? A missing rule in SELinux. At least it looks like


If SELinux is set to enforcing and if you are using Percona Server on CentOS/RHEL 6 (other versions could be affected), service mysql start doesn’t work properly and a fix is simple to run:

# grep mysqld_safe /var/log/audit/audit.log | audit2allow -M mysqld_safe # semodule -i mysqld_safe.pp # service mysql restart

Other options are:

  • Set SELinux to permissive
  • Use the CentOS/RHEL standard MySQL init script (note I didn’t extensively check if that could trigger other errors)
How did we see the issue?

That was pretty easy: if an instance is already running and if you run service mysql start again, you should see something like this in the MySQL error log:

150717 08:47:58 mysqld_safe A mysqld process already exists

But if you rather see tons of error messages like:

2015-07-17 08:47:05 27065 [ERROR] InnoDB: Unable to lock ./ibdata1, error: 11 2015-07-17 08:47:05 27065 [Note] InnoDB: Check that you do not already have another mysqld process using the same InnoDB data or log files.

it means that the init script is broken somewhere.


When the issue was brought to my attention, I tried to reproduce it on my local box, but with no luck. What was so special in the configuration used by the customer?

The only thing that was slightly out of the ordinary was SELinux which was set to enforcing. Then we set SELinux to permissive, and guess what? service mysql start was now working properly and it didn’t allow 2 concurrent instances to be run!

Next step was to look at the SELinux logs to find any error related to MySQL, and we discovered messages like:

type=SYSCALL msg=audit(1437121845.464:739): arch=c000003e syscall=62 success=no exit=-13 a0=475 a1=0 a2=0 a3=7fff0e954130 items=0 ppid=1 pid=5732 auid=500 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=5 comm="mysqld_safe" exe="/bin/bash" subj=unconfined_u:system_r:mysqld_safe_t:s0 key=(null)

At this point, we knew that a rule was missing for mysqld_safe, we needed to add a new one.

Deeper investigation

Actually what happens is that SELinux prevents this condition to be executed in mysqld_safe:

if kill -0 $PID > /dev/null 2> /dev/null

and then the script assumes that this means the mysqld process is not running. That’s why a second mysqld is started.

However users of Oracle MySQL will probably never experience that issue, simply because the init script is a bit different: before calling mysqld_safe, the init script tries to ping a potential mysqld instance and if it gets a positive reply or an Access denied error, it concludes that mysqld is already running and it doesn’t invoke mysqld_safe.

The fix

Fortunately, this is quite simple. You can generate the corresponding rule with audit2allow:

grep mysqld_safe /var/log/audit/audit.log | audit2allow -M mysqld_safe

And after checking the corresponding .te file, we were able to load that new module:

semodule -i mysqld_safe.pp

After stopping MySQL, you can now use service mysql start normally.


This issue was quite interesting to work on because finding the culprit was not that easy. Also it only triggers when SELinux is enabled and Percona Server is used. Now should the init script of Percona Server be fixed? I’m not sure of the potential problems that could occur if we did so, but of course feel free to leave your feedback in the comments.

The post SELinux and the MySQL init script appeared first on MySQL Performance Blog.

Percona now offering 24/7 support for MongoDB and TokuMX

July 21, 2015 - 12:22pm

Today Percona announced the immediate availability of 24/7, enterprise-class support for MongoDB and TokuMX. The new support service helps organizations achieve maximum application performance without database bloat. Customers have round-the-clock access (365 days a year) to the most trusted team of database experts in the open source community.

The news means that Percona now offers support across the entire open-source database ecosystem, including the entire LAMP stack (Linux, Apache, MySQL, and PHP/Python/Perl), providing a single, expert, proven service provider for companies to turn to in good times (always best to be proactive) – and during emergencies, too.

Today’s support announcement follows Percona’s acquisition of Tokutek, which included the Tokutek distribution of MongoDB – making Percona the first vendor to offer both MySQL and MongoDB software and solutions.

Like Percona’s other support services, support for MongoDB and TokuMX enables organizations to talk directly with Percona’s support experts at any time, day or night.

The Percona Support team is always ready to help resolve database and server instability, initiate data recovery, optimize performance, deal with response and outage issues – and ensure proactive system monitoring and alert responses. Percona also offers support across on-premises, cloud, and hybrid deployments.

The post Percona now offering 24/7 support for MongoDB and TokuMX appeared first on MySQL Performance Blog.

MySQL QA Episode 7: Reducing Testcases for Beginners – single-threaded reducer.sh!

July 21, 2015 - 3:00am

Welcome to MySQL QA Episode #7 – Reducing Testcases for Beginners: single-threaded reducer.sh!

In this episode we’ll learn how to use reducer.sh. Topics discussed;

  1. reducer.sh introduction/concepts
  2. Basic configurable variables & their default reducer.sh settings
    1. INPUTFILE options
    2. MODE=x
    3. TEXT=”text”
    5. MYEXTRA
    6. MYBASE
    9. How to learn more about each of the settings
  3. Manual example
  4. Introduction to the script’s self-recursion concept – subreducer
  5. Quick setup re-cap, details of an already executed QA run
  6. Examples from pquery-prep-red.sh (including some issue reviews)
  7. Gotcha’s

Full-screen viewing @ 720p resolution recommended.

If the speed is too slow for you, consider setting YouTube to 1.25 playback speed.

The post MySQL QA Episode 7: Reducing Testcases for Beginners – single-threaded reducer.sh! appeared first on MySQL Performance Blog.

Percona Live Amsterdam discounted pricing ends July 26!

July 20, 2015 - 2:32pm

The Percona Live Data Performance Conference in Amsterdam is just two months away and it’s going to be an incredible event. With a new expanded focus on MySQL, NoSQL, and Data in the Cloud, this conference will be jam-packed with talks from some of the industry’s leading experts from MongoDB, VMware, Oracle, MariaDB, Facebook, Booking.com, Pythian, Google, Rackspace, Yelp (and many more, including of course Percona).

Early Bird pricing ends this Sunday (July 26)! So if you want to save €25, then you’d better register now. And for all of my readers, you can take an additional 10% off the entire registration price by using the promo code “10off” at checkout.

It’s also important to book your room at the Mövenpick Hotel for a special rate – but hurry because that deal ends July 27 and the rooms are disappearing fast due to some other big events going on in Amsterdam that week.

Sponsorship opportunities are also still available for Percona Live Amsterdam. Event sponsors become part of a dynamic and fast-growing ecosystem and interact with hundreds of DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solution vendors and entrepreneurs who typically attend the event. This year’s conference will feature expanded accommodations and turnkey kiosks. Current sponsors include:

We’ve got a fantastic conference schedule on tap. Sessions, which will follow each morning’s keynote addresses, feature a variety of topics related to MySQL and NoSQL, High Availability, DevOps, Programming, Performance Optimization, Replication and Backup, MySQL in the Cloud, MySQL Case Studies, Security, and What’s New in MySQL and MongoDB.

Sessions Include:
  • “InnoDB: A Journey to the Core,” Jeremy Cole, Sr. Systems Engineer, Google, Inc. and Davi Arnaut, Software Engineer, LinkedIn
  • “MongoDB Patterns and Antipatterns for Dev and Ops,” Steffan Mejia, Principal Consulting Engineer, MongoDB, Inc.
  • “NoSQL’s Biggest Lie: SQL Never Went Away,” Matthew Revell, Lead Developer Advocate, Couchbase
  • “The Future of Replication is Today: New Features in Practice,” Giuseppe Maxia, Quality Assurance Architect, VMware
  • “What’s New in MySQL 5.7,” Geir Høydalsvik, Senior Software Development Director, Oracle
Tutorials include:
  • “Best Practices for MySQL High Availability,” Colin Charles, Chief Evangelist, MariaDB
  • “Mongo Sharding from the Trench: A Veterans Field Guide,” David Murphy, Lead DBA, Rackspace Data Stores
  • “Advanced Percona XtraDB Cluster in a Nutshell, La Suite: Hands on Tutorial Not for Beginners!,” Frederic Descamps, Senior Architect, Percona

The conference’s evening events will be a perfect way to network, relax and have FUN while seeing the beautiful city of Amsterdam!

Monday night, September 21, after the tutorial sessions conclude, attendees are invited to the Delirium Cafe located across the street from the conference venue. With more than 500 beers on tap and great food, this will be the perfect way to kick off the Conference.

Tuesday night, September 22, Booking.com will be hosting the Community dinner of the year at their very own headquarters located in historic Rembrandt Square in the heart of the city. Hop on one of the sponsored canal boats that will pick you up right outside of the Mövenpick for your chance to see the city from the water on the way to the community dinner! You’ll be dropped off right next to Booking.com’s offices!

Wednesday night, September 23, there will be a closing reception taking place at the Mövenpick for your last chance to visit with our exhibitors and to wrap up what promises to be an amazing conference!

See you in Amsterdam!

The post Percona Live Amsterdam discounted pricing ends July 26! appeared first on MySQL Performance Blog.

Fractal Tree library as a Key-Value store

July 20, 2015 - 6:50am

As you may know, Tokutek is now part of Percona and I would like to explain some internals of TokuDB and TokuMX – what performance benefits they bring, along with further optimizations we are working on.

However, before going into deep details, I feel it is needed to explain the fundamentals of Key-Value store, and how Fractal Tree handles it.

Before that, allow me to say that I hear opinions that the “Fractal Tree” name does not reflect an internal structure and looks more like a marketing term than a technical one. I will not go into this discussion and will keep using name “Fractal Tree” just out of the respect to inventors. I think they are in a position to name their invention with any name they want.

So with that said, the Fractal Tree library implements a new data structure for a more efficient handling (with main focus on insertion, but more on this later) of Key-Value store.

You may question how Key-Value is related to SQL Transactional databases – this is more from the NOSQL world. Partially this is true, and Fractal Tree Key-Value library is successfully used in Percona TokuMX (based on MongoDB 2.4) and Percona TokuMXse (storage engine for MongoDB 3.0) products.

But if we look on a Key-Value store in general, actually it maybe a good fit to use in structural databases. To explain this, let’s take a look in Key-Value details.

So what is Key-Value data structure?

We will use a notation (k,v), or key=>val, which basically mean we associate some value “v” with a key “k”. For software developers following analogies may be close:
key-value access is implemented as dictionary in Python, associative array in PHP or map in C++.
(More details in Wikipedia)

I will define key-value structure as a list of pairs (k,v).

It is important to note that both key and value cannot be just scalars (single value), but to be compound.
That is "k1, k2, k3 => v1, v2", which we can read as (give me two values by a 3-part key).

This brings us closer to a database table structure.
If we apply additional requirement that all (k) in list (k,v) must be unique, this will represent
a PRIMARY KEY for a traditional database table.
To understand this better, let’s take a look on following table:
CREATE TABLE metrics (
ts timestamp,
device_id int,
metric_id int,
cnt int,
val double,
PRIMARY KEY (ts, device_id, metric_id),
KEY metric_id (metric_id, ts),
KEY device_id (device_id, ts)

We can state that Key-Value structure (ts, device_id, metric_id => cnt, val), with a requirement
"ts, device_id, metric_id" to be unique, represents PRIMARY KEY for this table, actually this is how InnoDB (and TokuDB for this matter) stores data internally.

Secondary indexes also can be represented in Key=>Value notion, for example, again, how it is used in TokuDB and InnoDB:
(seconday_index_key=>primary_key), where a key for a secondary index points to a primary key (so later we can get values by looking up primary key). Please note that that seconday_index_key may not be unique (unless we add an UNIQUE constraint to a secondary index).

Or if we take again our table, the secondary keys are defined as
(metric_id, ts => ts, device_id, metric_id)
(device_id, ts => ts, device_id, metric_id)

It is expected from a Key-Value storage to support basic data manipulation and extraction operations, such as:

        – Add or Insert: add

(key => value)

        pair to a collection
        – Update: from

(key => value2)


(key => value2)

        , that is update


        assigned to


        – Delete: remove(key): delete a pair

(key => value)

        from a collection
        – Lookup (select): give a


        assigned to


and I want to add fifth operation:

        – Range lookup: give all values for keys defined by a range, such as

"key > 5"


"key >= 10 and key < 15"

They way software implements an internal structure of Key-Value store defines the performance of mentioned operations, and especially if datasize of a store grows over a memory capacity.

For the decades, the most popular data structure to represent Key-Value store on disk is B-Tree, and within the reason. I won’t go into B-Tree details (see for example https://en.wikipedia.org/wiki/B-tree), but it provides probably the best possible time for Lookup operations. However it has challenges when it comes to Insert operations.

And this is an area where newcomers to Fractal Tree and LSM-tree (https://en.wikipedia.org/wiki/Log-structured_merge-tree) propose structures which provide a better performance for Insert operations (often at the expense of Lookup/Select operation, which may become slower).

To get familiar with LSM-tree (this is a structure used by RocksDB) I recommend http://www.benstopford.com/2015/02/14/log-structured-merge-trees/. And as for Fractal Tree I am going to cover details in following posts.

The post Fractal Tree library as a Key-Value store appeared first on MySQL Performance Blog.

Bypassing SST in Percona XtraDB Cluster with binary logs

July 16, 2015 - 3:00am

In my previous post, I used incremental backups in Percona XtraBackup as a method for rebuilding a Percona XtraDB Cluster (PXC) node without triggering an actual SST. Practically this reproduces the SST steps, but it can be handy if you already had backups available to use.

In this post, I want to present another methodology for this that also uses a full backup, but instead of incrementals uses any binary logs that the cluster may be producing.

Binary logs on PXC

Binary logs are not strictly needed in PXC for replication, but you may be using them for backups or for asynchronous slaves of the cluster.  To set them up properly, we need the following settings added to our config:

server-id=1 log-bin log-slave-updates

As I stated, none of these are strictly needed for PXC.

  • server-id=1 — We recommend PXC nodes share the same server-id.
  • log-bin — actually enable the binary log
  • log-slave-updates — log ALL updates to the cluster to this server’s binary log

This doesn’t need to be set on every node, but likely you would set these on at least two nodes in the cluster for redundancy.

Note that this strategy should work with or without 5.6 asynchronous GTIDs.

Recovering data with backups and binary logs

This methodology is conventional point-in-time backup recovery for MySQL.  We have a full backup that was taken at a specific binary log position:

... backup created in the past... # innobackupex --no-timestamp /backups/full # cat /backups/full/xtrabackup_binlog_info node3-bin.000002 735622700

We have this binary log and all binary logs since:

-rw-r-----. 1 root root 1.1G Jul 14 18:53 node3-bin.000002 -rw-r-----. 1 root root 1.1G Jul 14 18:53 node3-bin.000003 -rw-r-----. 1 root root 321M Jul 14 18:53 node3-bin.000004

Recover the full backup

We start by preparing the backup with –apply-log:

# innobackupex --apply-log --use-memory=1G /backups/full ... xtrabackup: Recovered WSREP position: 1663c027-2a29-11e5-85da-aa5ca45f600f:60072936 ... InnoDB: Last MySQL binlog file position 0 735622700, file name node3-bin.000002 ... # innobackupex --copy-back /backups/full # chown -R mysql.mysql /var/lib/mysql

The output confirms the same binary log file and position that we knew from before.

Start MySQL without Galera

We need to start mysql, but without Galera so we can apply the binary log changes before trying to join the cluster. We can do this simply by commenting out all the wsrep settings in the MySQL config.

# grep wsrep /etc/my.cnf #wsrep_cluster_address = gcomm://pxc.service.consul #wsrep_cluster_name = mycluster #wsrep_node_name = node3 #wsrep_node_address = #wsrep_provider = /usr/lib64/libgalera_smm.so #wsrep_provider_options = "gcache.size=8G; gcs.fc_limit=1024" #wsrep_slave_threads = 4 #wsrep_sst_method = xtrabackup-v2 #wsrep_sst_auth = sst:secret # systemctl start mysql

Apply the binary logs

We now check our binary log starting position:

# mysqlbinlog -j 735622700 node3-bin.000002 | grep Xid | head -n 1 #150714 18:38:36 server id 1 end_log_pos 735623273 CRC32 0x8426c6bc Xid = 60072937

We can compare the Xid on this binary log position to that of the backup. The Xid in a binary log produced by PXC will be the seqno of the GTID of that transaction. The starting position in the binary log shows us the next Xid is one increment higher, so this makes sense: we can start at this position in the binary log and apply all changes as high as we can go to get the datadir up to a more current position.

# mysqlbinlog -j 735622700 node3-bin.000002 | mysql # mysqlbinlog node3-bin.000003 | mysql # mysqlbinlog node3-bin.000004 | mysql

This action isn’t particularly fast as binlog events are being applied by a single connection thread. Remember that if the cluster is taking writes while this is happening, the amount of time you have is limited by the size of gcache and the rate at which it is being filled up.

Prime the grastate

Once the binary logs are applied, we can check the final log’s last position to get the seqno we need:

[root@node3 backups]# mysqlbinlog node3-bin.000004 | tail -n 500 ... #150714 18:52:52 server id 1 end_log_pos 335782932 CRC32 0xb983e3b3 Xid = 63105191 ...

This is indeed the seqno we put in our grastate.dat. Like in the last post, we can copy a grastate.dat from another node to get the proper format. However, this time we must put the proper seqno into place:

# cat grastate.dat # GALERA saved state version: 2.1 uuid: 1663c027-2a29-11e5-85da-aa5ca45f600f seqno: 63105191 cert_index:

Be sure the grastate.dat has the proper permissions, uncomment the wsrep settings and restart mysql on the node:

# chown mysql.mysql /var/lib/mysql/grastate.dat # grep wsrep /etc/my.cnf wsrep_cluster_address = gcomm://pxc.service.consul wsrep_cluster_name = mycluster wsrep_node_name = node3 wsrep_node_address = wsrep_provider = /usr/lib64/libgalera_smm.so wsrep_provider_options = "gcache.size=8G; gcs.fc_limit=1024" wsrep_slave_threads = 4 wsrep_sst_method = xtrabackup-v2 wsrep_sst_auth = sst:secret # systemctl restart mysql

The node should now attempt to join the cluster with the proper GTID:

2015-07-14 19:28:50 4234 [Note] WSREP: Found saved state: 1663c027-2a29-11e5-85da-aa5ca45f600f:63105191

This, of course, still does not guarantee an IST. See my previous post for more details on the conditions needed for that to happen.

The post Bypassing SST in Percona XtraDB Cluster with binary logs appeared first on MySQL Performance Blog.

Bypassing SST in Percona XtraDB Cluster with incremental backups

July 16, 2015 - 2:00am
Beware the SST

In Percona XtraDB Cluster (PXC) I often run across users who are fearful of SSTs on their clusters. I’ve always maintained that if you can’t cope with a SST, PXC may not be right for you, but that doesn’t change the fact that SSTs with multiple Terabytes of data can be quite costly.

SST, by current definition, is a full backup of a Donor to Joiner.  The most popular method is Percona XtraBackup, so we’re talking about a donor node that must:

  1. Run a full XtraBackup that reads its entire datadir
  2. Keep up with Galera replication to it as much as possible (though laggy donors don’t send flow control)
  3. Possibly still be serving application traffic if you don’t remove Donors from rotation.

So, I’ve been interested in alternative ways to work around state transfers and I want to present one way I’ve found that may be useful to someone out there.

Percona XtraBackup and Incrementals

It is possible to use Percona XtraBackup Full and Incremental backups to build a datadir that might possibly SST.  First we’ll focus on the mechanics of the backups, preparing them and getting the Galera GTID and then later discuss when it may be viable for IST.

Suppose I have fairly recent full Xtrabackup and and one or more incremental backups that I can apply on top of that to get VERY close to realtime on my cluster (more on that ‘VERY’ later).

# innobackupex --no-timestamp /backups/full ... sometime later ... # innobackupex --incremental /backups/inc1 --no-timestamp --incremental-basedir /backups/full ... sometime later ... # innobackupex --incremental /backups/inc2 --no-timestamp --incremental-basedir /backups/inc1

In my proof of concept test, I now have a full and two incrementals:

# du -shc /backups/* 909M full 665M inc1 812M inc2 2.4G total

To recover this data, I follow the normal Xtrabackup incremental apply process:

# cp -av /backups/full /backups/restore # innobackupex --apply-log --redo-only --use-memory=1G /backups/restore ... xtrabackup: Recovered WSREP position: 1663c027-2a29-11e5-85da-aa5ca45f600f:35694784 ... # innobackupex --apply-log --redo-only /backups/restore --incremental-dir /backups/inc1 --use-memory=1G # innobackupex --apply-log --redo-only /backups/restore --incremental-dir /backups/inc2 --use-memory=1G ... xtrabackup: Recovered WSREP position: 1663c027-2a29-11e5-85da-aa5ca45f600f:46469942 ... # innobackupex --apply-log /backups/restore --use-memory=1G

I can see that as I roll forward on my incrementals, I get a higher and higher GTID. Galera’s GTID is stored in the Innodb recovery information, so Xtrabackup extracts it after every batch it applies to the datadir we’re restoring.

We now have a datadir that is ready to go, we need to copy it into the datadir of our joiner node and setup a grastate.dat. Without a grastate, starting the node would force an SST no matter what.

# innobackupex --copy-back /backups/restore # ... copy a grastate.dat from another running node ... # cat /var/lib/mysql/grastate.dat # GALERA saved state version: 2.1 uuid: 1663c027-2a29-11e5-85da-aa5ca45f600f seqno: -1 cert_index: # chown -R mysql.mysql /var/lib/mysql/

If I start the node now, it should see the grastate.dat with the -1 seqo and run –wsrep_recover to extract the GTID from Innodb (I could have also just put that directly into my grastate.dat).

This will allow the node to startup from merged Xtrabackup incrementals with a known Galera GTID.

But will it IST?

That’s the question.  IST happens when the selected donor has all the transactions the joiner needs to get it fully caught up inside of the donor’s gcache.  There are several implications of this:

  • A gcache is mmap allocated and does not persist across restarts on the donor.  A restart essentially purges the mmap.
  • You can query the oldest GTID seqno on a donor by checking the status variable ‘wsrep_local_cached_downto’.  This variable is not available on 5.5, so you are forced to guess if you can IST or not.
  • most PXC 5.6 will auto-select a donor based on IST.  Prior to that (i.e., 5.5) donor selection was not based on IST candidacy at all, meaning you had to be much more careful and do donor selection manually.
  • There’s no direct mapping from the earliest GTID in a gcache to a specific time, so knowing at a glance if a given incremental will be enough to IST is difficult.
  • It’s also difficult to know how big to make your gcache (set in MB/GB/etc.)  with respect to your backups (which are scheduled by the day/hour/etc.)

All that being said, we’re still talking about backups here.  The above method will only work if and only if:

  • You do frequent incremental backups
  • You have a large gcache (hopefully more on this in a future blog post)
  • You can restore a backup faster than it takes for your gcache to overflow

The post Bypassing SST in Percona XtraDB Cluster with incremental backups appeared first on MySQL Performance Blog.

MySQL QA Episode 6: Analyzing and Filtering

July 15, 2015 - 12:00am

Welcome to MySQL QA Episode #6!

Today we will look into analyzing and filtering our QA run. We’ll use tools like pquery-prep-red.sh, pquery-clean-known.sh & pquery-results.sh

1. Quick re-cap and QA run setup
2. pquery-prep-red.sh
3. pquery-clean-known.sh
4. pquery-results.sh
5. Bonus: pquery reach – pquery-reach++.sh

We’ll also introduce the text_string.sh tool which extracts a most-specific text string from the error log in order to classify a bug/issue.

Full-screen viewing @ 720p resolution recommended

The post MySQL QA Episode 6: Analyzing and Filtering appeared first on MySQL Performance Blog.

MongoDB benchmark: sysbench-mongodb IO-bound workload comparison

July 14, 2015 - 7:26am

In this post I’ll share the results of a sysbench-mongodb benchmark I performed on my server. I compared MMAP, WiredTiger, RocksDB and TokuMXse (based on MongoDB 3.0) and TokuMX (based on MongoDB 2.4) in an IO-intensive workload.

The full results are available here, and below I’ll just share the summary chart:

I would like to highlight that this benchmark was designed to emulate a heavy IO load on a (relatively) slow IO subsystem. This use case, I believe, is totally valid and represents frequently used “cloud” setups with limited memory and slow IO.

The WiredTiger engine, as B-Tree based, is expected to perform worse comparing to RocksDB and Toku Fractal Trees, which, are designed to handle IO-intensive workloads. My assumption is that WiredTiger will perform better (or even outperform others) for CPU intensive in-memory workloads (see for example Mark Callaghan’s results). Also WiredTiger is expected to perform better with faster storage.

The post MongoDB benchmark: sysbench-mongodb IO-bound workload comparison appeared first on MySQL Performance Blog.

MySQL QA Episode 5: Preparing Your QA Run with pquery

July 13, 2015 - 12:00am

Welcome to MySQL QA Episode #5! In this episode we’ll be setting up and running pquery for the first time… and I’ll also trigger some actual bugs (fun guaranteed)! I’ll also introduce you to mtr_to_sql.sh and pquery-run.sh.

pquery-run.sh (the main wrapper around pquery) is capable of generating 80-120 MySQL Server crashes – per hour! See how it all works in this episode…

Full-screen viewing @ 720p resolution recommended

The post MySQL QA Episode 5: Preparing Your QA Run with pquery appeared first on MySQL Performance Blog.

How to create a rock-solid MySQL database backup & recovery strategy

July 10, 2015 - 8:05am

Have you ever wondered what could happen if your MySQL database goes down?

Although it’s evident such a crash will cause downtime – and surely some business impact in terms of revenue – can you do something to reduce this impact?

The simple answer is “yes” by doing regular backups (of course) but are you 100% sure that your current backup strategy will really come through when an outage occurs? And how much precious time will pass (and how much revenue will be lost) before you get your business back online?

I usually think of backups as the step after HA fails. Let’s say we’re in M<>M replication and something occurs that kills the db but the HA can’t save the day. Let’s pretend that the UPS fails and those servers are completely out. You can’t failover; you have to restore data. Backups are a key piece of “Business Continuity.” Also factor in the frequent need to restore data that’s been altered by mistake. No ‘WHERE’ clause or DROP TABLE in prod instead of DEV. These instances are where backups are invaluable.

Let’s take some time and discuss the possible backup strategies with MySQL…  how to make backups efficiently and also examine the different tools that are available. We’ll cover these topics and more during my July 15  webinar: “Creating a Best-in-Class Backup and Recovery System for Your MySQL Environment” starting at 10 a.m. Pacific time.

On a related note, did you know that most online backups are possible with mysqldump and you can save some space on backups by using simple Linux tools? I’ll also cover this so be sure to join me next Wednesday. Oh, and it’s a free webinar, too!

Stay tuned!

The post How to create a rock-solid MySQL database backup & recovery strategy appeared first on MySQL Performance Blog.

Percona Live Europe 2015 conference, tutorials schedule now available

July 9, 2015 - 8:10am

The conference and tutorial schedule for Percona Live Europe 2015, September 21-23 in Amsterdam, was published this morning and this year’s event will focus on MySQL, NoSQL and Data in the Cloud.

Conference sessions, which will follow each morning’s keynote addresses, feature a variety of formal tracks and sessions. Topic areas include: high availability (HA), DevOps, programming, performance optimization, replication and backup, MySQL in the cloud, MySQL and NoSQL. There will also be MySQL case studies, session on security and talks about “What’s new in MySQL.”

Technology experts from the world’s leading MySQL and NoSQL vendors and users – including Oracle, MariaDB, Percona, Facebook, Google, LinkedIn and Yelp – will deliver the sessions. Sessions will include:

  • “InnoDB: A Journey to the Core,” Jeremy Cole, Systems Engineer, Google and Davi Arnaut, Software Engineer, LinkedIn
  • “MongoDB Patterns and Antipatterns for Dev and Ops,” Steffan Mejia, Principal Consulting Engineer, MongoDB, Inc.
  • “NoSQL’s Biggest Lie: SQL Never Went Away,” Matthew Revell, Lead Developer Advocate, Couchbase
  • “The Future of Replication is Today: New Features in Practice,” Giuseppe Maxia, Quality Assurance Architect, VMware
  • “What’s New in MySQL 5.7,” Geir Høydalsvik, Senior Software Development Director, Oracle
Tutorial Schedule

Tutorials provide practical, in-depth knowledge of critical MySQL issues. Topics will include:

  • “Best Practices for MySQL High Availability,” Colin Charles, Chief Evangelist, MariaDB
  • “Mongo Sharding from the Trench: A Veterans Field Guide,” David Murphy, Lead DBA, Rackspace Data Stores
  • “Advanced Percona XtraDB Cluster in a Nutshell, La Suite: Hands on Tutorial Not for Beginners!,” Frederic Descamps, Senior Architect, Percona
Featured Events
  • On Monday, September 21 at 5 p.m., Percona will host an opening reception at the Delirium Café in Amsterdam.
  • On Tuesday, September 22 at 7 p.m., the Community Dinner will take place at the offices of Booking.com.
  • On Wednesday September 23 at 6 p.m., the closing reception will be held at the Mövenpick Hotel, giving attendees one last chance to visit the sponsor kiosks.

Sponsorship opportunities for Percona Live Europe 2015 are still available but they are selling out fast. Event sponsors become part of a dynamic and fast-growing ecosystem and interact with hundreds of DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solution vendors and entrepreneurs who typically attend the event. This year’s conference will feature expanded accommodations and turnkey kiosks. Current sponsors include:

  • Diamond: VMware
  • Exhibitors: MariaDB, Severalnines
  • Media: Business Cloud News, Computerworld UK, TechWorld
Planning to Attend?

Early Bird registration discounts for Percona Live Europe 2015 are available through July 26, 2015 at 11:30 p.m. CEST.

The post Percona Live Europe 2015 conference, tutorials schedule now available appeared first on MySQL Performance Blog.

Percona Server 5.6.25-73.1 is now available

July 9, 2015 - 4:48am

Percona is glad to announce the release of Percona Server 5.6.25-73.1 on July 9, 2015. Download the latest version from the Percona web site or from the Percona Software Repositories.

Based on MySQL 5.6.25, including all the bug fixes in it, Percona Server 5.6.25-73.1 is the current GA release in the Percona Server 5.6 series. Percona Server is open-source and free – and this is the latest release of our enhanced, drop-in replacement for MySQL. Complete details of this release can be found in the 5.6.25-73.1 milestone on Launchpad.

New Features:

  • TokuDB storage engine package has been updated to version 7.5.8

New TokuDB Features:

  • Exposed ft-index fanout as TokuDB option tokudb_fanout, (default=16 range=2-16384).
  • Tokuftdump can now provide a summary info with new --summary option.
  • Fanout has been serialized in the ft-header and ft_header.fanout value has been exposed in tokuftdump.
  • New checkpoint status variables have been implemented:
    • CP_END_TIME – checkpoint end time, time spend in checkpoint end operation in seconds,
    • CP_LONG_END_COUNT – long checkpoint end count, count of end_checkpoint operations that exceeded 1 minute,
    • CP_LONG_END_TIME – long checkpoint end time, total time of long checkpoints in seconds.
  • “Engine” status variables are now visible as “GLOBAL” status variables.

TokuDB Bugs Fixed:

  • Fixed assertion with big transaction in toku_txn_complete_txn.
  • Fixed assertion that was caused when a transaction had rollback log nodes orphaned in the blocktable.
  • Fixed ftcxx tests failures that were happening when it was run in parallel.
  • Fixed multiple test failures for Debian/Ubuntu caused by assertion on setlocale().
  • Status has been refactored to its own file/subsystem within ft-index code to make the it more accessible.

Release notes for Percona Server 5.6.25-73.1 are available in the online documentation. Please report any bugs on the launchpad bug tracker .

The post Percona Server 5.6.25-73.1 is now available appeared first on MySQL Performance Blog.

How to obtain the MySQL version from an FRM file

July 9, 2015 - 12:00am

I recently helped a customer figure out why a minor version MySQL upgrade was indicating that some tables needed to be rebuilt. The mysql_upgrade program should be run for every upgrade, no matter how big or small the version difference is, but when only the minor version changes, I would normally not expect it to require tables to be rebuilt.

Turns out some of their tables were still marked with an older MySQL version, which could mean a few things… most likely that something went wrong with a previous upgrade, or that the tables were copied from a server with an older version.

In cases like this, did you know there is a fast, safe and simple way to check the version associated with a table? You can do this by reading the FRM file, following the format specification found here.

If you look at that page, you’ll see that the version is 4 bytes long and starts at offset 0x33. Since it is stored in little endian format, you can get the version just by reading the first two bytes.

This means you can use hexdump to read 2 bytes, starting at offset 0x33, and getting the decimal representation of them, to obtain the MySQL version, like so:

telecaster:test fernandoipar$ hexdump -s 0x33 -n 2 -v -d 55_test.frm
0000033 50532
telecaster:test fernandoipar$ hexdump -s 0x33 -n 2 -v -d 51_test.frm
0000033 50173

The first example corresponds to a table created on MySQL version 5.5.32, while the second one corresponds to 5.1.73.

Does that mean the 51_test table was created on 5.1.73? Not necessarily, as MySQL will update the version on the FRM whenever the table is rebuilt or altered.

The manual page says the details can change with the transition to the new text based format, but I was able to get the version using this command up to version MySQL 5.7.7.

Hope you found that useful!

The post How to obtain the MySQL version from an FRM file appeared first on MySQL Performance Blog.

MySQL QA Episode 4: QA Framework Setup Time!

July 8, 2015 - 12:00am

Welcome to MySQL QA Episode 4! In this episode we’ll look into setting up our QA Framework: percona-qa, pquery, reducer & more.

1. All about percona-qa
2. pquery

$ cd ~; bzr branch lp:percona-qa

3. reducer.sh

$ cd ~; bzr branch lp:randgen $ vi ~/randgen/util/reducer/reducer.sh

4. Short introduction to pquery framework tools

The tools introduced in this episode will be covered further in next two episodes.

Full-screen viewing @ 720p resolution recommended

The post MySQL QA Episode 4: QA Framework Setup Time! appeared first on MySQL Performance Blog.

MySQL QA Episode 3: How to use the debugging tool GDB

July 7, 2015 - 12:00am

Welcome to MySQL QA Episode 3: “Debugging: GDB, Backtraces, Frames and Library Dependencies”

In this episode you’ll learn how to use debugging tool GDB. The following debugging topics are covered:


1. GDB Introduction
2. Backtrace, Stack trace
3. Frames
4. Commands & Logging
5. Variables
6. Library dependencies
7. c++filt
8. Handy references
– GDB Cheat sheet (page #2): https://goo.gl/rrmB9i
– From Crash to testcase: https://goo.gl/1o5MzM

Also expands on live debugging & more. In HD quality (set your player to 720p!)

The post MySQL QA Episode 3: How to use the debugging tool GDB appeared first on MySQL Performance Blog.

TOI wsrep_RSU_method in PXC 5.6.24 and up

July 6, 2015 - 8:28am

I noticed that in the latest release of Percona XtraDB Cluster (PXC), the behavior of wsrep_RSU_method changed somewhat.  Prior to this release, the variable was GLOBAL only, meaning to use it you would:

mysql> set GLOBAL wsrep_RSU_method='RSU'; mysql> ALTER TABLE ... mysql> set GLOBAL wsrep_RSU_method='TOI';

This had the (possibly negative) side-effect that ALL DDL’s issued on this node would be affected by the setting while in RSU mode.

So, in this latest release, this variable was made to also have a SESSION value, while retaining GLOBAL as well. This has a couple of side-effects that are common to MySQL variables that are both GLOBAL and SESSION:

  • The SESSION copy is made from whatever the GLOBAL’s value is when a new connection (session) is established.
  • SET GLOBAL does not affect existing connection’s SESSION values.

Therefore, our above workflow would only set the GLOBAL value to RSU and not the SESSION value for the local connection.  Therefore, our ALTER TABLE will be TOI and NOT RSU!

So, for those using RSU, the proper workflow would be to make your connection, set the SESSION copy of the variable and then issue your DDL:

mysql> set SESSION wsrep_RSU_method='RSU'; mysql> ALTER TABLE ... ... disconnect ...

The advantage here is ONLY your session’s DDLs will be affected by RSU (handy if you possibly do DDLs automatically from your application).

The post TOI wsrep_RSU_method in PXC 5.6.24 and up appeared first on MySQL Performance Blog.