Buy Percona ServicesBuy Now!
Subscribe to Latest MySQL Performance Blog posts feed
Updated: 1 day 9 hours ago

Percona Blog Poll: What Database Engine Are You Using to Store Time Series Data?

February 10, 2017 - 7:19am

Take Percona’s blog poll on what database engine you are using to store time series data.

Time series data is some of the most actionable data available when it comes to analyzing trends and making predictions. Simply put, time series data is data that is indexed not just by value, but by time as well – allowing you to view value changes over time as they occur. Obvious uses include the stock market, web traffic, user behavior, etc.

With the increasing number of smart devices in the Internet of Things (IoT), being able to track data over time is more and more important. With time series data, you can measure and make predictions on things like energy consumption, pH values, water consumption, data from environment-aware machines like smart cars, etc. The sensors used in IoT devices and systems generate huge amounts of time-series data.

How is all of this data collected, segmented and stored? We’d like to hear from you: what database engine are you using to store time series data? Please take a few seconds and answer the following poll. Which are you using? Help the community learn what database engines help solve critical database issues. Please select from one to three database engines as they apply to your environment. Feel free to add comments below if your engine isn’t listed.

Note: There is a poll embedded within this post, please visit the site to participate in this post's poll.

Using NVME Command Line Tools to Check NVMe Flash Health

February 9, 2017 - 11:50am

In this blog post, I’ll look at the types of NVMe flash health information you can get from using the NVMe command line tools.

Checking SATA-based drive health is easy. Whether it’s an SSD or older spinning drive, you can use the smartctl command to get a wealth of information about the device’s performance and health. As an example:

root@blinky:/var/lib/mysql# smartctl -A /dev/sda smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-62-generic] (local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, === START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0  5 Reallocated_Sector_Ct   0x0032   100   100   010    Old_age   Always       -       0  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       41 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       2 171 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0 172 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0 173 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       1 174 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0 183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0 184 End-to-End_Error        0x0032   100   100   000    Old_age   Always       -       0 187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0 194 Temperature_Celsius     0x0022   065   059   000    Old_age   Always       -       35 (Min/Max 21/41) 196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0 197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0 198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0 199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0 202 Unknown_SSD_Attribute   0x0030   100   100   001    Old_age   Offline      -       0 206 Unknown_SSD_Attribute   0x000e   100   100   000    Old_age   Always       -       0 246 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       145599393 247 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       4550280 248 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       582524 180 Unused_Rsvd_Blk_Cnt_Tot 0x0033   000   000   000    Pre-fail  Always       -       1260 210 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0

While smartctl might not know all vendor-specific smart values, typically you can Google the drive model along with “smart attributes” and find documents like this to get more details.

If you move to newer generation NVMe-based flash storage, smartctl won’t work anymore – at least it doesn’t work for the packages available for Ubuntu 16.04 (what I’m running). It looks like support for NVMe in Smartmontools is coming, and it would be great to get a single tool that supports both  SATA and NVMe flash storage.

In the meantime, you can use the nvme tool available from the nvme-cli package. It provides some basic information for NVMe devices.

To get information about the NVMe devices installed:

root@alex:~# nvme list Node             SN                   Model                                    Version  Namespace Usage                      Format           FW Rev ---------------- -------------------- ---------------------------------------- -------- --------- -------------------------- ---------------- -------- /dev/nvme0n1     S3EVNCAHB01861F      Samsung SSD 960 PRO 1TB                  1.2      1         689.63  GB /   1.02  TB    512   B +  0 B   1B6QCXP7

To get SMART information:

root@alex:~# nvme smart-log /dev/nvme0 Smart Log for NVME device:nvme0 namespace-id:ffffffff critical_warning                    : 0 temperature                         : 34 C available_spare                     : 100% available_spare_threshold           : 10% percentage_used                     : 0% data_units_read                     : 3,465,389 data_units_written                  : 9,014,689 host_read_commands                  : 89,719,366 host_write_commands                 : 134,671,295 controller_busy_time                : 310 power_cycles                        : 11 power_on_hours                      : 21 unsafe_shutdowns                    : 8 media_errors                        : 0 num_err_log_entries                 : 1 Warning Temperature Time            : 0 Critical Composite Temperature Time : 0 Temperature Sensor 1                : 34 C Temperature Sensor 2                : 47 C Temperature Sensor 3                : 0 C Temperature Sensor 4                : 0 C Temperature Sensor 5                : 0 C Temperature Sensor 6                : 0 C

To get additional SMART information (not all devices support it):

root@ts140i:/home/pz/workloads/1m# nvme smart-log-add /dev/nvme0 Additional Smart Log for NVME device:nvme0 namespace-id:ffffffff key                               normalized raw program_fail_count              : 100%       0 erase_fail_count                : 100%       0 wear_leveling                   :  62%       min: 1114, max: 1161, avg: 1134 end_to_end_error_detection_count: 100%       0 crc_error_count                 : 100%       0 timed_workload_media_wear       : 100%       37.941% timed_workload_host_reads       : 100%       51% timed_workload_timer            : 100%       446008 min thermal_throttle_status         : 100%       0%, cnt: 0 retry_buffer_overflow_count     : 100%       0 pll_lock_loss_count             : 100%       0 nand_bytes_written              : 100%       sectors: 16185227 host_bytes_written              : 100%       sectors: 6405605

Some of this information is self-explanatory, and some of it isn’t. After looking at the NVME specification document, here is my read on some of the data:

Available Spare. Contains a normalized percentage (0 to 100%) of the remaining spare capacity that is available.

Available Spare Threshold. When the Available Spare capacity falls below the threshold indicated in this field, an asynchronous event completion can occur. The value is indicated as a normalized percentage (0 to 100%).

(Note: I’m not quite sure what the practical meaning of “asynchronous event completion” is, but it looks like something to avoid!)

Percentage Used. Contains a vendor specific estimate of the percentage of the NVM subsystem life used, based on actual usage and the manufacturer’s prediction of NVM life.

(Note: the number can be more than 100% if you’re using storage for longer than its planned life.)

Data Units Read/Data Units Written. This is the number of 512-byte data units that are read/written, but it is measured in an unusual way. The first value corresponds to 1000 of the 512-byte units. So you can multiply this value by 512000 to get value in bytes. It does not include meta-data accesses.

Host Read/Write Commands. The number of commands of the appropriate type issued. Using this value, as well as one below, you can compute the average IO size for “physical” reads and writes.

Controller Busy Time. Time in minutes that the controller was busy servicing commands. This can be used to gauge long-term storage load trends.

Unsafe Shutdowns. The number of times a power loss happened without a shutdown notification being sent. Depending on the NVMe device you’re using, an unsafe shutdown might corrupt user data.

Warning Temperature Time/Critical Temperature Time. The time in minutes a device operated above a warning or critical temperature. It should be zeroes.

Wear_Leveling. This shows how much of the rated cell life was used, as well as the min/max/avg write count for different cells. In this case, it looks like the cells are rated for 1800 writes and about 1100 on average were used

Timed Workload Media Wear. The media wear by the current “workload.” This device allows you to measure some statistics from the time you reset them (called the “workload”) in addition to showing the device lifetime values.

Timed Workload Host Reads. The percentage of IO operations that were reads (since the workload timer was reset).

Thermal Throttle Status. This shows if the device is throttled due to overheating, and when there were throttling events in the past.

Nand Bytes Written. The bytes written to NAND cells. For this device, the measured unit seems to be in 32MB values. It might be different for other devices.

Host Bytes Written. The bytes written to the NVMe storage from the system. This unit also is in 32MB values. The scale of these values is not very important, as they are the most helpful for finding the write amplification of your workload. This ratio is measured in writes to NAND and writes to HOST. For this example, the Write Amplification Factor (WAF) is 16185227 / 6405605 = 2.53  

As you can see, the NVMe command line tools provide a lot of good information for understanding the health and performance of NVMe devices. You don’t need to use vendor specific tools (like isdct).

Percona Server 5.6.35-80.0 is Now Available

February 8, 2017 - 10:09am

Percona announces the release of Percona Server 5.6.35-80.0 on February 8, 2017. Download the latest version from the Percona web site or the Percona Software Repositories.

Based on MySQL 5.6.35, and including all the bug fixes in it, Percona Server 5.6.35-80.0 is the current GA release in the Percona Server 5.6 series. Percona Server is open-source and free – this is the latest release of our enhanced, drop-in replacement for MySQL. Complete details of this release are available in the 5.6.35-80.0 milestone on Launchpad.

New Features:
  • Kill Idle Transactions feature has been re-implemented by setting a connection socket read timeout value instead of periodically scanning the internal InnoDB transaction list. This makes the feature applicable to any transactional storage engine, such as TokuDB, and, in future, MyRocks. This re-implementation is also addressing some existing bugs, including server crashes: #1166744, #1179136, #907719, and #1369373.
Bugs Fixed:
  • Logical row counts for TokuDB tables could get inaccurate over time. Bug fixed #1651844 (#732).
  • Repeated execution of SET STATEMENT ... FOR SELECT FROM view could lead to a server crash. Bug fixed #1392375.
  • CREATE TEMPORARY TABLE would create a transaction in binary log on a read-only server. Bug fixed #1539504 (upstream #83003).
  • If temporary tables from CREATE TABLE ... AS SELECT contained compressed attributes it could lead to a server crash. Bug fixed #1633957.
  • Using the per-query variable statement with subquery temporary tables could cause a memory leak. Bug fixed #1635927.
  • Fixed new compilation warnings with GCC 6. Bugs fixed #1641612 and #1644183.
  • A server could crash if a bitmap write I/O error happens in the background log tracking thread while a FLUSH CHANGED_PAGE_BITMAPS is executing concurrently. Bug fixed #1651656.
  • TokuDB was using the wrong function to calculate free space in data files. Bug fixed #1656022 (#1033).
  • CONCURRENT_CONNECTIONS column in the USER_STATISTICS table was showing incorrect values. Bug fixed #728082.
  • InnoDB index dives did not detect some of the concurrent tree changes, which could return bogus estimates. Bug fixed #1625151 (upstream #84366).
  • INFORMATION_SCHEMA.INNODB_CHANGED_PAGES queries would needlessly read potentially incomplete bitmap data past the needed LSN range. Bug fixed #1625466.
  • Percona Server cmake compiler would always attempt to build RocksDB even if -DWITHOUT_ROCKSDB=1 argument was specified. Bug fixed #1638455.
  • Adding COMPRESSED attributes to InnoDB special tables fields (like mysql.innodb_index_stats and mysql.innodb_table_stats) could lead to server crashes. Bug fixed #1640810.
  • Lack of free pages in the buffer pool is not diagnosed with innodb_empty_free_list_algorithm set to backoff (which is the default). Bug fixed #1657026.
  • mysqld_safe now limits the use of rm and chown to avoid privilege escalation. chown can now be used only for /var/log directory. Bug fixed #1660265. Thanks to Dawid Golunski (
  • Renaming a TokuDB table to a non-existent database with tokudb_dir_per_db enabled would lead to a server crash. Bug fixed #1030.
  • Read Free Replication optimization could not be used for TokuDB partition tables. Bug fixed #1012.

Other bugs fixed: #1486747 (upstream #76872), #1633988, #1638198 (upstream #82823), #1638897, #1646384, #1647530, #1647741, #1651121, #1156772, #1644569, #1644583, #1648389, #1648737, #1650247, #1650256, #1650324, #1650450, #1655587, and #1647723.

Release notes for Percona Server 5.6.35-80.0 are available in the online documentation. Please report any bugs on the launchpad bug tracker.

MySQL super_read_only Bugs

February 8, 2017 - 7:23am

This blog we describe an issue with MySQL 5.7’s super_read_only feature when used alongside with GTID in chained slave instances.


In MySQL 5.7.5 and onward introduced the gtid_executed table in the MySQL database to store every GTID. This allows slave instances to use the GTID feature regardless whether the binlog option is set or not. Here is an example of the rows in the gtid_executed table:

mysql> SELECT * FROM mysql.gtid_executed; +--------------------------------------+----------------+--------------+ | source_uuid | interval_start | interval_end | +--------------------------------------+----------------+--------------+ | 00005730-1111-1111-1111-111111111111 | 1 | 1 | | 00005730-1111-1111-1111-111111111111 | 2 | 2 | | 00005730-1111-1111-1111-111111111111 | 3 | 3 | | 00005730-1111-1111-1111-111111111111 | 4 | 4 | | 00005730-1111-1111-1111-111111111111 | 5 | 5 | | 00005730-1111-1111-1111-111111111111 | 6 | 6 | | 00005730-1111-1111-1111-111111111111 | 7 | 7 | | 00005730-1111-1111-1111-111111111111 | 8 | 8 | | 00005730-1111-1111-1111-111111111111 | 9 | 9 | | 00005730-1111-1111-1111-111111111111 | 10 | 10 | ...

To save space, this table needs to be compressed periodically by replacing GTIDs rows with a single row that represents that interval of identifiers. For example, the above GTIDs can be represented with the following row:

mysql> SELECT * FROM mysql.gtid_executed; +--------------------------------------+----------------+--------------+ | source_uuid | interval_start | interval_end | +--------------------------------------+----------------+--------------+ | 00005730-1111-1111-1111-111111111111 | 1 | 10 | ...

On the other hand, we have the super_read_only feature, if this option is set to ON, MySQL won’t allow any updates – even from users that have SUPER privileges. It was first implemented on WebscaleSQL and later ported to Percona Server 5.6. MySQL mainstream code implemented a similar feature in version 5.7.8.

The Issue [1]

MySQL’s super_read_only feature won’t allow the compression of the mysql.gtid_executed table. If a high number of transactions run on the master instance, it causes the gtid_executed table to grow to a considerable size. Let’s see an example.

I’m going to use the MySQL Sandbox to quickly setup a Master/Slave configuration, and sysbench to simulate a high number of transactions on master instance.

First, set up replication using GTID:

make_replication_sandbox --sandbox_base_port=5730 /opt/mysql/5.7.17 --how_many_nodes=1 --gtid

Next, set up the variables for a chained slave instance:

echo "super_read_only=ON" >> node1/my.sandbox.cnf echo "log_slave_updates=ON" >> node1/my.sandbox.cnf node1/restart

Now, generate a high number of transactions:

sysbench --test=oltp.lua --mysql-socket=/tmp/mysql_sandbox5730.sock --report-interval=1 --oltp-tables-count=100000 --oltp-table-size=100 --max-time=1800 --oltp-read-only=off --max-requests=0 --num-threads=8 --rand-type=uniform --db-driver=mysql --mysql-user=msandbox --mysql-password=msandbox --mysql-db=test prepare

After running sysbench for awhile, we check that the number of rows in the gtid_executed table is increasing faster:

slave1 [localhost] {msandbox} ((none)) > select count(*) from mysql.gtid_executed ; +----------+ | count(*) | +----------+ | 300038 | +----------+ 1 row in set (0.00 sec)

By reviewing SHOW ENGINE INNODB STATUS, we can find a compression thread running and trying to compress the gtid_executed table.

---TRANSACTION 4192571, ACTIVE 0 sec fetching rows mysql tables in use 1, locked 1 9 lock struct(s), heap size 1136, 1533 row lock(s), undo log entries 1525 MySQL thread id 4, OS thread handle 139671027824384, query id 0 Compressing gtid_executed table

This thread runs and takes ages to complete (or may never complete). It has been reported as #84332.

The Issue [2]

What happens if you have to stop MySQL while the thread compressing the gtid_executed table is running? In this special case, if you run the flush-logs command before or at the same time as mysqladmin shutdown, MySQL will actually stop accepting connections (all new connections hang waiting for the server) and will start to wait for the thread compressing the gtid_executed table to complete its work. Below is an example.

First, execute the flush logs command and obtain ERROR 1290:

$ mysql -h -P 5731 -u msandbox -pmsandbox -e "flush logs ;" ERROR 1290 (HY000): The MySQL server is running with the --super-read-only option so it cannot execute this statement

We’ve tried to shutdown the instance, but it hangs:

$ mysqladmin -h -P 5731 -u msandbox -pmsandbox shutdown ^CWarning; Aborted waiting on pid file: '' after 175 seconds

This bug has been reported and verified as #84597.

The Workaround

If you already have an established connection to your database with SUPER privileges, you can disable the super_read_only feature dynamically. Once that is done, the pending thread compressing the gtid_executed table completes its work and the shutdown finishes successfully. Below is an example.

We check rows in the gtid_executed table:

$ mysql -h -P 5731 -u msandbox -pmsandbox -e "select count(*) from mysql.gtid_executed ;" +----------+ | count(*) | +----------+ | 300038 | +----------+

We disable the super_read_only feature on an already established connection:

$ mysql> set global super_read_only=OFF ;

We check the rows in the gtid_executed table again, verifying that the compress thread ran successfully.

$ mysql -h -P 5731 -u msandbox -pmsandbox -e "select count(*) from mysql.gtid_executed ;" +----------+ | count(*) | +----------+ | 1 | +----------+

Now we can shutdown the instance without issues:

$ mysqladmin -h -P 5731 -u msandbox -pmsandbox shutdown

You can disable the super_read_only feature before you shutdown the instance to compress the gtid_executed table. If you ran into bug above, and don’t have any established connections to your database, the only way to shutdown the server is by issuing a kill -9 on the mysqld process.


As shown in this blog post, some of the mechanics of MySQL 5.7’s super_read_only command are not working as expected. This can prevent some administrative operations, like shutdown, from happening.

If you are using the super_read_only feature on MySQL 5.7.17 or older, including Percona Server 5.7.16 or older (which ports the mainstream implementation – unlike Percona Server 5.6, which ported Webscale’s super_read_only implementation) don’t use FLUSH LOGS.

Percona Monitoring and Management 1.1.0 Beta is Now Available

February 7, 2017 - 2:31pm

Percona announces the release of Percona Monitoring and Management 1.1.0 Beta on February 7, 2017. This is the first beta in the PMM 1.1 series with a focus on providing alternative deployment options for PMM Server:

The instructions for installing Percona Monitoring and Management 1.1.0 Beta are available in the documentation. Detailed release notes are available here.

New in PMM Server:

  • Grafana 4.1.1
  • Prometheus 1.5.0
  • Consul 0.7.3
  • Updated the MongoDB ReplSet dashboard to show the storage engine used by the instance
  • PMM-551: Fixed QAN changing query format when a time-based filter was applied to the digest

New in PMM Client:

  • PMM-530: Fixed pmm-admin to support special characters in passwords
  • Added displaying of original error message in pmm-admin config output

Known Issues:

  • Several of the MongoDB RocksDB metrics do not display correctly. This issue will be resolved in the production release.

A live demo of PMM is available at

We welcome your feedback and questions on our PMM forum.

About Percona Monitoring and Management
Percona Monitoring and Management is an open-source platform for managing and monitoring MySQL and MongoDB performance. Percona developed it in collaboration with experts in the field of managed database services, support and consulting.

PMM is a free and open-source solution that you can run in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL and MongoDB servers to ensure that your data works as efficiently as possible.

Overview of Different MySQL Replication Solutions

February 7, 2017 - 7:04am

In this blog post, I will review some of the MySQL replication concepts that are part of the MySQL environment (and Percona Server for MySQL specifically). I will also try to clarify some of the misconceptions people have about replication.

Since I’ve been working on the Solution Engineering team, I’ve noticed that – although information is plentiful – replication is often misunderstood or incompletely understood.

So What is Replication?

Replication guarantees information gets copied and purposely populated into another environment, instead of only stored in one location (based on the transactions of the source environment).

The idea is to use secondary servers on your infrastructure for either reads or other administrative solutions. The below diagram shows an example of a MySQL replication environment.


Fine, But What Choices Do I Have in MySQL?

You actually have several different choices:

Standard asynchronous replication

Asynchronous replication means that the transaction is completed on the local environment completely, and is not influenced by the replication slaves themselves.

After completion of its changes, the master populates the binary log with the data modification or the actual statement (the difference between row-based replication or statement-based replication – more on this later). This dump thread reads the binary log and sends it to the slave IO thread. The slave places it in its own preprocessing queue (called a relay log) using its IO thread.

The slave executes each change on the slave’s database using the SQL thread.


Semi-synchronous replication

Semi-synchronous replication means that the slave and the master communicate with each other to guarantee the correct transfer of the transaction. The master only populates the binlog and continues its session if one of the slaves provides confirmation that the transaction was properly placed in one of the slave’s relay log.

Semi-synchronous replication guarantees that a transaction is correctly copied, but it does not guarantee that the commit on the slave actually takes place.

Important to note is that semi-sync replication makes sure that the master waits to continue processing transactions in a specific session until at least one of the slaves has ACKed the reception of the transaction (or reaches a timeout). This differs from asynchronous replication, as semi-sync allows for additional data integrity.

Keep in mind that semi-synchronous replication impacts performance because it needs to wait for the round trip of the actual ACK from the slave.

Group Replication

This is a new concept introduced in the MySQL Community Edition 5.7, and was GA’ed in MySQL 5.7.17. It’s a rather new plugin build for virtual synchronous replication.

Whenever a transaction is executed on a node, the plugin tries to get consensus with the other nodes before returning it completed back to the client. Although the solution is a completely different concept compared to standard MySQL replication, it is based on the generation and handling of log events using the binlog.

Below is an example architecture for Group Replication.

If Group Replication interests you, read the following blog posts:

There will be a tutorial at the Percona Live Open Source Database Conference in Santa Clara in April, 2017.

Percona XtraDB Cluster / Galera Cluster

Another solution that allows you to replicate information to other nodes is Percona XtraDB Cluster. This solution focuses on delivering consistency, and also uses a certification process to guarantee that transactions avoid conflicts and are performed correctly.

In this case, we are talking about a clustered solution. Each environment is subject to the same data, and there is communication in-between nodes to guarantee consistency.

Percona XtraDB Cluster has multiple components:

  • Percona Server for MySQL
  • Percona XtraBackup for performing snapshots of the running cluster (if recovering or adding a node).
  • wsrep patches / Galera Library

This solution is virtually synchronous, which is comparable to Group Replication. However, it also has the capability to use multi-master replication. Solutions like Percona XtraDB Cluster are a component to improve the availability of your database infrastructure.

A tutorial on Percona XtraDB Cluster will be given at the Percona Live Open Source Database Conference in Santa Clara in April 2017.

Row-Based Replication Vs. Statement-Based Replication

With statement-based replication, the SQL query itself is written to the binary log. For example, the exact same INSERT/UPDATE/DELETE statements are executed by the slave.

There are many advantages and disadvantages to this system:

  • Auditing the database is much easier as the actual statements are logged in the binary log
  • Less data is transfered over the wire
  • Non-deterministic queries can create actual havoc in the slave environment
  • There might be a performance disadvantage, with some queries using statement-based replication (INSERT based on SELECT)
  • Statement-based replication is slower due to SQL optimizing and execution

Row-based replication is the default choice since MySQL 5.7.7, and it has many advantages. The row changes are logged in the binary log, and it does not require context information. This removes the impact of non-deterministic queries.

Some additional advantages are:

  • Performance improvements with high concurrency queries containing few row changes
  • Significant data-consistency improvement

And, of course, some disadvantages:

  • Network traffic can be significantly larger if you have queries that modify a large number of rows
  • It’s more difficult to audit the changes on the database
  • Row-based replication can be slower than statement-based replication in some cases
Some Misconceptions About Replication Replication is a cluster.

Standard asynchronous replication is not a synchronous cluster. Keep in mind that standard and semi-synchronous replication do not guarantee that the environments are serving the same dataset. This is different when using Percona XtraDB Cluster, where every server actually needs to process each change. If not, the impacted node is removed from the cluster. Asynchronous replication does not have this fail safe. It still accepts reads while in an inconsistent state.

Replication sounds perfect, I can use this as a manual failover solution.

Theoretically, the environments should be comparable. However, there are many parameters influencing the efficiency and consistency of the data transfer. As long as you use asynchronous replication, there is no guarantee that the transaction correctly took place. You can circumvent this by enhancing the durability of the configuration, but this comes at a performance cost. You can verify the consistency of your master and slaves using the pt-table-checksum tool.

I have replication, so I actually don’t need backups.

Replication is a great solution for having an accessible copy of the dataset (e.g., reporting issues, read queries, generating backups). This is not a backup solution, however. Having an offsite backup provides you with the certainty that you can rebuild your environment in the case of any major disasters, user error or other reasons (remember the Bobby Tables comic). Some people use delayed slaves. However, even delayed slaves are not a replacement for proper disaster recovery procedures.

I have replication, so the environment will now load balance the transactions.

Although you’ve potentially improved the availability of your environment by having a secondary instance running with the same dataset, you still might need to point the read queries towards the slaves and the write queries to the master. You can use proxy tools, or define this functionality in your own application.

Replication will slow down my master significantly.

Replication has only minor performance impacts on your master. Peter Zaitsev has an interesting post on this here, which discusses the potential impact of slaves on the master. Keep in mind that writing to the binary log can potentially impact performance, especially if you have a lot of small transactions that are then dumped and received by multiple slaves.

There are, of course, many other parameters that might impact the performance of the actual master and slave setup.

Percona Server for MongoDB 3.4.1-1.1 Release Candidate is Now Available

February 6, 2017 - 11:41am

Percona announces the release of Percona Server for MongoDB 3.4.1-1.1rc on February 6, 2017. It is the first release candidate in the 3.4 series. Download the latest version from the Percona web site or the Percona Software Repositories.

NOTE: Release candidate packages are available from the testing repository.

Percona Server for MongoDB is an enhanced, open source, fully compatible, highly scalable, zero-maintenance downtime database supporting the MongoDB v3.4 protocol and drivers. It extends MongoDB with Percona Memory Engine and MongoRocks storage engine, as well as several enterprise-grade features:

Percona Server for MongoDB requires no changes to MongoDB applications or code.

This release candidate is based on MongoDB 3.4.1, and includes the following additional changes:

Percona Server for MongoDB 3.4.1-1.1rc release notes are available in the official documentation.

Percona Toolkit 3.0.0 Release Candidate is Now Available

February 6, 2017 - 11:37am

Percona announces the availability of Percona Toolkit 3.0.0rc-2 with new MongoDB tools on February 6, 2017. This is a release candidate.

Percona Toolkit is a collection of advanced command-line tools to perform a variety of MySQL and MongoDB server and system tasks that are too difficult or complex for DBAs to perform manually. Percona Toolkit, like all Percona software, is free and open source.

This is the first release candidate in the 3.0 series. It includes new features and bug fixes. Downloads are available from the Percona Software Testing Repositories.

New features:

Bug fixes:

  • 1402776: Updated MySQLProtocolParser to fix error when parsing tcpdump capture with pt-query-digest
  • 1632522: Fixed failure of pt-online-schema-change when altering a table with a self-referencing foreign key (Thanks, Amiel Marqeta)
  • 1654668: Fixed failure of pt-summary on Red Hat and derivatives (Thanks, Marcelo Altmann)

You can find release details in the release notes. Bugs can be reported on the Percona Toolkit launchpad bug tracker.

Percona Server for MySQL 5.7.17-11 is now available

February 3, 2017 - 12:32pm

Percona announces the GA release of Percona Server for MySQL 5.7.17-11 on February 3, 2017. Download the latest version from the Percona web site or the Percona Software Repositories.

Based on MySQL 5.7.17, including all the bug fixes in it, Percona Server for MySQL 5.7.17-11 is the current GA release in the Percona Server for MySQL 5.7 series. Percona’s provides completely open-source and free software. Find release details in the 5.7.17-11 milestone at Launchpad.

New Features:
  • Percona Server for MySQL has implemented support for per-column VARCHAR/BLOB compression for the XtraDB storage engine. This also features compression dictionary support, to improve compression ratio for relatively short individual rows, such as JSON data.
  • Kill Idle Transactions feature has been re-implemented by setting a connection socket read timeout value instead of periodically scanning the internal InnoDB transaction list. This makes the feature applicable to any transactional storage engine, such as TokuDB, and, in future, MyRocks. This re-implementation is also addressing some existing bugs, including server crashes: #1166744, #1179136, #907719, and #1369373.
Bugs Fixed:
  • Logical row counts for TokuDB tables could get inaccurate over time. Bug fixed #1651844 (#732).
  • Repeated execution of SET STATEMENT ... FOR SELECT FROM view could lead to a server crash. Bug fixed #1392375.
  • CREATE TEMPORARY TABLE would create a transaction in binary log on a read-only server. Bug fixed #1539504 (upstream #83003).
  • Using per-query variable statement with subquery temporary tables could cause a memory leak. Bug fixed #1635927.
  • Fixed new compilation warnings with GCC 6. Bugs fixed #1641612 and #1644183.
  • A server could crash if a bitmap write I/O error happens in the background log tracking thread while a FLUSH CHANGED_PAGE_BITMAPS is executing concurrently. Bug fixed #1651656.
  • TokuDB was using wrong function to calculate free space in data files. Bug fixed #1656022 (#1033).
  • CONCURRENT_CONNECTIONS column in the USER_STATISTICS table was showing incorrect values. Bug fixed #728082.
  • Audit Log Plugin when set to JSON format was not escaping characters properly. Bug fixed #1548745.
  • InnoDB index dives did not detect some of the concurrent tree changes, which could return bogus estimates. Bug fixed #1625151 (upstream #84366).
  • INFORMATION_SCHEMA.INNODB_CHANGED_PAGES queries would needlessly read potentially incomplete bitmap data past the needed LSN range. Bug fixed #1625466.
  • Percona Server cmake compiler would always attempt to build RocksDB even if -DWITHOUT_ROCKSDB=1 argument was specified. Bug fixed #1638455.
  • Lack of free pages in the buffer pool is not diagnosed with innodb_empty_free_list_algorithm set to backoff (which is the default). Bug fixed #1657026.
  • mysqld_safe now limits the use of rm and chown to avoid privilege escalation. chown can now be used only for /var/log directory. Bug fixed #1660265. Thanks to Dawid Golunski (
  • Renaming a TokuDB table to a non-existent database with tokudb_dir_per_db enabled would lead to a server crash. Bug fixed #1030.
  • Read Free Replication optimization could not be used for TokuDB partition tables. Bug fixed #1012.

Other bugs fixed: #1486747, #1617715, #1633988, #1638198 (upstream #82823), #1642230, #1646384, #1640810, #1647530, #1651121, #1658843, #1156772, #1644583, #1648389, #1648737, #1650256, and #1647723.

The release notes for Percona Server for MySQL 5.7.17-11 are available in the online documentation. Please report any bugs on the launchpad bug tracker.

Percona Live Featured Tutorial with Derek Downey, David Turner and René Cannaò — ProxySQL Tutorial

February 3, 2017 - 9:22am

Welcome to another post in the series of Percona Live featured tutorial speakers blogs! In these blogs, we’ll highlight some of the tutorial speakers that will be at this year’s Percona Live conference. We’ll also discuss how these tutorials can help you improve your database environment. Make sure to read to the end to get a special Percona Live 2017 registration bonus!

In this Percona Live featured tutorial, we’ll meet Derek Downey (OSDB Practice Advocate, Pythian), David Turner (Storage SRE, Uber) and René Cannaò (MySQL SRE, Dropbox / ProxySQL). Their session is ProxySQL Tutorial. There is a stigma attached to database proxies when it comes to MySQL. This tutorial hopes to blow away that stigma by showing you what can be done with a proxy designed from the ground up to perform. I had a chance to speak with Derek, David and René and learn a bit more about ProxySQL:

Percona: How did you get into database technology? What do you love about it?

Derek Downey

Derek: I took a relational database course in college based on Oracle. Set theory and the relational model made a lot of sense to me. After a few years as a web developer at a small company, I transitioned to a hybrid SysAdmin/DBA role and got my first taste of the potential of “the cloud” (and some of the drawbacks).

I really came to understand that data is the lifeblood of any organization, and making sure it is always available through any disaster – from human error to hurricanes – is a unique and exciting challenge.

You should never notice the DBA if they’re doing their job right. There’s not much praise for a DBA on a job well done. But it’s a vital position to keep a company running. And that suits me just fine.

David: I started working for the Advanced Projects Group at the University of Missouri, now known as MOREnet. They were largely responsible for connecting all of the libraries and schools in the state to the Internet. I was initially helping them with their Internet presence as a webmaster. Later they needed help with their databases. I got very excited about working with Oracle at the time, and decided to join that team.

My relationship with MySQL started primarily because the cost of sharding Oracle was so high. Additionally, MySQL’s replication allowed us to use slaves. Oracle’s Dataguard/standby options wouldn’t allow reads from the slaves at that time. Lastly, MySQL was sort of “wild west” fun, since it lacked so many other features that Oracle had long ago. You had to get creative. It has been humbling to see how much innovation has come from the community and how far MySQL has come. And this is only the beginning!

René Cannaò

René: My career followed the classic path of a system administrator that ends up becoming a DBA. I used to work for a few companies as webmaster,  and finally as SysAdmin for a web hosting company. I always saw a similar pattern: “the bottleneck is in the database.” Nobody ever knew why the database was the bottleneck. I volunteered to improve the performance of this “unknown system.” Learning was a fun experience, and the result was extremely rewarding. I love understanding how databases operate and their internals. This is the only way to be able to get the maximum performance: “scientia potentia est”!

Percona: Your tutorial is called “ProxySQL Tutorial.” What exactly is ProxySQL, and what does it do?

Derek: I’ll leave it to René, the creator of ProxySQL, to give more detail on exactly what it is. But for a DBA trying to ensure their data is always available, it is a new and exciting tool in our toolbox.

René: ProxySQL is the MySQL data gateway. It’s the Stargate that can inspect, control, transform, manage, and route all traffic between clients and database servers. It builds reliable and fault-tolerant networks. It is a software bridge that empowers MySQL DBAs, built by DBAs for DBAs, allowing them to control all MySQL traffic where previously such traffic could not be controlled either on the client side (normally the developers’ realm) or server side (where there are not enough tools).

David Turner

David: Architecturally, ProxySQL is a separate process between the client and the database. Because traffic passes through it, ProxySQL can become many things (three of which got my attention). It can be a multiplexer, a filter, and a replicator.

Multiplexers reduce many signals down to a few. Web servers often open many static connections to MySQL. Since MySQL can only support a limited number of connections before performance suffers, ProxySQL’s ability to transparently manage tens of thousands of connections while only opening a few to the database is a great feature.

Administrators can update ProxySQL to filter and even rewrite queries based on patterns they decide on. As someone that has worked in operations and seen how long it can take to disable misbehaving applications, this is a very compelling feature. With ProxySQL in place, I can completely block a query from the database in no time.

ProxySQL’s replication or mirroring capability means that all of the queries sent to one database can now be sent to N databases. As someone that has to roll out new versions of MySQL, test index changes, and benchmark hardware this is also a compelling feature.

Percona: What are the advantages of using ProxySQL in a database environment?

René: ProxySQL is the bridge between the clients and the servers. It creates two layers, and controls all the communication between the two. Sitting in the middle, ProxySQL provides a lot of advantages normally impossible to achieve in a standard database environment, such as throttling or blocking queries, rewriting queries, implementing sharding, read/write splitting, caching, duplicating traffic, handling backend failures, failovers, integration with HA solutions, generating real-time statistics, etc. All this, without any application change, and completely transparent to the application.

Derek: For me, ProxySQL decouples the application from the data layer. This provides more control over the backend database environment to the DBA in regards to queries, maintenance and failovers, without impact to the application.

David: In addition to the roles noted above, ProxySQL can be part of a failover solution, routing queries to a new master when the current master fails. Other advantages are splitting queries over multiple databases to distribute the load, provide additional metrics, etc.

Percona: What do you want attendees to take away from your tutorial session? Why should they attend?

Derek: This tutorial highlights what ProxySQL is trying to achieve, and discusses how to add ProxySQL to common architectures environments. Attendees will get hands-on experience with the technology, and learn how to install and configure ProxySQL to achieve query rewriting, seamless failover, and query mirroring.

David: Not only hands-on experience with ProxySQL, but an understanding of how much they can leverage with it. The more I use ProxySQL, the more advantages I see to it. For example, I did not realize that by clustering ProxySQL processes I can distribute the query matching and rewrites over many hosts, as well as use them as a caching layer.

René: ProxySQL is built upon very innovative technologies. Certain architectural concepts like hostgroups, chaining of query rules, granular routing and sharding, query retries and the very powerful Admin interface are concepts not always intuitive for DBAs with experience using other proxies. This tutorial helps understand these concepts, and attendees get hand-on experience in the configuration of ProxySQL in various scenarios.

Percona: What are you most looking forward to at Percona Live?

David: First, the people. Next, everything everyone is working on. We’re really lucky to work in such an innovative and collaborative industry. As databases evolve, we are on the ground floor of their evolution. What an exciting place to be.

Derek: I am mostly looking forward to reconnecting with my peers in the MySQL community. Both ones I’ve formerly worked with or previously met at Percona Live, as well as meeting new open source database professionals and hearing how they are providing solutions for their companies.

René: I am looking forward to attending sessions regarding new features in MySQL 8 and other new technologies. But moreover, I am excited to interact with MySQL users and get more input on how to improve ProxySQL so that it can become an indispensable tool in any MySQL environment.

Register for Percona Live Data Performance Conference 2017, and see Derek, David and René present their ProxySQL Tutorial. Use the code FeaturedTutorial and receive $30 off the current registration price!

Percona Live Data Performance Conference 2017 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community as well as businesses that thrive in the MySQL, NoSQL, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Data Performance Conference will be April 24-27, 2017 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

PMM Alerting with Grafana: Working with Templated Dashboards

February 2, 2017 - 3:54pm

In this blog post, we will look into more intricate details of PMM alerting. More specifically, we’ll look at how to set up alerting based on templated dashboards.

Percona Monitoring and Management (PMM) 1.0.7 includes Grafana 4.0, which comes with the Alerting feature. Barrett Chambers shared how to enable alerting in general. This blog post looks at the specifics of setting up alerting based on the templated dashboards. Grafana 4.0 does not support basic alerting out-of-the-box.

This means if I try to set up an alert on the number of MySQL threads running, I get the error “Template variables are not supported in alert queries.”

What is the solution?

Until Grafana provides a better option, you need to do alerting based on graphs (which don’t use templating). This is how to do it.

Click on “Create New” in the Dashboards list to create a basic dashboard for your alerts:

Click on “Add Panel” and select “Graph”:

Click on the panel title of the related panel on the menu sign, and then click on “Panel JSON”.

This shows you the JSON of the panel, which will look like something like this:

Now you need to go back to the other browser window, and the dashboard with the graph you want to alert on. Show the JSON panel for it. In our case, we go to “MySQL Overview” and show the JSON for “MySQL Active Threads” panel.

Copy the JSON from the “MySQL Active Threads” panel and paste it into the new panel in the dashboard created for alerting.

Once we have done the copy/paste, click on the green Update button, and we’ll see the broken panel:

It’s broken because we’re using templating variables in dashboard expressions. None of them are set up in this dashboard. Expressions won’t work. We must replace the template variables in the formulas with actual hosts, instances, mount points, etc., for we want to alert on:

We need to change $host to the name of the host we want to alert on, and the $interval should align with the data capture interval (here we’ll set it to 5 seconds):

If correctly set up, you should see the graph showing the data.

Finally, we can go to edit the graph. Click on the “Alert” and “Create Alert”.

Specify Evaluate Every to create an alert. This sets up the evaluation interval for the alert rule. Obviously, the more often the alert evaluates the condition, the more quickly you get alerted if something goes wrong (as well as alert conditions).

In our case, we want to get an alert if the number of running threads are sustained at a high rate. To do this, look at the minimum number of threads for last minute to be above 30:

Note that our query has two parameters: “A” is the number of threads connected, and “B” is the number of threads running. We’re choosing to Alert on “B”. 

The beautiful thing Grafana does is show the alert threshold clearly on the graph, and allows you to edit the alert just by moving this alert line with a mouse:

You may want to click on the floppy drive at the top to save dashboard (giving it whatever identifying name you want).

At this point, you should see the alert working. A little heart sign appears by the graph title, colored green (indicating it is not active) or red (indicating it is active). Additionally, you will see the red and green vertical lines in the alert history. These show when this alert gets triggered and when the system went back to normal.

You probably want to set up notifications as well as see alerts on the graphs. 

To set up notifications, go to the Grafana Configuration menu and configure Alerting. There are Grafana Support Email, Slack, Pagerduty and general Webhook notification options (with more on the way, I’m sure).

The same way you added the “Graph” panel to set up an alert, you can add the “Alert List” panel to see all the alerts you have set up (and their status):


As you can see, it is possible to set up alerts in PMM using the new Grafana 4.0 alerting feature. It is not very convenient or easy to do. This is first alerting support release for Grafana and PMM. As such, I’m sure it will become much easier and more convenient over time.

Vote Percona in Members Choice Awards 2016

February 2, 2017 - 12:40pm

Percona is calling on you! Vote for Percona for Database of the Year in Members Choice Awards 2016. Help Percona get recognized as one of the best database options for data performance. Percona provides free, fully compatible, enhanced, open source drop-in replacement database software with superior performance, scalability and instrumentation., or LQ for short, is a community-driven, self-help website for Linux users. Each year, holds an annual competition to recognize the year’s best-in-breed technologies. The online Linux community determines the winners of each category!

You can vote now for your favorite database of 2016 (Percona, of course!). This is your chance to be heard!

Voting ends on February 7, 2017. You must be a registered member of with at least one post on their forums to vote.

Percona Server for MySQL 5.5.54-38.6 is now available

February 1, 2017 - 12:46pm

Percona announces the release of Percona Server for MySQL 5.5.54-38.6 on February 1, 2017. Based on MySQL 5.5.54, including all the bug fixes in it, Percona Server for MySQL 5.5.54-38.6 is now the current stable release in the 5.5 series.

Percona Server for MySQL is open-source and free. You can find release details in the 5.5.54-38.6 milestone on Launchpad. Downloads are available here and from the Percona Software Repositories.

Bugs Fixed:
  • Fixed new compilation warnings with GCC 6. Bugs fixed #1641612 and #1644183.
  • CONCURRENT_CONNECTIONS column in the USER_STATISTICS table was showing incorrect values. Bug fixed #728082.
  • Audit Log Plugin when set to JSON format was not escaping characters properly. Bug fixed #1548745.
  • mysqld_safe now limits the use of rm and chown to avoid privilege escalation. chown can now be used only for /var/log directory. Bug fixed #1660265.

Other bugs fixed: #1638897, #1644174, #1644547, and #1644558.

Find the release notes for Percona Server for MySQL 5.5.54-38.6 in our online documentation. Report bugs on the launchpad bug tracker.

Docker Security Vulnerability CVE-2016-9962

January 31, 2017 - 9:39am

Docker 1.12.6 was released to address CVE-2016-9962. CVE-2016-9962 is a serious vulnerability with RunC.

Quoting the coreos page (linked above):

“RunC allowed additional container processes via runc exec to be ptraced by the pid 1 of the container. This allows the main processes of the container, if running as root, to gain access to file-descriptors of these new processes during the initialization and can lead to container escapes or modification of runC state before the process is fully placed inside the container.”

In short, IF processes run as root inside a container they could potentially break out of the container and gain access over the host.

My recommendation at this time is to apply the same basic security tenants for containers as you would (I hope) for VM and baremetal installs. In other words, ensure you are adhering to a Path of Least Privilege as a best practice and not running as root for conevience’s sake.

Prior to this, we made changes to PMM prior to version 1.0.4 to reduce the number of processes within the container that ran as root. As such, only the processes required to do so run as root. All other processes run as a lower privilege user.

Check here for documentation on PMM, and use the JIRA project to raise bugs (JIRA requires registration).

To comment on running a database within docker, I’ve reviewed the following images

  • percona-server image: I have verified it does not run as root, and runs as a mysql user (for 5.7.16 at least)
  • percona-server-mongodb: I have worked with our teams internally and can confirm that the latest image no longer runs as root (you will to run the latest image, however, to see this change via docker pull)

Please comment below with any questions.

MySQL Sharding Models for SaaS Applications

January 30, 2017 - 3:16pm

In this blog post, I’ll discuss MySQL sharding models, and how they apply to SaaS application environments.

MySQL is one of the most popular database technologies used to build many modern SaaS applications, ranging from simple productivity tools to business-critical applications for the financial and healthcare industries.

Pretty much any large scale SaaS application powered by MySQL uses sharding to scale. In this blog post, we will discuss sharding choices as they apply to these kinds of applications.

In MySQL, unlike in some more modern technologies such as MongoDB, there is no standard sharding implementation that the vast majority of applications use. In fact, if anything “no standard” is the standard. The common practice is to roll your own sharding framework, as famous MySQL deployments such as Facebook and Twitter have done. MySQL Cluster – the MySQL software that has built-in Automatic Sharding functionality – is rarely deployed (for a variety of reasons). MySQL Fabric, which has been the official sharding framework, has no traction either.

When sharding today, you have a choice of rolling your own system from scratch, using comprehensive sharding platform such as Vitess or using a proxy solution to assist you with sharding. For proxy solutions, MySQL Router is the official solution. But in reality, third party solutions such as open source ProxySQL, commercial ScaleArc and semi-commercial (BSL)  MariaDB MaxScale are widely used. Keep in mind, however, that traffic routing is only one of the problems that exist in large scale sharding implementations.

Beneath all these “front end” choices for sharding on the application database connection framework or database proxy, there are some lower level decisions that you’ve got to make. Namely, around how your data is going to be led out and organized on the MySQL nodes.

When it comes to SaaS applications, at least one answer is simple. It typically makes sense to shard your data by “customer” or “organization” using some sort of mapping tables. In the vast majority of cases, single node (or replicated cluster) should be powerful enough to handle all the data and load coming from each customer.

What Should I Ask Myself Now?

The next set questions you should ask yourself are around your SaaS applications:

  • How much revenue per customer are you generating?
  • Do your customers (or regulations) require data segregation?
  • Are all the customers about the same, or are there outliers?
  • Are all your customers running the same database schema?

I address the answers in the sections below.

How Much Revenue?

How much revenue per customer you’re generating is an important number. It defines how much infrastructure costs per customer you can afford. In the case of “freemium” models, and customers generating less than $1 a month an average, you might need to ensure low overhead per customer (even if you have to compromise on customer isolation).

How much revenue per customer you’re generating is an important number. It defines how much infrastructure costs per customer you can afford. In the case of “freemium” models, and customers generating less than $1 a month an average, you might need to ensure low overhead per customer (even if you have to compromise on customer isolation).

Typically with low revenue customers, you have to co-locate the data inside the same MySQL instance (potentially even same tables). In the case of high revenue customers, isolation in separate MySQL instances (or even containers or virtualized OS instances) might be possible.

Data Segregation?

Isolation is another important area of consideration. Some enterprise customers might require that their data is physically separate from others. There could also be government regulations in play that require customer data to be stored in a specific physical location. If this is the case, you’re looking at completely dedicated customer environments. Or at the very least, separate database instances (which come with additional costs).

Customer Types?

Customer size and requirements are also important. A system designed to handle all customers of approximately the same scale (for example, personal accounting) is going to be different than if you are in the business of blog hosting. Some blogs might be 10,000 times more popular than the average.

Same Database Schema?

Finally, there is a there is the big question of whether all your customers are running the same database schema and same software version. If you want to support different software versions (if your customers require a negotiated maintenance window for software upgrades, for example) or different database schemas (if the schema is dependent on the custom functionality and modules customers might use, for example), keeping such customers in different MySQL schemas make sense.

Sharding Models

This gets us to the following sharding isolation models, ranging from lowest to highest:

  • Customers Share Schemas. This is the best choice when you have very large numbers of low-revenue customers. In this case, you would map multiple customers to the same set of tables, and include something like a customer_id field in them to filter customer data. This approach minimizes customer overhead and reduces customer isolation. It’s harder to backup/restore data for individual customers, and it is easier to introduce coding mistakes that can access other customers data. This method does not mean there is only one schema, but that there is a one-to-many relationship between schemas and customers.  For example, you might have 100 schema’s per MySQL instance, each handling 1000 to 10000 customers (depending on the application). Note that with a well-designed sharding implementation, you should be able to map customers individually to schemas. This allows you to have key customer data stored in dedicated schemas, or even on dedicated nodes.
  • Schema per Customer. This is probably the most common sharding approach in MySQL powered SaaS applications. Especially ones that have substantial revenue ($10+ per month / per customer). In this model, each customer’s data is stored in its own schema (database). This makes it very easy to backup/restore individual customers. It allows customers to have different schemas (i.e., add custom tables). It also allows them to run different versions of the application if desired. This approach allows the application server to use different MySQL users connecting on behalf of different customers, which adds an extra level of protection from accidental (or intentional) access of data that belongs to different customers. The schema per customer approach also makes it easier to move the shards around, and limits maintenance impact. The downside of this approach is higher overhead. It also results in a large number of tables per instance, and potentially larger numbers of files (which can be hard to manage).
  • Database Instance per Customer. You achieve even better isolation by having a MySQL instance per customer. This approach, however, increases overhead even further. The recent rise of light virtualization technologies and containers has reduced its usage.
  • OS Instance/Container per Customer. This approach allows you to improve isolation even further. It can be used for any customer, but can also be applied to selected customers in a model that uses Schema per Customer model for a majority of them.  Dedicated OS Instance, with improved isolation and better performance SLAs, might be a feature of some premium customer tiers. This method not only allows better isolation, but it also let’s you handle outliers better. You might chose to run a majority of your customers on the hardware (or cloud instance) that has best price/performance numbers, and also place some of the larger customers on the highest performance nodes.
  • Environment per customer. Finally, if you take this all the way you can build completely separate environments for customers. This includes databases, application servers and other required components. This is especially useful if you need to deploy the application close to the customer – which includes the appliance model, or deployment in the customer’s data center or cloud provider. This also allows you to accommodate customers if their data must be stored in a specific location. This is often due to government regulations. It is worth noting that many SaaS applications, even if they do not quite have one environment per customer, have multiple independent environments. These are often hosted in different locations or availability zones. Such setups allow you to reduce the impact of large-scale failures to only a portion of your customers. This avoids overloading your customer service group and allowing the operational organization to focus on repairing smaller environments.

The farther you go down this route – from the shared schema to an environment per customer – the more important is to have a high level of automation. With a shared schema, you often can get by with little automation (and some environments manually set up) and all the schema’s pre-created. If customer sign up requires setting up dedicated database instance or the whole environment, manual implementation doesn’t scale. For this type of setup, you need state-of-the-art automation and orchestration.


I hope this helps you to understand your options for MySQL sharding models. Each of the different sharding models for SaaS applications powered by MySQL have benefits and drawbacks. As you can see, many of these approaches require you to work with a large number of tables in the MySQL – this will be the topic of one of my next posts!

MariaDB ColumnStore

January 30, 2017 - 9:29am

Last month, MariaDB officially released MariaDB ColumnStore, their column store engine for MySQL. This post discusses what it is (and isn’t), why it matters and how you can approach a test of it.

What is ColumnStore?

ColumnStore is a storage engine that turns traditional MySQL storage concepts on their head. Instead of storing the data by row, a column store stores the data by column (obviously). This provides advantages for certain types of data, and certain types of queries run against that data. See my previous post for more details on column-based storage systems.

ColumnStore is a fork of InfiniDB and carries forward many of the concepts behind that product. InfiniDB ceased operations in 2014. With the front end managed through MariaDB, you get access to all of the expected security and audit options of MariaDB. MariaDB designed ColumnStore as a massively parallel database, working best in an environment with multiple servers. This is somewhat different than a traditional row store database.

ColumnStore stores columnar data in a concept called an “extent.” An extent contains a range of values for a single column. Each extent contains no more than 8 million values. It stores additional values in a new extent. The extents for a single column get distributed across the database nodes, known as “Performance Modules” in ColumnStore. It stores each unique extent on more than one node, thus providing data redundancy and removing the need for replication. If a node is down, and it contains an extent needed for a query, that same extent is found on another node in the environment. This data redundancy also provides a high availability environment.

The query engine determines which extents process query requests. Since the data in an extent is often preordered (time series data, for example), many queries can ignore individual extents since they cannot contain any data needed for the query. If we are only looking for data from February 2017, for example, extents containing data outside of that range get ignored. However, if a query requires data from many or all extents on a single column, the query takes much longer to complete.

Unlike some traditional column store vendors, that take an all or nothing approach to storage, MariaDB decided to go with a mixed concept. In a MariaDB MySQL database, you can mix traditional InnoDB storage with the new ColumnStore storage, just like you used to mix InnoDB and MyISAM. This presents some nice options, not the least of which is that it provides a way to “dip your toe” into the world of column stores. On the other hand, it could lead to people using the ColumnStore engine in non-optimal ways. Also, the differences in what is considered optimal architecture between these two storage options make it hard to see how this plays out in the real world.

Data Definition

As discussed in the earlier post, column storage works great for specific types of data and queries. It is important that your data definitions are as tight as possible, and that your queries are appropriate for column-based data.

Many people set their field definition as VARCHAR(256) when setting up a new database. They might not know what type of data gets stored in the new field. This broad definition allows you to store whatever you throw at the database. The negative effect for row store is that it can cause over-allocation of storage – but it only has a minimal effect on queries.

In a column store, the field definition can drive decisions about the compression methods for storing the data, along with sorting implications. Columnar data can use storage more efficiently than a row store, since the data for a single column is well-defined. This leads to selecting the best compression algorithm for the data. If that data is poorly defined, the selected compression algorithm might not be the best for the data.

Sorting is also a problem in a column store when the data types are not well-defined. We’ve all seen integer or date data that is sorted alphabetically. While it can be annoying, we can still adjust to that sorting method to find what we need. Since a column store is often used to perform analytical queries over a range of data, this poorly-sorted data can present a bigger problem. If you specify a column to be VARCHAR and only include date information, that data is sorted alphabetically. The same column defined as DATE causes the data to be sorted by date. This chart shows the difference (date format is mm/dd/yy)

Alphabetic Sort Date Sort 01/01/17 01/01/17 01/11/17 01/02/17 01/02/17 01/04/17 01/21/17 01/11/17 01/4/17 01/21/17 11/01/17 02/01/17 11/02/17 11/01/17 02/01/17 11/02/17


Imagine running a query over a range of dates (requesting all activity in the months of January and February 2017, for example). In the alphabetic sort, this requires working through the whole file, since the data for November shows up between the data for January and February. In the date sort, the query only reads the until the end of February. We know there can be no more matching data after that. The alphabetic sort leads to more I/O, more query time and less happiness on the part of the user.

Why Should You Care About ColumnStore?

The first reason is that it allows you to try out column storage without doing a massive shift in technology and with minimal effort. By setting up some tables in a MariaDB database to use the ColumnStore engine, you can benefit from the storage efficiencies and faster query capabilities, provided that the data you’ve selected for this purpose is sound. This means that the data definitions should be tight (always a good plan anyway), and the queries should be more analytical than transactional. For a purely transactional workflow, a row store is the logical choice. For a purely analytical workflow, a column store is the logical choice. ColumnStore allows you to easily mix the two storage options so that you can have the best match possible. It is still important to know what type of workflow you’re dealing with, and match the storage engine to that need.

Another solid reason is that it is a great fit if you are already doing analysis over massive amounts of data. Any column store shines when asked to look at relatively few columns of data (ideally the column or two that are being aggregated and other columns to locate and group the data). If you are already running these types of queries in MySQL, ColumnStore would likely be a good fit.

But There Be Dragons!

As with any new technology, ColumnStore might not be a good fit for everyone. Given that you can mix and match your storage engines, with ColumnStore for some tables and InnoDB for others, it can be tempting to just go ahead with a ColumnStore test doing things the same way you always did in the past. While this still yields results, those results might not be a true test of the technology. It’s like trying to drive your minivan the same way you used to drive your sports car. “Hey, my Alfa Romeo never flipped over taking these turns at high speed!”

To effectively use ColumnStore, it’s important to match it to a proper analytical workload. This means that you will likely do more bulk loading into these tables, since there is additional overhead in writing the data out into the column files. The overall workflow should be more read-intensive. The queries should only look for data from a small set of fields, but can span massive amounts of data within a single column. In my earlier post, there’s also a discussion about normalization of data and how denormalizing data is more common in columnar databases.

You should address these issues in your testing for a valid conclusion.

The minimum specifications for ColumnStore also point to a need for a more advanced infrastructure than is often seen for transactional data. This is to support batch loading, read intensive workloads and possibly different ETL processes for each type of data. In fact, MariaDB states in the installation documentation for ColumnStore that it must be completed as a new installation of MariaDB. You must remove any existing installations of MariaDB or MySQL before installing the ColumnStore-enabled RPM on a system.

Is It Right for Me?

ColumnStore might fit well into your organization. But like haggis, it’s not for everyone. If you need analytical queries, it is a great option. If your workload is more read-intensive, it could still work for you. As we move to a more Internet of Things (IoT) world, we’re likely to see a need for more of this type of query work. In order to accurately present each user with the best possible Internet experience, we might want to analyze their activities over spans of time and come up with the best match for future needs.

Seriously consider if making the move to ColumnStore is right for you. It is newer software (version 1.0.6, the first GA version, was released on December 14, 2016, and 1.0.7 was released on January 23, 2017), so it might go through changes as it matures. Though a new product, it is based on InfiniDB concepts (which are somewhat dated). MariaDB has lots of plans for additional integrations and support for ancillary products that are absent in the current release.

MariaDB took a huge step forward with ColumnStore. But do yourself a favor and consider whether it is right for you before testing it out. Also, make sure that you are not trying to force your current workflow into the column store box. Kids know that we cannot put a square peg in a round hole, but we adults often try to do just that. Finding the right peg saves you lots of time, hassle and annoyance.

Visit Percona Store

General Inquiries

For general inquiries, please send us your question and someone will contact you.