]]>
]]>

Feed aggregator

You are here

Percona Server 5.5.39-36.0 is now available

Latest MySQL Performance Blog posts - August 29, 2014 - 7:48am

Percona is glad to announce the release of Percona Server 5.5.39-36.0 on August 29, 2014 (Downloads are available here and from the Percona Software Repositories). Based on MySQL 5.5.39, including all the bug fixes in it, Percona Server 5.5.39-36.0 is now the current stable release in the 5.5 series. All of Percona‘s software is open-source and free, all the details of the release can be found in the 5.5.39-36.0 milestone at Launchpad.

New Features:

Bugs Fixed:

  • Querying INNODB_CHANGED_PAGES with a range condition START_LSN > x AND END_LSN < y would lead to a server crash if the range was empty with x greater than y. Bug fixed #1202252 (Jan Lindström and Sergei Petrunia).
  • With XtraDB changed page tracking feature enabled, queries from the INNODB_CHANGED_PAGES could read the bitmap data whose write was in still progress. This would cause the query to fail with an ER_CANT_FIND_SYSTEM_REC and a warning printed to the server error log. The workaround has been to add an appropriate END_LSN-limiting condition to the query. Bug fixed #1346122.
  • mysqld-debug was missing from Debian packages. This regression was introduced in Percona Server 5.5.36-34.0. Bug fixed #1290087.
  • Fixed a memory leak in Slow Query Log Rotation and Expiration. Bug fixed #1314138.
  • The audit log plugin would write log with XML syntax errors when OLD and NEW formats were used. Bug fixed #1320879.
  • A server built with system OpenSSL support, such as the distributed Percona Server binaries, had SSL-related memory leaks. Bug fixed #1334743 (upstream #73126).
  • If the bitmap directory has a bitmap file sequence with a start LSN of one file less than a start LSN of the previous file, a debug build would assert when queries were run on INNODB_CHANGED_PAGES table. Bug fixed #1342494.

Other bugs fixed: #1337324, #1151723, #1182050, #1182072, #1280875, #1182046, #1328482 (upstream #73418), and #1334317 (upstream #73111).

Release notes for Percona Server 5.5.39-36.0 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

The post Percona Server 5.5.39-36.0 is now available appeared first on MySQL Performance Blog.

Percona Server 5.5.39-36.0 is now available

Latest MySQL Performance Blog posts - August 29, 2014 - 7:48am

Percona is glad to announce the release of Percona Server 5.5.39-36.0 on August 29, 2014 (Downloads are available here and from the Percona Software Repositories). Based on MySQL 5.5.39, including all the bug fixes in it, Percona Server 5.5.39-36.0 is now the current stable release in the 5.5 series. All of Percona‘s software is open-source and free, all the details of the release can be found in the 5.5.39-36.0 milestone at Launchpad.

New Features:

Bugs Fixed:

  • Querying INNODB_CHANGED_PAGES with a range condition START_LSN > x AND END_LSN < y would lead to a server crash if the range was empty with x greater than y. Bug fixed #1202252 (Jan Lindström and Sergei Petrunia).
  • With XtraDB changed page tracking feature enabled, queries from the INNODB_CHANGED_PAGES could read the bitmap data whose write was in still progress. This would cause the query to fail with an ER_CANT_FIND_SYSTEM_REC and a warning printed to the server error log. The workaround has been to add an appropriate END_LSN-limiting condition to the query. Bug fixed #1346122.
  • mysqld-debug was missing from Debian packages. This regression was introduced in Percona Server 5.5.36-34.0. Bug fixed #1290087.
  • Fixed a memory leak in Slow Query Log Rotation and Expiration. Bug fixed #1314138.
  • The audit log plugin would write log with XML syntax errors when OLD and NEW formats were used. Bug fixed #1320879.
  • A server built with system OpenSSL support, such as the distributed Percona Server binaries, had SSL-related memory leaks. Bug fixed #1334743 (upstream #73126).
  • If the bitmap directory has a bitmap file sequence with a start LSN of one file less than a start LSN of the previous file, a debug build would assert when queries were run on INNODB_CHANGED_PAGES table. Bug fixed #1342494.

Other bugs fixed: #1337324, #1151723, #1182050, #1182072, #1280875, #1182046, #1328482 (upstream #73418), and #1334317 (upstream #73111).

Release notes for Percona Server 5.5.39-36.0 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

The post Percona Server 5.5.39-36.0 is now available appeared first on MySQL Performance Blog.

OpenStack Trove Day 2014 Recap: MySQL and DBaaS

Latest MySQL Performance Blog posts - August 28, 2014 - 1:00am

OpenStack Trove Day

I just returned from a week in Cambridge, Massachusetts where I was attending the OpenStack Trove Day and the Trove mid-cycle meetup, both sponsored by the great folks at Tesora.

I am relatively new to the OpenStack and Trove arenas so this was a fantastic opportunity for me to learn more about the communities, the various components within OpenStack, and what part Trove plays. I found the entire event very worthwhile – I met a lot of key people in the community, learned more about Trove and its potential, and in general felt a great energy and excitement surrounding Trove and OpenStack as a whole.

There were more than 120 attendees at Trove Day. That is almost four times the initial estimate! I think I would call that a success. There were 7 very high quality topics that covered material ranging from new and coming features within Trove, to deep inspection of how it is currently used in several big name companies to an investor’s perspective of the OpenStack market. There were also 2 panel style discussions that covered a lot of ground with all participants being ‘guys on the ground’ actively working with OpenStack deployments including one of my fellow Perconians, Mr. Tim Sharp.

One of the main takeaways for me from the entire day was the forward looking adoption estimates for Trove. This came up over and over through the various talks and panels. There seems to be a tremendous amount of interest in Trove deployments for late 2014/2015 but very few actual live users today. There also seems to be a bit of a messaging issue and confusion amongst potential users as to what Trove really is and is not. Simply reading the Trove Mission Statement should quickly clarify:

The OpenStack Open Source Database as a Service Mission: To provide scalable and reliable Cloud Database as a Service provisioning functionality for both relational and non-relational database engines, and to continue to improve its fully-featured and extensible open source framework.

So allow me to expand on that a bit based on some specific comments or questions that I overheard:
- Trove is NOT a database abstraction layer nor any sort of database unification tool; all applications still communicate with their respective datastores directly through their native APIs.
- Trove is NOT a database monitoring, management or analysis tool; all of your favorite debugging and monitoring tools like Percona Toolkit will still work exactly as advertised, and yes, you do need a monitoring tool.
- Although Trove does have some useful backup scheduling options, Trove is NOT a complete backup and recovery tool that can accommodate every backup strategy; you may still use 3rd party options such as scripting your own around Percona XtraBackup or make your life a lot easier and sign up for the Percona Backup Service.
- Trove IS a very nice way to add resource provisioning for many disparate datastores and has some ‘smarts’ built in for each. This ensures a common user experience when provisioning and managing datastore instances.

To that final point, our friends at Tesora introduced their new Database Certification Program at Trove Day. This new program will ensure a high level of compatibility between the various participating database vendors and the Trove project. Of course, Percona Server has already been certified.

I see the future of Trove as being very bright with a huge potential for expansion into other areas, once it is stabilized. I am very excited to begin contributing to this project and watch it grow.

Until next time…

The post OpenStack Trove Day 2014 Recap: MySQL and DBaaS appeared first on MySQL Performance Blog.

OpenStack Trove Day 2014 Recap: MySQL and DBaaS

Latest MySQL Performance Blog posts - August 28, 2014 - 1:00am

OpenStack Trove Day

I just returned from a week in Cambridge, Massachusetts where I was attending the OpenStack Trove Day and the Trove mid-cycle meetup, both sponsored by the great folks at Tesora.

I am relatively new to the OpenStack and Trove arenas so this was a fantastic opportunity for me to learn more about the communities, the various components within OpenStack, and what part Trove plays. I found the entire event very worthwhile – I met a lot of key people in the community, learned more about Trove and its potential, and in general felt a great energy and excitement surrounding Trove and OpenStack as a whole.

There were more than 120 attendees at Trove Day. That is almost four times the initial estimate! I think I would call that a success. There were 7 very high quality topics that covered material ranging from new and coming features within Trove, to deep inspection of how it is currently used in several big name companies to an investor’s perspective of the OpenStack market. There were also 2 panel style discussions that covered a lot of ground with all participants being ‘guys on the ground’ actively working with OpenStack deployments including one of my fellow Perconians, Mr. Tim Sharp.

One of the main takeaways for me from the entire day was the forward looking adoption estimates for Trove. This came up over and over through the various talks and panels. There seems to be a tremendous amount of interest in Trove deployments for late 2014/2015 but very few actual live users today. There also seems to be a bit of a messaging issue and confusion amongst potential users as to what Trove really is and is not. Simply reading the Trove Mission Statement should quickly clarify:

The OpenStack Open Source Database as a Service Mission: To provide scalable and reliable Cloud Database as a Service provisioning functionality for both relational and non-relational database engines, and to continue to improve its fully-featured and extensible open source framework.

So allow me to expand on that a bit based on some specific comments or questions that I overheard:
- Trove is NOT a database abstraction layer nor any sort of database unification tool; all applications still communicate with their respective datastores directly through their native APIs.
- Trove is NOT a database monitoring, management or analysis tool; all of your favorite debugging and monitoring tools like Percona Toolkit will still work exactly as advertised, and yes, you do need a monitoring tool.
- Although Trove does have some useful backup scheduling options, Trove is NOT a complete backup and recovery tool that can accommodate every backup strategy; you may still use 3rd party options such as scripting your own around Percona XtraBackup or make your life a lot easier and sign up for the Percona Backup Service.
- Trove IS a very nice way to add resource provisioning for many disparate datastores and has some ‘smarts’ built in for each. This ensures a common user experience when provisioning and managing datastore instances.

To that final point, our friends at Tesora introduced their new Database Certification Program at Trove Day. This new program will ensure a high level of compatibility between the various participating database vendors and the Trove project. Of course, Percona Server has already been certified.

I see the future of Trove as being very bright with a huge potential for expansion into other areas, once it is stabilized. I am very excited to begin contributing to this project and watch it grow.

Until next time…

The post OpenStack Trove Day 2014 Recap: MySQL and DBaaS appeared first on MySQL Performance Blog.

Trawling the binlog with FlexCDC and new FlexCDC plugins for MySQL

Latest MySQL Performance Blog posts - August 27, 2014 - 8:15am

Swanhart-Tools includes FlexCDC, a change data capture tool for MySQL. FlexCDC follows a server’s binary log and usually writes “changelogs” that track the changes to tables in the database. I say usually because the latest version of Swanhart-Tools (only in github for now) supports FlexCDC plugins, which allow you to send the updates to a remote data source, or to any other place of your liking.  You can find out more about FlexCDC basics in a previous blog post.

Please note that FlexCDC still needs to have source and destination instances defined in the configuration, even when using plugins.  This is because the FlexCDC state (how much into which binary log has FlexCDC progressed, and what tables are being captured) is stored in the “dest”.  Normally when using a plugin, the source and destination instances will be the same. FlexCDC will create a ‘flexviews’ database with a number of state tables in the destination instance.  This also means you still have to use the create_mvlog.php add_table.php or Flexview’s create_mvlog(…) procedure to mark which tables to capture!  See the previous blog post about FlexCDC.

When you create the mvlog, there will still be a changelog table created in the dest, just like when not using a plugin. This is because the INFORMATION_SCHEMA is used to get column datatypes and additional information (such as if an int is signed or unsigned) and this lookup is done against the table in the dest. The reason this is needed, is because mysqlbinlog, the utility used to scrape the binlog, produces strange output for large signed integers (it provides the signed and unsigned version), thus FlexCDC must figure out the right one to choose from the actual DDL of the changelog table. FlexCDC can’t look at the DDL of the source table though, because the consumer may be behind, and the current structure may not match the structure of the rows in the log.

The new plugin system allows you to do a lot of nifty things like:

  • Replicate to external databases
  • Publish changes to a message queue (this is like Facebook’s Wormhole)
  • Keep a remote cache server in sync
  • and more…

The latest version of Swanhart-Tools includes an Example plugin (in flexviews/consumer/include/example_plugin.php) that simply prints the events that come through it, not logging them into the changelog table at all. There is an example of the output at the end of the post.

The example plugin looks like this:

<?php class FlexCDC_Plugin { static function begin_trx($uow_id, $gsn) { echo "START TRANSACTION: trx_id: $uow_id, Prev GSN: $gsn"; } static function commit_trx($uow_id, $gsn) { echo "COMMIT: trx_id: $uow_id, Last GSN: $gsn"; } static function rollback_trx($uow_id) { echo "ROLLBACK: trx_id: $uow_id"; } static function insert($row, $db, $table, $trx_id, $gsn) { echo "TRX_ID: $trx_id, Schema:$db, Table: $table, DML: INSERT, AT: $gsn"; print_r($row); } static function delete($row, $db, $table, $trx_id, $gsn) { echo "TRX_ID: $trx_id, Schema:$db, Table: $table, DML: DELETE, AT: $gsn"; print_r($row); } static function update_before($row, $db, $table, $trx_id, $gsn) { echo "TRX_ID: $trx_id, Schema:$db, Table: $table, DML: UPDATE (OLD), AT: $gsn"; print_r($row); } static function update_after($row, $db, $table, $trx_id, $gsn) { echo "TRX_ID: $trx_id, Schema:$db, Table: $table, DML: UPDATE (NEW), AT: $gsn"; print_r($row); } }

Important Note: You must define all seven of these functions in your plugin, even if you do not plan to have actions for each of the callbacks – just leave the function body empty to do no action (the call is simply a noop that case.) Note that the plugin functions must be declared STATIC.  This is due to the way that FlexCDC calls the functions.

Transaction state callbacks
There are three callback functions which notify the plugin of changes in transaction state. Before I go into what they do, I want to note the $trx_id and $gsn parameters which are present in every callback. Each transaction is assigned a monotonically increasing transaction identifier. This sequence uniquely identifies each transaction that FlexCDC processes. In addition, each row change is assigned a unique sequence number which FlexCDC calls the Generic Sequence Number (or GSN).

As you can see, the start_trx(…) callback (called when a transaction starts) is passed both the new transaction number and also the highest GSN used in the previous transaction. This is called the GSN high water mark (GSNHWM). At transaction commit, the commit_trx(…) callback is called and the transaction id and the last GSN assigned in the transaction are passed into the callback. This same value will appear as the GSNHWM in the next start_trx(…) callback. Finally, at rollback any sequence numbers assigned in that transaction will be re-used, so no GSN is passed to the rollback callback, but a transaction id is, which lets you determine exactly which transaction is rolling back.

Row change callbacks

Each of the four row change callback functions capture a particular change to the data set. Each of the functions take five parameters. The first ($row) is an array which contains the row being acted upon. The second ($db) is the schema which contains the row. The third ($table) is the table that contains the row. Each callback also receives the transaction identifier, and of course, each row change is assigned a unique GSN.

For example:
An update will fire both update_before(…) and update_after(…) callbacks with the row images before and after the change, respectively. There is an example of this at the end of the post.

Configuring FlexCDC to use a plugin
FlexCDC uses a configuration file called consumer.ini by default.  To the [flexcdc] section add:
plugin=plugin_file.php

The plugin must be in the FlexCDC include/ directory.  You will find example_plugin.php in this directory, to serve as an example.

How it works
Flexviews uses mysqlbinlog to decode the binary log from the source server. It uses the –decode-rows=ROWS option to decode RBR into a format which can be parsed by an external utility. FlexCDC collects information about each transaction and the row changes that happen in the database (which means it requires ROW based binary logging to be used.)  When a plugin is defined the normal actions used by FlexCDC are overridden with the callback functions.

Here is the output from the example plugin, for an update that affected 3 rows (update test.t3 set c1 = c1 – 1):

START TRANSACTION: trx_id: 44, Prev GSN: 107 TRX_ID: 44, Schema:test, Table: t3, DML: UPDATE (OLD), AT: 108 Array ( [c1] => -3 [c2] => 1 ) TRX_ID: 44, Schema:test, Table: t3, DML: UPDATE (NEW), AT: 109 Array ( [c1] => -4 [c2] => 1 ) TRX_ID: 44, Schema:test, Table: t3, DML: UPDATE (OLD), AT: 110 Array ( [c1] => -5 [c2] => 2 ) TRX_ID: 44, Schema:test, Table: t3, DML: UPDATE (NEW), AT: 111 Array ( [c1] => -6 [c2] => 2 ) TRX_ID: 44, Schema:test, Table: t3, DML: UPDATE (OLD), AT: 112 Array ( [c1] => -5 [c2] => 2 ) TRX_ID: 44, Schema:test, Table: t3, DML: UPDATE (NEW), AT: 113 Array ( [c1] => -6 [c2] => 2 ) COMMIT: trx_id: 44, Last GSN: 113

One thing you should notice, is that FlexCDC provides column names for the data coming from the binary log. This is because the log table exists in the dest instance and FlexCDC can get the list of column names from there. When you use other CDC tools, like the C binlog API, you don’t get column names.

The post Trawling the binlog with FlexCDC and new FlexCDC plugins for MySQL appeared first on MySQL Performance Blog.

Trawling the binlog with FlexCDC and new FlexCDC plugins for MySQL

Latest MySQL Performance Blog posts - August 27, 2014 - 8:15am

Swanhart-Tools includes FlexCDC, a change data capture tool for MySQL. FlexCDC follows a server’s binary log and usually writes “changelogs” that track the changes to tables in the database. I say usually because the latest version of Swanhart-Tools (only in github for now) supports FlexCDC plugins, which allow you to send the updates to a remote data source, or to any other place of your liking.  You can find out more about FlexCDC basics in a previous blog post.

Please note that FlexCDC still needs to have source and destination instances defined in the configuration, even when using plugins.  This is because the FlexCDC state (how much into which binary log has FlexCDC progressed, and what tables are being captured) is stored in the “dest”.  Normally when using a plugin, the source and destination instances will be the same. FlexCDC will create a ‘flexviews’ database with a number of state tables in the destination instance.  This also means you still have to use the create_mvlog.php or Flexview’s create_mvlog(…) procedure to mark which tables to capture!  See the previous blog post about FlexCDC.

When you create the mvlog, there will still be a changelog table created in the dest, just like when not using a plugin. This is because the INFORMATION_SCHEMA is used to get column datatypes and additional information (such as if an int is signed or unsigned) and this lookup is done against the table in the dest. The reason this is needed, is because mysqlbinlog, the utility used to scrape the binlog, produces strange output for large signed integers (it provides the signed and unsigned version), thus FlexCDC must figure out the right one to choose from the actual DDL of the changelog table. FlexCDC can’t look at the DDL of the source table though, because the consumer may be behind, and the current structure may not match the structure of the rows in the log.

The new plugin system allows you to do a lot of nifty things like:

  • Replicate to external databases
  • Publish changes to a message queue (this is like Facebook’s Wormhole)
  • Keep a remote cache server in sync
  • and more…

The latest version of Swanhart-Tools includes an Example plugin (in flexviews/consumer/include/example_plugin.php) that simply prints the events that come through it, not logging them into the changelog table at all. There is an example of the output at the end of the post.

The example plugin looks like this:

<?php class FlexCDC_Plugin { static function begin_trx($uow_id, $gsn) { echo "START TRANSACTION: trx_id: $uow_id, Prev GSN: $gsn"; } static function commit_trx($uow_id, $gsn) { echo "COMMIT: trx_id: $uow_id, Last GSN: $gsn"; } static function rollback_trx($uow_id) { echo "ROLLBACK: trx_id: $uow_id"; } static function insert($row, $db, $table, $trx_id, $gsn) { echo "TRX_ID: $trx_id, Schema:$db, Table: $table, DML: INSERT, AT: $gsn"; print_r($row); } static function delete($row, $db, $table, $trx_id, $gsn) { echo "TRX_ID: $trx_id, Schema:$db, Table: $table, DML: DELETE, AT: $gsn"; print_r($row); } static function update_before($row, $db, $table, $trx_id, $gsn) { echo "TRX_ID: $trx_id, Schema:$db, Table: $table, DML: UPDATE (OLD), AT: $gsn"; print_r($row); } static function update_after($row, $db, $table, $trx_id, $gsn) { echo "TRX_ID: $trx_id, Schema:$db, Table: $table, DML: UPDATE (NEW), AT: $gsn"; print_r($row); } }

Important Note: You must define all seven of these functions in your plugin, even if you do not plan to have actions for each of the callbacks – just leave the function body empty to do no action (the call is simply a noop that case.) Note that the plugin functions must be declared STATIC.  This is due to the way that FlexCDC calls the functions.

Transaction state callbacks
There are three callback functions which notify the plugin of changes in transaction state. Before I go into what they do, I want to note the $trx_id and $gsn parameters which are present in every callback. Each transaction is assigned a monotonically increasing transaction identifier. This sequence uniquely identifies each transaction that FlexCDC processes. In addition, each row change is assigned a unique sequence number which FlexCDC calls the Generic Sequence Number (or GSN).

As you can see, the start_trx(…) callback (called when a transaction starts) is passed both the new transaction number and also the highest GSN used in the previous transaction. This is called the GSN high water mark (GSNHWM). At transaction commit, the commit_trx(…) callback is called and the transaction id and the last GSN assigned in the transaction are passed into the callback. This same value will appear as the GSNHWM in the next start_trx(…) callback. Finally, at rollback any sequence numbers assigned in that transaction will be re-used, so no GSN is passed to the rollback callback, but a transaction id is, which lets you determine exactly which transaction is rolling back.

Row change callbacks

Each of the four row change callback functions capture a particular change to the data set. Each of the functions take five parameters. The first ($row) is an array which contains the row being acted upon. The second ($db) is the schema which contains the row. The third ($table) is the table that contains the row. Each callback also receives the transaction identifier, and of course, each row change is assigned a unique GSN.

For example:
An update will fire both update_before(…) and update_after(…) callbacks with the row images before and after the change, respectively. There is an example of this at the end of the post.

Configuring FlexCDC to use a plugin
FlexCDC uses a configuration file called consumer.ini by default.  To the [flexcdc] section add:
plugin=plugin_file.php

The plugin must be in the FlexCDC include/ directory.  You will find example_plugin.php in this directory, to serve as an example.

How it works
Flexviews uses mysqlbinlog to decode the binary log from the source server. It uses the –decode-rows=ROWS option to decode RBR into a format which can be parsed by an external utility. FlexCDC collects information about each transaction and the row changes that happen in the database (which means it requires ROW based binary logging to be used.)  When a plugin is defined the normal actions used by FlexCDC are overridden with the callback functions.

Here is the output from the example plugin, for an update that affected 3 rows (update test.t3 set c1 = c1 – 1):

START TRANSACTION: trx_id: 44, Prev GSN: 107 TRX_ID: 44, Schema:test, Table: t3, DML: UPDATE (OLD), AT: 108 Array ( [c1] => -3 [c2] => 1 ) TRX_ID: 44, Schema:test, Table: t3, DML: UPDATE (NEW), AT: 109 Array ( [c1] => -4 [c2] => 1 ) TRX_ID: 44, Schema:test, Table: t3, DML: UPDATE (OLD), AT: 110 Array ( [c1] => -5 [c2] => 2 ) TRX_ID: 44, Schema:test, Table: t3, DML: UPDATE (NEW), AT: 111 Array ( [c1] => -6 [c2] => 2 ) TRX_ID: 44, Schema:test, Table: t3, DML: UPDATE (OLD), AT: 112 Array ( [c1] => -5 [c2] => 2 ) TRX_ID: 44, Schema:test, Table: t3, DML: UPDATE (NEW), AT: 113 Array ( [c1] => -6 [c2] => 2 ) COMMIT: trx_id: 44, Last GSN: 113

One thing you should notice, is that FlexCDC provides column names for the data coming from the binary log. This is because the log table exists in the dest instance and FlexCDC can get the list of column names from there. When you use other CDC tools, like the C binlog API, you don’t get column names.

The post Trawling the binlog with FlexCDC and new FlexCDC plugins for MySQL appeared first on MySQL Performance Blog.

mysqld_multi: How to run multiple instances of MySQL

Latest MySQL Performance Blog posts - August 26, 2014 - 7:42am

The need to have multiple instances of MySQL (the well-known mysqld process) running in the same server concurrently in a transparent way, instead of having them executed in separate containers/virtual machines, is not very common. Yet from time to time the Percona Support team receives a request from a customer to assist in the configuration of such an environment. MySQL provides a tool to facilitate the execution of multiple instances called mysqld_multi:

“mysqld_multi is designed to manage several mysqld processes that listen for connections on different Unix socket files and TCP/IP ports. It can start or stop servers, or report their current status.”

For tests and development purposes, MySQL Sandbox might be more practical and I personally prefer to use it for my own tests. Both tools work around launching and managing multiple mysqld processes but Sandbox has, as the name suggests, a “sandbox” approach, making it easy to both create and dispose a new instance (including all data inside it). It is more usual to see mysqld_multi being used in production servers: It’s provided with the server package and uses the same single configuration file that people are used to look for when setting up MySQL. So, how does it work? How do we configure and manage the instances? And as importantly, how do we backup all the instances we create?

Understanding the concept of groups in my.cnf

You may have noticed already that MySQL’s main configuration file (or “option file“), my.cnf, is arranged under what is called group structures: Sections defining configuration options specific to a given program or purpose. Usually, the program itself gives name to the group, which appears enclosed by brackets. Here’s a basic my.cnf showing three such groups:

[client] port = 3306 socket = /var/run/mysqld/mysqld.sock user = john password = p455w0rd [mysqld] user = mysql pid-file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock port = 3306 datadir = /var/lib/mysql [xtrabackup] target_dir = /backups/mysql/

The options defined in the group [client] above are used by the mysql command-line tool. As such, if you don’t specify any other option when executing mysql it will attempt to connect to the local MySQL server through the socket in /var/run/mysqld/mysqld.sock and using the credentials stated in that group. Similarly, mysqld will look for the options defined under its section at startup, and the same happens with Percona XtraBackup when you run a backup with that tool. However, the operating parameters defined by the above groups may also be stated as command-line options during the execution of the program, in which case they they replace the ones defined in my.cnf.

Getting started with multiple instances

To have multiple instances of MySQL running we must replace the [mysqld] group in the my.cnf configuration file by as many [mysqlN] groups as we want instances running, with “N” being a positive integer, also called option group number. This number is used by mysqld_multi to identify each instance, so it must be unique across the server. Apart from the distinct group name, the same options that are valid for [mysqld] applies on [mysqldN] groups, the difference being that while stating them is optional for [mysqld] (it’s possible to start MySQL with an empty my.cnf as default values are used if not explicitly provided) some of them (like socket, port, pid-file, and datadir) are mandatory when defining multiple instances – so they don’t step on each other’s feet. Here’s a simple modified my.cnf showing the original [mysqld] group plus two other instances:

[mysqld] user = mysql pid-file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock port = 3306 datadir = /var/lib/mysql [mysqld1] user = mysql pid-file = /var/run/mysqld/mysqld1.pid socket = /var/run/mysqld/mysqld1.sock port = 3307 datadir = /data/mysql/mysql1 [mysqld7] user = mysql pid-file = /var/run/mysqld/mysqld7.pid socket = /var/run/mysqld/mysqld7.sock port = 3308 datadir = /data/mysql/mysql7

Besides using different pid files, ports and sockets for the new instances I’ve also defined a different datadir for each – it’s very important that the instances do not share the same datadir. Chances are you’re importing the data from a backup but if that’s not the case you can simply use mysql_install_db to create each additional datadir (but make sure the parent directory exists and that the mysql user has write access on it):

mysql_install_db --user=mysql --datadir=/data/mysql/mysql7

Note that if /data/mysql/mysql7 doesn’t exist and you start this instance anyway then myqld_multi will call mysqld_install_db itself to have the datadir created and the system tables installed inside it. Alternatively from restoring a backup or having a new datadir created you can make a physical copy of the existing one from the main instance – just make sure to stop it first with a clean shutdown, so any pending changes are flushed to disk first.

Now, you may have noted I wrote above that you need to replace your original MySQL instance group ([mysqld]) by one with an option group number ([mysqlN]). That’s not entirely true, as they can co-exist in harmony. However, the usual start/stop script used to manage MySQL won’t work with the additional instances, nor mysqld_multi really manages [mysqld]. The simple solution here is to have the group [mysqld] renamed with a suffix integer, say [mysqld0] (you don’t need to make any changes to it’s current options though), and let mysqld_multi manage all instances.

Two commands you might find useful when configuring multiple instances are:

$ mysqld_multi --example

…which provides an example of a my.cnf file configured with multiple instances and showing the use of different options, and:

$ my_print_defaults --defaults-file=/etc/my.cnf mysqld7

…which shows how a given group (“mysqld7″ in the example above) was defined within my.cnf.

Managing multiple instances

mysqld_multi allows you to start, stop, reload (which is effectively a restart) and report the current status of a given instance, all instances or a subset of them. The most important observation here is that the “stop” action is managed through mysqladmin – and internally that happens on an individual basis, with one “mysqladmin … stop” call per instance, even if you have mysqld_multi stop all of them. For this to work properly you need to setup a MySQL account with the SHUTDOWN privilege and defined with the same user name and password in all instances. Yes, it will work out of the box if you run mysqld_multi as root in a freshly installed server where the root user can access MySQL passwordless in all instances. But as the manual suggests, it’s better to have an specific account created for this purpose:

mysql> GRANT SHUTDOWN ON *.* TO 'multi_admin'@'localhost' IDENTIFIED BY 'multipass'; mysql> FLUSH PRIVILEGES;

If you plan on replicating the datadir of the main server across your other instances you can have that account created before you make copies of it, otherwise you just need to connect to each instance and create a similar account (remember, the privileged account is only needed by mysqld_multi to stop the instances, not to start them). There’s a special group that can be used on my.cnf to define options for mysqld_multi, which should be used to store these credentials. You might also indicate in there the path for the mysqladmin and mysqld (or mysqld_safe) binaries to use, though you might have a specific mysqld binary defined for each instance inside it’s respective group. Here’s one example:

[mysqld_multi] mysqld = /usr/bin/mysqld_safe mysqladmin = /usr/bin/mysqladmin user = multi_admin password = multipass

You can use mysqld_multi to start, stop, restart or report the status of a particular instance, all instances or a subset of them. Here’s a few examples that speak for themselves:

$ mysqld_multi report Reporting MySQL (Percona Server) servers MySQL (Percona Server) from group: mysqld0 is not running MySQL (Percona Server) from group: mysqld1 is not running MySQL (Percona Server) from group: mysqld7 is not running $ mysqld_multi start $ mysqld_multi report Reporting MySQL (Percona Server) servers MySQL (Percona Server) from group: mysqld0 is running MySQL (Percona Server) from group: mysqld1 is running MySQL (Percona Server) from group: mysqld7 is running $ mysqld_multi stop 7,0 $ mysqld_multi report 7 Reporting MySQL (Percona Server) servers MySQL (Percona Server) from group: mysqld7 is not running $ mysqld_multi report Reporting MySQL (Percona Server) servers MySQL (Percona Server) from group: mysqld0 is not running MySQL (Percona Server) from group: mysqld1 is running MySQL (Percona Server) from group: mysqld7 is not running

Managing the MySQL daemon

What is missing here is an init script to automate the start/stop of all instances upon server initialization/shutdown; now that we use mysqld_multi to control the instances, the usual /etc/init.d/mysql won’t work anymore. But a similar startup script (though much simpler and less robust) relying on mysqld_multi is provided alongside MySQL/Percona Server, which can be found in /usr/share/<mysql|percona-server>/mysqld_multi.server. You can simply copy it over as /etc/init.d/mysql, effectively replacing the original script while maintaining it’s name. Please note: You may need to edit it first and modify the first two lines defining “basedir” and “bindir” as this script was not designed to find out the good working values for these variables itself, which the original single-instance /etc/init.d/mysql does. Considering you probably have mysqld_multi installed in /usr/bin, setting these variables as follows is enough:

basedir=/usr bindir=/usr/bin

Configuring an instance with a different version of MySQL

If you’re planning to have multiple instances of MySQL running concurrently chances are you want to use a mix of different versions for each of them, such as during a development cycle to test an application compatibility. This is a common use for mysqld_multi, and simple enough to achieve. To showcase its use I downloaded the latest version of MySQL 5.6 available and extracted the TAR file in /opt:

$ tar -zxvf mysql-5.6.20-linux-glibc2.5-x86_64.tar.gz -C /opt

Then I made a cold copy of the datadir from one of the existing instances to /data/mysql/mysqld574:

$ mysqld_multi stop 0 $ cp -r /data/mysql/mysql1 /data/mysql/mysql5620 $ chown mysql:mysql -R /data/mysql/mysql5620

and added a new group to my.cnf as follows:

[mysqld5620] user = mysql pid-file = /var/run/mysqld/mysqld5620.pid socket = /var/run/mysqld/mysqld5620.sock port = 3309 datadir = /data/mysql/mysql5620 basedir = /opt/mysql-5.6.20-linux-glibc2.5-x86_64 mysqld = /opt/mysql-5.6.20-linux-glibc2.5-x86_64/bin/mysqld_safe

Note the use of basedir, pointing to the path were the binaries for MySQL 5.6.20 were extracted, as well as an specific mysqld to be used with this instance. If you have made a copy of the datadir from an instance running a previous version of MySQL/Percona Server you will need to consider the same approach use when upgrading and run mysql_upgrade.

* I did try to use the latest experimental release of MySQL 5.7 (mysql-5.7.4-m14-linux-glibc2.5-x86_64.tar.gz) but it crashed with:

*** glibc detected *** bin/mysqld: double free or corruption (!prev): 0x0000000003627650 ***

Using the conventional tools to start and stop an instance

Even though mysqld_multi makes things easier to control in general let’s not forget it is a wrapper; you can still rely (though not always, as shown below) on the conventional tools directly to start and stop an instance: mysqld* and mysqladmin. Just make sure to use the parameter –defaults-group-suffix to identify which instance you want to start:

mysqld --defaults-group-suffix=5620

and –socket to indicate the one you want to stop:

$mysqladmin -S /var/run/mysqld/mysqld5620.sock shutdown

* However, mysqld won’t work to start an instance if you have redefined the option ‘mysqld’ on the configuration group, as I did for [mysqld5620] above, stating:

[ERROR] mysqld: unknown variable 'mysqld=/opt/mysql-5.6.20-linux-glibc2.5-x86_64/bin/mysqld_safe'

I’ve tested using “ledir” to indicate the path to the directory containing the binaries for MySQL 5.6.20 instead of “mysqld” but it also failed with a similar error. If nothing else, that shows you need to stick with mysqld_multi when starting instances in a mixed-version environment.

Backups

The backup of multiple instances must be done in an individual basis, like you would if each instance was located in a different server. You just need to provide the appropriate parameters to identify the instance you’re targeting. For example, we can simply use socket with mysqldump when running it locally:

$ mysqldump --socket=/var/run/mysqld/mysqld7.sock --all-databases > mysqld7.sql

In Percona XtraBackup there’s an option named  –defaults-group that should be used in environments running multiple instances to indicate which one you want to backup :

$ innobackupex --defaults-file=/etc/my.cnf --defaults-group=mysqld7 --socket=/var/run/mysqld/mysqld7.sock /root/Backup/

Yes, you also need to provide a path to the socket (when running the command locally), even though that information is already available in “–defaults-group=mysqld7″; as it turns out, only the Percona XtraBackup tool (which is called by innobackupex during the backup process) makes use of the information available in the group option. You may need to provide credentials as well (“–user” & “–password”), and don’t forget you’ll need to prepare the backup afterwards. The option “defaults-group” is not available in all versions of Percona XtraBackup so make sure to use the latest one.

Summary

Running multiple instances of MySQL concurrently in the same server transparently and without any contextualization or a virtualization layer is possible with both mysqld_multi and MySQL Sandbox. We have been using the later at Percona Support to quickly spin on new disposable instances (though you might as easily keep them running indefinitely). In this post though I’ve looked at mysqld_multi, which is provided with MySQL server and remains the official solution for providing an environment with multiple instances.

The key aspect when configuring multiple instances in my.cnf is the notion of group name option, as you replace a single [mysqld] section by as many [mysqldN] sections as you want instances running. It’s important though to pay attention to certain details when defining the options for each one of these groups, specially when mixing instances from different MySQL/Percona Server versions. Differently from MySQL Sandbox, where each instance relies on it’s own configuration file, you should be careful each time you edit the shared my.cnf file as a syntax error when configuring a single group option will prevent all instances from starting upon the server’s (re)initialization.

I hope to have covered the major points about mysqld_multi here but feel free to leave us a note below if you have something else to add or any comment to contribute.

The post mysqld_multi: How to run multiple instances of MySQL appeared first on MySQL Performance Blog.

mysqld_multi: How to run multiple instances of MySQL

Latest MySQL Performance Blog posts - August 26, 2014 - 7:42am

The need to have multiple instances of MySQL (the well-known mysqld process) running in the same server concurrently in a transparent way, instead of having them executed in separate containers/virtual machines, is not very common. Yet from time to time the Percona Support team receives a request from a customer to assist in the configuration of such an environment. MySQL provides a tool to facilitate the execution of multiple instances called mysqld_multi:

“mysqld_multi is designed to manage several mysqld processes that listen for connections on different Unix socket files and TCP/IP ports. It can start or stop servers, or report their current status.”

For tests and development purposes, MySQL Sandbox might be more practical and I personally prefer to use it for my own tests. Both tools work around launching and managing multiple mysqld processes but Sandbox has, as the name suggests, a “sandbox” approach, making it easy to both create and dispose a new instance (including all data inside it). It is more usual to see mysqld_multi being used in production servers: It’s provided with the server package and uses the same single configuration file that people are used to look for when setting up MySQL. So, how does it work? How do we configure and manage the instances? And as importantly, how do we backup all the instances we create?

Understanding the concept of groups in my.cnf

You may have noticed already that MySQL’s main configuration file (or “option file“), my.cnf, is arranged under what is called group structures: Sections defining configuration options specific to a given program or purpose. Usually, the program itself gives name to the group, which appears enclosed by brackets. Here’s a basic my.cnf showing three such groups:

[client] port = 3306 socket = /var/run/mysqld/mysqld.sock user = john password = p455w0rd [mysqld] user = mysql pid-file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock port = 3306 datadir = /var/lib/mysql [xtrabackup] target_dir = /backups/mysql/

The options defined in the group [client] above are used by the mysql command-line tool. As such, if you don’t specify any other option when executing mysql it will attempt to connect to the local MySQL server through the socket in /var/run/mysqld/mysqld.sock and using the credentials stated in that group. Similarly, mysqld will look for the options defined under its section at startup, and the same happens with Percona XtraBackup when you run a backup with that tool. However, the operating parameters defined by the above groups may also be stated as command-line options during the execution of the program, in which case they they replace the ones defined in my.cnf.

Getting started with multiple instances

To have multiple instances of MySQL running we must replace the [mysqld] group in the my.cnf configuration file by as many [mysqlN] groups as we want instances running, with “N” being a positive integer, also called option group number. This number is used by mysqld_multi to identify each instance, so it must be unique across the server. Apart from the distinct group name, the same options that are valid for [mysqld] applies on [mysqldN] groups, the difference being that while stating them is optional for [mysqld] (it’s possible to start MySQL with an empty my.cnf as default values are used if not explicitly provided) some of them (like socket, port, pid-file, and datadir) are mandatory when defining multiple instances – so they don’t step on each other’s feet. Here’s a simple modified my.cnf showing the original [mysqld] group plus two other instances:

[mysqld] user = mysql pid-file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock port = 3306 datadir = /var/lib/mysql [mysqld1] user = mysql pid-file = /var/run/mysqld/mysqld1.pid socket = /var/run/mysqld/mysqld1.sock port = 3307 datadir = /data/mysql/mysql1 [mysqld7] user = mysql pid-file = /var/run/mysqld/mysqld7.pid socket = /var/run/mysqld/mysqld7.sock port = 3308 datadir = /data/mysql/mysql7

Besides using different pid files, ports and sockets for the new instances I’ve also defined a different datadir for each – it’s very important that the instances do not share the same datadir. Chances are you’re importing the data from a backup but if that’s not the case you can simply use mysql_install_db to create each additional datadir (but make sure the parent directory exists and that the mysql user has write access on it):

mysql_install_db --user=mysql --datadir=/data/mysql/mysql7

Note that if /data/mysql/mysql7 doesn’t exist and you start this instance anyway then myqld_multi will call mysqld_install_db itself to have the datadir created and the system tables installed inside it. Alternatively from restoring a backup or having a new datadir created you can make a physical copy of the existing one from the main instance – just make sure to stop it first with a clean shutdown, so any pending changes are flushed to disk first.

Now, you may have noted I wrote above that you need to replace your original MySQL instance group ([mysqld]) by one with an option group number ([mysqlN]). That’s not entirely true, as they can co-exist in harmony. However, the usual start/stop script used to manage MySQL won’t work with the additional instances, nor mysqld_multi really manages [mysqld]. The simple solution here is to have the group [mysqld] renamed with a suffix integer, say [mysqld0] (you don’t need to make any changes to it’s current options though), and let mysqld_multi manage all instances.

Two commands you might find useful when configuring multiple instances are:

$ mysqld_multi --example

…which provides an example of a my.cnf file configured with multiple instances and showing the use of different options, and:

$ my_print_defaults --defaults-file=/etc/my.cnf mysqld7

…which shows how a given group (“mysqld7″ in the example above) was defined within my.cnf.

Managing multiple instances

mysqld_multi allows you to start, stop, reload (which is effectively a restart) and report the current status of a given instance, all instances or a subset of them. The most important observation here is that the “stop” action is managed through mysqladmin – and internally that happens on an individual basis, with one “mysqladmin … stop” call per instance, even if you have mysqld_multi stop all of them. For this to work properly you need to setup a MySQL account with the SHUTDOWN privilege and defined with the same user name and password in all instances. Yes, it will work out of the box if you run mysqld_multi as root in a freshly installed server where the root user can access MySQL passwordless in all instances. But as the manual suggests, it’s better to have an specific account created for this purpose:

mysql> GRANT SHUTDOWN ON *.* TO 'multi_admin'@'localhost' IDENTIFIED BY 'multipass'; mysql> FLUSH PRIVILEGES;

If you plan on replicating the datadir of the main server across your other instances you can have that account created before you make copies of it, otherwise you just need to connect to each instance and create a similar account (remember, the privileged account is only needed by mysqld_multi to stop the instances, not to start them). There’s a special group that can be used on my.cnf to define options for mysqld_multi, which should be used to store these credentials. You might also indicate in there the path for the mysqladmin and mysqld (or mysqld_safe) binaries to use, though you might have a specific mysqld binary defined for each instance inside it’s respective group. Here’s one example:

[mysqld_multi] mysqld = /usr/bin/mysqld_safe mysqladmin = /usr/bin/mysqladmin user = multi_admin password = multipass

You can use mysqld_multi to start, stop, restart or report the status of a particular instance, all instances or a subset of them. Here’s a few examples that speak for themselves:

$ mysqld_multi report Reporting MySQL (Percona Server) servers MySQL (Percona Server) from group: mysqld0 is not running MySQL (Percona Server) from group: mysqld1 is not running MySQL (Percona Server) from group: mysqld7 is not running $ mysqld_multi start $ mysqld_multi report Reporting MySQL (Percona Server) servers MySQL (Percona Server) from group: mysqld0 is running MySQL (Percona Server) from group: mysqld1 is running MySQL (Percona Server) from group: mysqld7 is running $ mysqld_multi stop 7,0 $ mysqld_multi report 7 Reporting MySQL (Percona Server) servers MySQL (Percona Server) from group: mysqld7 is not running $ mysqld_multi report Reporting MySQL (Percona Server) servers MySQL (Percona Server) from group: mysqld0 is not running MySQL (Percona Server) from group: mysqld1 is running MySQL (Percona Server) from group: mysqld7 is not running

Managing the MySQL daemon

What is missing here is an init script to automate the start/stop of all instances upon server initialization/shutdown; now that we use mysqld_multi to control the instances, the usual /etc/init.d/mysql won’t work anymore. But a similar startup script (though much simpler and less robust) relying on mysqld_multi is provided alongside MySQL/Percona Server, which can be found in /usr/share/<mysql|percona-server>/mysqld_multi.server. You can simply copy it over as /etc/init.d/mysql, effectively replacing the original script while maintaining it’s name. Please note: You may need to edit it first and modify the first two lines defining “basedir” and “bindir” as this script was not designed to find out the good working values for these variables itself, which the original single-instance /etc/init.d/mysql does. Considering you probably have mysqld_multi installed in /usr/bin, setting these variables as follows is enough:

basedir=/usr bindir=/usr/bin

Configuring an instance with a different version of MySQL

If you’re planning to have multiple instances of MySQL running concurrently chances are you want to use a mix of different versions for each of them, such as during a development cycle to test an application compatibility. This is a common use for mysqld_multi, and simple enough to achieve. To showcase its use I downloaded the latest version of MySQL 5.6 available and extracted the TAR file in /opt:

$ tar -zxvf mysql-5.6.20-linux-glibc2.5-x86_64.tar.gz -C /opt

Then I made a cold copy of the datadir from one of the existing instances to /data/mysql/mysqld574:

$ mysqld_multi stop 0 $ cp -r /data/mysql/mysql1 /data/mysql/mysql5620 $ chown mysql:mysql -R /data/mysql/mysql5620

and added a new group to my.cnf as follows:

[mysqld5620] user = mysql pid-file = /var/run/mysqld/mysqld5620.pid socket = /var/run/mysqld/mysqld5620.sock port = 3309 datadir = /data/mysql/mysql5620 basedir = /opt/mysql-5.6.20-linux-glibc2.5-x86_64 mysqld = /opt/mysql-5.6.20-linux-glibc2.5-x86_64/bin/mysqld_safe

Note the use of basedir, pointing to the path were the binaries for MySQL 5.6.20 were extracted, as well as an specific mysqld to be used with this instance. If you have made a copy of the datadir from an instance running a previous version of MySQL/Percona Server you will need to consider the same approach use when upgrading and run mysql_upgrade.

* I did try to use the latest experimental release of MySQL 5.7 (mysql-5.7.4-m14-linux-glibc2.5-x86_64.tar.gz) but it crashed with:

*** glibc detected *** bin/mysqld: double free or corruption (!prev): 0x0000000003627650 ***

Using the conventional tools to start and stop an instance

Even though mysqld_multi makes things easier to control in general let’s not forget it is a wrapper; you can still rely (though not always, as shown below) on the conventional tools directly to start and stop an instance: mysqld* and mysqladmin. Just make sure to use the parameter –defaults-group-suffix to identify which instance you want to start:

mysqld --defaults-group-suffix=5620

and –socket to indicate the one you want to stop:

$mysqladmin -S /var/run/mysqld/mysqld5620.sock shutdown

* However, mysqld won’t work to start an instance if you have redefined the option ‘mysqld’ on the configuration group, as I did for [mysqld5620] above, stating:

[ERROR] mysqld: unknown variable 'mysqld=/opt/mysql-5.6.20-linux-glibc2.5-x86_64/bin/mysqld_safe'

I’ve tested using “ledir” to indicate the path to the directory containing the binaries for MySQL 5.6.20 instead of “mysqld” but it also failed with a similar error. If nothing else, that shows you need to stick with mysqld_multi when starting instances in a mixed-version environment.

Backups

The backup of multiple instances must be done in an individual basis, like you would if each instance was located in a different server. You just need to provide the appropriate parameters to identify the instance you’re targeting. For example, we can simply use socket with mysqldump when running it locally:

$ mysqldump --socket=/var/run/mysqld/mysqld7.sock --all-databases > mysqld7.sql

In Percona XtraBackup there’s an option named  –defaults-group that should be used in environments running multiple instances to indicate which one you want to backup :

$ innobackupex --defaults-file=/etc/my.cnf --defaults-group=mysqld7 --socket=/var/run/mysqld/mysqld7.sock /root/Backup/

Yes, you also need to provide a path to the socket (when running the command locally), even though that information is already available in “–defaults-group=mysqld7″; as it turns out, only the Percona XtraBackup tool (which is called by innobackupex during the backup process) makes use of the information available in the group option. You may need to provide credentials as well (“–user” & “–password”), and don’t forget you’ll need to prepare the backup afterwards. The option “defaults-group” is not available in all versions of Percona XtraBackup so make sure to use the latest one.

Summary

Running multiple instances of MySQL concurrently in the same server transparently and without any contextualization or a virtualization layer is possible with both mysqld_multi and MySQL Sandbox. We have been using the later at Percona Support to quickly spin on new disposable instances (though you might as easily keep them running indefinitely). In this post though I’ve looked at mysqld_multi, which is provided with MySQL server and remains the official solution for providing an environment with multiple instances.

The key aspect when configuring multiple instances in my.cnf is the notion of group name option, as you replace a single [mysqld] section by as many [mysqldN] sections as you want instances running. It’s important though to pay attention to certain details when defining the options for each one of these groups, specially when mixing instances from different MySQL/Percona Server versions. Differently from MySQL Sandbox, where each instance relies on it’s own configuration file, you should be careful each time you edit the shared my.cnf file as a syntax error when configuring a single group option will prevent all instances from starting upon the server’s (re)initialization.

I hope to have covered the major points about mysqld_multi here but feel free to leave us a note below if you have something else to add or any comment to contribute.

The post mysqld_multi: How to run multiple instances of MySQL appeared first on MySQL Performance Blog.

OpenStack’s Trove: The benefits of this database as a service (DBaaS)

Latest MySQL Performance Blog posts - August 25, 2014 - 5:00am

In a previous post, my colleague Dimitri Vanoverbeke discussed at a high level the concepts of database as a service (DBaaS), OpenStack and OpenStack’s implementation of a DBaaS, Trove. Today I’d like to delve a bit further into Trove and discuss where it fits in, and who benefits.

Just to recap, Trove is OpenStack’s implementation of a database as a service for its cloud infrastructure as a service (IaaS). And as the mission statement declares, the Trove project seeks to provide a scalable and reliable cloud database service providing functionality for both relational and non-relational database engines. With the current release of Icehouse, the technology has begun to show maturity providing both stability and a rich feature set.

In my opinion, there are two primary markets that will benefit from Trove: the first being service providers such as RackSpace who provide cloud-based services similar to Amazon’s AWS. These are companies that wish to expand beyond the basic cloud services of storage and networking and provide their customer base with a richer cloud experience by providing higher level services such as DBaaS functionality. The other players are those companies that wish to “cloudify” their own internal systems. The reasons for this decision are varied, ranging from the desire to maintain complete control over all the architecture and the cloud components to legal constraints limiting the use of public cloud infrastructures.

With Trove, much of the management of your database system is taken care of by automating a significant portion of the configuration and initial setup steps necessitated when launching a new server. This includes deployment, configuration, patching, backups, restores, and monitoring that can be administered from either a CLI interface, RESTful API’s or OpenStack’s Horizon dashboard. At this point, what Trove doesn’t provide is failover, replication and clustering. This functionality is slated to be implemented in the Kilo release of OpenStack due out in April/2015.

The process flow is relatively simple. The OpenStack Administrator first configures the basic infrastructure by installing the database service. He or she would then create an image for each type of database they wish to support such as MySQL or MongoDB. They would then import the images and offer them to their tenants. From the end users perspective only a few commandes are necessary to get up and running. First issuing the <trove create> command to create a database service instance, followed by <trove list> command to get the ID of the instance and finally trove show command to get the IP address of it.

For example to create a database, you first start off by creating a database instance. This is an isolated database environment with compute and storage resources in a single tenant environment on a shared physical host machine. You can run a database instance with a variety of database engines such as MySQL or MongoDB.

From the Trove client I can issue the following command to create a database instance called PS_troveinstance, with a volume size of 2 GB, a user called PS_user, a password PS_password and the MySQL datastore (or database engine):

$ trove create –size 2 –users PS_user:PS_password –datastore MySQL PS_troveinstance

Next I issue the following command to get the ID of the database instance:

$ trove list PS_troveinstance

And finally, to create a database called PS_trovedb, I execute:

$ trove database-create PS_troveinstance PS_trovedb

Alternatively, I could have just combined the above commands as:

$ trove create –size 2 —-database PS_trovedb users PS_user:PS_password –datastore MySQL PS_troveinstance

And thus we now have a MySQL database server containing a database called PS_trovedb.

In our next post on OpenStack/Trove, we’ll dig even further and discuss the software and hardware requirements, and how to actually set up Trove.

On a related note, Percona has several experts attending this week’s OpenStack Operations Summit in San Antonio, Texas. One of them is Matt Griffin, director of product management, who pointed out in a recent post that many OpenStack operators use Percona open source software including the MySQL drop-in compatible Percona Server and Galera-based Percona XtraDB Cluster as well as tools such as Percona XtraBackup and Percona Toolkit. “We see a need in the community to understand how to improve MySQL performance in OpenStack. As a result, Percona, submitted 16 presentations for November’s Paris OpenStack Summit,” Matt said. So stay tuned for related news from him, too, on that front.

The post OpenStack’s Trove: The benefits of this database as a service (DBaaS) appeared first on MySQL Performance Blog.

When (and how) to move an InnoDB table outside the shared tablespace

Latest MySQL Performance Blog posts - August 22, 2014 - 7:29am

In my last post, “A closer look at the MySQL ibdata1 disk space issue and big tables,” I looked at the growing ibdata1 problem under the perspective of having big tables residing inside the so-called shared tablespace. In the particular case that motivated that post, we had a customer running out of disk space in his server who was looking for a way to make the ibdata1 file shrink. As you may know, that file (or, as explained there, the set of ibdata files composing the shared tablespace) stores all InnoDB tables created when innodb_file_per_table is disabled, but also other InnoDB structures, such as undo logs and data dictionary.

For example, when you run a transaction involving InnoDB tables, MySQL will first write all the changes it triggers in an undo log, for the case you later decide to “roll them back”. Long standing, uncommited transactions are one of the causes for a growing ibdata file. Of course, if you have innodb_file_per_table disabled then all your InnoDB tables live inside it. That was what happened in that case.

So, how do you move a table outside the shared tablespace and change the storage engine it relies on? As importantly, how does that affects disk space use? I’ve explored some of the options presented in the previous post and now share my findings with you below.

The experiment

I created a very simple InnoDB table inside the shared tablespace of a fresh installed Percona Server 5.5.37-rel35.1 with support for TokuDB and configured it with a 1GB buffer pool. I’ve used a 50G partition to host the ‘datadir’ and another one for ‘tmpdir’:

Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg0-lvFernando1 50G 110M 47G 1% /media/lvFernando1 # datadir /dev/mapper/vg0-lvFernando2 50G 52M 47G 1% /media/lvFernando2 # tmpdir

Here’s the table structure:

CREATE TABLE `joinit` ( `i` int(11) NOT NULL AUTO_INCREMENT, `s` varchar(64) DEFAULT NULL, `t` time NOT NULL, `g` int(11) NOT NULL, PRIMARY KEY (`i`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1;

I populated it with 134 million rows using the following routine:

INSERT INTO joinit VALUES (NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 ))); INSERT INTO joinit SELECT NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 )) FROM joinit; # repeat until you reach your target number of rows

which resulted in the table below:

mysql> show table status from test like 'joinit'G *************************** 1. row *************************** Name: joinit Engine: InnoDB Version: 10 Row_format: Compact Rows: 134217909 Avg_row_length: 72 Data_length: 9783214080 Max_data_length: 0 Index_length: 0 Data_free: 1013972992 Auto_increment: 134872552 Create_time: 2014-07-30 20:42:42 Update_time: NULL Check_time: NULL Collation: latin1_swedish_ci Checksum: NULL Create_options: Comment: 1 row in set (0.00 sec)

The resulting ibdata1 file was showing to have 11G, which accounted in practice for 100% of the datadir partition use then. What follows next is a few experiences I did by converting that table to use a different storage engine, moving it outside the shared tablespace, compressing it, and dumping and restoring the database back to see the effects in disk space use. I haven’t timed how long running each command took and focused mostly on the generated files size. As a bonus, I’ve also looked at how to extend the shared table space by adding an extra ibdata file.

#1) Converting to MyISAM

Technical characteristics and features apart, MyISAM tables are know to occupy less disk space than InnoDB’s ones. How much less depends on the actual table structure. Here I made the conversion in the most simplest way:

mysql> ALTER TABLE test.joinit ENGINE=MYISAM;

which created the following files (the .frm file already existed):

$ ls -lh /media/lvFernando1/data/test/ total 8.3G -rw-rw---- 1 fernando.laudares admin 8.5K Jul 31 16:21 joinit.frm -rw-rw---- 1 fernando.laudares admin 7.0G Jul 31 16:27 joinit.MYD -rw-rw---- 1 fernando.laudares admin 1.3G Jul 31 16:27 joinit.MYI

The resulting MyISAM files amounted for an additional 8.3G of disk space use:

/dev/mapper/vg0-lvFernando1 50G 19G 29G 40% /media/lvFernando1

I was expecting smaller files but, of course, the result depends largely on the data types of the columns composing the table. The problem (or the consequence) is that we end up with close to the double of the initial disk space being used:

As it happens with the other solutions presented in this section that migrate the target table outside the shared tablespace, the common/safest way to reclaim the freed (unused) space inside the ibdata1 file back to the operating system is by doing a dump & restore of the full database.

There’s an alternative approach with MyISAM though, which doesn’t involve dump & restore and only requires a MySQL restart. However, you need to convert all InnoDB tables to MyISAM, stop MySQL, delete all ib* files (there should be no remaining .ibd files after you’ve converted all InnoDB tables to MyISAM), and then restart MySQL again. Upon MySQL restart, ibdata1 will be re-created with it’s default initial size (more on this below). You can then convert the MyISAM tables back to InnoDB and if you have innodb_file_per_table enabled this time then the tables will be created with their own private tablespace file.

#2) Exporting the table to a private tablespace

Once you have innodb_file_per_table enabled you can move a table residing inside ibdata1 to it’s private tablespace (it’s own .ibd file) by either running ALTER TABLE or OPTIMIZE TABLE. Both commands create a “temporary” (though InnoDB, not MyISAM) table with it’s own tablespace file inside the database directory (and not in the tmpdir, as I believed it would happen), the rows from the target table being copied over there.

Here’s showing the temporary table (#sql-4f10_1) that was created while the process was still ongoing:

$ ls -lh /media/lvFernando1/data/test total 2.2G -rw-rw---- 1 fernando.laudares admin 8.5K Jul 30 20:42 joinit.frm -rw-rw---- 1 fernando.laudares admin 8.5K Jul 30 23:05 #sql-4f10_1.frm -rw-rw---- 1 fernando.laudares admin 2.2G Jul 30 23:12 #sql-4f10_1.ibd

and the resulting .ibd file from when the process completed:

$ ls -lh /media/lvFernando1/data/test total 9.3G -rw-rw---- 1 fernando.laudares admin 8.5K Jul 30 23:05 joinit.frm -rw-rw---- 1 fernando.laudares admin 9.3G Jul 30 23:35 joinit.ibd

Note that the new joinit.ibd file is about 9.3G. Again, the process resulted in the use of extra disk space:

/dev/mapper/vg0-lvFernando1 50G 20G 28G 42% /media/lvFernando1

#3) Dump and restore: looking at the disk space use

As pointed, the way to reclaim unused disk space inside ibdata1 back to the file system (and consequently making it shrink) is by dumping the full database to a text file and then restoring it back. I’ve started by doing a simple full mysqldump encompassing all databases:

$ mysqldump -S /tmp/mysql_sandbox5537.sock --user=msandbox --password=msandbox --all-databases > dump.sql

which created the following file:

-rw-r--r-- 1 fernando.laudares admin 8.1G Jul 31 00:02 dump.sql

I’ve then stopped MySQL, wiped out the full datadir and used the script mysql_install_db to (re-)create the system tables (though that’s not needed, I did it to level comparisons; once you have all InnoDB tables out of the system tablespace you can simply delete all ib* files along any .ibd and .frm files for related InnoDB tables), started MySQL again, and finally restored the backup:

$ mysql -S /tmp/mysql_sandbox5537.sock --user=msandbox --password=msandbox < dump.sql

This resulted in:

/dev/mapper/vg0-lvFernando1 50G 9.4G 38G 21% /media/lvFernando1

and:

-rw-rw---- 1 fernando.laudares admin 18M Jul 31 05:05 /media/lvFernando1/data/ibdata1

ibdata1 did shrink in the process (returning to the default* initial size) and we recovered back a couple of gigabytes that were not being used by the single InnoDB table that used to live inside it.

* The default value for innodb_data_file_path for the Percona Server release we used on the tests is “ibdata1:10M:autoextend”. However, the manual states that “the default behavior is to create a single auto-extending data file, slightly larger than 10MB“. Given that innodb_autoextend_increment is set to 8M by default what we get in practice is an initialized ibdata1 file of 18M, as seen above.

#4) Compressing the table

Back to the original scenario, with the test table still living in the shared tablespace, I’ve enabled innodb_file_per_table, set Barracuda as the innodb_file_format and then ran:

mysql> ALTER TABLE test.joinit ENGINE=INNODB ROW_FORMAT=Compressed;

This resulted in an .ibd file with half the size of an uncompressed one:

-rw-rw---- 1 fernando.laudares admin 4.7G Jul 31 13:59 joinit.ibd

The process was a lengthy one but resulted in considerable less disk space use than converting the table to MyISAM:

/dev/mapper/vg0-lvFernando1 50G 20G 28G 42% /media/lvFernando1

#5) Converting the table to TokuDB

I’ve used TokuDB version 7.1.6 (not the latest one), which was bundled in Percona Server 5.5.37-rel35.1.

mysql> ALTER TABLE test.joinit ENGINE=TOKUDB;

As was the case with converting the InnoDB table to it’s own tablespace, a temporary table was created to intermediate the process, with the table definition residing in the test database directory:

$ ls -lh /media/lvFernando1/data/test/ total 24K -rw-r----- 1 fernando.laudares admin 8.5K Jul 31 14:22 joinit.frm -rw-rw---- 1 fernando.laudares admin 8.5K Jul 31 14:28 #sql-7f51_1.frm

However, TokuDB created a temporary file in the main datadir to copy the data into, show here at some point during the process:

-rwxrwx--x 1 fernando.laudares admin 32K Jul 31 14:28 _test_sql_7f51_1_main_a_2_19.tokudb -rwxrwx--x 1 fernando.laudares admin 16K Jul 31 14:28 _test_sql_7f51_1_status_a_1_19.tokudb -rw------- 1 fernando.laudares admin 528M Jul 31 14:29 tokuldNk5W4v

Once the process completed the file tokuldNk5W4v disappeared and the data ended up in the _test_sql_7f51_1_main_a_2_19.tokudb file:

-rwxrwx--x 1 fernando.laudares admin 1.1G Jul 31 14:32 _test_sql_7f51_1_main_d_1_19_B_0.tokudb

The new .tokudb file is so small compared to the other files obtained in the previous approaches that it almost goes unnoticed when looking at the disk space use:

/dev/mapper/vg0-lvFernando1 50G 12G 36G 25% /media/lvFernando1

Curiously, that file retained the name/reference of the temporary table it used, even though the temporary table file description (#sql-7f51_1.frm) disappeared as well; only joinit.frm remained under the test database directory.

I decided to do a dump & restore of the whole database to see what that would result in this case:

$ mysqldump -S /tmp/mysql_sandbox5537.sock --user=msandbox --password=msandbox --all-databases > dump2.sql $ ls -lh dump2.sql -rw-r--r-- 1 fernando.laudares admin 8.1G Jul 31 15:07 /home/fernando.laudares/dump2.sql

Once it was restored we found the ibdata1 file back to its initial default size and the disk space used by the datadir was down to a scant 1.1G. Curiously (again), the dump & restore procedure did fixed the name of the TokuDB table file:

-rwxrwx--x 1 fernando.laudares admin 980M Jul 31 15:36 _test_joinit_main_21_1_19_B_0.tokudb

The process of converting the resident InnoDB table to TokuDB was faster than compressing it with ROW_FORMAT=Compressed and resulted in a much smaller file. This is not to say that using TokuDB is the best solution but to point that the process can be accomplished in less time and make use of lesser extra disk space than InnoDB does, which can simply come up handy if you don’t have much space left. Also, remember the test was done with a not very large (and simple in structure) test table; you may need to check if yours can be converted to TokuDB and what changes to indexes you might need to do (if any).

#6) Expanding the shared tablespace

As previously mentioned, I’ve been using the default value of  “ibdata1:10M:autoextend” for innodb_data_file_path for my tests. If you have another partition with unused space available and you look only for a solution to the “running out of disk space” problem and don’t mind keeping your big table inside the shared tablespace, then you can expand it. That can be done by means of adding a second ibdata file to the tablespace definition. Of course, that only works if you create it outside the partition already hosting ibdata1.

To do so, you need to stop MySQL and verify the size of the ibdata1 file you have (list size in bytes as “ls -lh” will round it here):

$ ls -l /media/lvFernando1/data/ibdata1 -rw-r----- 1 fernando.laudares admin 10957619200 Aug 1 16:20 /media/lvFernando1/data/ibdata1

I had available space on the /media/lvFernando2 partition so I decided to have ibdata2 created there. To do so, I’ve added the following 2 lines to my.cnf:

innodb_data_home_dir = innodb_data_file_path=/media/lvFernando1/data/ibdata1:10957619200;/media/lvFernando2/tmp/ibdata2:10M:autoextend

Then I restarted MySQL and confirmed that ibdata2 was created there upon initialization:

$ ls -lh /media/lvFernando2/tmp/ total 10M -rw-rw---- 1 fernando.laudares admin 10M Aug 1 16:30 ibdata2

Three important things to note on this approach:

  1. There can be only one ibdata file listed in innodb_data_file_path configured with “autoextend” - the last one in the list.
  2. You need to redefine ibdata1 in innodb_data_file_path using it’s current size. Any other size won’t work.
  3. From the moment you redefined ibdata1 with it’s current size and added ibdata2 configured with “autoexend”, ibdata1 won’t grow any bigger; the common tablespace will be increased through ibdata2.

What the 3rd point really means in practice is that you need to plan this maneuver accordingly: if you still have space available in the partition that hosts ibdata1 and you want to use it first then you need to delay these changes until you get to that point. From the moment you add ibdata2 to the common tablespace definition new data will start to be added into it.

Conclusion

The little experiment I did helped explain how some of the solutions proposed in my previous post for moving an InnoDB table outside the shared tablespace would work in practice and, most importantly, how much more disk space would be needed along the process. It was interesting to see that none of them made use of the tmpdir when they created temporary tables to intermediate the table conversion process; those were always created on the main datadir. The test table I’ve used had only a bit more than 10G, far away from a 1TB-sized table, so the results I’ve got may differ from what you would find in a larger environment.

As complementary information, MySQL 5.6 allows for the creation of InnoDB tables having their private tablespaces living outside the datadir (but that must be defined at table creation time and can’t be changed by means of an ALTER TABLE) as well as transportable tablespaces (which allow for the copy of private tablespaces from one database server to another).

The post When (and how) to move an InnoDB table outside the shared tablespace appeared first on MySQL Performance Blog.

When (and how) to move an InnoDB table outside the shared tablespace

Latest MySQL Performance Blog posts - August 22, 2014 - 7:29am

In my last post, “A closer look at the MySQL ibdata1 disk space issue and big tables,” I looked at the growing ibdata1 problem under the perspective of having big tables residing inside the so-called shared tablespace. In the particular case that motivated that post, we had a customer running out of disk space in his server who was looking for a way to make the ibdata1 file shrink. As you may know, that file (or, as explained there, the set of ibdata files composing the shared tablespace) stores all InnoDB tables created when innodb_file_per_table is disabled, but also other InnoDB structures, such as undo logs and data dictionary.

For example, when you run a transaction involving InnoDB tables, MySQL will first write all the changes it triggers in an undo log, for the case you later decide to “roll them back”. Long standing, uncommited transactions are one of the causes for a growing ibdata file. Of course, if you have innodb_file_per_table disabled then all your InnoDB tables live inside it. That was what happened in that case.

So, how do you move a table outside the shared tablespace and change the storage engine it relies on? As importantly, how does that affects disk space use? I’ve explored some of the options presented in the previous post and now share my findings with you below.

The experiment

I created a very simple InnoDB table inside the shared tablespace of a fresh installed Percona Server 5.5.37-rel35.1 with support for TokuDB and configured it with a 1GB buffer pool. I’ve used a 50G partition to host the ‘datadir’ and another one for ‘tmpdir’:

Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg0-lvFernando1 50G 110M 47G 1% /media/lvFernando1 # datadir /dev/mapper/vg0-lvFernando2 50G 52M 47G 1% /media/lvFernando2 # tmpdir

Here’s the table structure:

CREATE TABLE `joinit` ( `i` int(11) NOT NULL AUTO_INCREMENT, `s` varchar(64) DEFAULT NULL, `t` time NOT NULL, `g` int(11) NOT NULL, PRIMARY KEY (`i`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1;

I populated it with 134 million rows using the following routine:

INSERT INTO joinit VALUES (NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 ))); INSERT INTO joinit SELECT NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 )) FROM joinit; # repeat until you reach your target number of rows

which resulted in the table below:

mysql> show table status from test like 'joinit'G *************************** 1. row *************************** Name: joinit Engine: InnoDB Version: 10 Row_format: Compact Rows: 134217909 Avg_row_length: 72 Data_length: 9783214080 Max_data_length: 0 Index_length: 0 Data_free: 1013972992 Auto_increment: 134872552 Create_time: 2014-07-30 20:42:42 Update_time: NULL Check_time: NULL Collation: latin1_swedish_ci Checksum: NULL Create_options: Comment: 1 row in set (0.00 sec)

The resulting ibdata1 file was showing to have 11G, which accounted in practice for 100% of the datadir partition use then. What follows next is a few experiences I did by converting that table to use a different storage engine, moving it outside the shared tablespace, compressing it, and dumping and restoring the database back to see the effects in disk space use. I haven’t timed how long running each command took and focused mostly on the generated files size. As a bonus, I’ve also looked at how to extend the shared table space by adding an extra ibdata file.

#1) Converting to MyISAM

Technical characteristics and features apart, MyISAM tables are know to occupy less disk space than InnoDB’s ones. How much less depends on the actual table structure. Here I made the conversion in the most simplest way:

mysql> ALTER TABLE test.joinit ENGINE=MYISAM;

which created the following files (the .frm file already existed):

$ ls -lh /media/lvFernando1/data/test/ total 8.3G -rw-rw---- 1 fernando.laudares admin 8.5K Jul 31 16:21 joinit.frm -rw-rw---- 1 fernando.laudares admin 7.0G Jul 31 16:27 joinit.MYD -rw-rw---- 1 fernando.laudares admin 1.3G Jul 31 16:27 joinit.MYI

The resulting MyISAM files amounted for an additional 8.3G of disk space use:

/dev/mapper/vg0-lvFernando1 50G 19G 29G 40% /media/lvFernando1

I was expecting smaller files but, of course, the result depends largely on the data types of the columns composing the table. The problem (or the consequence) is that we end up with close to the double of the initial disk space being used:

As it happens with the other solutions presented in this section that migrate the target table outside the shared tablespace, the common/safest way to reclaim the freed (unused) space inside the ibdata1 file back to the operating system is by doing a dump & restore of the full database.

There’s an alternative approach with MyISAM though, which doesn’t involve dump & restore and only requires a MySQL restart. However, you need to convert all InnoDB tables to MyISAM, stop MySQL, delete all ib* files (there should be no remaining .ibd files after you’ve converted all InnoDB tables to MyISAM), and then restart MySQL again. Upon MySQL restart, ibdata1 will be re-created with it’s default initial size (more on this below). You can then convert the MyISAM tables back to InnoDB and if you have innodb_file_per_table enabled this time then the tables will be created with their own private tablespace file.

#2) Exporting the table to a private tablespace

Once you have innodb_file_per_table enabled you can move a table residing inside ibdata1 to it’s private tablespace (it’s own .ibd file) by either running ALTER TABLE or OPTIMIZE TABLE. Both commands create a “temporary” (though InnoDB, not MyISAM) table with it’s own tablespace file inside the database directory (and not in the tmpdir, as I believed it would happen), the rows from the target table being copied over there.

Here’s showing the temporary table (#sql-4f10_1) that was created while the process was still ongoing:

$ ls -lh /media/lvFernando1/data/test total 2.2G -rw-rw---- 1 fernando.laudares admin 8.5K Jul 30 20:42 joinit.frm -rw-rw---- 1 fernando.laudares admin 8.5K Jul 30 23:05 #sql-4f10_1.frm -rw-rw---- 1 fernando.laudares admin 2.2G Jul 30 23:12 #sql-4f10_1.ibd

and the resulting .ibd file from when the process completed:

$ ls -lh /media/lvFernando1/data/test total 9.3G -rw-rw---- 1 fernando.laudares admin 8.5K Jul 30 23:05 joinit.frm -rw-rw---- 1 fernando.laudares admin 9.3G Jul 30 23:35 joinit.ibd

Note that the new joinit.ibd file is about 9.3G. Again, the process resulted in the use of extra disk space:

/dev/mapper/vg0-lvFernando1 50G 20G 28G 42% /media/lvFernando1

#3) Dump and restore: looking at the disk space use

As pointed, the way to reclaim unused disk space inside ibdata1 back to the file system (and consequently making it shrink) is by dumping the full database to a text file and then restoring it back. I’ve started by doing a simple full mysqldump encompassing all databases:

$ mysqldump -S /tmp/mysql_sandbox5537.sock --user=msandbox --password=msandbox --all-databases > dump.sql

which created the following file:

-rw-r--r-- 1 fernando.laudares admin 8.1G Jul 31 00:02 dump.sql

I’ve then stopped MySQL, wiped out the full datadir and used the script mysql_install_db to (re-)create the system tables (though that’s not needed, I did it to level comparisons; once you have all InnoDB tables out of the system tablespace you can simply delete all ib* files along any .ibd and .frm files for related InnoDB tables), started MySQL again, and finally restored the backup:

$ mysql -S /tmp/mysql_sandbox5537.sock --user=msandbox --password=msandbox < dump.sql

This resulted in:

/dev/mapper/vg0-lvFernando1 50G 9.4G 38G 21% /media/lvFernando1

and:

-rw-rw---- 1 fernando.laudares admin 18M Jul 31 05:05 /media/lvFernando1/data/ibdata1

ibdata1 did shrink in the process (returning to the default* initial size) and we recovered back a couple of gigabytes that were not being used by the single InnoDB table that used to live inside it.

* The default value for innodb_data_file_path for the Percona Server release we used on the tests is “ibdata1:10M:autoextend”. However, the manual states that “the default behavior is to create a single auto-extending data file, slightly larger than 10MB“. Given that innodb_autoextend_increment is set to 8M by default what we get in practice is an initialized ibdata1 file of 18M, as seen above.

#4) Compressing the table

Back to the original scenario, with the test table still living in the shared tablespace, I’ve enabled innodb_file_per_table, set Barracuda as the innodb_file_format and then ran:

mysql> ALTER TABLE test.joinit ENGINE=INNODB ROW_FORMAT=Compressed;

This resulted in an .ibd file with half the size of an uncompressed one:

-rw-rw---- 1 fernando.laudares admin 4.7G Jul 31 13:59 joinit.ibd

The process was a lengthy one but resulted in considerable less disk space use than converting the table to MyISAM:

/dev/mapper/vg0-lvFernando1 50G 20G 28G 42% /media/lvFernando1

#5) Converting the table to TokuDB

I’ve used TokuDB version 7.1.6 (not the latest one), which was bundled in Percona Server 5.5.37-rel35.1.

mysql> ALTER TABLE test.joinit ENGINE=TOKUDB;

As was the case with converting the InnoDB table to it’s own tablespace, a temporary table was created to intermediate the process, with the table definition residing in the test database directory:

$ ls -lh /media/lvFernando1/data/test/ total 24K -rw-r----- 1 fernando.laudares admin 8.5K Jul 31 14:22 joinit.frm -rw-rw---- 1 fernando.laudares admin 8.5K Jul 31 14:28 #sql-7f51_1.frm

However, TokuDB created a temporary file in the main datadir to copy the data into, show here at some point during the process:

-rwxrwx--x 1 fernando.laudares admin 32K Jul 31 14:28 _test_sql_7f51_1_main_a_2_19.tokudb -rwxrwx--x 1 fernando.laudares admin 16K Jul 31 14:28 _test_sql_7f51_1_status_a_1_19.tokudb -rw------- 1 fernando.laudares admin 528M Jul 31 14:29 tokuldNk5W4v

Once the process completed the file tokuldNk5W4v disappeared and the data ended up in the _test_sql_7f51_1_main_a_2_19.tokudb file:

-rwxrwx--x 1 fernando.laudares admin 1.1G Jul 31 14:32 _test_sql_7f51_1_main_d_1_19_B_0.tokudb

The new .tokudb file is so small compared to the other files obtained in the previous approaches that it almost goes unnoticed when looking at the disk space use:

/dev/mapper/vg0-lvFernando1 50G 12G 36G 25% /media/lvFernando1

Curiously, that file retained the name/reference of the temporary table it used, even though the temporary table file description (#sql-7f51_1.frm) disappeared as well; only joinit.frm remained under the test database directory.

I decided to do a dump & restore of the whole database to see what that would result in this case:

$ mysqldump -S /tmp/mysql_sandbox5537.sock --user=msandbox --password=msandbox --all-databases > dump2.sql $ ls -lh dump2.sql -rw-r--r-- 1 fernando.laudares admin 8.1G Jul 31 15:07 /home/fernando.laudares/dump2.sql

Once it was restored we found the ibdata1 file back to its initial default size and the disk space used by the datadir was down to a scant 1.1G. Curiously (again), the dump & restore procedure did fixed the name of the TokuDB table file:

-rwxrwx--x 1 fernando.laudares admin 980M Jul 31 15:36 _test_joinit_main_21_1_19_B_0.tokudb

The process of converting the resident InnoDB table to TokuDB was faster than compressing it with ROW_FORMAT=Compressed and resulted in a much smaller file. This is not to say that using TokuDB is the best solution but to point that the process can be accomplished in less time and make use of lesser extra disk space than InnoDB does, which can simply come up handy if you don’t have much space left. Also, remember the test was done with a not very large (and simple in structure) test table; you may need to check if yours can be converted to TokuDB and what changes to indexes you might need to do (if any).

#6) Expanding the shared tablespace

As previously mentioned, I’ve been using the default value of  “ibdata1:10M:autoextend” for innodb_data_file_path for my tests. If you have another partition with unused space available and you look only for a solution to the “running out of disk space” problem and don’t mind keeping your big table inside the shared tablespace, then you can expand it. That can be done by means of adding a second ibdata file to the tablespace definition. Of course, that only works if you create it outside the partition already hosting ibdata1.

To do so, you need to stop MySQL and verify the size of the ibdata1 file you have (list size in bytes as “ls -lh” will round it here):

$ ls -l /media/lvFernando1/data/ibdata1 -rw-r----- 1 fernando.laudares admin 10957619200 Aug 1 16:20 /media/lvFernando1/data/ibdata1

I had available space on the /media/lvFernando2 partition so I decided to have ibdata2 created there. To do so, I’ve added the following 2 lines to my.cnf:

innodb_data_home_dir = innodb_data_file_path=/media/lvFernando1/data/ibdata1:10957619200;/media/lvFernando2/tmp/ibdata2:10M:autoextend

Then I restarted MySQL and confirmed that ibdata2 was created there upon initialization:

$ ls -lh /media/lvFernando2/tmp/ total 10M -rw-rw---- 1 fernando.laudares admin 10M Aug 1 16:30 ibdata2

Three important things to note on this approach:

  1. There can be only one ibdata file listed in innodb_data_file_path configured with “autoextend” - the last one in the list.
  2. You need to redefine ibdata1 in innodb_data_file_path using it’s current size. Any other size won’t work.
  3. From the moment you redefined ibdata1 with it’s current size and added ibdata2 configured with “autoexend”, ibdata1 won’t grow any bigger; the common tablespace will be increased through ibdata2.

What the 3rd point really means in practice is that you need to plan this maneuver accordingly: if you still have space available in the partition that hosts ibdata1 and you want to use it first then you need to delay these changes until you get to that point. From the moment you add ibdata2 to the common tablespace definition new data will start to be added into it.

Conclusion

The little experiment I did helped explain how some of the solutions proposed in my previous post for moving an InnoDB table outside the shared tablespace would work in practice and, most importantly, how much more disk space would be needed along the process. It was interesting to see that none of them made use of the tmpdir when they created temporary tables to intermediate the table conversion process; those were always created on the main datadir. The test table I’ve used had only a bit more than 10G, far away from a 1TB-sized table, so the results I’ve got may differ from what you would find in a larger environment.

As complementary information, MySQL 5.6 allows for the creation of InnoDB tables having their private tablespaces living outside the datadir (but that must be defined at table creation time and can’t be changed by means of an ALTER TABLE) as well as transportable tablespaces (which allow for the copy of private tablespaces from one database server to another).

The post When (and how) to move an InnoDB table outside the shared tablespace appeared first on MySQL Performance Blog.

Trove and OpenStack

Latest MySQL Performance Blog posts - August 21, 2014 - 3:18pm

In a previous blog, my colleague Dimitri Vanoverbeke, discussed at a high level the concepts of database as a service, OpenStack and OpenStack’s implementation of a DBaaS, Trove. Today I’d like to delve a bit further into Trove and discuss where it fits in, and who benefits. First off, I’d like to point out that

Just to recap, Trove is OpenStack’s implementation of a database as a service (DBaaS) for its cloud infrastructure as a service (IaaS). And as the mission statement declares, the Trove project seeks to provide a scalable and reliable cloud database service providing functionality for both relational and non-relational database engines. With the current release of Icehouse, the technology has begun to show maturity providing both stability and a rich feature set.

In my opinion, there are two primary markets that will benefit from Trove; the first being service providers such as RackSpace who provide cloud based services similar to Amazon’s AWS. These are companies that wish to expand beyond the basic cloud services of storage and networking and provide their customer base with a richer cloud experience by providing higher level services such as DBaaS functionality.

The other player are those companies that wish to cloudify their own internal systems. The reasons for this decision are varied, ranging from the desire to maintain complete control over all the architecture and the cloud components to legal constraints limiting the use of public cloud infrastructures.

With Trove, much of the management of your database system is taken care of for you by automating a significant portion of the configuration and initial setup steps necessitated when launching a new server. This includes deployment, configuration, patching, backups, restores, and monitoring which can be administered from either a CLI interface, RESTful API’s or OpenStack’s Horizon dashboard. At this point, what Trove doesn’t provide is failover, replication and clustering. This functionality is slated to be implemented in the Kilo release of OpenStack due out in April/2015.

The process flow is relatively simple. The OpenStack Administrator first configures the basic infrastructure by installing the database service. He or she would then create an image for each type of database they wish to support such as MySQL or MongoDB. They would then import the images and offer them to their tenants. From the end users perspective only a few commandes are necessary to get up and running. First issuing the <trove create> command to create a database service instance, followed by <trove list> command to get the ID of the instance and finally trove show command to get the IP address of it.

For example to create a database, you first start off by creating a database instance. This is an isolated database environment with compute and storage resources in a single tenant environment on a shared physical host machine. You can run a database instance with a variety of database engines such as MySQL or MongoDB.

From the Trove client I can issue the following command to create a database instance called PS_troveinstance, with a volume size of 2 GB, a user called PS_user, a password PS_password and the MySQL datastore (or database engine):

$ trove create –size 2 –users PS_user:PS_password –datastore MySQL PS_troveinstance

Next I issue the following command to get the ID of the database instance:

$ trove list PS_troveinstance

And finally, to create a database called PS_trovedb, I execute:

$ trove database-create PS_troveinstance PS_trovedb

Alternatively, I could have just combined the above commands as:

$ trove create –size 2 —-database PS_trovedb users PS_user:PS_password –datastore MySQL PS_troveinstance

And thus we now have a MySQL database server containing a database called PS_trovedb.

In our next blog on OpenStack/Trove, we’ll dig even further and discuss the software and hardware requirements, and how to actually set up Trove.

The post Trove and OpenStack appeared first on MySQL Performance Blog.

A closer look at the MySQL ibdata1 disk space issue and big tables

Latest MySQL Performance Blog posts - August 21, 2014 - 8:24am

A recurring and very common customer issue seen here at the Percona Support team involves how to make the ibdata1 file “shrink” within MySQL. I can only imagine there’s a degree of regret by some of the InnoDB architects on their design decisions regarding disk-space management by the shared tablespace* because this has been a big frustration for many MySQL users over the years.

There’s a very old bug (“InnoDB ibdata1 never shrinks after data is removed,” Sept. 8 2003) documenting user dissatisfaction. Shortly before that issue celebrated its 10th anniversary, James Day, MySQL senior principal support engineer at Oracle, posted a comment explaining why things haven’t changed and he also offered possible alternative solutions. Maybe we’ll see it fixed in a future release of MySQL. We can only be sure that any new storage engine aiming to warrant the sympathy of MySQL users can’t make that same mistake again.

One general misunderstanding that exacerbates the problem is the belief that if we enable innodb_file_per_table then all InnoDB tables will live in their own tablespace (a “private” .ibd file), even though the manual is clear about the role this variable plays. The truth is that when you enable innodb_file_per_table it will only immediately affect how new InnoDB tables are created – it won’t magically export tables currently living in the shared tablespace into their own separate .ibd files. That can be manually accomplished at any time afterwards by running ALTER TABLE or OPTIMIZE TABLE on the target table. The “gotcha” here is that by doing so you’ll actually be using additional disk space – as much as the table size,  for the newly created .ibd file. ibdata1 won’t automatically shrink: the space previously used by that table will be marked as being “free” (for internal InnoDB use) but won’t be returned to the file system.

Note: Throughout this post I make a common reference between ibdata1 and the shared (also called system) tablespace. In fact, the latter can be composed by a list of file definitions (ibdata1;ibdata2, …). Each file can have a fixed size or be specified with the autoextend parameter and have a cap limiting how big they can grow (well, actually only the last file defined in that list can). The default setting for the variable ruling how the shared tablespace is defined has it living on a file located in the datadir, named ibdata1, and configured with autoextend, hence the popular reference of a “growing ibdata1″.

A big table scenario

The shared tablespace contains more than data and indexes from InnoDB tables created inside it, such as the rollback segment for running transactions (a.k.a. “undo logs”). My colleague Miguel Angel Nieto wrote a blog post last year that explains in detail what is stored inside ibdata files, why they can grow bigger than the sum of data from the tables it hosts and won’t “shrink,” and explains the only real way to reclaim unused disk space to the file system.

But there’s one scenario where the ibdata1 file grows in a “legitimate” way: When it’s storing a big table. If there’s a big table living inside the shared tablespace accounting for most of its size then there’s no “shrinking” it. We can argue this would run counter to best practices (or not) but let’s remember that innodb_file_per_table used to be disabled by default. The first releases of MySQL 5.5 had it enabled by default but then the later ones had it disabled. We find the exact opposite happening with MySQL 5.6 - it is enabled in the later versions.

If a developer who is not a database specialist much less a MySQL expert creates a successful product around a single table and was “unlucky” enough to be using a MySQL release that had innodb_file_per_table disabled by default, he or she might soon find a Terabyte table living inside ibdata1 (BTW, I recommend Bill Karwin’s excellent “How to Avoid Common (but Deadly) MySQL Development Mistakes” recorded webinar and slides for all developers using MySQL).

What to do then ?

We recently helped a customer with such a problem. They found themselves running out of disk space with a steadily growing ibdata1 file containing data from a 1.5 Terabyte table. The replicas weren’t able to keep up either, and binary logs where pilling up, thus contributing to more disk space use. They had access to a network mounted partition in their main master server but the storage was quite slow. We started looking for possible solutions to their case and the team came up with the following list of “solutions”:

1) Add one or more disks to the system. Hopefully they had the datadir lying inside a LVM partition using the XFS filesystem so the perspective of simply increasing it’s size by adding more disk space was a good one to contemplate.

2) Re-arranging files in different partitions. The binary logs were stored inside the datadir and got to use a considerable amount of disk space so moving them to another partition would free a few hundreds gigabytes already. The complicated part here was that the only available alternative storage was the slow network one, and that would contribute to make the replicas delay yet more. Later we contemplated adding a second file (ibdata2) to the shared tablespace to be located in a different partition but, again, the only one with space available was the slow network mount.

3) Archive data. Delete a large amount of old data from the table (possibly using pt-archiver) and then do a mysqldump dump and restore the table with less data; or simply leave it as is – even though the freed space won’t be reclaimed by the filesystem it will be made available internally for InnoDB to re-use it. Be aware that deleting data is a slow process; if you have a lot of old unused data then it might be more efficient to dump the data you want to preserve, truncate the table, and then import the data back to it.

4) Convert the table to MyISAM. This could be seem as a polemic solution: it certainly shouldn’t be taken lightly as a change of storage engine is a big deal. A giving application that works with InnoDB might simply not do with MyISAM. While we do not normally recommend MyISAM, it does take less space to store data than InnoDB. Be aware that MyISAM is not a transactional storage engine, and if you crash, you will have to perform a lengthy repair process on the table. Also keep in mind that after converting the table to MyISAM you would have to either do a dump & restore of the full database to have ibdata1 shrink OR make sure you have all InnoDB tables converted to MyISAM, stop MySQL, remove all ib* files, and start MySQL again to have it recreate ibdata1 with its default size.

5) Try compressing the table. Rebuild the InnoDB table with file per table enabled and ROW_FORMAT=Compressed. If replicas are already lagging behind, replication may not be able to keep up when compressed tables are used. Also, if you have a very high write volume, it probably won’t perform well enough. If that is the case, this solution is not a viable one. However, if you would consider switching to the new TokuDB storage engine you will count with much better compression than InnoDB overall, and also better performance for write intensive workloads.

Of course, from all the outlined solutions the first 2 look to be by far the least invasive ones. The problem is that not always the server has free slots waiting for the addition of an extra disk, plus the arrangement around RAID setups brings extra constraints. Moving the binary logs to a different partition could be handy but it’s not always practical or possible at all, as is the case of making more space available by deleting/archiving data and extending the shared tablespace by adding a new ibdata2 file residing in a different partition. Finally, the other solutions involving converting the table to a different storage engine and compressing were not an option in the case of this particular customer because they required the use of additional disk space in the process, which they didn’t had.

What the customer ended doing was dumping the whole data and importing it into another server, compressing the table in the process. And then re-importing it back into the main server. It took time to complete and it wasn’t an option they were keen to try at first. But sometimes there’s just no easy way out of a problem like this. They made changes on the replicas to allow them to keep up with replication even when using compressed tables and the problem was solved for now, even though they’re aware the disk space they recovered won’t be enough in the long term.

Conclusion

The most important take-away here is to never get yourself into the situation of getting too close to running out of disk space. The best solutions available to minimize disk space being used by InnoDB tables inside the shared tablespace usually require the (temporary, if you intend to do a dump & restore afterwards) use of yet more disk space. And the bigger the table the longer it will take for dump & restore and table conversions so if you’re running out of time it only make things more complicated.

This case made me curious to explore some of the other options mentioned above a little further. In a followup post I’ll share some of my findings.

The post A closer look at the MySQL ibdata1 disk space issue and big tables appeared first on MySQL Performance Blog.

A closer look at the MySQL ibdata1 disk space issue and big tables

Latest MySQL Performance Blog posts - August 21, 2014 - 8:24am

A recurring and very common customer issue seen here at the Percona Support team involves how to make the ibdata1 file “shrink” within MySQL. I can only imagine there’s a degree of regret by some of the InnoDB architects on their design decisions regarding disk-space management by the shared tablespace* because this has been a big frustration for many MySQL users over the years.

There’s a very old bug (“InnoDB ibdata1 never shrinks after data is removed,” Sept. 8 2003) documenting user dissatisfaction. Shortly before that issue celebrated its 10th anniversary, James Day, MySQL senior principal support engineer at Oracle, posted a comment explaining why things haven’t changed and he also offered possible alternative solutions. Maybe we’ll see it fixed in a future release of MySQL. We can only be sure that any new storage engine aiming to warrant the sympathy of MySQL users can’t make that same mistake again.

One general misunderstanding that exacerbates the problem is the belief that if we enable innodb_file_per_table then all InnoDB tables will live in their own tablespace (a “private” .ibd file), even though the manual is clear about the role this variable plays. The truth is that when you enable innodb_file_per_table it will only immediately affect how new InnoDB tables are created – it won’t magically export tables currently living in the shared tablespace into their own separate .ibd files. That can be manually accomplished at any time afterwards by running ALTER TABLE or OPTIMIZE TABLE on the target table. The “gotcha” here is that by doing so you’ll actually be using additional disk space – as much as the table size,  for the newly created .ibd file. ibdata1 won’t automatically shrink: the space previously used by that table will be marked as being “free” (for internal InnoDB use) but won’t be returned to the file system.

Note: Throughout this post I make a common reference between ibdata1 and the shared (also called system) tablespace. In fact, the latter can be composed by a list of file definitions (ibdata1;ibdata2, …). Each file can have a fixed size or be specified with the autoextend parameter and have a cap limiting how big they can grow (well, actually only the last file defined in that list can). The default setting for the variable ruling how the shared tablespace is defined has it living on a file located in the datadir, named ibdata1, and configured with autoextend, hence the popular reference of a “growing ibdata1″.

A big table scenario

The shared tablespace contains more than data and indexes from InnoDB tables created inside it, such as the rollback segment for running transactions (a.k.a. “undo logs”). My colleague Miguel Angel Nieto wrote a blog post last year that explains in detail what is stored inside ibdata files, why they can grow bigger than the sum of data from the tables it hosts and won’t “shrink,” and explains the only real way to reclaim unused disk space to the file system.

But there’s one scenario where the ibdata1 file grows in a “legitimate” way: When it’s storing a big table. If there’s a big table living inside the shared tablespace accounting for most of its size then there’s no “shrinking” it. We can argue this would run counter to best practices (or not) but let’s remember that innodb_file_per_table used to be disabled by default. The first releases of MySQL 5.5 had it enabled by default but then the later ones had it disabled. We find the exact opposite happening with MySQL 5.6 - it is enabled in the later versions.

If a developer who is not a database specialist much less a MySQL expert creates a successful product around a single table and was “unlucky” enough to be using a MySQL release that had innodb_file_per_table disabled by default, he or she might soon find a Terabyte table living inside ibdata1 (BTW, I recommend Bill Karwin’s excellent “How to Avoid Common (but Deadly) MySQL Development Mistakes” recorded webinar and slides for all developers using MySQL).

What to do then ?

We recently helped a customer with such a problem. They found themselves running out of disk space with a steadily growing ibdata1 file containing data from a 1.5 Terabyte table. The replicas weren’t able to keep up either, and binary logs where pilling up, thus contributing to more disk space use. They had access to a network mounted partition in their main master server but the storage was quite slow. We started looking for possible solutions to their case and the team came up with the following list of “solutions”:

1) Add one or more disks to the system. Hopefully they had the datadir lying inside a LVM partition using the XFS filesystem so the perspective of simply increasing it’s size by adding more disk space was a good one to contemplate.

2) Re-arranging files in different partitions. The binary logs were stored inside the datadir and got to use a considerable amount of disk space so moving them to another partition would free a few hundreds gigabytes already. The complicated part here was that the only available alternative storage was the slow network one, and that would contribute to make the replicas delay yet more. Later we contemplated adding a second file (ibdata2) to the shared tablespace to be located in a different partition but, again, the only one with space available was the slow network mount.

3) Archive data. Delete a large amount of old data from the table (possibly using pt-archiver) and then do a mysqldump dump and restore the table with less data; or simply leave it as is – even though the freed space won’t be reclaimed by the filesystem it will be made available internally for InnoDB to re-use it. Be aware that deleting data is a slow process; if you have a lot of old unused data then it might be more efficient to dump the data you want to preserve, truncate the table, and then import the data back to it.

4) Convert the table to MyISAM. This could be seem as a polemic solution: it certainly shouldn’t be taken lightly as a change of storage engine is a big deal. A giving application that works with InnoDB might simply not do with MyISAM. While we do not normally recommend MyISAM, it does take less space to store data than InnoDB. Be aware that MyISAM is not a transactional storage engine, and if you crash, you will have to perform a lengthy repair process on the table. Also keep in mind that after converting the table to MyISAM you would have to either do a dump & restore of the full database to have ibdata1 shrink OR make sure you have all InnoDB tables converted to MyISAM, stop MySQL, remove all ib* files, and start MySQL again to have it recreate ibdata1 with its default size.

5) Try compressing the table. Rebuild the InnoDB table with file per table enabled and ROW_FORMAT=Compressed. If replicas are already lagging behind, replication may not be able to keep up when compressed tables are used. Also, if you have a very high write volume, it probably won’t perform well enough. If that is the case, this solution is not a viable one. However, if you would consider switching to the new TokuDB storage engine you will count with much better compression than InnoDB overall, and also better performance for write intensive workloads.

Of course, from all the outlined solutions the first 2 look to be by far the least invasive ones. The problem is that not always the server has free slots waiting for the addition of an extra disk, plus the arrangement around RAID setups brings extra constraints. Moving the binary logs to a different partition could be handy but it’s not always practical or possible at all, as is the case of making more space available by deleting/archiving data and extending the shared tablespace by adding a new ibdata2 file residing in a different partition. Finally, the other solutions involving converting the table to a different storage engine and compressing were not an option in the case of this particular customer because they required the use of additional disk space in the process, which they didn’t had.

What the customer ended doing was dumping the whole data and importing it into another server, compressing the table in the process. And then re-importing it back into the main server. It took time to complete and it wasn’t an option they were keen to try at first. But sometimes there’s just no easy way out of a problem like this. They made changes on the replicas to allow them to keep up with replication even when using compressed tables and the problem was solved for now, even though they’re aware the disk space they recovered won’t be enough in the long term.

Conclusion

The most important take-away here is to never get yourself into the situation of getting too close to running out of disk space. The best solutions available to minimize disk space being used by InnoDB tables inside the shared tablespace usually require the (temporary, if you intend to do a dump & restore afterwards) use of yet more disk space. And the bigger the table the longer it will take for dump & restore and table conversions so if you’re running out of time it only make things more complicated.

This case made me curious to explore some of the other options mentioned above a little further. In a followup post I’ll share some of my findings.

The post A closer look at the MySQL ibdata1 disk space issue and big tables appeared first on MySQL Performance Blog.

How to use MySQL Global Transaction IDs (GTIDs) in production

Latest MySQL Performance Blog posts - August 20, 2014 - 1:00am

Reconfiguring replication has always been a challenge with MySQL. Each time the replication topology has to be changed, the process is tedious and error-prone because finding the correct binlog position is not straightforward at all. Global Transaction IDs (GTIDs) introduced in MySQL 5.6 aim at solving this annoying issue.

The idea is quite simple: each transaction is associated with a unique identifier shared by all servers in a given replication topology. Now reconfiguring replication is easy as the correct binlog position can be automatically calculated by the server.

Awesome? Yes it is! However GTIDs are also changing a lot of things in how we can perform operations on replication. For instance, skipping transactions is a bit more difficult. Or you can get bitten by errant transactions, a concept that did not exist before.

This is why I will be presenting a webinar on Aug. 27 at 10 a.m. PDT: Using MySQL Global Transaction IDs in Production.

You will learn what you need to operate a replication cluster using GTIDs: how to monitor replication status or to recover from replication errors, tools that can help you and tools that you should avoid and also the main issues that can occur with GTIDs.

This webinar is free but you can register today to reserve your seat. And a recording will be available afterwards. See you next week!

The post How to use MySQL Global Transaction IDs (GTIDs) in production appeared first on MySQL Performance Blog.

How to use MySQL Global Transaction IDs (GTIDs) in production

Latest MySQL Performance Blog posts - August 20, 2014 - 1:00am

Reconfiguring replication has always been a challenge with MySQL. Each time the replication topology has to be changed, the process is tedious and error-prone because finding the correct binlog position is not straightforward at all. Global Transaction IDs (GTIDs) introduced in MySQL 5.6 aim at solving this annoying issue.

The idea is quite simple: each transaction is associated with a unique identifier shared by all servers in a given replication topology. Now reconfiguring replication is easy as the correct binlog position can be automatically calculated by the server.

Awesome? Yes it is! However GTIDs are also changing a lot of things in how we can perform operations on replication. For instance, skipping transactions is a bit more difficult. Or you can get bitten by errant transactions, a concept that did not exist before.

This is why I will be presenting a webinar on Aug. 27 at 10 a.m. PDT: Using MySQL Global Transaction IDs in Production.

You will learn what you need to operate a replication cluster using GTIDs: how to monitor replication status or to recover from replication errors, tools that can help you and tools that you should avoid and also the main issues that can occur with GTIDs.

This webinar is free but you can register today to reserve your seat. And a recording will be available afterwards. See you next week!

The post How to use MySQL Global Transaction IDs (GTIDs) in production appeared first on MySQL Performance Blog.

5 great new features from Percona Cloud Tools for MySQL

Latest MySQL Performance Blog posts - August 19, 2014 - 6:00am

It’s been three months since we announced anything for Percona Cloud Tools, not because we’ve been idle but because we’ve been so busy the time flew by!  Here’s the TL;DR to pique your interest:

  • EXPLAIN queries in real-time through the web app
  • Query Analytics for Performance Schema
  • Dashboards: customizable, shared groups of charts
  • Install and upgrade the agent with 1 command line
  • Unified UI: same time range, same host wherever you go

Percona Cloud Tools for MySQL is a hosted service providing access to query performance insights for all MySQL uses. After a brief setup, unlock new information about your database and how to improve your applications. There’s a lot more, but let’s just look at these five new features…

 

EXPLAIN queries in real-time through the web app

Like many people, to get a query’s EXPLAIN plan you probably copy the query, ssh to the server, log in to MySQL, then paste the query after typing “EXPLAIN”.  With Percona Cloud Tools’ new real-time EXPLAIN feature you can simply click a button.  It’s a real timesaver.

The EXPLAIN plan is a vital part of diagnosing the query.  Now with Percona Cloud Tools you have a lot of powerful information in one place: the query, its metrics, its EXPLAIN plan, and more–and more to come, too!

 

Query Analytics for Performance Schema

The MySQL slow log is a wealth of indispensable data about queries that you cannot get anywhere else.  That’s why it’s the default for Percona Cloud Tools Query Analytics.  Like most things, however, it has tradeoffs: for one, it can be time-consuming to parse, especially on very busy servers.  Or, in the case of Amazon RDS, the slow log may simply not be available.  That’s ok now because with MySQL 5.6 or newer (including Percona Server 5.6 or newer) you can parse queries from the Performance Schema.  It’s not as data-rich as the slow log, but it has the basics and it’s a great alternative (and sometimes the only alternative) to the slow log.

 

Dashboards: customizable, shared groups of charts

Metrics Monitor has a default dashboard (a collection of charts) for MySQL.  The default dashboard is a great start because it’s created by us (Vadim, actually) so you know it’s relevant and meaningful for MySQL.  However, it presents only a fraction of the data that percona-agent collects, so we need more dashboards to organize and present other data.  Now you can create new dashboards which are accessible to everyone in the organization.  For example, Peter was recently benchmarking TokuDB, so he created a TokuDB-specific dashboard.

 

Install and upgrade the agent with 1 command line

As of percona-agent 1.0.6, you can install, upgrade, and uninstall the agent with a single command line, ran as root, like:

# curl -s https://cloud.percona.com/install | bash /dev/stdin -api-key <API KEY>

For many environments this is all you need for a first-time install of a new agent.  The install will auto-detect MySQL, configure, and run all services by default.  You can tweak things later in the web app.  This also means you can install percona-agent in an automated environment.

 

Unified UI: same time range, same host wherever you go

Like most projects, Percona Cloud Tools has evolved over time.  Consequently, certain parts of the web app were different than other parts.  These differences had workarounds, but now the system is unified. Pick a MySQL instance, pick a time range, then view whatever part of the app you want and these selections will stay the same.  This is, of course, a natural expectation because it allows you to see easily examine a specific system at a specific time range from different perspectives.

There’s a lot more, but we don’t want to take up too much of your time!

Percona Cloud Tools is still in free beta, but not for much longer, so be sure to sign up today!

The post 5 great new features from Percona Cloud Tools for MySQL appeared first on MySQL Performance Blog.

Getting my hands dirty on an OpenStack lab

Latest MySQL Performance Blog posts - August 18, 2014 - 3:00am

Like you all may know, OpenStack is currently one of the coolest open source projects, so I was thrilled when I was asked to manage the deployment of an OpenStack lab for internal Percona use. Starting from basically zero, I created tasks in our Jira and assigned them to a pool of volunteer consultants. As usual in a service company, billing is the priority so I ended up losing the 2 senior guys but fortunately most of my time was with a customer that wasn’t very demanding and I could easily multitask with the project and fill the gap. So, here it goes…

Hardware

To deploy the OpenStack lab we were given 8 similar servers in our Durham, N.C. offices. The specs are:

  • CPU: 12 physical cores (24 with HT)
  • Disks: 1 sata 4 TB drive and one 480GB SSD drive
  • Nics: 2x GbE
  • OS: Centos 6

The hardware is recent and decent, a good start.

Deployment choices

Given the hardware we had, I picked the first to be the controller and jumphost, the second to be the network node (a bit overkill) and the remaining 6 nodes would become the compute nodes. I also wanted to use Ceph and RBD with 2 types of volumes, the default using SATA and a SSD type using the SSD drives. The servers only have a single GbE interface to use with Ceph, that’s not ideal but sufficient.

So, we basically followed: OpenStack doc for Centos and had our share of learning and fun. Overall, it went relatively well with only a few hiccups with Neutron and Ceph.

Neutron

Neutron is probably the most difficult part to understand. You need to be somewhat familiar with all the networking tools and protocols to find your way around. Neutron relies on network namespaces, virtual network switches, GRE tunnels, iptables, dnsmasq, etc. For my part, I discovered network namespaces and virtual network switches.

The tasks of providing isolated networks to different tenants with their own set of IPs and firewalling rules, on the same infrastructure, is a challenging tasks. I enjoyed a lot reading Networking in too much detail, from there, things just started to make sense.

A first issue we encountered was that the iproute package on Centos is old and it does not support the network namespaces. It just needs to be replaced by a newer version. It took me some time to understand how things are connected, each tenant has its own Gre tunnel id and vlan. I can only recommend you read the above document and look at your setup. Don’t forget, you have one set of iptables rules per network namespace… The network node is mainly dealing with Natting, SNAT and DNAT while the security group rules are set on the compute nodes. The “ip netns exec …” and the “ovs-ofctl dump-flows …” have been my best friends for debugging.

Once I got things working, I realized the network performance was, to say the least, pretty bad. I switch “gro off” on the Nics but it made very little change. Finally, I found how the MTU of the VMs were too large, an easy fix by adding a configuration file for dnsmasq with “dhcp-option-force=26,1400″. With an MTU of 1400, less than the NIC MTU + the GRE header, packets were no longer split and performance went back to normal.

More on Ceph

The integration of Ceph happened to be more challenging than I first thought. First, let’s say the documentation is not as polished as the Openstack one, there are some rough edges but nothing unmanageable. The latest version, at the time, had no rpms for Centos but that was easy to work around if you know rpmbuild. Same for the Centos RPM for qemu-kvm and nova required a patch to support rbd devices. I succeeded deploying Ceph over the SATA drives, configured Glance and Cinder to use it and that was all good. The patch for nova allows to launch instances on clones of an image. While normally the image has to be copied to the local drive, an operation that can take some time if the image is large, with a clone, barely a few second after you started a VM, it is actually starting. Very impressive, I haven’t tested but I suppose the same can be accomplished with btrfs or ZFS, shared over iscsi.

Since the servers all have a SSD drive, I also tackled the task of setting up a SSD volume type. That has been a bit tricky, you need to setup a rule so that a given storage pool uses the SSD drives. The SSD drives must be placed all in their own branch in the ceph osd tree and then, you need to decompile the current rule set (crush map), modify it by adding a rule that use the ssd drives, recompile and then define a storage pool that uses the ssd rule. Having done this, I modified the cinder configuration for the “volume-ssd” type and finally, I could mount a ssd backed volumes, replicated, to a VM, quite cool.

The only drawback I found using Ceph is when you want to create a snapshot. Ideally, Ceph should handle the snapshot and it should be kept there as is. The way it works is less interesting, a snapshot is created but it is then copied to /tmp of the compute node, uncompressed…, and then copied back to a destination volume. I don’t know why it is done like that, maybe some workaround for limitations of other Cinder backends. The compute nodes have only 30GB available for /tmp and with Ceph, you must use raw images so that’s quite limiting. I already started to look at the code, maybe this could be my first contribution to the Openstack project.

Overall impressions

My first impression is that OpenStack is a big project with many moving, many moving parts. Installing OpenStack is not a beginners project, my experience saved me quite a few times. Logging, when you activate verbose and even debug in the configuration files is abundant and if you take your time, you can figure out what is going on and why something is failing.

Regarding the distribution, I am not sure Centos was the right fit, I think we had a rougher ride than we could have had using Ubuntu. My feeling is that the packages, not necessarily the OpenStack ones, were not as up to date as they should have compared to Ubuntu.

Ceph rbd backend is certainly a good point in my experience, I always wanted to touch Ceph and this project has been a very nice opportunity. The rbd backend works very well and the ability to launch instances almost instantaneously is just awesome. Another great plus is the storage redundancy (2 replica), like EBS, and the storage saving of working only on a clone of the original image. Also, the integration of the SSD backed pool adds a lot of flexibility. There’s only the instance snapshot issue that will need to be worked on.

So, that’s the current status of our OpenStack lab. I hope to be able to add a few more servers that are currently idle, allowing me to replace the over powerful network node and recycle it as a compute node Another thing I would really like to do is to mix the hypervisor types, I’d like to be able to use a lightweight container like LXC or docker. Also, although this is just a lab for the consultants, I’d like to see how to improve its available which is currently quite low. So, more fun to come!

The post Getting my hands dirty on an OpenStack lab appeared first on MySQL Performance Blog.

Make a difference! See the world! Speak at Percona Live London; Santa Clara!

Latest MySQL Performance Blog posts - August 15, 2014 - 12:00am

Twice each year members of the global open-source community converge on London in November and Santa Clara in April to network with, and learn from, some of the world’s most experienced and accomplished system architects, developers and DBAs.

And now it’s time for YOU to give back to this diverse and growing MySQL community. The Call for Speakers for Percona Live London (Nov. 3-4) closes Aug. 17 and the deadline for submitting talks for the ever-growing Percona Live MySQL Conference and Expo (April 13-16, 2015) expires Nov. 9.

If you like putting things off until the last minute, then it’s time to get to work! Aug. 17 is just two days away and November will be here before you know it (think of how fast summer has gone by already).

Close your eyes and visualize the beauty of London in November (well, usually)…. And who doesn’t enjoy spring in Silicon Valley? Think BIG: Why not submit papers for BOTH conferences? Be a MySQL jetsetter! If your proposal(s) is approved by the Conference Committee then you’ll receive a complimentary full-conference pass. (You’ll have to pay for London’s fish ‘n’ chips and Santa Clara’s famed fish tacos, though. Sorry!)

I’ve said it before (yes, just up above, but it’s important): Your participation in Percona Live London and the Percona Live MySQL Conference and Expo are opportunities to make a difference… to give back to the MySQL community. Don’t worry about being fancy. Some of the most popular talks have been on simple topics. Many people come to the conferences to learn – and we need content for MySQL newcomers as well as the veterans.

Struggling to come up with an idea? Share what you know; share what you’ve learned… the good, the bad and the ugly. These stories are gold and you owe it to the MySQL community to share them – because doing so could save the day of those who hear your tale.

It’s GO time….

  • Click here to submit your speaker proposal (yes, more than one submission is OK) for Percona Live London (November 3-4 at the Millennium Gloucester Conference Center).
  • Click here to submit your speaker proposal (again, the more the merrier) for the Percona Live MySQL Conference & Expo 2015 (April 13-16, 2015 at The Hyatt Regency Santa Clara and Santa Clara Convention Center).

Questions? Need some advice or help with your submission? Let me know in the comments section and we can connect via email. I look forward to hearing from you! I also have five limited-edition Percona t-shirts to send to the first five people who submit a talk and let me know about it here in the comments section (and yes, I will check to confirm!).

One more (OK, more than one) thing:

  • Phone your colleagues, friends and family alerting them that Early Bird pricing for Percona Live London ends Aug. 31!
  • Super Saver registration discounts for the Percona Live MySQL Conference & Expo 2015 are available through Nov. 16 (at 11:30 p.m. PST)!
  • Sponsorship opportunities (this is HUGE!) for next April’s Percona Live MySQL Conference & Expo 2015 are available! Sponsors become a part of a dynamic and growing ecosystem and interact with more than 1,000 DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solutions vendors, and entrepreneurs who typically attend the event. Current (very cool) sponsors include:
    • Diamond Plus: Continuent
    • Gold: Pythian
    • Silver: Box and Yelp
    • Exhibit Only: Blackbird
    • Media Sponsor: Database Trends & Applications , Datanami, Linux Journal, and O’Reilly Media

Pssst! One more thing (yes, yet another!): These are the people you must impress when submitting a speaker proposal…. (the more you know!)

Meet the Conference Committees!

Percona Live London and the 2015 Percona Live MySQL Conference & Expo 2015 each features a dedicated Conference Committee.

Percona Live London’s Conference Committee:

  • Cédric Peintre, DBA at Dailymotion
  • David Busby, Remote DBA EMEA Regional Lead/RDBA Security Lead at Percona
  • Colin Charles, Chief Evangelist, MariaDB
  • Luis Motta Campos, DBA at ebay Classifieds Group
  • Nicolai Plum, Senior Systems Architect at Booking.com
  • Morgan Tocker, MySQL Community Manager at Oracle
  • Art van Scheppingen, Head of database engineering at Spil Games

The Percona Live MySQL Conference & Expo 2015 Conference Committee:

  • Tamar Bercovici, Senior Engineering Manager at Box
  • Colin Charles, Chief Evangelist for MariaDB
  • Sean Chighizola, Senior Director of IT Operations at Big Fish Games
  • Jeremy Cole, Sr. Systems Engineer at Google
  • Harrison Fisk, MySQL Performance Engineer at Facebook
  • Patrick Galbraith, Principal Engineer at HP
  • Jay Janssen, Consulting Lead at Percona
  • Shlomi Noach, Senior Software Engineer at Outbrain (Conference Chairman)
  • Chris Schneider, Database Architect at Groupon
  • John Scott, Lead Database Administrator at Wellcentive
  • Gwen Shapira, Solutions Architect at Cloudera
  • Shivinder Singh, Database Architect and Engineer at Verizon Wireless
  • Calvin Sun, Senior Engineering Manager at Twitter
  • Morgan Tocker, MySQL Community Manager at Oracle
  • Peter Zaitsev, Co-founder and CEO of Percona

The post Make a difference! See the world! Speak at Percona Live London; Santa Clara! appeared first on MySQL Performance Blog.

Benchmarking IBM eXFlash™ DIMM with sysbench fileio

Latest MySQL Performance Blog posts - August 12, 2014 - 5:00am

Diablo Technologies engaged Percona to benchmark IBM eXFlash™ DIMMs in various aspects. An eXFlash™ DIMM itself is quite an interesting piece of technology. In a nutshell, it’s flash storage, which you can put in the memory DIMM slots. Enabled by Diablo’s Memory Channel Storage™ technology, this practically means low latency and some unique performance characteristics.

These devices are special, because their full performance potential is unlocked by using multiple devices to leverage the parallelism of the memory subsystem. In a modern CPU there is more than one memory controller (4 is typical nowadays) and spreading eXFlash™ DIMMs across them will provide maximum performance. There are quite some details about the device that are omitted in this post, they can be found in this redbook.

Diablo technologies also provided us a 785 GB FusionIO ioDrive 2 card (MLC). We will compare the performance of it to 4 and 8 of the eXFlash DIMMs. The card itself is a good choice for comparison because of it’s popularity, and because as you will see, it provides a good baseline.

Environment

CPU: 2x Intel Xeon E5-2690 (hyper threading enabled)
FusionIO driver version: 3.2.6 build 1212
Operating system: CentOS 6.5
Kernel version: 2.6.32-431.el6.x86_64

Sysbench command used:

sysbench --test=fileio --file-total-size=${size}G --file-test-mode=${test_mode} --max-time=${rtime} --max-requests=0 --num-threads=$numthreads --rand-init=on --file-num=64 --file-io-mode=${mode} --file-extra-flags=direct --file-fsync-freq=0 --file-block-size=16384 --report-interval=1 run

The variables above mean the following:
Size: the devices always had 50% free space.
Test_mode: rndwr, rndrw, rndrd for the different types of workloads (read, write, mixed)
Rtime: 4 hours for asynchronsous tests, 30 minutes for synchronous ones.
Mode: sync or async dependening on which test are we talking about.

From the tdimm devices, we created RAID0 arrays with linux md (this is the recommended way to use them).
In case of 4 devices:

mdadm --create /dev/md0 --level=0 --chunk=4 --raid-devices=4 /dev/td{a,c,e,g}

In case of 4 devices:

mdadm --create /dec/md0 --level=0 --chunk=4 --raid-devices=8 /dev/td{a,b,c,d,e,f,g,h}

The filesystem used for the tests was ext4, it was created on the whole device (md0 or fioa, no parititions), which the block size of 4k.

We tested these with read-only, write-only and mixed workloads with sysbench fileio. The read/write ratio with the mixed test was 1.5. Apart from the workload itself, we varied the IO mode as well (we did tests with synchronous IO as well as asynchronous IO). We did the following tests.

  • Asynchronous read workload (16 threads, 16k block size, 4 hour long)
  • Asynchronous write workload (16 threads, 16k block size, 4 hour long)
  • Asynchronous mixed workload (16 threads, 16k block size, 4 hour long)
  • Synchronous read workload (1..1024 threads, 16k block size, 0.5 hour long)
  • Synchronous write workload (1..1024 threads, 16k block size, 0.5 hour long)
  • Synchronous mixed workload (1..1024 threads, 16k block size, 0.5 hour long)

The asynchronous read tests lasted 4 hours, because we tested the consistency of the device’s performance with those over time.

Asynchronous IOReads

The throughput is fairly consistent. With 8 dimms, the peak throughput goes above 3 GB/s in case of 8 eXFlash DIMMs.

At the beginning of the testing we expected that latency would drop significantly in case of eXFlash DIMMs. We based this assumption on the fact that the memory is even “closer” to the processor than the PCI-E bus. In case of 8 eXFlash DIMMs, it’s visible that we are able to utilize the memory channels completely, which gives a boost in throughput as well as it gives lower latency.

Writes

The consistent asynchronous write  throughput is around 1GB/sec in case of eXFlash DIMM_8. Each case at the beginning has a spike up in throughput: that’s the effect of caching. In eXFlash DIMM itself, the data can arrive much faster on the bus than it could be written to the flash devices.

In case of write latency, the PCI-E based device is Somewhat better, and also the performance is more consistent compared to the case where we were using 4 eXFlash DIMMs. When using 8, similarly to the case where only reads were tested in isolation, both the throughput increased and the latency drops.

Mixed

In the mixed workload (which is quite relevant to some databases), the eXFlash DIMM shines. The throughput in case of 8 DIMMs can go as high as 1.1 GB/s for reads, while it’s doing roughly 750 MB/s writes.

In case of mixed workload, the latency is the eXFlash DIMMs is lower and far more consistent (the more DIMMs we use, the more consistent it will be).

Synchronous IOReads

In case of synchronous IO, we reached the peak throughput at 32 threads used in case of the PCI-E device, but needed 64 and 128 threads for the eXFlash DIMM_4 and eXFlash DIMM_8 configurations respectively (in both cases, the throughput at 32 threads was higher than the PCI-E device’s).

The read latency degrades much more gradually in case of the eXFlash DIMMs, which makes it very suitable for workloads like linkbench, where the buffer pool misses are more frequent than writes (many large web application have this kind of read mostly and the database doesn’t fit into memory profile).In order to be able to see the latency differences at a lower number of threads, this graph has to be zoomed in a bit. The following graph is the same (synchronous read latency), but the y axis has a maximum value of 10 ms.

The maximum value is hit at 256 threads in case of the PCI-E device, and performance is already severely degraded at 128 threads (which is the optimal degree of parallelism for the eXFlash DIMM_8 configuration). Massively parallel, synchronous reads is a workload where the eXFlash DIMMs are exceptionally good.

Writes

Because of caching mechanisms, we don’t need too much writer threads in order to utilize the devices. This makes sense, writes are easy to cache: after a write, the device can tell the operating system that the write is completed, when in reality it’s only completed later. In case of reads, this is different, when the application requests a block to read, the device can’t give the data to the application, but only read it later. Because of mechanisms like this, consistency is very important in case of write tests. The PCI-E device delivers the most consistent, but the lowest throughput. The reason for this is NUMA’s effect to the MCS performance. Because sysbench was allowed to use any CPU core, it may got lucky by hitting the DIMM which is in the CPU’s memory bank it’s currently running on, or may got unlucky, and the QPI was involved in getting the data. One case is not more rare than the other, hence we see the more wide bars on the graphs.

The latency is better in case of eXFlash DIMMs, but not as much as in case of isolated reads. It only seems to be slightly less in case of eXFlash DIMM_4, so let’s do a similar zoom here as we did in case of writes.

It’s visible on the graph that in case of lower concurrency, the eXFlash DIMM_4 and fusionio measurement is roughly the same, but as the workload becomes more parallel, the eXFlash DIMM will degrade more gradually. This more gradual degradation is also present in the eXFlash DIMM_8 case, but it starts to degrade later.

Mixed

In case of mixed throughput, the PCI-E device similarly shows more consistent, but lower performance. Using 8 DIMMs means a significant performance boost: the 1.1 GB/s reads and 750 MB/s writes like in case of asynchronous IO is doable with 256 threads and above.

Similarly to the previous cases, the response times are slightly better in case of using eXFlash DIMMs. Because the latency is quite high in case of 1024 threads in each case, a similar zoom like previous cases will show more details.

At a low concurrency, the latency is better on the PCI-E device, at least the maximum latency. We can see some fluctuation in case of the eXFlash DIMMs. The reason for that the NUMA, on a single thread, which CPU has the storage the benchmark is accessing matters a lot. As the concurrency gets higher, the response time degrades in case of the flash device, but in the case of eXFlash DIMMs, the maximum remains the same and the response time becomes more consistent. At 32 threads, the mean latency of the devices are very close, with the PCI-E device being slightly better. From there, the eXFlash DIMMs are degrading more gracefully.

Conclusion

Overall, I am impressed, especially with the read latency at massive parallelism. So far this is the fastest MLC device we have tested. I would be curious how would 16 or 32 DIMMs would perform in a machine which has enough sockets and memory channels for that kind of configuration. Like we moved one step closer to the CPU with PCI-E based flash, MCS is one step closer to the CPU compared to the PCI-E based flash.

The post Benchmarking IBM eXFlash™ DIMM with sysbench fileio appeared first on MySQL Performance Blog.

Pages

Subscribe to Percona aggregator
Contact Us 24 Hours A Day
Support Contact us 24×7
Emergency? Contact us for help now!
Sales North America (888) 316-9775 or
(208) 473-2904
Sales
Europe
+44-208-133-0309 (UK)
0-800-051-8984 (UK Toll Free)
0-800-181-0665 (GER Toll Free)
More Numbers
Training (855) 55TRAIN or
(925) 271-5054

 

Share This
]]>