Latest MySQL Performance Blog posts
Percona Server and Percona XtraDB Cluster provide high-performance alternatives for MySQL users. We have also seen rapidly growing interest in these solutions in the OpenStack community where higher performance and high availability are crucial. Many OpenStack users are adopting these solutions but we’ve also seen demand from companies creating OpenStack distros. For example, Percona XtraDB Cluster is now certified for the RHEL OSP (OpenStack Platform) and is included in the Ubuntu 14.04 LTS release. Tesora recently joined this rapidly growing list when they announced the Tesora DBaaS Platform Enterprise Edition which includes Percona Server.
The Tesora platform is an enterprise-ready database as a service (DBaaS) platform based on the OpenStack Trove project. Percona Server was certified by Tesora in August as part of their Tesora DBaaS Platform certification program.
What does this mean for you if you are one of the users who have downloaded Percona Server more than 1,000,000 times?
Many enterprises are evaluating how to deliver robust, high-performance MySQL-as-a-service. OpenStack Trove is is an open-source platform that addresses this challenge. OpenStack operators can build and offer multiple databases as a service (such as MySQL, MongoDB, or Redis). These users can create, operate, maintain and delete different kinds of SQL or NoSQL databases on demand. Users don’t need to worry about the administrative details of the database technology nor complexities such as availability, resiliency, backups, recovery and security.
The Tesora DBaaS Platform Enterprise Edition builds upon OpenStack Trove and makes the “provisioning, deployment, configuration, tuning, monitoring and administration” simpler for operators. The platform includes support for multiple database backends including Percona Server which is certified on the Tesora DBaaS platform. Administrators and DBAs looking to build and offer a database as a service using the Tesora DBaaS Platform Enterprise Edition can rest assured that Percona Server is ready to meet their needs.
OpenStack is a rapidly evolving open-source platform that depends heavily on MySQL for optimal performance. Percona is participating in the Trove project and is sharing our knowledge on OpenStack through webinars (such as “OpenStack: A MySQL DBA Perspective“), blog posts (such as “OpenStack users shed light on Percona XtraDB Cluster deadlock issues“), professional services (such as Percona Consulting Services) and the recently announced OpenStack Live Conference which will be held April 13-14 2015 at the Santa Clara Convention Center in Silicon Valley (the Call for Speakers is open until November 9!).
We look forward to working with Tesora as they build out their platform as well as the rest of the OpenStack community as this exciting technology continues to mature. I hope you can join us in Santa Clara for the OpenStack Live Conference this April – submit your speaker proposal now or purchase your ticket at Super Saver prices!
The post Percona Server, OpenStack and the Tesora DBaaS Platform appeared first on MySQL Performance Blog.
This Wednesday I’ll be discussing two common types of big data: machine-generated data and user-generated content. These types of big data are amenable to sharding, a commonly used technique for spreading data over more than one database server.
I’ll be discussing this in-depth during a live webinar at 10 a.m. Pacific time on Sept. 24. I’ll also talk about two major sharding frameworks: MySQL Fabric and Shard-Query for OLTP or OLAP workloads, respectively. Following the webinar there will be a brief Q/A session.
Find the webinar link here: “How to Scale Big Data Applications Using MySQL Sharding Frameworks” for more information or register directly here.
The post How to scale big data applications using MySQL sharding frameworks appeared first on MySQL Performance Blog.
Interest in Percona XtraDB Cluster / Galera has been high ever since we introduced the product in 2012. I typically have a conversation about Galera and Percona XtraDB Cluster (PXC) at least once a week with a consulting customer who wants to know if it will be a good fit for their application. Last week I gave a webinar entitled “Migrating to Percona XtraDB Cluster.”
I covered everything in the webinar that I feel it is important for someone to know who is considering Galera and I’d suggest anyone who wants a brief overview of PXC/Galera spends an hour watching the recording. There were many questions asked in the webinar, but I answered all of them regarding Percona XtraDB Cluster. Access to the webinar is free along with download of the accompanying slides.
* * *
What is Percona XtraDB Cluster? PXC is a replacement for conventional MySQL master/slave architectures to eliminate replication lag and achieve a highly-available masterless cluster of MySQL servers. Like all Percona software, PXC is open source and free.
MySQL upgrades are necessary tasks and we field a variety of questions here at Percona Support regarding MySQL upgrade best practices. This post highlights recommended ways to upgrade MySQL in different scenarios.
Why are MySQL upgrades needed? The reasons are many and include: Access to new features, performance benefits, bug fixes…. However, MySQL upgrades can be risky if not tested extensively beforehand with your application because the process might break it, prevent the application from functioning properly – or performance issues could arise following the upgrade. Moreover, I suggest keeping an eye on new releases of MySQL and Percona Server – check what has changed in the most recent version. Perhaps the latest release has a fix for an issue that you have been experiencing.
Upgrading one major version via SQL Dump:
Upgrading between one major version covers upgrading from Percona Server 5.1 to 5.5 or Percona Server 5.5 to 5.6 and the same implies to Oracle MySQL.
First of all, upgrading between one major version is neither straightforward nor risk-free. Initially you should read “Upgrading from Previous Series” documentation here and here. In that documentation, please place special attention to all of the sections marked “Incompatible Change” and check whether you may be affected by those changes. There might be configuration changes as well as variables renamed, a few older variables obsoleted and new variables introduced – so make sure that you adjust your my.cnf accordingly. For Percona Server specific changes please refer here and here for Percona Server 5.5 & Percona Server 5.6, respectively.
Now there are several possible approaches you may take, where one may be more feasible than the other depending on the current replication topology and total data size – and one might also be safer than another. Let me show you an upgrade procedure… an example upgrading from Percona Server 5.5 to Percona Server 5.6.
In general, there are two types of MySQL upgrades:
- In place, where you use existing datadir against the new MySQL major version, with just running mysql_upgrade after binaries are upgraded,
- SQL dump on an old version and then restore it on a new version (using mysqldump utility or alternatives, like mydumper).
Also in general the second type is safer, but as you may expect a much slower MySQL upgrade process.
Theoretically, the safest scenario is:
- Dump all user grants (using http://www.percona.com/doc/percona-toolkit/2.2/pt-show-grants.html)
- Dump all data (except the mysql database) from MySQL 5.5 into SQL dump and restore on MySQL 5.6
- Restore the user grants
Here’s a basic procedure (you should stop application writes before starting).
1) Capture users and permissions information. This will backup all your existing user privileges.$ wget percona.com/get/pt-show-grants; $ perl pt-show-grants --user=root --ask-pass --flush > /root/grants.sql
2) Produce a logical dump of the 5.5 instance, excluding the mysql, information_schema and performance_schema databases:$ mysql -BNe "SELECT SCHEMA_NAME FROM INFORMATION_SCHEMA.SCHEMATA WHERE SCHEMA_NAME NOT IN ('mysql', 'performance_schema', 'information_schema')" | tr 'n' ' ' > /root/dbs-to-dump.sql $ mysqldump --routines --events --single-transaction --databases $(cat /root/dbs-to-dump.sql) > /root/full-data-dump.sql
3) Stop 5.5 instance.$ service mysql stop or $ /etc/init.d/mysql stop
4) Move old datadir (assuming /var/lib/mysql; edit accordingly to your setup):$ mv /var/lib/mysql/ /var/lib/mysql-55
5) Install 5.6 (simply as you would do when not upgrading). If you don’t use a package manager (yum/apt-get) then is likely that you need to run mysql_install_db and mysql_upgrade.
6) Load the users back to new upgraded version of MySQL.$ mysql -uroot < /root/grants.sql
7) Load the data back to new version of MySQL.$ mysql -e "SET GLOBAL max_allowed_packet=1024*1024*1024"; $ mysql -uroot -p --max-allowed-packet=1G < /root/full-data-dump.sql;
At this point all tables have been re-created and loaded in MySQL 5.6 so every binary representation is native to MySQL 5.6. You’ve also completed the cleanest/most-stable upgrade path and your application can resume service – and for that reason it’s worth mentioning that this upgrade path is the same with either upgrading vanila MySQL or Percona Server. Further, you may upgrade from Oracle MySQL to Percona Server, for example, upgrading Oracle MySQL 5.5 to Percona Server 5.6. Again, the MySQL upgrade path as described would be the same as Percona Server, which is a drop-in replacement of Oracle MySQL.
“SQL dump” is also known as a logical backup. It is safer in the sense that when restoring, all tables will be created using the format of the new MySQL binaries you’re using, which bypasses compatibility issues in general. Still for large data like data in terabytes, gigabytes… this may be a very time-consuming approach. On the other hand, by dumping/reloading such large data sets, it is possible that you will be able to recover a lot of free space on the disk as the InnoDB table spaces will be re-created from scratch, thus optimized and defragmented. If the data was often updated/deleted, the benefits may be significant.
Minor version MySQL upgrade within the same major version via In-Place Upgrade:
This implies to upgrading within the same series e.g. MySQL 5.5.35 to MySQL 5.5.38 or Percona Server 5.6.14 to latest Percona Server 5.6.20.
This is known as an in-place upgrade, where you just install a newer binary package and then run mysql_upgrade script, which checks and updates system tables if necessary. Still, with the in-place upgrade we highly recommend checking release notes for new features, bug fixes, etc. For Percona Server 5.5 and Percona Server 5.6, release notes can be found here and here respectively.
For Percona Server we have additional documents describing some details when it comes to upgrading Percona Server with a focus on Percona-specific features that can be found here and here. This also covers complete In-Place Upgrade procedure with the yum/apt package manager.
Also, to be on safe side you can do the upgrade with a logical dump using the earlier described procedure via mysqldump or mydumper program – where the former does parallel backups and restore and logical backup – and is the safest approach for the upgrade.
MySQL Upgrade directly to the latest version by skipping one major version in between:
This includes upgrading from MySQL 5.0 to MySQL 5.5 by skipping version 5.1 in between or upgrading MySQL 5.1 to MySQL 5.6 by skipping version 5.5 in between. Further, this also includes upgrading to MySQL 5.6 directly from MySQL 5.0 although there should be very few users still using MySQL version 5.0. This also implies to Percona Server.
For the topic, we would assume upgrading from Oracle MySQL or Percona Server 5.1 directly to version 5.6 by skipping one major version 5.5 in between.
Before anything, this is a serious upgrade, and a huge step over one major MySQL version. That is, it’s risky. Upgrading by using just binaries update is not supported and it’s not safe skipping major versions in between, so you should never do this from 5.0->5.5, 5.1->5.6, and surely not for 5.0->5.6. One problem is that not all changes in MySQL versions are backwards compatible. Some differences were introduced that may affect both how the data is handled, but also how the server behaves including both SQL language and MySQL server and storage engines internals. Another thing is that between MySQL 5.0 and 5.6 versions, a number of default setting variables were changed, which may result in completely different, unexpected behavior. For example since MySQL 5.5 the default storage engine is InnoDB and since MySQL 5.6 by default InnoDB will use a separate tablespace for each table and GTID replication was also introduced. But there are many more details which I won’t list here. All of those changes are described in “Upgrading from Previous Series” documentation as described above.
It’s worth mentioning that upgrading by skipping one major version is highly not recommended. Upgrading from MySQL 5.1 to 5.6 shouldn’t be done in one shot. Instead, I would suggest upgrading from version 5.1 to 5.5 and then from version 5.5 to 5.6 and running mysql_upgrade at each step. That will cope with the changes in formats as explained in the manual.
MySQL Upgrade Precautions:
MySQL upgrade precautions are an essential part of the upgrade itself. Before you upgrade make sure you have thoroughly tested all application parts with the desired version of MySQL. This is especially needed for an upgrade between major versions or if you are upgrading by skipping one major version in-between (e.g. upgrade from MySQL 5.1 to MySQL 5.6).
Make sure you read release notes carefully and that you are aware of all the changes. You can find Oracle MySQL 5.5 and 5.6 release notes as follows:
While Percona Server specific release notes can be found below for same versions as described above.
If you are planning to upgrade to Oracle MySQL 5.6 or Percona Server 5.6 I would recommend first checking for existing critical bugs. Bugs you should aware of:
This is yet another important aspect of any MySQL upgrade. You should plan your upgrade along with an upgrade hierarchy. This is always recommend: upgrade your dev/QA servers first, then staging server’s before moving to production. In fact, you can spare upgraded instances where you have desired upgraded versions of MySQL and then test your application extensively.
Once you are happy with the MySQL upgrade on your test servers, staging servers, etc., then you can begin the MySQL upgrade on your production servers. In replication environments we highly recommend upgrading the MySQL slaves first (one by one) and then finally upgrading the MySQL master. In reality, you can upgrade one of the slaves first and run it for few days to be on safe side – all the while closely monitoring its performance. If you don’t have a replication setup it may be worth creating a replica to test the newer version of MySQL on it first. Once you are happy with the results you can upgrade remaining the slaves and finally the master.
How Percona software helps you in a MySQL upgrade:
In any MySQL upgrade, Percona Toolkit comes to the rescue. Percona Tookit contains a number of tools that help a great deal.
pt-upgrade is one of such tool. It allows you to test whether the new MySQL instance handles some specific queries at least as fast as old version. There may be some substantial differences as the MySQL query optimizer has changed a lot between versions 5.1 and 5.6 and also data statistics may be refreshed, hence the query plan may change. You can check further in the manual about optimizer changes.
pt-query-digest is another great tool that might help you in the upgrade. You can replay your slow query log against existing and new desired MySQL versions for before and after query performance validation.
You can also benefit from Percona Cloud Tools for MySQL which is a hosted service providing access to query performance insights for all MySQL uses. You can signup for free now because this service is in public beta. Percona Cloud Tools, among other things, allows you to visually check your queries performance after a MySQL upgrade.
It’s highly recommended to backup your data before your MySQL upgrade. Percona XtraBackup is free and open source (like all Percona software). It’s a hot backup tool which backs-up your data online without scarifying read/write ability from the database and it will backup your data with minor impact.
Last but not least, You will find this post pretty useful, too: “Upgrading MySQL.” It’s a few years old but still very relevant. And also take a look at this informative webinar, “Upgrading to MySQL 5.6: Best Practices.” Both are from Percona CEO Peter Zaitsev.
A MySQL upgrade might look like a simple task – but actually it’s not. I’ve tried to cover most of the MySQL upgrade scenarios in this post that you will encounter. Again, I recommend to briefly test your application parts before pushing it “live,” otherwise it may break your application or part of it – or may minimize performance instead of a performance gain. Finally, I recommend having a downgrade plan in place before the MySQL upgrade just in case something goes wrong. Planning a proper downgrade procedure will minimize your app downtime when things go wrong. I’m looking forward to your comments and questions below.
I need to store exponentially increasing amounts of data and analyze all of it in real-time.
This is also known simply as: “We have big data.” Typically, this data is used for user interaction analysis, ad tracking, or other common click stream applications. However, it can also be seen in threat assessment (ddos mitigation, etc), financial forecasting, and other applications as well. While MySQL (and other OLTP systems) can handle this to a degree, it is by no means a forte. Some of the pain points include:
- Cost of rapidly increasing, expensive disk storage (OLTP disks need to be fast == $$)
- Performance decrease as the data size increases
- Wasted hardware resources (excess I/O, etc)
- Impact against other time-sensitive transactions (i.e. OLTP workload)
While there are many approaches to this problem – and often times, the solution is actually a hybrid of many individually tailored components – a solution that I have seen more frequently in recent work is HP Vertica.
At the 30,000 foot overview, Vertica is built around the following principles:
- Columnar data store
- Highly compressed data
- Clustered solution for both availability and scalability
Over the next few weeks, I’ll discuss several aspects of Vertica including:
- Underlying architecture and concepts
- Basic installation and use
- Different data loading techniques
- Some various maintenance/operational procedures
- Some comparisons vs. traditional OLTP (MySQL) performance
- Some potential use-cases
- Integration with other tools (such as Hadoop)
While Vertica is by no means the silver bullet that will solve all of your needs, it may prove to be a very valuable tool in your overall approach to managing big data.
I recently encountered a situation in which after running Percona Toolkit’s pt-table-checksum on a customer system, 95% of the table on the MySQL master was different on the MySQL slave. Although this table was not a critical part of the infrastructure, from time to time, writes to the table from the master would break replication. Additionally, this table has about 6 million rows, and running pt-table-sync would take sometime. Rebuilding the slave from backup of the master would not be an easy option as well since the slave acts as an archive where it has a lot more data than the master.
So how did we solve it? With pt-online-schema-change and a NOOP ALTER.pt-online-schema-change --alter 'ENGINE=INNODB' D=dbname,t=tblname
How is it possible? pt-online-schema-change works by creating a shadow copy of the original table and start copying the rows from the latter to the former. An additional set of TRIGGERs also ensures that any additional changes to existing rows after they have been copied to the shadow version will propagate.
There is little one caveat though, the binlog_format on the master would have to be ROW so the actual ROW images from the master would be copied to the slave. If your binlog_format is set to STATEMENT, you’d still end up with the same inconsistency. Since change statements will be logged as STATEMENT, the NOOP ALTER will operate on the slave table instead of copying from the master. You can configure the binlog_format for the alter process via the –set-vars option i.e. --set-vars 'binlog_format=ROW' to make it more easier!
The post Syncing MySQL slave table with pt-online-schema-change appeared first on MySQL Performance Blog.
Percona Live London 2014 is fast approaching – November is just around the corner. This year’s conference, November 3-4, will be even bigger and better than last year thanks to the participation of leading MySQL experts the world over (including you!).
The Percona Live London MySQL Conference is a great event for users of any level using any of the major MySQL branches: MySQL, MariaDB or Percona Server. And this year we once again host a star-studded group of keynote speakers from industry-leading companies in the MySQL space.
We’ll also be welcoming leading MySQL practitioners from across the industry (and from all corners of the world) who will speak on topics that matter to you now – see the full conference schedule here:
Monday starts early with a full day of tutorials and a fun evening at the community dinner. Attendees will be arriving in true London style on a double-decker bus! Tuesday morning will kick-off with a series of keynotes followed by interactive breakout sessions – wrapping things up at the end of the day with a fun post-conference reception (a great chance to make new friends and reconnect with old ones).
Here’s a sneak peek at some of the must-see events this year:
- Oracle’s Tomas Ulin will speak on “MySQL: It Just Keeps Getting Better”
- Facebook’s Shlomo Priymak will explain “MySQL Automation at Facebook Scale”
- I’ll update everyone on the “Ever Evolving MySQL Landscape
- Booking.com’s Jean-François Gagné will discuss “High Availability, Disaster Recovery and Extreme Read Scaling using Binlog Servers”
- Continuent’s Robert Hodges will talk about how “MySQL Comes of Age”
- Codership’s Alexey Yurchenko will make a “Synchronous Revelation”
- Tim Callaghan of Tokutek will share “Performance Bechmarking: Tips, Tricks, and Lessons Learned”
- Baron Schwartz of VividCortex will layout a blueprint for “Building a Time-Series Database on MySQL”
- OlinData’s Walter Heck will walk you through “Moving a MySQL infrastructure with 130k QPS to Galera”
To recap, here are the Top Ten reasons to attend Percona Live London this November 3-4:
10. Advanced Rate Pricing ends October 5th
9. Hear about the hottest current topics and trends.
8. Network! Meet face-to-face in the “hallway track” and make lasting connections.
7. Learn how to make MySQL work better for you – regardless of your expertise.
6. Have a blast at the community dinner!
5. Discuss your unique challenges with experts and discover options for solving them.
4. Engage with the sponsors at their tabletop exhibits.
3. Listen to top industry leaders describe the future of the MySQL ecosystem
2. Learn what works, and what doesn’t, from leading companies using MySQL
1. And the Number 1 reason to attend Percona Live London 2014: ALL of the above!
I look forward to seeing you in London this November and don’t forgot that Advanced Rate Pricing pricing ends October 5 so be sure to register now!
The post Percona Live London: Top Ten reasons to attend Nov. 3-4 appeared first on MySQL Performance Blog.
I am proud to announce OpenStack Live, a new annual conference that will run in parallel with the Percona Live MySQL Conference & Expo at the Santa Clara Convention Center in Silicon Valley. The inaugural event, OpenStack Live 2015, is April 13-14, 2015. We are lining up a strong Conference Committee and are now accepting tutorial and breakout session speaking proposals through November 9.
OpenStack Live will emphasize the essential elements of making OpenStack work better with emphasis on the critical role of MySQL and the value of Trove. You’ll hear about the hottest current topics, learn about operating a high-performing OpenStack deployment, and listen to top industry leaders describe the future of the OpenStack ecosystem. We are seeking speaking proposals on the following topics:
- Performance Optimization of OpenStack
- OpenStack Operations
- OpenStack Trove
- Replication and Backup for OpenStack
- High Availability for OpenStack
- OpenStack User Stories
- Monitoring and Tools for OpenStack
The conference features a full day of keynotes, breakout sessions, and Birds of a Feather sessions on April 14 preceded by an optional day of tutorials on April 13. A Monday reception will be held on the exhibit floor and joint lunches with both conferences offer you the opportunity to network with both the OpenStack and MySQL communities from both conferences. The OpenStack Live conference is a great event for users of any level.
As a bonus, OpenStack Live attendees may attend any Percona Live MySQL Conference session during the days of the OpenStack event. Conference only passes are available for April 14 and conference and tutorial passes are available for both April 13 and 14.
If you are using OpenStack and have a story to share – or a skill to teach – then now is the time to put pen to paper (or fingers to keyboard) and write your speaking proposal for either breakout or tutorial sessions (or both). Submissions will be reviewed by the OpenStack Live Conference Committee, which includes:
- Mark Atwood: Director – Open Source Evangelism for HP Cloud Services
- Rich Bowen: OpenStack Community Liaison at Red Hat
- Jason Rouault: Senior Director OpenStack Cloud at Time Warner Cable
- Peter Boros: Principal Architect at Percona
Presenting at OpenStack Live 2015 is your chance to put your ideas, case studies, best practices and technical knowledge in front of an intelligent, engaged audience of OpenStack users. If selected as a speaker by our Conference Committee, you will receive a complimentary full conference pass.
Public speaking not your thing or just want to learn about the latest and greatest OpenStack technologies, deployments and projects? Then register now and save big with our early bird discount. OpenStack Live 2015 is an ideal opportunity for organizations to connect with the community of OpenStack enthusiasts from Silicon Valley and around the world. The Percona Live MySQL Conference this past April had over 1,100 registered attendees from 40 countries and the OpenStack Open Source Appreciation Day on the Monday before the conference was fully booked so don’t delay, register today to save your seat!
We are currently accepting sponsors. You can learn more about sponsorship opportunities here.
I hope to see you at OpenStack Live 2015 next April! And speakers, remember the deadline to submit your proposals is November 9. In the meantime you can learn more by visiting the official OpenStack Live 2015 website.
The post OpenStack Live 2015: Call for speakers open through November 9 appeared first on MySQL Performance Blog.
Percona XtraBackup enables backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, Percona XtraBackup drives down backup costs while providing unique features for MySQL backup.
- Percona XtraBackup has implemented support for Galera GTID auto-recovery. Percona XtraBackup retrieves the GTID information, after backing up a server with backup locks support, from the InnoDB trx header on recovery and creates the xtrabackup_galera_info during that stage.
- Percona XtraBackup is now built with system zlib library instead of the older bundled one. Bug fixed #1108016.
- apt-get source was downloading older version of Percona XtraBackup. Bug fixed #1363259.
- innobackupex would ignore the innobackupex --databases without innobackupex --stream option and back up all the databases. Bug fixed #569387.
- rsync package wasn’t a dependency although it is required for the innobackupex --rsync option. Bug fixed #1259436.
- innobackupex --galera-info was checking only for non-capitalized wsrep_* status variables which was incompatible with MariaDB Galera Cluster 10.0. Bug fixed #1306875.
- Percona XtraBackup would crash trying to remove absent table from InnoDB data dictionary while preparing a partial backup. Bug fixed #1340717.
- Percona XtraBackup now supports MariaDB GTID. Bugs fixed #1329539 and #1326967 (Nirbhay Choubey).
- MariaDB 10.1 is now added to the list of supported servers. Bug fixed #1364398.
- Percona XtraBackup would fail to restore (copy-back) tables that have partitions with their own tablespace location. Bug fixed #1322658.
I’ll have the pleasure to present, next Wednesday, September 17 at 10 a.m. PDT (1 p.m. EDT) a webinar titled “OpenStack: A MySQL DBA Perspective.” Everyone is invited.
The webinar will be divided into two parts. The first part will cover how MySQL can be used by the OpenStack infrastructure including the expected load, high-availability solutions and geo-DR.
The second part will focus on the use of MySQL within an OpenStack cloud. We’ll look into the various options that are available, the traditional ones and Trove. We’ll also discuss the block device options in regards with MySQL performance and, finally, we’ll review the high-availability implications of running MySQL in an OpenStack cloud.
Register here. I look forward to your questions, and if you have any related to OpenStack that I can help with in advance of the webinar please feel free to post those in the comments section below. I’ll write a followup post after the webinar to recap all related questions and answers. I’ll also provide the slides.
See you next Wednesday!
The post OpenStack: A MySQL DBA Perspective – Sept. 17 webinar appeared first on MySQL Performance Blog.
I was fortunate to attend an Ops discussion about databases at the OpenStack Summit Atlanta this past May as one of the panelists. The discussion was about deadlock issues OpenStack operators see with Percona XtraDB Cluster (of course this is applicable to any Galera-based solution). I asked to describe what they are seeing, and as it turned out, nova and neutron uses the SELECT … FOR UPDATE SQL construct quite heavily. This is a topic I thought was worth writing about.Write set replication in a nutshell (with oversimplification)
Any node is writable, and replication happens in write sets. A write set is practically a row based binary log event or events and “some additional stuff.” The “some additional stuff” is good for 2 things.
- Two write sets can be compared and told if they are conflicting or not.
- A write set can be checked against a database if it’s applicable.
Before committing on the originating node, the write set is transferred to all other nodes in the cluster. The originating node checks that the transaction is not conflicting with any of the transactions in the receive queue and checks if it’s applicable to the database. This process is called certification. After the write set is certified the transaction is committed. The remote nodes will do certification asynchronously compared to the local node. Since the certification is deterministic, they will get the same result. Also the write set on the remote nodes can be applied later because of this reason. This kind of replication is called virtually synchronous, which means that the data transfer is synchronous, but the actual apply is not.
We have a nice flowchat about this.
Since the write set is only transferred before commit, InnoDB row level locks, which are held locally, are not held on remote nodes (if these were escalated, each row lock would take a network round trip to acquire). This also means that by default if multiple nodes are used, the ability to read your own writes is not guaranteed. In that case, a certified transaction, which is already committed on the originating node can still sit in the receive queue of the node the application is reading from, waiting to be applied.SELECT … FOR UPDATE
The SELECT … FOR UPDATE construct reads the given records in InnoDB, and locks the rows that are read from the index the query used, not only the rows that it returns. Given how write set replication works, the row locks of SELECT … FOR UPDATE are not replicated.Putting it together
Let’s create a test table.CREATE TABLE `t` ( `id` int(11) NOT NULL AUTO_INCREMENT, `ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1
And some records we can lock.pxc1> insert into t values(); Query OK, 1 row affected (0.01 sec) pxc1> insert into t values(); Query OK, 1 row affected (0.01 sec) pxc1> insert into t values(); Query OK, 1 row affected (0.01 sec) pxc1> insert into t values(); Query OK, 1 row affected (0.00 sec) pxc1> insert into t values(); Query OK, 1 row affected (0.01 sec)pxc1> select * from t; +----+---------------------+ | id | ts | +----+---------------------+ | 1 | 2014-06-26 21:37:01 | | 4 | 2014-06-26 21:37:02 | | 7 | 2014-06-26 21:37:02 | | 10 | 2014-06-26 21:37:03 | | 13 | 2014-06-26 21:37:03 | +----+---------------------+ 5 rows in set (0.00 sec)
On the first node, lock the record.pxc1> start transaction; Query OK, 0 rows affected (0.00 sec) pxc1> select * from t where id=1 for update; +----+---------------------+ | id | ts | +----+---------------------+ | 1 | 2014-06-26 21:37:01 | +----+---------------------+ 1 row in set (0.00 sec)
On the second, update it with an autocommit transaction.pxc2> update t set ts=now() where id=1; Query OK, 1 row affected (0.01 sec) Rows matched: 1 Changed: 1 Warnings: 0 pxc1> select * from t; ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
Let’s examine what happened here. The local record lock held by the started transation on pxc1 didn’t play any part in replication or certification (replication happens at commit time, there was no commit there yet). Once the node received the write set from pxc2, that write set had a conflict with a transaction still in-flight locally. In this case, our transaction on pxc1 has to be rolled back. This is a type of conflict as well, but here the conflict is not caught on certification time. This is called a brute force abort. This happens when a transaction done by a slave thread conflict with a transaction that’s in-flight on the node. In this case the first commit wins (which is the already replicated one) and the original transaction is aborted. Jay Janssen discusses multi-node writing conflicts in detail in this post.
The same thing happens when 2 of the nodes are holding record locks via select for update. Whichever node commits first will win, the other transaction will hit the deadlock error and will be rolled back. The behavior is correct.
Here is the same SELECT … FOR UPDATE transaction overlapping on the 2 nodes.pxc1> start transaction; Query OK, 0 rows affected (0.00 sec) pxc2> start transaction; Query OK, 0 rows affected (0.00 sec)pxc1> select * from t where id=1 for update; +----+---------------------+ | id | ts | +----+---------------------+ | 1 | 2014-06-26 21:37:48 | +----+---------------------+ 1 row in set (0.00 sec) pxc2> select * from t where id=1 for update; +----+---------------------+ | id | ts | +----+---------------------+ | 1 | 2014-06-26 21:37:48 | +----+---------------------+ 1 row in set (0.00 sec)pxc1> update t set ts=now() where id=1; Query OK, 1 row affected (0.01 sec) Rows matched: 1 Changed: 1 Warnings: 0 pxc2> update t set ts=now() where id=1; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0pxc1> commit; Query OK, 0 rows affected (0.00 sec) pxc2> commit; ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transactionWhere does this happen in OpenStack?
For example in OpenStack Nova (the compute project in OpenStack), tracking the quota usage uses the SELECT…FOR UPDATE construct.# User@Host: nova[nova] @ [10.10.10.11] Id: 147 # Schema: nova Last_errno: 0 Killed: 0 # Query_time: 0.001712 Lock_time: 0.000000 Rows_sent: 4 Rows_examined: 4 Rows_affected: 0 # Bytes_sent: 1461 Tmp_tables: 0 Tmp_disk_tables: 0 Tmp_table_sizes: 0 # InnoDB_trx_id: C698 # QC_Hit: No Full_scan: Yes Full_join: No Tmp_table: No Tmp_table_on_disk: No # Filesort: No Filesort_on_disk: No Merge_passes: 0 # InnoDB_IO_r_ops: 0 InnoDB_IO_r_bytes: 0 InnoDB_IO_r_wait: 0.000000 # InnoDB_rec_lock_wait: 0.000000 InnoDB_queue_wait: 0.000000 # InnoDB_pages_distinct: 2 SET timestamp=1409074305; SELECT quota_usages.created_at AS quota_usages_created_at, quota_usages.updated_at AS quota_usages_updated_at, quota_usages.deleted_at AS quota_usages_deleted_at, quota_usages.deleted AS quota_usages_deleted, quota_usages.id AS quota_usages_id, quota_usages.project_id AS quota_usages_project_id, quota_usages.user_id AS quota_usages_user_id, quota_usages.resource AS quota_usages_resource, quota_usages.in_use AS quota_usages_in_use, quota_usages.reserved AS quota_usages_reserved, quota_usages.until_refresh AS quota_usages_until_refresh FROM quota_usages WHERE quota_usages.deleted = 0 AND quota_usages.project_id = '12ce401aa7e14446a9f0c996240fd8cb' FOR UPDATE;So where does it come from?
These constructs are generated by SQLAlchemy using with_lockmode(‘update’). Even in nova’s pydoc, it’s recommended to avoid with_lockmode(‘update’) whenever possible. Galera replication is not mentioned among the reasons to avoid this construct, but knowing how many OpenStack deployments are using Galera for high availability (either Percona XtraDB Cluster, MariaDB Galera Cluster, or Codership’s own mysql-wsrep), it can be a very good reason to avoid it. The solution proposed in the linked pydoc above is also a good one, using an INSERT INTO … ON DUPLICATE KEY UPDATE is a single atomic write, which will be replicated as expected, it will also keep correct track of quota usage.
The simplest way to overcome this issue from the operator’s point of view is to use only one writer node for these types of transactions. This usually involves configuration change at the load-balancer level. See this post for possible load-balancer configurations.
The post OpenStack users shed light on Percona XtraDB Cluster deadlock issues appeared first on MySQL Performance Blog.
There are a lot of tools that generate test data. Many of them have complex XML scripts or GUI interfaces that let you identify characteristics about the data. For testing query performance and many other applications, however, a simple quick and dirty data generator which can be constructed at the MySQL command line is useful.
First, let’s talk about what kind of data you can easily create with MySQL function calls:
You can generate a decimal number between zero and another number using the MySQL RAND() function like the following query (here between 0 and 10000):SELECT RAND() * 10000;
Similarly, you can generate a random integer by adding the FLOOR() function:SELECT FLOOR(RAND() * 10000)
You can generate a random string of 32 characters using MD5():SELECT MD5(RAND() * 10000)
You can return a random integer between 500 and 1000 with the following:SELECT FLOOR( 500 + RAND() * (1000 - 500))
You can return a random string from a list of strings by using a table to hold the list. A subselect can select a random name from the list of names.create table names(id int auto_increment primary key, name varchar(20)); insert into names (name) values ('Justin','Jerry','James','Josh','Julien'); select (select name from names where id = 1 + rand() * 4);
Now we can generate a “fact” table with many rows using fairly simple SQL statements.
First create a table to generate data into:CREATE TABLE fact ( dim1 int, dim2 int, name varchar(20), hash varchar(32), measure1 double );
Seed the table with one initial row:INSERT INTO fact VALUES (1,1,'Justin',md5(''), .1);
Now grow the table by selecting from the table but providing new random values for the inserted rows:INSERT INTO fact SELECT FLOOR(1+ rand()*9999), FLOOR(1 + rand()*499), (select name from names where id = 1 + rand() * 4), MD5(1+rand()*9999), rand() FROM fact;
As you repeat the INSERT … SELECT, the table will grow exponentially. You may want to add a LIMIT clause to the INSERT … SELECT to reduce the amount of data generated as the table grows.
You will create a table with an even data distribution for each column. You can then add some queries to add skew, either using INSERT … SELECT or UPDATE, for example:INSERT INTO fact SELECT 1,1,'Justin',md5(''), .1 FROM fact LIMIT 10000;
That will skew the values by creating many rows with the same data as our initial row.
Using these simple tools, you can generate a data set that is great for testing purposes. For example, dim1 might be a customer_id and dim2 a product_id, and you would populate those tables with 10000 and 500 rows, respectively.
A common migration path from standalone MySQL/Percona Server to a Percona XtraDB Cluster (PXC) environment involves some measure of time where one node in the new cluster has been configured as a slave of the production master that the cluster is slated to replace. In this way, the new cluster acts as a slave of the production environment – traditional replication takes care of getting the data into the cluster, and then Galera replication handles the intra-cluster traffic. This often works without issue, although there is one case that I’ve encountered recently where special care must be taken to properly configure the stream to ensure that replication does not break. If you use multi-valued inserts with auto-increment columns, then this post is for you.
For purposes of our discussion, assume that we have a basic 3-node PXC cluster that we’ve set up using the PXC Reference Architecture document, and that we’re replicating from an asynchronous master (call it “server A”) into one of the PXC nodes. Without loss of generality, we’ll pick PXC03. Also, for purposes of our discussion, we’ll be working with the following table definition:serverA> show create table auto_inc_test; CREATE TABLE `auto_inc_test` ( `i` int(11) NOT NULL AUTO_INCREMENT, `stuff` varchar(20) DEFAULT NULL, PRIMARY KEY (`i`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 1 row in set (0.00 sec) serverA> SELECT * FROM auto_inc_test; Empty set (0.00 sec)
If we insert rows into this table one at a time, we have no issues.serverA> INSERT INTO auto_inc_test(stuff) VALUES ('first row'); serverA> INSERT INTO auto_inc_test(stuff) VALUES ('second row'); serverA> INSERT INTO auto_inc_test(stuff) VALUES ('third row'); serverA> SELECT * FROM auto_inc_test; +---+------------+ | i | stuff | +---+------------+ | 1 | first row | | 2 | second row | | 3 | third row | +---+------------+ PXC03> SELECT * FROM auto_inc_test; +---+------------+ | i | stuff | +---+------------+ | 1 | first row | | 2 | second row | | 3 | third row | +---+------------+
But if we start doing multi-valued inserts, we can run into a problem.serverA> INSERT INTO auto_inc_test(stuff) VALUES('first row'),('second row'),('third row'); serverA> INSERT INTO auto_inc_test(stuff) VALUES('fourth row'),('fifth row'); serverA> SELECT * FROM auto_inc_test; +---+------------+ | i | stuff | +---+------------+ | 1 | first row | | 2 | second row | | 3 | third row | | 4 | fourth row | | 5 | fifth row | +---+------------+ PXC03> SELECT * FROM auto_inc_test; +---+------------+ | i | stuff | +---+------------+ | 1 | first row | | 2 | second row | | 5 | third row | +---+------------+ PXC03> SHOW SLAVE STATUS; ... output elided ... Last_SQL_Errno: 1062 Last_SQL_Error: Error 'Duplicate entry '5' for key 'PRIMARY'' on query. Default database: 'test'. Query: 'INSERT INTO auto_inc_test (stuff) VALUES ('fourth row'),('fifth row')' ... output elided ...
Uh oh. Replication is broken and our data is now inconsistent. So why does this happen and how can we prevent it?binlog_format
The astute observer will note that I have not yet said anything about the binary log format on the master. If the binary log format on the master is already set to ROW, then the above error will not occur. RBR will properly replicate multi-valued INSERTs to the PXC cluster without issue, and the data will be consistent. Problem solved. However, there may be reasons that the master is not using or cannot use RBR, such as disk space or IOPS limitations, and thus it’s running in MIXED or STATEMENT mode. In that case, we need to look elsewhere….wsrep_auto_increment_control
When set to ON (the default), this variable has the effect of automatically specifying values for auto_increment_increment and auto_increment_offset based on the cluster size. The idea behind it is to help prevent auto-increment value conflicts when writing to multiple nodes. However, what it also means is that in a multi-node cluster, the auto-increment values generated by any given node will never be consecutive and the “next” auto-increment value on the slave cluster node will always be higher than what the master believes it should be. For example:serverA> INSERT INTO auto_inc_test (stuff) VALUES ('first row'),('second row'),('third row'); serverA> SHOW CREATE TABLE auto_inc_test; CREATE TABLE `auto_inc_test` ( `i` int(11) NOT NULL AUTO_INCREMENT, `stuff` varchar(20) DEFAULT NULL, PRIMARY KEY (`i`) ) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1 PXC03> SHOW CREATE TABLE auto_inc_test; CREATE TABLE `auto_inc_test` ( `i` int(11) NOT NULL AUTO_INCREMENT, `stuff` varchar(20) DEFAULT NULL, PRIMARY KEY (`i`) ) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=latin1
Hmm, that’s not good.
The above scenario can be avoided by setting wsrep_auto_increment_control to OFF on the node that’s acting as a slave while the cluster is still receiving asynchronous replication traffic. This can be configured in /etc/my.cnf or set dynamically from the MySQL command line with:SET GLOBAL wsrep_auto_increment_control='OFF';
In the testing that I’ve done, this appears to be sufficient to deal with the problem, even if the original master has non-standard values specified for the auto_increment_* variables (e.g., because it’s part of a master-master pair where the odd ID numbers are generated by one server and the even ID numbers from the other one).
In fact, if the cluster is always going to be used in single-writer mode, there’s a compelling argument to be made for setting this variable to OFF on all of the cluster nodes even when going into production: it will prevent the cluster from burning through N auto-increment IDs every time a single row is inserted.Fixing it when it’s broken
There’s no secret magic here. Percona Toolkit’s pt-table-checksum and pt-table-sync can be used to check and repair the data divergence between the master and the slave cluster node, just as if PXC were not involved, although you may find that it’s just faster to rebuild the cluster, depending upon how many diffs are found. Sometimes this issue happens right away when the master-to-cluster replication is started, and fixing it involves just one or two rows; other times I have seen it not occur for days with lots of rows out of sync once it finally breaks.The tl;dr version
In sum, if you’re doing traditional replication into a PXC cluster (or any other flavor of MySQL/MariaDB + Galera), you may have issues with multi-valued INSERT statements, and this can be prevented with a configuration change on either side of the replication stream:
- On the master, set binlog_format=ROW
- On the PXC slave node, set wsrep_auto_increment_control=OFF
The post Multi-Valued INSERTs, AUTO_INCREMENT & Percona XtraDB Cluster appeared first on MySQL Performance Blog.
When a write query is sent to Percona XtraDB Cluster all the nodes store the writeset on a file called gcache. By default the name of that file is galera.cache and it is stored in the MySQL datadir. This is a very important file, and as usual with the most important variables in MySQL, the default value is not good for high-loaded servers. Let’s see why it’s important and how can we calculate a correct value for the workload of our cluster.
What’s the gcache?
When a node goes out of the cluster (crash or maintenance) it obviously stops receiving changes. When you try to reconnect the node to the cluster the data will be outdated. The joiner node needs to ask a donor to send the changes happened during the downtime.
The donor will first try to transfer an incremental (IST), that is, the writesets the cluster received while the node was down. The donor checks the last writeset received by the joiner and then checks local gcache file. If all needed writesets are on that cache the donor sends them to the joiner. The joiner applies them and that’s all, it is up to date and ready to join the cluster. Therefore, IST can only be achieved if all changes missed by the node that went away are still in that gcache file of the donor.
On the other hand, if the writesets are not there a full transfer would be needed (SST) using one of the supported methods, XtraBackup, Rsync or mysqldump.
In a summary, the difference between a IST and SST is the time that a node needs to join the cluster. The difference could be from seconds to hours. In case of WAN connections and large datasets maybe days.
That’s why having a correct gcache is important. It work as a circular log, so when it is full it starts to rewrite the writesets at the beginning. With a larger gcache a node can be out of the cluster more time without requiring a SST. My colleague Jay Janssen explains in more detail about how IST works and how to find the right server to use as donor.
Calculating the correct size
When trick is pretty similar to the one used to calculate the correct InnoDB log file size. We need to check how many bytes are written every minute. The variables to check are:
wsrep_replicated_bytes: Total size (in bytes) of writesets sent to other nodes.
wsrep_received_bytes: Total size (in bytes) of writesets received from other nodes.mysql> show global status like 'wsrep_received_bytes'; show global status like 'wsrep_replicated_bytes'; select sleep(60); show global status like 'wsrep_received_bytes'; show global status like 'wsrep_replicated_bytes'; +----------------------+----------+ | Variable_name | Value | +----------------------+----------+ | wsrep_received_bytes | 83976571 | +----------------------+----------+ +------------------------+-------+ | Variable_name | Value | +------------------------+-------+ | wsrep_replicated_bytes | 0 | +------------------------+-------+ [...] +----------------------+----------+ | Variable_name | Value | +----------------------+----------+ | wsrep_received_bytes | 90576957 | +----------------------+----------+ +------------------------+-------+ | Variable_name | Value | +------------------------+-------+ | wsrep_replicated_bytes | 800 | +------------------------+-------+
Bytes per minute:
(second wsrep_received_bytes – first wsrep_received_bytes) + (second wsrep_replicated_bytes – first wsrep_replicated_bytes)
(90576957 – 83976571) + (800 – 0) = 6601186 bytes or 6 MB per minute.
Bytes per hour:
6MB * 60 minutes = 360 MB per hour of writesets received by the cluster.
If you want to allow one hour of maintenance (or downtime) of a node, you need to increase the gcache to that size. If you want more time, just make it bigger.
The post How to calculate the correct size of Percona XtraDB Cluster’s gcache appeared first on MySQL Performance Blog.
Based on Percona Server 5.5.39-36.0 including all the bug fixes in it, Galera Replicator 2.11, and on Codership wsrep API 25.11, Percona XtraDB Cluster 5.5.39-25.11 is now the current 5.5 General Availability release. All of Percona‘s software is open-source and free, and all the details of the release can be found in the 5.5.39-25.11 milestone at Launchpad.
- New session variable wsrep_sync_wait has been implemented to control causality check. The old session variable wsrep_causal_reads is deprecated but is kept for backward compatibility (#1277053).
- systemd integration with RHEL/CentOS 7 is now available for Percona XtraDB Cluster from our testing repository (#1342223).
- Percona XtraDB Cluster has implemented threadpool scheduling fixes. Bug fixed #1333348.
- When gmcast.listen_addr was configured to a certain address, local connection point for outgoing connections was not bound to listen address. This would happen if OS has multiple interfaces with IP addresses in the same subnet, it may happen that OS would pick wrong IP for local connection point and other nodes would see connections originating from IP address which was not listened to. Bug fixed #1240964.
- Client connections were closed unconditionally before generating SST request. Fixed by avoiding closing connections when wsrep is initialized before storage engines. Bug fixed #1258658.
- Issue with re-setting galera provider (in wsrep_provider_options) has been fixed. Bug fixed #1260283.
- Variable wsrep_provider_options couldn’t be set in runtime if no provider was loaded. Bug fixed #1260290.
- Node consistency issues with foreign keys have been fixed. This fix introduces two new variables: wsrep_slave_FK_checks and wsrep_slave_UK_checks. These variables are set to TRUE and FALSE respectively by default. They control whether Foreign Key and Unique Key checking is done for applier threads. Bug fixed #1260713.
- When FLUSH TABLES WITH READ LOCK was used on a node with wsrep_causal_reads set to 1 while there was a DML on other nodes then, subsequent SELECTs/SHOW STATUS didn’t hang earlier providing non-causal output, that has been fixed here. Bug fixed #1271177.
- Lowest group communication layer (evs) would fail to handle the situation properly when big number of nodes would suddenly start to see each other. Bugs fixed #1271918 and #1249805.
- Updating a unique key value could cause server hang if slave node has enabled parallel slaves. Bug fixed #1280896.
- Fixed the events replication inconsistencies. Bug fixed #1312618.
- Truncating the sorted version of multi-byte character conversion could lead to wsrep certification failures. Bug fixed #1314854.
- wsrep_slave_threads was counted towards max_connections which could cause ERROR 1040 (HY000): Too many connections error. Bug fixed #1315588.
- Leaving node was not set nonoperational if processed leave message originated from different view than the current one, which could cause other nodes to crash. Bug fixed #1323412 (#41).
- garbd couldn’t be started with init script on RHEL 6.5. Bug fixed #1323652.
- SST would fail when binlogs were in dedicated directory that’s located inside datadir. This bug was a regression introduced by bug fix for #1273368. Bug fixed #1326012.
- GTID of TOI operations is now also synced to InnoDB tablespace in order to get consistent backups. Bug fixed #1329055.
- mysql-debug (UNIV_DEBUG) is now distributed with binary tar.gz along with RPM and DEB packages. Bug fixed #1332073.
- The restart sequence in scripts/mysql.server would fail to capture and return if the start call failed to start the server, so a restart could occur that failed upon start-up, and the script would still return 0 as if it worked without any issues. Bug fixed #1339894.
- wsrep consistency check is now enabled for REPLACE ... SELECT as well. This was implemented because pt-table-checksum uses REPLACE .. SELECT during checksumming. Bug fixed #1343209.
- A memory leak in wsrep_mysql_parse function has been fixed. Bug fixed #1345023.
- SHOW STATUS was generating debug output in the error log. Bug fixed #1347818.
- percona-xtradb-cluster-garbd-3.x package was installed incorrectly on Debian/Ubuntu. Bugs fixed #1360633 and #1334530.
Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!
The post Percona XtraDB Cluster 5.5.39-25.11 is now available appeared first on MySQL Performance Blog.
Thank you to all of you who attended my webinar last week about Global Transaction IDs (GTIDs), which were introduced in MySQL 5.6 to make the reconfiguration of replication straightforward. If you missed my webinar, you can still listen to the recording and download the sides (free). We had a lot of questions during the webinar, so let me try to answer them here. Please let me know in the comments if additional clarification is needed.
Q: Does GTID provide any benefit to master-master replication? If yes, how?
Q: Is ACTIVE ACTIVE MASTER MASTER successful in MySQL with GTID?
A: GTIDs don’t change the nature of MySQL replication: for instance it is still asynchronous and if you write on both masters in a master-master setup, there is still no write conflict detection mechanism. The main benefit of GTIDs is that any change of the replication topology is easy because you no longer need to run complex calculations to find the right binlog positions when connecting a slave to a new master.
So master-master replication can be configured with GTIDs, it does not provide a lot of benefits compared to position-based replication as you will never change the replication topology.
And having a setup where both masters receive writes is still not recommended with GTIDs.
Q: Will GTIDs work well with master:standby master? How quick would the failover be?
A: Yes, GTIDs works well with this kind of setup (which is one master and one slave). GTIDs do not provide failover, you will have to use an external tool. Speed of failover then depends on the tool you will use.
Q: For already set up MASTER-MASTER/MASTER-SLAVE Replication, after getting GTID set up, we need to rebuild replication again using AUTO POS=1, correct?
A: Yes, using MASTER_AUTO_POSITION=1 is necessary to indicate that you will use GTID replication. So you will have to run: STOP SLAVE; CHANGE MASTER TO … MASTER_AUTO_POSITION = 1; START SLAVE;
Q: Application having tables from different Engines(InnoDB and MyISAM), how that will handled in GTID?
A: Transactions using both MyISAM and InnoDB tables are not allowed, please refer to the documentation
Q: In a master-slave replication topology (with GTID enabled), how does slave get data from the master if the master’s binary logs are purged given that AUTO_POSITION=1 is used as part of the change master command?
A: This will break replication with error 1236.
Q: Whats the value of show slave status who determines if there is a lag on the slave?
A: This is Seconds_Behind_Master. It’s not always reliable though. For instance if you have a replication setup like A -> B -> C, Seconds_Behind_Master on C will shop the lag relatively to B, not A.
Q: What is the value of saving the history of previous master’s GTIDs executed in the show slave status -> Executed_Gtid_Set?
A: The new replication protocol makes sure that when the slave connects to its master, it sends the range of GTIDs it has already executed. Then the master sends back all other transactions. That’s why Executed_Gtid_Set contains the history of all executed transactions.
Q: We use DB Master and Slave VIPs on our servers, can the mysqlfailover tool also switch the VIP to the new master? Is it scriptable on the event of a failover?
A: Yes you can use extension points to add you own custom scripts with mysqlfailover. See the documentation for –exec-before and –exec-after.
Q: How does mysqlfailover handle brief network instability between the Master and Slaves?
A: mysqlfailover only triggers failover when it suspects the master is no longer alive. So network instability between the master and its slaves won’t affect it for master crash detection. However it can prevent the tool from reconfiguring replication correctly during failover/switchover if one or several slaves are not reachable.
Q: Does Facebook use MySQL with GTID? if yes, which module or all together for everything?
A: I can’t speak for Facebook, but this talk at the MySQL Conference this year suggests that they’re using GTIDs in production. They have added custom code to make GTIDs easier to use.
Q: is GTID_SUBSET function part of MySQL utilities? or we should set a script to regularly detect it? is GTID_SUBSET beneficial in case of an ACTIVE ACTIVE MASTER MASTER setup?
Q: Can you please confirm how to get gtid set in order to use gtid functions?
A: GTID_SUBSET() is a built-in function in MySQL 5.6, you don’t need to install MySQL Utilities to use it. It can be used to easily know whether Executed_Gtid_Set on a given server is a subset of Executed_Gtid_Set on another server, so it can be beneficial to use it in any replication topology.
Q: What is difference between HOLE and BUGS?
A: Holes are not allowed in MySQL 5.6 implementation of GTIDs. So if you see a hole in a GTID sequence, you’re hitting a bug!
Q: Using MySQL utilities, we can set-up replication also using a python script with just one command. Does it automatically takes dump from master to slave and starts replication? If yes so a 300 GB data directory, will it run as background if executed using shell script? Or it just starts the replicationn from current position and won’t take the dump?
A: I think you are talking about mysqlreplicate. This tool only runs CHANGE MASTER TO for you so it doesn’t take a backup of any kind.
Q: Is it possible to use mysqlfailover script at any node(like slave) any time to know which is its MASTER and other SLAVE options also? If no, is this available by some other means?
A: You should probably use mysqlrplshow instead.
Q: As told during limitations of MySQL Utilities on automatic failover, so how can I achieve AUTOMATIC failover if I want this as primary option?
A: The node running mysqlfailover is not highly available so if it is down you lose the ability of doing automatic failover. There are several options if you want to achieve automatic failover: carefully monitor the monitoring node or use solutions like Percona Replication Manager which relies on Pacemaker or Percona XtraDB Cluster which relies on Galera replication.
Q: mysqlrpadmin failover/switchover: Can we execute this command on slave or any other monitoring node?
A: Yes, as long as mysqlrpladmin is installed on a server and if it can connect to the database servers, the command can be executed from anywhere.
Q: If we set slaves to read_only…is that recommended to eliminate the errant transaction?
A: Yes, it is recommended. However it doesn’t prevent users with the SUPER privilege from accidentally writing on a slave.
Q: Can this errant transaction issue be prevented with (active-passive) master-master replication?
A: Master-master replication will make sure that any transaction written on one server will automatically end up being written on the other slave. So it is like avoiding errant transactions. However writing on both masters is not recommended as you can have write conflicts.
Q: Is there any practical advantage of using GTID with MHA?
A: GTIDs provide no way to perform failover, they only simplify how you can reconfigure replication. So using MHA to leverage GTIDs makes sense.
Q: What is diifference GUID and GTID? When do we have to use GUID and GTID?
A: I’m not sure I correctly understand the question. A GTID is made of a source id and a transaction id. The source id is the master’s server_uuid, which is a GUID that is automatically generated when MySQL is started for the first time.
Q: How to check the slave database tables and record? we have to sync data from master to salve database or automatically will be happen?
A: You can use pt-table-checksum and pt-table-sync from Percona Toolkit.
Thanks again for attending the webinar! You can replay it, download the slide – and also access Percona’s vast library of other MySQL webinar recordings here.
The post Using MySQL 5.6 Global Transaction IDs (GTIDs) in production: Q&A appeared first on MySQL Performance Blog.
Here on the Percona Support team we often ask customers to retrieve disk stats to monitor disk IO and to measure block devices iops and latency. There are a number of tools available to monitor IO on Linux. iostat is one of the popular tools and Percona Toolkit, which is free, contains the pt-diskstats tool for this purpose. The pt-diskstats tool is similar to iostat but it’s more interactive and contains extended information. pt-diskstats reports current disk activity and shows the statistics for the last second (which by default is 1 second) and will continue until interrupted. The pt-diskstats tool collects samples of /proc/diskstats.
In this post, I will share some examples about how to monitor and check to see if the IO subsystem is performing properly or if any disks are a limiting factor – all this by using the pt-diskstats tool.
pt-diskstats output consists on number of columns and in order to interpret pt-diskstats output we need to know what each column represents.
- rd_s tells about number of reads per second while wr_s represents number of writes per second.
- rd_rt and wr_rt shows average response time in milliseconds for reads & writes respectively, which is similar to iostat tool output await column but pt-diskstats shows individual response time for reads and writes at disk level. Just a note, modern iostat splits read and write latency out, but most distros don’t have the latest iostat in their systat (or equivalent) package.
- rd_mrg and wr_mrg are other two important columns in pt-diskstats output. *_mrg is telling us how many of the original operations the IO elevator (disk scheduler) was able to merge to reduce IOPS, so *_mrg is telling us a quite important thing by letting us know that the IO scheduler was able to consolidate many or few operations. If rd_mrg/wr_mrg is high% then the IO workload is sequential on the other hand, If rd_mrg/wr_mrg is a low% then IO workload is all random. Binary logs, redo logs (aka ib_logfile*), undo log and doublewrite buffer all need sequential writes.
- qtime and stime are last two columns in pt-diskstats output where qtime reflects to time spent in disk scheduler queue i.e. average queue time before sending it to physical device and on the other hand stime is average service time which is time accumulated to process the physical device request. Note, that qtime is not discriminated between reads and writes and you can check if response time is higher for qtime than it signal towards disk scheduler. Also note that service time (stime field and svctm field in in pt-diskstats & iostat output respectively) is not reliable on Linux. If you read the iostat manual you will see it is deprecated.
Along with that, there are many other parameters for pt-diskstats – you can found full documentation here. Below is an example of pt-disktats in action. I used the –devices-regex option which prints only device information that matches this Perl regex.$ pt-diskstats --devices-regex=sd --interval 5 #ts device rd_s rd_avkb rd_mb_s rd_mrg rd_cnc rd_rt wr_s wr_avkb wr_mb_s wr_mrg wr_cnc wr_rt busy in_prg io_s qtime stime 1.1 sda 21.6 22.8 0.5 45% 1.2 29.4 275.5 4.0 1.1 0% 40.0 145.1 65% 158 297.1 155.0 2.1 1.1 sdb 15.0 21.0 0.3 33% 0.1 5.2 0.0 0.0 0.0 0% 0.0 0.0 11% 1 15.0 0.5 4.7 1.1 sdc 5.6 10.0 0.1 0% 0.0 5.2 1.9 6.0 0.0 33% 0.0 2.0 3% 0 7.5 0.4 3.6 1.1 sdd 0.0 0.0 0.0 0% 0.0 0.0 0.0 0.0 0.0 0% 0.0 0.0 0% 0 0.0 0.0 0.0 5.0 sda 17.0 14.8 0.2 64% 3.1 66.7 404.9 4.6 1.8 14% 140.9 298.5 100% 111 421.9 277.6 1.9 5.0 sdb 14.0 19.9 0.3 48% 0.1 5.5 0.4 174.0 0.1 98% 0.0 0.0 11% 0 14.4 0.9 2.4 5.0 sdc 3.6 27.1 0.1 61% 0.0 3.5 2.8 5.7 0.0 30% 0.0 2.0 3% 0 6.4 0.7 2.4 5.0 sdd 0.0 0.0 0.0 0% 0.0 0.0 0.0 0.0 0.0 0% 0.0 0.0 0% 0 0.0 0.0 0.0
These are the stats from 7200 RPM SATA disks. As you can see, the write-response time is very high and most of that is made up of IO queue time. This shows the problem exactly. The problem is that the IO subsystem is not able to handle the write workload because the amount of writes that are being performed are way beyond what it can handle. It means the disks cannot service every request concurrently. The workload would actually depend a lot on where the hot data is stored and as we can see in this particular case the workload only hits a single disk out of the 4 disks. A single 7.2K RPM disk can only do about 100 random writes per second which is not a lot considering heavy workload.
It’s not particularly a hardware issue but a hardware capacity issue. The kind of workload that is present and the amount of writes that are performed per second are not something that the IO subsystem is able to handle in an efficient manner. Mostly writes are generated on this server as can be seen by the disk stats.
Let me show you a second example. Here you can see read latency. rd_rt is consistently between 10ms-30ms. It depends on how fast the disks are spinning and the number of disks. To deal with it possible solutions would be to optimize queries to avoid table scans, use memcached where possible, use SSD’s as it can provide good I/O performance with high concurrency. You will find this post useful on SSD’s from our CEO, Peter Zaitsev.#ts device rd_s rd_avkb rd_mb_s rd_mrg rd_cnc rd_rt wr_s wr_avkb wr_mb_s wr_mrg wr_cnc wr_rt busy in_prg io_s qtime stime 1.0 sdb 33.0 29.1 0.9 0% 1.1 34.7 7.0 10.3 0.1 61% 0.0 0.4 99% 1 40.0 2.2 19.5 1.0 sdb1 0.0 0.0 0.0 0% 0.0 0.0 7.0 10.3 0.1 61% 0.0 0.4 1% 0 7.0 0.0 0.4 1.0 sdb2 33.0 29.1 0.9 0% 1.1 34.7 0.0 0.0 0.0 0% 0.0 0.0 99% 1 33.0 3.5 30.2 1.0 sdb 81.9 28.5 2.3 0% 1.1 14.0 0.0 0.0 0.0 0% 0.0 0.0 99% 1 81.9 2.0 12.0 1.0 sdb1 0.0 0.0 0.0 0% 0.0 0.0 0.0 0.0 0.0 0% 0.0 0.0 0% 0 0.0 0.0 0.0 1.0 sdb2 81.9 28.5 2.3 0% 1.1 14.0 0.0 0.0 0.0 0% 0.0 0.0 99% 1 81.9 2.0 12.0 1.0 sdb 50.0 25.7 1.3 0% 1.3 25.1 13.0 11.7 0.1 66% 0.0 0.7 99% 1 63.0 3.4 11.3 1.0 sdb1 25.0 21.3 0.5 0% 0.6 25.2 13.0 11.7 0.1 66% 0.0 0.7 46% 1 38.0 3.2 7.3 1.0 sdb2 25.0 30.1 0.7 0% 0.6 25.0 0.0 0.0 0.0 0% 0.0 0.0 56% 0 25.0 3.6 22.2
From the below diskstats output it seems that IO is saturated between both reads and writes. This can be noticed with high value for columns rd_s and wr_s. In this particular case, consider having disks in either RAID 5 (better for read only workload) or RAID 10 array is good option along with battery-backed write cache (BBWC) as single disk can really be bad for performance when you are IO bound.device rd_s rd_avkb rd_mb_s rd_mrg rd_cnc rd_rt wr_s wr_avkb wr_mb_s wr_mrg wr_cnc wr_rt busy in_prg io_s qtime stime sdb1 362.0 27.4 9.7 0% 2.7 7.5 525.2 20.2 10.3 35% 6.4 8.0 100% 0 887.2 7.0 0.9 sdb1 439.9 26.5 11.4 0% 3.4 7.7 545.7 20.8 11.1 34% 9.8 11.9 100% 0 985.6 9.6 0.8 sdb1 576.6 26.5 14.9 0% 4.5 7.8 400.2 19.9 7.8 34% 6.7 10.9 100% 0 976.8 8.6 0.8 sdb1 410.8 24.2 9.7 0% 2.9 7.1 403.1 18.3 7.2 34% 10.8 17.7 100% 0 813.9 12.5 1.0 sdb1 378.4 24.6 9.1 0% 2.7 7.3 506.1 16.5 8.2 33% 5.7 7.6 100% 0 884.4 6.6 0.9 sdb1 572.8 26.1 14.6 0% 4.8 8.4 422.6 17.2 7.1 30% 1.7 2.8 100% 0 995.4 4.7 0.8 sdb1 429.2 23.0 9.6 0% 3.2 7.4 511.9 14.5 7.2 31% 1.2 1.7 100% 0 941.2 3.6 0.9
The following example reflects write heavy activity but write-response time is very good, under 1ms, which shows disks are healthy and capable of handling high number of IOPS.#ts device rd_s rd_avkb rd_mb_s rd_mrg rd_cnc rd_rt wr_s wr_avkb wr_mb_s wr_mrg wr_cnc wr_rt busy in_prg io_s qtime stime 1.0 dm-0 530.8 16.0 8.3 0% 0.3 0.5 6124.0 5.1 30.7 0% 1.7 0.3 86% 2 6654.8 0.2 0.1 2.0 dm-0 633.1 16.1 10.0 0% 0.3 0.5 6173.0 6.1 36.6 0% 1.7 0.3 88% 1 6806.1 0.2 0.1 3.0 dm-0 731.8 16.0 11.5 0% 0.4 0.5 6064.2 5.8 34.1 0% 1.9 0.3 90% 2 6795.9 0.2 0.1 4.0 dm-0 711.1 16.0 11.1 0% 0.3 0.5 6448.5 5.4 34.3 0% 1.8 0.3 92% 2 7159.6 0.2 0.1 5.0 dm-0 700.1 16.0 10.9 0% 0.4 0.5 5689.4 5.8 32.2 0% 1.9 0.3 88% 0 6389.5 0.2 0.1 6.0 dm-0 774.1 16.0 12.1 0% 0.3 0.4 6409.5 5.5 34.2 0% 1.7 0.3 86% 0 7183.5 0.2 0.1 7.0 dm-0 849.6 16.0 13.3 0% 0.4 0.5 6151.2 5.4 32.3 0% 1.9 0.3 88% 3 7000.8 0.2 0.1 8.0 dm-0 664.2 16.0 10.4 0% 0.3 0.5 6349.2 5.7 35.1 0% 2.0 0.3 90% 2 7013.4 0.2 0.1 9.0 dm-0 951.0 16.0 14.9 0% 0.4 0.4 5807.0 5.3 29.9 0% 1.8 0.3 90% 3 6758.0 0.2 0.1 10.0 dm-0 742.0 16.0 11.6 0% 0.3 0.5 6461.1 5.1 32.2 0% 1.7 0.3 87% 1 7203.2 0.2 0.1
Let me show you a final example. I used –interval and –iterations parameters for pt-diskstats which tells us to wait for a number of seconds before printing the next disk stats and to limit the number of samples respectively. If you notice, you will see in 3rd iteration high latency (rd_rt, wr_rt) mostly for reads. Also, you can notice a high value for queue time (qtime) and service time (stime) where qtime is related to disk IO scheduler settings. For MySQL database servers we usually recommends noop/deadline instead of default cfq.$ pt-diskstats --interval=20 --iterations=3 #ts device rd_s rd_avkb rd_mb_s rd_mrg rd_cnc rd_rt wr_s wr_avkb wr_mb_s wr_mrg wr_cnc wr_rt busy in_prg io_s qtime stime 10.4 hda 11.7 4.0 0.0 0% 0.0 1.1 40.7 11.7 0.5 26% 0.1 2.1 10% 0 52.5 0.4 1.5 10.4 hda2 0.0 0.0 0.0 0% 0.0 0.0 0.4 7.0 0.0 43% 0.0 0.1 0% 0 0.4 0.0 0.1 10.4 hda3 0.0 0.0 0.0 0% 0.0 0.0 0.4 107.0 0.0 96% 0.0 0.2 0% 0 0.4 0.0 0.2 10.4 hda5 0.0 0.0 0.0 0% 0.0 0.0 0.7 20.0 0.0 80% 0.0 0.3 0% 0 0.7 0.1 0.2 10.4 hda6 0.0 0.0 0.0 0% 0.0 0.0 0.1 4.0 0.0 0% 0.0 4.0 0% 0 0.1 0.0 4.0 10.4 hda9 11.7 4.0 0.0 0% 0.0 1.1 39.2 10.7 0.4 3% 0.1 2.7 9% 0 50.9 0.5 1.8 10.4 drbd1 11.7 4.0 0.0 0% 0.0 1.1 39.1 10.7 0.4 0% 0.1 2.8 9% 0 50.8 0.5 1.7 20.0 hda 14.6 4.0 0.1 0% 0.0 1.4 39.5 12.3 0.5 26% 0.3 6.4 18% 0 54.1 2.6 2.7 20.0 hda2 0.0 0.0 0.0 0% 0.0 0.0 0.4 9.1 0.0 56% 0.0 42.0 3% 0 0.4 0.0 42.0 20.0 hda3 0.0 0.0 0.0 0% 0.0 0.0 1.5 22.3 0.0 82% 0.0 1.5 0% 0 1.5 1.2 0.3 20.0 hda5 0.0 0.0 0.0 0% 0.0 0.0 1.1 18.9 0.0 79% 0.1 21.4 11% 0 1.1 0.1 21.3 20.0 hda6 0.0 0.0 0.0 0% 0.0 0.0 0.8 10.4 0.0 62% 0.0 1.5 0% 0 0.8 1.3 0.2 20.0 hda9 14.6 4.0 0.1 0% 0.0 1.4 35.8 11.7 0.4 3% 0.2 4.9 18% 0 50.4 0.5 3.5 20.0 drbd1 14.6 4.0 0.1 0% 0.0 1.4 36.4 11.6 0.4 0% 0.2 5.1 17% 0 51.0 0.5 3.4 20.0 hda 0.9 4.0 0.0 0% 0.2 251.9 28.8 61.8 1.7 92% 4.5 13.1 31% 2 29.6 12.8 0.9 20.0 hda2 0.0 0.0 0.0 0% 0.0 0.0 0.6 8.3 0.0 52% 0.1 98.2 6% 0 0.6 48.9 49.3 20.0 hda3 0.0 0.0 0.0 0% 0.0 0.0 2.0 23.2 0.0 83% 0.0 1.4 0% 0 2.0 1.2 0.3 20.0 hda5 0.0 0.0 0.0 0% 0.0 0.0 4.9 249.4 1.2 98% 4.0 13.2 9% 0 4.9 12.9 0.3 20.0 hda6 0.0 0.0 0.0 0% 0.0 0.0 0.0 0.0 0.0 0% 0.0 0.0 0% 0 0.0 0.0 0.0 20.0 hda9 0.9 4.0 0.0 0% 0.2 251.9 21.3 24.2 0.5 32% 0.4 12.9 31% 2 22.2 10.2 9.7 20.0 drbd1 0.9 4.0 0.0 0% 0.2 251.9 30.6 17.0 0.5 0% 0.7 24.1 30% 5 31.4 21.0 9.5
You can see the busy column in pt-diskstats output which is the same as the util column in iostat – which points to utilization. Actually, pt-diskstats is quite similar to the iostat tool but pt-diskstats is more interactive and has more information. The busy percentage is only telling us for how long the IO subsystem was busy, but is not indicating capacity. So the only time you care about %busy is when it’s 100% and at the same time latency (await in iostat and rd_rt/wr_rt in diskstats output) increases over -say- 5ms. You can estimate capacity of your IO subsystem and then look at the IOPS being consumed (r/s + w/s columns). Also, the system can process more than one request in parallel (in case of RAID) so %busy can go beyond 100% in pt-diskstats output.
If you need to check disk throughput, block device IOPS run the following to capture metrics from your IO subsystem and see if utilization matches other worrisome symptoms. I would suggest capturing disk stats during peak load. Output can be grouped by sample or by disk using the –group-by option. You can use the sysbench benchmark tool for this purpose to measure database server performance. You will find this link useful for sysbench tool details.$ pt-diskstats --group-by=all --iterations=7200 > /tmp/pt-diskstats.out;
pt-diskstats is one of the finest tools from Percona Toolkit. By using this tool you can easily spot disk bottlenecks, measure the IO subsystem and identify how much IOPS your drive can handle (i.e. disk capacity).
Join me online next week (September 10 at 10 a.m. PDT) for my live webinar on Migrating to Percona XtraDB Cluster. This was a popular webinar that I gave a few years ago, so I’m doing it again with updates for Percona XtraDB Cluster 5.6 (PXC) and all the latest in the Galera world.
This webinar will be really good for people interested in getting an overview of what PXC/Galera is, what it would take to adopt it for your application, and some of the differences and challenges it brings compared with a conventional MySQL Master/slave setup. I’d highly suggest attending if you are considering Galera in your environment and want to get a better understanding of its uses and antipatterns.
Additionally, I’ll cover such questions as:
- What are the requirements for running Percona XtraDB Cluster?
- Will I have to reload all my tables?
- How does configuration for the cluster differ from configuring a stand-alone InnoDB server?
- How should my application interact with the Cluster?
- Can I use Percona XtraDB Cluster if I only have two MySQL servers currently?
- How can I move to the Cluster and keep downtime to a minimum?
- How can I migrate to Percona XtraDB Cluster gradually?
I hope to see you next Wednesday. And please feel free to ask questions in advance in the comments section below. Next week’s live event, like all of our MySQL webinars, is free. Register here!
The post Migrating to Percona XtraDB Cluster 2014 edition: Sept. 10 MySQL webinar appeared first on MySQL Performance Blog.
Let me start by saying a big “thank you” to the staff at Oracle for deciding to open source reducer.sh. It’s a tool I developed whilst I was working for them several years ago. Its sole purpose is to do one thing – but do it good: test-case simplification.
So, let’s say some customer just sent you 120,000 lines of SQL code and affirms that “it definitely causes a crash.” Or maybe you ran RQG (the Random Query Generator) for awhile (with the general query log turned on) and now you have a nice SQL trace which may just lead to that crash the run resulted in. Or you’re a DBA testing the company’s usual queries with Valgrind, and noticed that 2 in 1000 queries give a Valgrind warning in the mysqld error log – you’re just not sure which one. Or maybe you’re a developer, and during testing you saw that a SELECT query output did not look the way it should – the output was “7″ where it should have been “5″ – the only problem – you have 1000 lines of INSERT statements and are not sure which one caused it. In all of these cases reducer can help.
Here are some of it’s benefits/features:
- It can reduce large amount of SQL fast. 40K lines to just a few can usually be done in around 1 hour.
- Larger files scale even better – the chunking elimination method automatically adapt to file size.
- It can reduce crashes/asserts, Valgrind testcases, mysqld error log messages, and mysql CLI output testcases
- Also working (but with complex setup atm) is multi-threaded SQL test-case simplification (ALPHA)
- It can reduce sporadic testcases for all of the above (i.e. testcases where the issue does not reproduce every time)
- It can reduce sporadic testcases using multiple threads which significantly improves reduction time
- It is aware of how to establish a testcase is sporadic (and will report the same) – or not – and will change it’s behavior
- It is capable (turned on by default) of reducing actual DML/DDL query code after completing line-based reduction
- It is capable (turned on by default) of reducing testcases by eliminating columns from tables and INSERT queries
- By default reducer.sh uses tmpfs (highly recommended) to ensure testcases are “as reproducible as possible” (disk I/O)
- Additional options for mysqld (necessary to reproduce an issue) can easily be listed/added
- Regex syntax can be used in search strings (where applicable)
So, without further ado, let’s have a look at how to get it to do your simplification job
To get reducer.sh today, use these commands (yum example used, but this can easily be adapted to apt-get):$ sudo yum install bzr $ cd ~ $ bzr branch lp:randgen $ cd randgen/util/reducer/ $ ls *
(You may also want to checkout ./status.sh in this directory which is a handy tool for seeing what reducer.sh is upto when it is doing it’s first/original attempt to reproduce a given issue.)
And you can get percona-qa (for parse_general_log.sh [and the prepare_reducer.sh code bit if you need it] as shown in the video):$ cd ~ $ bzr branch lp:percona-qa $ cd perconq-qa $ ls *
The post Reducer.sh – A powerful MySQL test-case simplification/reducer tool appeared first on MySQL Performance Blog.
Given the recent excitement & interest around OpenStack I wanted to make sure I was ready to conduct appropriate evaluations of system performance. I generally turn to sysbench since it comes with a variety of different tests (accessed via –test= option interface), including:
- fileio – File I/O test
- cpu – CPU performance test
- memory – Memory functions speed test
- threads – Threads subsystem performance test
- mutex – Mutex performance test
As you can see, sysbench lets you stress many of the fundamental components of your hardware and infrastructure, such as your disk subsystem, along with your CPUs and memory. An additional option exists that is designed to perform synthetic stress testing of MySQL, and I was surprised when I didn’t see it in the above list on version 0.5, as it used to show up as “oltp – OLTP test”. What happened to –test=oltp ??
This list is from the latest release of sysbench which is 0.5 — you’re only going to be on this version if you build it yourself or if you use the package provided courtesy of Frederic Descamps (thanks lefred!). If you’re using the version from EPEL, Ubuntu 14.04, or Debian 7 you’re still using version 0.4.12 (check with sysbench –version). One thing you’ll notice is that the test type of OLTP doesn’t show up anymore. What gives? I was scratching my head until I asked on Percona IRC and found out that in 0.5 the standard OLTP test type was replaced with a different syntax, that instead of passing parameters to sysbench you instead reference scripts written in lua. The advantage here is that now you have an interface in order to write your own specific load tests (provided you know lua, but it isn’t hard). For those of you looking to run the pre-canned load tests they still exist but you have to have them as part of the RPM install or otherwise copied to your system.
Fortunately if you use the package provided by lefred you’ll find these lua scripts here (this is using Amazon ami as of August 4th, 2014):[root@pxc-control ~]# ls -l /usr/share/doc/sysbench/tests/db/ total 44 -rw-r--r-- 1 root root 3585 Sep 7 2012 common.lua -rw-r--r-- 1 root root 340 Sep 7 2012 delete.lua -rw-r--r-- 1 root root 830 Sep 7 2012 insert.lua -rw-r--r-- 1 root root 2925 Sep 7 2012 oltp.lua -rw-r--r-- 1 root root 342 Sep 7 2012 oltp_simple.lua -rw-r--r-- 1 root root 425 Sep 7 2012 parallel_prepare.lua -rw-r--r-- 1 root root 343 Sep 7 2012 select.lua -rw-r--r-- 1 root root 3964 Sep 7 2012 select_random_points.lua -rw-r--r-- 1 root root 4066 Sep 7 2012 select_random_ranges.lua -rw-r--r-- 1 root root 343 Sep 7 2012 update_index.lua -rw-r--r-- 1 root root 552 Sep 7 2012 update_non_index.lua
So the trick (if you want to call it that) is that instead of passing a single word to the –test directive, instead you pass the full path to the lua script.
This is the old way (sysbench 0.4.12 from EPEL repo):--test=oltp --oltp-test-mode=complex
This is the new way (sysbench 0.5):--test=/usr/share/doc/sysbench/tests/db/insert.lua
Here is an example of a test I’m running through haproxy against a 3-node PXC cluster doing the INSERT-only test type so you can see the full syntax I pass to sysbench:[root@pxc-control ~]# cat sys_haproxy.sh #!/bin/bash sysbench --test=/usr/share/doc/sysbench/tests/db/insert.lua --mysql-host=pxc-control --mysql-port=9999 --mysql-user=sysbench-haproxy --mysql-password=sysbench-haproxy --mysql-db=sbtest --mysql-table-type=innodb --oltp-test-mode=complex --oltp-read-only=off --oltp-reconnect=on --oltp-table-size=1000000 --max-requests=100000000 --num-threads=3 --report-interval=1 --report-checkpoints=10 --tx-rate=24 $1
And here’s what the insert.lua script looks like:[root@pxc-control ~]# cat /usr/share/doc/sysbench/tests/db/insert.lua pathtest = string.match(test, "(.*/)") or "" dofile(pathtest .. "common.lua") function thread_init(thread_id) set_vars() end function event(thread_id) local table_name local i local c_val local k_val local pad_val table_name = "sbtest".. sb_rand_uniform(1, oltp_tables_count) if (oltp_auto_inc) then i = 0 else i = sb_rand_uniq(1, oltp_table_size) end k_val = sb_rand(1, oltp_table_size) c_val = sb_rand_str([[ ###########-###########-###########-###########-###########-###########-###########-###########-###########-###########]]) pad_val = sb_rand_str([[ ###########-###########-###########-###########-###########]]) rs = db_query("INSERT INTO " .. table_name .. " (id, k, c, pad) VALUES " .. string.format("(%d, %d, '%s', '%s')",i, k_val, c_val, pad_val)) end
The thing that I like most about sysbench 0.5 (beyond the lua interface, of course!) is that it now comes with a –report-interval option (which I generally set as = 1) so that you get output while the script is running. No more waiting until the end of the test to get feedback! Here’s a sample of sysbench 0.5 in action running the INSERT test through a local haproxy instance and writing to three nodes in a PXC cluster such as OpenStack Trove might do:[root@pxc-control ~]# ./sys_haproxy.sh run sysbench 0.5: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 3 Report intermediate results every 1 second(s) Random number generator seed is 0 and will be ignored Threads started! [ 1s] threads: 3, tps: 0.00, reads/s: 0.00, writes/s: 1099.28, response time: 9.86ms (95%) [ 2s] threads: 3, tps: 0.00, reads/s: 0.00, writes/s: 973.02, response time: 10.77ms (95%) [ 3s] threads: 3, tps: 0.00, reads/s: 0.00, writes/s: 1181.01, response time: 6.23ms (95%) [ 4s] threads: 3, tps: 0.00, reads/s: 0.00, writes/s: 1103.00, response time: 6.77ms (95%)
I would also like to call your attention to a blog post by Nilnandan Joshi from Percona’s Support team where he describes a method to build sysbench 0.5 on Debian 7. Thanks Nil for pointing this out!
I hope that helps others out there who upgrade to sysbench 0.5 and then have questions about where –test=oltp went to. I’d love to hear your own sysbench use cases, and whether anyone else is publishing lua scripts for their own load testing!
The post Using sysbench 0.5 for performing MySQL benchmarks appeared first on MySQL Performance Blog.