]]>
]]>

Latest MySQL Performance Blog posts

You are here

Subscribe to Latest MySQL Performance Blog posts feed
Percona's MySQL & InnoDB performance and scalability blog
Updated: 37 min 53 sec ago

Get a handle on your HA at Percona Live London 2014

October 30, 2014 - 10:00pm

From left: Liz van Dijk, Frédéric Descamps and Kenny Gryp

If you’re following this blog, it’s quite likely you’re already aware of the Percona Live London 2014 conference coming up in just a few days. Just in case, though (you know, if you’re still looking for an excuse to sign up), I wanted to put a spotlight on the tutorial to be delivered by my esteemed colleagues Frédéric Descamps (@lefred) and Kenny Gryp (@gryp), and myself.

The past two years at Percona we’ve been spending a substantial amount of time working with customers taking their first steps into creating Highly Available MySQL environments built on Galera. Percona XtraDB Cluster allows you to get it up and running very fast, but as any weathered “HA” DBA will tell you, building the cluster is only the beginning. (Percona XtraDB Cluster is an open source (free) high-availability and high-scalability solution for MySQL clustering.)

Any cluster technology is likely to introduce a great amount of complexity to your environment, and in our tutorial we want to show you not only how to get started, but also how to avoid many of the operational pitfalls we’ve encountered. Our tutorial, Percona XtraDB Cluster in a nutshell, will be taking place on Monday 3 November and is a full-day (6 hours) session, with an intense hands-on approach.

We’ll be covering a great deal of practical topics, such as:

  • Things to keep in mind when migrating an existing environment over to PXC
  • How to manage and maintain the cluster, keeping it in good shape
  • Load balancing requests across the cluster
  • Considerations for deploying PXC in the cloud

Planning on attending? Be sure to come prepared! Given the hands-on approach of the tutorial, make sure you bring your laptop with enough disk space (~20GB) and processing power to run at least 4 small VirtualBox VM’s.

We look forward to seeing you there!

The post Get a handle on your HA at Percona Live London 2014 appeared first on MySQL Performance Blog.

Facebook MySQL database engineers ready for Percona Live London 2014

October 29, 2014 - 10:00pm

With 1.28 billion active users, Facebook MySQL database engineers are active and extremely valuable contributors to the global MySQL community. So naturally they are also active participants of Percona Live MySQL conferences! And next week’s Percona Live London 2014 (Nov. 3-4) is no exception. (Register now and use the promotional code “Facebook” to save £30!)

I spoke with Facebook database engineers Yoshinori “Yoshi” Matsunobu and Shlomo Priymak about their upcoming sessions along with what’s new at Facebook since our last conversation back in April.

Tom: Yoshi, last year Facebook deployed MySQL 5.6 on all production environments – what have you and your team learned since doing that? And do you have a few best practices you could share? I realize you’ll be going into detail during your session in London (MySQL 5.6 and WebScaleSQL at Facebook), but maybe a few words on a couple of the bigger ones?

Yoshi: MySQL 5.6 has excellent replication enhancements to use in large-scale deployments. For example, crash safe slave makes it possible to recover without rebuilding a slave instance on server crash. This can greatly minimize slave downtime, especially if your database size is large. There are many other new features such as GTID, multi-threaded slave, streaming mysqlbinlog and we actively use them in production.

For InnoDB, Online DDL is a good example to ease operations. Many MySQL users are doing schema changes by switching masters. This can minimize downtime but requires operational efforts. Online DDL made things much easier.

Tom: Facebook is an active and extremely valuable part of the overall MySQL community and ecosystem – what are some of the key features and improvements you’ve contributed in the past year since moving to MySQL 5.6?

Yoshi: For InnoDB, I think online defragmentation and faster full table scan are the most valuable contributions from Facebook in 5.6. I have received very positive feedback about faster InnoDB full table scan (Logical ReadAhead). My colleague Rongrong will speak about something interesting regarding online defragmentation at Percona Live London. For Replication, we have done many optimizations to make GTID and MTS work without pain. Semi-Synchronous mysqlbinlog and backported Loss-Less semisync from MySQL 5.7 are very useful when you use Semi-Synchronous replication.

Tom: Shlomo, your sesson, “MySQL Automation at Facebook Scale,” will be of great interest to DBAs at large and growing organizations considering that Facebook has one of the world’s largest MySQL database clusters. What are the two or three most significant things that you’ve learned as a database engineer operating a cluster of this size? And has anything surprised you along the way (so far)?

Shlomo: This is a great question! We like to speak of “10x” at Facebook when thinking of scaling. For example, what would you do differently if the number of servers you had was 10x more than what it is? This type of mental exercise is surprisingly useful when working with systems at scale. If you, or any of the readers, try to extrapolate this about systems you manage, there will be things you’ll be imagining about how a system like this would be – and you won’t be too far from our reality in many aspects.

You’d imagine that we automate much of the single units of work, like master/slave failover, upgrades and schema changes. You’d suspect we have automated fault detection, self managing systems, good alarming and self remediation. You’d presume that if you’re used to running a command on 100 machines, you’ll now be running it on 1000. At least that’s what I thought to myself, so these are not the things that surprised me. There are a few fundamental shifts in one’s thinking when you get to these sizes, which I didn’t foresee.

The first one is that there is absolutely no such thing as “one-off.” If there is a server somewhere that hits a problem every three years, and you have 1000 servers, this will be happening daily! Take it to 10,000 servers, and you can see absolutely nothing is a “one-off”. We can’t write things off as “worst case, I’ll get an SMS.” Whatever it is, we have to chase it down and fix it. Not just that – to deploy a fix at scale can require writing fairly large amounts of code, a fix that could be deployed manually by a DBA in smaller environments.

The second one is adapting to constraints which are very pragmatic and tangible. If you’re on AWS, you’re pretty much isolated from things like worrying where your servers are physically located, when they go over their lifetime, and if the firmware on the switch in the rack needs to be upgraded. If you’re a small shop and have a few racks up in a co-lo, hardware maintenance is just not as frequent, but it becomes more painful as you grow.

At Facebook, we run our own datacenters! We need to work around interesting challenges, such as running datacenters that have highly variable compositions of server hardware. Since we have so many servers, something is always going on. Racks of servers need to be moved. Whole clusters need to be rebuilt or refreshed, to be made better, faster, stronger.  New datacenters are constructed, others decommissioned.

Tom: And this is where automation comes into the picture, right?

Shlomo: We have had to build a lot of automation to make these operations seamless, and we work closely with the Site Ops teams on the ground to coordinate these logistically complicated processes.

Another thing my team does in this space is planning capacity and hardware purchases. Since we build our own servers, the turnaround time between ordering and getting machines is quite long, so proper planning is paramount. Buy too much, and you’ve wasted millions of dollars. Buy too few servers, and there won’t be space for user growth and upcoming projects. The sheer scale makes these decisions more complicated and involved.

These things have actually made my job much more interesting, and I think I’d find it hard to adjust to a smaller environment.

Tom: Last April Facebook announced a move to the newly created WebScaleSQL. Yoshi, do you have an update on where WebScaleSQL is today? And I know it’s early, but has there been any impact on Facebook yet?

Yoshi: WebscaleSQL is a collaboration among engineers from several companies that face similar challenges in running MySQL at scale. Collaboration is nothing new to the MySQL community. The intent is to make this collaboration more efficient.

We are based on the latest upstream (currently MySQL-5.6.21), and added many features. We added patches to improve InnoDB performance around compression LRU flushing, locking, NUMA Support, and doublewrite. We statically link Semi-Sync based on lessons learned at Facebook environments (plugin-lock caused hot mutex contentions). We have many upcoming features such as async clients.

We will continue to track the upstream branch that is the latest, production-ready release (currently MySQL 5.6). We are continuing to push the generally useful changes we have from all of the participants.  If you think you have something to contribute, get in touch!

Tom: I remember being surprised earlier this year when you told me there was usually just one MySQL Operations team member on call at any given time thanks to “robots.” How many robots did your team build and what do they do? Oh, and should rank-and-file DBAs around the world be worried about losing their day jobs to these robots?

Shlomo: Instead of becoming obsolete as some fear, our team is shifting its focus from smaller to larger problems, as we rise higher in the levels of abstraction. Our team has progressed with the requirements of the role. From being a team of DBAs that automate some of their work, we have become more like Production Engineers. We design, write and maintain MySQL/Facebook-specific automation that does our work for us.

While we build these software “robots” to do our work, we also have to maintain them. The job of the oncall is to fix these robots when they malfunction, and that can sometimes be difficult due to the size of our codebase.

In regards to employment concerns, I’d say our work has become more interesting, and the amount has increased. It definitely did not decrease, so if Facebook is indicative of other companies, jobs are not at risk just yet. Speaking of jobs – if what we’re doing sounds interesting, we’re hiring!

Oh, and as for details about these “robots” – that’s the topic of my talk next week in London. Come and hear me speak if you want to know more!

Tom: Yoshi, you also will host a session titled “Fast Master Failover without Data Loss.” I don’t want to give too much away, but how did you get failover to work at scale – across vast datacenters?

Yoshi: Master failure is a norm at Facebook, because of the large amount of servers. Without automation, it is not realistic for a limited number of people to manage. We have a very interesting infrastructure to automate failure handling at Facebook scale. To automate stuff, reliability is important. Unreliable automation makes engineers spend lots of time fixing things manually, and that increases downtime. It is also important to define what to automate and what we shouldn’t automate. Define failure scenarios and write good test cases and continuously integrate. There are multiple failure scenarios like the ones below and you’ll hear about each in detail at my session:

– mysqld crash
– mysqld stalls
– kernel panic and reboot
– error spikes caused by H/W failure
– error spikes caused by bad application logic
– rack switch down
– multiple rack switches down
– datacenter down

Tom: What other sessions, keynotes or events are you looking forward to at Percona Live London 2014? And are you guys planning on attending the MySQL Community Dinner?

Yoshi:MySQL 5.7: Performance and Scalability Benchmark(led by Oracle MySQL performance architect Dimitri Kravtchuk). And yes, we’re looking forward to meeting with people at MySQL Community Dinner!

Tom:  Thanks again Yoshi and Shlomo for taking the time to speak with me and I look forward to seeing you both in London next week!

And readers, I invite you to register now for Percona Live London using the promotional code “Facebook” to save £30. I also hope to see you at the MySQL Community Dinner next Monday (Nov. 3). Space is limited so be sure to reserve your spot now and join us aboard our private double-decker bus to the restaurant.

I’d also like to thank the Percona Live London 2014 Conference Committee for putting together a terrific event this year! The conference committee includes:

  • Dailymotion’s Cédric Peintre, conference chairman
  • Percona’s David Busby
  • MariaDB’s Colin Charles
  • ebay Classifieds Group’s Luis Motta Campos
  • Booking.com’s Nicolai Plum
  • Oracle’s Morgan Tocker
  • Spil Games’ Art van Scheppingen

The post Facebook MySQL database engineers ready for Percona Live London 2014 appeared first on MySQL Performance Blog.

MySQL and Openstack deep dive talk at OpenStack Paris Summit (and more!)

October 29, 2014 - 5:59am

I will present a benchmarking talk next week (Nov. 4) at the OpenStack Paris Summit with Jay Pipes from Mirantis. In order to be able to talk about benchmarking, we had to be able to set up and tear down OpenStack environments really quickly. For the benchmarks, we are using a deployment on AWS (ironically) where the instances aren’t actually started and the tenant network is not reachable but all the backend operations still happen.

The first performance bottleneck we hit wasn’t at the MySQL level. We used Rally to benchmark the environment. We started 1,000 fake instances with it at the first glance.

The first bottleneck that we saw was neutron-server eating up a single CPU core. We took a deeper look, and saw that neutron-server is utilizing a single core completely. By default, neutron does everything in a single process. After configuring the api workers and the rpc workers, performance became significantly better.

api_workers = 64 rpc_workers = 32

Before adding the options:

u'runner': {u'concurrency': 24, u'times': 1000, u'type': u'constant'}} +------------------+-----------+-----------+-----------+---------------+---------------+---------+-------+ | action | min (sec) | avg (sec) | max (sec) | 90 percentile | 95 percentile | success | count | +------------------+-----------+-----------+-----------+---------------+---------------+---------+-------+ | nova.boot_server | 4.125 | 9.336 | 15.547 | 11.795 | 12.362 | 100.0% | 1000 | | total | 4.126 | 9.336 | 15.547 | 11.795 | 12.362 | 100.0% | 1000 | +------------------+-----------+-----------+-----------+---------------+---------------+---------+-------+ Whole scenario time without context preparation: 391.359671831

After adding the options:

u'runner': {u'concurrency': 24, u'times': 1000, u'type': u'constant'}} +------------------+-----------+-----------+-----------+---------------+---------------+---------+-------+ | action | min (sec) | avg (sec) | max (sec) | 90 percentile | 95 percentile | success | count | +------------------+-----------+-----------+-----------+---------------+---------------+---------+-------+ | nova.boot_server | 2.821 | 6.958 | 36.826 | 8.165 | 10.49 | 100.0% | 1000 | | total | 2.821 | 6.958 | 36.826 | 8.165 | 10.49 | 100.0% | 1000 | +------------------+-----------+-----------+-----------+---------------+---------------+---------+-------+ Whole scenario time without context preparation: 292.163493156

Stop by our talk at the OpenStack Paris Summit for more details!

In addition to our talk, Percona has two additional speakers at the OpenStack Paris Summit. George Lorch, Percona software engineer, will speak with Vipul Sabhaya of the HP Cloud Platform Services team on “Percona Server Features for OpenStack and Trove Ops.” Tushar Katarki, Percona director of product management, will present a vBrownBag Tech Talk entitled “MySQL High Availability Options for OpenStack.” Percona is exhibiting at the OpenStack Paris Summit conference, as well – stop by booth E20 and say hello!

At Percona, we’re pleased to see the adoption of our open source software by the OpenStack community and we are working actively to develop more solutions for OpenStack users. We also provide Consulting assistance to organizations that are adopting OpenStack internally or are creating commercial services on top of OpenStack.

We are also pleased to introduce the first annual OpenStack Live, a conference focused on OpenStack and Trove, which is April 13 & 14, 2015 in Santa Clara, California. The call for speaking proposals is now open for submissions which will be reviewed by our OpenStack Live Conference Committee (including me!).

The post MySQL and Openstack deep dive talk at OpenStack Paris Summit (and more!) appeared first on MySQL Performance Blog.

How to deal with MySQL deadlocks

October 28, 2014 - 12:00am

A deadlock in MySQL happens when two or more transactions mutually hold and request for locks, creating a cycle of dependencies. In a transaction system, deadlocks are a fact of life and not completely avoidable. InnoDB automatically detects transaction deadlocks, rollbacks a transaction immediately and returns an error. It uses a metric to pick the easiest transaction to rollback. Though an occasional deadlock is not something to worry about, frequent occurrences call for attention.

Before MySQL 5.6, only the latest deadlock can be reviewed using SHOW ENGINE INNODB STATUS command. But with Percona Toolkit’s pt-deadlock-logger you can have deadlock information retrieved from SHOW ENGINE INNODB STATUS at a given interval and saved to a file or table for late diagnosis. For more information on using pt-deadlock-logger, see this post. With MySQL 5.6, you can enable a new variable innodb_print_all_deadlocks to have all deadlocks in InnoDB recorded in mysqld error log.

Before and above all diagnosis, it is always an important practice to have the applications catch deadlock error (MySQL error no. 1213) and handle it by retrying the transaction.

How to diagnose a MySQL deadlock

A MySQL deadlock could involve more than two transactions, but the LATEST DETECTED DEADLOCK section only shows the last two transactions. Also it only shows the last statement executed in the two transactions, and locks from the two transactions that created the cycle. What are missed are the earlier statements that might have really acquired the locks. I will show some tips on how to collect the missed statements.

Let’s look at two examples to see what information is given. Example 1:

1 141013 6:06:22 2 *** (1) TRANSACTION: 3 TRANSACTION 876726B90, ACTIVE 7 sec setting auto-inc lock 4 mysql tables in use 1, locked 1 5 LOCK WAIT 9 lock struct(s), heap size 1248, 4 row lock(s), undo log entries 4 6 MySQL thread id 155118366, OS thread handle 0x7f59e638a700, query id 87987781416 localhost msandbox update 7 INSERT INTO t1 (col1, col2, col3, col4) values (10, 20, 30, 'hello') 8 *** (1) WAITING FOR THIS LOCK TO BE GRANTED: 9 TABLE LOCK table `mydb`.`t1` trx id 876726B90 lock mode AUTO-INC waiting 10 *** (2) TRANSACTION: 11 TRANSACTION 876725B2D, ACTIVE 9 sec inserting 12 mysql tables in use 1, locked 1 13 876 lock struct(s), heap size 80312, 1022 row lock(s), undo log entries 1002 14 MySQL thread id 155097580, OS thread handle 0x7f585be79700, query id 87987761732 localhost msandbox update 15 INSERT INTO t1 (col1, col2, col3, col4) values (7, 86, 62, "a lot of things"), (7, 76, 62, "many more") 16 *** (2) HOLDS THE LOCK(S): 17 TABLE LOCK table `mydb`.`t1` trx id 876725B2D lock mode AUTO-INC 18 *** (2) WAITING FOR THIS LOCK TO BE GRANTED: 19 RECORD LOCKS space id 44917 page no 529635 n bits 112 index `PRIMARY` of table `mydb`.`t2` trx id 876725B2D lock mode S locks rec but not gap waiting 20 *** WE ROLL BACK TRANSACTION (1)

Line 1 gives the time when the deadlock happened. If your application code catches and logs deadlock errors,which it should, then you can match this timestamp with the timestamps of deadlock errors in application log. You would have the transaction that got rolled back. From there, retrieve all statements from that transaction.

Line 3 & 11, take note of Transaction number and ACTIVE time. If you log SHOW ENGINE INNODB STATUS output periodically(which is a good practice), then you can search previous outputs with Transaction number to hopefully see more statements from the same transaction. The ACTIVE sec gives a hint on whether the transaction is a single statement or multi-statement one.

Line 4 & 12, the tables in use and locked are only with respect to the current statement. So having 1 table in use does not necessarily mean that the transaction involves 1 table only.

Line 5 & 13, this is worth of attention as it tells how many changes the transaction had made, which is the “undo log entries” and how many row locks it held which is “row lock(s)”. These info hints the complexity of the transaction.

Line 6 & 14, take note of thread id, connecting host and connecting user. If you use different MySQL users for different application functions which is another good practice, then you can tell which application area the transaction comes from based on the connecting host and user.

Line 9, for the first transaction, it only shows the lock it was waiting for, in this case the AUTO-INC lock on table t1. Other possible values are S for shared lock and X for exclusive with or without gap locks.

Line 16 & 17, for the second transaction, it shows the lock(s) it held, in this case the AUTO-INC lock which was what TRANSACTION (1) was waiting for.

Line 18 & 19 shows which lock TRANSACTION (2) was waiting for. In this case, it was a shared not gap record lock on another table’s primary key. There are only a few sources for a shared record lock in InnoDB:
1) use of SELECT … LOCK IN SHARE MODE
2) on foreign key referenced record(s)
3) with INSERT INTO… SELECT, shared locks on source table
The current statement of trx(2) is a simple insert to table t1, so 1 and 3 are eliminated. By checking SHOW CREATE TABLE t1, you could confirm that the S lock was due to a foreign key constraint to the parent table t2.

Example 2: With MySQL community version, each record lock has the record content printed:

1 2014-10-11 10:41:12 7f6f912d7700 2 *** (1) TRANSACTION: 3 TRANSACTION 2164000, ACTIVE 27 sec starting index read 4 mysql tables in use 1, locked 1 5 LOCK WAIT 3 lock struct(s), heap size 360, 2 row lock(s), undo log entries 1 6 MySQL thread id 9, OS thread handle 0x7f6f91296700, query id 87 localhost ro ot updating 7 update t1 set name = 'b' where id = 3 8 *** (1) WAITING FOR THIS LOCK TO BE GRANTED: 9 RECORD LOCKS space id 1704 page no 3 n bits 72 index `PRIMARY` of table `tes t`.`t1` trx id 2164000 lock_mode X locks rec but not gap waiting 10 Record lock, heap no 4 PHYSICAL RECORD: n_fields 5; compact format; info bit s 0 11 0: len 4; hex 80000003; asc ;; 12 1: len 6; hex 000000210521; asc ! !;; 13 2: len 7; hex 180000122117cb; asc ! ;; 14 3: len 4; hex 80000008; asc ;; 15 4: len 1; hex 63; asc c;; 16 17 *** (2) TRANSACTION: 18 TRANSACTION 2164001, ACTIVE 18 sec starting index read 19 mysql tables in use 1, locked 1 20 3 lock struct(s), heap size 360, 2 row lock(s), undo log entries 1 21 MySQL thread id 10, OS thread handle 0x7f6f912d7700, query id 88 localhost r oot updating 22 update t1 set name = 'c' where id = 2 23 *** (2) HOLDS THE LOCK(S): 24 RECORD LOCKS space id 1704 page no 3 n bits 72 index `PRIMARY` of table `tes t`.`t1` trx id 2164001 lock_mode X locks rec but not gap 25 Record lock, heap no 4 PHYSICAL RECORD: n_fields 5; compact format; info bit s 0 26 0: len 4; hex 80000003; asc ;; 27 1: len 6; hex 000000210521; asc ! !;; 28 2: len 7; hex 180000122117cb; asc ! ;; 29 3: len 4; hex 80000008; asc ;; 30 4: len 1; hex 63; asc c;; 31 32 *** (2) WAITING FOR THIS LOCK TO BE GRANTED: 33 RECORD LOCKS space id 1704 page no 3 n bits 72 index `PRIMARY` of table `tes t`.`t1` trx id 2164001 lock_mode X locks rec but not gap waiting 34 Record lock, heap no 3 PHYSICAL RECORD: n_fields 5; compact format; info bit s 0 35 0: len 4; hex 80000002; asc ;; 36 1: len 6; hex 000000210520; asc ! ;; 37 2: len 7; hex 17000001c510f5; asc ;; 38 3: len 4; hex 80000009; asc ;; 39 4: len 1; hex 62; asc b;;

Line 9 & 10: The ‘space id’ is tablespace id, ‘page no’ gives which page the record lock is on inside the tablespace. The ‘n bits’ is not the page offset, instead the number of bits in the lock bitmap. The page offset is the ‘heap no’ on line 10,

Line 11~15: It shows the record data in hex numbers. Field 0 is the cluster index(primary key). Ignore the highest bit, the value is 3. Field 1 is the transaction id of the transaction which last modified this record, decimal value is 2164001 which is TRANSACTION (2). Field 2 is the rollback pointer. Starting from field 3 is the rest of the row data. Field 3 is integer column, value 8. Field 4 is string column with character ‘c’. By reading the data, we know exactly which row is locked and what is the current value.

What else can we learn from analysis? Since most MySQL deadlocks happen between two transactions, we could start the analysis based on that assumption. In Example 1, trx (2) was waiting on a shared lock, so trx (1) either held a shared or exclusive lock on that primary key record of table t2. Let’s say col2 is the foreign key column, by checking the current statement of trx(1), we know it did not require the same record lock, so it must be some previous statement in trx(1) that required S or X lock(s) on t2’s PK record(s). Trx (1) only made 4 row changes in 7 seconds. Then you learned a few characteristics of trx(1): it does a lot of processing but a few changes; changes involve table t1 and t2, a single record insertion to t2. These information combined with other data could help developers to locate the transaction.

Where else can we find previous statements of the transactions? Besides application log and previous SHOW ENGINE INNODB STATUS output, you may also leverage binlog, slow log and/or general query log. With binlog, if binlog_format=statement, each binlog event would have the thread_id. Only committed transactions are logged into binlog, so we could only look for Trx(2) in binlog. In the case of Example 1, we know when the deadlock happened, and we know Trx(2) started 9 seconds ago. We can run mysqlbinlog on the right binlog file and look for statements with thread_id = 155097580. It is always good to then cross refer the statements with the application code to confirm.

$ mysqlbinlog -vvv --start-datetime=“2014-10-13 6:06:12” --stop-datatime=“2014-10-13 6:06:22” mysql-bin.000010 > binlog_1013_0606.out

With Percona Server 5.5 and above, you can set log_slow_verbosity to include InnoDB transaction id in slow log. Then if you have long_query_time = 0, you would be able to catch all statements including those rolled back into slow log file. With general query log, the thread id is included and could be used to look for related statements.

How to avoid a MySQL deadlock

There are things we could do to eliminate a deadlock after we understand it.

– Make changes to the application. In some cases, you could greatly reduce the frequency of deadlocks by splitting a long transaction into smaller ones, so locks are released sooner. In other cases, the deadlock rises because two transactions touch the same sets of data, either in one or more tables, with different orders. Then change them to access data in the same order, in another word, serialize the access. That way you would have lock wait instead of deadlock when the transactions happen concurrently.

– Make changes to the table schema, such as removing foreign key constraint to detach two tables, or adding indexes to minimize the rows scanned and locked.

– In case of gap locking, you may change transaction isolation level to read committed for the session or transaction to avoid it. But then the binlog format for the session or transaction would have to be ROW or MIXED.

The post How to deal with MySQL deadlocks appeared first on MySQL Performance Blog.

MySQL & Friends Devroom FOSDEM 2015

October 27, 2014 - 6:50am

You can already feel the cold of February coming slowly… you can also smell waffles, fries and see a large amount of beards walking around with laptops… you are right, FOSDEM is coming! And as every year, the MySQL Community will also be present! For the 4th year in a row, I’ll perpetuate the organization of the MySQL & Friends Devroom.

FOSDEM 2015 edition will be held January 31 and February 1 here in Brussels. The MySQL & Friends Devroom is back on Sunday from 9 a.m. What is FOSDEM? It stands for the “Free and Open Source Software Developers’ European Meeting.” It’s a free event that offers open-source communities a place to meet, share ideas and collaborate.

As every year, the “Call for Papers” has been announced on the MySQL mailing list, and you can still read it here. CfP is open until December 7th!

This year the committee responsible for the talk’s selection is composed by:

* Dimitri Kravtchuk, representing Oracle
* Daniël van Eeden for the Community
* Roland Bouman for the Community
* Cédric Peintre for the Community
* Liz van Dijk, representing Percona
* Serge Frezefond, representing MariaDB
* René Cannaò, representing Blackbird IT

Thanks to all who have accepted playing this role and I wish them to work hard and make the best schedule as possible.

Don’t forget to submit your sessions (submit here, don’t forget to select MySQL track) in time and see you soon in Brussels to discover amazing stuff related to MySQL and have some beers with Friends!

The post MySQL & Friends Devroom FOSDEM 2015 appeared first on MySQL Performance Blog.

MySQL 5.6 Full Text Search Throwdown: Webinar Q&A

October 23, 2014 - 6:01am

Yesterday (Oct. 22) I gave a presentation titled “MySQL 5.6 Full Text Search Throwdown.” If you missed it, you can still register to view the recording and my slides.

Thanks to everyone who attended, and especially to folks who asked the great questions. I answered as many as we had time for during the session, but here are all the questions with my complete answers:

Q: Does Solr automatically maintain its index against MySQL? Do you have to hit the Solr server with a specific query to keep the index ‘warm’?

There are several strategies for updating a Solr index. In my examples, I showed only a “full import” which is what you would do to create an index by reading all the source data.

You can also perform a “delta import” periodically, to add a subset of the source data to an existing index, for example to add data that has changed since the last time you updated the Solr index. See the documentation for Using delta-import command and also Using query attribute for both full and delta import.

The delta import would typically be something you would invoke from a cron job, perhaps every hour. But that means that a Solr search might not find data that has changed in MySQL more recently than the last delta import. Depending on the application, a delay of up to 60 minutes might be acceptable, or else maybe you have strict requirements that all data must be in sync instantly.

You could also update the Solr index one document at a time using its Java API or web service API. This would require you to write code in your application. Every time you INSERT or UPDATE or DELETE a document in MySQL that you want to be kept in sync with the Solr index, you would write more code to do a similar operation in the Solr index. That way every single text change would be searchable nearly immediately.

Q: Did you test Elasticsearch? (several people asked about this)

I did not test Elasticsearch, but according to their technology overview: “Elasticsearch uses Lucene under the covers.” So I expect that this part of Elasticsearch performs similarly to what I saw from Apache Solr, which also uses Lucene internally.

Q: One question I could not understand, how to maintain Sphinx index in sync with data? Can be it in real time?

The Sphinx Search index does not automatically refresh as your MySQL data changes. You would have to write application code to invoke the indexing process. There’s a page in the Sphinx Search documentation about Live Index Updates, that gives an overview of the two methods, and links to further reading.

This is definitely the most inconvenient aspect of Sphinx Search. Queries are very fast, but it’s expensive to do incremental updates to an index. So it’s ideal for indexing an archive of text that doesn’t change very frequently, but not as easy to use it for indexing rapidly-changing content.

Q: I have over 800,000 PDF documents to index (in several languages), any recommendations?

I said during the webinar that I recalled there exists tools to extract searchable text from a PDF file. I found one such project called Apache PDFBox includes this capability, and they have a page describing a helper class for doing PDF parsing and extraction combined with Lucene indexing. I haven’t used it myself, so I can’t comment on its performance for indexing 800,000 PDF documents, but it seems like you could write a Java program to iterate over your collection of PDF’s, and index them using this class.

Q: What is your suggestion to use Sphinx Search for single column searches?

You can use any SQL query in the sphinx.conf to define the source data to index. You can select one column, multiple columns, or even multiple columns from joined tables. The result from any SQL query you write can be used as the data source.

Q: Which modules did you use with Sphinx Search? Did you use its built-in stemmers and metaphone package, etc.?

I installed the default modules. I don’t know if there is a significant performance difference from using optional packages.

Q: What about quality of results from each solution? I remember reading an article on percona.com several months ago comparing MyISAM fulltext vs InnoDB fulltext, and there were concerns about the results from InnoDB. Did you do any testing on this?

Indeed, here’s a link to the excellent blog post by my colleague Ernie Souhrada in which he found some surprises in the results from InnoDB FTS: InnoDB Full-text Search in MySQL 5.6: Part 2, The Queries!

I was just doing some comparison for performance in the current MySQL 5.7 milestone. I didn’t compare the query results this time.

Q: Is there any full text search in Percona Server with XtraDB?

Percona Server is based on the upstream MySQL Community Edition of the respective version number. So Percona Server has the builtin FULLTEXT index types for MyISAM and InnoDB, and we have not changed this part of the code. Percona Server does not bundle Sphinx Search, but it’s not too difficult to install Sphinx Search as a complementary technology, just as you would install other packages that are commonly used parts of an application infrastructure, for example Memcached or HA-proxy.

Q: Is MySQL going to improve the built-in InnoDB FTS in the near future?

They are continuing to add features that improve FTS, for example:

  • You can now write your own plugins for fulltext parsing (that is, parsing the input data to identify “words” to index; you may have your own idea about how to split text into words).
  • Both B-tree and full-text types now uses bulk-loading to make it faster and more efficient to build the index.

I’m not aware of any work to improve the performance of fulltext queries significantly.

Q: What is the performance comparison between MyISAM and InnoDB for inline index updating?

I didn’t test performance of incremental index updates this time. I only populated my tables from the StackOverflow data using LOAD XML, and then I created fulltext indexes on the populated tables. But I generally favor moving all important data to InnoDB, and not using MyISAM tables. It’s hard to imagine that the performance of index updates would be so much better that would convince me to use MyISAM. It’s more likely that the accuracy of search results would be a good reason to use MyISAM. Even then, I’d keep the original data in InnoDB and use MyISAM only as a copy of the data, to create a disposable fulltext index.

Thanks again for attending my webinar! For more great content, please join Percona and the MySQL community at our conference events. The next one is Percona Live London 2014 on November 3-4. We also look forward to the Open Stack Live 2015 in Santa Clara, California April 13-14, in the same venue with Percona Live MySQL Conference and Expo 2015, April 13-16.

Also watch more webinars from Percona in the future!

The post MySQL 5.6 Full Text Search Throwdown: Webinar Q&A appeared first on MySQL Performance Blog.

MySQL community set to meet at Percona Live London 2014

October 22, 2014 - 12:00am

The countdown is on for Europe’s largest annual MySQL event, Percona Live London 2014. The two days of technical tutorials and sessions, November 3-4, will focus on the latest MySQL industry trends, news, best practices – and a look at what’s on the near- and long-term horizon within the global MySQL ecosystem.

Percona Live London 2014 will bring attendees up to date on key areas including MySQL 5.7, database security, database as a service (DBaaS), Hadoop and high availability (HA), disaster recovery, replication and backup, performance and scalability, WebScaleSQL and much, much more.

Team Oracle will be in London, led by Tomas Ulin, vice president of Oracle’s MySQL engineering team, who will explain why MySQL just keeps getting better with the latest news for the MySQL Database, MySQL Cluster, MySQL Workbench… and more. Oracle’s Luis Soares, principle software engineer, and Andrew Morgan, MySQL HA product management, will provide insight into what’s in the latest 5.7 development milestone release and also what’s going on in the labs… particularly around MySQL replication. Seize the opportunity to learn how to leverage MySQL 5.7 replication to grow your business from their session, “MySQL Replication: What’s New in 5.7 and Beyond.”

If anything keeps DBAs up at night it’s database security – especially with recent revelations of vulnerabilities like the POODLE SSLv3 security flaw (CVE-2014-3566) and “Bash Bug,” also known as Shellshock (CVE-2014-6271). Attendees will have the opportunity to talk face-to-face with database security expert David Busby of Percona, who will also lead a session titled, “Security it’s more than just your database you should worry about.”

The official Percona Live London 2014 t-shirt!
(Click image for larger view)

Observe how to incorporate semi-synchronous replication to achieve failover – without data loss. Facebook’s Yoshinori Matsunobu and Santosh Banda will share how they did it at scale (across data centers) by extending MySQL internals along with some handy new self-made tools.

Meet the next-generation C connector for MySQL: libAttachSQL. It’s a new lightweight async C connector library for MySQL being developed from scratch by HP’s Advanced Technology Group. Andrew Hutchings, principal software engineer at Hewlett-Packard, will be on hand to share the latest on libAttachSQL.

Successful applications often become limited by MySQL performance. But tracking down and fixing those issues can be a huge drain on time and resources. Unless you think smart – spending time on what gives you the best return. Percona CEO Peter Zaitsev will explain how in his session, “Practical MySQL Performance Optimization.”

Percona Live London attendees will also learn from the real-life experiences of MySQL experts who share case studies. Shake hands with Art van Scheppingen, head of database engineering at Spil Games, who will explain how to serve out any page with an HA Sphinx environment.

Save yourself a quarter century by absorbing Tim Callaghan’s MySQL performance benchmarking tips, tricks and lessons learned. Tim, vice president of engineering at Tokutek, with share what he’s learned in the past 25 years maintaining the performance of database applications.

And of course there will be a MySQL community dinner! But be sure to register now for the dinner because space is limited – especially if you want to enjoy a ride to the restaurant on a vintage double-decker London bus (you do not need to attend the conference to join the dinner).

Register now for Percona Live London 2014 and save £30 with discount code “MPB30“. See you in London!

The post MySQL community set to meet at Percona Live London 2014 appeared first on MySQL Performance Blog.

Percona XtraDB Cluster: How to run a 2-node cluster on a single server

October 21, 2014 - 6:53am
I reckon there’s little sense in running 2 or more Percona XtraDB Cluster (PXC) nodes in a single physical server other than for educational and testing purposes – but doing so is still useful in those cases. The most popular way of achieving this seems to be with server virtualization, such as making use of Vagrant boxes. But in the same way you can have multiple instances of MySQL running in parallel on the OS level in the form of concurrent mysqld processes, so too can you have multiple Percona XtraDB Cluster nodes. And the way to achieve this is precisely the same: using dedicated datadirs and different ports for each node.

 

Which ports?4 tcp ports are used by Pecona XtraDB Cluster:
  • the regular MySQL port (default 3306)
  • port for group (Galera) communication (default 4567)
  • port for State Transfer (default 4444)
  • port for Incremental State Transfer (default is: port for group communication (4567) + 1 = 4568)
Of course, when you have multiple instances in the same server default values won’t work for all of them so we need to define new ports  for the additional instances and make sure to have the local firewall open to them, if there is one active (iptables, selinux,…).

[{ loading ... }]

Installing Percona XtraDB Cluster, configuring and starting the first nodeMy test server was a fresh CentOS 6.5 configured with Percona yum repository, from which I installed the latest Percona XtraDB Cluster (5.6.20-25.7.888.el6); note that you’ll need the EPEL repository as well to install socat, which is a dependency (see this bug). To avoid confusion, I’ve prevented the mysql service to start automatically:chkconfig --level 3 mysql off chkconfig --del mysql
I could have installed PXC from the tarball but I decided to do it from the repositories to have all dependencies covered by yum. This is how my initial /etc/my.cnf looked like (note the use of default values):

[mysqld] datadir = /var/lib/mysql port=3306 socket=/var/lib/mysql/mysql-node1.sock pid-file=/var/lib/mysql/mysql-node1.pid log-error=/var/lib/mysql/mysql-node1.err binlog_format=ROW innodb_autoinc_lock_mode=2 wsrep_provider=/usr/lib64/libgalera_smm.so wsrep_cluster_name = singlebox wsrep_node_name = node1 wsrep_cluster_address=gcomm://I’ve started by manually bootsrapping the cluster with this single node with the command:$ mysqld_safe --defaults-file=/etc/my.cnf --wsrep-new-cluster
You should then be able to access this node through the local socket:

$ mysql -S /var/lib/mysql/mysql-node1.sock

 

Configuring and starting the second nodeThen I created a similar configuration configuration file for the second instance, which I named /etc/my2.cnf, with the following modifications:[mysqld] datadir = /var/lib/mysql2 port=3307 socket=/var/lib/mysql2/mysql-node2.sock pid-file=/var/lib/mysql2/mysql-node2.pid log-error=/var/lib/mysql2/mysql-node2.err binlog_format=ROW innodb_autoinc_lock_mode=2 wsrep_provider=/usr/lib64/libgalera_smm.so wsrep_cluster_name = singlebox wsrep_node_name = node2 wsrep_cluster_address=gcomm://127.0.0.1:4567,127.0.0.1:5020 wsrep_provider_options = "base_port=5020;"
Note the use of base_port: by having it defined, port 5020 is used for group communication and 5021 (the one above it) is reserved for IST (it’s the same as using gmcast.listen_addr=tcp://127.0.0.1:5021, just simpler).

You need to create and setup the right permissions to the datadir on this second instance, otherwise MySQL won’t be able to create some files (like .pid and .err), though you don’t need to run the mysql_install_db script:$ chown -R mysql:mysql /var/lib/mysql2You can then start this second instance with the following command:$ mysqld_safe --defaults-file=/etc/my2.cnfWhile it starts, watch the log to observe how this second node starts, communicates with the primary node and join the cluster. On a different terminal from the one you’ve started the instance, execute:$ tail -f /var/log/mysql2/mysql-node2.errRemember that at any time you can use mysqladmin to stop the nodes, you only need to provide the right socket as argument, like follows:$ mysqladmin -S /var/lib/mysql/mysql-node1.sock shutdownFinally, once you have the whole cluster up you should edit the my.cnf of the first node with a complete wsrep_cluster_addres, as show in /etc/my2.cnf above.

 

Using mysqld_multiMy last blog post was on running multiple instances of MySQL with myslqd_multi. It applies here as well, the only exception is that you need to make sure to use “wsrep_cluster_address=gcomm://” in the first node whenever you bootstrap the cluster – and pay attention to start it before the other nodes.The only advantage I see in using mysqld_multi is facilitating the management (start/stop) of the nodes and concentrating all configuration in a single my.cnf file. In any case, you shouldn’t be running a PXC cluster in a single box for any purpose other than educational.

 

Adding a second Percona XtraDB Cluster node to a production serverWhat if you have a production cluster composed of multiple physical servers and you want to add a second node to one of them? It works the same way – you’ll just need to use the server’s IP address when configuring it instead of the loopback network interface. Here’s an example of a PXC cluster composed initially by three nodes: 192.168.70.1, 192.168.70.2, and 192.168.70.3. I’ve added a 4th node running on the server that is already hosting the 3rd – the wsrep_cluster_address line looks like as follows after the changes:wsrep_cluster_address = gcomm://192.168.70.1,192.168.70.2,192.168.70.3:4567,192.168.70.3:5020

 

Additional ressourcesWe have a documentation page on “How to setup 3 node cluster on single box” that contains more details of what I’ve covered above with a slightly different approach.

 

The post Percona XtraDB Cluster: How to run a 2-node cluster on a single server appeared first on MySQL Performance Blog.

Autumn: A season of MySQL-related conferences. Here’s my list

October 20, 2014 - 7:58am

Autumn is a season of MySQL-related conferences and I’m about to hit the road to speak and attend quite a  few of them.

This week I’ll participate in All Things Open, a local conference for me here in Raleigh, N.C. and therefore one I do not have to travel for. All Things Open explores open source, open tech and the open web in the enterprise. I’ll be speaking on SSDs for Databases at 3:15 p.m. on Thursday, Oct. 23 and I’ll also be participating in a book signing for the High Performance MySQL Book at 11:45 p.m. at the “Meet the Speaker” table. We are also proud to be sponsor of this show so please stop by and say “Hi” at our booth in the expo hall.

Following this show I go to Moscow, Russia to the Highload++ conference. This is wonderful show for people interested in high-performance solutions for Internet applications and I attend almost every year. It has a great lineup of speakers from leading Russian companies as well as many top International speakers covering a lot of diverse technologies. I have 3 talks at this show around Application Architecture, Using Indexes in MySQL and about SSD and Flash Storage for Databases. I’m looking forward to reconnecting with my many Russian friends at this show.

From Highload I go directly to Percona Live London 2014 (Nov. 3-4) which is the show we’re putting together – which of course means it is filled with great in-depth information about MySQL and its variants. I think this year we have a good balance of talks from MySQL users such as Facebook, Github, Booking.com, Ebay, Spil Games, IE Domain registry as well as vendors with in-depth information about products and having experiences with many customer environments – MySQL @ Oracle, HP, HGST, Percona, MariaDB, Pythian, Codership, Continuent, Tokutek, FromDual, OlinData. It looks like it is going to be a great show (though of course I’m biased) so do not forget to get registered if you have not already. (On Twitter use hashtag #PerconaLive)

The show I’m sorry to miss is the OpenStack Paris Summit. Even though it is so close to London, the additional visa logistics make it unfeasible for me to visit. There is going to be a fair amount of Perconians on the show, though. Our guys will be speaking about a MySQL and OpenStack Deep Dive as well as Percona Server Features for OpenStack and Trove Ops. We’re also exhibiting on this show so please stop by our booth and say “hi.”

Finally there is AWS re:Invent in Las Vegas Nov. 11-14. I have not submitted any talks for this one but I’ll drop in for a day to check it out. We’re also exhibiting at this show so if you’re around please stop by and stay “hi.”

This is going to be quite a busy month with a lot of events! There are actually more where we’re speaking or attending. If you’re interested about events we’re participating, there is a page on our web site to tell you just that! I also invite you to submit papers to speak at the new OpenStack Live 2015 conference April 13-14, which runs parallel to the annual Percona Live MySQL Conference and Expo 2015 April 13-16 – both at the Hyatt Regency Santa Clara & The Santa Clara Convention Center in Silicon Valley.

The post Autumn: A season of MySQL-related conferences. Here’s my list appeared first on MySQL Performance Blog.

Innodb transaction history often hides dangerous ‘debt’

October 17, 2014 - 7:02am

In many write-intensive workloads Innodb/XtraDB storage engines you may see hidden and dangerous “debt” being accumulated – unpurged transaction “history” which if not kept in check over time will cause serve performance regression or will take all free space and cause an outage. Let’s talk about where it comes from and what can you do to avoid running into the trouble.

Technical Background: InnoDB is an MVCC engine which means it keeps multiple versions of the rows in the database, and when rows are deleted or updated they are not immediately removed from the database but kept for some time – until they can be removed. For a majority of OLTP workloads they can be removed seconds after the change actually took place. In some cases though they might need to be kept for a long period of time – if there are some old transactions running in the system that might still need to look at an old database state. As of MySQL 5.6 Innodb has one or several “purge threads” which remove the old data that can be removed, though they might not be doing it fast enough for workloads with very intensive writes.

Does it really happen? I started looking into this problem based on some customer concerns and to my surprise I could very easily get the history to grow rapidly using basic sysbench “update” workload. It is especially easy with default innodb_purge_threads=1 setting but even with innodb_purge_threads=8 it grows rather rapidly.

If we take a look at the purging speed (which comes from innodb-metrics table) we can see what purge is being very much starved by the active concurrent sysbench process and it speeds up greatly when it is finished:

Now to be frank this is not an easy situation to get in the majority of workloads with short transactions when the undo space is kept in memory purge and is able to keep up. If Undo space however happens to be gone from buffer pool the purge speed can slow down drastically and the system might not be able to keep up anymore. How it could happen? There are 2 common variants….

Long Running Transaction: If you’re having some long running transaction, for example mysqldump, on the larger table the purging has to pause while that transaction is running and a lot of history will be accumulated. If there is enough IO pressure a portion of undo space will be removed from the buffer pool.

MySQL Restart: Even with modest history length restarting MySQL will wash away from memory and will cause purge to be IO bound. This is of course if you’re not using InnoDB Buffer Pool save and reload.

How do you check if your UNDO space is well cached? In Percona Server I can use those commands:

mysql> select sum(curr_size)*16/1024 undo_space_MB from XTRADB_RSEG; +---------------+ | undo_space_MB | +---------------+ | 1688.4531 | +---------------+ 1 row in set (0.00 sec) mysql> select count(*) cnt, count(*)*16/1024 size_MB, page_type from INNODB_BUFFER_PAGE group by page_type; +--------+-----------+-------------------+ | cnt | size_MB | page_type | +--------+-----------+-------------------+ | 55 | 0.8594 | EXTENT_DESCRIPTOR | | 2 | 0.0313 | FILE_SPACE_HEADER | | 108 | 1.6875 | IBUF_BITMAP | | 17186 | 268.5313 | IBUF_INDEX | | 352671 | 5510.4844 | INDEX | | 69 | 1.0781 | INODE | | 128 | 2.0000 | SYSTEM | | 1 | 0.0156 | TRX_SYSTEM | | 6029 | 94.2031 | UNDO_LOG | | 16959 | 264.9844 | UNKNOWN | +--------+-----------+-------------------+ 10 rows in set (1.65 sec)

This shows what the total undo space size is now, 1.7GB, with less than 100MB cached in the buffer pool size….

Here are a few graphs from Running Heavy concurrent query during lighter workload where purging could keep up. In this case I used the “injection” benchmark in sysbench setting –trx-rate to 50% of what the system shown as peak.

mysql> select count(distinct k+ length(pad)) from sbtest1; +--------------------------------+ | count(distinct k+ length(pad)) | +--------------------------------+ | 30916851 | +--------------------------------+ 1 row in set (28 min 32.38 sec)

What we can see from those graphs is that InnoDB purging initially is progressing at a speed fast enough to keep up with inflow of transactions,
however as we kick up the complicated query, purging is stopped and when the query is done the purge speed settles on the new much lower level where it is not able to keep up with the workload anymore.

Now, there is recognition of this problem and there are options with innodb_max_purge_lag and innodb_max_purge_lag_delay to set the maximum length of the history after reaching which delay will be injected for DML statements up to a specified amount of microseconds.

Unfortunately it is not designed very well to use with real applications. The problems I see with its design are two fold….

Looking at Total History: If you think about it there are 2 kinds of records within the history – there are records that can be purged and there are ones which can’t be purged because they are needed by some active transaction. It is perfectly fine to have a lot of records in history if some long transaction is running – it is not the cause of the problem or overload, while we expect what “purgable history” should be low most of the time.

Looking at the Size rather than Rate of Change: Even worse, the history blowout prevention is looking at the current value to inject a delay and not at whenever it is that’s growing or already shrinking.

These together means that cases of long running transactions concurrently with OLTP workloads is handled very poorly – as long as history reaches the specified maximum amount the system will kick into overdrive, delaying all statements to the maximum extent possible, until the history falls back below the threshold. Here is how it looks on graphs:

As you see on the last graph, we got the purge_dml_delay_usec spiking to 10000us (the max I set) even as no purging can be done (see the blue line is at zero). It only actually starts to work on the history when the heavy query completes and really releases the breaks when the purge is complete. In this case the throughput of the system reduced more than 5 times when the delay was active – which would not work for most real-world systems.

Design Thoughts: So what would I change in the purging design of the configuration? I would like to see a better default configuration that should include multiple purge threads and purge delay (improved). I would find some way to measure not only history size but purgable history size and base purge delay on it.  Also make it based on the change rather than threshold – do just enough delay so the history is gradually shrinking. Also basing it on the undo space size instead of the number of transactions (which can vary in size) might be more practical and easier to auto-tune. We also can probably do better in terms of undo space caching – similar to Insert buffer, I’d like to keep it in memory say until 10% of the buffer pool size as removing from the cache something you know you will need very soon is bad business, as well as consider whether there is some form of read-ahead which can work to pre-read undo space which is needed. Right now I’ve tested and neither linear nor random read-ahead seems to help picking it up from disk with less random IO.

Practical Thoughts: Whatever improvements we’ll get from purging we have MySQL and Percona Server 5.6 systems to run for some years to come. So what are the practical steps we can do to manage purge history better?

Monitor: Make sure you are monitoring and graphing innodb_history_list_length. If you use large transactions, set alerts pretty high but do not leave it unchecked.

Configure Set innodb_purge_threads=8 or some other value if you have write intensive workload. Consider playing with innodb_max_purge_lag and innodb_max_purge_lag_delay but be careful – as currently designed it can really bring the server to its knees. You may consider using it interactively instead, changing them as run-time options if you spot history list growths unchecked, balancing current workload demands with resources allocated to purging.

Let it purge before shutdown: In many cases I find purge performance much worse after I restart MySQL Server because of caching. So the good approach might be just to remove the workload from MySQL server before shutting it down to let the purge of outstanding history complete – and only after that shut it down. If the server has crashed you might consider letting it complete purging before getting traffic routed back to it.

Use Innodb Buffer Pool Preload Use innodb_buffer_pool_dump_at_shutdown=on and innodb_buffer_pool_load_at_startup=on to ensure undo space is preloaded back to the buffer pool on startup.

P.S If you wonder where the graphs I have used came from – it is our Percona Cloud Tools – a very convenient way for analyses like these allowing access to all MySQL status variables, InnoDB metrics, tons of OS metrics and more.

The post Innodb transaction history often hides dangerous ‘debt’ appeared first on MySQL Performance Blog.

Percona Toolkit for MySQL with MySQL-SSL Connections

October 16, 2014 - 6:06am

I recently had a client ask me how to use Percona Toolkit tools with an SSL connection to MySQL (MySQL-SSL). SSL connections aren’t widely used in MySQL due to most installations being within an internal network. Still, there are cases where you could be accessing MySQL over public internet or even over a public “private” network (ex: WAN between two colo datacenters). In order to keep packet sniffers at bay, the connection to MySQL should be encrypted.

If you are connecting to Amazon RDS from home or office (ie: not within the AWS network) you better be encrypted!

As there is already a MySQL Performance Blog post on how to setup MySQL SSL connections, we can skip that and dive right in.

As you probably know, the mysql client can read multiple configuration files; the primary one being /etc/my.cnf  You probably also know that the client reads a config file in your $HOME directory: .my.cnf (that’s dot-my-dot-cnf).  It is inside this file that we can set parameters for our shell-user account when connecting to MySQL hosts.

Percona Toolkit uses Perl’s DBI:mysql to make connections to MySQL hosts. This library is linked to the libmysqlclient C library which is responsible for reading and parsing the global config file as well as your $HOME config file. Let’s set some options here that are not directly available in the toolkit scripts. Using $MY_FAVORITE_EDITOR, edit your $HOME/.my.cnf as such:

[client] user = myuser password = foobar ssl-ca = /Users/drmac/ca-cert.pem

You must use the absolute path to the CA file. Relative paths won’t cut it:

ERROR 2026 (HY000): SSL connection error: SSL_CTX_set_default_verify_paths failed

Test your connection first using the mysql client:

asura:~ drmac$ mysql -h 74.13.19.17 -e "SHOW STATUS LIKE 'Ssl_cipher'" +---------------+--------------------+ | Variable_name | Value | +---------------+--------------------+ | Ssl_cipher | DHE-RSA-AES256-SHA | +---------------+--------------------+

Excellent! Now we can use any Percona Toolkit script and connect via SSL:

asura:~ drmac$ pt-table-checksum -h 74.13.19.17 -d foo -t zipcodes TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE 10-13T14:10:02 0 0 45358 7 0 5.959 foo.myzipcodes

Sweet!

Unfortunately, Percona Toolkit scripts are hard-coded to read the [client] section of your .my.cnf. If you don’t want to overwrite any existing configuration that may be present, you can make a new configuration and specify that file to any toolkit script using -F. Again, relative paths won’t work here. Use the absolute path; even if you are in the same directory.

asura:~ drmac$ cp .my.cnf mytestconfig.cnf asura:~ drmac$ rm .my.cnf asura:~ drmac$ pt-table-checksum -h 74.13.19.17 -d foo -t zipcodes -F /Users/drmac/mytestconfig.cnf

Now you can continue using our awesome tools in a secure manner.

Cheers!
-Matthew

The post Percona Toolkit for MySQL with MySQL-SSL Connections appeared first on MySQL Performance Blog.

How to close POODLE SSLv3 security flaw (CVE-2014-3566)

October 15, 2014 - 10:55am
Padding Oracle On Downgraded Legacy Encryption

First off, the naming “convention” as of late for security issues has been terrible. The newest vulnerability (CVE­-2014-3566) is nicknamed POODLE, which at least is an acronym and as per the header above has some meaning.

The summary of this issue is that it is much the same as the earlier B.E.A.S.T (Browser Exploit Against SSL TLS), however there’s no known mitigation method in this case – other than entirely disabling SSLv3 support, in short, an attacker has a vector by which they can retrieve the plaintext form your encrypted streams.

So let’s talk mitigation, the Mozilla Security Wiki Serverside TLS has for some time made strict recommendations of ciphers and protocols; and is certainly worth your attention.

Apache

Disable SSLv3 and SSLv3 in your ssh apache configuration by setting:
SSLProtocol all -SSLv2 -SSLv3

Nginx

Allow support only for TLS in Nginx with the following:
ssl_protocols TLSv1 TLSv1.1 TLSv1.2;

MySQL

It is worth noting that unless you deploy sha256_password plugin for MySQL 5.6; then the authentication handshake must be completed BEFORE SSL / TLS connection is negotiated; therefor this attack vector only becomes an issue for valid logins which have access to the data in the stream anyway. (where sha256_password is deployed there is the option to have authentication take place over SSL / TLS)

This is where things get far more interesting; unlike Apache and Nginx there’s no way to allow / disallow entire protocols of the SSL / TLS spec within mysql; there is however the ability to specify the cipher spec to be used in SSL communication.

As such to remove SSLv3 support from MySQL you need only ensure that none of the SSLv3 ciphers are in use wihtin your configuration.

As per information in this bug you can find a list of SSLv3 ciphers by simply
openssl ciphers -v 'DEFAULT' | awk '/SSLv3 Kx=(RSA|DH|DH(512))/ { print $1 }'
DHE-RSA-AES256-SHA
DHE-DSS-AES256-SHA
DHE-RSA-CAMELLIA256-SHA
DHE-DSS-CAMELLIA256-SHA
AES256-SHA
CAMELLIA256-SHA
EDH-RSA-DES-CBC3-SHA
EDH-DSS-DES-CBC3-SHA
DES-CBC3-SHA
DHE-RSA-AES128-SHA
DHE-DSS-AES128-SHA
DHE-RSA-SEED-SHA
DHE-DSS-SEED-SHA
DHE-RSA-CAMELLIA128-SHA
DHE-DSS-CAMELLIA128-SHA
AES128-SHA
SEED-SHA
CAMELLIA128-SHA
RC4-SHA
RC4-MD5
EDH-RSA-DES-CBC-SHA
EDH-DSS-DES-CBC-SHA
DES-CBC-SHA
EXP-EDH-RSA-DES-CBC-SHA
EXP-EDH-DSS-DES-CBC-SHA
EXP-DES-CBC-SHA
EXP-RC2-CBC-MD5
EXP-RC4-MD5

Removing the above form your ssl-cipher configuration should disable SSLv3 support; of course ensuring your MySQL service is NOT generally accessible is by far one of the most important steps you can take in securing your MySQL deployment against CVE-2014-3566.

You can read more about POODLE here.

Te following script will help to identify support for any none SSLv3 ciphers; unfortunately in my limited testing I have yet to have found a supported none SSLv3 cipher.

mysql -se “SHOW STATUS LIKE ‘Ssl_cipher_list'” | sed ‘s/:/n/g’ | sed ‘s/Ssl_cipher_listss//g’ |
while read sspec;
do SPEC=openssl ciphers -v “$sspec” 2>/dev/null | grep -v SSLv3 | awk ‘{print $1}';
[[ "$sspec" == "$SPEC" ]] && mysql –ssl-cipher=$sspec -e QUIT 2>/dev/null && echo “$sspec OK”;
done

The post How to close POODLE SSLv3 security flaw (CVE-2014-3566) appeared first on MySQL Performance Blog.

Rackspace doubling-down on OpenStack Trove and Percona Server

October 15, 2014 - 12:00am

Founded in 1998, Rackspace has evolved over the years to address the way customers are using data – and more specifically, databases. The San Antonio-based company is fueling the adoption of cloud computing among organizations large and small.

Today Rackspace is doubling down on open source database technologies. Why? Because that’s where the industry is heading, according to Sean Anderson, Manager of Data Services at Rackspace. The company, he said, created a separate business unit of 100+ employees focused solely on database workloads.

The key technologies under the hood include both relational databases (e.g., MySQL, Percona Server, and MariaDB) and NoSQL databases (e.g., MongoDB, Redis, and Apache Hadoop).

Last July Rackspace added support for Percona Server and MariaDB to their Cloud Databases DBaaS (Database-as-a-Service) product, primarily at the request of application developers who had been requesting more open source database support options.

Matt Griffin, Percona director of product management, and I recently sat down with Sean and his colleague Neha Verma, product manager of Cloud Databases. Our discussion focused on the shift to DBaaS as well as what to expect in the future from Rackspace in terms of Cloud Databases, OpenStack Trove and more.

* * *

Matt: Why did you expand the Cloud Databases product this past summer?
Sean:  We launched cloud databases about a year and a half ago. Since then we’ve rolled feature after feature (backups, monitoring, configuration management, etc…) focused on simplifying our customers life, this backed by Fanatical support has made the product easier to use and more production ready than ever. We understand that features aren’t enough so in addition to all the features we have also made significant improvements to the hardware and network infrastructure. All this means that we’ve been very busy not just expanding the offering but also making the offering simpler to use, more complete and more scalable.

Our vision is to offer a robust platform that with the most popular Big Data, SQL, and NoSQL databases on dedicated, bare metal, and public cloud infrastructure.

Matt: What type of customer is your Cloud Databases offering aimed at?
Sean: Currently we have a variety of customers running multiple Cloud Database instances ranging from customers running a two-month marketing campaign to customers running web applications, ecommerce applications with highly transactional database workloads. Our customers prefer the simplicity and reliability of the service which allows them to focus on their business and not worry about the heavy lifting associated with scaling and managing databases.

Matt: How is your Cloud Databases offering backed-up?
Neha: We use Percona XtraBackup  to perform a hot copy of all databases on a instance and then stream the backups to Cloud Files for storage. A customer can anytime restore the backup to a new instance. Percona XtraBackup is the only option we offer customers right now.

Tom: In terms of security, how do you address customer concerns? Are cloud-based open source databases more secure?
Sean: Data security concerns are at an all-time high and we have a number of up and coming features that continue to address those concerns.   Today we offer a number of unique features specifically Cloud Databases can only be accessed on the private network so the database can only be accessed by systems on your private network. Additionally, we support SSL for communication between user application and database instance so that any data transfer is encrypted in transit.  These features along with the built in user controls and authentication mechanisms help significantly address customers security concerns.  Ultimately Cloud-based open source databases or no more or less secure than any other database, security is about more than features it is about the process and people that build and manage your database and we have those more than covered.

Matt: Is this for production applications or pre-production?
Sean: It’s very much production capable. While there’s a perception that this type of offering would only fit for use cases around test or dev, the truth is we are running hundreds of very large, fully managed instances of MySQL on the cloud. We don’t make any delineation between production or pre-production. However, we’re definitely seeing more and more production workloads come onto the service as people are getting educated on the development work that we’ve done around adding these new features. Replication and monitoring are the two most popular right now.

Matt: How are people accessing and using it?
Sean: A majority of our users either access the database via the Control Panel, API or a command-line utility.

Matt: Since the launch, how has the reaction been?
Sean: The reaction from the press standpoint has been very positive. When we talk with industry analysts they see our commitment to open source and where we are going with this.

Tom: How committed is Rackspace to OpenStack?
Sean: We all live in OpenStack. We have tons of Rackers heading to the upcoming OpenStack Paris Summit in November. We’re looking forward to many years of contributing to the OpenStack community.

Tom: Last April, Rackspace hosted several sessions on OpenStack and Trove at the Percona Live MySQL Conference and Expo 2014 in Santa Clara, Calif. What are you looking forward to most at Percona Live 2015?
Sean: For us, Percona Live is about listening to the MySQL community. It’s our best opportunity each year to actually setup shop and get to learn what’s top of mind for them. We then can take that information and develop more towards that direction.

Tom: And as you know we’re also launching “OpenStack Live” to run parallel to the Percona Live MySQL conference. OpenStack Live 2015 runs April 13-14 and will emphasize the essential elements of making OpenStack work better with emphasis on the critical role of MySQL and the value of Trove. I look forward to hearing the latest news from Rackspace at both events.

Thanks Sean and Neha for speaking with us and I look forward to seeing you this coming April in Santa Clara at Percona Live and OpenStack Live!

On a related note, I’ll also be attending Percona Live London (Nov. 3-4) where we’ll have sessions on OpenStack Trove and everything MySQL. If you plan on attending, please join me at the 2014 MySQL Community Dinner (pay-your-own-way) on Nov. 3. (Register here to reserve your spot at the Community Dinner because space will be limited. You do not need to attend Percona Live London to join the dinner).

The post Rackspace doubling-down on OpenStack Trove and Percona Server appeared first on MySQL Performance Blog.

Recover orphaned InnoDB partition tablespaces in MySQL

October 14, 2014 - 8:25am

A few months back, Michael wrote about reconnecting orphaned *.ibd files using MySQL 5.6. I will show you the same procedure, this time for partitioned tables. An InnoDB partition is also a self-contained tablespace in itself so you can use the same method described in the previous post.

To begin with, I have an example table with a few orphaned partitions and we will reconnect each partition one by one to the original table.

mysql [localhost] {msandbox} (recovery) > SHOW CREATE TABLE t1 G *************************** 1. row *************************** Table: t1 Create Table: CREATE TABLE `t1` ( [...] KEY `h_date` (`h_date`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 /*!50100 PARTITION BY RANGE (year(h_date)) (PARTITION p0 VALUES LESS THAN (2006) ENGINE = InnoDB, PARTITION p1 VALUES LESS THAN (2010) ENGINE = InnoDB, PARTITION px VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */ 1 row in set (0.00 sec) mysql [localhost] {msandbox} (recovery) > SELECT COUNT(*) FROM t1; +----------+ | COUNT(*) | +----------+ | 0 | +----------+ 1 row in set (0.00 sec) -rw-rw----. 1 revin revin 176M Oct 8 08:41 t1#P#p0.ibd -rw-rw----. 1 revin revin 612M Oct 8 08:41 t1#P#p1.ibd -rw-rw----. 1 revin revin 932M Oct 8 08:42 t1#P#px.ibd

The first step is to create a dummy table and remove partitioning so that we can reattach individual partitions to this table.

mysql [localhost] {msandbox} (recovery) > CREATE TABLE t1_t LIKE t1; Query OK, 0 rows affected (0.02 sec) mysql [localhost] {msandbox} (recovery) > ALTER TABLE t1_t REMOVE PARTITIONING; Query OK, 0 rows affected (0.01 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql [localhost] {msandbox} (recovery) > ALTER TABLE t1_t DISCARD TABLESPACE; Query OK, 0 rows affected (0.05 sec)
Once the tablespace for our dummy table has been discarded, we copy  one of the partitions to take the place of the dummy table’s tablespace. For example, we copy t1#P#p0.ibd  as t1_t.ibd  into the MySQL data directory, of course taking into account the permissions afterward. The next step is to import the tablespace to the dummy table.

mysql [localhost] {msandbox} (recovery) > ALTER TABLE t1_t IMPORT TABLESPACE; Query OK, 0 rows affected, 1 warning (7.34 sec)
And for the secret sauce, we will exchange our dummy table recently imported tablespace to replace the target partition in our original table.

mysql [localhost] {msandbox} (recovery) > ALTER TABLE t1 EXCHANGE PARTITION px WITH TABLE t1_t; Query OK, 0 rows affected (6.42 sec) mysql [localhost] {msandbox} (recovery) > SELECT COUNT(*) FROM t1; +----------+ | COUNT(*) | +----------+ | 8523686 | +----------+ 1 row in set (2.50 sec)
You can do the same with subpartitions, too! Here’s my slightly different table with subpartitions where I reconnect one of the orphaned tablespacest2#P#px#SP#pxsp1.ibd .

mysql [localhost] {msandbox} (recovery) > SHOW CREATE TABLE t2 G *************************** 1. row *************************** Table: t2 Create Table: CREATE TABLE `t2` ( [...] KEY `h_date` (`h_date`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 /*!50100 PARTITION BY RANGE (year(h_date)) SUBPARTITION BY HASH (u_id) SUBPARTITIONS 2 (PARTITION p0 VALUES LESS THAN (2006) ENGINE = InnoDB, PARTITION p1 VALUES LESS THAN (2010) ENGINE = InnoDB, PARTITION px VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */ 1 row in set (0.00 sec) mysql [localhost] {msandbox} (recovery) > SELECT COUNT(*) FROM t2; +----------+ | COUNT(*) | +----------+ | 0 | +----------+ 1 row in set (0.94 sec) -rw-rw----. 1 revin revin 92M Oct 8 08:44 t2#P#p0#SP#p0sp0.ibd -rw-rw----. 1 revin revin 92M Oct 8 08:44 t2#P#p0#SP#p0sp1.ibd -rw-rw----. 1 revin revin 304M Oct 8 08:44 t2#P#p1#SP#p1sp0.ibd -rw-rw----. 1 revin revin 316M Oct 8 08:44 t2#P#p1#SP#p1sp1.ibd -rw-rw----. 1 revin revin 480M Oct 8 08:45 t2#P#px#SP#pxsp0.ibd -rw-rw----. 1 revin revin 460M Oct 8 08:45 t2#P#px#SP#pxsp1.ibd mysql [localhost] {msandbox} (recovery) > CREATE TABLE t2_t LIKE t2; Query OK, 0 rows affected (0.02 sec) mysql [localhost] {msandbox} (recovery) > ALTER TABLE t2_t REMOVE PARTITIONING; Query OK, 0 rows affected (0.02 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql [localhost] {msandbox} (recovery) > ALTER TABLE t2_t DISCARD TABLESPACE; Query OK, 0 rows affected (0.04 sec)
Once again, after copying t2#P#px#SP#pxsp1.ibd  to replace t2_t.ibd  in the MySQL data directory and setting the correct permissions, we can import it into our dummy table and exchange the same to our original table. In this case, on a table with subpartitions, a tablespace is a combined unit of partition and subpartition, hence in our partition name above P#px is our main partition and SP#pxsp1 the subpartition name. For our exchange command below, we will only need the later portion.

mysql [localhost] {msandbox} (recovery) > ALTER TABLE t2_t IMPORT TABLESPACE; Query OK, 0 rows affected, 1 warning (2.49 sec) mysql [localhost] {msandbox} (recovery) > ALTER TABLE t2 EXCHANGE PARTITION pxsp1 WITH TABLE t2_t; Query OK, 0 rows affected (3.11 sec) mysql [localhost] {msandbox} (recovery) > SELECT COUNT(*) FROM t2; +----------+ | COUNT(*) | +----------+ | 4546036 | +----------+ 1 row in set (0.94 sec)
But wait there’s more: Do you know that in MySQL 5.7, you can take the full shortcut? Directly import all partitions back to the original table, sweet! A quick example below on MySQL 5.7.5, I created the same t2 table above,DISCARD TABLESPACE , copy the partition tablespaces from my test 5.6 instance andIMPORT TABLESPACE  And done!
mysql [localhost] {msandbox} (test) > SELECT COUNT(*) FROM t2; +----------+ | COUNT(*) | +----------+ | 0 | +----------+ 1 row in set (0.00 sec) mysql [localhost] {msandbox} (test) > ALTER TABLE t2 DISCARD TABLESPACE; Query OK, 0 rows affected (0.02 sec) mysql [localhost] {msandbox} (test) > ! cp -v /sbx/msb/msb_5_6_210/data/test/t2#P#* /sbx/msb/msb_5_7_5/data/test/ `/sbx/msb/msb_5_6_210/data/test/t2#P#p0#SP#p0sp0.ibd' -> `/sbx/msb/msb_5_7_5/data/test/t2#P#p0#SP#p0sp0.ibd' `/sbx/msb/msb_5_6_210/data/test/t2#P#p0#SP#p0sp1.ibd' -> `/sbx/msb/msb_5_7_5/data/test/t2#P#p0#SP#p0sp1.ibd' `/sbx/msb/msb_5_6_210/data/test/t2#P#p1#SP#p1sp0.ibd' -> `/sbx/msb/msb_5_7_5/data/test/t2#P#p1#SP#p1sp0.ibd' `/sbx/msb/msb_5_6_210/data/test/t2#P#p1#SP#p1sp1.ibd' -> `/sbx/msb/msb_5_7_5/data/test/t2#P#p1#SP#p1sp1.ibd' `/sbx/msb/msb_5_6_210/data/test/t2#P#px#SP#pxsp0.ibd' -> `/sbx/msb/msb_5_7_5/data/test/t2#P#px#SP#pxsp0.ibd' `/sbx/msb/msb_5_6_210/data/test/t2#P#px#SP#pxsp1.ibd' -> `/sbx/msb/msb_5_7_5/data/test/t2#P#px#SP#pxsp1.ibd' mysql [localhost] {msandbox} (test) > ALTER TABLE t2 IMPORT TABLESPACE; Query OK, 0 rows affected, 6 warnings (11.36 sec) mysql [localhost] {msandbox} (test) > SHOW WARNINGS G *************************** 1. row *************************** Level: Warning Code: 1810 Message: InnoDB: IO Read error: (2, No such file or directory) Error opening './test/t2#P#p0#SP#p0sp0.cfg', will attempt to import without schema verification [...]
 

The post Recover orphaned InnoDB partition tablespaces in MySQL appeared first on MySQL Performance Blog.

How to avoid hash collisions when using MySQL’s CRC32 function

October 13, 2014 - 7:43am

Percona Toolkit’s  pt-table-checksum performs an online replication consistency check by executing checksum queries on the master, which produces different results on replicas that are inconsistent with the master – and the tool pt-table-sync synchronizes data efficiently between MySQL tables.

The tools by default use the CRC32. Other good choices include MD5 and SHA1. If you have installed the FNV_64 user-defined function, pt-table-sync will detect it and prefer to use it, because it is much faster than the built-ins. You can also use MURMUR_HASH if you’ve installed that user-defined function. Both of these are distributed with Maatkit. For details please see the tool’s documentation.

Below are test cases similar to what you might have encountered. By using the table checksum we can confirm that the two tables are identical and useful to verify a slave server is in sync with its master. The following test cases with pt-table-checksum and pt-table-sync will help you use the tools more accurately.

For example, in a master-slave setup we have a table with a primary key on column “a” and a unique key on column “b”. Here the master and slave tables are not in sync and the tables are having two identical values and two distinct values. The pt-table-checksum tool should be able to identify the difference between master and slave and the pt-table-sync in this case should sync the tables with two REPLACE queries.

+-----+-----+ +-----+-----+ | a | b | | a | b | +-----+-----+ +-----+-----+ | 2 | 1 | | 2 | 1 | | 1 | 2 | | 1 | 2 | | 4 | 3 | | 3 | 3 | | 3 | 4 | | 4 | 4 | +-----+-----+ +-----+-----+

Case 1:  Non-cryptographic Hash function (CRC32) and the Hash collision.

The tables in the source and target have two different columns and in general way of thinking the tools should identify the difference. But the below scenarios explain how the tools can be wrongly used and how to avoid them – and make things more consistent and reliable when using the tools in your production.

The tools by default use the CRC32 checksums and it is prone to hash collisions. In the below case the non-cryptographic function (CRC32) is not able to identify the two distinct values as the function generates the same value even we are having the distinct values in the tables.

CREATE TABLE `t1` ( `a` int(11) NOT NULL, `b` int(11) NOT NULL, PRIMARY KEY (`a`), UNIQUE KEY `b` (`b`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Master Slave +-----+-----+ +-----+-----+ | a | b | | a | b | +-----+-----+ +-----+-----+ | 2 | 1 | | 2 | 1 | | 1 | 2 | | 1 | 2 | | 4 | 3 | | 3 | 3 | | 3 | 4 | | 4 | 4 | +-----+-----+ +-----+-----+

Master: [root@localhost mysql]# pt-table-checksum --replicate=percona.checksum --create-replicate-table --databases=db1 --tables=t1 localhost --user=root --password=*** --no-check-binlog-format TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE 09-17T00:59:45 0 0 4 1 0 1.081 db1.t1 Slave: [root@localhost bin]# ./pt-table-sync --print --execute --replicate=percona.checksum --tables db1.t1 --user=root --password=*** --verbose --sync-to-master 192.**.**.** # Syncing via replication h=192.**.**.**,p=...,u=root # DELETE REPLACE INSERT UPDATE ALGORITHM START END EXIT DATABASE.TABLE

Narrowed down to BIT_XOR:

Master: mysql> SELECT BIT_XOR(CAST(CRC32(CONCAT_WS('#', `a`, `b`)) AS UNSIGNED)) FROM `db1`.`t1`; +------------------------------------------------------------+ | BIT_XOR(CAST(CRC32(CONCAT_WS('#', `a`, `b`)) AS UNSIGNED)) | +------------------------------------------------------------+ | 6581445 | +------------------------------------------------------------+ 1 row in set (0.00 sec) Slave: mysql> SELECT BIT_XOR(CAST(CRC32(CONCAT_WS('#', `a`, `b`)) AS UNSIGNED)) FROM `db1`.`t1`; +------------------------------------------------------------+ | BIT_XOR(CAST(CRC32(CONCAT_WS('#', `a`, `b`)) AS UNSIGNED)) | +------------------------------------------------------------+ | 6581445 | +------------------------------------------------------------+ 1 row in set (0.16 sec)

Case 2: As the tools are not able to identify the difference, let us add a new row to the slave and check if the tools are able to identify the distinct values. So I am adding a new row (5,5) to the slave.

mysql> insert into db1.t1 values(5,5); Query OK, 1 row affected (0.05 sec) Master Slave +-----+-----+ +-----+-----+ | a | b | | a | b | +-----+-----+ +-----+-----+ | 2 | 1 | | 2 | 1 | | 1 | 2 | | 1 | 2 | | 4 | 3 | | 3 | 3 | | 3 | 4 | | 4 | 4 | +-----+-----+ | 5 | 5 | +-----+-----+

[root@localhost mysql]# pt-table-checksum --replicate=percona.checksum --create-replicate-table --databases=db1 --tables=t1 localhost --user=root --password=*** --no-check-binlog-format TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE 09-17T01:01:13 0 1 4 1 0 1.054 db1.t1 [root@localhost bin]# ./pt-table-sync --print --execute --replicate=percona.checksum --tables db1.t1 --user=root --password=*** --verbose --sync-to-master 192.**.**.** # Syncing via replication h=192.**.**.**,p=...,u=root # DELETE REPLACE INSERT UPDATE ALGORITHM START END EXIT DATABASE.TABLE DELETE FROM `db1`.`t1` WHERE `a`='5' LIMIT 1 /*percona-toolkit src_db:db1 src_tbl:t1 src_dsn:P=3306,h=192.**.**.**. 10,p=...,u=root dst_db:db1 dst_tbl:t1 dst_dsn:h=192.**.**.**,p=...,u=root lock:1 transaction:1 changing_src:percona.checksum replicate:percona.checksum bidirectional:0 pid:5205 user:root host:localhost.localdomain*/; REPLACE INTO `db1`.`t1`(`a`, `b`) VALUES ('3', '4') /*percona-toolkit src_db:db1 src_tbl:t1 src_dsn:P=3306,h=192.**.**.**, p=...,u=root dst_db:db1 dst_tbl:t1 dst_dsn:h=192.**.**.**,p=...,u=root lock:1 transaction:1 changing_src:percona.checksum replicate:percona.checksum bidirectional:0 pid:5205 user:root host:localhost.localdomain*/; REPLACE INTO `db1`.`t1`(`a`, `b`) VALUES ('4', '3') /*percona-toolkit src_db:db1 src_tbl:t1 src_dsn:P=3306,h=192.**.**.**, p=...,u=root dst_db:db1 dst_tbl:t1 dst_dsn:h=192.**.**.**,p=...,u=root lock:1 transaction:1 changing_src:percona.checksum replicate:percona.checksum bidirectional:0 pid:5205 user:root host:localhost.localdomain*/; # 1 2 0 0 Chunk 01:01:43 01:01:43 2 db1.t1

Well, apparently the tools are now able to identify the newly added row in the slave and the two other rows having the difference.

Case 3: Advantage of Cryptographic Hash functions (Ex: Secure MD5)

As such let us make the tables as in the case1 and ask the tools to use the cryptographic (secure MD5) hash functions instead the usual non-cryptographic function. The default CRC32 function provides no security due to their simple mathematical structure and too prone to hash collisions but the MD5 provides better level of integrity. So let us try with the –function=md5 and see the result.

Master Slave +-----+-----+ +-----+-----+ | a | b | | a | b | +-----+-----+ +-----+-----+ | 2 | 1 | | 2 | 1 | | 1 | 2 | | 1 | 2 | | 4 | 3 | | 3 | 3 | | 3 | 4 | | 4 | 4 | +-----+-----+ +-----+-----+

Narrowed down to BIT_XOR:

Master: mysql> SELECT 'test', 't2', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONCAT(LPAD(CONV(BIT_XOR(CAST(CONV(SUBSTRING (@crc, 1, 16), 16, 10) AS UNSIGNED)), 10, 16), 16, '0'), LPAD(CONV(BIT_XOR(CAST(CONV(SUBSTRING(@crc := md5(CONCAT_WS('#', `a`, `b`)) , 17, 16), 16, 10) AS UNSIGNED)), 10, 16), 16, '0'))), 0) AS crc FROM `db1`.`t1`; +------+----+---+------+------+------+-----+----------------------------------+ | test | t2 | 1 | NULL | NULL | NULL | cnt | crc | +------+----+---+------+------+------+-----+----------------------------------+ | test | t2 | 1 | NULL | NULL | NULL | 4 | 000000000000000063f65b71e539df48 | +------+----+---+------+------+------+-----+----------------------------------+ 1 row in set (0.00 sec) Slave: mysql> SELECT 'test', 't2', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONCAT(LPAD(CONV(BIT_XOR(CAST(CONV(SUBSTRING (@crc, 1, 16), 16, 10) AS UNSIGNED)), 10, 16), 16, '0'), LPAD(CONV(BIT_XOR(CAST(CONV(SUBSTRING(@crc := md5(CONCAT_WS('#', `a`, `b`)) , 17, 16), 16, 10) AS UNSIGNED)), 10, 16), 16, '0'))), 0) AS crc FROM `db1`.`t1`; +------+----+---+------+------+------+-----+----------------------------------+ | test | t2 | 1 | NULL | NULL | NULL | cnt | crc | +------+----+---+------+------+------+-----+----------------------------------+ | test | t2 | 1 | NULL | NULL | NULL | 4 | 0000000000000000df024e1a4a32c31f | +------+----+---+------+------+------+-----+----------------------------------+ 1 row in set (0.00 sec)

[root@localhost mysql]# pt-table-checksum --replicate=percona.checksum --create-replicate-table --function=md5 --databases=db1 --tables=t1 localhost --user=root --password=*** --no-check-binlog-format TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE 09-23T23:57:52 0 1 12 1 0 0.292 db1.t1 [root@localhost bin]# ./pt-table-sync --print --execute --replicate=percona.checksum --tables db1.t1 --user=root --password=amma --verbose --function=md5 --sync-to-master 192.***.***.*** # Syncing via replication h=192.168.56.102,p=...,u=root # DELETE REPLACE INSERT UPDATE ALGORITHM START END EXIT DATABASE.TABLE REPLACE INTO `db1`.`t1`(`a`, `b`) VALUES ('3', '4') /*percona-toolkit src_db:db1 src_tbl:t1 src_dsn:P=3306,h=192.168.56.101,p=..., u=root dst_db:db1 dst_tbl:t1 dst_dsn:h=192.***.***.***,p=...,u=root lock:1 transaction:1 changing_src:percona.checksum replicate:percona.checksum bidirectional:0 pid:5608 user:root host:localhost.localdomain*/; REPLACE INTO `db1`.`t1`(`a`, `b`) VALUES ('4', '3') /*percona-toolkit src_db:db1 src_tbl:t1 src_dsn:P=3306,h=192.168.56.101,p=..., u=root dst_db:db1 dst_tbl:t1 dst_dsn:h=192.***.**.***,p=...,u=root lock:1 transaction:1 changing_src:percona.checksum replicate:percona.checksum bidirectional:0 pid:5608 user:root host:localhost.localdomain*/; # 0 2 0 0 Chunk 04:46:04 04:46:04 2 db1.t1

Master Slave +-----+-----+ +-----+-----+ | a | b | | a | b | +-----+-----+ +-----+-----+ | 2 | 1 | | 2 | 1 | | 1 | 2 | | 1 | 2 | | 4 | 3 | | 4 | 3 | | 3 | 4 | | 3 | 4 | +-----+-----+ +-----+-----+

The MD5 did the trick and solved the problem. See the BIT_XOR result for the MD5 given above and the function is able to identify the distinct values in the tables and resulted with the different crc values. The MD5 (Message-Digest algorithm 5) is a well-known cryptographic hash function with a 128-bit resulting hash value. MD5 is widely used in security-related applications, and is also frequently used to check the integrity but MD5() and SHA1() are very CPU-intensive with slower checksumming if chunk-time is included.

 

The post How to avoid hash collisions when using MySQL’s CRC32 function appeared first on MySQL Performance Blog.

MySQL compression: Compressed and Uncompressed data size

October 10, 2014 - 7:34am

MySQL has information_schema.tables that contain information such as “data_length” or “avg_row_length.” Documentation on this table however is quite poor, making an assumption that those fields are self explanatory – they are not when it comes to tables that employ compression. And this is where inconsistency is born. Lets take a look at the same table containing some highly compressible data using different storage engines that support MySQL compression:

TokuDB:

mysql> select * from information_schema.tables where table_schema='test' G *************************** 1. row *************************** TABLE_CATALOG: def TABLE_SCHEMA: test TABLE_NAME: comp TABLE_TYPE: BASE TABLE ENGINE: TokuDB VERSION: 10 ROW_FORMAT: tokudb_zlib TABLE_ROWS: 40960 AVG_ROW_LENGTH: 10003 DATA_LENGTH: 409722880 MAX_DATA_LENGTH: 9223372036854775807 INDEX_LENGTH: 0 DATA_FREE: 421888 AUTO_INCREMENT: NULL CREATE_TIME: 2014-10-10 07:59:05 UPDATE_TIME: 2014-10-10 08:01:20 CHECK_TIME: NULL TABLE_COLLATION: latin1_swedish_ci CHECKSUM: NULL CREATE_OPTIONS: TABLE_COMMENT: 1 row in set (0.00 sec)

Archive:

mysql> select * from information_schema.tables where table_schema='test' G *************************** 1. row *************************** TABLE_CATALOG: def TABLE_SCHEMA: test TABLE_NAME: comp TABLE_TYPE: BASE TABLE ENGINE: ARCHIVE VERSION: 10 ROW_FORMAT: Compressed TABLE_ROWS: 40960 AVG_ROW_LENGTH: 12 DATA_LENGTH: 501651 MAX_DATA_LENGTH: 0 INDEX_LENGTH: 0 DATA_FREE: 0 AUTO_INCREMENT: NULL CREATE_TIME: NULL UPDATE_TIME: 2014-10-10 08:08:24 CHECK_TIME: NULL TABLE_COLLATION: latin1_swedish_ci CHECKSUM: NULL CREATE_OPTIONS: TABLE_COMMENT: 1 row in set (0.01 sec)

InnoDB:

mysql> select * from information_schema.tables where table_schema='test' G *************************** 1. row *************************** TABLE_CATALOG: def TABLE_SCHEMA: test TABLE_NAME: comp TABLE_TYPE: BASE TABLE ENGINE: InnoDB VERSION: 10 ROW_FORMAT: Compressed TABLE_ROWS: 40660 AVG_ROW_LENGTH: 4168 DATA_LENGTH: 169480192 MAX_DATA_LENGTH: 0 INDEX_LENGTH: 0 DATA_FREE: 1572864 AUTO_INCREMENT: NULL CREATE_TIME: 2014-10-10 08:33:22 UPDATE_TIME: NULL CHECK_TIME: NULL TABLE_COLLATION: latin1_swedish_ci CHECKSUM: NULL CREATE_OPTIONS: row_format=COMPRESSED KEY_BLOCK_SIZE=4 TABLE_COMMENT: 1 row in set (0.00 sec)

From this we can see what Archive and Innodb show the COMPRESSED values for DATA_LENGTH and AVG_ROW_LENGTH while TokuDB shows uncompressed one (as of TokuDB 7.5.1) shipped with Percona Server 5.6.21-69.

The problem here is not only a lack of consistency but also what we need to know about BOTH numbers. We often need to know the uncompressed value to understand how much data there is really in the table, but also how much space it takes on the disk. Comparing these also can help us to understand the compression ratio (or on the contrary expansion due to storage overhead and indexes).

Looking at Information_Schema tables available I can’t find any way to find how much uncompressed data is stored in the Innodb (or Archive) table. The simple trick I can use is running the query along those lines: SELECT SUM(LENGTH(col1)+LENGTH(col2)…) FROM T – This would slightly overestimate the length converting numbers and dates to strings but it is good enough for most purposes.

TokuDB though, while providing uncompressed information in Information_schema TABLES table, allows you to get the information of real data storage on disk from its own information schema tables:

mysql> select * from information_schema.TokuDB_fractal_tree_info where table_schema='test' and table_name='comp' limit 10 G *************************** 1. row *************************** dictionary_name: ./test/comp-main internal_file_name: ./_test_sql_147e_292e_main_2c20c08_1_1b_B_0.tokudb bt_num_blocks_allocated: 125 bt_num_blocks_in_use: 125 bt_size_allocated: 1880088 bt_size_in_use: 1502232 table_schema: test table_name: comp table_dictionary_name: main *************************** 2. row *************************** dictionary_name: ./test/comp-status internal_file_name: ./_test_sql_147e_292e_status_2c20bdd_1_1b.tokudb bt_num_blocks_allocated: 4 bt_num_blocks_in_use: 4 bt_size_allocated: 8704 bt_size_in_use: 512 table_schema: test table_name: comp table_dictionary_name: status 2 rows in set (0.01 sec)

This shows us that this table is really consisting of 2 files each “bt_size_allocated” bytes in length. These numbers are close to what you will see on the disk but not exactly. I see these files are taking 1886208 and 16384 files, respectfully. I wish there would be an exact length available to query so we do not have to think how much difference there is and if it can get large enough in some cases to care.

If you’re just looking for information about how much space has been allocated and how much is currently used for given TokuDB table you can use a query like this:

mysql> select sum(bt_size_allocated) total, sum(bt_size_in_use) used, sum(bt_size_allocated)-sum(bt_size_in_use) free from information_schema.TokuDB_fractal_tree_ +---------+---------+--------+ | total | used | free | +---------+---------+--------+ | 1888792 | 1502744 | 386048 | +---------+---------+--------+ 1 row in set (0.01 sec)

To Sum it up – there is some consistency to improve in terms of reporting compressed and uncompressed data length information in MySQL – both in terms of consistency and information available. It is great to see that TokuDB found a way to report both compressed and uncompressed data size information, yet I would really love to see the actual size on the disk that a given table is taking. And it would be great if there was some consistent way to query it from inside MySQL without having to go to the file-system level and dealing with different ways that different storage engines place data on the file system. This becomes especially important with the coming of full tablespace support in MySQL 5.7 which would make it hard to find all matching files for the table on the filesystem.

The post MySQL compression: Compressed and Uncompressed data size appeared first on MySQL Performance Blog.

MySQL Replication: ‘Got fatal error 1236′ causes and cures

October 8, 2014 - 12:00am

MySQL replication is a core process for maintaining multiple copies of data – and replication is a very important aspect in database administration. In order to synchronize data between master and slaves you need to make sure that data transfers smoothly, and to do so you need to act promptly regarding replication errors to continue data synchronization. Here on the Percona Support team, we often help customers with replication broken-related issues. In this post I’ll highlight the top most critical replication error code 1236 along with the causes and cure. MySQL replication error “Got fatal error 1236” can be triggered by multiple reasons and I will try to cover all of them.

Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: ‘log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master; the first event ‘binlog.000201′ at 5480571

This is a typical error on the slave(s) server. It reflects the problem around max_allowed_packet size. max_allowed_packet refers to single SQL statement sent to the MySQL server as binary log event from master to slave. This error usually occurs when you have a different size of max_allowed_packet on the master and slave (i.e. master max_allowed_packet size is greater then slave server). When the MySQL master server tries to send a bigger packet than defined on the slave server,  the slave server then fails to accept it and hence the error. In order to alleviate this issue please make sure to have the same value for max_allowed_packet on both slave and master. You can read more about max_allowed_packet here.

This error usually occurs when updating a huge number of rows on the master and it doesn’t fit into the value of slave max_allowed_packet size because slave max_allowed_packet size is lower then the master. This usually happens with queries “LOAD DATA INFILE” or “INSERT .. SELECT” queries. As per my experience, this can also be caused by application logic that can generate a huge INSERT with junk data. Take into account, that one new variable introduced in MySQL 5.6.6 and later slave_max_allowed_packet_size which controls the maximum packet size for the replication threads. It overrides the max_allowed_packet variable on slave and it’s default value is 1 GB. In this post, “max_allowed_packet and binary log corruption in MySQL,”my colleague Miguel Angel Nieto explains this error in detail.

Got fatal error 1236 from master when reading data from binary log: ‘Could not find first log file name in binary log index file’

This error occurs when the slave server required binary log for replication no longer exists on the master database server. In one of the scenarios for this, your slave server is stopped for some reason for a few hours/days and when you resume replication on the slave it fails with above error.

When you investigate you will find that the master server is no longer requesting binary logs which the slave server needs to pull in order to synchronize data. Possible reasons for this include the master server expired binary logs via system variable expire_logs_days – or someone manually deleted binary logs from master via PURGE BINARY LOGS command or via ‘rm -f’ command or may be you have some cronjob which archives older binary logs to claim disk space, etc. So, make sure you always have the required binary logs exists on the master server and you can update your procedures to keep binary logs that the slave server requires by monitoring the “Relay_master_log_file” variable from SHOW SLAVE STATUS output. Moreover, if you have set expire_log_days in my.cnf old binlogs expire automatically and are removed. This means when MySQL opens a new binlog file, it checks the older binlogs, and purges any that are older than the value of expire_logs_days (in days). Percona Server added a feature to expire logs based on total number of files used instead of the age of the binlog files. So in that configuration, if you get a spike of traffic, it could cause binlogs to disappear sooner than you expect. For more information check Restricting the number of binlog files.

In order to resolve this problem, the only clean solution I can think of is to re-create the slave server from a master server backup or from other slave in replication topology.

– Got fatal error 1236 from master when reading data from binary log: ‘binlog truncated in the middle of event; consider out of disk space on master; the first event ‘mysql-bin.000525′ at 175770780, the last event read from ‘/data/mysql/repl/mysql-bin.000525′ at 175770780, the last byte read from ‘/data/mysql/repl/mysql-bin.000525′ at 175771648.’

Usually, this caused by sync_binlog <>1 on the master server which means binary log events may not be synchronized on the disk. There might be a committed SQL statement or row change (depending on your replication format) on the master that did not make it to the slave because the event is truncated. The solution would be to move the slave thread to the next available binary log and initialize slave thread with the first available position on binary log as below:

mysql> CHANGE MASTE R TO MASTER_LOG_FILE='mysql-bin.000526', MASTER_LOG_POS=4;

– [ERROR] Slave I/O: Got fatal error 1236 from master when reading data from binary log: ‘Client requested master to start replication from impossible position; the first event ‘mysql-bin.010711′ at 55212580, the last event read from ‘/var/lib/mysql/log/mysql-bin.000711′ at 4, the last byte read from ‘/var/lib/mysql/log/mysql-bin.010711′ at 4.’, Error_code: 1236

I foresee master server crashed or rebooted and hence binary log events not synchronized on disk. This usually happens when sync_binlog != 1 on the master. You can investigate it as inspecting binary log contents as below:

$ mysqlbinlog --base64-output=decode-rows --verbose --verbose --start-position=55212580 mysql-bin.010711

You will find this is the last position of binary log and end of binary log file. This issue can usually be fixed by moving the slave to the next binary log. In this case it would be:

mysql> CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000712', MASTER_LOG_POS=4;

This will resume replication.

To avoid corrupted binlogs on the master, enabling sync_binlog=1 on master helps in most cases. sync_binlog=1 will synchronize the binary log to disk after every commit. sync_binlog makes MySQL perform on fsync on the binary log in addition to the fsync by InnoDB. As a reminder, it has some cost impact as it will synchronize the write-to-binary log on disk after every commit. On the other hand, sync_binlog=1 overhead can be very minimal or negligible if the disk subsystem is SSD along with battery-backed cache (BBU). You can read more about this here in the manual.

sync_binlog is a dynamic option that you can enable on the fly. Here’s how:

mysql-master> SET GLOBAL sync_binlog=1;

To make the change persistent across reboot, you can add this parameter in my.cnf.

As a side note, along with replication fixes, it is always a better option to make sure your replica is in the master and to validate data between master/slaves. Fortunately, Percona Toolkit has tools for this purpose: pt-table-checksum & pt-table-sync. Before checking for replication consistency, be sure to check the replication environment and then, later, to sync any differences.

The post MySQL Replication: ‘Got fatal error 1236′ causes and cures appeared first on MySQL Performance Blog.

Percona Server 5.6.21-69.0 is now available

October 7, 2014 - 9:04am

Percona is glad to announce the release of Percona Server 5.6.21-69.0 on October 7, 2014. Download the latest version from the Percona web site or from the Percona Software Repositories.

Based on MySQL 5.6.21, including all the bug fixes in it, Percona Server 5.6.21-69.0 is the current GA release in the Percona Server 5.6 series. Percona Server is open-source and free. Complete details of this release can be found in the 5.6.21-69.0 milestone on Launchpad.

New Features:

Bugs Fixed:

  • Backup Locks did not guarantee consistent SHOW SLAVE STATUS information with binary log disabled. Bug fixed #1358836.
  • Audit Log Plugin would rotate the audit log in middle of an audit message. Bug fixed #1363370.
  • When the binary log is enabled on a replication slave, SHOW SLAVE STATUS performed under an active BINLOG lock could lead to a deadlock. Bug fixed #1372806.
  • Fixed a memory leak in Metrics for scalability measurement. Bug fixed #1334570.
  • Fixed a memory leak if secure-file-priv option was used with no argument. Bug fixed #1334719.
  • LOCK TABLES FOR BACKUP is now incompatible with LOCK TABLES, FLUSH TABLES WITH READ LOCK, and FLUSH TABLES FOR EXPORT in the same connection. Bug fixed #1360064.

Other bugs fixed: #1361568.

Release notes for Percona Server 5.6.21-69.0 are available in the online documentation. Please report any bugs on the launchpad bug tracker

The post Percona Server 5.6.21-69.0 is now available appeared first on MySQL Performance Blog.

MySQL ring replication: Why it is a bad option

October 7, 2014 - 8:43am

I’ve recently worked with customers using replication rings with 4+ servers; several servers accepting writes. The idea behind this design is always the same: by having multiple servers, you get high availability and by having multiple writer nodes, you get write scalability. Alas, this is simply not true. Here is why.

High Availability

Having several servers is a necessary condition to have high availability, but it’s far from sufficient. What happens if for instance C suddenly disappears?

  • The replication ring is broken, so updates from A and B will never go to D. D will then quickly become so out-of-date that it’s not going to be usable. But wait! A will no longer receive the updates from B so A will quickly become non usable as well. Same thing for B. So unless you are very quick to configure a smaller ring with the remaining nodes, the whole chain will soon be non working.
  • If an event from C is still being executed on one of the other servers, it will go into an infinite loop, simply because C is the only server being able to prevent an event originally coming from C to cycle through the ring.

Conclusion: each time a server goes down, the whole system goes down. In other words, availability is poorer than with a single server.

Write Scalability

You can think that if you are able to run 1000 writes/s on a single server, writing on 4 servers in parallel will allow you to run 4000 writes/s over the whole cluster. However reality is quite different.

Don’t forget that ALL writes will be executed on ALL servers. So we have 2 separate scenarios:

  • Scenario #1: 1000 writes/s is the point where you’re hitting a bottleneck (for instance disk saturation). Then you’ll never be able to handle the extra load coming from replication. What is going to happen is simply that the servers will become slow because of overload and they’ll never be able to go beyond the 1000 writes/s mark.
  • Scenario #2: a single server could handle 5000 writes/s. Then writing on all servers will indeed allow you to claim that your cluster can absorb 4000 writes/s. But you would achieve the same result by running 4000 writes/s on a single server. This has nothing to do with write scalability.

Conclusion: As all writes are run on all servers, writing on multiple nodes doesn’t magically create extra write capacity. You’re still bound by the capacity of a single server.

Other concerns

Another concern when allowing multiple writers is write conflicts. MySQL doesn’t have any mechanism to detect or solve write conflicts.

So lots of “funny” things can happen when writes are conflicting:

  • Duplicate key errors that will cause replication to halt. And no, setting auto_increment_increment and auto_increment_offset cannot resolve all possible situations when duplicate key errors can happen.
  • An even funnier situation is when conflicting writes do not generate a replication error, but instead create hidden data inconsistencies. Like you have value=100 in a field, A does value=value+2 and B does value=valuex2. You can end up with one server having value=202 and another server having value=204. Which one is the right value? Impossible to know…

If you’re interested in learning more on the risk of writing on multiple nodes while using regular MySQL replication, you can check out this webinar.

Conclusion

A ring is one the worst MySQL replication topologies as it dramatically increases the complexity of all operations on the ring while providing no benefit.

If you need an HA solution, it is not an easy choice as there are many of them and all have tradeoffs, but a ring is definitely not the right option. This post can help you find the right candidate(s).

If you need write scalability, the options are limited, but again, MySQL ring replication is not a good fit. The main question to answer is how many writes do you want to be able to run? For instance, if you want 10x write scalability but your current workload is 100 writes/s, that’s easy: just make sure you have a decent schema, decent indexes and decent hardware. If you want 10x write scalability but you’re already running 5000 writes/s, it’s probably time to explore sharding.

The post MySQL ring replication: Why it is a bad option appeared first on Percona Performance Blog.

Percona Server 5.5.40-36.1 is now available

October 7, 2014 - 8:20am


Percona is glad to announce the release of
Percona Server 5.5.40-36.1 on October 7, 2014. Based on MySQL 5.5.40, including all the bug fixes in it, Percona Server 5.5.40-36.1 is now the current stable release in the 5.5 series.

Percona Server is open-source and free. Details of the release can be found in the 5.5.34-36.1 milestone on Launchpad. Downloads are available here and from the Percona Software Repositories.

Bugs Fixed:

Release notes for Percona Server 5.5.40-36.1 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

(Please also note that Percona Server 5.6 series is the latest General Availability series and current GA release is 5.6.21-69.0.)

The post Percona Server 5.5.40-36.1 is now available appeared first on Percona Performance Blog.

Pages

Contact Us 24 Hours A Day
Support Contact us 24×7
Emergency? Contact us for help now!
Sales North America (888) 316-9775 or
(208) 473-2904
Sales
Europe
+44-208-133-0309 (UK)
0-800-051-8984 (UK Toll Free)
0-800-181-0665 (GER Toll Free)
More Numbers
Training (855) 55TRAIN or
(925) 271-5054

 

Share This
]]>