September 16, 2014

The story of one MySQL Upgrade

I recently worked on upgrading MySQL from one of very early MySQL 5.0 versions to Percona Server 5.1. This was a classical upgrade scenario which can cause surprises. Master and few slaves need to be upgraded. It is a shared database used by tons of applications written by many people over more than 5 years timeframe. It did not have any extensive test suite we could use for validation. As you might guess in such cases some of the original authors have moved on and nobody is exactly sure what application does or does not do with the database. Database is production critical with serious role in serious company so we can’t just do a reckless upgrade

First we needed to do a sanity check on existing replication setup. As we’re checking replication consistency down the road we need to make sure replication is in sync to begin with to avoid false positives. mk-table-checksum is a tool to do it. It turned out replication indeed had an issue replicating
triggers. The problem should be fixed by upgrade so we just have to keep this into account.

We move database to MySQL 5.1 As the database size is relatively small we do mysqldump and load which is a safest way, considering we’re speaking about 4 years worth of changes in versions. We also ran mysql_fix_privilege_tables to ensure all new privileges are added, which is something I frequently see forgotten

Next step is setup MySQL 5.0 to 5.1 replication to see if it runs properly. It turns out it does not because of the old bug which I’ve also seen causing upgrade problems in number of other environments. INSERT ON DUPLICATE KEY UPDATE had a unfair share of replication issues in MySQL 5.0. There are number of ways the problem can be solved but first we decide to see how broad is it. We let Slave to replicate with skip-slave-errors=1105 to see if we get any other problems spotted and in the meanwhile we go over binary logs for the last month to see how frequently this functionality is used. Happily there are only few INSERT ON DUPLICATE KEY UPDATE query instances, and only one of them into table with AUTO_INCREMENT column (and so affected by this bug). It was easy enough to change the single application not to use INSERT ON DUPLICATE KEY UPDATE in this instance so it was done.

So replication was running properly but does data match ? (This also would cover data improperly loaded with mysqldump if there is such). We stopped 5.0 and 5.1 slave at the same position and used mk-table-checksum to ensure the data is in sync. mk-table-checksum can use replication to check consistency but comparing 2 servers directly is faster and we had a spare capacity which we could use. First we ran the check using default CHECKSUM TABLE algorithm. We got number of tables reporting wrong checksums while running SELECT INTO OUTFILE and diffing these files reported no changes. It turns out there are some subtle changes to CHECKSUM TABLE over the years which could report different checksum in some cases. Rerunning check using BIT_XOR algorithm eliminated those false positives. Another table remained though. We used mk-table-sync –print
as a diff tool for MySQL to see what is different in the tables. It turned out one of the float columns stored “-0″ in MySQL 5.0 but it was displayed as “0″ when data loaded to Percona Server 5.1. This was not the issue for application and could be ignored.

So at this point we were sure the write traffic replicates properly to the new setup. It was the time to check how read traffic behaves. We stopped both slaves at the same position again and used tcpdump and mk-query-digest to get sample read traffic from both master and slave. –sample=50 (or similar) option is important to check only limited number of samples for each query type – otherwise it can take a lot of time. Running mk-upgrade with these queries showed some results differences which turned out to be false positives too – thanks to TABLE CHECKSUM mk-upgrade uses by default to check result sets. –compare-results-method rows helped to remove them and we were down to only query time differences. In most cases query time differences were not significant or Percona Server 5.1 did better but there were couple of queries where optimizer plan changed to significantly worse one and they were flagged to be fixed.

At this point we were confident enough Slaves can handle the traffic and we could put them in production. Before upgrading Master however we had to think about rollback plan if something goes wrong and we need to go back to MySQL 5.0 on the master. To do this we set up replication from Percona Server 5.1 back to MySQL 5.0 and performed the same checks again – happily replication worked and there were no “drift”. This allows us to simply to hook up old MySQL 5.0 and all it slaves as a slave off new master and keep it for some time, with rollback to old setup being trivial. This was the best choice as with New MySQL version upgrade involves new Operating System and hardware and any of them could be potential cause of rollback.

MySQL Upgrade in my opinion is the process where hiring external consultant it especially makes sense. The team, even if it includes skilled MySQL DBA typically does not need to go through major version upgrades more frequently than 3-5 years, so unless there are a lot of applications being upgraded by the same team it is hard to archive experience. Also problems you encounter during upgrade are very different depending on the upgrade version – upgrade from MySQL 4.1 to 5.0 had a lot of different issues than upgrade from MySQL 5.0 to 5.1

Also, Maatkit is Awesome, though I believe you know already.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. chris says:

    Peter,

    Great article but I have one small question. When you stated:

    “It turned out one of the float columns stored “-0″ in MySQL 5.0 but it was displayed as “0″ when data loaded to Percona Server 5.1. This was not the issue for application and could be ignored,”

    I was wondering if this was a bug in 5.1 or something else even though it was benign to the application?

    Thanks,

    Chris

  2. Rob Wultsch says:

    “set up replication from Percona Server 5.1 back to MySQL 5.0″
    I have never heard of anyone setting up replication from one major version to an older major version. How stable is such a system, assuming one avoids syntax that does not work on the older version slave?

    IIRC this is very much not supported per the fine manual.

  3. peter says:

    Chris,

    Honestly we did not dig into it. It may be mysqldump bug too. I checked MySQL 5.1 does store both 0 and -0 in float columns and mysqldump dumps them properly. We did not do it especially as for data stored 0 would be more correct than -0 :)

  4. peter says:

    Rob,

    Indeed it is not officially supported and your success would depend a lot on versions and features you’re using. For example if you remember Row based replication in 5.1 it obviously would not work with MySQL 5.0 slave. I would not leave it running forever but it can serve very well during rollback process as well as when not all applications can be upgraded to MySQL 5.1 immediately.

  5. @Rob
    Tungsten can replicate from 5.1 down to 5.0 including row updates. It works as long as you don’t use 5.1-only features (e.g., CREATE TABLE … PARTITION BY …).

  6. howard says:

    I have three questions
    â‘ skip-slave-errors=1105,consider the old bug,it may produce different data by INSERT ON DUPLICATE KEY UPDATE in slave and master,but does it cause an error in the slave?
    â‘¡I check the mysql document,the option name is slave-skip-errors both in 5.1 and 5.0
    â‘¢according to mysql test,”To do this we set up replication from Percona Server 5.1 back to MySQL 5.0″,when I set up this,start slave,it report errors,said ‘log as be corrupt’like this.

  7. Conrad Strydom says:

    Hi,

    I am also running Percona 5.1 and am seeing this “Old Bug” appearing breaking my slaves, is that really still the bug we are talking about from before 2007? Do you know what the situation is there, I guess it was never properly fixed or reintroduced?

    Thanks for the great article btw.

  8. peter says:

    Conrad,

    What bug do you refer to ? The bug I mentioned happens replicating old MySQL to new MySQL (or Percona Server). When you upgrade both Master and Slave it should not happen any more.
    Do you mean something else ?

Speak Your Mind

*