Recently I was tasked with investigating slippage between master and slave in a standard replication setup.
The client was using Maatkit’s mk-table-checksum to check his slave data was indeed a fair copy of that of the master.
1 mk-table-checksum --algorithm=BIT_XOR h=hostname.local,u=root,p=xxx --replicate=checksum.checksum --emptyrepltbl --chunksize=500000 --databases mydb --sleep 1
He could then examine the checksum.checksum table and see all was well, however there were various tables with different crc values.
123456789 db: mydbtbl: Foo_Historychunk: 0boundaries: 1=1this_crc: 30627c76fe658fd9b77eaddf1ea8c03athis_cnt: 2593master_crc: bdbadd7dae2636a8cf515bb886fb1295master_cnt: 2593ts: 2008-09-24 04:50:05
So, now I needed to find out what was updating the table. Here is where tools like mysqlsla and Maatkit’s mk-log-parser come into their own as they both allow you to quickly parse the binary log files, extracting the relevant statements.
1 mysqlbinlog bin_log.000001 | mysqlsla -lt binary -
Check out http://hackmysql.com/mysqlsla_filters for how to filter by statement.
Looking through the binary logs I could see this table is actually an audit table for changes to the Foo table. The trail is kept using two triggers on that table.
12345678910111213 select trigger_name,event_object_table,Event_Manipulationfrom information_schema.triggers where trigger_schema = 'mydb'and action_statement like '%Foo_History%'G*************************** 1. row ***************************trigger_name: Foo_Updateevent_object_table: FooEvent_Manipulation: UPDATE*************************** 2. row ***************************trigger_name: Foo_Deleteevent_object_table: FooEvent_Manipulation: DELETE
So whats the problem with that?, well there is a situation where two overlapping transactions updating the Foo table can be reordered once serialized on the slave.
Here is an example:
I recreated the tables and triggers, populating the Foo table with a handful of rows and then ran the following.
Here is the update trigger:
123 Create Trigger Foo_Update After UPDATE on FooFor Each Row INSERT into Foo_History (Foo_History_ID, Name, Value, Field_Id)Values (Old.Foo_History_ID, Old.Name, Old.Value, Old.Field_Id);
12 Start Transaction;Update Foo set Value=6 Where Field_ID = 3;
123 Start Transaction;Update Foo set Value=6 Where Field_Id = 51;Commit;
1 Commit;
Now when these statements get run on the slave they will be serialized, thus changing the order of the inserts made by the trigger. The Foo_History table is now out of sync.
Master:
12345678910 *************************** 1. row ***************************Foo_History_Id: 1Name: maxlengthValue: 7Field_Id: 3*************************** 2. row ***************************Foo_History_Id: 2Name: maxlengthValue: 7Field_Id: 51
Slave:
12345678910 *************************** 1. row ***************************Foo_History_Id: 1Name: maxlengthValue: 7Field_Id: 51*************************** 2. row ***************************Foo_History_Id: 2Name: maxlengthValue: 7Field_Id: 3
As you can see from the above, the updates were performed in a different order, with the inserts being assigned a different Foo_History_Id. This is because the statements are written to the binary log in commit order.