Recently I was tasked with investigating slippage between master and slave in a standard replication setup.
The client was using Maatkit’s mk-table-checksum to check his slave data was indeed a fair copy of that of the master.
1 mk-table-checksum --algorithm=BIT_XOR h=hostname.local,u=root,p=xxx --replicate=checksum.checksum --emptyrepltbl --chunksize=500000 --databases mydb --sleep 1
He could then examine the checksum.checksum table and see all was well, however there were various tables with different crc values.
1 <br>db: mydb <br>tbl: Foo_History<br>chunk: 0<br>boundaries: 1=1<br>this_crc: 30627c76fe658fd9b77eaddf1ea8c03a<br>this_cnt: 2593<br>master_crc: bdbadd7dae2636a8cf515bb886fb1295<br>master_cnt: 2593<br>ts: 2008-09-24 04:50:05<br>
So, now I needed to find out what was updating the table. Here is where tools like mysqlsla and Maatkit’s mk-log-parser come into their own as they both allow you to quickly parse the binary log files, extracting the relevant statements.
1 mysqlbinlog bin_log.000001 | mysqlsla -lt binary -
Check out http://hackmysql.com/mysqlsla_filters for how to filter by statement.
Looking through the binary logs I could see this table is actually an audit table for changes to the Foo table. The trail is kept using two triggers on that table.
1 <br>select trigger_name,<br>event_object_table,<br>Event_Manipulation <br>from information_schema.triggers where trigger_schema = 'mydb' <br>and action_statement like '%Foo_History%'G<br>*************************** 1. row ***************************<br> trigger_name: Foo_Update<br>event_object_table: Foo<br>Event_Manipulation: UPDATE<br>*************************** 2. row ***************************<br> trigger_name: Foo_Delete<br>event_object_table: Foo<br>Event_Manipulation: DELETE<br>
So whats the problem with that?, well there is a situation where two overlapping transactions updating the Foo table can be reordered once serialized on the slave.
Here is an example:
I recreated the tables and triggers, populating the Foo table with a handful of rows and then ran the following.
Here is the update trigger:
1 <br>Create Trigger Foo_Update After UPDATE on Foo <br>For Each Row INSERT into Foo_History (Foo_History_ID, Name, Value, Field_Id) <br>Values (Old.Foo_History_ID, Old.Name, Old.Value, Old.Field_Id);<br>
1 <br>Start Transaction;<br>Update Foo set Value=6 Where Field_ID = 3;<br>
1 <br>Start Transaction;<br>Update Foo set Value=6 Where Field_Id = 51;<br>Commit;<br>
1 <br>Commit;<br>
Now when these statements get run on the slave they will be serialized, thus changing the order of the inserts made by the trigger. The Foo_History table is now out of sync.
Master:
1 <br>*************************** 1. row ***************************<br> Foo_History_Id: 1<br> Name: maxlength<br> Value: 7<br> Field_Id: 3<br>*************************** 2. row ***************************<br> Foo_History_Id: 2<br> Name: maxlength<br> Value: 7<br> Field_Id: 51<br>
Slave:
1 <br>*************************** 1. row ***************************<br> Foo_History_Id: 1<br>Name: maxlength<br> Value: 7<br> Field_Id: 51<br>*************************** 2. row ***************************<br> Foo_History_Id: 2<br>Name: maxlength<br> Value: 7<br> Field_Id: 3<br>
As you can see from the above, the updates were performed in a different order, with the inserts being assigned a different Foo_History_Id. This is because the statements are written to the binary log in commit order.