EmergencyEMERGENCY? Get 24/7 Help Now!

Identifying useful info from MySQL row-based binary logs

 | January 20, 2015 |  Posted In: Insight for DBAs, MySQL

PREVIOUS POST
NEXT POST

As a MySQL DBA/consultant, it is part of my job to decode the MySQL binary logs – and there are a number of reasons for doing that. In this post, I’ll explain how you can get the important information about your write workload using MySQL row-based binary logs and a simple awk script.

First, it is important to understand that row-based binary logs contain the actual changes done by a query. For example, if I run a delete query against a table, the binary log will contain the rows that were deleted. MySQL provides the mysqlbinlog utility to decode the events stored in MySQL binary logs. You can read more about mysqlbinlog in detail in the reference manual here.

The following example illustrates how mysqlbinlog displays row events that specify data modifications. These correspond to events with the WRITE_ROWS_EVENT, UPDATE_ROWS_EVENT, and DELETE_ROWS_EVENT type codes.

We will use following options of mysqlbinlog.
–base64-output=decode-rows
–verbose, -v
–start-datetime=”datetime”
–stop-datetime=”datetime”

We have a server running with row based binary logging.

We created a test table and executed the following sequence of statements under a transaction.

Now let’s see how it is represented in binary logs.

  • Row with “Table_map: test.t” defines the table name used by query.
  • Rows with “Write_rows/Update_rows/Delete_rows” defines the event type.
  • Lines that start with “###” defines the actual row that got changed.
  • Columns are represented as @1, @2 and so on.

Now have a look at our simple awk script that will use the mysqlbinlog output and print a beautiful summary for INSERT/UPDATE/DELETE events from row-based binary logs. Just replace the “mysqld-bin.000023” with your binary log file. The string “#15” in third line is for year 2015. If you are decoding a binary log file from 2014, just change it to “#14”. It is also recommended to use “–start-datetime” and ” –stop-datetime” options to decode binary logs of a specific time range instead of decoding a large binary log file.

Script :

Output :

This awk script will return following columns as output for every transaction in binary logs.

Timestamp : timestamp of event as logged in binary log.
Table : database.tablename
Query Type : Type of query executed on table and number of rows affected by query.

[Transaction total : 3 Insert(s) : 1 Update(s) : 1 Delete(s) : 1]
Above line print the summary of transaction, it displays the total number of rows affected by transaction, total number of rows affected by each type of query in transaction.

Let’s execute some more queries in sakila database and then we will summarize them using our script. We assume that we executed these queries between “2015-01-16 13:30:00” and “2015-01-16 14:00:00”

We will save our script in a file named “summarize_binlogs.sh” and summarize the queries we just executed.

Now we run our script to get the summary of information logged in binary log.

See how nicely our script has summarized the information logged in binary log. For the last transaction, it shows 3 rows that corresponds to each statement executed within transaction and then the final summary of transaction i.e. transaction affected 4 rows in total and all of them were deleted.

This was how we can summarize MySQL row-based binary logs in a more readable format using a simple awk script. This will save you time whenever you need to decode the binary logs for troubleshooting. Now I’ll show you how this summarized information can help answer some basic but important questions.

Q1 : Which tables received highest number of insert/update/delete statements?

Q2 : Which table received the highest number of DELETE queries?

Q3: How many insert/update/delete queries executed against sakila.country table?

Q4: Give me the top 3 statements which affected maximum number of rows.

See how easy it is to identify the large transactions.

Q5 : Find DELETE queries that affected more than 1000 rows.

If we want to get all queries that affected more than 1000 rows.

Conclusion

The information logged in binary logs is really helpful to understand our write workload. We used mysqlbinlog + awk to present the binary log contents in more readable format. Further we used the summarized information to answer some of the important questions.

PREVIOUS POST
NEXT POST
Alok Pathak

Alok joined Percona as a Consultant in May-2014, prior to joining Percona, was giving MySQL database support to various product and service based companies in India. His career interests include High availability of databases, storage engines specially Innodb, server tuning, query optimization, and automating DBA tasks using shell scripts.

9 Comments

  • Nice post! I think the same information (workload) can be easily seen using performance schema:

    mysql> select table_schema, table_name, rows_fetched,rows_inserted,rows_updated,
    rows_deleted from x$schema_table_statistics where table_schema=”test”\G
    *************************** 1. row ***************************
    table_schema: test
    table_name: tbl_partition
    rows_fetched: 846933340
    rows_inserted: 588251
    rows_updated: 488251
    rows_deleted: 100000
    1 row in set (0.03 sec)

  • If you are not using row based binary then something like this would work.

    mysqlbinlog ${1} | grep -i -e “^update” -e “^insert” -e “^delete” -e “^replace” -e “^alter” | cut -c1-100 | tr ‘[A-Z]’ ‘[a-z]’ | sed -e “s/\t/ /g;s/`//g;s/(.*$//;s/ set .*$//;s/ as .*$//” | sed -e “s/ where .*$//” | sort | uniq -c | sort -nr

  • Alok,

    Suppose, I have a binlog in ROW format. I want to extract SQL statements for particular query. How can I do that?

    Thanks
    Sanju.

  • Hi Sanju,

    When using row based binary logging, actual changes will be logged instead of query which made the change. If you’re using MySQL 5.6, you can enable “binlog_rows_query_log_events” variable to log the actual query (for informational purpose only) along with the row data. You can read more about this in the link : http://dev.mysql.com/doc/refman/5.6/en/replication-options-binary-log.html#sysvar_binlog_rows_query_log_events.

    Regards,
    Alok

  • Is there a way to find information regarding which column actually changed in the table! I want to find the columns that are frequently updated

    • I have also tried to create triggers on the replica but it is not triggered by binary log events I don’t know why

Leave a Reply