MySQL Table Fragmentation: Beware of Bulk INSERT with FAILURE or ROLLBACK

MySQL Table Fragmentation InsertUsually, database people are familiar with table fragmentation with DELETE statements. Whenever doing a huge delete, in most cases, they are always rebuilding the table to reclaim the disk space. But, are you thinking only DELETEs can cause table fragmentation? (Answer: NO).

In this blog post, I am going to explain how table fragmentation is happening with the INSERT statement.

Before going into the topic, we need to know that with MySQL, there are two kinds of fragmentation:

  • Fragmentation where some of the InnoDB pages are completely free inside the table.
  • Fragmentation where some of the InnoDB pages are not completely filled (the page has some free space).

There are three major cases of table fragmentation with INSERTs :

  • INSERT with ROLLBACK
  • Failed INSERT statement
  • Fragmentation with page-splits

Test Environment

I have created my own test environment to experiment with those cases.

  • DB: percona
  • Tables : frag, ins_frag, frag_page_spl
  • Table Size: 2G

Case 1: INSERT with ROLLBACK

At first, I have created a new table “ins_frag”. Then I have created a transaction (with BEGIN) and started to copy the data from table “frag” to table “ins_frag” as shown below.

From the above, you can see the INSERT was executed, but still, I did not commit/rollback the INSERT. You can note that both tables have occupied 2 GB of disk space. 

Now, I am going to ROLLBACK the INSERT.

 

Yes, after rollback the INSERT, the table “ins_frag” is still occupying the same 2 GB of disk space. Let’s find the fragmented space through the MySQL client.

 

 

So, this clears the rolling back the INSERT will create the fragmentation.  We need to rebuild the table to reclaim the disk space.

 

 

Case 2: Failed INSERT Statement

To test this case, I have created two MySQL client sessions (session 1 and session 2). 

In session 1, I am running the same INSERT statement within the transaction. But this time I have interrupted and killed the INSERT query at session 2.

Session 1

Session 2

The INSERT is interrupted and failed.

Again, at Session 1:

 

The INSERT is not completed and there is no data in the table. But still, the table .ibd file has grown up to 212M.  Let’s see the fragmented space through the MySQL client.

 

It shows the table has fragmented space and has to rebuild the table to reclaim the space.

Case 3: Fragmentation with Page-Splits

We know that internally, InnoDB records are stored in the InnoDB pages. Each page size is 16K by default, but you have the option to change the page size.

If the InnoDB page doesn’t have enough space to accommodate the new record or index entry, it will be split in two pages, which will be about 50% full each. This means even for insert only workload, with no rollbacks or deletes, you may end up with only 75% avg page utilization – and so a 25% loss for this kind of internal page fragmentation.

When the indexes are built by sort, they will have more congestion, and if the table has a lot of inserts that go to the random location in the index, it will cause the page-split.

Check out this excellent blog written by Marco Tusa, InnoDB Page Merging and Page Splitting, that has the complete internals about the page-split and InnoDB page structure/operations.

For an experiment, I have created a table with a sorted index (descending),

 

We can monitor the page split activity from the table INFORMATION_SCHEMA.INNODB_METRICS . For this, you need to enable the InnoDB monitor.

Then I have created the script to trigger the INSERTs randomly with 6 parallel threads. After the end of the scripts:

Yes, from the metrics, we can see the page-split counter has increased. The output shows that there are 52186 page-splits operations that occurred, which created 127.92 MB of fragmentation.

Once the split page is created, the only way to move back is to have the created page drop below the merge threshold. When that happens, InnoDB moves the data from the split page with a merge operation. MERGE_THRESHOLD is configurable for table and specific indexes.

The other way is to reorganize the data is to OPTIMIZE the table. This can be a very heavy and long process, but often is the only way to recover from a situation where too many pages are located in sparse extents.

Conclusion

  • The first two cases are rare. Because most of the applications are not designed to write huge data in the table. 
  • You need to be aware of these issues, whenever doing bulk INSERTs (INSERT INTO SELECT * FROM, Loading data from Mysqldump, INSERT with huge data, etc.)
  • Remember that your fragmented disk space is always re-usable.

Share this post

Comments (2)

  • Jeeva Reply

    Really helpful.
    This helps me to understand how should we should do bulk insertion and how should not!
    And rectifying frags is explained clearer

    June 24, 2020 at 10:33 pm
  • KS Reply

    Helpful in getting grasp on internals. But open question is how to remove fragmentation from index & tables, once it gets there.

    June 26, 2020 at 8:15 am

Leave a Reply