October 23, 2014

Learning about MySQL Table Fragmentation

Recently I was working with the customer who need quick warmup – to get Innodb table fetched in memory as fast as possible to get good in memory access performance.

To do it I run the query: “SELECT count(*) FROM tbl WHERE non_idx_col=0″ I use this particular form of query because it will do full table scan – running count(*) without where clause may pick to scan some small index instead.

If your table is not fragmented one of two things should happen – either you should be reading at your hard drive sequential read rate or you would see MySQL becoming CPU bound if IO subsystem is too fast.

In this case however I saw neither – The vmstat showed read speed less than 10MB/sec which is very low for this system which had 6 15K SAS hard drives in RAID10.

Another indication of bad fragmentation was average IO size seen in SHOW INNODB STATUS output. It was around 20KB which means most reads are single page (16K reads). In case of non fragmented table you would see Innodb sequential read-ahead kick in which does reads in 1MB blocks and so you would see average IO size in hundreds of KB.

Now it is worth to notice you can see poor sequential scan performance even if table is not logically fragmented and Innodb is reading data in large blocks – this can happen in case Innodb table file is itself fragmented.

To check if this is the case I usually do “cat table.ibd > /dev/null” and watch IO statistics. If you see small IO request sizes in iostat and simply read speed. Like for the customer in question I saw file read speed of about 50MB/sec which is of course much better than 10MB/sec but well below RAID array capacity.

To check if file fragmentation is the issue or it is poor or miss configured IO subsystem I do another check by running cat /dev/sdb1 > /dev/null – Physical hard drive should never suffer fragmentation so you can get as much sequential IO as you can get (using IO pattern “cat” uses). In this case I got about 300MB/sec which confirmed file fragmentation is also the issue.

Interesting enough the “cure” for both fragmentation issues is the same – OPTIMIZE TABLE tbl – this command recreates the table by writing the new .ibd file (if you’re using innodb_file_per_table=1) which normally would be much less fragmented because it is written at once. Too bad however it requires table to be locked while it is being rebuilt and also it really only defragments clustered key but not the index.

P.S It would be cool to get Innodb objects (data and Index) fragmentation statistics which actually should not be that hard to implement.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. Would also be nice to get stats about whether and why readahead isn’t working.

    But these might be the same stats….. hm

  2. peter says:

    Kevin,

    I think non working read-ahead is result not the cause – if you have requests coming in a way read-ahead is not triggered it does not work :)

  3. just a note about “Optimize table” fixing file fragmentation and table space fragmentation: For ext-3 this is only true if there is enough contiguous blocks left on the disk for the new file created by optimize table-while the table space fragmentation is still solved with Optimize table.

    http://mysqldba.blogspot.com/2008/01/filefrag-dbas-best-friend.html

  4. peter says:

    Dathan,

    Thanks – Indeed optimize/copy may not succeed defragmenting but this is as much as we can succeed trying to defragment something from application level.
    Even if you do not have enough contiguos blocks in single piece the new file often would still have smaller number of fragments than old one.

  5. You may want to check out filefrag, this will tell you (rather quickly) how many fragments your file has. This works on just about any file system (on linux).

    If running XFS, xfs_bmap gives you much more detailed output. xfs_fsr will also (online) defragment the file. You can also use xfs_io to reserve space (resvspc) for files if you want to do this from the start.

  6. I have been using a similar method to ‘warm up the cache’ so to speak: SELECT 1 FROM (SELECT * FROM tablename) x LIMIT 1;
    Is there any benefit from doing it your way over this way or are they pretty much the same thing?

  7. peter says:

    Jay,

    Your method will require temporary table creation in MySQL which can be very large so it is better to use different query patterns as highlighted.

  8. I posted this to MySQL:
    http://bugs.mysql.com/bug.php?id=36588

    Originally we had this discussion in our Forums:
    http://www.webyog.com/forums//index.php?showtopic=3925

    Is this related, you think. Or am I overlooking something basic?

    The culprit is that also SHOW TABLE STATUS can be very slow. Same for queries to Information_Schema fetching the same information. But it seems not always so – I have tried on my laptop and here it is not the case. Framentation makes the difference?

  9. peter says:

    Peter,

    Good catch though it does not have anything to do with table fragmentation. Simply if you’re querying number of rows in table for Innodb it performs estimation – this estimation requires several random dives in the clustered index, which causes IO for large table sizes.

  10. Forget optimize table — better to use alter table order by xxx and force an arrangement by columns that you most often query against.

  11. peter says:

    Steven,

    It really depends. In some cases when there is prevailing access method which is different from physical row layout it can make a huge difference. In other cases one may not know prevailing access or may not have time to run alter table order by regularly. Running ALTER TABLE … ORDER BY … is often much slower than OPTIMIZE TABLE.

  12. I’m I overlooking something here. What is the point of running table optimize on innodb tables if it doesn’t get rid of free data? And how might I go about doing this without changing the table to MyISAM first?

  13. peter says:

    Galen,

    OPTIMIZE TABLE will free disk space if innodb_file_per_table is used. Otherwise it will normally increase free space in innodb tablespace by having pages being more full as well as make table primary key index less fragmented – so full table scan and PK ranges will be faster.

    For Innodb optimize table does not sort indexes (as it does for MyISAM) so the benefit is less.

  14. I am not familiar with innodb_file_per_table. Is this something I can turn on now and start utilizing with currently fragmented innodb tables?

  15. peter says:

    Yes. This is the option you configure in my.cnf

    Note however you will need to recreate tables to start using it. And you main tablespace ibdata1 will never shrink – typically you would reload database when you start using innodb_file_per_table.

  16. @Peter, thank you for all of you insight! ;)

    Sorry for the newbie questions, but I have a few more.

    This actually just worked. I add the line to mysqld within my.cnf. I then optimized the fragmented table and the free data went away. However, I did not need to recreate the table unless optimize took care of this already? Also is having innodb_file_per_table the better way to go? Or should I disable this once my fragmentation is repaired?

    Thanks again!

  17. peter says:

    Free space in the table went away but I believe your ibdata1 has not shrunk.

    innodb_file_per_table is a good way to go to ease space management.

  18. So I will have to do the following, please correct me if I am wrong:

    1. Add innodb_file_per_table to my.cnf
    2. Startup mysql and optimize all tables
    3. mysqldump all DBs
    4. Stop mysql
    5. delete ibdata1 and logfiles
    6. Startup mysql and import from mysqldump

    This should take care of all fragmentation with innodb tables and this will create a new ibdata1 file on startup

    Thanks again for everything! :D

  19. kuldeep Sirohi says:

    I have innodb tables in my production database and i have set files_per_table variable.

    i executed the following query to know the fragmentation

    SELECT TABLE_SCHEMA, TABLE_NAME, CONCAT(ROUND(data_length / ( 1024 * 1024 ), 2), ‘MB’) DATA, CONCAT(ROUND(data_free / ( 1024 * 1024 ), 2), ‘MB’)FREE from information_schema.TABLES where TABLE_SCHEMA=’xyz’ and table_name=’abc’ and Data_free > 0 limit 10;

    and it returned the following result

    xyz abc 1.52MB 4.00MB(free space)

    after this I executed

    alter table abc engine=innodb

    and re-executed

    SELECT TABLE_SCHEMA, TABLE_NAME, CONCAT(ROUND(data_length / ( 1024 * 1024 ), 2), ‘MB’) DATA, CONCAT(ROUND(data_free / ( 1024 * 1024 ), 2), ‘MB’)FREE from information_schema.TABLES where TABLE_SCHEMA=’xyz’ and table_name=’abc’ and Data_free > 0 limit 10;

    still the free space remains the same. that means that table could not be defragmented.

    is there any other way to defragment innodb tables?

    thanks
    Kuldeep

  20. Adam C. Scott says:

    @kuldeep
    If the table was not created while innodb_file_per_table was ON, this will happen. You have to recreate the instance using mysqldump. See (http://dev.mysql.com/doc/refman/5.5/en/innodb-multiple-tablespaces.html) the Note section.

Speak Your Mind

*