October 21, 2014

Statistics of InnoDB tables and indexes available in xtrabackup

If you ever wondered how big is that or another index in InnoDB … you had to calculate it yourself by multiplying size of row (which I should add is harder in the case of a VARCHAR – since you need to estimate average length) on count of records. And it still would be quite inaccurate as secondary indexes tend to take more space. So we added more detailed index statistics into our xtrabackup utility. The thanks for this feature goes to a well known Social Network who sponsored the development.


We chose to put this into xtrabackup for a couple of reasons – the first is that running statistics on your backup database does not need to hurt production servers, and the second reason is that running statistic on a stopped database is more accurate than with online (although online is also supported, but you may have inexact results).

Let’s see how it works. I have one table with size 13Gb what was filled during about 2.5 years.
The table is:

And size of file is about 12.88 GB

So to get statistics we run:

which will show something like this:

The output is intensive, let me highlight some points:

It says that PRIMARY key (which is the table by itself, as InnoDB is clustering data by primary key) takes 497839 pages ( 16KB each) and size of data 7492026403 bytes or (6.98 GB). And density ( fitting data into pages) is quite good – 91%. But it was expected, as table is really mostly inserted in, updates and deletes are rare).

And let’s take index domain_id

you can see the allocated pages (43255 pages or 708689920 bytes) are filled only by 76% ( data takes 545031333 bytes). And that means that 150MB are just waste of space. Which is really even worse for key revert_domain

.

For this key about 600MB is empty.

This needs a bit of explaining:
This does not have as good efficiency as the primary key, but a lot of this is to be expected. In a lot of cases we insert into the primary key in order which makes things very predictable, but the inserts into the secondary key index are random – which leads to a lot of page splits.

One helpful new feature to address this is in XtraDB/InnoDB plugin – fast index creation. With this feature, InnoDB creates indexes by sort, so page fill factor should be quite good.

To check that, there is xtrabackup –stats for index domain_id created for table in Barracuda format with Fast creation method:

As you see this time it takes 34383 pages (compare to 43255 in previous statistics).

Though it would be interesting to see how it will grow with further inserts, and I also suspect random INSERTS into so dense space going to be slower than in previous case.

The –stats is not in xtrabackup release yet, only in source code repository, but should be released quite soon.

And the last point of the post – if you are badly missing some features in MySQL, InnoDB, InnoDB-plugin, XtraDB, XtraBackup – you know whom ask for!

About Vadim Tkachenko

Vadim leads Percona's development group, which produces Percona Clould Tools, the Percona Server, Percona XraDB Cluster and Percona XtraBackup. He is an expert in solid-state storage, and has helped many hardware and software providers succeed in the MySQL market.

Comments

  1. wow this is probably the best feature added “evar”! Thanks for the hard work!

  2. peter says:

    Vadim,

    What is leaf pages vs size pages ? “key vals: 1204830, leaf pages 133869, size pages 153984″ It looks like size pages could be larger than sum of number of pages on all levels. Is this because there could be some pages which are allocated to the index because of extent allocation but never used or something else ?

    This actually tells about 2 different issues with full space utilization – partially filled pages and empty pages in allocated extents.

    Another thing I think it would be nice to see values per page as well. This can be computed from the rest of data but it is good if it can be seen. I would use it to understand how efficient range scans could be.

    Another thing – how is this stats computed – doing file scan or scanning indexes in the order. It may be good to see information similar to what “filefrag” gives – how badly each index leaf space is fragmented – in perfect world we would prefer to see each of them sequential if we do not use SSDs

  3. Mark Callaghan says:

    I uploaded an awk script to flatten the output from ‘xtrabackup –stats’. It is at http://launchpad.net/mysqlatfacebook/other/files/+download/xtrabackup_flatten.awk. This generates one line of output per index which is suitable for more processing using Unix command line utilities.

  4. Yasufumi says:

    Peter,

    Hmm…
    Simply, “estimated statistics in dictionary:” shows same values to the INNODB TABLE MONITOR.
    (“appr.key vals %lu, leaf pages %lu, size pages %lu” : http://dev.mysql.com/doc/refman/5.0/en/innodb-monitors.html#innodb-table-monitor )
    So, honestly, I don’t grasp what is the “size pages” exactly yet…

    And, I will add information about “recs/page” and “contiguousness”.
    (contiguousness like http://www.percona.com/docs/wiki/patches:innodb_check_fragmentation ?)

  5. peter says:

    Yasufumi,

    Thanks for explanation. So how this tool will count pages allocated in extents to given index. Will it simply consider these pages empty ? Though I’m not sure the “level” for such pages should be known yet.

Speak Your Mind

*