MySQL Users Conference – Innodb


It might look like it is too late to write about stuff happened at Users Conference but I’m just starting find bits of time from processing accumulated backlog. The Theme of this Users Conference was surely Storage Engines both looking at number of third party storage engine presented, main marketing message – Storage Engine partnership with IBM, “Clash of Database Egos” – Storage Engine developers showcase as one of keynotes.

Today let me start with most popular transactional storage engine for MySQL – Innodb.

Innodb Storage Engine was covered in a lot of talks, many of them done by Innodb users. I found these items forth interest

Innodb Zip Page Compression This feature was in development for a while. I believe it is at least 3rd Users Conference it is being talked about, but now it is working enough to do some tests. There seems to be still fair amount of work required to make it work well such as having IO and uncompression happening in parallel, improving buffer pool usage (currently you may need to both store Compressed and Uncompressed versions of the page in the buffer pool in many cases).

I also should mention this feature is not that easy to use as it is implemented now – you have to specify compressed block size, this is the size to which you expect each page to compress. If you choose this number too high the compression ratio will not be good if it is too low there will be a lot of pages which can’t be compressed to specified page size and will have to consume more than one page thus reducing performance. This is especially the problem as you have to specify single page size for Innodb table, while data pages and pages for different indexes may well have very different compression ratio. As Innodb indexes are not prefix compressed they have very good compression potential.

I also would like to see some stats about compressed tables, such as what is the average size pages were compressed to for data and for various indexes. How many “fragmented” pages do we have in the index with current compression settings. The same utility could be used to advice on the compression page settings by looking at compressed page size distribution you can tell which page size will be optimal – will allow to get maximum compression while limiting amount of pages which do not fit in compressed space.

Fast Index Creation This is another feature which I was asking Heikki about probably since my first days as Innodb user when I found out ALTERing Innodb tables may take a lot of time. Fast Index creation will allow Innodb to build indexes by sorting rather by row by row insertions as it currently does which can be multiple orders of magnitude faster for large tables.

There is fair number of questions about this feature which I do not yet know answer for – for example will LOAD DATA INFILE be optimized same way as for MyISAM tables by separate phase of building Indexes ? Will be UNIQUE indexes built by sorting in case of ALTER TABLE (the big Gotcha for MyISAM tables). Are there any plans to be able to build indexes in parallel by running sorts in different threads – using multiple cores becomes increasingly important as their number per CPU just keeps doubling. It is also rather interesting how is sort file structured for building the indexes – does it uses dynamic or static record format ? (static record format for sort used by MyISAM may cause sort files to be much larger than table itself)

The index build speed is however not only reason I can’t wait for this feature to be available. Building index by sort allows to build them physically sorted so they should be much faster for IO Bound Index scans and large range scans. We surely should benchmark this aspect separately.

Improved Auto Increment handling Yet another known Innodb problem is table locks which are taken for statement duration for Inserts with auto-increment column. This starts giving you problems if you have a lot of concurrent inserts happening to the same table.

This was born same as bunch of others due to statement level MySQL Replication which among other requirements, needs all auto-increment values in multiple value insert to be sequential.

To be honest Heikki probably took a bit of shortcut and kept code simple and same for all insert cases – in fact if the insert is single value insert or the number of values in the bulk is known you can allocate all values at once when statement is started instead of handling auto increment lock for full statement duration.

The problem with multi value inserts however is the following – even if you know number of values in insert statement you do not know if all of them need autoincrement value allocated, some may already have values specified. I’m not sure if Heikki will find a way to count exact number of auto increment values needed or if we get behavior change by having potential “holes” when auto increment values are assigned but never used. There are probably similar problems in INSERT IGNORE and ON DUPLICATE KEY UPDATE cases.

Now In MySQL 5.1+ it is also possible to fix this problem for cases when number of rows in auto-increment batch insert is not known as we do not have to allocate sequential auto increment values if row level replication is used. This however will be again behavior change to watch out for.

So these are all possibilities but are we expected to have the problem fixed soon ? Yes indeed there is patch out where already which may come in MySQL 5.2 or even MySQL 5.1. I surely would like to see it sooner at least in more flexible community version.

Relaxing Locks Row level replication allows to get relaxed locks in many other cases. Gap or next-key locks which limit concurrency in many cases can be removed for READ-COMMITED isolation mode which will start offering more performance benefits in MySQL 5.1 It also should be possible to start unlocking rows if they do not match where clause (think for example about running not-indexed UPDATE which has to lock all rows in Innodb right now) I however did not get exact plans on this one.

In general I should mention Innodb is being blamed for a lot of “Gotches” while many of them come from MySQL limits at the time when Innodb was first implemented. Many other transactional Storage Engines do not have the problems because they target at least MySQL 5.1 with row level replication readily available.

Speaking about community items the following were most interesting:

Further Scalability improvements Yasufumi Kinoshita did number of interesting benchmarks stress testing other areas in Innodb than buffer-pool which had it locking relaxed in latest MySQL 5.0 Hopefully the patches he has provided will be merged soon if not we should test them and make binaries with them available. With current MySQL 5.0 I still see Innodb having serious scalability problems in some environments with 4+ cores so it is still important.

IO Scalability patches Mark Callaghan from Google
released so many MySQL patches so we can call patched version GoogleSQL. Among them there are number of patches to make Innodb IO scheduling more aggressive which is important for serious IO subsystems. Many Innodb IO settings were optimized on systems with 1-2 SATA drives so they are far from optimal for server grade IO subsystems.

These are for MySQL 4.0 at this point but we hope to port some of them to MySQL 5.0+

Heikki also mentioned there are more things inside Innodb to unleash performance such as multiple purge threads or parallel log recovery features, which he however needs to test properly before really enabling it.

Innodb IO Tuning I should also mention Paul Tuckfield Keynote about Scaling Innodb at YouTube. There were many other interesting points in his talk besides Innodb but he also very well described (with pictures) while RAID10 is better than RAID5, why large stripes are better and why battery backed up cache is so important. I also usually speak about these in my talks but I’m very bad with pictures.



  1. says

    Hello Peter,

    Thanks for the sum-up on upcoming InnoDB changes! Any performance improvements will be welcome :)

    About Google patches, you wrote “These are for MySQL 5.0 at this point”
    I suppose you meant 4.1, right?

  2. says

    I think you have overstated what was provided in the Google patch, but we don’t mind the nice words. The patches from Google were first available for MySQL 4.0.26. They are now available for 4.0.26 and 5.0.37 — see to get them. There are two patches for 5.0.37, one that only has the changes for semi-sync replication and another that has everything. The changes for 4.0 have gotten more use than the changes for 5.0 — I have yet to do IO performance testing for 5.0. In theory, the changes for more IO threads in InnoDB are not needed, as InnoDB has support for using async IO, but this is disabled because async IO didn’t work too well on Linux in the past. Now that Oracle runs on Linux and uses async IO, it should work well.

  3. peter says

    Thank you Mark,

    That is great. I see you’ve got a lot of interest in Google patches to port them to MySQL 5.0 quickly :)

    I’ve checked MySQL 5.0 patches page it seems to only list one patch which is for semi-synchronous replication. Where is other big one.

    Regarding number of IO threads… well if Asynchronous IO is available and indeed well utilized, ie thread would never block with single outstanding IO it is good enough.

  4. says

    The big patch for MySQL 5 and the semi-sync only patch for MySQL 5 are at

    Real async IO has one advantage over background IO threads that I am curious about. With async IO, if an extent is to be prefetched, then 1 or more async IO requests can be used. While a large IO request can be submitted for some of the pages, the prefix or suffix of the extent (depending on scan direction) should use one request per page so that the page is available before the rest of the pages. Will we ever see the work done by the DIKU researchers show up in InnoDB? Until then, or until async IO is proven to work with InnoDB, extra background IO threads might be the best alternative.

Leave a Reply

Your email address will not be published. Required fields are marked *