Hot Column Addition and Deletion Part II: How it works

Hot Column Addition and Deletion (HCAD)

In the previous HCAD post, I described HCAD and showed that it can reduce the downtime of column addition (or deletion) from 18 hours to 3 seconds. In fact, the downtime of InnoDB is proportional to the size of the database, whereas the downtime for TokuDB 5.0 depends on the time it takes for MySQL to close and reopen a table — a time that’s independent of database size. Go ahead and build bigger tables. The HCAD downtime for TokuDB won’t increase.

You may be wondering how we do HCAD. Here goes:

Under the hood

TokuDB is based on Fractal Tree indexing, one of the cools features of which is that they replace random I/O with sequential I/O. The way this happens has an impact on how HCAD happens, so here’s the 20,000 foot view.

You can think of everything in a TokuDB table (or index) as a message. There can be messages to insert a row, or update a row, or delete a row. Rather than delivering these messages immediately, the messages are bundled up by common destination, and they progress towards the leaves of the Fractal Tree indexes when there are enough of them to make the disk-head movement worthwhile. They are applied in the right order, obviously, in order to keep the semantics of the SQL commands correct.

Even a query can be thought of as a message, except in the case of a query, it has to be delivered on the spot, even if that means moving the disk head. The query “sees” all the messages ahead of it, so it gets all the right answers, once again, according to the SQL semantics.

An HCAD command generates yet another type of message: it’s a broadcast message that needs to be applied to every row. Sticking the message into the Fractal Tree indexes is fast. In fact, the downtime associated with HCAD has nothing to do with the TokuDB work. Instead, MySQL closes and reopens a table on an alter table command. This causes dirty pages associated with that table to be flushed. The downtime is on the order of seconds, though we have seen extreme cases with many dirty pages and very large RAM where this can take a couple of minutes.

And since this is TokuDB, you also get to define clustering indexes, which reference all columns and therefore speed up lots of queries. The HCAD message gets injected into the primary table and into each clustering index.

The work of changing the rows does not happen when the HCAD message is injected. Rather, the broadcast message makes its way down to the leaves as other messages push it along. In this process, when an HCAD message reaches a row — either because other messages push it along or because of a query — the row gets rewritten to include the added column or exclude the delete column, as the case may be. Once the work is done to rewrite a row, the HCAD work is done for that row. The user can choose to have this work done immediately — say by a query that touches all rows — or lazily — as part of the normal operation of the database. Neither case involves downtime, and once the work is done to rewrite a row, the HCAD work is done for that row.

You can schedule the background work of updating the rows as follows. Recall that any query that touches a row, HCADs the row. A query of the form

where X is primary or one of the clustering indexes, will touch each row and finish all the HCAD work for that index. I should emphasize that you don’t need to do this. Everything works fine without it. This
is an option if you want to get the work done all at once.

Trying it out

This is a 3 step process:

  1. Download TokuDB (free for evaluation purposes or in production up to 50GB of user data).
  2. Read section 3.4 of the User’s Guide.
  3. Load your tables and go!

Learning more

I had the privilege of sitting down for the MySQL Community Podcast with Sheeri Cabral and Sarah Novotny where we spoke about the new TokuDB 5.0 features in depth. See Episodes 39 and 40.


Share this post

Comments (4)

  • Bruno Martinez Reply

    Why does touching a row flush all changes to it? Do you leave an empty space in the smaller COLA arrays?

    Can you implement range updates with the same mechanism you use for HCADs? For example, DELETE FROM T WHERE 3 < id AND id < 9.

    April 8, 2011 at 9:12 pm
    • bradley Reply

      Yes, reading a row causes all references to that row to be consolidated together. It’s just easier to run the MVCC algorithm if the various versions are all put together.

      Yes, in effect the smaller COLAs end up with spaces in them. Keep in mind that the COLA talks present a simplified version of the actual data structure we use. (Usually the talks don’t go into detail about what to do about many details including different-sized keys, transactions, MVCC, checkpointing, logging, and managing those holes. The real data structure doesn’t actually have holes per se.

      Yes, range updates and deletes could be implemented, but (mostly) don’t appear in TokuDB 5.0 because they didn’t fit into the schedule.

      Does this make things clearer, or just muddy the waters further?


      April 9, 2011 at 4:52 pm
  • Bruno Martinez Reply

    Much clearer, thank you.

    April 30, 2011 at 8:55 pm

Leave a Reply