October 25, 2014

Examining the TokuDB MySQL storage engine file structure

As we know different storage engines in MySQL have different file structures. Every table in MySQL 5.6 must have a .frm file in the database directory matching the table name. But where the rest of the data resides depends on the storage engine.

For MyISAM we have .MYI and .MYD files in the database directory (unless special settings are in place); for InnoDB we might have data stored in the single table space (typically ibdata1 in the database directory) or as file per table (or better said file per partition) producing a single file with .ibd extension for each table/partition. TokuDB as of this version (7.1.7) has its own innovative approach to storing the table contents.

I have created the table in the database test having the following structure:

No files appear in the “test” database directory besides mytable.frm, however few files are created in the database directory:

As you can see the table is presented by a series of files – the “status” file, the “main” table which contains clustered fractal tree index (primary key) plus each index is stored in its own file. Note how files are named – to include the database name, file name and the key name (the name you give to the key, not the columns involved). This is followed by something like “22f1818_1_19″ which I assume is kind of internal TokuDB object identifier.

Note also (at least in my system) files are created with executable bit set. I see no reason for this and this is probably just a minor bug.

Another minor bug (or intended design limitation?) seems to be TokuDB might loose the actual table name in its file name when you alter the table. For example as I altered the table to drop one of the keys and add another one named “superkey” I see the “mytable” name is replaced with “sql_390c_247″ which looks very much like the temporary table which was used to rebuild the table:

I like the approach of storing different indexes in the different files as this makes it much easier to drop the index as well as potentially allows the placement of indexes onto different storage if it is desired for some reason. However putting all tables in the database root is a bad idea – having substantial amount of tables, especially with few indexes, each producing huge amounts of files, makes it inconvenient to work with the database directory (which often contains other files – log files, binary logs etc.) plus it might push file systems to their limits or performance limits dealing with huge amounts of files in the single directory.

Many also like having files in the data directory as it allows, in basic configurations, to use simple Unix tools such as du to see how much space given database physically takes.

Same as InnoDB and MyISAM, TokuDB will create a separate set of files for each partition, placing it in the same directory, having same table partitioned by HASH on primary key with 4 partitions I observe:

As you can see “P_p1″ to “P_p2″ suffixes added to each of the files.

What other files exist beyond those which come from TokuDB tables?

There are few system files

Which as their name say contain “directory” – metadata about tables and indexes in the system, rollback – contains data which is needed for transaction rollback and environment contains some kind of information about environment. I did not dig into this – I just see this file is not changed after I have started the instance unlike other 2 which are changed when table structure is changed (tokudb.directory) or when database is modified (tokudb.rollback)

Next you will see

I see TokuDB really tries to lock the database instance preventing concurrent access to the directory with multiple files. I think InnoDB does it more clean way placing the lock on system tablespace which does not require extra files… though the TokuDB team might have some specific reason to do it this way. Might be Oracle holds the software patent on preventing concurrent database operation by locking the file?

Finally there is the transaction log file:

TokuDB transaction log files are not pre-allocated like InnoDB’s but they look more similar to MySQL binary logs – they have sequentially increasing file numbers which will increment as new files are created, file itself will grow as new data is written to the log. As I understand log rotation happens during checkpoint and you would typically see only one log file.

There is a lot more for me to learn when it comes to TokuDB file layout and purpose of individual files, yet I hope this provides you with good basic overview of the TokuDB MySQL storage engine.

Further Reading: Rich Prohaska from TokuDB team was kind enough to share some more good reading with me, for those extra curious: TokuDB Files and File Descriptors, Separate Database Directories Design

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. Antonio Fernandes says:

    For me, the worst part of it is placing ALL the files in the root datadir…

  2. @Antonio, we’re working on fixing that. :)

  3. A long time ago, we did not have any lock files. This would allow multiple MySQL servers to simultaneously use the same directories for the TokuDB data or the TokuDB recovery log and walk over each other. So, we just stuck lock files in all of the directories that store TokuDB files.

  4. The TokuDB recovery log consists of a sequence of files of about 100MB in size. We need to keep as many of these log files as necessary to contain a complete checkpoint. A checkpoint starts with a begin checkpoint log entry and completes with an end checkpoint log entry. While the checkpoint is in progress, log entries for current operations continue to be logged.
    Once a checkpoint is completed, we can trim the older log files that precede the log file that contains the checkpoint begin log entry.

  5. Please also see my own impressions: http://code.openark.org/blog/mysql/converting-an-olap-database-to-tokudb-part-2-the-process-of-migration

    The issue with one-file-per-index-per-partition leads to a situation where we have tens of thousands of files (all in the data directory). This causes MySQL/TokuDB to consistently crash, even if no data was present in any tables — just table definitions.

    Our solution was to modify our partitioning schema from daily/month to yearly. I don’t like it when I need to change business logic due to newly introduced system limitations.

  6. I just checked in a change that does NOT turn the execute permission bit on for the TokuDB data and log files.

  7. Rich,

    Thank you for prompt action!

  8. Peter (Stig) Edwards says:

    https://github.com/Tokutek/tokudb-engine/issues/59 is for “Put TokuDB table files within appropriate schema directories”

Speak Your Mind

*