Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Examining the TokuDB MySQL storage engine file structure

July 30, 2014

Author

Peter Zaitsev

Insight for DBAs

MySQL

Share this Post:

As we know different storage engines in MySQL have different file structures. Every table in MySQL 5.6 must have a .frm file in the database directory matching the table name. But where the rest of the data resides depends on the storage engine.

For MyISAM we have .MYI and .MYD files in the database directory (unless special settings are in place); for InnoDB we might have data stored in the single table space (typically ibdata1 in the database directory) or as file per table (or better said file per partition) producing a single file with .ibd extension for each table/partition. TokuDB as of this version (7.1.7) has its own innovative approach to storing the table contents.

I have created the table in the database test having the following structure:

CREATE TABLE `mytable` (
  `id` int(10) unsigned NOT NULL,
  `c` varchar(15) NOT NULL,
  `d` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `c` (`c`),
  KEY `d` (`d`,`c`),
  KEY `d_2` (`d`)
) ENGINE=TokuDB DEFAULT CHARSET=latin1

CREATE TABLE `mytable` (

`id` int(10) unsigned NOT NULL,

`c` varchar(15) NOT NULL,

`d` int(11) NOT NULL,

PRIMARY KEY (`id`),

KEY `c` (`c`),

KEY `d` (`d`,`c`),

KEY `d_2` (`d`)

) ENGINE=TokuDB DEFAULT CHARSET=latin1

No files appear in the “test” database directory besides mytable.frm, however few files are created in the database directory:

-rwxrwx--x  1 mysql mysql       40960 Jul 29 21:01 _test_mytable_key_c_22f19b0_3_19_B_0.tokudb
-rwxrwx--x  1 mysql mysql       16896 Jul 29 21:02 _test_mytable_key_d_2_22f223b_3_19_B_0.tokudb
-rwxrwx--x  1 mysql mysql       16896 Jul 29 21:01 _test_mytable_key_d_22f1c9a_3_19_B_0.tokudb
-rwxrwx--x  1 mysql mysql       32768 Jul 29 21:01 _test_mytable_main_22f1818_2_19.tokudb
-rwxrwx--x  1 mysql mysql       65536 Jul 29 21:02 _test_mytable_status_22f1818_1_19.tokudb

-rwxrwx--x 1 mysql mysql 40960 Jul 29 21:01 _test_mytable_key_c_22f19b0_3_19_B_0.tokudb

-rwxrwx--x 1 mysql mysql 16896 Jul 29 21:02 _test_mytable_key_d_2_22f223b_3_19_B_0.tokudb

-rwxrwx--x 1 mysql mysql 16896 Jul 29 21:01 _test_mytable_key_d_22f1c9a_3_19_B_0.tokudb

-rwxrwx--x 1 mysql mysql 32768 Jul 29 21:01 _test_mytable_main_22f1818_2_19.tokudb

-rwxrwx--x 1 mysql mysql 65536 Jul 29 21:02 _test_mytable_status_22f1818_1_19.tokudb

As you can see the table is presented by a series of files – the “status” file, the “main” table which contains clustered fractal tree index (primary key) plus each index is stored in its own file. Note how files are named – to include the database name, file name and the key name (the name you give to the key, not the columns involved). This is followed by something like “22f1818_1_19” which I assume is kind of internal TokuDB object identifier.

Note also (at least in my system) files are created with executable bit set. I see no reason for this and this is probably just a minor bug.

Another minor bug (or intended design limitation?) seems to be TokuDB might loose the actual table name in its file name when you alter the table. For example as I altered the table to drop one of the keys and add another one named “superkey” I see the “mytable” name is replaced with “sql_390c_247” which looks very much like the temporary table which was used to rebuild the table:

-rwxrwx--x  1 mysql mysql       32768 Jul 29 21:15 _test_sql_390c_247_key_c_22f6f7d_3_19.tokudb
-rwxrwx--x  1 mysql mysql       32768 Jul 29 21:15 _test_sql_390c_247_key_d_22f6f7d_5_19.tokudb
-rwxrwx--x  1 mysql mysql       32768 Jul 29 21:15 _test_sql_390c_247_key_superkey_22f6f7d_4_19.tokudb
-rwxrwx--x  1 mysql mysql       32768 Jul 29 21:14 _test_sql_390c_247_main_22f6f7d_2_19.tokudb
-rwxrwx--x  1 mysql mysql       32768 Jul 29 21:14 _test_sql_390c_247_status_22f6f7d_1_19.tokudb

-rwxrwx--x 1 mysql mysql 32768 Jul 29 21:15 _test_sql_390c_247_key_c_22f6f7d_3_19.tokudb

-rwxrwx--x 1 mysql mysql 32768 Jul 29 21:15 _test_sql_390c_247_key_d_22f6f7d_5_19.tokudb

-rwxrwx--x 1 mysql mysql 32768 Jul 29 21:15 _test_sql_390c_247_key_superkey_22f6f7d_4_19.tokudb

-rwxrwx--x 1 mysql mysql 32768 Jul 29 21:14 _test_sql_390c_247_main_22f6f7d_2_19.tokudb

-rwxrwx--x 1 mysql mysql 32768 Jul 29 21:14 _test_sql_390c_247_status_22f6f7d_1_19.tokudb

I like the approach of storing different indexes in the different files as this makes it much easier to drop the index as well as potentially allows the placement of indexes onto different storage if it is desired for some reason. However putting all tables in the database root is a bad idea – having substantial amount of tables, especially with few indexes, each producing huge amounts of files, makes it inconvenient to work with the database directory (which often contains other files – log files, binary logs etc.) plus it might push file systems to their limits or performance limits dealing with huge amounts of files in the single directory.

Many also like having files in the data directory as it allows, in basic configurations, to use simple Unix tools such as du to see how much space given database physically takes.

Same as InnoDB and MyISAM, TokuDB will create a separate set of files for each partition, placing it in the same directory, having same table partitioned by HASH on primary key with 4 partitions I observe:

-rwxrwx--x  1 mysql mysql       32768 Jul 29 21:22 _test_sql_390c_25b_P_p0_key_c_22f9f35_3_19.tokudb
-rwxrwx--x  1 mysql mysql       32768 Jul 29 21:22 _test_sql_390c_25b_P_p0_key_d_22f9f35_5_19.tokudb
-rwxrwx--x  1 mysql mysql       32768 Jul 29 21:22 _test_sql_390c_25b_P_p0_key_superkey_22f9f35_4_19.tokudb
-rwxrwx--x  1 mysql mysql       32768 Jul 29 21:22 _test_sql_390c_25b_P_p0_main_22f9f35_2_19.tokudb
-rwxrwx--x  1 mysql mysql       32768 Jul 29 21:22 _test_sql_390c_25b_P_p0_status_22f9f35_1_19.tokudb
-rwxrwx--x  1 mysql mysql       32768 Jul 29 21:22 _test_sql_390c_25b_P_p1_key_c_22f9f8e_3_19.tokudb
-rwxrwx--x  1 mysql mysql       32768 Jul 29 21:22 _test_sql_390c_25b_P_p1_key_d_22f9f8e_5_19.tokudb
-rwxrwx--x  1 mysql mysql       32768 Jul 29 21:22 _test_sql_390c_25b_P_p1_key_superkey_22f9f8e_4_19.tokudb
-rwxrwx--x  1 mysql mysql       32768 Jul 29 21:22 _test_sql_390c_25b_P_p1_main_22f9f8e_2_19.tokudb
-rwxrwx--x  1 mysql mysql       32768 Jul 29 21:22 _test_sql_390c_25b_P_p1_status_22f9f8e_1_19.tokudb
-rwxrwx--x  1 mysql mysql       32768 Jul 29 21:22 _test_sql_390c_25b_P_p2_key_c_22f9fe1_3_19.tokudb
-rwxrwx--x  1 mysql mysql       32768 Jul 29 21:22 _test_sql_390c_25b_P_p2_key_d_22f9fe1_5_19.tokudb
-rwxrwx--x  1 mysql mysql       32768 Jul 29 21:22 _test_sql_390c_25b_P_p2_key_superkey_22f9fe1_4_19.tokudb
-rwxrwx--x  1 mysql mysql       32768 Jul 29 21:22 _test_sql_390c_25b_P_p2_main_22f9fe1_2_19.tokudb
-rwxrwx--x  1 mysql mysql       32768 Jul 29 21:22 _test_sql_390c_25b_P_p2_status_22f9fe1_1_19.tokudb
-rwxrwx--x  1 mysql mysql       32768 Jul 29 21:22 _test_sql_390c_25b_P_p3_key_c_22f9ffb_3_19.tokudb
-rwxrwx--x  1 mysql mysql       32768 Jul 29 21:22 _test_sql_390c_25b_P_p3_key_d_22f9ffb_5_19.tokudb
-rwxrwx--x  1 mysql mysql       32768 Jul 29 21:22 _test_sql_390c_25b_P_p3_key_superkey_22f9ffb_4_19.tokudb
-rwxrwx--x  1 mysql mysql       32768 Jul 29 21:22 _test_sql_390c_25b_P_p3_main_22f9ffb_2_19.tokudb
-rwxrwx--x  1 mysql mysql       32768 Jul 29 21:22 _test_sql_390c_25b_P_p3_status_22f9ffb_1_19.tokudb

-rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p0_key_c_22f9f35_3_19.tokudb

-rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p0_key_d_22f9f35_5_19.tokudb

-rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p0_key_superkey_22f9f35_4_19.tokudb

-rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p0_main_22f9f35_2_19.tokudb

-rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p0_status_22f9f35_1_19.tokudb

-rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p1_key_c_22f9f8e_3_19.tokudb

-rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p1_key_d_22f9f8e_5_19.tokudb

-rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p1_key_superkey_22f9f8e_4_19.tokudb

-rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p1_main_22f9f8e_2_19.tokudb

-rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p1_status_22f9f8e_1_19.tokudb

-rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p2_key_c_22f9fe1_3_19.tokudb

-rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p2_key_d_22f9fe1_5_19.tokudb

-rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p2_key_superkey_22f9fe1_4_19.tokudb

-rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p2_main_22f9fe1_2_19.tokudb

-rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p2_status_22f9fe1_1_19.tokudb

-rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p3_key_c_22f9ffb_3_19.tokudb

-rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p3_key_d_22f9ffb_5_19.tokudb

-rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p3_key_superkey_22f9ffb_4_19.tokudb

-rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p3_main_22f9ffb_2_19.tokudb

-rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p3_status_22f9ffb_1_19.tokudb

As you can see “P_p1” to “P_p2” suffixes added to each of the files.

What other files exist beyond those which come from TokuDB tables?

There are few system files

-rwxrwx--x  1 mysql mysql       32768 Jul 29 21:16 tokudb.directory
-rwxrwx--x  1 mysql mysql       16384 Jul 17 19:09 tokudb.environment
-rwxrwx--x  1 mysql mysql     1048576 Jul 29 21:22 tokudb.rollback

-rwxrwx--x 1 mysql mysql 32768 Jul 29 21:16 tokudb.directory

-rwxrwx--x 1 mysql mysql 16384 Jul 17 19:09 tokudb.environment

-rwxrwx--x 1 mysql mysql 1048576 Jul 29 21:22 tokudb.rollback

Which as their name say contain “directory” – metadata about tables and indexes in the system, rollback – contains data which is needed for transaction rollback and environment contains some kind of information about environment. I did not dig into this – I just see this file is not changed after I have started the instance unlike other 2 which are changed when table structure is changed (tokudb.directory) or when database is modified (tokudb.rollback)

Next you will see

-rw-------  1 mysql mysql           0 Jul 17 19:09 __tokudb_lock_dont_delete_me_data
-rw-------  1 mysql mysql           0 Jul 17 19:09 __tokudb_lock_dont_delete_me_environment
-rw-------  1 mysql mysql           0 Jul 17 19:09 __tokudb_lock_dont_delete_me_logs
-rw-------  1 mysql mysql           0 Jul 17 19:09 __tokudb_lock_dont_delete_me_recovery
-rw-------  1 mysql mysql           0 Jul 17 19:09 __tokudb_lock_dont_delete_me_temp

-rw------- 1 mysql mysql 0 Jul 17 19:09 __tokudb_lock_dont_delete_me_data

-rw------- 1 mysql mysql 0 Jul 17 19:09 __tokudb_lock_dont_delete_me_environment

-rw------- 1 mysql mysql 0 Jul 17 19:09 __tokudb_lock_dont_delete_me_logs

-rw------- 1 mysql mysql 0 Jul 17 19:09 __tokudb_lock_dont_delete_me_recovery

-rw------- 1 mysql mysql 0 Jul 17 19:09 __tokudb_lock_dont_delete_me_temp

I see TokuDB really tries to lock the database instance preventing concurrent access to the directory with multiple files. I think InnoDB does it more clean way placing the lock on system tablespace which does not require extra files… though the TokuDB team might have some specific reason to do it this way. Might be Oracle holds the software patent on preventing concurrent database operation by locking the file?

Finally there is the transaction log file:

-rwx------  1 mysql mysql    37499084 Jul 29 21:34 log000000002593.tokulog25

1	-rwx------ 1 mysql mysql 37499084 Jul 29 21:34 log000000002593.tokulog25

TokuDB transaction log files are not pre-allocated like InnoDB’s but they look more similar to MySQL binary logs – they have sequentially increasing file numbers which will increment as new files are created, file itself will grow as new data is written to the log. As I understand log rotation happens during checkpoint and you would typically see only one log file.

There is a lot more for me to learn when it comes to TokuDB file layout and purpose of individual files, yet I hope this provides you with good basic overview of the TokuDB MySQL storage engine.

Further Reading: Rich Prohaska from TokuDB team was kind enough to share some more good reading with me, for those extra curious: TokuDB Files and File Descriptors, Separate Database Directories Design

0 0 votes

Article Rating

11 Comments

Oldest

Newest Most Voted

Antonio Fernandes

11 years ago

For me, the worst part of it is placing ALL the files in the root datadir…

Tim Callaghan

11 years ago

‘@Antonio, we’re working on fixing that. 🙂

Rich Prohaska

11 years ago

A long time ago, we did not have any lock files. This would allow multiple MySQL servers to simultaneously use the same directories for the TokuDB data or the TokuDB recovery log and walk over each other. So, we just stuck lock files in all of the directories that store TokuDB files.

Rich Prohaska

11 years ago

The TokuDB recovery log consists of a sequence of files of about 100MB in size. We need to keep as many of these log files as necessary to contain a complete checkpoint. A checkpoint starts with a begin checkpoint log entry and completes with an end checkpoint log entry. While the checkpoint is in progress, log entries for current operations continue to be logged.
Once a checkpoint is completed, we can trim the older log files that precede the log file that contains the checkpoint begin log entry.

Shlomi Noach

11 years ago

Please also see my own impressions: http://code.openark.org/blog/mysql/converting-an-olap-database-to-tokudb-part-2-the-process-of-migration

The issue with one-file-per-index-per-partition leads to a situation where we have tens of thousands of files (all in the data directory). This causes MySQL/TokuDB to consistently crash, even if no data was present in any tables — just table definitions.

Our solution was to modify our partitioning schema from daily/month to yearly. I don’t like it when I need to change business logic due to newly introduced system limitations.