Do not move or modify any TokuDB files. You will break the database, and need to recover the database from a backup.
Fast Insertions and Richer Indexes¶
TokuDB’s fast indexing enables fast queries through the use of rich indexes, such as covering and clustering indexes. It’s worth investing some time to optimize index definitions to get the best performance from MySQL and TokuDB. Here are some resources to get you started:
- “Understanding Indexing” by Zardosht Kasheff (video)
- Rule of Thumb for Choosing Column Order in Indexes
- Covering Indexes: Orders-of-Magnitude Improvements
- Introducing Multiple Clustering Indexes
- Clustering Indexes vs. Covering Indexes
- How Clustering Indexes Sometimes Helps UPDATE and DELETE Performance
- High Performance MySQL, 3rd Edition by Baron Schwartz, Peter Zaitsev, Vadim Tkachenko, Copyright 2012, O’Reilly Media. See Chapter 5, Indexing for High Performance.
Clustering Secondary Indexes¶
One of the keys to exploiting TokuDB’s strength in indexing is to make use of clustering secondary indexes.
TokuDB allows a secondary key to be defined as a clustering key. This means that all of the columns in the table are clustered with the secondary key. Percona Server for MySQL parser and query optimizer support Multiple Clustering Keys when TokuDB engine is used. This means that the query optimizer will avoid primary clustered index reads and replace them by secondary clustered index reads in certain scenarios.
The parser has been extended to support following syntax:
CREATE TABLE ... ( ..., CLUSTERING KEY identifier (column list), ... CREATE TABLE ... ( ..., UNIQUE CLUSTERING KEY identifier (column list), ... CREATE TABLE ... ( ..., CLUSTERING UNIQUE KEY identifier (column list), ... CREATE TABLE ... ( ..., CONSTRAINT identifier UNIQUE CLUSTERING KEY identifier (column list), ... CREATE TABLE ... ( ..., CONSTRAINT identifier CLUSTERING UNIQUE KEY identifier (column list), ... CREATE TABLE ... (... column type CLUSTERING [UNIQUE] [KEY], ...) CREATE TABLE ... (... column type [UNIQUE] CLUSTERING [KEY], ...) ALTER TABLE ..., ADD CLUSTERING INDEX identifier (column list), ... ALTER TABLE ..., ADD UNIQUE CLUSTERING INDEX identifier (column list), ... ALTER TABLE ..., ADD CLUSTERING UNIQUE INDEX identifier (column list), ... ALTER TABLE ..., ADD CONSTRAINT identifier UNIQUE CLUSTERING INDEX identifier (column list), ... ALTER TABLE ..., ADD CONSTRAINT identifier CLUSTERING UNIQUE INDEX identifier (column list), ... CREATE CLUSTERING INDEX identifier ON ...
To define a secondary index as clustering, simply add the word
before the key definition. For example:
CREATE TABLE foo ( column_a INT, column_b INT, column_c INT, PRIMARY KEY index_a (column_a), CLUSTERING KEY index_b (column_b)) ENGINE = TokuDB;
In the previous example, the primary table is indexed on column_a. Additionally, there is a secondary clustering index (named index_b) sorted on column_b. Unlike non-clustered indexes, clustering indexes include all the columns of a table and can be used as covering indexes. For example, the following query will run very fast using the clustering index_b:
SELECT column_c FROM foo WHERE column_b BETWEEN 10 AND 100;
This index is sorted on column_b, making the
WHERE clause fast, and
includes column_c, which avoids lookups in the primary table to satisfy the
TokuDB makes clustering indexes feasible because of its excellent compression and very high indexing rates. For more information about using clustering indexes, see Introducing Multiple Clustering Indexes.
Hot Index Creation¶
TokuDB enables you to add indexes to an existing table and still perform inserts and queries on that table while the index is being created.
ONLINE keyword is not used. Instead, the value of the
tokudb_create_index_online client session variable is examined.
Hot index creation is invoked using the
CREATE INDEX command after setting
on as follows:
mysql> SET tokudb_create_index_online=on; Query OK, 0 rows affected (0.00 sec) mysql> CREATE INDEX index ON foo (field_name);
Alternatively, using the
ALTER TABLE command for creating an index will
create the index offline (with the table unavailable for inserts or queries),
regardless of the value of
tokudb_create_index_online. The only way
to hot create an index is to use the
CREATE INDEX command.
Hot creating an index will be slower than creating the index offline, and
progress depends how busy the mysqld server is with other tasks. Progress of the
index creation can be seen by using the
SHOW PROCESSLIST command (in another
client). Once the index creation completes, the new index will be used in future
If more than one hot
CREATE INDEX is issued for a particular table, the
indexes will be created serially. An index creation that is waiting for another
to complete will be shown as Locked in
SHOW PROCESSLIST. We recommend that
CREATE INDEX be allowed to complete before the next one is started.
Hot Column Add, Delete, Expand, and Rename (HCADER)¶
TokuDB enables you to add or delete columns in an existing table, expand
integer type columns in an
existing table, or rename an existing column in a table with little blocking of
other updates and queries. HCADER typically blocks other queries with a table
lock for no more than a few seconds. After that initial short-term table
locking, the system modifies each row (when adding, deleting, or expanding
columns) later, when the row is next brought into main memory from disk. For
column rename, all the work is done during the seconds of downtime. On-disk rows
need not be modified.
To get good performance from HCADER, observe the following guidelines:
The work of altering the table for column addition, deletion, or expansion is performed as subsequent operations touch parts of the Fractal Tree, both in the primary index and secondary indexes.
You can force the column addition, deletion, or expansion work to be performed all at once using the standard syntax of
OPTIMIZE TABLE X, when a column has been added to, deleted from, or expanded in table X. It is important to note that as of TokuDB version 7.1.0,
OPTIMIZE TABLEis also hot, so that a table supports updates and queries without blocking while an
OPTIMIZE TABLEis being performed. Also, a hot
OPTIMIZE TABLEdoes not rebuild the indexes, since TokuDB indexes do not age. Rather, they flush all background work, such as that induced by a hot column addition, deletion, or expansion.
Each hot column addition, deletion, or expansion operation must be performed individually (with its own SQL statement). If you want to add, delete, or expand multiple columns use multiple statements.
Avoid adding, deleting, or expanding a column at the same time as adding or dropping an index.
The time that the table lock is held can vary. The table-locking time for HCADER is dominated by the time it takes to flush dirty pages, because MySQL closes the table after altering it. If a checkpoint has happened recently, this operation is fast (on the order of seconds). However, if the table has many dirty pages, then the flushing stage can take on the order of minutes.
Avoid dropping a column that is part of an index. If a column to be dropped is part of an index, then dropping that column is slow. To drop a column that is part of an index, first drop the indexes that reference the column in one alter table statement, and then drop the column in another statement.
Hot column expansion operations are only supported to
integerdata types. Hot column expansion is not supported if the given column is part of the primary key or any secondary keys.
Rename only one column per statement. Renaming more than one column will revert to the standard MySQL blocking behavior. The proper syntax is as follows:
ALTER TABLE table CHANGE column_old column_new DATA_TYPE REQUIRED_NESS DEFAULT
Here’s an example of how that might look:
ALTER TABLE table CHANGE column_old column_new INT(10) NOT NULL;
Notice that all of the column attributes must be specified.
ALTER TABLE table
CHANGE column_old column_new; induces a slow, blocking column rename.
- Hot column rename does not support the following data types:
LONGBLOB. Renaming columns of these types will revert to the standard MySQL blocking behavior.
- Temporary tables cannot take advantage of HCADER. Temporary tables are typically small anyway, so altering them using the standard method is usually fast.
TokuDB offers different levels of compression, which trade off between the amount of CPU used and the compression achieved. Standard compression uses less CPU but generally compresses at a lower level, high compression uses more CPU and generally compresses at a higher level. We have seen compression up to 25x on customer data.
Compression in TokuDB occurs on background threads, which means that high compression need not slow down your database. Indeed, in some settings, we’ve seen higher overall database performance with high compression.
We recommend that users use standard compression on machines with six or fewer cores, and high compression on machines with more than six cores.
The ultimate choice depends on the particulars of how a database is used, and we recommend that users use the default settings unless they have profiled their system with high compression in place.
tokudb_row_format variables accept
the following values:
|TOKUDB_DEFAULT||Sets the compression to the default behavior. As of TokuDB 7.1.0, the default behavior is to compress using the zlib library. In the future this behavior may change.|
|TOKUDB_FAST||Sets the compression to use the
|TOKUDB_SMALL||Sets the compression to use the
|TOKUDB_ZLIB||Compress using the zlib library, which provides mid-range compression and CPU utilization.|
|TOKUDB_QUICKLZ||Compress using the quicklz library, which provides light compression and low CPU utilization.|
|TOKUDB_LZMA||Compress using the lzma library, which provides the highest compression and high CPU utilization.|
|TOKUDB_SNAPPY||This compression is using snappy library and aims for very high speeds and reasonable compression.|
|TOKUDB_UNCOMPRESSED||This setting turns off compression and is useful for tables with data that cannot be compressed.|
Read Free Replication¶
TokuDB replicas can be configured to perform significantly less read IO in order to apply changes from the source. By utilizing the power of Fractal Tree indexes:
- insert/update/delete operations can be configured to eliminate read-modify-write behavior and simply inject messages into the appropriate Fractal Tree indexes
- update/delete operations can be configured to eliminate the IO required for uniqueness checking
To enable Read Free Replication, the servers must be configured as follows:
- On the replication source:
- Enable row based replication: set
- Enable row based replication: set
- On the replication replica(s):
- The replica must be in read-only mode: set
- Disable unique checks: set
- Disable lookups (read-modify-write): set
- The replica must be in read-only mode: set
You can modify one or both behaviors on the replica(s).
As long as the source is using row based replication, this optimization is available on a TokuDB replica. This means that it’s available even if the source is using InnoDB or MyISAM tables, or running non-TokuDB binaries.
TokuDB Read Free Replication will not propagate
events reliably if TokuDB table is missing the primary key which will
eventually lead to data inconsistency on the replica.
Transactions and ACID-compliant Recovery¶
By default, TokuDB checkpoints all open tables regularly and logs all changes between checkpoints, so that after a power failure or system crash, TokuDB will restore all tables into their fully ACID-compliant state. That is, all committed transactions will be reflected in the tables, and any transaction not committed at the time of failure will be rolled back.
The default checkpoint period is every 60 seconds, and this specifies the time
from the beginning of one checkpoint to the beginning of the next. If a
checkpoint requires more than the defined checkpoint period to complete, the
next checkpoint begins immediately. It is also related to the frequency with
which log files are trimmed, as described below. The user can induce a
checkpoint at any time by issuing the
FLUSH LOGS command. When a database is
shut down normally it is also checkpointed and all open transactions are
aborted. The logs are trimmed at startup.
Managing Log Size¶
TokuDB keeps log files back to the most recent checkpoint. Whenever a log file reaches 100 MB, a new log file is started. Whenever there is a checkpoint, all log files older than the checkpoint are discarded. If the checkpoint period is set to be a very large number, logs will get trimmed less frequently. This value is set to 60 seconds by default.
TokuDB also keeps rollback logs for each open transaction. The size of each log is proportional to the amount of work done by its transaction and is stored compressed on disk. Rollback logs are trimmed when the associated transaction completes.
Recovery is fully automatic with TokuDB. TokuDB uses both the log files and rollback logs to recover from a crash. The time to recover from a crash is proportional to the combined size of the log files and uncompressed size of rollback logs. Thus, if there were no long-standing transactions open at the time of the most recent checkpoint, recovery will take less than a minute.
Disabling the Write Cache¶
When using any transaction-safe database, it is essential that you understand the write-caching characteristics of your hardware. TokuDB provides transaction safe (ACID compliant) data storage for MySQL. However, if the underlying operating system or hardware does not actually write data to disk when it says it did, the system can corrupt your database when the machine crashes. For example, TokuDB can not guarantee proper recovery if it is mounted on an NFS volume. It is always safe to disable the write cache, but you may be giving up some performance.
For most configurations you must disable the write cache on your disk drives. On ATA/SATA drives, the following command should disable the write cache:
$ hdparm -W0 /dev/hda
There are some cases when you can keep the write cache, for example:
Write caching can remain enabled when using XFS, but only if XFS reports that disk write barriers work. If you see one of the following messages in /var/log/messages, then you must disable the write cache:
Disabling barriers, not supported with external log device
Disabling barriers, not supported by the underlying device
Disabling barriers, trial barrier write failed
XFS write barriers appear to succeed for single disks (with no LVM), or for very recent kernels (such as that provided by Fedora 12). For more information, see the XFS FAQ.
In the following cases, you must disable the write cache:
- If you use the ext3 filesystem
- If you use LVM (although recent Linux kernels, such as Fedora 12, have fixed this problem)
- If you use Linux’s software RAID
- If you use a RAID controller with battery-backed-up memory. This may seem counter-intuitive. For more information, see the XFS FAQ
In summary, you should disable the write cache, unless you have a very specific reason not to do so.
TokuDB has a system for tracking progress of long running statements, thereby removing the need to define triggers to track statement execution, as follows:
- Bulk Load: When loading large tables using
LOAD DATA INFILEcommands, doing a
SHOW PROCESSLISTcommand in a separate client session shows progress. There are two progress stages. The first will state something like
Inserted about 1000000 rows. After all rows are processed like this, the next stage tracks progress by showing what fraction of the work is done (e.g.
Loading of data about 45% done)
- Adding Indexes: When adding indexes via
CREATE INDEX, the command
SHOW PROCESSLISTshows progress. When adding indexes via
CREATE INDEX, the command
SHOW PROCESSLISTwill include an estimation of the number of rows processed. Use this information to verify progress is being made. Similar to bulk loading, the first stage shows how many rows have been processed, and the second stage shows progress with a fraction.
- Commits and Aborts: When committing or aborting a transaction, the command
SHOW PROCESSLISTwill include an estimate of the transactional operations processed.
Migrating to TokuDB¶
To convert an existing table to use the TokuDB engine, run
TABLE... ENGINE=TokuDB. If you wish to load from a file, use
INFILE and not
mysqldump will be much slower. To
create a file that can be loaded with
LOAD DATA INFILE, refer to the
OUTFILE option of the SELECT Syntax.
Creating this file does not save the schema of your table, so you may want to create a copy of that as well.