11. Tuning for Production

In most cases the default options should be left in-place to run Percona TokuMX, however it is a good idea to review some of the configuration parameters.

In this section, we’ll describe some concerns for tuning Percona TokuMX for production workloads, in terms of resource usage and choices to make for optimal efficiency.

11.1. Resource Usage

Percona TokuMX is unlike most databases in that, even when data is much larger than RAM, it can still be mostly CPU-bound on some workloads that would make other databases bound by disk seeks on rotating media, or disk bandwidth on SSDs. In general, TokuMX’s resource usage is just different from that of other databases. This section serves as a starting point for understanding TokuMX’s resource usage, for the four most commonly constrained resources: CPU, RAM, disk, and network.

11.1.1. CPU

CPU usage is attributed to:

11.1.1.1. Compression work

When data blocks are written to disk (for checkpoint or when evicted while dirty, in the presence of memory pressure), they are first compressed, and compression is primarily CPU work. Decompression is generally much cheaper than compression, and is hardly noticeable next to the disk I/O done just before it, even on most SSDs.

11.1.1.2. Message application

High-volume update workloads tend to stress this subsystem, which updates the data in leaves with the result of applying deferred operations above them in the tree. Small updates to large documents can disproportionately stress this system, in that case it can help to break up a large document into smaller documents and rely on TokuMX’s multi-document transactional semantics to read the same data consistently later.

11.1.1.3. Miscellaneous tasks

Tree searches, serialization and deserialization, sorting for aggregation, and other things common to most databases.

11.1.1.4. Building indexes

Compressing intermediate files during bulk load of indexes (foreground ensureIndex operations as well as loading collections with mongorestore).

11.1.2. RAM

RAM usage is configurable with the cacheSize parameter, and is attributed to:

11.1.2.1. Data cache

Caching uncompressed user data in tree node blocks is the main use of RAM.

Percona TokuMX doesn’t distinguish here between data and indexes, data is just stored in a clustering primary key index.

11.1.2.2. Document-level locks

Document-level Locks are stored in a locktree. The locktree’s size is dependent on the number of concurrent writers and the keys modified by those writers. Its maximum size is controlled by locktreeMaxMemory.

11.1.2.3. Building indexes

Each running bulk load reserves an additional 100MB of RAM by default (loaderMaxMemory).

11.1.2.4. Miscellaneous data

Transient data for cursors, transactions, intermediate aggregation results, thread stacks, etc. This is typically negligible except on extremely memory-constrained systems.

11.1.3. Disk

Disk usage is attributed to:

11.1.3.1. Queries

Reading data not in the cache to answer queries or to find existing documents for updates.

This can be sequential or random, depending on the queries, but the basic unit of I/O is 64KB before compression (readPageSize in Collection and Index Options). In most workloads, especially read-heavy workloads, this is the primary source of disk utilization.

11.1.3.2. Checkpoints and evictions

Writing dirty blocks for checkpoint, or when the cache is too full and something dirty needs to be evicted to make room for other data.

These are large writes—4MB before compression (pageSize in Collection and Index Options)—that tend to appear as sequential I/O.

11.1.3.3. Logging

Writing and fsyncing the transaction log (similar to the journal in basic MongoDB), for any write operation.

These are frequent, small, sequential writes eligible for merging, and frequent fsyncs eligible for group commit, and usually show up as sequential I/O. The fsyncs can be easily absorbed by a battery-backed disk controller, since the I/O is sequential, and the log can be placed on a different device with the logDir server parameter.

11.1.3.4. Building indexes

Writing and reading intermediate files to and from disk during bulk load. This I/O is all sequential, and can be placed on a different device with the tmpDir server parameter.

11.1.4. Network

Network usage is almost identical to basic MongoDB, and is attributed to:

11.1.4.1. Replication

Replicating the oplog to secondaries in a replica set.

Some oplog entries are larger than in basic MongoDB, to support faster application on secondaries; these will cost more bandwidth.

11.1.4.2. Sharding

Chunk migrations to other shards in a sharded cluster.

11.1.4.3. Clients

Sending and receiving data to and from applications and sharding routers.

11.2. Memory Allocation

Percona TokuMX will allocate 50% of the installed RAM for its own cache (cacheSize).

While this is optimal in most situations, there are cases where it may lead to memory over allocation. If the system tries to allocate more memory than is available, the machine will begin swapping and run much slower than normal.

The 2 most frequent cases when it is necessary to set the cacheSize to a value other than the default are:

  • Using Direct I/O:

    The directio parameter enables Direct I/O for all data and index filesystem operations, which bypasses the kernel’s page cache. This parameter removes the need to leave extra space available for the page cache, so we suggest increasing the cacheSize parameter to 80% or more of main memory. Using the directio flag and increasing cacheSize often improves the performance of TokuMX.

  • Running other memory heavy processes on the same server as TokuMX:

    In many cases the database process needs to share the system with other server processes like additional database instances, http server, application server, email server, monitoring systems and others. In order to properly configure TokuMX’s memory consumption, it’s important to understand how much free memory will be left and assign a sensible value. There is no fixed rule, but a conservative choice would be 50% of available RAM while all the other processes are running. If the result is under 2GB, you should consider moving some of the other processes to a different system or using a dedicated database server. As above, if you are using Direct I/O, this could instead be 80% or more of available RAM, while other processes are running.

Note

cacheSize needs to be set before starting the server and cannot be changed while the server is running.

11.3. Bulk Loader

Percona TokuMX includes a bulk loader that increases the throughput of initial loads into empty or non-existent collections.

The bulk loader is used automatically in the following scenarios:

  • All foreground index builds.

  • mongorestore and mongoimport operations that create collections (rather than insert into existing collections).

    For these tools, either the target collection must not exist beforehand, or the --drop option must be passed to the tool.

    In addition, for mongorestore, the bulk loader is disabled if the --w option is specified greater than 1.

Note

When loading large data sets, try to always load on a standalone server and then convert that server to a replica set after the load is finished (using a cold or hot backup to seed secondaries). This technique will avoid filling the oplog with the data from the load and eliminate the initial replication lag caused by these large loading activities.