6. Server Status

In Percona TokuMX, there are many changes to the information provided by db.serverStatus() to help users monitor running systems and diagnose issues.

6.1. New Sections

All of the new information in db.serverStatus() for Percona TokuMX is in a new section named ft.

Unless otherwise noted, all counter-type metrics are reset to zero on server startup.

Example:

To see all of the new information below, use:

> db.serverStatus().ft

Each new subsection’s information is described below.

6.1.1. Fsyncs

The Percona TokuMX transaction log normally receives many small sequential writes, and periodic fsyncs. Percona TokuMX tracks the number of and time spent doing fsyncs (across all files, though the transaction log is by far the most frequently synced file).

Example:

> db.serverStatus().ft.fsync
{
    "count" : 772,
    "time" : 68707786
}
variable fsync.count

The total number of times the database has flushed the operating system’s file buffers to disk.

variable fsync.time

The total time, in microseconds, used to fsync to disk.

6.1.2. Transaction Log

The Percona TokuMX transaction log normally receives many small sequential writes, and periodic fsyncs. Percona TokuMX tracks the bytes written and number of writes to the log, as well as the time spent doing those writes.

Example:

> db.serverStatus().ft.log
{
    "count" : 2496,
    "time" : 0.17602924388888888,
    "bytes" : 254716357
}
variable log.count

The number of individual log writes since the last server startup.

variable log.time

The total time, in seconds, used to perform log writes since the last server startup.

variable log.bytes

The number of bytes the logger has written to disk since the last server startup.

6.1.3. Cachetable

Unlike basic MongoDB, Percona TokuMX manages the memory it uses directly. You should not rely on the mem section, instead, the Cachetable section describes TokuMX’s memory usage in detail.

In the following, it is important to note the distinction between “full” and “partial” cachetable operations. Index tree nodes in Percona TokuMX have two parameters: pageSize and readPageSize. All writes are done in pageSize chunks, but reads from disk can be done in pageSize chunks, which are smaller (individual segments of the whole node). “Full” evictions and misses correspond to evicting or paging in the entire node, while “partial” evictions and misses correspond to evicting or paging in just one of the node’s segments, which are generally preferred as they are less expensive.

Percona TokuMX tracks the work done to evict old data from the cachetable when it reaches its memory limit (cacheSize), as “evictions”. Clean evictions are usually cheap, but dirty evictions are much more expensive, as they require that data be written out before it can be released. If a page is clean, it can be partially evicted, which evicts only the lesser recently used parts of the page and does not cause a future full miss later on. Furthermore, Percona TokuMX makes a distinction between leaves and nonleaves (internal tree nodes) when presenting eviction data. If too many nonleaf nodes are getting evicted, this is a sign that the workload can benefit from more memory.

The cachetable.evictions subdocument breaks down the evictions into these categories: partial vs. full misses, and nonleaf vs. leaf nodes. TokuMX tracks both the number of evictions of each type, and the amount of space reclaimed by those evictions. In addition, full evictions also track the time taken to serialize, compress, and write those nodes to disk.

Example:

> db.serverStatus().ft.cachetable
{
    "size" : {
        "current" : 1516207470,
        "writing" : 0,
        "limit" : 1771674009
    },
    "miss" : {
        "count" : 65470,
        "time" : 8.309003743333333,
        "full" : {
            "count" : 1018,
            "time" : 0.0017150680555555555
        },
        "partial" : {
            "count" : 64452,
            "time" : 8.307288675277778
        }
    },
    "evictions" : {
        "partial" : {
            "nonleaf" : {
                "clean" : {
                    "count" : 1280,
                    "bytes" : 334317512
                }
            },
            "leaf" : {
                "clean" : {
                    "count" : 76871,
                    "bytes" : 6956659264
                }
            }
        },
        "full" : {
            "nonleaf" : {
                "clean" : {
                    "count" : 6,
                    "bytes" : 15362782
                },
                "dirty" : {
                    "count" : 0,
                    "bytes" : 0,
                    "time" : 0
                }
            },
            "leaf" : {
                "clean" : {
                    "count" : 452,
                    "bytes" : 387796936
                },
                "dirty" : {
                    "count" : 0,
                    "bytes" : 0,
                    "time" : 0
                }
            }
        }
    }
}
variable cachetable.size.current

Total size, in bytes, of how much of your uncompressed data is currently in the database’s internal cache.

variable cachetable.size.writing

Total size, in bytes, of nodes that are currently queued up to be written to disk for eviction.

variable cachetable.size.limit

Total size, in bytes, of how much of your uncompressed data will fit in TokuMX’s internal cache (cacheSize).

variable cachetable.miss.count

This is a count of how many times the application was unable to access your data in the internal cache (i.e., a cache miss). This metric is similar to MongoDB’s btree misses and page faults.

variable cachetable.miss.time

This is the total time, in microseconds, of how long the database has had to wait for a disk read to complete for a cache miss.

variable cachetable.miss.full.{count,time}

The “count” and “time” breakdown for “full misses”, which are more expensive than partial misses.

variable cachetable.miss.partial.{count,time}

The “count” and “time” breakdown for “partial misses”, which are less expensive than full misses.

variable cachetable.evictions.partial.{nonleaf,leaf}.clean.{count,bytes}

Number of partial evictions (of nonleaf and leaf nodes), and their total size in bytes.

variable cachetable.evictions.full.{nonleaf,leaf}.clean.{count,bytes}

Number of clean, full evictions (of nonleaf and leaf nodes), and their total size in bytes.

variable cachetable.evictions.full.{nonleaf,leaf}.dirty.{count,bytes,time}

Number of dirty, full evictions (of nonleaf and leaf nodes), and their total size in bytes, as well as the time spent serializing, compressing, and writing those nodes to disk.

6.1.4. Checkpoint

TokuMX’s main data structure, the Fractal Tree, usually writes most user data to disk during checkpoints, which are triggered once every 60 seconds by default (checkpointPeriod, similar to --syncdelay or storage.syncPeriodSecs in MongoDB 2.6.0).

During a checkpoint, all tree data that has changed is written durably to disk, and this allows the system to trim old data from the tail of the transaction log. The system reports timing information about checkpoints as well as the number of bytes written, in the Checkpoint section of db.serverStatus().

A checkpoint is triggered 60 (checkpointPeriod) after the previous checkpoint was triggered, or immediately after the last checkpoint if it took longer than 60 seconds. For example, if every checkpoint takes 6 seconds, there should be 54 seconds between checkpoints, and checkpoint.time should be about 10% of the total system uptime. Extremely long checkpoints can cause a system to back up over time; if checkpoints are taking too long it may mean that your system needs more I/O bandwidth for node writes and/or more CPU power for compression. Disk writes for checkpoint are tracked in checkpoint.write.

Example:

> db.serverStatus().ft.checkpoint
{
    "count" : 13,
    "time" : 232,
    "lastBegin" : ISODate("2014-06-17T20:51:13Z"),
    "lastComplete" : {
        "begin" : ISODate("2014-06-17T20:51:13Z"),
        "end" : ISODate("2014-06-17T20:51:32Z"),
        "time" : 19
    },
    "begin" : {
        "time" : 1432
    },
    "write" : {
        "nonleaf" : {
            "count" : 682,
            "time" : 0.06575672722222221,
            "bytes" : {
                "uncompressed" : 880481988,
                "compressed" : 300794368
            }
        },
        "leaf" : {
            "count" : 942,
            "time" : 0.2565985083333333,
            "bytes" : {
                "uncompressed" : 2014680388,
                "compressed" : 857306624
            }
        }
    }
}
variable checkpoint.count

Number of completed checkpoints.

variable checkpoint.time

Time (in seconds) spent doing checkpoints.

variable checkpoint.lastBegin

The begin timestamp of the most recently started (possibly in progress) checkpoint.

variable checkpoint.lastComplete.begin

The begin timestamp of the most recently completed checkpoint.

variable checkpoint.lastComplete.end

The end timestamp of the most recently completed checkpoint.

variable checkpoint.lastComplete.time

The time spent, in seconds, by the most recently completed checkpoint.

variable checkpoint.write.{nonleaf,leaf}.count

Number of nonleaf and leaf nodes written to disk during checkpoints.

variable checkpoint.write.{nonleaf,leaf}.time

Time spent, in seconds, writing nonleaf and leaf nodes to disk during checkpoints.

variable checkpoint.write.{nonleaf,leaf}.bytes.uncompressed

Total size of nonleaf and leaf nodes written to disk during checkpoints, before compression.

variable checkpoint.write.{nonleaf,leaf}.bytes.compressed

Total size of nonleaf and leaf nodes written to disk during checkpoints, after compression.

6.1.5. Serialize Time

For writes, the primary consumer of CPU time is usually compression. For in-memory queries it’s usually tree searches, but for >RAM queries, decompression and deserialization can begin to impact performance. Serialization/deserialization and compression/decompression times are reported in Serialize Time.

Example:

> db.serverStatus().ft.serializeTime
{
    "nonleaf" : {
        "serialize" : 1.650712856111111,
        "compress" : 67.41081527,
        "decompress" : 0.797666756111111,
        "deserialize" : 4.092646362777778
    },
    "leaf" : {
        "serialize" : 2.524351571111111,
        "compress" : 176.64170658944442,
        "decompress" : 8.835384991666666,
        "deserialize" : 1.52323294
    }
}
variable serializeTime.{nonleaf,leaf}.serialize

Total time, in seconds, spent serializing nonleaf and leaf nodes before writing them to disk (for checkpoint or when evicted while dirty).

variable serializeTime.{nonleaf,leaf}.compress

Total time, in seconds, spent compressing nonleaf and leaf nodes before writing them to disk (for checkpoint or when evicted while dirty).

variable serializeTime.{nonleaf,leaf}.decompress

Total time, in seconds, spent decompressing nonleaf and leaf nodes and their partitions after reading them off disk.

variable serializeTime.{nonleaf,leaf}.deserialize

Total time, in seconds, spent deserializing nonleaf and leaf nodes and their partitions after reading them off disk.

6.1.6. Locktree

Percona TokuMX uses a locktree to implement Document-level Locks for SERIALIZABLE transactions. The locktree’s size is limited by locktreeMaxMemory, and some statistics are reported in Locktree.

Example:

db.serverStatus().ft.locktree
{
    "size" : {
        "current" : 2660,
        "limit" : 161061273
    }
}
variable locktree.size.current

Total size, in bytes, of memory the locktree is currently using.

variable locktree.size.limit

Maximum number of bytes that the locktree is allowed to use.

6.1.7. Compression Ratio

One of TokuMX’s biggest features is compression. For a quick estimate of compression, Percona TokuMX tracks all node writes that have been done while the server has been up, and reports the effective compression ratio for those writes in Compression Ratio.

Example:

> db.serverStatus().ft.compressionRatio
{
    "leaf" : 9.303496342310673,
    "nonleaf" : 10.9380145184542696,
    "overall" : 9.5766049110996824
}
variable compressionRatio.{leaf,nonleaf,overall}

For every node that is written out for checkpoint or eviction, Percona TokuMX records the size of the node before and after compression. The reported values in Compression Ratio are the ratios of those sizes, segregated into just leaves, just nonleaves, and the total ratio for all node types.

This may be different from the actual compression ratio, because it only tracks recently written nodes. However, it can be more accurate than what’s provided in db.collection.stats(), because that uses estimated values for the uncompressed size, whereas db.serverStatus().ft.compressionRatio uses exact values.

6.1.8. Fast Updates

Percona TokuMX tracks information about the possible usage of Fast Updates so users can learn whether fast updates can benefit or are benefiting their application.

Example:

> db.serverStatus().metrics.fastUpdates
{
    "errors" : NumberLong(0),
    "eligible" : {
        "primaryKey" : NumberLong(1000),
        "secondaryKey" : NumberLong(100)
    },
    "performed" : {
        "primaryKey" : NumberLong(0),
        "secondaryKey" : NumberLong(0),
        "slowOnSecondary" : NumberLong(0)
    }
}
variable metrics.fastUpdates.errors

Number of fast updates performed that were later found to have an error and therefore resulted in a no-op. Note that a single update can be counted multiple times, so this number is not entirely accurate. An example that causes this number to increase is trying to increment a text field with a fast update.

variable metrics.fastUpdates.eligible.primaryKey

Number of updates performed with fast updates disabled where the query uniquely identified the primary key and the update was eligible to be fast. If fast updates were enabled, these updates would have occurred with no reads, resulting in possibly large performance gains. If this value represents a large portion of your updates, you may want to consider enabling fast updates for better performance.

variable metrics.fastUpdates.eligible.secondaryKey

Number of updates performed with fast updates disabled where a query was required to find the primary key, but the update was eligible to be fast without fetching the entire document. If fast updates were enabled, these updates may have occurred with fewer reads, possibly resulting in performance gains. If this value represents a large portion of your updates, you may want to consider enabling fast updates for better performance. Some applications will show an improvement in performance, but some applications might not. Therefore, you ought to first experiment with fast updates enabled to see how your application is affected.

variable metrics.fastUpdates.performed.primaryKey

Number of updates performed with fast updates enabled where the query uniquely identified the primary key. These updates occurred with no reads, resulting in possibly large performance gains.

variable metrics.fastUpdates.performed.secondaryKey

Number of updates performed with fast updates disabled where a query was required to find the primary key, but fetching the entire document was not required. These updates may have occurred with fewer reads, possibly resulting in performance gains.

variable metrics.fastUpdates.performed.slowOnSecondary

Number of updates performed on a secondary that required a query to find the primary key, before applying the update. Generally, this should not happen unless there is a mismatch in indexes between the primary and the secondary. If these updates are happening, the user ought to investigate why, because this may lead to secondary lag.

6.1.9. Alerts

Percona TokuMX also tracks some anomalous events, which will appear in the Alerts section if any such events are detected.

Example:

> db.serverStatus().ft.alerts
{
    "panic code" : NumberLong(0),
    "panic string" : "",
    "filesystem status" : "OK",
    "locktreeRequestsPending" : 0,
    "checkpointFailures" : 0,
    "longWaitEvents" : {
        "logBufferWait" : 12,
        "fsync" : {
            "count" : 1,
            "time" : 1334887
        },
        "cachePressure" : {
            "count" : 0,
            "time" : 0
        },
        "checkpointBegin" : {
            "count" : 0,
            "time" : 0
        },
        "locktreeWait" : {
            "count" : 0,
            "time" : 0
        },
        "locktreeWaitEscalation" : {
            "count" : 0,
            "time" : 0
        }
    }
}
variable alerts["panic code"] / alerts["panic string"]

Integer error code and error string if the engine is panicked. Usually indicates an impending crash.

variable alerts["filesystem status"]

Current status of the filesystem’s free space, with respect to fsRedzone.

variable alerts.locktreeRequestsPending

The number of requests for Document-level Locks in the locktree that are waiting for other requests to release their locks.

During normal operation of most workloads, there should be no requests pending. This is a good field to monitor, because if it is large, you may be doing large multi-document update or findAndModify operations, which tend to cause Document-level Lock Conflicts and drastically reduce the potential concurrency of write operations.

variable alerts.checkpointFailures

Number of checkpoints that have failed for any reason.

variable alerts.longWaitEvents.logBufferWait

Number of times a writing client had to wait more than 100ms for access to the log buffer.

variable alerts.longWaitEvents.fsync.{count,time}

Same information as fsync.count and fsync.time, but only for fsync operations that took more than 1 second.

variable alerts.longWaitEvents.cachePressure.{count,time}

Number of times and the time spent (in microseconds) that a thread had to wait more than 1 second for evictions to create space in the cachetable for it to page in data it needed.

variable alerts.longWaitEvents.locktreeWait.{count,time}

Number of times and the time spent (in microseconds) that a thread had to wait more than 1 second to acquire a document-level lock in the locktree.

variable alerts.longWaitEvents.locktreeWaitEscalation.{count,time}

Number of times and the time spent (in microseconds) that a thread had to wait more than 1 second to acquire a document-level lock because the locktree was at the memory limit (locktreeMaxMemory) and needed to run escalation.

6.2. Deprecated Entries

The following entries in db.serverStatus() do not make sense for TokuMX’s storage engine and have been removed.

  • backgroundFlushing
  • dur
  • indexCounters
  • recordStats
  • mem.mapped
  • mem.mappedWithJournal
  • record
  • repl.buffer.maxSizeBytes
  • preload