EmergencyEMERGENCY? Get 24/7 Help Now!

Fast Updates : Coming Soon in TokuMX v2.0

 | September 29, 2014 |  Posted In: Tokutek, TokuView

PREVIOUS POST
NEXT POST

Coming in TokuMX v2.0 is a feature we’re calling “Fast Updates”. Fast updates permit certain update operations to bypass the read-modify-write behavior that most databases require (including MongoDB and the current release of TokuMX). In this blog I’ll cover how Fast Updates work by describing a simple schema and workload, plus I’ll measure the performance gains via a simple benchmark.

Fast Updates – Overview

In MongoDB and TokuMX v1.5, updates using $ operators, like $inc, force the server to read the document as part of the update operation. Reading the document allows the server to get the before image of the document to construct the updated version. It also allows the server to report the number of documents updated back to the caller and to check if there was an issue with the update (like reporting an error if the caller tried to $inc a character field).

TokuMX’s Fractal Tree indexes have always been capable of handling complex update operations, and in TokuMX v2.0 we are adding support for two types of fast updates: unindexed point updates by primary key and unindexed updates by secondary key.

Non-indexed Point Updates by Primary Key

This is an update where the primary key is specified in the operation (just a single document is updated). The update must not attempt to change any indexed columns (otherwise we’d need to read the before image of those indexed values as part of the update). Lastly, all updates must be $ operations.

Non-indexed Updates by Secondary Key

This is an update where the secondary key is specified used to determine which documents are updated. During the scan of the secondary index TokuMX collects primary key values for documents to update, at which point the unindexed point update by primary key is used for each. All conditions of that optimization must be satisfied.

The Fine Print

  • As mentioned above, your update cannot modify any indexed columns as there is no getting around reading the original document for that scenario.
  • Also, you are giving up the ability to know how many documents were updated (we return maximum # of documents your update might have affected and if you $inc a character field the update will fail silently). Since this is a behavioral change we added a new dynamic server parameter to enable the feature (it defaults to off).
  • Also, we maintain db.serverStatus() counters so you can check your server to see how many updates were eligible for fast updates versus how many actually took advantage of them.

Example Schema and workload

Imagine a schema for an online role playing game, the main player collection might be modeled as follows

{
_id: ,
name: "Tim Callaghan",
hit_points: 55,
gold: 10005,
team_id: 75,
experience: 77123
}

We would likely put an index on team_id so we can efficiently locate all players on a particular team, plus another index onexperience so we can efficiently create a top-100 list of players at any time. The following 4 update operations are possible:

Decrease a single players hit_points

db.player.update({_id: ObjectId("54299e290d9a202bab65d60b")}, {$inc: {hit_points:-7}})

You’d probably not want this update to be done on the fast path, as knowing the player’s hit points after the $inc operation is critical to knowing their current health level.

Increase a player’s gold

db.player.update({_id: ObjectId("54299e290d9a202bab65d60b")}, {$inc: {gold:5000}})

Fast path for the win. You don’t need to know how much gold a player has until they get to the virtual store, which might not be for quite some time.

Increase a team’s gold

db.player.update({team_id: 75}, {$inc: {gold:5000}}, {multi:true})

Fast path for the bigger win (saving 1 look-up for each player on team_id 75). You don’t need to know how much gold the players have until they get to the virtual store, which might not be for quite some time.

Increase a player’s experience

db.player.update({_id: ObjectId("54299e290d9a202bab65d60b")}, {$inc: {experience:3000}})

Since we are maintaining an index on experience, this update cannot be optimized and requires a look-up to get the existing value of the player’s experience to maintain the secondary index.

Fast Updates: The Benchmarks

The benchmark was performed on a Dell R710 server (2 x Xeon E540, 48GB RAM, 8 x 10K SAS in RAID 10), TokuMX was set to use zlib compression with a 4GB cache and directIO. The workload was Sysbench using 16 tables, 1 million documents per table (data was far larger than cache as I added a 4KB field to every document) and 64 concurrent threads. The workload was run for 10 minutes with the optimization disabled, then for another 10 minutes with the optimization enabled.

Fast Updates By Primary Key

Using the existing Sysbench schema and benchmark, I removed all operations except for a single unindexed update by primary key. The benchmark begins with the optimization disabled, then it is enabled for the final 10 minutes. Throughput went from ~2,500 updates per second to ~37,000.

[image img_url=”/blog/wp-content/uploads/2014/09/tokumx20-fast-updates-lex4-primary-cps.png” img_title=”fast updates”]

Fast Updates By Secondary Keys

Again using the existing Sysbench schema and benchmark, I removed all operations except for a single unindexed update by secondary key. The performance difference gets larger as the number of updates in the secondary index increases. The benchmark begins with the optimization disabled, then it is enabled for the final 10 minutes. Throughput went from ~2,400 updates per second to ~31,000.

[image img_url=”/blog/wp-content/uploads/2014/09/tokumx20-fast-updates-lex4-secondary-cps.png” img_title=”fast updates”]

To learn more about TokuMX:

Download it or check out the documentation.

PREVIOUS POST
NEXT POST

Leave a Reply