EmergencyEMERGENCY? Get 24/7 Help Now!

Making Deletions Fast, by Avoiding Disk Seeks

 | June 8, 2010 |  Posted In: Tokutek, TokuView


In my last post, I discussed how fractal tree data structures can be up to two orders of magnitude faster on deletions over B-trees. I focused on the deletions where the row entry is known (the storage engine API handler::delete_row), but I did not fully analyze how MySQL delete statements can be fast. In this post, I do. Here I show how one can use TokuDB, a storage engine that uses fractal tree data structures, to make MySQL deletions run fast.

Let’s take a step back and analyze the work needed to be done to execute a MySQL delete statement. Suppose we have the table:

Say we wish to perform the following operation that deletes 100,000 rows:

In MySQL, this statement executes in two steps. First, MySQL finds all the rows where a=1, via a query. The query is equivalent to “select * from foo where a=1;” Then for each row found, MySQL deletes the row by calling handler::delete_row(row). So the time to execute deletions is equivalent to the time to find the rows (T_query) plus the time to delete the found rows (T_change). Or, stated another way:

The previous post shows why T_change is faster for fractal tree data structures than for B-trees. Let’s use that fact to see how we can make deletions fast under MySQL.

Back to the original problem at hand:

For this statement, T_query is a table scan. Because no index on ‘a’ exists, every element must be processed to find where a=1. For large tables, this can be expensive. The (minor) advantage here is T_change, the cost of removing the 100,000 rows, requires no disk seeks, because rows where a=1 stay in memory as MySQL deletes them. The problem remains, we process too much data in T_query to delete some rows.

So how can we speed up a query? Indexing! Or specifically, add an index on ‘a’. The schema is now:

Now what is the cost of “delete from foo where a=1;”? Well, for TokuDB, InnoDB, and MyISAM, T_query requires roughly 100,000 disk seeks, because point queries are needed to retrieve the entire row. The key (a) is not a covering index. The (minor) advantage here (again) is that T_change has no additional disk seeks, because the query does the disk seeks necessary to bring the rows where a=1 into memory, both in the primary index (or .MYD file for MyISAM) and secondary index.

The problem remains that we are still doing about 100,000 disk seeks!

So how can we further speed up T_query? Remember, we want to make the query “select * from foo where a=1;” faster. Clustering indexes! The schema looks like:

Suppose MyISAM and InnoDB supported clustering indexes (they don’t). What is their cost for performing the deletion? Well, T_query becomes much faster, because it would be a range query as opposed to 100,000 point queries. But T_change is still expensive. As explained in my last post, B-trees require disk seeks for deletion when the row is not in memory. The rows that need to be deleted in the primary key are not in memory and require disk seeks. So, for MyISAM and InnoDB, we are still stuck with at least 100,000 disk seeks. For this reason, some erroneously think “clustering keys do not help here”.

Now what about TokuDB, a storage engine that uses fractal tree data structures? T_query is also fast, because a range query is performed. But T_change is ALSO fast, because the deletions in the primary index do NOT always require disk seeks. This is where the advantage of fractal tree data structures over B-trees comes into play, allowing clustering keys to speed up a deletion procedure.

So, the moral of the story is this. Deletions are a combination of queries and value changes. To make deletions fast, both the query and the value changes need to be fast. With B-tree based storage engines, users may be hesitant to add indexes to speed up the query, due to the added cost of value changes in the index. With TokuDB, this tradeoff does not exist, because TokuDB uses fractal tree data structures. Using a clustering key to drive down the cost of T_query, and using fractal tree indexes to keep the cost of T_change down, leads to very fast deletes.



Leave a Reply