EmergencyEMERGENCY? Get 24/7 Help Now!


 | August 14, 2012 |  Posted In: Tokutek, TokuView


Next week I (Bradley) will be traveling to FROSCON near Bonn, Germany, and then on to VLDB in Istanbul.

At FROSCON I’ll be talking about fast data structures for maintaining indexes. The talk will share some content with my upcoming MySQL Connect talk.

At VLDB, Dzejla Medjedovic will be presenting a talk on our paper on SSD-friendly Bloom-filter-like data structures. The paper is

Michael A. Bender, Martin Farach-Colton, Rob Johnson, Russell Kraner, Bradley C. Kuszmaul, Dzejla Medjedovic, Pablo Montes, Pradeep Shetty, Richard P. Spillane, and Erez Zadok.
Don’t Thrash: How to Cache Your Hash on Flash. PVLDB 5(11):1627-1637, 2012.

An earlier version of the paper appeared at HotStorage’11 in Portland, Oregon.

The Trash-Cache-Hash-Flash paper describes several data structures for performing approximate membership queries (AMQ). The classic example of an AMQ data structure is the Bloom filter. You maintain a set of objects: you can insert into the set, and you can ask if an object is in the set. If the query says “no”, then you know that the object isn’t in the set. But if the query says “yes”, then it might be wrong. Bloom filters are used in many contexts. In the database world, log-structured merge trees (LSM-trees) often use Bloom filters to avoid doing extra disk I/Os.

One of data structure we describe is called a cascade filter. Like a fractal-tree index (which is used, for example in TokuDB for MySQL), cascade filters can ingest data with very few I/Os: in fact, much less than one I/O per insertion on average. This low I/O cost per insertion makes the cascade filter a good match for SSD-resident data because it dramatically reduces the write amplification induced by typical Bloom filter implementations. Furthermore, unlike a Bloom filter, the cascade filter can remove objects from the set being maintained. (There are other data structures that can remove elements, such as counting Bloom filters). The cascade filter is a cache-oblivious data structure, which means it achieves good performance on a wide variety of memory hierarchies, without tuning for block size or memory size or other cache parameters.

Of the authors, I (Bradley), Rob, and Dzejla are attending. Dzejla is a student who will be looking for a job in the future, so if you might want to hire her, take the opportunity to see her talk!

Hope to see people in Germany and Istanbul. In Istanbul, I plan to cross the bridge: I haven’t been to Asia before.


Leave a Reply