EmergencyEMERGENCY? Get 24/7 Help Now!


 | August 14, 2012 |  Posted In: Tokutek, TokuView


Next week I (Bradley) will be traveling to FROSCON near Bonn, Germany, and then on to VLDB in Istanbul.

At FROSCON I’ll be talking about fast data structures for maintaining indexes. The talk will share some content with my upcoming MySQL Connect talk.

At VLDB, Dzejla Medjedovic will be presenting a talk on our paper on SSD-friendly Bloom-filter-like data structures. The paper is

Michael A. Bender, Martin Farach-Colton, Rob Johnson, Russell Kraner, Bradley C. Kuszmaul, Dzejla Medjedovic, Pablo Montes, Pradeep Shetty, Richard P. Spillane, and Erez Zadok.
Don’t Thrash: How to Cache Your Hash on Flash. PVLDB 5(11):1627-1637, 2012.

An earlier version of the paper appeared at HotStorage’11 in Portland, Oregon.

The Trash-Cache-Hash-Flash paper describes several data structures for performing approximate membership queries (AMQ). The classic example of an AMQ data structure is the Bloom filter. You maintain a set of objects: you can insert into the set, and you can ask if an object is in the set. If the query says “no”, then you know that the object isn’t in the set. But if the query says “yes”, then it might be wrong. Bloom filters are used in many contexts. In the database world, log-structured merge trees (LSM-trees) often use Bloom filters to avoid doing extra disk I/Os.

One of data structure we describe is called a cascade filter. Like a fractal-tree index (which is used, for example in TokuDB for MySQL), cascade filters can ingest data with very few I/Os: in fact, much less than one I/O per insertion on average. This low I/O cost per insertion makes the cascade filter a good match for SSD-resident data because it dramatically reduces the write amplification induced by typical Bloom filter implementations. Furthermore, unlike a Bloom filter, the cascade filter can remove objects from the set being maintained. (There are other data structures that can remove elements, such as counting Bloom filters). The cascade filter is a cache-oblivious data structure, which means it achieves good performance on a wide variety of memory hierarchies, without tuning for block size or memory size or other cache parameters.

Of the authors, I (Bradley), Rob, and Dzejla are attending. Dzejla is a student who will be looking for a job in the future, so if you might want to hire her, take the opportunity to see her talk!

Hope to see people in Germany and Istanbul. In Istanbul, I plan to cross the bridge: I haven’t been to Asia before.


Leave a Reply


Percona’s widely read Percona Data Performance blog highlights our expertise in enterprise-class software, support, consulting and managed services solutions for both MySQL® and MongoDB® across traditional and cloud-based platforms. The decades of experience represented by our consultants is found daily in numerous and relevant blog posts.

Besides specific database help, the blog also provides notices on upcoming events and webinars.
Want to get weekly updates listing the latest blog posts? Subscribe to our blog now! Submit your email address below and we’ll send you an update every Friday at 1pm ET.

No, thank you. Please do not ask me again.