Report on XLDB Tutorial on Data Structures and Algorithms
Bradley and I (Michael) gave the tutorial on Data Structures and Algorithms for Big Databases at the 6th XLDB Conference last month.
The tutorial was organized as follows:
- Module 0: Tutorial overview and introductions. We describe an observed (but not necessary) tradeoff in ingestion, querying, and freshness in traditional database.
- Module 1: I/O model and cache-oblivious analysis.
- Module 2: Write-optimized data structures. We give the optimal trade-off between inserts and point queries. We show how to build data structures that lie on this tradeoff curve.
- Module 2 continued: Write-optimized data structures perform writes much faster than point queries; this asymmetry affects the design of an ACID compliant database.
- Module 3: Case study – TokuFS. How to design and build a write-optimized file systems.
- Module 4: Page-replacement algorithms. We give relevant theorems on the performance of page-replacement strategies such as LRU.
- Module 5: Index design, including covering indexes.
- Module 6: Log-structured merge trees and fractional cascading.
- Module 7: Bloom filters.
These algorithms and data structures are used both in NoSQL implementations such as MongoDB, HBase and in SQL-oriented implementations such as MySQL and TokuDB.
The slides are available here.
Here are impressions from the experience. First, we feel grateful to have spoken before such an engaged and high-powered group of attendees. We welcome followup interactions with the colleagues that we met. We hope to have an opportunity to give a similar tutorial again in the not-too-distant future.
Second, we appreciated the longer format of a four-hour tutorial, rather than a short talk. It’s not easy to prepare a tutorial comprising four hours of advanced data structures; it must be even harder to sit through one. Nonetheless, it was refreshing to delve into topics the way we like to do in the classroom. JRR Tolkien said about The Lord of the Rings that, in retrospect, he felt that the book was too short. Intense as a four-hour tutorial may be, I feel the same way about our tutorial. Another couple of hours (after an appropriate break) would have been great.
An aside: JRR Tolkien was right. The Lord of the Rings is too short.
Conference, Fractal Tree™ indexes, MySQL, NewSQL, NoSQL, Storage Engine, TokuDB, Tokutek