The Language of Compression Benchmarking
Whether your data's in MySQL, a NoSQL, Hadoop, or somewhere in the cloud, you're likely paying decent money for storage and IOPS. With ever-growing data volumes, and the need for SSDs to cut latency and replication to provide insurance, your storage footprint is an important place to look for savings. It makes sense, then, why so many storage vendors tout compression as a key metric and differentiator. The language vendors and users employ to reason about storage footprint and compression is embarrassingly vague if not meaningless or downright deceptive, but we can do better, and we must do better. In this talk, we'll discuss each part of the durable storage stack, from the hardware on up, and how usage numbers can take on different meanings at each layer. We'll talk about what's important to know at each layer, and how to think about and talk about concepts like compression, fragmentation, write amplification, and wear leveling. Finally, we'll see different ways benchmarketers present data to lie to you, and learn some techniques for identifying and cutting through those kinds of lies.
Engineer, Two Sigma
Leif Walsh worked on TokuMX at Tokutek. He also worked on performance-critical software at Google and Microsoft, and helped start RethinkDB. Leif studied math and computer science at Stony Brook University. In his spare time, he is an amateur lithography assistant.