Less is More: Novel Approaches to MySQL Compression for Modern Data Sets
In the age of social networking, mobile apps, and the unstoppable panopticon of government surveillance, the SQL verbs "DELETE" and "TRUNCATE" often feel like forbidden four-letter words. Between our phones and tablets, computers and cars, and countless other "smart" devices, exabytes of content and clickstreams are generated every day. These data points are collected and correlated to drive behaviors, derive conclusions, all the while producing the next set of observations to feed back into the machine. Dystopian cyberpunk visions aside, what's a MySQL DBA surrounded by petabytes to do? At a very high level, typically the technical answer is some combination of scale up, scale out, and/or shard - just add MOAR DATABASE!! Yet each of these axes of expansion has its own associated costs and limitations, and with each step in any dimension, the probability of encountering multimillion-dollar resource waste, particularly for cloud-based services, rapidly approaches 1. Compression provides a way to introduce significant friction into the death slide of inefficiency. This talk will start off with a brief overview of traditional methods of compressing MySQL data: - The built-in compress/decompress functions. - InnoDB compression prior to 5.7. - Compression/decompression at the application layer. From there we look at new developments in MySQL compression technology: - InnoDB compression changes in 5.7. - Compression-enabled storage engines, such as TokuDB and RocksDB. - Natively-compressed column types. Along the way, we'll also provide some real-world examples of how different MySQL compression technologies help Pinterest more efficiently store trillions of rows: the what and the why around the choices we've made and the good, bad, and sometimes surprising results we've encountered, with benchmarks and performance data aplenty.
Database Engineer and Bit Wrangler, Pinterest
Ernie is a database engineer on the SRE team at Pinterest, where his current focus is on improving the performance and operational efficiency of a petabyte-scale hybrid deployment of MySQL, HBase, and Redis. Ernie has worked in almost every aspect of information technology, from network engineering and software development to systems administration and information security. Ernie's current areas of interest include artificial intelligence, data analytics, and neuroscience. He holds a BS in mathematics and a BA in political science from Arizona State University.