Nov 07, 2018
 Percona Live Open Source Database Conference Europe 2018
  Sveta Smirnova

Billion Goods in Few Categories: how Histograms Save a Life?

We store data with the intention to use it: search, retrieve, group, sort... To perform these actions effectively MySQL storage engines index data and communicate statistics with the Optimizer when it compiles a query execution plan. This approach works perfectly well unless your data distribution is not even.

Last year I worked on several tickets where data follow the same pattern: millions of popular products fit into a couple of categories and rest used the rest. We had a hard time to find a solution for retrieving goods fast. Workarounds for version 5.7 were offered. However new MySQL 8.0 feature: histograms, - would work better, cleaner and faster. This is how the idea of the talk was born.

I will discuss
- how index statistics physically stored
- which data exchanged with the Optimizer
- why it is not enough to make correct index choice

In the end, I will explain which issues resolve histograms and why using index statistics is insufficient for fast retrieving of not evenly distributed data.



About the Author

Sveta Smirnova

Sveta joined Percona in 2015. Her main professional interests are problem solving, working with tricky issues, bugs, finding patterns which can solve typical issues quicker, teaching others how to deal with MySQL issues, bugs and gotchas effectively. Before joining Percona Sveta worked as Support Engineer in MySQL Bugs Analysis Support Group in MySQL AB-Sun-Oracle. She is author of book "MySQL Troubleshooting" and JSON UDF functions for MySQL.