Billion Goods in Few Categories: how Histograms Save a Life?

We store data with the intention to use it: search, retrieve, group, sort… To perform these actions effectively MySQL storage engines index data and communicate statistics with the Optimizer when it compiles a query execution plan. This approach works perfectly well unless your data distribution is uneven. Last year I worked on several tickets where data followed the same pattern: millions of popular products fit into a couple of categories and the rest used the rest. We had a hard time finding a solution for retrieving goods fast. Workarounds for version 5.7 were offered. However, we learned a new MySQL 8.0 feature – histograms – would work better, cleaner and faster. Thus, the idea of our talk was born. In this webinar, we will discuss: – How index statistics are physically stored – Which data exchanged with the Optimizer – Why it is not enough to make a correct index choice In the end, I will explain which issues are resolved by histograms and why using index statistics are insufficient for the fast retrieving of unevenly distributed data.

Download slides

Resources