Billion Goods in Few Categories: how Histograms Save a Life?
We store data with the intention to use it: search, retrieve, group, sort... To perform these actions effectively MySQL storage engines index data and communicate statistics with the Optimizer when it compiles a query execution plan. This approach works perfectly well unless your data distribution is uneven.
Last year I worked on several tickets where data followed the same pattern: millions of popular products fit into a couple of categories and the rest used the rest. We had a hard time finding a solution for retrieving goods fast. Workarounds for version 5.7 were offered. However, we learned a new MySQL 8.0 feature - histograms - would work better, cleaner and faster. Thus, the idea of our talk was born.
In this webinar, we will discuss:
- How index statistics are physically stored
- Which data exchanged with the Optimizer
- Why it is not enough to make a correct index choice
In the end, I will explain which issues are resolved by histograms and why using index statistics are insufficient for the fast retrieving of unevenly distributed data.