We store data with an intention to use it: search, retrieve, group, sort... To do it effectively, the MySQL Optimizer uses index statistics when it compiles the query execution plan. This approach works excellently unless your data distribution is not even.
Last year I worked on several support tickets where data follows the same pattern: millions of popular products fit into a couple of categories and the rest used the rest. We had a hard time finding a solution for retrieving goods fast. We offered workarounds for version 5.7. However, a new MariaDB and MySQL 8.0 feature - histograms - would work better, cleaner and faster. The idea of the talk was born.
Of course, histograms are not a panacea and do not help in all situations.
I will discuss
- how index statistics physically stored by the storage engine
- which data exchanged with the Optimizer
- why it is not enough to make correct index choice
- when histograms can help and when they cannot
- differences between MySQL and MariaDB histograms
Sveta Smirnova works as a MySQL Support engineer since the year 2006. She is the author of book "MySQL Troubleshooting" (http://shop.oreilly.com/product/0636920021964.do) and MySQL Labs project "JSON UDFs for MySQL": design prototype of today JSON support in MySQL. After starting her career as a web developer, Sveta moved to a role in Bugs Analysis MySQL Support Group in MySQL AB/Sun/Oracle, then, in March 2015, joined the Support Team in Percona. She works on MySQL software bugs, escalations, and tricky Support issues on a daily basis. Her primary professional interest in recent years is finding better ways to solve DBA problems effectively.