In MySQL 8, Information Schema was basically re-written to utilize the all-New Data Dictionary which made it faster and better in many ways. Yet it also introduces some very counterintuitive behaviors when it comes to how actual and consistent data is.
Here is a simple test. Let’s create a table and insert a bunch of rows in it… and then check table size:
|
1 |
mysql> show table status G<br>*************************** 1. row ***************************<br> Name: t<br> Engine: InnoDB<br> Version: 10<br> Row_format: Dynamic<br> Rows: 0<br> Avg_row_length: 0<br> Data_length: 16384<br>Max_data_length: 0<br> Index_length: 0<br> Data_free: 0<br> Auto_increment: NULL<br> Create_time: 2019-12-27 01:04:37<br> Update_time: NULL<br> Check_time: NULL<br> Collation: utf8mb4_0900_ai_ci<br> Checksum: NULL<br> Create_options:<br> Comment:<br>1 row in set (0.02 sec)<br><br>mysql> select count(*) from t;<br>+----------+<br>| count(*) |<br>+----------+<br>| 524288 |<br>+----------+<br>1 row in set (0.21 sec)<br> |
So we have a bunch of rows in the table but the information schema (queried for simplicity with SHOW TABLE STATUS command) tells us the table is empty!
Any reasonable person would go ahead and file a bug at this point, but if you instead decide to spend some time googling you will find it is intended behavior. There is information_schema_stats_expiry variable that controls caching for these statistics, which defaults to 86400 seconds – or one whole day!
If caching is indeed in order, would not be something like 5 minutes be a much more reasonable value, showing the “big picture” correctly while also helping to improve performance? Such an approach would avoid having TB-sized tables on disk which report they have zero size, at least not for long.
In general, this looks like a classic case of premature optimization to me – with a moderate number of tables most users will have queries run pretty fast without caching (setting information_schema_stats_expiry to 0), and in my not-very-scientific-tests, I do not think I saw a performance difference more than 2x with and without caching. So why report confusing cached data when you can get the actual data fast enough? I do not know and a MySQL Engineering blog post on this topic (MySQL 8.0: Improvements to Information_schema) does not explain the reasoning of caching by default or the choice of such high expiration time.
If you read the MySQL manual you may also be under the impression you can just access mysql.index_stats and mysql.table_stats tables if you want the actual data from its source. But unfortunately, access to those tables is not available even to the MySQL “root” user.
|
1 |
mysql> select * from mysql.index_stats limit 5;<br>ERROR 3554 (HY000): Access to data dictionary table 'mysql.index_stats' is rejected.<br> |
The good news, though, is that information_schema_stats_expiry is session-variable so you can easily set it to zero for the applications which need up to date table information!
Resources
RELATED POSTS