Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Why InnoDB index cardinality varies strangely

September 28, 2009

Author

Baron Schwartz

Insight for DBAs

MySQL

Share this Post:

This is a very old draft, from early 2007 in fact. At that time I started to look into something interesting with the index cardinality statistics reported by InnoDB tables. The cardinality varies because it’s derived from estimates, and I know a decent amount about that. The interesting thing I wanted to look into was why the cardinality varies in a particular pattern.

Here I’ll grab a bunch of cardinality estimates from sakila.film on MySQL 5.0.45 and put them into a file:

baron@kanga:~$ while true; do mysql sakila -N -e 'show index from film' | head -n 2 | tail -n 1 | awk '{print $7}'; done > sizes

1	baron@kanga:~$ while true; do mysql sakila -N -e 'show index from film' \| head -n 2 \| tail -n 1 \| awk '{print $7}'; done > sizes

After a while I cancel it and then sort and aggregate them with counts:

baron@kanga:~$ sort sizes | uniq -c
157 1022
156 1024
156 1058
156 1059
156 1131
313 951
312 952
312 953

baron@kanga:~$ sort sizes | uniq -c

157 1022

156 1024

156 1058

156 1059

156 1131

313 951

312 952

312 953

Look at the distribution of the counts. The weighted average of these is 1000.53, so it’s close to the truth (1000 rows). But five of the eight distinct estimates are shown about one-half as often as the others; it looks like the random choice of which statistic to use is not evenly distributed.

I mentioned this to Heikki and he pondered it for a bit — but neither of us really figured out what was going on. I know the code superficially, but not as well as he or Yasufumi or others do; and I was not able to find a cause.

More recently I saw that I’m not the only one who notices oddities in the random number generation. I waited. And indeed the fixes for that bug seemed to have fixed the skew in the statistics. Case solved, and all I had to do was wait. Truly, laziness is a virtue.

0 0 votes

Article Rating

5 Comments

Oldest

Newest Most Voted

Morgan Tocker

16 years ago

Excellent! I had a training student ask me this question in San Francisco. It’s great to see such a thorough analysis in Vasil’s post.

Author

Baron Schwartz

16 years ago

Oh, I should have linked my bug report: http://bugs.mysql.com/bug.php?id=41133

Vadim

16 years ago

The story is much worse than it is sounds. For single-column indexes we get variations. For multi-column indexes we have problems.

Lets consider

CREATE TABLE T (
A int NOT NULL,
B int NOT NULL,
C int NOT NULL,
S varchar(80) NOT NULL,
KEY ix_a_b (A, B),
KEY ix_a_c (A, C)
) ENGINE=InnoDB

Our typical queries are

select S from T where A = ? and B = ?;
select S from T where A = ? and C = ?;

Well, here sampling comes to the play. We will get cardinalities
ix_a_b (1) x1
ix_a_b (2) x2
ix_a_c (3) y1
ix_a_c (4) y2

Because of sampling x1 is never equal to y1. So, for our queries it will always prefer one index over another. It means that one of the queries is always executed using wrong index!!!

I talked to Percona people. In their opinion the behavior is inherent, as sampling of indexes happen independently. I guess, expectations of optimizer are:
1) x1 = y1
2) x1 <= x2 <= rowcount && y1 <= y2 <= rowcount

Deepika Maddali

16 years ago

Hi,

I had the same problem with the various indexes on tables. The query would not use the same execution plan for each execution with Index Hints, but when we analyze the table the execution plan is as per the Index hints. I am not sure why is this possible, due to data fragmentation on the datafile and the estimates are not the most recent.