I’ve got an interesting comment the other day saying “I’ve heard Jay Pipes saying indexes with cardinality lower than 30% are worthless, true?”

That is interesting question and it has different answers depending on who is asking. A lot of people want to hear simple answers to the questions because they would be overwhelmed by complete answer, it is like children being taught math first taught simple division skills like you can divide 10 by 2 but not by 3. As we grow up we learn the truth is much more complex than that and if you Major in math you will find there is a lot of stuff out where far beyond basic stuff we use in every day life.

Answering such question you can say yes it is true and for clueless person it will be helpful answer because it will allow him to avoid creation indexes on the column containing just one value which is good idea.

If you start to explain it in details, a lot of variables add in – covering indexes, skewed distribution vs lookup frequency, IO bound or CPU bound workload which may be far too complicated.

It is also interesting the broad answer would not be helpful – there are workloads when even with 1% matching rows you’re better of doing full table scans, and there are ones with 99% same values in the column having it indexed helps.

The other questions which people like to get quick answers for are for example “Which MySQL Storage Engine to Use” or “How Much Memory Compared to Your Database Size Should you Have” , “What Key Cache Hit Ratio is Good One”

This often gives me head aches – I like correctness in the answers but it is often too much to deal with, so I’m trying to keep the answers tailored towards customer needs omitting details which should not be important and making sure all relevant stuff is covered.

This is one of the reasons of hiring MySQL Expert – you typically can find same information in the books, Internet, and even in this blog but selecting what is relevant for your case might be complicated.

4 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Jay Pipes

Uh, I would never say “indexes with cardinality lower than 30% are worthless”. 😉 What I have said in the past is that if you have a situation where the value you are searching for is contained in 30% of an index’s records, it is highly unlikely the index will be used, as the efficiency of a sequential scan will outweigh the numerous bookmark lookup operations needed with the index seek operations. Of course, having a covering index can influence the optimizer’s decision of course, as will having multiple levels of cardinality (multiple columns in the index). Bottom line: I *never* say there is a simple answer or a single answer to anything. Just take a look at my latest set of slides… on slide 2, I say “The answer to every question is… it depends.”

Cheers, and see you soon at the conference, Peter!

Bob Bankay

We have a large table (10M row) with about 12 indices. One of then indicates that state of the row in the table and there are about 30 states. The states change from 0 to 30 in about 14 days.

Would it be appropriate to create views for state with the highest population. Would it be faster than a lookup for the specific state wiht LIMIT n?