Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

How expensive is a WHERE clause in MySQL?

October 31, 2008

Author

Baron Schwartz

Benchmarks

Insight for DBAs

Share this Post:

This is a fun question I’ve been wanting to test for some time.Â How much overhead does a trivial WHERE clause add to a MySQL query?Â To find out, I set my InnoDB buffer pool to 256MB and created a table that’s large enough to test, but small enough to fit wholly in memory:

CREATE TABLE `t` (
`a` date NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

insert into t(a) values(current_date);
insert into t select * from t;

CREATE TABLE `t` (

`a` date NOT NULL

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

insert into t(a) values(current_date);

insert into t select * from t;

I repeated the last statement until it got slow enough that I thought I could do a reasonable test; at this point there were 8388608 rows. I then optimized the table and restarted MySQL. The table ended up at 237MB.

Now let’s see how long a table scan with no WHERE clause takes:

mysql> select sql_no_cache count(*) from t;
+----------+
| count(*) |
+----------+
|  8388608 |
+----------+
1 row in set (5.23 sec)

mysql> select sql_no_cache count(*) from t;

+----------+

| count(*) |

+----------+

| 8388608 |

+----------+

1 row in set (5.23 sec)

I repeated this ten times and averaged the time: 5.01 seconds, ranging from 4.83 to 5.17 seconds. Next I ran it with a trivial WHERE clause:

mysql> select count(*) from t where a = current_date;

1	mysql> select count(*) from t where a = current_date;

(Pop quiz: do I need SQL_NO_CACHE on this query?) The average execution time for this query is 7.96, ranging from 7.65 to 8.15 seconds. So apparently the overhead of the WHERE clause is about 2.95 seconds, or 351 nanoseconds per row. It doesn’t seem like much per row, but it adds up to about an extra 60% cost for the query. If I add another WHERE clause,

mysql> select count(*) from t where a = current_date and left(a, 10) = '2008-10-29';

1	mysql> select count(*) from t where a = current_date and left(a, 10) = '2008-10-29';

The average time is 9.39 seconds, or an added 87% overhead.

It would be interesting, in an academic kind of way, to test different data types and various complexities of WHERE clauses, but I’m not really interested enough to spend much time on it. I just wondered whether the WHERE clause would even be noticeable. This test doesn’t really reveal anything you can do to make your queries faster — you usually can’t optimize the WHERE clause itself, only the way that the MySQL optimizer chooses to apply the clause to indexes. (However, as I found out some time ago, some date functions are faster than others. That would be interesting to look into more.)

0 0 votes

Article Rating

17 Comments

Oldest

Newest Most Voted

Martin

17 years ago

re Pop quiz: do I need SQL_NO_CACHE on this query?
I’m still new to this but I’m going to say ‘NO’ as you are using a variable (current_date) in the query instead of a fixed date (ie ‘2008-10-29’). Is that correct.

I am amazed at that overhead! Would using MyIsam and/or adding an INDEX on the date field make any difference?

Cheers
Martin

Author

Baron Schwartz

17 years ago

Martin, you’re right about current_date.

The WHERE is implemented at the server level so it ought not to depend on the engine or indexes.

Mark

17 years ago

I just ran some tests, and on my system the performance is improved by an index:
CREATE INDEX a_idx ON t(a);

Rows: 8388608
Time without WHERE: 4.8 seconds
Time with WHERE, without index: 5.9 seconds
Time with WHERE, with index: 5.2 seconds

My results are that the index reduces the additional overhead from the WHERE clause by half.

Author

Baron Schwartz

17 years ago

And with the index, but without WHERE?

Mark

17 years ago

That’s unchanged – 4.8 seconds.

Author

Baron Schwartz

17 years ago

Of course — because it’s using the primary key.

My point in asking those questions was to get you thinking about the query plan. You can’t compare the primary key to the secondary key and say that the difference is due to where clauses or not having where clauses. I think you’re misunderstanding how the storage engine api and InnoDB indexes work.

Martin

17 years ago

I’m confused. Where did the Primary Key come from in:
CREATE TABLE t (a date NOT NULL) ENGINE=InnoDB DEFAULT CHARSET=latin1;
?

Author

Baron Schwartz

17 years ago

InnoDB has a hidden 6-byte primary key.

Martin

17 years ago

Well that is three things I have learned in one post – awesome!
Thx
Marty

MC.Spring

17 years ago

I think you should use none function in the where case for the last statement, while that maybe took a little time down.

Sheeri K. Cabral

17 years ago

How much of this is actually overhead because of the comparison versus the where?

ie, can you compare with

WHERE TRUE
and
WHERE 1=1
and
WHERE ‘a’=’a’? That would be much more interesting, because the 60% or 87% are likely because of the access and comparison costs. But how much is the actual WHERE costing?

There’s also explain plan overhead.

Pep Pla

17 years ago

Sheri,

MySQL Optimizer does something called: Constant condition removal.
This means that a condition like this TRUE or 5=5 is going to be
removed by the optimizer prior to query execution.

Mark,
Is it possible that using a index is faster because counting index records is
faster than counting table records? It is important to realize that all the
index keys are the same so probably the table is larger than the index and
executing the query takes less io -> faster. (I’ll check table size vs. index size)

Pep

Pep Pla

17 years ago

Show table status gives half the size of the table to the index.

Also explain gives half (???) the rows if using the index.

I can understand that the index takes less space than the table as all the keys are the same but why the optimizer believes that using the index requieres half the rows? because is a b-tree?

Pep

Author

Baron Schwartz

17 years ago

Pep, yes, the index is half the physical size. The row estimate is only an estimate. The stats are inexact.

gizg

17 years ago

All well and good being a student with plenty of time on your hands to do such trivial (read: useless) tests!
HOWEVER, try and write a query in a real world web app that can do without a WHERE clause!?!?!?!?!

Cpt. Contrarian

15 years ago

‘@gizg
>HOWEVER, try and write a query in a real world web app that can do without a WHERE clause!?!?!?!?!

I’ve got one. I found this page because I was looking for an explanation for how my JOIN operation could get faster when I took out the WHERE clause even though that greatly multiplied the JOIN size.

Because the other side of the JOIN already limited the records as needed, the WHERE clause on the first subquery was unneeded. I assumed that having a smaller resultset would increase performance because less memory was copied and used; however, instead, having the WHERE clause on that one table tripled the overall runtime of the query. Upon reflection, I presume that’s because the first subquery requires the database engine to set up a temporary table to store the resultset, while a SELECT without a WHERE means that it can read directly from the source database (likely cached) when it creates the JOINed resultset.

So, this is interesting research; and I am keen enough to find out more that I’m going to do tests of my own.

Michael

13 years ago

If you use functions on the ‘left hand side’ of a where clause, you ignore any indexes. “and left(a, 10) = ‘2008-10-29′” should be “and ‘2008-10-29’ = left(a, 10)”