How much overhead is caused by on disk temporary tables

As you might know while running GROUP BY and some other kinds of queries MySQL needs to create temporary tables, which can be created in memory, using MEMORY storage engine or can be created on disk as MYISAM tables. Which one will be used depends on the allowed tmp_table_size and also by the data which needs to be put to temporary tables – BLOB/TEXT columns are not supported with MEMORY storage engine so must use on disk MyISAM temporary table.

What i decided to do is to see how much overhead do on disk temporary tables cause compared to MEMORY tables. To have things comparable I used medium size table and types which can be presented both in MEMORY and MyISAM storage engine.

The benchmarks are done with MySQL 5.0.45 on 32bit Fedora Core 6 on Pentium 4 box – something I had available for testing.

The table contains 1.000.000 rows with mostly unique “c” column.

As you can see I’m using ORDER BY NULL clause as otherwise MySQL will sort the data after performing group by which is overhead we’re not looking to measure. This is actually good trick to use for cases when you do not need GROUP BY results sorted – it can significantly improve performance in some cases.

First I run the query with default key_buffer_size which is not large enough to fit all key blocks from temporary table index, so we get a lot of key writes which kill performance.

The problem with writes is OS can delay writes only for fixed amount of time and when it has to perform them anyway which can cause IO bound load even with small data which can fully fit in OS cache as in this case. In perfect world it should not happen and we should have seen 50MB sequentially written once per certain amount of time which takes fraction of second to do, but it is not how it happens in practice.

So how long does it take:

I repeated the run a few times with very similar results.

OK so what if we set key buffer to 128M ?

As you can see we have no writes because data fits in key buffer. Note however there is still significant amount of reads (the stats were flushed between query runs) which does not make sense as there no data to be read from the disk as nothing was written. So I filled bug on this.

So what was performance