How much overhead is caused by on disk temporary tables

PREVIOUS POST
NEXT POST

As you might know while running GROUP BY and some other kinds of queries MySQL needs to create temporary tables, which can be created in memory, using MEMORY storage engine or can be created on disk as MYISAM tables. Which one will be used depends on the allowed tmp_table_size and also by the data which needs to be put to temporary tables – BLOB/TEXT columns are not supported with MEMORY storage engine so must use on disk MyISAM temporary table.

What i decided to do is to see how much overhead do on disk temporary tables cause compared to MEMORY tables. To have things comparable I used medium size table and types which can be presented both in MEMORY and MyISAM storage engine.

The benchmarks are done with MySQL 5.0.45 on 32bit Fedora Core 6 on Pentium 4 box – something I had available for testing.

The table contains 1.000.000 rows with mostly unique “c” column.

As you can see I’m using ORDER BY NULL clause as otherwise MySQL will sort the data after performing group by which is overhead we’re not looking to measure. This is actually good trick to use for cases when you do not need GROUP BY results sorted – it can significantly improve performance in some cases.

First I run the query with default key_buffer_size which is not large enough to fit all key blocks from temporary table index, so we get a lot of key writes which kill performance.

The problem with writes is OS can delay writes only for fixed amount of time and when it has to perform them anyway which can cause IO bound load even with small data which can fully fit in OS cache as in this case. In perfect world it should not happen and we should have seen 50MB sequentially written once per certain amount of time which takes fraction of second to do, but it is not how it happens in practice.

So how long does it take:

I repeated the run a few times with very similar results.

OK so what if we set key buffer to 128M ?

As you can see we have no writes because data fits in key buffer. Note however there is still significant amount of reads (the stats were flushed between query runs) which does not make sense as there no data to be read from the disk as nothing was written. So I filled bug on this.

So what was performance

Wow this is over 15 times better so you surely need to make sure your key_buffer_size is set to accommodate needs of your temporary table. Do not forget about it especially if running only Innodb tables so you may think there is no use for it.

Looking at this number I found it to be too good to be true as I’ve seen significantly worse performance for large data sets. So I decided to see what happens if I create a larger table – 4M rows. I sized key_buffer to 256M so it would fit everything needed and there was still enough memory for OS cache to keep temporary table. Results:

VMSTAT:

In this case as you can see there is a lot of dirty blocks flushing going on with pretty random IO. Clearly OS write caching is not effective for relatively large area (which still fits in memory) which is getting a lot of random writes.

OK. Now lets move to testing MEMORY tables for the same queries:

Ahh… I alway forget GLOBAL only sets global value but it does not affect current session. A little gotcha which still catches me.

For 1M rows we have

Wow. This is great difference even from our best MyISAM results.

For 4M of rows

So MEMORY table scales pretty well with query execution time being close to linear dependence of table size, and it can get 100 times faster than MyISAM on disk table even in case temporary table is small enough to fit in OS cache and key_buffer.

My next intension was to test placing table on tmpfs as this should avoid write overhead we’ve observing (and which is indeed very good production practice, if your temporary tables are moderately sized).

Unfortunately I ran into another bug which means I will have to postpone this part of test to another time.

Summary:
If this was too many details for you to read through here is the summary:

  • key_buffer_size is important for MyISAM temporary tables performance to avoid OS writes
  • OS Write cache is not as efficient as I would hope, at least on Linux
  • Performance of Disk MyISAM tables may not scale proportionally with table size, even when all data fits in memory
  • MEMORY temporary tables can be 10-100 times faster than disk based MyISAM tables

Hopefully MySQL will implement support of dynamic rows for MEMORY tables some time in the future and so we will be able to use MEMORY table for all cases when data set fits in memory as it is really worth it.

PREVIOUS POST
NEXT POST

Comments

  1. says

    You may also want to try tmpdir on ext2 (or a journaling file system with external journal on a ramdisk or /dev/null). This has the benefit of supporting larger than free memory files… and writes could be better than if in the case of tmpfs, being swapped out.

  2. says

    “As you can see I’m using ORDER BY NULL clause as otherwise MySQL will sort the data after performing group by which is overhead we’re not looking to measure. This is actually good trick to use for cases when you do not need GROUP BY results sorted – it can significantly improve performance in some cases.”

    I find this very interesting. When I got to know MySQL I learned how MySQL commits itself to delivering results ordered according to the GROUP BY clause (unless an explicit ORDER BY was included), and I always assumed that this was because MySQL implements GROUP BY by sorting the data.

    Apparently, I was dead wrong. ORDER BY NULL cannot help performance in any way if MySQL would use sorting to resolve GROUP BY…so, no I am puzzled:

    How is GROUP BY implemented, and why does MySQL commit itself to returning the results in the order specified by the GROUP BY clause?

    This latter commitment turns out to be something that is pretty common, but it hampers optimization…would it not be better if MySQL would not commit itself to returning the results in order, and let the user specify it explicitly with ORDER BY?

  3. Patrick Domack says

    Thanks. This helped alot. I speed up two of my greylist sql commands from around 15 min to under 2min, mainly from the order by null.

    I kind of agree with the previous comment, mysql seems to return everything else unordered unless you tell it to, seems strange it would default to some kind of order method for group by.

  4. peter says

    Ronald,

    This is indeed interesting. MySQL has 2 ways of doing group by via filesort and with temporary table. filesort returns result in sorted order just by nature.

    I’m not sure why Monty decided to sort results by default even in case temporary table was used. May be this was his interpretation of how it should behave but basically it was done this way. Later one of the clients performing group bys on tens of millions of rows complained and so ORDER BY NULL was added not to break users expectations as a lot of them could be already relying on GROUP BY being sorted.

    When this was being decided I was for making users writing bad SQL to suffer instead of forcing to use ORDER BY NULL if you want results unsorted but Monty took another decision :)

  5. peter says

    Stewart surely there are some more things to try. I’m not sure how wise is tmpfs in writes (I assume it manages swap io directly rater than being swapped out by OS) and I could try different file systems and options.

    These are just numbers for typical configuration.

  6. Marki says

    What is the benefit of storing temp tables on tmpfs instead of increasing tmp_table_size? Maybe that we can store tables with BLOB/TEXT in memory? But is there any limit on the max. size of disk based temp table? Because I don’t want big temp table to eat all my tmpfs & memory causing everything to be swapped out…

  7. says

    Hi Peter,

    thanks for your swift reply and explanation of the filesort GROUP BY implementation

    “Later one of the clients performing group bys on tens of millions of rows complained and so ORDER BY NULL was added not to break users expectations as a lot of them could be already relying on GROUP BY being sorted.”

    “When this was being decided I was for making users writing bad SQL to suffer instead of forcing to use ORDER BY NULL if you want results unsorted but Monty took another decision :)”

    mmm…what a pity…I was thinking about this when looking at a query like this:

    SELECT e.description
    , e.name
    , e.id
    , sum(q.score) score
    FROM exam e
    INNER JOIN question q
    ON e.id = q.exam_id
    GROUP BY e.description
    , e.name
    , e.id

    We want to calculate the SUM of the question score for the group of questions corresponding to each exam. Technically, we need to GROUP BY only on the primary key of exam which is defined only on the id column. However, the GROUP BY clause contains also the name and description columns because the person (or reporting tool!) that wrote the query does not know MySQL is one of the few databases that will allow you to SELECT name and description without including these also in the GROUP BY. So, we get a SQL92 compliant full group by clause. Unfortunately, the description column is included first in the group by clause….and MySQL is determined to sort on that column (and then on name should description not be unique). So ironically, although MySQL could in principle use the index on the id column to do a pretty fast GROUP BY, it is all for nothing because of an inefficient sort that needs to take place because of the non-standard commitment to returning the rows in the order implied by the GROUP BY clause.

    ORDER BY NULL is a nice trick…to work around this problem if you know you have it. Unfortunately, reporting tools probably don’t know this either.

    BTW Peter, does it have to be ORDER BY NULL? Or will any constant do instead of NULL?

    Roland Bouman.

  8. peter says

    Ronald,

    ORDER BY 1 means ordering by the first expression in SELECT list so it would not work. I have not tried with string constants.

  9. peter says

    Marki,

    Right if data fits in memory using in memory table is better in most cases.
    I mainly did this comparison keeping in mind cases when BLOB/TEXT can’t be used because I expect them to have similar overhead (though I can try converting this column to TEXT to see how long time changes)

    There is another reason – MySQL only allows you to set tmp_table_site per thread – there is no global limit which causes problems with resource management – as you do not know how many queries are running concurrently and what resources they are using you can’t say how many resources you can allocate to given query.

  10. Michael says

    Dear Peter,

    I’d never considered putting tmpdir on /dev/shm, but that seems to have made a small (

  11. Clint Byrum says

    Just wondering if you ever tried this again since they fixed the temp table storage bug that you pointed out in your article. We have our tmpdir set to a tmpfs partition on a big, busy server.. and it seems to have smoothed out some heavy query loads.. but I am curious to see what you’ve found.

  12. says

    Clint,

    Note the bug I described in the blog post was only added in 5.0.44 and fixed promptly in next versions so I just was not too lucky. In general using tmpfs indeed helps a lot especially for moderately sized temporary tables which fit in the cache.

    Another interesting development is Maria which would be very good for temporary tables as it caches data not only indexes.

  13. says

    The MySQL bug with tmpdir was fixed in v5.0.48 (if somebody didn’t know). The current (at the time of writing) stable Debian package has v5.0.32 so if you’re using tmpfs for MySQL’s tmpdir, keep an eye out for the Debian upgrades of the MySQL package (it might be possible that an upgrade will take you to a MySQL server version that has this bug).

    Btw. using tmpfs for tmpdir helped me a lot. Our server (used for a news portal) was totally overloaded with I/O when MySQL started to write out several n*100MB temp tables to disk (a tmp table was mostly less than 200MB, but several tmp tables were created in a row resulting in heavy I/O). Sometimes this took a minute or so and the portal came down to it’s knees (eg. it slowed down and started to give “Internal server error” messages in the end). Using a 500MB tmpfs fixed this and everything seems to be just fine since then. However this is just a quick fix, because those large tmp tables should not even be created in the first place.

  14. says

    Right. I should again check tmpfs vs heap table to see how it works.

    In general of course it would be great if MySQL would have the option to use tmpfs only up to certain space and when switch to the drive – this would allow to use it safer.

    Though I guess it is going in different direction – there is a patch from Ebay to support BLOB/TEXT in memory tables and there is Maria comming which will cache the data in pagecache avoiding disk IO for temp tables if they are small enough.

  15. says

    Hi!

    I’m using a tmpdir on the tmpfs for about 14 days and it performs very well (5.0.51a). The only problem I have run into is creating temporary InnoDB table (file per table setting), because tmpfs cannot do IO_DIRECT. It prints a warning to the error log every time I create a tmp innoDB table. However, despite these warnings, I think the result is OK and the table is created (see http://bugs.mysql.com/bug.php?id=26662)

    krteQ

  16. vijay says

    When i retrieve the data from mysql tables using PHP, the output is in a single XSL file. if my web page accessing for download the xsl file many users simultaneously, the little data retrieved . but many rows in my tables using joins. what is the problem? any memory problem or key buffer limit problem, plz let me know as soon as possible, im in hurry.

    thanking u

  17. vijay says

    In my webpage, I want to get a output in a XSL file. The data are retrieved from Mysql db. Retrieving the bulk records in a single file. When I access that download file page simultaneously multi systems in the remote, I couldn’t get the file also my data are crashed. What is the problem? PHP.ini or DB? Plz tell me clearly. And how to solve this problem?

    Urgent. Thanks in advance.

  18. says

    Great article here. A lot of MySQL optimization articles don’t even bother looking at disk i/o, so that was a refreshing change. Also the ORDER BY NULL was something I never knew about! Cheers.

  19. says

    Correction!

    You said, “Which one will be used depends on the allowed tmp_table_size and also by the data which needs to be put to temporary tables – BLOB/TEXT columns are not supported with MEMORY storage engine so must use on disk MyISAM temporary table.” but I guess temporary tables also depend upon MAX_HEAP_TABLE_SIZE.

Leave a Reply

Your email address will not be published. Required fields are marked *