Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Quickly preloading Innodb tables in the buffer pool

May 2, 2008

Author

Peter Zaitsev

Insight for DBAs

Share this Post:

In the previous post I mentioned a way I use to preload Clustered Index (data) for Innodb tables. Though I thought this topic would benefit from a bit more information.

But lest first start with feature request for Innodb Team: All ways I mention here are hacks and they can’t be as efficient as native support. It would be great if Innodb would implement command to preload table to Innodb buffer pool, which would simply go through .ibd file sequentially and inject pages in the buffer pool. This would make preload done using sequential file scan even if indexed suffered a lot of page splits.

Now lets continue to the hacks 🙂

So As I mentioned you can load Innodb Table Clustered Index in the buffer pool pretty efficiently by using something like SELECT count(*) FROM tbl WHERE non_index_col=0 This works relatively well (though can be slow for fragmented tables) but it does not preload indexes in memory neither it does externally stored objects – BLOB and TEXT fields.

If you would like some non PRIMARY Indexes preloaded you can use something like SELECT count(*) from tbl WHERE index_col like “%0%” for each index. Only one such query per index is enough even if it is multiple column index.

To fetch BLOB/TEXT externally stored columns you can use similar query: SELECT count(*) from tbl WHERE blob_col like “%0%”. Note if you preloading BLOB/TEXT columns you do not need to use first query I mentioned because scanning potentially externally stored blobs will also scan Clustered key Anyway.

Now, say you have bunch of tables having few indexes – should you run multiple queries in parallel to get best preload speed ?
It depends – depending on key/clustered key fragmentation it may be faster to run queries one by one (keeping IO more sequential) or run multiple queries at once to get more outstanding requests at the same time – benchmark to find out.

If you just need to preload single large table you can chop it into several ranges and preload in parallel, such as SELECT count(*) FROM tbl WHERE id BETWEEN 1 and 10000000 AND non_index_col=0

0 0 votes

Article Rating

13 Comments

Oldest

Newest Most Voted

Keith Murphy

18 years ago

Thanks!! That is very helpful.

Kevin Burton

18 years ago

I used the BLACKHOLE storage engine for this. The only thing it’s really good for…

on bootup just run

CREATE TABLE BLACKHOLE_FOO LIKE FOO;
ALTER TABLE BLACKHOLE_FOO ENGINE = BLACKHOLE;
SELECT * FROM FOO ORDER BY ID INTO BLACKHOLE_FOO;

and repeat for every index.

If your tables all fit into memory you’ll be loaded in seconds.

Author

Peter Zaitsev

18 years ago

Kevin,

Too many statements 🙂 I’m also wondering if it is faster or slower than count() query as it has to go through insert code path at least on server side.

There are two other potential issues – it is possible for not innodb tables (or not primary keys) last query to pick to do filesort instead of table scan. Plus if you repeat the thing for bunch of indexes you will have considerable overhead by accessing row each time, not traversing index only.

venu

18 years ago

It will be nice if InnoDB/MySQL can implement something like …

innodb_pre_load_tables=List_of_tables

as configuration variable allowing frequently used tables to be loaded (provided it fits)

Along with something like ..

PRELOAD [FULL] TABLE foo

Kevin Burton

18 years ago

Venu,

This only works for tables that fit 100% in memory. Some people use the LRU semantics to cache their data.

It’s harder to get this working as you have to ‘warm up’ a box with new load.

One way to this could be to slowly expose it to more and more traffic.

Kevin

jim

18 years ago

couldn’t you use kevin’s blackhole idea, but by creating a temporary blackhole table using create/select?

CREATE TEMPORARY TABLE blackhole_foo ENGINE = BLACKHOLE SELECT * FROM foo ORDER BY id

it does seem a shame that ‘create table like’ doesn’t allow for specifying the engine.

Mark Rose

17 years ago

There’s an interesting patch to partially solve this problem: http://provenscaling.com/blog/2008/10/06/making-mysql-more-usable-innodb-saverestore-buffer-pool-patch/

Author

Peter Zaitsev

17 years ago

Mark,

This is nice patch but it is a patch. Plus also it is proof of concept so I’m not sure it is good enough to be used widely in production. Note as long as you start messing with data such a way you’ve got to deal with a lot of niche cases to make things work 100% – for example what happens if you crash during writing dump file ? How to deal with such file potential corruptions (store checkums and validate them) etc. But in general it is nice idea for many workloads.

Zonker Harris

18 years ago

The SELECT count(*) method relies on a side effect, that the tables are buffered during the count. The BLACKHOLE method relies on expected behavior of the system, that the table data has to be read to be inserted into the new table. Which is more likely to continue to work? The day a MySql developer optimizes “SELECT count(*)” (by recognizing that it is not necessary to load/buffer the data to count it) that method will stop working; the other will continue.

Webmaster Info

12 years ago

I do not write a great deal of remarks, but i did a few searching and wound up here Quickly preloading Innodb tables in the buffer
pool – MySQL Performance Blog. And I do have 2 questions for
you if you tend not to mind. Could it be just me or
does it give the impression like a few of the responses look as if
they are coming from brain dead people? 😛 And, if you are posting on additional places, I would
like to follow anything new you have to post.
Would you make a list of all of all your communal sites like your Facebook page, twitter feed, or linkedin profile?

Billy

11 years ago

This trick had no effect at all on query speeds when I tested it on MySQL 5.5.37. I used the “SELECT count(*) from tbl WHERE index_col like “%0%”” query for every index column.

Peter Hulstaert

11 years ago

At work we use databases that are to big to fit in memory, but we do want to preload after a restart of the service.
We usually do a
SELECT count(*) FROM TableA WHERE datecol >= curdate() – INTERVAL 366 DAY and (col1 like ‘%0%’ or col2 like ‘%0%’ or col3 ….. )

Like this we only load the current data in our memory.