Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Reality of Innodb Caching

April 22, 2011

Author

Peter Zaitsev

Insight for DBAs

MySQL

Share this Post:

I have mentioned few times Innodb caches data in pages and even if you have working set consisting of relatively few rows your working set in terms of pages can be rather large. Now I came to do a little benchmark to show it in practice. I’m using standard “sbtest” with 10mil rows with data file of 2247098368 which gives us 224 bytes of gross storage per row, including all overhead etc. Actual row
size in this table is smaller but lets use this number for our math. For benchmark I’m using set number of random IDs which are repeatedly selected in random order, which would illustrate data set with
some randomly distributed “hot” rows. I read every row in the set once before timing, so when there is enough memory to cache every single row there should not be any disk reads in benchmark run itself.

I’m using 128M buffer pool for this test, which should fit roughly 500K of rows 224 bytes in size. Lets see what Benchmark really shows:

Testing 100 out of 10000000 keys  24.79 seconds,  4034.53 lookups per second
Testing 200 out of 10000000 keys  25.66 seconds,  3896.96 lookups per second
Testing 400 out of 10000000 keys  25.03 seconds,  3995.65 lookups per second
Testing 800 out of 10000000 keys  24.40 seconds,  4097.73 lookups per second
Testing 1600 out of 10000000 keys  24.83 seconds,  4026.59 lookups per second
Testing 3200 out of 10000000 keys  25.47 seconds,  3926.65 lookups per second
Testing 6400 out of 10000000 keys  26.51 seconds,  3771.73 lookups per second
Testing 12800 out of 10000000 keys  386.20 seconds,  258.94 lookups per second
Testing 25600 out of 10000000 keys  640.12 seconds,  156.22 lookups per second
Testing 51200 out of 10000000 keys  775.38 seconds,  128.97 lookups per second
Testing 102400 out of 10000000 keys  841.65 seconds,  118.81 lookups per second

Testing 100 out of 10000000 keys 24.79 seconds, 4034.53 lookups per second

Testing 200 out of 10000000 keys 25.66 seconds, 3896.96 lookups per second

Testing 400 out of 10000000 keys 25.03 seconds, 3995.65 lookups per second

Testing 800 out of 10000000 keys 24.40 seconds, 4097.73 lookups per second

Testing 1600 out of 10000000 keys 24.83 seconds, 4026.59 lookups per second

Testing 3200 out of 10000000 keys 25.47 seconds, 3926.65 lookups per second

Testing 6400 out of 10000000 keys 26.51 seconds, 3771.73 lookups per second

Testing 12800 out of 10000000 keys 386.20 seconds, 258.94 lookups per second

Testing 25600 out of 10000000 keys 640.12 seconds, 156.22 lookups per second

Testing 51200 out of 10000000 keys 775.38 seconds, 128.97 lookups per second

Testing 102400 out of 10000000 keys 841.65 seconds, 118.81 lookups per second

As we see in this case database can really fit only somewhere between 6400 and 12800 different rows which is about 1/50 of “projected size”. This number is very close to what I would have estimated –
With 224 bytes per row we have some 70 rows per page so with random distribution you would expect up to 70 times data which have to be fetched to the database than you need.

I’m wondering if any over storage engine can show better results in such benchmark. Falcon with plans for row cache would fair better, so I would expect better results with PBXT. I also should check with
smaller page sizes available in Percona Server and my expectation is with 4K page size I can fit 4x more distinct rows in my cache.

0 0 votes

Article Rating

8 Comments

Oldest

Newest Most Voted

tobi

15 years ago

A nice performance trick is to reassign the PK values of such a table from time to time in order to group hot rows together. That way the buffer cache is utilized highly.

Pavel Shevaev

15 years ago

Peter,

In case it’s possible to fit 4x more distinct rows with 4K pages what are the possible cons of a lesser page size?

Author

Peter Zaitsev

15 years ago

Well,

This test looks at worse case scenario. In real world things typically are not that bad and especially with proper choice of your primary key you can get a lot better physical data access locality

Author

Peter Zaitsev

15 years ago

You’re right.

I’m doing set of benchmarks now including with compression and 4K page sizes
4K page sizes have more overhead for LRU structures, pagehash and such as well as there is more overhead in storage as there is some data stored per page, finally 4K pages can only store up to 2K rows on page down from 8K for 16K pages. See Vadim post on different page sizes – there is really a good reason to support that

Author

Peter Zaitsev

15 years ago

You’re right.

Author

Peter Zaitsev

15 years ago

Well,

This test looks at worse case scenario. In real world things typically are not that bad and especially with proper choice of your primary key you can get a lot better physical data access locality

Pavel Shevaev

15 years ago

Peter,

In case it’s possible to fit 4x more distinct rows with 4K pages what are the possible cons of a lesser page size?

tobi

15 years ago

A nice performance trick is to reassign the PK values of such a table from time to time in order to group hot rows together. That way the buffer cache is utilized highly.