Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

How is join_buffer_size allocated?

July 5, 2010

Author

Baron Schwartz

Insight for DBAs

MySQL

Share this Post:

When examining MySQL configuration, we quite often want to know how various buffer sizes are used. This matters because some buffers (sort_buffer_size for example) are allocated to their full size immediately as soon as they are needed, but others are effectively a “max size” and the corresponding buffers are allocated only as big as needed (key_buffer_size). There are many examples of this. What about join_buffer_size?

I saw a my.cnf with a 128M join_buffer_size the other day and needed to research this quickly before I gave advice. The join buffer is a special case. Unlike many of the buffers that are allocated per-thread (i.e. per-connection), this one is allocated per-join-per-thread, in special cases. A join buffer is allocated to cache rows from each table in a join when the join can’t use an index. This is because we know that the nested loop is effectively going to do a table scan on the inner table — it has to, because there’s no index. If the query joins several tables this way, you’ll get several join buffers allocated, for example, this one will have two:

select * from a_table
join b_table on b.col1 = a.col1
join c_table on c.col2 = b.col2

select * from a_table

join b_table on b.col1 = a.col1

join c_table on c.col2 = b.col2

You can see these un-indexed queries in SHOW STATUS as Select_full_join, and you want zero of them. Anyway, back to the question about how the buffer is allocated. Its meaning is actually “minimum join buffer size.” Here’s the code, in sql/sql_select.cc in 5.1.47 source:

14176 /*****************************************************************************
14177   Fill join cache with packed records
14178   Records are stored in tab->cache.buffer and last record in
14179   last record is stored with pointers to blobs to support very big
14180   records
14181 ******************************************************************************/
14182
14183 static int
14184 join_init_cache(THD *thd,JOIN_TAB *tables,uint table_count)
14185 {
... snip ...
14268   cache->length=length+blobs*sizeof(char*);
14269   cache->blobs=blobs;
14270   *blob_ptr=0;             /* End sequentel */
14271   size=max(thd->variables.join_buff_size, cache->length);
14272   if (!(cache->buff=(uchar*) my_malloc(size,MYF(0))))
14273     DBUG_RETURN(1);           /* Don't use cache */ /* purecov: inspected */
14274   cache->end=cache->buff+size;
14275   reset_cache_write(cache);
14276   DBUG_RETURN(0);
14277 }

14176 /*****************************************************************************

14177 Fill join cache with packed records

14178 Records are stored in tab->cache.buffer and last record in

14179 last record is stored with pointers to blobs to support very big

14180 records

14181 ******************************************************************************/

14182

14183 static int

14184 join_init_cache(THD *thd,JOIN_TAB *tables,uint table_count)

14185 {

... snip ...

14268 cache->length=length+blobs*sizeof(char*);

14269 cache->blobs=blobs;

14270 *blob_ptr=0; /* End sequentel */

14271 size=max(thd->variables.join_buff_size, cache->length);

14272 if (!(cache->buff=(uchar*) my_malloc(size,MYF(0))))

14273 DBUG_RETURN(1); /* Don't use cache */ /* purecov: inspected */

14274 cache->end=cache->buff+size;

14275 reset_cache_write(cache);

14276 DBUG_RETURN(0);

14277 }

On line 14271 the server decides the size of the cache: it is the greater of the join_buffer_size or the size that’s been determined to be needed. And it’s allocated all at once. So a 128M join_buffer_size is indeed very bad!

And this leads me to an interesting possibility to run the server out of memory:

set session join_buffer_size = 1 << 30; # 1GB
select * from
(select 1 union select 1) as x1
join (select 1 union select 1) as x2
join....

set session join_buffer_size = 1 << 30; # 1GB

select * from

(select 1 union select 1) as x1

join (select 1 union select 1) as x2

join....

That should try to allocate 1GB of memory for each join. If you execute this and nothing bad happens, you might be seeing this bug: http://bugs.mysql.com/55002

0 0 votes

Article Rating

10 Comments

Oldest

Newest Most Voted

Admin

Peter Zaitsev

16 years ago

Baron,

First about allocation – it is checked but it is NOT reported if it failed which can cause someone to set large buffer and get lower performance because it is not used.
I think MySQL should warn about such failures – print stuff in the error log file or at least have a global variable counter for allocation failures (which should stay 0 for any well configured system)

A note on crashing the server – you would do a lot better by allocating very large number of smaller buffers, for example 128K or so. If Server failed to allocate 1GB it still can likely allocate few MB needed to run most of concurrent queries successfully. If however you can’t allocate even 128K any more nothing else probably will work.

Also do not forget about over commit behavior for a lot of systems – they can let you to allocate a lot of memory but may fail when you can actually use it (access it)

dalin

16 years ago

Then what is the purpose of a setting that can only be used to waste memory?

Geoff

16 years ago

Is it correct that the allocated cache size is then the maximum of *one* row of the join data and the size specified in the configuration? As I understand, that would allow it to degrade from a Block-Nested Loop Join to a Nested Loop Join should only one row fit in cache.

Shlomi Noach

16 years ago

Baron,

Very interesting! A buffer-per-join-per query is news to me. What about deallocation, then? Do all these per-join buffers live for the duration of the connection?

Thanks

Author

Baron Schwartz

16 years ago

It’s not accurate to say it can ONLY be used for that. I explained in the blog post what the server uses it for.

Author

Baron Schwartz

16 years ago

Shlomi, buffers are usually deallocated at the end in “cleaning up” state. I haven’t checked where this one is.

Geoff, I am not sure; I did not look that closely at that code to figure out its purpose.

Shahryar Ghazi

15 years ago

I disagree with “A join buffer is allocated to cache rows from each table in a join when the join canâ€™t use an index.” because even if MySQL decides to scan all index rows and/or uses range for SELECTion it prefers to use the join buffer. Also, there is another variable that determines the max size of join buffer and that is max_join_size (no. of rows to be joined not the buffer size).

Author

Baron Schwartz

15 years ago

Shahryar, you are right. The manual (http://dev.mysql.com/doc/refman/5.0/en/nested-loop-joins.html) has the following quote: “Join buffering can be used when the join is of type ALL or index (in other words, when no possible keys can be used, and a full scan is done, of either the data or index rows, respectively), or range.”

Thanks for correcting me!