How is join_buffer_size allocated?

July 5, 2010

Author

Baron Schwartz

Insight for DBAs

MySQL

Share this Post:

When examining MySQL configuration, we quite often want to know how various buffer sizes are used. This matters because some buffers (sort_buffer_size for example) are allocated to their full size immediately as soon as they are needed, but others are effectively a “max size” and the corresponding buffers are allocated only as big as needed (key_buffer_size). There are many examples of this. What about join_buffer_size?

I saw a my.cnf with a 128M join_buffer_size the other day and needed to research this quickly before I gave advice. The join buffer is a special case. Unlike many of the buffers that are allocated per-thread (i.e. per-connection), this one is allocated per-join-per-thread, in special cases. A join buffer is allocated to cache rows from each table in a join when the join can’t use an index. This is because we know that the nested loop is effectively going to do a table scan on the inner table — it has to, because there’s no index. If the query joins several tables this way, you’ll get several join buffers allocated, for example, this one will have two:

<br>select * from a_table<br>join b_table on b.col1 = a.col1<br>join c_table on c.col2 = b.col2<br>

1	<br>select * from a_table<br>join b_table on b.col1 = a.col1<br>join c_table on c.col2 = b.col2<br>

You can see these un-indexed queries in SHOW STATUS as Select_full_join, and you want zero of them. Anyway, back to the question about how the buffer is allocated. Its meaning is actually “minimum join buffer size.” Here’s the code, in sql/sql_select.cc in 5.1.47 source:

<br>14176 /*****************************************************************************<br>14177   Fill join cache with packed records<br>14178   Records are stored in tab-&gt;cache.buffer and last record in<br>14179   last record is stored with pointers to blobs to support very big<br>14180   records<br>14181 ******************************************************************************/<br>14182<br>14183 static int<br>14184 join_init_cache(THD *thd,JOIN_TAB *tables,uint table_count)<br>14185 {<br>... snip ...<br>14268   cache-&gt;length=length+blobs*sizeof(char*);<br>14269   cache-&gt;blobs=blobs;<br>14270   *blob_ptr=0;             /* End sequentel */<br>14271   size=max(thd-&gt;variables.join_buff_size, cache-&gt;length);<br>14272   if (!(cache-&gt;buff=(uchar*) my_malloc(size,MYF(0))))<br>14273     DBUG_RETURN(1);           /* Don't use cache */ /* purecov: inspected */<br>14274   cache-&gt;end=cache-&gt;buff+size;<br>14275   reset_cache_write(cache);<br>14276   DBUG_RETURN(0);<br>14277 }<br>

1

14176 /***************************************************************************** 14177 Fill join cache with packed records 14178 Records are stored in tab->cache.buffer and last record in 14179 last record is stored with pointers to blobs to support very big 14180 records 14181 ******************************************************************************/ 14182 14183 static int 14184 join_init_cache(THD *thd,JOIN_TAB *tables,uint table_count) 14185 { ... snip ... 14268 cache->length=length+blobs*sizeof(char*); 14269 cache->blobs=blobs; 14270 *blob_ptr=0; /* End sequentel */ 14271 size=max(thd->variables.join_buff_size, cache->length); 14272 if (!(cache->buff=(uchar*) my_malloc(size,MYF(0)))) 14273 DBUG_RETURN(1); /* Don't use cache */ /* purecov: inspected */ 14274 cache->end=cache->buff+size; 14275 reset_cache_write(cache); 14276 DBUG_RETURN(0); 14277 }

On line 14271 the server decides the size of the cache: it is the greater of the join_buffer_size or the size that’s been determined to be needed. And it’s allocated all at once. So a 128M join_buffer_size is indeed very bad!

And this leads me to an interesting possibility to run the server out of memory:

<br>set session join_buffer_size = 1 &lt;&lt; 30; # 1GB<br>select * from<br>(select 1 union select 1) as x1<br>join (select 1 union select 1) as x2<br>join....<br>

1	<br>set session join_buffer_size = 1 << 30; # 1GB<br>select * from<br>(select 1 union select 1) as x1<br>join (select 1 union select 1) as x2<br>join....<br>

That should try to allocate 1GB of memory for each join. If you execute this and nothing bad happens, you might be seeing this bug: http://bugs.mysql.com/55002