EmergencyEMERGENCY? Get 24/7 Help Now!

A Look at MariaDB Subquery Cache

 | March 2, 2017 |  Posted In: MariaDB, MySQL

PREVIOUS POST
NEXT POST

MariaDB Subquery CacheThe MariaDB subquery cache feature added in MariaDB 5.3 is not widely known. Let’s see what it is and how it works.

What is a subquery cache?

The MariaDB subquery cache optimizes the execution of correlated subqueries. Correlated subqueries refer to a value from the parent query. For example:

MariaDB only uses this optimization if the parent query is a SELECT, not an UPDATE or a DELETE. The subquery results get cached only for the duration of the parent query.

MariaDB added the subquery cache in v5.3. It is controlled by optimizer_switch, a dynamic variable that contains many flags that enable or disable several optimizations. To disable the subquery cache, run:

You can also do this at the session level.

How does subquery cache work?

Let’s see how it works. To make things clearer, we will use an example. Consider these tables:

Now, we issue this query:

The server decides to read t1 first (the bigger table, as expected), and then access t2 using the subquery cache. It creates a MEMORY temporary table to store the results of the subquery, with an index on c (it is used to match the rows). Then it reads the first row from t1, and checks if the search is cached. It is not, so it reads t1 looking for rows with c=1 and copies the results into the cache. The next time it will find the value 1, and it will not need to access t2 because the matches are already cached. If you look at the data, you may notice that the value “5” appears twice in t1 (and is absent in t2). But the search is cached anyway, so the server searches for 5 in t2 only once.

I hope that you aren’t blindly accepting what I wrote until now: good DBAs need facts and metrics. Let’s be scientific: we’ll make a prediction, conduct an experiment and check the status variables to verify the prediction. The prediction is the list of rows that will be read from t1, with the running total of hits and misses:

The totals match, and the number of writes to the cache is equal to the misses (after a miss, a table access is done and cached).

The maximum size of an individual table is the minimum of tmp_table_size and max_heap_table_size. If the table size grows over this limit, the table is written to disk. If the MEMORY table creation fails (perhaps because MEMORY does not support BLOB), the subquery is not cached.

The total of hits and misses can be seen by reading two status variables: subquery_cache_hit and subquery_cache_miss. After 200 misses, the server checks the hit ratio for that particular subquery. If it is < 20%, it disables the cache for that subquery. If the hit rate is < 70% the table cannot be written to disk in case it exceeds the size limit. These numbers (200, 0.2, 0.7) are hardcoded and cannot be changed. But if you really want to test how MariaDB behaves with different parameters, you can change these constants in sql/sql_expression_cache.cc and recompile the server.

Isn’t this subquery materialization?

Subquery materialization is another strategy that the optimizer can choose to execute a query. It might look similar, because some data from a subquery are written to a MEMORY table – but this is the only similarity. The purpose and implementation of subquery materialization is different.

Let’s try to explain this with pseudocode.

Subquery materialization is for uncorrelated IN subqueries. Therefore the subquery is executed and materialized altogether:

The subquery cache is for correlated subqueries. Thus the subquery gets executed only for non-cached values:

Some considerations

Despite the similarity in names, the MariaDB subquery cache is not a query cache for subqueries. These features are different, implemented for different purposes. Obviously, the subquery cache doesn’t have the scalability and performance problems of the query cache (global mutex, table invalidation). As mentioned, a subquery cache table only survives for the duration of a statement, so it should be considered an optimizer strategy. For example, in some cases you might use the subquery cache for a WHERE … NOT IN subquery, but not for the WHERE … IN version, because the optimizer prefers to rewrite it as a JOIN.

Of course, not all correlated subqueries automatically benefit from this feature. Consider the example above: it is built to show that the subquery cache is useful. But we can easily build an example to show that can have a negative impact on performance: add rows to t1, and delete all duplicate values of a. There will be no hits, a temporary table is created, 200 reads and writes are performed, but it won’t help. After 200 misses, the cache will be disabled, yes, but what if this happens for each subquery? The damage may not be huge in a realistic case, but it is still damage. That’s why you can disable the MariaDB subquery cache.

PREVIOUS POST
NEXT POST

4 Comments

  • I wonder why this is
    > SET GLOBAL optimizer_switch=’subquery_cache=OFF’;
    not
    SET GLOBAL subquery_cache=OFF
    ?

    This way we will end up with something like
    SET GLOBAL optimizer_switch='subquery_cache="feature1='feature2=true'"';

    • Hi FractalizeR,
      Fortunately, you don’t need to set all flags of optimizer_switch. You only specify the flags that you want to change.

  • Hi Federico,

    Thanks for the post.

    If I understand correctly where it says: “It is not, so it reads t1 looking for rows with c = 1 …” you should say: “It is not, so it reads t2 looking for rows with c = 1 …”, right?. Thank you.

Leave a Reply