September 22, 2014

A workaround for the performance problems of TEMPTABLE views

MySQL supports two different algorithms for views: the MERGE algorithm and the TEMPTABLE algorithm. These two algorithms differ greatly. A view which uses the MERGE algorithm can merge filter conditions into the view query itself. This has significant performance advantages over TEMPTABLE views. A view which uses the TEMPTABLE algorithm will have to compute the entire contents of the view before any filter is applied. Computing the entire contents requires a temporary table and many more rows may be accessed than otherwise would had the filter been merged into the view query.

As long as a view avoids aggregation, DISTINCT, GROUP BY, ORDER BY, etc, then it can use the MERGE algorithm. Unfortunately, this means that views of any significant complexity will almost always use the TEMPTABLE algorithm.

This blog post demonstrates a workaround that allows the convenience of complex views, including those with aggregation (that is views which use the TEMPTABLE algorithm) with some of the performance advantages of MERGE algorithm views.

As a demonstration, consider the following table which contains a combination of integer values. There are enough values in the table such that a scan of all rows takes a second or two.

If we define a view which uses no aggregation, then MySQL will let us use the MERGE algorithm. Notice that there are no warnings:

If we then write a SQL query to aggregate over the rows of the view and we add a filter condition on c1, the view is accessed quicky:

We can use EXPLAIN EXTENDED to see that MySQL rewrites the view query to include the restriction:

Notice the rewritten SQL query:

Now, lets try to use the MERGE algorithm with an aggregation query:

Notice that MySQL sets the ALGORITHM=UNDEFINED, and will chose to use TEMPTABLE when the view is actually used:

When you try to restrict the second view using a filter on c1, performance is not good. MySQL is scanning the entire table to produce the result:

Even a query with an impossible where clause takes a long time to process:

You can see that MySQL is accessing millions of rows:

Now to the actual workaround. MySQL won’t merge query filters into the view definition, but we can modify the physical definition of the view in a way which provides a different means of providing filter conditions to the query.

Lets start by creating a parameter table. This table might be view specific or you could construct it in such a way that it can be shared between views. This example is somewhere in between those two extremes:

Notice that MySQL still won’t let us use the MERGE algorithm (1 warning):

There are two important things to consider here. First, notice the use of connection_id() in the view definition. This makes sure that the only rows examined in the params table are those which belong to this connection. When the table is inserted too (see below) the connection_id() function will be used to generate the value for the connection_id column. Also notice the restriction that v1.c1 = p.param1_val. This restricts the query to only those rows where v1.c1 matches the value in the params table.

Place parameters into the params table using the REPLACE statement:

When MySQL executes the new view it will be able to restrict the rows examined based on the param1_val value in the params table:

Notice the vastly different plan over the old one. We no longer examine millions of rows:

In review, I like views for their convenient encapsulation of SQL logic that would usually have to be repeated in SQL statements throughout the application otherwise. The convenience sometimes comes with high cost though, particularly when the TEMPTABLE algorithm is used. The technique outlined in this post can be used to make a trade off between increased query complexity for improved view performance.

About Justin Swanhart

Justin is a Principal Support Engineer on the support team. In the past, he was a trainer at Percona and a consultant. Justin also created and maintains Shard-Query, a middleware tool for sharding and parallel query execution and Flexviews, a tool for materialized views for MySQL. Prior to working at Percona Justin consulted for Proven Scaling, was a backend engineer at Yahoo! and a database administrator at Smule and Gazillion games.

Comments

  1. tobi says:

    It’s unbelievable how bad the mysql query execution engine is. MS SQL, Oracle and even Postgres do not have problems like this at all. Of course the can optimize across views. And they have merge and hash join so you are not forced to join manually in PHP like some very big sites are forced to… I cannot believe that MySql still does not have a better query execution engine.

  2. peter says:

    Justin,

    Nice and creative workaround. I never thought about it. Though I’m not sure I would use it – VIEWs can simplify how things look and how developers can access data, in this case it however complicate things a lot. Why not to use direct query in this case ?

  3. Justin Swanhart says:

    Hi Peter,

    Mostly it was just a cool trick that I thought up and I thought I would share it. There are lots of limitations on versatility such as it wouldn’t really work if you had read_only=true on a slave for example.

    One place where I think it might be useful is in ETL or ELT processes. Views can be very convenient for ETL purposes as they abstract the SQL for sometimes complex relationships. Adding a parameter table for such an ETL process would be pretty straightforward as the ETL flow could simply insert parameters into the parameter table. Such parameters might be used to get the total order amount for a particular order number that has changed for example.

  4. Justin,

    Nice one. I’ve encountered a similar solution, please see Views: better performance with condition pushdown.
    Towards the end I present with a use case I’ve seen.

  5. Justin Swanhart says:

    Shlomi,

    Wow, nice post Shlomi. I didn’t think about the fact that stored functions or UDF could be called in the query and provide session state information.

  6. Justin Swanhart says:

    I have a UDF around somewhere called get_session_var(). I need to dig that up.

  7. Olaf says:

    Hi, good idea but what to you suggest in tables larger than 2 GB. In our case Mysql creates a temp table on the hard drive (we defined 1GB temp table size) so the query takes 7 sec.

    Any idea?

    Olaf

Speak Your Mind

*