Buy Percona ServicesBuy Now!

Using ProxySQL and VIRTUAL Columns to Solve ORM Issues

 | March 29, 2018 |  Posted In: MySQL, ProxySQL

PREVIOUS POST
NEXT POST

ProxySQL and VIRTUAL columnsIn this blog post, we’ll look at using ProxySQL and VIRTUAL columns to solve object-relational mapping (ORM) issues.

There are a lot of web frameworks all around. Programmers and web designers are using them to develop and deploy any website and web application. Just to cite some of the most famous names: Drupal, Ruby on Rails, Symfony, etc.

Web frameworks are very useful tools. But sometimes, as with many human artifacts, they have issues. Any framework has its own queries to manage its internal tables. While there is nothing wrong with that, but it often means these queries are not optimized.

Here is my case with Symfony 2 on MySQL 5.7, and how I solved it.

The sessions table issue

Symfony has a table to manage session data for users on the application. The table is defined as follow:

The expiration time of the user session is configurable. The developers decided to configure it to be one month.

Symfony was serving a high traffic website, and very soon that table became very big. After one month, I saw it had more than 14 million rows and was more than 3GB in size.

Developers noticed the web application sometimes stalling for a few seconds. First, I analyzed the slow queries on MySQL and I discovered that sometimes Symfony deletes inactive sessions. It issued the following query, which took several seconds to complete. This query was the cause of the stalls in the application:

The query is not optimized. Let’s have a look at the EXPLAIN:

Every DELETE query was a full table scan of more than 14 million rows. So, let’s try to improve it.

First workaround

Looking around on the web and discussing it with colleagues, we’ve found some workarounds. But none of them was the definitive solution:

  1. Reduce expiration time in Symfony configuration. Good idea. One month is probably too long for a high traffic website. But we kept the expiration time configured at one month because of an internal business policy. But even one week wouldn’t have solved the full table scan.
  2. Using a different database solution. Redis was proposed as an alternative to MySQL to manage session data. This might be a good solution, but it could involve a long deployment time. We planned a test, but the sysadmins suggested it was not a good solution to have another database system for such a simple task.
  3. Patching Symfony code. It was proposed to rewrite the query directly into the Symfony code. Discarded.
  4. Create indexes. It was proposed to create indexes on sess_time and sess_lifetime columns. The indexes wouldn’t get used because of the arithmetic addition on the where clause. This is the only condition we have on the query.

So, what do we do if everything must remain the same? Same configuration, same environment, same query issued and no indexes added?

Query optimization using a virtual column

I focused on how to optimize the query. Since I was using 5.7, I thought about a generated virtual column. I decided to add a virtual column in the sessions table, defined as sess_time+sess_lifetime (the same as the condition of the query):

Any virtual column can have an index on it. So, I created the index:

Note: I first checked that the INSERT queries were well written in Symfony (with an explicit list of the fields to insert), in make sure this modification wouldn’t cause more issues. Making a schema change on a table that is in use by any framework, where the queries against the table are generally outside of your control, can be a daunting task.

So, let’s EXPLAIN the query rewritten as follows, with the condition directly on the generated indexed column:

The query now can to use the index, and the number of rows selected are the exact number of the session that we have to delete.

So far, so good. But will Symfony execute that query if we don’t want to modify the source code?

Using ProxySQL to rewrite the query

Fortunately, we already had ProxySQL up and running in our environment. We were using it just to manage the master MySQL failover.

One of the very useful features of ProxySQL is the ability to rewrite any query it receives into another one based on rules you can define. You can create queries from very simple rules, like changing the name of a field, to very complex queries that use a chain of rules. It depends on how complex the translation is that you have to do. In our case, we just needed to translate sess_time + sess_lifetime into sess_delete. The rest of the query was the same. We needed to define a very simple rule.

Let’s see how to create the rewrite rules.

Connect to the proxy:

Define the rewrite rule by inserting a record into the mysql_query_rules table:

The two fields I want to focus on are:

  • match_pattern: it defines the query to be matched using the regular expression notation. The + symbol must be escaped using because it’s a special character for regular expressions
  • replace_pattern: it defines how to rewrite the matched query. 1 is the value of the parameter matched by match_pattern into (.*)

For the meaning of the other fields, have a look at https://github.com/sysown/proxysql/wiki/ProxySQL-Configuration.

Once created, we have to save the rule to disk and put it on runtime to let it run effectively.

After that, the proxy began to filter the query and rewrite it to have a better execution plan using the index on the virtual column.

Note: pay attention when you need to upgrade the framework. If it needs to rebuild the database tables, you will lose the virtual column you’ve created. Just remember to recreate it and check it after the upgrade.

Conclusion

Developers love using web frameworks because they are very powerful in simplifying development and deployment of complex web applications. But for DBAs, sometimes internal queries can cause a bit of a headache because it is not well optimized or because it was not supposed to run in your “huge” database. I solved my case using ProxySQL and VIRTUAL columns with a minimal impact on the architecture of the system we had and avoided any source code patching.

Take this post as a tip in case you face similar issues with your application framework.

PREVIOUS POST
NEXT POST
Corrado Pandiani

Prior to joining Percona as a Senior Consultant, Corrado spent more than 20 years in developing web sites and designing and administering MySQL. He is a MySQL enthusiast since version 3.23 and his skills are focused on performances and architectural design. He's also a trainer and a MongoDB consultant.

2 Comments

  • I don’t understand why you use ProxySQL to rewrite the query because the optimizer can directly use the index on the virtual column (but you inverted the two fields):

    ALTER TABLE sessions ADD COLUMN sess_delete INT UNSIGNED GENERATED ALWAYS AS ((sess_lifetime + sess_time)) VIRTUAL;

    ALTER TABLE sessions ADD INDEX(sess_delete);

    mysql> EXPLAIN DELETE FROM sessions WHERE sess_lifetime + sess_time < 1521025847\G
    *************************** 1. row ***************************
    id: 1
    select_type: DELETE
    table: sessions
    partitions: NULL
    type: range
    possible_keys: sess_delete
    key: sess_delete
    key_len: 5
    ref: const
    rows: 5
    filtered: 100.00
    Extra: Using where
    1 row in set (0.00 sec)

    https://dev.mysql.com/doc/refman/5.7/en/generated-column-index-optimizations.html

    • Yes, you’re right.
      Inverting columns was for sure an oversight in this case.

      Consider the post as a general hint instead. A hint on how to cope with similar issues when dealing with ORMs or some legacy software you are not allowed to modify or you don’t want to modify for many valid reasons.

      The primary aim was to outline the proxysql’s query rewriting capability.

      This was a trivial case but there could be more complex cases of queries in which the optimizer can’t use the index on the generated column, or even a generated column is not needed at all, but you still have to rewrite the query in order to achieve a better execution plan. Proxysql is a very helpful tool in such cases.

Leave a Reply