Percona Server: Thread Pool Improvements for Transactional WorkloadsAlexey Stroganov
In a previous thread pool post, I mentioned that in Percona Server we used an open source implementation of MariaDB’s thread pool, and enhanced/improved it further. Below I would like to describe some of these improvements for transactional workloads.
When we were evaluating MariaDB’s thread pool implementation, we observed that it improves scalability for
AUTOCOMMIT statements. However, it does not scale well with multi-statement transactions. The
UPDATE_NO_KEY test which was run as an
AUTOCOMMIT statement and inside a transaction gave the following results:
After analysis, we identified the major cause of that inefficiency: High latency between individual statements inside transactions. This looked pretty similar to the case when transactions are executed in parallel without thread pool. Latency there is high as well, though the cause of high latency in these two cases is different.
- In the “one-thread-per-connection” case, with 1000 connections, higher latency is caused by increased contention on accessing MySQL server shared resources like structures/locks/etc.
- In the case of “pool-of-threads”, 1000 client connections will be organized into
thread_pool_sizegroups (or to be more specific into
thread_pool_sizequeues), and latency here comes not from contention as we have much smaller number of parallel threads. It comes from the execution order of individual statements in transactions. Suppose that you have 100 identical transactions (each with 4 statements in it) in the thread group queue. As transactions are processed and executed sequentially, statements of transaction T1 will be placed at the following positions in the thread pool queue: 1…101…201…301…401. So in case of a uniform workload distances between statements in the transaction will be 100. This way transaction T1 may hold server resources during execution of all statements in the thread pool queue between 1 and 401, and that has a negative impact on performance.
In an ideal world, the number of concurrent transactions does not matter, as long as we keep the number of concurrent statements sufficiently low. Reality is different though. An open transaction which is not currently executing a statement may still block other connections by holding metadata or row-level locks. On top of that, any MVCC implementation should examine states of all open transactions and thus may perform less efficiently with large numbers of transactions (we blogged about InnoDB-specific problems here and here).
In order to help execute transactions as fast as possible we introduced high and low priority queues for thread pool. Now with default thread pool settings, we check every incoming statement, and if it is from an already started transaction we put it into the the high priority queue, otherwise it will go in the low priority queue.
Such reordering allowed to notably reduce latency and resulted in very good scalability up to a very high number of threads. You can find details about this optimization in our documentation.
Now let’s check how these introduced changes will affect the workload we mentioned earlier in this article.
The next graphs show the results for the
UPDATE_NO_KEY test that was run as an
AUTOCOMMIT statement, and inside a transaction for MariaDB with thread_pool and Percona Server with the
thread_pool_high_priority mode=statements – which is very similar to behavior of thread_pool in MariaDB and Percona Server with
thread_pool_high_priority mode=transactions – optimization that performs statements reordering of the transactions in the thread pool queue.
As seen, we get nearly flat throughput with thread_pool_high_prio_mode=transactions even with very high numbers of users connections.