MySQL Limitations Part 4: One thread per connection

PREVIOUS POST
NEXT POST

This is the third in a series on what’s seriously limiting MySQL in core use cases (links: part 1, 2, 3). This post is about the way MySQL handles connections, allocating one thread per connection to the server.

MySQL is a single process with multiple threads. Not all databases are architected this way; some have multiple processes that communicate through shared memory or other means. It’s cheap to create a connection to MySQL, because it just requires creating a thread (or taking one from a cache). This is generally so fast that there isn’t really the need for connection pools as there is with other databases, at least not in the same way. Windows in particular has had excellent threading support practically forever; Linux has very good threading now, but that wasn’t always the case.

However, many development environments and programming languages really want a connection pool. They’re just built that way (I’m looking at you, Java). And many others use persistent connections by default, so that a connection isn’t really closed when it’s closed; it’s kind of like a connection pool, except that the connection is persisted from request to request within the same process, rather than being shared with whichever request needs a connection.

Connection pools and persistent connections combined with a large number of application servers can lead to a situation where the database server has a very large number of connections open to it, most of which are doing nothing. It’s not uncommon for me to see a server with 1000 to 5000 connections open, and maybe one to three are actually running queries on average. These connections originate from dozens to hundreds of application server instances. When you have a heavily sharded or otherwise horizontally scaled application, it’s not only easy to get into this pickle, it’s really hard or impossible to avoid it.

And with 5000 connections open, you get 5000 threads in the server. That increases the overhead from thread scheduling, and potentially memory usage as well. I feel like I’m forgetting some reasons that this matters — please fill in whatever’s missing in the comments.

There can be more than one solution to this problem, but the one that’s actually partially implemented is a pool of threads, which was originally coded for MySQL 6.0, but is available now in MariaDB.

Unfortunately it isn’t a full solution, because it can cause undesirable lock-out or waiting, and the specific implementation has a scalability bottleneck on multicore servers. Mark Callaghan has done much more investigation of the pool of threads than I have. There are more details in this blog post by Mark, and two followup blog posts from Tim Cook (1, 2).

Thanks for the great comments on the last post. Some of them were good guesses. Remember that the context for this series isn’t micro-limitations or edge-case badness (even if they are serious in some cases), but rather a focus on shortcomings in the main use cases for the server. There are a lot of things MySQL doesn’t do well, but it doesn’t matter that much, because that’s not what it’s designed for. Wrong tool, wrong use, NotABug. I’m thinking of the lack of sort-merge joins or intra-query parallelism, for example. It would be lovely to have those things, if you’re running a data warehouse on MySQL, and in some cases for other uses too (note that most databases that do have these query plans usually try to use nested-loop joins whenever possible, because of things like the lower startup cost for the query). But MySQL isn’t a data warehouse DBMS first and foremost. It’s a general-purpose OLTP database server that runs well on affordable hardware and is great for Web usage. It’s so good, in fact, that it can be used for tons of other things such as… data warehousing. But it isn’t a Netezza or Paraccel, and if it were, it wouldn’t be a great OLTP web database too.

MySQL replication is one of the core, fundamental features — and it’s single-threaded and relies on the binary log, which are two major limitations. And it has subqueries, which are a core, fundamental part of SQL — but it’s bad at certain kinds of them. That’s why I listed those as major limitations. And because MySQL is a multi-threaded database for Web usage that tends to be used in sharded environments with tons of application servers, which creates a situation with many thousands of connections to the database, and because it doesn’t handle that very well, I list its one-thread-per-connection design as a serious limitation.

PREVIOUS POST
NEXT POST

Comments

  1. Raine says

    It’s not just about scaling up through a connection pool on MySQL incoming/outgoing network io handling side.
    We need a more asynchronous behavior from client server comunication (something like Postgres do).

    It would just be great to register callback functions to handle events: query completion, query handling exceptions, query processing status, and so on, on the client side. Everything asynchronous. I bet this would make MySQL server use more eficiently its internal buffers, and heating up the caches, thus improving performance and efficiency.

    Any future possibility of having this funtionality/paradigm on client libs??

    Regards,
    Raine

  2. says

    For the Java connection pool issue you mention, it’s considered bad form to keep many unused connections in the pool. The normal solution is to tune the pools to drop them, say after 60 seconds and also to put an upper limit on the number of connections the pool will hand out before blocking.

    This is a good set of articles by the way. If you don’t have it on your list already I would add the fact that MySQL lacks good support for online backup to your list. The pg_start_backup()/pg_stop_backup() functionality from PostgreSQL would be great in MySQL, though probably very hard to implement in a way that works with multiple engines.

  3. Patrick Casey says

    I think connection pooling has a use in high performance applications, regardless of what the setup/teardown cost for a connection is going to be on the server. Even on a server with *very* cheap connection generation overhead, its still going to be an extra round trip between the client and the server. Depending on your wire protocol implementation it might even require a fresh TCP handshake which is going to add another 3 round trips.

    If you’re on a loopback interface the round trip overhead is miniscule, but even on my switched network for example I’m looking at maybe 100 microseconds ping time between app and database server. Add in 3 round trips for a handshake and 1 round trip for a connection establish and we’ve added 400 microseconds to my database interaction.

    Doesn’t sound like much, but if most of your queries are fetches by primary key (server time 500 active connections against them and I haven’t noticed a correlation between connection count and performance (good or bad), but that doesn’t mean I’m not taking a performance hit, I just might not have noticed it.

  4. Mark Matthews says

    I agree with Robert…Connection pools aren’t the anti-pattern, in fact I usually recommend they’re used as a form of rate-limiting queue for a Java application. The anti-pattern is a connection pool configured to have idle connections for any longer than it takes to smooth out any arrival-rate spikes.

  5. says

    Robert, I forgot to mention: my imaginary scenario is 5000 connections, 4996 of them are in Sleep status, but the maximum time any of them has been sleeping is 1 second :-) So they are used and reused quite often — the connections themselves are used a lot — but the utilization (busy time divided by wall-clock time) is obviously very low, thus should be perfectly share-able across a pool of threads, if properly designed.

    Patrick, you are right of course, there is a benefit to keeping a pool of connections open. I see now that I implied it’s bad, but I didn’t really mean to say that. I really only meant to point out that “connections are cheap and fast!” became a mindset among PHP developers early on.

  6. Mark Matthews says

    Baron,

    Of course one can’t easily fix the utilization of MySQL server-side threads from the client side, unless you start to re-architect your application to pool work, not connections and keep all connections busy all of the time by filling them with work (i.e. make the interaction with persistence async from the client’s point of view). This does cause issues with most people’s notions of transactions, so I guess we’re all waiting for a workable solution server-side to increase per-thread utilization.

  7. says

    Heh, I didn’t realize it before, but seems that we are actively working on all 4 of your major limitations in MariaDB (and you pointed that out in at least 3 of the posts). So maybe we are doing something right!

    The single threaded replication is not actually being worked on yet, but Kristian’s work on a generic replication api should enable someone to more easily work on that.

  8. says

    Henrik, yes — a series like this could quickly turn into a MySQL-bashing party, so I am trying to be careful to point out solutions and work-in-progress, where I know of it. I want this to be informative but not negative :-)

  9. says

    Hi!

    Thread per connection is not really the issue, the issue is that the threads are not being scheduled on events. The framework in 6.0 never worked, all it ever did was allow you to swap a THD around on threads. At the beginning of each query it set some global pointers (yuck!) and then executed. It never released the context until it was done. What happened? Dead-lock galore.

    I deleted that scheduler out of Drizzle before we went beta. Under load testing it locked up constantly, especially if you did anything outside of auto-commit. We fixed some of the issues with this, and while our sessions are not held strongly to a thread, I know that until locking understands the concept of a sleeping IO you will have a mess if you attempt to do much on this front.

    Anything MySQL derived has a long ways to go before it can achieve an event based IO/thread system like what Memcached and others have.

    Cheers,
    -Brian

  10. says

    Reading the blog posts from Mark and Tim as well as Brian’s comment, it seems as if pool-of-threads does not solve the real problem, which is lock granularity. Plus it’s a hard model to debug–you can’t track user sessions easily in a debugger and it is also more difficult to reason about concurrency outside of a thread model. Why not get sessions to work properly and then fix whatever is left by improving applications?

  11. says

    Baron, well it depends if your glass is half empty or half full. What I see are suggestions of what should be on a roadmap. It seems these limitations are rather commonly accepted, so then it is a good list of what one should be working on.

    As for bashing, also for MariaDB this is vaporware. Only pool-of-threads is actually in MariaDB 5.2 and we were already aware of the criticism presented also in this thread. Actually, this reminds me of a MySQL saying: “Are you part of the problem or part of the solution?”, a saying often used to indicate you shouldn’t talk about problems. In my opinion, it is important to first identify a problem, so that it can then be fixed. So no, I don’t see talking about limitations as bashing.

  12. says

    Connection pools and persistent connections have the weakness: They must fully reset the connection between uses.

    Otherwise, state held from a previous unrelated session can affect behaviour, and that’s really bad.

    MySQL has no SQL-level “reset connection” command, and many applications like to do a few commands at startup (for example SET NAMES utf8 is common). Because of this, in a pooled application, I typically see either several commands to reset the connection, or bugs caused by unwanted state.

    Yes, you can use mysql_change_user, then re-issue any SET NAMES etc. Is this quicker than reconnecting? It will involve at least two round trips anyway.

    Connection pools / persistent connections also suffer from a “stale connection” problem, where a connection which has not been used for a while will become unusable, either because the server has disconnected it (wait_timeout etc) or because a stateful firewall has timed out its connection (or otherwise dropped it).

    So many apps do a “ping” on reuse of an old connection to make sure it’s still alive. This adds another round trip.

    Peristent conntections often cause grief, and are mostly useless.

    That’s my opinion

  13. Nickolay Ihalainen says

    The real problem of many connections is unused memory buffers like sort/join and additional locks on one_thread_per_connection_scheduler checks increments/decrements. Maybe per-connection tread should just autorelease buffers on timeout? Also it can be useful to hold a lock or locks on thread count after all thread init completes successfully. If the priority of connection threads lower then a priority of system mysql threads, the thread storm it’s not realy a problem on modern OS.

  14. yakup says

    I think, the write very good and I search MysQl. Howewer, other writers are sayed diverse this topic. :)

Leave a Reply

Your email address will not be published. Required fields are marked *