September 1, 2014

The Doom of Multiple Storage Engines

One of the big “Selling Points” of MySQL is support for Multiple Storage engines, and from the glance view it is indeed great to provide users with same top level SQL interface allowing them to store their data many different way. As nice as it sounds the in theory this benefit comes at very significant cost in performance, operational and development complexity. Let me touch just on the few complications it brings in

Conversions – Each storage engine has its own data storage format so MySQL has to do a lot of copying and conversions while accessing the data, which significantly limits performance compared to basically “zero copy” design one may have streaming data from memory when it fits there.
Optimizer and Execution Storage engines are not created equal especially if you look at In Memory storage engines vs Distributed ones. Trying to handle all different cases in Optimizer – such as in memory nature of MEMORY storage engine, clustering by Primary key in Innodb and distributed nature of NDB makes it more complicated when it should be and still because it has to serve so many use cases it is not tuned to exploit full performance potential of any given storage engine.
Synchronization The top level on MySQL side, such as .frm files and binary log files has to be synchronized with storage engine transactions which require a lot of complications and performance overhead. MySQL has to do several fsync() calls per transaction commit to be on the safe side and still number of operations are not fully safe – you can for example get .frm out of sync with internal data dictionary if you crash at wrong time.
Transaction Support We have to deal with mix of locks on high level as well as on storage engine internally. We have to deal with different locking strategies storage engines may have and if we venture into using cross storage engine transactions we’re up for a lot of surprises.
Backup Cross storage engine backup is a pain because they are so different. Some of them are distributed and some in memory so even OS level approaches such as LVM backup may not work in all cases. We also get into synchronization aspects making backup which prevents from having hot backup in many cases.
Replication Support for different storage engines is a pain in replication. To start we have to write second binary log file when all transaction engines have their own !
Quality Assurance Multiple storage engines are much more complicated to test. And I believe multi-storage engine use is really not tested very deeply. I’ve seen some tests being run for different storage engines, but really this is not enough as MySQL allows you using multiple storage engines in the query you need to test all kinds of combination such as running multi-table update joining PBXT and Innodb tables.
Operations Things like backup and monitoring becomes a lot more complicated plus you have to deal with balancing – allocating resources to different storage engines and performance tuning becomes more complicated.

Now what is interesting for probably 95% of applications single storage engine would be good enough. In fact people already do not love to mix multiple storage engines very actively because of potential complications involved.

Now lets think what we could have if we have a version of MySQL Server which drops everything but Innodb Storage engine (it could be NDB, PBXT etc, it does not matter).

We could save a lot of CPU cycles by having storage format same as processing format. We could tune Optimizer to handle Innodb specifics well. We could get rid of SQL level table locks and using Innodb internal data dictionary instead of Innodb files. We would use Innodb transactional log for replication (which could be extended a bit for this purpose). Finally backup can be done in truly hot way without nasty “FLUSH TABLE WITH READLOCK” and hoping nobody is touching “mysql” database any more. Single Storage Engine server would be also a lot easier to test and operate.

This also would not mean one has to give up flexibility completely, for example one can imagine having Innodb tables which do not log the changes, hence being faster for update operations. One could also lock them in memory to ensure predictable in memory performance.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. I think this is exactly the reasoning why PostgreSQL only has one storage engine to rule them all.

    It seems like the Drizzle project is trying to address a lot of these types of performance issues, but they are keeping support for pluggable storage engines (MyISAM notably absent) and even extending it. I wonder what their take on those issues is, or if they are able to get around some of them (perhaps by rigidly defining how a storage engine must behave/store data to avoid the copying issues?). If anyone has some insight I’d love to hear it.

  2. Igor Babaev says:

    Yep. The only minor problem is to find developers to implement all this.

  3. peter says:

    Jay,

    I think Drizzle is going opposite direction with everything abstracted and modular.

  4. Arvids says:

    These are right thoughts on how to boost the InnoDB performance and MySQL as a whole. But still, MySQL is really good in cases like you have tables with lots of changing non essential data to move to MEMORY table or tables (like “online”, “peers” in Bittorrent trackers, etc). It gives a really good performance boost. NDB is second good thing to use, although it’s not used as wide as other things.
    Basicaly I thing than there shouldn’t be InnoDB only, but to leave some essential engines like InnoDB, MEMORY and NDB and make them work together better. It would be like performance boosted edition with limited capabilities on supported storage engines.

  5. Vladimir Kolesnikov says:

    Peter,

    I see your point but I think the current MySQL approach addresses the performance in a more wider scope – you cannot make generic code fit well one specific engine but as time passes and demands change this approach allows fast development of solutions (implemented in the form of engines) that address specific problems better than one highly optimized generic solution.

    I think in some sense this is the philosophy of MySQL :)

  6. peter says:

    Arvids,

    Yes. I do not question the fact there are gains you can get from using multiple storage engines. However as I stated before it is not needed for probably 95%+ of applications out there. These 95% could be served better by something more simple than that. Having 3 storage engines especially so different ones already brings most of the problems.

  7. peter says:

    Vladimir,

    Yes I understand it is philosophy of MySQL I also understand for a lot of companies this is a blessing – PBXT, TokuTek, probably would not exist if MySQL would not offer pluggable storage engines. Even Innodb itself could only be brought to users because MySQL had a storage engine interface Innodb could hook up to.

    I do not think MySQL has to change. I’m just wishing among the recent MySQL forks we would have someone focused on solid solution which focuses on simplicity and making small set of features which work really well rather than adding more and more complex features or having everything plugable.

  8. Isotopp says:

    PeterZ, most of these commercial storage engines do not exist – at least where I work. See http://mysqldump.azundris.com/archives/87-Why-commercial-storage-engines-do-not-excite-me.html

  9. peter says:

    Isotopp,

    Right. Though this is you. There are number of people using commercial storage engines and happy about it. Though of course the popularity of commercial storage engines is nowhere near popularity of GPL ones. Regarding Support Topics true Commercial Storage Engines limit third party and self support dramatically. However storage engine vendors would typically have their own support or some may make a partnership with Oracle or other third parties to support their solution.

  10. Hi Peter!

    Some of what you are referring too is just the cost of having much of the design of MySQL based around the limitations, and features, of MyISAM. You can’t entirely get rid of the concept of “multiple engines”. Why? Because you will want to have an in-memory engine for some actions (and Innodb treats standard and temporary tables as the same, which is non-optimum).

    As far as testing goes, there are many bugs in the multi-storage interface. It took us a number of months to make the test suite all work with Innodb (and fix the bugs along the way). Paul is still working on fixing issues that were found with PBXT after porting to Drizzle (since we can run a larger test suite against PBXT).

    In Drizzle we have taken some engines which cause performance and/or feature issues and made them temporary only for the time being. This solves a number of architecture issues (and we are just beginning to really make use of some of this, there is a lot of work that has to be done in order to make a number of things work). It is tough to really weigh the real cost in multiple engines until you understand how internally each is used (and how the cost adds up). At some point when we close in on our beta I will blog about some of these.

    Cheers,
    -Brian

  11. Hi Peter,

    I think Brian has a very good point when he refers to: “much of the design of MySQL based around the limitations, and features, of MyISAM”.

    It is still clear today that the storage engine API was based on a non-transactional, ISAM-like interface. There was no re-design done, when the first transactional engine was added.

    But, this is history, and I know that there were very good reasons for doing things this way at the time (for example, not wanting to destabilize the existing MyISAM implementation).

    What Drizzle has recognizing, for example, with its latest changes to the API is that the non-transactional case is simply a subset of the transactional case. For example: an engine->startTransaction() call is a no-op for a non-transactional engine.

    But this call does not exist in the standard MySQL engine API! Why not? Well because it is possible for the engine to hack in a begin transaction call by counting external_lock() calls…

    So what I am saying is this: all of the points you raised above are major sticking points. However, most of these problems can be solved by moving to a well designed API.

    And, I think it is worth it, because this is definitely something that sets MySQL/Drizzle apart from other SQL implementations.

  12. Percona has written its own storage engine, xtradb. Does this mean you are going to discontinue development on it? Or will Percona server only use xtradb in the future?

    I have no idea how to understand this post in this context; either you are going to radically change Percona Server / xtradb to be incompatible with MySQL, or you are saying one thing and doing another and being a hyprocrite.

  13. Rob Wultsch says:

    For the sake of argument why not just use Postgres?

    PG seems significantly less buggy and significantly more responsive to bugs. (#989 I am looking at you)
    PG only has a single storage engine. No multiple data dictionaries and hyper easy hot backups.
    PG seems to have fewer weird issues scaling up.
    BSD licence.
    No conference surprise features.
    No crippleware^H^H^H^H^H^H^H^H^H^H enterprise only features.

  14. peter says:

    Brian,

    Right. The bad interface for storage engine adds to the list of problems but I still believe the problems are general. How many of the items in my list does Drizzle design is posed to solve completely ?

    Now regarding temporary storage for resolving queries it does not have to be handled the same way as normal operations. At least one can argue it should not. Internal temporary tables are handled just by single thread which has created them which includes all kind of concurrently. Regarding treating In memory and on disk tables equally I think Innodb could be improved in a way it does it but again I argue for a lot of users simplicity and performance in general case can be a more important than getting last possible bits of performance in niche cases.

    Please do not get me wrong I’m not suggesting this as a path for Drizzle or MySQL my point is 95% of users would be served better by more tight knit solution.

  15. peter says:

    I see some people do not understand where this post comes from and what are Percona plans regarding the issues I mentioned. Here is example:
    http://www.pythian.com/news/12437/the-doom-of-xtradb-and-percona-server/

    First I came to write this post after serving on advisory to number of database engines, some not MySQL related at all as well as well as looking at design of NoSQL solutions as MongoDB. I was in many cases pleased how simple designs they can use because they do not have this multiple storage engine complication.

    Percona Server is not going to rip away support for other storage engines, there is no point doing that without significant changes to the architecture and result obviously will not be as compatible with MySQL as we want Percona Server to be.

    I’m simply wishing some thing like this would exist though we are not in position to jump into creating it right now.

  16. peter says:

    Paul,

    Transactional API is only one part of the story, a very small one. Think about Optimizer for example – it has to operate entirely different for local in memory data and distributed storage engine. I recognize this is major point for Drizzle and MySQL which at certain point attracted development and user community.

  17. Great post!

    I always like the idea behind the ability to use multiple storage engines. The idea that “not all data is created equal” makes sense, and it stands to reason that the is benefit in the ability to address specific needs with a specific storage solution. But over time, a few things made me feel uncomfortable:

    - if different storage engines cater to different use cases, then why is the SQL dialect, even DDL, just a fixed set? Shouldn’t we be able to configure tables backed by a particular storage with options that are specific to that engine? In the case of a storage engine plugin, shouldn’t the SQL dialect extend accordingly? Sure, there may be some table options today that apply only to a specific engine, but the syntax for these options is hard-wired in the code proper – not in any particular storage engine. Essentially this means that if you want to develop a storage engine yourself that needs specific table options, you can’t just write a plugin – you actually have to fork the server to allow the user to talk properly to your engine.

    - perhaps an extension of the previous point, shouldn’t data type support be dynamic too? It seems reasonable that one of the special advantages of a particular storage engine would be that it implements particular data types. Again, if we’d want to create an engine to support a particular specific data type, we have no option but to fork the server.

    - you already mentioned it, but indeed – what about the optimizer? Shouldn’t the optimizer be extensible and pluggable too? Column-oriented storage engine vendors like infobright and calpont all had to put in their optimizer next to the native one in order to benefit from their engine. Is that just a corner case? WOuld it even be possible to come up with an optimizer design that is so generic that it is even possible to fully benefit from different engines?

    (I am not pretending I have any answers – just voicing concerns about the current state of the ‘pluggable’ SE)

  18. Hi Roland

    The points you raise are actually being worked on:

    CREATE TABLE options exists in MariaDB 5.2: http://askmonty.org/wiki/Manual:Extending_CREATE_TABLE

    As part of Timour’s “Query Fragment Pushdown” work (this is a set of several worklogs) in the SE summit we also discussed the issue of “normalized cost model” for the optimizer, where an engine could give info to the optimizer on which operations will be efficient and which are costly. (A more detailed explanation would be beyond my ability to write it, and space available here…) The obvious alternative would be a pluggable optimizer, but it seems preferable to have the optimizer constant, and let the engine just influence the decision.

    Some other of Peter’s comments are also valid, but are not to my knowledge being addressed – for instance nobody is trying to get rid of the .frm files. Especially for “remote” engines like NDB, syncing the .frm is a hassle.

  19. waterguo says:

    well said. modularization comes with cost. in database world, the cost is too high to be ignored.

  20. Kevin says:

    Doesn’t MyISAM still offer some significant benefits in certain situations? If you have a database with a relatively small, constant, known INSERT/UPDATE load, but a very large SELECT load, are you not still better off using MyISAM with fixed-length rows over InnoDB?

    If so, I have to question the assumption that “for probably 95% of applications single storage engine would be good enough” if that storage engine is InnoDB. I find it hard to believe that the sort of use model I describe accounts for less than 5% of all MySQL uses.

    Has this situation changed?

  21. peter says:

    Kevin,

    Yes there are situation when MyISAM is faster. However it is not as general as “all selects are faster” – MyISAM has serve problems with contention on the key cache mutex limiting scalability dramatically as well as slow access to the row data if access is done by index because it is only cached in OS cache.

    What you seems to be missing is if application performance can be improved by using mix of storage engines it does not mean it is justified. Every complication has it costs and I believe for 95% of MySQL applications the performance advantages by such fine tuning in storage engines is not worth it.

    This is not to mention if MySQL would be highly optimized for transactional storage engine such as Innodb the gap would be a lot less.

  22. Simon says:

    Yea, I don’t use MySQL hence I don’t have the possibility to use different storage engines as all other database engines have only one storage engine. I’ve never suffered from that.

Speak Your Mind

*