About a week ago Marten send me email pointing to his article published on Jays Blog (Come on Marten, it is time for you to get your own blog). I should have replied much earlier but only found time to do that now. So here is my list
1. Be Pluggable
Unlike many OpenSource projects MySQL was single chunk of code and for years the only way you could officially extend it was using UDFs which was very limited. Compare this with other OpenSource projects such as PostgreSQL (plugable indexes etc), Apache, PHP or Linux Kernel. Yes in MySQL 5.1 the situation is changes – now there are plugable storage engines (something even PostgreSQL does not have) as well as Full Text Search parses but there is very long way to go before you could do any significant functionality ourside of storage engines as plugins.
Two aspects why I’m especially interested about it is Sphinx which is currently implemented as storage engine for MySQL but this role does not suite it well as well for our performance optimization works – we would like to get much more stats from MySQL and it would be great if we could write these as plugins instead of patches.
2. Be Scalable
I’d like to see MySQL Scaling to large amount of CPUs for all storage engines all operation systems and all workloads. Well this may be too much of the wish but at least it should happen for most common cases. Innodb scaling issues are the most common problem (and patches currently available only fix part of it) but far from being only problem. There are fair amount of issues with MyISAM and generally on MySQL level which needs to be taken care of. Many of these issues were known for years. The good thing is now with multi-cores coming even on laptops these issues can’t be under carpet any more and they are getting attention and gradually being fixed.
3 Be Distributed
One of the most common buzzwords used together with MySQL is Scale Out, but to say the truth MySQL does not provide much of the functionality for scale out itself, besides its fast and simple replication. You still would normally need to manually chop your data to the multiple databases and handle data merging on application together with high availability and all kinds of problems. The dream for many MySQLers is to be able to get bunch of boxes and use them same as single MySQL server with improved performance and high availability. Of course there is MySQL Cluster but it does not get where quite yet having very particular performance properties and operating requirements. In fact I do not think MySQL Cluster would ever be that thing as it provides too strong guarantees. It is same as with MySQL Replication – the reason it is fast and has low performance overhead on the master is at large extent because it is asynchronous. To offer great performance on distributed systems, especially if they are connected via not very fast network you need to come up with different consistence guaranty and semantics… and believe me most of Web applications could live with it. They anyway read stale data from the slave or memcached manually relaxing data consistence requirements.
But honestly most parts are OK – over time you develop practices and tool set to deal with multiple nodes nicely even if you do not have much support from MySQL to do so. What is really bad is Parallel query. Reporting with group by over 10.000.000 of rows in real time is one painful thing to do requiring complex application programming to do efficiently in distributed manner. Of course there are summary tables and other tricks we use when such need arises but we still can’t deny – MySQL does not handle complex queries well, being able to use only one CPU effectively on single node and no true federation support.
4. Be Solid
MySQL has very good stability track record, with exception of MySQL 5.0 it was typically safe to run it in production well before it was stable (I remember we had successful production on 3.23 alpha). However many of new releases score pretty poor on the scale of being Solid.
By by being Solid I mean Solid functionally complete features which work as uses intuitively expect them to work instead of having a lot of little known documented (sometimes) limitations. I’m saying about Prepared Statements in MySQL 4.1 which came without query cache support but even worse not all of statements could be prepared not to mention other issues. I’m saying about SubSelects which were added in MySQL 4.1 but you still could not run (and still can’t) most of your Oracle/DB2/MS SQL Queries on MySQL because subqueries optimizer weakness. I’m saying about XA which were released in MySQL 5.0 without consistent work with MySQL close Partner – Innobase to ensure Group Commit still works in 5.0 and so we do not get major regressions. I’m saying about Stored Procedures which were released without proper development and debugging support. These are just few examples.
Of course you could say “release early release often” is a good strategy and at large extent I would agree but MySQL is not released often any more. There is relatively long release cycle with great deal of planning and it looks like no features go in stable release just the day before it is released. With such release management I would expect MySQL to try to be a lot more consistent as such approach means if something is inconsistent it stays where for very long time. For example Query Cache support for Prepared Statements is only fixed in MySQL 5.1 and Most of other items are still open.
Well may be this is remainder of wild first VC-backed years of MySQL and not things are getting straightened up. I know on a lot of listed issues the work is going on and it just takes time to get them all fixed.
5. Do not forget about the roots.
MySQL has gained its popularity as database for Web but web users do not pay much because they are OK with GPL and often have smart guys inside who can figure things out and even patch MySQL if needed. So MySQL went to chase Enterprise market and develop enterprise features pushing back a lot of features which Web users would like. These features are often small, typically not SQL Standard compatible but help you if you’re web developer a lot. In my Early years in MySQL I have created more than hundred of such little projects suggested by MySQL customers and myself based on long MySQL usage for Web. Later I stopped adding these because I have not seen much traction for these tasks as well as of similar tasks created by other guys.
These would include support for query timeouts, parallel query execution, CONNECT BY support and various performance related features.
I’m not sure if MySQL got back to spend significant resources on taking care of Web guys needs but at least we have now Community tree and Web guys can take care of themselves if they want to. Jeremy Cole’s patches were accepted, our patches and I guess Google patches are in the queue which is very good sign. There is still fair amount of work to do to straighten up this process so good patches can quickly appear in community tree. Once this is done I think amount of contributions will increase a lot.