I’m not going to speak about disruption and commoditisation of Database Market, leaving this for Market talks, my interest is market of Web Applications in general.
Clearly web is not enterprise and has a lot of different properties – Top web sites like Google, Yahoo, FaceBook have to have to provide online service to tens and hundreds of millions of users, having qucikly changing applications which have to be deployed with relatively tight budget. Especially if you think about it as $/visitor or $revenue/server the difference for most enterprises vs most of web applications in insane.
Not only that but many of these systems were started by beginners so traditional databases were way to complex to use too.
All these requirements made traditional databases irrelevant for many web properties – too complex and too expensive to start with.
In second half of 90s when MySQL appeared on the market many web applications were simple and often not using database at all. MySQL did not have replication or transactions at that time but it was easy to understand fast and free and this is what a lot of people looked for that days.
Over last 10 years Web applications are changed and so are their demands to data storage – now web applications have to handle much higher load and much larger data sets, they are also much more complicated and performance of MySQL is often the problem even for medium scale sites.
As response to these things few things happen – MySQL kept being “simple” and introduced relatively simple performance boosters – Query Cache and Replication, which were clearly not enough to solve all performance problems (like scaling Writes with replication or dealing with large data sets). Web crowd simplicity and really had a little choice so most of them instead of “upgrading” to Oracle or other systems which offer more performance features out of the box started deploying their custom solutions – Sharding and Caching. Indeed Memcache is the most known caching technologies today, at large extent because Brad opensourced it rather than keeping it in house. There were a lot of custom technologies created for caching or scalability issues MySQL could not solve well. There are inhouse and opensource solutions for tasks for large scale file storage, queuing, data processing etc.
But seriously if you look at this – people do not really enjoy adding memcached – cache handling with all its invalidation and consistence issues is not fun. Neither they like sharding with requirement to split the data in defined way so cross shard queries become a pain. Dealing with lag in MySQL Replication is another issue which complicates application development. All of it is not fun – developers had to do this because there is no product out where which would allow them to build their application in easy way not dealing with all those issues. And I’m not speaking about just database here, but rather whole stack to allow to build scalable web applications in an easy way.
Customers are constantly asking me if there is something which would help them to scale MySQL and get some HA out of the box even on the medium level. Seriously – MySQL Cluster, Continuent, Master-Master Replication, DRBD or SAN based HA architectures all have their limits which makes neither of them used for very wide class of applications.
Another interesting trend which is happening is Web is getting more enterprise like. After few years of geeky startup growth many companies get sold or otherwise become structured in more enterprise fashion and become thinking more enterprise wise – they may want more packaged solutions rather than custom architectures, they also may not enjoy running too many servers. For others it becomes space and power constraint.
It is also worth to mention Web vs Open Source component here. I think there is an interesting split – some companies I talk to are committed to “no vendor lock in” policy and would not like proprietary solutions, another – typically ones having already some of these system in house do not really care and would go with commercial solution if it solves their pains.
So what do we see as response to these requirements ?
From MySQL side we have further development of MySQL Cluster to be more usable for web apps as well as development of MySQL Proxy to help with sharding or dealing with use not fully up to date slaves.
Some innovation is coming from third party vendors – InfoBright and NitroDB presented Storage Engines targeting certain workloads. PrimeBase is working on scalable blog streaming to make it possible store large blobs such as images in the database efficiently.
The other Wave is appliances – you can see Violin Memory appliance which can be used with MySQL to get very fast IO and so consolidate system suffering from IO bound workloads. There is KickFire appliance around the corner which more focuses on CPU bound complex queries and there are more in development.
Though I think most interesting development when it comes from Web Apps come from another side and is abut not SQL and non relational data processing and storage system – BigTable with MapReduce, Amazon Dynamo and SimpleDB, Hadoop
Another angle of customization of data store and processing was using non SQL and not systems – Google Big Table, Amazon Dynamo and SimpleDB, Hadoop, CouchDB which comes computed with Cloud Computing and dealing with geographically distributed very large scale systems.
This is the area where I would expect next big innovation to happen, when it comes to Web applications. Web applications concepts operate with concepts which are not very efficiently handled with SQL and relational operations (think social graph or permissions).
I would expect MySQL to continue to drill into Enterprise market during next few years but Web Applications starting to more and more relay on alternative systems for data storage and management (well Google does it already).
I also think the piece which is missing now is not the database but rather concept and platform – developers do not want to care about database and caching they just want their application to be quickly developed and scale well. So what we need is some kind of breed between Ruby on Rails in terms of getting it up and running fast and Scalability on the scale of Big Table with Map Reduce.
Indeed I think MySQL Replication usage will reduce, but I would not expect Memcached to be leading pushing factor.