September 18, 2014

Sharding and Time Base Partitioning

For large number of online applications once you implemented proper sharding you can consider your scaling problems solved – by getting more and more hardware you can grow. As I recently wrote it however does not mean it is the most optimal way by itself to do things.

The “classical” sharding involves partitioning by user_id,site_id or somethat similar. This allows to spread data more or less evenly across the boxes and use any number of boxes. However this may be not the most optimal approach by itself because not all data belonging to same user is equal.

Consider Blog or Forum as example – most likely few last posts will get majority of hits while things written year ago are accessed with much less frequency. You can often level off this significantly for reads by using caching (if things are accessed frequently they are served from cache) but you still have to deal with writes which can be significant depending on your design.

It does not only have to be active portions of data same way you can have active users and ones which are almost dead.

Another interesting type of data which I find often kept on the same “cluster” without good reason is some sort of logs or history data. Think about Wikipedia page version history for example which accumulates to huge volume which very few users need to read, various change logs etc are other type of this data.

Besides separating “cold” data from “hot” it often makes sense to separate data based on its importance for system operation – for example if page versions data is currently unavailable for Wikipedia it is still possible to serve 99% of reads and even possibly handle writes by queuing new version creation.

It may make sense to separate data on table (or partition) level to get better “clustering” – because data is usually cached by pages rather than by rows and index entries having all hot data in separate table from cold data is much more efficient for caching than having table with the mix, even though the total size remains same.

It also often makes sense to separate data on the server level. Keeping Hot and Production Critical dataset small you can both make system to perform faster as well as well as get plenty of operating benefits – small database takes less time to backup and restore it is easier to do ALTER TABLE and replication would not fall behind as easily.

You can also use different hardware for different parts of data – you can hold “Hot” data on fast RAID volume or even SSD while place archive data on slow but large SATA volumes (unless response time will not become show stopper)

Of course not all applications need to use this technique but there is significant class of application which can benefit from it dramatically.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. mike says:

    It would be cool if MySQL itself had some sort of LRU type query cache method for re-organizing how data is physically stored. That would be cool.

  2. peter says:

    Well it would. Though I would like it to get online defragmentation first.

    Also note it is not as easy as last access time is the only important factor for data organization – on the contrary there is bunch of other things such as access paths which should be considered.

  3. mike says:

    Hmm. Even having the option would be nice. A configuration option (just like InnoDB recovery style) would probably be used by a lot of people.

    Trying to determine how to partition my data and all that myself seems a bit daunting. I’d be happy with an LRU-style thing. I mean, I will deploy memcached and use that for caching prior to the database anyway. But any further optimizations to MySQL are welcome.

  4. howa says:

    I think the golden rule for large site should be don’t let your visitor hit your MySQL directly?

    Such as using Squid?

  5. peter says:

    Of course there are layers of caching – squid, memcache etc but finally some load hits the database and you need to deal with it.

  6. Frank says:

    I honestly think some things are out of the scope of mysql, I think you’re application should be smart enough to pull from a cache. In fact frameworks (.net, cakephp, rails, etc) have built in cache (file, memory, mcache, etc) support you just need to use it. MySQL is very powerful and while I think it needs to do more beyond master/slave replicationing, determining how you use it’s data isn’t exactly it’s job. Being able to treat multiple servers, db’s and tables as a cluster and provide easy to use methods for moving data (tables and databases) for optimization of access might be within it’s scope. Knowing when and why to move your data rests firmly with you and your app.

  7. peter says:

    Frank,

    Databases are at large extent created to ease the development process and the more database can do to help it the better. Sure frameworks can have their cache but this cache is not good enough in all cases plus you may have different application accessing same data.

    Database may not need to guess how you’re using its data but at least follow your recommendation. If you think about it indexes are nothing more but hints to the database – I’m going to use this column for lookups and I want these to be fast. Clustering, partitioning or physical sorting is another ways to speedup certain access paths.

  8. Wes Mahler says:

    Interesting way to partition data!

  9. howa says:

    Let say, partitioning by user_id, the main difficulty I think is not partitioning, but load balacing, e.g. 80% of users are inactive, how can do spread the 20% of active users across all the database servers for max. performance?

  10. peter says:

    Howa,

    First I should say the problem is often exaggerated – indeed some users can be 100 times more than others, so what ? With big numbers of say 300000 users per box it all evens out pretty well anyway.

    There is however a different problem which is about user “groups” not users individual users. Consider for example you decided to put out box one by one and keep all registrations going to them until it gets 300.000 registrations.

    This may be suboptimal because you may find out users are more active first days after registration, it may be even more complicated like users which came through some marketing action may have sustained interest while from other may mostly leave after short period of time.

    It is also quite natural for some users to stop using service over time so proportion of active users goes down.

    So you need to think and see what kind of allocation policy makes sense in your case. There are many ways to organize it.

Speak Your Mind

*