I wrote a post a while back that said why you don’t want to shard. In that post that I tried to explain that hardware advances such as 128G of RAM being so cheap is changing the point at which you need to shard, and that the (often omitted) operational issues created by sharding can be painful.
What I didn’t mention was that if you’ve established that you will need to eventually shard, is it better to just get it out of the way early? My answer is almost always no. That is to say I disagree with a statement I’ve been hearing recently; “shard early, shard often”. Here’s why:
Or to phrase that another way:
I would never recommend sharding to a customer until I had at least reviewed their slow query log with mk-query-digest and understood exactly why each of the queries in that report were slow. While we have some customers who have managed to create their own tools for shard automation, it’s always easier to propose major changes to how data is stored before you have a cluster of 50+ servers.
Resources
RELATED POSTS