I wrote a post a while back that said why you don’t want to shard.Â In that post that I tried to explain that hardware advances such as 128G of RAM being so cheap is changing the point at which you need to shard, and that the (often omitted) operational issues created by sharding can be painful.
What I didn’t mention was that if you’ve established that you will need to eventually shard, is it better to just get it out of the way early?Â My answer is almost always no. That is to say I disagree with a statement I’ve been hearing recently; “shard early, shard often”.Â Here’s why:
- There’s an order of magnitude better performance that can be gained by focusing on query/index/schema optimization.Â The gains from sharding are usually much lower.
- If you shard first, and then decide you want to tune query/index/schema to reduce server count, you find yourself in a more difficult position – since you have to apply your changes across all servers.
Or to phrase that another way:
I would never recommend sharding to a customer until I had at least reviewed their slow query log with mk-query-digest and understood exactly why each of the queries in that report were slow.Â While we have some customers who have managed to create their own tools for shard automation, it’s always easier to propose major changes to how data is stored before you have a cluster of 50+ servers.