Downloads

Blog

"Shard early, shard often"

November 16, 2009

Author

Morgan Tocker

Insight for Developers

MySQL

Share this Post:

I wrote a post a while back that said why you don’t want to shard.Â In that post that I tried to explain that hardware advances such as 128G of RAM being so cheap is changing the point at which you need to shard, and that the (often omitted) operational issues created by sharding can be painful.

What I didn’t mention was that if you’ve established that you will need to eventually shard, is it better to just get it out of the way early?Â My answer is almost always no. That is to say I disagree with a statement I’ve been hearing recently; “shard early, shard often”.Â Here’s why:

There’s an order of magnitude better performance that can be gained by focusing on query/index/schema optimization.Â The gains from sharding are usually much lower.

If you shard first, and then decide you want to tune query/index/schema to reduce server count, you find yourself in a more difficult position – since you have to apply your changes across all servers.

Or to phrase that another way:
I would never recommend sharding to a customer until I had at least reviewed their slow query log with mk-query-digest and understood exactly why each of the queries in that report were slow.Â While we have some customers who have managed to create their own tools for shard automation, it’s always easier to propose major changes to how data is stored before you have a cluster of 50+ servers.

0 0 votes

Article Rating

7 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Arjen Lentz

16 years ago

It’s good to think about sharding from early on, that is – keep it in mind.
Doing it early on tends to also be a bad idea because you don’t yet know where the nasties will be when things grow.

For instance, originally independent components (good candidates for functional sharding!) may end up needing to be tightly integrated (joins) to deliver what users end up doing with your system.

Michael

16 years ago

I agree that sharding simply for the sake of it can create more problems and too much work early on when resources are scarce. However, there are at least three good reasons to shard early — or at least run your dev environments sharded, even if you have a single “shard” in production.

1) It’s too tempting to produce joins or otherwise assume all data is in the same place, especially if you’re using an ORM layer (developers tend not to look at the resulting queries too closely). This makes it difficult to shard later (or vertically partition), and almost impossible to do it quickly.

2) The tools you use for data access layer may not handle sharding correctly, or may need significant changes, or may result in different idioms. This also makes it difficult to shard later, and will involve significant retraining.

3) You may expect significant traffic bumps as your company gains publicity. Forget the cost for a moment, it’s simply less work to add DB servers into the mix than to upgrade to one with more RAM/CPU/disk or whatever it is you need.

But, truthfully, the best investment a new company can make is hire smart people with experience in both scaling systems and optimizing performance. 🙂

Mark Callaghan

16 years ago

Several things can delay the need to shard: affordable RAM, affordable flash storage, InnoDB plugin or XtraDB and smart people including expert consultants. Hopefully all of these are given proper consideration. Sharding is usually much easier than re-sharding (splitting) data on an overloaded shard. The plan to shard must include a plan to reshard.

Peter Sankauskas

16 years ago

Your first bullet point is mostly true… definitely nothing replaces good data design with effective use of indexes. However, there are some reasons why you want to shared (or at least build it) from the start. Your second bullet point is true, but misses the advantage. Having multiple servers allows you to experiments with different options, and if done properly, lets you choose the best optimization sooner. Depending on how the sharding (or combined with replication) is done, you may also be able to create an index (which locks a table) but not face bringing the application down to do it.

One more point that is somewhat overlooked is stress. It is much easier to think about and plan out a good sharding model when your do not have to deal with an application that has performance problems, all while the boss wants to add more features. If you do leave sharding for later, make sure everyone on the team knows about the potential technical debt, and it is accounted for. You don’t want to have to rush into sharding under pressure/stress.

Author

Morgan Tocker

16 years ago

Michael and Arjen: You raise good points about not introducing functionality that could block sharding if you’ve established you need to shard at some point. That can be a case where prolonging pain causes more pain – and I’ve certainly seen that happen.

My main message here is really in response to a repeating pattern on systems I’ve had to work on. If you are using a pretty standard box with 1-2 spindles and 4-8G of RAM, there are a lot more opportunities open to you before sharding. Most of the items in Mark’s list (comment #3) were not available 2-3 years ago.

Author

Morgan Tocker

16 years ago

Peter – Yes, having multiple machines certainly is helpful in A/B testing changes, but at the simplest level tuning queries with mk-query-digest is fairly safe unless you go too far with indexes with a write-heavy workload.

There’s nothing wrong with planning to shard early – just hold off implementing it while you can. As Arjen wrote, users can surprise you – the components you thought wouldn’t grow exponentially do.

Djamel Hamas

6 years ago

Hello,
Shard with hashed key of 32 digit of GUID.

I tested in our environnement of “perf” insertion of 1000000 documents per bulk continuously:
One shard with replica set took ~ 3 minutes per bulk
Two shard with replica set took ~ 1 minutes 40 secondes per bulk

After 18000000 documents inserted i noticed degradation of performance :
One shard with replica set took ~ 5 minutes per bulk
Two shard with replica set took ~ 2 minutes 40 secondes per bulk

We can conclure that with two shard we multiplied performance of insertion by 2.