The Etsy Shard Architecture: Starts with S and Ends With Hard

Average: 8.8 (5 votes)

In the beginning, all websites start from a single data store, but the abundance of open source solutions today make it apparent that read and write bottlenecks are quickly reached and some form of scaling is eventually necessary. The two extreme choices are to vertically scale and hope that hardware outpaces your capacity requirements, or to horizontally scale by sharding the data so that additional capacity is simply a matter of adding shards.

At Etsy, we made a decision to horizontally scale our data by sharding it across MySQL multi-master replicated shard pairs. This gives us the benefit of infinite horizontal scalability, redundancy within each shard, excellent fault tolerance, and improved performance.

Scaling is not without its caveats, and some specific decisions must be made about the overall architecture. For example choosing the shard based on a range, a key hash, or a lookup table can be one of the most important decisions about how the shards are initially set up. Other decisions such as how to generate primary keys also come into play very early on.

Etsy uses a lookup based approach where index servers are queried initially for the shard the data resides on. This gives us the flexibility to easily move data around based on capacity requirements. This talk will cover everything from how the multi-master pairs are setup and queried, how we find the data using index servers, and how we generate keys using ticket servers. It will also cover some of the more non-trivial aspects such as schema changes and redistributing data using custom shard migration tools.

Horizontal scaling using a time tested database such as MySQL is a great way to avoid the bottlenecks of a single data store. With a carefully thought out architecture and the proper tooling, it will grow with your traffic and allow you to combat performance and storage limitations by simply adding hardware. This talk will show you exactly how Etsy does this using MySQL.

Trends in Architecture and Design
Experience level: 

Schedule info

Time slot: 
11 April 14:00 - 14:50
Ballroom C

Schedule Info

11 April 14:00 - 14:50 @
Ballroom C


John Goulah's picture
Dev Infrastructure Tools Lead Engineer, Etsy

John Goulah has been working in New York City over the last several years for a number of web sites in both technical and management roles, as well as the co-founder of several startups. Having spent much of his youth touring in rock bands and hacking from the road, he is no stranger to crowds, be it a smoke filled room or presenting to the company board. He strives for non mundane tasks and has automated himself out of his last few endeavors, which has landed him in his current role as an Engineer at Etsy, the leading marketplace for handmade goods.


Sponsored By