Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Don’t Spin Your Data, Use SSDs!

September 9, 2016

Author

Peter Zaitsev

MySQL

Share this Post:

This blog post discussed the advantages of SSDs over HDDs for database environments.

For years now, I’ve been telling audiences for my MySQL Performance talk the following: if you are running an I/O-intensive database on spinning disks you’re doing it wrong. But there are still a surprising number of laggards who aren’t embracing SSD storage (whether it’s for cost or reliability reasons).

Let’s look at cost first. As I write this now (September 2016), high-performance server-grade spinning hard drives run for about $240 for 600GB (or $0.40 per GB). Of course, you can get an 8TB archive drive at about same price (about $0.03 per GB), but it isn’t likely you’d use something like that for your operational database. At the same time, you can get a Samsung 850 EVO drive for approximately $300 (or $0.30 per GB), which is cheaper than the server-grade spinning drive!

While it’s not the best drive money can buy, it is certainly an order of magnitude faster than any spinning disk drive!

(I’m focusing on the cost per GB rather than the cost of the number of IOPS per drive as SSDs have overtaken HDDs years ago when it comes to IOPS/$.)

If we take a look at Amazon EBS pricing, we will find that Amazon has moved to SSD volumes by default as “General Purpose” storage (gp2). Prices for this volume type run about 2x higher per GB than high-performance HDD-based volumes (st1) and provisioned IOPs volumes. The best volumes for databases will likely run you 4x higher than HDD.

This appears to be a significant cost difference, but keep in mind you can get much more IOPS at much better latency from these volumes. They also handle IO spikes better, which is very important for real workloads.

Whether we’re looking at a cloud or private environment, it is wrong just to look at the cost of the storage alone – you must look at the whole server cost. When using an SSD, you might not need to buy a RAID card with battery-backed-up (BBU) cache, as many SSDs have similar functions built in.

(For some entry-level SSDs, there might be an advantage to purchasing a RAID with BBU, but it doesn’t affect performance nearly as much as for HDDs. This works out well, however, as entry level SSDs aren’t going to cost that much to begin with and won’t make this setup particularly costly, relative to a higher-end SSD.)

Some vendors can charge insane prices for SSDs, but this is where you should negotiate and your alternative vendor choice powers.

Some folks are concerned they can’t get as much storage per server with SSDs because they are smaller. This was the case a few years back, but not any more. You can find a 2TB 2.5″ SSD drive easily, which is larger than the available 2.5″ spinning drives. You can go as high as 13TB in the 2.5″ form factor

There is a bit of challenge if you’re looking at the NVMe (PCI-E) cards, as you typically can’t have as many of those per server as you could using spinning disks, but the situation is changing here as well with the 6.4TB SX300 from Sandisk/FusionIO or the PM1725 from Samsung. Directly attached storage provides extremely high performance and 10TB-class sizes.

To get multiple storage units together, you can use hardware RAID, software RAID, LVM striping or some file systems (such as ZFS) can take care of it for you.

Where do we stand with SSD reliability? In my experience, modern SSDs (even inexpensive ones) are pretty reliable, particularly for online data storage. The shelf life of unpowered SSDs is likely to be less than HDDs, but we do not really keep servers off for long periods of time when running database workloads. Most SSDs also do something like RAID internally (it’s called RAIN) in addition to error correction codes that protect your data from a full single flash chip.

In truth, focusing on storage-level redundancy is overrated for databases. We want to protect most critical applications from complete database server failure, which means using some form of replication, storing several copies of data. In this case, you don’t need bulletproof storage on a single server – just a replication setup where you won’t lose the data and any server loss is easy to handle. For MySQL, solutions like Percona XtraDB Cluster come handy. You can use external tools such as Orchestrator or MHA to make MySQL replication work.

When it comes to comparing SSD vs. HDD performance, whatever you do with SSDs they will likely still perform better than HDDs. Your RAID5 and RAID6 arrays made from SSDs will beat your RAID10 and RAID0 made from HDDs (unless your RAID card is doing something nasty).

Another concern with SSD reliability is write endurance. SSDs indeed have a specified amount of writes they can handle (after which they are likely to fail). If you’re thinking about replacing HDDs with SSDs, examine how long SSDs would endure under a comparable write load.

If we’re looking at a high HDD write workload, a single device is likely to handle 200 write IOPS of 16KB (when running InnoDB). Let’s double that. That comes to 6.4MB/sec, which gives us 527GB/day (doing this 24/7). Even with the inexpensive Samsung 850 Pro we get 300TB of official write endurance – enough for 1.5 years. And in reality, drives tend to last well beyond their official specs.

If you don’t like living on the edge, more expensive server-grade storage options have much better endurance. For example, 6.4TB SX300 offers almost 100x more endurance at 22 Petabytes written.

In my experience, people often overestimate how many writes their application performs on a sustained basis. The best approach is to do the math, but also monitor the drive status with a SMART utility or vendor tool. The tools can alert you in advance when drive wears out.

Whatever your workload is, you will likely find an SSD solution that offers you enough endurance while significantly exceeding the performance of an HDD-based solution.

Finally, there is a third and very important component of SSD reliability for operational database workloads: not losing your data during a power failure. Many “consumer-grade” SSDs come with drive write cache enabled by default, but without proper power loss protection. This means you can lose some writes during a power failure, causing data loss or database corruption.

Disabling write cache is one option, though it can severely reduce write performance and does not guarantee data won’t be lost. Using enterprise-grade SSDs from a reputable vendor is another option, and testing SSDs yourself might be a good idea if you’re on a budget.

Conclusion

When it comes to operational databases, whether your workload is on-premises or in the cloud, Don’t spin your data – use SSD. There are choices and options for almost any budget and every workload.

0 0 votes

Article Rating

16 Comments

Oldest

Newest Most Voted

Jouni Järvinen

9 years ago

I don’t trust SSD on the principle you can write them dead.

Andy McAdam

8 years ago

Reply to Jouni Järvinen

Sorry but that’s thinking from 8 years ago, SSDs have moved on since then and writes are no longer a cause for concern. Did you not read the line about Samsung quoting 300TB write endurance, but this only being guidance, the likelihood is it would go well beyond that.

Author

Peter Zaitsev

9 years ago

Jouni,

Nothing lasts forever. The fact the Endurance is not published in terms amount of data written does not mean there is no limit how much writes they can handle. Micron has nice Article on this topic https://www.micron.com/about/blogs/2016/february/the-myth-of-hdd-endurance Obviously being in SSD business they might be a bit biased in terms of numbers but general idea stands.

Mark Callaghan

9 years ago

Is this post good for business? OLTP on HDD means there will be more performance problems. More performance problems mean there will be more consulting opportunities to fix performance problems.

Author

Peter Zaitsev

9 years ago

Mark,

I sense sarcasm here 🙂 But if you want honest answer – I believe giving the best advice to folk whenever I am giving it for free or being paid to give one. If this best advice leads to us getting less business – it is fine by me.

In fact it makes me very proud of our team if we come in to the customer who think he needs 20 servers to do the job but we help them to do it only with 5 – saving the customer money on the serves but also reducing how much we charge for support.

Mark Callaghan

9 years ago

Reply to Peter Zaitsev

Plenty of sarcasm. But performance problems on HDD has been great for my career as person who responds to perf problems. But problems from HDD maybe weren’t fun for people who did database oncall. SSD hides so many of the problems you hit with HDD — schema change was too slow, logical backup consumed too much IO, restores were too slow, transient spikes in demand ran out of IOPs. I miss HDD. Most people wont.

Jon forrest

9 years ago

Minor typo:

“you might not need to spend also buy a RAID card” -> “you might not need to also buy a RAID card”

Dave Avery

9 years ago

Reply to Jon forrest

Fixed, thanks!

nate

9 years ago

Who seriously buys disks from a retail website or store to run some serious database ? Can’t be too many.

Compare costs of formal storage systems SSD vs HDD with high availability etc. Whether it is true enterprise storage or even tier 2 crap, cost differences can be more significant in that space. Or at least compare the cost of say a server from the likes of HPE or Dell with built in SSD vs HDD.

I use Samsung 850 and 950 pro in my laptop(Lenovo P50), HPE uses Sandisk last I heard for most of 3PAR stuff though Samsung may be getting in some of the newer stuff(7-15TB), not sure. 5 year unconditional warranty.

All of my org’s critical mysql DBs run on flash on HPE 3PAR. The oldest flash we have is 22 months at this point and the built in endurance checker says the oldest SSDs have 98% of their write life left in them (they are read intensive SSDs).

Author

Peter Zaitsev

9 years ago

Nate,

I use web site as it is easy enough to compare prices. When it comes to MySQL or MongoDB we use internal storage for systems where maximum performance or price performance is require. There is a lot of virtualization and consolidated storage (SAN etc) is used by the enterprises but typically it is well understood this is not the highest performance/cost performance option but rather one offering more flexibility and high-availability.

Looks like your point with 3PAR is what with 98% endurance left after about 2 years this has not been the problem for you, right ?

patrick

9 years ago

It’s not so easy to replace enterprise disks with evo add.
My users always complain the system is slow on the 840evo raid1 using bbu. On the old 10k rpm 300g disks it is much faster.
Has to do with raid firmware and the random long pauses the sad has. This workload is not iops intensive though.

Author

Peter Zaitsev

9 years ago

Reply to patrick

Hi, I can surely believe that. There have been a lot of compatibility issues with older RAID controllers and SSD disks. Especially “Enterprise” RAID controllers might be rather picky even about HDD you get into them. I remember ones Dell ship at certain time would refuse to work with anything but certified (read overpriced) Hard drives. 840 EVO was also one of the cheapest consumer drives so I’m not surprised they could choke on serious workloads.

Mark Callaghan

9 years ago

More fun with HDD – http://motherboard.vice.com/read/a-loud-sound-just-shut-down-a-banks-data-center-for-10-hours

Todor

9 years ago

Hi,
I am using for the last two years only ssd with hardware raid 1 (percona 5.6 handling around 1.5-2k writes per second). The performance is ok and I do not experience any issues. Of course I have a replication and this is enough. I am always trying to put the size of the database down by running archive scripts.

Author

Peter Zaitsev

9 years ago

Reply to Todor

Todd,

This is a good point – keeping database small and in memory is always best for performance. As much as SSDs get faster than HDD accessing data in memory is still significantly faster for most workloads

HAHost

7 years ago

Hello Petter.

I am currently with this question in mind. I manage web servers and are studying a server with nvme SSD. would be two 480gb disks. We want to use Nvme only for MySQl, because the directory of the sites is large and we do not yet have the largest nvme disk options for the datacenter.

Do you think this alternative should generate a good response in the performance of DB sites?

tanks