Where the open source database community meets: Use code PERCONA75 and secure your spot for Percona Live.  Register

ext4 vs xfs on SSD

March 15, 2012
Author
Vadim Tkachenko
Share this Post:

As ext4 is a standard de facto filesystem for many modern Linux system, I am getting a lot of question if this is good for SSD, or something else (i.e. xfs) should be used.
Traditionally our recommendation is xfs, and it comes to known problem in ext3, where IO gets serialized per i_node in O_DIRECT mode (check for example Domas’s post)

However from the results of my recent benchmarks I felt that this should be revisited.
While I am still running experiments, I would like to share earlier results what I have.

I use STEC SSD drive 200GB SLC SATA (my thanks to STEC for providing drives).

What I see, that ext4 still has problem with O_DIRECT. There are results for “single file” with O_DIRECT case (sysbench fileio 16 KiB blocksize random write workload):

    • ext4 1 thread: 87 MiB/sec
    • ext4 4 threads: 74 MiB/sec
    • xfs 4 threads: 97 MiB/sec

Dropping performance in case with 4 threads for ext4 is a signal that there still are contention issues.

I was pointed that ext4 has an option dioread_nolock, which supposedly fixes that, but that option is not available on my CentOS 6.2, so I could not test it.

At this point we may decide that xfs is still preferable, but there is one more point to consider.

Starting the MySQL 5.1 + InnoDB-plugin and later MySQL 5.5 (or equally Percona Server 5.1 and 5.5), InnoDB uses “asynchronous” IO in Linux.

Let’s test “async” mode in sysbench, and now we can get:

    • ext4 4 threads: 120 MiB/sec
    • xfs 4 threads: 97 MiB/sec

It corresponds to results I see running MySQL benchmarks (to be published later) on ext4 vs xfs.

Actually amount of threads does not affect the result significantly. This is to another question I was asked, namely: “If MySQL 5.5 uses async IO, is innodb_write_io_threads still important?”, and it seems it is not. In my tests it does not affect the final result. I would still use value 2 or 4, to avoid scheduling overhead from single thread, but it does not seem critical.

In conclusion ext4 looks like an good option, providing 20% better throughput. I am still going to run more benchmark to get better picture.

The script for tests:


0 0 votes
Article Rating
Subscribe
Notify of
guest

28 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Rolf
Rolf
14 years ago

Could you provide a comparison this to rawfs? Also I was under the impression that journaled FS on SSD was a bad idea for SSD.

Aurimas Mikalauskas
14 years ago

Vadim, was that done with write-back cache enabled ?

rj03hou
rj03hou
14 years ago

If we do FileIO test for mysql, do you think we need to test many files?
such as –file-num=32

joe
joe
14 years ago

What would be interesting is to see your data beyond 4 threads. The presentation http://www.youtube.com/watch?v=FegjLbCnoBw shows EXT4 did really well until scaling beyond 4 threads.

Mike
Mike
14 years ago

Are you using innodb file-per-table? ibdata getting huge would drive me nuts

XL
XL
14 years ago

Few days ago I have run very similar test: sysbench OLTP RW benchmark on spinning (non-SS) disks. The file systems on this “inherited” machine default to ext4, but I had Domas’ claims in my mind that xfs is superior. So I recreated the benchmark fs as xfs and repeated the sysbench run.

Up to 8 threads xfs was few percent faster (~10% on average).
At 16 threads it was a draw (2036 tps vs. 2070 tps).
At 32 threads ext4 was 28% faster (2345 tps vs. 1829 tps).
At 64 threads ext4 was even 47% faster (2362 tps vs. 1601 tps).
At higher concurrency ext4 lost it’s bite, but was still constantly better than xfs.

I did not look deeper into this, but used the fs modules as they come with openSuSE and with default mount options.

Mike Schueler
Mike Schueler
14 years ago

Useful results for when we move to SSDs..

Were sync_binlog=1 and innodb_flush_at_trx_commit=1 ?

Mike Schueler
Mike Schueler
14 years ago

Oh, of course, hah. Well, I’m eagerly awaiting your MySQL results.

Dave Chinner
Dave Chinner
14 years ago

Hi Vadim,

YOu’ve just found a known problem that we are working to fix. This is regressions as a result of cleaning up the IO path in the XFS code – it’s put a lot more pressure on an exclusive lock by removing other bottlenecks. Hence when we hit contention on it, it degrades more quickly than it used to. This was brought to my attention recently

http://oss.sgi.com/archives/xfs/2012-01/msg00325.html

and this was the prototype patch I wrote to fix the overwrite DIO performance problem:

http://oss.sgi.com/archives/xfs/2012-02/msg00219.html

Now that the prerequisite fixes have been merged into 3.4, I can move forward with this fix.

I ran the sysbench tests to check this was the problem – I don’t have a SSD available right now, so I ran the tests on a 2GB ramdisk with a 1.8Gb file. Unpatched results – the numbers are throughput /IOPS, with throughput being in GB/s

sync async
threads throughput throughput
XFS ext4 XFS ext4
1 1.90/124k 1.41/92k 1.72/112k 1.41/92k
2 1.01/64k 1.65/108k 0.97/62k 1.65/108k
4 0.27/17k 1.55/102k 0.21/13k 1.55/102k
8 0.13/8k 1.45/95k 0.15/9k 1.45/95k
16 0.12/7k 1.45/95k 0.12/7k 1.45/95k

It’s pretty clear from these results that lock contention is killing XFS as the thread count grows. ext4 performance shows that it uses exclusive locking as well, but it is not degrading like XFS is due to different lock types being used. With the above patch forward ported to 3.4-pre-RC1, the XFS results are:

sync async
threads throughput throughput
vanilla patched vanilla patched
1 1.90/124k 1.83/120k 1.72/112k 1.69/111k
2 1.01/64k 2.85/185k 0.97/62k 2.57/168k
4 0.27/17k 3.68/241k 0.21/13k 3.41/223k
8 0.13/8k 4.42/290k 0.15/9k 4.16/273k
16 0.12/7k 4.95/325k 0.12/7k 4.86/319k

Throughput scales with thread count – each thread runs at 100% CPU utilsation, and XFS gets up to 3x as much throughput as ext4 does. Other testing I’ve done on this machine with this patch has given close to a million 4k overwrite IOPS to a single file when completely CPU bound…

So, basically, XFS is still the filesystem you want for direct IO – but like any filesystem bugs do creep in as we improve stuff. In future, you might want to report such problems to the XFS list, rather than just blogging about it and assuming that it’s expected behaviour. Someone who thought this was a suspect result pointed me at your blog, I would have never found it otherwise.

Cheers,

Dave.

Dave Chinner
Dave Chinner
14 years ago

> I will be waiting for fix in RedHat, until that I will consider ext4 as good alternative.

That’s a very short-sighted response. If you report the bug to RH, then a fix should become available at some point in the not-to-distant future, too (and perhaps you should pay for RHEL so you can ask for bug fixes as a proactive service to your customers). At that point, the normal rule of thumb (use XFS) will prevail, and all that will have resulted is that you have a bunch of unhappy people left using ext4 because they’ve already deployed it to production on the back of your recommendation…..

Recommend ext4 for the right reasons – XFS having a performance regression isn’t one of them, because the moment XFS is fixed (and it will be) your recommendation is invalid. Your blog post will continue to be found by google for years to come, so it’s not just your current readers that you are providing bad advice to….

Dave.

Jerry Westerby
Jerry Westerby
14 years ago

Vadim,

Selecting something as important as an Operating System or major component like a File System on the basis of ONE fairly small issue is short-sighted in the extreme. Take a look at the bug list Red Hat at ant one point — the list is quite long. For that matter look at the bug list or patch list for MySQL/InnoDB at any one point in time. The list will be long, and to the uninitiated it will look frightening.

The same is true for commercial software, if the list is known. With commercial software and fat, annual license agreements, customers can try to pressure companies to move one patch ahead of another. I’ve worked in that environment, and can tell you that I’ve almost never seen a patch moved up that way unless something is completely broken.

There used to be, and in some places still is a developer culture of common decency. That’s where things like the support of ZFS outside of Solaris comes from… people do technical work of the pleasure of it, not for a pay check.

So in the first count, you are reacting to a patch issue in a way that is “fussy” in the extreme. The systems I put on-line have ZFS and we rely on it. I can just imagine some manager with little technical skill throwing your post over the wall at me. Worse, someone electing to use ext3/4 because of your post.

In the second event I find your comments rather disingenuous. Do you pay license fees for your operating system?

Benoit Sigoure
14 years ago

I’ve also had to benchmark ext4 vs XFS, on a RAID10 of spinning drives. Many DBAs like to assert that XFS is the way to go for MySQL, but I’m not sure how frequently they benchmark XFS vs ext4, and how much of their recommendation comes from the days of ext2/ext3.

So I decided to use sysbench and plot some graphs to compare the O_DIRECT performance of ext4 and XFS on the hardware we bought. I found that ext4 worked really well out of the box, while XFS required poorly documented knobs to be turned and still couldn’t beat ext4:

http://blog.tsunanet.net/2011/09/ext4-2x-faster-than-xfs.html

So we built all our MySQL DBs with ext4, and it’s been working great for the past 7 months. I recently had an interesting conversation with someone building a large Ceph cluster on top of XFS instead of btrfs, and his feedback was that some recent developments in the XFS world have greatly enhanced the metadata performance of XFS (especially with regards to metadata fragmentation), so maybe it’s time to do another benchmark.

What I found with XFS is that, to my great surprise, changing the number of allocation groups, setting the correct sunit/swidth for the RAID array, or using nobarrier, all have no statistically significant impact on performance, which seems to indicate that XFS had some bottleneck internally, maybe the lock contention issue that Dave was referring to above.

Disclaimer: I’m just someone who runs their own benchmarks, and am not religious about filesystems.

Benoit Sigoure
14 years ago

I wrote a couple scripts to be able to visualize sysbench results, they’re available here: https://github.com/tsuna/sysbench-tools

I understand you need the fixes in your kernel, and many of Percona’s customers are in the same situation. That’s why we don’t use RHEL/CentOS, because they come with generally outdated software and it’s hard to use newer versions or upgrade.

At the time I ran the benchmark above, we were using Linux 2.6.32, as my blog says. We’re now on 2.6.38, and whenever is the next time our servers reboot, they’ll be on kernel 3.x (depending what ‘x’ we use then, we’re currently on 3.0).

But I agree with you that there’s a difference between when the developers consider the issue fixed, and when end users can actually get the fix, regardless of whether or not you’re on a slow track with RHEL or on a faster track with a distro that’s closer to upstream.

sandeep
sandeep
14 years ago

quick question – the xfs_freeze feature is really killer when one wants to do consistent snapshots/backup of database directories, etc. Is there anything similar in ext4 or is the benchmark of ext4 +lvm (which I know allows this) on par with xfs_freeze (without lvm) ?

Justin Rovang
13 years ago

Benoit, nice little script you’ve got there; using it!

prl77
prl77
13 years ago

Hey guys, great discussion, thank you for your contributions.
I’m currently building a new database server and am to the point of choosing a filesystem. I was going to go with XFS, but the above findings are concerning. I’m glad I came across this discussion.
Question for Dave Chinner or anyone else that might know – is there a released kernel that has this XFS fix included?

Joe
Joe
13 years ago

I was just curious if you will test this again now that the xfs bug looks to be fixed in rhel 6.4. I am assuming centos 6.4 will be release pretty soon.

Elias
11 years ago

How would one know if you are running a patched kernel or not?
I’m using Ubuntu Server 12.04 with their standard kernel 3.2.0.

danblack
11 years ago

From Dave’s patch I found it merged in 3.5+ kernels https://github.com/torvalds/linux/commit/507630b29f13a3d8689895618b12015308402e22

If you see the same sort of code in a kernel sources (apt-get source linux-image-3.2.0-4-amd64) I guess its fixed. linux_3.2.60-1+deb7u3 didn’t include it FWIW.

Far
Enough.

Said no pioneer ever.
MySQL, PostgreSQL, InnoDB, MariaDB, MongoDB and Kubernetes are trademarks for their respective owners.
© 2026 Percona All Rights Reserved