Update on fsync Performance

fsync performanceLast year, I wrote a post focused on the performance of the fsync call on various storage devices. The fsync call is extremely important for a database when durability, the “D” of the ACID acronym is a hard requirement. The call ensures the data is permanently stored on disk. The durability requirement forces every transaction to return only when the InnoDB log file and they binary log file have been flushed to disk.

In this post, instead of focusing on the performance of various devices, we’ll see what can be done to improve fsync performance using an Intel Optane card.

Intel Optane

A few years ago, Intel introduced a new type of storage devices based on the 3D_XPoint technology and sold under the Optane brand. Those devices are outperforming regular flash devices and have higher endurance. In the context of this post, I found they are also very good at handling the fsync call, something many flash devices are not great at doing.

I recently had access to an Intel Optane NVMe card, a DC P4800X card with a storage capacity of 375GB. Let’s see how it can be used to improve performance.

Optane used directly as a storage device

This is by far the simplest option if your dataset fits on the card. Just install the device, create a filesystem, mount it and go. Using the same python script as in the first post, the results are:

OptionsFsync rateLatency
ext4, O_DIRECT21200/s0.047 ms
ext420000/s0.050 ms
ext4, data=journal9600/s0.100 ms


The above results are pretty amazing. The fsync performance is on par with a RAID controller with a write cache, for which I got a rate of 23000/s and is much better than a regular NAND based NVMe card like the Intel PC-3700, able to deliver a fsync rate of 7300/s. Even enabling the full ext4 journal, the rate is still excellent although, as expected, cut by about half.

Optane used as the cache block device in a hybrid volume

If you have a large dataset, you can still use the Optane card as a read/write cache and improve fsync performance significantly. I did some tests with two easily available solutions, dm-cache and bcache. In both cases, the Optane card was put in front of an external USB Sata disk and the cache layer set to writeback.

OptionsFsync rateLatency
No cache13/s75 ms
dm-cache3100/s0.32 ms
bcache2500/s0.40 ms


Both solutions improve the fsync rate by two orders of magnitude. That’s still much slower than the straight device but a very decent trade-off.

Optane used as an ZFS SLOG

ZFS can also use a fast device for its write journal, the ZIL. Such a device in ZFS terminology is called a SLOG. With the ZFS logbias set to “latency”, here is the impact of using an Optane device as SLOG in front of the same slow USB SATA disk:

OptionsFsync rateLatency
ZFS, SLOG7400/s0.135 ms
ZFS, no SLOG28/s36 ms


The addition of SLOG device boosted fsync rate by a factor of nearly 260. The rates are also twice as important as the ones reported using dm-cache and bcache and about a third of the result using the Optane device for storage.  Considering all the added benefits of ZFS like compression and snapshots, that’s a really interesting result.


If you are struggling with the commit latency of a large transactional database, 3D_XPoint devices like the Intel Optane may offer you new options.

Share this post

Comments (4)

  • Andy Reply

    What about Optane used directly as a storage device but with ZFS instead of ext4 as the filesystem? Would ZFS be faster than ext4 in that case?

    September 20, 2019 at 1:07 am
    • Li Ben Reply

      ZFS use a ssd ZIL will fast at first, but it’ll slow down when apply log to slow disks, unless all your fsync can convert to append style write.
      But in real case, it’ll be mix of random-write and sequence write.

      October 14, 2019 at 6:09 am
  • BK Reply

    I read your article impressively.

    I’d like to take an Optane for SLOG such as yours.
    How do I configure the system?

    March 9, 2020 at 10:38 pm
  • Yves Trudeau Reply

    My Optane card has 3 partitions, 2 of 10GB and one with the remaining space. Keep in mind, this is not prod, just my home server. For ZFS, I use the 2nd for SLOG and the 3rd for cache (l2arc):

    zpool add data log /dev/disk/by-id/nvme-INTEL_SSDPED1K375GA_PHKS750500FR375AGN-part2
    zpool add data cache /dev/disk/by-id/nvme-INTEL_SSDPED1K375GA_PHKS750500FR375AGN-part3

    My ZFS pool is named data. In any production environment, log should be a mirror from ideally 2 Optane cards. Losing the SLOG is pretty bad. 10GB for the SLOG is quite a lot.

    March 10, 2020 at 9:02 am

Leave a Reply