Last year, I wrote a post focused on the performance of the fsync call on various storage devices. The fsync call is extremely important for a database when durability, the “D” of the ACID acronym is a hard requirement. The call ensures the data is permanently stored on disk. The durability requirement forces every transaction to return only when the InnoDB log file and they binary log file have been flushed to disk.
In this post, instead of focusing on the performance of various devices, we’ll see what can be done to improve fsync performance using an Intel Optane card.
A few years ago, Intel introduced a new type of storage devices based on the 3D_XPoint technology and sold under the Optane brand. Those devices are outperforming regular flash devices and have higher endurance. In the context of this post, I found they are also very good at handling the fsync call, something many flash devices are not great at doing.
I recently had access to an Intel Optane NVMe card, a DC P4800X card with a storage capacity of 375GB. Let’s see how it can be used to improve performance.
This is by far the simplest option if your dataset fits on the card. Just install the device, create a filesystem, mount it and go. Using the same python script as in the first post, the results are:
| Options | Fsync rate | Latency |
| ext4, O_DIRECT | 21200/s | 0.047 ms |
| ext4 | 20000/s | 0.050 ms |
| ext4, data=journal | 9600/s | 0.100 ms |
The above results are pretty amazing. The fsync performance is on par with a RAID controller with a write cache, for which I got a rate of 23000/s and is much better than a regular NAND based NVMe card like the Intel PC-3700, able to deliver a fsync rate of 7300/s. Even enabling the full ext4 journal, the rate is still excellent although, as expected, cut by about half.
If you have a large dataset, you can still use the Optane card as a read/write cache and improve fsync performance significantly. I did some tests with two easily available solutions, dm-cache and bcache. In both cases, the Optane card was put in front of an external USB Sata disk and the cache layer set to writeback.
| Options | Fsync rate | Latency |
| No cache | 13/s | 75 ms |
| dm-cache | 3100/s | 0.32 ms |
| bcache | 2500/s | 0.40 ms |
Both solutions improve the fsync rate by two orders of magnitude. That’s still much slower than the straight device but a very decent trade-off.
ZFS can also use a fast device for its write journal, the ZIL. Such a device in ZFS terminology is called a SLOG. With the ZFS logbias set to “latency”, here is the impact of using an Optane device as SLOG in front of the same slow USB SATA disk:
| Options | Fsync rate | Latency |
| ZFS, SLOG | 7400/s | 0.135 ms |
| ZFS, no SLOG | 28/s | 36 ms |
The addition of SLOG device boosted fsync rate by a factor of nearly 260. The rates are also twice as important as the ones reported using dm-cache and bcache and about a third of the result using the Optane device for storage. Considering all the added benefits of ZFS like compression and snapshots, that’s a really interesting result.
If you are struggling with the commit latency of a large transactional database, 3D_XPoint devices like the Intel Optane may offer you new options.
Resources
RELATED POSTS