SSD, XFS, LVM, fsync, write cache, barrier and lost transactions

We finally managed to get Intel X25-E SSD drive into our lab. I attached it to our Dell PowerEdge R900. The story making it running is worth separate mentioning – along with Intel X25-E I got HighPoint 2300 controller and CentOS 5.2 just could not start with two RAID controllers (Perc/6i and HighPoint 2300). The problem was solved by installing Ubuntu 8.10 which is currently running all this system. Originally I wanted to publish some nice benchmarks where InnoDB on SSD outperforms RAID 10, but recently I faced issue which can make previous results inconsistent.

In short words using Intel SSD X25-E card with enabled write-cache (which is default and most performance mode) does not warranty storing all InnoDB transactions on permanent storage.
I am having some déjà vu here, as Peter was rolling this 5 years ago regarding regular IDE disks, and I did not expect this question poping up again.

Long story is:
I started with puting XFS on SSD and running very primitive test with INSERT INTO fs VALUES(0) into auto-increment field into InnoDB table. InnoDB parameters are

Actually most interesting one are innodb_flush_log_at_trx_commit=1 and innodb_flush_method = O_DIRECT (I tried also default innodb_flush_method, with the same result), using innodb_flush_log_at_trx_commit=1 I expect to have all committed transactions even in case of system failure.

Running this test with default XFS setting I saw SSD was doing 50 writes / s, this is something so forced me to check results several times – come on, it’s SSD, we should have much more IO there. Investigations put me into barries/nobarriers parameters and with mounting -o nobarrier I got 5300 writes / s. Nice difference, and this is something we want from SSD.

Now to test durability I do plug off power from SSD card and check how many transactions are really stored – and there is second bumper – I do not see several last N commited transactions.

So now time to turn off write-cache on SSD – all transactions are in place now, but write speed is only 1200 writes / s, which is comparable with RAID 10

So in conclusion to warranty Durability with SSD we have to disable write-cache which can affect performance results significantly (I have no results on hands, but it is to be tested).

What about LVM there ? Well, we often recommend to use LVM for backup purposes (even recent results are bad, we have no good replacement yet) and I tried LVM under XFS. With write-cache ON and default mount options (i.e. with barrier) I have 5250 writes / s, this is because LVM ignores write barriers (see ), but again with enable write-cache you may lose transactions.

So in final conclusion:
1. Intel SSD X25E is NOT reliable in default mode
2. To have durability we need to disable write cache ( with following performance penalty, how much we need to test yet)
3. Possible solution could be put SSD into RAID controller with battery-backup-ed write cache, but I am not sure what is good ones – another are for research
4. XFS without LVM is putting barrier option which decreases write performance a lot

Share this post

Comments (33)

  • peter

    It would be very interesting to check with Intel what their stand is on this.

    This is what suppose to be “Enterprise” drives so it is very strange it comes in so unsafe option as a default. It would be also to check what floats on the SATA wire in this case – do these drive really ignore both “do not cache” flag which should be set for D_SYNC/O_DIRECT writes and cache flush which should be passed w fsync.

    March 2, 2009 at 11:28 pm
  • rocky

    What kind of tool did you for testing writes ? and parameters ?

    March 3, 2009 at 1:04 am
  • Vadim


    it’s very simple PHP scripts

    for w/s you can see iostat -dx 5

    March 3, 2009 at 1:15 am
  • Phil

    Did you try aligning the blocks in the FS to the blocks of the SSD?

    March 3, 2009 at 4:06 am
  • Sean

    Vadim, do you have any tests related to read oriented queries, ie 90%+ using innodb and myisam?


    March 3, 2009 at 3:17 pm
  • Vadim


    No I did not try that – do you think it will help with fsync ?

    March 4, 2009 at 10:23 am
  • Vadim


    Are you asking about reads on SSD or in general ?
    We have a lot of numbers, just need to sort it out 🙂

    March 4, 2009 at 10:23 am
  • Sean

    Hi Vadim,

    Yes, I’m interested in reads on SSD for both innodb and myisam, though primarily myisam given our environment houses static data (compressed tables). I’ve read the papers and had presentations from vendors, but have not had the opportunity to get some in-house. I’m curious to know how they performed in the described situation, but also the TCO, how to save on server hardware. Given SSD’s respond for reads within .2-.4 ms, does a server need 32,64,128G of memory anymore?


    March 4, 2009 at 10:45 am
  • Theodore Tso

    FYI, starting in 2.6.29, LVM will start respecting write barriers. (finally!)

    March 4, 2009 at 3:03 pm
  • Mark Callaghan

    @Sean — there is a great paper on where to spend money (RAM, Flash, disk) and it address your question — spend more on Flash and less on RAM. I have not seen anyone publish benchmark results based on the ideas in the paper —

    March 9, 2009 at 8:57 am
  • TS


    2. To have durability we need to disable write cache. (Basically reducing random write IOPS from 5000+ to 1200ish.)

    Is this issue XFS specific or MySQL specific or is it hardware? Does it happen in EXT3 under Linux? If it is a file system issue or MySQL issue, then it is not too bad. If it is a hardware issue, basically Intel just did false advertising. The enterprise market requires write IOPS durability. If it is indeed a hardware issue, there is no reason why people would pay for the ridiculous price of the X25-E if you are forced to disable the write cache on it and accept 1200 random write IOPS compared to 5K+ random write IOPS. I can foresee a drive recall or at least a huge price reduction for the drive then followed by a PCB revision with supercapacitor backed PCB design.

    This issue must be broadcast to the entire SQL database community. I am sure a lot of people are indeed picking the X25-E on the DB servers right now and they might be risking their data. Preferably, Intel must give an official word on this issue too.

    March 9, 2009 at 10:00 am
  • Vadim


    Currently I believe this is hardware issue, not filesystem or MySQL. It seems Intel X25E does not have battery to support write cache and just lose it at power failure. To claim it 100% we need to run some more experiments, but I would not put my database that require 100% durability on it for now.

    March 9, 2009 at 10:50 am
  • Andi Kleen

    Sorry, but you realize that nobarrier is the likely cause for the data loss, right? With barriers
    XFS fsync (but not necessarily ext3 fsync) would wait for a write barrier on the log commit, and thus
    also for the data. O_SYNC might be different though.

    Basically you specified the “please go unsafe but faster” option and then complain that it is
    actually unsafe.

    I would recommend to do the power off test without nobarriers but write cache on.


    March 10, 2009 at 1:12 pm
  • Vadim


    I wrote that in post. With barrier and write cache we have 50 writes / s, which I consider “not just slower” but disaster which I would not put on production system.

    March 10, 2009 at 1:57 pm
  • Andi Kleen

    It’s the cost of hitting the media. Unsafety like you chose is always faster.

    BTW LVM has been fixed in 2.6.29: if you only have a single backing disk it will pass through barriers. Still not for
    the multiple disk case which is questionable.

    March 10, 2009 at 6:36 pm
  • Vadim


    This cost is way too big. In this case RAID 10 on 8 disks + BBU is cheaper and gives much better results.
    That simply means SSD can’t be used as media for high performance durable databases.

    March 10, 2009 at 6:46 pm
  • justmy2cents

    Why not just adding an UPS to ur system.

    March 21, 2009 at 3:33 am
  • B

    @ Justmy2cents,

    I totally agree. Who isn’t running UPS’s on their DB servers? Even an old one that could hold the system on for 30 seconds would be enough, as long as further transactions didn’t keep coming in during that time.


    March 28, 2009 at 3:09 pm
  • Mark Callaghan

    When will Amazon provide a UPS on EC2 servers?

    March 28, 2009 at 6:22 pm
  • Baron Schwartz

    A UPS isn’t the whole solution. What happens when your power supply (the one inside the server) fails, for example? “Keep the power from going off” and “keep the server from crashing” are not the same thing as “I want this device not to lie when I ask for this data to be written to durable storage.”

    March 29, 2009 at 6:35 am
  • B

    True, but even lower cost servers come with redundant power supplies as an option. I certainly wouldn’t specify a server for mission critical database data without redundant PSU’s.


    March 29, 2009 at 4:37 pm
  • Baron Schwartz

    Jignesh presented on a similar topic at the PostgreSQL conference this weekend. Slides are at

    April 6, 2009 at 5:20 am
  • Baron Schwartz

    Oh, and the broad consensus in the room was “these things are worth a lot less if they lie to the operating system about durability, and a UPS is not an acceptable workaround” 🙂

    April 6, 2009 at 5:21 am

    Somewhat after the fact I realized that this information relates closely with the SSD tests I did with PostgreSQL, reported here: . With a plain dd test I get about 50% performance loss when turning the write cache off on the X25-E. That’s much better than the more than fourfold loss you are observing (if I parse your numbers right). I’m planning to do a pgbench test later, which might provide more insight.

    July 9, 2009 at 7:55 am
  • Baron Schwartz

    Peter [Eisentraut], I’ve been following your benchmark blog posts with a lot of interest, too 🙂

    July 9, 2009 at 6:41 pm
  • andor

    Hi Vadim,

    I know that you published your thread long time back.. but I have a really really basic question. How do you turn off write cache on X25-E? I’ve searched everywhere how to do it but could only find studies showing performance drop after doing it. I used sdparm in the following fashion:

    sudo sdparm -s WCE=0 -S /dev/sdc

    but get an error like this:

    /dev/sdc: ATA SSDSA2SH032G1GN 045C
    change_mode_page: mode page indicates it is not savable but
    ‘–save’ option given (try without it)

    The error seems to suggest we cannot switch the X25-E’s cache.. but we know thats not the case.

    I am asking this question on the wrong forum I believe, but I thought you might have a ready solution for me..


    January 21, 2010 at 2:35 pm
  • Vadim


    hdparm -W 0 /dev/sdb

    works for me.


    setting drive write-caching to 0 (off)
    write-caching = 0 (off)

    January 21, 2010 at 2:41 pm
  • andor

    Vadim, thats awesome.. that has did the trick for me! It says the cache is switched off .. I was tinkering around with the sdparm utility as I thought the device file ‘sdc’ starts with s. (one of my friends suggested this. He said that since we connected the SSD with a SAS controller, we should be using sdparm. I have no idea what this SAS controller is.) thanks a lot for the input.

    January 21, 2010 at 2:51 pm
  • Robert

    I know this posting s quite old, but it would be great to know the firmware of the Intel X25-E SSD you tested ?

    April 6, 2010 at 2:08 pm
  • Evan Jones

    Note that according to my tests, the Intel X25-M G2 loses data even when the write cache is disabled. I have not tested the X25-E, but I suspect this may also be an issue there? See the following for more info:

    August 24, 2010 at 6:05 am
  • SAB

    Sorry for digging this post up, but did you tried those tests with Sysbench ? I made a few tests with write cache on and off but I don’t have a SSD to compare the results.

    January 9, 2012 at 3:06 pm
  • Angel Genchev

    It’s interesting how does perform here the “Proffesional” SSD of Samsung – Samsung 850 pro w 3D V-NAND whic by 2014-2015 is among the best SSDs.

    July 8, 2015 at 6:06 pm

Comments are closed.

Use Percona's Technical Forum to ask any follow-up questions on this blog topic.