Storm on Demand + InnoDB Issues

  • Filter
  • Time
  • Show
Clear All
new posts

  • Storm on Demand + InnoDB Issues

    Hey all,

    I have a pretty puzzling issue with InnoDB and LiquidWeb's Storm on Demand service. Basically, the issue revolves around backups. They said they can't do anything to change their backup system, so I thought maybe there was some MySQL settings that could help with the issue.

    From what I can understand, Storm uses a Xen-based virtualization infrastructure. It does daily point-in-time snapshot backups. How exactly it does this, they won't disclose. All I do know, is at exactly the time this starts, the MySQL database basically becomes completely unresponsive for around 2 minutes. Even simple selects go on hold. If I do "show processlist" a ton of processes are in "statistics" or "sorting results" or the like. And these are all very simple-index based queries. My entire dataset easily fits in the buffer-pool as well.

    So, here's what I'm thinking is happening. The DB does around 10-15 updates, deletes or inserts per second. In order to take a point-in-time backup, the Xen system freezes the state of the hard disks for a very brief period of time. Nothing can be written to the disk. Since the database is unable to write anything to disks, it can't commit its inserts, updates or deletes, and everything is placed on hold. BTW, I have these settings:

    innodb_flush_log_at_trx_commit = 2

    So, does this sound about right to you guys? If so, is there anyway to stop the database from becoming unresponsive that you might know?


    P.S. I just changed innodb_flush_log_at_trx_commit to 0 as my data is not mission critical. We'll see if that makes any difference.

  • #2

    Just an FYI to all setting innodb_flush_log_at_trx_commit to 0 seems to have solved the issue. If anyone wants to jump as to why this is, I'd appreciate it. My thoughts are just that with that variable, if the disk is unresponsive, InnoDB can wait for a few minutes to write until it is responsive again. With =2, it can't. It still has to do some writing.



    • #3
      They just delay the flush operation. With this new setting, the number of dirty pages in memory increases. They may cause freezing at a later point, when these have not been flushed and memory is required for new pages.


      • #4

        Thank you for the feedback. I see what you mean. Since the server has so much more ram allocated to buffer pool than it needs, I don't think a few minutes of delayed flushes would be a big concern. When the disk is operating normally, the dirty pages percent is only 0.6%

        Total memory allocated 15112556002; in additional pool allocated 20804864
        Buffer pool size 832000
        Free buffers 339941
        Database pages 461223
        Modified db pages 2866
        Pending reads 0
        Pending writes: LRU 0, flush list 0, single page 0
        Pages read 528169, created 1436905, written 88782476
        0.00 reads/s, 0.00 creates/s, 12.09 writes/s
        Buffer pool hit rate 1000 / 1000

        However, I could see how on a server doing more write operations or with less buffer pool, this could be a problem. If and when it becomes one for me, I could just increase the buffer pool further to give it time to smooth out any disk freezes. We're only talking about a couple of minutes a day here though.