Open Source Databases on Big Machines: Disk Speed and innodb_io_capacity

Open Source Databases on Big Machines: Disk Speed and innodb_io_capacity

PREVIOUS POST
NEXT POST

In this blog post, I’ll look for the bottleneck that prevented the performance in my previous post from achieving better results.

The powerful machine I used in the tests in my previous post has a comparatively slow disk, and therefore I expected my tests would hit a point when I couldn’t increase performance further due to the disk speed.

Hardware configuration:

Processors: physical = 4, cores = 72, virtual = 144, hyperthreading = yes
Memory: 3.0T
Disk speed: about 3K IOPS
OS: CentOS 7.1.1503
File system: XFS

Versions tested and configuration: same as in the first post of this series (check the post for specifics).

Even though I expected my tests would stop increasing in performance due to the disk speed, I did not observe high IO rates in the  iostat output. I already tested with a full data set that fits in memory. In this case, write performance only affected data flushes and log writes. But we should still see a visible decrease in speed. So I decided to try RW tests totally in memory. I created a ramdisk and put the MySQL datadir on it. Surprisingly, results on the SSD and ramdisk did not differ.

I asked my colleagues from “Postgres Professional” to test PostgreSQL with the ramdisk. They got similar results:

It’s interesting that the value of innodb_io_capacity does not have any effect on this situation. Data for the graph below was taken when I ran tests on ramdisk. I wanted to see if I could control the IO activity of a disk, which is extremely fast by default, using this variable.

This totally contradicts all my past experiences with smaller machines. Percona re-purposed the machine with a faster disk (which I used before, described in this post), so I used a similar one with slower disk speed.

Hardware configuration:

Processors: physical = 2, cores = 12, virtual = 24, hyperthreading = yes
Memory: 47.2G
Disk speed: about 3K IOPS
OS: Ubuntu 14.04.5 LTS (trusty)
File system: ext4

Again, in this case innodb_io_capacity benchmarks with a smaller number of CPU cores showed more predictable results.

Conclusion:

Both MySQL and PostgreSQL on a machine with a large number of CPU cores hit CPU resources limits before disk speed can start affecting performance. We only tested one scenario, however. With other scenarios, the results might be different.

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

PREVIOUS POST
NEXT POST

Share this post

Comments (10)

  • Abbas Ahmed Reply

    Sveta, this is a very interesting observation indeed. However, I was hoping that you would have kept using the same filesystem in both scenarios, any reasons for not keeping it the same? If they were actually different, do we risk this comparison becoming a filesystem benchmark?

    March 1, 2017 at 9:48 pm
    • Sveta Smirnova Reply

      Do you mean Percona and Freematiq machines? I did not try to compare them. But Percona machines are more like those which I see when work on Support issues and they demonstrate behavior I used to. So I use them as a baseline and to check if I am not doing something wrong with the test itself.

      This is certainly not filesystem benchmark, because workload is not disk-bound by its nature. Still I expected I would hit disk performance issues on Freematiq machine at some point and I did not.

      March 2, 2017 at 5:23 am
  • Holger Thiel Reply

    The filesystem is different.

    March 2, 2017 at 4:23 am
    • Sveta Smirnova Reply

      Yes, but I am not comparing filesystems or even Freematiq and Percona machine. I used smaller machine to show behavior which I expect based on my Support experience. See more detailed reply to the previous comment.

      March 2, 2017 at 5:26 am
  • Jouni Järvinen Reply

    If hundreds or thousands of queries faster speed ain’t worth minding, what is ?

    March 2, 2017 at 8:24 am
  • Mark Callaghan Reply

    Did you get PMP output to see where the CPU/mutex bottlenecks occur?

    March 2, 2017 at 3:08 pm
    • Sveta Smirnova Reply

      And first lines of PMP:

      $ head -n 10 pt-pmp.log.20170303
      Пт мар 3 10:51:09 UTC 2017
      29 pthread_cond_wait,wait(os0event.cc:165),os_event::wait_low(os0event.cc:165),sync_array_wait_event(sync0arr.cc:475),TTASEventMutex::wait(ut0mutex.ic:89),spin_and_try_lock(ib0mutex.h:850),enter(ib0mutex.h:850),PolicyMutex(ib0mutex.h:850),lock_table(lock0lock.cc:4008),row_search_mvcc(row0sel.cc:4904),ha_innobase::index_read(ha_innodb.cc:8634),handler::ha_index_read_map(handler.cc:3031),handler::read_range_first(handler.cc:7297),handler::multi_range_read_next(handler.cc:6367),DsMrr_impl::dsmrr_next(handler.cc:6751),QUICK_RANGE_SELECT::get_next(opt_range.cc:11182),rr_quick(records.cc:398),mysql_update(sql_update.cc:803),Sql_cmd_update::try_single_table_update(sql_update.cc:2884),Sql_cmd_update::execute(sql_update.cc:3011),mysql_execute_command(sql_parse.cc:3616),Prepared_statement::execute(sql_prepare.cc:3935),Prepared_statement::execute_loop(sql_prepare.cc:3543),mysqld_stmt_execute(sql_prepare.cc:2549),dispatch_command(sql_parse.cc:1395),do_command(sql_parse.cc:999),handle_connection(connection_handler_per_thread.cc:300),pfs_spawn_thread(pfs.cc:2188),start_thread(libpthread.so.0),clone(libc.so.6)
      28 pthread_cond_wait,native_cond_wait(thr_cond.h:140),my_cond_wait(thr_cond.h:140),inline_mysql_cond_wait(thr_cond.h:140),Per_thread_connection_handler::block_until_new_connection(thr_cond.h:140),handle_connection(connection_handler_per_thread.cc:329),pfs_spawn_thread(pfs.cc:2188),start_thread(libpthread.so.0),clone(libc.so.6)
      20 pthread_cond_wait,wait(os0event.cc:165),os_event::wait_low(os0event.cc:165),sync_array_wait_event(sync0arr.cc:475),TTASEventMutex::wait(ut0mutex.ic:89),spin_and_try_lock(ib0mutex.h:850),enter(ib0mutex.h:850),PolicyMutex
      (ib0mutex.h:850),log_write_up_to(log0log.cc:1249),trx_flush_log_if_needed_low(trx0trx.cc:1789),trx_flush_log_if_needed(trx0trx.cc:1789),trx_commit_complete_for_mysql(trx0trx.cc:1789),innobase_commit(ha_innodb.cc:4345),ha_commit_low(handler.cc:1900),TC_LOG_DUMMY::commit(tc_log.cc:28),ha_commit_trans(handler.cc:1796),trans_commit(transaction.cc:239),mysql_execute_command(sql_parse.cc:4302),mysql_parse(sql_parse.cc:5611),dispatch_command(sql_parse.cc:1461),do_command(sql_parse.cc:999),handle_connection(connection_handler_per_thread.cc:300),pfs_spawn_thread(pfs.cc:2188),start_thread(libpthread.so.0),clone(libc.so.6)
      10 __io_getevents_0_4(libaio.so.1),LinuxAIOHandler::collect(os0file.cc:2492),LinuxAIOHandler::poll(os0file.cc:2638),os_aio_linux_handler(os0file.cc:2694),os_aio_handler(os0file.cc:2694),fil_aio_wait(fil0fil.cc:5817),io_handler_thread(srv0start.cc:308),start_thread(libpthread.so.0),clone(libc.so.6)
      4 pthread_cond_wait,wait(os0event.cc:165),os_event::wait_low(os0event.cc:165),sync_array_wait_event(sync0arr.cc:475),TTASEventMutex::wait(ut0mutex.ic:89),spin_and_try_lock(ib0mutex.h:850),enter(ib0mutex.h:850),PolicyMutex
      (ib0mutex.h:850),lock_clust_rec_read_check_and_lock(lock0lock.cc:6267),sel_set_rec_lock(row0sel.cc:1252),row_search_mvcc(row0sel.cc:5319),ha_innobase::index_read(ha_innodb.cc:8634),handler::ha_index_read_map(handler.cc:3031),handler::read_range_first(handler.cc:7297),handler::multi_range_read_next(handler.cc:6367),DsMrr_impl::dsmrr_next(handler.cc:6751),QUICK_RANGE_SELECT::get_next(opt_range.cc:11182),rr_quick(records.cc:398),mysql_update(sql_update.cc:803),Sql_cmd_update::try_single_table_update(sql_update.cc:2884),Sql_cmd_update::execute(sql_update.cc:3011),mysql_execute_command(sql_parse.cc:3616),Prepared_statement::execute(sql_prepare.cc:3935),Prepared_statement::execute_loop(sql_prepare.cc:3543),mysqld_stmt_execute(sql_prepare.cc:2549),dispatch_command(sql_parse.cc:1395),do_command(sql_parse.cc:999),handle_connection(connection_handler_per_thread.cc:300),pfs_spawn_thread(pfs.cc:2188),start_thread(libpthread.so.0),clone(libc.so.6)
      3 pthread_cond_wait,wait(os0event.cc:165),os_event::wait_low(os0event.cc:165),sync_array_wait_event(sync0arr.cc:475),TTASEventMutex::wait(ut0mutex.ic:89),spin_and_try_lock(ib0mutex.h:850),enter(ib0mutex.h:850),PolicyMutex
      (ib0mutex.h:850),lock_sec_rec_modify_check_and_lock(lock0lock.cc:6094),btr_cur_del_mark_set_sec_rec(btr0cur.cc:4967),row_upd_sec_index_entry(row0upd.cc:2264),row_upd_sec_step(row0upd.cc:2332),row_upd(row0upd.cc:2332),row_upd_step(row0upd.cc:3158),row_update_for_mysql_using_upd_graph(row0mysql.cc:2524),row_update_for_mysql(row0mysql.cc:2722),ha_innobase::delete_row(ha_innodb.cc:8252),handler::ha_delete_row(handler.cc:7997),Sql_cmd_delete::mysql_delete(sql_delete.cc:468),Sql_cmd_delete::execute(sql_delete.cc:1382),mysql_execute_command(sql_parse.cc:3616),Prepared_statement::execute(sql_prepare.cc:3935),Prepared_statement::execute_loop(sql_prepare.cc:3543),mysqld_stmt_execute(sql_prepare.cc:2549),dispatch_command(sql_parse.cc:1395),do_command(sql_parse.cc:999),handle_connection(connection_handler_per_thread.cc:300),pfs_spawn_thread(pfs.cc:2188),start_thread(libpthread.so.0),clone(libc.so.6)
      3 pthread_cond_wait,wait(os0event.cc:165),os_event::wait_low(os0event.cc:165),buf_flush_page_cleaner_worker(buf0flu.cc:3496),start_thread(libpthread.so.0),clone(libc.so.6)
      2 pthread_cond_wait,wait(os0event.cc:165),os_event::wait_low(os0event.cc:165),sync_array_wait_event(sync0arr.cc:475),TTASEventMutex::wait(ut0mutex.ic:89),spin_and_try_lock(ib0mutex.h:850),enter(ib0mutex.h:850),PolicyMutex
      (ib0mutex.h:850),lock_update_delete(lock0lock.cc:3561),btr_cur_optimistic_delete_func(btr0cur.cc:5128),row_purge_remove_sec_if_poss_leaf(row0purge.cc:541),row_purge_remove_sec_if_poss(row0purge.cc:587),row_purge_upd_exist_or_extern_func(row0purge.cc:587),row_purge_record_func(row0purge.cc:987),row_purge(row0purge.cc:987),row_purge_step(row0purge.cc:987),que_thr_step(que0que.cc:1056),que_run_threads_low(que0que.cc:1056),que_run_threads(que0que.cc:1056),srv_task_execute(srv0srv.cc:2458),srv_worker_thread(srv0srv.cc:2458),start_thread(libpthread.so.0),clone(libc.so.6)
      2 pthread_cond_wait,wait(os0event.cc:165),os_event::wait_low(os0event.cc:165),sync_array_wait_event(sync0arr.cc:475),TTASEventMutex::wait(ut0mutex.ic:89),spin_and_try_lock(ib0mutex.h:850),enter(ib0mutex.h:850),PolicyMutex
      (ib0mutex.h:850),lock_sec_rec_modify_check_and_lock(lock0lock.cc:6094),btr_cur_del_mark_set_sec_rec(btr0cur.cc:4967),row_upd_sec_index_entry(row0upd.cc:2264),row_upd_sec_step(row0upd.cc:2332),row_upd(row0upd.cc:2332),row_upd_step(row0upd.cc:3158),row_update_for_mysql_using_upd_graph(row0mysql.cc:2524),row_update_for_mysql(row0mysql.cc:2722),ha_innobase::update_row(ha_innodb.cc:8139),handler::ha_update_row(handler.cc:7964),mysql_update(sql_update.cc:880),Sql_cmd_update::try_single_table_update(sql_update.cc:2884),Sql_cmd_update::execute(sql_update.cc:3011),mysql_execute_command(sql_parse.cc:3616),Prepared_statement::execute(sql_prepare.cc:3935),Prepared_statement::execute_loop(sql_prepare.cc:3543),mysqld_stmt_execute(sql_prepare.cc:2549),dispatch_command(sql_parse.cc:1395),do_command(sql_parse.cc:999),handle_connection(connection_handler_per_thread.cc:300),pfs_spawn_thread(pfs.cc:2188),start_thread(libpthread.so.0),clone(libc.so.6)

      March 3, 2017 at 6:50 am
      • cyberjots Reply

        Sveta, I wonder if Percona Server was launched with numactl –interleave ALL for both cases? We could be getting cache lines thrashed in the bigger box if memory policy is on defaults.

        Also, does the above profiler output belong to the bigger box?

        March 3, 2017 at 11:16 am
  • Sveta Smirnova Reply

    Here is perf top output:

    2,42% libc-2.17.so [.] __memmove_ssse3_back
    2,36% mysqld [.] rec_get_offsets_func
    2,24% mysqld [.] buf_page_get_gen
    1,70% mysqld [.] row_search_mvcc
    1,57% [kernel] [k] call_function_interrupt
    1,35% mysqld [.] ut_delay

    March 3, 2017 at 6:49 am

Leave a Reply