GET 24/7 LIVE HELP NOW

Announcement

Announcement Module
Collapse
No announcement yet.

simple update sometimes stuck for about 180s when stress test

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • simple update sometimes stuck for about 180s when stress test

    Hi guys, I have a serious issue and need some help. Any advice is appreciated.

    CentOS release 6.4 (Final) Linux version 2.6.32-358.18.1.el6.centos.plus.x86_64
    Percona version: 5.5.33-rel31.1
    Server: 32 cores 64GB RAM


    Currently I'm doing stress test for 8 percona servers, while there's 4 proxy servers transmit the request to mysql (just like mysql-proxy)
    There are four mm group for 8 mysql. Each two is a mm.


    What the test do is just imitating sending messages and update the last receive time in mysql. So for mysql there's only one update sql
    like this:
    update private_message_status set last_read_message_cursor=1384915973595 where user_id=1352394 and from_user_id=1352394;

    Here's the schema
    CREATE TABLE `private_message_status` (
    `user_id` bigint(20) NOT NULL,
    `from_user_id` bigint(20) NOT NULL,
    `last_read_message_cursor` bigint(20) DEFAULT '0',
    `enable_notification` tinyint(1) DEFAULT '1',
    PRIMARY KEY (`user_id`,`from_user_id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8

    And for each server, there's about 460 updates /s

    There's occasionally slow sql which takes time more than 0.1s, sometimes a few hours, but sometimes a few hours, there's some slow sql.
    But sometimes there's a big problem that it can take 2-10s, or 20-30s, occasionally around 180s. Every server has this problem.
    .
    .
    .
    update private_message_status set last_read_message_cursor=1385047291697 where user_id=1344051 and from_user_id=1344051;
    # User@Host: cmmw[cmmw] @ [172.21.27.81]
    # Thread_id: 159395 Schema: shanliao5 Last_errno: 0 Killed: 0
    # Query_time: 180.108397 Lock_time: 0.000076 Rows_sent: 0 Rows_examined: 1 Rows_affected: 1 Rows_read: 1
    # Bytes_sent: 52
    SET timestamp=1385047472;
    update private_message_status set last_read_message_cursor=1385047292266 where user_id=1350252 and from_user_id=1350252;
    # User@Host: cmmw[cmmw] @ [172.21.27.81]
    # Thread_id: 159459 Schema: shanliao5 Last_errno: 0 Killed: 0
    # Query_time: 179.371043 Lock_time: 0.025446 Rows_sent: 0 Rows_examined: 1 Rows_affected: 1 Rows_read: 1
    # Bytes_sent: 52
    SET timestamp=1385047472;
    update private_message_status set last_read_message_cursor=1385047293027 where user_id=1351216 and from_user_id=1351216;
    # User@Host: cmmw[cmmw] @ [172.21.27.81]
    # Thread_id: 159258 Schema: shanliao5 Last_errno: 0 Killed: 0
    # Query_time: 180.901386 Lock_time: 0.000094 Rows_sent: 0 Rows_examined: 1 Rows_affected: 1 Rows_read: 1
    # Bytes_sent: 52
    SET timestamp=1385047472;
    update private_message_status set last_read_message_cursor=1385047291488 where user_id=1338937 and from_user_id=1338937;
    # User@Host: cmmw[cmmw] @ [172.21.28.131]
    # Thread_id: 159415 Schema: shanliao5 Last_errno: 0 Killed: 0
    # Query_time: 179.896060 Lock_time: 0.000087 Rows_sent: 0 Rows_examined: 1 Rows_affected: 1 Rows_read: 1
    # Bytes_sent: 52
    SET timestamp=1385047472;
    update private_message_status set last_read_message_cursor=1385047292453 where user_id=1351169 and from_user_id=1351169;
    # User@Host: cmmw[cmmw] @ [172.21.27.81]
    # Thread_id: 159291 Schema: shanliao5 Last_errno: 0 Killed: 0
    # Query_time: 180.739492 Lock_time: 0.000048 Rows_sent: 0 Rows_examined: 1 Rows_affected: 1 Rows_read: 1
    # Bytes_sent: 52
    SET timestamp=1385047472;
    update private_message_status set last_read_message_cursor=1385047291823 where user_id=1346144 and from_user_id=1346144;
    # User@Host: cmmw[cmmw] @ [172.21.28.131]
    # Thread_id: 159331 Schema: shanliao5 Last_errno: 0 Killed: 0
    # Query_time: 180.486669 Lock_time: 0.000048 Rows_sent: 0 Rows_examined: 1 Rows_affected: 1 Rows_read: 1
    # Bytes_sent: 52
    SET timestamp=1385047472;
    update private_message_status set last_read_message_cursor=1385047291902 where user_id=1336834 and from_user_id=1336834;
    .
    .
    .



    Then I did a crontab to collect the status, for every 5 seconds(I know in nomal situation, it should have a inteveral like 30s, but here 5 can see the problem clearly)
    when the problem comes, the 'show processlist' and 'show engine innodb status'
    shows abnormal status

    There's about 600 threads in show processlist, most of which hang in query end status;


    Query 4 query end update private_message_status set last_read_message_cursor=1384915973062 where user_id=1347239 and from_user_id=1347239 0 1 1
    Query 4 query end update private_message_status set last_read_message_cursor=1384915973035 where user_id=1340131 and from_user_id=1340131 0 1 1
    Query 4 query end update private_message_status set last_read_message_cursor=1384915973064 where user_id=1339030 and from_user_id=1339030 0 1 1
    Query 4 query end update private_message_status set last_read_message_cursor=1384915973009 where user_id=1346218 and from_user_id=1346218 0 1 1
    Query 4 query end update private_message_status set last_read_message_cursor=1384915973032 where user_id=1347227 and from_user_id=1347227 0 1 1
    .
    .
    .
    Query 1 Updating update private_message_status set last_read_message_cursor=1384915976882 where user_id=1336038 and from_user_id=1336038 0 0 0
    Query 1 Updating update private_message_status set last_read_message_cursor=1384915976865 where user_id=1350313 and from_user_id=1350313 0 0 0
    Query 1 query end update private_message_status set last_read_message_cursor=1384915977011 where user_id=1339042 and from_user_id=1339042 0 1 1
    Query 0 Updating update private_message_status set last_read_message_cursor=1384915977123 where user_id=1340127 and from_user_id=1340127 0 0 0
    Query 0 query end update private_message_status set last_read_message_cursor=1384915977147 where user_id=1349375 and from_user_id=1349375 0 1 1

    Threads: 583 Questions: 45797267 Slow queries: 46 Opens: 701 Flush tables: 1 Open tables: 579 Queries per second avg: 745.034

    The OS status
    top - 10:52:57 up 43 days, 27 min, 1 user, load average: 0.40, 0.13, 0.03
    Tasks: 586 total, 1 running, 585 sleeping, 0 stopped, 0 zombie
    Cpu(s): 0.2%us, 0.1%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
    Mem: 65937652k total, 28454740k used, 37482912k free, 225704k buffers
    Swap: 8388600k total, 0k used, 8388600k free, 14169812k cached


    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    467 root 39 19 0 0 0 S 1.9 0.0 666:30.75 kipmi0
    2026 mysql 20 0 57.3g 11g 5796 S 1.9 18.7 138:50.91 mysqld
    1 root 20 0 19228 1504 1216 S 0.0 0.0 0:02.51 init
    2 root 20 0 0 0 0 S 0.0 0.0 0:00.03 kthreadd

    while in normal status
    top - 10:52:52 up 43 days, 27 min, 1 user, load average: 0.09, 0.06, 0.01
    Tasks: 586 total, 1 running, 585 sleeping, 0 stopped, 0 zombie
    Cpu(s): 0.2%us, 0.1%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
    Mem: 65937652k total, 28394284k used, 37543368k free, 225700k buffers
    Swap: 8388600k total, 0k used, 8388600k free, 14177696k cached

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    2026 mysql 20 0 44.3g 11g 5796 S 11.7 18.7 138:50.26 mysqld
    12552 root 20 0 15424 1504 828 R 3.9 0.0 0:00.02 top
    467 root 39 19 0 0 0 S 1.9 0.0 666:30.68 kipmi0
    1 root 20 0 19228 1504 1216 S 0.0 0.0 0:02.51 init
    2 root 20 0 0 0 0 S 0.0 0.0 0:00.03 kthreadd
    Last edited by johnsonma; 11-28-2013, 12:53 AM.

  • #2

    And the 'show engine innodb status'
    BACKGROUND THREAD
    -----------------
    srv_master_thread loops: 61339 1_second, 61339 sleeps, 6133 10_second, 4 background, 4 flush
    srv_master_thread log flush and writes: 61537
    ----------
    SEMAPHORES
    ----------
    OS WAIT ARRAY INFO: reservation count 22619, signal count 66413
    Mutex spin waits 3527105, rounds 19531558, OS waits 11825
    RW-shared spins 29266, rounds 272872, OS waits 6578
    RW-excl spins 73082, rounds 383646, OS waits 2684
    Spin rounds per wait: 5.54 mutex, 9.32 RW-shared, 5.25 RW-excl
    --------
    FILE I/O
    --------
    I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
    I/O thread 1 state: waiting for completed aio requests (log thread)
    I/O thread 2 state: waiting for completed aio requests (read thread)
    I/O thread 3 state: waiting for completed aio requests (read thread)
    I/O thread 4 state: waiting for completed aio requests (read thread)
    I/O thread 5 state: waiting for completed aio requests (read thread)
    I/O thread 6 state: waiting for completed aio requests (write thread)
    I/O thread 7 state: waiting for completed aio requests (write thread)
    I/O thread 8 state: waiting for completed aio requests (write thread)
    I/O thread 9 state: waiting for completed aio requests (write thread)
    Pending normal aio reads: 0 [0, 0, 0, 0] , aio writes: 0 [0, 0, 0, 0] ,
    ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
    Pending flushes (fsync) log: 1; buffer pool: 0
    1180 OS file reads, 43169133 OS file writes, 88320 OS fsyncs
    0.00 reads/s, 0 avg bytes/read, 133.37 writes/s, 0.00 fsyncs/s
    -------------------------------------
    INSERT BUFFER AND ADAPTIVE HASH INDEX
    -------------------------------------
    Ibuf: size 1, free list len 0, seg size 2, 4 merges
    merged operations:
    insert 3, delete mark 4, delete 3
    discarded operations:
    insert 0, delete mark 0, delete 0
    Hash table size 80060237, node heap has 10 buffer(s)
    205.76 hash searches/s, 1.20 non-hash searches/s
    ---
    LOG
    ---
    Log sequence number 292140682848
    Log flushed up to 292140021164
    Last checkpoint at 292101169700
    Max checkpoint age 166798910
    Checkpoint age target 161586445
    Modified age 39513148
    Checkpoint age 39513148
    1 pending log writes, 0 pending chkp writes
    42850100 log i/o's done, 133.37 log i/o's/second
    ----------------------
    BUFFER POOL AND MEMORY
    ----------------------
    Total memory allocated 41540485120; in additional pool allocated 0
    Total memory allocated by read views 4216
    Internal hash tables (constant factor + variable factor)
    Adaptive hash index 640649952 (640481896 + 168056)
    Page hash 40031048 (buffer pool 0 only)
    Dictionary cache 160578044 (160122448 + 455596)
    File system 115504 (82672 + 32832)
    Lock system 100298272 (100076056 + 222216)
    Recovery system 0 (0 + 0)
    Dictionary memory allocated 455596
    Buffer pool size 2469119
    Buffer pool size, bytes 40454045696
    Free buffers 2467551
    Database pages 1558
    Old database pages 555
    Modified db pages 720
    Pending reads 0
    Pending writes: LRU 0, flush list 0, single page 0
    Pages made young 1, not young 0
    0.00 youngs/s, 0.00 non-youngs/s
    Pages read 1115, created 443, written 312879
    0.00 reads/s, 43.79 creates/s, 0.00 writes/s
    Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
    Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
    LRU len: 1558, unzip_LRU len: 0
    I/O sum[0]:cur[0], unzip sum[0]:cur[0]
    --------------
    ROW OPERATIONS
    --------------
    0 queries inside InnoDB, 0 queries in queue
    1 read views open inside InnoDB
    587 transactions active inside InnoDB
    587 out of 1000 descriptors used
    ---OLDEST VIEW---
    Normal read view
    Read view low limit trx n 487B8735
    Read view up limit trx id 487B86C1
    Read view low limit trx id 487B8735
    Read view individually stored trx ids:
    Read view trx id 487B8734
    Read view trx id 487B8733
    Read view trx id 487B8732
    Read view trx id 487B8731
    Read view trx id 487B8730
    Read view trx id 487B872F
    Read view trx id 487B8727
    .
    .
    .
    .
    .
    .
    Read view trx id 487B86C3
    Read view trx id 487B86C2
    Read view trx id 487B86C1
    -----------------
    Main thread process no. 2026, id 139971502806784, state: flushing log
    Number of rows inserted 42, updated 29529289, deleted 6, read 29550470
    0.00 inserts/s, 193.96 updates/s, 0.00 deletes/s, 193.96 reads/s

    TRANSACTIONS
    ------------
    Trx id counter 487B890D
    Purge done for trx's n < 487B86C1 undo n < 0
    History list length 2409
    LIST OF TRANSACTIONS FOR EACH SESSION:
    ---TRANSACTION 0, not started
    MySQL thread id 48637, OS thread handle 0x7f499ba69700, query id 45797309 localhost root
    SHOW ENGINE INNODB STATUS
    ---TRANSACTION 487B7D27, not started
    MySQL thread id 5459, OS thread handle 0x7f4da632f700, query id 45792517 172.21.28.131 cmmw
    ---TRANSACTION 487B77B9, not started
    MySQL thread id 5415, OS thread handle 0x7f57a4227700, query id 45791443 172.21.27.81 cmmw
    ---TRANSACTION 487B86B7, not started
    MySQL thread id 2, OS thread handle 0x7f4da66bd700, query id 45794398 Slave has read all relay log; waiting for the slave I/O thread to update it
    ---TRANSACTION 487B890C, ACTIVE (PREPARED) 0 sec preparing
    mysql tables in use 1, locked 1
    2 lock struct(s), heap size 376, 1 row lock(s), undo log entries 1
    MySQL thread id 48636, OS thread handle 0x7f499baaa700, query id 45797307 172.21.28.130 cmmw query end
    update private_message_status set last_read_message_cursor=1384915977654 where user_id=1346271 and from_user_id=1346271
    TABLE LOCK table `shanliao6`.`private_message_status` trx id 487B890C lock mode IX
    RECORD LOCKS space id 280 page no 22 n bits 304 index `PRIMARY` of table `shanliao6`.`private_message_status` trx id 487B890C lock_mode X locks rec but not gap
    ---TRANSACTION 487B890B, ACTIVE 0 sec starting index read
    mysql tables in use 1, locked 1
    LOCK WAIT 2 lock struct(s), heap size 376, 1 row lock(s)
    MySQL thread id 48635, OS thread handle 0x7f499baeb700, query id 45797302 172.21.28.130 cmmw Updating
    update private_message_status set last_read_message_cursor=1384915977649 where user_id=1356481 and from_user_id=1356481
    ------- TRX HAS BEEN WAITING 0 SEC FOR THIS LOCK TO BE GRANTED:
    RECORD LOCKS space id 280 page no 19 n bits 360 index `PRIMARY` of table `shanliao6`.`private_message_status` trx id 487B890B lock_mode X locks rec but not gap waiting
    ------------------
    TABLE LOCK table `shanliao6`.`private_message_status` trx id 487B890B lock mode IX
    RECORD LOCKS space id 280 page no 19 n bits 360 index `PRIMARY` of table `shanliao6`.`private_message_status` trx id 487B890B lock_mode X locks rec but not gap waiting
    ---TRANSACTION 487B890A, ACTIVE (PREPARED) 0 sec preparing
    mysql tables in use 1, locked 1
    2 lock struct(s), heap size 376, 1 row lock(s), undo log entries 1
    MySQL thread id 48634, OS thread handle 0x7f499bb2c700, query id 45797297 172.21.28.130 cmmw query end
    update private_message_status set last_read_message_cursor=1384915977638 where user_id=1354394 and from_user_id=1354394
    TABLE LOCK table `shanliao6`.`private_message_status` trx id 487B890A lock mode IX
    RECORD LOCKS space id 280 page no 14 n bits 376 index `PRIMARY` of table `shanliao6`.`private_message_status` trx id 487B890A lock_mode X locks rec but not gap
    ---TRANSACTION 487B8909, ACTIVE (PREPARED) 0 sec preparing
    mysql tables in use 1, locked 1
    2 lock struct(s), heap size 376, 1 row lock(s), undo log entries 1
    MySQL thread id 48633, OS thread handle 0x7f499bb6d700, query id 45797292 172.21.28.131 cmmw query end
    update private_message_status set last_read_message_cursor=1384915977449 where user_id=1339051 and from_user_id=1339051
    TABLE LOCK table `shanliao6`.`private_message_status` trx id 487B8909 lock mode IX
    RECORD LOCKS space id 280 page no 25 n bits 344 index `PRIMARY` of table `shanliao6`.`private_message_status` trx id 487B8909 lock_mode X locks rec but not gap
    ---TRANSACTION 487B8908, ACTIVE (PREPARED) 0 sec preparing
    mysql tables in use 1, locked 1
    2 lock struct(s), heap size 376, 1 row lock(s), undo log entries 1
    MySQL thread id 48632, OS thread handle 0x7f499bbae700, query id 45797287 172.21.28.131 cmmw query end
    update private_message_status set last_read_message_cursor=1384915977325 where user_id=1352440 and from_user_id=1352440
    TABLE LOCK table `shanliao6`.`private_message_status` trx id 487B8908 lock mode IX
    RECORD LOCKS space id 280 page no 14 n bits 376 index `PRIMARY` of table `shanliao6`.`private_message_status` trx id 487B8908 lock_mode X locks rec but not gap

    .
    .
    .
    .
    .
    .
    ---TRANSACTION 487B86C2, ACTIVE (PREPARED) 4 sec preparing
    mysql tables in use 1, locked 1
    2 lock struct(s), heap size 376, 1 row lock(s), undo log entries 1
    MySQL thread id 45434, OS thread handle 0x7f4da61ea700, query id 45794403 172.21.28.130 cmmw query end
    update private_message_status set last_read_message_cursor=1384915973009 where user_id=1346218 and from_user_id=1346218
    TABLE LOCK table `shanliao6`.`private_message_status` trx id 487B86C2 lock mode IX
    RECORD LOCKS space id 280 page no 22 n bits 304 index `PRIMARY` of table `shanliao6`.`private_message_status` trx id 487B86C2 lock_mode X locks rec but not gap
    ---TRANSACTION 487B86C1, ACTIVE (PREPARED) 4 sec preparing
    mysql tables in use 1, locked 1
    2 lock struct(s), heap size 376, 1 row lock(s), undo log entries 1
    MySQL thread id 45748, OS thread handle 0x7f57a4268700, query id 45794402 172.21.27.81 cmmw query end
    update private_message_status set last_read_message_cursor=1384915973032 where user_id=1347227 and from_user_id=1347227
    TABLE LOCK table `shanliao6`.`private_message_status` trx id 487B86C1 lock mode IX
    RECORD LOCKS space id 280 page no 22 n bits 304 index `PRIMARY` of table `shanliao6`.`private_message_status` trx id 487B86C1 lock_mode X locks rec but not gap
    ----------------------------
    END OF INNODB MONITOR OUTPUT
    ============================

    Comment


    • #3
      mysql my.cnf
      [mysqld]
      port = 3306
      socket = /tmp/mysql.sock
      skip-external-locking
      back_log = 2000

      key_buffer = 256M
      max_allowed_packet = 2M
      table_cache = 512
      join_buffer_size = 64M
      sort_buffer_size = 32M
      read_buffer_size = 32M
      read_rnd_buffer_size = 64M
      myisam_sort_buffer_size = 64M
      query_cache_size = 64M
      query_cache_limit = 8M
      max_tmp_tables = 128
      tmp_table_size = 256M
      thread_cache_size = 256
      thread_concurrency = 4
      wait_timeout = 3600
      interactive_timeout = 3600
      myisam_repair_threads = 1
      myisam-recover = DEFAULT

      slow_query_log = 1
      long_query_time = 0.1

      low_priority_updates
      delay_key_write = ALL

      max_connections = 1024
      max_connect_errors = 99999999

      skip-name-resolve
      relay-log=relay-bin

      default_storage_engine = InnoDB

      innodb_file_per_table
      innodb_file_format = "Barracuda"
      innodb_data_home_dir = /data/mysqldata/
      innodb_data_file_path = ibdata1:100M:autoextend
      innodb_log_group_home_dir = /data/mysqldata/
      innodb_buffer_pool_size = 38580M
      innodb_additional_mem_pool_size = 100M
      innodb_log_file_size = 1024M
      innodb_log_buffer_size = 128M
      innodb_flush_log_at_trx_commit = 2
      innodb_lock_wait_timeout = 50
      innodb_thread_concurrency = 40
      innodb_flush_method = O_DIRECT
      #innodb_force_recovery = 6
      #transaction-isolation = READ-COMMITTED
      innodb_io_capacity = 500
      auto_increment_offset=2
      auto_increment_increment=2


      [mysqldump]
      quick
      max_allowed_packet = 16M

      [mysql]
      no-auto-rehash

      [isamchk]
      key_buffer = 256M
      sort_buffer_size = 256M
      read_buffer = 2M
      write_buffer = 2M

      [myisamchk]
      key_buffer = 256M
      sort_buffer_size = 256M
      read_buffer = 8M
      write_buffer = 8M

      I know there's one setting may have some problem
      innodb_log_file_size = 100M
      According to this http://www.mysqlperformanceblog.com/...log-file-size/
      I cauculated that our server should be innodb_log_file_size = 1024M,
      I already choose one server to apply this setting and wait to see what will happen
      Ps: This doesn't fix the problem at all, the server still has this problem.


      This issue has scratched my head for a few weeks, hope someone can give me some idea, thks.
      Last edited by johnsonma; 11-27-2013, 02:24 AM.

      Comment


      • #4
        I forgot an important thing, table 'private_message_status' only has around 2500 records for every server.
        Last edited by johnsonma; 11-22-2013, 03:04 AM.

        Comment

        Working...
        X