GET 24/7 LIVE HELP NOW

Announcement

Announcement Module
Collapse
No announcement yet.

SELECT FOR UPDATE using secondary index => 50x times slower than forcing full table scan

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • SELECT FOR UPDATE using secondary index => 50x times slower than forcing full table scan

    Hi all! (Please be kind with my poor english...)

    THANK YOU for this blog! )

    I am developing on my spare time a little CMS storing data in a hierarchical manner. After googling and testing a while, I end up using the "materialized path" solution despite its unefficient storage requirement.

    Here is the (simplified) tree table:


    create table site_node ( id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT, # --- path VARBINARY(32) NOT NULL, children SMALLINT UNSIGNED NOT NULL, parent MEDIUMINT UNSIGNED NOT NULL, alias VARBINARY(32) NOT NULL, # --- PRIMARY KEY (id), UNIQUE INDEX ak_path (path), UNIQUE INDEX ak_parent (parent, alias), FOREIGN KEY fk_parent (parent) REFERENCES site_node (id) ON UPDATE CASCADE ON DELETE RESTRICT)ENGINE=InnoDB;

    "parent" field is used to select direct children (faster than using path index).

    "children" field contains the number of direct children (0 => leaf node).

    "path" field contains the node path encoded in hexadecimal, each level is 4 bytes long:


    mysql> SELECT id, parent, children, path FROM site_node WHERE path >= "0001" AND path < "0002" ORDER BY path LIMIT 20;+--------+--------+----------+------------------+| id | parent | children | path |+--------+--------+----------+------------------+| 2 | 1 | 521 | 0001 || 102 | 2 | 348 | 00010001 || 5128 | 102 | 133 | 000100010001 || 258090 | 5128 | 0 | 0001000100010001 || 261033 | 5128 | 0 | 0001000100010002 || 263014 | 5128 | 0 | 0001000100010003 || 269425 | 5128 | 0 | 0001000100010004 || 272585 | 5128 | 0 | 0001000100010005 || 283558 | 5128 | 0 | 0001000100010006 || 284133 | 5128 | 0 | 0001000100010007 || 284925 | 5128 | 0 | 0001000100010008 || 293219 | 5128 | 0 | 0001000100010009 || 309769 | 5128 | 0 | 000100010001000A || 311042 | 5128 | 0 | 000100010001000B || 316655 | 5128 | 0 | 000100010001000C || 337223 | 5128 | 0 | 000100010001000D || 339034 | 5128 | 0 | 000100010001000E || 346686 | 5128 | 0 | 000100010001000F || 348222 | 5128 | 0 | 0001000100010010 || 363623 | 5128 | 0 | 0001000100010011 |+--------+--------+----------+------------------+20 rows in set (0.00 sec)

    As expected, retrieving or counting all children from a node is trivial:


    Cold:mysql> SELECT COUNT(*) FROM site_node WHERE path >= "0001" AND path < "0002" ORDER BY path;+----------+| COUNT(*) |+----------+| 601591 |+----------+1 row in set (13.07 sec)Warm:mysql> SELECT COUNT(*) FROM site_node WHERE path >= "0001" AND path < "0002" ORDER BY path;+----------+| COUNT(*) |+----------+| 601591 |+----------+1 row in set (0.55 sec)

    Here is my problem:

    when I need to move or shift a node, I must lock all affected nodes before updating them. For instance, before shifting node id #2 from rank 1 to rank 2 and updating 601591 rows, I lock the whole subtree :

    Note: should be path < "0003" but you get the idea.


    mysql> EXPLAIN SELECT COUNT(*) FROM site_node WHERE path >= "0001" AND path < "0002" ORDER BY path FOR UPDATE;+----+-------------+-----------+-------+---------------+---------+---------+------+--------+--------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-----------+-------+---------------+---------+---------+------+--------+--------------------------+| 1 | SIMPLE | site_node | range | ak_path | ak_path | 32 | NULL | 902702 | Using where; Using index |+----+-------------+-----------+-------+---------------+---------+---------+------+--------+--------------------------+1 row in set (0.00 sec)

    Index range scan should be fast (Handler_read_next going up) but it is not :


    SELECT FOR UPDATE:mysql> SELECT COUNT(*) FROM site_node WHERE path >= "0001" AND path < "0002" ORDER BY path FOR UPDATE;+----------+| COUNT(*) |+----------+| 601591 |+----------+1 row in set (20 min 41.33 sec)Note: using SELECT COUNT(id) does not improve anything.Innodb output:LIST OF TRANSACTIONS FOR EACH SESSION:---TRANSACTION 0 36156404, ACTIVE 1208 sec, process no 20386, OS thread id 1448221616 fetching rows, thread declared inside InnoDB 136mysql tables in use 1, locked 126110 lock struct(s), heap size 1944896MySQL thread id 629, query id 30002552 localhost root Sending dataSELECT COUNT(*) FROM site_node WHERE path >= "0001" AND path < "0002" ORDER BY path FOR UPDATEBUFFER POOL AND MEMORY----------------------Total memory allocated 305426468; in additional pool allocated 1682688Buffer pool size 16384Free buffers 0Database pages 16289Modified db pages 1Pending reads 1Pending writes: LRU 0, flush list 0, single page 0Pages read 1367173, created 54631, written 605910228.55 reads/s, 0.00 creates/s, 0.00 writes/sBuffer pool hit rate 883 / 1000ROW OPERATIONS--------------1 queries inside InnoDB, 0 queries in queueMain thread process no. 20386, id 1439226800, state: waiting for server activityNumber of rows inserted 5000000, updated 10000000, deleted 0, read 725714210.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 598.03 reads/s

    What I do not understand is that using a read lock is much faster:


    SELECT LOCK IN SHARE MODE:mysql> SELECT COUNT(*) FROM site_node WHERE path >= "0001" AND path < "0002" ORDER BY path LOCK IN SHARE MODE;+----------+| COUNT(*) |+----------+| 601591 |+----------+1 row in set (13.57 sec)

    After scratching my head a while, I tried this:


    Full index scan:mysql> EXPLAIN SELECT COUNT(*) FROM site_node USE INDEX(PRIMARY) WHERE id != 0 AND path >= "0001" AND path < "0002" ORDER BY path FOR UPDATE;+----+-------------+-----------+-------+---------------+---------+---------+------+---------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-----------+-------+---------------+---------+---------+------+---------+-------------+| 1 | SIMPLE | site_node | range | PRIMARY | PRIMARY | 3 | NULL | 2501352 | Using where |+----+-------------+-----------+-------+---------------+---------+---------+------+---------+-------------+1 row in set (0.04 sec)mysql> SELECT COUNT(*) FROM site_node USE INDEX(PRIMARY) WHERE id != 0 AND path >= "0001" AND path < "0002" ORDER BY path FOR UPDATE;+----------+| COUNT(*) |+----------+| 601591 |+----------+1 row in set (23.97 sec)Yep! Much better. |

    And this:


    Full table scan:mysql> EXPLAIN SELECT COUNT(*) FROM site_node USE INDEX(PRIMARY) WHERE path >= "0001" AND path < "0002" ORDER BY path FOR UPDATE;+----+-------------+-----------+------+---------------+------+---------+------+---------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-----------+------+---------------+------+---------+------+---------+-------------+| 1 | SIMPLE | site_node | ALL | NULL | NULL | NULL | NULL | 5002703 | Using where |+----+-------------+-----------+------+---------------+------+---------+------+---------+-------------+1 row in set (0.00 sec)mysql> SELECT COUNT(*) FROM site_node USE INDEX(PRIMARY) WHERE path >= "0001" AND path < "0002" ORDER BY path FOR UPDATE;+----------+| COUNT(*) |+----------+| 601591 |+----------+1 row in set (20.21 sec)Sigh... Better again...

    The "site_node" table contains 5 millions rows and is arround 1Go (Data 396,288 Mo, Index 525,392 Mo, Total 921,680 Mo).

    MySQL is running on a tiny box:

    - CPU 3Ghz
    - RAM 512 Mo
    - HDD 80Go (IDE, 55 Mo/s)


    skip-networkingtable_cache = 256innodb_file_per_table = 1innodb_buffer_pool_size = 256M (I cannot raise it up...)innodb_additional_mem_pool_size = 8Minnodb_log_file_size = 256Minnodb_log_buffer_size = 8Minnodb_flush_log_at_trx_commit = 2

    Could it be because ak_path index does not fit in buffer pool?

    Note: update speed is quite normal (15 000 - 20 000 rows/s).

    I have also tried to use "path" field as PRIMARY KEY hoping that it would be faster using a clustered index but in this case total index size grows up and update speed is slower because of B-TREE rebalancing.

    Thank you for your help!

    PS: No, I do not plan to use nested sets... ;o) (not yet!)

  • #2
    Interesting.

    Take a look at handler stats when you run this query with LOCK IN SHARE MODE and SELECT FOR UPDATE

    I do not see any difference in how locking is done - it should be about same in both cases.

    Comment


    • #3
      Hi again!

      Watching "mysqladmin extended-status -ri 10" did not help:

      - Handler_read_next goes up when using index (and index scan)
      - Handler_read_rnd_next goes up when forcing table scan

      The only difference is that SELECT ... FOR UPDATE when using ak_path index is very slow while SELECT ... LOCK IN SHARE MODE runs "normaly".

      The buffer pool is small so these three queries are all IO bounded: the HDD led confirms that the hard drive is really having a bad time... ;o)


      Running "iostat -xk 10":SELECT ... LOCK IN SHARE MODEDevice: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %utilhda 0.00 3.50 119.68 0.80 3931.47 12.19 1965.73 6.09 32.73 1.16 9.19 7.67 92.37hda 0.00 2.00 117.12 1.30 3781.38 10.81 1890.69 5.41 32.02 1.22 10.71 7.87 93.17hda 0.00 4.09 120.96 1.50 3877.05 16.57 1938.52 8.28 31.80 1.39 11.39 7.56 92.58hda 0.00 3.60 136.40 1.30 4444.80 15.20 2222.40 7.60 32.39 1.24 8.98 6.70 92.28hda 0.00 3.60 129.90 1.20 4236.80 14.40 2118.40 7.20 32.43 1.19 9.08 6.88 90.20InnoDB output (50 000 reads/s) <= index scan, sounds good1 pending preads, 0 pending pwrites120.93 reads/s, 16858 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s118.93 reads/s, 17029 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s126.62 reads/s, 17694 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s# -----SELECT ... FOR UPDATEDevice: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %utilhda 0.00 3.70 240.54 1.40 8760.76 16.82 4380.38 8.41 36.28 1.75 7.22 4.07 98.47hda 0.00 3.60 199.70 1.20 6818.78 14.39 3409.39 7.19 34.01 1.34 6.66 4.89 98.16hda 0.00 3.60 234.43 1.30 8798.40 14.61 4399.20 7.31 37.39 1.53 6.48 4.18 98.51hda 0.00 3.60 238.36 1.40 9356.24 15.38 4678.12 7.69 39.09 1.58 6.59 4.10 98.31hda 0.00 3.60 237.10 1.20 8820.80 15.20 4410.40 7.60 37.08 1.63 6.68 4.11 98.01InnoDB output (500 reads/s) <= index scan that sucks, but why?1 pending preads, 0 pending pwrites215.17 reads/s, 20876 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s179.11 reads/s, 18670 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s217.67 reads/s, 21741 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s235.24 reads/s, 20179 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s# -----SELECT [INDEX & TABLE SCAN] ... FOR UPDATE => InnoDB output: 250 000 reads/sDevice: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %utilhda 7.69 3.60 185.51 1.70 43442.96 17.18 21721.48 8.59 232.14 0.61 3.23 2.96 55.34hda 6.80 3.60 177.20 1.20 42112.00 14.40 21056.00 7.20 236.13 0.57 3.19 2.87 51.20hda 6.41 3.60 171.57 1.70 39933.53 16.02 19966.77 8.01 230.56 0.54 3.09 2.87 49.78hda 8.30 3.60 190.70 1.40 44837.40 16.00 22418.70 8.00 233.49 0.65 3.38 2.94 56.53hda 6.09 3.60 160.64 1.40 38005.19 15.98 19002.60 7.99 234.64 0.51 3.18 2.84 45.97InnoDB output (500 000 reads/s) <= at least a good news, table scan rocks... ;o)(I forgot to copy and paste the data...)# -----hdparm /dev/hda: multcount = 16 (on) IO_support = 3 (32-bit w/sync) unmaskirq = 0 (off) using_dma = 1 (on) keepsettings = 0 (off) readonly = 0 (off) readahead = 256 (on) geometry = 65535/16/63, sectors = 156301488, start = 0hdparm -Tt /dev/hda: Timing cached reads: 2504 MB in 2.00 seconds = 1249.69 MB/sec Timing buffered disk reads: 158 MB in 3.00 seconds = 52.60 MB/sec Timing cached reads: 2524 MB in 2.00 seconds = 1261.56 MB/sec Timing buffered disk reads: 166 MB in 3.02 seconds = 54.96 MB/sec Timing cached reads: 2516 MB in 2.00 seconds = 1257.56 MB/sec Timing buffered disk reads: 154 MB in 3.02 seconds = 51.02 MB/sec


      The query cache is disabled and I ran each query after restarting MySQL. I will try ASAP using ak_path as PRIMARY KEY or/and with less rows in order to get the whole table in the buffer pool.

      There is no speed difference between PRIMARY KEY Index scan and full table scan. I think this is because rows are physically ordered by id.

      Is there some pages in MySQL documention that explains how InnoDB row locking internally works? Does it need to read the whole row or just the index?

      Did I miss something or is it just "normal"?

      Comment


      • #4
        I asked Heikki about it and he filed the bug report

        http://bugs.mysql.com/bug.php?id=25914

        Basically SELECT FOR UPDATE can't be "using index" because it needs to locks rows as well. Standard select as well as LOCK IN SHARE MODE can.

        Comment


        • #5
          Grmph... (

          Note: if it helps, I use MySQL 4.1 (stable) on Debian (Sarge).

          This is getting a little "technical" for me but I would like to be sure I really understand what is going on.

          The optimizer says it will use an index (PRIMARY, secondary, whatever) when SELECT ... FOR UPDATE locking is requeted but in reality, MySQL do not use it. Right?

          > Suggested fix:
          > The optimizer should be aware that a SELECT ... FOR UPDATE query does not use a 'key read'.

          Sorry for the stupid question but what does this means "in real life"? I mean, what is doing MySQL if it does not use an index?

          Does the ORDER BY clause still works or are rows locked in another/unknown order?

          -----

          If SELECT ... FOR UPDATE is "unusable", when shifting/moving a node, can/should I try this instead:


          1) BEGIN2) SELECT COUNT(*) WHERE ORDER BY path LOCK IN SHARE MODE3) do some PHP stuff (but do it fast...)4) UPDATE SET path = XXX WHERE 5) COMMIT

          Is it "cheap" to upgrade from a READ lock to a WRITE lock? Is there some gotchas to watch for when not using directly a WRITE lock?

          Rows are always locked in the same order (ORDER BY path ASC) but is it the right way to avoid/minimize deadlocks?

          Sorry for all these questions and many thanks for taking the time to help me! )

          Comment


          • #6
            Sorry. It should be "using index"

            It does uses the index but with other selects it only can use index without touching the row data - this is why it is fast.

            With SELECT FOR UPDATE it has to read the row data as well this is what slows it down.

            At this stage full table scan becomes faster because it is sequential IO.

            Comment


            • #7
              (Newbie in progress) still trying to understand: please do not laugh... ;o)

              MySQL: Innodb isolation mode affect on query performance
              Quote:

              Basically Innodb always need to perform row read if it needs to lock the row (even if covered index is used).



              14.2.11.8. Locks Set by Different SQL Statements in InnoDB
              Quote:

              A locking read, an UPDATE, or a DELETE generally set record locks on every index record that is scanned in the processing of the SQL statement. It does not matter if there are WHERE conditions in the statement that would exclude the row. InnoDB does not remember the exact WHERE condition, but only knows which index ranges were scanned. The record locks are normally next-key locks that also block inserts to the “gap” immediately before the record.

              If the locks to be set are exclusive, InnoDB always retrieves also the clustered index record and sets a lock on it.



              So, let me rephrase what happens:


              - SELECT ... LOCK IN SHARE MODE: can use a secondary index AND "stay" within it => secondary index range scan + read lock directly on (secondary) index- SELECT ... FOR UPDATE: can use a secondary index BUT also "jump" to the PRIMARY index containing the row in order to lock it => secondary index range scan + clustered index seek + write lock on row

              Is that right or do I need to RTFM again? )

              Comment


              • #8
                Right. And this means a lot of random IO for primary index.

                Comment


                • #9
                  Great! Now I can die in peace... )

                  Many, many thanks for your help!

                  [Going back "playing" with this awesome "toy"...]

                  Comment

                  Working...
                  X