Announcement

Announcement Module
Collapse
No announcement yet.

Deadlocks when running pt-online-schema-change on XtraDB cluster

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Deadlocks when running pt-online-schema-change on XtraDB cluster

    Hi,

    We have fairly large databases (100GB+) and when running pt-online-schema-change on a table that is 50GB+ I see deadlocks on regular transactions against the table I'm running the migration against.

    For example:
    pt-online-schema-change --user $user --password $pw --recursion-method none --execute --progress time,30 --execute --alter 'MODIFY COLUMN state VARCHAR(10) NOT NULL','DROP INDEX bleh' D=name_production,t=invoices

    The app pointing to the invoices table gets deadlocks but it looks like the underlying error comes from the invoices_new (temp) table:

    140221 14:13:13 *** (1) TRANSACTION: TRANSACTION E943A14E, ACTIVE 0 sec inserting mysql tables in use 2, locked 2 LOCK WAIT 7 lock struct(s), heap size 1248, 5 row lock(s), undo log entries 3 MySQL thread id 6231227, OS thread handle 0x7eff8cc8c700, query id 158584959 worker.hostname.com ip.add.ress name update REPLACE INTO `name_production`.`_invoices_new` schema VALUES (NEW.<snip> *** (1) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 685 page no 279991 n bits 120 index `index_invoices_on_site_id_and_invoice_number` of table `database_production`.`_invoices_new` trx id E943A14E lock_mode X waiting *** (2) TRANSACTION: TRANSACTION E943A151, ACTIVE 0 sec inserting mysql tables in use 2, locked 2 6 lock struct(s), heap size 1248, 4 row lock(s), undo log entries 3 MySQL thread id 6231228, OS thread handle 0x7eff8e169700, query id 158584964 worker.hostname.com 10.128.2.42 recurly update REPLACE INTO `database_production`.`_invoices_new` (schema) VALUES (NEW.<snip> *** (2) HOLDS THE LOCK(S): RECORD LOCKS space id 685 page no 279991 n bits 120 index `index_invoices_on_site_id_and_invoice_number` of table `database_production`.`_invoices_new` trx id E943A151 lock_mode X locks rec but not gap *** (2) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 685 page no 279991 n bits 120 index `index_invoices_on_site_id_and_invoice_number` of table `database_production`.`_invoices_new` trx id E943A151 lock_mode X waiting *** WE ROLL BACK TRANSACTION (2) Any help would be appreciated! Thanks

  • #2
    Deadlocks between the trigger updates and the table copy can happen. These conflicts are more likely on a busy server, and even more likely when writes are made into more than one node. In PXC all applications must be able to deal with deadlocks in order to handle write conflicts, so such deadlocks should be a nuisance but not a problem.

    Comment


    • #3
      I'm running a 3 node cluster, all writes and migrations are done on the first node. The other two nodes are used for read-only traffic and failover.

      This didn't seem to happen with LHM. Is this a problem specific to the percona online schema change tool?

      Comment


      • #4
        What version of pt-osc you are using ? May be this bug is affecting you https://bugs.launchpad.net/percona-toolkit/+bug/988036
        I would suggest to try with latest version. Further, pt-osc deals with chunk size it might be possible that it's using a very large chunk size and it can't get locks for all those rows. So I would suggest to use smaller value of --chunk-size while using pt-osc to avoid the tool from selecting too many rows. Give it try with smaller chunk size. Details are here http://www.percona.com/doc/percona-t...ge--chunk-size
        Also, other option to try during offpeak time.


        Hope it helps.

        Comment

        Working...
        X