Announcement

Announcement Module
Collapse
No announcement yet.

performanceproblem string compare in huge tables

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • performanceproblem string compare in huge tables

    hi!

    i have the following problem:

    i got 2 tables with much data (table1: 170K records, table2: 7K records), both tables have email fields i want to compare with each other.

    for better results i added to both tables a field "email_compare" and put an index on it.

    now i want to delete (or mark) all records from table1 that are in table2 (compare by email).

    everything i tried so far ended up in waiting for HOURS until i killed the process.

    i tried:

    UPDATE table1 t1
    LEFT JOIN table2 t2 ON t1.email_compare=t2.email_compare
    SET t1.delete_marker=1
    WHERE t2.email_compare IS NOT NULL

    or

    UPDATE table1
    SET delete_marker=1
    WHERE email_compare IN (SELECT email_compare FROM table2)

    or

    UPDATE table1 SET delete_marker=1 WHERE (SELECT 1 FROM table2


    if its really impossible to speed the above ones up,
    i think it would be better to begin with table2. but how can i execute an update to table1 for each entry in table2?

    or any other suggestions?


    thank you in advance )

  • #2
    Your tables really aren't that big, so there's probably something else going wrong. Try this:


    UPDATE table1 t1, table2 t2SET t1.delete_marker=1WHERE t1.email_compare = t2.email_compare


    If that is still slow, can you provide the output of this:


    EXPLAIN SELECT t1.email_compare, t2.email_compareFROM table1 AS t1, table2 AS t2WHERE t1.email_compare = t2.email_compare


    As well as


    SHOW INDEXES FROM table1;SHOW INDEXES FROM table2;

    Comment


    • #3
      i am trying your first suggestion at the moment, i'm waiting for the result...

      as well i'm waiting for the result of the explain at the moment...

      show indexes result is for both tables primary key (id) and email_compare

      Comment


      • #4
        oh, in the processlist i realized that explain is locked while updating. i killed the process again and the result of your explain command is:

        1 SIMPLE t1 index email_compare email_compare 82 178836 Using index

        1 SIMPLE t2 index email_compare email_compare 257 24612 Using where; Using index

        Comment


        • #5
          Ok, so that EXPLAIN result is very bad. It's basically saying for every row in t1, it will try to match against every row in t2. This is why things are taking so long. If you check SHOW CREATE TABLE for t1 and t2, are the definitions of email_compare exactly the same? If you have something like:

          CREATE TABLE t1 (
          email_compare VARCHAR(255)
          );

          CREATE TABLE t2 (
          email_compare VARCHAR(64)
          );

          You won't get good join performance. I'm guessing that's the case because you have a different key length for each table in the output of the EXPLAIN.

          Comment


          • #6
            you are right, the key lengths are different. will it be much better if the varchar length is equal on both tables?

            yesterday i solved the problem by writing a php script, but i will definitely try this with equal key length. i will report if the result is better.

            thank you for your help )

            Comment

            Working...
            X