Announcement

Announcement Module
Collapse
No announcement yet.

huge table + function based comparison optimization possible?

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • huge table + function based comparison optimization possible?

    Hi,
    I have a huge table (2 million records) which have fields that need to be compared to each other in another table.

    These fields must use a comparison function (which i coded as mysql UDF extension). The comparison function itself is quite fast, but, I cannot create an index on the function.

    table1:

    id, staticdata, data


    table2:

    id, staticdata, data

    query:
    SELECT table1.id, table2.id FROM table1 LEFT JOIN table2 ON ( table1.staticdata = table2.staticdata ) WHERE mycompare(table1.data, table2.data) > 90;


    id is primary key and staticdata has indexes

    1 result takes 0.1s
    2 results take 50s
    100 result take days
    etc


    if i code this without MySQL results are 10000x faster. However i would like to use the database functionalities. But taking days instead of minutes, is a show stopper of course

    Or should i switch to postgres sql to create an index on the "mycompare" function?
    Due to the size of the table, im unable to run mycompare separatly and store results.it would take too much space (and table wouldnt fit in memory anyway)

  • #2
    If the WHERE clause function can't be resolved to a constant, MySQL will have to look at every combination.

    This link may be of use to you:
    http://www.mysqlperformanceblog.com/2007/10/17/mysql-perform ance-eliminating-order-by-function/

    Comment


    • #3
      just saying that for anyone getting the same problem:

      - split each where/join statement into its own query
      - insert each result into a new table
      - index those tables
      - make compares between those tables

      this is much faster.
      the compare function itself, is still slow of course, but yet doing it this way is as fast as one would expect it (unlike the complex query)

      other ways to look into:

      - split function compare into sets of rows, and run into separate queries, good if you have multiple cpus, disks, etc. (useless on a single cpu system)

      - use merge table if the table is very big ? it seems to me, that mysql gets extremely slow when your table > 4Gb, at leas ton 32 bit systems

      Comment

      Working...
      X