GET 24/7 LIVE HELP NOW

Announcement

Announcement Module
Collapse
No announcement yet.

Long Primary Keys in MySQL

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Long Primary Keys in MySQL

    Hi,

    I'm trying to find a solution to a very common problem in databases - enforcing a primary key constraint. Problem is, the column I'm trying to enforce it on is more than 768 bytes in length, which is the maximum size of an index column in InnoDB. Lets say the data there is a URL, longer than 768 bytes.

    This means I can't use the native primary key enforcement that the database supplies, and have to develop something by myself.
    I've tried different combinations of selects & inserts with different locking modes. The only way I could find to perform this is to lock the entire table:

    SELECT url FROM t WHERE url=x;

    then if we don't find a row

    LOCK TABLE t WRITE;SELECT url FROM t WHERE url=x;

    If we don't find a row again, we are certain it's not there, and insert it.

    INSERT INTO t VALUES (x)

    Problem with this solution is the table lock. When the database performs regular locks to enforce primary keys, it lock records in place, and ranges in the index. Here, I have no B-Tree structure to do this, so I have to lock the entire table.

    Can anyone think of any solution that avoids locking the entire table? It's really horrible for performance and concurrency...

    I'd appreciate any input! )
    Shlomo
    http://blog.shlomoid.com

  • #2
    The primary key should be as short as possible - especially when you use InnoDB.

    My Idea: Use a MD5 of the long string. This limits the length to 32 bytes (or even 16 if you store it as binary).

    Comment


    • #3
      Yes, the fact a primary key should be short as possible is clear )

      We decided that MD5/Other hash is not good enough since we might have collisions AFAIK. And rare as they may be, it's still possible and false positives are a big no-no in this scenario.

      MD5s by the way can be used to make searches faster. You index only the MD5 column, and do the select on the URL+MD5. Then the index minimizes the results to a really small number, and duplicates are filtered. Even a simple checksum does the same effect.

      Comment

      Working...
      X