Why Unique Indexes are Bad

Before creating a unique index in TokuMX or TokuDB, ask yourself, “does my application really depend on the database enforcing uniqueness of this key?” If the answer is ANYTHING other than yes, do not declare the index to be unique. Why? Because unique indexes may kill your write performance. In this post, I’ll explain why.

Unique indexes are a strange beast: they have no impact on standard databases that use B-Trees, such as MongoDB and MySQL, but may be horribly painful for databases that use write optimized data structures, like TokuMX’s Fractal Tree(R) indexes. How? They essentially drag the Fractal Tree index down to the B-Tree’s level of performance.

When a user declares a unique index, the user tells the database, “please help me and enforce uniqueness on this index.” So, before doing any insertion into a unique index, the database must first verify that the key being inserted does not already exist. If the possible location of the key is not in memory, which may happen if the working set does not fit in memory, then the database MUST perform an I/O to bring into memory the contents of the potential location (be it a leaf node in a tree, or an offset into a memory mapped file), in order to check whether the key exists in that location.

I/Os are expensive. A hard disk has only so many I/O’s available per second, so we should be wary of ever using one when we don’t need to.

This is where the difference in impact that unique indexes have on B-Trees v. write optimized data structures comes into play. For the B-Tree, the I/O does not matter, because the subsequent insertion would perform the I/O anyway, to insert the key. But a Fractal Tree index does not require an I/O on insertion. For Fractal Tree indexes, this I/O is a big cost!

So essentially, unique keys drag the performance of Fractal Tree indexes down to a B-Tree’s level, eliminating one of the biggest innovations of TokuMX and TokuDB for MySQL.

So, the moral of the story is this: don’t create a unique index unless you REALLY REALLY have to. Your performance may suffer greatly. Don’t create unique indexes because you happen to know an index will be unique. Only do so if your application depends on the database enforcing uniqueness. And if it really does, you may want to try to find a way to change your application.

Share this post

Comments (12)

  • Mark Callaghan Reply

    They have an impact on InnoDB performance – the insert buffer does not support them.

    July 15, 2013 at 4:32 pm
  • Shlomi Noach Reply

    Unique keys are actually quite encouraged. They have positive impact on SELECT performance because MySQL has much more insight into the possible query execution plans.

    I suspect your recommendation of “don’t use UNIQUE keys unless you REALLY REALLY have to, or else TokuDB is dragged down into B-Tree performance” is offputting. It’s like the fine print one never reads until something goes wrong.

    As disclosure, I’m now evaluating TokuDB as replacement for InnoDB on a fast-growing DWH, and so far results are good making for high expectations.
    Data compressions is exceptionally good in our DWH, and I suspect much of it will fit in memory once done. This may ease out the problem you mentioned for having to perform I/O to get relevant page to memory.


    July 15, 2013 at 5:03 pm
    • Zardosht Kasheff Reply

      Shlomi, thank you for the feedback. I have a question on the impact of SELECT performance. Don’t query execution plans get most of this insight by analyzing the key distribution of the table, as done with “analyze table”? I am not an expert in the query optimizer, but I hope that declaring the index as non-unique and relying on the key distribution statistics as well as range estimates will get many queries similar benefits.

      July 15, 2013 at 5:29 pm
      • Baron Reply

        In MySQL, cardinality indexes are not as effective as you might hope. In some cases MySQL will do “get next, get next…” until it hits something analogous to EOF. If it knows the index is unique, then in some cases it stops after one row, and doesn’t do the “get next” that would have (if the index were unique but not declared as such) returned EOF. I’ve benchmarked this actually mattering quite a bit, although I forget where – maybe in High Performance MySQL.

        July 16, 2013 at 12:15 am
        • Baron Reply

          s/cardinality indexes/cardinality estimates/

          July 16, 2013 at 12:16 am
        • Zardosht Kasheff Reply

          Baron, were these benchmarks done on I/O bound workloads? I imagine if getting the first row requires an I/O, then the subsequent “get next” is quite cheap. I can see this making a big difference on in-memory queries.

          I now see I should have done a better job in discussing the workloads I was thinking of: data > RAM. If data < RAM, then the uniqueness check is an in-memory check and is not as expensive as cases where the uniqueness check induces an I/O.

          July 16, 2013 at 1:51 am
  • Benjamin Darfler Reply

    How does this intersect with the _id field in MongoDB since it is Unique by default?

    July 16, 2013 at 12:33 am
    • Zardosht Kasheff Reply

      Similar to an auto increment in MySQL, auto-generated _id fields in MongoDB are sequential. So while a uniqueness check is still required, the check will be done on the right most path of the _id index, making it an in-memory check. If the _id field is user-generated and random, and data > RAM, then the uniqueness check will induce I/O and severely hurt performance.

      July 16, 2013 at 1:55 am
  • NPSF3000 Reply

    Is this post badly worded?

    As far as I can see it the actual problem is using unique index constraint on random (or distributed) field in a high insert workload.

    If so, there are plenty of cases when using the unique index has minimal impact (e.g. any use-case where reads or updates are more common) and far easier to use the inbuilt functionality than trying to rework your data.

    July 26, 2013 at 4:14 am
    • Zardosht Kasheff Reply

      As I said in the post, if one depends on the application to ensure uniqueness, then use it. That dependence can be a result of ease of use.

      July 26, 2013 at 6:07 pm
  • Jehad Keriaki Reply

    I think it is not matter of “have to” or “really really have to”. It is either required or not required. If uniqueness is required, then it should be enforced at the DB level.
    Handling this at application level is risky, specially in high traffic environment, or in more complex architecture where replication is setup.
    Additionally, checking this at application level involves reading (selecting) from the table, using a regular key, which I assume MySQL would do internally in a more efficient way. So, I don’t think there is saving by enforcing uniqueness at application level.

    April 23, 2015 at 12:06 pm
  • leafonsword Reply

    Even if tables have unique indexes, but index_size memory, insertion speed will not be impacted?

    August 23, 2015 at 7:10 am

Leave a Reply