How to Speed Up Pattern Matching Queries

March 19, 2018
Author
Tibor Korocz
Share this Post:

pattern matching queriesFrom time to time I see pattern matching queries with conditions that look like this: “where fieldname like ‘%something%’ “. MySQL cannot use indexes for these kinds of queries, which means it has to do a table scan every single time.

(That’s really only half true — there are the FullText indexes. In another blog post I will cover FullText indexes as well.)

I recently was trying to find a solution, and my friend Charles Nagy reminded me of Trigrams. Let me show you the Trigram of the name Daniel:

But how is this useful?

Let me show you an example. You have the following email schema:

With data like these:

And we are looking for email addresses like ‘%n.pierre%’:

There are 11 email addresses, but it has to scan the whole index (318458 rows). That’s not good! Let’s try and make it better.

Trigram table

I created a table like this:

As we can see, there is an index called “trigram“.

The plan is to create a trigram for every single email addresses. I wrote the following trigger:

When there is an insert, it creates and inserts the trigrams into the email_trigram table. Trigrams for anderson.pierre:

With the following query, we can find all the email addresses with n.pierre:

It does not have to read the whole table, but still needs to read a lot of rows and even using filesort. I did not want to create trigrams manually, so I wrote the following procedure:

Since with trigrams we are looking for parts of the words (like err or ier), there can be many matches. If we are using a longer condition like derson.pierre, the procedure needed to read 65722 rows. This is also a lot.

Let’s have a look at the selectivity a bit:

There are parts that give back many rows. As I mentioned, more parts mean more rows.

I was hoping for a bigger improvement, so I wondered what else we could do. MySQL cannot use an index because of the leading %. How can we avoid that? Let’s save all the possible versions of the email address that we could be looking for.

(I don’t know if there is any official name of this method — if someone knows it, please write a comment.)

Shorting method

Hmm.. could this work? Let’s test it. I created the following table and trigger:

Let’s find the email addresses that contain n.pierre:

Wow, that is much better than the previous one! It is more than 100 times faster! Now you can have a beer because you deserve it. 🙂

Selectivity

There are parts that result in many readings as well, but it helps a lot now that we are using a longer pattern:

Using more than six characters gives us a much better selectivity.

Table statistics

In this test, I used 318458 random email addresses, and both methods created 2749000 additional rows.

Size on the disk:

As we expected they will use more space than the original table.

Cons

  • Both solutions require an extra table
  • That table contains millions of short rows, and it could use a few gigs of space
  • Requires three triggers (insert, update and delete, which could affect the write performance on the table) or the application has to keep that table up-to-date

Pros

  • Finding an email address is going to be much faster and require fewer reads.
  • Users will be much more satisfied.

Conclusion

If there is no built in solution or index in MySQL that can help or solve your problem, do not give up. Many times, with small modifications, you can create your own index table or use other tricks.

In this specific case, if you are willing to sacrifice some additional disk space you can speed up your queries a lot with the right method. Trigram was not the best option, but I can see use cases where it could be better.

This is just one possible solution, there could be an even better one. If you know one, please let me know.

Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Far
Enough.

Said no pioneer ever.
MySQL, PostgreSQL, InnoDB, MariaDB, MongoDB and Kubernetes are trademarks for their respective owners.
© 2026 Percona All Rights Reserved