InnoDB Full-text Search in MySQL 5.6: Part 3, Performance

This is part 3 of a 3 part series covering the new InnoDB full-text search features in MySQL 5.6. To catch up on the previous parts, see part 1 or part 2

Some of you may recall a few months ago that I promised a third part in my InnoDB full-text search (FTS) series, in which I’d actually take a look at the performance of InnoDB FTS in MySQL 5.6 versus traditional MyISAM FTS. I hadn’t planned on quite such a gap between part 2 and part 3, but as they say, better late than never. Recall that we have been working with two data sets, one which I call SEO (8000-keyword-stuffed web pages) and the other which I call DIR (800K directory records), and we are comparing MyISAM FTS in MySQL 5.5.30 versus InnoDB FTS in MySQL 5.6.10.

For reference, although this is not really what I would call a benchmark run, the platform I’m using here is a Core i7-2600 3.4GHz, 32GiB of RAM, and 2 Samsung 256GB 830 SSDs in RAID-0. The OS is CentOS 6.4, and the filesystem is XFS with dm-crypt/LUKS. All MySQL settings are their respective defaults, except for innodb_ft_min_token_size, which is set to 4 (instead of the default of 3) to match MyISAM’s default ft_min_word_len.

Also, recall that the table definition for the DIR data set is:

The table definition for the SEO data set is:

Table Load / Index Creation

First, let’s try loading data and creating our FT indexes in one pass – i.e., we’ll create the FT indexes as part of the original table definition itself. In particular, this means adding “FULLTEXT KEY (full_name, details)” to our DIR tables and adding “FULLTEXT KEY (title, body)” to the SEO tables. We’ll then drop these tables, drop our file cache, restart MySQL, and try the same process in two passes: first we’ll load the table, and then we’ll do an ALTER to add the FT indexes. All times in seconds.

Engine Data Set one-pass (load) two-pass (load, alter)
MyISAM SEO 3.91 3.96 (0.76, 3.20)
InnoDB SEO 3.777 7.32 (1.53, 5.79)
MyISAM DIR 43.159 44.93 (6.99, 37.94)
InnoDB DIR 330.76 56.99 (12.70, 44.29)

Interesting. For MyISAM, we might say that it really doesn’t make too much difference which way you proceed, as the numbers from the one-pass load and the two-pass load are within a few percent of each other, but for InnoDB, we have mixed behavior. With the smaller SEO data set, it makes more sense to do it in a one-pass process, but with the larger DIR data set, the two-pass load is much faster.

Recall that when adding the first FT index to an InnoDB table, the table itself has to be rebuilt to add the FTS_DOC_ID column, so I suspect that the size of the table when it gets rebuilt has a lot to do with the performance difference on the smaller data set. The SEO data set fits completely into the buffer pool, the DIR data set does not. That also suggests that it’s worth comparing the time required to add a second FT index (this time we will just index each table’s TEXT/MEDIUMTEXT field). While we’re at it, let’s look at the time required to drop the second FT index as well. Again, all times in seconds.

Engine Data Set FT Index Create Time FT Index Drop Time
MyISAM SEO 6.34 3.17
InnoDB SEO 3.26 0.01
MyISAM DIR 74.96 37.82
InnoDB DIR 24.59 0.01

InnoDB wins this second test all around. I’d attribute InnoDB’s win here partially to not having to rebuild the whole table with second (and subsequent) indexes, but also to the fact that at least some the InnoDB data was already in the buffer pool from when the first FT index was created. Also, we know that InnoDB generally drops indexes extremely quickly, whereas MyISAM requires a rebuild of the .MYI file, so InnoDB’s win on the drop test isn’t surprising.

Query Performance

Recall the queries that were used in the previous post from this series:

The queries were run consecutively from top to bottom, a total of 10 times each. Here are the results in tabular format:

Query # Engine Min. Execution Time Avg. Execution Time Max. Execution Time
1 MyISAM 0.007953 0.008102 0.008409
1 InnoDB 0.014986 0.015331 0.016243
2 MyISAM 0.001815 0.001893 0.001998
2 InnoDB 0.001987 0.002077 0.002156
3 MyISAM 0.000748 0.000817 0.000871
3 InnoDB 0.670110 0.676540 0.684837
4 MyISAM 0.001199 0.001283 0.001372
4 InnoDB 0.055479 0.056256 0.060985
5 MyISAM 0.008471 0.008597 0.008817
5 InnoDB 0.624305 0.630959 0.641415

Not a lot of variance in execution times for a given query, so that’s good, but InnoDB is always coming back slower than MyISAM. In general, I’m not that surprised that MyISAM tends to be faster; this is a simple single-threaded, read-only test, so none of the areas where InnoDB shines (e.g., concurrent read/write access) are being exercised here, but I am quite surprised by queries #3 and #5, where InnoDB is just getting smoked.

I ran both versions of query 5 with profiling enabled, and for the most part, the time spent in each query state was identical between the InnoDB and MyISAM versions of the query, with one exception.

InnoDB: | Creating sort index | 0.626529 |
MyISAM: | Creating sort index | 0.014588 |

That’s where the bulk of the execution time is. According to the docs, this thread state means that the thread is processing a SELECT which required an internal temporary table. Ok, sure, that makes sense, but it doesn’t really explain why InnoDB is taking so much longer, and here’s where things get a bit interesting. If you recall part 2 in this series, query 5 actually returned 0 results when run against InnoDB with the default configuration because of the middle initial “B”, and I had to set innodb_ft_min_token_size to 1 in order to get results back. For the sake of completeness, I did that again here, then restarted the server and recreated my FT index. The results? Execution time dropped by 50% and ‘Creating sort index’ didn’t even appear in the query profile:

Hm. It’s still slower than MyISAM by quite a bit, but much faster than before. The reason it’s faster is because it found an exact match and I only asked for one row, but if I change LIMIT 1 to LIMIT 2 (or limit N>1), then ‘Creating sort index’ returns to the tune of roughly 0.5 to 0.6 seconds, and ‘FULLTEXT initialization’ remains at 0.3 seconds. So this answers another lingering question: there is a significant performance impact to using a lower innodb_ft_min_token_size (ifmts), and it can work for you or against you, depending upon your queries and how many rows you’re searching for. The time spent in “Creating sort index” doesn’t vary too much (maybe 0.05s) between ifmts=1 and ifmts=4, but the time spent in FULLTEXT initialization with ifmts=4 was typically only a few milliseconds, as opposed to the 300ms seen here.

Finally, I tried experimenting with different buffer pool sizes, temporary table sizes, per-thread buffer sizes, and I also tried changing from Antelope (ROW_FORMAT=COMPACT) to Barracuda (ROW_FORMAT=DYNAMIC) and switching character sets from utf8 to latin1, but none of these made any difference. The only thing which seemed to provide a bit of a performance improvement was upgrading to 5.6.12. The execution times for the InnoDB FTS queries under 5.6.12 were about 5-10 percent faster than with 5.6.10, and query #2 actually performed a bit better under InnoDB than MyISAM (average execution time 0.00075 seconds faster), but other than that, MyISAM still wins on raw SELECT performance.

Three blog posts later, then, what’s my overall take on InnoDB FTS in MySQL 5.6? I don’t think it’s great, but it’s serviceable. The performance for BOOLEAN MODE queries definitely leaves something to be desired, but I think InnoDB FTS fills a need for those people who want the features and capabilities of InnoDB but can’t modify their existing applications or who just don’t have enough FTS traffic to justify building out a Sphinx/Solr/Lucene-based solution.

Share this post

Comments (7)

  • Bill Karwin

    It’s also worth noting that “serviceable” is orders of magnitude better than the solution that too many naive MySQL developers still use — “LIKE ‘%word%'”. It’s nice that we now have a reasonably efficient FTS feature that requires neither additional technology, nor feeling bound to MyISAM tables.

    July 31, 2013 at 12:51 pm
  • Jimmy

    On a similar test performed on “select order by desc limit” on a wikidatabase (5.4 million rows, 16G table), InnoDB got far better performance for such test. (See We will look at your case.

    July 31, 2013 at 9:53 pm
  • Ronald Bradford

    Just reading the 5.6.13 release notes and I see “InnoDB’s new FULLTEXT now mirrors that of MyISAM with respect to leading wildcards in search terms, as a result of Zoltan Fedor’s report in Bug#68949.”

    I know from your earlier post there was mention of differences in comparison. Does this make any improvement?

    August 1, 2013 at 3:48 pm
  • Ernie Souhrada

    @Ronald: I just took a quick look at 5.6.13 with my test queries, and for the most part, everything is the same. The raw scores are a bit different, but the order and content of the search results are unchanged – with two interesting exceptions.

    The first of these is what Jimmy pointed out in a recent comment on part 2 in this series, which is that the issue with “Thomas B Smith” and the boolean mode query which required setting innodb_ft_min_token_size to 1 is now resolved. With 5.6.13 (and, apparently, also 5.6.12), this query returns the proper results without having to adjust the minimum token size and rebuild your indexes. So I’d call this a bug fix.

    I’m not sure what I’d call this next one. Incomplete, maybe. Under 5.5, as noted in part 2, this query returns 5 rows with MyISAM, and it also returns 5 rows with InnoDB, even if rows 2 through 5 are different: “SELECT id, full_name, MATCH(full_name, details) AGAINST (‘+james +peterson +arizona’ IN BOOLEAN MODE) AS score FROM dir_test_myisam ORDER BY 3 DESC LIMIT 5”

    However, under 5.6.10 and later (I have not checked every previous release of 5.6 to see where this changed), this same query, which still returns 5 rows if run against InnoDB, now only returns ONE row from MyISAM. Technically, that’s correct – there is only one row in the DB that matches all three terms, but I wonder why it is “fixed” for MyISAM and still “broken” for InnoDB.

    August 1, 2013 at 8:58 pm
  • Jimmy


    > However, under 5.6.10 and later (I have not checked every previous
    > release of 5.6 to see where this changed), this same query, which still
    > returns 5 rows if run against InnoDB, now only returns ONE row from
    > MyISAM. Technically, that’s correct – there is only one row in the DB
    > that matches all three terms, but I wonder why it is “fixed” for MyISAM
    > and still “broken” for InnoDB.

    in fact, MyISAM is broken here, so it is MyISAM technically incorrect. And it is being fixed. The correct SQL semantics requires the query to consider all rows in the table, and sort on the rank. So MyISAM is behave incorrect here.

    Another suggestion is that you might want to do tests on queries with where clause:

    SELECT .. against(..) as rel FROM .. WHERE match(…) against(‘..) ORDER BY rel desc LIMIT 10;

    In such way, only the matching records are considered, and this do away with dominating QP layer and scanning unmatching records cost, which then a more fair comparison between InnoDB and MyISAM FTS.

    August 1, 2013 at 9:58 pm