October 24, 2014

InnoDB Full-text Search in MySQL 5.6: Part 3, Performance

This is part 3 of a 3 part series covering the new InnoDB full-text search features in MySQL 5.6. To catch up on the previous parts, see part 1 or part 2

Some of you may recall a few months ago that I promised a third part in my InnoDB full-text search (FTS) series, in which I’d actually take a look at the performance of InnoDB FTS in MySQL 5.6 versus traditional MyISAM FTS. I hadn’t planned on quite such a gap between part 2 and part 3, but as they say, better late than never. Recall that we have been working with two data sets, one which I call SEO (8000-keyword-stuffed web pages) and the other which I call DIR (800K directory records), and we are comparing MyISAM FTS in MySQL 5.5.30 versus InnoDB FTS in MySQL 5.6.10.

For reference, although this is not really what I would call a benchmark run, the platform I’m using here is a Core i7-2600 3.4GHz, 32GiB of RAM, and 2 Samsung 256GB 830 SSDs in RAID-0. The OS is CentOS 6.4, and the filesystem is XFS with dm-crypt/LUKS. All MySQL settings are their respective defaults, except for innodb_ft_min_token_size, which is set to 4 (instead of the default of 3) to match MyISAM’s default ft_min_word_len.

Also, recall that the table definition for the DIR data set is:

The table definition for the SEO data set is:

Table Load / Index Creation

First, let’s try loading data and creating our FT indexes in one pass – i.e., we’ll create the FT indexes as part of the original table definition itself. In particular, this means adding “FULLTEXT KEY (full_name, details)” to our DIR tables and adding “FULLTEXT KEY (title, body)” to the SEO tables. We’ll then drop these tables, drop our file cache, restart MySQL, and try the same process in two passes: first we’ll load the table, and then we’ll do an ALTER to add the FT indexes. All times in seconds.

EngineData Setone-pass (load)two-pass (load, alter)
MyISAMSEO3.913.96 (0.76, 3.20)
InnoDBSEO3.7777.32 (1.53, 5.79)
MyISAMDIR43.15944.93 (6.99, 37.94)
InnoDBDIR330.7656.99 (12.70, 44.29)

Interesting. For MyISAM, we might say that it really doesn’t make too much difference which way you proceed, as the numbers from the one-pass load and the two-pass load are within a few percent of each other, but for InnoDB, we have mixed behavior. With the smaller SEO data set, it makes more sense to do it in a one-pass process, but with the larger DIR data set, the two-pass load is much faster.

Recall that when adding the first FT index to an InnoDB table, the table itself has to be rebuilt to add the FTS_DOC_ID column, so I suspect that the size of the table when it gets rebuilt has a lot to do with the performance difference on the smaller data set. The SEO data set fits completely into the buffer pool, the DIR data set does not. That also suggests that it’s worth comparing the time required to add a second FT index (this time we will just index each table’s TEXT/MEDIUMTEXT field). While we’re at it, let’s look at the time required to drop the second FT index as well. Again, all times in seconds.

EngineData SetFT Index Create TimeFT Index Drop Time
MyISAMSEO6.343.17
InnoDBSEO3.260.01
MyISAMDIR74.9637.82
InnoDBDIR24.590.01

InnoDB wins this second test all around. I’d attribute InnoDB’s win here partially to not having to rebuild the whole table with second (and subsequent) indexes, but also to the fact that at least some the InnoDB data was already in the buffer pool from when the first FT index was created. Also, we know that InnoDB generally drops indexes extremely quickly, whereas MyISAM requires a rebuild of the .MYI file, so InnoDB’s win on the drop test isn’t surprising.

Query Performance

Recall the queries that were used in the previous post from this series:

The queries were run consecutively from top to bottom, a total of 10 times each. Here are the results in tabular format:

Query #EngineMin. Execution TimeAvg. Execution TimeMax. Execution Time
1MyISAM0.0079530.0081020.008409
1InnoDB0.0149860.0153310.016243
2MyISAM0.0018150.0018930.001998
2InnoDB0.0019870.0020770.002156
3MyISAM0.0007480.0008170.000871
3InnoDB0.6701100.6765400.684837
4MyISAM0.0011990.0012830.001372
4InnoDB0.0554790.0562560.060985
5MyISAM0.0084710.0085970.008817
5InnoDB0.6243050.6309590.641415

Not a lot of variance in execution times for a given query, so that’s good, but InnoDB is always coming back slower than MyISAM. In general, I’m not that surprised that MyISAM tends to be faster; this is a simple single-threaded, read-only test, so none of the areas where InnoDB shines (e.g., concurrent read/write access) are being exercised here, but I am quite surprised by queries #3 and #5, where InnoDB is just getting smoked.

I ran both versions of query 5 with profiling enabled, and for the most part, the time spent in each query state was identical between the InnoDB and MyISAM versions of the query, with one exception.

InnoDB: | Creating sort index | 0.626529 |
MyISAM: | Creating sort index | 0.014588 |

That’s where the bulk of the execution time is. According to the docs, this thread state means that the thread is processing a SELECT which required an internal temporary table. Ok, sure, that makes sense, but it doesn’t really explain why InnoDB is taking so much longer, and here’s where things get a bit interesting. If you recall part 2 in this series, query 5 actually returned 0 results when run against InnoDB with the default configuration because of the middle initial “B”, and I had to set innodb_ft_min_token_size to 1 in order to get results back. For the sake of completeness, I did that again here, then restarted the server and recreated my FT index. The results? Execution time dropped by 50% and ‘Creating sort index’ didn’t even appear in the query profile:

Hm. It’s still slower than MyISAM by quite a bit, but much faster than before. The reason it’s faster is because it found an exact match and I only asked for one row, but if I change LIMIT 1 to LIMIT 2 (or limit N>1), then ‘Creating sort index’ returns to the tune of roughly 0.5 to 0.6 seconds, and ‘FULLTEXT initialization’ remains at 0.3 seconds. So this answers another lingering question: there is a significant performance impact to using a lower innodb_ft_min_token_size (ifmts), and it can work for you or against you, depending upon your queries and how many rows you’re searching for. The time spent in “Creating sort index” doesn’t vary too much (maybe 0.05s) between ifmts=1 and ifmts=4, but the time spent in FULLTEXT initialization with ifmts=4 was typically only a few milliseconds, as opposed to the 300ms seen here.

Finally, I tried experimenting with different buffer pool sizes, temporary table sizes, per-thread buffer sizes, and I also tried changing from Antelope (ROW_FORMAT=COMPACT) to Barracuda (ROW_FORMAT=DYNAMIC) and switching character sets from utf8 to latin1, but none of these made any difference. The only thing which seemed to provide a bit of a performance improvement was upgrading to 5.6.12. The execution times for the InnoDB FTS queries under 5.6.12 were about 5-10 percent faster than with 5.6.10, and query #2 actually performed a bit better under InnoDB than MyISAM (average execution time 0.00075 seconds faster), but other than that, MyISAM still wins on raw SELECT performance.

Three blog posts later, then, what’s my overall take on InnoDB FTS in MySQL 5.6? I don’t think it’s great, but it’s serviceable. The performance for BOOLEAN MODE queries definitely leaves something to be desired, but I think InnoDB FTS fills a need for those people who want the features and capabilities of InnoDB but can’t modify their existing applications or who just don’t have enough FTS traffic to justify building out a Sphinx/Solr/Lucene-based solution.

About Ernie Souhrada

Ernie joined Percona in April 2012 as a Senior Consultant. In his previous lives, he has been everything from a Perl/Java developer to a Linux sysadmin, a MySQL DBA to a Cisco network engineer, and a security auditor to an IT engineering manager, many of these things all at the same time. When not working on MySQL, he might be found on the ski slope, at a psytrance festival, or at the nearest sushi bar.

Comments

  1. It’s also worth noting that “serviceable” is orders of magnitude better than the solution that too many naive MySQL developers still use — “LIKE ‘%word%'”. It’s nice that we now have a reasonably efficient FTS feature that requires neither additional technology, nor feeling bound to MyISAM tables.

  2. Jimmy says:

    On a similar test performed on “select order by desc limit” on a wikidatabase (5.4 million rows, 16G table), InnoDB got far better performance for such test. (See https://blogs.oracle.com/mysqlinnodb/entry/performance_enhancement_in_full_text). We will look at your case.

  3. Just reading the 5.6.13 release notes and I see “InnoDB’s new FULLTEXT now mirrors that of MyISAM with respect to leading wildcards in search terms, as a result of Zoltan Fedor’s report in Bug#68949.”

    I know from your earlier post there was mention of differences in comparison. Does this make any improvement?

  4. @Ronald: I just took a quick look at 5.6.13 with my test queries, and for the most part, everything is the same. The raw scores are a bit different, but the order and content of the search results are unchanged – with two interesting exceptions.

    The first of these is what Jimmy pointed out in a recent comment on part 2 in this series, which is that the issue with “Thomas B Smith” and the boolean mode query which required setting innodb_ft_min_token_size to 1 is now resolved. With 5.6.13 (and, apparently, also 5.6.12), this query returns the proper results without having to adjust the minimum token size and rebuild your indexes. So I’d call this a bug fix.

    I’m not sure what I’d call this next one. Incomplete, maybe. Under 5.5, as noted in part 2, this query returns 5 rows with MyISAM, and it also returns 5 rows with InnoDB, even if rows 2 through 5 are different: “SELECT id, full_name, MATCH(full_name, details) AGAINST (‘+james +peterson +arizona’ IN BOOLEAN MODE) AS score FROM dir_test_myisam ORDER BY 3 DESC LIMIT 5″

    However, under 5.6.10 and later (I have not checked every previous release of 5.6 to see where this changed), this same query, which still returns 5 rows if run against InnoDB, now only returns ONE row from MyISAM. Technically, that’s correct – there is only one row in the DB that matches all three terms, but I wonder why it is “fixed” for MyISAM and still “broken” for InnoDB.

  5. Jimmy says:

    Ernie,

    > However, under 5.6.10 and later (I have not checked every previous
    > release of 5.6 to see where this changed), this same query, which still
    > returns 5 rows if run against InnoDB, now only returns ONE row from
    > MyISAM. Technically, that’s correct – there is only one row in the DB
    > that matches all three terms, but I wonder why it is “fixed” for MyISAM
    > and still “broken” for InnoDB.

    in fact, MyISAM is broken here, so it is MyISAM technically incorrect. And it is being fixed. The correct SQL semantics requires the query to consider all rows in the table, and sort on the rank. So MyISAM is behave incorrect here.

    Another suggestion is that you might want to do tests on queries with where clause:

    SELECT .. against(..) as rel FROM .. WHERE match(…) against(‘..) ORDER BY rel desc LIMIT 10;

    In such way, only the matching records are considered, and this do away with dominating QP layer and scanning unmatching records cost, which then a more fair comparison between InnoDB and MyISAM FTS.

  6. Lazuardi Nasution says:

    Is it possible to specify data and index storage location of InnoDB FTS files? I can do that for InnoDB table file by specifiying DATA DIRECTORY and INDEX DIRECTORY options.

  7. Ernie S says:

    @Lazuardi-

    No, that’s not possible. InnoDB does not support the INDEX DIRECTORY option. That’s a MyISAM-only feature.

Speak Your Mind

*