As you probably already seen in a post by Baron, Sphinx Release 0.9.8 is finally out, just in time for OSCON 2008. Even though it is “minor release” if you look at the number, it is major release in practice (and you can view snapshots as minor releases). The changes since 0.9.7 are dramatic with over 70 new features corresponding to over 15 months of work. With zero in front it still looks like “beta” release though it is very stable and widely used.
Myself I would have already named it 1.3.0 or something like it (with 3rd number used for minor releases) and use version 2.0.0 as a target for full live updates. Though it looks like Andrew has set his goal on naming it 1.0 only when dynamic updates work and starting from 0.9.1 it did not allow too much version flexibility.
Sphinx will be presented this year on OSCON as .ORG Exhibitor, with me running the show – it was too expensive for Andrew to come from Russia, especially as he did not get a session at OSCON.
It is also worth to note Sphinx is nominated as SourceForge community choice awards finalist in 3 nominations (Best Project, Best Project for Enterprise, Most Likely to Be the Next $1B Acquisition) which is pretty cool.
At Percona we actively support Sphinx as in our opinion it is great complement to MySQL when it comes to full text search tasks and other real time information processing applications. It integrates with MySQL and scripting languages very well, it is simple, it performs well and it is easily clustered, allowing you to scale out to multiple cores and multiple nodes, with close to linear scalability.
Because of this we included Sphinx chapter in High Performance MySQL book – check out Appendix C. This should be the best printed material about Sphinx out there, though as of now Sphinx has surely grown into the size to justify for a book of its own.
If you’re hungry for some numbers I’d be happy to share a couple of benchmarks results for the new version. First is about “EXTENDED2” matching mode – which is faster and more feature full search mode than the previous “EXTENDED” one that originally introduced a query language. It can be 10-30% faster when it comes to rare word combinations, while if you search for frequent words the difference can be as large as 2-3 times.
For 15 million of documents on single client run on Intel Core Duo @ 2.2Ghz we got the following:
Extended2 mode also offers choice of “ranking modes” – if you would like to use BM25 ranking (similar to what MySQL build in full text search uses) you can get performance another 20-100% better though search result quality will be reduced. Or if you’re not interested in full-text ranking altogether, for example when you’re sorting by price, you can just disable ranking.
Another interesting point is Sphinx grouping performance. For example on the same 15M document collection counting number of documents per site_id takes 3.6 seconds to do using Sphinx, compared to 7.5 seconds using MySQL with best covering index (so no temporary table or sorting is needed for group by). Note that with Sphinx you can easily run the process on multiple cores/multiple nodes.
Anyways I’m excited of this new Sphinx milestone, and if you’re using Sphinx, be sure to try this new release.