I’ve run into the following thread couple of days ago:
Basically someone is using sphinx to perform search simply on attributes (date, group etc) and get sorted result set and claiming it is way faster than getting it with MySQL. Honestly I can well believe it for cases when you want to know number of matching rows as well as if you can’t build efficient indexes so selectivity is done by index and index used to resolve order by.
Funny enough to filter by attributes or sort sphinx does not use indexes – indexes are only used for full text search matching, but it is still extremely fast doing data crunching.
I just tested right now performing search of “the” which matched 100.000.000 of documents out of 200.000.000 collection (200GB) completed in 0.7 second. This is system we’re building for one of our clients which uses cluster of 3 nodes to handle search. In this case no shortcuts are taken all 100.000.000 of matching document are traversed and priority queue sorting is performed to generate 1.000 best matching results. Quite impressive
Yeah I know it should be stop word but I currently have index without stop words for testing purposes.
Now what I’m hoping for as developments:
Andrew to continue improving sphinx so it would have more advanced filtering clauses and types of attributes, plus there would be an option to retrieve by filters only with no full text query. Sphinx should not be replacement for Database Server but for many data retrieval needs it will work great. Especially as it can be used with other databases which may be slower than MySQL.
MySQL Make it so one would not need to use sphinx to get great performance for such kind of queries. This includes parallel processing, fast count(*) and bitmap indexes to help non selective clauses. Also some form of fast sort like priority queue could be used if only few first elements are needed.
Resources
RELATED POSTS