October 1, 2014

EuroOSCON 2006 – High Performance FullText Search

I’m now back from EuroOSCON 2006 which was the reason I was not posting for a while. Pretty interesting event, even though it looks like it is getting less geeky compared to OSCON in US I visited two years ago – a lot of presentations now shifted to philosophical, political and business issues. I however do not know might be this is just Europe thing.

I gave a talk on High Performance FullText Search for Database Content which is now available for download from MySQL Performance Presentations page

We have plans to turn this presentation into article to publish here with some extra information and source code for all tests we’ve done so it will be helpful for full text search benchmarks with your own data.

It was great to see a lot of MySQL guys on the conference – Monty,David, Marten, Lars, Russell and ex MySQL guys such as Zak. We had good chat.

There was MySQL Best Practices BOF on the conference which surprisingly attracted very few people – there were more MySQL current and ex employees than attendees. Might be it was due to size of the conference which was not as big as in US or might be it was scheduled too late at night – I do not know.

This year were were quite a few talks related to Mobile market. It does not seems to have same opensource penetration yet as Server and even Desktop market but it seems to be coming.

With recent developments as SIP (Voice over IP) enabled phones and active growth of smartphones market we might soon end up with main function of Mobile operators to be provide connectivity than anything else which surely will hurt their margins but would be great for the users. Not sure when it would happen though.

There were also (of course) a lot of presentations on AJAX and Web 2.0 including presentation from Zimbra on best practices and from Yahoo team on their UI library (looks pretty cool)

In general conference was worth the trip, especially as I had combining it with visiting friends and Brussels sightseeing.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. yeah…. turn it into an article here…….

    I’ve worked with both Lucene and MySQL full-text search and both are fairly decent. MySQL really doesn’t scale as much as Lucene but it’s really easy to implement and support. Much better to just use MySQL if you’re already a MySQL shop then to have to deal with Lucene.

    Kevin

  2. peter says:

    Kevin,

    I can’t agree with you on final conclusion. First you have to identify your performance goals and than check solutions. For many data sizes or search requirements MySQL FullText search performance is unfortunately showstopper.

    There are some examples and benchmarks in presentation :)

    For tiny data size, ie search for your personal DVD collection MySQL FullText search should be used of course as it is simplest to use.

    You also can check sphinx which goes between Lucene and MySQL FullText Search in features and ease of use but which is the fastest.

    Note: It might sounds as I’m too negative to MySQL Full Text Search but just have seen too many people being stuck trying to scale it when their data size grew.

  3. Apachez says:

    I agree to that Peter.

    The reason for why I created TBGsearch was not to build a complete standalone search engine solution but to find a replacement of mysql fulltext for the forum at http://www.tbg.nu

    In my case the problem with mysql fulltext were that my searches performed:

    * Using boolean search.
    * No stopword-list.
    * Sorting hits by date and not “relevance” since relevance were irrelevant in this case :-)

    All the three reasons above made mysql fulltext to virtually puke when a client performed a search which contained several commonly existing word.

    I assume the reason for that is that the mysql fulltext internally will first find all hits for each word (since it was boolean search), then sort the result on date and finally return top 200 matches (instead of having some optimization to loop one hit at a time until 200 hits is collected or similar to that). Ending up taking sometimes 10 minutes or more to complete a search…

    So I started to look for a fix or a replacement and quickly found sphinx which I created a perl api for (since I’m using perl on my site). However sphinx had some limitations such as no support for wildcards, no liveupdates and when a new sphinx version were released the old perl api was no longer compatible.

    So instead of putting time into updating the perl api I tried to see how large the difference would be if I created a search engine written only in perl with mysql as backend for the storage and this ended up in TBGseach which I have released as open-source at http://www.tbg.nu/tbgsearch (for download http://www.tbg.nu/tbgsearch/tbgsearch.zip).

    In case someone has some tips or other questions regarding TBGsearch you can email me at the address which is written in http://www.tbg.nu/tbgsearch/readme.txt

Speak Your Mind

*