Using UDFs for geo-distance search in MySQL

In my previous post about geo-spatial search in MySQL I described (along with other things) how to use geo-distance functions. In this post I will describe the geo-spatial distance functions in more details.

If you need to calculate an exact distance between 2 points on Earth in MySQL (very common for geo-enabled applications) you have at least 3 choices.

  • Use stored function and implement haversine formula
  • Use UDF (user defined function) for haversine (see below)
  • In MySQL 5.6 you can use st_distance function (newly documented), however, you will get the distance on plane and not on earth; the value returned will be good for sorting by distance but will not represent actual miles or kilometers.

MySQL stored function for calculating distance on Earth

I previously gave an example for a MySQL-stored function which implements the haversine formula. However, the approach I used was not very precise: it was optimized for speed. If you need a more precise haversine formula implementation you can use this function (result will be in miles):

(the algorithm is based on the standard formula, I’ve used the well-known Movable Type scripts calculator)

This is a slower implementation as it uses arctangent, however it is more precise.  

MySQL UDF for Haversine distance

Another approach, which will give you much more performance is to use UDF. There are a number of implementations, I’ve used lib_mysqludf_haversine.

Here is the simple steps to install it in MySQL 5.6 (will also work with earlier versions):

Please note:

  • Make sure you have the mysql-devel or percona-server-devel package installed (MySQL development libraries) before installing.
  • You will need to specify the last parameter to be “mi” if you want to get the results in miles, otherwise it will give you kilometers.

MySQL ST_distance function

In MySQL 5.6 you can use ST_distance function:

As we can see it does not give us an actual distance in mile or kilometers as it does not take into account that we have latitude and longitude, rather than X and Y on plane.

Geo Distance Functions Performance

The stored procedures and functions in MySQL are known to be slower, especially with trigonometrical functions. I’ve did a quick test, using MySQL function benchmark.

First I set 2 points (10 miles from SFO airport)

Next I use 4 function to benchmark:

  • Less precise stored function (haversine)
  • More precise stored function (haversine)
  • UDF for haversine
  • MySQL 5.6 native ST_distance (plane)

The benchmark function will execute the above function 100000 times.

Here are the results:

As we can see the UDF gives much faster response time (which is comparable to built-in function).

Benchmark chart (smaller the better)

Conclusion

The lib_mysqludf_haversine UDF provides a good function for geo-distance search in MySQL. Please let me know in the comments what geo-distance functions or approaches do you use in your applications.

Share this post

Comments (4)

  • Eliah

    Incorrect benchmark for the ST_distance function. In fact, you’re measuring not only the speed of the function itself, but two conversions from text to point each iteration as well.

    select benchmark(100000, st_distance(point(@rlat1, @rlon1), point(@rlat2, @rlon1))) as mysql_builtin_st_distance;

    1 row in set (0.08 sec)

    set @p1 = point(@rlat1, @rlon1), @p2 = point(@rlat2, @rlon1); select benchmark(100000, st_distance(@p1, @p2)) as mysql_builtin_st_distance;

    1 row in set (0.01 sec)

    June 22, 2014 at 4:53 pm
  • Alexander Rubin

    Eliah,

    Yes, that is a good point, you are right! I will update the post.
    Thank you for pointing it out!
    PS: I wish the built-in function st_distance support the SRID and can calculate distance on earth.

    June 23, 2014 at 10:13 am
  • Prabhat Nath

    Hello,

    Thank you for this article and the benchmarks.
    Suppose I have to deploy my MySQL on Amazon RDS so what should be the approach for Installing lib_mysqludf_haversine ?

    July 22, 2015 at 4:48 am
  • Alastair Knowles

    Helpful post – I did a bit of my own research for a project this morning and found something quite interesting. The main performance bottleneck seems to be stored procedures / functions themselves – probably interpretation. Without installing any kind of third party library I was able to produce comparable performance to the UDF Haversine result, just by inlining the algebra directly into the SQL. Would guess that’s going to be an adequate approach in most scenarios, e.g. where statements are issued from another language that supports templating / prepared statements.

    November 3, 2016 at 5:48 am

Comments are closed.

Use Percona's Technical Forum to ask any follow-up questions on this blog topic.