EmergencyEMERGENCY? Get 24/7 Help Now!

New GIS Features in MySQL 5.7

 | February 3, 2016 |  Posted In: MySQL

PREVIOUS POST
NEXT POST

MySQL 5.7MySQL 5.7 has been released, and there are some exciting new features now available that I’m going to discuss in this blog — specifically around geographic information system (GIS).

I’ve used GIS features in MySQL for a long time. In my previous blog entries I’ve shown how to create geo-enabled applications with MySQL 5.6 and use MySQL 5.6 geo-spatial functions. In this blog post, I’ll look into what is new in MySQL 5.7 and how we can use those new features in practice for geo-enabled applications.

New in MySQL 5.7

MySQL 5.7 introduces the following major improvements and features for GIS:

  1. Spatial indexes for InnoDB. Finally it is here! This was a long overdue feature, which also prevented many companies from converting all tables to InnoDB.
  2. st_distance_sphere: native function to calculate a distance between two points on earth. Finally it is here as well! Like many others, I’ve created my stored procedure to calculate the distance between points on earth using haversine formula. The native function is ~20x faster than the stored procedure (in an artificial benchmark, see below). This is not surprising, as stored procedures are slow computationally – especially for trigonometrical functions.
  3. New functions: GeoHash and GeoJSON. With GeoJSON we can generate the results that are ready for visualizing on Google Maps.
  4. New GIS implementation based on Boost.Geometry library. This is great news, as originally GIS was implemented independently from scratch with a very limited set of features. Manyi Lu from MySQL server team provides more reasoning behind the choice of Boost.Geometry.

This is the great news. The bad news is that except for the st_distance_sphere, all other functions use planar geometry (no change since MySQL 5.6) and do not support Spatial Reference System Identifier (SRID). That means that if I want to calculate the distance of my favorite bike path in miles or kilometers, I’ll still have to use a stored function (see below for an example) or write an application code for that. Native function st_distance will ignore SRID for now and return a value which represents a distance on a planar – not very useful for our purposes (may be useful for order by / compare).

Distance on Sphere

MySQL 5.7 introduces the function st_distance_sphere, which uses a haversine formula to calculate distance. He is the example:

The distance is in meters by default (you can also change the radius of the earth to meters using the 3rd optional parameter, default: 6,370,986). Although our earth is represented as an oblate spheroid, all practical applications use the distance on a sphere. The difference between the haversine formula and more precise (and much slower) functions is negligible for our purposes.

The st_distance_sphere is much faster than using stored routines. Here is the artificial benchmark:

haversine_distance_sp is a stored routine implementation of the same algorithm.

InnoDB GIS example: find 10 restaurants near me 

In my previous blog post I’ve demonstrated how to use st_within function to find restaurants inside my zipcode (US postal code) and sort by distance. In MySQL 5.7 there will be 2 changes:

  1. We can use InnoDB table
  2. We can use st_distance_sphere function

For this example, I’ve converted Open Street Map data to MySQL and then created a new InnoDB table:

SHAPE is declared as geometry (and stores points in this table). We also have SPATIAL KEY SHAPE in the InnoDB table.

The following query will find all cafe or restaurants in Durham, NC (zipcode: 27701):

Table tl_2013_us_zcta510 stores the shapes of polygons for all US zipcodes. (It needs to be converted to MySQL.) In this example I’m using st_within to filter only the POIs I need, and st_distance_sphere to get the distance from my location (-78.9064543 35.9975194 are the coordinates of Percona’s office in Durham) to the restaurants.

Explain plan:

That looks pretty good: MySQL is using and index on the SHAPE field (even with the subquery, btw).

Results:

0.13 seconds response time on AWS t2.medium box sounds reasonable to me. The same query on the MyISAM table shows ~same response time: 0.14 seconds.

GeoJSON feature and Google Maps

Another nice feature of MySQL 5.7 GIS is GeoJSON function: you can convert your result set to GeoJSON, which can be used with other applications (for example Google Maps API).

Let’s say I want to visualize the above result set on Google Map. As the API requires a specific format, I can use concat / group_concat to apply the format inside the SQL:

I will get all the restaurants and cafes in zipcode 27701. Here I’m using ST_AsGeoJSON(shape) to convert to GeoJSON, and concat/group_concat to “nest” the whole result into the format suitable for Google Maps.

Result:

I did not include the full result set for the lack of space; I also had to change the group concat max length, otherwise MySQL will cut the result of the group_concat function.

Now I can visualize it:

 

 

 

 

 

 

 

 

 

 

Example: Find the longest bike path

MySQL 5.7 (as well as the older versions) supports st_length function to calculate a length of a linestring. However, even in MySQL 5.7 st_length can’t calculate the distance on earth. To find the distance of a linestring I’ve created a very simple stored procedure:

As the Open Street Map data has the information about roads in North America, we can use this function to calculate the length (in meters) for every road it stores:

Index the polygon/area distance using MySQL 5.7 virtual fields

To really answer the question “what is the longest bikepath (cyclepath) in North America?” we will have to order by stored function result. This will cause a full table scan and a filestort, which will be extremely slow for 30 millions of rows. The standard way to fix this is to materialize this road distance: add an additional field to the table and store the distance there.

In MySQL 5.7 we can actually use Generated (Virtual) Columns feature:

Unfortunately, MySQL 5.7 does not support non-native functions (stored procedures or UDF) in generated columns, so I have to use st_length in this example. Ordering by value of st_length may be OK though:

The query is very fast as it uses an index on both highway and linestring:

Conclusion

MySQL 5.7 contains a great set of features to work with geospatial data. Finally, spatial indexes are supported in InnoDB; st_distance_sphere as a native function is very useful. Unfortunately, other spatial functions only work with planar coordinates and do not support SRID. I hope this will be fixed in the new releases.

PREVIOUS POST
NEXT POST
Alexander Rubin

Alexander joined Percona in 2013. Alexander worked with MySQL since 2000 as DBA and Application Developer. Before joining Percona he was doing MySQL consulting as a principal consultant for over 7 years (started with MySQL AB in 2006, then Sun Microsystems and then Oracle). He helped many customers design large, scalable and highly available MySQL systems and optimize MySQL performance. Alexander also helped customers design Big Data stores with Apache Hadoop and related technologies.

11 Comments

  • Alex,

    I wonder if we can get some GIS benchmark script for Sysbench, might be do some benchmarks.

    One thing I’m missing for GIS is some easy to reference numbers. If I’m to do GIS lookup to find 50 restaurants within 10 mile distance, how many such lookups we can handle ?

  • Have you tried with coordinates close to the limits?

    I have tried the following and got error:

    -12.2730911,179.9946179
    -12.273021,-179.9963346
    +- 1000m

    mysql> select st_distance_sphere(point(-12.2730911, 179.9946179), point(-12.273021,-179.9963346));
    ERROR 1210 (HY000): Incorrect arguments to st_distance_sphere

    • Sorry, after reading this: https://dev.mysql.com/doc/refman/5.7/en/spatial-convenience-functions.html
      I realized I’ve inverted lon and lat.

      Now it worked perfectly:

      mysql> select st_distance_sphere(point(179.9946179,-12.2730911), point(-179.9963346,-12.273021));
      +————————————————————————————+
      | st_distance_sphere(point(179.9946179,-12.2730911), point(-179.9963346,-12.273021)) |
      +————————————————————————————+
      | 983.0725362589525 |
      +————————————————————————————+
      1 row in set (0.00 sec)

      Thank you.

  • Thanks for this update. The google street maps integration is particularily interesting. I also appreciate your older post showing how to get the data into MySQL using ogr2ogr.

    Have you tried using ogr2ogr to load the data to a MySQL 5.7 instance? I keep getting segmentation fault: 11 when loading the Canadian or U.S. postal code data. Here are the commands I’m trying:

    ogr2ogr -progress -f “MySQL” MySQL:”dbname,host=localhost,user=usr,password=pw,port=3306″ gfsa000b11a_e/gfsa000b11a_e.shp -nln “GEO_CANADA” -update -overwrite -lco engine=InnoDB

    ogr2ogr -progress -f “MySQL” MySQL:”dbname,host=localhost,user=usr,password=pw,port=3306″ tl_2016_us_zcta510/tl_2016_us_zcta510.shp -nln “GEO_US_POSTAL_CD” -update -overwrite -lco engine=InnoDB

    I’m on OS X 10.12.3, running gdal 1.11.5_1, installed with “brew install gdal –enable-unsupported –with-mysql”

    • I was able to get this working on ubuntu without any problems. Looks like spatial operations might be broken for the mysql driver in the gdal on osx.

Leave a Reply