MySQL 5.7 has been released, and there are some exciting new features now available that I’m going to discuss in this blog — specifically around geographic information system (GIS).
I’ve used GIS features in MySQL for a long time. In my previous blog entries I’ve shown how to create geo-enabled applications with MySQL 5.6 and use MySQL 5.6 geo-spatial functions. In this blog post, I’ll look into what is new in MySQL 5.7 and how we can use those new features in practice for geo-enabled applications.
New in MySQL 5.7
MySQL 5.7 introduces the following major improvements and features for GIS:
This is the great news. The bad news is that except for the st_distance_sphere, all other functions use planar geometry (no change since MySQL 5.6) and do not support Spatial Reference System Identifier (SRID). That means that if I want to calculate the distance of my favorite bike path in miles or kilometers, I’ll still have to use a stored function (see below for an example) or write an application code for that. Native function st_distance will ignore SRID for now and return a value which represents a distance on a planar – not very useful for our purposes (may be useful for order by / compare).
Distance on Sphere
MySQL 5.7 introduces the function st_distance_sphere, which uses a haversine formula to calculate distance. He is the example:
|
1 |
mysql> select st_distance_sphere(point(-78.7698947, 35.890334), point(-122.38657, 37.60954));<br>+--------------------------------------------------------------------------------+<br>| st_distance_sphere(point(-78.7698947, 35.890334), point(-122.38657, 37.60954)) |<br>+--------------------------------------------------------------------------------+<br>| 3855600.7928957273 |<br>+--------------------------------------------------------------------------------+<br>1 row in set (0.00 sec) |
The distance is in meters by default (you can also change the radius of the earth to meters using the 3rd optional parameter, default: 6,370,986). Although our earth is represented as an oblate spheroid, all practical applications use the distance on a sphere. The difference between the haversine formula and more precise (and much slower) functions is negligible for our purposes.
The st_distance_sphere is much faster than using stored routines. Here is the artificial benchmark:
|
1 |
mysql> select benchmark(1000000, haversine_distance_sp(37.60954, -122.38657, 35.890334, -78.7698947));<br>+-----------------------------------------------------------------------------------------+<br>| benchmark(1000000, haversine_distance_sp(37.60954, -122.38657, 35.890334, -78.7698947)) |<br>+-----------------------------------------------------------------------------------------+<br>| 0 |<br>+-----------------------------------------------------------------------------------------+<br>1 row in set (22.55 sec)<br><br><br>mysql> select benchmark(1000000, st_distance_sphere(point(-78.7698947, 35.890334), point(-122.38657, 37.60954)));<br>+----------------------------------------------------------------------------------------------------+<br>| benchmark(1000000, st_distance_sphere(point(-78.7698947, 35.890334), point(-122.38657, 37.60954))) |<br>+----------------------------------------------------------------------------------------------------+<br>| 0 |<br>+----------------------------------------------------------------------------------------------------+<br>1 row in set (0.77 sec) |
haversine_distance_sp is a stored routine implementation of the same algorithm.
InnoDB GIS example: find 10 restaurants near me
In my previous blog post I’ve demonstrated how to use st_within function to find restaurants inside my zipcode (US postal code) and sort by distance. In MySQL 5.7 there will be 2 changes:
For this example, I’ve converted Open Street Map data to MySQL and then created a new InnoDB table:
|
1 |
CREATE TABLE `points_new` (<br> `OGR_FID` int(11) NOT NULL AUTO_INCREMENT,<br> `SHAPE` geometry NOT NULL,<br> `osm_id` text,<br> `name` text,<br> `barrier` text,<br> `highway` text,<br> `ref` text,<br> `address` text,<br> `is_in` text,<br> `place` text,<br> `man_made` text,<br> `other_tags` text,<br> UNIQUE KEY `OGR_FID` (`OGR_FID`),<br> SPATIAL KEY `SHAPE` (`SHAPE`)<br>) ENGINE=InnoDB AUTO_INCREMENT=13660668 DEFAULT CHARSET=latin1 |
SHAPE is declared as geometry (and stores points in this table). We also have SPATIAL KEY SHAPE in the InnoDB table.
The following query will find all cafe or restaurants in Durham, NC (zipcode: 27701):
|
1 |
SELECT osm_id, name, round(st_distance_sphere(shape, st_geomfromtext('POINT (-78.9064543 35.9975194)', 1) ), 2) as dist <br>FROM points_new <br>WHERE st_within(shape, <br> (select shape from zcta.tl_2013_us_zcta510 where zcta5ce10='27701') ) <br> and (other_tags like '%"amenity"=>"cafe"%' or other_tags like '%"amenity"=>"restaurant"%') <br> and name is not null <br>ORDER BY dist asc LIMIT 10; |
Table tl_2013_us_zcta510 stores the shapes of polygons for all US zipcodes. (It needs to be converted to MySQL.) In this example I’m using st_within to filter only the POIs I need, and st_distance_sphere to get the distance from my location (-78.9064543 35.9975194 are the coordinates of Percona’s office in Durham) to the restaurants.
Explain plan:
|
1 |
mysql> EXPLAIN<br> -> SELECT osm_id, name, round(st_distance_sphere(shape, st_geomfromtext('POINT (-78.9064543 35.9975194)', 1) ), 2) as dist <br> -> FROM points_new <br> -> WHERE st_within(shape, <br> -> (select shape from zcta.tl_2013_us_zcta510 where zcta5ce10='27701') ) <br> -> and (other_tags like '%"amenity"=>"cafe"%' or other_tags like '%"amenity"=>"restaurant"%') <br> -> and name is not null <br> -> ORDER BY dist asc LIMIT 10G<br>*************************** 1. row ***************************<br> id: 1<br> select_type: PRIMARY<br> table: points_new<br> partitions: NULL<br> type: range<br>possible_keys: SHAPE<br> key: SHAPE<br> key_len: 34<br> ref: NULL<br> rows: 21<br> filtered: 18.89<br> Extra: Using where; Using filesort<br>*************************** 2. row ***************************<br> id: 2<br> select_type: SUBQUERY<br> table: tl_2013_us_zcta510<br> partitions: NULL<br> type: ref<br>possible_keys: zcta5ce10<br> key: zcta5ce10<br> key_len: 8<br> ref: const<br> rows: 1<br> filtered: 100.00<br> Extra: NULL<br>2 rows in set, 1 warning (0.00 sec) |
That looks pretty good: MySQL is using and index on the SHAPE field (even with the subquery, btw).
Results:
|
1 |
mysql> SELECT osm_id, name, round(st_distance_sphere(shape, st_geomfromtext('POINT (-78.9064543 35.9975194)', 1) ), 2) as dist, st_astext(shape) <br> -> FROM points_new <br> -> WHERE st_within(shape, <br> -> (select shape from zcta.tl_2013_us_zcta510 where zcta5ce10='27701') ) <br> -> and (other_tags like '%"amenity"=>"cafe"%' or other_tags like '%"amenity"=>"restaurant"%') <br> -> and name is not null <br> -> ORDER BY dist asc LIMIT 10;<br>+------------+----------------------------+--------+--------------------------------------+<br>| osm_id | name | dist | st_astext(shape) |<br>+------------+----------------------------+--------+--------------------------------------+<br>| 880747417 | Pop's | 127.16 | POINT(-78.9071795 35.998501) |<br>| 1520441350 | toast | 240.55 | POINT(-78.9039761 35.9967069) |<br>| 2012463902 | Pizzeria Toro | 256.44 | POINT(-78.9036457 35.997125) |<br>| 398941519 | Parker & Otis | 273.39 | POINT(-78.9088833 35.998997) |<br>| 881029843 | Torero's | 279.96 | POINT(-78.90829140000001 35.9995516) |<br>| 299540833 | Fishmonger's | 300.01 | POINT(-78.90850250000001 35.9996487) |<br>| 1801595418 | Lilly's Pizza | 319.83 | POINT(-78.9094462 35.9990732) |<br>| 1598401100 | Dame's Chicken and Waffles | 323.82 | POINT(-78.9031929 35.9962871) |<br>| 685493947 | El Rodeo | 379.18 | POINT(-78.909865 35.999523) |<br>| 685504784 | Piazza Italia | 389.06 | POINT(-78.9096472 35.9998794) |<br>+------------+----------------------------+--------+--------------------------------------+<br>10 rows in set (0.13 sec) |
0.13 seconds response time on AWS t2.medium box sounds reasonable to me. The same query on the MyISAM table shows ~same response time: 0.14 seconds.
GeoJSON feature and Google Maps
Another nice feature of MySQL 5.7 GIS is GeoJSON function: you can convert your result set to GeoJSON, which can be used with other applications (for example Google Maps API).
Let’s say I want to visualize the above result set on Google Map. As the API requires a specific format, I can use concat / group_concat to apply the format inside the SQL:
|
1 |
SELECT CONCAT('{<br> "type": "FeatureCollection",<br> "features": [<br> ',<br> GROUP_CONCAT('{ <br> "type": "Feature",<br> "geometry": ', ST_AsGeoJSON(shape), ',<br> "properties": {}<br> }'),<br> ']<br>}') as j <br>FROM points_new <br>WHERE st_within(shape, <br> (select shape from zcta.tl_2013_us_zcta510 where zcta5ce10='27701') ) <br> and (other_tags like '%"amenity"=>"cafe"%' or other_tags like '%"amenity"=>"restaurant"%') <br> and name is not null |
I will get all the restaurants and cafes in zipcode 27701. Here I’m using ST_AsGeoJSON(shape) to convert to GeoJSON, and concat/group_concat to “nest” the whole result into the format suitable for Google Maps.
Result:
|
1 |
mysql> set group_concat_max_len = 1000000;<br>Query OK, 0 rows affected (0.00 sec)<br><br>mysql> SELECT CONCAT('{<br> '> "type": "FeatureCollection",<br> '> "features": [<br> '> ',<br> -> GROUP_CONCAT('{ <br> '> "type": "Feature",<br> '> "geometry": ', ST_AsGeoJSON(shape), ',<br> '> "properties": {}<br> '> }'),<br> -> ']<br> '> }') as j <br> -> FROM points_new <br> -> WHERE st_within(shape, <br> -> (select shape from zcta.tl_2013_us_zcta510 where zcta5ce10='27701') ) <br> -> and (other_tags like '%"amenity"=>"cafe"%' or other_tags like '%"amenity"=>"restaurant"%') <br> -> and name is not null <br>*************************** 1. row ***************************<br>j: {<br> "type": "FeatureCollection",<br> "features": [<br> { <br> "type": "Feature",<br> "geometry": {"type": "Point", "coordinates": [-78.890852, 35.9903403]},<br> "properties": {}<br> },{ <br> "type": "Feature",<br> "geometry": {"type": "Point", "coordinates": [-78.8980807, 35.9933562]},<br> "properties": {}<br> },{ <br> "type": "Feature",<br> "geometry": {"type": "Point", "coordinates": [-78.89972490000001, 35.995879]},<br> "properties": {}<br> } ... ,{ <br> "type": "Feature",<br> "geometry": {"type": "Point", "coordinates": [-78.9103211, 35.9998494]},<br> "properties": {}<br> },{ <br> "type": "Feature",<br> "geometry": {"type": "Point", "coordinates": [-78.9158326, 35.9967114]},<br> "properties": {}<br> }]<br>}<br>1 row in set (0.14 sec)<br> |
I did not include the full result set for the lack of space; I also had to change the group concat max length, otherwise MySQL will cut the result of the group_concat function.
Now I can visualize it:
Example: Find the longest bike path
MySQL 5.7 (as well as the older versions) supports st_length function to calculate a length of a linestring. However, even in MySQL 5.7 st_length can’t calculate the distance on earth. To find the distance of a linestring I’ve created a very simple stored procedure:
|
1 |
DELIMITER //<br><br>CREATE DEFINER=CURRENT_USER() FUNCTION `ls_distance_sphere`(ls GEOMETRY) RETURNS DECIMAL(20,8)<br> DETERMINISTIC<br>BEGIN <br>DECLARE i, n INT DEFAULT 0;<br>DECLARE len DECIMAL(20,8) DEFAULT 0;<br>SET i = 1;<br>SET n = ST_NumPoints(ls);<br> <br> WHILE i < n DO<br> SET len = len + st_distance_sphere(st_pointN(ls, i), st_pointN(ls, i+1));<br>SET i = i + 2;<br> END WHILE;<br> RETURN len;<br><br>END //<br><br>DELIMITER ; |
As the Open Street Map data has the information about roads in North America, we can use this function to calculate the length (in meters) for every road it stores:
|
1 |
mysql> select name, ls_distance_sphere(shape) from lines_new where highway = 'cycleway' and name is not null limit 10;<br>+---------------------------------------+---------------------------+<br>| name | ls_distance_sphere(shape) |<br>+---------------------------------------+---------------------------+<br>| Highbury Park Drive Bypass | 0.97386664 |<br>| Ygnacio Canal Trail | 0.86093199 |<br>| South Marion Parkway | 1.06723424 |<br>| New River Greenway | 1.65705401 |<br>| Northern Diversion Trail | 2.08269808 |<br>| Gary L. Haller Trail;Mill Creek Trail | 2.09988209 |<br>| Loop 1 | 2.05297129 |<br>| Bay Farm Island Bicycle Bridge | 2.51141623 |<br>| Burrard Street | 1.84810259 |<br>| West 8th Avenue | 1.76338236 |<br>+---------------------------------------+---------------------------+<br>10 rows in set (0.00 sec) |
Index the polygon/area distance using MySQL 5.7 virtual fields
To really answer the question “what is the longest bikepath (cyclepath) in North America?” we will have to order by stored function result. This will cause a full table scan and a filestort, which will be extremely slow for 30 millions of rows. The standard way to fix this is to materialize this road distance: add an additional field to the table and store the distance there.
In MySQL 5.7 we can actually use Generated (Virtual) Columns feature:
|
1 |
CREATE TABLE `lines_new` (<br> `OGR_FID` int(11) NOT NULL AUTO_INCREMENT,<br> `SHAPE` geometry NOT NULL,<br> `osm_id` int(11) DEFAULT NULL,<br> `name` varchar(255) DEFAULT NULL,<br> `highway` varchar(60) DEFAULT NULL,<br> `waterway` text,<br> `aerialway` text,<br> `barrier` text,<br> `man_made` text,<br> `other_tags` text,<br> `linestring_length` decimal(15,8) GENERATED ALWAYS AS (st_length(shape)) VIRTUAL,<br> PRIMARY KEY (`OGR_FID`),<br> SPATIAL KEY `SHAPE` (`SHAPE`),<br> KEY `linestring_length` (`linestring_length`),<br> KEY `highway_len` (`highway`,`linestring_length`)<br>) ENGINE=InnoDB AUTO_INCREMENT=27077492 DEFAULT CHARSET=latin1 |
Unfortunately, MySQL 5.7 does not support non-native functions (stored procedures or UDF) in generated columns, so I have to use st_length in this example. Ordering by value of st_length may be OK though:
|
1 |
mysql> select name, ls_distance_sphere(shape) from lines_new where highway = 'cycleway' and name is not null order by linestring_length desc limit 10;<br>+-----------------------------+---------------------------+<br>| name | ls_distance_sphere(shape) |<br>+-----------------------------+---------------------------+<br>| Confederation Trail | 55086.92572725 |<br>| Cowboy Trail | 43432.06768706 |<br>| Down East Sunrise Trail | 42347.39791330 |<br>| Confederation Trail | 29844.91038542 |<br>| Confederation Trail | 26141.04655981 |<br>| Longleaf Trace | 29527.66063726 |<br>| Cardinal Greenway | 30613.24487294 |<br>| Lincoln Prairie Grass Trail | 19648.26787218 |<br>| Ghost Town Trail | 25610.52158647 |<br>| Confederation Trail | 27086.54829531 |<br>+-----------------------------+---------------------------+<br>10 rows in set (0.02 sec) |
The query is very fast as it uses an index on both highway and linestring:
|
1 |
mysql> explain select name, ls_distance_sphere(shape) from lines_new where highway = 'cycleway' and name is not null order by linestring_length desc limit 10G<br>*************************** 1. row ***************************<br> id: 1<br> select_type: SIMPLE<br> table: lines_new<br> partitions: NULL<br> type: ref<br>possible_keys: highway_len<br> key: highway_len<br> key_len: 63<br> ref: const<br> rows: 119392<br> filtered: 90.00<br> Extra: Using where<br>1 row in set, 1 warning (0.00 sec) |
Conclusion
MySQL 5.7 contains a great set of features to work with geospatial data. Finally, spatial indexes are supported in InnoDB; st_distance_sphere as a native function is very useful. Unfortunately, other spatial functions only work with planar coordinates and do not support SRID. I hope this will be fixed in the new releases.