In this blog post, we’ll discuss JSON document fast lookup with MySQL 5.7.
Recently I attended Morgan Tocker’s talk on MySQL 5.7 and JSON at FOSDEM, and I found it awesome.
I learned some great information from the talk. Let me share one of them here: a very useful trick if you plan to store JSON documents in your MySQL database and want to retrieve the documents from some attribute’s values. So let’s look at how to do JSON document fast lookup with MySQL 5.7!
In this short example, I show you how we can speed up this type of search using JSON functions and virtual columns.
This our test table:
|
1 |
Table: test_features<br>Create Table: CREATE TABLE `test_features` (<br> `id` int(11) NOT NULL AUTO_INCREMENT,<br> `feature` json NOT NULL,<br> PRIMARY KEY (`id`)<br>) ENGINE=InnoDB AUTO_INCREMENT=206561 DEFAULT CHARSET=latin1<br><br>mysql> show table status like 'test_features'G<br>*************************** 1. row ***************************<br> Name: test_features<br> Engine: InnoDB<br> Version: 10<br> Row_format: Dynamic<br> Rows: 171828<br> Avg_row_length: 1340<br> Data_length: 230326272<br>Max_data_length: 0<br> Index_length: 0<br> Data_free: 3145728<br> Auto_increment: 206561<br> Create_time: 2016-03-01 15:22:34<br> Update_time: 2016-03-01 15:23:20<br> Check_time: NULL<br> Collation: latin1_swedish_ci<br> Checksum: NULL<br> Create_options: <br> Comment: <br> |
We can see the data length is almost 230M:
|
1 |
+--------------------+--------+-------+-------+-------+------------+---------+<br>| TABLE | ENGINE | ROWS | DATA | IDX | TOTAL SIZE | IDXFRAC |<br>+--------------------+--------+-------+-------+-------+------------+---------+<br>| json.test_features | InnoDB | 0.17M | 0.21G | 0.00G | 0.21G | 0.00 |<br>+--------------------+--------+-------+-------+-------+------------+---------+<br><br>-rw-r----- 1 mysql mysql 228M Mar 1 15:23 /var/lib/mysql/json/test_features.ibd<br><br> |
As an example here is one record (the data is coming from https://github.com/zemirco/sf-city-lots-json):
|
1 |
{<br> "type": "Feature",<br> "geometry": {<br> "type": "Polygon",<br> "coordinates": [<br> [<br> [<br> -122.41983177253881,<br> 37.80720512387136,<br> 0<br> ],<br> ...<br> [<br> -122.41983177253881,<br> 37.80720512387136,<br> 0<br> ]<br> ]<br> ]<br> },<br> "properties": {<br> "TO_ST": "600",<br> "BLKLOT": "0010001",<br> "STREET": "BEACH",<br> "FROM_ST": "600",<br> "LOT_NUM": "001",<br> "ST_TYPE": "ST",<br> "ODD_EVEN": "E",<br> "BLOCK_NUM": "0010",<br> "MAPBLKLOT": "0010001"<br> }<br>}<br> |
Now let’s try to find all the records where the street is “BEACH”. “Street” is part of the array attribute properties.
|
1 |
mysql> SELECT count(*) FROM test_features WHERE feature->"$.properties.STREET" = 'BEACH';<br>+----------+<br>| count(*) |<br>+----------+<br>| 208 |<br>+----------+<br>1 row in set (0.21 sec)<br><br>mysql> explain SELECT count(*) FROM test_features WHERE feature->"$.properties.STREET" = 'BEACH'G<br>*************************** 1. row ***************************<br> id: 1<br> select_type: SIMPLE<br> table: test_features<br> partitions: NULL<br> type: ALL<br>possible_keys: NULL<br> key: NULL<br> key_len: NULL<br> ref: NULL<br> rows: 171828<br> filtered: 100.00<br> Extra: Using where<br>1 row in set, 1 warning (0.00 sec)<br> |
As you can see, we perform a full table scan to achieve this.
With MySQL, we have the possibility of using virtually generated columns. Let’s create one for the streets:
|
1 |
mysql> ALTER TABLE test_features ADD COLUMN street VARCHAR(30) GENERATED ALWAYS AS (json_unquote(json_extract(`feature`,'$.properties.STREET'))) VIRTUAL;<br> |
I use “json_unquote()” to avoid to add the JSON string quotes in the column, and later in the index.
You can verify the size of the table on disk, and you will see this doesn’t increase (as it’s a virtual column).
Even if we can now use the “street” column in the search, that won’t help. We still need to add an index on it:
|
1 |
mysql> ALTER TABLE test_features ADD KEY `street` (`street`);<br> |
And now we can see that the size is larger because we have added the size of the index:
|
1 |
-rw-r----- 1 mysql mysql 232M Mar 1 15:48 /var/lib/mysql/json/test_features.ibd<br> |
Now we can try to run the query like this:
|
1 |
mysql> SELECT count(*) FROM test_features WHERE street = 'BEACH';<br>+----------+<br>| count(*) |<br>+----------+<br>| 208 |<br>+----------+<br>1 row in set (0.00 sec)<br> |
|
1 |
mysql> explain SELECT count(*) FROM test_features WHERE street = 'BEACH'G<br>*************************** 1. row ***************************<br> id: 1<br> select_type: SIMPLE<br> table: test_features<br> partitions: NULL<br> type: ref<br>possible_keys: street<br> key: street<br> key_len: 33<br> ref: const<br> rows: 208<br> filtered: 100.00<br> Extra: Using index<br> |
And finally we can verify this in the statistics available in sys schema:
|
1 |
mysql> select * from sys.schema_index_statistics where table_name='test_features'G <br>*************************** 1. row ***************************<br> table_schema: json<br> table_name: test_features<br> index_name: street<br> rows_selected: 208<br>select_latency: 72.59 us<br> rows_inserted: 0<br>insert_latency: 0 ps<br> rows_updated: 0<br>update_latency: 0 ps<br> rows_deleted: 0<br>delete_latency: 0 ps<br>*************************** 2. row ***************************<br> table_schema: json<br> table_name: test_features<br> index_name: PRIMARY<br> rows_selected: 0<br>select_latency: 0 ps<br> rows_inserted: 0<br>insert_latency: 0 ps<br> rows_updated: 0<br>update_latency: 0 ps<br> rows_deleted: 0<br>delete_latency: 0 ps<br>2 rows in set (0.00 sec)<br> |
As you can see, this is very fast. If you already know how you want to retrieve data out of your JSON document, it’s very easy to add such indexes in MySQL.
Resources
RELATED POSTS