Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

JSON document fast lookup with MySQL 5.7

March 7, 2016

Author

Frederic Descamps

MySQL

Share this Post:

In this blog post, we’ll discuss JSON document fast lookup with MySQL 5.7.

Recently I attended Morgan Tocker’s talk on MySQL 5.7 and JSON at FOSDEM, and I found it awesome.

I learned some great information from the talk. Let me share one of them here: a very useful trick if you plan to store JSON documents in your MySQL database and want to retrieve the documents from some attribute’s values. So let’s look at how to do JSON document fast lookup with MySQL 5.7!

In this short example, I show you how we can speed up this type of search using JSON functions and virtual columns.

JSON document fast lookup

This our test table:

       Table: test_features
Create Table: CREATE TABLE `test_features` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `feature` json NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=206561 DEFAULT CHARSET=latin1

mysql> show table status like 'test_features'G
*************************** 1. row ***************************
           Name: test_features
         Engine: InnoDB
        Version: 10
     Row_format: Dynamic
           Rows: 171828
 Avg_row_length: 1340
    Data_length: 230326272
Max_data_length: 0
   Index_length: 0
      Data_free: 3145728
 Auto_increment: 206561
    Create_time: 2016-03-01 15:22:34
    Update_time: 2016-03-01 15:23:20
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options: 
        Comment:

Table: test_features

Create Table: CREATE TABLE `test_features` (

`id` int(11) NOT NULL AUTO_INCREMENT,

`feature` json NOT NULL,

PRIMARY KEY (`id`)

) ENGINE=InnoDB AUTO_INCREMENT=206561 DEFAULT CHARSET=latin1

mysql> show table status like 'test_features'G

*************************** 1. row ***************************

Name: test_features

Engine: InnoDB

Version: 10

Row_format: Dynamic

Rows: 171828

Avg_row_length: 1340

Data_length: 230326272

Max_data_length: 0

Index_length: 0

Data_free: 3145728

Auto_increment: 206561

Create_time: 2016-03-01 15:22:34

Update_time: 2016-03-01 15:23:20

Check_time: NULL

Collation: latin1_swedish_ci

Checksum: NULL

Create_options:

Comment:

We can see the data length is almost 230M:

+--------------------+--------+-------+-------+-------+------------+---------+
| TABLE              | ENGINE | ROWS  | DATA  | IDX   | TOTAL SIZE | IDXFRAC |
+--------------------+--------+-------+-------+-------+------------+---------+
| json.test_features | InnoDB | 0.17M | 0.21G | 0.00G | 0.21G      |    0.00 |
+--------------------+--------+-------+-------+-------+------------+---------+

-rw-r----- 1 mysql mysql 228M Mar  1 15:23 /var/lib/mysql/json/test_features.ibd

+--------------------+--------+-------+-------+-------+------------+---------+

+--------------------+--------+-------+-------+-------+------------+---------+

| json.test_features | InnoDB | 0.17M | 0.21G | 0.00G | 0.21G | 0.00 |

+--------------------+--------+-------+-------+-------+------------+---------+

-rw-r----- 1 mysql mysql 228M Mar 1 15:23 /var/lib/mysql/json/test_features.ibd

As an example here is one record (the data is coming from https://github.com/zemirco/sf-city-lots-json):

{
    "type": "Feature",
    "geometry": {
        "type": "Polygon",
        "coordinates": [
            [
                [
                    -122.41983177253881,
                    37.80720512387136,
                    0
                ],
                ...
                [
                    -122.41983177253881,
                    37.80720512387136,
                    0
                ]
            ]
        ]
    },
    "properties": {
        "TO_ST": "600",
        "BLKLOT": "0010001",
        "STREET": "BEACH",
        "FROM_ST": "600",
        "LOT_NUM": "001",
        "ST_TYPE": "ST",
        "ODD_EVEN": "E",
        "BLOCK_NUM": "0010",
        "MAPBLKLOT": "0010001"
    }
}

{

"type": "Feature",

"geometry": {

"type": "Polygon",

"coordinates": [

[

-122.41983177253881,

37.80720512387136,

...

[

-122.41983177253881,

37.80720512387136,

]

"properties": {

"TO_ST": "600",

"BLKLOT": "0010001",

"STREET": "BEACH",

"FROM_ST": "600",

"LOT_NUM": "001",

"ST_TYPE": "ST",

"ODD_EVEN": "E",

"BLOCK_NUM": "0010",

"MAPBLKLOT": "0010001"

}

Now let’s try to find all the records where the street is “BEACH”. “Street” is part of the array attribute properties.

mysql> SELECT count(*) FROM test_features WHERE feature->"$.properties.STREET" = 'BEACH';
+----------+
| count(*) |
+----------+
|      208 |
+----------+
1 row in set (0.21 sec)

mysql> explain SELECT count(*) FROM test_features WHERE feature->"$.properties.STREET" = 'BEACH'G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: test_features
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 171828
     filtered: 100.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

mysql> SELECT count(*) FROM test_features WHERE feature->"$.properties.STREET" = 'BEACH';

+----------+

| count(*) |

+----------+

| 208 |

+----------+

1 row in set (0.21 sec)

mysql> explain SELECT count(*) FROM test_features WHERE feature->"$.properties.STREET" = 'BEACH'G

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: test_features

partitions: NULL

type: ALL

possible_keys: NULL

key: NULL

key_len: NULL

ref: NULL

rows: 171828

filtered: 100.00

Extra: Using where

1 row in set, 1 warning (0.00 sec)

As you can see, we perform a full table scan to achieve this.

With MySQL, we have the possibility of using virtually generated columns. Let’s create one for the streets:

mysql> ALTER TABLE test_features ADD COLUMN street VARCHAR(30) GENERATED ALWAYS AS (json_unquote(json_extract(`feature`,'$.properties.STREET'))) VIRTUAL;

1	mysql> ALTER TABLE test_features ADD COLUMN street VARCHAR(30) GENERATED ALWAYS AS (json_unquote(json_extract(`feature`,'$.properties.STREET'))) VIRTUAL;

I use “json_unquote()” to avoid to add the JSON string quotes in the column, and later in the index.

You can verify the size of the table on disk, and you will see this doesn’t increase (as it’s a virtual column).

Even if we can now use the “street” column in the search, that won’t help. We still need to add an index on it:

mysql> ALTER TABLE test_features ADD KEY `street` (`street`);

1	mysql> ALTER TABLE test_features ADD KEY `street` (`street`);

And now we can see that the size is larger because we have added the size of the index:

-rw-r----- 1 mysql mysql 232M Mar  1 15:48 /var/lib/mysql/json/test_features.ibd

1	-rw-r----- 1 mysql mysql 232M Mar 1 15:48 /var/lib/mysql/json/test_features.ibd

Now we can try to run the query like this:

mysql> SELECT count(*) FROM test_features WHERE street = 'BEACH';
+----------+
| count(*) |
+----------+
|      208 |
+----------+
1 row in set (0.00 sec)

mysql> SELECT count(*) FROM test_features WHERE street = 'BEACH';

+----------+

| count(*) |

+----------+

| 208 |

+----------+

1 row in set (0.00 sec)

Query Execution Plan:

mysql> explain SELECT count(*) FROM test_features WHERE street = 'BEACH'G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: test_features
   partitions: NULL
         type: ref
possible_keys: street
          key: street
      key_len: 33
          ref: const
         rows: 208
     filtered: 100.00
        Extra: Using index

mysql> explain SELECT count(*) FROM test_features WHERE street = 'BEACH'G

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: test_features

partitions: NULL

type: ref

possible_keys: street

key: street

key_len: 33

ref: const

rows: 208

filtered: 100.00

Extra: Using index

And finally we can verify this in the statistics available in sys schema:

mysql> select * from sys.schema_index_statistics where table_name='test_features'G                        
*************************** 1. row ***************************
  table_schema: json
    table_name: test_features
    index_name: street
 rows_selected: 208
select_latency: 72.59 us
 rows_inserted: 0
insert_latency: 0 ps
  rows_updated: 0
update_latency: 0 ps
  rows_deleted: 0
delete_latency: 0 ps
*************************** 2. row ***************************
  table_schema: json
    table_name: test_features
    index_name: PRIMARY
 rows_selected: 0
select_latency: 0 ps
 rows_inserted: 0
insert_latency: 0 ps
  rows_updated: 0
update_latency: 0 ps
  rows_deleted: 0
delete_latency: 0 ps
2 rows in set (0.00 sec)

mysql> select * from sys.schema_index_statistics where table_name='test_features'G

*************************** 1. row ***************************

table_schema: json

table_name: test_features

index_name: street

rows_selected: 208

select_latency: 72.59 us

rows_inserted: 0

insert_latency: 0 ps

rows_updated: 0

update_latency: 0 ps

rows_deleted: 0

delete_latency: 0 ps

*************************** 2. row ***************************

table_schema: json

table_name: test_features

index_name: PRIMARY

rows_selected: 0

select_latency: 0 ps

rows_inserted: 0

insert_latency: 0 ps

rows_updated: 0

update_latency: 0 ps

rows_deleted: 0

delete_latency: 0 ps

2 rows in set (0.00 sec)

As you can see, this is very fast. If you already know how you want to retrieve data out of your JSON document, it’s very easy to add such indexes in MySQL.

0 0 votes

Article Rating

4 Comments

Oldest

Newest Most Voted

Daniël van Eeden

10 years ago

http://databaseblog.myname.nl/2016/03/re-json-document-fast-lookup-with-mysql.html

glidek

9 years ago

Great article! What would be the best approach of indexing a property that’s an object or array?

glidek

9 years ago

Also, what method did you use to load the data from https://github.com/zemirco/sf-city-lots-json? I’m trying to do that locally but can’t think of an easy way.