Where the open source database community meets: Use code PERCONA75 and secure your spot for Percona Live. Register

Downloads

Blog

Looking inside the MySQL 5.7 document store

May 24, 2016

Author

Alexander Rubin

MySQL

Share this Post:

In this blog, we’ll look at the MySQL 5.7 document store feature, and how it is implemented.

Document Store

MySQL 5.7.12 is a major new release, as it contains quite a number of new features:

1. Document store and “MongoDB” like NoSQL interface to JSON storage

1. Protocol X / X Plugin, which can be used for asynchronous queries (I will write about it as well)

1. New MySQL shell

Peter already wrote the document store overview; in this post, I will look deeper into the document store implementation. In my next post, I will demonstrate how to use document store for Internet of Things (IoT) and event logging.

Older MySQL 5.7 versions already have a JSON data type, and an ability to create virtual columns that can be indexed. The new document store feature is based on the JSON datatype.

So what is the document store anyway? It is an add-on to a normal MySQL table with a JSON field. Let’s take a deep dive into it and see how it works.

First of all: one can interface with the document store’s collections using the X Plugin (default port: 33060). To do that:

1. Enable X Plugin and install MySQL shell.

1. Login to a shell:
  
  Shell
  
  mysqlsh --uri root@localhost
  
  1
  
  mysqlsh --uri root@localhost

Run commands (JavaScript mode, can be switched to SQL or Python):

mysqlsh --uri root@localhost
Creating an X Session to root@localhost:33060
Enter password: 
No default schema selected.

Welcome to MySQL Shell 1.0.3 Development Preview

Copyright (c) 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help', 'h' or '?' for help.

Currently in JavaScript mode. Use sql to switch to SQL mode and execute queries.

mysql-js> db = session.getSchema('world_x')                                                                                                                                                                 <Schema:world_x>
mysql-js> db.getCollections()
{
    "CountryInfo": <Collection:CountryInfo>
}

mysqlsh --uri root@localhost

Creating an X Session to root@localhost:33060

Enter password:

No default schema selected.

Welcome to MySQL Shell 1.0.3 Development Preview

Oracle is a registered trademark of Oracle Corporation and/or its

affiliates. Other names may be trademarks of their respective

owners.

Type 'help', 'h' or '?' for help.

Currently in JavaScript mode. Use sql to switch to SQL mode and execute queries.

mysql-js> db = session.getSchema('world_x') <Schema:world_x>

mysql-js> db.getCollections()

{

"CountryInfo": <Collection:CountryInfo>

}

Now, how is the document store’s collection different from a normal table? To find out, I’ve connected to a normal MySQL shell:

mysql world_x
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 2396
Server version: 5.7.12 MySQL Community Server (GPL)

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.

mysql> show create table CountryInfo
*************************** 1. row ***************************
       Table: CountryInfo
Create Table: CREATE TABLE `CountryInfo` (
  `doc` json DEFAULT NULL,
  `_id` varchar(32) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$._id'))) STORED NOT NULL,
  PRIMARY KEY (`_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

mysql> show tables;
+-------------------+
| Tables_in_world_x |
+-------------------+
| City              |
| Country           |
| CountryInfo       |
| CountryLanguage   |
+-------------------+
4 rows in set (0.00 sec)

mysql world_x

Reading table information for completion of table and column names

You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor. Commands end with ; or g.

Your MySQL connection id is 2396

Server version: 5.7.12 MySQL Community Server (GPL)

Oracle is a registered trademark of Oracle Corporation and/or its

affiliates. Other names may be trademarks of their respective

owners.

Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.

mysql> show create table CountryInfo

*************************** 1. row ***************************

Table: CountryInfo

Create Table: CREATE TABLE `CountryInfo` (

`doc` json DEFAULT NULL,

`_id` varchar(32) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$._id'))) STORED NOT NULL,

PRIMARY KEY (`_id`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8

1 row in set (0.00 sec)

mysql> show tables;

+-------------------+

| Tables_in_world_x |

+-------------------+

| City |

| Country |

| CountryInfo |

| CountryLanguage |

+-------------------+

4 rows in set (0.00 sec)

So the document store is actually an InnoDB table with one field: doc json + Primary key, which is a generated column.

As we can also see, there are four tables in the world_x database, but db.getCollections() only shows one. So how does MySQL distinguish between a “normal” table and a “document store” table? To find out, we can enable the general query log and see which query is being executed:

$ mysql -e 'set global general_log=1'
$ tail /var/log/general.log
2016-05-17T20:53:12.772114Z  186 Query  SELECT table_name, COUNT(table_name) c FROM information_schema.columns WHERE ((column_name = 'doc' and data_type = 'json') OR (column_name = '_id' and generation_expression = 'json_unquote(json_extract(`doc`,''$._id''))')) AND table_schema = 'world_x' GROUP BY table_name HAVING c = 2
2016-05-17T20:53:12.773834Z  186 Query  SHOW FULL TABLES FROM `world_x`

$ mysql -e 'set global general_log=1'

$ tail /var/log/general.log

2016-05-17T20:53:12.772114Z 186 Query SELECT table_name, COUNT(table_name) c FROM information_schema.columns WHERE ((column_name = 'doc' and data_type = 'json') OR (column_name = '_id' and generation_expression = 'json_unquote(json_extract(`doc`,''$._id''))')) AND table_schema = 'world_x' GROUP BY table_name HAVING c = 2

2016-05-17T20:53:12.773834Z 186 Query SHOW FULL TABLES FROM `world_x`

As you can see, every table that has a specific structure (doc JSON or specific generation_expression) is considered to be a JSON store. Now, how does MySQL translate the .find or .add constructs to actual MySQL queries? Let’s run a sample query:

mysql-js> db.getCollection("CountryInfo").find('Name= "United States"').limit(1)
[
    {
        "GNP": 8510700,
        "IndepYear": 1776,
        "Name": "United States",
        "_id": "USA",
        "demographics": {
            "LifeExpectancy": 77.0999984741211,
            "Population": 278357000
        },
        "geography": {
            "Continent": "North America",
            "Region": "North America",
            "SurfaceArea": 9363520
        },
        "government": {
            "GovernmentForm": "Federal Republic",
            "HeadOfState": "George W. Bush",
            "HeadOfState_title": "President"
        }
    }
]
1 document in set (0.02 sec)

mysql-js> db.getCollection("CountryInfo").find('Name= "United States"').limit(1)

[

{

"GNP": 8510700,

"IndepYear": 1776,

"Name": "United States",

"_id": "USA",

"demographics": {

"LifeExpectancy": 77.0999984741211,

"Population": 278357000

"geography": {

"Continent": "North America",

"Region": "North America",

"SurfaceArea": 9363520

"government": {

"GovernmentForm": "Federal Republic",

"HeadOfState": "George W. Bush",

"HeadOfState_title": "President"

}

]

1 document in set (0.02 sec)

and now look at the slow query log again:

2016-05-17T21:02:21.213899Z  186 Query  SELECT doc FROM `world_x`.`CountryInfo` WHERE (JSON_EXTRACT(doc,'$.Name') = 'United States') LIMIT 1

1	2016-05-17T21:02:21.213899Z 186 Query SELECT doc FROM `world_x`.`CountryInfo` WHERE (JSON_EXTRACT(doc,'$.Name') = 'United States') LIMIT 1

We can verify that MySQL translates all document store commands to SQL. That also means that it is 100% transparent to the existing MySQL storage level and will work with other storage engines. Let’s verify that, just for fun:

mysql> alter table CountryInfo engine=MyISAM;
Query OK, 239 rows affected (0.06 sec)
Records: 239  Duplicates: 0  Warnings: 0

mysql-js> db.getCollection("CountryInfo").find('Name= "United States"').limit(1)
[
    {
        "GNP": 8510700,
        "IndepYear": 1776,
        "Name": "United States",
        "_id": "USA",
        "demographics": {
            "LifeExpectancy": 77.0999984741211,
            "Population": 278357000
        },
        "geography": {
            "Continent": "North America",
            "Region": "North America",
            "SurfaceArea": 9363520
        },
        "government": {
            "GovernmentForm": "Federal Republic",
            "HeadOfState": "George W. Bush",
            "HeadOfState_title": "President"
        }
    }
]
1 document in set (0.00 sec)

2016-05-17T21:09:21.074726Z 2399 Query  alter table CountryInfo engine=MyISAM
2016-05-17T21:09:41.037575Z 2399 Quit   
2016-05-17T21:09:43.014209Z  186 Query  SELECT doc FROM `world_x`.`CountryInfo` WHERE (JSON_EXTRACT(doc,'$.Name') = 'United States') LIMIT 1

mysql> alter table CountryInfo engine=MyISAM;

Query OK, 239 rows affected (0.06 sec)

Records: 239 Duplicates: 0 Warnings: 0

mysql-js> db.getCollection("CountryInfo").find('Name= "United States"').limit(1)

[

{

"GNP": 8510700,

"IndepYear": 1776,

"Name": "United States",

"_id": "USA",

"demographics": {

"LifeExpectancy": 77.0999984741211,

"Population": 278357000

"geography": {

"Continent": "North America",

"Region": "North America",

"SurfaceArea": 9363520

"government": {

"GovernmentForm": "Federal Republic",

"HeadOfState": "George W. Bush",

"HeadOfState_title": "President"

}

]

1 document in set (0.00 sec)

2016-05-17T21:09:21.074726Z 2399 Query alter table CountryInfo engine=MyISAM

2016-05-17T21:09:41.037575Z 2399 Quit

2016-05-17T21:09:43.014209Z 186 Query SELECT doc FROM `world_x`.`CountryInfo` WHERE (JSON_EXTRACT(doc,'$.Name') = 'United States') LIMIT 1

Worked fine!

Now, how about the performance? We can simply take the SQL query and run explain:

mysql> explain SELECT doc FROM `world_x`.`CountryInfo` WHERE (JSON_EXTRACT(doc,'$.Name') = 'United States') LIMIT 1
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: CountryInfo
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 239
     filtered: 100.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

mysql> explain SELECT doc FROM `world_x`.`CountryInfo` WHERE (JSON_EXTRACT(doc,'$.Name') = 'United States') LIMIT 1

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: CountryInfo

partitions: NULL

type: ALL

possible_keys: NULL

key: NULL

key_len: NULL

ref: NULL

rows: 239

filtered: 100.00

Extra: Using where

1 row in set, 1 warning (0.00 sec)

Hmm, it looks like it is not using an index. That’s because there is no index on Name. Can we add one? Sure, we can add a virtual column and then index it:

mysql> alter table CountryInfo add column Name varchar(255)
    -> GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$.Name'))) VIRTUAL;
Query OK, 0 rows affected (0.12 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> alter table CountryInfo add key (Name);
Query OK, 0 rows affected (0.02 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> explain SELECT doc FROM `world_x`.`CountryInfo` WHERE (JSON_EXTRACT(doc,'$.Name') = 'United States') LIMIT 1
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: CountryInfo
   partitions: NULL
         type: ref
possible_keys: name
          key: name
      key_len: 768
          ref: const
         rows: 1
     filtered: 100.00
        Extra: NULL
1 row in set, 1 warning (0.00 sec)

mysql> alter table CountryInfo add column Name varchar(255)

-> GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$.Name'))) VIRTUAL;

Query OK, 0 rows affected (0.12 sec)

Records: 0 Duplicates: 0 Warnings: 0

mysql> alter table CountryInfo add key (Name);

Query OK, 0 rows affected (0.02 sec)

Records: 0 Duplicates: 0 Warnings: 0

mysql> explain SELECT doc FROM `world_x`.`CountryInfo` WHERE (JSON_EXTRACT(doc,'$.Name') = 'United States') LIMIT 1

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: CountryInfo

partitions: NULL

type: ref

possible_keys: name

key: name

key_len: 768

ref: const

rows: 1

filtered: 100.00

Extra: NULL

1 row in set, 1 warning (0.00 sec)

That is really cool! We have added an index, and now the original query starts using it. Note that we do not have to reference the new field, the MySQL optimizer is smart enough to translate the (JSON_EXTRACT(doc,'$.Name') = 'United States' to an index scan on the virtual column.

But please note: JSON attributes are case-sensitive. If you will use (doc,'$.name') instead of (doc,'$.Name') it will not generate an error, but will simply break the search and all queries looking for “Name” will return 0 rows.

Finally, if you looked closely at the output of db.getCollection("CountryInfo").find('Name= "United States"').limit(1) , you noticed that the database has outdated info:

        "government": {
            "GovernmentForm": "Federal Republic",
            "HeadOfState": "George W. Bush",
            "HeadOfState_title": "President"
        }

"government": {

"GovernmentForm": "Federal Republic",

"HeadOfState": "George W. Bush",

"HeadOfState_title": "President"

}

Let’s change “George W. Bush” to “Barack Obama” using the .modify clause:

mysql-js> db.CountryInfo.modify("Name = 'United States'").set("government.HeadOfState", "Barack Obama" );
Query OK, 1 item affected (0.02 sec)

mysql-js> db.CountryInfo.find('Name= "United States"')
[
    {
        "GNP": 8510700,
        "IndepYear": 1776,
        "Name": "United States",
        "_id": "USA",
        "demographics": {
            "LifeExpectancy": 77.0999984741211,
            "Population": 278357000
        },
        "geography": {
            "Continent": "North America",
            "Region": "North America",
            "SurfaceArea": 9363520
        },
        "government": {
            "GovernmentForm": "Federal Republic",
            "HeadOfState": "Barack Obama",
            "HeadOfState_title": "President"
        }
    }
]
1 document in set (0.00 sec)

mysql-js> db.CountryInfo.modify("Name = 'United States'").set("government.HeadOfState", "Barack Obama" );

Query OK, 1 item affected (0.02 sec)

mysql-js> db.CountryInfo.find('Name= "United States"')

[

{

"GNP": 8510700,

"IndepYear": 1776,

"Name": "United States",

"_id": "USA",

"demographics": {

"LifeExpectancy": 77.0999984741211,

"Population": 278357000

"geography": {

"Continent": "North America",

"Region": "North America",

"SurfaceArea": 9363520

"government": {

"GovernmentForm": "Federal Republic",

"HeadOfState": "Barack Obama",

"HeadOfState_title": "President"

}

]

1 document in set (0.00 sec)

Conclusion

Document store is an interesting concept and a good add-on on top of the existing MySQL JSON feature. Using the new .find/.add/.modify methods instead of the original SQL statements can be convenient in some cases.

Some might ask, “why do you want to use document store and store information in JSON inside the database if it is relational anyway?” Storing data in JSON can be quite useful in some cases, for example:

- You already have a JSON (i.e., from external feeds) and need to store it anyway. Using the JSON datatype will be more convenient and more efficient.

- You have a flexible schema, typical for the Internet of Things for example, where some sensors might only send temperature data, some might send temperature/humidity/light (but light information is only recorded during the day), etc. Storing it in the JSON format can be more convenient so that you do not have to declare all possible fields in advance, and do not have to run “alter table” if a new sensor starts sending new types of data.

In the next two blog posts, I will show how to use document store for Internet of Things / event streaming, and how to use X Protocol for asynchronous queries in MySQL.

0 0 votes

Article Rating

3 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Jonathan

10 years ago

Maybe this is too far ahead, but:
Is there a possibility to integrate stored procedures with this new JSON interface to simulate a database API layer?
Because that can possibly cut out a lot of work for developers.

Tanvi Shah

9 years ago

Hi,

I have to find data by id.. But my id does not exist.. so it is giving timeout error..
Any way to handle this and return message as “ID Does not exist”