In this blog, we’ll look at the MySQL 5.7 document store feature, and how it is implemented.
Document Store
MySQL 5.7.12 is a major new release, as it contains quite a number of new features:
- Document store and “MongoDB” like NoSQL interface to JSON storage
- Protocol X / X Plugin, which can be used for asynchronous queries (I will write about it as well)
- New MySQL shell
Peter already wrote the document store overview; in this post, I will look deeper into the document store implementation. In my next post, I will demonstrate how to use document store for Internet of Things (IoT) and event logging.
Older MySQL 5.7 versions already have a JSON data type, and an ability to create virtual columns that can be indexed. The new document store feature is based on the JSON datatype.
So what is the document store anyway? It is an add-on to a normal MySQL table with a JSON field. Let’s take a deep dive into it and see how it works.
First of all: one can interface with the document store’s collections using the X Plugin (default port: 33060). To do that:
- Enable X Plugin and install MySQL shell.
- Login to a shell:
1mysqlsh --uri root@localhost - Run commands (JavaScript mode, can be switched to SQL or Python):
12345678910111213141516171819202122mysqlsh --uri root@localhostCreating an X Session to root@localhost:33060Enter password:No default schema selected.Welcome to MySQL Shell 1.0.3 Development PreviewCopyright (c) 2016, Oracle and/or its affiliates. All rights reserved.Oracle is a registered trademark of Oracle Corporation and/or itsaffiliates. Other names may be trademarks of their respectiveowners.Type 'help', 'h' or '?' for help.Currently in JavaScript mode. Use sql to switch to SQL mode and execute queries.mysql-js> db = session.getSchema('world_x') <Schema:world_x>mysql-js> db.getCollections(){"CountryInfo": <Collection:CountryInfo>}
Now, how is the document store’s collection different from a normal table? To find out, I’ve connected to a normal MySQL shell:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
mysql world_x Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or g. Your MySQL connection id is 2396 Server version: 5.7.12 MySQL Community Server (GPL) Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or 'h' for help. Type 'c' to clear the current input statement. mysql> show create table CountryInfo *************************** 1. row *************************** Table: CountryInfo Create Table: CREATE TABLE `CountryInfo` ( `doc` json DEFAULT NULL, `_id` varchar(32) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$._id'))) STORED NOT NULL, PRIMARY KEY (`_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 1 row in set (0.00 sec) mysql> show tables; +-------------------+ | Tables_in_world_x | +-------------------+ | City | | Country | | CountryInfo | | CountryLanguage | +-------------------+ 4 rows in set (0.00 sec) |
So the document store is actually an InnoDB table with one field: doc
json + Primary key, which is a generated column.
As we can also see, there are four tables in the world_x database, but db.getCollections() only shows one. So how does MySQL distinguish between a “normal” table and a “document store” table? To find out, we can enable the general query log and see which query is being executed:
1 2 3 4 |
$ mysql -e 'set global general_log=1' $ tail /var/log/general.log 2016-05-17T20:53:12.772114Z 186 Query SELECT table_name, COUNT(table_name) c FROM information_schema.columns WHERE ((column_name = 'doc' and data_type = 'json') OR (column_name = '_id' and generation_expression = 'json_unquote(json_extract(`doc`,''$._id''))')) AND table_schema = 'world_x' GROUP BY table_name HAVING c = 2 2016-05-17T20:53:12.773834Z 186 Query SHOW FULL TABLES FROM `world_x` |
As you can see, every table that has a specific structure (doc JSON or specific generation_expression) is considered to be a JSON store. Now, how does MySQL translate the .find or .add constructs to actual MySQL queries? Let’s run a sample query:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
mysql-js> db.getCollection("CountryInfo").find('Name= "United States"').limit(1) [ { "GNP": 8510700, "IndepYear": 1776, "Name": "United States", "_id": "USA", "demographics": { "LifeExpectancy": 77.0999984741211, "Population": 278357000 }, "geography": { "Continent": "North America", "Region": "North America", "SurfaceArea": 9363520 }, "government": { "GovernmentForm": "Federal Republic", "HeadOfState": "George W. Bush", "HeadOfState_title": "President" } } ] 1 document in set (0.02 sec) |
and now look at the slow query log again:
1 |
2016-05-17T21:02:21.213899Z 186 Query SELECT doc FROM `world_x`.`CountryInfo` WHERE (JSON_EXTRACT(doc,'$.Name') = 'United States') LIMIT 1 |
We can verify that MySQL translates all document store commands to SQL. That also means that it is 100% transparent to the existing MySQL storage level and will work with other storage engines. Let’s verify that, just for fun:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
mysql> alter table CountryInfo engine=MyISAM; Query OK, 239 rows affected (0.06 sec) Records: 239 Duplicates: 0 Warnings: 0 mysql-js> db.getCollection("CountryInfo").find('Name= "United States"').limit(1) [ { |