In this blog post, we’ll look at what MySQL does under the hood to transform NoSQL requests to SQL (and then store them in InnoDB transactional engine) when using the X Plugin.
X Plugin allows MySQL to function as a document store. We don’t need to define any schema or use SQL language while still being a fully ACID database. Sounds like magic – but we know the only thing that magic does is make planes fly! 🙂
Alexander already wrote a blog post exploring how the X Plugin works, with some examples. In this post, I am going to show some more query examples and how they are transformed.
I have enabled the slow query log to see what it is actually being executed when I run NoSQL queries.
We start the MySQL shell and create our first collection:
|
1 |
$ mysqlsh -u root --py<br>Creating an X Session to root@localhost:33060<br>No default schema selected.<br>[...]<br>Currently in Python mode. Use sql to switch to SQL mode and execute queries.<br>mysql-py> db.createCollection("people") |
What is a collection in SQL terms? A table. Let’s check what MySQL does by reading the slow query log:
|
1 |
CREATE TABLE `people` (<br> `doc` json DEFAULT NULL,<br> `_id` varchar(32) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$._id'))) STORED NOT NULL,<br> PRIMARY KEY (`_id`)<br>) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 |
As we correctly guessed, it creates a table with two columns. One is called “doc” and it stores a JSON document. A second column named “_id” and is created as a virtual column from data extracted from that JSON document. _id is used as a primary key, and if we don’t specify a value, MySQL will choose a random UUID every time we write a document.
So, the basics are clear.
Let’s run a getCollections that would be similar to “SHOW TABLES” in the SQL world:
|
1 |
mysql-py> db.getCollections()<br>[<br> <br>] |
This is what MySQL actually runs:
|
1 |
SELECT C.table_name AS name, IF(ANY_VALUE(T.table_type)='VIEW', 'VIEW', IF(COUNT(*) = COUNT(CASE WHEN (column_name = 'doc' AND data_type = 'json') THEN 1 ELSE NULL END) + COUNT(CASE WHEN (column_name = '_id' AND generation_expression = 'json_unquote(json_extract(`doc`,''$._id''))') THEN 1 ELSE NULL END) + COUNT(CASE WHEN (column_name != '_id' AND generation_expression RLIKE '^(json_unquote[[.(.]])?json_extract[[.(.]]`doc`,''[[.$.]]([[...]][^[:space:][...]]+)+''[[.).]]{1,2}$') THEN 1 ELSE NULL END), 'COLLECTION', 'TABLE')) AS type FROM information_schema.columns AS C LEFT JOIN information_schema.tables AS T USING (table_name)WHERE C.table_schema = 'test' GROUP BY C.table_name ORDER BY C.table_name; |
This time, the query is a bit more complex. It runs a query on information_schema.tables joining it, with information_schema.columns searching for tables that have “doc” and “_id” columns.
I am going to start adding data to our collection. Let’s add our first document:
|
1 |
mysql-py> db.people.add(<br> ... {<br> ... "Name": "Miguel Angel",<br> ... "Country": "Spain",<br> ... "Age": 33<br> ... }<br> ... ) |
In the background, MySQL inserts a JSON object and auto-assign a primary key value.
|
1 |
INSERT INTO `test`.`people` (doc) VALUES (JSON_OBJECT('Age',33,'Country','Spain','Name','Miguel Angel','_id','a45c69cd2074e611f11f62bf9ac407d7')); |
Ok, this is supposed to be schemaless. So let’s add someone else using different fields:
|
1 |
mysql-py> db.people.add(<br> ... {<br> ... "Name": "Thrall",<br> ... "Race": "Orc",<br> ... "Faction": "Horde"<br> ... }<br> ... ) |
Same as before, MySQL just writes another JSON object (with different fields):
|
1 |
INSERT INTO `test`.`people` (doc) VALUES (JSON_OBJECT('Faction','Horde','Name','Thrall','Race','Orc','_id','7092776c2174e611f11f62bf9ac407d7')); |
Now we are going to read the data we have just inserted. First, we are going to find all documents stored in the collection:
|
1 |
mysql-py> db.people.find() |
MySQL translates to a simple:
|
1 |
SELECT doc FROM `test`.`people`; |
And this is how filters are transformed:
|
1 |
mysql-py> db.people.find("Name = 'Thrall'") |
It uses a SELECT with the WHERE clause on data extracted from the JSON object.
|
1 |
SELECT doc FROM `test`.`people` WHERE (JSON_EXTRACT(doc,'$.Name') = 'Thrall'); |
Thrall decided that he doesn’t want to belong to the Horde anymore. He wants to join the Alliance. We need to update the document:
|
1 |
mysql-py> db.people.modify("Name = 'Thrall'").set("Faction", "Alliance") |
MySQL runs an UPDATE, again using a WHERE clause on the data extracted from the JSON. Then, it updates the “Faction”:
|
1 |
UPDATE `test`.`people` SET doc=JSON_SET(doc,'$.Faction','Alliance') WHERE (JSON_EXTRACT(doc,'$.Name') = 'Thrall'); |
Now I want to remove my own document:
|
1 |
mysql-py> db.people.remove("Name = 'Miguel Angel'"); |
As you can already imagine, it runs a DELETE, searching for my name on the data extracted from the JSON object:
|
1 |
DELETE FROM `test`.`people` WHERE (JSON_EXTRACT(doc,'$.Name') = 'Miguel Angel'); |
The magic that makes our MySQL work like a document-store NoSQL database is:
I would define the solution as something really clever, simple and clean. Congrats to Oracle! 🙂
Resources
RELATED POSTS