In this blog, we will discuss MySQL 5.7 asynchronous query execution using the X Plugin.
Overview
MySQL 5.7 supports X Plugin / X Protocol, which allows (if the library supports it) asynchronous query execution. In 2014, I published a blog on how to increase a slow query performance with the parallel query execution. There, I created a prototype in the bash shell. Here, I’ve tried a similar idea with NodeJS + mysqlx library (which uses MySQL X Plugin).
TL;DR version: By using the MySQL X Plugin with NodeJS I was able to increase query performance 10x (some query rewrite required).
X Protocol and NodeJS
Here are the steps required:
|
1 |
# node --version<br>v4.4.4<br># wget https://dev.mysql.com/get/Downloads/Connector-Nodejs/mysql-connector-nodejs-1.0.2.tar.gz<br># npm install mysql-connector-nodejs-1.0.2.tar.gz |
Test data
I’m using the same Wikipedia Page Counts dataset (wikistats) I’ve used for my Apache Spark and MySQL example. Let’s imagine we want to compare the popularity of MySQL versus PostgeSQL in January 2008 (comparing the total page views). Here are the sample queries:
|
1 |
mysql> select sum(tot_visits) from wikistats_by_day_spark where url like '%mysql%';<br>mysql> select sum(tot_visits) from wikistats_by_day_spark where url like '%postgresql%'; |
The table size only holds data for English Wikipedia for January 2008, but still has ~200M rows and ~16G in size. Both queries run for ~5 minutes each, and utilize only one CPU core (one connection = one CPU core). The box has 24 CPU cores, Intel(R) Xeon(R) CPU L5639 @ 2.13GHz. Can we run the query in parallel, utilizing all cores?
That is possible now with NodeJS and X Plugin, but require some preparation:
|
1 |
CREATE TABLE `wikistats_by_day_spark_part` (<br> `id` int(11) NOT NULL AUTO_INCREMENT,<br> `mydate` date NOT NULL,<br> `url` text,<br> `cnt` bigint(20) NOT NULL,<br> `tot_visits` bigint(20) DEFAULT NULL,<br> PRIMARY KEY (`id`)<br>) ENGINE=InnoDB AUTO_INCREMENT=239863472 DEFAULT CHARSET=latin1<br>/*!50100 PARTITION BY HASH (id)<br>PARTITIONS 24 */ |
|
1 |
select sum(tot_visits) from wikistats_by_day_spark_part partition (p<N>) where url like '%mysql%'; |
The code
|
1 |
var mysqlx = require('mysqlx');<br><br>var cs_pre = {<br> host: 'localhost',<br> port: 33060,<br> dbUser: 'root',<br> dbPassword: 'mysql'<br>};<br><br>var cs = {<br> host: 'localhost',<br> port: 33060,<br> dbUser: 'root',<br> dbPassword: 'mysql'<br>};<br><br>var partitions = [];<br>var res = [];<br>var total = 0;<br>mysqlx.getNodeSession( cs_pre ).then(session_pre => {<br> var sql="select partition_name from information_schema.partitions where table_name = 'wikistats_by_day_spark_part' and table_schema = 'wikistats' ";<br> session_pre.executeSql(sql)<br> .execute(function (row) {<br> partitions.push(row);<br> }).catch(err => {<br> console.log(err);<br> })<br> .then( function () {<br> partitions.forEach(function(p) {<br> mysqlx.getNodeSession( cs ).then(session => {<br> var sql="select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(" + p + ") where url like '%mysql%';"<br> console.log("Started SQL for partiton: " + p);<br> return Promise.all([<br> session.executeSql(sql)<br> .execute(function (row) {<br> console.log(p + ":" + row);<br> res.push(row);<br> total = Number(total) + Number(row);<br> }).catch(err => {<br> console.log(err);<br> }),<br> session.close()<br> ]);<br> }).catch(err => {<br> console.log(err + "partition: " + p);<br> }).then(function() {<br> // All done<br> if (res.length == partitions.length) {<br> console.log("All done! Total: " + total);<br> // can now sort "res" array if needed an display<br> }<br> });<br> });<br> });<br> session_pre.close();<br>});<br><br>console.log("Starting..."); |
The explanation
The idea here is rather simple:
|
1 |
.execute(function (row) {<br> console.log(p + ":" + row);<br> res.push(row);<br> total = Number(total) + Number(row); ... |
Asynchronous Salad: tomacucumtoes,bersmayonn,aise *
This may blow your mind: because everything is running asynchronously, the callback functions will return when ready. Here is the result of the above script:
|
1 |
$ time node async_wikistats.js <br>Starting...<br>Started SQL for partiton: p0<br>Started SQL for partiton: p1<br>Started SQL for partiton: p2<br>Started SQL for partiton: p3<br>Started SQL for partiton: p4<br>Started SQL for partiton: p5<br>Started SQL for partiton: p7<br>Started SQL for partiton: p8<br>Started SQL for partiton: p6<br>Started SQL for partiton: p9<br>Started SQL for partiton: p10<br>Started SQL for partiton: p12<br>Started SQL for partiton: p13<br>Started SQL for partiton: p11<br>Started SQL for partiton: p14<br>Started SQL for partiton: p15<br>Started SQL for partiton: p16<br>Started SQL for partiton: p17<br>Started SQL for partiton: p18<br>Started SQL for partiton: p19<br>Started SQL for partiton: p20<br>Started SQL for partiton: p21<br>Started SQL for partiton: p22<br>Started SQL for partiton: p23 |
… here the script will wait for the async calls to return, and they will return when ready – the order is not defined.
Meanwhile, we can watch MySQL processlist:
|
1 |
+------+------+-----------------+-----------+---------+-------+--------------+-------------------------------------------------------------------------------------------------------------------+<br>| Id | User | Host | db | Command | Time | State | Info |<br>+------+------+-----------------+-----------+---------+-------+--------------+-------------------------------------------------------------------------------------------------------------------+<br>| 186 | root | localhost:44750 | NULL | Sleep | 21391 | cleaning up | PLUGIN |<br>| 2290 | root | localhost | wikistats | Sleep | 1417 | | NULL |<br>| 2510 | root | localhost:41737 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p0) where url like '%mysql%' |<br>| 2511 | root | localhost:41738 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p1) where url like '%mysql%' |<br>| 2512 | root | localhost:41739 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p2) where url like '%mysql%' |<br>| 2513 | root | localhost:41741 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p4) where url like '%mysql%' |<br>| 2514 | root | localhost:41740 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p3) where url like '%mysql%' |<br>| 2515 | root | localhost:41742 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p5) where url like '%mysql%' |<br>| 2516 | root | localhost:41743 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p6) where url like '%mysql%' |<br>| 2517 | root | localhost:41744 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p7) where url like '%mysql%' |<br>| 2518 | root | localhost:41745 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p8) where url like '%mysql%' |<br>| 2519 | root | localhost:41746 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p9) where url like '%mysql%' |<br>| 2520 | root | localhost:41747 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p10) where url like '%mysql%' |<br>| 2521 | root | localhost:41748 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p11) where url like '%mysql%' |<br>| 2522 | root | localhost:41749 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p12) where url like '%mysql%' |<br>| 2523 | root | localhost:41750 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p13) where url like '%mysql%' |<br>| 2524 | root | localhost:41751 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p14) where url like '%mysql%' |<br>| 2525 | root | localhost:41752 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p15) where url like '%mysql%' |<br>| 2526 | root | localhost:41753 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p16) where url like '%mysql%' |<br>| 2527 | root | localhost:41754 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p17) where url like '%mysql%' |<br>| 2528 | root | localhost:41755 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p18) where url like '%mysql%' |<br>| 2529 | root | localhost:41756 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p19) where url like '%mysql%' |<br>| 2530 | root | localhost:41757 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p20) where url like '%mysql%' |<br>| 2531 | root | localhost:41758 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p21) where url like '%mysql%' |<br>| 2532 | root | localhost:41759 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p22) where url like '%mysql%' |<br>| 2533 | root | localhost:41760 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p23) where url like '%mysql%' |<br>| 2534 | root | localhost | NULL | Query | 0 | starting | show full processlist |<br>+------+------+-----------------+-----------+---------+-------+--------------+-------------------------------------------------------------------------------------------------------------------+<br> |
And CPU utilization:
|
1 |
Tasks: 41 total, 1 running, 33 sleeping, 7 stopped, 0 zombie<br>%Cpu0 : 91.9 us, 1.7 sy, 0.0 ni, 6.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st<br>%Cpu1 : 97.3 us, 2.7 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st<br>%Cpu2 : 97.0 us, 3.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st<br>%Cpu3 : 97.7 us, 2.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st<br>%Cpu4 : 95.7 us, 2.7 sy, 0.0 ni, 1.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st<br>%Cpu5 : 98.3 us, 1.7 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st<br>%Cpu6 : 98.3 us, 1.7 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st<br>%Cpu7 : 97.7 us, 2.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st<br>%Cpu8 : 96.7 us, 3.0 sy, 0.0 ni, 0.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st<br>%Cpu9 : 98.3 us, 1.7 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st<br>%Cpu10 : 95.7 us, 4.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st<br>%Cpu11 : 97.7 us, 2.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st<br>%Cpu12 : 98.0 us, 2.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st<br>%Cpu13 : 98.0 us, 1.7 sy, 0.0 ni, 0.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st<br>%Cpu14 : 97.7 us, 2.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st<br>%Cpu15 : 97.3 us, 2.7 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st<br>%Cpu16 : 98.0 us, 2.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st<br>%Cpu17 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st<br>%Cpu18 : 97.3 us, 2.7 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st<br>%Cpu19 : 98.7 us, 1.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st<br>%Cpu20 : 99.3 us, 0.7 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st<br>%Cpu21 : 97.3 us, 2.3 sy, 0.0 ni, 0.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st<br>%Cpu22 : 97.0 us, 3.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st<br>%Cpu23 : 96.0 us, 4.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st<br>...<br> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND <br>18901 mysql 20 0 25.843g 0.017t 7808 S 2386 37.0 295:34.05 mysqld |
Now, here is our “salad”:
|
1 |
p1:2499<br>p23:2366<br>p2:2297<br>p0:4735<br>p12:12349<br>p14:1412<br>p3:2045<br>p16:4157<br>p20:3160<br>p18:8717<br>p17:2967<br>p13:4519<br>p15:5462<br>p10:1312<br>p5:2815<br>p7:4644<br>p9:766<br>p4:3218<br>p6:4175<br>p21:2958<br>p8:929<br>p19:4182<br>p22:3231<br>p11:4020 |
As we can see, all partitions are in random order. If needed, we can even sort the result array (which isn’t needed for this example as we only care about the total). Finally our result and timing:
|
1 |
All done! Total: 88935<br><br>real 0m30.668s<br>user 0m0.256s<br>sys 0m0.028s<br> |
Timing and Results
If you are interested in the original question (MySQL versus PostgreSQL, Jan 2008):
Further Reading:
PS: Original Asynchronous Salad Joke, by Vlad @Crazy_Owl (in Russian)