Asynchronous Query Execution with MySQL 5.7 X Plugin

In this blog, we will discuss MySQL 5.7 asynchronous query execution using the X Plugin.

Overview

MySQL 5.7 supports X Plugin / X Protocol, which allows (if the library supports it) asynchronous query execution. In 2014, I published a blog on how to increase a slow query performance with the parallel query execution. There, I created a prototype in the bash shell. Here, I’ve tried a similar idea with NodeJS + mysqlx library (which uses MySQL X Plugin).

TL;DR version: By using the MySQL X Plugin with NodeJS I was able to increase query performance 10x (some query rewrite required).

X Protocol and NodeJS

Here are the steps required:

  1. First, we will need to enable X Plugin in MySQL 5.7.12+, which will use a different port (33060 by default).
  2. Second, download and install NodeJS (>4.2) and mysql-connector-nodejs-1.0.2.tar.gz (follow Getting Started with Connector/Node.JS guide).

    Please note: on older systems, you will probably need to upgrade the nodejs version. Follow the Installing Node.js via package manager guide.
  3. All set! Now we can use the asynchronous queries feature.

Test data 

I’m using the same Wikipedia Page Counts dataset (wikistats) I’ve used for my Apache Spark and MySQL example. Let’s imagine we want to compare the popularity of MySQL versus PostgeSQL in January 2008 (comparing the total page views). Here are the sample queries:

The table size only holds data for English Wikipedia for January 2008, but still has ~200M rows and ~16G in size. Both queries run for ~5 minutes each, and utilize only one CPU core (one connection = one CPU core). The box has 24 CPU cores, Intel(R) Xeon(R) CPU L5639 @ 2.13GHz. Can we run the query in parallel, utilizing all cores?

That is possible now with NodeJS and X Plugin, but require some preparation:

  1. Partition the table using hash, 24 partitions:
  2. Rewrite the query running one connection (= one thread) per each partition, choosing its own partition for each thread:
  3. Wrap it up inside the NodeJS Callback functions / Promises.

The code

The explanation

The idea here is rather simple:

  1. Find all the partitions for the table by using “select partition_name from information_schema.partitions”
  2. For each partition, run the query in parallel: create a connection, run the query with a specific partition name, define the callback function, then close the connection.
  3. As the callback function is used, the code will not be blocked, but rather proceed to the next iteration. When the query is finished, the callback function will be executed.
  4. Inside the callback function, I’m saving the result into an array and also calculating the total (actually I only need a total in this example).

Asynchronous Salad: tomacucumtoes,bersmayonn,aise *

This may blow your mind: because everything is running asynchronously, the callback functions will return when ready. Here is the result of the above script: