How to Measure MySQL Performance in Kubernetes with Sysbench

MySQL Kubernetes SysbenchAs our Percona Kubernetes Operator for Percona XtraDB Cluster gains in popularity, I am getting questions about its performance and how to measure it properly. Sysbench is the most popular tool for database performance evaluation, so let’s review how we can use it with Percona XtraDB Cluster Operator.

Operator Setup

I will assume that you have an operator running (if not, this is the topic for a different post). We have the documentation on how to get it going, and we will start a three-node cluster using the following cr.yaml file:

If we are successful, we will have three pods running:

It’s important to note that IP addresses allocated are internal to Kubernetes Pods and not routable outside of Kubernetes.

Sysbench on an External to Kubernetes Host

In this part, let’s assume we want to run a client (sysbench) on a separate host, which is not a part of the Kubernetes system. How do we do it? We need to expose one of the pods (or multiple) to the external world, and for this, we use Kubernetes service with type NodePort:

So here we see that port 3306 (MySQL port) is exposed as port 30160 on node-3 (node where pod cluster1-pxc-0 is running). Please note this will invoke a kube-proxy process on node-3, which will handle incoming traffic on port 30160 and route it to the cluster1-pxc-0 pod. Kube-proxy by itself will introduce some networking overhead.

To find the IP address of Node-3:

So now we can connect the dots and connect the mysql client to IP 147.75.56.103 port 30160 and create database sbtest, which we need to run sysbench:

And now we can prepare data for sysbench (nevermind some parameters, we will come to them later).

Sysbench Running Inside Kubernetes

When we have sysbench running inside Kubernetes, it makes all these networking steps unnecessary and it simplifies a lot of things while also making one more complicated: how do you actually start a pod with sysbench?

For the start, we need an image with sysbench, and prudently we already have one in Docker Hub available as perconalab/sysbench, so we will use that one. And with an image you can prepare a yaml file to start a pod with kubectl create -f sysbench.yaml, or, I prefer to invoke it just from the command line (which is a little bit elaborate):

This way, Kubernetes will schedule sysbench-client pod on any available node, which may not be something we want. To schedule sysbench-client on a specific node, we can use:

This will start sysbench-client on node-3. Now from pod command line we can access mysql just using cluster1-pxc-0  hostname:

A Quick Intro to Sysbench

Although we have covered sysbench multiple times, I was asked to provide a basic intro for different scenarios, so I would like to review some basic options for sysbench.

Prepare Data

Before running a benchmark, we need to prepare the data. From our previous example:

This will create ten tables with 1mln rows each, so it will generate data for ten tables, each about 250MB in size, for a total 2.5GB of data. This gives us an idea what knobs we can use to generate less or more data.

If we want, say, 25GB of data, we can use either 100 tables with 1mln rows each or ten tables with 10mln rows. For 50GB data, we can use 200 tables with 1mln rows or ten tables with 20mln rows, or any combination of tables and rows that will give 200mln rows in total.

Running Benchmark

Sysbench OLTP scenarios provides oltp_read_only  and oltp_read_write scripts, where you can guess by the name  – oltp_read_only will generate only SELECT queries, while oltp_read_write will generate SELECT, UPDATE, INSERT and DELETE queries.

Examples:

Read-only

Read-write

Parameters to Play

From our example, you can see some parameters you can play with:

  • –threads – how many user threads will connect to the database and generate queries. One will generate single-threaded load.
  • –time – for long to run a benchmark. It may vary from very short (60 sec or so) period to very long (hours and hours) if we want to see the stability of the long runs
  • –report-interval=1, how often to report results in progress. I often use one second to see the variance in the performance with one-sec resolution

Results interpretation

Running sysbench from one of the examples, you can see the following output:

The first part is interval reports (one second, as we asked), and there we will see how many threads are running, and the most interesting part is “tps” and “lat” columns that report throughput and latency correspondingly for the given period of time.

In general, we want to see throughput higher and latency lower when we compare different experiments.

And the last part is the total statistics. The part we usually pay attention to is:

And

The more transactions and smaller latency time typically corresponds to better performance.

Share this post

Leave a Reply