Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Some fun with R visualization

February 23, 2012

Author

Vadim Tkachenko

Benchmarks

MySQL

Share this Post:

My previous post I finished with the graph with unstable results.

There I won’t analyze causes, but rather I want to show some different ways to present results.

I enjoy working with R, and though I am not even close to be proficient in it, I want to share some graphs you can build with R + ggplot2.

The conditions of the benchmark are the same as in the previous post, with difference there are results for 4 and 16 tables cases running MySQL 5.5.20.

Let me remind how I do measurements. I run benchmark for 1 hours, with measurements every 10 seconds.
So we have 360 points – metrics.

If we draw them all, it will look like:

I will also show my R code how to make it

m <- ggplot(dv.ver,
            aes(x = sec, Throughput, color=factor(Tables)))
m + geom_point()

m <- ggplot(dv.ver,

aes(x = sec, Throughput, color=factor(Tables)))

m + geom_point()

The previous graph is not very representative, so we may add some lines to see a trend.

m + geom_point() + geom_line()

1	m + geom_point() + geom_line()

This looks better, but still you may have hard time answering: which case shows the better throughput? what number we should take as the final result?

Jitter graph may help:

m <- ggplot(dv.ver,
            aes(x = factor(Tables), Throughput, color=factor(Tables)))
m + geom_jitter(alpha=0.75)

m <- ggplot(dv.ver,

aes(x = factor(Tables), Throughput, color=factor(Tables)))

m + geom_jitter(alpha=0.75)

With jitter we see some dense areas, which shows “most likely” throughput.

So let’s build density graphs:

m <- ggplot(dd,
            aes(x = Throughput,fill=factor(Tables)))
m+geom_density(alpha = 0.7)

m <- ggplot(dd,

aes(x = Throughput,fill=factor(Tables)))

m+geom_density(alpha = 0.7)

m+geom_density(alpha = 0.7)+facet_wrap(~Tables,ncol=1)

1	m+geom_density(alpha = 0.7)+facet_wrap(~Tables,ncol=1)

In these graphs Axe X is Throughput and Axe Y represents density of hitting given Throughput.
That may give you an idea how to compare both results, and that the biggest density is around 3600-3800 tps.

And we are moving to numbers, we can build boxplots:

m <- ggplot(dd,
            aes(x = factor(Tables),y=Throughput,fill=factor(Tables)))
m+geom_boxplot()

m <- ggplot(dd,

aes(x = factor(Tables),y=Throughput,fill=factor(Tables)))

m+geom_boxplot()

That may be not easy to read if you never saw boxplots. There is good reading on this way to represent data. In short – the middle line inside a box is median (line that divides top 50% and bottom 50%),
the line that limits the top of a box – 75% quantile (divides 75% bottom and 25% top results), and correspondingly
– the line at the bottom of a box – 25% quantile (you should have an idea already what does that mean).
You may decide what measurements you want to take to compare the results – median, 75%, etc.

And finally we can combine jitter and boxplot to get:

m <- ggplot(dd,
            aes(x = factor(Tables),y=Throughput,color=factor(Tables)))
m+geom_boxplot()+geom_jitter()

m <- ggplot(dd,

aes(x = factor(Tables),y=Throughput,color=factor(Tables)))

m+geom_boxplot()+geom_jitter()

That’s it for today.
The full script sysbench-4-16.R with data you can get on benchmarks launchpad

If you want to see more visualizations idea, you may check out Brendan’s blog:

- http://dtrace.org/blogs/brendan/2011/12/18/visualizing-device-utilization/

- http://dtrace.org/blogs/brendan/2012/02/06/visualizing-process-snapshots/

- http://dtrace.org/blogs/brendan/2012/02/12/visualizing-process-execution/

And, yes, if you wonder what to do with such unstable results in MySQL – stay tuned. There is a solution.

Follow @VadimTk

0 0 votes

Article Rating

5 Comments

Oldest

Newest Most Voted

marcos

14 years ago

Hey Vadim, nice post, but reading through it I kept wondering why you didn’t plot a simple histogram with the hist( ) command, with a superimposed mean or median line. That would have shown you just about all you wanted.

Author

Vadim Tkachenko

14 years ago

marcos,

That’s true. As I said I am not that proficient in R, there probably are better ways to do what I did.
I just liked these graphs and wanted to share them.

Will Gunty

14 years ago

Do you have any examples of how to actually extract the values out of that? The graphs are pretty and all, but it’s important to have the actual concrete numbers.

Author

Vadim Tkachenko

14 years ago

Will,

That is also not so hard, though some time is needed to figure it out.

you can use ddply and summarize, something like that:

d1 <- ddply(data,
            c("Server","Threads"),
            summarize,
            Thrp50 = q50(Throughput),
            Resp90 = q90(ResponseTime)
          )

d1 <- ddply(data,

c("Server","Threads"),

summarize,

Thrp50 = q50(Throughput),

Resp90 = q90(ResponseTime)

)

where

q90 <- function(x) { quantile(x, probs = c(0.90)) }
q50 <- function(x) { quantile(x, probs = c(0.50)) }

1 2	q90 <- function(x) { quantile(x, probs = c(0.90)) } q50 <- function(x) { quantile(x, probs = c(0.50)) }

It is equal to SQL language:
SELECT q50(Throughput) FROM data GROUP BY Server, Threads

Kunal V

14 years ago

Vadim,

Been poking around “R” , your blog gave it a boost! Thanks for sharing it.

Resources

MySQL

July 15, 2026

Anil Joshi

Inside MySQL 9.7 LTS Features

July 13, 2026

David Ducos

MyDumper Locking Mechanisms Revisited: Introducing SAFE_NO_LOCK

MySQL

July 10, 2026

Evgeniy Patlan

Running DuckDB as a MySQL 9.7 storage engine

Far
Enough.

Said no pioneer ever.

Get Started

Open source database software from experts who stand with you in production. Forever free from lock-in and other corporate BS.

Connect

Privacy

Legal

Security Center

MySQL, PostgreSQL, InnoDB, MariaDB, MongoDB and Kubernetes are trademarks for their respective owners.

Some fun with R visualization

Inside MySQL 9.7 LTS Features

MyDumper Locking Mechanisms Revisited: Introducing SAFE_NO_LOCK

Running DuckDB as a MySQL 9.7 storage engine

Far Enough.

Far
Enough.