Justin Swanhart, Author at Percona Database Performance Blog

Justin Swanhart
Justin is a former Principal Support Engineer on the support team. In the past, he was a trainer at Percona and a consultant. Justin also created and maintains Shard-Query, a middleware tool for sharding and parallel query execution and Flexviews, a tool for materialized views for MySQL. Prior to working at Percona Justin consulted for Proven Scaling, was a backend engineer at Yahoo! and a database administrator at Smule and Gazillion games.

Advanced JSON to MySQL indexing

Mar 10, 2015 | MySQL

This post will discuss some excellent methods of getting JSON to MySQL indexing to work smoothly. What is JSON JSON is an text based, human readable format for transmitting data between systems, for serializing objects and for storing document store data for documents that have different attributes/schema for each document. Popular document store databases use […]

‘Indexing’ JSON documents for efficient MySQL queries over JSON data

Feb 17, 2015 | MySQL

MySQL meets NoSQL with JSON UDF I recently got back from FOSDEM, in Brussels, Belgium. While I was there I got to see a great talk by Sveta Smirnova, about her MySQL 5.7 Labs release JSON UDF functions. It is important to note that while the UDF come in a 5.7 release it is absolutely […]

How to scale big data applications using MySQL sharding frameworks

Sep 23, 2014 | MySQL, Webinars

This Wednesday I’ll be discussing two common types of big data: machine-generated data and user-generated content. These types of big data are amenable to sharding, a commonly used technique for spreading data over more than one database server. I’ll be discussing this in-depth during a live webinar at 10 a.m. Pacific time on Sept. 24. […]

Generating test data from the mysql> prompt

Sep 10, 2014 | Benchmarks, Insight for DBAs, MySQL

There are a lot of tools that generate test data. Many of them have complex XML scripts or GUI interfaces that let you identify characteristics about the data. For testing query performance and many other applications, however, a simple quick and dirty data generator which can be constructed at the MySQL command line is useful. […]

Trawling the binlog with FlexCDC and new FlexCDC plugins for MySQL

Aug 27, 2014 | MySQL

Swanhart-Tools includes FlexCDC, a change data capture tool for MySQL. FlexCDC follows a server’s binary log and usually writes “changelogs” that track the changes to tables in the database. I say usually because the latest version of Swanhart-Tools (only in github for now) supports FlexCDC plugins, which allow you to send the updates to a remote […]

Parallel Query for MySQL with Shard-Query

May 01, 2014 | Insight for DBAs, MySQL

While Shard-Query can work over multiple nodes, this blog post focuses on using Shard-Query with a single node. Shard-Query can add parallelism to queries which use partitioned tables. Very large tables can often be partitioned fairly easily. Shard-Query can leverage partitioning to add paralellism, because each partition can be queried independently. Because MySQL 5.6 supports the […]

MySQL webinar: ‘Introduction to open source column stores’

Sep 12, 2013 | MySQL, Webinars

Join me Wednesday, September 18 at 10 a.m. PDT for an hour-long webinar where I will introduce the basic concepts behind column store technology. The webinar’s title is: “Introduction to open source column stores.” What will be discussed? This webinar will talk about Infobright, LucidDB, MonetDB, Hadoop (Impala) and other column stores I will compare […]

MySQL and the SSB – Part 2 – MyISAM vs InnoDB low concurrency

May 22, 2013 | Benchmarks, MySQL

This blog post is part two in what is now a continuing series on the Star Schema Benchmark. In my previous blog post I compared MySQL 5.5.30 to MySQL 5.6.10, both with default settings using only the InnoDB storage engine. In my testing I discovered that innodb_old_blocks_time had an effect on performance of the benchmark. There was […]

MySQL 5.6 vs MySQL 5.5 and the Star Schema Benchmark

Mar 11, 2013 | MySQL

So far most of the benchmarks posted about MySQL 5.6 use the sysbench OLTP workload. I wanted to test a set of queries which, unlike sysbench, utilize joins. I also wanted an easily reproducible set of data which is more rich than the simple sysbench table. The Star Schema Benchmark (SSB) seems ideal for this. […]

Webinar: Building a highly scaleable distributed row, document or column store with MySQL and Shard-Query

Feb 06, 2013 | MySQL, Percona Events

On Friday, February 15, 2013 10:00am Pacific Standard Time, I will be delivering a webinar entitled “Building a highly scaleable distributed row, document or column store with MySQL and Shard-Query” The first part of this webinar will focus on why distributed databases are needed, and on the techniques employed by Shard-Query to implement a distributed […]

Replication of the NOW() function (also, time travel)

Nov 28, 2012 | Insight for DBAs, MySQL

Notice the result of the NOW() function in the following query. The query was run on a real database server and I didn’t change the clock of the server or change anything in the database configuration settings.

mysql> SELECT NOW(),SYSDATE();
+---------------------+---------------------+
| NOW()               | SYSDATE()           |
+---------------------+---------------------+
| 1999-01-01 00:00:00 | 2012-11-29 05:50:03 |
+---------------------+---------------------+
1 row in set (0.00 sec)

mysql> SELECT NOW(),SYSDATE();

+---------------------+---------------------+

| NOW() | SYSDATE() |

+---------------------+---------------------+

| 1999-01-01 00:00:00 | 2012-11-29 05:50:03 |

+---------------------+---------------------+

1 row in set (0.00 sec)

You may proceed to party like it is 1999. How can the NOW() function return a value […]

REPEATABLE-READ and READ-COMMITTED Transaction Isolation Levels

Aug 28, 2012 | Insight for DBAs, MySQL

As an instructor with Percona, I’m sometimes asked about the differences between the REPEATABLE-READ and READ-COMMITTED transaction isolation levels. There are a few differences between them, and they are all related to locking.

Flexviews is a working scalable database transactional memory example

May 19, 2011 | MySQL

http://Flexvie.ws fully implements a method for creating materialized views for MySQL data sets. The tool is for MySQL, but the methods are database agnostic. A materialized view is an analogue of software transactional memory. You can think of this as database transactional memory, or as database state distributed over time, but in an easy way […]

The case for getting rid of duplicate “sets”

May 17, 2011 | MySQL

The most useful feature of the relational database is that it allows us to easily process data in sets, which can be much faster than processing it serially. When the relational database was first implemented, write-ahead-logging and other technologies did not exist. This made it difficult to implement the database in a way that matched […]

Checking the subset sum set problem with set processing

May 16, 2011 | MySQL

Hi, Here is an easy way to run the subset sum check from SQL, which you can then distribute with Shard-Query:

CREATE TABLE `the list` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `val` bigint(20) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `id` (`id`)
) ENGINE=MyISAM;

SELECT val as `val`, COUNT(DISTINCT (id)) as `cd`
FROM  test.data as d  WHERE val in (-2,-3,-10,15,15,16)
GROUP BY val;
+-----+----------+----------+
| val | cd       | CNT      |
+-----+----------+----------+
| -10 |        1 |        1 |
|  -3 |        1 |        1 |
|  -2 |        1 |        1 |
|  15 | 35417088 | 35417088 |
+-----+----------+----------+
5 rows in set (40.20 sec)

CREATE TABLE `the list` (

`id` bigint(20) NOT NULL AUTO_INCREMENT,

`val` bigint(20) NOT NULL DEFAULT '0',

PRIMARY KEY (`id`),

KEY `id` (`id`)

) ENGINE=MyISAM;

SELECT val as `val`, COUNT(DISTINCT (id)) as `cd`

FROM test.data as d WHERE val in (-2,-3,-10,15,15,16)

GROUP BY val;

+-----+----------+----------+

| val | cd | CNT |

+-----+----------+----------+

| -10 | 1 | 1 |

| -3 | 1 | 1 |

| -2 | 1 | 1 |

| 15 | 35417088 | 35417088 |

+-----+----------+----------+

5 rows in set (40.20 sec)

Notice there is no 16 in the list. We did not pass the check. There are enough 15s though. The distinct value count for each item in the output set, must at least […]

Using any general purpose computer as a special purpose SIMD computer

May 16, 2011 | MySQL

Often times, from a computing perspective, one must run a function on a large amount of input. Often times, the same function must be run on many pieces of input, and this is a very expensive process unless the work can be done in parallel. Shard-Query introduces set based processing, which on the surface appears […]

Distributed Set Processing with Shard-Query

May 14, 2011 | MySQL

Can Shard-Query scale to 20 nodes? Peter asked this question in comments to to my previous Shard-Query benchmark. Actually he asked if it could scale to 50, but testing 20 was all I could due to to EC2 and time limits. I think the results at 20 nodes are very useful to understand the performance: […]

Distributed set processing performance analysis with ICE 3.5.2pl1 at 20 nodes.

May 14, 2011 | MySQL

Demonstrating distributed set processing performance Shard-Query + ICE scales very well up to at least 20 nodes This post is a detailed performance analysis of what I’ve coined “distributed set processing”. Please also read this post’s “sister post” which describes the distributed set processing technique. Also, remember that Percona can help you get up and […]

Shard-Query EC2 images available

May 11, 2011 | MySQL

Infobright and InnoDB AMI images are now available There are now demonstration AMI images for Shard-Query. Each image comes pre-loaded with the data used in the previous Shard-Query blog post. The data in the each image is split into 20 “shards”. This blog post will refer to an EC2 instances as a node from here […]

Shard-Query turbo charges Infobright community edition (ICE)

May 06, 2011 | MySQL

Shard-Query is an open source tool kit which helps improve the performance of queries against a MySQL database by distributing the work over multiple machines and/or multiple cores. This is similar to the divide and conquer approach that Hive takes in combination with Hadoop. Shard-Query applies a clever approach to parallelism which allows it to […]

MySQL 5.7
Support

Compare Percona to Leading Database Solutions

Software
Downloads

Valkey Contribution

Product Documentation

Resource Hub

Why Percona for MongoDB?

Why Percona for PostgreSQL?

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Advanced JSON to MySQL indexing

‘Indexing’ JSON documents for efficient MySQL queries over JSON data

How to scale big data applications using MySQL sharding frameworks

Generating test data from the mysql> prompt

Trawling the binlog with FlexCDC and new FlexCDC plugins for MySQL

Parallel Query for MySQL with Shard-Query

MySQL webinar: ‘Introduction to open source column stores’

MySQL and the SSB – Part 2 – MyISAM vs InnoDB low concurrency

MySQL 5.6 vs MySQL 5.5 and the Star Schema Benchmark

Webinar: Building a highly scaleable distributed row, document or column store with MySQL and Shard-Query

Replication of the NOW() function (also, time travel)

REPEATABLE-READ and READ-COMMITTED Transaction Isolation Levels

Flexviews is a working scalable database transactional memory example

The case for getting rid of duplicate “sets”

Checking the subset sum set problem with set processing

Using any general purpose computer as a special purpose SIMD computer

Distributed Set Processing with Shard-Query

Distributed set processing performance analysis with ICE 3.5.2pl1 at 20 nodes.

Shard-Query EC2 images available

Shard-Query turbo charges Infobright community edition (ICE)

MySQL 5.7 Support

Compare Percona to Leading Database Solutions

Software Downloads

Valkey Contribution

Product Documentation

Resource Hub

Why Percona for MongoDB?

Why Percona for PostgreSQL?

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

MySQL 5.7
Support

Software
Downloads