Author - Justin Swanhart

Advanced JSON to MySQL indexing

This post will discuss some excellent methods of getting JSON to MySQL indexing to work smoothly.
What is JSON
JSON is an text based, human readable format for transmitting data between systems, for serializing objects and for storing document store data for documents that have different attributes/schema for each document. Popular document store databases use JSON […]

Read more

‘Indexing’ JSON documents for efficient MySQL queries over JSON data

MySQL meets NoSQL with JSON UDF
I recently got back from FOSDEM, in Brussels, Belgium. While I was there I got to see a great talk by Sveta Smirnova, about her MySQL 5.7 Labs release JSON UDF functions. It is important to note that while the UDF come in a 5.7 release it is absolutely […]

Read more

How to scale big data applications using MySQL sharding frameworks

This Wednesday I’ll be discussing two common types of big data: machine-generated data and user-generated content. These types of big data are amenable to sharding, a commonly used technique for spreading data over more than one database server.
I’ll be discussing this in-depth during a live webinar at 10 a.m. Pacific time on Sept. 24. […]

Read more

Generating test data from the mysql> prompt

There are a lot of tools that generate test data.  Many of them have complex XML scripts or GUI interfaces that let you identify characteristics about the data. For testing query performance and many other applications, however, a simple quick and dirty data generator which can be constructed at the MySQL command line is […]

Read more

Trawling the binlog with FlexCDC and new FlexCDC plugins for MySQL

Swanhart-Tools includes FlexCDC, a change data capture tool for MySQL. FlexCDC follows a server’s binary log and usually writes “changelogs” that track the changes to tables in the database. I say usually because the latest version of Swanhart-Tools (only in github for now) supports FlexCDC plugins, which allow you to send the updates to a […]

Read more

Parallel Query for MySQL with Shard-Query

While Shard-Query can work over multiple nodes, this blog post focuses on using Shard-Query with a single node.  Shard-Query can add parallelism to queries which use partitioned tables.  Very large tables can often be partitioned fairly easily. Shard-Query can leverage partitioning to add paralellism, because each partition can be queried independently. Because MySQL 5.6 supports […]

Read more

MySQL webinar: ‘Introduction to open source column stores’

Join me Wednesday, September 18 at 10 a.m. PDT for an hour-long webinar where I will introduce the basic concepts behind column store technology. The webinar’s title is: “Introduction to open source column stores.”
What will be discussed?
This webinar will talk about Infobright, LucidDB, MonetDB, Hadoop (Impala) and other column stores

I will compare features between […]

Read more

MySQL and the SSB – Part 2 – MyISAM vs InnoDB low concurrency

This blog post is part two in what is now a continuing series on the Star Schema Benchmark.
In my previous blog post I compared MySQL 5.5.30 to MySQL 5.6.10, both with default settings using only the InnoDB storage engine.  In my testing I discovered that innodb_old_blocks_time had an effect on performance of the benchmark.  There was […]

Read more

MySQL 5.6 vs MySQL 5.5 and the Star Schema Benchmark

MySQL 5.6 vs MySQL 5.5 & the Star Schema Benchmark
So far most of the benchmarks posted about MySQL 5.6 use the sysbench OLTP workload.  I wanted to test a set of queries which, unlike sysbench, utilize joins.  I also wanted an easily reproducible set of data which is more rich than the simple sysbench […]

Read more

Webinar: Building a highly scaleable distributed row, document or column store with MySQL and Shard-Query

On Friday, February 15, 2013 10:00am Pacific Standard Time, I will be delivering a webinar entitled “Building a highly scaleable distributed row, document or column store with MySQL and Shard-Query”
The first part of this webinar will focus on why distributed databases are needed, and on the techniques employed by Shard-Query to implement a distributed […]

Read more