September 21, 2014

The relationship between Innodb Log checkpointing and dirty Buffer pool pages

This is a time-honored topic, and there’s no shortage of articles on the topic on this blog. I wanted to write a post trying to condense and clarify those posts, as it has taken me a while to really understand this relationship. Some basic facts Most of us know that writing into Innodb updates buffer […]

Optimizing InnoDB for creating 30,000 tables (and nothing else)

Once upon a time, it would have been considered madness to even attempt to create 30,000 tables in InnoDB. That time is now a memory. We have customers with a lot more tables than a mere 30,000. There have historically been no tests for anything near this many tables in the MySQL test suite. So, […]

Aligning IO on a hard disk RAID – the Benchmarks

In the first part of this article I have showed how I align IO, now I want to share results of the benchmark that I have been running to see how much benefit can we get from a proper IO alignment on a 4-disk RAID1+0 with 64k stripe element. I haven’t been running any benchmarks […]

Using any general purpose computer as a special purpose SIMD computer

Often times, from a computing perspective, one must run a function on a large amount of input. Often times, the same function must be run on many pieces of input, and this is a very expensive process unless the work can be done in parallel. Shard-Query introduces set based processing, which on the surface appears […]

Distributed Set Processing with Shard-Query

Can Shard-Query scale to 20 nodes? Peter asked this question in comments to to my previous Shard-Query benchmark. Actually he asked if it could scale to 50, but testing 20 was all I could due to to EC2 and time limits. I think the results at 20 nodes are very useful to understand the performance: […]

Connecting orphaned .ibd files

There are two ways InnoDB can organize tablespaces. First is when all data, indexes and system buffers are stored in a single tablespace. This is typicaly one or several ibdata files. A well known innodb_file_per_table option brings the second one. Tables and system areas are split into different files. Usually system tablespace is located in […]

Shard-Query EC2 images available

Infobright and InnoDB AMI images are now available There are now demonstration AMI images for Shard-Query. Each image comes pre-loaded with the data used in the previous Shard-Query blog post. The data in the each image is split into 20 “shards”. This blog post will refer to an EC2 instances as a node from here […]

Innodb Caching (part 2)

Few weeks ago I wrote about Innodb Caching with main idea you might need more cache when you think you are because Innodb caches data in pages, not rows, and so the whole page needs to be in memory even if you need only one row from it. I have created the simple benchmark which […]

Shard-Query turbo charges Infobright community edition (ICE)

Shard-Query is an open source tool kit which helps improve the performance of queries against a MySQL database by distributing the work over multiple machines and/or multiple cores. This is similar to the divide and conquer approach that Hive takes in combination with Hadoop. Shard-Query applies a clever approach to parallelism which allows it to […]

The two even more fundamental performance metrics

In a recent blog post, I wrote about four fundamental metrics for system performance analysis. These are throughput, residence time, “weighted time” (the sum of all residence times in the observation period — the terminology is mine for lack of a better name), and concurrency. I derived all of these metrics from two “even more […]