November 28, 2014

Measuring the amount of writes in InnoDB redo logs

Choosing a good InnoDB log file size is key to InnoDB write performance. This can be done by measuring the amount of writes in the redo logs. You can find a detailed explanation in this post. To sum up, here are the main points: The redo logs should be large enough to store at most […]

Distributed Set Processing with Shard-Query

Can Shard-Query scale to 20 nodes? Peter asked this question in comments to to my previous Shard-Query benchmark. Actually he asked if it could scale to 50, but testing 20 was all I could due to to EC2 and time limits. I think the results at 20 nodes are very useful to understand the performance: […]

Shard-Query turbo charges Infobright community edition (ICE)

Shard-Query is an open source tool kit which helps improve the performance of queries against a MySQL database by distributing the work over multiple machines and/or multiple cores. This is similar to the divide and conquer approach that Hive takes in combination with Hadoop. Shard-Query applies a clever approach to parallelism which allows it to […]

Converting Character Sets

The web is going the way of utf8. Drizzle has chosen it as the default character set, most back-ends to websites use it to store text data, and those who are still using latin1 have begun to migrate their databases to utf8. Googling for “mysql convert charset to utf8″ results in a plethora of sites, […]

New SpecJAppServer results at MySQL and Sun.

As you likely have seen Sun has posted the new SpecJAppServer Results More information from Tom Daly can be found here These results are quite interesting for me as I worked on some of the previous SpecJAppServer Benchmarks several years ago while being employed by MySQL. These are great results, plus they can be relevant […]

Sphinx 0.9.8 is released just in time for OSCON 2008

As you probably already seen in a post by Baron, Sphinx Release 0.9.8 is finally out, just in time for OSCON 2008. Even though it is “minor release” if you look at the number, it is major release in practice (and you can view snapshots as minor releases). The changes since 0.9.7 are dramatic with […]

Missing Data – rows used to generate result set

As Baron writes it is not the number of rows returned by the query but number of rows accessed by the query will most likely be defining query performance. Of course not all row accessed are created equal (such as full table scan row accesses may be much faster than random index lookups row accesses […]

Updated msl (microslow) patch, installation walk-through!

For a couple of months there have been no updates to our msl patch, however recently I managed some time to change this. The functionality was extended a little bit and what’s even more important the patch is available for all the recent MySQL releases. To remind anyone who has not yet come across this […]

MySQL Optimizer team comments on TPC-H Results

Yesterday I had a chance to speak to Igor – head of MySQL optimizer team and Timur – both of them expressed concern with TPC-H run results I posted and notes about little gains in MySQL 6.0. Do not get this post wrong. I’m not saying MySQL 6.0 SubQuery optimizations are non existent or priorities […]

Using InfiniDB MySQL server with Hadoop cluster for data analytics

In my previous post about Hadoop and Impala I benchmarked performance of analytical queries in Impala. This time I’ve tried InfiniDB for Hadoop (open-source version) on the modern hardware with an 8-node Hadoop cluster. One of the main advantages (at least for me) of InifiniDB for Hadoop is that it stores the data inside the Hadoop cluster but uses the […]