September 30, 2014

Why do you need many apache children ?

I already wrote kind of about same topic a while ago and now interesting real life case makes me to write again Most Web applications we’re working with have single tier web architecture, meaning there is just single set of apache servers server requests and nothing else – no dedicated server for static content, no […]

Cache Performance Comparison

Jay Pipes continues cache experiements and has compared performance of MySQL Query Cache and File Cache. Jay uses Apache Benchmark to compare full full stack, cached or not which is realistic but could draw missleading picture as contribution of different components may be different depending on your unique applications. For example for application containing a […]

Using InfiniDB MySQL server with Hadoop cluster for data analytics

In my previous post about Hadoop and Impala I benchmarked performance of analytical queries in Impala. This time I’ve tried InfiniDB for Hadoop (open-source version) on the modern hardware with an 8-node Hadoop cluster. One of the main advantages (at least for me) of InifiniDB for Hadoop is that it stores the data inside the Hadoop cluster but uses the […]

Join my Oct. 2 webinar: ‘Implementing MySQL and Hadoop for Big Data’

MySQL DBAs know that integrating MySQL and a big data solution can be challenging. That’s why I invite you to join me this Wednesday (Oct. 2) at 10 a.m. Pacific time for a free webinar in which I’ll walk you through how to implement a successful big data strategy with Apache Hadoop and MySQL. This […]

Why do we care about MySQL Performance at High Concurrency?

In many MySQL Benchmarks we can see performance compared with rather high level of concurrency. In some cases reaching 4,000 or more concurrent threads which hammer databases as quickly as possible resulting in hundreds or even thousands concurrently active queries. The question is how common is it in production ? The typical metrics to use […]

Distributed Set Processing with Shard-Query

Can Shard-Query scale to 20 nodes? Peter asked this question in comments to to my previous Shard-Query benchmark. Actually he asked if it could scale to 50, but testing 20 was all I could due to to EC2 and time limits. I think the results at 20 nodes are very useful to understand the performance: […]

Shard-Query turbo charges Infobright community edition (ICE)

Shard-Query is an open source tool kit which helps improve the performance of queries against a MySQL database by distributing the work over multiple machines and/or multiple cores. This is similar to the divide and conquer approach that Hive takes in combination with Hadoop. Shard-Query applies a clever approach to parallelism which allows it to […]

Just do the math!

One of the most typical reasons for performance and scalability problems I encounter is simply failing to do the math. And these are typically bad one because it often leads to implementing architectures which are not up for job they are intended to solve. Let me start with example to make it clear. Lets say […]

Detailed review of Tokutek storage engine

(Note: Review was done as part of our consulting practice, but is totally independent and fully reflects our opinion) I had a chance to take look TokuDB (the name of the Tokutek storage engine), and run some benchmarks. Tuning of TokuDB is much easier than InnoDB, there only few parameters to change, and actually out-of-box […]