September 1, 2014

High-Performance Click Analysis with MySQL

We have a lot of customers who do click analysis, site analytics, search engine marketing, online advertising, user behavior analysis, and many similar types of work.  The first thing these have in common is that they’re generally some kind of loggable event. The next characteristic of a lot of these systems (real or planned) is […]

Computing 95 percentile in MySQL

When doing performance analyzes you often would want to see 95 percentile, 99 percentile and similar values. The “average” is the evil of performance optimization and often as helpful as “average patient temperature in the hospital”. Lets set you have 10000 page views or queries and have average response time of 1 second. What does […]

How adding another table to JOIN can improve performance ?

JOINs are expensive and it most typical the fewer tables (for the same database) you join the better performance you will get. As for any rules there are however exceptions The one I’m speaking about comes from the issue with MySQL optimizer stopping using further index key parts as soon as there is a range […]

Missing Data – rows used to generate result set

As Baron writes it is not the number of rows returned by the query but number of rows accessed by the query will most likely be defining query performance. Of course not all row accessed are created equal (such as full table scan row accesses may be much faster than random index lookups row accesses […]

The MySQL optimizer, the OS cache, and sequential versus random I/O

In my post on estimating query completion time, I wrote about how I measured the performance on a join between a few tables in a typical star schema data warehousing scenario. In short, a query that could take several days to run with one join order takes an hour with another, and the optimizer chose […]

MySQL 6.0 vs 5.1 in TPC-H queries

Last week I played with queries from TPC-H benchmarks, particularly comparing MySQL 6.0.4-alpha with 5.1. MySQL 6.0 is interesting here, as there is a lot of new changes in optimizer, which should affect execution plan of TPC-H queries. In reality only two queries (from 22) have significantly better execution time (about them in next posts), […]

MySQL File System Fragmentation Benchmarks

Few days ago I wrote about testing writing to many files and seeing how this affects sequential read performance. I was very interested to see how it shows itself with real tables so I’ve got the script and ran tests for MyISAM and Innodb tables on ext3 filesystem. Here is what I found:

How much overhead is caused by on disk temporary tables

As you might know while running GROUP BY and some other kinds of queries MySQL needs to create temporary tables, which can be created in memory, using MEMORY storage engine or can be created on disk as MYISAM tables. Which one will be used depends on the allowed tmp_table_size and also by the data which […]

Long PRIMARY KEY for Innodb tables

I’ve written and spoke a lot about using short PRIMARY KEYs with Innodb tables due to the fact all other key will refer to the rows by primary key. I also recommended to use sequential primary keys so you do not end up having random primary key BTREE updates which can be very expensive. Today […]

MySQL EXPLAIN limits and errors.

Running EXPLAIN for problematic queries is very powerful tool for MySQL Performance optimization. If you’ve been using this tool a lot you probably noticed it is not always provide adequate information. Here is list of things you may wish to watch out. EXPLAIN can be wrong – this does not happen very often but it […]