October 31, 2014

Faster MySQL failover with SELECT mirroring

One of my favorite MySQL configurations for high availability is master-master replication, which is just like normal master-slave replication except that you can fail over in both directions. Aside from MySQL Cluster, which is more special-purpose, this is probably the best general-purpose way to get fast failover and a bunch of other benefits (non-blocking ALTER […]

COUNT(*) vs COUNT(col)

Looking at how people are using COUNT(*) and COUNT(col) it looks like most of them think they are synonyms and just using what they happen to like, while there is substantial difference in performance and even query result. Lets look at the following series of examples:

Using delayed JOIN to optimize count(*) and LIMIT queries

In many Search/Browse applications you would see main (fact) table which contains search fields and dimension tables which contain more information about facts and which need to be joined to get query result. If you’re executing count(*) queries for such result sets MySQL will perform the join even if you use LEFT JOIN so it […]

COUNT(*) for Innodb Tables

I guess note number one about MyISAM to Innodb migration is warning what Innodb is very slow in COUNT(*) queries. The part which I often however see omitted is fact it only applies to COUNT(*) queries without WHERE clause. So if you have query like SELECT COUNT(*) FROM USER It will be much faster for […]

Innodb transaction history often hides dangerous ‘debt’

In many write-intensive workloads Innodb/XtraDB storage engines you may see hidden and dangerous “debt” being accumulated – unpurged transaction “history” which if not kept in check over time will cause serve performance regression or will take all free space and cause an outage. Let’s talk about where it comes from and what can you do […]

Flexviews – part 3 – improving query performance using materialized views

Combating “data drift” In my first post in this series, I described materialized views (MVs). An MV is essentially a cached result set at one point in time. The contents of the MV will become incorrect (out of sync) when the underlying data changes. This loss of synchronization is sometimes called drift. This is conceptually […]

Using flow control functions for performance monitoring queries

I’m not big fan on flow control functions like IF or CASE used in MySQL Queries as they are often abused used to create queries which are poorly readable as well as can hardly be optimized well by MySQL Optimizer. One way I find IF statement very useful is computing multiple aggregates over different set […]

Speeding up GROUP BY if you want aproximate results

Doing performance analyzes today I wanted to count how many hits come to the pages which get more than couple of visits per day. We had SQL logs in the database so It was pretty simple query:

Unfortunately this query ran for over half an hour badly overloaded server and I had to kill […]

Using GROUP BY WITH ROLLUP for Reporting Performance Optimization

Quite typical query for reporting applications is to find top X values. If you analyze Web Site logs you would look at most popular web pages or search engine keywords which bring you most of the traffic. If you’re looking at ecommerce reporting you may be interested in best selling product or top sales people. […]

Do you always need index on WHERE column ?

I believe we wrote about this before, but this topic popups again and again. Today I’ve read opinion that if we have clause WHERE has_something=1 we should have index on column has_something (the column has two values 0 and 1). In reality the right answer is not so simple.