October 21, 2014

Quickly finding unused indexes (and estimating their size)

I had a customer recently who needed to reduce their database size on disk quickly without a lot of messy schema redesign and application recoding.  They didn’t want to drop any actual data, and their index usage was fairly high, so we decided to look for unused indexes that could be removed. Collecting data It’s […]

SQL Injection Questions Followup

I presented a webinar today about SQL Injection, to try to clear up some of the misconceptions that many other blogs and articles have about this security risk.  You can register for the webinar even now that I’ve presented it, and you’ll be emailed a link to the recording, which will be available soon. During […]

Identifying the load with the help of pt-query-digest and Percona Server

Overview Profiling, analyzing and then fixing queries is likely the most oft-repeated part of a job of a DBA and one that keeps evolving, as new features are added to the application new queries pop up that need to be analyzed and fixed. And there are not too many tools out there that can make […]

The case for getting rid of duplicate “sets”

The most useful feature of the relational database is that it allows us to easily process data in sets, which can be much faster than processing it serially. When the relational database was first implemented, write-ahead-logging and other technologies did not exist. This made it difficult to implement the database in a way that matched […]

Distributed Set Processing with Shard-Query

Can Shard-Query scale to 20 nodes? Peter asked this question in comments to to my previous Shard-Query benchmark. Actually he asked if it could scale to 50, but testing 20 was all I could due to to EC2 and time limits. I think the results at 20 nodes are very useful to understand the performance: […]

Shard-Query turbo charges Infobright community edition (ICE)

Shard-Query is an open source tool kit which helps improve the performance of queries against a MySQL database by distributing the work over multiple machines and/or multiple cores. This is similar to the divide and conquer approach that Hive takes in combination with Hadoop. Shard-Query applies a clever approach to parallelism which allows it to […]

When the subselect runs faster

A few weeks ago, we had a query optimization request from one of our customer. The query was very simple like:

This column in the table is looks like this:

The table have 549252 rows and of course, there is an index on the col1. MySQL estimated the cardinality of that index as […]

Multi Column indexes vs Index Merge

The mistake I commonly see among MySQL users is how indexes are created. Quite commonly people just index individual columns as they are referenced in where clause thinking this is the optimal indexing strategy. For example if I would have something like AGE=18 AND STATE=’CA’ they would create 2 separate indexes on AGE and STATE […]

Multiple column index vs multiple indexes

(There is an updated version of the content in this post by Percona’s Stephane Combaudon available here.) After my previous post there were questions raised about Index Merge on Multiple Indexes vs Two Column Index efficiency. I mentioned in most cases when query can use both of the ways using multiple column index would be […]

Database access Optimization in Web Applications.

This is pretty simple approach I often use called to optimize web application performance if problem happens with few pages. If we have “everything is slow” problem looking at slow query logs may be better start. So what could you do ? Look at the information shown on the page which comes from database. This […]