Where the open source database community meets: Use code PERCONA75 and secure your spot for Percona Live.  Register

Speeding up GROUP BY if you want aproximate results

March 7, 2008
Author
Peter Zaitsev
Share this Post:

Doing performance analyzes today I wanted to count how many hits come to the pages which get more than couple of visits per day. We had SQL logs in the database so It was pretty simple query:

Unfortunately this query ran for over half an hour badly overloaded server and I had to kill it in the end.

The reason for slowness was of course huge temporary table was required (there were about 5 million of distinct pages visited during that day) which resulted in on disk temporary table which as we know quite slow.

Of course it would be possible to allocate more memory to the temporary table or switch to filesort method and get result faster.

I however picked another road which is quite helpful in similar cases – I did not need exact result but approximate figure so I could trick MySQL to do group by a hash of the page instead of page itself:

As you can see now it completes in about 30 seconds – quite handy.

Another trick I want to share which I use a lot when I want to analyze data distribution but table is to large is to just limit it to first number of rows:

Again this is not exact value but normally close enough to make a decision.

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Far
Enough.

Said no pioneer ever.
MySQL, PostgreSQL, InnoDB, MariaDB, MongoDB and Kubernetes are trademarks for their respective owners.
© 2026 Percona All Rights Reserved