A paginated display is one of the top optimization scenarios we see in the real world. Search results pages, leaderboards, and most-popular lists are good examples. You know the design pattern: display 20 results in some most-relevant order. Show a “next” and “previous” link. And usually, show how many items are in the whole list and how many pages of results there are.
Rendering such a display can consume more resources than the entire rest of the site!
As an example, I’m looking at slow log analysis results (with our microslow patches, set to log all queries) for one client; the slow log contains 6300 seconds’ worth of queries, and the two main queries for the paginated display consumed 2850 and 380 seconds, respectively.
Why is it so expensive? I typically see queries like this:
select .... from ... order by .... limit X, 20
If the ORDER BY can’t use an index (commonly the case), it uses a filesort. Suppose there are a million rows that meet any WHERE conditions. That means a million rows are retrieved, stored, filesorted, then most of them are discarded and only 20 retrieved. If the user clicks the “next” button the same process happens again, only a different 20 are retrieved. And to show the list of pages and the total count, you either a) use SQL_CALC_FOUND_ROWS (see our post on this) or b) execute a separate SELECT to count the rows.
There are ways to optimize so you don’t have to do quite so much offsetting and limiting. I wrote about this on O’Reilly’s website in an article on optimizing ranked data. But frankly it’s not that easy to do in the real world; you can usually optimize for one access method to the data at some significant cost in complexity and maintenance (which might be worth it) but not for many different ways of accessing the same data, which is more typical in websites we work on.
Beyond indexing, re-organizing data, or query optimizations, there are two big things you can do. One is caching aggressively to prevent these queries from running. The other is to rethink the paradigm. Just because everyone lays out such pages in the same way doesn’t mean you need to. Think about how you use such pages. Do you really go clicking directly to the Nth page of results, or the last page? “Hmm, it found 13928 results. Let me look at the least relevant search results for my query.” Generally not — you usually look at the most helpful stuff, which is supposed to be first in the list.
With that in mind, here are four suggestions for optimizing paginated displays that can give significantly better performance.
These suggestions can take a lot of work off the database server without impacting the user’s experience at all.