Four ways to optimize paginated displaysBaron Schwartz
A paginated display is one of the top optimization scenarios we see in the real world. Search results pages, leaderboards, and most-popular lists are good examples. You know the design pattern: display 20 results in some most-relevant order. Show a “next” and “previous” link. And usually, show how many items are in the whole list and how many pages of results there are.
Rendering such a display can consume more resources than the entire rest of the site!
As an example, I’m looking at slow log analysis results (with our microslow patches, set to log all queries) for one client; the slow log contains 6300 seconds’ worth of queries, and the two main queries for the paginated display consumed 2850 and 380 seconds, respectively.
Why is it so expensive? I typically see queries like this:
select .... from ... order by .... limit X, 20
If the ORDER BY can’t use an index (commonly the case), it uses a filesort. Suppose there are a million rows that meet any WHERE conditions. That means a million rows are retrieved, stored, filesorted, then most of them are discarded and only 20 retrieved. If the user clicks the “next” button the same process happens again, only a different 20 are retrieved. And to show the list of pages and the total count, you either a) use SQL_CALC_FOUND_ROWS (see our post on this) or b) execute a separate SELECT to count the rows.
There are ways to optimize so you don’t have to do quite so much offsetting and limiting. I wrote about this on O’Reilly’s website in an article on optimizing ranked data. But frankly it’s not that easy to do in the real world; you can usually optimize for one access method to the data at some significant cost in complexity and maintenance (which might be worth it) but not for many different ways of accessing the same data, which is more typical in websites we work on.
Beyond indexing, re-organizing data, or query optimizations, there are two big things you can do. One is caching aggressively to prevent these queries from running. The other is to rethink the paradigm. Just because everyone lays out such pages in the same way doesn’t mean you need to. Think about how you use such pages. Do you really go clicking directly to the Nth page of results, or the last page? “Hmm, it found 13928 results. Let me look at the least relevant search results for my query.” Generally not — you usually look at the most helpful stuff, which is supposed to be first in the list.
With that in mind, here are four suggestions for optimizing paginated displays that can give significantly better performance.
- On the first query, fetch and cache all the results. Now it’s easy to know how many results there are, and fetching subsequent pages is no extra work for the database. In this model, you get to keep your “found X rows, showing page N of M” display that many people cherish.
- Don’t show all results. Not even Google lets you see the millionth result. You get to see N results and after that, you’re done. Limit the results to 100 or 500 or something. Remember, the further you go into the list, the more rows you are scanning and discarding with that LIMIT. This technique works great in conjunction with the first one. If you want to show 500 results, maybe you can fetch 501 and if the 501st row exists, display “more than 500 results found.”
- Don’t show the total count or the intermediate links to other pages. Show only the “next” link. (If people want to see the “previous” results, they can use their browser’s back button.) You can do this by fetching one more result than you want to display — for example, fetch 21 rows and display only 20. If there’s a 21st row, render the “next” link; if not, you’re at the end of the list. This way you don’t have to calculate how many results there are, and if caching is difficult then this is a simple way to avoid some of the costs.
- Estimate how many results there are. Again, Google does this and nobody complains. Use EXPLAIN and look at the “rows” column — that’s a fine estimate for some scenarios. (This tip doesn’t work in as many scenarios as others, but it’s still acceptable in many.)
These suggestions can take a lot of work off the database server without impacting the user’s experience at all.