Database access Optimization in Web Applications.Peter Zaitsev
This is pretty simple approach I often use called to optimize web application performance if problem happens with few pages. If we have “everything is slow” problem looking at slow query logs may be better start.
So what could you do ?
Look at the information shown on the page which comes from database. This is important as this is information you need to have retrieved from database one way or another if you want to keep page look the same. Your task is simply to have this information delivered from the database most efficient way.
Investigate how dynamic is this information If information rarely changes or is static it may be better to pre-create or cache it. There are many cache and pre-creation techniques which you can use. Just remember avoiding accessing database is the best way you can optimize database access. This applies to anything else – if you can avoid dynamic page generation at all and have server cache to serve it it is even better.
Check if information retrieved from the database matches information you display Way to often much more information is retrieved from the database than used to generate the page. It can be mild as SELECT * FROM tbl instead of listing columns you actually need or it can be as bad as running SELECT * FROM tbl to count amount of rows in the table (no joking). Sometimes you would see 100 stories selected so one random of them will be displayed and similar things with filtering on application level. Sometimes you can do it more efficient on application level but generally you should try your queries to return you only information you need.
Check number of rows which are used to generate result set This is very important and often forgotten step. Some people think query is simple if it returns few rows while what really matters is amount of rows are analyzed by the query. For example SELECT COUNT(*) FROM links WHERE domain = ‘mysql.com’; will return only one row while may require to scan hundreds of thousands of rows (or index entries) to do so. Other common killer queries are GROUP BY Queries and Sort Queries – SELECT name,descr FROM titles ORDER BY rank DESC LIMIT 10 – If there is no proper index and query has to use “filesort” it may become a bit problem. Other thing to mention is JOINs – Joins are expensive (relatively of course) and for sure increase amount of rows which need to be used to generate result set – if you have to join 10 tables to compose the object it is much slower than getting the same data by single row read.
Check number of rows which actually need to be used to generate result sets Sometimes query needs to use many rows to provide result set just because schema is not optimally indexed – this is easy. For example our ORDER BY rank query is such – adding index on rank column maxes this query to use 10 rows to return 10 rows – exactly what we’re looking for. However our COUNT(*) query is different – even if you have index on domain it still can require a lot of rows scanned to provide result set. Such queries need to be redesigned rather than simply adjusted – for example summary table which holds number of links per domain would do in this case.
Check number of queries If you can retrieve same amount of data with one query it is better than doing so by several queries, assuming this query would not need to analyze much more rows because it is optimized differently. One of typical examples here would be SELECT * FROM tbl WHERE id=5 executed many times with different constants – it is worth to replace such queries with query using IN(5,7,4,56) instead. Do not however be obsessed with this. I’ve seen people trying to join all queries in single UNION (with padding to accommodate different types and number of columns) – this is not a good practice. However if you can reduce number of queries without a lot of complications to application architecture it is good thing to do.