Another look at improving TPC-H-like queries – Q17

Posted on:



Share Button

Summary: An alternate approach, offered in response to our original post, provides excellent improvements for smaller databases, but clustered indexes offer better performance as database size increases. (This posting is by Dave.)

Jay Pipes suggested an alternate approach to improving MySQL performance of Query 17 on a TPC-H-like database.

  1. Add the index (l_partkey, l_quantity) to the lineitem table.
  2. Re-write the query as:

I ran this suggestion on MySQL with Tokutek’s TokuDB storage engine (as before) against a 10G Scale Factor database. After waiting several hours for the EXPLAIN to complete, I punted and moved to a smaller, 1G Scale Factor database. This database nearly fits in memory, and allowed me to examine the query plan

Create the index:


Notice that although the planner sees the li_pkey_quan_idx, it does not use it.

Run the Query:

The resulting query time (66.73s) is an improvement on the default query (155.83s). Nonetheless, when you run this query against the larger, 10G Scale Factor (SF) database, it does not complete after several hours. I observed that the intermediate tables can be held in memory for SF = 1G, but are written to disk for SF = 10G. This is a well-known peril of intermediate tables.

Undeterred, I tried forcing the query planner to make use of the li_pkey_quan_idx. I found that telling the planner to use the index in both the 3rd and 4th line of plan, I could get substantial speed-ups:



Now we’re talking. 12.38s vs. 155.83s (12x improvement). I’m sure this is closer to what Jay was hoping for. Emboldened, I ran the modified query against the SF = 10G database:

So Jay’s ideas offer a solution that does not involve clustering keys and produces a result in a manageable timeframe. However, relative to clustering keys, it is still 4028 / 101 = 40x slower.

Share Button

Tokutek, TokuView


Leave a Reply