Divide and conquer in the cloud: one big server or many small ones?
Amazon EC2 and other cloud providers now provide large machines with SSD disks. But is one big server with lots of very fast storage the best option for queries that have to access large volumes of data (OLAP)? The large server is limited to 64GB and MySQL queries are single threaded. Perhaps spreading your data over eight 17.1GB servers might cost the same(or less) and perform significantly better?
This talk will introduce Shard-Query which can spread data over many servers but treat the set as one big server but the focus is on performance, not how Shard-Query works. The talk will compare the price/performance difference of OLAP queries on one "Quadruple Extra Large High-IO" server compared with eight "Extra Large High Memory" servers. While eight servers increase operational complexity, the performance improvement trade-off may very well be acceptable.
Shard-Query is an open source MPP query engine for MySQL: