Search Results for: queries with group by and aggregate

Using Apache Hadoop and Impala together with MySQL for data analysis

Apache Hadoop is commonly used for data analysis. It is fast for data loads and scalable. In a previous post I showed how to integrate MySQL with Hadoop. In this post I will show how to export a table from  MySQL to Hadoop, load the data to Cloudera Impala (columnar format) and run a reporting […]

How to Identify Bad Queries in MySQL

Finding bad queries is a big part of optimization. A scientific optimization process can be simplified to “can anything be improved for less than it costs not to improve it? – if not, we’re done.” In databases, we care most about the work the database is doing. That is, queries. There are other things we […]

Parallel Query for MySQL with Shard-Query

While Shard-Query can work over multiple nodes, this blog post focuses on using Shard-Query with a single node.  Shard-Query can add parallelism to queries which use partitioned tables.  Very large tables can often be partitioned fairly easily. Shard-Query can leverage partitioning to add paralellism, because each partition can be queried independently. Because MySQL 5.6 supports the […]

Introducing Percona Cloud Tools for MySQL

I am proud to announce Percona Cloud Tools, the next generation of tools for MySQL.  I have been developing tools for MySQL for 10 years.  For the last 5 years, I have been developing Percona Toolkit (formerly “Maatkit”).  Almost 1 year ago, we began developing Percona Cloud Tools (PCT), first in-house, then in private beta, […]

Distributed Set Processing with Shard-Query

Can Shard-Query scale to 20 nodes? Peter asked this question in comments to to my previous Shard-Query benchmark. Actually he asked if it could scale to 50, but testing 20 was all I could due to to EC2 and time limits. I think the results at 20 nodes are very useful to understand the performance: […]