Search Results for: queries with group by and aggregate

Auditing MySQL with McAfee and MongoDB

Greetings everyone! Let’s discuss a 3rd Party auditing solution to MySQL and how we can leverage MongoDB® to make sense out of all of that data. The McAfee MySQL Audit plugin does a great job of capturing, at low level, activities within a MySQL server. It does this through some non-standard APIs which is why installing […]

Using Apache Hadoop and Impala together with MySQL for data analysis

Apache Hadoop is commonly used for data analysis. It is fast for data loads and scalable. In a previous post I showed how to integrate MySQL with Hadoop. In this post I will show how to export a table from  MySQL to Hadoop, load the data to Cloudera Impala (columnar format) and run a reporting […]

How to Identify Bad Queries in MySQL

Finding bad queries is a big part of optimization. A scientific optimization process can be simplified to “can anything be improved for less than it costs not to improve it? – if not, we’re done.” In databases, we care most about the work the database is doing. That is, queries. There are other things we […]

Parallel Query for MySQL with Shard-Query

While Shard-Query can work over multiple nodes, this blog post focuses on using Shard-Query with a single node.  Shard-Query can add parallelism to queries which use partitioned tables.  Very large tables can often be partitioned fairly easily. Shard-Query can leverage partitioning to add paralellism, because each partition can be queried independently. Because MySQL 5.6 supports the […]

Introducing Percona Cloud Tools for MySQL

I am proud to announce Percona Cloud Tools, the next generation of tools for MySQL.  I have been developing tools for MySQL for 10 years.  For the last 5 years, I have been developing Percona Toolkit (formerly “Maatkit”).  Almost 1 year ago, we began developing Percona Cloud Tools (PCT), first in-house, then in private beta, […]

Distributed Set Processing with Shard-Query

Can Shard-Query scale to 20 nodes? Peter asked this question in comments to to my previous Shard-Query benchmark. Actually he asked if it could scale to 50, but testing 20 was all I could due to to EC2 and time limits. I think the results at 20 nodes are very useful to understand the performance: […]