October 23, 2014

Checking the subset sum set problem with set processing

Hi, Here is an easy way to run the subset sum check from SQL, which you can then distribute with Shard-Query:

Notice there is no 16 in the list. We did not pass the check. There are enough 15s though. The distinct value count for each item in the output set, must at least […]

Distributed set processing performance analysis with ICE 3.5.2pl1 at 20 nodes.

Demonstrating distributed set processing performance Shard-Query + ICE scales very well up to at least 20 nodes This post is a detailed performance analysis of what I’ve coined “distributed set processing”. Please also read this post’s “sister post” which describes the distributed set processing technique. Also, remember that Percona can help you get up and […]

Distributed Set Processing with Shard-Query

Can Shard-Query scale to 20 nodes? Peter asked this question in comments to to my previous Shard-Query benchmark. Actually he asked if it could scale to 50, but testing 20 was all I could due to to EC2 and time limits. I think the results at 20 nodes are very useful to understand the performance: […]

Percona XtraDB Cluster: Setting up a simple cluster

Percona XtraDB Cluster (PXC) is different enough from async replication that it can be a bit of a puzzle how to do things the Galera way.  This post will attempt to illustrate the basics of setting up 2 node PXC cluster from scratch. Requirements Two servers (could be VMs) that can talk to each other. […]

Preprocessing Data

There are many ways of improving response times for users. There are some people that spend a lot of time, energy and money on trying to have the application respond as fast as possible at the time when the users made the request. Those people may miss out on an opportunity to do some or […]

The case for getting rid of duplicate “sets”

The most useful feature of the relational database is that it allows us to easily process data in sets, which can be much faster than processing it serially. When the relational database was first implemented, write-ahead-logging and other technologies did not exist. This made it difficult to implement the database in a way that matched […]

Why message queues and offline processing are so important

If you read Percona’s whitepaper on Goal-Driven Performance Optimization, you will notice that we define performance using the combination of three separate terms. You really want to read the paper, but let me summarize it here: Response Time – This is the time required to complete a desired task. Throughput – Throughput is measured in […]

High availability for MySQL on Amazon EC2 – Part 2 – Setting up the initial instances

This post is the second of a series that started here. The first step to build the HA solution is to create two working instances, configure them to be EBS based and create a security group for them. A third instance, the client, will be discussed in part 7. Since this will be a proof […]

PHP Large result sets and summary tables.

We’re working with web site preparing for massive growth. To make sure it handles large data sets as part of the process we work on generation test database of significant size as testing your application on table with 1000 rows may well give you very dangerous false sense of security. One of the process web […]

Percona XtraDB Cluster 5.6.19-25.6 is now available

Percona is glad to announce the new release of Percona XtraDB Cluster 5.6 on July 21st 2014. Binaries are available from downloads area or from our software repositories. We’re also happy to announce that Ubuntu 14.04 LTS users can now download, install, and upgrade Percona XtraDB Cluster 5.6 from Percona’s software repositories. Based on Percona […]