September 1, 2014

Why message queues and offline processing are so important

If you read Percona’s whitepaper on Goal-Driven Performance Optimization, you will notice that we define performance using the combination of three separate terms. You really want to read the paper, but let me summarize it here:

  1. Response Time – This is the time required to complete a desired task.
  2. Throughput – Throughput is measured in tasks completed per unit of time.
  3. Capacity – The system’s capacity is the point where load cannot be increased without degrading response time below acceptable levels.

Setting and meeting your response time goal should always be your primary focus, but the closer throughput is to capacity the worse response time can be.  It’s a trade-off! Cary Millsap reminds us to think of this just like how traffic slows down with more cars on a highway:


Photo Credit: photoAtlas

Which brings me to my point.

You can actually choose to optimize a system in two different ways – for response, or for throughput. When you optimize for throughput you are relaxing (not eliminating) your response time objectives in order to have more tasks completed per unit of time.

It is much easier to relax response time objectives if the task is not user facing, which is why I often see applications and suggest that they convert a task that happens in the foreground to instead be sent to a message queue, or Gearman.  Or in plain English: The same MySQL servers can achieve  much more work, if you allow the potential for each individual task to take a little bit longer.

About Morgan Tocker

Morgan is a former Percona employee.
He was the Director of Training at Percona. He was formerly a Technical Instructor for MySQL and Sun Microsystems. He has also previously worked in the MySQL Support Team, and provided DRBD support.

Comments

  1. Cary Millsap calls this technique “latency hiding.” I’m not sure that he invented the term, but it’s a good one. One of the key things about sipping from a queue is that arrival times don’t vary. Randomly distributed arrival times are one of the biggest reasons why query response times can be so irregular for user-facing systems.

  2. Andy says:

    A pertinent post, and a very interesting white paper. Thanks.

  3. Michael says:

    Great writeup — but its the visual (the photo) that really bring this concept out. Travel that darn bridge approach several times a week and I’ve now got a whole new way to fully visualize/comprehend the meaning of the white paper :)

Speak Your Mind

*