Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Pipelining and Transactions in Redis and Valkey

April 13, 2026

Author

Martin Visser

Uncategorized

Share this Post:

Pipelining and transactions are two important features in Redis and Valkey. Both involve sending multiple commands together. But they solve completely different problems, make completely different guarantees, and combining them incorrectly is one of the most common sources of subtle bugs in production systems.

In this post we’ll unpack both from first principles, look at how Lua scripting and Valkey Functions fit into the picture, and cover what actually changes when you move to cluster mode.

Network latency is the real bottleneck

Before we can really appreciate pipelining, we need to consider this uncomfortable truth: your Redis or Valkey server is almost certainly not the bottleneck in your system. The network is.

A typical Valkey instance can handle hundreds of thousands of operations per second. But what if your network latency is 250ms, your maximum throughput from a single client is just four requests per second. The server sits idle for most of the time, waiting for packets to arrive. If we have multiple clients, this is a smaller issue but still worth optimising.

Every individual command follows a round trip: the client sends a request, waits for the server to process it, reads the response. This is the Round Trip Time, or RTT. And unless you do something about it, every single command pays this cost in full.

Pipelining: reclaiming the RTT

Pipelining is the solution to the RTT problem. Instead of sending commands one at a time and waiting for each response, you buffer a batch of commands and flush them all in a single write to the socket. The server processes them and sends back all the responses together.

# Without pipelining — 3 round trips
client.set('key1', 'a')  # wait...
client.set('key2', 'b')  # wait...
client.set('key3', 'c')  # wait...

# With pipelining — 1 round trip regardless of batch size
pipe = client.pipeline()
pipe.set('key1', 'a')
pipe.set('key2', 'b')
pipe.set('key3', 'c')
results = pipe.execute()

# Without pipelining — 3 round trips

client.set('key1', 'a') # wait...

client.set('key2', 'b') # wait...

client.set('key3', 'c') # wait...

# With pipelining — 1 round trip regardless of batch size

pipe = client.pipeline()

pipe.set('key1', 'a')

pipe.set('key2', 'b')

pipe.set('key3', 'c')

results = pipe.execute()

The performance gains are real and significant. In production systems, pipelining routinely delivers 10–20x throughput improvements for write-heavy workloads. Swiggy’s engineering team found that 50–70% of their ML feature job uptime was being consumed by Redis writes. After switching to pipelined batch writes using Jedis in cluster mode, they cut write time by 90% and reduced overall infrastructure costs by around 60%.

Additional benefits

The throughput gain isn’t just about fewer round trips. There’s a second benefit that’s less obvious, system call reduction.

Every individual command requires the operating system to transition from user space to kernel space via read() and write() calls. These context switches are not free. When you pipeline, multiple commands are handled by a single read() and their responses sent via a single write(). The reduction in context switching is a meaningful contributor to the performance improvement at high throughput.

Trade-offs

Pipelining is not a free lunch. There are a few things to keep in mind.

Pipelining is not atomic

Commands in a pipeline can partially succeed. If three commands are sent and the second one fails, the first has already been applied and the third will still execute. If you need all-or-nothing semantics, you need transactions.

Large pipelines consume server memory

The server has to buffer all responses until the final command in the pipeline is processed. In practice, batches of around 1,000 commands strike a good balance between RTT reduction and memory pressure. Don’t pipeline everything blindly.

Monitor your output buffers

Each client connection has a server-side output buffer. If a client can’t read responses fast enough, this buffer grows. The omem field in CLIENT LIST will tell you if you have slow consumers before they hit the hard limit and get disconnected.

Transactions: MULTI/EXEC

Transactions in Redis and Valkey are initiated with MULTI and executed with EXEC. Commands issued between the two are queued rather than executed immediately. When EXEC is called, the server runs all queued commands sequentially and returns all results at once.

pipe = client.pipeline(transaction=True)
pipe.multi()

pipe.decrby('balance:alice', 50)
pipe.incrby('balance:bob', 50)
pipe.set('last_tx', 'alice->bob')

results = pipe.execute()  # ['OK', 100, 'OK'] — all or nothing

pipe = client.pipeline(transaction=True)

pipe.multi()

pipe.decrby('balance:alice', 50)

pipe.incrby('balance:bob', 50)

pipe.set('last_tx', 'alice->bob')

results = pipe.execute() # ['OK', 100, 'OK'] — all or nothing

Transactions provide two guarantees:

Serialised execution. No other client’s commands will be interleaved during the EXEC phase. The server’s command execution loop remains single-threaded for the duration. You get complete isolation.
Atomicity*. Either all commands in the transaction execute (on EXEC) or none do (if the connection is lost before EXEC, or if you call DISCARD). However, this atomicity guarantee is weaker than it first appears, because of how errors are handled.

Error modes

IThere are two fundamentally different types of errors in a transaction, and they behave very differently.

Queue-time errors

These occur before EXEC ; things like syntax errors, wrong argument counts, or out-of-memory conditions. If any command is rejected at queue time, the entire transaction is discarded when EXEC is called. You get an EXECABORT and nothing runs.

Exec-time errors

These occur during EXEC ; for example, running LPOP on a key that holds a string. These errors apply only to that specific command. The rest of the transaction continues executing. You get back a mixed result array containing both successful results and error objects.

pipe.multi()
pipe.set('mystr', 'hello')    # OK
pipe.lpop('mystr')             # will fail — type mismatch
pipe.incr('counter')           # will still run!
results = pipe.execute()
# ['OK', WrongTypeError, 1] — partial execution

pipe.multi()

pipe.set('mystr', 'hello') # OK

pipe.lpop('mystr') # will fail — type mismatch

pipe.incr('counter') # will still run!

results = pipe.execute()

# ['OK', WrongTypeError, 1] — partial execution

The absence of rollbacks is intentional. Implementing rollbacks would require significant added complexity, and the Redis/Valkey philosophy is that exec-time errors are almost always programming bug. Type mismatches that should have been caught in development, not handled in production. The performance trade-off for a rollback mechanism isn’t worth it for an in-memory datastore.

Optimistic locking with WATCH

Transactions solve the isolation problem but they don’t protect you from race conditions in read-modify-write patterns. Consider this scenario: you read a balance, calculate a new value, another client does the same thing, and you both write back. Result, the second write silently overwrites the first. In relational databases this is handled with pessimistic locking. You take a lock on the balance row in a table and no other transactions can change the balance until you release the lock.

WATCH solves this with optimistic locking. You watch one or more keys before starting a MULTI block. If any watched key is modified by another client before your EXEC is called, the transaction is aborted and nil is returned. You retry.

def transfer(r, from_key, to_key, amount):
    while True:
        with r.pipeline() as pipe:
            try:
                pipe.watch(from_key, to_key)
                balance = int(r.get(from_key))
                if balance < amount:
                    raise ValueError('insufficient funds')
                pipe.multi()
                pipe.decrby(from_key, amount)
                pipe.incrby(to_key, amount)
                pipe.execute()
                return True
            except WatchError:
                continue  # retry

def transfer(r, from_key, to_key, amount):

while True:

with r.pipeline() as pipe:

try:

pipe.watch(from_key, to_key)

balance = int(r.get(from_key))

if balance < amount:

raise ValueError('insufficient funds')

pipe.multi()

pipe.decrby(from_key, amount)

pipe.incrby(to_key, amount)

pipe.execute()

return True

except WatchError:

continue # retry

WATCH is highly efficient in low-contention environments. The cost is only paid when a conflict actually occurs. In high-contention scenarios – where many clients interact on the same keys – this retry loop can become expensive. This is where Lua scripting is the better approach.

Lua Scripting: moving logic to the data

Lua scripts are executed atomically on the server. No other commands run while a script is executing. Unlike transactions, scripts can read data and make conditional decisions within the same atomic block, which eliminates the need for WATCH in many cases.

<code class="language-lua">-- Atomic rate limiter
local key     = KEYS[1]
local limit   = tonumber(ARGV[1])
local current = redis.call('INCR', key)
if current == 1 then
    redis.call('EXPIRE', key, 60)
end
if current > limit then
    return 0  -- denied
end
return 1  -- allowed

<code class="language-lua">-- Atomic rate limiter

local key = KEYS[1]

local limit = tonumber(ARGV[1])

local current = redis.call('INCR', key)

if current == 1 then

redis.call('EXPIRE', key, 60)

end

if current > limit then

return 0 -- denied

end

return 1 -- allowed

This script reads the current counter, increments it, sets the expiry on the first call, and returns the allow/deny decision : all in one atomic server-side operation. With MULTI/EXEC you’d need WATCH, multiple round trips, and a retry loop to achieve the same result.

The comparison between transactions and Lua is worth making explicit:

Transactions vs Lua Scripting
Feature	MULTI/EXEC	Lua Script
Conditional logic	Not inside MULTI	Full if/else/loops
Network trips	Multiple	One
Atomicity	During EXEC only	Entire script
Read-then-write	Requires WATCH + retry	Implicit
Performance	Medium	High

EVALSHA: Avoid Resending the Script

For high-frequency operations, you don’t want to send the full Lua script body over the network on every call. Load the script once:

SCRIPT LOAD "...lua script..."
# Returns a SHA-1 hash

1 2	SCRIPT LOAD "...lua script..." # Returns a SHA-1 hash

Then reference it by hash:

EVALSHA <sha1> 1 user:42 100

1	EVALSHA <sha1> 1 user:42 100

This is the EVALSHA pattern — the script body is cached on the server, and you only send the lightweight hash on each call. It’s particularly valuable for operations like rate limiting and leaderboard updates that fire thousands of times per second.

Valkey Functions

EVALSHA has one significant operational weakness: the script cache is not persisted. It lives in memory only and is flushed on restart. After a failover, the new primary has no knowledge of any cached scripts. Your first EVALSHA call gets a NOSCRIPT error and you have to reload.

Valkey Functions, introduced as part of Redis 7.0 and carried forward into Valkey, solve this properly. Functions are named, library-organised scripts that are stored in the keyspace and replicated through the normal replication stream. They survive restarts. After a primary failure and failover, the new primary already has your functions. So no reloading required which you would need to with LUA.

# Load the library once — it persists through restarts and replication
FUNCTION LOAD #!lua name=ratelimit
  redis.register_function('rl', function(keys, args)
    ...
  end)

# Call by name — no script body on the wire
FCALL rl 1 user:42 100

# Load the library once — it persists through restarts and replication

FUNCTION LOAD #!lua name=ratelimit

redis.register_function('rl', function(keys, args)

...

end)

# Call by name — no script body on the wire

FCALL rl 1 user:42 100

The distinction is important for production systems: EVALSHA is a performance optimisation; Functions are an operational guarantee. For anything you’d have previously relied on EVALSHA for, Functions (FCALL) are now the right answer.

Clustering

As data volumes grow or you need more throughput than a single instance can provide, you need to shard across multiple nodes. A Valkey cluster divides the keyspace into 16,384 hash slots using CRC16:

Slot = CRC16(key) mod 16,384

1	Slot = CRC16(key) mod 16,384

Each node owns a subset of these slots. This works transparently for single-key operations, but it introduces constraints for anything involving multiple keys.

The CROSSSLOT problem

Multi-key commands, e.g. transactions, Lua scripts, MSET, need all involved keys to reside on the same slot. Attempt to span slots and you get a CROSSSLOT error. Hash tags solve this: by enclosing a portion of the key in curly braces, you force the cluster to hash only that portion.

# These will land on the same slot
{user:1001}:profile
{user:1001}:settings
{user:1001}:activity

# These will land on the same slot

{user:1001}:profile

{user:1001}:settings

{user:1001}:activity

hot-slots

Hash tags introduce a risk: if your tag has low cardinality, you’ll concentrate all traffic on a single node while others idle. Using status values like {PENDING} as hash tags means all pending tasks land on one node. The fix is using high-cardinality identifiers as the tag, which maintains locality while keeping the distribution even.

Pipelining in cluster mode

Cluster mode creates a specific challenge for pipelining. A standard pipeline assumes all commands go to the same Valkey server. In a cluster, keys in the same pipeline batch may hash to different slots on different nodes.

Most client libraries handle this by grouping commands by destination slot and sending a sub-pipeline to each relevant node, typically done serially, paying one round trip per node. If your batch touches five nodes, you wait for five sequential round trips. Some libraries simply throw an error for cross-slot pipelines.

PhysicsWallah solved this by building a custom clusterPipeline library that fans sub-pipelines out to all relevant nodes in parallel, then reassembles results in the original command order. The latency is determined by the slowest node, not the sum of all nodes — a significant improvement at scale.

If you’re using Valkey, the official client library – Valkey Glide – does this natively. It maintains the slot map, routes sub-pipelines to each node in parallel, and reassembles results correctly. This is one of the concrete operational advantages of using Glide over legacy clients like Jedis or redis-py when running in cluster mode.

Real-world lessons

Uber uses Redis as the coordination layer for financial batch processing, handling over 150 million reads and managing real-time ledger updates. Their architecture uses pipelining to group account operations into 250ms time-bounded batches, and designates the Redis instance hosting each batch as the authoritative clock to avoid clock drift across availability zones. Atomic Lua scripts manage the state transitions between batch phases – creation, execution, completion – thus ensuring financial holds and credits are applied without race conditions.

Swiggy found that 50–70% of their ML feature job uptime was being consumed by Redis writes. By moving to pipelined batch writes using Jedis in cluster mode, they cut that write overhead by 90% and reduced overall AWS EC2 and Databricks infrastructure costs by around 60%. The gains came from eliminating CPU context switches and per-command RTT overhead.

PhysicsWallah needed bulk cluster operations but found that standard libraries couldn’t efficiently pipeline across multiple nodes. Their custom clusterPipeline library calculates hash slots for an entire batch, groups commands by responsible node, executes node-specific pipelines in parallel, and returns results in original submission order. The approach significantly improved API response times compared to individual command execution, and it’s essentially the same pattern that Valkey Glide now provides out of the box.

Production checklist

Batch size: cap pipelines at around 1,000 commands. Larger batches queue responses server-side and can exhaust memory.

Lua over WATCH: for read-modify-write flows, prefer Lua scripts or Valkey Functions over MULTI/EXEC with WATCH. Fewer round trips, no retry loop, implicitly atomic.

Hash tag cardinality: use IDs as hash tags in cluster mode. Low-cardinality tags (status values, boolean flags) create hot slots that negate horizontal scaling.

Use Valkey Glide for clusters: if you’re running cluster mode, Glide’s parallel sub-pipeline routing is meaningfully better than the serial approach in most legacy clients.

Disable THP: Transparent Huge Pages cause multi-millisecond latency spikes during RDB fork operations. Add echo never > /sys/kernel/mm/transparent_hugepage/enabled to your server startup.

TCP_NODELAY: Nagle’s algorithm buffers small packets for efficiency, the opposite of what you want for sub-millisecond latency. Ensure your client enables TCP_NODELAY.

Connection pooling: persistent pools reduce handshake overhead and avoid port exhaustion under load.

Summary

Pipelining, transactions, and scripting each solve a different problem

Pipelining eliminates the RTT bottleneck and system call overhead. It’s your primary throughput tool. It’s not atomic, accept partial failures or use transactions for the parts that need them.
Transactions (MULTI/EXEC + WATCH) provide serialised isolation and conditional atomicity. Use them when you need all-or-nothing semantics and can tolerate the round-trip cost. WATCH handles race conditions at the cost of retry logic.
Lua scripts and Valkey Functions are the most powerful option : fully atomic, server-side execution with conditional logic, no WATCH required. EVALSHA is a performance optimisation; Functions (FCALL) are the production-grade pattern, surviving restarts and failover through replication.

The right choice depends on what you’re actually trying to protect. For throughput, pipeline. For consistency, script. For both — design your keys carefully, understand your cluster topology, and use Valkey Glide.