Pipelining and transactions are two important features in Redis and Valkey. Both involve sending multiple commands together. But they solve completely different problems, make completely different guarantees, and combining them incorrectly is one of the most common sources of subtle bugs in production systems.
In this post we’ll unpack both from first principles, look at how Lua scripting and Valkey Functions fit into the picture, and cover what actually changes when you move to cluster mode.
Before we can really appreciate pipelining, we need to consider this uncomfortable truth: your Redis or Valkey server is almost certainly not the bottleneck in your system. The network is.
A typical Valkey instance can handle hundreds of thousands of operations per second. But what if your network latency is 250ms, your maximum throughput from a single client is just four requests per second. The server sits idle for most of the time, waiting for packets to arrive. If we have multiple clients, this is a smaller issue but still worth optimising.
Every individual command follows a round trip: the client sends a request, waits for the server to process it, reads the response. This is the Round Trip Time, or RTT. And unless you do something about it, every single command pays this cost in full.
Pipelining is the solution to the RTT problem. Instead of sending commands one at a time and waiting for each response, you buffer a batch of commands and flush them all in a single write to the socket. The server processes them and sends back all the responses together.
|
1 2 3 4 5 6 7 8 9 10 11 12 |
# Without pipelining — 3 round trips client.set('key1', 'a') # wait... client.set('key2', 'b') # wait... client.set('key3', 'c') # wait... # With pipelining — 1 round trip regardless of batch size pipe = client.pipeline() pipe.set('key1', 'a') pipe.set('key2', 'b') pipe.set('key3', 'c') results = pipe.execute() |
The performance gains are real and significant. In production systems, pipelining routinely delivers 10–20x throughput improvements for write-heavy workloads. Swiggy’s engineering team found that 50–70% of their ML feature job uptime was being consumed by Redis writes. After switching to pipelined batch writes using Jedis in cluster mode, they cut write time by 90% and reduced overall infrastructure costs by around 60%.
The throughput gain isn’t just about fewer round trips. There’s a second benefit that’s less obvious, system call reduction.
Every individual command requires the operating system to transition from user space to kernel space via read() and write() calls. These context switches are not free. When you pipeline, multiple commands are handled by a single read() and their responses sent via a single write(). The reduction in context switching is a meaningful contributor to the performance improvement at high throughput.
Pipelining is not a free lunch. There are a few things to keep in mind.
Commands in a pipeline can partially succeed. If three commands are sent and the second one fails, the first has already been applied and the third will still execute. If you need all-or-nothing semantics, you need transactions.
The server has to buffer all responses until the final command in the pipeline is processed. In practice, batches of around 1,000 commands strike a good balance between RTT reduction and memory pressure. Don’t pipeline everything blindly.
Each client connection has a server-side output buffer. If a client can’t read responses fast enough, this buffer grows. The omem field in CLIENT LIST will tell you if you have slow consumers before they hit the hard limit and get disconnected.
Transactions in Redis and Valkey are initiated with MULTI and executed with EXEC. Commands issued between the two are queued rather than executed immediately. When EXEC is called, the server runs all queued commands sequentially and returns all results at once.
|
1 2 3 4 5 6 7 8 9 |
pipe = client.pipeline(transaction=True) pipe.multi() pipe.decrby('balance:alice', 50) pipe.incrby('balance:bob', 50) pipe.set('last_tx', 'alice->bob') results = pipe.execute() # ['OK', 100, 'OK'] — all or nothing |
Transactions provide two guarantees:
IThere are two fundamentally different types of errors in a transaction, and they behave very differently.
These occur before EXEC ; things like syntax errors, wrong argument counts, or out-of-memory conditions. If any command is rejected at queue time, the entire transaction is discarded when EXEC is called. You get an EXECABORT and nothing runs.
These occur during EXEC ; for example, running LPOP on a key that holds a string. These errors apply only to that specific command. The rest of the transaction continues executing. You get back a mixed result array containing both successful results and error objects.
|
1 2 3 4 5 6 7 |
pipe.multi() pipe.set('mystr', 'hello') # OK pipe.lpop('mystr') # will fail — type mismatch pipe.incr('counter') # will still run! results = pipe.execute() # ['OK', WrongTypeError, 1] — partial execution |
The absence of rollbacks is intentional. Implementing rollbacks would require significant added complexity, and the Redis/Valkey philosophy is that exec-time errors are almost always programming bug. Type mismatches that should have been caught in development, not handled in production. The performance trade-off for a rollback mechanism isn’t worth it for an in-memory datastore.
Transactions solve the isolation problem but they don’t protect you from race conditions in read-modify-write patterns. Consider this scenario: you read a balance, calculate a new value, another client does the same thing, and you both write back. Result, the second write silently overwrites the first. In relational databases this is handled with pessimistic locking. You take a lock on the balance row in a table and no other transactions can change the balance until you release the lock.
WATCH solves this with optimistic locking. You watch one or more keys before starting a MULTI block. If any watched key is modified by another client before your EXEC is called, the transaction is aborted and nil is returned. You retry.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
def transfer(r, from_key, to_key, amount): while True: with r.pipeline() as pipe: try: pipe.watch(from_key, to_key) balance = int(r.get(from_key)) if balance < amount: raise ValueError('insufficient funds') pipe.multi() pipe.decrby(from_key, amount) pipe.incrby(to_key, amount) pipe.execute() return True except WatchError: continue # retry |
WATCH is highly efficient in low-contention environments. The cost is only paid when a conflict actually occurs. In high-contention scenarios – where many clients interact on the same keys – this retry loop can become expensive. This is where Lua scripting is the better approach.
Lua scripts are executed atomically on the server. No other commands run while a script is executing. Unlike transactions, scripts can read data and make conditional decisions within the same atomic block, which eliminates the need for WATCH in many cases.
|
1 2 3 4 5 6 7 8 9 10 11 12 |
-- Atomic rate limiter local key = KEYS[1] local limit = tonumber(ARGV[1]) local current = redis.call('INCR', key) if current == 1 then redis.call('EXPIRE', key, 60) end if current > limit then return 0 -- denied end return 1 -- allowed |
This script reads the current counter, increments it, sets the expiry on the first call, and returns the allow/deny decision : all in one atomic server-side operation. With MULTI/EXEC you’d need WATCH, multiple round trips, and a retry loop to achieve the same result.
The comparison between transactions and Lua is worth making explicit:
| Feature | MULTI/EXEC | Lua Script |
|---|---|---|
| Conditional logic | Not inside MULTI | Full if/else/loops |
| Network trips | Multiple | One |
| Atomicity | During EXEC only | Entire script |
| Read-then-write | Requires WATCH + retry | Implicit |
| Performance | Medium | High |
For high-frequency operations, you don’t want to send the full Lua script body over the network on every call. Load the script once:
|
1 2 3 |
SCRIPT LOAD "...lua script..." # Returns a SHA-1 hash |
Then reference it by hash:
|
1 2 |
EVALSHA <sha1> 1 user:42 100 |
This is the EVALSHA pattern — the script body is cached on the server, and you only send the lightweight hash on each call. It’s particularly valuable for operations like rate limiting and leaderboard updates that fire thousands of times per second.
EVALSHA has one significant operational weakness: the script cache is not persisted. It lives in memory only and is flushed on restart. After a failover, the new primary has no knowledge of any cached scripts. Your first EVALSHA call gets a NOSCRIPT error and you have to reload.
Valkey Functions, introduced as part of Redis 7.0 and carried forward into Valkey, solve this properly. Functions are named, library-organised scripts that are stored in the keyspace and replicated through the normal replication stream. They survive restarts. After a primary failure and failover, the new primary already has your functions. So no reloading required which you would need to with LUA.
|
1 2 3 4 5 6 7 8 9 |
# Load the library once — it persists through restarts and replication FUNCTION LOAD #!lua name=ratelimit redis.register_function('rl', function(keys, args) ... end) # Call by name — no script body on the wire FCALL rl 1 user:42 100 |
The distinction is important for production systems: EVALSHA is a performance optimisation; Functions are an operational guarantee. For anything you’d have previously relied on EVALSHA for, Functions (FCALL) are now the right answer.
As data volumes grow or you need more throughput than a single instance can provide, you need to shard across multiple nodes. A Valkey cluster divides the keyspace into 16,384 hash slots using CRC16:
|
1 2 |
Slot = CRC16(key) mod 16,384 |
Each node owns a subset of these slots. This works transparently for single-key operations, but it introduces constraints for anything involving multiple keys.
Multi-key commands, e.g. transactions, Lua scripts, MSET, need all involved keys to reside on the same slot. Attempt to span slots and you get a CROSSSLOT error. Hash tags solve this: by enclosing a portion of the key in curly braces, you force the cluster to hash only that portion.
|
1 2 3 4 5 |
# These will land on the same slot {user:1001}:profile {user:1001}:settings {user:1001}:activity |
Hash tags introduce a risk: if your tag has low cardinality, you’ll concentrate all traffic on a single node while others idle. Using status values like {PENDING} as hash tags means all pending tasks land on one node. The fix is using high-cardinality identifiers as the tag, which maintains locality while keeping the distribution even.
Cluster mode creates a specific challenge for pipelining. A standard pipeline assumes all commands go to the same Valkey server. In a cluster, keys in the same pipeline batch may hash to different slots on different nodes.
Most client libraries handle this by grouping commands by destination slot and sending a sub-pipeline to each relevant node, typically done serially, paying one round trip per node. If your batch touches five nodes, you wait for five sequential round trips. Some libraries simply throw an error for cross-slot pipelines.
PhysicsWallah solved this by building a custom clusterPipeline library that fans sub-pipelines out to all relevant nodes in parallel, then reassembles results in the original command order. The latency is determined by the slowest node, not the sum of all nodes — a significant improvement at scale.
If you’re using Valkey, the official client library – Valkey Glide – does this natively. It maintains the slot map, routes sub-pipelines to each node in parallel, and reassembles results correctly. This is one of the concrete operational advantages of using Glide over legacy clients like Jedis or redis-py when running in cluster mode.
Uber uses Redis as the coordination layer for financial batch processing, handling over 150 million reads and managing real-time ledger updates. Their architecture uses pipelining to group account operations into 250ms time-bounded batches, and designates the Redis instance hosting each batch as the authoritative clock to avoid clock drift across availability zones. Atomic Lua scripts manage the state transitions between batch phases – creation, execution, completion – thus ensuring financial holds and credits are applied without race conditions.
Swiggy found that 50–70% of their ML feature job uptime was being consumed by Redis writes. By moving to pipelined batch writes using Jedis in cluster mode, they cut that write overhead by 90% and reduced overall AWS EC2 and Databricks infrastructure costs by around 60%. The gains came from eliminating CPU context switches and per-command RTT overhead.
PhysicsWallah needed bulk cluster operations but found that standard libraries couldn’t efficiently pipeline across multiple nodes. Their custom clusterPipeline library calculates hash slots for an entire batch, groups commands by responsible node, executes node-specific pipelines in parallel, and returns results in original submission order. The approach significantly improved API response times compared to individual command execution, and it’s essentially the same pattern that Valkey Glide now provides out of the box.
Batch size: cap pipelines at around 1,000 commands. Larger batches queue responses server-side and can exhaust memory.
Lua over WATCH: for read-modify-write flows, prefer Lua scripts or Valkey Functions over MULTI/EXEC with WATCH. Fewer round trips, no retry loop, implicitly atomic.
Hash tag cardinality: use IDs as hash tags in cluster mode. Low-cardinality tags (status values, boolean flags) create hot slots that negate horizontal scaling.
Use Valkey Glide for clusters: if you’re running cluster mode, Glide’s parallel sub-pipeline routing is meaningfully better than the serial approach in most legacy clients.
Disable THP: Transparent Huge Pages cause multi-millisecond latency spikes during RDB fork operations. Add echo never > /sys/kernel/mm/transparent_hugepage/enabled to your server startup.
TCP_NODELAY: Nagle’s algorithm buffers small packets for efficiency, the opposite of what you want for sub-millisecond latency. Ensure your client enables TCP_NODELAY.
Connection pooling: persistent pools reduce handshake overhead and avoid port exhaustion under load.
Pipelining, transactions, and scripting each solve a different problem
The right choice depends on what you’re actually trying to protect. For throughput, pipeline. For consistency, script. For both — design your keys carefully, understand your cluster topology, and use Valkey Glide.