When scaling Redis or its open source fork Valkey, a single instance can cause a bottleneck. The solution is to create a sharded cluster, where the cluster partitions data across multiple nodes. Understanding how this partitioning works is crucial for designing efficient, scalable applications. This article explores the mechanics of key distribution, the use of hash tags for data locality, and the potential pitfalls of this powerful feature.
The Hash Slot Model
A Redis/Valkey cluster distributes keys using hash slots instead of applying a consistent hashing ring directly to node identities. Instead, it uses a concept called hash slots. The system divides the entire keyspace into 16,384 slots (numbered 0 to 16383).
When a key needs to be stored, the cluster determines which slot it belongs to by applying a CRC16 hashing algorithm to the key’s name. The formula is simple:
slot=CRC16(key) % 16384
The cluster assigns each primary node a subset of the 16,384 slots. For example, in a three-node cluster:
- Node A holds slots 0 – 5460.
- Node B holds slots 5461 – 10922.
- Node C holds slots 10923 – 16383.
When you send a command like SET mykey “value” to any node in the cluster, that node first calculates CRC16(“mykey”) % 16384. If the resulting slot belongs to the node that received the command, it processes it. If the slot belongs to a different node, the node redirects the client to the correct node, which then stores the key. This distribution model ensures that the cluster spreads data relatively evenly across all nodes..
To learn more about the key distribution model, refer to this documentation:
Controlling Key Placement with Hash Tags
The default distribution behavior spreads the load effectively but creates issues for operations that work on multiple keys simultaneously. If the keys involved in a multi-key operation map to different slots—and therefore different nodes—the cluster fails the operation with a CROSSSLOT error.
|
1 2 3 4 5 6 |
172.16.1.10:6379> CLUSTER KEYSLOT key1 (integer) 9189 172.16.1.10:6379> CLUSTER KEYSLOT key2 (integer) 4998 172.16.1.10:6379> MSET key1 "hello" key2 "world" (error) CROSSSLOT Keys in request don't hash to the same slot |
This is where hash tags come in. A hash tag is a part of the key name enclosed in curly braces {abc123} . When a key contains a hashtag, the cluster’s hashing algorithm only considers the portion of the key inside the braces to calculate the hash slot.
For example, consider these keys:
- user:{1001}:name
- user:{1001}:email
- user:{1001}:session
Without the {} braces, the cluster would hash these keys to different slots, placing them on different nodes. With the hash tag {1001}, the cluster calculates the slot based only on the string “1001″.
slot=CRC16("1001") % 16384
Because the input to the hash function is the same for all three keys, the cluster always places them in the same hash slot, so they reside on the same physical node. This design allows you to perform atomic multi-key operations on all the data related to user 1001. This is an incredibly powerful feature for ensuring data locality and consistency for related information.
To learn more about hash tags, refer to this documentation:
The Challenge of ‘Hot Slots’
While hash tags solve the problem of data locality, they can introduce a new one: hot slots or hot spots.
A hash slot itself doesn’t have a “size limit” in terms of bytes. A single slot can hold millions of keys, and the node’s available memory limits that number. The problem arises when a single hash tag becomes overwhelmingly popular, causing a single slot to accumulate a disproportionate amount of data and traffic.
What happens when a {keyname} group grows too large?
- Uneven Memory Distribution: The cluster stores all keys associated with a single tag on a single node. If this set of keys grows to 500 GB, that one node will be responsible for storing all of it, while other nodes in the cluster might have much less data. This negates the benefit of distributing data across the cluster.
- Unbalanced CPU and Network Load: Every read, write, and update for that popular set of keys will hit the same single node. This can max out the CPU and network bandwidth of that specific machine, making it a performance bottleneck for the entire cluster. Even if other nodes are idle, they cannot help process the traffic directed at this “hot slot.”
- Blocked Operations: High traffic to a single node can lead to increased latency for all commands processed by that node, not just those related to the hot slot.
In essence, by forcing a massive amount of related data into a single slot, you are effectively undermining the “shared nothing” architecture of the cluster and creating a mini single-point-of-failure.
To mitigate this, it’s essential to design your key schema carefully. Avoid using a single hash tag for a collection that you expect to grow indefinitely or receive extremely high traffic. Instead, consider breaking the data down further.
A successful sharding strategy relies on designing hash tags that maintain balanced levels of cardinality, frequency, and monotonicity. The goal is to group enough related data to enable efficient atomic operations, while avoiding excessive concentration that could create new bottlenecks and defeat the purpose of clustering.
Let’s see this principle in action by analyzing a common pitfall: using a low-cardinality attribute for a hash tag.
Imagine a system tracking millions of tasks for a project management tool. Each task has a status: PENDING, IN_PROGRESS, or COMPLETED.
Bad Hash Tag Selection (Low-Cardinality Grouping)
- Key Pattern: task:{PENDING}:id_123, task:{PENDING}:id_456, task:{IN_PROGRESS}:id_789
- Hash Tag: {PENDING}, {IN_PROGRESS}, {COMPLETED}
- Problem: This design is catastrophic because its cardinality is extremely low (only three unique values). The cluster’s partitioning logic only sees three possible inputs for its hash function.
This means that your entire multi-terabyte dataset, containing millions of tasks, will be concentrated in only 3 out of the 16,384 available slots. As a result, at best, only three nodes in your cluster will store any data, while other nodes remain idle.. If 90% of the tasks are in a PENDING state, nearly all the data and traffic will be directed to a single node, creating a severe hot spot and effectively nullifying the cluster’s ability to balance the load.
Better Hash Tag Selection (Striking the Right Balance)
- Key Pattern: task:{project_A}:id_123, task:{project_A}:id_456, task:{project_B}:id_789
- Hash Tag: {project_A}, {project_B}, etc. (Assuming many projects)
- Benefit: This is a much better approach that strikes the required balance. The data is now sharded by the project ID, which has a much higher and more evenly distributed cardinality.
With this approach, the cluster stores all tasks for project_A on the same node, achieving data locality and enabling efficient atomic operations, such as querying all tasks for that project. Meanwhile, the data for project_B resides in a different slot, and the data for thousands of other projects is evenly distributed across all nodes in the cluster. This ensures proper data distribution and allows the system to scale horizontally without introducing new bottlenecks.
In conclusion, hash tags in a Redis/Valkey cluster are a double-edged sword. They enable data locality and powerful atomic operations, but a poorly chosen, low-cardinality tag can concentrate data and traffic on a single node, creating severe bottlenecks. Effective scaling isn’t just about adding nodes; it requires thoughtful key schema design that accounts for data cardinality and access patterns to ensure both consistency and scalability.