Consul ArchitectureFederico Razzoli
I approached Consul recently while looking for a service discovery and configuration automation solution for ProxySQL. My colleague Nik Vyzas wrote a great post on this topic, and I suggest you read it. I wrote this article to share my first impressions of Consul (for whomever it might interest).
Consul is a complete service discovery solution. In this respect it differs from its alternative etcd, which only provides a foundation to build such solutions.
Consul consists of a single, small binary (the Linux binary is 24MB). You just download it, edit the configuration file and start the program. It doesn’t need a package. The Consul binary does it all. You can start it as a server or as a client. It also provides a set of administrative tasks, usable via the command-line or the HTTP API.
But what is Consul about?
I mentioned service discovery, which is the primary purpose of Consul. But what is it?
Suppose that you have a Percona XtraDB Cluster. Applications query this cluster via ProxySQL (or another proxy), which distributes the workload among the running servers. But the applications still need to know ProxySQL’s address and port. But what if we can’t reach the ProxySQL instance? Well, service discovery is what allows applications to reach a running ProxySQL server. A service discovery server is a server that tells applications the IP address and port of a running service they need. It can also store information about service configuration.
Let’s continue with our Percona XtraDB Cluster and ProxySQL example. Here is what Consul can do for us:
- When a node is added, automatically discover other nodes in the cluster.
- When a proxy is added, automatically discover all cluster nodes.
- Automatically configure the proxy with users and other settings.
- Even some basic monitoring, thanks to Consul health checks.
Now, let’s see how it does these things.
If you only want to test Consul interfaces from a developer point of view, you can start a stand-alone Consul instance in developer mode. This means that Consul will run in-memory, and will not write anything to disk.
Applications can query Consul in two ways. It can be queried like a DNS server, which is the most lightweight option. For example, an application can send a request for mysql.service.dc1.consul, which means “please find a running MySQL service, in the datacenter called dc1.” Consul will reply with an A or SRV record, with the IP and possibly the port of a running server.
You can make the same request via a REST API. The API can register or unregister services, add health checks, and so on.
Consul performs health checks to find out which services are running. Consul expects to receive an integer representing success or error, just like Nagios. In fact, you can use Nagios plugins with Consul. You can even use Consul as a basis for a distributed monitoring system.
The HTTP API also includes endpoints for a KV store. Under the hood, Consul includes BoltDB. This means you can use Consul for configuration automation. Endpoints are also provided to implement distributed semaphores and leader election.
The Consul binary also provides an easy command-line interface, mainly used for administrative tasks: registering or unregistering services, adding new nodes to Consul, and so on. It also provides good diagnostic commands.
In production, Consul runs as a cluster. As mentioned above, each instance can be a server or a client. Clients have less responsibilities: when they receive queries (reads) or transactions (writes), they act like a proxy and forward them to a server. Each client also executes health checks against some services, and informs servers about their health status.
Servers are one of two types: an elected leader, and its followers. The leader can change at any moment. When a follower receives a request from a client, it forwards it to the leader. If it is a transaction, the leader logs it locally and replicates it to the followers. When more than half of them accept the changes, the transaction gets committed. The term “transaction” is a bit confusing: since version 0.7, think of a “transaction” as something that changes the state of the cluster.
Reads can have three different consistency levels, where stricter levels are slower. Followers forward queries to the leader by default, which in turn contacts other followers to check if it is still the leader. This mechanism guarantees that the applications (the users) never receive stale data. However, it requires a considerable amount of work. For this reason, less reliable but faster consistency levels are supported (depending on the use case).
Therefore, we can say that having more servers improves the reliability in case of some nodes crashes, but lowers the performance because it implies more network communications. The recommended number of servers is five. Having a high number of clients makes the system more scalable, because the health check and request forwarding work is distributed over all clients.
Multi-cluster configurations are natively supported, for geographically distributed environments. Each cluster serves data about different services. Applications, however, can query any cluster. If necessary, Consul will forward the request to the proper cluster to retrieve the required information.
Currently most Linux distributions do not include Consul. However the package is present in some versions that are not yet stable (like Debian Testing and Ubuntu 16.10).
Some community packages also exist. Before using them, you should test them to be sure that they are production-ready.
Consul in Docker
Consul’s official Docker image is based on Alpine Linux, which makes it very small. Alpine Linux is a distribution designed for embedded environments, and has recently become quite popular in the Docker world. It is based on Busybox, a tiny re-implementation of GNU basic tools.
The image is also very secure. Normally containers run a daemon as root; Consul runs as consul user, via a sudo alternative called gosu.
A Good Use Case
When we start a new container in a “dockerized” environment, we cannot predict its IP. This is a major pain when setting up a cluster: all nodes must be configured with other nodes addresses, and optionally a proxy (like ProxySQL) must know the nodes’ addresses. The problem reappears every time we add a new node, a new slave, or a new proxy. Consul is a great way to solve this problem. We will see this in depth in a future post.