Scaling PostgreSQL using Connection Poolers and Load Balancers for an Enterprise Grade environment

Scaling PostgreSQL using Connection Poolers and Load Balancers for an Enterprise Grade environment

PREVIOUS POST
NEXT POST

PostgreSQL® logo

In the previous blog posts in this series we’ve covered some of the essential aspects of an Enterprise-grade solution: security, high availability, and backups. Another important aspect is the scalability of the solution: as our application grows how do we accommodate an increase in traffic while maintaining the quality of the service (response time)? The answer to this question depends on the nature of the workload at play but it is often shaped around:

(a) improving its efficiency and
(b) increasing the resources available.

Why connection pooling ?

When it comes to improving the efficiency of a database workload, one of the first places we start looking at is the list of slow queries; if the most popular ones can be optimized to run faster then we can easily gain some overall performance back. Arguably, we may look next at the number and frequency of client connections: is the workload composed of a high number of very frequent but short-lived connections? Or are clients connections of a more moderate number, and tend to stick around for longer ?

If we consider the first scenario further–a high number of short lived connections–and that each connection spawns a new OS process, the server may hit a practical limit as to the number of transactions—or connections—it can manage per second, considering the hardware available and the workload being processed. Remember that PostgreSQL is process-based, as opposed to thread-based, which is itself an expensive operation in terms of resources, both CPU and memory.

A possible remedy for this would be the use of a connection pooler, acting as a mediator between the application and the database. The connection pooler keeps a number of connections permanently opened with the database, and receives and manages all incoming requests from clients itself, allowing them to temporarily use one of the connections it already has established with PostgreSQL. This removes the burden of creating a new process each time a client establishes a connection with PostgreSQL, and allows it to employ the resources that it would otherwise use for this into serving more requests (or completing them faster).

Rule of thumb?

A general rule of thumb that we often hear is that you may need a connection pooler once you reach around 350 concurrent connections. However, the actual threshold is highly dependent on your database traffic and server configuration: as we find out recently, you may need one much sooner.

You may implement connection pooling using your native application connection pooler (if there is one available) or through an external connection pooler such as PgBouncer and pgPool-II. For the solution we have built, which we demonstrate in our webinar of October 10, we have used PgBouncer as our connection pooler.

PgBouncer

PgBouncer is a lightweight (thread-based) connection pooler that has been widely used in PostgreSQL based environments. It “understands” the PostgreSQL connection protocol and has been a stable project for over a decade.

PgBouncer allows you to configure the pool of connections to operate in three distinct modes: session, statement and transaction. Unless you’ve a good reason to reserve a connection in the pool to a single user for the duration of its session, or are operating with single-statements exclusively, transaction mode is the one you should investigate.

A feature that is central to our enterprise-grade solution is that you can add multiple connection strings using unique alias names (referred to as database names). This allows greater flexibility when mediating connections with multiple database servers. We can then have an alias named “master_db” that will route connections to a master/primary server and another alias named “slave_db” that will route connections to a slave/standby server.

Scaling up

Once efficiency is taken care of, we can then start working on increasing the resources, or computing power, available to process database requests. Scaling vertically means, in short, upgrading the server: more and faster cores, memory, storage. It’s a simple approach, but one that reaches a practical limitation rather quickly. It is not inline with other requirements of an enterprise grade solution, such as high availability. The alternative is scaling horizontally. As briefly introduced above, a common way for implementing horizontal scalability is to redirect reads to standby servers (replicas) with the help of a proxy, which can also act as a load balancer, such as HAProxy. We’ll be discussing these ideas further here, and showcase their integration in our webinar.

HAProxy 

HAProxy is a popular open source TCP/HTTP load balancer that can distribute the workload across multiple servers. It can be leveraged in a PostgreSQL replication cluster that has been built using streaming replication. When you build replication using streaming replication method standby replicas are open for reads. With the help of HAProxy you can efficiently utilize the computing power of all database servers, distributing read requests among the available replicas using algorithms such as Least Connection and Round Robin.

A combination of connection pooler and load balancer to scale PostgreSQL

The following diagram represents a simplified part of the architecture that composes the enterprise grade solution we’ve designed, where we employ PgBouncer and HAProxy to scale our PostgreSQL cluster:

Our PgBouncer contains two database (alias) names, one for redirecting writes to the master and another for balancing reads across standby replicas, as discussed above. Here is how the database section looks in the pgbouncer.ini, PgBouncer’s main configuration file:

Notice that both database entries redirect their connections to the HAProxy server, but each to a different port. The HAProxy, in turn, is configured to route the connections in functions of the incoming port they reach. Considering the above pgBouncer config file as a reference, writes (master connections) are redirected to port 5002 and reads (slave connections) to port 5003. Here is how the HAProxy config file looks:

As seen above:

  • connections to port 5002 are sent to server with role: “master”
  • connections to port 5003 are sent to server with role : “replica”

In a previous post, we discussed using Patroni in our high availability setup. HAProxy relies on Patroni to determine the role of the PostgreSQL server. Patroni is being used here for cluster management and automatic failover. By using Patroni’s REST API (on port 8008 in this scenario) we can obtain the role of a given PostgreSQL server. The example below shows this in practice, the IP addresses denoting the PostgreSQL servers in this setup:

HAProxy can thus rely on Patroni’s REST API to redirect connections from the master alias in PgBouncer to a server with role master. Similarly, HAProxy uses server role information to redirect connections from a slave alias to one of the servers with role replica, using the appropriate load balancer algorithm.

This way, we ensure that the application uses the advantage of a connection pooler to leverage connections to the database, and also of the load balancer which distributes the read load to multiple database servers as configured.

There are many other open source connection poolers and load balancers available to build a similar setup. You can choose the one that best suits your environment—just make sure to test your custom solution appropriately before bringing it to production.

PREVIOUS POST
NEXT POST

Share this post

Comments (5)

  • Alexander Reply

    > tcp-check expect string “role”:\ “master”

    This is considered to be a bad practice, you shouldn’t do that!
    In some situations it might happen that role is master (because postgres is not running in recovery), but the leader key is owned by some other node.

    Patroni provides a few REST API endpoints for master-replica health checks, which will return http status codes 200 or 503. Haproxy must rely only on http status codes.

    “/master” — will return http status code 200 only if the node is holding the leader key in DCS. In all other cases it will return 503.
    “/replica” — will return code 200 if node is running as replica and it is allows load balancing (noloadbalance tags is not set)

    Besides that it is better to use OPTIONS request method instead of GET:
    option httpchk OPTIONS /master
    http-check expect status 200

    In order to automate everything, one can use confd (https://github.com/kelseyhightower/confd), and Patroni provides you an example of haproxy template: https://github.com/zalando/patroni/blob/master/extras/confd/templates/haproxy.tmpl#L19-L32
    Confd will take care about generating the new haproxy config and reloading/restarting haproxy if the list of Patroni nodes registered in DCS is changing.

    October 3, 2018 at 5:14 am
    • Jobin Augustine Reply

      Thank you very much for this detailed expert comment. We hope all our readers will benefit from this information.
      The HAProxy configuration template which uses http-check rather than tcp-check is much better and simpler.
      confd can can make the configuration much more dynamic also.

      October 3, 2018 at 9:42 am
    • Avinash Vallarapu Reply

      Hi Alexander,

      Correct, this method is simpler too. Both the methods get the information from Patroni and should give same information. Do you see a situation where the leader key is with another node than the actual master ?

      October 3, 2018 at 10:27 am
  • Andy Reply

    Why would you put HAProxy behind PgBouncer? The point of PgBouncer is to maintain a pool of PG connections that can be reused. Now that you put HAProxy in there, PgBouncer can’t even connect directly with Postgresql. What’s the point?

    October 3, 2018 at 11:01 am
    • Avinash Vallarapu Reply

      Thank You for reading through the blog. pgBouncer still maintains persistent connections and behave no different. Many applications with native connection poolers use HA Proxy similar way. Using pgBouncer before HAProxy still does the job of a connection pooler as it does when it directly connects to DB

      October 3, 2018 at 11:14 am

Leave a Reply