Reference Architecture(s) for High Availability Solutions in Geographic Distributed Scenarios: Why Should I Care?Marco Tusa
In this series of blog posts, I’m going to look at some high availability reference architecture solutions over geographically distributed areas.
Nowadays, when businesses plan a new service or application, it is very common for them to worry about ensuring a very high level of availability.
It doesn’t matter if we are talking about an online shop, online banking or the internal services of a large organization. We know users are going to expect access to services 24x7x365. They also expect to access data consistently and instantaneously. If we fail to meet their expectations, then they move to another provider and we lose money. Simple as that.
The other important aspect of providing online services and applications is that the amount of data produced, analyzed and stored is growing every day. We’ve moved from the few gigabytes of yesterday to terabytes today. Who knows what number of petabytes we need tomorrow?
What was once covered with a single LAMP stack, today can require dozens of Ls, As, different letters instead of P (like J, R, Py, G) and M. Our beloved MySQL that used to be “enough” to cover our needs 12 years ago is not fitting well with all the needs of many modern applications.
It is very common to have an application using different types of “storage” at different levels and in different aspects of their activities. We can use a key-value store to cache inflight operations, and a relational full ACID database for the “valuable” core data (the kind of data that must be consistent and durable). Large data gets stored in an eventually consistent columns store mechanism, and long-term data in some “big data” approach.
On top of all this is are reporting mechanisms that collect elements of each data store to provide a required, comprehensive data picture. The situation is very diversified and complex, and the number of possible variables is high. The way we can combine them is so vast that nowadays developers have no limits, and often comes up with creative solutions.
This is where we as architects can help: we can clarify how each tool can be used for the right JOB. We, at Percona, have the strong belief that we must seek simplicity in the complexity, and embracing the KISS approach. This starts with the initial identification of the right tool for the job.
Let’s start by looking at the following good practices in the following examples:
- It is not a good idea to use key-value storage if you need to define the relationship between entities and rules between them.
- Avoid using an eventually consistent storage when you have to save monetary information about customer payments.
- It’s not a best practice to use a relational database to store HTML caching, page-tracking information, or game info in real time.
Use the right tool for the right job. Some tools scale writes better and keep an eventually consistent approach. Some others are designed to store an unbelievable amount of data, but cannot handle relations. As a result, they might take a long time when processing a typical OLTP request – if they can at all. Each tool has a different design and goal, each one scales differently, and each one has its way of handling and improving availability.
It is a crucial part of the architectural phase of your project not to mix the cards. Keep things clean and build the right architecture for each component. Then combine them in the way that harmonizes in the final result. We should optimize each block when solving a complex issue with simple answers.
How far are we from the old LAMP single stack? Ages. It is like turning your head and looking at our ancestors building the first tents. Tents are still a valid solution if you want to go camping. But only for fun, not for everyday life.
There is too often confusion around what a relational database should do and how it should do it. A relational database should not replace every other component of the wide architecture, and vice versa. They must coexist and work together with other options. Each one should maximize its characteristics and minimize its limitations.
In this series, we will focus on RDBMSs, and we will present a few possible reference architectures for the relational database layer. I will illustrate solutions that improve service availability, keeping a focus on what the tool’s design and the relational data approach concerning the ACID paradigm.
This means employing the simple rules of:
- Atomicity -> All operations, part of the same transaction, are concluded successfully or not applied at all.
- Consistency -> Any data written must be valid/validated against the defined rules and combination thereof.
- Isolation -> Guarantees that all transactions will occur in isolation. No transaction affects any other transaction.
- Durability -> Durability means that, once a transaction is committed, it will remain in the system even if there’s a system crash immediately following the transaction. Transaction changes must be stored permanently.
We will discuss the solution involving the most common open source RDBMSs, covering on-premises and in the cloud:
The scenario will be common to all solutions, but the way we implement the solution will instead answer to different needs. The first example is MySQL high availability on premises: MySQL High Availability on premises.