Managing farms of MySQL servers with MySQL FabricFernando Ipar
While built-in replication has been a major cause for MySQL’s wide adoption, official tools to help DBAs manage replication topologies have typically been missing from the picture. The community has produced many good products to fill in this gap, but recently, Oracle has been filling it too with the addition of MySQL Utilities to the mix.
One part of the Utilities that has been generating interest recently is MySQL Fabric, and we will be discussing this project in an upcoming series of blog posts.
According to the official documentation, MySQL Fabric is a system for managing farms of MySQL Servers. At the current stage, the system focuses on two areas of MySQL operations: High Availability and Sharding, relying on GTID based replication (available only on MySQL >= 5.6.5) for the former. Its development has been spearheaded by Mats Kindahl, senior principal software developer in MySQL at Oracle, who explains what it is in this post and again in this short video interview at Percona Live earlier this month.
We will start this series of posts by providing a broad overview of the project, and then we’ll dig deeper on each area on subsequent posts.
What it does
Currently, MySQL Fabric has two areas of server management in which it can help DBAs: High Availability and Sharding.
In terms of High Availability, MySQL Fabric will let you pool a group of MySQL servers and treat them as a single logical unit, with a Primary server that can take reads and writes, and Secondary servers that can take reads (and be used to scale those) as well as take over the Primary role in the event of a failure.
As we’ll see, MySQL Fabric relies on GTID based replication to work. By default, the tool won’t do any automatic failover, but it can be configured to do that, and it does work. In future posts of this series, we’ll spend some time explaining how to set up an HA group of MySQL servers using MySQL Fabric, and then we’ll set to try and break them in many ways. Stay tuned if you’re interested in the results 🙂
When it comes to Sharding, MySQL Fabric relies on special connectors to work. Without the connectors, you can still use the mysqlfabric command line tool to manage shards (and even migrate them from one server to another), but you will have to modify your application to include logic to decide which shard should be used for any given query. When using the connectors, the MySQL connection will actually be established with MySQL Fabric (with an XML RPC service we’ll talk about later). The connectors cache the needed information (this is a simplification, we’ll go into more detail on the next post) so that the client can make the right routing decision when reading or writing data to a sharded table.
We feel that the HA features are more mature now, but the Sharding ones look promising. Also, MySQL Fabric is extensible, and we think this is one area where it can grow a lot (for example, by using Shard Query with MySQL Fabric to support parallel distributed queries).
In order to understand MySQL Fabric we first need to present some terminology used by the project. We’ll start by listing basic definitions, and then we’ll go into more details when needed.
- Group. A collection of mysqld servers.
- Global group. Special groups that store updates that must be propagated to all shards.
- Node. A running instance of MySQL Fabric.
- Shard. A horizontal partition of data in a table.
- Primary. A group member that has been designated master.
- Secondary. A group member that is read only.
A server here is actually a mysqld instance, though ideally, all instances of a group should be on different servers. However, while testing, you can create multiple instances on the same host since you don’t really need HA.
A given server can only be part of a single group. This may seem confusing at first, but when you realize MySQL Fabric relies on replication (using GTID) for most of its work, it becomes clearer. A given MySQL server can only have one master and therefore it makes no sense for it to belong to multiple groups.
Groups have identifiers, which are just symbolic names that need to comply with some basic rules
Global groups are special groups involved in Sharding. They are used to propagate changes global to all shards (like schema changes, though not only that).
Note that a node is *not* a MySQL server that’s part of a group. It is a python program that, among other things, provides the XML-RPC server that is used by special connectors and by the ‘mysqlfabric’ command line client. A node will, however, need a mysqld instance. This instance is called the backend store and will be used by MySQL Fabric to save all the information it needs to manage servers.
We said a shard is an horizontal partition of data in a table, but this partition happens at the MySQL Fabric level. MySQL is completely unaware of it, and at the mysqld instance level, a shard is just a table. We’ll talk more about this soon, as it has some consequences.
The primary server is the only writable server in a group. This applies to HA, not to sharding, though you could define a group (and therefore a Primary) per shard and therefore use MySQL Fabric both for sharding, and to provide HA for each shard.
A Secondary server is a member of a group that is available to replace a Primary server on failover, and that is read only.It can also be used to scale out reads.
As with anything, MySQL Fabric has its good and bad points. Fortunately, the bulk of the bad points we identified are due to the project being still early in its lifecycle. Considering the latest release is a RC, we’re sure those will go away the future.
On the good side:
- It is developed by Oracle.
We think this is important, because MySQL did not have a standard tool to manage farms of servers until now. People will still be able to use tools from other providers (or cook their own), but we think it’s good to have a standard offering that’s part of the MySQL packages.
- It is extensible.
MySQL Fabric feels more like a framework than a closed product. In fact, even some official presentations introduce it as a framework. As a framework, it is implemented in python, a widely available and friendly interpreted language. We believe this means MySQL Fabric should be adaptable to specific needs with little hassle.
- It is focused on MySQL versions 5.6.10 and newer.
By not worrying about backwards compatibility, implementation should be simpler. Take Secondary server promotion after the Primary goes down as example. GTID makes this much simpler.
On the bad side:
- It is a bit rough around the edges.
This is expected for a new product, and to be honest, most problems we faced turned out being a documentation issue and not an actual problem with the software. The docs are still a bit green, but source code comments are good and plenty, so if you want to get your hands dirty and really understand how it works, that is the path we suggest.
- Some things require help from MySQL to be properly implemented, and MySQL does not provide it (yet?).
An example: it is currently impossible to make sure that writes can never go to the wrong shard. As we said earlier, at the individual mysqld server, a shard is just a table, period. Ideally, some future MySQL version should extend the @read_only variable functionality and allow you to selectively mark parts of the data as read only. If we could mark a specific table as read only, or as updatable only when a given condition is met (i.e. WHERE id between <lower bound> and <upper bound>), it would increase the safety of sharding a lot.
- It is focused on MySQL versions 5.6.10.
Yes, we said that’s good, but the obvious downside is you can’t consider using MySQL Fabric if you’re on an earlier MySQL version, and a lot of people still are.
- In HA setups, MySQL Fabric itself can become a single point of failure
MySQL Fabric (the XML-RPC ‘Node’) and its data store (the mysqld instance that stores MySQL Fabric’s data) are a single point of failure that needs to be addressed. In practical terms, the impact of MySQL Fabric going down will vary with your use case. If you’re only using the mysqlfabric utility to manage servers, nothing will happen as long as all servers in a Group continue to work. If, however, you’re using one of the special connectors to access the Group, then your application will be down. This is resolvable, and we will discuss some approaches in the HA posts, but we think the best solution going forward is for MySQL Fabric to address this in the future by letting you set up multiple Fabric Nodes and have them monitor each other and promote a new active one if needed.
What comes next
Over the next few days, we will blog more about this, describing how to set up HA and Sharded clusters using MySQL Fabric, walking you through the process, and also trying to see how and when it can fail. We had a lot of fun doing this and we hope you have a good time reading it and experimenting yourself too!