September 1, 2014

Managing farms of MySQL servers with MySQL Fabric

While built-in replication has been a major cause for MySQL’s wide adoption, official tools to help DBAs manage replication topologies have typically been missing from the picture. The community has produced many good products to fill in this gap, but recently, Oracle has been filling it too with the addition of MySQL Utilities to the mix.

One part of the Utilities that has been generating interest recently is MySQL Fabric, and we will be discussing this project in an upcoming series of blog posts.

According to the official documentation, MySQL Fabric is a system for managing farms of MySQL Servers. At the current stage, the system focuses on two areas of MySQL operations: High Availability and Sharding, relying on GTID based replication (available only on MySQL >= 5.6.5) for the former. Its development has been spearheaded by Mats Kindahl, senior principal software developer in MySQL at Oracle, who explains what it is in this post and again in this short video interview at Percona Live earlier this month.

We will start this series of posts by providing a broad overview of the project, and then we’ll dig deeper on each area on subsequent posts.

What it does

Currently, MySQL Fabric has two areas of server management in which it can help DBAs: High Availability and Sharding.

In terms of High Availability, MySQL Fabric will let you pool a group of MySQL servers and treat them as a single logical unit, with a Primary server that can take reads and writes, and Secondary servers that can take reads (and be used to scale those) as well as take over the Primary role in the event of a failure.

As we’ll see, MySQL Fabric relies on GTID based replication to work. By default, the tool won’t do any automatic failover, but it can be configured to do that, and it does work. In future posts of this series, we’ll spend some time explaining how to set up an HA group of MySQL servers using MySQL Fabric, and then we’ll set to try and break them in many ways. Stay tuned if you’re interested in the results :)

When it comes to Sharding, MySQL Fabric relies on special connectors to work. Without the connectors, you can still use the mysqlfabric command line tool to manage shards (and even migrate them from one server to another), but you will have to modify your application to include logic to decide which shard should be used for any given query. When using the connectors, the MySQL connection will actually be established with MySQL Fabric (with an XML RPC service we’ll talk about later). The connectors cache the needed information (this is a simplification, we’ll go into more detail on the next post) so that the client can make the right routing decision when reading or writing data to a sharded table.

We feel that the HA features are more mature now, but the Sharding ones look promising. Also, MySQL Fabric is extensible, and we think this is one area where it can grow a lot (for example, by using Shard Query with MySQL Fabric to support parallel distributed queries).

General Concepts

In order to understand MySQL Fabric we first need to present some terminology used by the project. We’ll start by listing basic definitions, and then we’ll go into more details when needed.

  • Group. A collection of mysqld servers.
  • Global group. Special groups that store updates that must be propagated to all shards.
  • Node. A running instance of MySQL Fabric.
  • Shard. A horizontal partition of data in a table.
  • Primary. A group member that has been designated master.
  • Secondary. A group member that is read only.

Group

A server here is actually a mysqld instance, though ideally, all instances of a group should be on different servers. However, while testing, you can create multiple instances on the same host since you don’t really need HA.

A given server can only be part of a single group. This may seem confusing at first, but when you realize MySQL Fabric relies on replication (using GTID) for most of its work, it becomes clearer. A given MySQL server can only have one master and therefore it makes no sense for it to belong to multiple groups.

Groups have identifiers, which are just symbolic names that need to comply with some basic rules

Global group

Global groups are special groups involved in Sharding. They are used to propagate changes global to all shards (like schema changes, though not only that).

Node

Note that a node is *not* a MySQL server that’s part of a group. It is a python program that, among other things, provides the XML-RPC server that is used by special connectors and by the ‘mysqlfabric’ command line client. A node will, however, need a mysqld instance. This instance is called the backend store and will be used by MySQL Fabric to save all the information it needs to manage servers.

Shard

We said a shard is an horizontal partition of data in a table, but this partition happens at the MySQL Fabric level. MySQL is completely unaware of it, and at the mysqld instance level, a shard is just a table. We’ll talk more about this soon, as it has some consequences.

Primary

The primary server is the only writable server in a group. This applies to HA, not to sharding, though you could define a group (and therefore a Primary) per shard and therefore use MySQL Fabric both for sharding, and to provide HA for each shard.

Secondary

A Secondary server is a member of a group that is available to replace a Primary server on failover, and that is read only.It can also be used to scale out reads. 

Highlights 

 As with anything, MySQL Fabric has its good and bad points. Fortunately, the bulk of the bad points we identified are due to the project being still early in its lifecycle. Considering the latest release is a RC, we’re sure those will go away the future.

On the good side:

  • It is developed by Oracle.

We think this is important, because MySQL did not have a standard tool to manage farms of servers until now. People will still be able to use tools from other providers (or cook their own), but we think it’s good to have a standard offering that’s part of the MySQL packages.

  • It is extensible.

MySQL Fabric feels more like a framework than a closed product. In fact, even some official presentations introduce it as a framework. As a framework, it is implemented in python, a widely available and friendly interpreted language. We believe this means MySQL Fabric should be adaptable to specific needs with little hassle.

  • It is focused on MySQL versions 5.6.10 and newer.

By not worrying about backwards compatibility, implementation should be simpler. Take Secondary server promotion after the Primary goes down as example. GTID makes this much simpler.

On the bad side:

  • It is a bit rough around the edges.

This is expected for a new product, and to be honest, most problems we faced turned out being a documentation issue and not an actual problem with the software. The docs are still a bit green, but source code comments are good and plenty, so if you want to get your hands dirty and really understand how it works, that is the path we suggest.

  • Some things require help from MySQL to be properly implemented, and MySQL does not provide it (yet?).

An example: it is currently impossible to make sure that writes can never go to the wrong shard. As we said earlier, at the individual mysqld server, a shard is just a table, period. Ideally, some future MySQL version should extend the @read_only variable functionality and allow you to selectively mark parts of the data as read only. If we could mark a specific table as read only, or as updatable only when a given condition is met (i.e. WHERE id between <lower bound> and <upper bound>), it would increase the safety of sharding a lot.

  • It is focused on MySQL versions 5.6.10.

Yes, we said that’s good, but the obvious downside is you can’t consider using MySQL Fabric if you’re on an earlier MySQL version, and a lot of people still are.

  • In HA setups, MySQL Fabric itself can become a single point of failure

MySQL Fabric (the XML-RPC ‘Node’) and its data store (the mysqld instance that stores MySQL Fabric’s data) are a single point of failure that needs to be addressed. In practical terms, the impact of MySQL Fabric going down will vary with your use case. If you’re only using the mysqlfabric utility to manage servers, nothing will happen as long as all servers in a Group continue to work. If, however, you’re using one of the special connectors to access the Group, then your application will be down. This is resolvable, and we will discuss some approaches in the HA posts, but we think the best solution going forward is for MySQL Fabric to address this in the future by letting you set up multiple Fabric Nodes and have them monitor each other and promote a new active one if needed.

What comes next

Over the next few days, we will blog more about this, describing how to set up HA and Sharded clusters using MySQL Fabric, walking you through the process, and also trying to see how and when it can fail. We had a lot of fun doing this and we hope you have a good time reading it and experimenting yourself too!

About Fernando Ipar

Fernando is part of Percona's team working as Senior Consultant. Prior to joining Percona, Fernando worked as a consultant for financial services institutions, telcos, and technology providers.

Comments

  1. Jeremy Tinley says:

    I had the same concern about the HA capacity of the Fabric node. In chatting with Mats Kindhal, it can be any MySQL, even MySQL Cluster, so you can roll your own HA (DRBD/Heartbeat/LoadBalancer/Whatever) on that node to ensure it’s up.

  2. Fernando says:

    Jeremy: Thanks for the comment. That makes sense because the docs say that the Fabric Store is storage engine agnostic, but our main concern is with HA for the xml-rpc service and not the store, as that becomes a single point of failure for new connections. It is not too difficult to resolve, but I think eventually it would be better if Fabric itself provides a solution to this. Perhaps it’s as simple as firing up multiple fabric nodes, and passing a list of them to the connector (something similar can already be done with actual mysql servers in connector/j for example).

  3. Ben Ustick says:

    Hey Guys,
    I just wanted to reach out and let you know that I included your article in the InterWorx roundup of April’s best SysAdmin content. http://www.interworx.com/community/april-round-best-administration-enterprise-hosting-security-content/ Thanks for the valuable resource.

    Ben

  4. MWM says:

    Dear Fernando,

    Finally, I able to establish a fabric backing store. Define the group of two nodes. Now I want to implement the other end (Client Side).

    In my scenario, my application is deployed on weblogic 12c server. I am not clear about what will be the configurations for creating the datasource. Default MySQL connector/j installed with weblogic is 5.1.22 which does not supports fabric. I downloaded the latest one i.e 5.1.30 and configured with weblogic too but fails :(.

    I think I am missing some basic config or something related to create a datasource.

    I searched a lot to find out the way of doing it but unfortunately fails.

    Kindly, send me any info, link or clue through which ill be able to achieve solution.

    Regards
    MWM

  5. @MWM:

    Are you hitting this error with 5.1.30?: http://bugs.mysql.com/bug.php?id=72713

  6. MWM says:

    Dear Fernando,

    No, I am behind this issue and did no faced yet.

    Actually, in every example I saw that connections made through Java source code. But I want to know that how fabric would be configure as a datasource.

    Assume, datasource defined as host:192.168.0.69 port:3306 lets say 192.168.0.69 is down so how my application deployed on weblogic would know the new master.

    So I am confuse about the datasource configuration in weblogic in case of Fabric setup.

    Actually I am searching on internet from last 8 continous hours … :) I hope i delivered the problem situation clearly.

    Regards
    M.Waqas

  7. @MWM:

    I have not tried to use it as a datasource within a container, but I would try to do it similarly to the example I put on that bug report, that is, establishing the connection with Fabric and not any specific mysql instance.

    As for how your application would know your new master, I was unable to do more testing at this stage due to that bug, but I expect it to work similarly to the Python connector. That would mean the JDBC driver gets a list of servers and roles and caches that, and that the application should attempt to establish a new connection (Again, with Fabric) if a query fails because one of the nodes becomes unavailable. This way, the drive will get an updated list of servers/roles.

    I have it on my list to blog more about using Fabric from Java but for the time being it seems I will wait for the next Connector/J release for this.

  8. MWM says:

    Dear Fernando,

    Your Statement:

    “I have not tried to use it as a datasource within a container, but I would try to do it similarly to the example I put on that bug report, that is, establishing the connection with Fabric and not any specific mysql instance.”

    Totally agreed. Lets see when we’ll get chance to move forward. :)

    Regards
    MWM

  9. samuel says:

    Hi Fernando,

    We are trying fabric to manage servers and make automatic failovers. It’s quite easy to promote a slave into master. But we also want to use mysql-proxy to make load balances, and rw splitting.

    The backend of mysql proxy is mysql servers( or nodes of fabric), if one of the master has gone down and a new master has been promoted, is there any way for mysql proxy to know the promoted master?

    Thanks very much in advance.

  10. @samuel: sorry for the delay, I’ve been on vacation for the last couple of weeks.

    There’s currently no way to do this (that I know of, at least), but while I’m no python expert, a cursory look at the code makes me think it should be easy to extend it to hook up other functions (such as notifying mysql-proxy) during promotion.

    Or you can file a bug report with a feature request at http://bugs.mysql.com/. I think adding support to call external scripts at the time of promotion/failover would be a great improvement to Fabric.

Speak Your Mind

*