September 22, 2014

Q&A: Putting MySQL Fabric to use

Percona MySQL webinar followup: Q&AMartin Arrieta and I gave an online presentation last week on “Putting MySQL Fabric To Use.” If you missed it, you can find a recording and the slides here, and the vagrant environment we used plus a transcript of the commands we ran here (be sure to check out the ‘sharding’ branch, as that’s what we used during the webinar).

Thank you all for attending and asking interesting questions. We were unable to answer all of them in the scheduled time, so here are our replies to all the questions.

What is GTID? And how does it relate to MySQL Fabric?
GTID stands for Global Transaction Identifier, and MySQL Fabric requires MySQL replication to be configured to used this in order to work. This means that it won’t work with MySQL versions previous to 5.6. You can find more information in the manual, and in the upcoming Percona webinar, “Using MySQL Global Transaction IDs in Production,” that our colleague Stephane Combaudon will present on August 27.

During any issue, does MySQL Fabric internally do a switch over or failover?
MySQL Fabric will only detect failures and initiate failovers for a group that has been activated. Otherwise, any changes must be made manually through the mysqlfabric command.

For an alter table, how do you avoid replication lag?
We think using pt-online-schema for the alter table is a good way to avoid or minimize this problem.

Are there benchmarks available for MySQL Fabric?
We’re not aware of any, but once we wrap up our current series of blog posts on the topic, we plan to do some benchmarks ourselves. Our interest lies in the overhead the connectors can place vs a standard connector accessing a single MySQL Server, but if you have other ideas, let us know in the comments and we’ll consider adding them to our list.

Can MySQL Fabric be used to handle Big Data?
We think the HA features are usable regardless of data set size, with the caveat that MySQL Fabric does not (currently) handle provisioning the data of the managed instances. It just configures and points replication as required, but you’re responsible for making sure all servers have the same set of data.

We feel that the sharding features, however, may not be a good fit for working with very large data sets yet, specially because of the use of mysqldump/mysql on the shard split/move operations. However, since sharding usually goes hand in hand with big data (a data set size too large is one of the reasons to shard after all) we’re sure this will get improved in the future. Or, someone with some python skills can adjust the scripts to use a more efficient mechanism to move data from one server to another, like Percona XtraBackup.

Does sharding require many changes to application coding (database code on shards etc) at the MySQL Fabric level? Should we sit with the developers/architects to help them understand tables, their usage, and to plan sharding?
Sharding with MySQL Fabric is not transparent (and we’d say sharding, in general, is typically not transparent) so some application coding and DBAs/Devs interaction will be needed. But you should be working close to your developers and architects (or DBAs and sysadmins, depending on which side of the counter you stand) anyway :)

Would you say that MySQL Fabric is a good HA replacement removing the need for VIPs?
We don’t see them as mutually exclusive. Simplifying, the HA features of MySQL Fabric are just another way to manage replication between a set of MySQL servers. It doesn’t force any way to make writable and readable nodes available to clients. With some coding, you can use MySQL Fabric together with VIPs, or dynamic DNS entries, or a load balancer like HA Proxy.

Does global group mean fabric node?
If you did not attend the webinar, we’d recommend you take a look at slide 7 to get the required context for this question.

Global group means a group that has the schema for all tables, all the data for global tables, and no data for sharded tables. Since it’s a group, it can be one or more nodes, but at any given time, there will only be one PRIMARY node in the group, and the PRIMARY nodes on all shard groups will be replication slaves of this node.

Is the config file at the application side or on the MySQL Fabric node?
There is some configuration at both ends. The MySQL Fabric node (the ‘store’ server in our Vagrant setup) has some configuration, but applications need their own too. The separation is rather clean though. Typically, the application only needs to know how to reach the MySQL Fabric server (host, port, credentials), while the MySQL Fabric node needs to know how to connect to the nodes it manages, how long to wait before starting a failover, etc. The configuration on MySQL Fabric’s side is done by the fabric.cfg file, while on the application side, it depends on the connector used.

When setting a faulty to secondary, does it also automatically come back in replication as a slave?
Yes, though remember you’ll first have to set it to SPARE :)

When we set primary and secondary and it sets the replication, it never copies data and builds replication? It only starts replication from that side and we will lose all the data which is not available on secondary
Yes to the first question. As to the second, we reiterate the fact that MySQL Fabric does not handle provisioning data for instances. When you’re using it to manage nodes, it (currently) assumes the nodes all have a copy of the data. It is your responsibility to make sure that is the case, though this may change in the future. To give you a specific example, suppose you run the following commands:


mysqlfabric group create ha
mysqlfabric group add ha node1
mysqlfabric group add ha node2

Then, on node1’s MySQL’s CLI:

create database test;

and finally run:

mysqlfabric group promote ha --slave_id=<uuid_for_node2>

You’ll end up with node2 being PRIMARY, node1 being SECONDARY, without the ‘test’ database on the PRIMARY node.

Are there any limitations around adding shards? Is it possible to re-configure the sharding keys to handle the additional shard(s)?
We don’t know about any limitations, and yes, it is possible to add more shards and configure the mapping. We did not have enough time to do this at the webinar, but the transcript on the repo includes examples for splitting and moving shards. What you want in this case is to add a new group (can be a single node) and then split a shard (optionally providing a pivot value), keeping one part on the original group and the rest on the new one. Fernando encountered one problem when doing this on a busy server, though the bug is not yet verified so that may have been a user error on his side, but we’d recommend you do similar tests yourself before attempting this in production.

Is the addition of Connectors at the MySQL Fabric level or even at application side? Is an application code change a huge effort?
It’s at the application side. It’s difficult to estimate the effort for the code changes, but if your database access is reasonably modularized/abstracted from your application logic, it shouldn’t be huge. In fact, for HA, if you don’t want to split read and writes, you can migrate an application to a MySQL Fabric-aware connector with just a couple of lines changed (changing the connection string and then requesting a MODE_READWRITE connection always).

What is the minimum server requirements to setup mysqlfabric and does it use any multithreading if i have more than one core for each node. Can we get the transcripts with examples..
We don’t know of the official minimum requirements, but it is a python script which in our experience has been lightweight. Remember it does *not* act as a middle man/proxy between your app and the servers, it is only involved when a client establishes a new connection or when a change in status happens in a server. As for multithreading, we know it’s concurrent (i.e. multiple clients can establish connections at the same time) but we don’t know if it’s parallel.

The transcript with examples can be found here.

How does MySQL Fabric handle latency issues? Does latency increase with mysql traffic?
MySQL Fabric uses regular MySQL connections so latency will affect it the same as any other app using MySQL, and yes, it will increase with traffic (but not due to Fabric).

What if MySQL Fabric is hung? Will connections going to the primary stop? And how can we deduce if there is an issue at the fabric side?
If MySQL Fabric is hung, the following scenarios will be problematic:

  • New connections come in. Clients will try to connect to Fabric before trying to connect any hosts (they won’t know what hosts to connect to otherwise!) so if MySQL Fabric is down, no new connections can be established.
  • A failure happens in the managed servers. MySQL Fabric is responsible for monitoring for failures and taking the appropriate actions (including promoting another server, repointing replicas, etc), so if something fails while MySQL Fabric is down, there will be nobody to take action.

Are there any scenarios where a data loss can happen when promoting/demoting nodes (or nodes failure happens) in a production environment?
Given the use of gtid-enforce-consistency and the fact that MySQL Fabric won’t promote a node to PRIMARY until it has caught up with any pending changes, we feel this is unlikely, but we’re planning a blog post specifically focused on evaluating the potential of data loss during failover.

How to configure MySQL Fabric to use synchronous / semi-synchronous mechanisms?
We have not attempted this, but from this manual page, we think currently only async replication is supported. I would expect sync to be in the roadmap though. If this is something you’d like to see, another section on the same page has some suggestions on how to make it happen.

Thanks again for attending the webinar, and feel free to post any further questions in the comments section.

About Fernando Ipar

Fernando is part of Percona's team working as Senior Consultant. Prior to joining Percona, Fernando worked as a consultant for financial services institutions, telcos, and technology providers.

Comments

  1. Moody says:

    Nice Q&A .. will any tests be made integrating PXC with Fabric? i.e replacing the replication topologies with xtradb cluster? I’d imagine it would give PXC the ability for write scaling and auto sharding..

  2. @Moody:

    I think trying to integrate Fabric’s HA parts with PXC would not add any benefit (that’s my personal opinion), and it would be complicated anyway, since PXC relies on Galera and not on MySQL replication.

    Sharding could work though, I don’t see a reason not to have each shard group be a PXC, with Fabric setting up async M/S replication between the global PXC and the shards. I would only use Fabric for shard routing though, and I’d never promote or activate any of the groups (as again, the HA features wouldn’t work with Galera anyway).

    Finally, to answer your specific question, I don’t have any short term plans to test this kind of integration, but of course that does not mean it may not happen if the needs comes up while working on a case :)

  3. Longxing says:

    Hi Fernando,
    I have a question about the fault-tolerant ability of fabric server, would you please do me a favor.

    I used to read a article about fabric which says:
    (http://www.clusterdb.com/mysql-fabric/mysql-fabric-adding-high-availability-and-scaling-to-mysql)

    The MySQL Fabric process itself is not fault-tolerant and must be restarted in the event of it failing.
    Note that this does not represent a single-point-of-failure for the server farm (HA and/or sharding) as
    the connectors are able to continue routing operations using their local caches while the MySQL
    Fabric process is unavailable.

    So I do some tests about the fault-tolerant ability. I stop the fabirc server and do some queries
    to the connector. But a error happens. The error message is as follows:

    [root@JSSIOEtest01 python]# python test_ddl.py
    Traceback (most recent call last):
    File “test_ddl.py”, line 7, in
    autocommit=True
    File “/usr/lib/python2.6/site-packages/mysql/connector/__init__.py”, line 144, in connect
    return mysql.connector.fabric.connect(*args, **kwargs)
    File “/usr/lib/python2.6/site-packages/mysql/connector/fabric/__init__.py”, line 53, in connect
    return MySQLFabricConnection(**kwargs)
    File “/usr/lib/python2.6/site-packages/mysql/connector/fabric/connection.py”, line 809, in __init__
    self.store_config(**kwargs)
    File “/usr/lib/python2.6/site-packages/mysql/connector/fabric/connection.py”, line 948, in store_config
    self._configure_fabric(config['fabric'])
    File “/usr/lib/python2.6/site-packages/mysql/connector/fabric/connection.py”, line 929, in _configure_fabric
    self._fabric.seed()
    File “/usr/lib/python2.6/site-packages/mysql/connector/fabric/connection.py”, line 332, in seed
    fabinst.connect()
    File “/usr/lib/python2.6/site-packages/mysql/connector/fabric/connection.py”, line 763, in connect
    self._proxy = self._xmlrpc_get_proxy()
    File “/usr/lib/python2.6/site-packages/mysql/connector/fabric/connection.py”, line 747, in _xmlrpc_get_proxy
    proxy._some_nonexisting_method() # pylint: disable=W0212
    File “/usr/lib64/python2.6/xmlrpclib.py”, line 1199, in __call__
    return self.__send(self.__name, args)
    File “/usr/lib64/python2.6/xmlrpclib.py”, line 1489, in __request
    verbose=self.__verbose
    File “/usr/lib/python2.6/site-packages/mysql/connector/fabric/connection.py”, line 272, in request
    raise InterfaceError(“Connection with Fabric failed: ” + msg)
    mysql.connector.errors.InterfaceError: Connection with Fabric failed:

    It seems that connector need to connect fabric server first rather than using the local cache.
    Am I right? Or something else?

    Regards
    Longxing

  4. Martin Arrieta says:

    @Longxing,

    Correct, the first time that you use the connector it will need to read the server information from the store server, however after that it will use the local cache. Try the following example:

    1- Start your script, open the connector and keep running queries.
    2- Stop the store server
    3- Check if the status of your script

    You will see that your script will keep running without problems using the local cache even with the store server down.

    Regards,

    Martin
    @MartinArrietaC

  5. Shen Longxing says:

    @Martin,

    Grateful for your reply, it does helpful to me.

    And I have another question, would you please do me a favor. Thank you!
    I want to reshard the shards, and it seems that there is no api to achieve that.

    I used to read an article from Oracle saying there are tools using to reshard, but I do not know what are they.
    Can you please give me some advice?

    Regards
    Longxing

  6. @Longxing,

    To my knowledge, the only currently existing operations that change shards are split and move. You can see an example of a split here: https://github.com/martinarrieta/vagrant-fabric/blob/sharding/webinar-commands/session-0.org#splitting-a-shard

    FYI, at this moment there are two open bugs that I know about that affect the split and move functionality: http://bugs.mysql.com/bug.php?id=73212 and http://bugs.mysql.com/bug.php?id=73546

    Regards,
    Fernando.

  7. Shen Longxing says:

    @Fernando

    Grateful for your reply, it does helpful to me.
    I do the split according to your example:https://github.com/martinarrieta/vagrant-fabric/blob/sharding/webinar-commands/session-0.org#splitting-a-shard.

    But, I got a error when execute “mysqlfabric sharding split_shard 1 salaries-4″.
    The log is as follows:
    [root@JSSIOEtest05 python]# mysqlfabric sharding split_shard 1 group_id-3 –split_v alue=10000
    Password for admin:
    Procedure :
    { uuid = 2cf79d32-feaa-4355-a6d4-b644d92704f1,
    finished = True,
    success = False,
    return = BackupError: (‘Error while restoring the backup using the mysql client\n, %s’, “ERROR 1840 (HY000) at line 24 in file: ‘MySQL_132.228.239.24_3317.sql’ : @@GLOBAL.GTID_PURGED can only be set when @@GLOBAL.GTID_EXECUTED is empty.\n”),
    activities =
    }

    And you told me that there is a open bug about the split:http://bugs.mysql.com/bug.php?id=73212.
    Is it the reason for the error that I met?

    Regards
    Longxing

  8. @Longxing,

    Yes, I believe that bug is the reason for the error you’re getting. In my test case for the bug, I have ongoing write activity to the shard while doing the split. Is that your case too? If you get this error even with no activity, please add that as a comment to the bug report to help the engineers working on it.

    Regards,
    Fernando.

  9. Shen Longxing says:

    @Fernando

    I just split the shard without any activity. And i will add this to the bug report. Thank you!

    Regards
    Longxing

Speak Your Mind

*