November 27, 2014

Managing shards of MySQL databases with MySQL Fabric

This is the fourth post in our MySQL Fabric series. In case you’re joining us now, we started with an introductory post, and then discussed High Availability (HA) using MySQL Fabric here (Part 1) and here (Part 2). Today we will talk about how MySQL Fabric can help you scale out MySQL databases with sharding.

Introduction

At the time of writing, MySQL Fabric includes support for range- and hash-based sharding. As with HA, the functionality is split between client, through a MySQL Fabric-aware connector; and server, through the mysqlfabric utility and the XML-RPC server we’ve talked about before.

In this post, we’ll go through the process of setting up a sharded table for use with MySQL Fabric, and then go through some usage examples, again using the Python connector.

In our next post, we’ll talk about shard management operations, and go into more detail about how we can combine the Sharding and HA features of MySQL Fabric.

The architecture

For our examples, we’ll be using a sharding branch from our vagrant-fabric repository. If you have been following previous posts and already have a local copy of the repo, you can get this one just by running the following command:

from the root of your copy. Bear in mind that the node names are the same in the Vagrantfile, so while in theory  just running vagrant provision should be enough, you may have to run vagrant destroy and vagrant up again, if you hit unexpected behavior.

The only difference between this branch and the original one is that you’ll have two mysqld instances per node: one on port 3306 and one on port 13306. This will let us achieve high availability for our shard groups. But don’t worry about that for now, it’s something we’ll discuss more in depth in our next post.

In today’s examples, we’ll be using the three group architecture described by this diagram:

Fabric Sharding Structure

The blue boxes represent shard-groups and the green box represent the global-group. The red arrows indicate the flow of replication and the violet arrows represent client connections.

Setting up sharding

The official documentation about sharding with MySQL Fabric can be found here. We’ll be using the same example employees database and shard the salaries table.

As we said, to keep things simple for the introduction, we’ll create all the groups but only add one instance to each one of them. In our next post, we’ll use two instances per group to evaluate how MySQL Fabric can make our shards highly available, and how it can rearrange replication topologies automatically after a failure.

To start, let’s create three groups:

 

The global group will be used to propagate schema changes and to store unpartitioned data. Think of configuration tables that don’t need to be sharded, for example.

The other two groups will host shards, that is, tables that will have the same structure across all the nodes, but not the same data (and that will be empty in the global group’s nodes).

Now, let’s add one instance to each group:

We also need to promote the groups. Even though each group has a single node, MySQL Fabric sets up that node as SECONDARY, which means it can’t take writes.

Finally, we are ready to create a shard definition and associate ranges to groups:

The integer after each shard group is the lower bound for emp_no values found on that shard.

After the last command, the shard groups should be replicating off the global one. We can verify that this is the case by checking salaries-3:

Looks good. Let’s go ahead and create the database schema. To avoid being too verbose, we’re only including the create statement for the salaries table in this example. Notice we run this on the PRIMARY node for the global group:

And again, check that it made it to the shard groups:

Good. We’re now ready to use the Python connector and load some data into this table. We’ll be using the following script:

This is similar to the script we used in our HA post. It inserts rows with random data in an endless loop. The sleep on every iteration is there just to make it easier to cancel the script, and to keep row insert rate under control.

If you leave this running for a while, you should then be able to check the global server and individual shards, and confirm they have different data:

As you can see, the global group’s server has no data for this table, and each shard’s server has data within the defined boundaries.

Querying data is done similarly (though with a READ_ONLY connection), and we can also lookup the group a row belongs to using the mysqlfabric utility directly:

Bear in mind that this lookups only use the fabric store, which means they can tell you on which servers a given row may be, but can’t confirm if the row exists or not. You need to actually query the given servers for that. If you use the connector, both steps are done for you when you issue the query.

The following code snippets illustrate the point:

In our examples, we connected directly to the PRIMARY node of the global group in order to execute DDL statements, but the same can be done requesting a global connection to MySQL Fabric, like so:

We can see that the table gets replicated as expected:

Note that we’re explicitly indicating we want to connect to the global group here. When establishing a MySQL Fabric connection, we need to specify either a group name or a key and table pair (as in the insert example).

Summary

Today we’ve presented the basics of how MySQL Fabric can help you scale out by sharding, but we’ve intentionally left a few things out of the picture to keep this example simple.

In our next post, we’ll see how we can combine MySQL Fabric’s HA and sharding features, what support we have for shard operations and how HASH sharding works in MySQL Fabric.

About Martin Arrieta

Martin joined Percona in January 2012. He has been using Linux and open source technologies since 1999. Martin has worked with Apache, DNS's, mail servers, iptables and MySQL servers.

Comments

  1. adam says:

    Excellent article. Is it possible to show few examples using PHP ?

  2. @adam: thank you for taking the time to comment!

    We have a follow up post which should go online sometime next week, and while we have been using python for the examples too, I’ll see if we can make it on time with a PHP example.

    If that’s not the case, we have a webinar coming up soon (http://www.percona.com/resources/mysql-webinars/putting-mysql-fabric-use) and I’ll make sure to include PHP examples there. We’ll be doing a post announcing it shortly.

  3. Tim says:

    I have watched the excellent webinar (Putting MySQL Fabric to Use) but there is no PHP examples.

    I also looking forward to examples about connect to a Fabric HA group with PHP.

  4. Shen says:

    I create 3 groups, global-group, group_id-1 and group_id-2, then built a shard. Till now, everything is OK. But when i split the shard with a new group named group_id-3, there is an error occurred. The message is as follows:
    # mysqlfabric sharding split_shard 2 group_id-3 –split_value=100000
    Password for admin:
    Procedure :
    { uuid = a0eaff93-f6fe-4a7d-9c44-e7e53882fbfe,
    finished = True,
    success = False,
    return = BackupError: (‘Error while restoring the backup using the mysql client\n, %s’, “ERROR 1840 (HY000) at line 24 in file: ‘MySQL_132.228.239.19_3316.sql’: @@GLOBAL.GTID_PURGED can only be set when @@GLOBAL.GTID_EXECUTED is empty.\n”),
    activities =
    }
    I run the “reset master” command first, but the error is still there.what is this error?

    Regards
    Shen

  5. @Tim: I have PHP examples on my list. Will do either a blog post or add examples to our vagrant repo. I’ve been away on vacation the last couple of weeks so I don’t yet have an ETA for when this will be published, but it should be soon :)

    @Shen: It seems you’re hitting http://bugs.mysql.com/bug.php?id=73212. If you think that’s the case, please mark the bug as “affects me”, to help Oracle prioritise it.

Speak Your Mind

*