Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

MySQL Sharding with ProxySQL

August 30, 2016

Author

Marco Tusa

MySQL

Share this Post:

This article demonstrates how MySQL sharding with ProxySQL works.

Recently a colleague of mine asked me to provide a simple example on how ProxySQL performs sharding.

In response, I’m writing this short tutorial in the hope it will illustrate ProxySQL’s sharding functionalities, and help people out there better understand how to use it.

ProxySQL is a very powerful platform that allows us to manipulate and manage our connections and queries in a simple but effective way. This article shows you how.

Before starting let’s clarify some basic concepts.

- ProxySQL organizes its internal set of servers in Host Groups (HG), and each HG can be associated with users and Query Rules (QR)

- Each QR can be final (apply = 1) or = let ProxySQL continue to parse other QRs

- A QR can be a rewrite action, be a simple match, have a specific target HG, or be generic

- QRs are defined using regex

You can see QRs as a sequence of filters and transformations that you can arrange as you like.

These simple basic rules give us enormous flexibility. They allow us to create very simple actions like a simple query re-write, or very complex chains that could see dozens of QR concatenated. Documentation can be found here.

The information related to HGs or QRs is easily accessible using the ProxySQL administrator interface, in the tables mysql_servers, mysql_query_rules and stats.stats_mysql_query_rules. The last one allows us to evaluate if and how the rule(s) is used.

With regards to sharding, what can ProxySQL do to help us achieve what we need (in a relatively easy way)? Some people/companies include sharding logic in the application, use multiple connections to reach the different targets, or have some logic to split the load across several schemas/tables. ProxySQL allows us to simplify the way connectivity and query distribution is supposed to work reading data in the query or accepting HINTS.

No matter what the requirements, the sharding exercise can be summarized in a few different categories.

- By splitting the data inside the same container (like having a shard by State where each State is a schema)

- By physical data location (this can have multiple MySQL servers in the same room, as well as having them geographically distributed)

- A combination of the two, where I do split by State using a dedicated server, and again split by schema/table by whatever (say by gender)

In the following examples, I show how to use ProxySQL to cover the three different scenarios defined above (and a bit more).

The example below will report text from the Admin ProxySQL interface and the MySQL console. I will mark each one as follows:

- Mc for MySQL console

- Pa for ProxySQL Admin

Please note that the MySQL console MUST use the -c flag to pass the comments in the query. This is because the default behavior in the MySQL console is to remove the comments.

I am going to illustrate procedures that you can replicate on your laptop, and when possible I will mention a real implementation. This because I want you to directly test the ProxySQL functionalities.

For the example described below I have a PrxySQL v1.2.2 that is going to become the master in few days. You can download it from:

git clone https://github.com/sysown/proxysql.git
git checkout v1.2.2

1 2	git clone https://github.com/sysown/proxysql.git git checkout v1.2.2

Then to compile:

cd <path to proxy source code>
make
make install

cd <path to proxy source code>

make

make install

If you need full instructions on how to install and configure ProxySQL, read here and here.

Finally, you need to have the WORLD test DB loaded. WORLD test DB can be found here.

Shard inside the same MySQL Server using three different schemas split by continent

Obviously, you can have any number of shards and relative schemas. What is relevant here is demonstrating how traffic gets redirected to different targets (schemas), maintaining the same structure (tables), by discriminating the target based on some relevant information in the data or pass by the application.

OK, let us roll the ball.

[Mc]
+---------------+-------------+
| Continent     | count(Code) |
+---------------+-------------+
| Asia          |          51 | <--
| Europe        |          46 | <--
| North America |          37 | 
| Africa        |          58 | <-- 
| Oceania       |          28 |
| Antarctica    |           5 |
| South America |          14 |
+---------------+-------------+

[Mc]

+---------------+-------------+

| Continent | count(Code) |

+---------------+-------------+

| Asia | 51 | <--

| Europe | 46 | <--

| North America | 37 |

| Africa | 58 | <--

| Oceania | 28 |

| Antarctica | 5 |

| South America | 14 |

+---------------+-------------+

For this exercise, I will use three hosts in replica.

To summarize, I will need:

- Three hosts: 192.168.1.[5-6-7]

- Three schemas: Continent X + world schema

- One user : user_shardRW

- Three hostgroups: 10, 20, 30 (for future use)

First, we will create the schemas Asia, Africa, Europe:

[Mc]
Create schema [Asia|Europe|North_America|Africa];
create table Asia.City as select a.* from  world.City a join Country on a.CountryCode = Country.code where Continent='Asia' ;
create table Europe.City as select a.* from  world.City a join Country on a.CountryCode = Country.code where Continent='Europe' ;
create table Africa.City as select a.* from  world.City a join Country on a.CountryCode = Country.code where Continent='Africa' ;
create table North_America.City as select a.* from  world.City a join Country on a.CountryCode = Country.code where Continent='North America' ;

create table Asia.Country as select * from  world.Country where Continent='Asia' ;
create table Europe.Country as select * from  world.Country where Continent='Europe' ;
create table Africa.Country as select * from  world.Country  where Continent='Africa' ;
create table North_America.Country as select * from  world.Country where Continent='North America' ;

[Mc]

Create schema [Asia|Europe|North_America|Africa];

create table Asia.City as select a.* from world.City a join Country on a.CountryCode = Country.code where Continent='Asia' ;

create table Europe.City as select a.* from world.City a join Country on a.CountryCode = Country.code where Continent='Europe' ;

create table Africa.City as select a.* from world.City a join Country on a.CountryCode = Country.code where Continent='Africa' ;

create table North_America.City as select a.* from world.City a join Country on a.CountryCode = Country.code where Continent='North America' ;

create table Asia.Country as select * from world.Country where Continent='Asia' ;

create table Europe.Country as select * from world.Country where Continent='Europe' ;

create table Africa.Country as select * from world.Country where Continent='Africa' ;

create table North_America.Country as select * from world.Country where Continent='North America' ;

Now, create the user

grant all on *.* to user_shardRW@'%' identified by 'test';

1	grant all on . to user_shardRW@'%' identified by 'test';

Now let us start to configure the ProxySQL:

[Pa]
insert into mysql_users (username,password,active,default_hostgroup,default_schema) values ('user_shardRW','test',1,10,'test_shard1');
LOAD MYSQL USERS TO RUNTIME;SAVE MYSQL USERS TO DISK;

INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.5',10,3306,100);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.6',20,3306,100);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.7',30,3306,100);
LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;

[Pa]

insert into mysql_users (username,password,active,default_hostgroup,default_schema) values ('user_shardRW','test',1,10,'test_shard1');

LOAD MYSQL USERS TO RUNTIME;SAVE MYSQL USERS TO DISK;

INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.5',10,3306,100);

INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.6',20,3306,100);

INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.7',30,3306,100);

LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;

With this we have defined the user, the servers and the host groups.

Let us start to define the logic with the query rules:

[Pa]
delete from mysql_query_rules where rule_id > 30;
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply) VALUES (31,1,'user_shardRW',"^SELECTs*(.*)s*froms*world.(S*)s(.*).*Continent='(S*)'s*(s*.*)$","SELECT 1 from 4.2 WHERE 1=1 5",1);
LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;

[Pa]

delete from mysql_query_rules where rule_id > 30;

INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply) VALUES (31,1,'user_shardRW',"^SELECTs*(.*)s*froms*world.(S*)s(.*).*Continent='(S*)'s*(s*.*)$","SELECT 1 from 4.2 WHERE 1=1 5",1);

LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;

I am now going to query the master (or a single node), but I am expecting ProxySQL to redirect the query to the right shard, catching the value of the continent:

[Mc]
 SELECT name,population from world.City  WHERE Continent='Europe' and CountryCode='ITA' order by population desc limit 1;
+------+------------+
| name | population |
+------+------------+
| Roma |    2643581 |
+------+------------+

[Mc]

SELECT name,population from world.City WHERE Continent='Europe' and CountryCode='ITA' order by population desc limit 1;

+------+------------+

| name | population |

+------+------------+

| Roma | 2643581 |

+------+------------+

You can say: “Hey! You are querying the schema World, of course you get back the correct data.”

This is not what really happened. ProxySQL did not query the schema World, but the schema Europe.

Let’s look at the details:

[Pa]
select * from stats_mysql_query_digest;
Original    :SELECT name,population from world.City  WHERE Continent='Europe' and CountryCode='ITA' order by population desc limit 1;
Transformed :SELECT name,population from Europe.City WHERE ?=? and CountryCode=? order by population desc limit ?

[Pa]

select * from stats_mysql_query_digest;

Original :SELECT name,population from world.City WHERE Continent='Europe' and CountryCode='ITA' order by population desc limit 1;

Transformed :SELECT name,population from Europe.City WHERE ?=? and CountryCode=? order by population desc limit ?

Let me explain what happened.

Rule 31 in ProxySQL will take all the FIELDS we will pass in the query. It will catch the CONTINENT in the WHERE clause, it will take any condition after WHERE and it will reorganize the queries all using the RegEx.

Does this work for any table in the sharded schemas? Of course it does.

A query like: SELECT name,population from world.Country WHERE Continent='Asia' ;
Will be transformed into: SELECT name,population from Asia.Country WHERE ?=?

[Mc]
+----------------------+------------+
| name                 | population |
+----------------------+------------+
| Afghanistan          |   22720000 |
| United Arab Emirates |    2441000 |
| Armenia              |    3520000 |
<snip ...>
| Vietnam              |   79832000 |
| Yemen                |   18112000 |
+----------------------+------------+

[Mc]

+----------------------+------------+

| name | population |

+----------------------+------------+

| Afghanistan | 22720000 |

| United Arab Emirates | 2441000 |

| Armenia | 3520000 |

| Vietnam | 79832000 |

| Yemen | 18112000 |

+----------------------+------------+

Another possible approach is to instruct ProxySQL to shard is to pass a hint inside a comment. Let see how.

First let me disable the rule I just inserted. This is not really needed, but we’ll do it so you can see how. 🙂

[Pa]
mysql> update mysql_query_rules set active=0 where rule_id=31;
Query OK, 1 row affected (0.00 sec)
mysql> LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;
Query OK, 0 rows affected (0.00 sec)

[Pa]

mysql> update mysql_query_rules set active=0 where rule_id=31;

Query OK, 1 row affected (0.00 sec)

mysql> LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;

Query OK, 0 rows affected (0.00 sec)

Done.

Now what I want is for *ANY* query that contains the comment /* continent=X */ to go to the continent X schema, same server.

To do so, I instruct ProxySQL to replace any reference to the world schema inside the query I am going to submit.

[Pa]
delete from mysql_query_rules where rule_id in (31,33,34,35,36);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,FlagOUT,FlagIN) VALUES (31,1,'user_shardRW',"S*s*/*s*continent=.*Asias**.*",null,0,23,0);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,FlagIN,FlagOUT) VALUES (32,1,'user_shardRW','world.','Asia.',0,23,23);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,FlagOUT,FlagIN) VALUES (33,1,'user_shardRW',"S*s*/*s*continent=.*Europes**.*",null,0,25,0);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,FlagIN,FlagOUT) VALUES (34,1,'user_shardRW','world.','Europe.',0,25,25);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,FlagOUT,FlagIN) VALUES (35,1,'user_shardRW',"S*s*/*s*continent=.*Africas**.*",null,0,24,0);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,FlagIN,FlagOUT) VALUES (36,1,'user_shardRW','world.','Africa.',0,24,24);
LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;

[Pa]

delete from mysql_query_rules where rule_id in (31,33,34,35,36);

INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,FlagOUT,FlagIN) VALUES (31,1,'user_shardRW',"S*s*/*s*continent=.*Asias**.*",null,0,23,0);

INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,FlagIN,FlagOUT) VALUES (32,1,'user_shardRW','world.','Asia.',0,23,23);

INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,FlagOUT,FlagIN) VALUES (33,1,'user_shardRW',"S*s*/*s*continent=.*Europes**.*",null,0,25,0);

INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,FlagIN,FlagOUT) VALUES (34,1,'user_shardRW','world.','Europe.',0,25,25);

INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,FlagOUT,FlagIN) VALUES (35,1,'user_shardRW',"S*s*/*s*continent=.*Africas**.*",null,0,24,0);

INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,FlagIN,FlagOUT) VALUES (36,1,'user_shardRW','world.','Africa.',0,24,24);

LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;

How does this work?

I have defined two concatenated rules. The first captures the incoming query containing the desired value (like continent = Asia). If the match is there, ProxySQL exits that action, but while doing so it will read the Apply field. If Apply is 0, it will read the FlagOUT value. At this point it will go to the first rule (in sequence) that has the value of FlagIN equal to the FlagOUT.

The second rule gets the request and will replace the value of world with the one I have defined. In short, it replaces whatever is in the match_pattern with the value that is in the replace_pattern.

Now ProxySQL implements the Re2 Google library for RegEx. Re2 is very fast but has some limitations, like it does NOT support (at the time of the writing) the flag option g. In other words, if I have a select with many tables, and as such several “world”, Re2 will replace ONLY the first instance.

As such, a query like:

Select /* continent=Europe */ * from world.Country join world.City on world.City.CountryCode=world.Country.Code where Country.code='ITA' ;

1	Select /* continent=Europe / from world.Country join world.City on world.City.CountryCode=world.Country.Code where Country.code='ITA' ;

Will be transformed into:

Select /* continent=Europe */ * from Europe.Country join world.City on world.City.CountryCode=world.Country.Code where Country.code='ITA' ;

1	Select /* continent=Europe / from Europe.Country join world.City on world.City.CountryCode=world.Country.Code where Country.code='ITA' ;

And fail.

The other day, Rene and I were discussing how to solve this given the lack of implementation in Re2. Finally, we opted for recursive actions.

What does this mean? It means that ProxySQL from v1.2.2 now has a new functionality that allows recursive calls to a Query Rule. The maximum number of iterations that ProxySQL can run is managed by the option (global variable) mysql-query_processor_iterations. Mysql-query_processor_iterations define how many operations a query process can execute as whole (from start to end).

This new implementation allows us to reference a Query Rule to itself to be executed multiple times.

If you go back you will notice that QR 34 has FlagIN and FlagOUT pointing to the same value of 25 and Apply =0. This brings ProxySQL to recursively call rule 34 until it changes ALL the values of the word world.

The result is the following:

[Mc]
Select /* continent=Europe */ Code, City.Name, City.population  from world.Country join world.City on world.City.CountryCode=world.Country.Code where City.population > 10000 group by Name order by City.Population desc limit 5;
+------+---------------+------------+
| Code | Name          | population |
+------+---------------+------------+
| RUS  | Moscow        |    8389200 |
| GBR  | London        |    7285000 |
| RUS  | St Petersburg |    4694000 |
| DEU  | Berlin        |    3386667 |
| ESP  | Madrid        |    2879052 |
+------+---------------+------------+

[Mc]

Select /* continent=Europe */ Code, City.Name, City.population from world.Country join world.City on world.City.CountryCode=world.Country.Code where City.population > 10000 group by Name order by City.Population desc limit 5;

+------+---------------+------------+

| Code | Name | population |

+------+---------------+------------+

| RUS | Moscow | 8389200 |

| GBR | London | 7285000 |

| RUS | St Petersburg | 4694000 |

| DEU | Berlin | 3386667 |

| ESP | Madrid | 2879052 |

+------+---------------+------------+

You can see ProxySQL internal information using the following queries:

[Pa]
 select active,hits, mysql_query_rules.rule_id, match_digest, match_pattern, replace_pattern, cache_ttl, apply,flagIn,flagOUT FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id;
+--------+------+---------+---------------------+----------------------------------------+-----------------+-----------+-------+--------+---------+
| active | hits | rule_id | match_digest        | match_pattern                          | replace_pattern | cache_ttl | apply | flagIN | flagOUT |
+--------+------+---------+---------------------+----------------------------------------+-----------------+-----------+-------+--------+---------+
| 1      | 1    | 33      | NULL                | S*s*/*s*continent=.*Europes**.* | NULL            | NULL      | 0     | 0      | 25      |
| 1      | 4    | 34      | NULL                | world.                                 | Europe.         | NULL      | 0     | 25     | 25      |
| 1      | 0    | 35      | NULL                | S*s*/*s*continent=.*Africas**.* | NULL            | NULL      | 0     | 0      | 24      |
| 1      | 0    | 36      | NULL                | world.                                 | Africa.         | NULL      | 0     | 24     | 24      |
+--------+------+---------+---------------------+----------------------------------------+-----------------+-----------+-------+--------+---------+

[Pa]

select active,hits, mysql_query_rules.rule_id, match_digest, match_pattern, replace_pattern, cache_ttl, apply,flagIn,flagOUT FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id;

+--------+------+---------+---------------------+----------------------------------------+-----------------+-----------+-------+--------+---------+

+--------+------+---------+---------------------+----------------------------------------+-----------------+-----------+-------+--------+---------+

| 1 | 1 | 33 | NULL | S*s*/*s*continent=.*Europes**.* | NULL | NULL | 0 | 0 | 25 |

| 1 | 4 | 34 | NULL | world. | Europe. | NULL | 0 | 25 | 25 |

| 1 | 0 | 35 | NULL | S*s*/*s*continent=.*Africas**.* | NULL | NULL | 0 | 0 | 24 |

| 1 | 0 | 36 | NULL | world. | Africa. | NULL | 0 | 24 | 24 |

+--------+------+---------+---------------------+----------------------------------------+-----------------+-----------+-------+--------+---------+

And:

[Pa]
select * from stats_mysql_query_digest;
<snip and taking only digest_text>
Select Code, City.Name, City.population from Europe.Country join Europe.City on Europe.City.CountryCode=Europe.Country.Code where City.population > ? group by Name order by City.Population desc limit ?

[Pa]

select * from stats_mysql_query_digest;

Select Code, City.Name, City.population from Europe.Country join Europe.City on Europe.City.CountryCode=Europe.Country.Code where City.population > ? group by Name order by City.Population desc limit ?

As you can see ProxySQL has nicely replaced the word world in the query to Europe, and it ran Query Rule 34 four times (hits).

This is obviously working for Insert/Update/Delete as well.

Queries like:

insert into  /* continent=Europe */  world.City values(999999,'AAAAAAA','ITA','ROMA',0) ;

1	insert into /* continent=Europe */ world.City values(999999,'AAAAAAA','ITA','ROMA',0) ;

Will be transformed into:

[Pa]
select digest_text from stats_mysql_query_digest;
+-------------------------------------------+
| digest_text                               |
+-------------------------------------------+
| insert into Europe.City values(?,?,?,?,?) |
+-------------------------------------------+

[Pa]

select digest_text from stats_mysql_query_digest;

+-------------------------------------------+

| digest_text |

+-------------------------------------------+

| insert into Europe.City values(?,?,?,?,?) |

+-------------------------------------------+

And executed only on the desired schema.

Sharding by host

Using hint

How can I shard and redirect the queries to a host (instead of a schema)? This is even easier! 🙂

The main point is that whatever matches the rule should go to a defined HG. No rewrite imply, which means less work.

So how this is done? As before, I have three NODES: 192.168.1.[5-6-7]. For this example, I will use world DB (no continent schema), distributed in each node, and I will retrieve the node bound to the IP to be sure I am going to the right place.

I instruct ProxySQL to send my query by using a HINT to a specific host. I choose the hint “shard_host_HG”, and I am going to inject it in the query as a comment.

As such the Query Rules will be:

[Pa]
delete from mysql_query_rules where rule_id in (40,41,42, 10,11,12);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,destination_hostgroup,apply) VALUES (10,1,'user_shardRW',"/*s*shard_host_HG=.*Europes**.",10,0);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,destination_hostgroup,apply) VALUES (11,1,'user_shardRW',"/*s*shard_host_HG=.*Asias**.",20,0);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,destination_hostgroup,apply) VALUES (12,1,'user_shardRW',"/*s*shard_host_HG=.*Africas**.",30,0);
LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;

[Pa]

delete from mysql_query_rules where rule_id in (40,41,42, 10,11,12);

INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,destination_hostgroup,apply) VALUES (10,1,'user_shardRW',"/*s*shard_host_HG=.*Europes**.",10,0);

INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,destination_hostgroup,apply) VALUES (11,1,'user_shardRW',"/*s*shard_host_HG=.*Asias**.",20,0);

INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,destination_hostgroup,apply) VALUES (12,1,'user_shardRW',"/*s*shard_host_HG=.*Africas**.",30,0);

LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;

While the queries I am going to test are:

[Mc]
Select /* shard_host_HG=Europe */ City.Name, City.Population from world.Country join world.City on world.City.CountryCode=world.Country.Code where Country.code='ITA' limit 5; SELECT * /* shard_host_HG=Europe */ from information_schema.GLOBAL_VARIABLES where variable_name like 'bind%';
Select /* shard_host_HG=Asia */ City.Name, City.Population from world.Country join world.City on world.City.CountryCode=world.Country.Code where Country.code='IND' limit 5; SELECT * /* shard_host_HG=Asia */ from information_schema.GLOBAL_VARIABLES where variable_name like 'bind%';
Select /* shard_host_HG=Africa */ City.Name, City.Population from world.Country join world.City on world.City.CountryCode=world.Country.Code where Country.code='ETH' limit 5; SELECT * /* shard_host_HG=Africa */ from information_schema.GLOBAL_VARIABLES where variable_name like 'bind%';

[Mc]

Select /* shard_host_HG=Europe */ City.Name, City.Population from world.Country join world.City on world.City.CountryCode=world.Country.Code where Country.code='ITA' limit 5; SELECT * /* shard_host_HG=Europe */ from information_schema.GLOBAL_VARIABLES where variable_name like 'bind%';

Select /* shard_host_HG=Asia */ City.Name, City.Population from world.Country join world.City on world.City.CountryCode=world.Country.Code where Country.code='IND' limit 5; SELECT * /* shard_host_HG=Asia */ from information_schema.GLOBAL_VARIABLES where variable_name like 'bind%';

Select /* shard_host_HG=Africa */ City.Name, City.Population from world.Country join world.City on world.City.CountryCode=world.Country.Code where Country.code='ETH' limit 5; SELECT * /* shard_host_HG=Africa */ from information_schema.GLOBAL_VARIABLES where variable_name like 'bind%';

Running the query for Africa, I will get:

[Mc]
Select /* shard_host_HG=Africa */ City.Name, City.Population from world.Country join world.City on world.City.CountryCode=world.Country.Code where Country.code='ETH' limit 5; SELECT * /* shard_host_HG=Africa */ from information_schema.GLOBAL_VARIABLES where variable_name like 'bind%';
+-------------+------------+
| Name        | Population |
+-------------+------------+
| Addis Abeba |    2495000 |
| Dire Dawa   |     164851 |
| Nazret      |     127842 |
| Gonder      |     112249 |
| Dese        |      97314 |
+-------------+------------+
+---------------+----------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+---------------+----------------+
| BIND_ADDRESS  | 192.168.1.7    |
+---------------+----------------+

[Mc]

+-------------+------------+

| Name | Population |

+-------------+------------+

| Addis Abeba | 2495000 |

| Dire Dawa | 164851 |

| Nazret | 127842 |

| Gonder | 112249 |

| Dese | 97314 |

+-------------+------------+

+---------------+----------------+

| VARIABLE_NAME | VARIABLE_VALUE |

+---------------+----------------+

| BIND_ADDRESS | 192.168.1.7 |

+---------------+----------------+

That will give me:

[Pa]
select active,hits, mysql_query_rules.rule_id, match_digest, match_pattern, replace_pattern, cache_ttl, apply,flagIn,flagOUT FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id;
+--------+------+---------+---------------------+----------------------------------------+-----------------+-----------+-------+--------+---------+
| active | hits | rule_id | match_digest        | match_pattern                          | replace_pattern | cache_ttl | apply | flagIN | flagOUT |
+--------+------+---------+---------------------+----------------------------------------+-----------------+-----------+-------+--------+---------+
| 1      | 0    | 40      | NULL                | /*s*shard_host_HG=.*Europes**.    | NULL            | NULL      | 0     | 0      | 0       |
| 1      | 0    | 41      | NULL                | /*s*shard_host_HG=.*Asias**.      | NULL            | NULL      | 0     | 0      | 0       |
| 1      | 2    | 42      | NULL                | /*s*shard_host_HG=.*Africas**.    | NULL            | NULL      | 0     | 0      | 0       | <-- Note the HITS (2 as the run queries)
+--------+------+---------+---------------------+----------------------------------------+-----------------+-----------+-------+--------+---------+

[Pa]

+--------+------+---------+---------------------+----------------------------------------+-----------------+-----------+-------+--------+---------+

+--------+------+---------+---------------------+----------------------------------------+-----------------+-----------+-------+--------+---------+

| 1 | 0 | 40 | NULL | /*s*shard_host_HG=.*Europes**. | NULL | NULL | 0 | 0 | 0 |

| 1 | 0 | 41 | NULL | /*s*shard_host_HG=.*Asias**. | NULL | NULL | 0 | 0 | 0 |

| 1 | 2 | 42 | NULL | /*s*shard_host_HG=.*Africas**. | NULL | NULL | 0 | 0 | 0 | <-- Note the HITS (2 as the run queries)

+--------+------+---------+---------------------+----------------------------------------+-----------------+-----------+-------+--------+---------+

In this example, we have NO replace_patter. This is only a matching and redirecting Rule, where the destination HG is defined in the value of the destination_hostgroup attribute while inserting. In the case for Africa, it is HG 30.

The server in HG 30 is:

[Pa]
select hostgroup_id,hostname,port,status from mysql_servers ;
+--------------+-------------+------+--------+
| hostgroup_id | hostname    | port | status |
+--------------+-------------+------+--------+
| 10           | 192.168.1.5 | 3306 | ONLINE |
| 20           | 192.168.1.6 | 3306 | ONLINE |
| 30           | 192.168.1.7 | 3306 | ONLINE | <---
+--------------+-------------+------+--------+

[Pa]

select hostgroup_id,hostname,port,status from mysql_servers ;

+--------------+-------------+------+--------+

+--------------+-------------+------+--------+

| 10 | 192.168.1.5 | 3306 | ONLINE |

| 20 | 192.168.1.6 | 3306 | ONLINE |

| 30 | 192.168.1.7 | 3306 | ONLINE | <---

+--------------+-------------+------+--------+

This perfectly matches our returned value.

You can try by your own the other two continents.

Using destination_hostgroup

Another way to assign a query’s final host is to use the the destination_hostgroup, set the Schema_name attribute and use the use schema syntax in the query.

For example:

[Pa]
INSERT INTO mysql_query_rules (active,schemaname,destination_hostgroup,apply) VALUES
(1, 'shard00', 1, 1), (1, 'shard01', 1, 1), (1, 'shard03', 1, 1),
(1, 'shard04', 2, 1), (1, 'shard06', 2, 1), (1, 'shard06', 2, 1),
(1, 'shard07', 3, 1), (1, 'shard08', 3, 1), (1, 'shard09', 3, 1);

[Pa]

INSERT INTO mysql_query_rules (active,schemaname,destination_hostgroup,apply) VALUES

(1, 'shard00', 1, 1), (1, 'shard01', 1, 1), (1, 'shard03', 1, 1),

(1, 'shard04', 2, 1), (1, 'shard06', 2, 1), (1, 'shard06', 2, 1),

(1, 'shard07', 3, 1), (1, 'shard08', 3, 1), (1, 'shard09', 3, 1);

And then in the query do something like :

use shard02; Select * from tablex;

1	use shard02; Select * from tablex;

I mention this method because it is currently one of the most common in large companies using SHARDING.

But it is not safe, because it relays on the fact the query will be executed in the desired HG. The risk of error is high.

Just think if a query doing join against a specified SHARD:

use shard01; Select * from tablex join shard03 on tablex.id = shard03.tabley.id;

1	use shard01; Select * from tablex join shard03 on tablex.id = shard03.tabley.id;

This will probably generate an error because shard03 is probably NOT present on the host containing shard01.

As such this approach can be used ONLY when you are 100% sure of what you are doing, and when you are sure NO query will have an explicit schema declaration.

Shard by host and by schema

Finally, is obviously possible to combine the two approaches and shard by host with only a subset of schemas.

To do so, let’s use all three nodes and have the schema distribution as follow:

- Europe on Server 192.168.1.5 -> HG 10

- Asia on Server 192.168.1.6 -> HG 20

- Africa on Server 192.168.1.7 -> HG 30

I have already set the query rules using HINT, so what I have to do is to use them BOTH to combine the operations:

[Mc]
Select /* shard_host_HG=Asia */ /* continent=Asia */  City.Name, City.Population from world.Country join world.City on world.City.CountryCode=world.Country.Code where Country.code='IND' limit 5; SELECT * /* shard_host_HG=Asia */ from information_schema.GLOBAL_VARIABLES where variable_name like 'bind%';
+--------------------+------------+
| Name               | Population |
+--------------------+------------+
| Mumbai (Bombay)    |   10500000 |
| Delhi              |    7206704 |
| Calcutta [Kolkata] |    4399819 |
| Chennai (Madras)   |    3841396 |
| Hyderabad          |    2964638 |
+--------------------+------------+
5 rows in set (0.00 sec)

+---------------+----------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+---------------+----------------+
| BIND_ADDRESS  | 192.168.1.6    |
+---------------+----------------+
1 row in set (0.01 sec)

[Mc]

Select /* shard_host_HG=Asia */ /* continent=Asia */ City.Name, City.Population from world.Country join world.City on world.City.CountryCode=world.Country.Code where Country.code='IND' limit 5; SELECT * /* shard_host_HG=Asia */ from information_schema.GLOBAL_VARIABLES where variable_name like 'bind%';

+--------------------+------------+

| Name | Population |

+--------------------+------------+

| Mumbai (Bombay) | 10500000 |

| Delhi | 7206704 |

| Calcutta [Kolkata] | 4399819 |

| Chennai (Madras) | 3841396 |

| Hyderabad | 2964638 |

+--------------------+------------+

5 rows in set (0.00 sec)

+---------------+----------------+

| VARIABLE_NAME | VARIABLE_VALUE |

+---------------+----------------+

| BIND_ADDRESS | 192.168.1.6 |

+---------------+----------------+

1 row in set (0.01 sec)

[Pa]
mysql> select digest_text from stats_mysql_query_digest;
+--------------------------------------------------------------------------------------------------------------------------------------------+
| digest_text                                                                                                                                |
+--------------------------------------------------------------------------------------------------------------------------------------------+
| SELECT * from information_schema.GLOBAL_VARIABLES where variable_name like ?                                                               |
| Select City.Name, City.Population from Asia.Country join Asia.City on Asia.City.CountryCode=Asia.Country.Code where Country.code=? limit ? |
+--------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

mysql> select active,hits, mysql_query_rules.rule_id, match_digest, match_pattern, replace_pattern, cache_ttl, apply,flagIn,flagOUT FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id;
+--------+------+---------+---------------------+----------------------------------------+-----------------+-----------+-------+--------+---------+
| active | hits | rule_id | match_digest        | match_pattern                          | replace_pattern | cache_ttl | apply | flagIN | flagOUT |
+--------+------+---------+---------------------+----------------------------------------+-----------------+-----------+-------+--------+---------+
| 1      | 0    | 10      | NULL                | /*s*shard_host_HG=.*Europes**.    | NULL            | NULL      | 0     | 0      | NULL    |
| 1      | 2    | 11      | NULL                | /*s*shard_host_HG=.*Asias**.      | NULL            | NULL      | 0     | 0      | NULL    |
| 1      | 0    | 12      | NULL                | /*s*shard_host_HG=.*Africas**.    | NULL            | NULL      | 0     | 0      | NULL    |
| 1      | 0    | 13      | NULL                | NULL                                   | NULL            | NULL      | 0     | 0      | 0       |
| 1      | 1    | 31      | NULL                | S*s*/*s*continent=.*Asias**.*   | NULL            | NULL      | 0     | 0      | 23      |
| 1      | 4    | 32      | NULL                | world.                                 | Asia.           | NULL      | 0     | 23     | 23      |
| 1      | 0    | 33      | NULL                | S*s*/*s*continent=.*Europes**.* | NULL            | NULL      | 0     | 0      | 25      |
| 1      | 0    | 34      | NULL                | world.                                 | Europe.         | NULL      | 0     | 25     | 25      |
| 1      | 0    | 35      | NULL                | S*s*/*s*continent=.*Africas**.* | NULL            | NULL      | 0     | 0      | 24      |
| 1      | 0    | 36      | NULL                | world.                                 | Africa.         | NULL      | 0     | 24     | 24      |
+--------+------+---------+---------------------+----------------------------------------+-----------------+-----------+-------+--------+---------+

[Pa]

mysql> select digest_text from stats_mysql_query_digest;

+--------------------------------------------------------------------------------------------------------------------------------------------+

| digest_text |

+--------------------------------------------------------------------------------------------------------------------------------------------+

| SELECT * from information_schema.GLOBAL_VARIABLES where variable_name like ? |

| Select City.Name, City.Population from Asia.Country join Asia.City on Asia.City.CountryCode=Asia.Country.Code where Country.code=? limit ? |

+--------------------------------------------------------------------------------------------------------------------------------------------+

2 rows in set (0.00 sec)

mysql> select active,hits, mysql_query_rules.rule_id, match_digest, match_pattern, replace_pattern, cache_ttl, apply,flagIn,flagOUT FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id;

+--------+------+---------+---------------------+----------------------------------------+-----------------+-----------+-------+--------+---------+

+--------+------+---------+---------------------+----------------------------------------+-----------------+-----------+-------+--------+---------+

| 1 | 0 | 10 | NULL | /*s*shard_host_HG=.*Europes**. | NULL | NULL | 0 | 0 | NULL |

| 1 | 2 | 11 | NULL | /*s*shard_host_HG=.*Asias**. | NULL | NULL | 0 | 0 | NULL |

| 1 | 0 | 12 | NULL | /*s*shard_host_HG=.*Africas**. | NULL | NULL | 0 | 0 | NULL |

| 1 | 0 | 13 | NULL | NULL | NULL | NULL | 0 | 0 | 0 |

| 1 | 1 | 31 | NULL | S*s*/*s*continent=.*Asias**.* | NULL | NULL | 0 | 0 | 23 |

| 1 | 4 | 32 | NULL | world. | Asia. | NULL | 0 | 23 | 23 |

| 1 | 0 | 33 | NULL | S*s*/*s*continent=.*Europes**.* | NULL | NULL | 0 | 0 | 25 |

| 1 | 0 | 34 | NULL | world. | Europe. | NULL | 0 | 25 | 25 |

| 1 | 0 | 35 | NULL | S*s*/*s*continent=.*Africas**.* | NULL | NULL | 0 | 0 | 24 |

| 1 | 0 | 36 | NULL | world. | Africa. | NULL | 0 | 24 | 24 |

+--------+------+---------+---------------------+----------------------------------------+-----------------+-----------+-------+--------+---------+

As you can see rule 11 has two HITS, which means my queries will go to the associated HG. But given that the Apply for rule 11 is =0, ProxySQL will first continue to process the Query Rules.

As such it will also transform the queries for rules 31 and 32, each one having the expected number of hits (one the first and the second four because of the loop).

That was all we had to do to perform a perfect two layer sharding in ProxySQL.

Conclusion

ProxySQL allows the user to access data distributed by shards in a very simple way. The query rules that follow the consolidate RegEx pattern, in conjunction with the possibility to concatenate rules and the Host Group approach definition give us huge flexibility with relative simplicity.

References:

https://github.com/sysown/proxysql/tree/v1.2.2/doc
https://github.com/google/re2/wiki/Syntax
http://www.proxysql.com/2015/09/proxysql-tutorial-setup-in-mysql.html
https://github.com/sysown/proxysql/blob/v1.2.2/doc/configuration_howto.md
https://github.com/sysown/proxysql/blob/v1.2.2/INSTALL.md
https://dev.mysql.com/doc/index-other.html

Credits

It is obvious that I need to acknowledge and kudos the work Rene’ Cannao is doing to make ProxySQL a solid, fast and flexible product. I also should mention that I work with him very often (more often than he likes), asking him for fixes and discussing optimization strategies. Requests that he satisfies with surprising speed and efficiency.

0 0 votes

Article Rating

8 Comments

Oldest

Newest Most Voted

Luke

8 years ago

There must be something wrong with the following QR:
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply) VALUES (31,1,’user_shardRW’,”^SELECTs*(.*)s*froms*world.(S*)s(.*).*Continent='(S*)’s*(s*.*)$”,”SELECT 1 from 4.2 WHERE 1=1 5″,1);

3 is missing…

Luke

8 years ago

Reply to Luke

Besides, I think the invasion to the SQL seems a bit too much, as the column Continent not even exists in the table city.

Author

Marco Tusa

8 years ago

Reply to Luke

Luke, thanks to take some time to read the article.
I will answer to both questions here.

In the example I had done above, I had explain that you need to create several schemas respecting the continent:
Create schema [Asia|Europe|North_America|Africa];
As such a query like :
SELECT name,population from world.City WHERE Continent='Europe' and CountryCode='ITA' order by population desc limit 1;
will point to the EUROPE schema, not to the table city.
To be clear the above become:

Original :SELECT name,population from world.City WHERE Continent=’Europe’ and CountryCode=’ITA’ order by population desc limit 1;
Transformed :SELECT name,population from Europe.City WHERE ?=? and CountryCode=? order by population desc limit ?

Finally if you read carefully the regex you will see that group 3 is not needed and not included, here the breakdown :Group Group 1. 7-23 name,population
Group 2. 34-38 City
Group 3. 39-46 WHERE
Group 4. 57-63 Europe
Group 5. 65-120 and CountryCode='ITA' order by population desc limit 1;

Group 3 is the WHERE , and I don’t need it given it is already included in the replace_pattern:
"SELECT 1 from 4.2 WHERE 1=1 5"

Hope this will help you to better understand the article.

Luke

8 years ago

Reply to Marco Tusa

Thanks for your quick response.
1, If you mean you just create this QR for this specified query,then missing 3 is OK.
2, I understand how ProxySQL handles the transformation and just think it’s a little weird if i need to write the original SQL in my application.

As i know, ProxySQL can only route one SQL to one backend MySQL instance. If I need to query multi shards in one SQL(e.g. Continent in (‘Asia’,’Europe’)), then ProxySQL can not push down the SQL and merge the results from each shard and finally return to user, is that right?

Author

Marco Tusa

8 years ago

Reply to Luke

Hi Luke,
I think you may find additional answers here https://www.slideshare.net/marcotusa/proxysql-sharding about limitations see slide 83
regards

Rajan

7 years ago

i followed step by step but not able to login

mysql -u user_shardRW -p -h hostname -P 6033 -e “SELECT name,population from world.City WHERE Continent=’Europe’ and CountryCode=’ITA’ order by population desc limit 1;”

scarlett

6 years ago

We use proxysql occured very serious accidents for several times, so we had a test, after that we found three main problems:
1. we killed the procedure of Back-end mysql, but the state of that machine was still SHUNNED for more than half a day, plus the real connecting request was dispached to that machine which was killed the mysql procedure
2. proxysql.Client_Connections_connected/stats=stats_mysql_global was more than default max value 2048, but our business concurrent was far from this value with php short links , why that happened?
3. We found no log printed for more than one hour and half, so we can’t analyze the accidents at certain times

business