This article demonstrates how MySQL sharding with ProxySQL works.
Recently a colleague of mine asked me to provide a simple example on how ProxySQL performs sharding.
In response, I’m writing this short tutorial in the hope it will illustrate ProxySQL’s sharding functionalities, and help people out there better understand how to use it.
ProxySQL is a very powerful platform that allows us to manipulate and manage our connections and queries in a simple but effective way. This article shows you how.
Before starting let’s clarify some basic concepts.
- ProxySQL organizes its internal set of servers in Host Groups (HG), and each HG can be associated with users and Query Rules (QR)
- Each QR can be final (apply = 1) or = let ProxySQL continue to parse other QRs
- A QR can be a rewrite action, be a simple match, have a specific target HG, or be generic
- QRs are defined using regex
You can see QRs as a sequence of filters and transformations that you can arrange as you like.
These simple basic rules give us enormous flexibility. They allow us to create very simple actions like a simple query re-write, or very complex chains that could see dozens of QR concatenated. Documentation can be found here.
The information related to HGs or QRs is easily accessible using the ProxySQL administrator interface, in the tables mysql_servers, mysql_query_rules and stats.stats_mysql_query_rules. The last one allows us to evaluate if and how the rule(s) is used.
With regards to sharding, what can ProxySQL do to help us achieve what we need (in a relatively easy way)? Some people/companies include sharding logic in the application, use multiple connections to reach the different targets, or have some logic to split the load across several schemas/tables. ProxySQL allows us to simplify the way connectivity and query distribution is supposed to work reading data in the query or accepting HINTS.
No matter what the requirements, the sharding exercise can be summarized in a few different categories.
- By splitting the data inside the same container (like having a shard by State where each State is a schema)
- By physical data location (this can have multiple MySQL servers in the same room, as well as having them geographically distributed)
- A combination of the two, where I do split by State using a dedicated server, and again split by schema/table by whatever (say by gender)
In the following examples, I show how to use ProxySQL to cover the three different scenarios defined above (and a bit more).
The example below will report text from the Admin ProxySQL interface and the MySQL console. I will mark each one as follows:
- Mc for MySQL console
- Pa for ProxySQL Admin
Please note that the MySQL console MUST use the -c flag to pass the comments in the query. This is because the default behavior in the MySQL console is to remove the comments.
I am going to illustrate procedures that you can replicate on your laptop, and when possible I will mention a real implementation. This because I want you to directly test the ProxySQL functionalities.
For the example described below I have a PrxySQL v1.2.2 that is going to become the master in few days. You can download it from:
1 2 |
git clone https://github.com/sysown/proxysql.git git checkout v1.2.2 |
Then to compile:
1 2 3 |
cd <path to proxy source code> make make install |
If you need full instructions on how to install and configure ProxySQL, read here and here.
Finally, you need to have the WORLD test DB loaded. WORLD test DB can be found here.
Shard inside the same MySQL Server using three different schemas split by continent
Obviously, you can have any number of shards and relative schemas. What is relevant here is demonstrating how traffic gets redirected to different targets (schemas), maintaining the same structure (tables), by discriminating the target based on some relevant information in the data or pass by the application.
OK, let us roll the ball.
1 2 3 4 5 6 7 8 9 10 11 12 |
[Mc] +---------------+-------------+ | Continent | count(Code) | +---------------+-------------+ | Asia | 51 | <-- | Europe | 46 | <-- | North America | 37 | | Africa | 58 | <-- | Oceania | 28 | | Antarctica | 5 | | South America | 14 | +---------------+-------------+ |
For this exercise, I will use three hosts in replica.
To summarize, I will need:
- Three hosts: 192.168.1.[5-6-7]
- Three schemas: Continent X + world schema
- One user : user_shardRW
- Three hostgroups: 10, 20, 30 (for future use)
First, we will create the schemas Asia, Africa, Europe:
1 2 3 4 5 6 7 8 9 10 11 |
[Mc] Create schema [Asia|Europe|North_America|Africa]; create table Asia.City as select a.* from world.City a join Country on a.CountryCode = Country.code where Continent='Asia' ; create table Europe.City as select a.* from world.City a join Country on a.CountryCode = Country.code where Continent='Europe' ; create table Africa.City as select a.* from world.City a join Country on a.CountryCode = Country.code where Continent='Africa' ; create table North_America.City as select a.* from world.City a join Country on a.CountryCode = Country.code where Continent='North America' ; create table Asia.Country as select * from world.Country where Continent='Asia' ; create table Europe.Country as select * from world.Country where Continent='Europe' ; create table Africa.Country as select * from world.Country where Continent='Africa' ; create table North_America.Country as select * from world.Country where Continent='North America' ; |
Now, create the user
1 |
grant all on *.* to user_shardRW@'%' identified by 'test'; |
Now let us start to configure the ProxySQL:
1 2 3 4 5 6 7 8 |
[Pa] insert into mysql_users (username,password,active,default_hostgroup,default_schema) values ('user_shardRW','test',1,10,'test_shard1'); LOAD MYSQL USERS TO RUNTIME;SAVE MYSQL USERS TO DISK; INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.5',10,3306,100); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.6',20,3306,100); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.7',30,3306,100); LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK; |
With this we have defined the user, the servers and the host groups.
Let us start to define the logic with the query rules:
1 2 3 4 |
[Pa] delete from mysql_query_rules where rule_id > 30; INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply) VALUES (31,1,'user_shardRW',"^SELECT\s*(.*)\s*from\s*world.(\S*)\s(.*).*Continent='(\S*)'\s*(\s*.*)$","SELECT \1 from \4.\2 WHERE 1=1 \5",1); LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK; |
I am now going to query the master (or a single node), but I am expecting ProxySQL to redirect the query to the right shard, catching the value of the continent:
1 2 3 4 5 6 7 |
[Mc] SELECT name,population from world.City WHERE Continent='Europe' and CountryCode='ITA' order by population desc limit 1; +------+------------+ | name | population | +------+------------+ | Roma | 2643581 | +------+------------+ |
You can say: “Hey! You are querying the schema World, of course you get back the correct data.”
This is not what really happened. ProxySQL did not query the schema World, but the schema Europe.
Let’s look at the details:
1 2 3 4 |
[Pa] select * from stats_mysql_query_digest; Original :SELECT name,population from world.City WHERE Continent='Europe' and CountryCode='ITA' order by population desc limit 1; Transformed :SELECT name,population from Europe.City WHERE ?=? and CountryCode=? order by population desc limit ? |
Let me explain what happened.
Rule 31 in ProxySQL will take all the FIELDS we will pass in the query. It will catch the CONTINENT in the WHERE clause, it will take any condition after WHERE and it will reorganize the queries all using the RegEx.
Does this work for any table in the sharded schemas? Of course it does.
A query like:
SELECT name,population from world.Country WHERE Continent='Asia' ;
Will be transformed into:
SELECT name,population from Asia.Country WHERE ?=?
1 2 3 4 5 6 7 8 9 10 11 |
[Mc] +----------------------+------------+ | name | population | +----------------------+------------+ | Afghanistan | 22720000 | | United Arab Emirates | 2441000 | | Armenia | 3520000 | <snip ...> | Vietnam | 79832000 | | Yemen | 18112000 | +----------------------+------------+ |
Another possible approach is to instruct ProxySQL to shard is to pass a hint inside a comment. Let see how.
First let me disable the rule I just inserted. This is not really needed, but we’ll do it so you can see how. 🙂
1 2 3 4 5 |
[Pa] mysql> update mysql_query_rules set active=0 where rule_id=31; Query OK, 1 row |