In my previous blog post about millions of table in MySQL 8, I was able to create one million tables and test the performance of it. My next challenge is to create 40 million tables in MySQL 8 using shared tablespaces (one tablespace per schema). In this blog post I’m showing how to do it and what challenges we can expect.
Background
Once again – why do we need so many tables in MySQL, what is the use case? The main reason is: customer isolation. With the new focus on security and privacy (take GDPR for example) it is much easier and more beneficial to create a separate schema (or “database” in MySQL terms) for each customer. That creates a new set of challenges that we will need to solve. Here is the summary:
- Too many files. For each table MySQL creates an FRM file. With MySQL 8.0, this is not the case for InnoDB tables (new data dictionary): it does not create FRM files, only creates IBD file.
- Too much storage overhead.Ā Just to create 40 million tables we will need to have ~4 – 5 Tb of space. The ZFS filesystem can help here a lot, through compression ā see below.
- MySQL does not work well with so many tables.Ā We have observed a lot of overhead (MySQL needs to open/close table definition files) and contention (table definitions needs to be stored in memory to avoid performance penalty, which introduce mutex contention)
Challenges
When I approached the task of creating 40 million tables, my first challenge was disk space. Just to create them, I needed at least 5Tb of fast disk storage. The good news is: we have the ZFS filesystem which provides compression out of the box. With compression I was able to use just a 250G drive with ZFS ā the compression ratio is > 10x:
1 2 3 4 |
# du -sh --apparent-size /var/lib/mysql-data 4.7T /var/lib/mysql-data # du -sh /var/lib/mysql-data 131G /var/lib/mysql-data |
The second challenge is how to create those tables in a reasonable amount of time. I created a script to “provision” the databases (create all 40 millions tables). The good new is that theĀ performance regression in “create table” speed and scalability bug was fixed so I was able to use this script to create 40 million tables using shared tablespaces (one tablespace per schema):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
#/bin/bash function do_db { db_exist=$(mysql -A -s -Nbe "select 1 from information_schema.schemata where schema_name = '$db'") if [ "$db_exist" == "1" ]; then echo "Already exists $db"; return 0; fi; mysql -vvv -e "create database $db"; mysql -vvv $db -e "CREATE TABLESPACE $db ADD DATAFILE '$db.ibd' engine=InnoDB;" for i in {1..100} do table="CREATE TABLE sbtest$i ( id int(10) unsigned NOT NULL AUTO_INCREMENT, k int(10) unsigned NOT NULL DEFAULT '0', c varchar(120) NOT NULL DEFAULT '', pad varchar(60) NOT NULL DEFAULT '', PRIMARY KEY (id), KEY k_1 (k) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 tablespace $db;" mysql $db -e "$table" done } c=0 for m in {1..4000000} do for i in {1..40} do let c=$c+1 echo $c db="sbtest_$c" do_db & done wait #if [ $c > 4000000 ]; then exit; fi done |
40 million tables in MySQL 8
Now it’s time for a real test. I’m using the latest MySQL 8 version (at the time of writing): 8.0.12. This implements the new data dictionary.
MySQL config file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
[mysqld] datadir=/var/lib/mysql-data socket=/var/lib/mysql-data/mysql.sock datadir=/var/lib/mysql-data log-error = /var/lib/mysql-log/error.log server_id = 12345 log_bin = /var/lib/mysql-log/binlog relay_log=/var/lib/mysql-log/relay-bin skip-log-bin=1 innodb_log_group_home_dir = /var/lib/mysql-log innodb_doublewrite = 0 innodb_flush_log_at_trx_commit=0 innodb_log_file_size=2G innodb_buffer_pool_size=4G tablespace_definition_cache = 524288 schema_definition_cache = 524288 table_definition_cache = 524288 table_open_cache=524288 open-files-limit=1000000 |
Sysbench shell script:
1 2 3 4 5 6 7 8 9 10 11 |
function run_sb() { conn=" --db-driver=mysql --mysql-socket=/var/lib/mysql-data/mysql.sock --mysql-db=sbtest_1 --mysql-user=sbtest --mysql-password=abc " sysbench $conn --oltp_db_count=$db_count --oltp_tables_count=$table_count --oltp-table-size=10000 --report-interval=1 --num-threads=$num_threads --max-requests=0 --max-time=$max_time ./select_custom.lua run | tee -a sysbench_2.txt } let db_count=400000 table_count=100 max_time=10000 num_threads=32 run_sb |
Sysbench lua script:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
pathtest = "/usr/share/sysbench/tests/include/oltp_legacy/" if pathtest then dofile(pathtest .. "common.lua") else require("common") end function thread_init(thread_id) set_vars() end function event() local table_name local i local c_val local k_val local pad_val oltp_db_count = tonumber(oltp_db_count) or 1 -- local oltp_db_count = 4 table_name = "sbtest_" .. sb_rand(1, oltp_db_count)..".sbtest".. sb_rand(1, oltp_tables_count) k_val = sb_rand(1, oltp_table_size) c_val = sb_rand_str([[ ###########-###########-###########-###########-###########-###########-###########-###########-###########-###########]]) pad_val = sb_rand_str([[ ###########-###########-###########-###########-###########]]) rs = db_query("SELECT id FROM " .. table_name .." LIMIT 1") end |
Please note that the tables are empty – no data.
Now we can run the benchmark. Unfortunately, we have a serious mutex contention in the data dictionary. Here are the results:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
[ 453s ] thds: 32 tps: 1203.96 qps: 1203.96 (r/w/o: 1203.96/0.00/0.00) lat (ms,95%): 41.10 err/s: 0.00 reconn/s: 0.00 [ 454s ] thds: 32 tps: 1202.32 qps: 1202.32 (r/w/o: 1202.32/0.00/0.00) lat (ms,95%): 42.61 err/s: 0.00 reconn/s: 0.00 [ 455s ] thds: 32 tps: 1196.74 qps: 1196.74 (r/w/o: 1196.74/0.00/0.00) lat (ms,95%): 41.10 err/s: 0.00 reconn/s: 0.00 [ 456s ] thds: 32 |