Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

40 million tables in MySQL 8.0 with ZFS

September 3, 2018

Author

Share this Post:

In my previous blog post about millions of table in MySQL 8, I was able to create one million tables and test the performance of it. My next challenge is to create 40 million tables in MySQL 8 using shared tablespaces (one tablespace per schema). In this blog post I’m showing how to do it and what challenges we can expect.

Background

Once again – why do we need so many tables in MySQL, what is the use case? The main reason is: customer isolation. With the new focus on security and privacy (take GDPR for example) it is much easier and more beneficial to create a separate schema (or “database” in MySQL terms) for each customer. That creates a new set of challenges that we will need to solve. Here is the summary:

1. Too many files. For each table MySQL creates an FRM file. With MySQL 8.0, this is not the case for InnoDB tables (new data dictionary): it does not create FRM files, only creates IBD file.

1. Too much storage overhead. Just to create 40 million tables we will need to have ~4 – 5 Tb of space. The ZFS filesystem can help here a lot, through compression – see below.

1. MySQL does not work well with so many tables. We have observed a lot of overhead (MySQL needs to open/close table definition files) and contention (table definitions needs to be stored in memory to avoid performance penalty, which introduce mutex contention)

Challenges

When I approached the task of creating 40 million tables, my first challenge was disk space. Just to create them, I needed at least 5Tb of fast disk storage. The good news is: we have the ZFS filesystem which provides compression out of the box. With compression I was able to use just a 250G drive with ZFS – the compression ratio is > 10x:

# du -sh --apparent-size /var/lib/mysql-data
4.7T    /var/lib/mysql-data
# du -sh /var/lib/mysql-data
131G    /var/lib/mysql-data

# du -sh --apparent-size /var/lib/mysql-data

4.7T /var/lib/mysql-data

# du -sh /var/lib/mysql-data

131G /var/lib/mysql-data

The second challenge is how to create those tables in a reasonable amount of time. I created a script to “provision” the databases (create all 40 millions tables). The good new is that the performance regression in “create table” speed and scalability bug was fixed so I was able to use this script to create 40 million tables using shared tablespaces (one tablespace per schema):

#/bin/bash
function do_db {
        db_exist=$(mysql -A -s -Nbe "select 1 from information_schema.schemata where schema_name = '$db'")
        if [ "$db_exist" == "1" ]; then echo "Already exists $db"; return 0; fi; 
        mysql -vvv -e "create database $db";
        mysql -vvv $db -e "CREATE TABLESPACE $db ADD DATAFILE '$db.ibd' engine=InnoDB;"
        for i in {1..100}
        do
                table="CREATE TABLE sbtest$i ( id int(10) unsigned NOT NULL AUTO_INCREMENT, k int(10) unsigned NOT NULL DEFAULT '0', c varchar(120) NOT NULL DEFAULT '', pad varchar(60) NOT NULL DEFAULT '', PRIMARY KEY (id), KEY k_1 (k) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 tablespace $db;"
                mysql $db -e "$table"
        done
}
c=0
for m in {1..4000000}
do
        for i in {1..40}
        do
                let c=$c+1
                echo $c
                db="sbtest_$c"
                do_db &
        done
        wait
        #if [ $c > 4000000 ]; then exit; fi
done

#/bin/bash

function do_db {

db_exist=$(mysql -A -s -Nbe "select 1 from information_schema.schemata where schema_name = '$db'")

if [ "$db_exist" == "1" ]; then echo "Already exists $db"; return 0; fi;

mysql -vvv -e "create database $db";

mysql -vvv $db -e "CREATE TABLESPACE $db ADD DATAFILE '$db.ibd' engine=InnoDB;"

for i in {1..100}

table="CREATE TABLE sbtest$i ( id int(10) unsigned NOT NULL AUTO_INCREMENT, k int(10) unsigned NOT NULL DEFAULT '0', c varchar(120) NOT NULL DEFAULT '', pad varchar(60) NOT NULL DEFAULT '', PRIMARY KEY (id), KEY k_1 (k) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 tablespace $db;"

mysql $db -e "$table"

done

}

c=0

for m in {1..4000000}

for i in {1..40}

let c=$c+1

echo $c

db="sbtest_$c"

do_db &

done

wait

#if [ $c > 4000000 ]; then exit; fi

done

40 million tables in MySQL 8

Now it’s time for a real test. I’m using the latest MySQL 8 version (at the time of writing): 8.0.12. This implements the new data dictionary.

MySQL config file:

[mysqld]
datadir=/var/lib/mysql-data
socket=/var/lib/mysql-data/mysql.sock

datadir=/var/lib/mysql-data
log-error = /var/lib/mysql-log/error.log
server_id = 12345
log_bin = /var/lib/mysql-log/binlog
relay_log=/var/lib/mysql-log/relay-bin
skip-log-bin=1

innodb_log_group_home_dir = /var/lib/mysql-log
innodb_doublewrite = 0
innodb_flush_log_at_trx_commit=0
innodb_log_file_size=2G
innodb_buffer_pool_size=4G

tablespace_definition_cache = 524288
schema_definition_cache = 524288
table_definition_cache = 524288
table_open_cache=524288
open-files-limit=1000000

[mysqld]

datadir=/var/lib/mysql-data

socket=/var/lib/mysql-data/mysql.sock

datadir=/var/lib/mysql-data

log-error = /var/lib/mysql-log/error.log

server_id = 12345

log_bin = /var/lib/mysql-log/binlog

relay_log=/var/lib/mysql-log/relay-bin

skip-log-bin=1

innodb_log_group_home_dir = /var/lib/mysql-log

innodb_doublewrite = 0

innodb_flush_log_at_trx_commit=0

innodb_log_file_size=2G

innodb_buffer_pool_size=4G

tablespace_definition_cache = 524288

schema_definition_cache = 524288

table_definition_cache = 524288

table_open_cache=524288

open-files-limit=1000000

Sysbench shell script:

function run_sb() {
conn=" --db-driver=mysql --mysql-socket=/var/lib/mysql-data/mysql.sock  --mysql-db=sbtest_1 --mysql-user=sbtest --mysql-password=abc "
sysbench $conn --oltp_db_count=$db_count --oltp_tables_count=$table_count --oltp-table-size=10000 --report-interval=1 --num-threads=$num_threads --max-requests=0 --max-time=$max_time ./select_custom.lua run | tee -a sysbench_2.txt

}

let db_count=400000
table_count=100
max_time=10000
num_threads=32
run_sb

function run_sb() {

conn=" --db-driver=mysql --mysql-socket=/var/lib/mysql-data/mysql.sock --mysql-db=sbtest_1 --mysql-user=sbtest --mysql-password=abc "

sysbench $conn --oltp_db_count=$db_count --oltp_tables_count=$table_count --oltp-table-size=10000 --report-interval=1 --num-threads=$num_threads --max-requests=0 --max-time=$max_time ./select_custom.lua run | tee -a sysbench_2.txt

}

let db_count=400000

table_count=100

max_time=10000

num_threads=32

run_sb

Sysbench lua script:

pathtest = "/usr/share/sysbench/tests/include/oltp_legacy/"

if pathtest then
   dofile(pathtest .. "common.lua")
else
   require("common")
end

function thread_init(thread_id)
   set_vars()
end

function event()
   local table_name
   local i
   local c_val
   local k_val
   local pad_val
   oltp_db_count = tonumber(oltp_db_count) or 1
   -- local oltp_db_count = 4
   
   table_name = "sbtest_" .. sb_rand(1, oltp_db_count)..".sbtest".. sb_rand(1, oltp_tables_count)

   k_val = sb_rand(1, oltp_table_size)
   c_val = sb_rand_str([[
###########-###########-###########-###########-###########-###########-###########-###########-###########-###########]])
   pad_val = sb_rand_str([[
###########-###########-###########-###########-###########]])

      rs = db_query("SELECT id FROM " .. table_name .." LIMIT 1")
end

pathtest = "/usr/share/sysbench/tests/include/oltp_legacy/"

if pathtest then

dofile(pathtest .. "common.lua")

else

require("common")

end

function thread_init(thread_id)

set_vars()

end

function event()

local table_name

local i

local c_val

local k_val

local pad_val

oltp_db_count = tonumber(oltp_db_count) or 1

-- local oltp_db_count = 4

table_name = "sbtest_" .. sb_rand(1, oltp_db_count)..".sbtest".. sb_rand(1, oltp_tables_count)

k_val = sb_rand(1, oltp_table_size)

c_val = sb_rand_str([[

###########-###########-###########-###########-###########-###########-###########-###########-###########-###########]])

pad_val = sb_rand_str([[

###########-###########-###########-###########-###########]])

rs = db_query("SELECT id FROM " .. table_name .." LIMIT 1")

end

Please note that the tables are empty – no data.

Now we can run the benchmark. Unfortunately, we have a serious mutex contention in the data dictionary. Here are the results:

[ 453s ] thds: 32 tps: 1203.96 qps: 1203.96 (r/w/o: 1203.96/0.00/0.00) lat (ms,95%): 41.10 err/s: 0.00 reconn/s: 0.00
[ 454s ] thds: 32 tps: 1202.32 qps: 1202.32 (r/w/o: 1202.32/0.00/0.00) lat (ms,95%): 42.61 err/s: 0.00 reconn/s: 0.00
[ 455s ] thds: 32 tps: 1196.74 qps: 1196.74 (r/w/o: 1196.74/0.00/0.00) lat (ms,95%): 41.10 err/s: 0.00 reconn/s: 0.00
[ 456s ] thds: 32 tps: 1197.18 qps: 1197.18 (r/w/o: 1197.18/0.00/0.00) lat (ms,95%): 41.10 err/s: 0.00 reconn/s: 0.00
[ 457s ] thds: 32 tps: 887.11 qps: 887.11 (r/w/o: 887.11/0.00/0.00) lat (ms,95%): 41.10 err/s: 0.00 reconn/s: 0.00
[ 458s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 459s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 460s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 461s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 462s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 463s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 464s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 465s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 466s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 467s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 468s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 469s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 470s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 471s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 472s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 473s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 474s ] thds: 32 tps: 403.96 qps: 403.96 (r/w/o: 403.96/0.00/0.00) lat (ms,95%): 16819.24 err/s: 0.00 reconn/s: 0.00
[ 475s ] thds: 32 tps: 1196.00 qps: 1196.00 (r/w/o: 1196.00/0.00/0.00) lat (ms,95%): 41.85 err/s: 0.00 reconn/s: 0.00
[ 476s ] thds: 32 tps: 1208.96 qps: 1208.96 (r/w/o: 1208.96/0.00/0.00) lat (ms,95%): 41.85 err/s: 0.00 reconn/s: 0.00
[ 477s ] thds: 32 tps: 1192.06 qps: 1192.06 (r/w/o: 1192.06/0.00/0.00) lat (ms,95%): 41.85 err/s: 0.00 reconn/s: 0.00
[ 478s ] thds: 32 tps: 1173.89 qps: 1173.89 (r/w/o: 1173.89/0.00/0.00) lat (ms,95%): 43.39 err/s: 0.00 reconn/s: 0.00

[ 453s ] thds: 32 tps: 1203.96 qps: 1203.96 (r/w/o: 1203.96/0.00/0.00) lat (ms,95%): 41.10 err/s: 0.00 reconn/s: 0.00

[ 454s ] thds: 32 tps: 1202.32 qps: 1202.32 (r/w/o: 1202.32/0.00/0.00) lat (ms,95%): 42.61 err/s: 0.00 reconn/s: 0.00

[ 455s ] thds: 32 tps: 1196.74 qps: 1196.74 (r/w/o: 1196.74/0.00/0.00) lat (ms,95%): 41.10 err/s: 0.00 reconn/s: 0.00

[ 456s ] thds: 32 tps: 1197.18 qps: 1197.18 (r/w/o: 1197.18/0.00/0.00) lat (ms,95%): 41.10 err/s: 0.00 reconn/s: 0.00

[ 457s ] thds: 32 tps: 887.11 qps: 887.11 (r/w/o: 887.11/0.00/0.00) lat (ms,95%): 41.10 err/s: 0.00 reconn/s: 0.00

[ 458s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00

[ 459s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00

[ 460s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00

[ 461s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00

[ 462s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00

[ 463s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00

[ 464s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00

[ 465s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00

[ 466s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00

[ 467s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00

[ 468s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00

[ 469s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00

[ 470s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00

[ 471s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00

[ 472s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00

[ 473s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00

[ 474s ] thds: 32 tps: 403.96 qps: 403.96 (r/w/o: 403.96/0.00/0.00) lat (ms,95%): 16819.24 err/s: 0.00 reconn/s: 0.00

[ 475s ] thds: 32 tps: 1196.00 qps: 1196.00 (r/w/o: 1196.00/0.00/0.00) lat (ms,95%): 41.85 err/s: 0.00 reconn/s: 0.00

[ 476s ] thds: 32 tps: 1208.96 qps: 1208.96 (r/w/o: 1208.96/0.00/0.00) lat (ms,95%): 41.85 err/s: 0.00 reconn/s: 0.00

[ 477s ] thds: 32 tps: 1192.06 qps: 1192.06 (r/w/o: 1192.06/0.00/0.00) lat (ms,95%): 41.85 err/s: 0.00 reconn/s: 0.00

[ 478s ] thds: 32 tps: 1173.89 qps: 1173.89 (r/w/o: 1173.89/0.00/0.00) lat (ms,95%): 43.39 err/s: 0.00 reconn/s: 0.00

As we can see, for ~15 seconds no queries were processed: a complete MySQL stall. That situation – complete stall – happens constantly, every ~25-30 seconds.

Show engine innodb status query shows mutex contention:

SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 498635
--Thread 140456572004096 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1

--Thread 140451898689280 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1

--Thread 140451896919808 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1

--Thread 140456571119360 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1

--Thread 140457044215552 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1

--Thread 140456572299008 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1

--Thread 140457043035904 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1

--Thread 140456571709184 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1

--Thread 140451897214720 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1

--Thread 140451896624896 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1

--Thread 140457042740992 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1

--Thread 140451899279104 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1

--Thread 140457042446080 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1

OS WAIT ARRAY INFO: signal count 89024
RW-shared spins 11216, rounds 14847, OS waits 3641

SEMAPHORES

----------

OS WAIT ARRAY INFO: reservation count 498635

--Thread 140456572004096 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore: