One Billion Tables in MySQL 8.0 with ZFS

One Billion Tables in MySQL 8.0 with ZFS

PREVIOUS POST
NEXT POST

The short version

I created > one billion InnoDB tables in MySQL 8.0 (tables, not rows) just for fun. Here is the proof:

Yes, it took 6 hours and 57 minutes to count them all!

Why does anyone need one billion tables?

In my previous blog post, I created and tested MySQL 8.0 with 40 million tables (that was a real case study). The One Billion Tables project is not a real world scenario, however. I was challenged by Billion Tables Project (BTP) in PostgreSQL, and decided to repeat it with MySQL, creating 1 billion InnoDB tables.

As an aside: I think MySQL 8.0 is the first MySQL version where creating 1 billion InnoDB tables is even practically possible.

Challenges with one billion InnoDB tables

Disk space

The first and one of the most important challenges is disk space. InnoDB allocates data pages on disk when creating .ibd files. Without disk level compression we need > 25Tb of disk. The good news: we have ZFS which provides transparent disk compression. Here’s how the disk utilization looks:

Actual data (apparent-size):

Compressed data:

Compression ratio:

(Looks like the compression ratio reported is not 100% correct, we expect ~10x compression ratio.)

Too many tiny files

This is usually the big issue with databases that create a file per table. With MySQL 8.0 we can create a shared tablespace and “assign” a table to it. I created a tablespace per database, and created 1000 tables in each database.

The result:

Creating tables

Another big challenge is how to create tables fast enough so it will not take months. I have used three approaches:

  1. Disabled all possible consistency checks in MySQL, and decreased the innodb page size to 4K (these config options are NOT for production use)
  2. Created tables in parallel: as the mutex contention bug in MySQL 8.0 has been fixed, creating tables in parallel works fine.
  3. Use local NVMe cards on top of an AWS ec2 i3.8xlarge instance

my.cnf config file (I repeat: do not use this in production):

ZFS pool:

A simple “deploy” script to create tables in parallel (includes the sysbench table structure):

How fast did we create tables? Here are some stats:

So we created ~650 tables per second. The average, above, is per 10 seconds.

Counting the tables

It took > 6 hours to do “count(*) from information_schema.tables”! Here is why:

  1. MySQL 8.0 uses a new data dictionary (this is great as it avoids creating 1 billion frm files). Everything is stored in this file:
  2. The information_schema.tables is actually a view:

and the explain plan looks like this:

Conclusions

  1. I have created more than 1 billion real InnoDB tables with indexes in MySQL 8.0, just for fun, and it worked. It took ~2 weeks to create.
  2. Probably MySQL 8.0 is the first version where it is even practically possible to create billion InnoDB tables
  3. ZFS compression together with NVMe cards makes it reasonably cheap to do, for example, by using i3.4xlarge or i3.8xlarge instances on AWS.

one billion tables MySQL

PREVIOUS POST
NEXT POST

Share this post

Comments (8)

  • Alexander Rubin Reply

    Restart time for MySQL 8.0 with billion tables here (just FYI):

    # time service mysql restart

    real 11m16.531s
    user 0m0.006s
    sys 0m0.005s

    Which is also significantly faster than MySQL 5.6/5.7

    October 23, 2018 at 7:17 pm
  • Matt Reply

    Loved this post. Now what to do with those 1 billion empty tables?

    October 23, 2018 at 11:00 pm
  • McFlew van Landed on ya feet Reply

    What a great lab/test/mythbuster case thanks for sharing all the specs and steps involved!

    @Matt just rm -rf /var/lib/mysql/ that’s what you do with 1bi empty tables and then you reinstall mysql from scratch.

    October 26, 2018 at 5:31 am
  • Heidi Schmidt Reply

    What OS did you use ZFS with and what was the default settings for io capacity with respect to EBS SSD? (If you wouldn’t mind posting the show global variables — it would be interesting to see.) Thanks!

    November 1, 2018 at 9:30 am
    • Alexander Rubin Reply

      Ubuntu 18.04, I was not using the ebs, I was using the i3 instances with the local nvme cards.

      November 2, 2018 at 5:15 pm
  • Diego Reply

    Next challenge: what about MariaDB 10? 😉

    November 13, 2018 at 8:54 am
    • Alexander Rubin Reply

      MariaDB is based on MySQL 5.x, no new data dictionary, meaning that there is 1 FRM file per each table. I do not want to create 2 billion files… 🙂

      November 13, 2018 at 11:45 am
      • Diego Reply

        I think MariaDB 10.x is not based on MySQL 5.x but it’s a new development (and not compatible with MySQL 5.x), but I don’t now if it still uses one FRM file per table.

        November 14, 2018 at 7:49 am

Leave a Reply