Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Contact Us

MyRocks Engine: Things to Know Before You Start

February 1, 2018

Author

Vadim Tkachenko

MySQL

Share this Post:

Percona recently released Percona Server with MyRocks as GA. You can see how Facebook explains wins they see in production with MyRocks. Now if you use Percona repositories, you can simply install MyRocks plugin and enable it with ps-admin --enable-rocksdb.

There are some major and minor differences when comparing it to typical InnoDB deployments, and I want to highlight them here. The first important difference is that MyRocks (based on RocksDB) uses Log Structured Merge Tree data structure, not a B+ tree like InnoDB.

You learn more about the LSM engine in my article for DZone.The summary is that an LSM data structure is good for write-intensive workloads, with the expense that reads might slow down (both point reads and especially range reads) and full table scans might be too heavy for the engine. This is important to keep in mind when designing applications for MyRocks. MyRocks is not an enhanced InnoDB, nor a one-size-fits-all replacement for InnoDB. It has its own pros/cons just like InnoDB. You need to decide which engine to use based on your applications data access patterns.

MyRocks Engine – What other differences should you be aware of?

- Let’s look at the directory layout. Right now, all tables and all databases are stored in a hidden .rocksdb directory inside mysqldir. The name and location can be changed, but still all tables from all databases are stored in just a series of .sst files. There is no per-table / per-database separation.

By default in Percona Server for MySQL, MyRocks will use LZ4 compression for all tables. You can change compression settings by changing the rocksdb_default_cf_options server variable. By default it set to compression=kLZ4Compression;bottommost_compression=kLZ4Compression. We chose LZ4 compression as it provides acceptable compression level with very little CPU overhead. Other possible compression methods are Zlib and ZSTD, or no compression at all. You can learn more about compression ratio vs. speed in Peter’s and my post.To compare the data size of a MyRocks table loaded with traffic statistic data from my homebrew router, I’ve used the following table created for pmacct collector:

CREATE TABLE `acct_v9` (
  `tag` int(4) unsigned NOT NULL,
  `class_id` char(16) NOT NULL,
  `class` varchar(255) DEFAULT NULL,
  `mac_src` char(17) NOT NULL,
  `mac_dst` char(17) NOT NULL,
  `vlan` int(2) unsigned NOT NULL,
  `as_src` int(4) unsigned NOT NULL,
  `as_dst` int(4) unsigned NOT NULL,
  `ip_src` char(15) NOT NULL,
  `ip_dst` char(15) NOT NULL,
  `port_src` int(2) unsigned NOT NULL,
  `port_dst` int(2) unsigned NOT NULL,
  `tcp_flags` int(4) unsigned NOT NULL,
  `ip_proto` char(6) NOT NULL,
  `tos` int(4) unsigned NOT NULL,
  `packets` int(10) unsigned NOT NULL,
  `bytes` bigint(20) unsigned NOT NULL,
  `flows` int(10) unsigned NOT NULL,
  `stamp_inserted` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `id` int(11) NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`id`)
) ENGINE=ROCKSDB AUTO_INCREMENT=20127562

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

CREATE TABLE `acct_v9` (

`tag` int(4) unsigned NOT NULL,

`class_id` char(16) NOT NULL,

`class` varchar(255) DEFAULT NULL,

`mac_src` char(17) NOT NULL,

`mac_dst` char(17) NOT NULL,

`vlan` int(2) unsigned NOT NULL,

`as_src` int(4) unsigned NOT NULL,

`as_dst` int(4) unsigned NOT NULL,

`ip_src` char(15) NOT NULL,

`ip_dst` char(15) NOT NULL,

`port_src` int(2) unsigned NOT NULL,

`port_dst` int(2) unsigned NOT NULL,

`tcp_flags` int(4) unsigned NOT NULL,

`ip_proto` char(6) NOT NULL,

`tos` int(4) unsigned NOT NULL,

`packets` int(10) unsigned NOT NULL,

`bytes` bigint(20) unsigned NOT NULL,

`flows` int(10) unsigned NOT NULL,

`stamp_inserted` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,

`id` int(11) NOT NULL AUTO_INCREMENT,

PRIMARY KEY (`id`)

) ENGINE=ROCKSDB AUTO_INCREMENT=20127562

As you can see, there are about 20mln records in this table. MyRocks (with default LZ4 compression) uses 828MB. InnoDB (uncompressed) uses 3760MB.

- You can find very verbose information about your RocksDB instance in the LOG file located in .rocksdb directory. Check this file for more diagnostics. You can also try the SHOW ENGINE ROCKSDB STATUS command, but it is even more cryptic than SHOW ENGINE INNODB STATUS. It takes time to parse and to understand it.

- Keep in mind that at this time MyRocks supports only READ-COMMITTED isolation levels. There is no REPEATABLE-READ isolation level and no gap locking like in InnoDB. In theory, RocksDB should support SNAPSHOT isolation level. However, there is no notion of SNAPSHOT isolation in MySQL so we have not implemented the special syntax to support this level. Please let us know if you would be interested in this.

- For bulk loads, you may face problems trying to load large amounts of data into MyRocks (and unfortunately this might be the very first operation when you start playing with MyRocks as you try to LOAD DATA, INSERT INTO myrocks_table SELECT * FROM innodb_table or ALTER TABLE innodb_table ENGINE=ROCKSDB). If your table is big enough and you do not have enough memory, RocksDB crashes. As a workaround, you should set rocksdb_bulk_load=1 for the session where you load data. See more on this page: https://github.com/facebook/mysql-5.6/wiki/data-loading.

- Block cache in MyRocks is somewhat similar to innodb_buffer_pool_size, however for MyRocks it’s mainly beneficial for reads. You may want to tune the rocksdb_block_cache_size setting. Also keep in mind it uses buffered reads by default, and in this case the OS cache contains cached compressed data and RockDB block cache will contain uncompressed data. You may keep this setup to have two levels of cache, or you can disable buffering by forcing block cache to use direct reads with rocksdb_use_direct_reads=ON.

- The nature of LSM trees requires that when a level becomes full, there is a merge process that pushes compacted data to the next level. This process can be quite intensive and affect user queries. It is possible to tune it to be less intensive.

- Right now there is no hot backup software like Percona XtraBackup to perform a hot backup of MyRocks tables (we are looking into this). At this time you can use mysqldump for logical backups, or use filesystem-level snapshots like LVM or ZFS.

You can find more MyRocks specifics and limitations in our docs at https://www.percona.com/doc/percona-server/LATEST/myrocks/limitations.html.

We are looking for feedback on your MyRocks experience!

UPDATES (12-Feb-2018)
Updates to the original post with the feedback provided by Facebook MyRocks team

1. Isolation Levels
MyRocks supports READ COMMITTED and REPEATABLE READ. MyRocks does not support SERIALIZABLE.

Please read https://github.com/facebook/mysql-5.6/wiki/Transaction-Isolation for details.
The way to implement REPETABLE READ was different from MyRocks and InnoDB — MyRocks used
PostgreSQL style snapshot isolation.
In Percona Server we allow REPEATABLE READ for MyRocks tables only for statements that are safe to use. For example statements like SELECT .. FOR UPDATE and INSERT .. SELECT will fail in MyRocks in REPEATABLE READ isolation mode

2. Online Binary Backup Tool
There is an open source online binary backup tool for MyRocks — myrocks_hotabackup
https://github.com/facebook/mysql-5.6/blob/fb-mysql-5.6.35/scripts/myrocks_hotbackup