EmergencyEMERGENCY? Get 24/7 Help Now!

MyRocks Engine: Things to Know Before You Start

 | February 1, 2018 |  Posted In: MySQL

PREVIOUS POST
NEXT POST

MyRocksPercona recently released Percona Server with MyRocks as GA. You can see how Facebook explains wins they see in production with MyRocks. Now if you use Percona repositories, you can simply install MyRocks plugin and enable it with ps-admin --enable-rocksdb.

There are some major and minor differences when comparing it to typical InnoDB deployments, and I want to highlight them here. The first important difference is that MyRocks (based on RocksDB) uses Log Structured Merge Tree data structure, not a B+ tree like InnoDB.

You learn more about the LSM engine in my article for DZone.The summary is that an LSM data structure is good for write-intensive workloads, with the expense that reads might slow down (both point reads and especially range reads) and full table scans might be too heavy for the engine. This is important to keep in mind when designing applications for MyRocks. MyRocks is not an enhanced InnoDB, nor a one-size-fits-all replacement for InnoDB. It has its own pros/cons just like InnoDB. You need to decide which engine to use based on your applications data access patterns.

What other differences should you be aware of?

  • Let’s look at the directory layout. Right now, all tables and all databases are stored in a hidden .rocksdb directory inside mysqldir. The name and location can be changed, but still all tables from all databases are stored in just a series of .sst files. There is no per-table / per-database separation.
  • By default in Percona Server for MySQL, MyRocks will use LZ4 compression for all tables. You can change compression settings by changing the rocksdb_default_cf_options server variable. By default it set to compression=kLZ4Compression;bottommost_compression=kLZ4Compression. We chose LZ4 compression as it provides acceptable compression level with very little CPU overhead. Other possible compression methods are Zlib and ZSTD, or no compression at all. You can learn more about compression ratio vs. speed in Peter’s and my post.To compare the data size of a MyRocks table loaded with traffic statistic data from my homebrew router, I’ve used the following table created for pmacct collector:

    As you can see, there are about 20mln records in this table. MyRocks (with default LZ4 compression) uses 828MB. InnoDB (uncompressed) uses 3760MB.
  • You can find very verbose information about your RocksDB instance in the LOG file located in .rocksdb directory. Check this file for more diagnostics. You can also try the SHOW ENGINE ROCKSDB STATUS command, but it is even more cryptic than SHOW ENGINE INNODB STATUS. It takes time to parse and to understand it.
  • Keep in mind that at this time MyRocks supports only READ-COMMITTED isolation levels. There is no REPEATABLE-READ isolation level and no gap locking like in InnoDB. In theory, RocksDB should support SNAPSHOT isolation level. However, there is no notion of SNAPSHOT isolation in MySQL so we have not implemented the special syntax to support this level. Please let us know if you would be interested in this.
  • For bulk loads, you may face problems trying to load large amounts of data into MyRocks (and unfortunately this might be the very first operation when you start playing with MyRocks as you try to LOAD DATA, INSERT INTO myrocks_table SELECT * FROM innodb_table or ALTER TABLE innodb_table ENGINE=ROCKSDB). If your table is big enough and you do not have enough memory, RocksDB crashes. As a workaround, you should set rocksdb_bulk_load=1 for the session where you load data.  See more on this page: https://github.com/facebook/mysql-5.6/wiki/data-loading.
  • Block cache in MyRocks is somewhat similar to innodb_buffer_pool_size, however for MyRocks it’s mainly beneficial for reads. You may want to tune the rocksdb_block_cache_size setting. Also keep in mind it uses buffered reads by default, and in this case the OS cache contains cached compressed data and RockDB block cache will contain uncompressed data. You may keep this setup to have two levels of cache, or you can disable buffering by forcing block cache to use direct reads with rocksdb_use_direct_reads=ON.
  • The nature of LSM trees requires that when a level becomes full, there is a merge process that pushes compacted data to the next level. This process can be quite intensive and affect user queries. It is possible to tune it to be less intensive.
  • Right now there is no hot backup software like Percona XtraBackup to perform a hot backup of MyRocks tables (we are looking into this). At this time you can use mysqldump for logical backups, or use filesystem-level snapshots like LVM or ZFS.

You can find more MyRocks specifics and limitations in our docs at https://www.percona.com/doc/percona-server/LATEST/myrocks/limitations.html.

We are looking for feedback on your MyRocks experience!

UPDATES (12-Feb-2018)
Updates to the original post with the feedback provided by Facebook MyRocks team

1. Isolation Levels
MyRocks supports READ COMMITTED and REPEATABLE READ. MyRocks does not support SERIALIZABLE.

Please read https://github.com/facebook/mysql-5.6/wiki/Transaction-Isolation for details.
The way to implement REPETABLE READ was different from MyRocks and InnoDB — MyRocks used
PostgreSQL style snapshot isolation.
In Percona Server we do not allow REPEATABLE READ for MyRocks tables, as the behavior will be different from InnoDB.

2. Online Binary Backup Tool
There is an open source online binary backup tool for MyRocks — myrocks_hotabackup
https://github.com/facebook/mysql-5.6/blob/fb-mysql-5.6.35/scripts/myrocks_hotbackup

PREVIOUS POST
NEXT POST
Vadim Tkachenko

Vadim Tkachenko co-founded Percona in 2006 and serves as its Chief Technology Officer. Vadim leads Percona Labs, which focuses on technology research and performance evaluations of Percona’s and third-party products. Percona Labs designs no-gimmick tests of hardware, filesystems, storage engines, and databases that surpass the standard performance and functionality scenario benchmarks. Vadim’s expertise in LAMP performance and multi-threaded programming help optimize MySQL and InnoDB internals to take full advantage of modern hardware. Oracle Corporation and its predecessors have incorporated Vadim’s source code patches into the mainstream MySQL and InnoDB products. He also co-authored the book High Performance MySQL: Optimization, Backups, and Replication 3rd Edition.

2 Comments

  • How does MyRocks handle secondary indexes?

    Let’s say I use auto-increment as the primary key. My understanding is that in innoDB updates are in-place so each row has an auto-increment id that doesn’t change. Secondary indexes simply point to this auto-increment id.

    In RocksDB updates are not in-place but result in new versions of the row. Do these different versions all share the same auto-increment id? If not does that mean each update would change the primary key of the newest version to a new auto-increment id? In that case would the secondary indexes need to be updated to point to the newest primary key? If I have 20 indexes that’s a lot of updates to be done for simply changing 1 row.

  • MyRocks, like most/all MySQL storage engines appends the PK (or columns needed to re-create the entire PK) to the secondary key. This is how the association of SK -> PK works. AUTO INC is a logical value assigned to a column when the column is inserted and does not change at any time without an explicit UPDATE on that column.

    SKs should only get changed if you re updating a portion of the key itself or a portion of the PK that it refers back to. In most tree oriented storage engines this results in a deletion of the record in the effected SKs and a re-insertion of the record as the position of the record within the tree will have changed.

Leave a Reply to Andy Cancel reply