October 22, 2014

Dynamic row format for MEMORY tables

The latest Percona Server release has one new feature: now MEMORY tables can have BLOB and TEXT columns, and VARCHAR columns will not waste space due to implicit extension to CHAR.

The lack of these features in the MEMORY storage engine (also known as HEAP) in MySQL has been one of the long-standing limitations caused by the fact that variable-length records were not supported by the engine. This is true for all MySQL versions, including the latest 5.6 development tree as of last week.

There have been efforts in the community to remove this limitation. In 2008 Igor Chernyshev of eBay implemented true-VARCHAR support for MEMORY tables for MySQL 5.0. Although this patch did not add the BLOB and TEXT support, it refactored the code to make it easier to add it. The code was proposed for inclusion into upstream sources but unfortunately never made it there, probably due to some limitations discussed below.

We at Percona took the eBay patch, ported it to 5.5, tested it extensively and fixed all the bugs we found.

There are a few points to keep in mind when using this new feature (and these points are probably the reason why it was not included in MySQL in 2008):

  1. The key columns must come before any non-key variable-length columns. For performance reasons, the key columns are stored in a fixed-length block at the beginning of the row. If there were any variable-length column in between them, it had to be stored in the fixed-length format, either wasting space (VARCHAR) or outright impossible (BLOB).
  2. Indexes on BLOB columns are not supported.

While we do not consider these limitations to be show-stopping, in the future we will work to remove them.

The full usage notes, including discussion when the dynamic row format is chosen over the fixed one, how to request it explicitly and how to set the data block size at CREATE TABLE time, are in the documentation.

What of course interests us is the MEMORY engine performance with this feature added. For that we run a few experiments, described next. These tests are not exhaustive and, of course, if you have a performance-critical application depending on MEMORY tables, they are no replacement for application-specific tests.

We start by checking if there is any performance regression when the variable-length records are not used, i.e. if you pay for what you don’t use. This is a valid question because the MEMORY implementation has changed significantly with this patch–for the fixed-length records too.

Default sysbench 0.4.12 schema, one client thread:
Fixed-row format (FRF, Percona Server without this patch applied):
Insertions in 900 seconds: 25962452.
VmPeak: 6267692 kB

Dynamic-row format (DRF):
Insertions in 900 seconds: 25949631 (0.05% slower).
VmPeak: 6267700 kB

Two client threads:
FRF:
Insertions in 900 seconds: 42946880.
VmPeak: 10069040 kB

DRF:
Insertions in 900 seconds: 43325887 (0.88% faster).
VmPeak: 10069048 kB

Here we can see that the performance and memory usage is about the same for both row formats. (Speaking very strictly, the VM Peak value is not comparable due to slightly different number of rows inserted).

In the second test we check if the dynamic row format delivers the memory savings for VARCHAR fields agains the fixed row format. For that we modify the default sysbench schema to be the following:

CREATE TABLE sbtest (id INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
c VARCHAR(4096) NOT NULL DEFAULT ”, PRIMARY KEY (id))
ENGINE=MEMORY ROW_FORMAT=DYNAMIC DEFAULT CHARSET=latin1;

Also we modify the sysbench non-transactional INSERT test to insert strings of random length between 0 and 4096 in uniform distribution, that is, 2048 on average.

The results of this test (two client threads):
FRF:
Insertions in 900 seconds: 9922468
VmPeak: 40871220 kB

DRF:
Insertions in 900 seconds: 9831946 (0.92% less)
VmPeak: 22520888 kB (45% reduction)

Thus the test confirms that the dynamic row format delivers the expected memory savings.

One quite important feature is missing from this release however: MyISAM tables are still used for temporary tables where the MEMORY tables with dynamic row format could be used instead: for schemas containing large VARCHARs, BLOB, TEXT. Unfortunately, enabling this feature is not trivial, but we are working on this and expect to deliver it in a future release.

About Laurynas Biveinis

Laurynas is a software engineer and Percona Server lead whose primary interest is InnoDB performance. In the past he worked in industry, interned in Google as a compiler software engineer, as well as academia where he researched physical database indexes, including large-scale spatial models of the brain.

Comments

  1. Lee says:

    Hi Laurynas Biveinis.

    You mean current percona server use dynamic format memory table even though it is internal temporary table.
    So, Would below query use dynamic format internal temporary table. ?
    (of course tab table doesn’t have index on fd2 columns)

    SELECT fd2, count(*) FROM tab GROUP BY fd2;

    Best regards,
    Lee.

  2. Laurynas Biveinis says:

    Lee –

    Unfortunately the dynamic format is not used by internal temporary tables yet. We are planning to remove this limitation.

  3. Lee says:

    Hi Laurynas Biveinis.

    Anyway, Thanks for you answer.

    Bye.

  4. Laurynas,

    Just checking whether MEMORY engine dynamic row format was already implemented for internal tables? I could not find this info anywhere. If not, is there a tentative release day?

    Gene

  5. Gene –

    It’s on our plans, but we haven’t implemented this feature to release quality yet, and we don’t have an ETA neither. As with any other development work, there is an option to contract Percona to prioritize this development.

  6. Hans-Joachim Kliemeck says:

    Has anything happened since the last post?

  7. Laurynas Biveinis says:

    Hans-Joachim –

    Sorry, we don’t have any updates yet.

Speak Your Mind

*