TokuMX tip: Create any field name you want

A common MongoDB tip is to create short field names to save storage space. Because MongoDB does not compress its data on disk and stores field names in each document, using longer field names leads to bigger documents which leads to more storage space usage. The downside here is developers find short field names unintuitive and difficult to use.

TokuMX does not have this problem. With TokuMX, use whatever field name you want.

TokuMX uses standard compression algorithms like zlib to compress large chunks of data before writing the data to disk. That is the secret to TokuMX’s great compression. As a result, all documents stored on disk are compressed with many other documents as well. The field names create a lot of redundant data and zlib is great at compressing redundant data. Therefore, long field names should not have an impact on TokuMX.

To show this, we ran a little experiment. We created a 100 million document collection that stores first names, last names, and email addresses. We used two schemas. The first schema had long field names.

The second schema had short field names:

We save 26 bytes per document with the second schema.

We generated random names and email addresses, so the actual data is not that naturally compressible. The first names were 10 random characters, the last name and email were each 20 random characters. If we used actual names and email addresses, the data would have compressed more, but that is not what wanted to test. We wanted to test the effect of field names.

We ran this experiment on MongoDB 2.2.5, TokuMX uncompressed, and TokuMX with zlib compression (the compression settings are set when the collection is created). Below are the results of disk usage.


Here are some interesting observations:

  • With MongoDB, short names saved about 10% of disk space (from 22.47 GB to 20.42 GB).
  • With TokuMX uncompressed, short names saved about 17% of disk space, which makes sense because we are not getting the advantages of zlib.
  • With TokuMX with zlib (which is the default setting), short names saved about 2.5% of disk space (from 4.38 GB to 4.27 GB).

So, the conclusion here is that the short field names are not buying much for TokuMX. Therefore, create whatever field name you want. Don’t worry about the storage usage, because compression will handle it.

  • Dorian Reply

    Yeah, but what about when documents are in ram? They are uncompressed in ram. So you get ~17% free ram.

    September 2, 2014 at 8:10 am

