Infinitely Scalable Storage with High Compression Feature

Infinitely Scalable Storage with High Compression FeatureIt is no secret that compute and storage costs are the main drivers of cloud bills. Migration of data from the legacy data center to the cloud looks appealing at first as it significantly reduces capital expense (CapEx) and keeps operational expenses (OpEx) under control. But once you see the bill, the lift and shift project does not look that promising anymore. See Percona’s recent open source survey which shows that many organizations saw an unexpected growth around cloud and data.

Storage growth is an organic process for the expanding business: more customers store more data, and more data needs more backups and disaster recovery storage for low RTO.

Today, the Percona Innovation Team, which is part of the Engineering organization, is proud to announce a new feature – High Compression. With this feature enabled, your MySQL databases will have infinite storage at zero cost.

The Problem

Our research team was digging into the problem of storage growth. They have found that the storage growth of a successful business inevitably leads to the increase of the cloud bill. After two years of research we got the data we need and the problem is now clear, and you can see it on the chart below:

The correlation is clearly visible – the more data you have, the more you pay.

The Solution

Once our Innovation Team received the data, we started working day and night on the solution. The goal was to change the trend and break the correlation. That is how after two years, we are proud to share with the community the High Compression feature. You can see the comparison of the storage costs with and without this new feature below:

Option 100 TB AWS EBS 100 TB AWS S3 for backups 100 TB AWS EBS + High compression 100 TB AWS S3 for backups + High Compression
Annual run rate

$120,000

$25,200

$1.2

< $1

As you see it is a 100,000x difference! What is more interesting, the cost of the storage with the High Compression feature enabled always stays flat and the chart now looks like this:

Theory

Not many people know, but data on disks is stored as bits, which are 0s and 1s. They form the binary sequences which are translated into normal data.

After thorough research, we came to the conclusion that we can replace the 1s with 0s easily. The formula is simple:

f(1) = 0

So instead of storing all these 1s, our High Compression feature stores zeroes only:

 

Implementation

The component which does the conversion is called the Nullifier, and every bit of data goes through it. We are first implementing this feature in Percona Operator for Percona XtraDB Cluster and below is the technical view of how it is implemented in Kubernetes:

As you see, all the data written by the user (all Insert or Update statements) goes through the Nullifier first, and only then are stored on the Persistent Volume Claim (PVC). With the High Compression feature enabled, the size of the PVC can be always 1 GB.

Percona is an open source company and we are thrilled to share our code with everyone. You can see the Pull Request for the High Compression feature here. As you see in the PR, our feature provides the Nullifier through the underestimated and very powerful Blackhole engine.

The High Compression feature will be enabled by default starting from PXC Operator version 1.8.0, but we have added the flag into cr.yaml to disable this feature if needed: spec.pxc.highCompression: true.

Backups and with the High Compression feature are blazing fast and take seconds with any amount of data. The challenge our Engineering team is working on now is recovery. The Nullifier does the job, but recovering the data is hard. We are confident that De-Nullifier will be released in 1.8.0 as well.

Conclusion

Percona is spearheading innovation in the database technology field. The High Compression feature solves the storage growth problem and as a result, reduces the cloud bill significantly. The release of the Percona Kubernetes Operator for Percona XtraDB Cluster 1.8.0 is planned for mid-April, but this feature is already available in Tech Preview.

As a quick peek at our roadmap, we are glad to share that the Innovation Team has already started working on the High-Density feature, which will drastically reduce the compute footprint required to run MySQL databases.

Share this post

Comments (6)

  • Kay Agahd Reply

    Nullifier for president, LOL 🙂
    Great post, percona rocks! 🙂

    April 1, 2021 at 9:08 am
  • Vinicius M. Grippa Reply

    I still prefer to use quantum storage disks since they can store both 0 and 1’s at the same time. This will increase the storage capacity to petabytes of data 😀

    April 1, 2021 at 11:16 am
  • Tate McDaniel Reply

    Blackhole has always been my favorite engine. It is great to see it getting the love it deserves!

    April 1, 2021 at 12:21 pm
  • Bruno Cabral Reply

    When I saw the PR I was very scared of Percona Future, now I noticed the date and everything makes sense. Well done!!

    April 1, 2021 at 2:51 pm
  • Ivan Baldo Reply

    This is amazing tech! Applying to all my servers now!
    I also see dramatic reductions on IOPS, CPU, network throughout (specially outgoing traffic) and replica lag!
    Nothing short of revolutionary, well done!

    April 1, 2021 at 11:29 pm
  • Anna Tuen Reply

    Have the team considered the 2nd level storage this approach offers. As zero is empty in the middle, that space can be utilitized for storage of additional non-zero data.
    Double your capacity is a single action.

    April 6, 2021 at 6:10 am

Leave a Reply