MariaDB S3 Engine: Implementation and Benchmarking

MariaDB S3 EngineMariaDB 10.5 has an excellent engine plugin called “S3”. The S3 storage engine is based on the Aria code and the main feature is that you can directly move your table from a local device to S3 using ALTER. Still, your data is accessible from MariaDB client using the standard SQL commands. This is a great solution to those who are looking to archive data for future references at a low cost. The S3 engine is READ_ONLY so you can’t perform any write operations ( INSERT/UPDATE/DELETE ), but you can change the table structure.

In this blog, I am going to explain the details about the S3 engine’s implementation and aspects. And in the end, I compare the performance results from both Local and S3 engine tables.

S3 Engine Implementation

The S3 engine is alpha-level maturity and it will not load by default during MariaDB startup. You have to enable the S3 engine as follows:

You also need to configure your S3 credentials in the MariaDB config file so that MariaDB can authenticate the connection and communicate with the S3 bucket. My config file looks like this:

Note: From a security perspective, your AWS credentials are plaintext. A new key pair should be created specifically for this plugin and only the necessary IAM grants be given.

After configuring the parameters, you need to restart MariaDB to apply the settings. After the restart, you can install the plugin as follows.

Now the S3 engine is ready to use.

How Do I Move The Table to The S3 Engine?

You can move the table to the S3 engine by using the ALTER. For testing, I have created the table “percona_s3” at my lab.

Physically, you can see both .frm and .ibd files once the table is created (default InnoDB). I am going to convert the table “percona_s3” to the S3 engine.

Note: You will get the error “ERROR 3 (HY000):” if you enabled SELINUX, or if anything related to S3 access is misconfigured.

After converting to the S3 engine, you can see only the .frm file. The data has been migrated out of InnoDB and into the S3 engine storage format.

Note: S3 will split the data and index pages and store them separately in respective folders.

S3 Engine Operation

For testing, I created the below table on S3. Let’s test the commands one by one.

S3 Engine with INSERT/UPDATE DELETE:

With all three statements, the query will return the “ERROR 1036: read only”.

Sample output:

S3 Engine with SELECT:

Adding Index to S3 Engine Table:

Modifying the Column on S3 Engine Table:

S3 Engine with DROP:

Note: DROP TABLE will completely remove the data and index pages from S3 as well.

In short, the S3 will allow the read commands and the structure modification commands. Changing or adding any data into the S3 is restricted. MariaDB community is planning to allow the BATCH UPDATE (single user) on S3. Right now, if you need to change any data on S3 tables, you need to follow the below procedure:

  • Convert table from S3 to local (Engine = InnoDB)
  • Modify the data
  • Convert table from Local to S3 (Engine = S3)

You can also query the metadata from INFORMATION_SCHEMA and retrieve the metadata using the SHOW commands.

Comparing the Query Results on Both S3 and Local

In this section, I am going to compare the query results on both the S3 engine and Local. We need to consider the below points before going to the test results.

  • I have disabled the parameters “innodb_buffer_pool_dump_at_shutdown” and “innodb_buffer_pool_load_at_startup”.
  • MariaDB server has restarted before and after executing every single SQL query shown below.
  • MariaDB server and S3 are in the same zone.
  • The ping time between the MySQL and s3 is 1.18 ms

S3 vs Local ( Count(*) )

At S3:

At local:

Count(*) is faster on S3engine. S3 tables are read_only, and it might display the stored value like MyISAM.

S3 vs Local (Entire Table Data)

At S3:

At Local:

S3 vs Local (PRIMARY KEY based lookup)

At S3:

At Local:

S3 engine is pretty good with COUNT(*). And, if we retrieve the actual data from S3, we can see little delay compared to local.

I have conducted the above tests with the default S3 settings. As per the MariaDB document, we can consider the below things to increase the performance on S3:

  • Decreasing s3_block_size. This can be done both globally and per table.
  • Use COMPRESSION_ALGORITHM=zlib when creating the table. This will decrease the amount of data transferred from S3 to the local cache.
  • Increasing the size of the s3 page cache: s3_pagecache_buffer_size

I would say the performance also depends on the disk access speed and network health between server and S3. Consider the below points:

  • Having a low-performance disk and a good network between servers and S3 will favor S3.
  • Having a good performance disk and poor network between servers and S3 will favor Local.

Conclusion

  • This is a very good solution for data archival from MariaDB community. You can query the historical data without restoring.
  • The table is completely read-only.
  • COUNT(*) is pretty fast like MyISAM.
  • Pt-online-schema change will not work on both scenarios (S3 to Local & Local to S3). It will fail because of the INSERT (when copying the data) and CREATE (the S3 table options will not support for InnoDB).
  • CREATE TABLE, DROP TABLE, INFORMATION_SCHEMA tables are slower as those operations need to check the S3.
  • For copying the Aria tables, you need to use the tool aria_s3_copy

I am working with Agustin on our next blog on this, covering the S3 engine compression. Stay tuned!

Share this post

Comment (1)

  • Sebastiano Cala Reply

    Hi Sri, I had some issues using this plugin regarding the IAM policy owned by the IAM user.
    The bucket is not visible in the AWS console or not listed by the CLI command s3api list-buckets.
    I struggled then to clean up the environment, I couldn’t delete the bucket due to Access Denied.
    Did you have the same issue?

    July 28, 2020 at 11:27 am

Leave a Reply