Buy Percona ServicesBuy Now!

Keep Sensitive Data Secure in a Replication Setup

 | April 30, 2018 |  Posted In: MySQL, Replication, Security

PREVIOUS POST
NEXT POST

Keep sensitive data secureThis blog post describes how to keep sensitive data secure on slave servers in a MySQL async replication setup.

Almost every web application has a sensitive data: passwords, SNN, credit cards, emails, etc. Splitting the database to secure and “public” parts allows for restricting user and application parts access to sensitive data.

Field encryption

This is based on MySQL encryption functions or on client-side encryption when the authorized user knows a secret, but encrypted data is distributed to all slaves.

  • If possible, use hashes with a big enough salt, and do not store real sensitive data in the database. A good example is passwords. An end-user sends the login and password, application/SQL code calculates the hash with a salt value unique for each end-user and compares the hash with the value stored in the database. Even if the attacker gets the hashes, it’s still hard or even impossible to extract real passwords for all users. Make sure that you are using a good random number generator for the salt, application-side secret, and a good hash function (not MD5).
  • Encryption is not suitable if you are going to provide public access to your database (via slave dumps in sql/csv/xml/json format).
  • Encryption is a complex topic. Check here for a good blog post explaining hashing usage, and try to find a security consultant if you are inventing some “new” method of storing and encrypting data.

Field encryption example

I’m using a single server setup, because the most important part of data separation should be done on the application side. The secure part of the application has a secret passphrase. For example, you can place the code working with authentication, full profile and payments on a separate server and use a dedicated MySQL account.

Summary

  • GOOD: Master and slave servers have exactly the same data and no problems with replication.
  • GOOD: Even if two different end-users have exactly the same password, the stored values are different due to random bytes in the init vector for AES encryption.
  • GOOD: Both the encryption and random number generation uses an external library (openssl).
  • CONF: It’s important to have binlog_format=ROW to avoid sending the secret to slave servers.
  • CONF: Do not allow end-users to change data without changing the init_vector, especially for small strings without random padding. Each update should cause init_vector re-generation.
  • BAD: Encrypted data is still sent to slave servers. If the encryption algorithm or protocol is broken, it is possible to get access to data from an insecure part of the application.
  • BAD: The described protocol still could be insecure.

Replication filters

There are two types of replication filters: a master-side with binlog-*db and a slave-side with replicate-*.

Both could cause replication breakage. Replication filters were created for STATEMENT-based replication and are problematic with modern binlog_format=ROW + gtid_mode=on setup. You can find several cases related to database-level slave-side filters in this blog post. If you still need slave-side filtering, use per-table replicate-wild-*-table options.

Master-side

Even if binary logging is disabled for a specific database, the statement still could be stored in the binary log if it’s a DDL statement, or if the binlog_format is STATEMENT or MIXED and default database is not used by the statement. For details, see the reference manual for the binlog-do-db option. In order to avoid replication issues, you should use ROW-based replication and run SET SESSION sql_log_bin=0; before each DDL statement is executed against the ignored database. It’s not a good idea to use binlog-do-db, because you are losing control of what should be replicated.

Why is binary log filtering useful? Changing the sql_log_bin variable is prohibited inside transactions. The sql_log_bin is DANGEROUS, please do not use it instead of binlog-ignore-db in production on the application side. If you need it for database administration, make sure that you are always typing the “session” word before sql_log_bin. This makes problematic consistent updates of multiple entities inside database.

We still should have the ability to hide just one column from the table. But if we are ignoring the database, we should provide a method of reading non-secure data on slaves / by restricted MySQL accounts. This is possible with triggers and views:

Summary

  • BAD: The data is not the same on the master and slaves. It potentially breaks replication. It’s not possible to use a slave’s backup to restore the master or promote the slave as a new master.
  • BAD: Triggers could reduce DML statement performance.
  • GOOD: The sensitive data is not sent to slaves at all (and not written to binary log).
  • GOOD: It works with GTID
  • GOOD: It requires no application changes (or almost no application changes).
  • GOOD: binlog-ignore-db allows us to not use the dangerous sql_log_bin variable after initial table creation.
PREVIOUS POST
NEXT POST
Nickolay Ihalainen

Nickolay joined Percona in December 2010, after working for several years at what is now the most popular cinema site in Russia. During the time he was there, Nickolay and a small team of developers were responsible for scaling the site into one which now serves over a million unique visitors per day. Prior to that, he worked for several other companies, including NetUp, which provides ISP billing and IPTV solutions, and eHouse, the oldest Russian e-commerce company. Nickolay has a great deal of experience in both systems administration and programming. His experience includes extensive hands-on work with a broad range of technologies, including SQL, MySQL, PHP, C, C++, Python, Java, XML, OS parameter tuning (Linux, Solaris), caching techniques (e.g., memcached), RAID, file systems, SMTP, POP3, Apache, networking and network data formats, and many others. He is an expert in scalability, performance, and system reliability.

3 Comments

  • Hi Nikolay,
    Thank you for your article.
    In the “field encryption example” you have a line insert into t (c1,c2, rnd_pad) values (1, AES_ENCRYPT(‘Secret’, @key_str, @init_vector), @init_vector);
    AES_ENCRYPT returns a binary string containing the encrypted output. But c2 is varchar(255). I think c2 should have a different data type.

    • Hi Alex,

      Good point, the best storage for binary data with unknown size is varbinary(N) column type.
      For example in mysql 8.0, insert will cause an error even if both client and server encodings are the same (utf8mb4):
      mysql> create table t1(c varbinary(255));
      Query OK, 0 rows affected (0.74 sec)

      mysql> insert into t1 values(AES_ENCRYPT(‘Hi, Привет, 你好’,’secret’));
      Query OK, 1 row affected (0.39 sec)

      mysql> select AES_DECRYPT(‘secret’) from t1;
      ERROR 1582 (42000): Incorrect parameter count in the call to native function ‘AES_DECRYPT’
      mysql> select AES_DECRYPT(c,’secret’) from t1;
      +————————–+
      | AES_DECRYPT(c,’secret’) |
      +————————–+
      | Hi, Привет, 你好 |
      +————————–+
      1 row in set (0.00 sec)

      mysql> insert into t values(AES_ENCRYPT(‘Hi, Привет, 你好’,’secret’));
      ERROR 1366 (HY000): Incorrect string value: ‘\xA5\xF5\x8D\xB5o\x0F…’ for column ‘c’ at row 1

      In heterogeneous character set environment the problem will be also with result set, because AES_DECRYPT returns binary and you need additional conversion step:

      select cast(AES_DECRYPT(c,’secret’) AS CHAR CHARACTER SET utf8mb4) as c, @@character_set_client, @@character_set_server from t1;
      +—————-+————————+————————+
      | c | @@character_set_client | @@character_set_server |
      +—————-+————————+————————+
      | Hi, Привет, ?? | cp1251 | utf8mb4 |
      +—————-+————————+————————+

      Best regards,
      Nickolay

Leave a Reply