In this blog post, we will describe typical usage scenarios for dictionary operations in the Data Masking Component, which is available in Percona Server for MySQL as an open source alternative to Oracle’s enterprise version.

In particular, we will consider the following functions.

  • gen_dictionary() – a function that returns a random term from a dictionary.
  • gen_blocklist() – a function that replaces a term from one dictionary with a randomly selected term in another dictionary.

Generating random terms – gen_dictionary()

For instance, if you need to get a random word from a set of predefined values (a dictionary), you need to create this dictionary first. Just for simplicity, let’s create a dictionary called “colors” and fill it with the seven colors of the rainbow.

All the examples below assume that not only did you install the Data Masking Component but also created the ‘mysql.masking_dictionaries’ table, granted “MASKING_DICTIONARIES_ADMIN” privilege to your current MySQL user, and also granted required privileges to the “mysql.session@localhost” user as per instructions here and here.

Now, if we want to get a random color, we can execute the following query:

One of the possible outcomes could be as follows:

Or, in a more sophisticated scenario, when we need ten random colors, we can execute:

Replacing sensitive terms – gen_blocklist()

In the following set of examples, let’s create a table “tbl” and fill it with ids and random colors.

Then, modify “random_color” in rows with an even “id” with its value spelled backwards so that we would have a mixture of original colors and some other values.

Now, imagine a scenario when color names become a really sensitive piece of information, and they should never be revealed to the public when we query data from our “tbl” table.

In order to achieve this, let’s create another dictionary called “masked_colors.”

Please note that the number of unique “masked_colors” (3) differs from the number of original “colors” (7). In this example, we chose this approach on purpose.

Returning to our primary task of masking original colors, all we have to do now is use the “gen_blocklist()” function.

What happened here is that each term that was found in the “colors” dictionary was masked (substituted) with a random term from the “masked_colors” dictionary. At the same time, the unknown values (those that we reversed before) were not modified. Please also notice that there is no direct, unambiguous mapping between the terms in these dictionaries. Random factors play a significant role in this process, and the output of this query could be different on every execution. In this particular run, the same color “red” on the rows with “id” 7 and 9 got masked to different values “colorC” and “colorB”. On the other hand, the values with different colors “red” on the row with “id” 3 and “orange” on the row with “id” 5 got transformed into the same masked value “colorC”.

Conclusion

Although the demonstrated examples are, by their nature, a bit artificial, and real-life scenarios can be much more complicated, what we have shown in this blog post could be a good starting point for developing custom Data Masking solutions for real production environments. Moreover, taking into account that these dictionary-based operations became noticeably faster in Percona Server for MySQL 8.0.41 / 8.4.4, these techniques can be applied to data sets of a much larger scale. Check the “Dictionary term cache in Data Masking Component” blog post for more details.

Dictionary Operations in Data Masking

Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments