Data masking lets you hide sensitive fields (emails, credit-card numbers, job titles, etc.) while keeping data realistic for reporting, support, or testing. It is particularly useful when you collaborate with external entities and need to share your data for development reasons. You also need to protect your data and keep your customers’ privacy safe. Last but not least, data masking is required to achieve compliance with data privacy regulations like GDPR, HIPAA, and CCPA.
Percona Server for MySQL 8.4 includes a data masking component with ready-made masking and synthetic-data generation functions you can use directly in SQL. Below, I show a small, practical workflow: install/enable the component, create a sample table, create a view/stored procedure that presents masked data to a restricted user, and dump/export masked data.
For my tests, I’m using Percona Server for MySQL 8.4.
The examples use the Percona masking functions such as mask_inner(), mask_pan(), gen_rnd_email(), and dictionary helpers; these are part of the Percona masking functionality.
Install the data masking component
Here is the minimal install sequence for the component: create the internal masking_dictionaries table and then install the component. Also note, some dictionary functions require the MASKING_DICTIONARIES_ADMIN privilege. Example:
|
1 2 3 4 5 6 7 8 9 10 11 12 |
-- create internal dictionary table (run as a privileged admin) CREATE TABLE IF NOT EXISTS mysql.masking_dictionaries ( Dictionary VARCHAR(256) NOT NULL, Term VARCHAR(256) NOT NULL, UNIQUE INDEX dictionary_term_idx (Dictionary, Term) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4; -- install the component INSTALL COMPONENT 'file://component_masking_functions'; -- grant dictionary-admin if you plan to add/remove dictionary terms GRANT MASKING_DICTIONARIES_ADMIN ON *.* TO 'root'@'localhost'; |
Following MySQL best practices, many admins disable the root user, which previously caused these functions to stop working. The server now uses the built-in mysql.session user to execute dictionary queries.
For this to work, you need to grant the mysql.session user SELECT, INSERT, UPDATE, and DELETE privileges on the masking_dictionaries table.
|
1 2 |
-- grant dictionary-admin if you plan to add/remove dictionary terms GRANT SELECT, INSERT, UPDATE, DELETE ON mysql.masking_dictionaries TO 'mysql.session'@'localhost'; |
Example data: Create a table and insert some rows
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
CREATE DATABASE example; USE example; CREATE TABLE customers ( id INT AUTO_INCREMENT PRIMARY KEY, full_name VARCHAR(200), email VARCHAR(255), credit_card VARCHAR(32), job_title VARCHAR(100) ); INSERT INTO customers (full_name, email, credit_card, job_title) VALUES |
Simple masking examples
Let’s see how to mask common fields.
Percona Server for MySQL provides many functions. A few useful ones:
- mask_inner(str, margin_left, margin_right [, mask_char]) — masks the inner part of a string.
- mask_outer(str, left_mask, right_mask [, mask_char]) — masks both ends.
- mask_pan(str) / mask_pan_relaxed(str) — mask primary account numbers (credit cards).
- gen_rnd_email() — generates a randomized email (useful for synthetic exports)
- dictionary functions + masking_dictionary_term_add() — let you supply dictionaries for generated terms.
There are many more. See the functions reference for the full list here: https://docs.percona.com/percona-server/8.4/data-masking-function-list.html
Example: Mask email but keep the domain
We want to show something like a****@example.com. Use substring functions plus mask_inner():
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
-- View expression to keep domain and mask the local part except first/last char SELECT id, full_name, CONCAT( mask_inner(SUBSTRING_INDEX(email, '@', 1), 1, 1, 'X'), '@', SUBSTRING_INDEX(email, '@', -1) ) AS email_masked FROM customers; +----+-------------+-------------------------+ | id | full_name | email_masked | +----+-------------+-------------------------+ | 1 | Alice Smith | aXXXe@mydomain1.com | | 2 | Bob Rossi | bXXXXXXXi@mydomain2.com | | 3 | Caroline Yu | cXXXXXXe@mydomain3.com | +----+-------------+-------------------------+ |
If you don’t need to keep the domain, another function is available for generating a completely random email address: gen_rnd_email([name_size, surname_size, domain])
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
-- Generate a random email address SELECT id, full_name, gen_rnd_email(4, 5, 'mydomain.edu') AS email_masked FROM customers; +----+-------------+-------------------------+ | id | full_name | email_masked | +----+-------------+-------------------------+ | 1 | Alice Smith | yqgy.cflrn@mydomain.edu | | 2 | Bob Rossi | betf.cetqp@mydomain.edu | | 3 | Caroline Yu | qefm.jrgrw@mydomain.edu | +----+-------------+-------------------------+ |
Example: Mask credit card (PAN), leaving the last four digits
Use mask_pan():
|
1 2 3 4 5 6 7 8 9 10 |
-- produces XXXXXXXXXXXX1111 style output (if input length valid) SELECT id, mask_pan(credit_card) AS cc_masked FROM customers; +----+------------------+ | id | cc_masked | +----+------------------+ | 1 | XXXXXXXXXXXX1111 | | 2 | XXXXXXXXXXXX1234 | | 3 | XXXXXXXXXXXX9876 | +----+------------------+ |
Example: Mask job title with synthetic dictionary terms
You can populate a dictionary called jobs and then use gen_dictionary(‘jobs’):
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
-- add a few dictionary terms (requires MASKING_DICTIONARIES_ADMIN grant) SELECT masking_dictionary_term_add('jobs', 'Engineer'); SELECT masking_dictionary_term_add('jobs', 'Designer'); SELECT masking_dictionary_term_add('jobs', 'Account Manager'); SELECT masking_dictionary_term_add('jobs', 'Support Specialist'); -- let’s take a look what’s inside the dictionaries table mysql> select * from mysql.masking_dictionaries; +------------+--------------------+ | Dictionary | Term | +------------+--------------------+ | jobs | Account Manager | | jobs | Designer | | jobs | Engineer | | jobs | Support Specialist | +------------+--------------------+ -- use gen_dictionary to produce random job titles from that dictionary SELECT id, gen_dictionary('jobs') AS job_masked FROM customers; |
In the same way, you can define, for example, a dictionary of types of berries:
|
1 2 3 4 5 6 |
SELECT masking_dictionary_term_add('berries', 'Strawberry'); SELECT masking_dictionary_term_add('berries', 'Blueberry'); SELECT masking_dictionary_term_add('berries', 'Raspberry'); SELECT masking_dictionary_term_add('berries', 'Blackberry'); SELECT masking_dictionary_term_add('berries', 'Cranberry'); SELECT masking_dictionary_term_add('berries', 'Lingonberry'); |
You are free to create whatever dictionary you like. The only limit is your creativity.
Percona Server for MySQL adds dictionary support and helper functions. From version 8.4.4, an in-memory cache has been added to avoid querying the dictionary table constantly. So, if you modify the dictionary table directly, you need to run the following command to resync the cache.
|
1 |
SELECT masking_dictionaries_flush(); |
Create a view that shows masked data
Here is a common case when a user has access to the database, but they’re not granted direct access to the base tables. In such a case, we may create a view and grant the user access only to the view.
A view is a simple, robust way to let a low-privileged user query masked results while protecting the real table.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
USE example; -- create a read-only user who should not see real data CREATE USER 'masked_user'@'%' IDENTIFIED BY 'ChangeMe123!'; -- create a view that applies masking functions: CREATE OR REPLACE VIEW v_customers_masked AS SELECT id, full_name, -- mask the local part of email, keep domain CONCAT( mask_inner(SUBSTRING_INDEX(email, '@', 1), 1, 1, 'X'), '@', SUBSTRING_INDEX(email, '@', -1) ) AS email_masked, -- mask PAN (payment card number) mask_pan(credit_card) AS credit_card_masked, -- generate a dictionary-based fake job title (stable-ish per query) gen_dictionary('jobs') AS job_masked FROM customers; -- grant only SELECT on the view (no access to real table) GRANT SELECT ON example.v_customers_masked TO 'masked_user'@'%'; -- (do NOT grant SELECT on example.customers !) |
Reconnect to the database using the new masked_user credentials.
|
1 |
# mysql -u masked_user -p'ChangeMe123!' |
Let’s run queries now against the created view and the base table. The user can see masked/synthetic values only. This pattern keeps the real table protected while letting downstream users run queries and reports.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 |
mysql> select * from example.v_customers_masked; +----+-------------+-------------------------+--------------------+------------+ | id | full_name | email_masked | credit_card_masked | job_masked | +----+-------------+-------------------------+--------------------+------------+ | 1 | Alice Smith | aXXXe@mydomain1.com | XXXXXXXXXXXX1111 | Engineer | | 2 | Bob Rossi | bXXXXXXXi@mydomain2.com | XXXXXXXXXXXX0004 | Designer | | 3 | Caroline Yu | cXXXXXXe@mydomain3.com | XXXXXXXXXXX0009 | Designer | +----+-------------+-------------------------+--------------------+------------+ 3 rows in set (0.01 sec) mysql> select * from example.customers; ERROR 1142 (42000): SELECT command denied to user 'masked_user'@'localhost' for table 'customers' |
Can we use stored procedure for masking data?
Sure, you can. Similarly to the view, you can also rely on a stored procedure for masking your data. We can implement the same masking as before and provide only the EXECUTE grant for the stored procedure to the unprivileged user.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
DELIMITER $$ CREATE PROCEDURE get_masked_customers() BEGIN SELECT id, full_name, CONCAT(mask_inner(SUBSTRING_INDEX(email, '@', 1), 1, 1, 'X'),'@',SUBSTRING_INDEX(email, '@', -1)) AS email_masked, mask_pan(credit_card) AS credit_card_masked, gen_dictionary('jobs') AS job_masked FROM customers; END$$ DELIMITER ; -- provide EXECUTE grant to masked_user GRANT EXECUTE ON PROCEDURE example.get_masked_customers TO 'masked_user'@'%'; |
Reconnect to the database using the masked_user credentials and call the procedure.
|
1 2 3 4 5 6 7 8 9 10 11 |
# mysql -u masked_user -p'ChangeMe123!' mysql> CALL example.get_masked_customers; +----+-------------+-------------------------+--------------------+--------------------+ | id | full_name | email_masked | credit_card_masked | job_masked | +----+-------------+-------------------------+--------------------+--------------------+ | 1 | Alice Smith | aXXXe@mydomain1.com | XXXXXXXXXXXX1111 | Support Specialist | | 2 | Bob Rossi | bXXXXXXXi@mydomain2.com | XXXXXXXXXXXX1234 | Designer | | 3 | Caroline Yu | cXXXXXXe@mydomain3.com | XXXXXXXXXXXX9876 | Designer | +----+-------------+-------------------------+--------------------+--------------------+ |
Is it preferable to use a view or a stored procedure? It depends on the goal you have.
Using a view, you can permit the user to run SELECT statements on top of the view, adding filter conditions or doing JOINs with other tables. Using the stored procedure instead, you are permitting the user to get only a fixed result, all the time, the full masked table, or whatever the result provided by the stored procedure’s code is. In order to provide more flexibility, you can eventually add input parameters to the stored procedure to implement filter conditions, sorting, and limiting the result, and so forth. You can put a lot of complexity in the procedure’s logic if you like, but in the end, the view should provide more flexibility.
Dump/export masked data
We can not use mysqldump directly for dumping the values from the view. Indeed, mysqldump can only dump the definition of the view (the CREATE VIEW statement). Data is not dumped from a VIEW
The right approach here is to use SELECT … INTO OUTFILE from the view.
If you prefer CSV or other flat files:
|
1 2 3 4 5 |
-- run as masked_user (or a user that has FILE privilege and access) SELECT * FROM example.v_customers_masked INTO OUTFILE '/var/lib/mysql-files/customers_masked.csv' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY 'n'; |
Make sure secure_file_priv and file system permissions allow writing to that path, and the masked_user has the FILE grant. If not, add the grant to the user and retry:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
GRANT FILE ON *.* TO 'masked_user'@'%'; mysql> SHOW GLOBAL VARIABLES LIKE 'secure_file_priv'; +------------------+-----------------------+ | Variable_name | Value | +------------------+-----------------------+ | secure_file_priv | /var/lib/mysql-files/ | +------------------+-----------------------+ Check now the content of the file /var/lib/mysql-files/customers_masked.csv and you should see all data properly masked. # cat /var/lib/mysql-files/customers_masked.csv 3,"Caroline Yu","<a href="mailto:cXXXXXXe@mydomain3.com">[email protected]</a>","XXXXXXXXXXX0009","Designer" |
Use mysqldump with proxysql
To be honest, there’s a way we can use mysqldump for dumping masked data. Instead of dumping directly from MySQL Server, you can point mysqldump to ProxySQL, which will perform the data masking on the fly.
ProxySQL can be configured to match incoming queries and replace them with modified or different SQL code. For example, we can replace the typical query executed by mysqldump SELECT /*!40001 SQL_NO_CACHE */ * FROM tablename with a custom one using the masking functions we need.
The goal of this article is not to explain ProxySQL; the assumption is that you know how it works and how to configure and use it. We also assume ProxySQL is already installed on our system.
For more details about ProxySQL, look at the official documentation on the website: http://www.proxysql.com
In the ProxySQL configuration, we can simply create a query rule for rewriting on-the-fly the mysqldump’s query for the customers table.
|
1 2 3 4 5 6 7 8 9 |
-- create the query rule in the ProxySQL configuration to match and replace the mysqldump query INSERT INTO mysql_query_rules (rule_id,active,schemaname,match_pattern,replace_pattern) VALUES (1,1,'example', '^SELECT /*!40001 SQL_NO_CACHE */ * FROM `customers`', 'SELECT id, full_name, CONCAT(mask_inner(SUBSTRING_INDEX(email, "@", 1), 1, 1, "X"),"@",SUBSTRING_INDEX(email, "@", -1)) AS email, mask_pan(credit_card) AS credit_card, gen_dictionary("jobs") AS job_title FROM customers' ); LOAD QUERY RULES TO RUNTIME; SAVE QUERY RULES TO DISK; |
Then we can run mysqldump connecting through the proxy as follows:
|
1 |
mysqldump -uroot -pmyrootpw -h 127.0.0.1 -P 6033 example customers --skip-extended-insert > customers.sql |
You can take a look now inside the dump file and see how INSERTs have been generated as masked.
|
1 2 3 4 5 6 7 8 9 10 11 12 |
-- -- Dumping data for table `customers` -- LOCK TABLES `customers` WRITE; /*!40000 ALTER TABLE `customers` DISABLE KEYS */; INSERT INTO `customers` VALUES (1,'Alice Smith','<a href="mailto:[email protected]">[email protected]</a>','XXXXXXXXXXXX1111','Designer'); INSERT INTO `customers` VALUES (2,'Bob Rossi','[email protected]','XXXXXXXXXXXX1234','Support Specialist'); INSERT INTO `customers` VALUES (3,'Caroline Yu','[email protected]','XXXXXXXXXXXX9876','Account Manager'); /*!40000 ALTER TABLE `customers` ENABLE KEYS */; UNLOCK TABLES; |
In this case, the dump is also completely transparent because we did not rename the fields returned for the table.
Additional tips & caveats
- Privilege model: grant only what’s needed. If you give masked_user access to the base table, masking via view becomes meaningless. The dictionary administration functions require MASKING_DICTIONARIES_ADMIN.
- Determinism: functions like gen_rnd_email() and gen_dictionary() may generate different results across queries; design accordingly when you need reproducible masked exports.
- Caching: Percona Server 8.4 introduced a dictionary term cache to speed dictionary lookups. If you change dictionary tables directly (not via provided functions), the cache can become out of sync — use the provided masking_dictionaries_flush() or configure the cache flush settings. The system variable dictionaries_flush_interval_seconds can be used to set the number of seconds between updates to the internal dictionary cache to match changes in the dictionaries table
- Static vs dynamic masking: the approach above is dynamic masking (real rows unchanged; masked on SELECT). For test environments, you may prefer static masking (create a masked copy of the dataset). In such a case, you need to rely on exporting data as described using the SELECT … INTO OUTFILE statement for all the tables you have to recreate as masked. Eventually consider using mysqldump with ProxySQL.
- Replication: if you have a replicated environment, the real data is replicated as usual and not masked on the replicas. Keep in mind that data masking only deals with SELECT statements; the real data is untouched.
- Challenges: If you have more tables to be masked, you have to define the masking view or stored procedure for every single table. In the case of the dump, you have to manage the export one table at a time, unless you’re using a ProxySQL based solution.
Another potential challenge is when a field used in JOIN conditions must be masked. In such a case, even the field in the joined table must be masked exactly the same way. Unfortunately, current functions do not support this case, and you have to implement custom code to manage it. Anyway, we hope it is not a very frequent case.
Conclusion
Data Masking is nowadays an important task for securing data and complying with privacy regulations. Percona Server for MySQL 8.4 provides a specific component with functions to implement data masking with minimal effort. You can safely share the data with your external partners to develop your business, securing your customers’ privacy.
For more details, look at the documentation:
- https://docs.percona.com/percona-server/8.4/
- https://docs.percona.com/percona-server/8.4/data-masking-overview.html
MySQL 8.0 reaching end of life in 2026 is a big moment for your database plans. If your priority is staying safe and stable on 8.0, see how Percona’s EOL support can keep your environment protected.
Expert Support for Your Approach to MySQL 8.0 EOL