Buy Percona ServicesBuy Now!

Replication from Percona Server for MySQL to PostgreSQL using pg_chameleon

 | August 17, 2018 |  Posted In: High-availability, MySQL, PostgreSQL, Replication

PREVIOUS POST
NEXT POST

postgres mysql replication using pg_chameleonReplication is one of the well-known features that allows us to build an identical copy of a database. It is supported in almost every RDBMS. The advantages of replication may be huge, especially HA (High Availability) and load balancing. But what if we need to build replication between 2 heterogeneous databases like MySQL and PostgreSQL? Can we continuously replicate changes from a MySQL database to a PostgreSQL database? The answer to this question is pg_chameleon.

For replicating continuous changes, pg_chameleon uses the mysql-replication library to pull the row images from MySQL, which are transformed into a jsonb object. A pl/pgsql function in postgres decodes the jsonb and replays the changes into the postgres database. In order to setup this type of replication, your mysql binlog_format must be “ROW”.

A few points you should know before setting up this tool :

  1. Tables that need to be replicated must have a primary key.
  2. Works for PostgreSQL versions > 9.5 and MySQL > 5.5
  3. binlog_format must be ROW in order to setup this replication.
  4. Python version must be > 3.3

When you initialize the replication, pg_chameleon pulls the data from MySQL using the CSV format in slices, to prevent memory overload. This data is flushed to postgres using the COPY command. If COPY fails, it tries INSERT, which may be slow. If INSERT fails, then the row is discarded.

To replicate changes from mysql, pg_chameleon mimics the behavior of a mysql slave. It creates the schema in postgres, performs the initial data load, connects to MySQL replication protocol, stores the row images into a table in postgres. Now, the respective functions in postgres decode those rows and apply the changes. This is similar to storing relay logs in postgres tables and applying them to a postgres schema. You do not have to create a postgres schema using any DDLs. This tool automatically does that for the tables configured for replication. If you need to specifically convert any types, you can specify this in the configuration file.

The following is just an exercise that you can experiment with and implement if it completely satisfies your requirement. We performed these tests on CentOS Linux release 7.4.

Prepare the environment

Set up Percona Server for MySQL

InstallMySQL 5.7 and add appropriate parameters for replication.

In this exercise, I have installed Percona Server for MySQL 5.7 using YUM repo.

pg_chameleon requires the following the parameters to be set in your my.cnf file (parameter file of your MySQL server). You may add the following parameters to /etc/my.cnf

Now start your MySQL server after adding the above parameters to your my.cnf file.

Fetch the temporary root password from mysqld.log, and reset the root password using mysqladmin

Now, connect to your MySQL instance and create sample schema/tables. I have also created an emp table for validation.

Create a user for configuring replication using pg_chameleon and give appropriate privileges to the user using the following steps.

While creating the user in your mysql server (‘usr_replica’@’%’), you may wish to replace % with the appropriate IP or hostname of the server on which pg_chameleon is running.

Set up PostgreSQL

Install PostgreSQL and start the database instance.

You may use the following steps to install PostgreSQL 10.x

As seen in the following logs, create a user in PostgreSQL using which pg_chameleon can write changed data to PostgreSQL. Also create the target database.

Steps to install and setup replication using pg_chameleon

Step 1: In this exercise, I installed Python 3.6 and pg_chameleon 2.0.8 using the following steps. You may skip the python install steps if you already have the desired python release. We can create a virtual environment if the OS does not include Python 3.x by default.

Step 2: This tool requires a configuration file to store the source/target server details, and a directory to store the logs. Use the following command to let pg_chameleon create the configuration file template and the respective directories for you.

The above command would produce the following output, which shows that it created some directories and a file in the location where you ran the command.

Copy the sample configuration file to another file, lets say, default.yml

Here is how my default.yml file looks after adding all the required parameters. In this file, we can optionally specify the data type conversions, tables to skipped from replication and the DML events those need to skipped for selected list of tables.

Step 3: Initialize the replica using this command:

The above command creates a schema and nine tables in the PostgreSQL database that you specified in the .pg_chameleon/configuration/default.yml file. These tables are needed to manage replication from source to destination. The same can be observed in the following log.

Step 4: Add the source details to pg_chameleon using the following command. Provide the name of the source as specified in the configuration file. In this example, the source name is mysql and the target is postgres database defined under pg_conn.

Once you run the above command, you should see that the source details are added to the t_sources table.

Step 5: Initialize the replica/slave using the following command. Specify the source from which you are replicating the changes to the PostgreSQL database.

Initialization involves the following tasks on the MySQL server (source).

1. Flush the tables with read lock
2. Get the master’s coordinates
3. Copy the data
4. Release the locks

The above command creates the target schema in your postgres database automatically.
In the default.yml file, we mentioned the following schema_mappings.

So, now it created the new schema scott in the target database db_replica.

Step 6: Now, start replication using the following command.

Step 7: Check replication status and any errors using the following commands.

This is how the status looks:

Now, you should see that the changes are continuously getting replicated from MySQL to PostgreSQL.

Step 8:  To validate, you may insert a record into the table in MySQL that we created for the purpose of validation and check that it is replicated to postgres.

In the above log, we see that the record that was inserted to the MySQL table was replicated to the PostgreSQL table.

You may also add multiple sources for replication to PostgreSQL (target).

Reference : http://www.pgchameleon.org/documents/

Please refer to the above documentation to find out about the many more options that are available with pg_chameleon

PREVIOUS POST
NEXT POST
Avinash Vallarapu

Avinash Vallarapu joined Percona in the month of May 2018. Before joining Percona, Avi worked as a Database Architect at OpenSCG for 2 Years and as a DBA Lead at Dell for 10 Years in Database technologies such as PostgreSQL, Oracle, MySQL and MongoDB. He has given several talks and trainings on PostgreSQL. He has good experience in performing Architectural Health Checks and Migrations to PostgreSQL Environments.

Leave a Reply