Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Foreign Data Wrappers in PostgreSQL and a closer look at postgres_fdw

August 21, 2018

Author

Jobin Augustine

Open Source

PostgreSQL

Share this Post:

FDWs with PostgreSQL There are a few features in PostgreSQL that are very compelling, and that I rarely see in other RDBMSs. Some of these features are the driving force behind the growing popularity of PostgreSQL. This blog post is about one of my favorite features: FDW (Foreign Data Wrapper). As the name indicates, this feature allows a PostgreSQL database to treat tables in a remote PostgreSQL database as locally available tables.

The history of FDW began when SQL/MED came out as part of the ANSI SQL standard specification in 2003. MED stands for “Management of External Data”. By definition, “external data” is the data that the DBMS is able to access but does not manage. There are two parts of this specification:

Foreign Table: this is about how to access external data sources and present them as relational tables.
Datalink: this extends the functionality of database systems to include control over external files without the need to store their contents directly in the database, such as LOBs. A column of a table could directly refer to a file.

PostgreSQL’s FDW capabilities address foreign tables only. It was introduced in PostgreSQL 9.1 and has been receiving improvements ever since.

Today there is a variety of FDWs which allow PostgreSQL to talk to most of the data sources we can think of. However, most FDWs are independent open source projects implemented as Postgres Extensions, and not officially supported by the PostgreSQL Global Development Group.

FDWs with PostgreSQL

In this blog post, we will take a closer look at the postgres_fdw which can be considered as the “reference implementation” for other FDW development efforts, and showcases its capabilities. This is the one FDW which comes with PostgreSQL source as a contrib extension module. The only other FDW which is part of PostgreSQL source tree is file_fdw.

Let’s look into postgres_fdw with a use case. In many organizations, there could be multiple systems catering to different functionalities/departments. For example, while an HR database may be holding the employee information the finance and payroll systems may need to access that same data. A common—but bad—solution for this is to duplicate the data in both systems. Data duplication often leads to problems, starting with data maintenance and accuracy. A smarter option, to avoid duplication while providing access to foreign databases to only the required data, is through FDWs.

Installation postgres_fdw

The Postgres Development Group (PGDG) offers PostgreSQL packages for all major Linux distributions. postgres_fdw itself is provided as a module that is usually included in the contrib package. In the example below we install such package for PostgreSQL 10 running on Red Hat/CentOS:

$ sudo yum install postgresql10-contrib.x86_64

1	$ sudo yum install postgresql10-contrib.x86_64

Steps to setup

Let’s consider two PostgreSQL Instances, source instance and a destination instance

source is the remote postgres server from where the tables are accessed by the destination database server as foreign tables.
destination is another postgres server where the foreign tables are created which is referring tables in source database server.

We are going to use these definitions of source and destination in the rest of the post. Let’s assume that the current application connects to destination database using a user app_user.

Step 1: Create a user on the source

Create a user in the source server using the following syntax. This user account will be used by the destination server to access the source tables

postgres=# CREATE USER fdw_user WITH ENCRYPTED PASSWORD 'secret';

1	postgres=# CREATE USER fdw_user WITH ENCRYPTED PASSWORD 'secret';

Step 2: Create test tables (optional)

Let’s create a test table in the source server and insert a few records.

postgres=> create table employee (id int, first_name varchar(20), last_name varchar(20));
CREATE TABLE

postgres=# insert into employee values (1,'jobin','augustine'),(2,'avinash','vallarapu'),(3,'fernando','camargos');
INSERT 0 3

postgres=> create table employee (id int, first_name varchar(20), last_name varchar(20));

CREATE TABLE

postgres=# insert into employee values (1,'jobin','augustine'),(2,'avinash','vallarapu'),(3,'fernando','camargos');

INSERT 0 3

Step 3: Grant privileges to user in the source

Give appropriate privileges to the fdw_user on the source table. Always try to limit the scope of privilege to minimum to improve security.
An example syntax is as follows:

postgres=# GRANT SELECT, INSERT, UPDATE, DELETE ON TABLE employee TO fdw_user;
GRANT

1 2	postgres=# GRANT SELECT, INSERT, UPDATE, DELETE ON TABLE employee TO fdw_user; GRANT

Step 4: Modify ACL in pg_hba.conf

We need to ensure that the proper authentication is setup for accessing source server from the destination server.
Add an entry into pg_hba.conf as shown below, preferably at the beginning of the file.

host    all all     destination_server_ip/32          md5

1	host all all destination_server_ip/32 md5

Step 5: Test connectivity and privileges on source

Before proceeding further, It is a good idea to make sure that we are able to connect to the source machine from this destination machine using the newly created database user (fdw_user).

In order to validate, on the destination server, use psql to connect to the source server:

$ psql -h hr -U fdw_user postgres

1	$ psql -h hr -U fdw_user postgres

You could even validate all privileges on the tables which are to be presented as foreign tables using this connection.

Step 6: Create postgres_fdw extension on the destination

Connect to destination server, and create the postgres_fdw extension in the destination database from where you wish to access the tables of source server. You must be a superuser to create the extension.

No postgres_fdw extension is needed on the source server.

postgres=# create extension postgres_fdw;
CREATE EXTENSION

1 2	postgres=# create extension postgres_fdw; CREATE EXTENSION

Validate if the extension is created using dx. Following is an example validation log.

postgres=# dx postgres_fdw
                            List of installed extensions
    Name    | Version | Schema |                    Description
--------------+---------+--------+----------------------------------------------------
postgres_fdw | 1.0     | public | foreign-data wrapper for remote PostgreSQL servers
(1 row)

postgres=# dx postgres_fdw

List of installed extensions

Name | Version | Schema | Description

--------------+---------+--------+----------------------------------------------------

postgres_fdw | 1.0 | public | foreign-data wrapper for remote PostgreSQL servers

(1 row)

Step 7: Grant privileges to user in the destination

Always better to limit the scope of the server definition to an application user. If a regular user needs to define a server, that user needs to have USAGE permission on the foreign data wrapper. Superuser can grant the privilege

postgres=# grant usage on FOREIGN DATA WRAPPER postgres_fdw to app_user;
GRANT

1 2	postgres=# grant usage on FOREIGN DATA WRAPPER postgres_fdw to app_user; GRANT

Alternatively, superuser (postgres) can create a server definition and then grant USAGE permission on that server definition to the application user like this:

postgres=# GRANT USAGE ON FOREIGN SERVER hr TO app_user;
GRANT

1 2	postgres=# GRANT USAGE ON FOREIGN SERVER hr TO app_user; GRANT

Step 8: Create a server definition

Now we can create a server definition. This foreign server is created using the connection details of the source server running on host “hr”. Let’s name the foreign server as itself as “hr”

postgres=> CREATE SERVER hr 
 FOREIGN DATA WRAPPER postgres_fdw
 OPTIONS (dbname 'postgres', host 'hr', port '5432');
CREATE SERVER

postgres=> CREATE SERVER hr

FOREIGN DATA WRAPPER postgres_fdw

OPTIONS (dbname 'postgres', host 'hr', port '5432');

CREATE SERVER

Step 9: Create user mapping from destination user to source user

Create a mapping on the destination side for destination user (app_user) to remote source user (fdw_user)

postgres=> CREATE USER MAPPING for app_user
SERVER hr
OPTIONS (user 'fdw_user', password 'secret');
CREATE USER MAPPING

postgres=> CREATE USER MAPPING for app_user

SERVER hr

OPTIONS (user 'fdw_user', password 'secret');

CREATE USER MAPPING

Step 10: Create foreign table definition on the destination

Create a foreign table in the destination server with the same structure as the source table, but with OPTIONS specifying schema_name and table_name

postgres=# CREATE FOREIGN TABLE employee
(id int, first_name character varying(20), last_name character varying(20))
SERVER hr
OPTIONS (schema_name 'public', table_name 'employee');

postgres=# CREATE FOREIGN TABLE employee

(id int, first_name character varying(20), last_name character varying(20))

SERVER hr

OPTIONS (schema_name 'public', table_name 'employee');

Step 11: Test foreign table

Validate whether we can query the foreign table we just created in the destination server.

postgres=> select * from employee;
id | first_name | last_name
----+------------+-----------
1 | jobin | augustine
2 | avinash | vallarapu
3 | fernando | camargos
(3 rows)

postgres=> select * from employee;

id | first_name | last_name

----+------------+-----------

1 | jobin | augustine

2 | avinash | vallarapu

3 | fernando | camargos

(3 rows)

As we can see from the above example, data is been accessed from the source database.

Now you might be thinking: “creating foreign tables one by one like this on the destination server is painful. Is it possible to do it automatically?“. The answer is yes – there is an option to import a full schema.

On the destination server, you can use the following syntax to import a schema.

postgres=# IMPORT FOREIGN SCHEMA "public" FROM SERVER hr INTO public;

1	postgres=# IMPORT FOREIGN SCHEMA "public" FROM SERVER hr INTO public;

If you wish to choose a certain list of tables for import, you can use the following syntax.

postgres=# IMPORT FOREIGN SCHEMA "public" limit to (employee) FROM SERVER hr INTO public;

1	postgres=# IMPORT FOREIGN SCHEMA "public" limit to (employee) FROM SERVER hr INTO public;

In the above example, it will import the definition of only one table (employee).

Advantages of foreign tables

The main use case of the foreign tables is to make the data available to systems without actually duplicating/replicating it. There are even simple implementations of sharding using FDW because data in the other shards can be made available for queries though FDWs.

A person coming from an Oracle-like background might think: “I can get data from a remote database table using simple DBLinks so what is the difference?“. The main difference is that FDW will maintain the meta-data/table definition about the foreign table locally. This results in better decisions compared to sending a simple SELECT * FROM <TABLE> to pull all results. We are going to see some of these advantages.

Note: In the following section always pay special attention on those lines starting with “Remote SQL:”

Query optimization

Since the definition of the foreign table is held locally, all query optimizations are made for remote executions too. Let’s consider a slightly more complex example where we have EMP (employee) and DEPT (department) tables in the HR database and SALGRADE (salary grade) table in the finance database. Suppose we want to know how many employees there are with a particular salary grade:

SELECT COUNT(*)
FROM EMP
JOIN SALGRADE ON EMP.SAL > SALGRADE.LOSAL AND EMP.SAL < SALGRADE.HISAL
WHERE SALGRADE.GRADE = 4;

SELECT COUNT(*)

FROM EMP

JOIN SALGRADE ON EMP.SAL > SALGRADE.LOSAL AND EMP.SAL < SALGRADE.HISAL

WHERE SALGRADE.GRADE = 4;

Let’s see how PostgreSQL handles this:

                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=805.44..805.45 rows=1 width=8)
   Output: count(*)
   ->  Nested Loop  (cost=100.00..798.33 rows=2844 width=0)
         Join Filter: ((emp.sal > (salgrade.losal)::double precision) AND (emp.sal < (salgrade.hisal)::double precision)) ->  Foreign Scan on public.emp  (cost=100.00..186.80 rows=2560 width=8)
               Output: emp.empno, emp.ename, emp.job, emp.mgr, emp.hiredate, emp.sal, emp.comm, emp.deptno
               <b>Remote SQL: SELECT sal FROM public.emp</b>
         ->  Materialize  (cost=0.00..35.55 rows=10 width=8)
               Output: salgrade.losal, salgrade.hisal
               ->  Seq Scan on public.salgrade  (cost=0.00..35.50 rows=10 width=8)
                     Output: salgrade.losal, salgrade.hisal
                     Filter: (salgrade.grade = 4)

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=805.44..805.45 rows=1 width=8)

Output: count(*)

-> Nested Loop (cost=100.00..798.33 rows=2844 width=0)

Join Filter: ((emp.sal > (salgrade.losal)::double precision) AND (emp.sal < (salgrade.hisal)::double precision)) -> Foreign Scan on public.emp (cost=100.00..186.80 rows=2560 width=8)

Output: emp.empno, emp.ename, emp.job, emp.mgr, emp.hiredate, emp.sal, emp.comm, emp.deptno

<b>Remote SQL: SELECT sal FROM public.emp</b>

-> Materialize (cost=0.00..35.55 rows=10 width=8)

Output: salgrade.losal, salgrade.hisal

-> Seq Scan on public.salgrade (cost=0.00..35.50 rows=10 width=8)

Output: salgrade.losal, salgrade.hisal

Filter: (salgrade.grade = 4)

Please pay special attention to the line reading :

Remote SQL: SELECT sal FROM public.emp

1	Remote SQL: SELECT sal FROM public.emp

It knows that only the sal column need to be fetched from the remote database.
If we change the count(*) to ename (Employee Name) column, the remote SQL changes like:

Remote SQL: SELECT ename, sal FROM public.emp

1	Remote SQL: SELECT ename, sal FROM public.emp

PostgreSQL tries to pull only the absolutely necessary data from the remote server.

Writable foreign tables

In the beginning, foreign tables were just readable. But, with time, the community introduced writable foreign tables functionality in PostgreSQL. Let us consider the following situation where management wants to give a salary increase of 10% to grade 3 employees:

UPDATE emp
SET    sal = sal * 1.1
FROM   salgrade
WHERE  emp.sal > salgrade.losal
AND emp.sal < salgrade.hisal
AND salgrade.grade = 3;

UPDATE emp

SET sal = sal * 1.1

FROM salgrade

WHERE emp.sal > salgrade.losal

AND emp.sal < salgrade.hisal

AND salgrade.grade = 3;

In this case, we are updating data on a remote table using a join condition with a local table. As we can see in the explain plan, an UPDATE statement is more complex because it involves 2 steps. First, it needs to fetch the data from the remote table to complete the join operation. Then, it updates the rows in the foreign table.

                                                                        QUERY PLAN                                                                        
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on public.emp  (cost=100.00..300.71 rows=669 width=118)
   Remote SQL: UPDATE public.emp SET sal = $2 WHERE ctid = $1
   ->  Nested Loop  (cost=100.00..300.71 rows=669 width=118)
         Output: emp.empno, emp.ename, emp.job, emp.mgr, emp.hiredate, (emp.sal * '1.1'::double precision), emp.comm, emp.deptno, emp.ctid, salgrade.ctid
         Join Filter: ((emp.sal > (salgrade.losal)::double precision) AND (emp.sal < (salgrade.hisal)::double precision)) ->  Foreign Scan on public.emp  (cost=100.00..128.06 rows=602 width=112)
               Output: emp.empno, emp.ename, emp.job, emp.mgr, emp.hiredate, emp.sal, emp.comm, emp.deptno, emp.ctid
               Remote SQL: SELECT empno, ename, job, mgr, hiredate, sal, comm, deptno, ctid FROM public.emp FOR UPDATE
         ->  Materialize  (cost=0.00..35.55 rows=10 width=14)
               Output: salgrade.ctid, salgrade.losal, salgrade.hisal
               ->  Seq Scan on public.salgrade  (cost=0.00..35.50 rows=10 width=14)
                     Output: salgrade.ctid, salgrade.losal, salgrade.hisal
                     Filter: (salgrade.grade = 3)

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------

Update on public.emp (cost=100.00..300.71 rows=669 width=118)

Remote SQL: UPDATE public.emp SET sal = $2 WHERE ctid = $1

-> Nested Loop (cost=100.00..300.71 rows=669 width=118)

Output: emp.empno, emp.ename, emp.job, emp.mgr, emp.hiredate, (emp.sal * '1.1'::double precision), emp.comm, emp.deptno, emp.ctid, salgrade.ctid

Join Filter: ((emp.sal > (salgrade.losal)::double precision) AND (emp.sal < (salgrade.hisal)::double precision)) -> Foreign Scan on public.emp (cost=100.00..128.06 rows=602 width=112)

Output: emp.empno, emp.ename, emp.job, emp.mgr, emp.hiredate, emp.sal, emp.comm, emp.deptno, emp.ctid

Remote SQL: SELECT empno, ename, job, mgr, hiredate, sal, comm, deptno, ctid FROM public.emp FOR UPDATE

-> Materialize (cost=0.00..35.55 rows=10 width=14)

Output: salgrade.ctid, salgrade.losal, salgrade.hisal

-> Seq Scan on public.salgrade (cost=0.00..35.50 rows=10 width=14)

Output: salgrade.ctid, salgrade.losal, salgrade.hisal

Filter: (salgrade.grade = 3)

Operator and function pushdown

PostgreSQL 9.5 release included the capability to assess and decide on the safety of pushing a function execution to remote server. Built-in functions are good candidates for this:

SELECT avg(sal)
FROM EMP
WHERE EMP.SAL > (SELECT LOSAL FROM SALGRADE WHERE GRADE = 4);

SELECT avg(sal)

FROM EMP

WHERE EMP.SAL > (SELECT LOSAL FROM SALGRADE WHERE GRADE = 4);

This statement results in the following query plan

                                QUERY PLAN                                 
---------------------------------------------------------------------------
 Foreign Scan  (cost=137.63..186.06 rows=1 width=8)
   Output: (avg(emp.sal))
   Relations: Aggregate on (public.emp)
   Remote SQL: SELECT avg(sal) FROM public.emp WHERE ((sal > $1::integer))
   InitPlan 1 (returns $0)
     ->  Seq Scan on public.salgrade  (cost=0.00..35.50 rows=10 width=4)
           Output: salgrade.losal
           Filter: (salgrade.grade = 4)

QUERY PLAN

---------------------------------------------------------------------------

Foreign Scan (cost=137.63..186.06 rows=1 width=8)

Output: (avg(emp.sal))

Relations: Aggregate on (public.emp)

Remote SQL: SELECT avg(sal) FROM public.emp WHERE ((sal > $1::integer))

InitPlan 1 (returns $0)

-> Seq Scan on public.salgrade (cost=0.00..35.50 rows=10 width=4)

Output: salgrade.losal

Filter: (salgrade.grade = 4)

If the planner finds that the majority of records needs to be fetched from a remote server, it may not push the function execution to the remote server. For example:

SELECT avg(sal)
FROM EMP
JOIN SALGRADE ON EMP.SAL > SALGRADE.LOSAL AND EMP.SAL < SALGRADE.HISAL
WHERE SALGRADE.GRADE = 4;

SELECT avg(sal)

FROM EMP

JOIN SALGRADE ON EMP.SAL > SALGRADE.LOSAL AND EMP.SAL < SALGRADE.HISAL

WHERE SALGRADE.GRADE = 4;

In this case, the planner decides to do the function execution on the local server:

                                                        QUERY PLAN                                                        
--------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=805.44..805.45 rows=1 width=8)
   Output: avg(emp.sal)
   ->  Nested Loop  (cost=100.00..798.33 rows=2844 width=8)
         Output: emp.sal
         Join Filter: ((emp.sal > (salgrade.losal)::double precision) AND (emp.sal < (salgrade.hisal)::double precision)) ->  Foreign Scan on public.emp  (cost=100.00..186.80 rows=2560 width=8)
               Output: emp.empno, emp.ename, emp.job, emp.mgr, emp.hiredate, emp.sal, emp.comm, emp.deptno
               Remote SQL: SELECT sal FROM public.emp
         ->  Materialize  (cost=0.00..35.55 rows=10 width=8)
               Output: salgrade.losal, salgrade.hisal
               ->  Seq Scan on public.salgrade  (cost=0.00..35.50 rows=10 width=8)
                     Output: salgrade.losal, salgrade.hisal
                     Filter: (salgrade.grade = 4)
(13 rows)

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=805.44..805.45 rows=1 width=8)

Output: avg(emp.sal)

-> Nested Loop (cost=100.00..798.33 rows=2844 width=8)

Output: emp.sal

Join Filter: ((emp.sal > (salgrade.losal)::double precision) AND (emp.sal < (salgrade.hisal)::double precision)) -> Foreign Scan on public.emp (cost=100.00..186.80 rows=2560 width=8)

Output: emp.empno, emp.ename, emp.job, emp.mgr, emp.hiredate, emp.sal, emp.comm, emp.deptno

Remote SQL: SELECT sal FROM public.emp

-> Materialize (cost=0.00..35.55 rows=10 width=8)

Output: salgrade.losal, salgrade.hisal

-> Seq Scan on public.salgrade (cost=0.00..35.50 rows=10 width=8)

Output: salgrade.losal, salgrade.hisal

Filter: (salgrade.grade = 4)

(13 rows)

A great improvement in PostgreSQL 9.6 is that the function doesn’t need to be even a built-in function. If a user-defined function or operator is immutable it becomes a good candidate for being executed in the remote server.

Join push down

In many cases, it is worth pushing down the entire join operations to the remote server in such a way only the results need to be fetched to the local server. PostgreSQL handles this switching intelligently. Here’s an example:

postgres=# EXPLAIN VERBOSE SELECT COUNT(*)
FROM EMP JOIN  DEPT ON EMP.deptno = DEPT.deptno AND DEPT.deptno=10;

1 2	postgres=# EXPLAIN VERBOSE SELECT COUNT(*) FROM EMP JOIN DEPT ON EMP.deptno = DEPT.deptno AND DEPT.deptno=10;

                                                          QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
 Foreign Scan  (cost=100.56..194.84 rows=1 width=8)
   Output: (count(*))
   Relations: Aggregate on ((public.emp) INNER JOIN (public.dept))
   Remote SQL: SELECT count(*) FROM (public.emp r1 INNER JOIN public.dept r2 ON (((r2.deptno = 10)) AND ((r1.deptno = 10))))
(4 rows)

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------

Foreign Scan (cost=100.56..194.84 rows=1 width=8)

Output: (count(*))

Relations: Aggregate on ((public.emp) INNER JOIN (public.dept))

Remote SQL: SELECT count(*) FROM (public.emp r1 INNER JOIN public.dept r2 ON (((r2.deptno = 10)) AND ((r1.deptno = 10))))

(4 rows)

Predicate push down

There are two options when executing a query against a foreign table:

Fetch the data locally and apply the predicates like filtering condition locally.
Send the filtering condition to the remote server and have it applied there.

The latter will can be the best option in many cases.

If you consider the previous example, we can see that the predicate specification like “DEPT.deptno=10;” is pushed down to the remote server through foreign tables and applied there separately like this:

Remote SQL: SELECT count(*) FROM (public.emp r1 INNER JOIN public.dept r2 ON (((r2.deptno = 10)) AND ((r1.deptno = 10))))

PostgreSQL not only pushed the predicate, but it also rewrote the query we sent to avoid one extra AND condition.

Aggregate push down

Just like predicate push down, here PostgreSQL also considers 2 options:

Execute the aggregates on the remote server and pull the result back to the local server
Do the aggregate calculations on the local database instance after collecting all required data from remote database

We’ve already seen an aggregate pushdown example as part of the function pushdown since we’ve used an aggregate function for that example. Here’s another simple example:

postgres=# explain verbose select deptno,count(*) from emp group by deptno;
                            QUERY PLAN                            
------------------------------------------------------------------
 Foreign Scan  (cost=114.62..159.88 rows=200 width=12)
   Output: deptno, (count(*))
   Relations: Aggregate on (public.emp)
   Remote SQL: SELECT deptno, count(*) FROM public.emp GROUP BY 1
(4 rows)

postgres=# explain verbose select deptno,count(*) from emp group by deptno;

QUERY PLAN

------------------------------------------------------------------

Foreign Scan (cost=114.62..159.88 rows=200 width=12)

Output: deptno, (count(*))

Relations: Aggregate on (public.emp)

Remote SQL: SELECT deptno, count(*) FROM public.emp GROUP BY 1

(4 rows)

In this case, all of the aggregate calculation happens on the remote server.

Triggers and Check constraints on Foreign tables

We have seen that foreign tables can be writable. PostgreSQL provides features to implement check constraints and triggers on the foreign table as well. This allows us to have powerful capabilities in the local database. For example, all validations and auditing can take place on the local server. The remote DMLs can be audited separately, or a different logic can be applied for local and remote triggers and constraint validations.

Conclusion

FDWs in PostgreSQL, postgres_fdw in particular, provides very powerful and useful features by which, in many cases, we can avoid the complex duplicating and replicating of data. It provides a mechanism for ACID compliant transactions between two database systems. postgres_fdw works as a reference implementation for the development of other fdw implementations. In the coming days we will be covering some of these.

More articles you might enjoy:

If you found this article useful, why not take a look at some of our other posts on PostgreSQL?

For example, tuning PostgreSQL for sysbench-tpcc continues Percona’s long tradition of performance investigation and benchmarking. The blog covers how to setup PostgreSQL to perform optimally for sysbench-tpcc. The post also highlights the settings our authors tuned the most to accomplish optimal performance.

Fsync call is essential to the durability of a database. Principal Architect, Yves Trudeau, evaluates fsync performance across different storage devices to see how it affects latency and MySQL performance.

0 0 votes

Article Rating

26 Comments

Oldest

Newest Most Voted

Vao Tsun

7 years ago

hi. shouldn’t you first create server and THEN grant usage on it?..

Author

Jobin Augustine

7 years ago

Reply to Vao Tsun

Yes, if superuser is creating server definition, super user should create it and grant privilege to app user.
Alternatively, application user can create server definition. In that case, the superuser can grant privilege to use FDW (postgres_fdw) first, then app user can create server definition in app user account.

Michael Vitale

7 years ago

What I would like to see is an example of aggregating data in parallel across multiple remote servers. Is that even possible now?

Author

Jobin Augustine

7 years ago

Reply to Michael Vitale

Parallel execution on remote servers is announced for PostgreSQL 11. currently it is in beta.

ladeburger

7 years ago

Hi,

Interesting article, especially as there are not so many fdw-related articles around. Would you mind to take a look at https://github.com/EnterpriseDB/mysql_fdw/issues/166 ? Maybe i make somthing fundamentally wrong.

Cheers, Jan

Author

Jobin Augustine

7 years ago

Reply to ladeburger

We are happy to look into special cases and bug reports are appreciated and welcomed.
May be a complete, but simple steps to reproduce problem can help development.

Raja Kt

7 years ago

Any benchmark with fdw implementation with Mariadb ( Spider+innodb) or (Spider+MyRocks)?

Vijay

7 years ago

Hey Jobin,

Awesome writeup.
I use a setup to simulate sharded architecture as following
server a
server b
server c
server d

each shard has FDW mapping talking to each other server.
i.e
server a -> (via fdw) -> can query (server b, server c, server d)
and so on

this worked great for our usecase where we had data in one shard server a has no direct relation to server b and so on.
now for queries we has a view which does a query across all the shards
create view from_all_shards as
select * from remote_server_a.tablea
union all
select * from remote_server_b.tablea
union all
select * from remote_server_c.tablea
union all
select * from remote_server_d.tablea

so even though the queries seem to be making isolated query to shards, i am not able to have parallel queries (even via parallel setup in pg10)
i mean if all query take 10s, if takes 40 s to return.

in simple table setups parallel workers come into play, but in FDW i did not see the case?
Am i doing things correctly?

Author

Jobin Augustine

7 years ago

Reply to Vijay

Hi Vijay,
Great to see your experiment in setting up a shared cluster using FDW.
Parallel scanning of all shards was a known limitation till PostgreSQL 10 and it is addressed in PostgreSQL 11. You may try the same with PostgreSQL 11.

Vijay

7 years ago

Reply to Jobin Augustine

FYI, the pg11 setup too does not parallelize the union all queries via FDW.
postgres_fdw operates through declared cursors, and declared cursors inhibit parallel query. This doesn’t change in v11, see https://www.postgresql.org/docs/11/when-can-parallel-query-be-used.html

https://postgrespro.com/list/id/CAE7uO5hVKdUGJHsBbV8KKXfEWJQ4SCynRMbgG3vgwjQh+HTz7A@mail.gmail.com

Author

Jobin Augustine

7 years ago

Reply to Vijay

Hi Raja,
FDW acts as a client for a remote server. So the performance mainly depends on how fast remote server can answer the queries from the PostgreSQL side. So the benchmarking of FDW layer alone may not be relevant because the overhead in that layer is very minimal.

Nam Le

7 years ago

Thank you for your greate article. I have a question:
Does the query performance decrease when we map one or more tables from one database to many other databases?
Ex:
Database A has some table:
– Employee
– Department
– Organization
– …
And I want to mapping some tables (Emp, Dept, Org) to database B, C, D, and more.
Is it decrease query performance? or have any trouble?

Author

Jobin Augustine

7 years ago

Reply to Nam Le

Since everything need to go though the fdw interface, A slight performance hit related to CPU is expected. However I don’t have numbers now. Thank you for giving me the need to have a benchmark.

Scott

7 years ago

Jobin, great article. I currently have a postgres RDS database in AWS. I have imported schemas and foreign tables from AWS Redshift. The RDS instance contains daily transnational data the is moved to Redshift monthly, but the reporting needs require the data from RDS and the data from Redshift to be merged. The schemas in RDS match the foreign schema from Redshift. How would you suggest queries be handled such that all joins both locally and foreign are pushed down to their respective database engine and the results then merged and returned? The number of joins and filters are completely dynamic in nature.

Author

Jobin Augustine

6 years ago

Reply to Scott

If I understand correctly, A query with Union can be best suitable for you.

Naushad Ahmad

6 years ago

Hi Sir
Great article !!!

But I am stuck at a point when the app database changes in source then how is it going to be updated in the destination schema. For example: if we add a table, or drop a column—our foreign tables will need updating too. And in case it is happening frequently then how can I trigger the change in my destination as well. Kindly explain how can I achieve this.

Author

Jobin Augustine

6 years ago

Reply to Naushad Ahmad

Both are independent databases. So databases won’t do it automatically. It is Architects and DBAs responsibility to ensure that all database schema are in sync in the appropriate change window. You may use external scripting to assist you.

inteligencia360

6 years ago

Great, Perfect, Thank so much

Manoj Jain

6 years ago

Thanks Jobin for such a nice article.

I am facing issue while creating DB link between two databases dev_abc_db (source db) and dev_xyz_db (destination db) present on the same machine. And the owner of both database is same user (dev_user).
Here is what I am doing.

1. CREATE SERVER IF NOT EXISTS resdb FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host ‘dev.abc.com’, dbname ‘dev_abc_db’);
2. CREATE USER MAPPING FOR dev_user SERVER resdb OPTIONS (user ‘dev_user’, password ‘abc’);
3. GRANT USAGE ON FOREIGN SERVER resdb TO dev_user;

I have created a view in my destination database which uses resdb (created in point 1) to fetch the data from source database. While executing view I am getting below error.

SQL Error [2F003]: ERROR: password is required
Detail: Non-superuser cannot connect if the server does not request a password.
Hint: Target server’s authentication method must be changed.

Is it must to have super user role to use DB link ?

Author

Jobin Augustine

6 years ago

Reply to Manoj Jain

Hi Manoj,
FDW is different and dblink is different. Hope you are discussing about FDW.
use regular psql utility in the sever machine and try to connect to foreign database with using the same credentials as you specified in the FDW. if that works, database sever on the same server should be able to connect.

Muhammad ali

6 years ago

Hello brother.
Can you guide me about oracle_fdw?
let consider i fetch a table “abc” which has 50 rows from oracle into Postgresql using oracle_fdw.
Now i insert into foreign table, it works smoothly and data is synced with oracle.
But when i insert a row in “abc” table from oracle, let say now there are 51 rows in oracle “abc” table,
It doesnt reflects in same postgresql foreign table i.e: the postgres foreign table still shows 50 rows.

Please correct me If I’m wrong or tell me if the above statements are true.
Thanks.

Rajni Baliyan

6 years ago

Reply to Muhammad ali

After inserting record in Oracle , execute commit command and select data in PG.
SQL> insert ;
SQL> commit;

Author

Jobin Augustine

6 years ago

Hi Muhammad,
Every record in the remote database will be available though FDW. oracle_fdw is widely used for huge migrations.
if you have any reproducible problem like you mentioned, appreciate filing a bug with oracle_fdw project : https://github.com/laurenz/oracle_fdw/issues

Rohit Tawde

6 years ago

Great article Jobin,

I am getting below error while importing foreign schema.

SQLSTATE[08001]: could not translate host name to address: Name or service not known.

I already tested below things :
– I can able to connect to both local and foreign db with psql command
– Can able to connect to my DB instance using ‘nc -v hosturl’

What else could be possible reasons ?

Florian

6 years ago

Hello Jobin,

thank you for this great article of yours! I already learned a lot from it but have a question.

My database I am trying to connect to has a SSL certificate. What do I have to chnage in the code to connect to the Server? Is it Step 8 I have to alter and if yes, what do I need to do?

Thank you for any help!

Hav a great week,
Florian