PostgreSQL Foreign Data Wrappers (hereinafter referred to as FDW) is a very useful feature in real database usage scenarios. PostgreSQL’s FDW is similar to Oracle’s dblink and DB2’s Federation, which allows you to establish connections between local and external databases so that you can manipulate external data as if it were local data.

FDW What’s the point

  • Data sharding uses FDW to distribute data across multiple databases to achieve data sharding (e.g. pg_shardman plug-in, which uses postgres_fdw and pg_pathman plug-ins to achieve data sharding).
  • Data synchronization uses FDW to establish a connection between local and external databases to synchronize external data to local at regular intervals.
  • Data Migration Use FDW to establish a connection between local database and external database to perform data migration.
  • ETL (Extract-Transform-Load) uses FDW to extract data from different types of databases into one data warehouse for easy and uniform access.

PostgreSQL FDW Development Overview

In 2003, SQL/MED (SQL Management of External Data) was added to the SQL standard, which provides specifications for external data management. In 2011, PostgreSQL 9.1 was released to support external data reads, and in 2013, PostgreSQL 9.3 was released to support external data writes.

Currently, PostgreSQL (PostgreSQL 14 at the time of writing) provides several extensions to support operations on various types of external databases or files (e.g., postgres_fdw for connecting to external PostgreSQL databases, oracle_fdw for connecting to external Oracle databases, mysql _fdw supports connecting to external MySQL databases, jdbc_fdw supports connecting to external relational databases using the JDBC protocol, file_fdw supports connecting to external files of a specific format, etc.).

PostgreSQL FDW Development Overview

This article focuses only on postgres_fdw, how the PostgreSQL database connects to the external PostgreSQL database and how it manages the external data.

Using postgres_fdw

To use postgres_fdw to access a remote database, there are several steps.

  • Install the postgres_fdw extension
  • Create an external server
  • Create a user map
  • Create an external table or import an external schema

This article uses a local PostgreSQL database to emulate both the remote and local databases. Before starting the formal steps, you need to do a little bit of preparation in advance.

  • Check the PostgreSQL version

    1
    2
    
    $ psql --version
    psql (PostgreSQL) 14.2
    
  • Creating a user in a remote PostgreSQL database

    Use superuser to execute the following statement in the remote PostgreSQL database to create a normal user fdw_user to be used later when the local database establishes an FDW connection.

    1
    
    CREATE USER fdw_user WITH ENCRYPTED PASSWORD 'secret';
    
  • Create a table in a remote PostgreSQL database

    Create the weather table weather for testing in the remote database, insert the test data, and grant the user fdw_user permission to add, delete, and check against the table.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    
    CREATE TABLE weather (
        city        varchar(80), -- city name (城市名)
        temp_low    int,  -- low temperature (最低温度)
        temp_high   int,  -- high temperature (最高温度)
        prcp        real, -- precipitation (降水量)
        date        date  -- date (日期)
    );
    
    INSERT INTO weather (city, temp_low, temp_high, prcp, date)
        VALUES ('Beijing', 18, 32, 0.25, '2021-05-19'),
            ('Beijing', 20, 30, 0.0, '2021-05-20'),
            ('Dalian', 16, 24, 0.0, '2021-05-21');
    
    1
    
    GRANT SELECT,INSERT,UPDATE,DELETE ON TABLE weather TO fdw_user;
    

    Connect locally with user fdw_user to the remote database (this article is special in that it uses both local and remote databases, so the remote host is also localhost) and verify the authorized privileges.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    
    $ psql -h localhost -U fdw_user postgres
    
    postgres=> SELECT * FROM weather;
    city   | temp_low | temp_high | prcp |    date
    ---------+----------+-----------+------+------------
    Beijing |       18 |        32 | 0.25 | 2021-05-19
    Beijing |       20 |        30 |    0 | 2021-05-20
    Dalian  |       16 |        24 |    0 | 2021-05-21
    (3 rows)
    

    Note: For a real remote database, you need to add a record to the pg_hba.conf configuration file of the remote database to firewall the access IP in order to establish a connection locally.

  • Creating a user in the local PostgreSQL database

    Use superuser to create the normal user local_user in the local PostgreSQL database by executing the following statement.

    1
    
    CREATE USER local_user WITH ENCRYPTED PASSWORD 'secret';
    

All the preparations are done, now you can start the formal steps in the local database using superuser.

Installing the postgres_fdw extension

Use the CREATE EXTENSION statement to install the postgres_fdw extension.

1
CREATE EXTENSION postgres_fdw;

Grant access to postgres_fdw for user local_user.

1
GRANT USAGE ON FOREIGN DATA WRAPPER postgres_fdw TO local_user;

Creating an external server

To create an external server using the CREATE SERVER statement, specify the host, port and database name of the remote database.

1
2
3
CREATE SERVER foreign_server
        FOREIGN DATA WRAPPER postgres_fdw
        OPTIONS (host 'localhost', port '5432', dbname 'postgres');

Grant access to the external server foreign_server for the user local_user.

1
GRANT USAGE ON FOREIGN SERVER foreign_server TO local_user;

Creating a User Mapping

Use the CREATE USER MAPPING statement to create a mapping of remote users to local users, you need to provide the user name and password of the remote user.

1
2
3
CREATE USER MAPPING FOR local_user
        SERVER foreign_server
        OPTIONS (user 'fdw_user', password 'secret');

Creating an External Table or Importing an External Schema

Use the CREATE FOREIGN TABLE statement to create a remote table. Note that the column types need to match the actual remote table and the column names should ideally be the same, otherwise you will need to specify the column names in the remote table separately for each column using the column_name parameter.

1
2
3
4
5
6
7
8
9
CREATE FOREIGN TABLE foreign_weather (
      city        varchar(80),
      temp_low    int,
      temp_high   int,
      prcp        real,
      date        date
  )
        SERVER foreign_server
        OPTIONS (schema_name 'public', table_name 'weather');

In most cases, you can just use the IMPORT FOREIGN SCHEMA statement to directly import all the tables in the external schema into the local schema.

Note: Since no user mapping is specified for super_user, the following statement needs to be executed with user local_user, otherwise it will report ERROR: user mapping not found for "super_user" error.

1
2
3
4
5
-- 导入外部模式下的所有表
IMPORT FOREIGN SCHEMA public FROM SERVER foreign_server INTO public;

-- 导入外部模式下的指定表
IMPORT FOREIGN SCHEMA public LIMIT TO (weather) FROM SERVER foreign_server INTO public;

Authorize CRUD permissions for all tables (including external tables) in public mode for local_user.

1
GRANT SELECT,INSERT,UPDATE,DELETE ON ALL TABLES IN SCHEMA public TO local_user;

This allows you to use the user local_user to connect to the local database and operate on external tables.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
$ psql -U local_user postgres

postgres=> SELECT * FROM foreign_weather;
  city   | temp_low | temp_high | prcp |    date
---------+----------+-----------+------+------------
 Beijing |       18 |        32 | 0.25 | 2021-05-19
 Beijing |       20 |        30 |    0 | 2021-05-20
 Dalian  |       16 |        24 |    0 | 2021-05-21
(3 rows)

postgres=> UPDATE foreign_weather SET prcp=0 WHERE city='Beijing' AND date='2021-05-19';
UPDATE 1

At this point, we have basically mastered the use of postgres_fdw. Next, we will look at the system tables and functions related to FDW, and finally look at FDW transaction management and performance optimization to get a deeper understanding of FDW.

Some important parameters when setting up postgres_fdw

  • updatable

    This option sets whether the external table can be updated, i.e. whether postgres_fdw allows external tables to be modified using the INSERT, UPDATE and DELETE commands, the default is true. It can be specified on an external table or on an external server, and the one specified on the table will override the one specified on the server. The specific statements to set or update this parameter are as follows.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    
    -- 创建外部服务器时指定
    CREATE SERVER foreign_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (..., updatable 'false', ...);
    
    -- 创建外部表时指定
    CREATE FOREIGN TABLE foreign_weather (
        ...
    )
        SERVER foreign_server
        OPTIONS (schema_name ..., table_name ..., updatable 'false', ...);
    
    -- 更新外部服务器参数选项
    ALTER SERVER foreign_server OPTIONS (updatable 'false');
    
    -- 更新外部表参数选项
    ALTER FOREIGN TABLE foreign_weather OPTIONS(updatable 'false');
    

    Of course, if the remote table is not actually updatable, then an error will occur anyway. Using this option mainly allows to throw errors locally without having to query the remote server.

  • truncatable

    This option sets whether external tables can be truncated, i.e. whether postgres_fdw allows external tables to be truncated using the TRUNCATE command, the default is true. This parameter can also be specified on the external table or on the external server, and specifying it on the table will override specifying it on the server.

    The specific statements to set or update this parameter are exactly the same as updatable above.

    Of course, if the remote table is not actually truncable, then an error will occur anyway. Again, the main point of using this option is that it allows errors to be thrown locally without having to query the remote server.

  • keep_connections

    This option sets whether postgres_fdw will keep connections to the remote server in the local session for reuse, the default is on (if set to off, all connections to the external server will be dropped at the end of each transaction). It can only be specified on external servers. The specific statements to set or update this parameter are as follows.

    1
    2
    3
    4
    5
    
    -- 创建外部服务器时指定
    CREATE SERVER foreign_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (..., keep_connections 'off', ...);
    
    -- 更新外部服务器参数选项
    ALTER SERVER foreign_server OPTIONS (keep_connections 'off');
    

system tables

FDW-related system tables are as follows (for the _pg_* table, only super_user has access to it)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
information_schema._pg_foreign_data_wrappers
information_schema._pg_foreign_servers
information_schema._pg_foreign_tables
information_schema._pg_foreign_table_columns
information_schema._pg_user_mappings

information_schema.foreign_data_wrappers
information_schema.foreign_data_wrapper_options
information_schema.foreign_server_options
information_schema.foreign_servers
information_schema.foreign_tables
information_schema.foreign_table_options

Functions

  • postgres_fdw_get_connections()

    Calling this function returns the external server name of all open connections established by postgres_fdw from the local session to the external server and whether the connection is valid.

    Note: This function gets the current connection status of the local session to the external server, not the connection status of the local database to the external server. Therefore, remote table queries made by opening a separate Shell Tab will not be recorded by the current local session.

    This article uses the default option (on) for the keep_connections parameter when creating the external server, so the connection is kept.

    As you can see below psql connects to the local database and after doing an external table query, the query postgres_fdw_get_connections() function will return a row.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    
    $ psql -U local_user postgres
    
    postgres=> SELECT * FROM foreign_weather;
    ...
    
    postgres=> SELECT * FROM postgres_fdw_get_connections();
    server_name   | valid
    ----------------+-------
    foreign_server | t
    (1 row)
    
  • postgres_fdw_disconnect(server_name text)

    Disconnects all connections from postgres_fdw from the local session (local session) to the given external server, based on the name passed in.

    There can be multiple connections to the given server using different user mappings (when multiple users are used to access the external server, multiple user mappings are configured and postgres_fdw will create a connection for each user mapping). If the connection is being used in the current local transaction, it will not be disconnected and a warning message will be output. Returns true if at least one connection is disconnected, false otherwise, and an error is reported if no external server with the given name is found (ERROR: server "..." does not exist).

    Following the session, execute SELECT postgres_fdw_disconnect('foreign_server') and return true; query the postgres_fdw_get_connections() function again to find no more connections.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    
    postgres=> SELECT postgres_fdw_disconnect('foreign_server');
    postgres_fdw_disconnect
    -------------------------
    t
    (1 row)
    
    postgres=> SELECT * FROM postgres_fdw_get_connections();
    server_name | valid
    -------------+-------
    (0 rows)
    
  • postgres_fdw_disconnect_all()

    Disconnects postgres_fdw from the local session to the external server. Used in a similar way to postgres_fdw_disconnect(server_name text) and will not be repeated here.

FDW Transaction Management and Performance Optimization

Transaction Management

When querying a remote table, postgres_fdw opens a new transaction on the remote server if the transaction corresponding to the current local transaction is not already open. When the local transaction is committed or aborted, the remote transaction is also committed or aborted. Savepoints are also managed by creating the corresponding remote savepoints.

When the local transaction has a SERIALIZABLE isolation level, the remote transaction also uses that isolation level; otherwise, the REPEATABLE READ isolation level is used.

If a query performs multiple table scans on a remote server, this option ensures that it will get snapshot-consistent results for all scans. The result is that successive queries in a single transaction will see the same data from the remote server, even if other activities perform concurrent updates on the remote server. This behavior is expected if the local transaction uses the SERIALIZABLE or REPEATABLE READ isolation levels, but it may be surprising for local transactions with the READ COMMITTED isolation level. Future versions of PostgreSQL may modify these rules.

Performance Optimization

postgres_fdw will be more intelligent in determining whether a query statement (the query statements to be tested include SELECT, UPDATE, DELETE statements, statements involving operators, functions, joins, filter conditions and aggregate functions, etc.) should be moved down to the remote server for execution.

Ideally, the tables involved are on the remote server, and the operators, functions, etc. are all built-in types, so that postgres_fdw sends the entire query to the remote server for computation, and then just takes the results. In most cases, postgres_fdw needs to fetch the necessary data locally to perform operations such as joins, filters, and aggregate function processing. That is, postgres_fdw optimizes the queries sent to the remote server (optimizing the WHERE clause and not fetching unneeded columns) to reduce the data transfer from the remote server.

Let’s look at two examples.

  • Pure Remote Table Query

    Original query statement:

    1
    
    SELECT * FROM foreign_weather;
    

    Use EXPLAIN VERBOSE to view the actual query (Remote SQL) sent to the remote server as :

    1
    
    SELECT city, temp_low, temp_high, prcp, date FROM public.weather
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    
    $ psql -U local_user postgres
    
    postgres=> EXPLAIN VERBOSE SELECT * FROM foreign_weather;
                                        QUERY PLAN
    ----------------------------------------------------------------------------------
    Foreign Scan on public.foreign_weather  (cost=100.00..121.25 rows=375 width=194)
    Output: city, temp_low, temp_high, prcp, date
    Remote SQL: SELECT city, temp_low, temp_high, prcp, date FROM public.weather
    (3 rows)
    
  • Remote table and local table join query

    Create a new local table cities and insert test data.

    1
    2
    3
    4
    5
    6
    7
    8
    
    CREATE TABLE cities (
    name        varchar(80), -- city name (城市名)
    location    point -- point为PostgreSQL特有类型,该字段表示地理坐标(经度, 纬度)
    );
    
    INSERT INTO cities (name, location)
    VALUES ('Beijing', '(116.3, 39.9)'),
            ('Shanghai', '(121.3, 31.1)');
    

    For queries.

    1
    2
    
    SELECT * FROM cities c, foreign_weather w
    WHERE c.name = w.city;
    

    The SQL sent by postgres_fdw to the remote server is

    1
    2
    
    SELECT city, temp_low, temp_high, prcp, date
    FROM public.weather;
    
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    
    $ psql -U local_user postgres
    
    postgres=> EXPLAIN VERBOSE SELECT * FROM cities c, foreign_weather w WHERE c.name = w.city;
                                        QUERY PLAN
    ------------------------------------------------------------------------------------------
    Hash Join  (cost=118.10..163.91 rows=675 width=388)
    Output: c.name, c.location, w.city, w.temp_low, w.temp_high, w.prcp, w.date
    Hash Cond: ((w.city)::text = (c.name)::text)
    ->  Foreign Scan on public.foreign_weather w  (cost=100.00..121.25 rows=375 width=194)
            Output: w.city, w.temp_low, w.temp_high, w.prcp, w.date
            Remote SQL: SELECT city, temp_low, temp_high, prcp, date FROM public.weather
    ->  Hash  (cost=13.60..13.60 rows=360 width=194)
            Output: c.name, c.location
            ->  Seq Scan on public.cities c  (cost=0.00..13.60 rows=360 width=194)
                Output: c.name, c.location
    (10 rows)
    

    That is, postgres_fdw will fetch all the data of foreign_weather locally and join it with the table cities.

    And for the query.

    1
    2
    3
    
    SELECT c.name, max(w.temp_high)
    FROM cities c, foreign_weather w
        WHERE c.name = w.city AND w.temp_high <= 30 GROUP BY c.name;
    

    The SQL sent by postgres_fdw to the remote server is

    1
    2
    3
    
    SELECT city, temp_high
    FROM public.weather
        WHERE (temp_high <= 30)
    
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    
    $ psql -U local_user postgres
    
    postgres=> EXPLAIN VERBOSE SELECT c.name, max(w.temp_high) FROM cities c, foreign_weather w WHERE c.name = w.city AND w.temp_high <= 30 group by c.name;
                                                QUERY PLAN
    ------------------------------------------------------------------------------------------------------
    HashAggregate  (cost=143.17..145.17 rows=200 width=182)
    Output: c.name, max(w.temp_high)
    Group Key: c.name
    ->  Hash Join  (cost=119.25..141.98 rows=238 width=182)
            Output: c.name, w.temp_high
            Hash Cond: ((c.name)::text = (w.city)::text)
            ->  Seq Scan on public.cities c  (cost=0.00..13.60 rows=360 width=178)
                Output: c.name, c.location
            ->  Hash  (cost=117.60..117.60 rows=132 width=182)
                Output: w.temp_high, w.city
                ->  Foreign Scan on public.foreign_weather w  (cost=100.00..117.60 rows=132 width=182)
                        Output: w.temp_high, w.city
                        Remote SQL: SELECT city, temp_high FROM public.weather WHERE ((temp_high <= 30))
    (13 rows)
    

    That is, postgres_fdw optimizes the WHERE conditions sent to the remote server, gets only the data it needs from the remote table foreign_weather, and then computes it locally with the table cities for join, filtering, and aggregate function processing.

In summary, we have a more detailed understanding of the basic concepts of PostgreSQL external data wrappers and how to use postgres_fdw.