PostgreSQL Foreign Data Wrappers (hereinafter referred to as FDW) is a very useful feature in real database usage scenarios. PostgreSQL’s FDW is similar to Oracle’s dblink and DB2’s Federation, which allows you to establish connections between local and external databases so that you can manipulate external data as if it were local data.
FDW What’s the point
- Data sharding uses FDW to distribute data across multiple databases to achieve data sharding (e.g. pg_shardman plug-in, which uses postgres_fdw and pg_pathman plug-ins to achieve data sharding).
- Data synchronization uses FDW to establish a connection between local and external databases to synchronize external data to local at regular intervals.
- Data Migration Use FDW to establish a connection between local database and external database to perform data migration.
- ETL (Extract-Transform-Load) uses FDW to extract data from different types of databases into one data warehouse for easy and uniform access.
PostgreSQL FDW Development Overview
In 2003, SQL/MED (SQL Management of External Data) was added to the SQL standard, which provides specifications for external data management. In 2011, PostgreSQL 9.1 was released to support external data reads, and in 2013, PostgreSQL 9.3 was released to support external data writes.
Currently, PostgreSQL (PostgreSQL 14 at the time of writing) provides several extensions to support operations on various types of external databases or files (e.g., postgres_fdw for connecting to external PostgreSQL databases, oracle_fdw for connecting to external Oracle databases, mysql _fdw supports connecting to external MySQL databases, jdbc_fdw supports connecting to external relational databases using the JDBC protocol, file_fdw supports connecting to external files of a specific format, etc.).
This article focuses only on postgres_fdw, how the PostgreSQL database connects to the external PostgreSQL database and how it manages the external data.
To use postgres_fdw to access a remote database, there are several steps.
- Install the postgres_fdw extension
- Create an external server
- Create a user map
- Create an external table or import an external schema
This article uses a local PostgreSQL database to emulate both the remote and local databases. Before starting the formal steps, you need to do a little bit of preparation in advance.
Check the PostgreSQL version
Creating a user in a remote PostgreSQL database
Use superuser to execute the following statement in the remote PostgreSQL database to create a normal user
fdw_userto be used later when the local database establishes an FDW connection.
CREATE USER fdw_user WITH ENCRYPTED PASSWORD 'secret';
Create a table in a remote PostgreSQL database
Create the weather table
weatherfor testing in the remote database, insert the test data, and grant the user
fdw_userpermission to add, delete, and check against the table.
CREATE TABLE weather ( city varchar(80), -- city name (城市名) temp_low int, -- low temperature (最低温度) temp_high int, -- high temperature (最高温度) prcp real, -- precipitation (降水量) date date -- date (日期) ); INSERT INTO weather (city, temp_low, temp_high, prcp, date) VALUES ('Beijing', 18, 32, 0.25, '2021-05-19'), ('Beijing', 20, 30, 0.0, '2021-05-20'), ('Dalian', 16, 24, 0.0, '2021-05-21');
GRANT SELECT,INSERT,UPDATE,DELETE ON TABLE weather TO fdw_user;
Connect locally with user
fdw_userto the remote database (this article is special in that it uses both local and remote databases, so the remote host is also localhost) and verify the authorized privileges.
$ psql -h localhost -U fdw_user postgres postgres=> SELECT * FROM weather; city | temp_low | temp_high | prcp | date ---------+----------+-----------+------+------------ Beijing | 18 | 32 | 0.25 | 2021-05-19 Beijing | 20 | 30 | 0 | 2021-05-20 Dalian | 16 | 24 | 0 | 2021-05-21 (3 rows)
Note: For a real remote database, you need to add a record to the pg_hba.conf configuration file of the remote database to firewall the access IP in order to establish a connection locally.
Creating a user in the local PostgreSQL database
Use superuser to create the normal user
local_userin the local PostgreSQL database by executing the following statement.
CREATE USER local_user WITH ENCRYPTED PASSWORD 'secret';
All the preparations are done, now you can start the formal steps in the local database using superuser.
Installing the postgres_fdw extension
CREATE EXTENSION statement to install the
Grant access to
postgres_fdw for user
Creating an external server
To create an external server using the
CREATE SERVER statement, specify the host, port and database name of the remote database.
Grant access to the external server
foreign_server for the user
Creating a User Mapping
CREATE USER MAPPING statement to create a mapping of remote users to local users, you need to provide the user name and password of the remote user.
Creating an External Table or Importing an External Schema
CREATE FOREIGN TABLE statement to create a remote table. Note that the column types need to match the actual remote table and the column names should ideally be the same, otherwise you will need to specify the column names in the remote table separately for each column using the
In most cases, you can just use the
IMPORT FOREIGN SCHEMA statement to directly import all the tables in the external schema into the local schema.
Note: Since no user mapping is specified for super_user, the following statement needs to be executed with user
local_user, otherwise it will report
ERROR: user mapping not found for "super_user"error.
Authorize CRUD permissions for all tables (including external tables) in public mode for
This allows you to use the user
local_user to connect to the local database and operate on external tables.
At this point, we have basically mastered the use of
postgres_fdw. Next, we will look at the system tables and functions related to FDW, and finally look at FDW transaction management and performance optimization to get a deeper understanding of FDW.
Some important parameters when setting up postgres_fdw
This option sets whether the external table can be updated, i.e. whether postgres_fdw allows external tables to be modified using the
DELETEcommands, the default is
true. It can be specified on an external table or on an external server, and the one specified on the table will override the one specified on the server. The specific statements to set or update this parameter are as follows.
-- 创建外部服务器时指定 CREATE SERVER foreign_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (..., updatable 'false', ...); -- 创建外部表时指定 CREATE FOREIGN TABLE foreign_weather ( ... ) SERVER foreign_server OPTIONS (schema_name ..., table_name ..., updatable 'false', ...); -- 更新外部服务器参数选项 ALTER SERVER foreign_server OPTIONS (updatable 'false'); -- 更新外部表参数选项 ALTER FOREIGN TABLE foreign_weather OPTIONS(updatable 'false');
Of course, if the remote table is not actually updatable, then an error will occur anyway. Using this option mainly allows to throw errors locally without having to query the remote server.
This option sets whether external tables can be truncated, i.e. whether postgres_fdw allows external tables to be truncated using the
TRUNCATEcommand, the default is
true. This parameter can also be specified on the external table or on the external server, and specifying it on the table will override specifying it on the server.
The specific statements to set or update this parameter are exactly the same as
Of course, if the remote table is not actually truncable, then an error will occur anyway. Again, the main point of using this option is that it allows errors to be thrown locally without having to query the remote server.
This option sets whether postgres_fdw will keep connections to the remote server in the local session for reuse, the default is
on(if set to
off, all connections to the external server will be dropped at the end of each transaction). It can only be specified on external servers. The specific statements to set or update this parameter are as follows.
FDW related system tables and functions
FDW-related system tables are as follows (for the
_pg_* table, only super_user has access to it)
Calling this function returns the external server name of all open connections established by postgres_fdw from the local session to the external server and whether the connection is valid.
Note: This function gets the current connection status of the local session to the external server, not the connection status of the local database to the external server. Therefore, remote table queries made by opening a separate Shell Tab will not be recorded by the current local session.
This article uses the default option (
on) for the
keep_connectionsparameter when creating the external server, so the connection is kept.
As you can see below
psqlconnects to the local database and after doing an external table query, the query
postgres_fdw_get_connections()function will return a row.
Disconnects all connections from postgres_fdw from the local session (local session) to the given external server, based on the name passed in.
There can be multiple connections to the given server using different user mappings (when multiple users are used to access the external server, multiple user mappings are configured and postgres_fdw will create a connection for each user mapping). If the connection is being used in the current local transaction, it will not be disconnected and a warning message will be output. Returns true if at least one connection is disconnected, false otherwise, and an error is reported if no external server with the given name is found (
ERROR: server "..." does not exist).
Following the session, execute
SELECT postgres_fdw_disconnect('foreign_server')and return
true; query the
postgres_fdw_get_connections()function again to find no more connections.
Disconnects postgres_fdw from the local session to the external server. Used in a similar way to
postgres_fdw_disconnect(server_name text)and will not be repeated here.
FDW Transaction Management and Performance Optimization
When querying a remote table, postgres_fdw opens a new transaction on the remote server if the transaction corresponding to the current local transaction is not already open. When the local transaction is committed or aborted, the remote transaction is also committed or aborted. Savepoints are also managed by creating the corresponding remote savepoints.
When the local transaction has a SERIALIZABLE isolation level, the remote transaction also uses that isolation level; otherwise, the REPEATABLE READ isolation level is used.
If a query performs multiple table scans on a remote server, this option ensures that it will get snapshot-consistent results for all scans. The result is that successive queries in a single transaction will see the same data from the remote server, even if other activities perform concurrent updates on the remote server. This behavior is expected if the local transaction uses the SERIALIZABLE or REPEATABLE READ isolation levels, but it may be surprising for local transactions with the READ COMMITTED isolation level. Future versions of PostgreSQL may modify these rules.
postgres_fdw will be more intelligent in determining whether a query statement (the query statements to be tested include
DELETE statements, statements involving operators, functions, joins, filter conditions and aggregate functions, etc.) should be moved down to the remote server for execution.
Ideally, the tables involved are on the remote server, and the operators, functions, etc. are all built-in types, so that postgres_fdw sends the entire query to the remote server for computation, and then just takes the results. In most cases, postgres_fdw needs to fetch the necessary data locally to perform operations such as joins, filters, and aggregate function processing. That is, postgres_fdw optimizes the queries sent to the remote server (optimizing the WHERE clause and not fetching unneeded columns) to reduce the data transfer from the remote server.
Let’s look at two examples.
Pure Remote Table Query
Original query statement:
SELECT * FROM foreign_weather;
EXPLAIN VERBOSEto view the actual query (Remote SQL) sent to the remote server as :
SELECT city, temp_low, temp_high, prcp, date FROM public.weather
$ psql -U local_user postgres postgres=> EXPLAIN VERBOSE SELECT * FROM foreign_weather; QUERY PLAN ---------------------------------------------------------------------------------- Foreign Scan on public.foreign_weather (cost=100.00..121.25 rows=375 width=194) Output: city, temp_low, temp_high, prcp, date Remote SQL: SELECT city, temp_low, temp_high, prcp, date FROM public.weather (3 rows)
Remote table and local table join query
Create a new local table
citiesand insert test data.
The SQL sent by postgres_fdw to the remote server is
$ psql -U local_user postgres postgres=> EXPLAIN VERBOSE SELECT * FROM cities c, foreign_weather w WHERE c.name = w.city; QUERY PLAN ------------------------------------------------------------------------------------------ Hash Join (cost=118.10..163.91 rows=675 width=388) Output: c.name, c.location, w.city, w.temp_low, w.temp_high, w.prcp, w.date Hash Cond: ((w.city)::text = (c.name)::text) -> Foreign Scan on public.foreign_weather w (cost=100.00..121.25 rows=375 width=194) Output: w.city, w.temp_low, w.temp_high, w.prcp, w.date Remote SQL: SELECT city, temp_low, temp_high, prcp, date FROM public.weather -> Hash (cost=13.60..13.60 rows=360 width=194) Output: c.name, c.location -> Seq Scan on public.cities c (cost=0.00..13.60 rows=360 width=194) Output: c.name, c.location (10 rows)
That is, postgres_fdw will fetch all the data of
foreign_weatherlocally and join it with the table
And for the query.
The SQL sent by postgres_fdw to the remote server is
$ psql -U local_user postgres postgres=> EXPLAIN VERBOSE SELECT c.name, max(w.temp_high) FROM cities c, foreign_weather w WHERE c.name = w.city AND w.temp_high <= 30 group by c.name; QUERY PLAN ------------------------------------------------------------------------------------------------------ HashAggregate (cost=143.17..145.17 rows=200 width=182) Output: c.name, max(w.temp_high) Group Key: c.name -> Hash Join (cost=119.25..141.98 rows=238 width=182) Output: c.name, w.temp_high Hash Cond: ((c.name)::text = (w.city)::text) -> Seq Scan on public.cities c (cost=0.00..13.60 rows=360 width=178) Output: c.name, c.location -> Hash (cost=117.60..117.60 rows=132 width=182) Output: w.temp_high, w.city -> Foreign Scan on public.foreign_weather w (cost=100.00..117.60 rows=132 width=182) Output: w.temp_high, w.city Remote SQL: SELECT city, temp_high FROM public.weather WHERE ((temp_high <= 30)) (13 rows)
That is, postgres_fdw optimizes the
WHEREconditions sent to the remote server, gets only the data it needs from the remote table
foreign_weather, and then computes it locally with the table
citiesfor join, filtering, and aggregate function processing.
In summary, we have a more detailed understanding of the basic concepts of PostgreSQL external data wrappers and how to use postgres_fdw.