pglogical installation and configuration of logical streaming replication using pglogical
pglogical is a logical replication system implemented entirely as a PostgreSQL extension. Fully integrated, it requires no triggers or external programs. This alternative to physical replication is a highly efficient method of replicating data using a publish/subscribe model for selective replication.
Migrate and upgrade PostgreSQL with almost zero downtime
Accumulate changes from sharded database servers into a Data Warehouse
Copy all or a selection of database tables to other nodes in a cluster
Feed database changes in real-time to other systems
What benefits does pglogical provide?
Here I am listing important one
• Synchronous replication
• Delayed replication
• Configurable conflict resolution
• Ability to convert physical standby to logical replication
• Sequences can be replicated
• Subscriber isn’t in hot standby recovery, so you can use temp, unlogged, or normal tables
• Subscriber can have different users & security, different indexes, different parameter settings
• Replicate just one database, or a subset of tables, known as Replication Sets
• Replicate across PostgreSQL versions or architectures, allowing low- or zero-downtime upgrades
Refer document for more features.
pglogical works on a per-database level, not whole server level like physical streaming replication. To use pglogical the provider and subscriber must be running PostgreSQL 9.4 or newer.
Installation of pglogical:
We can install pglogical using two methods
1) Using rpms
2) Using source code
Here I am explain using the source code.
Download the source code from http://packages.2ndquadrant.com/pglogical/tarballs/
Untar the archive
export the PATH variable such that it includes the command pg_config.
export LD_LIBRARY_PATH variable which point to libraries of postgresql.
• Install the PostgreSQL using source code
• export the path variable which contain pg_config command
• make USE_PGXS=1 clean all
• make USE_PGXS=1 install
This will complete installation of pglogical.
pglogical terminology:
Nodes and Roles
A node simply refers to a PostgreSQL server. It can play the role of:
Provider (upstream): changes to data on this node are tracked and replicated to subscribing nodes.
Subscriber (downstream): subscribes to changes on a provider node and replicates changes.
These roles are not mutually exclusive. A node can be both a provider and a subscriber.
Configuration:
Make below changes to postgresql.conf file in both the nodes.
On primary node:
wal_level = ‘logical’
max_worker_processes = 10
max_replication_slots = 10
max_wal_senders = 10
shared_preload_libraries = ‘pglogical’
track_commit_timestamp = on # 9.5+ only
On slave node:
max_worker_processes = 10
shared_preload_libraries = ‘pglogical’
Replication User
On both the nodes create the user for pglogical.
CREATE ROLE replication_user WITH SUPERUSER REPLICATION LOGIN ENCRYPTED PASSWORD ‘password’;
Changes to pg_hba.conf:
Allow replication connections from localhost
host replication replicationuser 127.0.0.1/32 md5
host dbname replicationuser 127.0.0.1/32 md5
Allow connection to Primary from standby and standby to primary by replication user by adding below line in pg_hba.conf of all nodes according to your environment.
host replication_db_name replication_user_name host_or_ipaddress md5
host replication replication_user_name host_or_ipaddress md5
Restart both the instances (Primary and slave).
Extension
Create extension in both the nodes.
CREATE EXTENSION IF NOT EXISTS pglogical;
Node
We now need to create the node in pglogical. You should do this only once per database in each node.
SELECT pglogical.create_node(
node_name := ‘provider’,
dsn := ‘host=192.168.77.129 port=5432 dbname=pglogical user=replication_user password=password’);
SELECT pglogical.create_node(
node_name := ‘subscriber’,
dsn := ‘host=192.168.77.131 port=5433 dbname=pglogical user=replication_user password=password’);
Replication Sets
Replication sets provide a mechanism to control which tables in the database will be replicated and which actions on those tables will be replicated. The tables are not replicated until they are added into a replication set.
There are three pre-existing replication sets named “default”, “default_insert_only” and “ddl_sql”. The “default” replication set is defined to replicate all changes to tables in. The “default_insert_only” only replicates INSERTs and is meant for tables that don’t have primary key (see Limitations section for details). The “ddl_sql” replication set is defined to replicate schema changes specified by pglogical.replicate_ddl_command
pglogical.create_replication_set(set_name name, replicate_insert bool, replicate_update bool, replicate_delete bool, replicate_truncate bool)
This function creates a new replication set.
Parameters:
• set_name – name of the set, must be unique
• replicate_insert – specifies if INSERT is replicated, default true
• replicate_update – specifies if UPDATE is replicated, default true
• replicate_delete – specifies if DELETE is replicated, default true
• replicate_truncate – specifies if TRUNCATE is replicated, default true
EX:
select pglogical.create_replication_set(set_name := ‘APP_RSET’, replicate_insert := true, replicate_update := true, replicate_delete := true, replicate_truncate := true);
select pglogical.replication_set_add_all_tables(set_name := ‘APP_RSET’, schema_names := ARRAY[‘app_schema’], synchronize_data :=true);
select pglogical.replication_set_add_all_sequences(set_name := ‘APP_RSET’, schema_names :=ARRAY[‘app_schema’], synchronize_data :=true)
Subscriptions
Slave nodes will subscribe to the primary nodes to get the changes from the primary node. Replace the para
SELECT pglogical.create_subscription(
subscription_name := ‘subscription1’,
provider_dsn := ‘host=192.168.77.129 port=5432 dbname=pglogical user=replicati on_user password=password’,
replication_sets := ARRAY[‘APP_RSET’],
synchronize_structure := true,
synchronize_data := true);
Parameters:
• subscription_name – name of the subscription, must be unique
• provider_dsn – connection string to a provider
• replication_sets – array of replication sets to subscribe to, these must already exist, default is “{default,default_insert_only,ddl_sql}”
• synchronize_structure – specifies if to synchronize structure from provider to the subscriber, default false
• synchronize_data – specifies if to synchronize data from provider to the subscriber, default true
• forward_origins – array of origin names to forward, currently only supported values are empty array meaning don’t forward any changes that didn’t originate on provider node, or “{all}” which means replicate all changes no matter what is their origin, default is “{all}”
• apply_delay – how much to delay replication, default is 0 seconds
Functions to perform DDL:
pglogical do not replicate DDL commands, So we need to manually execute the commands on both the nodes(primary and slave) manually. We can use the below function to execute DDL commands on the both the nodes at a time.
select pglogical.replicate_ddl_command(command := ‘alter table app_schema.employee add address varchar(200)’, replication_sets := ARRAY[‘APP_RSET’]);
pglogical=# select pglogical.replicate_ddl_command(command := ‘create table app_schema.test_rep (id int primary key)’, replication_sets := ARRAY[‘APP_RSET’]);
replicate_ddl_command
———————–
t
(1 row)
pglogical=# select pglogical.replication_set_add_table(set_name := ‘APP_RSET’ , relation := ‘app_schema.test_rep’ , synchronize_data := true);
replication_set_add_table
—————————
t
(1 row)
Row Filtering
On the provider the row filtering can be done by specifying row_filter parameter for the pglogical.replication_set_add_table function. The row_filter is normal PostgreSQL expression which has the same limitations on what’s allowed as the CHECK constraint.
Conflicts
In case the node is subscribed to multiple providers, or when local writes happen on a subscriber, conflicts can arise for the incoming changes. These are automatically detected and can be acted on depending on the configuration.
The configuration of the conflicts resolver is done via the pglogical.conflict_resolution setting. The supported values for the pglogical.conflict_resolution are:
• error – the replication will stop on error if conflict is detected and manual action is needed for resolving
• apply_remote – always apply the change that’s conflicting with local data, this is the default
• keep_local – keep the local version of the data and ignore the conflicting change that is coming from the remote node
• last_update_wins – the version of data with newest commit timestamp will be be kept (this can be either local or remote version)
• first_update_wins – the version of the data with oldest timestamp will be kept (this can be either local or remote version)
Synchronous Replication
• Synchronous replication is supported using same standard mechanism provided by PostgreSQL for physical replication.
• The synchronous_commit and synchronous_standby_names settings will affect when COMMIT command reports success to client if pglogical subscription name is used in synchronous_standby_names. Refer to PostgreSQL documentation for more info about how to configure these two variables.
Limitations and Restrictions
• Currently pglogical replication and administration requires superuser privileges.
• UNLOGGED and TEMPORARY tables will not and cannot be replicated
• To replicate multiple databases you must set up individual provider/subscriber relationships for each. We can do only one database at a time.
• UPDATEs and DELETEs cannot be replicated for tables that lack a PRIMARY KEY or other valid replica identity such as aUNIQUE constraint.
• Automatic DDL replication is not supported. Managing DDL so that the provider and subscriber database(s) remain compatible is the responsibility of the user.
• pglogical provides the pglogical.replicate_ddl_command function to allow DDL to be run on the provider and subscriber at a consistent point.
• PGLogical does not support replication between databases with different encoding. We recommend using UTF-8 encoding in all replicated databases.
Check the status of the replication:-
To check the status of the replication use the pg_stat_replication view which is present in the master node. Where pglogical internally create a logical replication slot to perform the replication.
Overview on how to do up gradation with less down time using pglogical:- (9.4 to 9.6)
1) Install 9.6 software on same machine.
2) Initiate new cluster using 9.6 software.
3) Copy the global objects along with database (only db creation command) of 9.4 to 9.6.
4) Install pglogical as explain above. (install two times for two software versions by setting the path variables)
5) Set the required parameters in both the servers and restart the server to take effect of variable values.
6) Create provider and subscriber nodes in both 9.4 and 9.6
7) Create replication set in primary and add all the tables and sequences to replication set in 9.4.
8) Create subscription for the replication set created in primary in 9.6.
9) Once the primary and standby are in sync. Stop the application.
10) Verify the data and sequences are in sync
11) Stop the application
12) Remove pglogical configuration in both the nodes
13) start the 9.6 server with the port number used in 9.4.