pglogical installation and configuration of logical streaming replication using pglogical

pglogical is a logical replication system implemented entirely as a PostgreSQL extension. Fully integrated, it requires no triggers or external programs. This alternative to physical replication is a highly efficient method of replicating data using a publish/subscribe model for selective replication.

Migrate and upgrade PostgreSQL with almost zero downtime
Accumulate changes from sharded database servers into a Data Warehouse
Copy all or a selection of database tables to other nodes in a cluster
Feed database changes in real-time to other systems

What benefits does pglogical provide?

Here I am listing important one

Synchronous replication
Delayed replication
Configurable conflict resolution
Ability to convert physical standby to logical replication
Sequences can be replicated
Subscriber isn’t in hot standby recovery, so you can use temp, unlogged, or normal tables
Subscriber can have different users & security, different indexes, different parameter settings
Replicate just one database, or a subset of tables, known as Replication Sets
Replicate across PostgreSQL versions or architectures, allowing low- or zero-downtime upgrades
Refer document for more features.

pglogical works on a per-database level, not whole server level like physical streaming replication. To use pglogical the provider and subscriber must be running PostgreSQL 9.4 or newer.

Installation of pglogical:

We can install pglogical using two methods

1) Using rpms
2) Using source code

Here I am explain using the source code.


Untar the archive

export the PATH variable such that it includes the command pg_config.
export LD_LIBRARY_PATH variable which point to libraries of postgresql.

Install the PostgreSQL using source code
export the path variable which contain pg_config command
make USE_PGXS=1 clean all
make USE_PGXS=1 install

This will complete installation of pglogical.

pglogical terminology:

Nodes and Roles

A node simply refers to a PostgreSQL server. It can play the role of:
Provider (upstream):  changes to data on this node are tracked and replicated to subscribing nodes.
Subscriber (downstream): subscribes to changes on a provider node and replicates changes.
These roles are not mutually exclusive. A node can be both a provider and a subscriber.

Configuration:

Make below changes to postgresql.conf file in both the nodes.

On primary node:

wal_level = ‘logical’
max_worker_processes = 10
max_replication_slots = 10
max_wal_senders = 10
shared_preload_libraries = ‘pglogical’
track_commit_timestamp = on # 9.5+ only

On slave node:

max_worker_processes = 10
shared_preload_libraries = ‘pglogical’

Replication User

On both the nodes create the user for pglogical.
CREATE ROLE replication_user WITH SUPERUSER REPLICATION LOGIN ENCRYPTED PASSWORD ‘password’;

Changes to pg_hba.conf:

Allow replication connections from localhost
host replication replicationuser 127.0.0.1/32 md5
host  dbname replicationuser 127.0.0.1/32 md5

Allow connection to Primary from standby and standby to primary by replication user by adding below line in pg_hba.conf of all nodes according to your environment. 

host replication_db_name replication_user_name host_or_ipaddress md5
host replication replication_user_name host_or_ipaddress md5

Restart both the instances (Primary and slave).

Extension

Create extension in both the nodes.
CREATE EXTENSION IF NOT EXISTS pglogical; 

Node

We now need to create the node in pglogical. You should do this only once per database in each node.

SELECT pglogical.create_node(
    node_name := ‘provider’,
    dsn := ‘host=192.168.77.129 port=5432 dbname=pglogical user=replication_user password=password’);
SELECT pglogical.create_node(
    node_name := ‘subscriber’,
    dsn := ‘host=192.168.77.131 port=5433 dbname=pglogical user=replication_user password=password’);

Replication Sets

Replication sets provide a mechanism to control which tables in the database will be replicated and which actions on those tables will be replicated. The tables are not replicated until they are added into a replication set.

There are three pre-existing replication sets named “default”, “default_insert_only” and “ddl_sql”. The “default” replication set is defined to replicate all changes to tables in. The “default_insert_only” only replicates INSERTs and is meant for tables that don’t have primary key (see Limitations section for details). The “ddl_sql” replication set is defined to replicate schema changes specified by pglogical.replicate_ddl_command
pglogical.create_replication_set(set_name name, replicate_insert bool, replicate_update bool, replicate_delete bool, replicate_truncate bool)

This function creates a new replication set.

Parameters:
set_name – name of the set, must be unique
replicate_insert – specifies if INSERT is replicated, default true
replicate_update – specifies if UPDATE is replicated, default true
replicate_delete – specifies if DELETE is replicated, default true
replicate_truncate – specifies if TRUNCATE is replicated, default true

EX:
select pglogical.create_replication_set(set_name := ‘APP_RSET’, replicate_insert := true, replicate_update := true, replicate_delete := true, replicate_truncate := true);

select pglogical.replication_set_add_all_tables(set_name := ‘APP_RSET’, schema_names := ARRAY[‘app_schema’], synchronize_data :=true);

select pglogical.replication_set_add_all_sequences(set_name := ‘APP_RSET’, schema_names :=ARRAY[‘app_schema’], synchronize_data :=true)

Subscriptions

Slave nodes will subscribe to the primary nodes to get the changes from the primary node. Replace the para

SELECT pglogical.create_subscription(
    subscription_name := ‘subscription1’,
    provider_dsn := ‘host=192.168.77.129 port=5432 dbname=pglogical user=replicati  on_user password=password’,
    replication_sets := ARRAY[‘APP_RSET’],
    synchronize_structure := true,
    synchronize_data := true);

Parameters:

subscription_name – name of the subscription, must be unique
provider_dsn – connection string to a provider
replication_sets – array of replication sets to subscribe to, these must already exist, default is “{default,default_insert_only,ddl_sql}”
synchronize_structure – specifies if to synchronize structure from provider to the subscriber, default false
synchronize_data – specifies if to synchronize data from provider to the subscriber, default true
forward_origins – array of origin names to forward, currently only supported values are empty array meaning don’t forward any changes that didn’t originate on provider node, or “{all}” which means replicate all changes no matter what is their origin, default is “{all}”
apply_delay – how much to delay replication, default is 0 seconds


Functions to perform DDL:

pglogical do not replicate DDL commands, So we need to manually execute the commands on both the nodes(primary and slave) manually. We can use the below function to execute DDL commands on the both the nodes at a time.

select pglogical.replicate_ddl_command(command := ‘alter table app_schema.employee add address varchar(200)’, replication_sets := ARRAY[‘APP_RSET’]);

pglogical=# select pglogical.replicate_ddl_command(command := ‘create table app_schema.test_rep (id int primary key)’, replication_sets := ARRAY[‘APP_RSET’]);
 replicate_ddl_command
———————–
 t
(1 row)

pglogical=# select pglogical.replication_set_add_table(set_name := ‘APP_RSET’ , relation := ‘app_schema.test_rep’ , synchronize_data := true);
 replication_set_add_table
—————————
 t
(1 row)
Row Filtering

On the provider the row filtering can be done by specifying row_filter parameter for the pglogical.replication_set_add_table function. The row_filter is normal PostgreSQL expression which has the same limitations on what’s allowed as the CHECK constraint.

Conflicts

In case the node is subscribed to multiple providers, or when local writes happen on a subscriber, conflicts can arise for the incoming changes. These are automatically detected and can be acted on depending on the configuration.

The configuration of the conflicts resolver is done via the pglogical.conflict_resolution setting. The supported values for the pglogical.conflict_resolution are:
error – the replication will stop on error if conflict is detected and manual action is needed for resolving
apply_remote – always apply the change that’s conflicting with local data, this is the default
keep_local – keep the local version of the data and ignore the conflicting change that is coming from the remote node
last_update_wins – the version of data with newest commit timestamp will be be kept (this can be either local or remote version)
first_update_wins – the version of the data with oldest timestamp will be kept (this can be either local or remote version)

Synchronous Replication

Synchronous replication is supported using same standard mechanism provided by PostgreSQL for physical replication.
The synchronous_commit and synchronous_standby_names settings will affect when COMMIT command reports success to client if pglogical subscription name is used in synchronous_standby_names. Refer to PostgreSQL documentation for more info about how to configure these two variables.

Limitations and Restrictions

Currently pglogical replication and administration requires superuser privileges. 
UNLOGGED and TEMPORARY tables will not and cannot be replicated
To replicate multiple databases you must set up individual provider/subscriber relationships for each. We can do only one database at a time.
UPDATEs and DELETEs cannot be replicated for tables that lack a PRIMARY KEY or other valid replica identity such as aUNIQUE constraint.
Automatic DDL replication is not supported. Managing DDL so that the provider and subscriber database(s) remain compatible is the responsibility of the user.
pglogical provides the pglogical.replicate_ddl_command function to allow DDL to be run on the provider and subscriber at a consistent point.
PGLogical does not support replication between databases with different encoding. We recommend using UTF-8 encoding in all replicated databases.

Check the status of the replication:-

To check the status of the replication use the pg_stat_replication view which is present in the master node. Where pglogical internally create a logical replication slot to perform the replication.

Overview on how to do up gradation with less down time using pglogical:- (9.4 to 9.6)

1) Install 9.6 software on same machine.
2) Initiate new cluster using 9.6 software.
3) Copy the global objects along with database (only db creation command) of 9.4 to 9.6.
4) Install pglogical as explain above. (install two times for two software versions by setting the path variables)
5) Set the required parameters in both the servers and restart the server to take effect of variable values.
6) Create provider and subscriber nodes in both 9.4 and 9.6
7) Create replication set in primary and add all the tables and sequences to replication set in 9.4.
8) Create subscription for the replication set created in primary in 9.6.
9) Once the primary and standby are in sync. Stop the application.
10) Verify the data and sequences are in sync
11) Stop the application
12) Remove pglogical configuration in both the nodes
13) start the 9.6 server with the port number used in 9.4.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s