Category Archives: PostgreSQL

Load AWS metrics to PostgreSQL

AWS provide lot of metrics about resource utilization of ec2/rds instance which is useful to analyze the system utilization. There are different ways to read the metrics like aws command line or using python boto3 library etc.

To know the resource utilization trend of the instance we need to maintain this metrics for longer term. As it is costly to store this metrics in Cloudwatch for longer term it would be good to copy the data to some database which will help to find the trend of resource utilization and useful for capacity planning.

Below is a simple way to load aws metrics of rds instance to PostgreSQL. PostgreSQL has bulk loading command COPY which can be used to load metrics directly from other programs. Below is the snippet of code useful to load metrics from aws to PostgreSQL.

Here I am creating three tables for read iops, write iops and cpu utilization and loading the respective metrics into the tables.

!/bin/bash
 export PGPASSWORD='**********'
 psql -h 10.12.34.56 -U viswamitra -d postgres -c "create table if not exists write_iops(datapoint varchar,value decimal,gather_time timestamp,unit varchar)" -c "\copy sandbox_write from PROGRAM 'aws cloudwatch get-metric-statistics --metric-name WriteIOPS --start-time date +%Y-%m-%d --date="$1 days ago"T00:00:00Z --end-time date +%Y-%m-%d --date="$1 days ago"T23:59:59Z --period 60 --namespace AWS/RDS --statistics Average --dimensions Name=DBInstanceIdentifier,Value=testinstance1 --output text | tail -n +2';"
 psql -h 10.12.34.56 -U viswamitra -d postgres -c "create table if not exists read_iops(datapoint varchar,value decimal,gather_time timestamp,unit varchar)" -c "\copy sandbox_write from PROGRAM 'aws cloudwatch get-metric-statistics --metric-name WriteIOPS --start-time date +%Y-%m-%d --date="$1 days ago"T00:00:00Z --end-time date +%Y-%m-%d --date="$1 days ago"T23:59:59Z --period 60 --namespace AWS/RDS --statistics Average --dimensions Name=DBInstanceIdentifier,Value=testinstance1 --output text | tail -n +2';"
 psql -h 10.12.34.56 -U viswamitra -d postgres -c "create table if not exists cpu_utilization(datapoint varchar,value decimal,gather_time timestamp,unit varchar)" -c "\copy sandbox_write from PROGRAM 'aws cloudwatch get-metric-statistics --metric-name WriteIOPS --start-time date +%Y-%m-%d --date="$1 days ago"T00:00:00Z --end-time date +%Y-%m-%d --date="$1 days ago"T23:59:59Z --period 60 --namespace AWS/RDS --statistics Average --dimensions Name=DBInstanceIdentifier,Value=testinstance1 --output text | tail -n +2';"

PostgreSQL idle sessions filled my disk space

We have an internal application using PostgreSQL as backend with application connection pooling. After few days we started seeing a peculiar issue where the disk allocated for database is getting full and application is running too slow. After restarting the application the space got released and application is working as expected.

After checking the tables which are consuming more space we found that temporary tables are consuming more space(to know more about temporary tables visit this). After checking it further, temporary tables will be dropped once the session is closed but as we are using the application connection pooling which is not managed properly causing this issue.

Before closing or establishing the connection it is recommended to execute the “DISCARD ALL” which will drop all the temporary tables or resources allocated in that session.

test_temp=# create temporary table test_emp(id int, name varchar);
 CREATE TABLE
 test_temp=#
 test_temp=# insert into test_emp values (1,'Viswamitra');
 INSERT 0 1
 test_temp=#
 test_temp=# select * from test_emp ;
  id |    name
 ----+------------
   1 | Viswamitra
 (1 row)
 test_temp=#
 test_temp=# discard all;
 DISCARD ALL
 test_temp=# select * from test_emp ;
 ERROR:  relation "test_emp" does not exist
 LINE 1: select * from test_emp ;
                       ^
 test_temp=#

In the tomcat connection pooling configuration (validationquery) set the command “discard all” which will release the resources occupied by the session before returning the connection back to the pool.

Reference:

PostgreSQL: Documentation: 11: DISCARD

Temporary tables in PostgreSQL

Temporary tables are short lived tables which will be dropped at the end of session or transaction. These are used to store intermediate data in a transaction. Temporary tables can be created in the same way as the permanent tables where we just add TEMP or TEMPORARY to create table statement. Below is the syntax to create temporary tables

create [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } table
(
col1 datatype,
col2 datatype,
.....
.....) ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP }
[ TABLESPACE tablespace_name ]

Databases like Oracle or SQL server support global temporary tables where a temporary table can be created once and used in all sessions, but in PostgreSQL it is not yet supported, the GLOBAL keyword is there just for SQL standard but it actually represents the LOCAL.

Temporary tables are visible only in a session, sessions can’t see temporary tables of each other. Temporary tables can be created with same name as the permanent tables but it is not recommended. When you try to access the table without schema, temporary tables will be given priority than permanent tables.

In the below example temporary table with name emp has more priority than permanent emp table. To access the permanent tables use fully qualified name (schema.table).

 postgres=# create database test_temp;
 CREATE DATABASE
 postgres=# \c test_temp
 psql (12.4, server 10.14)
 You are now connected to database "test_temp" as user "postgres".
 test_temp=# create table emp(id int, name varchar);
 CREATE TABLE
 test_temp=# insert into emp values (1,'Viswamitra');
 INSERT 0 1
 test_temp=# create temporary table emp (id int, name varchar);
 CREATE TABLE 
 test_temp=# \dt
           List of relations
   Schema   | Name | Type  |  Owner   
 -----------+------+-------+----------
  pg_temp_4 | emp  | table | postgres
 (1 row)
 test_temp=# show search_path ;
    search_path   
 "$user", public
 (1 row)
 test_temp=# \dt public.*
         List of relations
  Schema | Name | Type  |  Owner   
 --------+------+-------+----------
  public | emp  | table | postgres
 (1 row)
 test_temp=# select * from emp;
  id | name 
 ----+------
 (0 rows)
 test_temp=# select * from public.emp;
  id |    name    
 ----+------------
   1 | Viswamitra
 (1 row)

Temporary tables will be created in a special schema with naming convention “pg_temp_nnn”, temporary tables can not be created in non-temporary schemas. When a query is executed , PostgreSQL will search for matching tables in pg_temp_nnn schema and then it will search in schemas specified in search_path variable.

postgres=# create temporary table public.test(id int);
 ERROR:  cannot create temporary relation in non-temporary schema
 LINE 1: create temporary table public.test(id int);
                                ^

By default temporary tables(files store the data) will be created in the default tablespace pg_default, if there is any tablespace created and set in the parameter temp_tablespaces, tables will use that tablespace.

 test_temp=# create temporary table test_size(id int, name varchar);
 CREATE TABLE
 test_temp=# select pg_relation_filepath('test_size');
  pg_relation_filepath
 base/90113/t4_90140
 (1 row)
 test_temp=# ! du -sh /pgdata/10/data/base/90113/t4_90140
 0       /pgdata/10/data/base/90113/t4_90140
 test_temp=# insert into test_size select generate_series(1,10000),md5(generate_series(1,10000)::text);
 INSERT 0 10000
 test_temp=# ! du -sh /pgdata/10/data/base/90113/t4_90140
 960K    /pgdata/10/data/base/90113/t4_90140
 test_temp=#
 test_temp=# \dt+ test_size
                         List of relations
   Schema   |   Name    | Type  |  Owner   |  Size  | Description
 -----------+-----------+-------+----------+--------+-------------
  pg_temp_4 | test_size | table | postgres | 704 kB |
 (1 row)
 test_temp=# ! mkdir /tmp/test
 test_temp=# create tablespace test location '/tmp/test';
 CREATE TABLESPACE
 test_temp=# create temporary table test_size_1(id int, name varchar) tablespace test;
 CREATE TABLE
 test_temp=# select pg_relation_filepath('test_size_1');
               pg_relation_filepath
 pg_tblspc/90147/PG_10_201707211/90113/t4_90148
 (1 row)

Indexes can be created on temporary tables which are by default temporary and dropped after that. autovacuum process can not handle the temporary tables so it is highly recommended to analyze the temporary tables before using in complex queries.

temp_buffers is a session level memory component which is allocated each session to access the temporary tables, by default the size of temp_buffers is 8 MB. if the temporary tables are too large than temp_buffers query will start using disk (merge disk algorithm) to process the data.

PostgreSQL upgrade to 12 failed due to earth distance extension

While upgrading RDS PostgreSQL from 9.5 to 12 got below error, which is little confusing because it is not listed in the prerequisites documentation.

"/rdsdbbin/postgres/bin/pg_restore" --host /rdsdbdata/log/upgrade --port 8193 --username rdsadmin --create --exit-on-error --verbose --dbname template1 "pg_upgrade_dump_1.custom" >> "pg_upgrade_dump_1.log" 2>&1

*failure*
There were problems executing ""/rdsdbbin/postgres/bin/pg_restore" --host /rdsdbdata/log/upgrade --port 8193 --username rdsadmin --create --exit-on-error --verbose --dbname template1 "pg_upgrade_dump_16395.custom" >> "pg_upgrade_dump_16395.log" 2>&1"
Consult the last few lines of "pg_upgrade_dump_16395.log" for
the probable cause of the failure.
"/rdsdbbin/postgres/bin/pg_ctl" -w -D "/rdsdbdata/db" -o "--config_file=/rdsdbdata/config_new/postgresql.conf" -m fast stop >> "pg_upgrade_server.log" 2>&1
child process exited abnormally: status 256
"/rdsdbbin/postgres/bin/pg_ctl" -w -D "/rdsdbdata/db" -o "--config_file=/rdsdbdata/config_new/postgresql.conf" -m fast stop >> "pg_upgrade_server.log" 2>&1
----------------------- END OF LOG ----------------------

atabase instance is in a state that cannot be upgraded: pg_restore: from TOC entry 4241; 1259 62347 INDEX addresses_earthdistance_ix f0187642fc72b3a pg_restore: error: could not execute query: ERROR: type "earth" does not exist LINE 1: ...ians($1))*sin(radians($2))),earth()*sin(radians($1)))::earth ^ QUERY: SELECT cube(cube(cube(earth()*cos(radians($1))*cos(radians($2))),earth()*cos(radians($1))*sin(radians($2))),earth()*sin(radians($1)))::earth CONTEXT: SQL function "ll_to_earth" during inlining Command was: -- For binary upgrade, must preserve pg_class oids SELECT pg_catalog.binary_upgrade_set_next_index_pg_class_oid('62347'::pg_catalog.oid); CREATE INDEX "addresses_earthdistance_ix" ON "publicw9l8ylklm8dupgr4zedbfssfgjmnfzwf"."addresses" USING "gist" ("publicw9l8ylklm8dupgr4zedbfssfgjmnfzwf"."ll_to_earth"("latitude", "longitude"));

after digging more found that there is a bug which is causing this issue. So we dropped the index “addresses_earthdistance_ix” and recreated it after upgradation.

PostgreSQL: Re: BUG #15112: Unable to run pg_upgrade with earthdistance extension

Aurora read replica option is not available for RDS PostgreSQL

We tried to migrate our existing RDS PostgreSQL instance of 9.5 version to 11.8 version. Below are the steps we followed after checking the documentation. The maximum Aurora PostgreSQL version supported is 11.8.

  1. upgrade existing RDS PostgreSQL to 11.8 version
  2. create read replica for existing RDS PostgreSQL instance
  3. Once both instances are in sync, promote read replica

When we were performing this on dev machine we found that RDS do not support read replica creation for 11.8. After discussing with AWS team they confirmed that RDS do not support Aurora read replica for all PostgreSQL versions. Below are details which can help others who want to migrate to Aurora.

RDS PostgreSQL VersionAurora Read Replica Version Supported
  
9.6.19.6.8, 9.6.9
9.6.29.6.8, 9.6.9
9.6.39.6.8, 9.6.9
9.6.59.6.8, 9.6.9
9.6.69.6.8, 9.6.9
9.6.89.6.8, 9.6.9
9.6.99.6.9
9.6.109.6.11, 9.6.12
9.6.119.6.11
9.6.129.6.12
9.6.14No Aurora Read Replica
9.6.15No Aurora Read Replica
9.6.16No Aurora Read Replica
9.6.17No Aurora Read Replica
9.6.18No Aurora Read Replica
  
10.110.4, 10.5, 10.6, 10.7, 10.12
10.310.4, 10.5, 10.6, 10.7, 10.12
10.410.4, 10.5, 10.6, 10.7, 10.12
10.510.5, 10.6, 10.7, 10.12
10.610.6, 10.7, 10.12
10.710.7, 10.12
10.910.11, 10.12
10.1O10.11, 10.12
10.1110.11, 10.12
10.1210.12
10.13No Aurora Read Replica
  
11.111.4, 11.7
11.211.4, 11.7
11.411.4, 11.7
11.511.6, 11.7
11.611.6, 11.7
11.711.7
11.8No Aurora Read Replica
  
12.2No Aurora Read Replica
12.3No Aurora Read Replica

PostgreSQL DECLARATIVE PARTITIONING

This is a series of posts on PostgreSQL declarative partitioning starting from PostgreSQL 10.

Until PostgreSQL 9.6 traditional partitioning method is used, where inheritance is used to create partitions and to maintain the data integrity CHECK constraints and Triggers are used. This method has following disadvantages.

  1. Complexity: need to create child tables, triggers, and check constraints.
  2. Data consistency depend on CHECK constraints defined on individual tables.
  3. Insert statements are re-directed to the child tables using triggers so it is slow.

In PostgreSQL 10 we don’t need to define the triggers or constraints, also the tuple routing is handled internally. I have done a small benchmark on the performance of 9.6 and 10 partitioning; you can find it at the end.

Types of partitioning in PostgreSQL 10

PostgreSQL supports two types of partitioning

List

To create list partitioned table, specify the name of the column to be used for partition during the create table statement along with partition type as LIST. Only one column can be specified.

postgres=# create table country (state varchar, code int) partition by list(state);
CREATE TABLE
postgres=#
postgres=# create table india_ap partition of country for values in ( 'Andhra Pradesh');
CREATE TABLE
postgres=#
postgres=# create table india_kerala partition of country for values in ( 'Kerla');
CREATE TABLE
postgres=#
postgres=# \d+ country
                                       Table "public.country"
 Column |       Type        | Collation | Nullable | Default | Storage  | Stats target | Description
--------+-------------------+-----------+----------+---------+----------+--------------+-------------
 state  | character varying |           |          |         | extended |              |
 code   | integer           |           |          |         | plain    |              |
Partition key: LIST (state)
Partitions: india_ap FOR VALUES IN ('Andhra Pradesh'),
            india_kerala FOR VALUES IN ('Kerla')

postgres=# insert into country values ('Andhra Pradesh',1);
INSERT 0 1
postgres=# select * from country ;
     state      | code
----------------+------
 Andhra Pradesh |    1
(1 row)
postgres=# insert into country values ('TAMIL NADU',3);
ERROR:  no partition of relation "country" found for row
DETAIL:  Partition key of the failing row contains (state) = (TAMIL NADU).

If we try to insert data into table which is not in the partition list, then it will through an error.

Obtaining the partition information

The pg_get_partkeydef function can be used to obtain the partition method and column information. Restrictions on each partition can be obtained with the pg_get_partition_constraintdef function.

postgres=# select pg_get_partkeydef('country'::regclass);
 pg_get_partkeydef
-------------------
 LIST (state)
(1 row)

postgres=# select pg_get_partition_constraintdef('india_kerala'::regclass);
                     pg_get_partition_constraintdef
------------------------------------------------------------------------
 ((state IS NOT NULL) AND ((state)::text = 'Kerla'::character varying))
(1 row)

Range

The table is partitioned into “ranges” defined by a key column or set of columns, with no overlap between the ranges of values assigned to different partitions. To create a range partitioned table create table with column name which can be partitioned based on ranges and specify the partition type as RANGE. Multiple columns can be specified by separating with a comma (,). Not null constraint is automatically applied for the partition columns.

Below is an example of how the declarative partitioning works, it is a simple example copied from documentation.

create partitioned table by specifying the partition by type of partition, here we are partitioning a table based on a date (range partitioning)

CREATE TABLE measurement (
    city_id         int not null,
    logdate         date not null,
    peaktemp        int,
    unitsales       int
) PARTITION BY RANGE (logdate);

Create partitions of the table, while creating mention the range for the date whose data need to be stored.

CREATE TABLE measurement_y2006m02 PARTITION OF measurement
    FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');

CREATE TABLE measurement_y2006m03 PARTITION OF measurement
    FOR VALUES FROM ('2006-03-01') TO ('2006-04-01');

--- While creating the table we can say “unbounded” for beginning or ending value so that it accept all values and make it as default partition.

CREATE TABLE measurement_default PARTITION OF measurement FOR VALUES FROM ('2006-04-01') TO ('infinity');
--- For partition key with data type of numbers we need to give “unbounded” in the place of infinity.

We can also create partitions for a partition called sub-partitions.

CREATE TABLE measurement_y2006m02 PARTITION OF measurement
    FOR VALUES FROM ('2006-02-01') TO ('2006-03-01')
    PARTITION BY RANGE (peaktemp);

Delete old partition

The data will be expired based on the SLA we have, once the data is expired we need to delete the data. We can simply drop the partition table using “DROP TABLE”

DROP TABLE measurement_y2006m02;

Recommended option is detaching the partition and take backup if required and delete the partition.

ALTER TABLE measurement DETACH PARTITION measurement_y2006m02;

Drop table take a table level lock on the parent table.

Add a new partition to existing partitioned table

Some times we need to add an existing or create a new one and do bulk load into the table and add it as a partition to the existing table. We can do it like below.

CREATE TABLE measurement_y2008m02
  (LIKE measurement INCLUDING DEFAULTS INCLUDING CONSTRAINTS);

ALTER TABLE measurement_y2008m02 ADD CONSTRAINT y2008m02
   CHECK (logdate >= DATE '2008-02-01' AND logdate < DATE '2008-03-01');

\copy measurement_y2008m02 from 'measurement_y2008m02'
-- possibly some other data preparation work

ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02
    FOR VALUES FROM ('2008-02-01') TO ('2008-03-01');

Before attaching the table as a partition we need to add the constraint else it will scan the complete table during the time of attach.

Constraint exclusion

PostgreSQL use the constraint exclusion during planning phase to find which partitions to scan. All constraints on all partitions of the master table are examined during constraint exclusion, so large numbers of partitions are likely to increase query planning time considerably.

Partitioning using these techniques will work well with up to perhaps a hundred partitions; do not try to use many thousands of partitions.

Limitations

There is no facility to create global index (index across all partitions), in turn there is no facility to create primary key or unique key or exclusion constraint on partitioned tables (on all partitions).

No foreign key creation is possible on partitioned table and partition table cannot refer the key from other table.

Using the ON CONFLICT clause with partitioned tables will cause an error, because unique or exclusion constraints can only be created on individual partitions. There is no support for enforcing uniqueness (or an exclusion constraint) across an entire partitioning hierarchy.

An UPDATE that causes a row to move from one partition to another fails, because the new value of the row fails to satisfy the implicit partition constraint of the original partition.

Row triggers, if necessary, must be defined on individual partitions, not the partitioned table.

Mixing temporary and permanent relations in the same partition tree is not allowed. Hence, if the partitioned table is permanent, so must be its partitions and likewise if the partitioned table is temporary. When using temporary relations, all members of the partition tree must be from the same session.

NOTE

Choosing right partition key is an important task, it will decide the query performance.

Choosing the number of partitions is important task, large number of partitions increase the planning cost and memory consumption. Less number of partitions with large size can degrade query performance. The query planner is generally able to handle partition hierarchies with up to a few hundred partitions.

Truncate command is propagated to all partitions of the table

Copy statement is propagated to child tables

Dropping parent table drops child tables

Add or remove columns do add columns to partition tables but partition column can not be dropped

Catalog tables to store partition information

New catalog table pg_patitioned_table maintain the partitioned table information. pg_class table maintain the information about a table is partitioned or not and boundary information

postgres=# SELECT partrelid::regclass, * FROM pg_partitioned_table ;
  partrelid  | partrelid | partstrat | partnatts | partattrs | partclass | partcollation | partexprs
-------------+-----------+-----------+-----------+-----------+-----------+---------------+-----------
 measurement |     16384 | r         |         1 | 2         | 3122      | 0             |
(1 row)

postgres=#  SELECT relname, relispartition, relpartbound FROM pg_class WHERE
relname = 'measurement_y2006m02' ;
-[ RECORD 1 ]--+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
relname        | measurement_y2006m02
relispartition | t
relpartbound   | {PARTITIONBOUNDSPEC :strategy r :listdatums <> :lowerdatums ({PARTITIONRANGEDATUM :kind 0 :value {CONST :consttype 1082 :consttypmod -1 :constcollid 0 :constlen 4 :constbyval true :constisnull false :location 80 :constvalue 4 [ -81 8 0 0 0 0 0 0 ]} :location 80}) :upperdatums ({PARTITIONRANGEDATUM :kind 0 :value {CONST :consttype 1082 :consttypmod -1 :constcollid 0 :constlen 4 :constbyval true :constisnull false :location 98 :constvalue 4 [ -53 8 0 0 0 0 0 0 ]} :location 98}) :location 74}

--- Get the definition of the partition boundary

postgres=# SELECT relname, relispartition, pg_get_expr(relpartbound, oid)
postgres-# FROM pg_class WHERE relname = 'measurement_y2006m02' ;
-[ RECORD 1 ]--+-------------------------------------------------
relname        | measurement_y2006m02
relispartition | t
pg_get_expr    | FOR VALUES FROM ('2006-02-01') TO ('2006-03-01')

9.6 vs 10 Partitioning performance

I have done a small performance test on 9.6 and 10 version. I created a new partitioned table and loaded data into the table using copy. 10 version is much faster than 9.6 version.

---- PostgreSQL 10 ----

postgres=# create table part_test (id int not null, name varchar not null, address varchar not null) partition by range(id);
CREATE TABLE
postgres=# create table part_test_1 partition of part_test for values from (1) to (10000);
CREATE TABLE
postgres=# create table part_test_2 partition of part_test for values from (10000) to (20000);
CREATE TABLE
postgres=# create table part_test_3 partition of part_test for values from (20000) to (30000);
CREATE TABLE
postgres=# create table part_test_4 partition of part_test for values from (30000) to (40000);
CREATE TABLE
postgres=# create table part_test_5 partition of part_test for values from (40000) to (50000);
CREATE TABLE
postgres=# \copy part_test from 'test.csv' CSV;
COPY 49999
Time: 73.360 ms

---- PostgreSQL 9.6 ----

postgres=# create table part_test (id int not null, name varchar not null, address varchar not null);
CREATE TABLE
postgres=# create table part_test_1(CHECK ( id >= 1 and id < 10000)) inherits (part_test);
CREATE TABLE
postgres=# create table part_test_2(CHECK ( id >= 10000 and id < 20000)) inherits (part_test);
CREATE TABLE
postgres=# create table part_test_3(CHECK ( id >= 20000 and id < 30000)) inherits (part_test);
CREATE TABLE
postgres=# create table part_test_4(CHECK ( id >= 30000 and id < 40000)) inherits (part_test);
CREATE TABLE
postgres=# create table part_test_5(CHECK ( id >= 40000 and id < 50000)) inherits (part_test);
CREATE TABLE
postgres=#


CREATE OR REPLACE FUNCTION part_test_insert_trigger()
RETURNS TRIGGER AS $$
BEGIN
    IF ( NEW.id >= 1 AND NEW.id < 10000 )
	THEN
        INSERT INTO part_test_1 VALUES (NEW.*);
    ELSIF ( NEW.id >= 10000 AND NEW.id < 20000 ) THEN
        INSERT INTO part_test_2 VALUES (NEW.*);
    ELSIF ( NEW.id >= 20000 AND NEW.id < 30000 ) THEN
        INSERT INTO part_test_3 VALUES (NEW.*);
	ELSIF ( NEW.id >= 30000 AND NEW.id < 40000 ) THEN
	    INSERT INTO part_test_4 VALUES (NEW.*);
	ELSIF ( NEW.id >= 40000 AND NEW.id < 50000 ) THEN
	    INSERT INTO part_test_5 VALUES (NEW.*);
    ELSE
        RAISE EXCEPTION 'Date out of range.  Fix the measurement_insert_trigger() function!';
    END IF;
    RETURN NULL;
END;
$$
LANGUAGE plpgsql;


CREATE TRIGGER part_test_trigger
    BEFORE INSERT ON part_test
    FOR EACH ROW EXECUTE PROCEDURE part_test_insert_trigger();
	
postgres=# \copy part_test from test.csv CSV;
COPY 0
Time: 546.744 ms
postgres=#
postgres=# select count(*) from part_test;
 count
-------
 49999
(1 row)

Time: 4.994 ms

Upgrade RDS PostgreSQL to higher version with less downtime using bucardo

Downtime ! when the business team listen to this word they simply say no…To upgrade a PostgreSQL Database using pg_upgrade we need good amount of downtime based on the database size. Even if we are using PostgreSQL RDS we need to have downtime to upgrade to latest versions.

If we are using PostgreSQL 9.4+ we can use the pglogical extension, some times it might not be possible to install this extension and we need another way. In that I found bucardo is very much useful. We can install this on a separate machine and configure the replication.

Let’s start!

What is bucardo ?

Bucardo is a perl based software used to replicate a set of tables from source to destination using triggers. I am using this to migrate from RDS PostgreSQL 9.5 to 11 version.

How bucardo works ?

Bucardo creates a schema bucardo in the source database and create a trigger on each table who are part of replication. Once there is a change (insert, update, delete) on source table, it will be stored in the delta tables created in bucardo schema. A notice will be sent to the bucardo daemon process which will handle the replication of record to destination.

My configuration ?

I have two rds instances and one EC2 instance with CentOS 7.

hostdatabaseversion
   
pghost1compliance9.5
pghost2compliance11
bucardobucardo10
configuration

Installation

The simplest way to install bucardo is using the yum repository of PostgreSQL. After installation the configure bucardo to connect to all databases (source and destination) without prompting password. Create the users with rds_superuser permissions and make entry for them in the pg_hba.conf of bucardo instance. As this is my private instance I created users with same password and used * for all instances in pg_hba.conf

yum install epel
yum install bucardo_10
[root@bucardo usr]# systemctl enable postgresql-10
Created symlink from /etc/systemd/system/multi-user.target.wants/postgresql-10.service to /usr/lib/systemd/system/postgresql-10.service.
[root@bucardo usr]# /usr/pgsql-10/bin/postgresql-10-setup initdb
Initializing database ... OK
[root@bucardo usr]# systemctl start postgresql-10
# Set the configuration parameters
postgres=# alter system set listen_addresses to '*';
ALTER SYSTEM
postgres=# alter system set shared_buffers to '250MB';
ALTER SYSTEM
# Add a line to pg_hba.conf
host    all             all             0.0.0.0/0               md5
# Create a file ~/.pgpass file with below content add in all hosts
-bash-4.2$ cat ~/.pgpass
*:*:*:*:<password>
# I have set password for all users to postgres so I set it like that
systemctl restart postgresql-10

Create bucardo repository database

bucardo use repository database to maintain the metadata about the instances involved in the replication and other information. Login to the bucardo instance and create the database.

postgres=# create user bucardo password 'postgres' superuser;
CREATE ROLE
postgres=# create database bucardo owner bucardo;
CREATE DATABASE

Install bucardo repository

This will create metadata tables required to store metadata in the bucardo database created in previous step.

To store the log files and process id related files create the required folders.

[root@bucardo usr]# mkdir -p /var/run/bucardo
[root@bucardo usr]# mkdir -p /var/log/bucardo
[root@bucardo usr]# chown -R postgres:postgres /var/run/bucardo
[root@bucardo usr]# chown -R postgres:postgres /var/log/bucardo

install bucardo

-bash-4.2$ bucardo install
This will install the bucardo database into an existing Postgres cluster.
Postgres must have been compiled with Perl support,
and you must connect as a superuser

Current connection settings:
1. Host:           <none>
2. Port:           5432
3. User:           bucardo
4. Database:       bucardo
5. PID directory:  /var/run/bucardo
Enter a number to change it, P to proceed, or Q to quit: 1

# Change hostname to localhost and select P to proceed
Enter a number to change it, P to proceed, or Q to quit: P


Attempting to create and populate the bucardo database and schema
Database creation is complete

Updated configuration setting "piddir"
Installation is now complete.
If you see errors or need help, please email bucardo-general@bucardo.org

You may want to check over the configuration variables next, by running:
bucardo show all
Change any setting by using: bucardo set foo=bar

Now the bucardo installation is completed. We will configure the instances for replication.

Configure instances

Take a schema backup of the compliance database from 9.5 and restore it to 11 version server.

pg_dump -C -d compliance -h 192.168.159.137 --schema-only | psql
# Install the extension plperl in both the instances
# For 9.5,11 version install plperl
yum install postgresql11-plperl
compliance=# create extension plperl;
CREATE EXTENSION

As we are using AWS create the below function in source and destination environments (databases)

postgres=# CREATE OR REPLACE FUNCTION public.rds_session_replication_role(role text)
postgres-#  RETURNS text
postgres-#  LANGUAGE plpgsql
postgres-#  SECURITY DEFINER
postgres-# AS $function$
postgres$#         DECLARE
postgres$#                 curr_val text := 'unset';
postgres$#         BEGIN
postgres$#                 EXECUTE 'SET session_replication_role = ' || quote_literal(role);
postgres$#                 EXECUTE 'SHOW session_replication_role' INTO curr_val;
postgres$#                 RETURN curr_val;
postgres$#         END
postgres$# $function$;
CREATE FUNCTION
postgres=# revoke all on function rds_session_replication_role(text) from public;
REVOKE
Also, make changes to the /usr/local/share/perl5/Bucardo.pm file at lines 5397 and 5330 with the following code:
$dbh->do(q{select rds_session_replication_role('replica');}); ## Assumes a sane default !
From.
$dbh->do(q{SET session_replication_role = default}); ## Assumes a sane default !

Add databases to bucardo

Add the databases into bucardo , to be part of replication and create a database group.

postgres@bucardo $ bucardo add db pgdb95 dbhost=192.168.159.137 dbport=5432 dbname=compliance dbuser=bucardo dbpass=postgres
Added database "pgdb95"
postgres@bucardo $ bucardo add db pgdb11 dbhost=192.168.159.130 dbport=5432 dbname=compliance dbuser=bucardo dbpass=postgres
Added database "pgdb11"
postgres@bucardo $
postgres@bucardo $ bucardo list db
Database: pgdb11  Status: active  Conn: psql -p 5432 -U bucardo -d compliance -h 192.168.159.130
Database: pgdb95  Status: active  Conn: psql -p 5432 -U bucardo -d compliance -h 192.168.159.137
postgres@bucardo $
postgres@bucardo $ bucardo add dbgroup migration_group
Created dbgroup "migration_group"
postgres@bucardo $
[root@bucardo ~]# bucardo add dbgroup migration_group pgdb95:source
Added database "pgdb95" to dbgroup "migration_group" as source
[root@bucardo ~]# bucardo add dbgroup migration_group pgdb11:target
Added database "pgdb11" to dbgroup "migration_group" as target
postgres@bucardo $

Add tables and sequences to be part of replication

We can add some specific tables to be part of replication process. Here I am adding all tables because of the requirement. This will not start replication it just note them as herds(relation groups).

[root@bucardo ~]# bucardo add all tables db=pgdb95 relgroup=migration_rel_group
Creating relgroup: migration_rel_group
Added table public.pgbench_branches to relgroup migration_rel_group
Added table public.pgbench_tellers to relgroup migration_rel_group
Added table public.pgbench_history to relgroup migration_rel_group
Added table public.pgbench_accounts to relgroup migration_rel_group
New tables added: 0
Already added: 4
[root@bucardo ~]#
[root@bucardo ~]# bucardo add all sequences db=pgdb95 relgroup=migration_rel_group --verbose
Relgroup already exists: migration_rel_group
Added sequence public.pgbench_history_sid_seq to relgroup migration_rel_group
New sequences added: 0
Already added: 1

[root@bucardo ~]# bucardo list herd
Relgroup: migration_rel_group  DB: pgdb95  Members: public.pgbench_accounts, public.pgbench_branches, public.pgbench_history, public.pgbench_history_sid_seq, public.pgbench_tellers

Create sync

Sync is the one which will start the replication by creating the triggers and other processes. Here I am creating the sync with “autokick=0” which means it will make a note of the changes (insert, update, delete) but not replicate the changes.

postgres@bucardo $ bucardo add sync migration_sync dbs=migration_group herd= migration_rel_group onetimecopy=2 autokick=0
Added sync "migration_sync"

postgres@bucardo $ bucardo list sync
Sync "migration_sync"  Relgroup "migration" [Active]
  DB group "migration_group" pgdb11:target pgdb95:source

Now take a backup of data from source database and copy it to the destination. After data copy is completed, start the replication process.

# Copy the data
pg_dump -v -U postgres -h 192.168.159.137 -d compliance --data-only --disable-triggers -N bucardo | PGOPTIONS='-c session_replication_role=replica' psql compliance
# Start replication
bucardo -U bucardo -d bucardo -P postgres update sync migration_sync autokick=1
# start bucardo
[root@bucardo bucardo]# bucardo start
Checking for existing processes
Removing file "/var/run/bucardo/fullstopbucardo"
Starting Bucardo

Check the bucardo status

[root@bucardo bucardo]# bucardo status
PID of Bucardo MCP: 11239
 Name             State    Last good    Time    Last I/D    Last bad    Time
================+========+============+=======+===========+===========+=======
 migration_sync | Good   | 06:55:33   | 52s   | 0/0       | none      |

# to check the status of the sync
postgres@bucardo $ bucardo status migration_sync
======================================================================
Sync name                : migration_sync
Current state            : No records found
Source relgroup/database : migration / pgdb95
Tables in sync           : 5
Status                   : Active
Check time               : None
Overdue time             : 00:00:00
Expired time             : 00:00:00
Stayalive/Kidsalive      : Yes / Yes
Rebuild index            : No
Autokick                 : Yes
Onetimecopy              : Yes
Post-copy analyze        : Yes
Last error:              :

Once the databases are in sync we can proceed to take the down time. stop the app, check the status of replication again and stop and start the bucardo to replicate the sequences. I observed that sequences are not getting replicated immediately only after restarting bucardo I observed them getting replicated. After that start the application to connect to the new instance.

errors:

Can’t locate Sys/Syslog.pm in @INC


[root@localhost perl5]# bucardo start
Checking for existing processes
Can’t locate Sys/Syslog.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /usr/local/share/perl5/Bucardo.pm line 32.
BEGIN failed–compilation aborted at /usr/local/share/perl5/Bucardo.pm line 32.
Compilation failed in require at /usr/local/bin/bucardo line 836.

Solution:
yum install perl-Sys-Syslog

Reference:

https://aws.amazon.com/blogs/database/migrating-legacy-postgresql-databases-to-amazon-rds-or-aurora-postgresql-using-bucardo/

How to take backup of sequence’s in PostgreSQL

Some times we might need to copy the sequences from one database to another. Below is the query which is used to create pg_dump statement to take backup of all sequences in the database.

SELECT 'pg_dump -h <host> -d <database> -U <username> '||string_agg('-t ' || n.nspname||'.'||c.relname, ' ')
 FROM pg_catalog.pg_class c
      LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
 WHERE c.relkind IN ('S','')
       AND n.nspname <> 'pg_catalog'
       AND n.nspname <> 'information_schema'
       AND n.nspname !~ '^pg_toast'
   AND pg_catalog.pg_table_is_visible(c.oid);

replace the variables host , username with actual values.

Example

postgres=# SELECT 'pg_dump -h localhost -d postgres -U postgres -f sequence_dump.sql '||string_agg('-t ' || n.nspname||'.'||c.relname, ' ')
 as pg_dump_statement
 FROM pg_catalog.pg_class c
      LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
 WHERE c.relkind IN ('S','')
       AND n.nspname <> 'pg_catalog'
       AND n.nspname <> 'information_schema'
       AND n.nspname !~ '^pg_toast'
   AND pg_catalog.pg_table_is_visible(c.oid);
                                                                                       pg_dump_statement

-------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------
 pg_dump -h localhost -d postgres -U postgres -f sequence_dump.sql -t public.branch_id_seq -t public.test_sq -t public.master_id_seq -t public.test_seq -t public.pgbench_te
st_id_seq -t public.test_seq_1
(1 row)

Execute the output of the sql statement (pg_dump command) and it will take the backup of all sequences into sequence_dump.sql which can be restored in other databases.

Reference:

https://www.postgresql.org/docs/9.6/app-pgdump.html

Build query to get all columns except some columns

This is one of the requirement we get in DBA world where we get all the columns of a table except one or two columns. There is no sql standard which can directly give this information with a query like “select * from table (except some columns)”. If we have large table with more columns, it is difficult to write complete query with all columns except some columns.

We can use the below query to build the required query

select 'select '||string_agg(attname, ',')||' from public.test;' as query from pg_attribute join pg_class on (pg_class.oid=pg_attribute.attrelid) join pg_namespace on (pg_class.relnamespace=pg_namespace.oid) where pg_class.relname='<tablename>' and nspname='<schemaname>' and attnum > 0 and attname not in ('<exception column list>');

Example

I have a table pgbench_accounts table in d_schema and I want to exclude two columns abalance , filler and get remaining.

compliance=# \d d_schema.pgbench_accounts
  Table "d_schema.pgbench_accounts"
  Column  |     Type      | Modifiers
----------+---------------+-----------
 aid      | integer       | not null
 bid      | integer       |
 abalance | integer       |
 filler   | character(84) |
Indexes:
    "pgbench_accounts_pkey" PRIMARY KEY, btree (aid)

compliance=#
compliance=#
compliance=# select 'select '||string_agg(attname, ',')||' from public.test;' as query from pg_attribute join pg_class on (pg_class.oid=pg_attribute.attrelid) join pg_namespace on (pg_class.relnamespace=pg_namespace.oid) where pg_class.relname='pgbench_accounts' and nspname='d_schema' and attnum > 0 and attname not in ('abalance','filler');
              query
----------------------------------
 select aid,bid from public.test;
(1 row)

Find parameters set for user and database

Sometimes we set parameters in PostgreSQL for user and database specific. So the parameters will take effect only for that user when he connected to that database. To do that we can use below command

--- set statement timeout for a user to 1 min when he connect to postgres database
ALTER USER viswamitra IN DATABASE postgres SET statement_timeout TO 60000;

We can see this value only when we connect to this database with user and execute “show <parameter>”. To see all such configuration parameters use the below command.

SELECT coalesce(role.rolname, 'database wide') as role, 
       coalesce(db.datname, 'cluster wide') as database, 
       setconfig as what_changed
FROM pg_db_role_setting role_setting
LEFT JOIN pg_roles role ON role.oid = role_setting.setrole
LEFT JOIN pg_database db ON db.oid = role_setting.setdatabase;

Example

postgres=# ALTER USER viswamitra IN DATABASE postgres SET statement_timeout TO 60000;
ALTER ROLE
postgres=#
postgres=# SELECT coalesce(role.rolname, 'database wide') as role,
postgres-#        coalesce(db.datname, 'cluster wide') as database,
postgres-#        setconfig as what_changed
postgres-# FROM pg_db_role_setting role_setting
postgres-# LEFT JOIN pg_roles role ON role.oid = role_setting.setrole
postgres-# LEFT JOIN pg_database db ON db.oid = role_setting.setdatabase;
    role    | database |       what_changed
------------+----------+---------------------------
 viswamitra | postgres | {statement_timeout=60000}
(1 row)
« Older Entries Recent Entries »