Tag Archives: POSTGRES

Query to kill idle connections in PostgreSQL

PostgreSQL is process oriented architecture, for evey user connection one process will be created in background. If the user forget to close the connection the process will still consume some resources. Below is the query to close the idle connections

--- Close the connections which are idle for more than 15 mins

SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE datname = 'databasename'
   AND pid <> pg_backend_pid()
   AND state = 'idle'
   and extract(epoch from (current_timestamp - query_start)) > 15*60;

Identify a role has execute permission on function in PostgreSQL

To identify a role has execute permission on function, PostgreSQL has a catalog function(has_function_privilege) which makes it easy. Below is the simple example.

I have created a simple schema test with a function row_count() which return the row count of employee table. I have grated execut permission on the function to reader role.

postgres=# create schema test;
postgres=# revoke USAGE on SCHEMA test from PUBLIC ;
postgres=# revoke execute on all functions in schema test from public;
postgres=# set search_path to test, public;
postgres=# CREATE OR REPLACE FUNCTION row_count()
postgres-#  RETURNS integer AS $total$
postgres$#  declare
postgres$#  total integer;
postgres$#  BEGIN
postgres$#  SELECT count(1) into total FROM public.employee;
postgres$#  RETURN total;
postgres$#  END;
postgres$#  $total$
postgres-#  LANGUAGE plpgsql;
postgres=# create role reader;
postgres=# create role temp;
postgres=# grant usage on schema test to reader ;
postgres=# grant USAGE on SCHEMA test to temp;
postgres=# grant SELECT on TABLE public.employee to reader ;
postgres=# grant EXECUTE on FUNCTION test.row_count to reader ;

To know the reader and temp roles has permissions on the row_count() function

postgres=# SELECT has_function_privilege('reader', 'test.row_count()', 'execute');
(1 row)

postgres=# SELECT has_function_privilege('temp', 'test.row_count()', 'execute');
(1 row)

The reader role has permissions to execute but not the temp.

postgres=# set role reader ;
postgres=> select test.row_count();
(1 row)

postgres=> set role temp;
postgres=> select test.row_count();
ERROR:  permission denied for function row_count

Dynamic partitions in PostgreSQL

Up to PostgreSQL 9.6, to partition a table we use trigger functions and inheritance to create partitions. In range partitioning(on date or a sequence number) we need to create partitions manually in advance and change the trigger function every time.

We have a similar situation where we have a transactions table which is partitioned on quarter basis and we need to create partitions manually and change trigger function when the limit is reached.

To overcome this I wrote a function which will create partition if it not exists, also it will create the indexes same as the master table. Below is the table and function. We can use the same function for month wise partitions by doing some changes.

Table Definition

Below is my table definition. We need to partition this table based on transaction_date column. It is quarter wise partition at any time we need to store only last two quarters data and remaining partitions we need to drop.

create table client_transactions(
    id bigserial primary key,
    client_id varchar,
    transaction_date date not null,
    ISIN varchar,
    price decimal,
    description text

Trigger Function

To partition the table we need to create a trigger function on the master table, which will handle the creation of child table, creation of indexes on child table same as in the master table.

create or replace function client_transactions_function()
returns trigger as $$
    table_name varchar;
    quarter int;
    ts_begin date;
    ts_end date;
	query text;
	rec RECORD;
	t_year int;
	n_year int;
    select EXTRACT(QUARTER FROM new.transaction_date) into quarter;
	select EXTRACT(year from new.transaction_date) into t_year;
    table_name := format('client_transactions_%s_Q%s', t_year, quarter);
    perform 1 from pg_class where lower(relname) = lower(table_name) limit 1;
    if not found
        CASE quarter
          WHEN 1 THEN select t_year||'-01-01' into ts_begin; select t_year||'-04-01' into ts_end;
		  WHEN 2 THEN select t_year||'-04-01' into ts_begin; select t_year||'-07-01' into ts_end;
          WHEN 3 THEN select t_year||'-07-01' into ts_begin; select t_year||'-10-01' into ts_end;
          WHEN 4 THEN select t_year||'-10-01' into ts_begin; n_year = t_year+1; select n_year||'-01-01' into ts_end;
          ELSE RAISE EXCEPTION 'Wrong date range';
        END CASE; 		  
        execute format('create table %s (like client_transactions including all)', table_name);
        execute format('alter table %s inherit client_transactions, add check (transaction_date >= ''%s'' and transaction_date < ''%s'')',table_name, ts_begin, ts_end);
    end if;
    execute 'insert into ' || table_name || ' values ( ($1).* )' using new;
    return null;
language plpgsql;

In this function first we verify the existence of table “client_transactions_<year>_<quarter>” , if it exists then we will insert data into the table. Else we will create the table, indexes and insert the data into that table.

Trigger creation

Now we will proceed and create the trigger.

create trigger insert_client_transactions_trigger before insert on client_transactions for each row execute procedure client_transactions_function();

This will create the trigger on the client_transactions table and execute the function before data inserted into the client_transactions table.

Verify the trigger function

Now the trigger creation is completed, we will verify by inserting data into the table.

--- inserting some random data into the table
insert into client_transactions(client_id,transaction_date,ISIN,price,description) values ('EI1W34',now()::date,'INE001A01036',986282.1022,'HDFC BANK');

insert into client_transactions(client_id,transaction_date,ISIN,price,description) values ('EI1W34',now()::date,'INE001A01036',986282.1022,'HDFC BANK');

insert into client_transactions(client_id,transaction_date,ISIN,price,description) values ('EI1W34','2020-01-01'::date,'INE001A01036',986282.1022,'HDFC BANK');
--- checking new child tables created or not
partitioning=# \dt
                    List of relations
 Schema |            Name             | Type  |  Owner
 public | client_transactions         | table | postgres
 public | client_transactions_2020_q1 | table | postgres
 public | client_transactions_2020_q2 | table | postgres
(3 rows)
--- checking table definition
partitioning=# \d client_transactions_2020_q2
                               Table "public.client_transactions_2020_q2"
      Column      |       Type        |                            Modifiers
 id               | bigint            | not null default nextval('client_transactions_id_seq'::regclass)
 client_id        | character varying |
 transaction_date | date              | not null
 isin             | character varying |
 price            | numeric           |
 description      | text              |
    "client_transactions_2020_q2_pkey" PRIMARY KEY, btree (id)
Check constraints:
    "client_transactions_2020_q2_transaction_date_check" CHECK (transaction_date >= '2020-04-01'::date AND transaction_date < '2020-07-01'::date)
Inherits: client_transactions

As we can see two tables for two different quarters got created. Also the table got all indexes including constraints (primary key, not null etc).

Now we will execute a query and verify the partition scanning.

partitioning=# explain analyze select * from client_transactions where transaction_date >= '2020-04-19';
                                                           QUERY PLAN
 Append  (cost=0.00..16.00 rows=161 width=140) (actual time=0.008..0.009 rows=2 loops=1)
   ->  Seq Scan on client_transactions  (cost=0.00..0.00 rows=1 width=140) (actual time=0.003..0.003 rows=0 loops=1)
         Filter: (transaction_date >= '2020-04-19'::date)
   ->  Seq Scan on client_transactions_2020_q2  (cost=0.00..16.00 rows=160 width=140) (actual time=0.004..0.005 rows=2 loops=1)
         Filter: (transaction_date >= '2020-04-19'::date)
 Planning time: 0.214 ms
 Execution time: 0.029 ms
(7 rows)

As we can see only the child table “client_transactions_2020_q2” got scanned.


We have two tables left, right. We need to delete the data from left which is not present in the right table based on a particular column(s). Below are the details of tables.
feeds=> \dt+ right
                  List of relations
 Schema | Name | Type  | Owner |  Size  | Description
 public | right | table | oguri | 175 MB |
(1 row)
feeds=> \dt+ left
                          List of relations
   Schema    |    Name     | Type  |  Owner   |  Size  | Description
 vendor_data | left    | table | oguri   | 641 MB |
(1 row)
When I execute the below query, it is taking lot of time(after 30 mins also the query was still running)
so I cancelled the query.
delete from left where (identifier, id_bb_global) NOT IN (SELECT identifier, id_bb_global FROM right) 
AND region=’asia’;
After that I executed below command to check the estimated execution plan for the query. ( I changed from delete to select with same condition).
feeds=> explain select * from left where (identifier, id_bb_global) NOT IN 
(SELECT identifier, id_bb_global FROM right) AND region=’asia’;
                                 QUERY PLAN                               
 Seq Scan on left  (cost=0.00..3345120516.58 rows=87786 width=944)
   Filter: (((region)::text = ‘asia’::text) AND (NOT (SubPlan 1)))
   SubPlan 1
     ->  Materialize  (cost=0.00..26237.23 rows=173882 width=31)
           ->  Seq Scan on right  (cost=0.00..24178.82 rows=173882 width=31)
(5 rows)
As you can see in the execution plan, it is materializing the right table. what is Materialize in execution plan ?
A materialize node means the output of whatever is below it in the tree (which can be a scan, or a
full set of joins or something like that or a contiguous cache of rows) is materalized into memory before the upper node is executed. This is usually done when the outer node needs a source that it can re-scan for some reason or other.
So the right table will be scanned and materalized into memory (work_mem). Since PostgreSQL cannot flush a hashed subplan onto the disk, it will estimate the subquery size, and if it decides that it will not fit into work_mem, it will resort to using a mere subplan which will be either executed for each row from left, or materialized and the rows will be searched for in a loop.
The query will take lot of time if the work_mem is not enough to store the rows from subquery. To over come this we have two solutions.
1) Increase the work_mem value.
2) Use not exists instead of not in.
1) Increase the work_mem.
By seeing the above estimated plan it needs  rows*width (173882*31) bytes of memory to store the output of left table in work_mem. I have set the memory to 10MB and executed the query and it was executed successfully.
feeds=> set work_mem to ’10MB’;
feeds=> explain analyze select * from vendor_data.left where (identifier, id_bb_global) NOT IN (SELECT identifier, id_bb_global FROM right) AND region=’asia’;
                                                       QUERY PLAN                                                     
 Seq Scan on left  (cost=24613.53..111080.82 rows=87197 width=941) (actual time=679.007..679.007 rows=0 loops=1)
   Filter: (((region)::text = ‘asia’::text) AND (NOT (hashed SubPlan 1)))
   Rows Removed by Filter: 250352
   SubPlan 1
     ->  Seq Scan on right  (cost=0.00..24178.82 rows=173882 width=31) (actual time=0.004..214.924 rows=173882 loops=1)
 Planning time: 0.941 ms
 Execution time: 680.191 ms
(7 rows)
2) Use not exists instead of not in.
When I removed the “NOT IN” clause and used the “NOT EXISTS” it worked perfectly, below is the execution plan for the same. Also it is using a better execution plan than first one.
feeds=> explain analyze delete from vendor_data.left t1 where not exists (SELECT 1 FROM right where identifier=t1.identifier and id_bb_global=t1.id_bb_global) AND region=’asia’;
                                                             QUERY PLAN                                                             
 Delete on left t1  (cost=28146.05..119478.96 rows=91197 width=12) (actual time=777.052..777.052 rows=0 loops=1)
   ->  Hash Anti Join  (cost=28146.05..119478.96 rows=91197 width=12) (actual time=304.233..776.662 rows=71 loops=1)
         Hash Cond: (((t1.identifier)::text = (right.identifier)::text) AND ((t1.id_bb_global)::text = (right.id_bb_global)::text))
         ->  Seq Scan on left t1  (cost=0.00..85206.35 rows=175571 width=37) (actual time=0.078..345.248 rows=173953 loops=1)
               Filter: ((region)::text = ‘asia’::text)
               Rows Removed by Filter: 76489
         ->  Hash  (cost=24178.82..24178.82 rows=173882 width=37) (actual time=290.545..290.545 rows=173882 loops=1)
               Buckets: 65536  Batches: 4  Memory Usage: 3427kB
               ->  Seq Scan on right  (cost=0.00..24178.82 rows=173882 width=37) (actual time=0.003..210.958 rows=173882 loops=1)
 Planning time: 0.612 ms
 Trigger _td_bl_left_trigger: time=15.105 calls=71
 Execution time: 792.229 ms
(12 rows)
Reference :

script to take base backup and delete old backup files

# create backup of master Postgess and compress

export PATH=/home/postgres/software/9.6/bin/psql:$PATH
d=$(date +%Y-%m-%d)
export PGPASSWORD=”password”
TO=”user1@yahoo.com user2@yahoo.com user3@yahoo.com user4@yahoo.com”

# Create a directory for today backup

echo “Backup started at $(date) by $0” > $LOG_LOC/$LOGFILE
#Stream mode does not compress
pg_basebackup -h $MASTERD -U back_user -p 5432 -D $BACKUP_LOC/$DIRECTORY –xlog-method=stream -v –checkpoint=fast >> $LOG_LOC/$LOGFILE 2>&1
if [ $? -eq 0 ]
echo “—————–” >> $LOG_LOC/$LOGFILE 2>&1
echo “—————–” >> $LOG_LOC/$LOGFILE 2>&1
echo “Backup completed successfully. Please check the attached logfile” > /tmp/backup_status
echo “Backup failed with some errors. Please check the attached logfile” > /tmp/backup_status
# compress
echo “Compress started at $(date)” >> $LOG_LOC/$LOGFILE
# Create tar file and compress it
tar -czvf $BACKUP_LOC/$d.tar.gz /backup/$d >> /dev/null 2>&1

# clean some old information
/bin/rm -rf /backup/$d >> $LOG_LOC/$LOGFILE 2>&1
echo “Finished at $(date)” >> $LOG_LOC/$LOGFILE

mail -a $LOG_LOC/$LOGFILE -s “Backup status” $TO < /tmp/backup_status
rm -f /tmp/backup_status

## Delete old backups
/usr/bin/find $BACKUP_LOC/*.tar.gz -mtime +28 -delete
/usr/bin/find $LOG_LOC/backup*.log* -mtime +28 -delete
## Delete old archive logs
/usr/bin/find $ARCHIVE_LOC/* -mtime +28 -delete

odd behavior of google cloud machine

I created a machine in google cloud with below.

CPU : 8 vCPUs
RAM : 30 GB
DISK : Standard persistent disk
DB VERSION : PostgreSQL 9.3

When I am checking for time estimation for a 100GB database I am getting different times. There is no processes other then the database and I am the only user using it..

Time for taking base backup first time

plain format : 31m55.348s 

[postgres@old-server bin]$ time pg_basebackup –pgdata=/tmp/BACKUP –format=p –xlog-method=stream

real    31m55.348s
user    0m12.418s
sys     1m45.688s

compressed format : 29m47.282s

[postgres@old-server tmp]$ time pg_basebackup -D /tmp/BACKUP –format=t –compress=9 –gzip
NOTICE:  pg_stop_backup complete, all required WAL segments have been archived

real    29m47.282s
user    27m44.429s
sys     0m38.565s

Compressed format takes more time in normal scenarios but here it is reverse, the compressed format is taking less time. 

I have tried the same base backup with xlog-method stream and with out stream. This time the backup time got doubled.

[postgres@old-server ~]$ time pg_basebackup –pgdata=/tmp/BACKUPWithout stream 
NOTICE:  pg_stop_backup complete, all required WAL segments have been archived

real    61m2.175s
user    0m16.302s
sys     2m11.380s

[postgres@old-server ~]$ time pg_basebackup –pgdata=/tmp/BACKUP –format=p –xlog-method=stream

real    54m38.801s
user    0m15.629s
sys     2m10.309s

query cancellations in postgresql standby server

After configuring streaming replication in PostgreSQL. We can set “hot_standby” parameter in postgresql.conf in standby server to true, so that we can execute read only queries on standby server.

There are mainly two situations where the long running queries on standby server get cancelled due to conflicting locks held in primary.

Due to MVCC implementation in PostgreSQL, if the standby server is executing a query to see the data, the data was already deleted  in primary and executed vacuum to clean the space. When the WAL file data which contain this vacuum information reached to standby, it will cancel the current running query in the standby server.

On primary:
Delete from test_123 where id between 20 and 30;
On standby:
postgres=# begin;
postgres=#  set transaction isolation level REPEATABLE READ;
postgres=# select * from test_123 where id between 20 and 30;
FATAL:  terminating connection due to conflict with recovery
DETAIL:  User query might have needed to see row versions that must be removed.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

Here the query in standby trying to see the old data version or data removed by vacuum on primary.

To overcome this type of errors we can set “hot_standby_feedback” parameter but this will stop vacuum process from cleaning the old data in primary and ultimately bloat the tables.
    Access Exclusive locks taken on the primary server, including both explicit LOCK commands and various DDL actions, conflict with table accesses in standby queries.



lock TABLE test_123 in access exclusive mode ;


==== wait for a minute
postgres=# select * from test_123 where id between 20 and 30;
FATAL:  terminating connection due to conflict with recovery
DETAIL:  User was holding a relation lock for too long.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

To overcome this we need to set the max_standby_archive_delay,max_standby_streaming_delay parameters based on how much time the query takes. But this parameters stop applying the WAL data and the standby will be far behind primary in terms of data.

Background process forked by postmaster perform authentication

Previously I thought the postmaster who perform the authentication of client. But the postmaster simply a fork a background process which actually perform the authentication of the client.
When a request message is received, we now fork() immediately. The child process performs authentication of the request, and then becomes a backend if successful.  This allows the auth code to be written in a simple single-threaded style (as opposed to the crufty “poor man’s multitasking” code that used to be needed). More importantly, it ensures that blockages in non-multi threaded libraries like SSL or PAM cannot cause denial of service to other clients.

Reference :


Memory and CPU utilization per session in PostgreSQL

Create extension “plsh” by following below method. This is used to execute the shell scripts from postgres functions.

wget https://github.com/petere/plsh/archive/1.20171014.zip
unzip 1.20171014.zip
cd plsh-1.20171014/
make install

postgres=# create extension plsh ;

Create below functions which take the process id (PID) and give the cpu and memory utilisation.

cat /proc/$1/smaps | grep -i PSS | awk ”{SUM=SUM+$2}END{print SUM}”
‘ LANGUAGE plsh;

ps –pid=$1 -o %cpu -h
‘ LANGUAGE plsh;

Execute the below command to list the sessions with memory and cpu utilisation.

postgres=# select datname,pid,usename,application_name,state,MEM_UTIL(pid) as MEM_UTIL_KB,CPU_UTIL(pid) as CPU_UTIL_PER from pg_stat_activity where datname is not null;
 datname  |  pid  | usename  | application_name | state  | mem_util_kb | cpu_util_per
 postgres |  9413 | postgres | psql             | active | 6620        |  0.0
 postgres | 16291 | postgres | psql             | idle   | 1059        |  0.0
 postgres | 16293 | postgres | psql             | idle   | 1059        |  0.0
 postgres | 16296 | postgres | psql             | idle   | 1061        |  0.0
 postgres | 16298 | postgres | psql             | idle   | 1060        |  0.0
 postgres | 16330 | postgres | psql             | idle   | 1059        |  0.0
(6 rows)


using stale statistics instead of current ones because stats collector is not responding

When I see the postgresql logfile I can see below message many times
LOG: using stale statistics instead of current ones because stats collector is not responding
PostgreSQL tracks ‘runtime statistics’ (number of scans of a table, tuples fetched from index/table etc.) in a file, maintained by a separate process (collector). When a backed process requests some of the stats (e.g. when a monitoring tool selects from pg_stat_all_tables) it requests a recent snapshot of the file from the collector.
The log message you see means that the collector did not handle such requests fast enough, and the backend decided to read an older snapshot instead. So you may see some stale data in monitoring for example.
This may easily happen if the I/O system is overloaded, for example. The simplest solution is to move the statistics file to RAM disk (e.g. tmpfs mount on Linux) using stats_temp_directory in postgresql.conf.
The space neede depends on the number of objects (databases, tables, indexes), and usually it’s a megabyte in total or so.
To improve the performance mount the pg_stat_tmp directory on RAM by using below procedure. Do this as root user.
echo “tmpfs $PGDATA/pg_stat_tmp tmpfs size=1G,uid=postgres,gid=postgres 0 0” >> /etc/fstab
mount $PGDATA/pg_stat_tmp
[root@s192-169-159-2 ~]# df -h
Filesystem         Size  Used Avail Use% Mounted on
tmpfs              1.0G     0  1.0G   0% /home/postgres/DATA_9.6/pg_stat_tmp
« Older Entries