Parallel query in Postgresql 9.6

Postgresql 9.6 has a new feature which can leverage multiple CPUs in order to answer queries faster. This feature is known as parallel query(It does not support the DML in current release). When the optimizer determines that parallel query is the fastest execution strategy for a particular query, it will create a query plan which includes a Gather node(like root node). 
EX 1 :-
postgres=#  Explain analyze select * from tbl_parallel_test where c1 < 10000
postgres-#  and c2 like ‘%bb%’;
                                                              QUERY PLAN
—————————————————————————————-
 Gather  (cost=1000.00..147608.01 rows=1 width=1008) (actual time=147.086..147.086 rows=0 loops=1) ===> Gather node
   Workers Planned: 4
   Workers Launched: 4
   ->  Parallel Seq Scan on tbl_parallel_test  (cost=0.00..146607.91 rows=1 width=1008) (actual time=141.886..141.886 rows=0 loops=5)
         Filter: ((c1 < 10000) AND (c2 ~~ '%bb%'::text))
         Rows Removed by Filter: 200000
 Planning time: 0.204 ms
 Execution time: 150.866 ms
(8 rows)
In all cases, the Gather node will have exactly one child plan, which is the portion of the plan that will be executed in parallel. If the Gather node is at the very top of the plan tree, then the entire query will execute in parallel. If it is somewhere else in the plan tree, then only that portion of the query will run in parallel.
Using Explain plan like the example above we can see the number of processes the main process which executing requires. When the Gather node in the plan is reached it will ask for that many background worker processes , the background processes will depend on the “max_worker_processes” parameter.
Every background worker process which is successfully started for a given parallel query will execute the portion of the plan which is a descendant of the Gather node.
For below queries the Planner will not use the parallel processes :-
*) The query writes any data or locks any database rows. If a query contains a data-modifying operation either at the top level or within a CTE, no parallel plans for that query will be generated. This is a limitation of the current implementation which could be lifted in a future release.
*) The query might be suspended during execution. In any situation in which the system thinks that partial or incremental execution might occur, no parallel plan is generated. Like in Cursor or in loops.
*) The query uses any function marked PARALLEL UNSAFE. Most system-defined functions are PARALLEL SAFE, but user-defined functions are marked PARALLEL UNSAFE by default.
*) The query is running inside of another query that is already parallel. For example, if a function called by a parallel query issues an SQL query itself, that query will never use a parallel plan. This is a limitation of the current implementation, but it may not be desirable to remove this limitation, since it could result in a single query using a very large number of processes.
*) The transaction isolation level is serializable. This is a limitation of the current implementation.
Some times the query planner shows the parallel but at the execution time it will not pick parallel processes :-
*) No background workers can be obtained because of the limitation that the total number of background workers cannot exceed max_worker_processes.
*) The transaction isolation level is serializable. This situation does not normally arise, because parallel query plans are not generated when the transaction isolation level is serializable. However, it can happen if the transaction isolation level is changed to serializable after the plan is generated and before it is executed.
We can create function with parallel enabled using the PARALLEL key word , there are three keywords we can attach to the function :-
Parallel safe : A parallel safe operation is one which does not conflict with the use of parallel query.
Parallel restricted : A parallel restricted operation is one which cannot be performed in a parallel worker, but which can be performed in the leader while parallel query is in use.
Parallel unsafe : A parallel unsafe operation is one which cannot be performed while parallel query is in use, not even in the leader.
The following operations are always parallel restricted :-
*) Scans of common table expressions (CTEs).
*) Scans of temporary tables.
*) Scans of foreign tables, unless the foreign data wrapper has an IsForeignScanParallelSafe API which indicates otherwise.
*) Access to an InitPlan or SubPlan.
Configuring the Parallelism :-
To enable the parallelism in Postgres, we need to enable the below parameters.
max_parallel_workers_per_gather : Set this to more than zero , setting this more than the cpu core is not so useful.
max_worker_processes : This directly controls the number of background process and indirectly controls the number of parallel process to be created. Set this parameter according to the environment.
“Parallel query works better when the input rows are very high and output rows is less.”
Some times we will see a drastic difference in the plans using the parallel if we are selecting more output rows like below.
EX 2:-
postgres=# set max_parallel_workers_per_gather=4;
SET
postgres=#  explain analyze select * from tbl_parallel_test t1 inner join tb2_parallel_test t2 on (t1.c1=t2.c1);
                                                                      QUERY PLAN
———————————————————————————–
 Gather  (cost=292331.48..730088.66 rows=999977 width=2016) (actual time=9354.606..39907.220 rows=1000000 loops=1)
   Workers Planned: 4
   Workers Launched: 4
   ->  Hash Join  (cost=291331.48..629090.96 rows=999977 width=2016) (actual time=9178.603..27657.812 rows=200000 loops=5)
         Hash Cond: (t2.c1 = t1.c1)
         ->  Parallel Seq Scan on tb2_parallel_test t2  (cost=0.00..145358.02 rows=250002 width=1008) (actual time=0.030..158.447 rows=200000 loops=5)
         ->  Hash  (cost=152857.77..152857.77 rows=999977 width=1008) (actual time=9175.104..9175.104 rows=1000000 loops=5)
               Buckets: 4096 (originally 4096)  Batches: 512 (originally 256)  Memory Usage: 4065kB
               ->  Seq Scan on tbl_parallel_test t1  (cost=0.00..152857.77 rows=999977 width=1008) (actual time=0.015..880.792 rows=1000000 loops=5)
 Planning time: 0.150 ms
 Execution time: 39971.560 ms
(11 rows)
postgres=# set max_parallel_workers_per_gather=0;
SET
postgres=# explain analyze select * from tbl_parallel_test t1 inner join tb2_parallel_test t2 on (t1.c1=t2.c1);
                                                                  QUERY PLAN
———————————————————————————-
 Hash Join  (cost=291331.48..835869.33 rows=999977 width=2016) (actual time=1339.144..7753.688 rows=1000000 loops=1)
   Hash Cond: (t2.c1 = t1.c1)
   ->  Seq Scan on tb2_parallel_test t2  (cost=0.00..152858.06 rows=1000006 width=1008) (actual time=0.039..489.541 rows=1000000 loops=1)
   ->  Hash  (cost=152857.77..152857.77 rows=999977 width=1008) (actual time=1335.402..1335.402 rows=1000000 loops=1)
         Buckets: 4096 (originally 4096)  Batches: 512 (originally 256)  Memory Usage: 4065kB
         ->  Seq Scan on tbl_parallel_test t1  (cost=0.00..152857.77 rows=999977 width=1008) (actual time=0.004..443.079 rows=1000000 loops=1)
 Planning time: 0.152 ms
 Execution time: 7798.724 ms
(8 rows)
Parallel query is taking time 5 times than normal query without parallel.
EX 3:-
postgres=# explain analyze select * from tbl_parallel_test t1 inner join tb2_parallel_test t2 on (t1.c1=t2.c1) where t1.c1 < 1000;
                                                                QUERY PLAN
——————————————————————————–
 Hash Join  (cost=155370.59..311988.97 rows=1030 width=2016) (actual time=942.237..943.025 rows=999 loops=1)
   Hash Cond: (t2.c1 = t1.c1)
   ->  Seq Scan on tb2_parallel_test t2  (cost=0.00..152858.06 rows=1000006 width=1008) (actual time=0.058..410.925 rows=1000000 loops=1)
   ->  Hash  (cost=155357.71..155357.71 rows=1030 width=1008) (actual time=420.606..420.606 rows=999 loops=1)
         Buckets: 2048  Batches: 1  Memory Usage: 1031kB
         ->  Seq Scan on tbl_parallel_test t1  (cost=0.00..155357.71 rows=1030 width=1008) (actual time=418.324..419.794 rows=999 loops=1)
               Filter: (c1 < 1000)
               Rows Removed by Filter: 999001
 Planning time: 0.372 ms
 Execution time: 943.257 ms
(10 rows)
postgres=# set max_parallel_workers_per_gather=4;
SET
postgres=# explain analyze select * from tbl_parallel_test t1 inner join tb2_parallel_test t2 on (t1.c1=t2.c1) where t1.c1 < 1000;
                                                                       QUERY PLAN
———————————————————————————-
 Hash Join  (cost=147098.80..303717.19 rows=1030 width=2016) (actual time=638.002..638.462 rows=999 loops=1)
   Hash Cond: (t2.c1 = t1.c1)
   ->  Seq Scan on tb2_parallel_test t2  (cost=0.00..152858.06 rows=1000006 width=1008) (actual time=0.039..390.565 rows=1000000 loops=1)
   ->  Hash  (cost=147085.93..147085.93 rows=1030 width=1008) (actual time=138.299..138.299 rows=999 loops=1)
         Buckets: 2048  Batches: 1  Memory Usage: 1031kB
         ->  Gather  (cost=1000.00..147085.93 rows=1030 width=1008) (actual time=136.893..137.604 rows=999 loops=1)
               Workers Planned: 4
               Workers Launched: 4
               ->  Parallel Seq Scan on tbl_parallel_test t1  (cost=0.00..145982.93 rows=258 width=1008) (actual time=131.796..132.132 rows=200 loops=5)
                     Filter: (c1 < 1000)
                     Rows Removed by Filter: 199800
 Planning time: 0.157 ms
 Execution time: 638.680 ms
(13 rows)
Parallel query is taking less time than the normal query when selecting less number of rows.
Parameters that effect the parallelism :-
*) max_parallel_workers_per_gather: the number of workers that can assist a sequential scan of a table;
*) min_parallel_relation_size: the minimum size that a relation must have for the planner to consider the use of additional workers;
*) parallel_setup_cost: the planner parameter that estimates the cost of instantiate a worker;
*) parallel_tuple_cost: the planner parameter that estimates the cost of transferring a tuple from one worker to another;
*) force_parallel_mode: parameter useful for testing, strong parallelism and also a query in which the planner would operate in other ways.
References :-

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s