This is a series of posts on PostgreSQL declarative partitioning starting from PostgreSQL 10.
Until PostgreSQL 9.6 traditional partitioning method is used, where inheritance is used to create partitions and to maintain the data integrity CHECK constraints and Triggers are used. This method has following disadvantages.
- Complexity: need to create child tables, triggers, and check constraints.
- Data consistency depend on CHECK constraints defined on individual tables.
- Insert statements are re-directed to the child tables using triggers so it is slow.
In PostgreSQL 10 we don’t need to define the triggers or constraints, also the tuple routing is handled internally. I have done a small benchmark on the performance of 9.6 and 10 partitioning; you can find it at the end.
Types of partitioning in PostgreSQL 10
PostgreSQL supports two types of partitioning
List
To create list partitioned table, specify the name of the column to be used for partition during the create table statement along with partition type as LIST. Only one column can be specified.
postgres=# create table country (state varchar, code int) partition by list(state);
CREATE TABLE
postgres=#
postgres=# create table india_ap partition of country for values in ( 'Andhra Pradesh');
CREATE TABLE
postgres=#
postgres=# create table india_kerala partition of country for values in ( 'Kerla');
CREATE TABLE
postgres=#
postgres=# \d+ country
Table "public.country"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+-------------------+-----------+----------+---------+----------+--------------+-------------
state | character varying | | | | extended | |
code | integer | | | | plain | |
Partition key: LIST (state)
Partitions: india_ap FOR VALUES IN ('Andhra Pradesh'),
india_kerala FOR VALUES IN ('Kerla')
postgres=# insert into country values ('Andhra Pradesh',1);
INSERT 0 1
postgres=# select * from country ;
state | code
----------------+------
Andhra Pradesh | 1
(1 row)
postgres=# insert into country values ('TAMIL NADU',3);
ERROR: no partition of relation "country" found for row
DETAIL: Partition key of the failing row contains (state) = (TAMIL NADU).
If we try to insert data into table which is not in the partition list, then it will through an error.
Obtaining the partition information
The pg_get_partkeydef function can be used to obtain the partition method and column information. Restrictions on each partition can be obtained with the pg_get_partition_constraintdef function.
postgres=# select pg_get_partkeydef('country'::regclass);
pg_get_partkeydef
-------------------
LIST (state)
(1 row)
postgres=# select pg_get_partition_constraintdef('india_kerala'::regclass);
pg_get_partition_constraintdef
------------------------------------------------------------------------
((state IS NOT NULL) AND ((state)::text = 'Kerla'::character varying))
(1 row)
Range
The table is partitioned into “ranges” defined by a key column or set of columns, with no overlap between the ranges of values assigned to different partitions. To create a range partitioned table create table with column name which can be partitioned based on ranges and specify the partition type as RANGE. Multiple columns can be specified by separating with a comma (,). Not null constraint is automatically applied for the partition columns.
Below is an example of how the declarative partitioning works, it is a simple example copied from documentation.
create partitioned table by specifying the partition by type of partition, here we are partitioning a table based on a date (range partitioning)
CREATE TABLE measurement (
city_id int not null,
logdate date not null,
peaktemp int,
unitsales int
) PARTITION BY RANGE (logdate);
Create partitions of the table, while creating mention the range for the date whose data need to be stored.
CREATE TABLE measurement_y2006m02 PARTITION OF measurement
FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');
CREATE TABLE measurement_y2006m03 PARTITION OF measurement
FOR VALUES FROM ('2006-03-01') TO ('2006-04-01');
--- While creating the table we can say “unbounded” for beginning or ending value so that it accept all values and make it as default partition.
CREATE TABLE measurement_default PARTITION OF measurement FOR VALUES FROM ('2006-04-01') TO ('infinity');
--- For partition key with data type of numbers we need to give “unbounded” in the place of infinity.
We can also create partitions for a partition called sub-partitions.
CREATE TABLE measurement_y2006m02 PARTITION OF measurement
FOR VALUES FROM ('2006-02-01') TO ('2006-03-01')
PARTITION BY RANGE (peaktemp);
Delete old partition
The data will be expired based on the SLA we have, once the data is expired we need to delete the data. We can simply drop the partition table using “DROP TABLE”
DROP TABLE measurement_y2006m02;
Recommended option is detaching the partition and take backup if required and delete the partition.
ALTER TABLE measurement DETACH PARTITION measurement_y2006m02;
Drop table take a table level lock on the parent table.
Add a new partition to existing partitioned table
Some times we need to add an existing or create a new one and do bulk load into the table and add it as a partition to the existing table. We can do it like below.
CREATE TABLE measurement_y2008m02
(LIKE measurement INCLUDING DEFAULTS INCLUDING CONSTRAINTS);
ALTER TABLE measurement_y2008m02 ADD CONSTRAINT y2008m02
CHECK (logdate >= DATE '2008-02-01' AND logdate < DATE '2008-03-01');
\copy measurement_y2008m02 from 'measurement_y2008m02'
-- possibly some other data preparation work
ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02
FOR VALUES FROM ('2008-02-01') TO ('2008-03-01');
Before attaching the table as a partition we need to add the constraint else it will scan the complete table during the time of attach.
Constraint exclusion
PostgreSQL use the constraint exclusion during planning phase to find which partitions to scan. All constraints on all partitions of the master table are examined during constraint exclusion, so large numbers of partitions are likely to increase query planning time considerably.
Partitioning using these techniques will work well with up to perhaps a hundred partitions; do not try to use many thousands of partitions.
Limitations
There is no facility to create global index (index across all partitions), in turn there is no facility to create primary key or unique key or exclusion constraint on partitioned tables (on all partitions).
No foreign key creation is possible on partitioned table and partition table cannot refer the key from other table.
Using the ON CONFLICT clause with partitioned tables will cause an error, because unique or exclusion constraints can only be created on individual partitions. There is no support for enforcing uniqueness (or an exclusion constraint) across an entire partitioning hierarchy.
An UPDATE that causes a row to move from one partition to another fails, because the new value of the row fails to satisfy the implicit partition constraint of the original partition.
Row triggers, if necessary, must be defined on individual partitions, not the partitioned table.
Mixing temporary and permanent relations in the same partition tree is not allowed. Hence, if the partitioned table is permanent, so must be its partitions and likewise if the partitioned table is temporary. When using temporary relations, all members of the partition tree must be from the same session.
NOTE
Choosing right partition key is an important task, it will decide the query performance.
Choosing the number of partitions is important task, large number of partitions increase the planning cost and memory consumption. Less number of partitions with large size can degrade query performance. The query planner is generally able to handle partition hierarchies with up to a few hundred partitions.
Truncate command is propagated to all partitions of the table
Copy statement is propagated to child tables
Dropping parent table drops child tables
Add or remove columns do add columns to partition tables but partition column can not be dropped
Catalog tables to store partition information
New catalog table pg_patitioned_table maintain the partitioned table information. pg_class table maintain the information about a table is partitioned or not and boundary information
postgres=# SELECT partrelid::regclass, * FROM pg_partitioned_table ;
partrelid | partrelid | partstrat | partnatts | partattrs | partclass | partcollation | partexprs
-------------+-----------+-----------+-----------+-----------+-----------+---------------+-----------
measurement | 16384 | r | 1 | 2 | 3122 | 0 |
(1 row)
postgres=# SELECT relname, relispartition, relpartbound FROM pg_class WHERE
relname = 'measurement_y2006m02' ;
-[ RECORD 1 ]--+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
relname | measurement_y2006m02
relispartition | t
relpartbound | {PARTITIONBOUNDSPEC :strategy r :listdatums <> :lowerdatums ({PARTITIONRANGEDATUM :kind 0 :value {CONST :consttype 1082 :consttypmod -1 :constcollid 0 :constlen 4 :constbyval true :constisnull false :location 80 :constvalue 4 [ -81 8 0 0 0 0 0 0 ]} :location 80}) :upperdatums ({PARTITIONRANGEDATUM :kind 0 :value {CONST :consttype 1082 :consttypmod -1 :constcollid 0 :constlen 4 :constbyval true :constisnull false :location 98 :constvalue 4 [ -53 8 0 0 0 0 0 0 ]} :location 98}) :location 74}
--- Get the definition of the partition boundary
postgres=# SELECT relname, relispartition, pg_get_expr(relpartbound, oid)
postgres-# FROM pg_class WHERE relname = 'measurement_y2006m02' ;
-[ RECORD 1 ]--+-------------------------------------------------
relname | measurement_y2006m02
relispartition | t
pg_get_expr | FOR VALUES FROM ('2006-02-01') TO ('2006-03-01')
9.6 vs 10 Partitioning performance
I have done a small performance test on 9.6 and 10 version. I created a new partitioned table and loaded data into the table using copy. 10 version is much faster than 9.6 version.
---- PostgreSQL 10 ----
postgres=# create table part_test (id int not null, name varchar not null, address varchar not null) partition by range(id);
CREATE TABLE
postgres=# create table part_test_1 partition of part_test for values from (1) to (10000);
CREATE TABLE
postgres=# create table part_test_2 partition of part_test for values from (10000) to (20000);
CREATE TABLE
postgres=# create table part_test_3 partition of part_test for values from (20000) to (30000);
CREATE TABLE
postgres=# create table part_test_4 partition of part_test for values from (30000) to (40000);
CREATE TABLE
postgres=# create table part_test_5 partition of part_test for values from (40000) to (50000);
CREATE TABLE
postgres=# \copy part_test from 'test.csv' CSV;
COPY 49999
Time: 73.360 ms
---- PostgreSQL 9.6 ----
postgres=# create table part_test (id int not null, name varchar not null, address varchar not null);
CREATE TABLE
postgres=# create table part_test_1(CHECK ( id >= 1 and id < 10000)) inherits (part_test);
CREATE TABLE
postgres=# create table part_test_2(CHECK ( id >= 10000 and id < 20000)) inherits (part_test);
CREATE TABLE
postgres=# create table part_test_3(CHECK ( id >= 20000 and id < 30000)) inherits (part_test);
CREATE TABLE
postgres=# create table part_test_4(CHECK ( id >= 30000 and id < 40000)) inherits (part_test);
CREATE TABLE
postgres=# create table part_test_5(CHECK ( id >= 40000 and id < 50000)) inherits (part_test);
CREATE TABLE
postgres=#
CREATE OR REPLACE FUNCTION part_test_insert_trigger()
RETURNS TRIGGER AS $$
BEGIN
IF ( NEW.id >= 1 AND NEW.id < 10000 )
THEN
INSERT INTO part_test_1 VALUES (NEW.*);
ELSIF ( NEW.id >= 10000 AND NEW.id < 20000 ) THEN
INSERT INTO part_test_2 VALUES (NEW.*);
ELSIF ( NEW.id >= 20000 AND NEW.id < 30000 ) THEN
INSERT INTO part_test_3 VALUES (NEW.*);
ELSIF ( NEW.id >= 30000 AND NEW.id < 40000 ) THEN
INSERT INTO part_test_4 VALUES (NEW.*);
ELSIF ( NEW.id >= 40000 AND NEW.id < 50000 ) THEN
INSERT INTO part_test_5 VALUES (NEW.*);
ELSE
RAISE EXCEPTION 'Date out of range. Fix the measurement_insert_trigger() function!';
END IF;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;
CREATE TRIGGER part_test_trigger
BEFORE INSERT ON part_test
FOR EACH ROW EXECUTE PROCEDURE part_test_insert_trigger();
postgres=# \copy part_test from test.csv CSV;
COPY 0
Time: 546.744 ms
postgres=#
postgres=# select count(*) from part_test;
count
-------
49999
(1 row)
Time: 4.994 ms