PostgreSQL 17 - A Major Step Forward in Performance, Logical Replication and More

After a successful 3rd beta in August 2024, the PostgreSQL development group released the GA version of Postgres 17 on September 26th. Recently, I blogged about some of the key logical replication features that you'll see in PostgreSQL 17 https://www.pgedge.com/blog/logical-replication-features-in-Postgres 17. In this blog I'll describe a couple of new performance features that you'll find in Postgres 17 as well as another important logical replication feature that I didn't cover in my earlier blog of this series.

PostgreSQL has grown remarkably over the years, and with each major release has become a more robust, reliable, and responsive database for both mission critical and non-mission critical enterprise applications. The global and vibrant PostgreSQL community is contributing to PostgreSQL success, diligently ensuring that all changes are carefully scrutinized and reviewed before they are added to the project source code. It is also very encouraging to see big technology names like Microsoft, Google, and others investing in Postgres by developing in-house expertise and giving back to the open source community.

Improvements to logical replication are making it even more robust and reliable for enterprise use, while providing core capabilities that vendors like pgEdge can build on to deliver fully distributed PostgreSQL. Distributed PostgreSQL refers to the implementation of PostgreSQL in a distributed architecture, allowing for enhanced scalability, fault tolerance, and improved performance across multiple nodes. A pgEdge fully distributed PostgreSQL cluster already provides essential enterprise features like improved performance with low latency, high availability, data residency, and fault tolerance.

Now without further adieu let's discuss some PostgreSQL 17 performance features:

Improved Query Performance with Materialized CTEs

Common Table Expressions (CTEs) in PostgreSQL are temporary result sets that can be referenced within a SELECT, INSERT, UPDATE, or DELETE statement. They enhance the readability and organization of complex queries and can be recursive, making them particularly useful for hierarchical data. The basic syntax of a CTE query is as follows:

WITH cte_names AS 
(
	– QUERY here
)
Select * from cte_names

Include the WITH keyword in a query to create the CTE; the parent query (that defines the result set) follows the AS clause after the CTE name. After defining the CTE, you can refer to the CTE by name to reference the result set of the CTE and carry out further operations on the result set within the same query.

PostgreSQL 17 continues to enhance performance and capabilities around CTEs, including improvements in query planning and execution. Older versions of Postgres treat CTEs as optimization fences, meaning the planner could not push down predicates into them. However, from PostgreSQL 12 onward, you can define more efficient execution plans. You should always analyze your queries and consider the execution plans when performance is critical.

Performance tip: If you will be referring to the same result set multiple times, create the CTE with the MATERIALIZED keyword. When you create a materialized CTE, Postgres computes and stores the result of the parent query. Then, subsequent queries aren't required to perform complex computations multiple times if you reference the CTE multiple times.

Extracting column statistics from CTE references; Postgres 17 improves materialized CTE’s

A materialized CTE basically acts as an optimization fence, which means that the outer query won’t influence the plan of the sub-query once that plan is chosen. The outer query has visibility into the estimated width and row counts of the CTE result set, so it makes sense to propagate the column statistics from the sub-query to the planner for the outer query. The outer query can make use of whatever information is available, allowing the column statistical information to propagate up to the outer query plan but not down to the CTE plan.

This bug reported to the community contains a simple test case that can demonstrate the improvement and effect on the query planner as a result of this improvement.

https://www.postgresql.org/message-id/flat/18466-1d296028273322e2%40postgresql.org

Example - Comparing Postgres 16 behavior to Postgres 17

First, we create our work space in Postgres 16 and run ANALYZE against it; two tables and indexes:

postgres=# create table t1(a int);
CREATE TABLE
postgres=# create table t2(b int);
CREATE TABLE
postgres=# create index my_index on t1 using btree (a);
CREATE INDEX
postgres=# insert into t1 select generate_series(1, 100000) from generate_series(1, 3);
INSERT 0 300000
postgres=# insert into t2 select generate_series(1, 100) from generate_series(1, 10);
INSERT 0 1000
postgres=# analyze t1;
ANALYZE
postgres=# analyze t2;
ANALYZE

Then, we create our materialized CTE:

postgres=# explain analyze with my_cte as materialized (select b from t2) 
select * from t1 where t1.a in (select b from my_cte);

The query plan from our Postgres 16 code sample contains:

QUERY PLAN         
----------------------------------------------------------------------
 Nested Loop  (cost=37.92..856.50 rows=2966 width=4) (actual time=0.574..0.722 rows=300 loops=1)
   CTE my_cte
     ->  Seq Scan on t2  (cost=0.00..15.00 rows=1000 width=4) (actual time=0.038..0.161 rows=1000 loops=1)
   ->  HashAggregate  (cost=22.50..24.50 rows=200 width=4) (actual time=0.449..0.461 rows=100 loops=1)
         Group Key: my_cte.b
         Batches: 1  Memory Usage: 40kB
         ->  CTE Scan on my_cte  (cost=0.00..20.00 rows=1000 width=4) (actual time=0.046..0.322 rows=1000 loops=1)
   ->  Index Only Scan using my_index on t1  (cost=0.42..4.06 rows=3 width=4) (actual time=0.002..0.002 rows=3 loops=100)
         Index Cond: (a = my_cte.b)
         Heap Fetches: 0
 Planning Time: 1.242 ms
 Execution Time: 1.051 ms
(12 rows)

As you can see in the query plan, the column statistics of 200 rows from the sub-query is wrong, which is impacting the overall plan.

   ->  HashAggregate  (cost=22.50..24.50 rows=200 width=4) (actual time=0.449..0.461 rows=100 loops=1)
         Group Key: my_cte.b

Then, we test the same setup and query against PostgreSQL 17

postgres=# explain analyze with my_cte as materialized (select b from t2) 
select * from t1 where t1.a in (select b from my_cte);

QUERY PLAN                               
-------------------------------------------------------------------------------------------------
---------------------------------
 Merge Join  (cost=42.25..54.29 rows=302 width=4) (actual time=0.627..0.712 rows=300 loops=1)
   Merge Cond: (t1.a = my_cte.b)
   CTE my_cte
     ->  Seq Scan on t2  (cost=0.00..15.00 rows=1000 width=4) (actual time=0.031..0.134 rows=1000 loops=1)
   ->  Index Only Scan using my_index on t1  (cost=0.42..7800.42 rows=300000 width=4) (actual time=0.027..0.049 rows=301 loops=1)
         Heap Fetches: 0
   ->  Sort  (cost=26.82..27.07 rows=100 width=4) (actual time=0.598..0.604 rows=100 loops=1)
         Sort Key: my_cte.b
         Sort Method: quicksort  Memory: 25kB
         ->  HashAggregate  (cost=22.50..23.50 rows=100 width=4) (actual time=0.484..0.494 rows=100 loops=1)
               Group Key: my_cte.b
               Batches: 1  Memory Usage: 24kB
               ->  CTE Scan on my_cte  (cost=0.00..20.00 rows=1000 width=4) (actual time=0.033..0.324 rows=1000 loops=1)
 Planning Time: 1.066 ms
 Execution Time: 0.946 ms
(15 rows)

As you can see in the query plan for Postgres 17, the column statistics from the subquery are correctly propagating to the upper planner of the outer query. This helps PostgreSQL choose a better plan that improves the execution time of the query.

This is a simple query, but with bigger and complex queries this change can result in a major performance difference.

Propagating pathkeys from a CTE to an Outer Query

Another interesting improvement to CTE functionality in Postgres 17 is the propagation of path keys from the sub-query to the outer query. In PostgreSQL, pathkeys are a part of the query execution planning process used primarily for sorting and ordering rows in queries that require ordered results, such as queries with an ORDER BY clause, or when sorting is needed for other operations like merge joins.

Prior to Postgres 17, the sort order of the materialized CTE sub-query was not shared with the outer query, even if sort order was guaranteed by either an index scan node or sort node. Not having a guaranteed sort order allows the PostgreSQL planner to choose a less optimized plan, whereas having a guaranteed sort order will make it more likely to choose an optimized plan.

With PostgreSQL 17, if a CTE is materialized and has a specific sort order, the planner can reuse that information in the outer query, improving performance by avoiding redundant sorting or enabling more efficient join methods. As noted in the commit comments by Tom Lane, "The code for hoisting pathkeys into the outer query already exists for regular RTE_SUBQUERY subqueries, but it wasn't getting used for CTEs, possibly out of concern for maintaining an optimization fence between the CTE and the outer query."

This simple modification to the Postgres source code should result in performance improvements for queries involving complex CTEs, especially those where sorting or merge joins can be optimized based on the inherent order of CTE results.

Here is an example using the data in PostgreSQL regression

postgres=# CREATE TABLE tenk1 (
postgres(#         unique1         int4,
postgres(#         unique2         int4,
postgres(#         two             int4,
postgres(#         four            int4,
postgres(#         ten             int4,
postgres(#         twenty          int4,
postgres(#         hundred         int4,
postgres(#         thousand        int4,
postgres(#         twothousand     int4,
postgres(#         fivethous       int4,
postgres(#         tenthous        int4,
postgres(#         odd             int4,
postgres(#         even            int4,
postgres(#         stringu1        name,
postgres(#         stringu2        name,
postgres(#         string4         name
postgres(# );
CREATE TABLE 
postgres=# CREATE INDEX tenk1_unique1 ON tenk1 USING btree(unique1 int4_ops);
CREATE INDEX 
postgres=# \copy tenk1 FROM '~/projects/postgres/src/test/regress/data/tenk.data';
COPY 10000 
postgres=# VACUUM ANALYZE tenk1;
VACUUM

The query plan from our Postgres 16 code sample contains:

postgres=# explain analyze with x as materialized (select unique1 from tenk1 b order by unique1)
select count(*) from tenk1 a where unique1 in (select * from x);

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Aggregate  (cost=764.29..764.30 rows=1 width=8) (actual time=21.592..21.593 rows=1 loops=1)
   CTE x
     ->  Index Only Scan using tenk1_unique1 on tenk1 b  (cost=0.29..306.29 rows=10000 width=4) (actual time=0.046..1.415 rows=10000 loops=1)
           Heap Fetches: 0
   ->  Nested Loop  (cost=225.28..445.50 rows=5000 width=0) (actual time=7.545..20.911 rows=10000 loops=1)
         ->  HashAggregate  (cost=225.00..227.00 rows=200 width=4) (actual time=7.535..9.051 rows=10000 loops=1)
               Group Key: x.unique1
               Batches: 1  Memory Usage: 929kB
               ->  CTE Scan on x  (cost=0.00..200.00 rows=10000 width=4) (actual time=0.070..3.933 rows=10000 loops=1)
         ->  Index Only Scan using tenk1_unique1 on tenk1 a  (cost=0.29..1.08 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=10000)
               Index Cond: (unique1 = x.unique1)
               Heap Fetches: 0
 Planning Time: 0.806 ms
 Execution Time: 21.890 ms
(14 rows)

The query plan from our Postgres 17 code sample contains:

postgres=# explain analyze with x as materialized (select unique1 from tenk1 b order by unique1)
select count(*) from tenk1 a where unique1 in (select * from x);
                                                                    
QUERY PLAN                                                                    
------------------------------------------------------------------------------------------------------------------------------- Aggregate  (cost=987.55..987.56 rows=1 width=8) (actual time=8.777..8.778 rows=1 loops=1)
   CTE x
     ->  Index Only Scan using tenk1_unique1 on tenk1 b  (cost=0.29..306.29 rows=10000 width=4) (actual time=0.010..1.095 rows=10000 loops=1)
           Heap Fetches: 0
   ->  Merge Semi Join  (cost=0.31..656.26 rows=10000 width=0) (actual time=0.037..8.024 rows=10000 loops=1)
         Merge Cond: (a.unique1 = x.unique1)
         ->  Index Only Scan using tenk1_unique1 on tenk1 a  (cost=0.29..306.29 rows=10000 width=4) (actual time=0.013..1.262 rows=10000 loops=1)
               Heap Fetches: 0
         ->  CTE Scan on x  (cost=0.00..200.00 rows=10000 width=4) (actual time=0.016..3.678 rows=10000 loops=1)
 Planning Time: 0.800 ms
 Execution Time: 8.899 ms
(11 rows)

The query plans in Postgres 16 and Postgres 17 are significantly different due to this version 17 enhancement. This is a small example; you can see the performance gain will be significant in larger queries. Please note that this improvement is only effective if the CTE subquery has an ORDER BY clause.

Fast B-Tree index scans for Scalar Array

In PostgreSQL, ScalarArrayOpExpr is a node type in the execution plan that handles queries involving operations like IN or ANY with arrays or lists of values. It's particularly useful for queries where you compare a column against a set of values, such as:

SELECT * FROM table WHERE column = ANY(ARRAY[1, 2, 3]);

ScalarArrayOpExpr allows PostgreSQL to optimize queries that involve multiple comparisons that use IN or ANY. PostgreSQL 17 has introduced new performance enhancements to make these operations even faster.

In PostgreSQL 17, significant improvements have been made to B-tree index scans, which optimize performance, particularly for queries with large IN lists or ANY conditions. These enhancements reduce the number of index scans performed by the system, thereby decreasing CPU and buffer page contention, resulting in faster query execution.

One of the key improvements is in handling Scalar Array Operation Expressions (SAOP), which allows more efficient traversal of B-tree indexes, particularly for multidimensional queries. For example, when you have multiple index columns (each with its own IN list), PostgreSQL 17 can now process these operations more efficiently in a single index scan, rather than multiple scans as in earlier versions. This can lead to performance gains of 20-30% in CPU-bound workloads where page accesses were previously a bottleneck.

Additionally, PostgreSQL 17 introduces better management of internal locks, further enhancing performance for high-concurrency workloads, especially when scanning multiple dimensions within a B-tree index.

We can demonstrate this with a simple example. We'll use the same tenk1 table and data that we used in the previous example from the Postgres regression suite.

Our example, first run on Postgres 16:

CREATE TABLE tenk1 (
postgres(#         unique1         int4,
postgres(#         unique2         int4,
postgres(#         two             int4,
postgres(#         four            int4,
postgres(#         ten             int4,
postgres(#         twenty          int4,
postgres(#         hundred         int4,
postgres(#         thousand        int4,
postgres(#         twothousand     int4,
postgres(#         fivethous       int4,
postgres(#         tenthous        int4,
postgres(#         odd             int4,
postgres(#         even            int4,
postgres(#         stringu1        name,
postgres(#         stringu2        name,
postgres(#         string4         name
postgres(# );
CREATE TABLE
postgres=# CREATE INDEX tenk1_unique1 ON tenk1 USING btree(unique1 int4_ops);
CREATE INDEX 
postgres=# \copy tenk1 FROM '~/projects/postgres/src/test/regress/data/tenk.data';
COPY 10000
postgres=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM tenk1 WHERE unique1 IN (1, 2, 3);

QUERY PLAN                                                        
-------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tenk1  (cost=14.20..330.12 rows=176 width=244) (actual time=0.138..0.153 rows=3 loops=1)
   Recheck Cond: (unique1 = ANY ('{1,2,3}'::integer[]))
   Heap Blocks: exact=3
   Buffers: shared hit=9
   ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..14.16 rows=176 width=0) (actual time=0.102..0.102 rows=3 loops=1)
         Index Cond: (unique1 = ANY ('{1,2,3}'::integer[]))
         Buffers: shared hit=6
 Planning:
   Buffers: shared hit=2
 Planning Time: 0.900 ms
 Execution Time: 0.242 ms
(11 rows)
postgres=# SELECT idx_scan, idx_tup_fetch FROM pg_stat_user_tables WHERE relname = 'tenk1';
 idx_scan | idx_tup_fetch 
----------+---------------
        3 |             3
(1 row)

In the previous query you can see that the shared buffer hit for the IN query was 9 and that it took 3 index scans to get the results from the index scan. In PostgreSQL, the term shared hit refers to a specific type of cache hit related to buffer management. A shared hit occurs when PostgreSQL accesses a data block or page from the shared buffer pool rather than from disk, improving query performance.

The same example, this time run on Postgres 17:

EXPLAIN (ANALYZE, BUFFERS) 
SELECT * FROM tenk1 WHERE unique1 IN (1, 2, 3);

QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tenk1  (cost=12.88..24.08 rows=3 width=244) (actual time=0.043..0.054 rows=3 loops=1)
   Recheck Cond: (unique1 = ANY ('{1,2,3}'::integer[]))
   Heap Blocks: exact=3
   Buffers: shared hit=5
   ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..12.88 rows=3 width=0) (actual time=0.026..0.026 rows=3 loops=1)
         Index Cond: (unique1 = ANY ('{1,2,3}'::integer[]))
         Buffers: shared hit=2
 Planning:
   Buffers: shared hit=59
 Planning Time: 0.479 ms
 Execution Time: 0.116 ms
(11 rows)

postgres=# SELECT idx_scan, idx_tup_fetch FROM pg_stat_user_tables WHERE relname = 'tenk1';
 idx_scan | idx_tup_fetch 
----------+---------------
        1 |             3
(1 row)

As you can see, with Postgres 17 the shared buffer hit is reduced to 5, and most importantly it is only doing one index scan (as opposed to 3 scans in the case of Postgres 16). With this improvement in Postgres 17, the performance of scalar array operations is greatly improved, and Postgres can choose from better optimized query plans.

Retention of logical replication slots and subscriptions during upgrade

The retention of logical replication slots and migration of subscription dependencies during themajor upgrade process is another logical replication feature added to PostgreSQL 17. Please note that this feature will only be useful in upgrading from PostgreSQL 17 to later versions, this is not supported for upgrade prior to Postgres 17. The replication slots and replication origins are generated when building a logical replication environment. However this information is specific to the node in order to record replication status, application status and WAL transmission status so they aren’t upgraded as part of the upgrade process. Once the published node is upgraded the user needs to manually construct these objects.

The pg_upgrade process is improved in PostgreSQL 17 to reference and rebuild these internal objects; this functionality enables replication to automatically resume when upgrading a node that has logical replication. Previously, when performing a major version upgrade, users had to drop logical replication slots, requiring them to re-synchronize data with the subscribers after the upgrade. This added complexity and increased downtime during upgrades.

You need to follow these steps when upgrading the publisher cluster:

Ensure any subscriptions to the publisher are temporarily disabled by performing an ALTER SUBSCRIPTION….DISABLE. These are enabled after the upgrade process has completed.
Set the new cluster's wal_level to logical.
The max_replication_slots on the new cluster must be set to a value greater than or equal to replication slots on the old cluster.
Output plugins used by the slots must be installed in the new cluster.
All the changes from the old cluster are already replicated to the target cluster prior to the upgrade.
All slots on the old cluster must be usable; you can ensure this by checking conflicting columns in pg_replication_slots view. Conflicting should be false for all the slots on the old cluster.
No slots in the new cluster should have a value of false in the Temporary column of the pg_replication_slots view. There should be no permanent logical replication slots in the new cluster.

The pg_upgrade process of upgrading replication slots will result in an error if any of the above prerequisites aren’t met.

Conclusion

With PostgreSQL 17, community focus continues to be on making PostgreSQL more performant, scalable, secure, and enterprise ready. Postgres 17 also improves the developer experience by adding new features for compatibility and making existing features more powerful and robust.

The logical replication feature in PostgreSQL has rapidly grown and become more mature and robust over the last few years. pgEdge provides distributed PostgreSQL by adding significant new changes on top of the Postgres logical replication base; that base enables pgEdge to provide capabilities like low latency, HA, and data residency for enterprise applications. These are extremely important and critical features that are a must have for almost all enterprise applications today.

Beyond version 17, PostgreSQL will continue to grow, improve, and become more performant to cater to enterprise applications requiring more scalable databases. Scalability (both horizontal and vertical) has improved over the years, but there is definitely room to improve horizontal capability by adding sharding capabilities to PostgreSQL. We will see more logical replication improvements, with more to come in the area of DDL replication or replication of missing objects (like sequences) and better node management. The community also recognizes the need to make PostgreSQL more compatible, hence the MERGE command improvements in Postgres 17, and plans for more compatibility features beyond Postgres 17.

PostgreSQL 17 - A Major Step Forward in Performance, Logical Replication and More

Improved Query Performance with Materialized CTEs

Extracting column statistics from CTE references; Postgres 17 improves materialized CTE’s

Propagating pathkeys from a CTE to an Outer Query

Fast B-Tree index scans for Scalar Array

Retention of logical replication slots and subscriptions during upgrade

Conclusion

Ahsan Hadi

Get started today.

PostgreSQL 17 - A Major Step Forward in Performance, Logical Replication and More

Improved Query Performance with Materialized CTEs

Extracting column statistics from CTE references; Postgres 17 improves materialized CTE’s

Propagating pathkeys from a CTE to an Outer Query

Fast B-Tree index scans for Scalar Array

Retention of logical replication slots and subscriptions during upgrade

Conclusion

Ahsan Hadi

SUBSCRIBE TO BLOG

Get started today.