PostgreSQL Multi-master Replication and upcoming Logical Replication Improvements in PostgreSQL 16
Replication is a process that reliably copies data from one database server to another database server in an automated fashion. Replication is a core part of an enterprise database solution that:
offers fault tolerance in-case of data mishaps
enables high availability in the event of a node failure
allows incoming traffic to be distributed across replicas for provide better performance
… and more.
This blog is the first of a series discussing the future of logical replication. In this post, I’ll focus on the improvements the community has added to logical replication for PostgreSQL 16. The next post will describe the in-flight PostgreSQL 16 logical replication improvements (those changes that are in progress, but not yet committed). The last post in the series will delve into a new PostgreSQL extension for logical replication called Spock. Spock is a replication solution recently released by pgEdge that leverages both the pgLogical and BDR2 open-source projects as a solid foundation for this enterprise-class extension. Please visit our official site to learn more about pgEdge and Spock.
Spock provides multi-master (multi-active) PostgreSQL replication optimized for the network edge of cloud-based systems (with the cloud provider of your choice) or for databases hosted on-prem. With its logical replication foundation, Spock offers fine-grained control for your data replication and security needs.
PostgreSQL Replication Methods
PostgreSQL supports two native methods of replication: logical replication and physical replication (also called streaming replication).
Logical replication uses a publisher/subscriber model to replicate changes between PostgreSQL servers. The primary node (where the database lives) is called the publisher, and the stand-by node (which receives copies of database transactions) is called the subscriber. Database changes are copied from the publisher node to one or more subscriber node(s) identified by the subscription.
When you set up logical replication, you take a snapshot of the data on the published database, and copy it to the subscriber. When you start the subscription, changes on the publisher are sent to the subscriber as they occur. Logical replication uses a transactional model to apply changes to the subscriber in the same order that they are applied to the publisher. This guarantees transactional consistency.
The other native method of PostgreSQL replication is physical (or streaming) replication. Streaming replication passes the data from the primary node to the stand-by node in WAL (write-ahead log) files. You can configure streaming replication to be either synchronous or asynchronous; by default, streaming replication is asynchronous.
Asynchronous replication ships each log file to the stand-by node after the transaction is committed on the primary server. If something happens to the primary server before the transaction is written to the stand-by, you can potentially lose data.
Synchronous replication writes each WAL record to the primary and stand-by node simultaneously. It is generally safer, but requires a more robust network connection with better bandwidth.
Both synchronous and asynchronous modes of streaming have their own pros and cons. As a rule, synchronous replication offers better data protection in the event of a server problem, while asynchronous replication is more cost effective in terms of required resources. Review the PostgreSQL documentation for more information about native replication methods.
Logical Replication Improvements in PostgreSQL 16
Let’s turn our attention to the main topic of this blog, and summarize the key logical replication improvements that are added to PostgreSQL 16 so far.
Applying changes to the subscriber with background workers
Currently, the changes for large, in-progress transactions are sent from the publisher to subscriber in multiple streams, with the changes divided into chunks based on the value of the logical_decoding_work_mem parameter. PostgreSQL version 16 adds a feature that improves performance by parallelizing the process of applying changes to the subscriber node by using multiple background workers.
The parallel application to the stand-by node begins while the transaction is still in-progress on the primary node. When the application starts, a single worker applies the top-level transaction, while parallel workers begin to apply the sub-transactions. If any of the parallel workers error out, the entire transaction is exited. This functionality provides transactional consistency to ensure that a partially completed bulk insert does not remain in your database.
Performance benchmarking shows that the patch offers a 30 to 40% performance improvement for bulk inserts. You can review the benchmarking as part of the patch history at https://commitfest.postgresql.org/42/3621.
Creating a subscription in binary format
In PostgreSQL version 16, when you create a subscription, you have the option to use binary format for the initial data transfer. Prior to version 16, the initial sync was performed in text format; you could change the format to binary only after logical replication was started. This new functionality allows you to perform the initial sync in the same format that you plan to use for replication.
The COPY command is used behind the scenes of the CREATE SUBSCRIPTION command to copy the data for the initial sync. Since the COPY command supports both binary and text formats, it makes perfect sense to support both. You can use the following clauses to specify the data transfer mode:
When you set binary=false (the default), data is sent in text format.
When you set binary=true, data is sent in binary format.
If your column type supports binary, copying tables in binary format may reduce your initial sync time.
Note that this feature is supported only when both the publisher and subscriber are version 16 or later. Please review the commit fest entry for more details https://commitfest.postgresql.org/42/3840/.
Improving performance by using indexes on the subscription node
The REPLICA IDENTITY attribute helps the server identify the correct row on the subscriber node to UPDATE or DELETE when a change occurs to the primary node. If your table does not have a key, specifying REPLICA IDENTITY FULL tells the server to use a combination of all of the columns in a row to identify the correct row on the subscriber to modify.
Specifying REPLICA IDENTITY FULL on the publication node, can trigger a full table scan on the subscriber node in the event of an UPDATE or DELETE to ensure that the correct row is updated. A full table scan can be time-consuming, and uses more resources than an index.
This commit improves performance by allowing you to specify which index will be used on the subscriber when applying UPDATES and DELETES. The index must be:
a btree index
a non-partial index
include at least one column that does not consist solely of expressions
If multiple indexes meet these requirements, the server will select the first valid index, instead of using a smart approach to select the best index. If you specify a REPLICA IDENTITY other than FULL, the subscriber must have a similar replica identity.
The functionality provided by this feature is only enabled when REPLICA IDENTITY FULL is specified. The functionality is skipped when the remote relation doesn’t contain the left most column of the index, primarily because a sequential scan provides better performance in such cases.
Please see the commit fest entry for more details https://commitfest.postgresql.org/42/3765/
Allow logical decoding on stand-by
Prior to PostgreSQL 16, logical decoding was supported only for the primary node; this commit allows minimal logical decoding on the stand-by node as well. To make use of this functionality, you need to set wal_level higher than replica (the default) on the primary node.
This feature allows you to:
create a logical replication slot on a stand-by node
create a subscription to a stand-by node
perform logical decoding on the stand-by node
Prior to this commit, those actions would result in the following error:
logical decoding cannot be used while in recovery
This commit also introduces the pg_log_standby_snapshot() function. The function takes a snapshot of a running transaction, and writes it into WAL files without requiring a checkpoint. This function makes the process of creating logical replication slots on a stand-by much faster; the function helps create the replication slot on the stand-by if the primary node is in an idle state.
For more information, please see the commit fest entry at: https://commitfest.postgresql.org/42/3740/.
Conclusion
PostgreSQL logical replication continues to improve and become more robust. Some of the features added in this release also lay the groundwork for more great features in future releases. This post summarizes some of the key logical replication features added to PostgreSQL 16. My next post will go over the improvements that are in progress and discuss the likelihood of those making it into the release.