PGD overview v5

Architectural overview

EDB Postgres Distributed (PGD) is a distributed database solution that extends PostgreSQL's capabilities, enabling highly available and fault-tolerant database deployments across multiple nodes. PGD provides data distribution with advanced conflict management, data-loss protection, and throughput up to 5X faster than native logical replication.

PGD is built on a multi-master foundation (Bi-directional replication, or BDR) which is then optimized for performance and availability through PGD Proxy. PGD proxy ensures lower contention and conflict through the use of a write leader, and for each proxy instance a single endpoint automatically addresses all the data nodes in a group, removing the need for clients to round robin multi-host connection strings. RAFT is implemented to help the system make important decisions, like deciding which node is the RAFT election leader and which node is the write leader.

High-level architecture

At the highest level, PGD comprises two main components: Bi-Directional Replication (BDR) and PGD-proxy. BDR is a Postgres extension that enables a multiple-master replication mesh between different BDR-enabled Postgres instances/nodes. PGD proxy sends requests to the write leader—ensuring a lower risk of conflicts between nodes.

Diagram showing 3 application nodes, 3 proxy instances, and 3 PGD nodes. Traffic is being directed from each of the proxy instances to the write leader node.

Changes are replicated directly, row-by-row between all nodes. Logical replication in PGD is asynchronous by default, so only eventual consistency is guaranteed (within seconds usually). However, commit scope options offer stronger consistency guarantees via two-phase commit, group and synchronous commits.

The RAFT algorithm provides a mechanism for electing leaders (both RAFT leader and write leader), deciding which nodes should be added or subtracted from the cluster, and generally ensuring that the distributed system remains consistent and fault-tolerant, even in the face of node failures.

Architectural elements

PGD comprises several key architectural elements that work together to provide its distributed database solution:

  • PGD nodes: These are individual PostgreSQL instances that store and manage data. They are the basic building blocks of a PGD cluster.

  • Groups: PGD nodes are organized into groups, which enhance manageability and high availability. Each group can contain multiple nodes, allowing for redundancy and failover within the group. Groups facilitate organized replication and data consistency among nodes within the same group and across different groups. Each group has its own write leader.

  • Replication mechanisms: PGD's replication mechanisms include Bi-Directional Replication (BDR) for efficient replication across nodes, enabling multi-master replication. BDR supports asynchronous replication by default, but can be configured for varying levels of synchronicity, such as Group Commit or Synchronous Commit, to enhance data durabiltiy and consistency. Logical replication is used to replicate changes at the row level, ensuring flexible data propagation without replicating the entire database state​.

  • Transaction management systems: PGD's transaction management systems ensure data consistency and reliability across distributed nodes through several key mechanisms. It supports explicit two-phase commit (2PC) for atomic transactions, utilizes different commit scopes to balance performance with durability and consistency, and employs advanced conflict management to handle multi-master write conflicts. Additionally, PGD leverages logical replication for efficient data replication at the row level and includes wait functions to prevent stale reads by ensuring specific transactions are applied locally before proceeding​.

  • Monitoring tools: To monitor performance, health, and usage with PGD, you can utilize its built-in command-line interface (CLI), which offers several useful commands. For instance, the pgd show-nodes command provides a summary of all nodes in the cluster, including their state and status. The pgd check-health command checks the health of the cluster, reporting on node accessibility, replication slot health, and other critical metrics. The pgd show-events command lists significant events like background worker errors and node membership changes, which helps in tracking the operational status and issues within the cluster. Furthermore, the BDR extension allows for monitoring your cluster using SQL using the bdr.monitor role.

Node types and roles

  • Data nodes: Store and manage data, handle read and write operations, and participate in replication.

  • Subscriber-only nodes: Subscribe to changes from data nodes for read-only purposes, used in reporting or analytics.

  • Witness nodes: Participate in concensus proceses without storing data, aiding in achieving quorum and maintaining high availability.

  • Logical standby nodes: Act as standby nodes that can be promoted to data nodes if needed, ensuring high availability and disaster recovery.

  • Write leader: Receives all write operations from PGD Proxy.

Architectural aims

High availability

PGD ensures high availability through multiple mechanisms. The architecture includes multiple master nodes, providing redundancy and maintaining service availability even if a node fails. Optional logical standby nodes can quickly replace any node that goes down, minimizing downtime in case of a failure. In fact, replication continues among connected nodes even if some are temporarily unavailable, and resumes seamlessly when a node recovers, ensuring no data loss. Additionally, PGD supports rolling upgrades of major versions, allowing nodes to run different release levels and perform updates without disrupting the cluster’s operation. These combined mechanisms ensure continuous availability and robust disaster recovery in a distributed database environment.

Connection management

In PGD, connection management aims to optimize performance by reducing potential data conflicts. PGD uses a RAFT-based consensus-driven quorum to determine the correct connection endpoint—the write leader. This approach reduces potential data conflicts by ensuring that writes are directed to a single node. The PGD Proxy manages application connections, routing read-heavy queries to replicas and writes to the write leaders, thereby optimizing performance and maintaining data consistency across the distributed environment.

Architectural options and performance

Always-on architectures

A number of different architectures can be configured, each of which has different performance and scalability characteristics.

The group is the basic building block consisting of 2+ nodes (servers). In a group, each node is in a different availability zone, with a dedicated router and backup, giving immediate switchover and high availability. Each group has a dedicated replication set defined on it. If the group loses a node, you can easily repair or replace it by copying an existing node from the group.

The Always-on architectures are built from either one group in a single location or two groups in two separate locations. Each group provides high availability. When two groups are leveraged in remote locations, they together also provide disaster recovery (DR).

Tables are created across both groups, so any change goes to all nodes, not just to nodes in the local group.

One node in each group is selected as the group's write leader. Proxies then direct application writes and queries to the write leader. The other nodes are replicas of the write leader. If, at any point, the write leader is seen to be unavailable, the remaining nodes in the group select a new write leader from the group the proxies direct traffic to that node. Scalability isn't the goal of this architecture.

Since writes are mainly to only one node, the possibility of contention between nodes is reduced to almost zero. As a result, performance impact is much reduced.

Secondary applications might execute against the shadow nodes, although these are reduced or interrupted if the main application begins using that node.

In the future, one node will be elected as the main replicator to other groups, limiting CPU overhead of replication as the cluster grows and minimizing the bandwidth to other groups.

Supported Postgres database servers

PGD is compatible with PostgreSQL, EDB Postgres Extended Server, and EDB Postgres Advanced Server and is deployed as a standard Postgres extension named BDR. See Compatibility for details about supported version combinations.

Some key PGD features depend on certain core capabilities being available in the target Postgres database server. Therefore, PGD users must also adopt the Postgres database server distribution that's best suited to their business needs. For example, if having the PGD feature Commit At Most Once (CAMO) is mission critical to your use case, don't adopt the community PostgreSQL distribution. It doesn't have the core capability required to handle CAMO. See the full feature matrix compatibility in Choosing a Postgres distribution.

PGD offers close-to-native Postgres compatibility. However, some access patterns don't necessarily work as well in multi-node setup as they do on a single instance. There are also some limitations in what you can safely replicate in a multi-node setting. Application usage goes into detail about how PGD behaves from an application development perspective.

Characteristics affecting performance

By default, PGD keeps one copy of each table on each node in the group, and any changes propagate to all nodes in the group.

Since copies of data are everywhere, SELECTs need only ever access the local node. On a read-only cluster, performance on any one node isn't affected by the number of nodes and is immune to replication conflicts on other nodes caused by long-running SELECT queries. Thus, adding nodes increases linearly the total possible SELECT throughput.

If an INSERT, UPDATE, and DELETE (DML) is performed locally, then the changes propagate to all nodes in the group. The overhead of DML apply is less than the original execution. So if you run a pure write workload on multiple nodes concurrently, a multi-node cluster can handle more TPS than a single node.

Conflict handling has a cost that acts to reduce the throughput. The throughput then depends on how much contention the application displays in practice. Applications with very low contention perform better than a single node. Applications with high contention can perform worse than a single node. These results are consistent with any multimaster technology and aren't particular to PGD.

Synchronous replication options can send changes concurrently to multiple nodes so that the replication lag is minimized. Adding more nodes means using more CPU for replication, so peak TPS reduces slightly as each node is added.

If the workload tries to use all CPU resources, then this resource constrains replication, which can then affect the replication lag.

In summary, adding more master nodes to a PGD group doesn't result in significant write throughput increase when most tables are replicated because all the writes are replayed on all nodes. Because PGD writes are in general more effective than writes coming from Postgres clients by way of SQL, you can increase performance. Read throughput generally scales linearly with the number of nodes.