Deploying an EDB Postgres Distributed example cluster on Linux hosts v5

Introducing TPA and PGD

We created TPA to make installing and managing various Postgres configurations easily repeatable. TPA orchestrates creating and deploying Postgres. In this quick start, you install TPA first. If you already have TPA installed, you can skip those steps. You can use TPA to deploy various configurations of Postgres clusters.

PGD is a multi-master replicating implementation of Postgres designed for high performance and availability. The installation of PGD is orchestrated by TPA. You will use TPA to generate a configuration file for a PGD demonstration cluster.

The TPA Linux host option allows users of any cloud or VM platform to use TPA to configure EDB Postgres Distributed. All you need from TPA is for the target system to be configured with a Linux operating system and accessible using SSH. Unlike the other TPA platforms (Docker and AWS), the Linux host configuration doesn't provision the target machines. It's up to you to provision them wherever you decide to deploy.

This cluster uses Linux server instances to host the cluster's nodes. The nodes include three replicating database nodes, three cohosted connection proxies, and one backup node. TPA can then provision, prepare, and deploy the required EDB Postgres Distributed software and configuration to each node.

On host compatibility

This set of steps is specifically for users running Ubuntu 22.04 LTS on Intel/AMD processors.

Prerequisites

Configure your Linux hosts

You will need to provision four hosts for this quick start. Each host should have a supported Linux operating system installed. To eliminate prompts for password, each host also needs to be SSH-accessible using certificate key pairs.

On machine provisioning

Azure users can follow a Microsoft guide on how to provision Azure VMs loaded with Linux. Google Cloud Platform users can follow a Google guide on how to provision GCP VMs with Linux loaded. You can use any virtual machine technology to host a Linux instance, too. Refer to your virtualization platform's documentation for instructions on how to create instances with Linux loaded on them.

Whichever cloud or VM platform you use, you need to make sure that each instance is accessible by SSH and that each instance can connect to the other instances. They can connect through either the public network or over a VPC for the cloud platforms. You can connect through your local network for on-premises VMs.

If you can't do this, you might want to consider the Docker or AWS quick start. These configurations are easier to set up and quicker to tear down. The AWS quick start, for example, automatically provisions compute instances and creates a VPC for those instances.

In this quick start, you will install PGD nodes onto four hosts configured in the cloud. Each of these hosts in this example is installed with Rocky Linux. Each has a public IP address to go with its private IP address.

Host namePublic IPPrivate IP
linuxhost-1172.19.16.27192.168.2.247
linuxhost-2172.19.16.26192.168.2.41
linuxhost-3172.19.16.25192.168.2.254
linuxhost-4172.19.16.15192.168.2.30

These are example IP addresses. Substitute them with your own public and private IP addresses as you progress through the quick start.

Set up a host admin user

Each machine requires a user account to use for installation. For simplicity, use a user with the same name on all the hosts. On each host, also configure the user so that you can SSH into the host without being prompted for a password. Be sure to give that user sudo privileges on the host. On the four hosts, the user rocky is already configured with sudo privileges.

Preparation

EDB account

You'll need an EDB account to install both TPA and PGD.

Sign up for a free EDB account if you don't already have one. Signing up gives you a trial subscription to EDB's software repositories.

After you are registered, go to the EDB Repos 2.0 page, where you can obtain your repo token.

On your first visit to this page, select Request Access to generate your repo token. Copy the token using the Copy Token icon, and store it safely.

Setting environment variables

First, set the EDB_SUBSCRIPTION_TOKEN environment variable to the value of your EDB repo token, obtained in the EDB account step.

export EDB_SUBSCRIPTION_TOKEN=<your-repo-token>

You can add this to your .bashrc script or similar shell profile to ensure it's always set.

Configure the repository

All the software needed for this example is available from the EDB Postgres Distributed package repository. Download and run a script to configure the EDB Postgres Distributed repository. This repository also contains the TPA packages.

curl -1sLf "https://downloads.enterprisedb.com/$EDB_SUBSCRIPTION_TOKEN/postgres_distributed/setup.deb.sh" | sudo -E bash

Installing Trusted Postgres Architect (TPA)

You'll use TPA to provision and deploy PGD. If you previously installed TPA, you can move on to the next step. You'll find full instructions for installing TPA in the Trusted Postgres Architect documentation, which we've also included here.

Linux environment

TPA supports several distributions of Linux as a host platform. These examples are written for Ubuntu 22.04, but steps are similar for other supported platforms.

Install the TPA package

sudo apt install tpaexec

Configuring TPA

You now need to configure TPA, which configures TPA's Python environment. Call tpaexec with the command setup:

sudo /opt/EDB/TPA/bin/tpaexec setup
export PATH=$PATH:/opt/EDB/TPA/bin

You can add the export command to your shell's profile.

Testing the TPA installation

You can verify TPA is correctly installed by running selftest:

tpaexec selftest

TPA is now installed.

Installing PGD using TPA

Generating a configuration file

Run the tpaexec configure command to generate a configuration folder:

tpaexec configure democluster \
    --architecture PGD-Always-ON \
    --platform bare \
    --edb-postgres-advanced 15 \
    --redwood \
    --no-git \
    --location-names dc1 \
    --pgd-proxy-routing local \
    --hostnames-unsorted

You specify the PGD-Always-ON architecture (--architecture PGD-Always-ON), which sets up the configuration for PGD 5's Always On architectures. As part of the default architecture, it configures your cluster with three data nodes, cohosting three PGD Proxy servers and a Barman node for backup.

For Linux hosts, specify that you're targeting a "bare" platform (--platform bare). TPA will determine the Linux version running on each host during deployment. See the EDB Postgres Distributed compatibility table for details about the supported operating systems.

Specify that the data nodes will be running EDB Postgres Advanced Server v15 (--edb-postgres-advanced 15) with Oracle compatibility (--redwood).

You set the notional location of the nodes to dc1 using --location-names. You then set --pgd-proxy-routing to local so that proxy routing can route traffic to all nodes within each location.

By default, TPA commits configuration changes to a Git repository. For this example, you don't need to do that, so pass the --no-git flag.

Finally, you ask TPA to generate repeatable hostnames for the nodes by passing --hostnames-unsorted. Otherwise, it selects hostnames at random from a predefined list of suitable words.

This command creates a subdirectory in the current working directory called democluster. It contains the config.yml configuration file TPA uses to create the cluster. You can view it using:

less democluster/config.yml

You now need to edit the configuration file to add details related to your Linux hosts, such as admin user names and public and private IP addresses.

Editing your configuration

Using your preferred editor, open democluster/config.yml.

Search for the line containing ansible_user: root. Change root to the name of the user you configured with SSH access and sudo privileges. Follow that with this line:

    manage_ssh_hostkeys: yes

Your instance_defaults section now looks like this:

instance_defaults:
  platform: bare
  vars:
    ansible_user: rocky
    manage_ssh_hostkeys: yes

Next, search for node: 1, which is the configuration settings of the first node, kaboom.

After the node: 1 line, add the public and private IP addresses of your node. Use linuxhost-1 as the host for this node. Add the following to the file, substituting your IP addresses. Align the start of each line with the start of the node: line.

  public_ip: 172.19.16.27
  private_ip: 192.168.2.247

The whole entry for kaboom looks like this but with your own IP addresses:

- Name: kaboom
  backup: kapok
  location: dc1
  node: 1
  public_ip: 172.19.16.27
  private_ip: 192.168.2.247 
  role:
  - bdr
  - pgd-proxy
  vars:
    bdr_child_group: dc1_subgroup
    bdr_node_options:
      route_priority: 100

Repeat this process for the three other nodes.

Search for node: 2, which is the configuration settings for the node kaftan. Use linuxhost-2 for this node. Substituting your IP addresses, add:

  public_ip: 172.19.16.26
  private_ip: 192.168.2.41

Search for node: 3, which is the configuration settings for the node kaolin. Use linuxhost-3 for this node. Substituting your IP addresses, add:

  public_ip: 172.19.16.25
  private_ip: 192.168.2.254

Finally, search for node: 4, which is the configuration settings for the node kapok. Use linuxhost-4 for this node. Substituting your IP addresses, add:

  public_ip: 172.19.16.15
  private_ip: 192.168.2.30

Provisioning the cluster

You can now run:

tpaexec provision democluster

This command prepares for deploying the cluster. (On other platforms, such as Docker and AWS, this command also creates the required hosts. When using Linux hosts, your hosts should already be configured.)

Further reading

One part of this process for Linux hosts is creating key-pairs for the hosts for SSH operations later. With those key-pairs created, you will need to copy the public part of the key-pair to the hosts. You can do this with ssh-copy-id, giving the democluster identity (-i) and the login to each host. For this example, these are the commands:

ssh-copy-id -i democluster/id_democluster rocky@172.19.16.27
ssh-copy-id -i democluster/id_democluster rocky@172.19.16.26
ssh-copy-id -i democluster/id_democluster rocky@172.19.16.25
ssh-copy-id -i democluster/id_democluster rocky@172.19.16.15

You can now create the tpa_known_hosts file, which allows the hosts to be verified. Use ssh-keyscan on each host (-H) and append its output to tpa_known_hosts:

ssh-keyscan -H 172.19.16.27 >> democluster/tpa_known_hosts
ssh-keyscan -H 172.19.16.26 >> democluster/tpa_known_hosts
ssh-keyscan -H 172.19.16.25 >> democluster/tpa_known_hosts
ssh-keyscan -H 172.19.16.15 >> democluster/tpa_known_hosts

Deploy your cluster

You now have everything ready to deploy your cluster. To deploy, run:

tpaexec deploy democluster

TPA applies the configuration, installing the needed packages and setting up the actual EDB Postgres Distributed cluster.

Further reading

Connecting to the cluster

You're now ready to log into one of the nodes of the cluster with SSH and then connect to the database. Part of the configuration process set up SSH logins for all the nodes, complete with keys. To use the SSH configuration, you need to be in the democluster directory created by the tpaexec configure command earlier:

cd democluster

From there, you can run ssh -F ssh_config <hostname> to establish an SSH connection. Connect to kaboom, the first database node in the cluster:

ssh -F ssh_config kaboom
Output
[rocky@kaboom ~]# 

Notice that you're logged in as rocky, the admin user and ansible user you configured earlier, on kaboom.

You now need to adopt the identity of the enterprisedb user. This user is preconfigured and authorized to connect to the cluster's nodes.

sudo -iu enterprisedb
Output
enterprisedb@kaboom:~ $

You can now run the psql command to access the bdrdb database:

psql bdrdb
Output
psql (15.2.0, server 15.2.0)
Type "help" for help.

bdrdb=#

You're directly connected to the Postgres database running on the kaboom node and can start issuing SQL commands.

To leave the SQL client, enter exit.

Using PGD CLI

The pgd utility, also known as the PGD CLI, lets you control and manage your EDB Postgres Distributed cluster. It's already installed on the node.

You can use it to check the cluster's health by running pgd check-health:

pgd check-health
Output
Check      Status Message
-----      ------ -------
ClockSkew  Ok     All BDR node pairs have clockskew within permissible limit
Connection Ok     All BDR nodes are accessible
Raft       Ok     Raft Consensus is working correctly
Replslots  Ok     All BDR replication slots are working correctly
Version    Ok     All nodes are running same BDR versions
enterprisedb@kaboom:~ $

Or, you can use pgd show-nodes to ask PGD to show you the data-bearing nodes in the cluster:

pgd show-nodes
Output
Node   Node ID    Group        Type Current State Target State Status Seq ID
----   -------    -----        ---- ------------- ------------ ------ ------
kaboom 2710197610 dc1_subgroup data ACTIVE        ACTIVE       Up     1
kaftan 3490219809 dc1_subgroup data ACTIVE        ACTIVE       Up     3
kaolin 2111777360 dc1_subgroup data ACTIVE        ACTIVE       Up     2
enterprisedb@kaboom:~ $

Similarly, use pgd show-proxies to display the proxy connection nodes:

pgd show-proxies
Output
Proxy  Group        Listen Addresses Listen Port
-----  -----        ---------------- -----------
kaboom dc1_subgroup [0.0.0.0]        6432
kaftan dc1_subgroup [0.0.0.0]        6432
kaolin dc1_subgroup [0.0.0.0]        6432

The proxies provide high-availability connections to the cluster of data nodes for applications. You can connect to the proxies and, in turn, to the database with the command psql -h kaboom,kaftan,kaolin -p 6432 bdrdb:

psql -h kaboom,kaftan,kaolin -p 6432 bdrdb
Output
psql (15.2.0, server 15.2.0)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.

bdrdb=#

Explore your cluster