How To Set Up Physical Streaming Replication with PostgreSQL 12 on Ubuntu 20.04

Physical Streaming Replication with PostgreSQL 12 on Ubuntu 20.04

Introduction

Streaming replication is a popular method you can use to horizontally scale your relational databases. It uses two or more copies of the same database cluster running on separate machines. One database cluster is referred to as the primary and serves both read and write operations; the others, referred to as the replicas, serve only read operations. You can also use streaming replication to provide high availability of a system. If the primary database cluster or server were to unexpectedly fail, the replicas are able to continue serving read operations, or (one of the replicas) become the new primary cluster.

PostgreSQL is a widely used relational database that supports both logical and physical replication. Logical replication streams high-level changes from the primary database cluster to the replica databases. Using logical replication, you can stream changes to just a single database or table in a database. However, in physical replication, changes to the WAL (Write-Ahead-Logging) log file are streamed and replicated in the replica clusters. As a result, you can’t replicate specific areas of a primary database cluster, but instead all changes to the primary are replicated.

In this tutorial, you will set up physical streaming replication with PostgreSQL 12 on Ubuntu 20.04 using two separate machines running two separate PostgreSQL 12 clusters. One machine will be the primary and the other, the replica.

To complete this tutorial, you will need the following:

  • Two separate machines Ubuntu 20.04 machines; one referred to as the primary and the other referred to as the replica. You can set these up with our Initial Server Setup Guide, including non-root users with sudo permissions and a firewall.
  • Your firewalls configured to allow HTTP/HTTPS and traffic on port 5432—the default port used by PostgreSQL 12. You can follow How To Set Up a Firewall with ufw on Ubuntu 20.04 to configure these firewall settings.
  • PostgreSQL 12 running on both Ubuntu 20.04 Servers. Follow Step 1 of the How To Install and Use PostgreSQL on Ubuntu 20.04 tutorial that covers the installation and basic usage of PostgreSQL on Ubuntu 20.04.

Step 1 — Configuring the Primary Database to Accept Connections

In this first step, you’ll configure the primary database to allow your replica database(s) to connect. By default, PostgreSQL only listens to the localhost (127.0.0.1) for connections. To change this, you’ll first edit the listen_addresses configuration parameter on the primary database.

On your primary server, run the following command to connect to the PostgreSQL cluster as the default postgres user:

 $ sudo -u postgres psql

Once you have connected to the database, you’ll modify the listen_addresses parameter using the ALTER SYSTEM command:

  • ALTER SYSTEM SET listen_addresses TO ‘your_replica_IP_addr’;

Replace 'your_replica_IP_addr' with the IP address of your replica machine.

You will receive the following output:

Output
ALTER SYSTEM

The command you just entered instructs the PostgreSQL database cluster to allow connections only from your replica machine. If you were using more than one replica machine, you would list the IP addresses of all your replicas separated by commas. You could also use '*' to allow connections from all IP addresses, however, this isn’t recommended for security reasons.

Note: You can also run the command on the database from the terminal using psql -c as follows:

 $ sudo -u postgres psql -c "ALTER SYSTEM SET listen_addresses TO 'your_replica_IP_adder';"

Alternatively, you can change the value for listen_addresses by manually editing the postgresql.conf configuration file, which you can find in the /etc/postgresql/12/main/ directory by default. You can also get the location of the configuration file by running SHOW config_file; on the database cluster.

To open the file using nano use:

 $ sudo nano /etc/postgresql/12/main/postgresql.conf

 

Once you’re done, your primary database will now accept connections from other machines. Next, you’ll create a role with the appropriate permissions that the replica will use when connecting to the primary.

Step 2 — Creating a Special Role with Replication Permissions

Now, you need to create a role in the primary database that has permission to replicate the database. Your replica will use this role when connecting to the primary. Creating a separate role just for replication also has security benefits. Your replica won’t be able to manipulate any data on the primary; it will only be able to replicate the data.

To create a role, you need to run the following command on the primary cluster:

  • CREATE ROLE test WITH REPLICATION PASSWORD ‘testpassword’ LOGIN;

You’ll receive the following output:

Output
CREATE ROLE

This command creates a role named test with the password 'testpassword', which has permission to replicate the database cluster.

PostgreSQL has a special replication pseudo-database that the replica connects to, but you first need to edit the /etc/postgresql/12/main/pg_hba.conf configuration file to allow your replica to access it. So, exit the PostgreSQL command prompt by running:

  • \q

Now that you’re back at your terminal command prompt, open the /etc/postgresql/12/main/pg_hba.conf configuration file using nano:

 $ sudo nano /etc/postgresql/12/main/pg_hba.conf

Append the following line to the end of the pg_hba.conf file:

/etc/postgresql/12/main/pg_hba.conf
. . .
host    replication     test    your-replica-IP/32   md5

This ensures that your primary allows your replica to connect to the replication pseudo-database using the role, test, you created earlier. The host value means to accept non-local connections via plain or SSL-encrypted TCP/IP sockets. replication is the name of the special pseudo-database that PostgreSQL uses for replication. Finally, the value md5 is the type of authentication used. If you want to have more than one replica, just add the same line again to the end of the file with the IP address of your other replica.

To ensure these changes to the configuration file are implemented, you need to restart the primary cluster using:

 $ sudo systemctl restart postgresql@12-main

If your primary cluster restarted successfully, it is correctly set up and ready to start streaming once your replica connects. Next, you’ll move on to setting up your replica cluster.

Step 3 — Backing Up the Primary Cluster on the Replica

As you are setting up physical replication with PostgreSQL in this tutorial, you need to perform a physical backup of the primary cluster’s data files into the replica’s data directory. To do this, you’ll first clear out all the files in the replica’s data directory. The default data directory for PostgreSQL on Ubuntu is /var/lib/postgresql/12/main/.

You can also find PostgreSQL’s data directory by running the following command on the replica’s database:

  • SHOW data_directory;

Once you have the location of the data directory, run the following command to remove everything:

 $ sudo -u postgres rm -r /var/lib/postgresql/12/main/*

Since the default owner of the files in the directory is the postgres user, you will need to run the command as postgres using sudo -u postgres.

Note:
If in the exceedingly rare case a file in the directory is corrupted and the command does not work, remove the main directory all together and recreate it with the appropriate permissions as follows:

 $ sudo -u postgres rm -r /var/lib/postgresql/12/main
 $ sudo -u postgres mkdir /var/lib/postgresql/12/main
 $ sudo -u postgres chmod 700 /var/lib/postgresql/12/main

 

Now that the replica’s data directory is empty, you can perform a physical backup of the primary’s data files. PostgreSQL conveniently has the utility pg_basebackup that simplifies the process. It even allows you to put the server into standby mode using the -R option.

Execute the pg_basebackup command on the replica as follows:

 $ sudo -u postgres pg_basebackup -h primary-ip-addr -p 5432 -U test -D /var/lib/postgresql/12/main/ -Fp -Xs -R
  • The -h option specifies a non-local host. Here, you need to enter the IP address of your server with the primary cluster.
  • The -p option specifies the port number it connects to on the primary server. By default, PostgreSQL uses port :5432.
  • The -U option allows you to specify the user you connect to the primary cluster as. This is the role you created in the previous step.
  • The -D flag is the output directory of the backup. This is your replica’s data directory that you emptied just before.
  • The -Fp specifies the data to be outputted in the plain format instead of as a tar file.
  • -Xs streams the contents of the WAL log as the backup of the primary is performed.
  • Lastly, -R creates an empty file, named standby.signal, in the replica’s data directory. This file lets your replica cluster know that it should operate as a standby server. The -R option also adds the connection information about the primary server to the postgresql.auto.conf file. This is a special configuration file that is read whenever the regular postgresql.conf file is read, but the values in the .auto file override the values in the regular configuration file.

When the pg_basebackup command connects to the primary, you will be prompted to enter the password for the role you created in the previous step. Depending on the size of your primary database cluster, it may take some time to copy all the files.

Your replica will now have all the data files from the primary that it requires to begin replication. Next, you’ll be putting the replica into standby mode and start replicating.

Step 4 — Restarting and Testing the Clusters

Now that the primary cluster’s data files have been successfully backed up on the replica, the next step is to restart the replica database cluster to put it into standby mode. To restart the replica database, run the following command:

 $ sudo systemctl restart postgresql@12-main

If your replica cluster restarted in standby mode successfully, it should have already connected to the primary database cluster on your other machine. To check if the replica has connected to the primary and the primary is streaming, connect to the primary database cluster by running:

 $ sudo -u postgres psql

Now query the pg_stat_replication table on the primary database cluster as follows:

  • SELECT client_addr, state FROM pg_stat_replication;

Running this query on the primary cluster will output something similar to the following:

Output
   client_addr    |  state
------------------+-----------
 your_replica_IP | streaming

If you have similar output, then the primary is correctly streaming to the replica.

conclusion
You now have two Ubuntu 20.04 servers each with a PostgreSQL 12 database cluster running with physical streaming between them. Any changes now made to the primary database cluster will also appear in the replica cluster.

You can also add more replicas to your setup if your databases need to handle more traffic.

If you wish to learn more about physical streaming replication including how to set up synchronous replication to ensure zero chance of losing any mission-critical data, you can read the entry in the official PostgreSQL docs.

How To Install and Use PostgreSQL on Ubuntu 20.04

How To Install and Use PostgreSQL on Ubuntu 20.04

Introduction

Relational database management systems are a key component of many web sites and applications. They provide a structured way to store, organize, and access information.

PostgreSQL, or Postgres, is a relational database management system that provides an implementation of the SQL querying language. It’s standards-compliant and has many advanced features like reliable transactions and concurrency without read locks.

This guide demonstrates how to install Postgres on an Ubuntu 20.04 server. It also provides some instructions for general database administration.

prerequisites
To follow along with this tutorial, you will need one Ubuntu 20.04 server that has been configured by following our Initial Server Setup for Ubuntu 20.04 guide. After completing this prerequisite tutorial, your server should have a non-root user with sudo permissions and a basic firewall.

Step 1 — Installing PostgreSQL

Ubuntu’s default repositories contain Postgres packages, so you can install these using the aptpackaging system.

If you’ve not done so recently, refresh your server’s local package index:

 $ sudo apt update

Then, install the Postgres package along with a -contrib package that adds some additional utilities and functionality:

  • sudo apt install postgresql postgresql-contrib

Now that the software is installed, we can go over how it works and how it may be different from other relational database management systems you may have used.

Step 2 — Using PostgreSQL Roles and Databases

By default, Postgres uses a concept called “roles” to handle authentication and authorization. These are, in some ways, similar to regular Unix-style accounts, but Postgres does not distinguish between users and groups and instead prefers the more flexible term “role”.

Upon installation, Postgres is set up to use ident authentication, meaning that it associates Postgres roles with a matching Unix/Linux system account. If a role exists within Postgres, a Unix/Linux username with the same name is able to sign in as that role.

The installation procedure created a user account called postgres that is associated with the default Postgres role. In order to use Postgres, you can log into that account.

There are a few ways to utilize this account to access Postgres.

Switching Over to the postgres Account

Switch over to the postgres account on your server by typing:

You can now access the PostgreSQL prompt immediately by typing:

From there you are free to interact with the database management system as necessary.

Exit out of the PostgreSQL prompt by typing:

This will bring you back to the postgres Linux command prompt.

Accessing a Postgres Prompt Without Switching Accounts

You can also run the command you’d like with the postgres account directly with sudo.

For instance, in the last example, you were instructed to get to the Postgres prompt by first switching to the postgres user and then running psql to open the Postgres prompt. You could do this in one step by running the single command psql as the postgres user with sudo, like this:

This will log you directly into Postgres without the intermediary bash shell in between.

Again, you can exit the interactive Postgres session by typing:

Many use cases require more than one Postgres role. Read on to learn how to configure these.

Step 3 — Creating a New Role

Currently, you just have the postgres role configured within the database. You can create new roles from the command line with the createrole command. The --interactive flag will prompt you for the name of the new role and also ask whether it should have superuser permissions.

If you are logged in as the postgres account, you can create a new user by typing:

If, instead, you prefer to use sudo for each command without switching from your normal account, type:

The script will prompt you with some choices and, based on your responses, execute the correct Postgres commands to create a user to your specifications.

Enter name of role to add: sammy
Shall the new role be a superuser? (y/n) y

You can get more control by passing some additional flags. Check out the options by looking at the man page:

  • man createuser

Your installation of Postgres now has a new user, but you have not yet added any databases. The next section describes this process.

Step 4 — Creating a New Database

Another assumption that the Postgres authentication system makes by default is that for any role used to log in, that role will have a database with the same name which it can access.

This means that if the user you created in the last section is called sammy, that role will attempt to connect to a database which is also called “sammy” by default. You can create the appropriate database with the createdb command.

If you are logged in as the postgres account, you would type something like:

  • createdb sammy

If, instead, you prefer to use sudo for each command without switching from your normal account, you would type:

 $ sudo -u postgres createdb

This flexibility provides multiple paths for creating databases as needed.

Step 5 — Opening a Postgres Prompt with the New Role

To log in with ident based authentication, you’ll need a Linux user with the same name as your Postgres role and database.

If you don’t have a matching Linux user available, you can create one with the adduser command. You will have to do this from your non-root account with sudo privileges (meaning, not logged in as the postgres user):

 $ sudo adduser

Once this new account is available, you can either switch over and connect to the database by typing:

 $ sudo -i -u
  • psql

Or, you can do this inline:

 $ sudo -u

This command will log you in automatically, assuming that all of the components have been properly configured.

If you want your user to connect to a different database, you can do so by specifying the database like this:

  • psql -d postgres

Once logged in, you can get check your current connection information by typing:

  • \conninfo
Output
You are connected to database "sammy" as user "sammy" via socket in "/var/run/postgresql" at port "5432".

This is useful if you are connecting to non-default databases or with non-default users.

Now that you know how to connect to the PostgreSQL database system, you can learn some basic Postgres management tasks.

The basic syntax for creating tables is as follows:

CREATE TABLE table_name (
    column_name1 col_type (field_length) column_constraints,
    column_name2 col_type (field_length),
    column_name3 col_type (field_length)
);

As you can see, these commands give the table a name, and then define the columns as well as the column type and the max length of the field data. You can also optionally add table constraints for each column.

You can learn more about how to create and manage tables in Postgres here.

For demonstration purposes, create the following table:

  • CREATE TABLE playground (
  • equip_id serial PRIMARY KEY,
  • type varchar (50) NOT NULL,
  • color varchar (25) NOT NULL,
  • location varchar(25) check (location in (‘north’, ‘south’, ‘west’, ‘east’, ‘northeast’, ‘southeast’, ‘southwest’, ‘northwest’)),
  • install_date date
  • );

This command will create a table that inventories playground equipment. The first column in the table will hold equipment ID numbers of the serial type, which is an auto-incrementing integer. This column also has the constraint of PRIMARY KEY which means that the values within it must be unique and not null.

The next two lines create columns for the equipment type and color respectively, neither of which can be empty. The line after these creates a location column as well as a constraint that requires the value to be one of eight possible values. The last line creates a date column that records the date on which you installed the equipment.

For two of the columns (equip_id and install_date), the command doesn’t specify a field length. The reason for this is that some data types don’t require a set length because the length or format is implied.

You can see your new table by typing:

  • \d
Output
                  List of relations
 Schema |          Name           |   Type   | Owner 
--------+-------------------------+----------+-------
 public | playground              | table    | sammy
 public | playground_equip_id_seq | sequence | sammy
(2 rows)

Your playground table is here, but there’s also something called playground_equip_id_seq that is of the type sequence. This is a representation of the serial type which you gave your equip_idcolumn. This keeps track of the next number in the sequence and is created automatically for columns of this type.

If you want to see just the table without the sequence, you can type:

  • \dt
Output
          List of relations
 Schema |    Name    | Type  | Owner 
--------+------------+-------+-------
 public | playground | table | sammy
(1 row)

With a table at the ready, let’s use it to practice managing data.

Step 7 — Adding, Querying, and Deleting Data in a Table

Now that you have a table, you can insert some data into it. As an example, add a slide and a swing by calling the table you want to add to, naming the columns and then providing data for each column, like this:

  • INSERT INTO playground (type, color, location, install_date) VALUES (‘slide’, ‘blue’, ‘south’, ‘2017-04-28’);
  • INSERT INTO playground (type, color, location, install_date) VALUES (‘swing’, ‘yellow’, ‘northwest’, ‘2018-08-16’);

You should take care when entering the data to avoid a few common hangups. For one, do not wrap the column names in quotation marks, but the column values that you enter do need quotes.

Another thing to keep in mind is that you do not enter a value for the equip_id column. This is because this is automatically generated whenever you add a new row to the table.

Retrieve the information you’ve added by typing:

  • SELECT * FROM playground;
Output
 equip_id | type  | color  | location  | install_date 
----------+-------+--------+-----------+--------------
        1 | slide | blue   | south     | 2017-04-28
        2 | swing | yellow | northwest | 2018-08-16
(2 rows)

Here, you can see that your equip_id has been filled in successfully and that all of your other data has been organized correctly.

If the slide on the playground breaks and you have to remove it, you can also remove the row from your table by typing:

  • DELETE FROM playground WHERE type = ‘slide’;

Query the table again:

  • SELECT * FROM playground;
Output
 equip_id | type  | color  | location  | install_date 
----------+-------+--------+-----------+--------------
        2 | swing | yellow | northwest | 2018-08-16
(1 row)

Notice that the slide row is no longer a part of the table.

Step 8 — Adding and Deleting Columns from a Table

After creating a table, you can modify it by adding or removing columns. Add a column to show the last maintenance visit for each piece of equipment by typing:

  • ALTER TABLE playground ADD last_maint date;

If you view your table information again, you will see the new column has been added but no data has been entered:

  • SELECT * FROM playground;
Output
 equip_id | type  | color  | location  | install_date | last_maint 
----------+-------+--------+-----------+--------------+------------
        2 | swing | yellow | northwest | 2018-08-16   | 
(1 row)

If you find that your work crew uses a separate tool to keep track of maintenance history, you can delete of the column by typing:

  • ALTER TABLE playground DROP last_maint;

This deletes the last_maint column and any values found within it, but leaves all the other data intact.

Step 9 — Updating Data in a Table

So far, you’ve learned how to add records to a table and how to delete them, but this tutorial hasn’t yet covered how to modify existing entries.

You can update the values of an existing entry by querying for the record you want and setting the column to the value you wish to use. You can query for the swing record (this will match everyswing in your table) and change its color to red. This could be useful if you gave the swing set a paint job:

  • UPDATE playground SET color = ‘red’ WHERE type = ‘swing’;

You can verify that the operation was successful by querying the data again:

  • SELECT * FROM playground;
Output
 equip_id | type  | color | location  | install_date 
----------+-------+-------+-----------+--------------
        2 | swing | red   | northwest | 2018-08-16
(1 row)

As you can see, the slide is now registered as being red.

Conclusion

You are now set up with PostgreSQL on your Ubuntu 20.04 server. If you’d like to learn more about Postgres and how to use it, we encourage you to check out the following guides: