Skip to content
logo
Percona Product Documentation
Testing the Patroni PostgreSQL Cluster
Initializing search
    percona/postgresql-docs
    percona/postgresql-docs
    • Percona Distribution for PostgreSQL 12 Documentation
        • Overview
        • Install on Debian and Ubuntu
        • Install on RHEL and derivatives
        • Enable Percona Distribution for PostgreSQL extensions
        • Repositories overview
      • Migrate from PostgreSQL to Percona Distribution for PostgreSQL
      • Upgrading Percona Distribution for PostgreSQL from 11 to 12
      • Minor Upgrade of Percona Distribution for PostgreSQL
      • pg_stat_monitor
      • High availability
        • Deploying on Debian or Ubuntu
        • Deploying on RHEL or CentOS
        • Testing the Patroni PostgreSQL Cluster
          • Testing replication
          • Testing failover
            • Scenario 1. Intentionally stop the PostgreSQL on the primary node and verify access to PostgreSQL.
            • Scenario 2. Abrupt machine shutdown or power outage
          • Manual switchover
      • Backup and disaster recovery
        • Deploying backup and disaster recovery solution in Percona Distribution for PostgreSQL
      • LDAP authentication
    • Uninstall
      • Release notes index
      • Percona Distribution for PostgreSQL 12.14 (2023-03-29)
      • Percona Distribution for PostgreSQL 12.13 (2022-12-01)
      • Percona Distribution for PostgreSQL 12.12 (2022-09-07)
      • Percona Distribution for PostgreSQL 12.11 (2022-06-06)
      • Percona Distribution for PostgreSQL 12.10 Second Update (2022-05-05)
      • Percona Distribution for PostgreSQL 12.10 Update (2022-04-14)
      • Percona Distribution for PostgreSQL 12.10 (2022-03-30)
      • Percona Distribution for PostgreSQL 12.9 (2021-12-13)
      • Percona Distribution for PostgreSQL 12.8 Update (2021-12-07)
      • Percona Distribution for PostgreSQL 12.8 (2021-09-09)
      • Percona Distribution for PostgreSQL 12.7 Third Update (2021-07-15)
      • Percona Distribution for PostgreSQL 12.7 Second Update (2021-07-01)
      • Percona Distribution for PostgreSQL 12.7 Update (2021-06-10)
      • Percona Distribution for PostgreSQL 12.7 (2021-05-24)
      • Percona Distribution for PostgreSQL 12.6 Fourth Update (2021-06-10)
      • Percona Distribution for PostgreSQL 12.6 Third Update (2021-05-10)
      • Percona Distribution for PostgreSQL 12.6 Second Update (2021-04-27)
      • Percona Distribution for PostgreSQL 12.6 Update (2021-04-12)
      • Percona Distribution for PostgreSQL 12.6 (2021-03-09)
      • Percona Distribution for PostgreSQL 12.5 (2020-12-07)
      • Percona Distribution for PostgreSQL 12.4 Update (2020-09-22)
      • Percona Distribution for PostgreSQL 12.4 (2020-09-11)
      • Percona Distribution for PostgreSQL 12.3 (2020-06-11)
      • Percona Distribution for PostgreSQL 12.2 (2020-05-15)
    • Licensing

    • Testing replication
    • Testing failover
      • Scenario 1. Intentionally stop the PostgreSQL on the primary node and verify access to PostgreSQL.
      • Scenario 2. Abrupt machine shutdown or power outage
    • Manual switchover

    Testing the Patroni PostgreSQL Cluster¶

    This document covers the following scenarios to test the PostgreSQL cluster:

    • replication,
    • connectivity,
    • failover, and
    • manual switchover.

    Testing replication¶

    1. Connect to the cluster and establish the psql session from a client machine that can connect to the HAProxy node. Use the HAProxy-demo node’s public IP address:

      $ psql -U postgres -h 134.209.111.138 -p 5000
      
    2. Run the following commands to create a table and insert a few rows:

      CREATE TABLE customer(name text,age integer);
      INSERT INTO CUSTOMER VALUES('john',30);
      INSERT INTO CUSTOMER VALUES('dawson',35);
      
    3. To ensure that the replication is working, we can log in to each PostgreSQL node and run a simple SQL statement against the locally running instance:

      $ sudo psql -U postgres -c "SELECT * FROM CUSTOMER;"
      

      The results on each node should be the following:

        name  | age
      --------+-----
       john   |  30
       dawson |  35
      (2 rows)
      

    Testing failover¶

    In a proper setup, client applications won’t have issues connecting to the cluster, even if one or even two of the nodes go down. We will test the cluster for failover in the following scenarios:

    Scenario 1. Intentionally stop the PostgreSQL on the primary node and verify access to PostgreSQL.¶

    1. Run the following command on any node to check the current cluster status:

      $ sudo patronictl -c /etc/patroni/patroni.yml list
      
      + Cluster: stampede1 (7011110722654005156) -----------+
      | Member | Host  | Role    | State   | TL | Lag in MB |
      +--------+-------+---------+---------+----+-----------+
      | node1  | node1 | Leader  | running |  1 |           |
      | node2  | node2 | Replica | running |  1 |         0 |
      | node3  | node3 | Replica | running |  1 |         0 |
      +--------+-------+---------+---------+----+-----------+
      
    2. node1 is the current leader. Stop Patroni in node1 to see how it changes the cluster:

      $ sudo systemctl stop patroni
      
    3. Once the service stops in node1, check the logs in node2 and node3 using the following command:

      $ sudo journalctl -u patroni.service -n 100 -f
      

      Output

      Sep 23 14:18:13 node03 patroni[10042]: 2021-09-23 14:18:13,905 INFO: no action. I am a secondary (node3) and following a leader (node1)
      Sep 23 14:18:20 node03 patroni[10042]: 2021-09-23 14:18:20,011 INFO: Got response from node2 http://node2:8008/patroni: {"state": "running", "postprimary_start_time": "2021-09-23 12:50:29.460027+00:00", "role": "replica", "server_version": 130003, "cluster_unlocked": true, "xlog": {"received_location": 67219152, "replayed_location": 67219152, "replayed_timestamp": "2021-09-23 13:19:50.329387+00:00", "paused": false}, "timeline": 1, "database_system_identifier": "7011110722654005156", "patroni": {"version": "2.1.0", "scope": "stampede1"}}
      Sep 23 14:18:20 node03 patroni[10042]: 2021-09-23 14:18:20,031 WARNING: Request failed to node1: GET http://node1:8008/patroni (HTTPConnectionPool(host='node1', port=8008): Max retries exceeded with url: /patroni (Caused by ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))))
      Sep 23 14:18:20 node03 patroni[10042]: 2021-09-23 14:18:20,038 INFO: Software Watchdog activated with 25 second timeout, timing slack 15 seconds
      Sep 23 14:18:20 node03 patroni[10042]: 2021-09-23 14:18:20,043 INFO: promoted self to leader by acquiring session lock
      Sep 23 14:18:20 node03 patroni[13641]: server promoting
      Sep 23 14:18:20 node03 patroni[10042]: 2021-09-23 14:18:20,049 INFO: cleared rewind state after becoming the leader
      Sep 23 14:18:21 node03 patroni[10042]: 2021-09-23 14:18:21,101 INFO: no action. I am (node3) the leader with the lock
      Sep 23 14:18:21 node03 patroni[10042]: 2021-09-23 14:18:21,117 INFO: no action. I am (node3) the leader with the lock
      Sep 23 14:18:31 node03 patroni[10042]: 2021-09-23 14:18:31,114 INFO: no action. I am (node3) the leader with the lock
      ...
      

      The logs in node3 show that the requests to node1 are failing, the watchdog is coming into action, and node3 is promoting itself as the leader:

    4. Verify that you can still access the cluster through the HAProxy instance and read data:

      $ psql -U postgres -h 10.104.0.6 -p 5000 -c "SELECT * FROM CUSTOMER;"
      
        name  | age
      --------+-----
       john   |  30
       dawson |  35
      (2 rows)
      
    5. Restart the Patroni service in node1

      $ sudo systemctl start patroni
      
    6. Check the current cluster status:

      $ sudo patronictl -c /etc/patroni/patroni.yml list
      
      + Cluster: stampede1 (7011110722654005156) -----------+
      | Member | Host  | Role    | State   | TL | Lag in MB |
      +--------+-------+---------+---------+----+-----------+
      | node1  | node1 | Replica | running |  2 |         0 |
      | node2  | node2 | Replica | running |  2 |         0 |
      | node3  | node3 | Leader  | running |  2 |           |
      +--------+-------+---------+---------+----+-----------+
      

    As we see, node3 remains the leader and the rest are replicas.

    Scenario 2. Abrupt machine shutdown or power outage¶

    To emulate the power outage, let’s kill the service in node3 and see what happens in node1 and node2.

    1. Identify the process ID of Patroni and then kill it with a -9 switch.

      $ ps aux | grep -i patroni
      
      postgres   10042  0.1  2.1 647132 43948 ?        Ssl  12:50   0:09 /usr/bin/python3 /usr/bin/patroni /etc/patroni/patroni.yml
      
      $ sudo kill -9 10042
      
    2. Check the logs on node2:

      $ sudo journalctl -u patroni.service -n 100 -f
      
      Output
      Sep 23 14:40:41 node02 patroni[10577]: 2021-09-23 14:40:41,656 INFO: no action. I am a secondary (node2) and following a leader (node3)
      …
      Sep 23 14:41:01 node02 patroni[10577]: 2021-09-23 14:41:01,373 INFO: Got response from node1 http://node1:8008/patroni: {"state": "running", "postprimary_start_time": "2021-09-23 14:25:30.076762+00:00", "role": "replica", "server_version": 130003, "cluster_unlocked": true, "xlog": {"received_location": 67221352, "replayed_location": 67221352, "replayed_timestamp": null, "paused": false}, "timeline": 2, "database_system_identifier": "7011110722654005156", "patroni": {"version": "2.1.0", "scope": "stampede1"}}
      Sep 23 14:41:03 node02 patroni[10577]: 2021-09-23 14:41:03,364 WARNING: Request failed to node3: GET http://node3:8008/patroni (HTTPConnectionPool(host='node3', port=8008): Max retries exceeded with url: /patroni (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f57e06dffa0>, 'Connection to node3 timed out. (connect timeout=2)')))
      Sep 23 14:41:03 node02 patroni[10577]: 2021-09-23 14:41:03,373 INFO: Software Watchdog activated with 25 second timeout, timing slack 15 seconds
      Sep 23 14:41:03 node02 patroni[10577]: 2021-09-23 14:41:03,385 INFO: promoted self to leader by acquiring session lock
      Sep 23 14:41:03 node02 patroni[15478]: server promoting
      Sep 23 14:41:03 node02 patroni[10577]: 2021-09-23 14:41:03,397 INFO: cleared rewind state after becoming the leader
      Sep 23 14:41:04 node02 patroni[10577]: 2021-09-23 14:41:04,450 INFO: no action. I am (node2) the leader with the lock
      Sep 23 14:41:04 node02 patroni[10577]: 2021-09-23 14:41:04,475 INFO: no action. I am (node2) the leader with the lock
      …
      … 
      

      node2 realizes that the leader is dead, and promotes itself as the leader.

    3. Try accessing the cluster using the HAProxy endpoint at any point in time between these operations. The cluster is still accepting connections.

    Manual switchover¶

    Typically, a manual switchover is needed for planned downtime to perform maintenance activity on the leader node. Patroni provides the switchover command to manually switch over from the leader node.

    Run the following command on node2 (the current leader node):

    $ sudo patronictl -c /etc/patroni/patroni.yml switchover
    

    Patroni asks the name of the current primary node and then the node that should take over as the switched-over primary. You can also specify the time at which the switchover should happen. To trigger the process immediately, specify the value now:

    primary [node2]: node2
    Candidate ['node1', 'node3'] []: node1
    When should the switchover take place (e.g. 2021-09-23T15:56 )  [now]: now
    Current cluster topology
    + Cluster: stampede1 (7011110722654005156) -----------+
    | Member | Host  | Role    | State   | TL | Lag in MB |
    +--------+-------+---------+---------+----+-----------+
    | node1  | node1 | Replica | running |  3 |         0 |
    | node2  | node2 | Leader  | running |  3 |           |
    | node3  | node3 | Replica | stopped |    |   unknown |
    +--------+-------+---------+---------+----+-----------+
    Are you sure you want to switchover cluster stampede1, demoting current primary node2? [y/N]: y
    2021-09-23 14:56:40.54009 Successfully switched over to "node1"
    + Cluster: stampede1 (7011110722654005156) -----------+
    | Member | Host  | Role    | State   | TL | Lag in MB |
    +--------+-------+---------+---------+----+-----------+
    | node1  | node1 | Leader  | running |  3 |           |
    | node2  | node2 | Replica | stopped |    |   unknown |
    | node3  | node3 | Replica | stopped |    |   unknown |
    +--------+-------+---------+---------+----+-----------+
    

    Restart the Patroni service in node2 (after the “planned maintenance”). The node rejoins the cluster as a secondary.

    Contact Us

    For free technical help, visit the Percona Community Forum.

    To report bugs or submit feature requests, open a JIRA ticket.

    For paid support and managed or consulting services , contact Percona Sales.


    Last update: December 5, 2022
    Created: December 15, 2021
    Percona LLC, © 2023
    Made with Material for MkDocs

    Cookie consent

    We use cookies to recognize your repeated visits and preferences, as well as to measure the effectiveness of our documentation and whether users find what they're searching for. With your consent, you're helping us to make our documentation better.