Skip to content
logo
Percona XtraDB Cluster
Cluster failover
Initializing search
    percona/pxc-docs
    percona/pxc-docs
    • Home
      • About Percona XtraDB Cluster
      • Percona XtraDB Cluster limitations
      • Understand version numbers
      • Quick start guide for Percona XtraDB Cluster
      • Install Percona XtraDB Cluster
      • Configure nodes for write-set replication
      • Bootstrap the first node
      • Add nodes to cluster
      • Verify replication
      • High availability
      • PXC strict mode
      • Online schema upgrade
      • Non-Blocking Operations (NBO) method for Online Scheme Upgrades (OSU)
      • Security basics
      • Secure the network
      • Encrypt PXC traffic
      • Enable AppArmor
      • Enable SELinux
      • State snapshot transfer
      • Percona XtraBackup SST configuration
      • Restart the cluster nodes
      • Cluster failover
        • Use an arbitrator
        • Recover a non-primary cluster
        • Other reading
      • Monitor the cluster
      • Certification in Percona XtraDB Cluster
      • Percona XtraDB Cluster threading model
      • Understand GCache and Record-Set cache
      • GCache encryption and Write-Set cache encryption
      • Perfomance Schema instrumentation
      • Data at Rest Encryption
      • Upgrade Percona XtraDB Cluster
      • Crash recovery
      • Configure Percona XtraDB Cluster on CentOS
      • Configure Percona XtraDB Cluster on Ubuntu
      • Set up Galera arbitrator
      • How to set up a three-node cluster on a single box
      • How to set up a three-node cluster in EC2 environment
      • Load balancing with HAProxy
      • Load balancing with ProxySQL
      • ProxySQL admin utilities
      • Setting up a testing environment with ProxySQL
      • Release notes index
      • Percona XtraDB Cluster 8.0.31-23 (2023-03-14)
      • Percona XtraDB Cluster 8.0.30-22.md (2022-12-28)
      • Percona XtraDB Cluster 8.0.29-21 (2022-09-12)
      • Percona XtraDB Cluster 8.0.28-19.1 (2022-07-19)
      • Percona XtraDB Cluster 8.0.27-18.1
      • Percona XtraDB Cluster 8.0.26-16.1
      • Percona XtraDB Cluster 8.0.25-15.1
      • Percona XtraDB Cluster 8.0.23-14.1
      • Percona XtraDB Cluster 8.0.22-13.1
      • Percona XtraDB Cluster 8.0.21-12.1
      • Percona XtraDB Cluster 8.0.20-11
      • Percona XtraDB Cluster 8.0.20-11.3
      • Percona XtraDB Cluster 8.0.20-11.2
      • Percona XtraDB Cluster 8.0.19-10
      • Percona XtraDB Cluster 8.0.18-9.3
      • Index of wsrep status variables
      • Index of wsrep system variables
      • Index of wsrep_provider options
      • Index of files created by PXC
      • Frequently asked questions
      • Glossary
      • Copyright and licensing information
      • Trademark policy

    • Use an arbitrator
    • Recover a non-primary cluster
    • Other reading

    Cluster failover¶

    Cluster membership is determined simply by which nodes are connected to the rest of the cluster; there is no configuration setting explicitly defining the list of all possible cluster nodes. Therefore, every time a node joins the cluster, the total size of the cluster is increased and when a node leaves (gracefully) the size is decreased.

    The size of the cluster is used to determine the required votes to achieve quorum. A quorum vote is done when a node or nodes are suspected to no longer be part of the cluster (they do not respond). This no response timeout is the evs.suspect_timeout setting in the wsrep_provider_options (default 5 sec), and when a node goes down ungracefully, write operations will be blocked on the cluster for slightly longer than that timeout.

    Once a node (or nodes) is determined to be disconnected, then the remaining nodes cast a quorum vote, and if the majority of nodes from before the disconnect are still still connected, then that partition remains up. In the case of a network partition, some nodes will be alive and active on each side of the network disconnect. In this case, only the quorum will continue. The partition(s) without quorum will change to non-primary state.

    As a consequence, it’s not possible to have safe automatic failover in a 2 node cluster, because failure of one node will cause the remaining node to become non-primary. Moreover, any cluster with an even number of nodes (say two nodes in two different switches) have some possibility of a split brain situation, when neither partition is able to retain quorum if connection between them is lost, and so they both become non-primary.

    Therefore, for automatic failover, the rule of 3s is recommended. It applies at various levels of your infrastructure, depending on how far the cluster is spread out to avoid single points of failure. For example:

    • A cluster on a single switch should have 3 nodes

    • A cluster spanning switches should be spread evenly across at least 3 switches

    • A cluster spanning networks should span at least 3 networks

    • A cluster spanning data centers should span at least 3 data centers

    These rules will prevent split brain situations and ensure automatic failover works correctly.

    Use an arbitrator¶

    If it is too expensive to add a third node, switch, network, or datacenter, you should use an arbitrator. An arbitrator is a voting member of the cluster that can receive and relay replication, but it does not persist any data, and runs its own daemon instead of mysqld. Placing even a single arbitrator in a 3rd location can add split brain protection to a cluster that is spread across only two nodes/locations.

    Recover a non-primary cluster¶

    It is important to note that the rule of 3s applies only to automatic failover. In the event of a 2-node cluster (or in the event of some other outage that leaves a minority of nodes active), the failure of one node will cause the other to become non-primary and refuse operations. However, you can recover the node from non-primary state using the following command:

    SET GLOBAL wsrep_provider_options='pc.bootstrap=true';
    

    This will tell the node (and all nodes still connected to its partition) that it can become a primary cluster. However, this is only safe to do when you are sure there is no other partition operating in primary as well, or else Percona XtraDB Cluster will allow those two partitions to diverge (and you will end up with two databases that are impossible to re-merge automatically).

    For example, assume there are two data centers, where one is primary and one is for disaster recovery, with an even number of nodes in each. When an extra arbitrator node is run only in the primary data center, the following high availability features will be available:

    • Auto-failover of any single node or nodes within the primary or secondary data center

    • Failure of the secondary data center would not cause the primary to go down (because of the arbitrator)

    • Failure of the primary data center would leave the secondary in a non-primary state.

    • If a disaster-recovery failover has been executed, you can tell the secondary data center to bootstrap itself with a single command, but disaster-recovery failover remains in your control.

    Other reading¶

    • PXC - Failure Scenarios with only 2 nodes

    Contact us

    For free technical help, visit the Percona Community Forum.

    To report bugs or submit feature requests, open a JIRA ticket.

    For paid support and managed or consulting services , contact Percona Sales.


    Last update: 2023-01-20
    Percona LLC and/or its affiliates, © 2023
    Made with Material for MkDocs

    Cookie consent

    We use cookies to recognize your repeated visits and preferences, as well as to measure the effectiveness of our documentation and whether users find what they're searching for. With your consent, you're helping us to make our documentation better.