Skip to content

For help, click the link below to get free database assistance or contact our experts for personalized support.

Measuring high availability

The need for high availability is determined by the business requirements, potential risks, and operational limitations. For example, the more components you add to your infrastructure, the more complex and time-consuming it is to maintain. Moreover, it may introduce extra failure points. The recommendation is to follow the principle “The simpler the better”.

The level of high availability depends on the following:

  • how frequently you may encounter an outage or a downtime.
  • how much downtime you can bear without negatively impacting your users for every outage, and
  • how much data loss you can tolerate during the outage.

When you evaluate high-availability, consider these two aspects:

  • Expected level of availability.
  • Actual availability level of your infrastructure.

Expected level of availability

It is measured by establishing a measurement time frame and dividing it by the time that it was available. This ratio will rarely be one, which is equal to 100% availability. At Percona, we don’t consider a solution to be highly available if it is not at least 99% or two nines available.

The following table shows the amount of downtime for each level of availability from two to five nines.

Availability % Downtime per year Downtime per month Downtime per week Downtime per day
99% (“two nines”) 3.65 days 7.31 hours 1.68 hours 14.40 minutes
99.5% (“two nines five”) 1.83 days 3.65 hours 50.40 minutes 7.20 minutes
99.9% (“three nines”) 8.77 hours 43.83 minutes 10.08 minutes 1.44 minutes
99.95% (“three nines five”) 4.38 hours 21.92 minutes 5.04 minutes 43.20 seconds
99.99% (“four nines”) 52.60 minutes 4.38 minutes 1.01 minutes 8.64 seconds
99.995% (“four nines five”) 26.30 minutes 2.19 minutes 30.24 seconds 4.32 seconds
99.999% (“five nines”) 5.26 minutes 26.30 seconds 6.05 seconds 864.00 milliseconds

Actual level of availability

Measuring the real level of high availability (HA) in your system is key to making sure your investment in HA infrastructure pays off. Instead of relying on assumptions or expectations, you should base your availability insights on incident management data. This is the information collected during service disruptions, failures, or outages that affect the normal functioning of the setup. With this data, you can track metrics like uptime, Mean Time to Recovery (MTTR), and Mean Time Between Failures (MTBF).

MTBF gives you a picture of how reliable your infrastructure really is. In well-designed high-availability environment, the incidents should be rare, typically occurring no more than once every 2 to 4 years. This assumes a robust infrastructure, as not all systems equally suit for handling database load.

Recovery speed matters too. For example, a typical Patroni-based cluster can fail over to a new primary node within 30 to 50 seconds. However, note that database availability metrics typically don’t consider the application’s ability to detect the failover and reconnect. Some applications recover seamlessly, while others may require a restart.