Binding Percona Distribution for PostgreSQL components to Specific Kubernetes/OpenShift Nodes¶
The operator does good job automatically assigning new Pods to nodes with sufficient resources to achieve balanced distribution across the cluster. Still there are situations when it is worth to ensure that pods will land on specific nodes: for example, to get speed advantages of the SSD equipped machine, or to reduce network costs choosing nodes in a same availability zone.
Appropriate sections of the
deploy/cr.yaml
file (such as pgPrimary
or pgReplicas
) contain keys which can be used to do this, depending on what is the
best for a particular situation.
Affinity and anti-affinity¶
Affinity makes Pod eligible (or not eligible - so called “anti-affinity”) to be scheduled on the node which already has Pods with specific labels, or has specific labels itself (so called “Node affinity”). Particularly, Pod anti-affinity is good to reduce costs making sure several Pods with intensive data exchange will occupy the same availability zone or even the same node - or, on the contrary, to make them land on different nodes or even different availability zones for the high availability and balancing purposes. Node affinity is useful to assign PostgreSQL instances to specific Kubernetes Nodes (ones with specific hardware, zone, etc.).
Pod anti-affinity is controlled by the antiAffinityType
option, which can
be put into pgPrimary
, pgBouncer
, and backup
sections of the
deploy/cr.yaml
configuration file. This option can be set to one of two
values:
preferred
Pod anti-affinity is a sort of a soft rule. It makes Kubernetes trying to schedule Pods matching the anti-affinity rules to different Nodes. If it is not possible, then one or more Pods are scheduled to the same Node. This variant is used by default.required
Pod anti-affinity is a sort of a hard rule. It forces Kubernetes to schedule each Pod matching the anti-affinity rules to different Nodes. If it is not possible, then a Pod will not be scheduled at all.
Node affinity can be controlled by the pgPrimary.affinity.nodeAffinityType
option in the deploy/cr.yaml
configuration file. This option can be set to
either preferred
or required
similarly to the antiAffinityType
option.
Simple approach - configure Node Affinity based on nodeLabel¶
The Operator provides the pgPrimary.affinity.nodeLabel
option, which should
contains one or more key-value pairs. If the node is not labeled with each
key-value pair and nodeAffinityType
is set to required
, the Pod will not be
able to land on it.
The following example forces Operator to lend Percona Distribution for
PostgreSQL instances on the Nodes having the kubernetes.io/region: us-central1
label:
affinity:
nodeAffinityType: required
nodeLabel:
kubernetes.io/region: us-central1
Advanced approach - use standard Kubernetes constraints¶
Previous way can be used with no special knowledge of the Kubernetes way of
assigning Pods to specific Nodes. Still in some cases more complex tuning may be
needed. In this case pgPrimary.affinity.advanced
option placed in the
deploy/cr.yaml
file turns off the effect of the nodeLabel
and allows to use
standard Kubernetes affinity constraints of any complexity:
affinity:
advanced:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S1
topologyKey: failure-domain.beta.kubernetes.io/zone
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S2
topologyKey: kubernetes.io/hostname
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/e2e-az-name
operator: In
values:
- e2e-az1
- e2e-az2
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: another-node-label-key
operator: In
values:
- another-node-label-value
You can see the explanation of these affinity options in Kubernetes documentation.
Default Affinity rules¶
The following anti-affinity rules are applied to all Percona Distribution for PostgreSQL Pods:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: vendor
operator: In
values:
- crunchydata
- key: pg-pod-anti-affinity
operator: Exists
- key: pg-cluster
operator: In
values:
- cluster1
topologyKey: kubernetes.io/hostname
weight: 1
You can see the explanation of these affinity options in Kubernetes documentation.
Note
Setting required
anti-affinity type will result in placing all Pods on
separate nodes, so default configuration will require 7 Kubernetes nodes
to deploy the cluster with separate nodes assigned to one PostgreSQL
primary, two PostgreSQL replica instances, three pgBouncer and one
pgBackrest Pod.
Tolerations¶
Tolerations allow Pods having them to be able to land onto nodes with matching
taints. Toleration is expressed as a key
with and operator
, which is
either exists
or equal
(the latter variant also requires a value
the key
is equal to). Moreover, toleration should have a specified effect
, which may
be a self-explanatory NoSchedule
, less strict PreferNoSchedule
, or
NoExecute
. The last variant means that if a taint with NoExecute
is
assigned to node, then any Pod not tolerating this taint will be removed from
the node, immediately or after the tolerationSeconds
interval, like in the
following example.
You can use pgPrimary.tolerations
key in the deploy/cr.yaml
configuration file as follows:
tolerations:
- key: "node.alpha.kubernetes.io/unreachable"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 6000
The Kubernetes Taints and Toleratins contains more examples on this topic.