Skip to content

Providing Backups

The Operator usually stores Server for MongoDB backups outside the Kubernetes cluster: on Amazon S3 or S3-compatible storage, or on Azure Blob Storage.

image

The Operator allows doing cluster backup in two ways. Scheduled backups are configured in the deploy/cr.yaml file to be executed automatically in proper time. On-demand backups can be done manually at any moment. Both ways use the Percona Backup for MongoDB tool.

Warning

Backups made with the Operator versions before 1.9.0 are incompatible for restore with the Operator 1.9.0 and later. That is because Percona Backup for MongoDB 1.5.0 used by the newer Operator versions processes system collections Users and Roles differently. The recommended approach is to make a fresh backup after upgrading the Operator to version 1.9.0.

Making scheduled backups

Backups schedule is defined in the backup section of the deploy/cr.yaml file. This section contains backup.enabled key (it should be set to true to enable backups), and the following subsections:

  • storages subsection contains data needed to access the S3-compatible cloud to store backups,
  • tasks subsection allows to actually schedule backups (the schedule is specified in crontab format).

Backups on Amazon S3 or S3-compatible storage

Since backups are stored separately on the Amazon S3, a secret with AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY should be present on the Kubernetes cluster. The secrets file with these base64-encoded keys should be created: for example deploy/backup-s3.yaml file with the following contents.

apiVersion: v1
kind: Secret
metadata:
  name: my-cluster-name-backup-s3
type: Opaque
data:
  AWS_ACCESS_KEY_ID: UkVQTEFDRS1XSVRILUFXUy1BQ0NFU1MtS0VZ
  AWS_SECRET_ACCESS_KEY: UkVQTEFDRS1XSVRILUFXUy1TRUNSRVQtS0VZ

Note

The following command can be used to get a base64-encoded string from a plain text one: $ echo -n 'plain-text-string' | base64

The name value is the Kubernetes secret name which will be used further, and AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are the keys to access S3 storage (and obviously they should contain proper values to make this access possible). To have effect secrets file should be applied with the appropriate command to create the secret object, e.g. kubectl apply -f deploy/backup-s3.yaml (for Kubernetes).

All the data needed to access the S3-compatible cloud to store backups should be put into the backup.storages subsection, and backup.tasks subsection should actually schedule backups in crontab-compatible way. Here is an example of deploy/cr.yaml which uses Amazon S3 storage for backups:

...
backup:
  enabled: true
  ...
  storages:
    s3-us-west:
      type: s3
      s3:
        bucket: S3-BACKUP-BUCKET-NAME-HERE
        region: us-west-2
        credentialsSecret: my-cluster-name-backup-s3
  ...
  tasks:
   - name: "sat-night-backup"
     schedule: "0 0 * * 6"
     keep: 3
     storageName: s3-us-west
  ...
Using AWS EC2 instances for backups makes it possible to automate access to AWS S3 buckets based on IAM roles for Service Accounts with no need to specify the S3 credentials explicitly.

Following steps are needed to turn this feature on:

  • Create the IAM instance profile and the permission policy within where you specify the access level that grants the access to S3 buckets.
  • Attach the IAM profile to an EC2 instance.
  • Configure an S3 storage bucket and verify the connection from the EC2 instance to it.
  • Do not provide s3.credentialsSecret for the storage in deploy/cr.yaml.

If you use some S3-compatible storage instead of the original Amazon S3, the endpointURL is needed in the s3 subsection which points to the actual cloud used for backups and is specific to the cloud provider. For example, using Google Cloud involves the following endpointUrl:

endpointUrl: https://storage.googleapis.com

Also you can use prefix option to specify the path (sub-folder) to the backups inside the S3 bucket. If prefix is not set, backups are stored in the root directory.

The options within these three subsections are further explained in the Operator Custom Resource options.

One option which should be mentioned separately is credentialsSecret which is a Kubernetes secret for backups. Value of this key should be the same as the name used to create the secret object (my-cluster-name-backup-s3 in the last example).

Backups on Microsoft Azure Blob storage

Since backups are stored separately on Azure Blob Storage, a secret with AZURE_STORAGE_ACCOUNT_NAME and AZURE_STORAGE_ACCOUNT_KEY should be present on the Kubernetes cluster. The secrets file with these base64-encoded keys should be created: for example deploy/backup-azure.yaml file with the following contents.

apiVersion: v1
kind: Secret
metadata:
  name: my-cluster-azure-secret
type: Opaque
data:
  AZURE_STORAGE_ACCOUNT_NAME: UkVQTEFDRS1XSVRILUFXUy1BQ0NFU1MtS0VZ
  AZURE_STORAGE_ACCOUNT_KEY: UkVQTEFDRS1XSVRILUFXUy1TRUNSRVQtS0VZ

Note

The following command can be used to get a base64-encoded string from a plain text one: $ echo -n 'plain-text-string' | base64

The name value is the Kubernetes secret name which will be used further, and AZURE_STORAGE_ACCOUNT_NAME and AZURE_STORAGE_ACCOUNT_KEY credentials will be used to access the storage (and obviously they should contain proper values to make this access possible). To have effect secrets file should be applied with the appropriate command to create the secret object, e.g. kubectl apply -f deploy/backup-azure.yaml (for Kubernetes).

All the data needed to access the Azure Blob storage to store backups should be put into the backup.storages subsection, and backup.tasks subsection should actually schedule backups in crontab-compatible way. Here is an example of deploy/cr.yaml which uses Azure Blob storage for backups:

...
backup:
  enabled: true
  ...
  storages:
    azure-blob:
      type: azure
      azure:
        container: <your-container-name>
        prefix: psmdb
        credentialsSecret: my-cluster-azure-secret

  ...
  tasks:
   - name: "sat-night-backup"
     schedule: "0 0 * * 6"
     keep: 3
     storageName: azure-blob
  ...

The options within these three subsections are further explained in the Operator Custom Resource options.

One option which should be mentioned separately is credentialsSecret which is a Kubernetes secret for backups. Value of this key should be the same as the name used to create the secret object (my-cluster-name-backup-s3 in the last example).

You can use prefix option to specify the path (sub-folder) to the backups inside the container. If prefix is not set, backups will be stored in the root directory of the container.

Making on-demand backup

To make an on-demand backup, the user should first make changes in the deploy/cr.yaml configuration file: set the backup.enabled key to true and configure backup storage in the backup.storages subsection in a same way it was done for scheduled backups. When the deploy/cr.yaml file contains correctly configured keys and is applied with kubectl command, use a special backup configuration YAML file with the following contents:

  • backup name in the metadata.name key,
  • Percona Server for MongoDB Cluster name in the clusterName key (prior to the Operator version 1.12.0 this key was named spec.psmdbCluster),
  • storage name from deploy/cr.yaml in the spec.storageName key.

The example of such file is deploy/backup/backup.yaml.

When the backup destination is configured and applied with kubectl apply -f deploy/cr.yaml command, the actual backup command is executed:

$ kubectl apply -f deploy/backup/backup.yaml

Note

Storing backup settings in a separate file can be replaced by passing its content to the kubectl apply command as follows:

$ cat <<EOF | kubectl apply -f-
apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDBBackup
metadata:
  name: backup1
spec:
  clusterName: my-cluster-name
  storageName: s3-us-west
EOF

Storing operations logs for point-in-time recovery

Point-in-time recovery functionality allows users to roll back the cluster to a specific date and time. Technically, this feature involves saving operations log updates to the S3-compatible backup storage.

To be used, it requires setting the backup.pitr.enabled key in the deploy/cr.yaml configuration file:

backup:
  ...
  pitr:
    enabled: true

Note

It is necessary to have at least one full backup to use point-in-time recovery. Percona Backup for MongoDB will not upload operations logs if there is no full backup. This is true for new clusters and also true for clusters which have been just recovered from backup.

Percona Backup for MongoDB uploads operations logs to the same bucket where full backup is stored. This makes point-in-time recovery functionality available only if there is a single bucket in spec.backup.storages. Otherwise point-in-time recovery will not be enabled and there will be an error message in the operator logs.

Note

Adding a new bucket when point-in-time recovery is enabled will not break it, but put error message about the additional bucket in the operator logs as well.

Restore the cluster from a previously saved backup

Backup can be restored not only on the Kubernetes cluster where it was made, but also on any Kubernetes-based environment with the installed Operator.

Note

When restoring to a new Kubernetes-based environment, make sure it has a Secrets object with the same user passwords as in the original cluster. More details about secrets can be found in System Users. The name of the required Secrets object can be found out from the spec.secrets key in the deploy/cr.yaml (my-cluster-name-secrets by default).

Following things are needed to restore a previously saved backup:

  • Make sure that the cluster is running.
  • Find out correct names for the backup and the cluster. Available backups can be listed with the following command:

    $ kubectl get psmdb-backup
    

    Note

    Obviously, you can make this check only on the same cluster on which you have previously made the backup.

    And the following command will list available clusters:

    $ kubectl get psmdb
    

Restoring without point-in-time recovery

When the correct names for the backup and the cluster are known, backup restoration can be done in the following way.

  1. Set appropriate keys in the deploy/backup/restore.yaml file.

    • set spec.clusterName key to the name of the target cluster to restore the backup on,

    • if you are restoring backup on the same Kubernetes-based cluster you have used to save this backup, set spec.backupName key to the name of your backup,

    • if you are restoring backup on the Kubernetes-based cluster different from one you have used to save this backup, set spec.backupSource subsection instead of spec.backupName field to point on the appropriate S3-compatible storage. This backupSource subsection should contain a destination key, followed by necessary storage configuration keys, same as in deploy/cr.yaml file:

      ...
      backupSource:
        destination: s3://S3-BUCKET-NAME/BACKUP-NAME
        s3:
          credentialsSecret: my-cluster-name-backup-s3
          region: us-west-2
          endpointUrl: https://URL-OF-THE-S3-COMPATIBLE-STORAGE
      

    As you have noticed, destination value is composed of three parts in case of S3-compatible storage: the s3:// prefix, the s3 bucket name, and the actual backup name, which you have already found out using the kubectl get psmdb-backup command). For Azure Blob storage, you don’t put the prefix, and use your container name as an equivalent of a bucket.

    • you can also use a storageName key to specify the exact name of the storage (the actual storage should be already defined in the backup.storages subsection of the deploy/cr.yaml file):
    ...
    storageName: s3-us-west
    backupSource:
      destination: s3://S3-BUCKET-NAME/BACKUP-NAME
    
  2. After that, the actual restoration process can be started as follows:

    $ kubectl apply -f deploy/backup/restore.yaml
    

    Note

    Storing backup settings in a separate file can be replaced by passing its content to the kubectl apply command as follows:

    $ cat <<EOF | kubectl apply -f-
    apiVersion: psmdb.percona.com/v1
    kind: PerconaServerMongoDBRestore
    metadata:
      name: restore1
    spec:
      clusterName: my-cluster-name
      backupName: backup1
    EOF
    

Restoring backup with point-in-time recovery

Following steps are needed to roll back the cluster to a specific date and time:

  1. Set appropriate keys in the deploy/backup/restore.yaml file.

    • set spec.clusterName key to the name of the target cluster to restore the backup on,

    • put additional restoration parameters to the pitr section:

      ...
      spec:
        clusterName: my-cluster-name
        pitr:
          type: date
          date: YYYY-MM-DD hh:mm:ss
      
    • if you are restoring backup on the same Kubernetes-based cluster you have used to save this backup, set spec.backupName key to the name of your backup,

    • if you are restoring backup on the Kubernetes-based cluster different from one you have used to save this backup, set spec.backupSource subsection instead of spec.backupName field to point on the appropriate S3-compatible storage. This backupSource subsection should contain a destination key equal to the s3 bucket with a special s3:// prefix, followed by necessary S3 configuration keys, same as in deploy/cr.yaml file:

      ...
      backupSource:
        destination: s3://S3-BUCKET-NAME/BACKUP-NAME
        s3:
          credentialsSecret: my-cluster-name-backup-s3
          region: us-west-2
          endpointUrl: https://URL-OF-THE-S3-COMPATIBLE-STORAGE
      
    • you can also use a storageName key to specify the exact name of the storage (the actual storage should be already defined in the backup.storages subsection of the deploy/cr.yaml file):

      ...
      storageName: s3-us-west
      backupSource:
        destination: s3://S3-BUCKET-NAME/BACKUP-NAME
      
  2. Run the actual restoration process:

    $ kubectl apply -f deploy/backup/restore.yaml
    

    Note

    Storing backup settings in a separate file can be replaced by passing its content to the kubectl apply command as follows:

    $ cat <<EOF | kubectl apply -f-
    apiVersion: psmdb.percona.com/v1
    kind: PerconaServerMongoDBRestore
    metadata:
      name: restore1
    spec:
      clusterName: my-cluster-name
      backupName: backup1
      pitr:
        type: date
        date: YYYY-MM-DD hh:mm:ss
    EOF
    

Delete the unneeded backup

The maximum amount of stored backups is controlled by the backup.tasks.keep option (only successful backups are counted). Older backups are automatically deleted, so that amount of stored backups do not exceed this number. Setting keep=0 or removing this option from deploy/cr.yaml disables automatic deletion of backups.

Manual deleting of a previously saved backup requires not more than the backup name. This name can be taken from the list of available backups returned by the following command:

$ kubectl get psmdb-backup

When the name is known, backup can be deleted as follows:

$ kubectl delete psmdb-backup/<backup-name>

Last update: 2022-09-21