Skip to content

Running Percona Backup for MongoDB

This document provides examples of using pbm commands to operate your backup system. For detailed description of pbm commands, refer to pbm commands.

Listing backups

To view all completed backups, run the pbm list command.

$ pbm list

As of version 1.4.0, the pbm list output shows the completion time. This is the time to which the sharded cluster / non-shared replica set will be returned to after the restore.

Sample output

Backup snapshots:
  2021-01-13T15:50:54Z [complete: 2021-01-13T15:53:40Z]
  2021-01-13T16:10:20Z [complete: 2021-01-13T16:13:00Z]
  2021-01-20T17:09:46Z [complete: 2021-01-20T17:10:33Z]

In logical backups, the completion time almost coincides with the backup finish time. To define the completion time, Percona Backup for MongoDB waits for the backup snapshot to finish on all cluster nodes. Then it captures the oplog from the backup start time up to that time.

In physical backups, the completion time is only a few seconds after the backup start time. By holding the $backupCursor open guarantees that the checkpoint data won’t change during the backup, and Percona Backup for MongoDB can define the completion time ahead.

Starting a backup

$ pbm backup --type=TYPE

As of version 1.7.0, you can specify what type of a backup you wish to make: physical or logical.

When physical backup is selected, Percona Backup for MongoDB copies the contents of the dbpath directory (data and metadata files, indexes, journal and logs) from every shard and config server replica set to the backup storage.

During logical backups, Percona Backup for MongoDB copies the actual data to the backup storage. When no --type flag is passed, Percona Backup for MongoDB makes a logical backup.

For more information about backup types, see Backup and restore types.

By default, Percona Backup for MongoDB uses s2 compression method when making a backup. You can start a backup with a different compression method by passing the --compression flag to the pbm backup command.

For example, to start a backup with gzip compression, use the following command

$ pbm backup --compression=gzip

Supported compression types are: gzip, snappy, lz4, pgzip, zstd. The none value means no compression is done during backup.

As of version 1.7.0, you can configure the compression level for backups. Specify the value for the --compression-level flag. Note that the higher value you specify, the longer it takes to compress / retrieve the data.

Backup in sharded clusters

Important

For PBM v1.0 (only): before running pbm backup on a cluster, stop the balancer.

In sharded clusters, one of pbm-agent processes for every shard and the config server replica set writes backup snapshots into the remote backup storage directly. For logical backups, pbm-agents also write oplog slices. To learn more about oplog slicing, see Point-in-Time Recovery.

The mongos nodes are not involved in the backup process.

The following diagram illustrates the backup flow.

_images/pbm-backup-shard.png

Important

If you reshard a collection in MongoDB 5.0 and higher versions, make a fresh backup to prevent data inconsistency and restore failure.

Adjust node priority for backups

In Percona Backup for MongoDB prior to version 1.5.0, the pbm-agent to do a backup is elected randomly among secondary nodes in a replica set. In sharded cluster deployments, the pbm-agent is elected among the secondary nodes in every shard and the config server replica sets. If no secondary node responds in a defined period, then the pbm-agent on the primary node is elected to do a backup.

As of version 1.5.0, you can influence the pbm-agent election by assigning a priority to mongod nodes in the Percona Backup for MongoDB configuration file.

backup:
  priority:
    "localhost:28019": 2.5
    "localhost:27018": 2.5
    "localhost:27020": 2.0
    "localhost:27017": 0.1

The format of the priority array is <hostname:port>:<priority>.

To define priority in a sharded cluster, you can either list all nodes or specify priority for one node in each shard and config server replica set. The hostname and port uniquely identifies a node so that Percona Backup for MongoDB recognizes where it belongs to and grants the priority accordingly.

Note that if you listed only specific nodes, the remaining nodes will be automatically assigned priority 1.0. For example, you assigned priority 2.5 to only one secondary node in every shard and config server replica set of the sharded cluster.

backup:
  priority:
    "localhost:27027": 2.5  # config server replica set
    "localhost:27018": 2.5  # shard 1
    "localhost:28018": 2.5  # shard 2

The remaining secondaries and the primary nodes in the cluster receive priority 1.0.

The mongod node with the highest priority makes the backup. If this node is unavailable, next priority node is selected. If there are several nodes with the same priority, one of them is randomly elected to make the backup.

If you haven’t listed any nodes for the priority option in the config, the nodes have the default priority for making backups as follows:

  • hidden nodes - priority 2.0

  • secondary nodes - priority 1.0

  • primary node - priority 0.5

This ability to adjust node priority helps you manage your backup strategy by selecting specific nodes or nodes from preferred data centers. In geographically distributed infrastructures, you can reduce network latency by making backups from nodes in geographically closest locations.

Important

As soon as you adjust node priorities in the configuration file, it is assumed that you take manual control over them. The default rule to prefer secondary nodes over primary stops working.

Checking an in-progress backup

Important

As of version 1.4.0, the information about running backups is not available in the pbm list output. Use the pbm status command instead to check for running backups. See Percona Backup for MongoDB status for more information.

For Percona Backup for MongoDB version 1.3.4 and earlier, run the pbm list command and you will see the running backup listed with a ‘In progress’ label. When that is absent, the backup is complete.

As of version 1.7.0, the pbm list output includes the type of backup.

$ pbm list

  Backup snapshots:
    2021-12-13T13:05:14Z <physical> [complete: 2021-12-13T13:05:17Z]

Restoring a backup

Warning

Backups made with Percona Backup for MongoDB prior to v1.5.0 are incompatible for restore with Percona Backup for MongoDB v1.5.0 and later. This is because processing of system collections Users and Roles has changed: in v1.5.0, Users and Roles are copied to temporary collection during backup and must be present in the backup during restore. In earlier versions of Percona Backup for MongoDB, Users and Roles are copied to a temporary collection during restore. Therefore, restoring from these backups with Percona Backup for MongoDB v1.5.0 isn’t possible.

The recommended approach is to make a fresh backup after upgrading Percona Backup for MongoDB to version 1.5.0.

To restore a backup that you have made using pbm backup, use the pbm restore command supplying the time stamp of the backup that you intend to restore. Percona Backup for MongoDB identifies the type of the backup (physical or logical) and restores the database up to the backup completion time (available in pbm list output as of version 1.4.0).

Important

Consider these important notes on restore operation:

  1. Percona Backup for MongoDB is designed to be a full-database restore tool. As of version <=1.x, it performs a full all-databases, all collections restore and does not offer an option to restore only a subset of collections in the backup, as MongoDB’s mongodump tool does. But to avoid surprising mongodump users, as of versions 1.x, Percona Backup for MongoDB replicates mongodump’s behavior to only drop collections in the backup. It does not drop collections that are created new after the time of the backup and before the restore. Run a db.dropDatabase() manually in all non-system databases (these are all databases except “local”, “config” and “admin”) before running pbm restore if you want to guarantee that the post-restore database only includes collections that are in the backup.

  2. Whilst the restore is running, prevent clients from accessing the database. The data will naturally be incomplete whilst the restore is in progress, and writes the clients make cause the final restored data to differ from the backed-up data.

  3. If you enabled Point-in-Time Recovery, disable it before running pbm restore. This is because Point-in-Time Recovery incremental backups and restore are incompatible operations and cannot be run together.

$ pbm restore 2019-06-09T07:03:50Z

Adjust memory consumption

New in version 1.3.2: The Percona Backup for MongoDB config includes the restore options to adjust the memory consumption by the pbm-agent in environments with tight memory bounds. This allows preventing out of memory errors during the restore operation.

restore:
  batchSize: 500
  numInsertionWorkers: 10

The default values were adjusted to fit the setups with the memory allocation of 1GB and less for the agent.

Note

The lower the values, the less memory is allocated for the restore. However, the performance decreases too.

Restoring a backup in sharded clusters

Important

As preconditions for restoring a backup in a sharded cluster, complete the following steps:

  1. Stop the balancer.

  2. Shut down all mongos nodes to stop clients from accessing the database while restore is in progress. This ensures that the final restored data doesn’t differ from the backed-up data.

  3. Disable point-in-time recovery if it is enabled. To learn more about point-in-time recovery, see Point-in-Time Recovery.

Note that you can restore a sharded backup only into a sharded environment. It can be your existing cluster or a new one. To learn how to restore a backup into a new environment, see Restoring a backup into a new environment.

During the restore, the pbm-agent processes write data to primary nodes in the cluster. The following diagram shows the restore flow.

_images/pbm-restore-shard.png

After a cluster’s restore is complete, restart all mongos nodes to reload the sharding metadata.

Physical restore known limitations

Tracking restore progress via pbm status is currently not available during physical restores. To check the restore status, the options are:

  • Check the stderr logs of the leader pbm-agent. The leader ID is printed once the restore has started.

  • Check the status in the metadata file created on the remote storage for the restore. This file is in the root of the storage path and has the format .pbm.restore/<restore_timestamp>.json

After the restore is complete, do the following:

  • Restart all mongod nodes

  • Restart all pbm-agents

  • Run the following command to resync the backup list with the storage:

    $ pbm config --force-resync
    

Restoring a backup into a new environment

To restore a backup from one environment to another, consider the following key points about the destination environment:

  • Replica set names (both the config servers and the shards) in your new destination cluster and in the cluster that was backed up must be exactly the same.

  • Percona Backup for MongoDB configuration in the new environment must point to the same remote storage that is defined for the original environment, including the authentication credentials if it is an object store. Once you run pbm list and see the backups made from the original environment, then you can run the pbm restore command.

    Of course, make sure not to run pbm backup from the new environment whilst the Percona Backup for MongoDB config is pointing to the remote storage location of the original environment.

Restoring into a cluster / replica set with a different name

Starting with version 1.8.0, you can restore logical backups into a new environment that has the same or more number of shards and these shards have different replica set names.

To restore data to the environment with different replica set names, configure the name mapping between the source and target environments. You can either set the PBM_REPLSET_REMAPPING environment variable for pbm CLI or use the --replset-remapping flag for PBM commands. The mapping format is <rsTarget>=<rsSource>.

Important

Configure replica set name mapping for all shards in your cluster. Otherwise, Percona Backup for MongoDB attempts to restore the unspecified shard to the target shard with the same name. If there is no shard with such name or it is already mapped to another source shard, the restore fails.

Configure the replica set name mapping:

  • Using the environment variable for pbm CLI in your shell:

    $ export PBM_REPLSET_REMAPPING="rsX=rsA,rsY=rsB"
    
  • Using the command line:

    $ pbm restore <timestamp> --replset-remapping="rsX=rsA,rsY=rsB"
    

The --replset-remapping flag is available for the following commands: pbm restore, pbm list, pbm status, pbm oplog-replay.

Note

Don’t forget to make a fresh backup on the new environment after the restore is complete.

This ability to restore data to clusters with different replica set names and the number of shards extends the set of environments compatible for the restore.

Canceling a backup

You can cancel a running backup if, for example, you want to do another maintenance of a server and don’t want to wait for the large backup to finish first.

To cancel the backup, use the pbm cancel-backup command.

$ pbm cancel-backup
Backup cancellation has started

After the command execution, the backup is marked as canceled in the pbm status output:

$ pbm status
...
2020-04-30T18:05:26Z  Canceled at 2020-04-30T18:05:37Z

Deleting backups

Use the pbm delete-backup command to delete a specified backup or all backups older than the specified time.

The command deletes the backup regardless of the remote storage used: either S3-compatible or a filesystem-type remote storage.

Note

You can only delete a backup that is not running (has the “done” or the “error” state).

As of version 1.4.0, pbm list shows only successfully completed backups. To check for backups with other states, run pbm status.

To delete a backup, specify the <backup_name> as an argument.

$ pbm delete-backup 2020-12-20T13:45:59Z

By default, the pbm delete-backup command asks for your confirmation to proceed with the deletion. To bypass it, add the -f or --force flag.

$ pbm delete-backup --force 2020-04-20T13:45:59Z

To delete backups that were created before the specified time, pass the --older-than flag to the pbm delete-backup command. Specify the timestamp as an argument for pbm delete-backup in the following format:

  • %Y-%M-%DT%H:%M:%S (for example, 2020-04-20T13:13:20Z) or

  • %Y-%M-%D (2020-04-20).

$ #View backups
$ pbm list
Backup snapshots:
  2020-04-20T20:55:42Z
  2020-04-20T23:47:34Z
  2020-04-20T23:53:20Z
  2020-04-21T02:16:33Z
$ #Delete backups created before the specified timestamp
$ pbm delete-backup -f --older-than 2020-04-21
Backup snapshots:
  2020-04-21T02:16:33Z

Viewing backup logs

As of version 1.4.0, you can see the logs from all pbm-agents in your MongoDB environment using pbm CLI. This reduces time for finding required information when troubleshooting issues.

Note

The log information about restores from physical backups not available in pbm logs.

To view pbm-agent logs, run the pbm logs command and pass one or several flags to narrow down the search.

The following flags are available:

  • -t, --tail - Show the last N rows of the log

  • -e, --event - Filter logs by all backups or a specific backup

  • -n, --node - Filter logs by a specific node or a replica set

  • -s, --severity - Filter logs by severity level. The following values are supported (from low to high):

  • D - Debug

  • I - Info

  • W - Warning

  • E - Error

  • F - Fatal

  • -o, --output - Show log information as text (default) or in JSON format.

  • -i, --opid - Filter logs by the operation ID

Examples

The following are some examples of filtering logs:

Show logs for all backups

$ pbm logs --event=backup

Show the last 100 lines of the log about a specific backup 2020-10-15T17:42:54Z

$ pbm logs --tail=100 --event=backup/2020-10-15T17:42:54Z

Include only errors from the specific replica set

$ pbm logs -n rs1 -s E

The output includes log messages of the specified severity type and all higher levels. Thus, when ERROR is specified, both ERROR and FATAL messages are shown in the output.

Implementation details

pbm-agents write log information into the pbmLog collection in the PBM Control collections. Every pbm-agent also writes log information to stderr so that you can retrieve it when there is no healthy mongod node in your cluster or replica set. For how to view an individual pbm-agent log, see How to see the pbm-agent log.

Note that log information from pbmLog collection is shown in the UTC timezone and from the stderr - in the server’s time zone.