Percona Server for MongoDB Sharding¶
About sharding¶
Sharding provides horizontal database scaling, distributing data across multiple MongoDB Pods. It is useful for large data sets when a single machine’s overall processing speed or storage capacity turns out to be not enough. Sharding allows splitting data across several machines with a special routing of each request to the necessary subset of data (so-called shard).
A MongoDB Sharding involves the following components:
shard
- a replica set which contains a subset of data stored in the database (similar to a traditional MongoDB replica set),mongos
- a query router, which acts as an entry point for client applications,config servers
- a replica set to store metadata and configuration settings for the sharded database cluster.
Note
Percona Operator for MongoDB 1.6.0 supported only one shard of
a MongoDB cluster; still, this limited sharding support allowed using
mongos
as an entry point instead of provisioning a load-balancer per
replica set node. Multiple shards are supported starting from the Operator
1.7.0. Also, before the Operator 1.12.0 mongos were deployed by the Deployment
object, and starting from 1.12.0 they are deployed by the StatefulSet one.
Turning sharding on and off¶
Sharding is controlled by the sharding
section of the deploy/cr.yaml
configuration file and is turned on by default.
To enable sharding, set the sharding.enabled
key to true
. This will turn
existing MongoDB replica set nodes into sharded ones).
To disable sharding, set the sharding.enabled
key to false
.
If backups are disabled (the
backup.enabled
Custom Resource option set to false
),
the Operator will turn sharded MongoDB instances into unsharded one by one,
so the database cluster will operate without downtime. If backups are enabled
(the backup.enabled
Custom Resource option is true
),
the Operator will pause the cluster (to avoid Percona Backup for MongoDB
misconfiguration), update the instances, and then unpause it back.
Configuring instances of a sharded cluster¶
When sharding is turned on, the Operator runs replica sets with config
servers and mongos instances. Their number is controlled by
configsvrReplSet.size
and mongos.size
keys, respectively.
Config servers have cfg
replica set name by default, which is used by the
Operator in StatefulSet and Service names. If this name needs to be
customized (for example when migrating MongoDB cluster from barebone
installation to Kubernetes), you can override the default cfg
variant using
replsets.configuration
Custom Resource option in deploy/cr.yaml
as follows:
...
configuration: |
replication:
replSetName: customCfgRS
...
Note
Config servers for now can properly work only with WiredTiger engine, and sharded MongoDB nodes can use either WiredTiger or InMemory one.
By default replsets section of the
deploy/cr.yaml
configuration file contains only one replica set, rs0
.
You can add more replica sets with different names to the replsets
section
in a similar way. Please take into account that having more than one replica set
is possible only with the sharding turned on.
Note
The Operator will be able to remove a shard only when it contains no application (non-system) collections.
Checking connectivity to sharded and non-sharded cluster¶
With sharding turned on, you have mongos
service as an entry point to access
your database. If you do not use sharding, you have to access mongod
processes of your replica set.
To connect to Percona Server for MongoDB you need to construct the MongoDB connection URI string. It includes the credentials of the admin user, which are stored in the Secrets object.
-
List the Secrets objects
$ kubectl get secrets -n <namespace>
The Secrets object you are interested in has the
my-cluster-name-secrets
name by default. -
View the Secret contents to retrieve the admin user credentials.
The command returns the YAML file with generated Secrets, including the$ kubectl get secret my-cluster-name-secrets -o yaml
MONGODB_DATABASE_ADMIN_USER
andMONGODB_DATABASE_ADMIN_PASSWORD
strings, which should look as follows:Sample output
... data: ... MONGODB_DATABASE_ADMIN_PASSWORD: aDAzQ0pCY3NSWEZ2ZUIzS1I= MONGODB_DATABASE_ADMIN_USER: ZGF0YWJhc2VBZG1pbg==
The actual login name and password on the output are base64-encoded. To bring it back to a human-readable form, run:
$ echo 'MONGODB_DATABASE_ADMIN_USER' | base64 --decode $ echo 'MONGODB_DATABASE_ADMIN_PASSWORD' | base64 --decode
-
Run a container with a MongoDB client and connect its console output to your terminal. The following command does this, naming the new Pod
percona-client
:$ kubectl run -i --rm --tty percona-client --image=percona/percona-server-mongodb:7.0.18-11 --restart=Never -- bash -il
Executing it may require some time to deploy the corresponding Pod.
-
Now run
mongosh
tool inside thepercona-client
command shell using the admin user credentialds you obtained from the Secret, and a proper namespace name instead of the<namespace name>
placeholder. The command will look different depending on whether sharding is on (the default behavior) or off:$ mongosh "mongodb://databaseAdmin:databaseAdminPassword@my-cluster-name-mongos.<namespace name>.svc.cluster.local/admin?ssl=false"
$ mongosh "mongodb+srv://databaseAdmin:databaseAdminPassword@my-cluster-name-rs0.<namespace name>.svc.cluster.local/admin?replicaSet=rs0&ssl=false"