DB, Managed Services, Uncategorized

Backup of a mongoDB(with replica sets)

MongoDB is one of the most popular NoSQL database(open-source document-oriented database) engines , It is famous for being scalable , powerful , reliable and easy to use .Here the documents are created and stored in BSON files , Binary JSON (JavaScript Object Notation) format .This NoSQL solution comes with embedding, auto-sharding, and on-board replication for better scalability and high availability .

Backing Up a MongoDB Database

There are two ways in which backup can be taken for a mongo db namely Online backup and offline backup

Online backup :
This mode of backup is usually taken without locking the database ,using mongodump command ..This can be considered when using a non-scalable mongo (without any replica sets)

mongodump command :  ./mongodump –db newdb –out /opt/backup/mongo_backups/`date +”%F”`

just in case –db flag is not used then all the databases are backed up ( ./mongodump – for backing up all the databases at once )

–out specifies the destination where mongodump has to be stored , if this flag is not specified dump gets stored in the  bin directory

Inorder to restore the database to the latest backup , mongorestore command should be used

mongorestore comand: ./mongorestore –db newdb –drop /opt/backup/mongo_backups/2017-09-06/newdb/

–db flag to specify the database name to recover

–drop to drop the database and start it cleanly

Offline backup :

There are cases when online backups fail because they are taken on a running database(imagine if there is any write traffic to the DB at that particular point in time) , just to make sure that the backup taken is perfect ..offline backup can be tried where the database is locked to protect the db from the write traffic at that moment when backup is being taken .

This would be a suggestible solution when there are replica sets for your primary mongo . In that case a lock will be applied to one of the replica sets so that the write traffic is not hampered (as all the write traffic always  reaches the primary)

So now only one replica set will be in sync with the master(primary mongo) and the locked replica will be used for taking a backup .

After locking the replica, either the database folder on the server is rsynced to the backup directory or a vm image(ami if mongo is running in aws) can be taken for the server.

After the copy of the database is done , then the DB is unlocked .Once the replica is unlocked it comes in sync with the primary and everything functions as usual .

In the below scenario mongos are installed in aws server and also there will be two scripts running ,one main script which can be placed in a control server and the backup script which will be placed in mongo servers .

Reason behind dividing the scripts  into main and backup scripts is that , in mongo there are no fixed primary and secondaries , once the primary mongo goes  down then one of the secondary comes up as primary mongo and when later when the mongo which went down as primary is brought up then it joins the cluster as secondary mongo .So inorder to avoid any further changes to scripts this division of scripts has come into picture.

Main script :

This script scans the ips (a,b,c in this case) for the secondary mongo and once a secondary mongo is encountered then the backup script has to get executed on that respective server and the script exits once first secondary mongo is encountered .

For example , In a,b and c if a is primary mongo then loop runs to validate the state of b , just in-case of b is a secondary mongo the backup script in b is triggered and the loop exits

script location – place it in any control server other than mongo servers

for ip in { a b c  }
echo $ip
secondary=`ssh -i ~/test.pem -o StrictHostKeyChecking=no ec2-user@$ip “echo ‘db.isMaster().secondary’ | /opt/mongodb/bin/mongo”`  # command to check if the mongo is secondary or not
if [[ `echo $secondary | cut -d” ” -f8` == “true” ]] ; then
 ssh -i ~/test.pem -o StrictHostKeyChecking=no ec2-user@$ip “sudo /bin/bash /opt/backup.sh”  # once the secondary mongo is encountered  then the back up script is executed on the same server


Backup script :

This script is located in all mongo servers including primary and secondary , and it exits when the current server is primary (just re-validating again if the script is  not getting executed on primary ) .. it locks the secondary mongo and then the mongodb folder is synced to backup directory , Once that is successfully synced then the database is unlocked , in the meanwhile the backup directory is tar balled and then pushed to s3 .

mongo bin path : /opt/mongodb/bin

backup directory path : /opt/backup/

currenttime=`date +”%r”`
DateStamp=`date +%d%m%y`
mkdir -p /opt/backup/$DateStamp
secondary=`echo ‘db.isMaster().secondary’ | /opt/mongodb/bin/mongo`
if [[ `echo $secondary | cut -d” ” -f8` == “true” ]] ; then
echo ‘use admin’ | /opt/mongodb/bin/mongo
echo ‘db.fsyncLock()’ | /opt/mongodb/bin/mongo
lock=`echo ‘db.currentOp().fsyncLock’ | /opt/mongodb/bin/mongo `
echo $lock
if [[ `echo $lock | cut -d” ” -f8` == “true” ]] ; then
rsync -avz /opt/mongodb $BackupDirectory
if [ $? == “0” ] ; then
echo “back up successful”
echo ‘db.fsyncUnlock()’ | /opt/mongodb/bin/mongo
unlock=`echo ‘db.currentOp().fsyncLock’ | /opt/mongodb/bin/mongo`
if [ `echo $unlock | cut -d” ” -f8` == “bye” ] ; then
echo “db is successfully unlocked”
echo “$ip db is not unlocked ” | mail -s “backup successfull but database not unlocked properly” abc.xyz@techaspect.com,xyz.abc@techaspect.com
echo “back up not successful”
echo “db is  not locked”
cd /opt/backup
tar -czvf mongobkp_$DateStamp.tar.gz $DateStamp
rm -rf /opt/backup/$DateStamp
aws s3 sync /opt/backup s3://mongo-backup-uat/ ; rm -f /opt/backup/*.tar.gz
echo “database is not a secondary to take a backup”


While restoring just copy the specific dated tar from s3 on to the local and then start the mongo

Note : There might be other ways of doing the same , but we followed this approach.

About The Author

Leave a Reply