Tag Archives: Mongodb

Remove duplicate documents from collection in mongodb 3.X

From mongodb 2.6 “dropDups” option with creation of unique index is deprecated.

Use the below command to remove the duplicate documents in mongodb.

db..find({}, {“”:1}).sort({_id:1}).forEach
(
function(doc)
{
    db..remove({_id:{$gt:doc._id}, <attribute/column name:doc.<attribute/column name});
}
)

How to take backups in mongodb

There are three common methods to take the backup in mongodb. 

  • Take the backup using the filesystem snapshot :

This is the easiest method to take the backup. For taking the snapshot backup two requirements are there.
a. Filesystem should support the snapshot.
b. The database should have journaling enabled.
Restoring the backup:-
Stop the mongod service if it is running.
Restore the snapshot.
Start the mongod service , this will use the journal logs and recover the database to consistent state.

  • Copying data files :


Another way of creating backups is to make a copy of everything in the data directory. Copying will happen file by file so there may be a chance that one file will be modified while copying another which will create inconsistency. So we will lock all the data files before copying using the below command.
db.fsyncLock()
O/P :-
    “info” : “now locked against writes, use db.fsyncUnlock() to unlock”
    “ok” : NumberInt(1)
    }
This command will flush the data in buffer to the data files before locking the files. All the databases write operations in the cluster are in locked state. The writes will be in queue. Once the command returns star copying files.
cp -R /data/db/ /backup/
Once the copy is completed execute the below command to unlock the files.
db.fsyncUnlock()
O/P:
    “info” : “unlock completed”, 
    “ok” : NumberInt(1)
}

  • Mongodump :

Mongodump is a tool to take backup in binary format, like taking dump in other databases (In oracle expdp and postgres pgdump). This is slow in taking backup and doing restorations but we can take the backups at collection level also using this.
mongodump only captures the documents in the database in its backup data and does not include index data. mongorestore or mongod must then rebuild the indexes after restoring data.
mongodump will create a dump directory in the current directory, which contains a dump of all of your data.
To back up all the collections in the database:
C:\BACKUP>mongodump -h localhost:27017
2016-04-26T19:25:39.359+0530    writing test.employee to
2016-04-26T19:25:39.360+0530    writing test.emp to
2016-04-26T19:25:39.362+0530    done dumping test.emp (1 document)
2016-04-26T19:25:39.362+0530    done dumping test.employee (2 documents)
This command backed up all the collections(employee and emp) in the test database.
Some useful options of command:
–db , -d  
Take the backup of this database (all collections)
–collection , -c
Specify the collection to take the backup.
C:\BACKUP>mongodump -h localhost:27017 -d test -c emp
2016-04-26T19:41:53.114+0530    writing test.emp to
2016-04-26T19:41:53.116+0530    done dumping test.emp (1 document)
–query , -q
Query to include only some documents in the backup based on query.
–queryFile
File containing the query
–forceTableScan
Forces mongodump to scan the data store directly: typically, mongodump saves entries as they appear in the index of the _id field.
–gzip
Zip the backup file.
–out , -o
Location of directory where the backup files will be created.
–archive
Write the data to single archive file.
–repair
This will write only valid data. Do not write objects which are in invalid state.
–oplog
Creates a file named oplog.bson as part of the mongodump output. The oplog.bson file, located in the top level of the output directory, contains oplog entries that occur during the mongodump operation. This file provides an effective point in-time snapshot of the state of a mongod instance. To restore to a specific point-in-time backup, use the output created with this option in conjunction with mongorestore –oplogReplay.
Without –oplog, if there are write operations during the dump operation, the dump will not reflect a single moment in time. Changes made to the database during the update process can affect the output of the backup.
–dumpDbUsersAndRoles
–excludeCollection
–excludeCollectionsWithPrefix
–numParallelCollections
Number of parallel collections to be exported by default 4.
mongorestore command is used to restore the data from backup created by mongodump.

Starting and stopping mongodb server

By using the “mongod” command we can start the database and configure the database using the same. Below are basic options to configure the mongo db.

Common configuration options :-

–port : port number (by default 27017).
–bind_ip : list of comma separated ip address on which the listener will listen.
–logpath : log file to send write to instead of stdout.
                 db.adminCommand({“setParameter” : 1, “logLevel” : 3}) èthis will set the loglevel (we can set upto 5) for more info about the mongod operations and debugging.
                 db.setProfilingLevel(1, 500) à log the queries taking more time than 500ms.
–logrotate : set the log rotation behavior
     Rename:Rename the existing logfile with timestamp and open a new logfile.
     Reopen : Open the new logfile with same name. (This expect a new process will rename the logfile ,       before it reopen)
–config : use the configuration file for additional options not mentioned in the command line.

Specifically for windows :-

———————————————–
–install : create a service
–remove : remove service
–reinstall : reinstall the service
–serviceName : name of the service
–serviceDisplayName : service display name
–serviceUser : account for service execution
———————————————–

–storageEngine :What is the storage engine need to be used. (by default weired tiger)
–dbpath :directory for datafiles (when we start the mongo instance , it will create a lock file in this location , represent an instance is running on this directory).
–directoryperdb :Each db will be stored in separate directory.
–noprealloc :disable data file preallocation – this will hamper the performance.
–quota : limit the size of database by specifying the number of files.
–nssize : .ns files size in MB
–upgrade :upgrade database
–journal :enable journaling
–nojournal :disable journaling
Ex:-
Configuring mongodb service.

C:\Program Files\MongoDB\Server\3.2\bin\mongod.exe –dbpath C:\MYTEST\DB –port 12345 –bind_ip 192.9.1.101 –logpath C:\MONGOLOGS\back.log –logRotate rename –directoryperdb –install –serviceName “mymongodb”
This will create a service named “mymongodb”

net start mymongodb èStart the service , internally start the database instance. For the first time this will create all the database files and logfiles in the respective folders.
Starting the database for the first time take more as time as it need to create the database files.
Stopping mongodb server :-

db.shudownServer() – This command will stop the database server.
If we want to stop the server forcefully then we can use the force option.
db.shutdownServer{(“shutdown”:1,“force”:true)}
                OR
We can also use the linux kill command to kill the server process.