Course Objectives
Performing basic operations through shell prompt

Performing aggregation functions in shell

Create and manage indexes
To import and export data

Session Plan
Querying Mongo
Backup and Restore
Import and Export Data
Mongo tools

Querying Mongo

Querying Mongo : Selection


Example : db.trainees.find({stream: Java, track: fast track})

This will return all those documents where the stream equals Java
and track equals fast track.

Querying Mongo : Selection & Projection


db.trainees.find({stream: Java, track: fast track},{name:1,
This will return the name, emp_id and the document _id of all those
documents where the stream equals Java and track equals fast track
db.trainees.find({},{name:1, emp_id:1, _id:0})
This will return all trainees name and emp_id as there is no where
clause. Also null can be specified instead of {}.

Querying Mongo : Operators


v1},{ k2: v2}]})

will select the documents if any one condition is satisfied

Example: db.trainees.find({batch: Jan12CS, $or:[{stream:
Java},{stream: OS},{track: intermediate}]})
This will select the trainees from Jan12CS batch who belong to either
Java stream or to OS stream or if they are from intermediate track
db.collection_name.find({key1: {$gt: v1}})

will fetch all documents with key1s value greater than v1

Example: db.trainees.find({GPA: {$gt: 4, $lt: 4.9}})
This will return all the trainee details who have a GPA of between 4 and 4.9
Similarly, we have gte, lt, lte and ne to check greater than or equal, less
than, less than or equal and not equals respectively

Querying Mongo : Operators

db.collection_name.find({key: {$in: [v1, v2]}})

will retreive those documents whose key value is equal to either v1 or

Example: db.trainees.find({stream: {$in: [Java, OS]}}) will retrieve
the Java and OS stream trainee details

{$nin: [Java, OS]} will retrieve trainee details who are not in both
Java and OS.
db.collection_name.find({key: {$all: [v1, v2]}})

will retreive those documents that have all the values passed in the
argument as values of the key array
Example: db.trainees.find({module: {$all: [JPA, POJO]}}) will retreive
those trainee details whose module array has JPA and POJO in it

Querying Mongo : Operators

db.collection_name.find({key: {$size: value}})

will retrieve the documents whose key array has size specified in
Example: db.trainees.find({module: {$size: 4}}) will select the
trainees who have completed 4 modules
$size operator cannot have a range as its value, i.e., $gt, $lt, $gte, $lte,
$ne cannot be used with $size operators value, the value can only be an
db.collection_name.find({},{array_field :{$slice : n}})

n can be positive or negative

this will return only the first n values from the given array (if n is
positive) or the last n values in the given array (when n is negative)

Querying Mongo : Operators

db.collection_name.find({key: {$exists:true})

useful to retrieve only those documents which have entries for a

particular key.
Example: db.trainees.find(certification: {$exists: true}) will select
the details of those trainees who have done some certification.

Querying Mongo : Operators

db.collection_name.find({key: perl_compatible_regex})
select those documents whose key has value that matches with the
given perl_compatible_regex.
db.trainees.find({name: /an/i}) will retrieve all trainees whose name
has the alphabet series an in their name
The i at the end specifies that the regular expression is case

db.trainees.find({emp_id: /^620/}) will retrieve all trainees whose

emp_id starts with 620.

Querying Mongo : Operators

db.collection_name.find({key: {$type: value}})
will select only those documents whose keys values data type
matches with the data type passed.
Assume for the field certification, few documents have the name of the
course (whose data type will be string), few have the number of
certification(type Double) and few have null. If you want to select only
those documents with the course name, then use the query,
db.trainees.find({certification: {$type: 2}})
Data types and the values to be passed:
Double - 1; String 2; Array 4; Object id 7;
Boolean 8; Date 9; Null 10; Regular Expression 11

Querying Mongo : Operators

Accessing embedded document

db.collection_name.find({parent_key.emdedded_key: value}) is
used to find those documents whose embedded documents
embedded_keys is equal to the value passed.
Eg: db.trainees.find({project.IDE: Eclipse}) will retrieve the trainees
who use Eclipse IDE for their project.

Querying Mongo : Operators


Consider a collection CDP with the following documents

{ emp_id: 101, certification:

[ { name: Big Data, grade: A },
{ name: AWS, grade: B }

{ emp_id: 102, certification:

[ { name: Hadoop, grade: B },
{ name: AWS, grade: A }

Problem Statement: To find all the employees who are certified in AWS with grade A
Expected output: Only the second document must be returned

Querying Mongo : Operators

db.CDP.find({ AWS, certification.grade: A})

This will return both the documents because

In the first document, there is an array element with name AWS and
there is also another array element with grade A, thus satisfying both the
selection condition

The second document is displayed because it has an array element

that has both the name as AWS and grade as A

So to get the desired output (only the second document), a documents

should be selected only when both conditions are satisfied by the single
element of the array
For this we use $elemMatch operator
So the query to do the same will be
db.CDP.find({ certification: { $elemMatch: { name: AWS, grade: A}}})

Querying Mongo : Operators


Helps to use javascript expression (as a string) or javascript functions

in query
The javascript expression or function is processed against each

Each document is referred using this or obj in the javascript

db.trainees.find({$where: this.currentCDP > this.previousCDP})
db.trainees.find({$where: function() {return (this.currentCDP > this.previousCDP)})

Always $where is executed as the last filter during selection

Querying Mongo : Functions


Number of documents in the given collection


Number of objects scanned, time taken to scan and other useful


Returns an array of distinct values for the key

gives all the commands that can be performed on the collection

gives all the commands that can be performed on the database

Querying Mongo : Functions


gives information about the database such as name, number of

collections and indexes, and the amount of memory used by it

gives the number of indexes on that collection, total size of all indexes
and individual size of each index along with other information

gives the details of the last error that occurred during a write operation
if any

Querying Mongo : Functions

db.serverStatus() will give details about the host server, the mongodb version,
the process (mongod / mongos), the memory used by the server, no. of client
connections, the different operations executed by the server, and the cursor type
db.currentOp() returns an array that contains various information (like
operationId, secs running, operation name, namespace, the client that issued the
operation, lock status) about all the currently executing operations.

Querying Mongo : Functions

To copy database between two server instances, copyDatabase() function can be
used from the destination server instance

db.copyDatabase(mysourcedb, mydestdb,

Will copy the database mysourcedb from the server running at

MYSGEC240748D:27017 to the destination server (current server)
with the name mydestdb

Querying Mongo : Limiting & Ordering


limits the results to n documents.

db.collection_name.find().sort({key: n})

will sort the result based on the field key

n can take either 1 or -1
1 for ascending, and -1 for descending

Querying Mongo : Skipping and Chaining

skips the first n documents of the result set of find function

limit(), sort() and skip() function can be chained.

Example: db.trainees.find().sort({emp_id: 1}).limit(10).skip(5) will

display the ten trainee details sorted based on their emp_id after
skipping the first five in the result generated by find().

Quiz : Provide the Mongo equivalent


INSERT INTO users(user_id, age, status) VALUES ("bcd001", 45, "A")

Answer: db.users.insert( { user_id: "bcd001", age: 45, status: "A" } )

2. SELECT user_id, status FROM users

Answer: db.users.find( { }, { user_id: 1, status: 1, _id: 0 } )

3. SELECT user_id, status FROM users WHERE status = "A

Answer: db.users.find( { status: "A" }, { user_id: 1, status: 1, _id: 0 } )

4. SELECT * FROM users WHERE status = "A" OR age = 50

Answer : db.users.find( { $or: [ { status: "A" } , { age: 50 } ] } )

5. SELECT * FROM users WHERE age > 25 AND age <= 50

Answer: db.users.find( { age: { $gt: 25, $lte: 50 } } )

Quiz : Provide the Mongo equivalent

6. SELECT * FROM users WHERE user_id like "%bc%"
Answer: db.users.find( { user_id: /bc/ } )

7. SELECT * FROM users WHERE status = "A" ORDER BY user_id DESC

Answer: db.users.find( { status: "A" } ).sort( { user_id: -1 }


Answer: db.users.count() OR db.users.find().count()

9. SELECT COUNT(user_id) FROM users

Answer : db.users.count( { user_id: { $exists: true } } )

10. SELECT DISTINCT(status) FROM users

Answer: db.users.distinct( "status" )


Simple Aggregation Functions

db.collection_name.count() gives the number of documents present in the collection.
db.collection_name.count({JSON for where clause}) will give the number of documents
with the specified selecting criteria.

db.collection_name.distinct(key) will return the documents with distinct values for the
passed key
db.collection_name.distinct(key, {JSON for where clause} ) will return documents that
meets the search criteria and with distinct values for the passed key

Simple Aggregation Functions (Contd.)

Assume there is a employee_details relational database
table with fields emp_no, emp_name, role, experience and

The MongoDB document equivalent will be like

{emp_no: 6475, emp_name: amit, role: project lead, experience:
7, resources_allocated: 5 }

Simple Aggregation Functions (Contd.)

Now if we have to group the employee_details based on the role and calculate
the sum of resources allocated to each role, the SQL query will be as follows
SELECT role, SUM(resources_allocated) as total_resources
FROM emloyee_details

The MongoDB equivalent will be

db. {
key: {role: 1 },
reduce: function ( cur, result ) {
result.total_resources += cur.resources_allocated;

initial: { total_resources : 0 }

Simple Aggregation Functions (Contd.)

Now if we have to group those employee_details whose experience is less than 3, based on the
role and calculate the sum of resources allocated to each role, the SQL query will be as follows
SELECT role, SUM(resources_allocated) as total_resources
FROM emloyee_details
WHERE experience < 3

The MongoDB equivalent will be

db. {
key: {role: 1 },
cond: {experience : { $lt: 3 } },
reduce: function ( cur, result ) {
result.total_resources += cur.resources_allocated;
initial: { total_resources : 0 }

Simple Aggregation Functions (Contd.)

Now if we have to group those employee_details whose experience is less than 3, based on the role and
experience, and then calculate the sum of resources allocated to each role, the SQL query will be as follows

SELECT role, experience, SUM(resources_allocated) as total_resources

FROM emloyee_details
WHERE experience < 3
GROUP BY role, experience

The MongoDB equivalent will be

db. {
key: {role: 1, experience: 1 },
cond: {experience : { $lt: 3 } },
reduce: function ( cur, result ) {
result.total_resources += cur.resources_allocated;
initial: { total_resources : 0 }

Simple Aggregation Functions (Contd.)

Group Syntax:{key, reduce, initial, [keyf,][cond,][finalize]})
Key specifies the key based on which grouping should be done.
Reduce it is a function that specifies what operation (like count, sum) has to be performed on the
grouping documents. The function takes two parameters the current document and the result till
the aggregation of previous document.
Initial the result set of the aggregation operation will be initialized with this value at the beginning
of the operation.
Keyf it is an alternative for key field. This function is defined when grouping has to be done
based on some derived values rather than the fields.
Cond specifies the selection criteria. Only the documents qualifying with this condition will be
considered for grouping.
Finalize it is a function that specifies the changes that need to be done to the final result set.

group function cannot be used with sharded cluster. For sharded cluster,
aggregation framework has to be used.

Aggregation framework
Used to calculate aggregated values without map-reduce on a sharded cluster
Provides similar functionality to

GROUP BY and related SQL operators

Provides simple forms of self joins

Have projection capabilities which reshapes the result

Framework Components
Pipelines are the different properties that the aggregation framework provides.
These properties can be chained. The different pipeline properties are
$project, $match, $limit, $skip, $unwind, $group, $sort

Expressions are the operators that calculate values wen the pipeline properties
are performed. The various expressions that are available are classified into
Boolean, comparison, arithmetic, string, date and conditional operators.

Aggregation framework (Contd.)

$project helps to select particular fields.
{ $project : {
role : 1 ,
experience : 1 ,

This will retrieve the role, experience and _id from all the documents.

Aggregation framework (Contd.)

$match used to filter documents using a selection criteria
db. employee_details.aggregate(
{ $match : { role : project lead } }

This will return those documents whose role is project lead

Aggregation framework (Contd.)

$limit used to limit the number of documents displayed
{ $limit : 5 }

This will display only 5 documents from the collection.

Aggregation framework (Contd.)

$skip skips the specified number of documents from the result set
{ $skip : 5 }

This skips the first 5 documents in the result set.

Aggregation framework (Contd.)

$unwind if there are n values in an array field, if unwind is set to the field, it
creates n copies of the document, each copy having one value from the array
{ $project : {
emp_no : 1 ,
emp_name : 1 ,
specialization : 1
{ $unwind : "$specialization" }

In above example, it is assumed that specialization is an array field. Suppose if there are
2 specialization for a particular document, then the output will have two entries with same
emp_no and emp_name but differs only by the specialization.

Aggregation framework (Contd.)

$group performs grouping operation
{ $group : {
_id : $role,
tot_no_of_emp_in_this_role : { $sum : 1 },
tot_resources : { $sum : $resources_allocated }

This will group the employees based on role (given as the value of _id). Also it displays
the total number of employees under the particular role, as it adds 1 to the groups count
each time it encounters a matching document. And it gives the total resources allocated to
that role, as it adds up the individual resources of that groups employees.

Aggregation framework (Contd.)

$group must have any of the following aggregate function with it to develop the
composite value.
$addToSet, $first, $last, $max, $min, $avg, $push, $sum

Aggregation framework (Contd.)

$sort sorts the result set
{ $sort : { experience : 1 } }

This sorts the result set based on the experience.

Aggregation framework
More Examples

SUM(resources_allocated) AS
FROM employee_details

db. employee_details.aggregate(
{ $group: { _id: null,
total_resources: { $sum:
"$resources_allocated" } } }

Sum the resources_allocated field

from employee_details

Aggregation framework
More Examples Contd.


Performance can be increased by proper implementation of Indexes

Indexes increases the speed of read operations

Index can be created on any field using the following syntax

1 represents ascending Index and -1 represents descending Index

Index can be dropped by

Indexes are auto updated after every insert

ensureIndex() method can have an optional second parameter

Few values which it can take

{unique: true} : to create a unique index

{background: true} : the system does not wait for the index to be
created. Index will be created in the background
{sparse: true} : will create indexes only on those documents that has
the indexed field in it
{dropDups: true} : will delete those documents that has duplicated
values for the indexed fields

db.collection_name.getIndexes() will get all the Indexes created on the particular
db.collection_name.reIndex() rebuilds all the indexes on the particular collection
db.collection_name.totalIndexSize() will give the total size in bytes of all the

Index Types
_id Index
_id index is a unique index on the _id field
MongoDB creates this index by default on all collections
Cannot delete the index on _id.

Secondary Indexes
All indexes in MongoDB are secondary indexes
Can create indexes on any field within any document or sub-document
It can be Indexes on Sub-documents, Embedded Fields or Compound Indexes

Backup and Restore

Backup and Restore

Backups of the databases can be created by instantiating the
mongodump application (present in the bin folder)
The syntax is

mongodump --out path_to_store_backup

It can also be customized to backup a particular database or collection
mongodump --out path_to_store_backup --db
database_name --collection collection_name

To restore a backup, mongorestore application have to be instantiated

mongorestore --collection collection_name --db


Backup and Restore - Cont

MongoDB export

To export a collection from the server to local machine to a json

or csv file, mongoexport application can be used
mongoexport --db database_name --collection collection_name -out path_to_json\file_name.json

mongoexport --db database_name --collection collection_name -csv --out path_to_json\file_name.csv --fields


Backup and Restore Contd.

MongoDB import

To import data from a json or csv file to a collection,

mongoimport application can be used
mongoimport --db dest_database_name --collection
dest_collection_name path_to_input_json

mongoimport --type csv --db dest_database_name --collection

dest_collection_name path_to_input_csv --fields

Querying Mongo

Backup and Restore
Import and Export Data
Mongo tools


Karl Seguin, The Little MongoDB Book

Kristina Chodorow & Michael Dirolf, MongoDB: The Definitive Guide, O'Reilly
Media, 2010

Thank You


