Sunteți pe pagina 1din 57

Usage Guidelines

Do not forward this document to any non-Infosys mail ID. Forwarding this document to a
non-Infosys mail ID may lead to disciplinary action against you, including termination of
employment.

Contents of this material cannot be used in any other internal or external document
without explicit permission from E&R@infosys.com.

Introduction to MongoDB
Education & Research
2012 Infosys Limited, Bangalore, India. All rights reserved. Infosys believes the information in this document is accurate as of its publication date; such
information is subject to change without notice. Infosys acknowledges the proprietary rights of other companies to the trademarks, product names and such
other intellectual property rights mentioned in this document. Except as expressly permitted, neither this document nor any part of it may be reproduced, stored
in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, printing, photocopying, recording or otherwise, without the prior
permission of Infosys Limited and/or any named intellectual property rights holders under this document.

ER/CORP/CRS/ERCLD0008/003

Confidential Information

This Document is confidential to Infosys Limited. This document contains information and data that Infosys considers confidential
and proprietary (Confidential Information).

Confidential Information includes, but is not limited to, the following:

Corporate and Infrastructure information about Infosys;

Infosys project management and quality processes;

Project experiences provided included as illustrative case studies.

Any disclosure of Confidential Information to, or use of it by a third party, will be damaging to Infosys.

Ownership of all Infosys Confidential Information, no matter in what media it resides, remains with Infosys.

Confidential information in this document shall not be disclosed, duplicated or used in whole or in part for any purpose without
specific written permission of an authorized representative of Infosys.

Course Objectives
Performing basic operations through shell prompt

Performing aggregation functions in shell


Create and manage indexes
To import and export data

Session Plan
Querying Mongo
Aggregation
Indexing
Backup and Restore
Import and Export Data
Mongo tools

Querying Mongo

Querying Mongo : Selection


db.collection_name.find({JSON_for_where_clause})

Example : db.trainees.find({stream: Java, track: fast track})


This will return all those documents where the stream equals Java
and track equals fast track.

Querying Mongo : Selection & Projection


db.collection_name.find({JSON_for_where_clause},
{JSON_for_select_clause})

Examples
db.trainees.find({stream: Java, track: fast track},{name:1,
emp_id:1})
This will return the name, emp_id and the document _id of all those
documents where the stream equals Java and track equals fast track
db.trainees.find({},{name:1, emp_id:1, _id:0})
This will return all trainees name and emp_id as there is no where
clause. Also null can be specified instead of {}.

Querying Mongo : Operators


db.collection_name.find({$or:[{key1:

v1},{ k2: v2}]})

will select the documents if any one condition is satisfied


Example: db.trainees.find({batch: Jan12CS, $or:[{stream:
Java},{stream: OS},{track: intermediate}]})
This will select the trainees from Jan12CS batch who belong to either
Java stream or to OS stream or if they are from intermediate track
db.collection_name.find({key1: {$gt: v1}})

will fetch all documents with key1s value greater than v1


Example: db.trainees.find({GPA: {$gt: 4, $lt: 4.9}})
This will return all the trainee details who have a GPA of between 4 and 4.9
Similarly, we have gte, lt, lte and ne to check greater than or equal, less
than, less than or equal and not equals respectively

Querying Mongo : Operators


db.collection_name.find({key: {$in: [v1, v2]}})

will retreive those documents whose key value is equal to either v1 or


v2.
Example: db.trainees.find({stream: {$in: [Java, OS]}}) will retrieve
the Java and OS stream trainee details

{$nin: [Java, OS]} will retrieve trainee details who are not in both
Java and OS.
db.collection_name.find({key: {$all: [v1, v2]}})

will retreive those documents that have all the values passed in the
argument as values of the key array
Example: db.trainees.find({module: {$all: [JPA, POJO]}}) will retreive
those trainee details whose module array has JPA and POJO in it

Querying Mongo : Operators


db.collection_name.find({key: {$size: value}})

will retrieve the documents whose key array has size specified in
value
Example: db.trainees.find({module: {$size: 4}}) will select the
trainees who have completed 4 modules
$size operator cannot have a range as its value, i.e., $gt, $lt, $gte, $lte,
$ne cannot be used with $size operators value, the value can only be an
integer
db.collection_name.find({},{array_field :{$slice : n}})

n can be positive or negative


this will return only the first n values from the given array (if n is
positive) or the last n values in the given array (when n is negative)

Querying Mongo : Operators


db.collection_name.find({key: {$exists:true})

useful to retrieve only those documents which have entries for a


particular key.
Example: db.trainees.find(certification: {$exists: true}) will select
the details of those trainees who have done some certification.

Querying Mongo : Operators


db.collection_name.find({key: perl_compatible_regex})
select those documents whose key has value that matches with the
given perl_compatible_regex.
Example:
db.trainees.find({name: /an/i}) will retrieve all trainees whose name
has the alphabet series an in their name
The i at the end specifies that the regular expression is case
insensitive.

db.trainees.find({emp_id: /^620/}) will retrieve all trainees whose


emp_id starts with 620.

Querying Mongo : Operators


db.collection_name.find({key: {$type: value}})
will select only those documents whose keys values data type
matches with the data type passed.
Assume for the field certification, few documents have the name of the
course (whose data type will be string), few have the number of
certification(type Double) and few have null. If you want to select only
those documents with the course name, then use the query,
db.trainees.find({certification: {$type: 2}})
Data types and the values to be passed:
Double - 1; String 2; Array 4; Object id 7;
Boolean 8; Date 9; Null 10; Regular Expression 11

Querying Mongo : Operators

Accessing embedded document


db.collection_name.find({parent_key.emdedded_key: value}) is
used to find those documents whose embedded documents
embedded_keys is equal to the value passed.
Eg: db.trainees.find({project.IDE: Eclipse}) will retrieve the trainees
who use Eclipse IDE for their project.

Querying Mongo : Operators

$elemMatch

Consider a collection CDP with the following documents

{ emp_id: 101, certification:


[ { name: Big Data, grade: A },
{ name: AWS, grade: B }
]
}

{ emp_id: 102, certification:


[ { name: Hadoop, grade: B },
{ name: AWS, grade: A }
]

Problem Statement: To find all the employees who are certified in AWS with grade A
Expected output: Only the second document must be returned

Querying Mongo : Operators


db.CDP.find({certification.name: AWS, certification.grade: A})

This will return both the documents because

In the first document, there is an array element with name AWS and
there is also another array element with grade A, thus satisfying both the
selection condition

The second document is displayed because it has an array element


that has both the name as AWS and grade as A

So to get the desired output (only the second document), a documents


should be selected only when both conditions are satisfied by the single
element of the array
For this we use $elemMatch operator
So the query to do the same will be
db.CDP.find({ certification: { $elemMatch: { name: AWS, grade: A}}})

Querying Mongo : Operators


$where

Helps to use javascript expression (as a string) or javascript functions


in query
The javascript expression or function is processed against each
document

Each document is referred using this or obj in the javascript

Example:
db.trainees.find({$where: this.currentCDP > this.previousCDP})
db.trainees.find({$where: function() {return (this.currentCDP > this.previousCDP)})

Always $where is executed as the last filter during selection

Querying Mongo : Functions


db.collection_name.find().count()

Number of documents in the given collection


db.collection_name.find().explain()

Number of objects scanned, time taken to scan and other useful


information
db.collection_name.distinct(key)

Returns an array of distinct values for the key


db.collection_name.help()

gives all the commands that can be performed on the collection


db.help()

gives all the commands that can be performed on the database

Querying Mongo : Functions


db.stats()

gives information about the database such as name, number of


collections and indexes, and the amount of memory used by it
db.collection_name.stats()

gives the number of indexes on that collection, total size of all indexes
and individual size of each index along with other information
db.getLastError()

gives the details of the last error that occurred during a write operation
if any

Querying Mongo : Functions


db.serverStatus() will give details about the host server, the mongodb version,
the process (mongod / mongos), the memory used by the server, no. of client
connections, the different operations executed by the server, and the cursor type
used.
db.currentOp() returns an array that contains various information (like
operationId, secs running, operation name, namespace, the client that issued the
operation, lock status) about all the currently executing operations.

Querying Mongo : Functions


To copy database between two server instances, copyDatabase() function can be
used from the destination server instance
Example:

db.copyDatabase(mysourcedb, mydestdb,
MYSGEC240748D:27017)

Will copy the database mysourcedb from the server running at


MYSGEC240748D:27017 to the destination server (current server)
with the name mydestdb

Querying Mongo : Limiting & Ordering


db.collection_name.find().limit(n)

limits the results to n documents.

db.collection_name.find().sort({key: n})

will sort the result based on the field key


n can take either 1 or -1
1 for ascending, and -1 for descending

Querying Mongo : Skipping and Chaining


db.collection_name.find().skip(n)
skips the first n documents of the result set of find function

limit(), sort() and skip() function can be chained.

Example: db.trainees.find().sort({emp_id: 1}).limit(10).skip(5) will


display the ten trainee details sorted based on their emp_id after
skipping the first five in the result generated by find().

Quiz : Provide the Mongo equivalent


1.

INSERT INTO users(user_id, age, status) VALUES ("bcd001", 45, "A")

Answer: db.users.insert( { user_id: "bcd001", age: 45, status: "A" } )

2. SELECT user_id, status FROM users


Answer: db.users.find( { }, { user_id: 1, status: 1, _id: 0 } )

3. SELECT user_id, status FROM users WHERE status = "A


Answer: db.users.find( { status: "A" }, { user_id: 1, status: 1, _id: 0 } )

4. SELECT * FROM users WHERE status = "A" OR age = 50

Answer : db.users.find( { $or: [ { status: "A" } , { age: 50 } ] } )

5. SELECT * FROM users WHERE age > 25 AND age <= 50


Answer: db.users.find( { age: { $gt: 25, $lte: 50 } } )

Quiz : Provide the Mongo equivalent


6. SELECT * FROM users WHERE user_id like "%bc%"
Answer: db.users.find( { user_id: /bc/ } )

7. SELECT * FROM users WHERE status = "A" ORDER BY user_id DESC


Answer: db.users.find( { status: "A" } ).sort( { user_id: -1 }

8. SELECT COUNT(*) FROM users


Answer: db.users.count() OR db.users.find().count()

9. SELECT COUNT(user_id) FROM users


Answer : db.users.count( { user_id: { $exists: true } } )

10. SELECT DISTINCT(status) FROM users


Answer: db.users.distinct( "status" )

AGGREGATION

Simple Aggregation Functions


Count
db.collection_name.count() gives the number of documents present in the collection.
db.collection_name.count({JSON for where clause}) will give the number of documents
with the specified selecting criteria.

Distinct
db.collection_name.distinct(key) will return the documents with distinct values for the
passed key
db.collection_name.distinct(key, {JSON for where clause} ) will return documents that
meets the search criteria and with distinct values for the passed key

Simple Aggregation Functions (Contd.)

Group
Assume there is a employee_details relational database
table with fields emp_no, emp_name, role, experience and
resources_allocated.

The MongoDB document equivalent will be like


{emp_no: 6475, emp_name: amit, role: project lead, experience:
7, resources_allocated: 5 }

Simple Aggregation Functions (Contd.)


Now if we have to group the employee_details based on the role and calculate
the sum of resources allocated to each role, the SQL query will be as follows
SELECT role, SUM(resources_allocated) as total_resources
FROM emloyee_details
GROUP BY role

The MongoDB equivalent will be


db. employee_details.group( {
key: {role: 1 },
reduce: function ( cur, result ) {
result.total_resources += cur.resources_allocated;

},
initial: { total_resources : 0 }
})

Simple Aggregation Functions (Contd.)


Now if we have to group those employee_details whose experience is less than 3, based on the
role and calculate the sum of resources allocated to each role, the SQL query will be as follows
SELECT role, SUM(resources_allocated) as total_resources
FROM emloyee_details
WHERE experience < 3
GROUP BY role

The MongoDB equivalent will be

db. employee_details.group( {
key: {role: 1 },
cond: {experience : { $lt: 3 } },
reduce: function ( cur, result ) {
result.total_resources += cur.resources_allocated;
},
initial: { total_resources : 0 }
})

Simple Aggregation Functions (Contd.)

Now if we have to group those employee_details whose experience is less than 3, based on the role and
experience, and then calculate the sum of resources allocated to each role, the SQL query will be as follows

SELECT role, experience, SUM(resources_allocated) as total_resources


FROM emloyee_details
WHERE experience < 3
GROUP BY role, experience

The MongoDB equivalent will be

db. employee_details.group( {
key: {role: 1, experience: 1 },
cond: {experience : { $lt: 3 } },
reduce: function ( cur, result ) {
result.total_resources += cur.resources_allocated;
},
initial: { total_resources : 0 }
})

Simple Aggregation Functions (Contd.)


Group Syntax:
db.collection_name.group({key, reduce, initial, [keyf,][cond,][finalize]})
Key specifies the key based on which grouping should be done.
Reduce it is a function that specifies what operation (like count, sum) has to be performed on the
grouping documents. The function takes two parameters the current document and the result till
the aggregation of previous document.
Initial the result set of the aggregation operation will be initialized with this value at the beginning
of the operation.
Keyf it is an alternative for key field. This function is defined when grouping has to be done
based on some derived values rather than the fields.
Cond specifies the selection criteria. Only the documents qualifying with this condition will be
considered for grouping.
Finalize it is a function that specifies the changes that need to be done to the final result set.

group function cannot be used with sharded cluster. For sharded cluster,
aggregation framework has to be used.

Aggregation framework
Used to calculate aggregated values without map-reduce on a sharded cluster
Provides similar functionality to

GROUP BY and related SQL operators

Provides simple forms of self joins


Have projection capabilities which reshapes the result

Framework Components
Pipelines are the different properties that the aggregation framework provides.
These properties can be chained. The different pipeline properties are
$project, $match, $limit, $skip, $unwind, $group, $sort

Expressions are the operators that calculate values wen the pipeline properties
are performed. The various expressions that are available are classified into
Boolean, comparison, arithmetic, string, date and conditional operators.

Aggregation framework (Contd.)


$project helps to select particular fields.
db.employee_details.aggregate(
{ $project : {
role : 1 ,
experience : 1 ,
}}
);

This will retrieve the role, experience and _id from all the documents.

Aggregation framework (Contd.)


$match used to filter documents using a selection criteria
db. employee_details.aggregate(
{ $match : { role : project lead } }
);

This will return those documents whose role is project lead

Aggregation framework (Contd.)


$limit used to limit the number of documents displayed
db.employee_details.aggregate(
{ $limit : 5 }
);

This will display only 5 documents from the collection.

Aggregation framework (Contd.)


$skip skips the specified number of documents from the result set
db.employee_details.aggregate(
{ $skip : 5 }
);

This skips the first 5 documents in the result set.

Aggregation framework (Contd.)


$unwind if there are n values in an array field, if unwind is set to the field, it
creates n copies of the document, each copy having one value from the array
db.employee_details.aggregate(
{ $project : {
emp_no : 1 ,
emp_name : 1 ,
specialization : 1
}},
{ $unwind : "$specialization" }
);

In above example, it is assumed that specialization is an array field. Suppose if there are
2 specialization for a particular document, then the output will have two entries with same
emp_no and emp_name but differs only by the specialization.

Aggregation framework (Contd.)


$group performs grouping operation
db.employee_details.aggregate(
{ $group : {
_id : $role,
tot_no_of_emp_in_this_role : { $sum : 1 },
tot_resources : { $sum : $resources_allocated }
}}
);

This will group the employees based on role (given as the value of _id). Also it displays
the total number of employees under the particular role, as it adds 1 to the groups count
each time it encounters a matching document. And it gives the total resources allocated to
that role, as it adds up the individual resources of that groups employees.

Aggregation framework (Contd.)


$group must have any of the following aggregate function with it to develop the
composite value.
$addToSet, $first, $last, $max, $min, $avg, $push, $sum

Aggregation framework (Contd.)


$sort sorts the result set
db.employee_details.aggregate(
{ $sort : { experience : 1 } }
);

This sorts the result set based on the experience.

Aggregation framework
More Examples

SELECT
SUM(resources_allocated) AS
total_resources
FROM employee_details

db. employee_details.aggregate(
[
{ $group: { _id: null,
total_resources: { $sum:
"$resources_allocated" } } }
])

Sum the resources_allocated field


from employee_details

Aggregation framework
More Examples Contd.

Indexing

Indexing
Performance can be increased by proper implementation of Indexes

Indexes increases the speed of read operations


Index can be created on any field using the following syntax

db.collection_name.ensureIndex({key:1})
1 represents ascending Index and -1 represents descending Index

Index can be dropped by


db.collection_name.dropIndex({key:1})
Indexes are auto updated after every insert

Indexing
ensureIndex() method can have an optional second parameter

Few values which it can take

{unique: true} : to create a unique index


{background: true} : the system does not wait for the index to be
created. Index will be created in the background
{sparse: true} : will create indexes only on those documents that has
the indexed field in it
{dropDups: true} : will delete those documents that has duplicated
values for the indexed fields

Indexing
db.collection_name.getIndexes() will get all the Indexes created on the particular
collection
db.collection_name.reIndex() rebuilds all the indexes on the particular collection
db.collection_name.totalIndexSize() will give the total size in bytes of all the
indexes

Index Types
_id Index
_id index is a unique index on the _id field
MongoDB creates this index by default on all collections
Cannot delete the index on _id.

Secondary Indexes
All indexes in MongoDB are secondary indexes
Can create indexes on any field within any document or sub-document
It can be Indexes on Sub-documents, Embedded Fields or Compound Indexes

Backup and Restore

Backup and Restore


Backups
Backups of the databases can be created by instantiating the
mongodump application (present in the bin folder)
The syntax is

mongodump --out path_to_store_backup


It can also be customized to backup a particular database or collection
mongodump --out path_to_store_backup --db
database_name --collection collection_name

To restore a backup, mongorestore application have to be instantiated

mongorestore --collection collection_name --db


database_name
path_to_the_backup\collection_name.bson

Backup and Restore - Cont


MongoDB export

To export a collection from the server to local machine to a json


or csv file, mongoexport application can be used
mongoexport --db database_name --collection collection_name -out path_to_json\file_name.json

mongoexport --db database_name --collection collection_name -csv --out path_to_json\file_name.csv --fields


field_name1,field_name2

Backup and Restore Contd.


MongoDB import

To import data from a json or csv file to a collection,


mongoimport application can be used
mongoimport --db dest_database_name --collection
dest_collection_name path_to_input_json

mongoimport --type csv --db dest_database_name --collection


dest_collection_name path_to_input_csv --fields
new_field_name1,new_field_name2

Summary
Querying Mongo

Aggregation
Indexing
Backup and Restore
Import and Export Data
Mongo tools

References
www.mongodb.org/

Karl Seguin, The Little MongoDB Book


Kristina Chodorow & Michael Dirolf, MongoDB: The Definitive Guide, O'Reilly
Media, 2010
www.mkyong.com/tutorials/java-mongodb-tutorials/

Thank You

ER/CORP/CRS/ERCLD0008/003

S-ar putea să vă placă și