Nosql part1 8th December

NoSQL & MongoDB Part I
Arindam Chatterjee

1

Introduction to NoSQL
•
•

NoSQL stands for “Not only SQL”
NoSQL is
– SQL for non-relational database management system
– Different from traditional relational database system
– designed for distributed data storage that
• typically not requires fixed schema,
• avoid join operations and
• scale horizontally

Used by Facebook, Google and other applications requiring large volume
of unstructured Web application data

2

History of NoSQL
•

RDBMS systems have limitations with respect to the following
– Scalability,
– Parallelization
– Cost

•
•

Example: Google that gets billions of requests a month across applications
which are geographically distributed.
The above led to research on the following concepts
–
–
–
–

GFS: Distributed files System
Chubby: Distributed coordination system
MapReduce: Parallel execution system
Big Data: Column oriented database

3

NoSQL..Where to use
•

NoSQL is useful in the following cases
– Online stores and portals like amazon where the transaction of an individual
should not “lock” a database or part of a database
– Where “committed” transaction is not critical (e.g. a buyer orders an item and
someone else clicks for the same item at the same time, one of them may end
up not getting the item if the same is the last piece left. An “apology” mail and
refund can sort the matter.
– Cost

•

NoSQL SHOULD NOT be used in the following cases
– Stock exchanges or banking where transactions are critical, cached or state data
will just not work
– Other non financial transactions where completion of transactions are critical

4

Benefits of NoSQL
•

Schemaless data representation:
– Almost all NoSQL implementations offer schema-less data representation. This
means that we do not have to think too far ahead to define a structure and we
can continue to evolve over time, including adding new fields or even nesting the
data, for example, in case of JSON representation.

•

Development time:
– reduced development time time because one doesn’t have to deal with complex
SQL queries and joins

•

Speed:
– NoSQL databases are much faster than relational databases

•

Ability to plan ahead for scalability:
– The applications can be quite elastic, can handle sudden spikes of load.
– Provides horizontal scalability and partitioning to new servers

5

List of NoSQL Databases
•
•
•
•
•

Document Based: MongoDB, CouchDB, RavenDB, Terrastore
Key Value: Redis, Membase, Voldemort
XML based: BaseX, eXist
Column based: BigTable, Hadoop/HBase, Cassandra, SimpleDB, Cloudera
Graph bases: Neo4J, FLockDB, InfiniteGraph

6

Storage Types for NoSQL Databases
•

Column Oriented storage
–
–

•

Data stored as columns as opposed to rows (in traditional RDBMS)
Used for On Line Analytical Processing type databases

Example: We want to store the following information
Employee ID
1234
3242
5678
4543

•

First Name
Asim
Noel
Raj
Rohan

Last Name
Das
David
Malhotra
Singh

Dept
HR
Marketing
Production
R&D

Advantages:
–

New column can be added without worrying for filling up default values of existing rows
Efficient for computing maxima, minima, averages and sums, specifically on large datasets

Traditional RDBMS Approach

Column Oriented Approach

• Data serialized as follows

• Data stored as follows

1234, Asim, Das, HR
3242, Noel, David, Marketing
5678, Raj, Malhotra, Production
4543, Rohan, Singh, R&D

1234, 3242, 5678, 4543
Asim, Noel, Raj, Rohan
Das,David, Malhotra, Singh
HR, Marketing, Production, R&D

7

Storage Types for NoSQL Databases..2
•

Document Oriented storage
–
–
–
–
–

•

Allows the inserting, retrieving, and manipulating of semi-structured data
Documents themselves act as records (or rows)
two records may have completely different set of fields or columns
The records may or may not adhere to a specific schema
Most of the databases available under this category use XML, JSON, BSON data types

Example: Different record contain different level of Employee information as
follows
Record 1
{"EmployeeID":
"SM1",
"FirstName" :
"Anuj",
"LastName" :
"Sharma",
"Age" : 45,
"Salary" : 10000000
}

Record 2
{"EmployeeID": "MM2",
"FirstName" : "Anand",
"Age" : 34,
"Salary" : 5000000,
"Address" : {
"Line1" : "123, 4th Street",
"City" : "Bangalore",
"State" : "Karnataka"
},
"Projects" : [
"nosql-migration",
"top-secret-007"
]}

8

Storage Types for NoSQL Databases..3
•

Key value storage
–
–
–

•

Similar to document oriented storage with the following differences
Unlike a document store that can create a key when a new document is inserted, a key-value
store requires the key to be specified
Unlike a document store where the value can be indexed and queried, for a key-value store,
the value is opaque and as such, the key must be known to retrieve the value

Advantages:
–
–

Key-value stores are optimized for querying against keys.
They serve great in-memory caches.

9

RDBMS vs NoSQL
NoSQL

RDBMS
•
•
•
•
•
•

Structured and organized data
Structured query language (SQL)
Data and its relationships are stored in
separate tables.
Follows ACID rules
Data Manipulation Language, Data
Definition Language
Tight Consistency

ACID:
Atomic
Consistent
Isolated
Durable

•
•
•
•
•
•
•

No declarative query language
No predefined schema
Key-Value pair storage, Column Store,
Document Store, Graph databases
Eventual consistency rather ACID
property
Unstructured and unpredictable data
CAP Theorem
Prioritizes high performance, high
availability and scalability

BASE:
Basically Available
Soft State
Eventual Consistency

CAP:
Consistency
Availability
Partition Tolerance
10

CAP
•

CAP theorem (Brewer’s Theorem) : Three basic requirements which exist in a
special relation when designing applications for a distributed architecture.
–
–
–

•
•
•

Consistency : This means that the data in the database remains consistent after the
execution of an operation. For example after an update operation all clients see the same
data.
Availability : This means that the system is always on (service guarantee availability), no
downtime.
Partition Tolerance : This means that the system continues to function even the
communication among the servers is unreliable, i.e. the servers may be partitioned into
multiple groups that cannot communicate with one another.

It is theoretically impossible to fulfill all 3 requirements C, A and P
CAP provides the basic requirements for a distributed system to follow 2 of the
3 requirements.
Therefore all the current NoSQL database follow the different combinations of
the C, A, P from the CAP theorem.
11

BASE
•

BASE system gives up on Consistency
– Basically Available indicates that the system does guarantee availability, in
terms of the CAP theorem.
– Soft state indicates that the state of the system may change over time, even
without input. This is because of the eventual consistency model.
– Eventual consistency indicates that the system will become consistent over
time, given that the system doesn't receive input during that time.

12

MongoDB
•
•

Open Source database written in C++.
Document Oriented database
–

•

Used to store data for very high performance applications with unforeseen growth in
data
–

•

Example format : FirstName="Arun", Address="St. Xavier's Road", Spouse=[{Name:"Kiran"}],
Children=[{Name:"Rihit", Age:8}] –

If load increases (more storage space, more processing power), it can be distributed to other
nodes across computer networks (sharding)

MongoDB supports Map/Reduce framework for batch processing of data and
aggregation operation
–

–

Map : A master node takes an input. Splits it into smaller sections. Sends it to the associated
nodes. These nodes may perform the same operation in turn to send those smaller section of
input to other nodes. It processes the problem (taken as input) and sends it back to the
Master Node.
Reduce : The master node aggregates those results to find the output.

14

RDBMS vs. MongoDB
RDBMS
Record/Row
Table
Column
Value
Index
Table join

MongoDb
Document/Object
Collection
Key, field
Value
Index
embedded documents and linking

15

NoSQL operations in MongoDB
•

Creating Table (Collections)
Other SQL Schema

MongoDb statement

CREATE TABLE users (
id MEDIUMINT NOT NULL
AUTO_INCREMENT,
user_id Varchar(30),
age Number,
status char(1),
PRIMARY KEY (id)

db.users.insert( {
user_id: "abc123",
age: 55,
status: "A"
})

)

db.createCollection("users")

Alternatively,

In MongoDB, collections are implicitly created on first insert() operation. The primary
key _id is automatically added if _id field is not specified.
Reference:
See insert() and db.createCollection() for more information.
16

•

Altering Table (Collections)
Other SQL Schema
Adding a Column
ALTER TABLE users
ADD join_date DATETIME
Dropping a Column
ALTER TABLE users
DROP column join_date

MongoDb statement
Adding a field
db.users.update(
{ },
{ $set: { join_date: new () } },
{ multi: true }
)
Dropping a field
db.users.update(
{ },
{ $unset: { join_date: “” } },
{ multi: true }
)

Collections do not describe or enforce the structure of its documents; i.e. there is no
structural alteration at the collection level for adding/removing fields
Reference:
See the Data Modeling Considerations for MongoDB Applications, update(), $set
and $unset for more information

17

•

INSERT and SELECT operations
Other SQL Schema
Inserting data
INSERT INTO users(user_id,
age,
status)
VALUES (“abc001",
35,
“U")
SELECT operation
SELECT *
FROM users
WHERE status = "A"

MongoDb statement
Inserting data
db.users.insert( {
user_id: “abc001",
age: 35,
status: “U"
})
Find operation
db.users.find(
{ status: "A" }
)

Reference:
See insert() and find() for more information
Use pretty() to display data in formatted way:

db.users.find().pretty();
18

•

UPDATE and DELETE operations
Other SQL Schema
UPDATE

MongoDb statement
UPDATE

UPDATE users
SET status = "C"

db.users.update(
{ age: { $gt: 10 } },
{ $set: { status: "C" } },
{ multi: true }
)

WHERE age > 10
UPDATE users
SET age = age + 5
WHERE status = “U"
DELETE
DELETE FROM users
WHERE status = "D"

db.users.update(
{ status: “U" } ,
{ $inc: { age: 5 } },
{ multi: true }
)
REMOVE
db.users.remove( { status: "D" } )

Reference:
See update(), $gt, $inc , $set and remove() for more information
19

More examples in MongoDB
•

Run the database(Windows):
–
–
–

•

Connect to the database:
–
–
–
–

•
•

Open Command prompt
Go to bin folder of Mongodb specific directory (e.g. mongodb-win32-x86_64-2008plus-v2.4-2013-1004/bin
Run mongod.exe

Open Command prompt
Go to bin folder of Mongodb specific directory (e.g. mongodb-win32-x86_64-2008plus-v2.4-2013-1004/bin
Run mongo
A mongo “shell” will open

Show database: show dbs
Select a database: use <database name>

20

More examples in MongoDB..2
•

Switch to database testData (use testData;)

•

Task I: Insert data directly : The following operation inserts a row/document in Collections
testData
– db.testData.insert({ name : "OtherDB" } );

•

Task 2: Insert data with JavaScript operations : The following operation inserts 2 rows/
documents in Collections testData
j = { name : "mongo" }
k={x:3}
db.testData.insert( j );
db.testData.insert( k );

•

Task 3: Check to see that the 3 records are inserted in the collections testData
– db.testData.find();

21

Inserting multiple documents using a For loop
•

Task : use the following loop from the mongo shell

–

for (var i = 1; i <= 25; i++) db.testData.insert( { x : i } )

•

Use find() to see the result. 25 records will be shown
–

db.testData.find()

Note: If the collection and database do not exist, MongoDB creates
them implicitly before inserting documents.

22

Queries with conditions
•

Task : In the above example, 25 rows were created. We want to show the rows there
x is less than 15. We also want to limit to first 5 rows in the display.
–

db.testData.find( { "x": { $lt: 15 } }).limit(5)

Condition
: x<15

Limit to 5
rows

23

Inserting with explicit “id”
• Task : Insert a record in collections named “inventory” with explicit id, type and
quantity.
–

db.inventory.insert( { _id: 10, type: "misc", item: "card", qty: 15 } );
Explicit ID

Inserting with update() method
• Call the update() method with the upsert flag to create a new document if no
document matches the update’s query criteria. .
–db.inventory.update(
{ type: "book", item : "journal" },
{ $set : { qty: 10 } },
{ upsert : true }
);
The above example creates a new document if no document in the inventory collection contains
{ type: "books", item : "journal" } and assigns an unique ID
24

Inserting with save() method
• To insert a document with the save() method, pass the method a document
that does not contain the _id field or a document that contains an _id field
that does not exist in the collection. .
–

db.inventory.save( { type: "book", item: "notebook", qty: 40 } )
The above example creates a new document in the inventory collection , adds the
ID field and assigns an unique ID

25

Conditional queries
• Task: Select all documents in the inventory collection where the value of the
type field is either 'food' or 'snacks‘
–db.inventory.find( { type: { $in: [ 'food', 'snacks' ] } } );

•

Task: “AND” condition- specifying an equality match on the field food AND
a less than ($lt) comparison match on the field price
–db.inventory.find( { type: 'food', price: { $lt: 9.95 } } );

•

Task: “OR” condition- the query document selects all documents in the
collection where the field qty has a value greater than ($gt) 100 OR the
value of the price field is less than ($lt) 9.95
–db.inventory.find(
{ $or: [ { qty: { $gt: 100 } }, { price: { $lt: 9.95 } } ] }
);

26

Compound queries (using “AND” and “OR” both)
• Task: Select all documents in the collection where the value of the type field
is 'food' and either the qty has a value greater than ($gt) 100 or the value of
the price field is less than ($lt) 9.95:
–db.inventory.find( { type: 'food', $or: [ { qty: { $gt: 100 } },
{ price: { $lt: 9.95 } } ]
} );

27

Matching on “subdocuments”
•

When the field holds an embedded document (i.e. subdocument), we can either
specify the entire subdocument as the value of a field, or “reach into” the
subdocument using “dot” notation, to specify values for individual fields in the
subdocument.

•

In the following example, the query matches all documents where the value of the
field producer is a subdocument that contains only the field company with the value
'ABC123' and the field address with the value '123 Street', in the exact order:
–db.inventory.find(
{ producer: { company: 'ABC123', address: '123 Street' }
});

•

In the following example, the query uses the dot notation to match all documents
where the value of the field producer is a subdocument that contains a field company
with the value 'ABC123' and may contain other fields
–db.inventory.find( { 'producer.company': 'ABC123' } );

28

Matching on Arrays
•

To specify equality match on an array, use the query document { <field>: <value> }
where <value> is the array to match. Equality matches on the array require that the
array field match exactly the specified <value>, including the element order.

•

Exact Match: In the following example, the query matches all documents where the
value of the field tags is an array that holds exactly three elements, 'fruit', 'food', and
'citrus', in this order:
db.inventory.find( { tags: [ 'fruit', 'food', 'citrus' ] } );

–
•

Matching Array Elements: In the following example, the query matches all documents
where the value of the field tags is an array that contains 'fruit' as one of its elements:
–db.inventory.find( { tags: 'fruit' } );

•

In the following example, the query uses the dot notation to match all documents
where the value of the tags field is an array whose first element equals 'fruit‘.
–db.inventory.find( { 'tags.0' : 'fruit' } )

29

Array of subdocuments
•

Match a Field in the Subdocument Using the Array Index :The following example selects

all documents where the memos contains an array whose first element (i.e. index is
0) is a subdocument with the field by with the value 'shipping':
–db.inventory.find( { 'memos.0.by': 'shipping' } )
.
•

Match a Field without specifying Array Index: The following example selects all

documents where the memos field contains an array that contains at least one
subdocument with the field by with the value 'shipping':
–db.inventory.find( { 'memos.by': 'shipping' } )
•

Match multiple Fields: The following example uses dot notation to query for documents

where the value of the memos field is an array that has at least one subdocument
that contains the field memo equal to 'on time' and the field by equal to 'shipping':
–db.inventory.find( { 'memos.memo': 'on time', 'memos.by': 'shipping' } )

30

Using findOne()
db.collection.findOne(<criteria>, <projection>)
•

•

•

The above returns one document that satisfies the specified query criteria. If multiple
documents satisfy the query, this method returns the first document according to the
natural order which reflects the order of documents on the disk.
The <projection> parameter takes a document in the following form
–{ field1: <boolean>, field2: <boolean> ... }
–Boolean can be 1(true, to include) or 0(false, to exclude)
Example: Create a collection named bios with multiple fields. Return “name”,
“contribs” and “_id” fields:
db.bios.findOne(
{ },
{ name: 1, contribs: 1 }
)
31

Exercise I
• Go to database “test”
•Insert data in a collection named userdetails with the following attributes
“user_id" : "ABCDBWN","password" :"ABCDBWN" ,"date_of_join" : "15/10/2010" ,"education"
:"B.C.A." , "profession" : "DEVELOPER","interest" : "MUSIC","community_name" :["MODERN
MUSIC", "CLASSICAL MUSIC","WESTERN MUSIC"],"community_moder_id" : ["MR. BBB","MR.
JJJ","MR MMM"],"community_members" : [500,200,1500],"friends_id" :
["MMM123","NNN123","OOO123"],"ban_friends_id" :["BAN123","BAN456","BAN789"]});

• View the inserted data using find() and pretty()
•Insert another set of data in the same collection with the following
–{"user_id" : "testuser","password" :"testpassword" ,"date_of_join" : "16/10/2010" ,"education"
:"M.C.A." , "profession" : "CONSULTANT","interest" : "MUSIC","community_name" :["MODERN
MUSIC", "CLASSICAL MUSIC","WESTERN MUSIC"],"community_moder_id" : ["MR. BBB","MR.
JJJ","MR MMM"],"community_members" : [500,200,1500],"friends_id" :
["MMM123","NNN123","OOO123"],"ban_friends_id" :["BAN123","BAN456","BAN789"]}

32

Exercise I..contd
•Use update() to change password to “Newpd” and date_of_join to 12/12/2010
for user id "ABCDBWN”
•Fetch only the "user_id" for all documents from the collection 'userdetails'
which hold the educational qualification "M.C.A
•Fetch the "user_id" , "password" and "date_of_join" for all documents from the
collection 'userdetails' which hold the educational qualification "M.C.A."
•Remove one record from collection userdetails where userid= testuser
•Remove the entire collection userdetails using drop()

33

Nosql part1 8th December

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to Nosql part1 8th December

Similar to Nosql part1 8th December (20)

More from Ruru Chowdhury

More from Ruru Chowdhury (20)

Recently uploaded

Recently uploaded (20)

Nosql part1 8th December