Intro to NoSQL and MongoDB

NoSQL: Introduction

Asya Kamsky
1

• 1970's Relational Databases Invented
– Storage is expensive
– Data is normalized
– Data storage is abstracted away from app

2

• 1980's RDBMS commercialized
– Client/Server model
– SQL becomes the standard

3

• 1980's RDBMS commercialized
– Client/Server model
– SQL becomes the standard
• 1990's Things begin to change
– Client/Server=> 3-tier architecture
– Rise of the Internet and the Web
4

• 2000's Web 2.0
– Rise of "Social Media"
– Acceptance of E-Commerce
– Constant decrease of HW prices
– Massive increase of collected data

5

• 2000's Web 2.0
– Rise of "Social Media"
– Acceptance of E-Commerce
– Constant decrease of HW prices
– Massive increase of collected data

• Result
– Constant need to scale dramatically
– How can we scale? 6

Computers in 1985
• x286 5-35 mhz
• 56 kbps
• 64 KB RAM
• 10 MB HDD

7

Computers in 1985 Computers in 1995
• x286 5-35 mhz • Pentium 100 mhz
• 56 kbps • 20-50 Mbps
• 64 KB RAM • 16 MB RAM
• 10 MB HDD • 200 MB HDD

8

Computers in 1985 Computers in 1995 Phone in 2012
• x286 5-35 mhz • Pentium 100 mhz • Dual core 1.2 Ghz
• 56 kbps • 20-50 Mbps • WiFi 802.11n -
• 64 KB RAM • 16 MB RAM 300+Mbps
• 10 MB HDD • 200 MB HDD • 1 GB RAM
• 48 GB SSD

9

Computers in 1985 Computers in 1995 Computers in 2012
• x286 5-35 mhz • Pentium 100 mhz • Dual core 1.8 Ghz
• 56 kbps • 20-50 Mbps • WiFi 802.11n -
• 64 KB RAM • 16 MB RAM 300+Mbps
• 10 MB HDD • 200 MB HDD • 180+ Gbps
• 8 GB RAM
• 512 GB SSD

10

• Agile Development
Methodology
• Shorter development cycles
• Constant evolution of
requirements
• Flexibility at design time

11

Methodology
requirements

• Relational Schema
• Hard to evolve
• long painful migrations
• must stay in sync with
application
• few developers interact directly

12

+ complex transactions
+ ad hoc queries + tabular data
+ SQL standard + ad hoc queries
protocol between - O<->R mapping hard
clients and servers - speed/scale problems
+ scales horizontally - not super agile
better than oper dbs.
- some scale limits at
massive scale BI / OLTP /
- schemas are rigid reporting operational
- no real time; great at
bulk nightly data loads

a lot more
fewer issues issues here
13
here

+ complex transactions
+ ad hoc queries + tabular data
+ SQL standard + ad hoc queries
protocol between - O<->R mapping hard
clients and servers - speed/scale problems
+ scales horizontally - not super agile
better than oper dbs.
- some scale limits at
massive scale BI / OLTP /
- schemas are rigid reporting operational
- no real time; great at caching
bulk nightly data loads

app layer
flat files partitioning
map/reduce

14

Methodology
requirements

16

Methodology
requirements

• Relational Schema
• Hard to evolve
• long painful migrations
• must stay in sync with
application
• few developers interact directly

17

• Horizontal scaling
• Run anywhere
• Flexible data model
• Faster development
• Low upfront cost
• Low cost of ownership

19

What is NoSQL?

Relational
vs
Non-Relational
20

+ speed and scale
- ad hoc query limited
- not very transactional
- no sql/no standard
+ fits OO well
scalable + agile
nonrelational
BI / reporting ("nosql")

OLTP /
operational

21

Non-relational next generation
operation data stores and databases

A collection of very different products
• Different data models (Not relational)
• Most are not using SQL for queries
• No predefined schema
• Some allow flexible data structures

22

• Relational • Key-Value
• Document
• XML
• Graph
• Column

23

• Document
• XML
• Graph
• Column

• ACID • BASE
• Some ACID properties

24

• Document
• XML
• Graph
• Column

• ACID • BASE

• Two-phase commit • Atomic transactions on
document level
25

• Document
• XML
• Graph
• Column

• ACID • BASE

• Two-phase commit • Atomic transactions on
document level
• Joins • No Joins 26

• Fits your use case

• Reliability

• Maintainability

• Ease of Use

• Scalability

• Cost
28

MongoDB: Introduction

29

• Designed and developed by founders of Doubleclick,
ShopWiki, GILT groupe, etc.
• GOAL: create high performance, fully consistent,
horizonally scalable general purpose data store.

• Coding started fall 2007
• Open Source – AGPL, written in C++
• First production site March 2008 - businessinsider.com
• Currently version 2.2 – August 2012

31

MongoDB
Design Goals
32

• Document-oriented
Storage
• Based on JSON
Documents
• Data serialized to BSON
• Flexible Schema
• Scalable Architecture
• Replication
• High availability • Key Features Include:
• Auto-sharding • Full featured indexes
• Extensive use of memory • Ad-hoc Query Language
mapped files • Interactive shell
• Durable • Aggregation queries
• Strong Consistency • Map/Reduce
34

• Rich data models
• Seamlessly map to native programming
language types
• Flexible for dynamic data
• Better data locality

35

Blogging website:
Register users
Users post blog entries
Comment on others' entries
Considering:
Tagging, Voting, ???

36

{
_id : ObjectId("4e2e3f92268cdda473b628f6"),
title : "My Very Important Thoughts",
published: ISODate("2011-07-26T19:49:00.147Z"),
author : { name:"Asya Kamsky", username:"asya" },
text : "It was a long and stormy night ..."
}

38

{
tags : ["business", "news", "north america"]
}

> db.posts.ensureIndex( { tags : 1 } )

39

{
}

> db.posts.find( { tags : "news" } )

40

{
}

> db.posts.find( { tags : "news" } ) .explain()
{ "cursor" : "BtreeCursor tags_1",
"isMultiKey" : true,
"n" : 1,
"nscannedObjects" : 1,
"scanAndOrder" : false,
"indexOnly" : false, 41

{
tags : ["business", "news", "north america"],
votes : 3,
voters : ["dmerr", "sj", "jane" ]
}

> db.posts.update( { }, – query for documents to update
{ } – update to perform
)

42

{
votes : 3,
voters : ["dmerr", "sj", "jane" ]
}

> db.posts.update( {_id:..., voters:{$ne:"asya"} },
{ $push: {voters:"asya"},
$inc : {votes: 1}
} )
43

{
votes : 4,
voters : ["dmerr", "sj", "jane", "asya" ],
comments : [
{ by : "tim157", text : "great story", ... },
{ by : "gora", text : "i don’t think so", ... },
{ by : "dmerr", text : "also check out..." }
]
}
44

{
votes : 4,
voters : ["dmerr", "sj", "jane","asya" ],
comments : [
{ by : "tim157", text : "great story" },
{ by : "gora", text : "i don’t think so" },
]
}

> db.posts.ensureIndex( { "comments.by" : 1 } )
45

{
votes : 4,
voters : ["dmerr", "sj", "jane","asya" ],
comments : [
{ by : "tim157", text : "great story" },
{ by : "gora", text : "i don’t think so" },
]
}

> db.posts.find( { "comments.by" : "gora" } )
46

Seek = 5+ ms Read = really really fast

Post

Comment
Author

47

Disk seeks and data locality

Post

Author

Comment
Comment
Comment
Comment
Comment

48

• High Availability
• Data Redundancy
• Increase capacity with no downtime
• Transparent to the application

49

• A cluster of N servers Pick me!

• Any (one) node can be primary
• All writes to primary Node 1

• Reads go to primary (default)
Node 2
optionally to a secondary
• Consensus election of primary Primary
Node 3

• Automatic failover
• Automatic recovery
50

Replica Sets
• High Availability/Automatic Failover
• Data Redundancy
• Disaster Recovery
• Perform maintenance with no down time

51

Asynchronous
Replication

52

Asynchronous
Replication

53

Asynchronous
Replication

54

Automatic
Election

56

• Increase capacity with no downtime
• Range based partitioning
• Partitioning and balancing is automatic

58

Key Range Key Range Key Range Key Range
min..25 26..50 51..75 76.. max

Primary Primary Primary Primary

Secondary Secondary Secondary Secondary


59

Application

MongoS

min..25 26..50 51..75 76.. max




60

Application

MongoS MongoS MongoS

min..25 26..50 51..75 76.. max




61

Application Application
Application Application

MongoS Config
MongoS Config
MongoS MongoS Config
MongoS

min..25 26..50 51..75 76.. max




62

• Few configuration options
• Does the right thing out of the box
• Easy to deploy and manage

63

Better data locality In-Memory Auto-Sharding
Caching

Read scaling
Write scaling

Relational MongoDB

We just can't get any faster than the way MongoDB handles our data.
Tony Tam
CTO, Wordnik

64

• Supported Platforms:

– Linux, Windows, Solaris, Mac OS X

– Packages available for all popular distributions

No external/third party software dependencies
10gen maintains drivers for over dozen languages
65

Content Management Operational Intelligence E-Commerce

User Data Management High Volume Data Feeds

66

Open source, high performance database

68

Intro to NoSQL and MongoDB

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Intro to NoSQL and MongoDB

Similar to Intro to NoSQL and MongoDB (20)

More from DATAVERSITY

More from DATAVERSITY (20)

Recently uploaded

Recently uploaded (20)

Intro to NoSQL and MongoDB