Intro to NoSQL and MongoDB

NoSQL: Introduction

Asya Kamsky
1

• 1970's Relational Databases Invented
– Storage is expensive
– Data is normalized
– Data storage is abstracted away from app

2

• 1980's RDBMS commercialized
– Client/Server model
– SQL becomes the standard

3

• 1980's RDBMS commercialized
– Client/Server model
– SQL becomes the standard
• 1990's Things begin to change
– Client/Server=> 3-tier architecture
– Rise of the Internet and the Web
4

• 2000's Web 2.0
– Rise of "Social Media"
– Acceptance of E-Commerce
– Constant decrease of HW prices
– Massive increase of collected data

5

• 2000's Web 2.0
– Rise of "Social Media"
– Acceptance of E-Commerce
– Constant decrease of HW prices
– Massive increase of collected data

• Result
– Constant need to scale dramatically
– How can we scale? 6

+ complex transactions
+ tabular data
+ ad hoc queries
- O<->R mapping hard
- speed/scale problems
- not super agile

BI / OLTP /
reporting operational

7

+ ad hoc queries + tabular data
+ SQL standard + ad hoc queries
protocol between - O<->R mapping hard
clients and servers - speed/scale problems
+ scales horizontally - not super agile
better than oper dbs.
- some scale limits at
massive scale BI / OLTP /
- schemas are rigid reporting operational
- no real time; great at
bulk nightly data loads

8


fewer issues
9
here


a lot more
fewer issues issues here
10
here

- no real time; great at caching

app layer
flat files partitioning
map/reduce

11

• Agile Development
Methodology
• Shorter development cycles
• Constant evolution of
requirements
• Flexibility at design time

12

• Agile Development
Methodology
• Shorter development cycles
• Constant evolution of
requirements
• Flexibility at design time

• Relational Schema
• Hard to evolve
• long painful migrations
• must stay in sync with
application
• few developers interact directly

13

• Horizontal scaling
• More real time requirements
• Faster development time
• Flexible data model
• Low upfront cost
• Low cost of ownership

16

What is NoSQL?

Relational
vs
Non-Relational
17

+ speed and scale
- ad hoc query limited
- not very transactional
- no sql/no standard
+ fits OO well
scalable + agile
nonrelational
BI / reporting (“nosql”)

OLTP /
operational

18

Non-relational next generation
operation data stores and databases

A collection of very different products
• Different data models (Not relational)
• Most are not using SQL for queries
• No predefined schema
• Some allow flexible data structures

19

• Relational • Key-Value
• Document
• XML
• Graph
• Column

20

• Document
• XML
• Graph
• Column

• ACID • BASE

21

• Document
• XML
• Graph
• Column

• ACID • BASE

• Two-phase commit • Atomic transactions on
document level

22

• Document
• XML
• Graph
• Column

• ACID • BASE

• Two-phase commit • Atomic transactions on
document level
• Joins • No Joins
23

• Transaction rate

• Reliability

• Maintainability
• Ease of Use

• Scalability

• Cost
25

MongoDB: Introduction

26

• Designed and developed by founders of Doubleclick,
ShopWiki, GILT groupe, etc.

• Coding started fall 2007
• First production site March 2008 -
businessinsider.com
• Open Source – AGPL, written in C++
• Version 0.8 – first official release February 2009
• Version 1.0 – August 2009
• Version 2.0 – September 2011
27

MongoDB
Design Goals
28

• Document-oriented
Storage
• Based on JSON
Documents
• Flexible Schema
• Scalable Architecture
• Auto-sharding
• Replication & high
availability
• Key Features Include:
• Full featured indexes
• Query language
• Map/Reduce &
Aggregation
30

• Rich data models
• Seamlessly map to native programming
language types
• Flexible for dynamic data
• Better data locality

31

{
_id : ObjectId("4e2e3f92268cdda473b628f6"),
title : “Too Big to Fail”,
when : Date(“2011-07-26”),
author : “joe”,
text : “blah”
}

33

{
when : Date(“2011-07-26”),
author : “joe”,
text : “blah”,
tags : [“business”, “news”, “north america”]
}

> db.posts.find( { tags : “news” } )

34

{
when : Date(“2011-07-26”),
author : “joe”,
text : “blah”,
tags : [“business”, “news”, “north america”],
votes : 3,
voters : [“dmerr”, “sj”, “jane” ]
}

35

{
when : Date(“2011-07-26”),
author : “joe”,
text : “blah”,
votes : 3,
voters : [“dmerr”, “sj”, “jane” ],
comments : [
{ by : “tim157”, text : “great story” },
{ by : “gora”, text : “i don’t think so” },
{ by : “dmerr”, text : “also check out...” }
]
}
36

{
when : Date(“2011-07-26”),
author : “joe”,
text : “blah”,
votes : 3,
voters : [“dmerr”, “sj”, “jane” ],
comments : [
{ by : “tim157”, text : “great story” },
{ by : “gora”, text : “i don’t think so” },
{ by : “dmerr”, text : “also check out...” }
]
}

> db.posts.find( { “comments.by” : “gora” } )
> db.posts.ensureIndex( { “comments.by” : 1 } )
37

Seek = 5+ ms Read = really really fast

Post

Comment
Author

38

Disk seeks and data locality

Post

Author

Comment
Comment
Comment
Comment
Comment

39

• Sophisticated secondary indexes
• Dynamic queries
• Sorting
• Rich updates, upserts
• Easy aggregation
• Viable primary data store

40

• Scale linearly
• High Availability
• Increase capacity with no downtime
• Transparent to the application

41

Replica Sets
• High Availability/Automatic Failover
• Data Redundancy
• Disaster Recovery
• Perform maintenance with no down time

42

Asynchronous
Replication

43

Asynchronous
Replication

44

Asynchronous
Replication

45

Automatic
Election

47


49

• Range based partitioning
• Partitioning and balancing is automatic

50

Key Range
0..100

mongod

Write Scalability

51

Key Range Key Range
0..50 51..100

mongod mongod

Write Scalability

52

Key Range Key Range Key Range Key Range
0..25 26..50 51..75 76.. 100

mongod mongod
mongod mongod

Write Scalability

53

0..25 26..50 51..75 76.. 100

Primary Primary Primary Primary

Secondary Secondary Secondary Secondary


54

Application

MongoS

0..25 26..50 51..75 76.. 100




55

Application

MongoS MongoS MongoS

0..25 26..50 51..75 76.. 100




56

Application

Config
Config
MongoS MongoS MongoS

Config

0..25 26..50 51..75 76.. 100




57

• Few configuration options
• Does the right thing out of the box
• Easy to deploy and manage

58

MySQL MongoDB
START TRANSACTION; db.contacts.save( {
INSERT INTO contacts VALUES userName: “joeblow”,
(NULL, ‘joeblow’); emailAddresses: [
INSERT INTO contact_emails VALUES “joe@blow.com”,
( NULL, ”joe@blow.com”, “joseph@blow.com” ] } );
LAST_INSERT_ID() ),
( NULL, “joseph@blow.com”,
LAST_INSERT_ID() );
COMMIT;

59

MySQL MongoDB
START TRANSACTION; db.contacts.save( {
INSERT INTO contacts VALUES userName: “joeblow”,
(NULL, ‘joeblow’); emailAddresses: [
INSERT INTO contact_emails VALUES “joe@blow.com”,
( NULL, ”joe@blow.com”, “joseph@blow.com” ] } );
LAST_INSERT_ID() ),
( NULL, “joseph@blow.com”,
LAST_INSERT_ID() );
COMMIT;

• Native drivers for dozens of languages
• Data maps naturally to OO data
structures
60

MongoDB Usage Examples

61

Content Management Operational Intelligence E-Commerce

User Data Management High Volume Data Feeds

62

Wordnik uses MongoDB as the foundation for its “live” dictionary that stores its entire
text corpus – 3.5T of data in 20 billion records

Problem Why MongoDB Impact
 Analyze a staggering amount of  Migrated 5 billion records in a  Reduced code by 75%
data for a system build on single day with zero downtime compared to MySQL
continuous stream of high-  MongoDB powers every  Fetch time cut from 400ms to
quality text pulled from online website request: 20m API calls 60ms
sources per day  Sustained insert speed of 8k
 Adding too much data too  Ability to eliminate memcached words per second, with
quickly resulted in outages; layer, creating a simplified frequent bursts of up to 50k per
tables locked for tens of system that required fewer second
seconds during inserts resources and was less prone to  Significant cost savings and 15%
 Initially launched entirely on error. reduction in servers
MySQL but quickly hit
performance road blocks

Life with MongoDB has been good for Wordnik. Our code is faster, more flexible and dramatically smaller.
Since we don’t spend time worrying about the database, we can spend more time writing code for our
application. -Tony Tam, Vice President of Engineering and Technical Co-founder
63

Intuit relies on a MongoDB-powered real-time analytics tool for small businesses to
derive interesting and actionable patterns from their customers’ website traffic

 Intuit hosts more than 500,000  MongoDB's querying and  In one week Intuit was able to
websites Map/Reduce functionality could become proficient in MongoDB
 wanted to collect and analyze server as a simpler, higher- development
data to recommend conversion performance solution than a  Developed application features
and lead generation complex Hadoop more quickly for MongoDB than
improvements to customers. implementation. for relational databases
 With 10 years worth of user  The strength of the MongoDB  MongoDB was 2.5 times faster
data, it took several days to community. than MySQL
process the information using a
relational database.

We did a prototype for one week, and within one week we had made big progress. Very big progress. It
was so amazing that we decided, “Let’s go with this.” -Nirmala Ranganathan, Intuit

64

Shutterfly uses MongoDB to safeguard more than six billion images for millions of
customers in the form of photos and videos, and turn everyday pictures into keepsakes

 Managing 20TB of data (six  JSON-based data structure  500% cost reduction and 900%
billion images for millions of  Provided Shutterfly with an performance improvement
customers) partitioning by agile, high performance, compared to previous Oracle
function. scalable solution at a low cost. implementation
 Home-grown key value store on  Works seamlessly with  Accelerated time-to-market for
top of their Oracle database Shutterfly’s services-based nearly a dozen projects on
offered sub-par performance architecture MongoDB
 Codebase for this hybrid store  Improved Performance by
became hard to manage reducing average latency for
 High licensing, HW costs inserts from 400ms to 2ms.

The “really killer reason” for using MongoDB is its rich JSON-based data structure, which offers Shutterfly
an agile approach to develop software. With MongoDB, the Shutterfly team can quickly develop and
deploy new applications, especially Web 2.0 and social features. -Kenny Gorman, Director of Data Services
65

Open source, high performance database

67

Intro to NoSQL and MongoDB

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Viewers also liked

Viewers also liked (7)

Similar to Intro to NoSQL and MongoDB

Similar to Intro to NoSQL and MongoDB (20)

More from DATAVERSITY

More from DATAVERSITY (20)

Recently uploaded

Recently uploaded (20)

Intro to NoSQL and MongoDB