MongoDB at NoSQL Now! 2012: Benefits and Challenges of Using MongoDB in the Enterprise

Open source, high performance database

Jared Rosoff
@forjared
1

Challenges Opportunities
• Variably typed data • Agility
• Complex objects • Cost reduction
• High transaction rates • Simplification
• Large data size
• High availability

2

AGILE DEVELOPMENT
• Iterative and continuous
• New and emerging apps

VOLUME AND TYPE
OF DATA
• Trillions of records
• 10’s of millions of queries NEW ARCHITECTURES
per second • Systems scaling horizontally,
• Volume of data not vertically
• Semi-structured and • Commodity servers
unstructured data • Cloud Computing

3

PROJECT
DENORMALIZE
START
DEVELOPER PRODUCTIVITY DECREASES DATA MODEL
STOP USING
JOINS CUSTOM
• Needed to add new software layers of ORM, Caching, CACHING LAYER
Sharding and Message Queue CUSTOM
SHARDING
• Polymorphic, semi-structured and unstructured data
not well supported

COST OF DATABASE INCREASES
+1 YEAR
• Increased database licensing cost
• Vertical, not horizontal, scaling
• High cost of SAN
+6 MONTHS

+90 DAYS

+30 DAYS
LAUNCH
4

• Variably typed data
• Complex data objects
• High transaction rates
• Large data size
• Agile development

6

• Metadata management
• EAV anti-pattern
• Sparse table anti-
pattern

7

Entity Attribute Value
1 Type Book
1 Title Inside Intel
Difficult to query
1 Author Andy Grove
2 Type Laptop Difficult to index
2 Manufactur Dell
er Expensive to query
2 Model Inspiron
2 RAM 8gb
3 Type Cereal
8

Type Title Author Manufacturer Model RAM Screen
Width
Book Inside Intel Andy Grove
Laptop Dell Inspiron 8gb 12”
TV Panasonic Viera 52”
MP3 Margin Walker Fugazi Dischord

Constant schema changes

Space inefficient

Overloaded fields

9

Type Manufacturer Model RA Screen
M Width
Laptop Dell Inspiron 8gb 12”

Querying is difficult Manufacturer Model Screen
Width
Hard to add more types Panasonic Viera 52”

Title Author Manufacturer
Margin Fugazi Dischord
Walker

10

Challenges Quer
y
Quer
y
Quer
• Constant load from client y

• Far in excess of single server Quer
y
Quer
y
capacity
• Can never take the system Query
Query

down
Query

Database

12

Challenges
• Adding more storage over time
• Aging out data that’s no longer needed
• Minimizing resource overhead of “cold” data

Recent Data Old Data

Fast Storage Archival Storage

Add Capacity

13

Variably typed • Rigid schemas
data
Complex • Normalization can be hard
Objects • Dependent on joins

High transaction • Vertical scaling
rate • Poor data locality

• Difficult to maintain consistency & HA
High Availability • HA a bolt-on to many RDBMS

Agile • Schema changes
Development • Monolithic data model

16

A new data model

17

var post = { author: “Jared”,
date: new Date(),
text: “NoSQL Now 2012”,
tags: [“NoSQL”, “MongoDB”]}

> db.posts.save(post)

18

>db.posts.find()

{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : ”Jared",
date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)",
text : ”NoSQL Now 2012",
tags : [ ”NoSQL", ”MongoDB" ] }

Notes:
- _id is unique, but can be anything you’d like

19

Create index on any Field in Document

// 1 means ascending, -1 means descending

>db.posts.ensureIndex({author: 1})

>db.posts.find({author: ’Jared'})

author : ”Jared",
... }
20

• Conditional Operators
– $all, $exists, $mod, $ne, $in, $nin, $nor, $or, $size, $type
– $lt, $lte, $gt, $gte

// find posts with any tags
> db.posts.find( {tags: {$exists: true }} )

// find posts matching a regular expression
> db.posts.find( {author: /^Jar*/i } )

// count posts by author
> db.posts.find( {author: ‘Jared’} ).count()
21

• $set, $unset, $inc, $push, $pushAll, $pull, $pullAll, $bit

> comment = { author: “Brendan”,
date: new Date(),
text: “I want a freakin pony”}

> db.posts.update( { _id: “...” },
$push: {comments: comment} );

22

author : ”Jared",
text : ”NoSQL Now 2012",
tags : [ ”NoSQL", ”MongoDB" ],
comments : [
{
author : "Brendan",
text : ”I want a freakin pony"
}
]} 23

// Index nested documents
> db.posts.ensureIndex( “comments.author”:1 )
 db.posts.find({‘comments.author’:’Brendan’})

// Index on tags
> db.posts.ensureIndex( tags: 1)
> db.posts.find( { tags: ’MongoDB’ } )

// geospatial index
> db.posts.ensureIndex( “author.location”: “2d” )
> db.posts.find( “author.location” : { $near : [22,42] } )
24

db.posts.aggregate(
{ $project : {
author : 1,
tags : 1,
} },
{ $unwind : “$tags” },
{ $group : {
_id : { tags : 1 },
authors : {
$addToSet : “$author”
}}}
); 25

It’s highly available

26

Write
Primary
Read
Asynchronous
Driver

Secondary Replication
Read

Secondary
Read

27

Primary
Driver

Secondary
Read

Secondary
Read

28

Primary

Write Automatic
Driver

Primary Leader Election
Read

Secondary
Read

29

Secondary
Read

Write
Driver

Primary
Read

Secondary
Read

30

With tunable
consistency

31

Write
Driver

Primary
Read

Secondary

Secondary

32

Write
Primary
Driver

Secondary
Read

Secondary
Read

33

• Fire and forget
• Wait for error
• Wait for fsync
• Wait for journal sync
• Wait for replication

35

Driver Primary
write

apply in memory

36

Driver Primary
write
getLastError
apply in memory

37

Driver Primary
write
getLastError
apply in memory
j:true
Write to journal

38

Driver Primary Secondary
write
getLastError
apply in memory
w:2
replicate

39

Value Meaning
<n:integer> Replicate to N members of replica set
“majority” Replicate to a majority of replica set
members
<m:modeName> Use custom error mode name

40

{ _id: “someSet”,
members: [
{ _id:0, host:”A”, tags: { dc: “ny”}},
{ _id:1, host:”B”, tags: { dc: “ny”}},
{ _id:2, host:”C”, tags: { dc: “sf”}},
{ _id:3, host:”D”, tags: { dc: “sf”}},
{ _id:4, host:”E”, tags: { dc: “cloud”}},
settings: {
getLastErrorModes: {
These are the
veryImportant: { dc: 3 }, modes you can
sortOfImportant: { dc: 2 } use in write
} concern
}
}

41

• Between 0..1000
• Highest member that is up to date wins
– Up to date == within 10 seconds of primary
• If a higher priority member catches up, it will force election
and win

Primary Secondary Secondary
priority = 3 priority = 2 priority = 1

42

• Lags behind master by configurable time delay
• Automatically hidden from clients
• Protects against operator errors
– Accidentally delete database
– Application corrupts data

43

• Vote in elections
• Don’t store a copy of data
• Use as tie breaker

44

Data Center

Primary

45

Data Center

Zone 1 Zone 2 Zone 3


46

Data Center

Zone 1 Zone 2 Zone 3

hidden = true

backups

47

Active Data Center Standby Data Center

Zone 1 Zone 2

priority = 1 priority = 1 priority = 0

48

West Coast DC Central DC East Coast DC
Zone 1 Zone 1

Primary Secondary
priority = 2 priority = 1

Zone 2 Zone 2
Abiter
Secondary Secondary
priority = 2 priority = 1

49

client client client client

config
mongos mongos
config
config

Config
Servers
mongod mongod mongod

Shard Shard Shard

51

> db.runCommand( { shardcollection: “test.users”,
key: { email: 1 }} )

{
name: “Jared”,
email: “jsr@10gen.com”,
}
{
name: “Scott”,
email: “scott@10gen.com”,
}
{
name: “Dan”,
email: “dan@10gen.com”,
}

52

-∞ +
∞

dan@10gen.com scott@10gen.com

jsr@10gen.com

54

Split!

-∞ +
∞


jsr@10gen.com

55

This is a Split! This is a
chunk chunk

-∞ +
∞


jsr@10gen.com

56

-∞ +
∞


jsr@10gen.com

57

-∞ +
∞


jsr@10gen.com

58

Split!

-∞ +
∞


jsr@10gen.com

59

Min Key Max Key Shard
-∞ dan@10gen.com 1
dan@10gen.com jsr@10gen.com 1
jsr@10gen.com scott@10gen.com 1
scott@10gen.com +∞ 1

• Stored in the config servers
• Cached in MongoS
• Used to route requests and keep cluster balanced

60

mongos
config
balancer
config
Chunks!
config

1 2 3 4 13 14 15 16 25 26 27 28 37 38 39 40

5 6 7 8 17 18 19 20 29 30 31 32 41 42 43 44

9 10 11 12 21 22 23 24 33 34 35 36 45 46 47 48

Shard 1 Shard 2 Shard 3 Shard 4

61

mongos
config
balancer
config

Imbalance
Imbalance config

1 2 3 4

5 6 7 8

9 10 11 12 21 22 23 24 33 34 35 36 45 46 47 48


62

mongos
config
balancer
config

Move chunk 1 config
to Shard 2

1 2 3 4

5 6 7 8

9 10 11 12 21 22 23 24 33 34 35 36 45 46 47 48


63

mongos
config
balancer
config

config

1 2 3 4

5 6 7 8

9 10 11 12 21 22 23 24 33 34 35 36 45 46 47 48


64

mongos
config
balancer
config
Chunk 1 now lives
on Shard 2 config

2 3 4

5 6 7 8 1

9 10 11 12 21 22 23 24 33 34 35 36 45 46 47 48


65

By Shard Key Routed db.users.find(
{email: “jsr@10gen.com”})
Sorted by Routed in order db.users.find().sort({email:-1})
shard key
Find by non Scatter Gather db.users.find({state:”CA”})
shard key
Sorted by Distributed merge db.users.find().sort({state:1})
sort
non shard
key

66

Inserts Requires shard db.users.insert({
key name: “Jared”,
email: “jsr@10gen.com”})

Removes Routed db.users.delete({
email: “jsr@10gen.com”})
Scattered db.users.delete({name: “Jared”})

Updates Routed db.users.update(
{email: “jsr@10gen.com”},
{$set: { state: “CA”}})
Scattered db.users.update(
{state: “FZ”},
{$set:{ state: “CA”}}, false, true )

67

1

1. Query arrives at
4
mongos MongoS
2. MongoS routes query
to a single shard
3. Shard returns results
of query
2
4. Results returned to
client
3

Shard 1 Shard 2 Shard 3

68

1

4 1. Query arrives at
mongos MongoS
2. MongoS broadcasts
query to all shards
3. Each shard returns
results for query
2 4. Results combined and
2 2 returned to client
3
3 3


69

1

6 1. Query arrives at
mongos MongoS
2. MongoS broadcasts
5 query to all shards
3. Each shard locally sorts
results
2 4. Results returned to
2 2 mongos
4 4 4 5. MongoS merge sorts
individual results
3 6. Combined sorted result
3 3
returned to client

70

Content Management Operational Intelligence Meta Data Management

w

User Data Management High Volume Data Feeds

72

Machine • More machines, more sensors,
Generated more data
Data • Variably structured

Stock Market • High frequency trading
Data

• Multiple sources of data
Social Media
• Each changes their format
Firehose constantly

73

Flexible document
model can adapt to
changes in sensor
format
Asynchronous writes

Data
Data
Sources
Data
Sources
Data Write to memory with
Sources periodic disk flush
Sources

Scale writes over
multiple shards

74

• Large volume of state about users
Ad Targeting • Very strict latency requirements

• Expose report data to millions of customers
Real time • Report on large volumes of data
dashboards • Reports that update in real time

Social Media • What are people talking about?
Monitoring
75

Parallelize queries
Low latency reads
across replicas and
shards

API
In database
aggregation

Dashboards
Flexible schema
adapts to changing
input data
Can use same cluster
to collect, store, and
report on data
76

Intuit relies on a MongoDB-powered real-time analytics tool for small businesses to
derive interesting and actionable patterns from their customers’ website traffic

Problem Why MongoDB Impact
 Intuit hosts more than 500,000  Intuit hosts more than 500,000  In one week Intuit was able to
websites websites become proficient in MongoDB
 wanted to collect and analyze  wanted to collect and analyze development
data to recommend conversion data to recommend conversion  Developed application features
and lead generation and lead generation more quickly for MongoDB than
improvements to customers. improvements to customers. for relational databases
 With 10 years worth of user  With 10 years worth of user  MongoDB was 2.5 times faster
data, it took several days to data, it took several days to than MySQL
process the information using a process the information using a
relational database. relational database.

We did a prototype for one week, and within one week we had made big progress. Very big progress. It
was so amazing that we decided, “Let’s go with this.” -Nirmala Ranganathan, Intuit

77

Rich profiles
collecting multiple
complex actions
1 See Ad

Scale out to support { cookie_id: ‚1234512413243‛,
high throughput of advertiser:{
apple: {
activities tracked actions: [
2 See Ad { impression: ‘ad1’, time: 123 },
{ impression: ‘ad2’, time: 232 },
{ click: ‘ad2’, time: 235 },
{ add_to_cart: ‘laptop’,
sku: ‘asdf23f’,
time: 254 },
Click { purchase: ‘laptop’, time: 354 }
3 ]
}
}
}
Dynamic schemas
make it easy to track
Indexing and
4 Convert vendor specific
querying to support
attributes
matching, frequency
capping
78

Data • Meta data about artifacts
• Content in the library
Archiving

• Have data sources that you don’t have
Information access to
• Stores meta-data on those stores and figure
discovery out which ones have the content

• Retina scans
Biometrics • Finger prints

79

Indexing and rich
query API for easy
searching and sorting
db.archives.
find({ ‚country”: ‚Egypt‛ });

Flexible data model
for similar, but
different objects

{ type: ‚Artefact‛, { ISBN: ‚00e8da9b‛,
medium: ‚Ceramic‛, type: ‚Book‛,
country: ‚Egypt‛, country: ‚Egypt‛,
year: ‚3000 BC‛ title: ‚Ancient Egypt‛
} }

80

Shutterfly uses MongoDB to safeguard more than six billion images for millions of
customers in the form of photos and videos, and turn everyday pictures into keepsakes

 Managing 20TB of data (six  JSON-based data structure  500% cost reduction and 900%
billion images for millions of  Provided Shutterfly with an performance improvement
customers) partitioning by agile, high performance, compared to previous Oracle
function. scalable solution at a low cost. implementation
 Home-grown key value store on  Works seamlessly with  Accelerated time-to-market for
top of their Oracle database Shutterfly’s services-based nearly a dozen projects on
offered sub-par performance architecture MongoDB
 Codebase for this hybrid store  Improved Performance by
became hard to manage reducing average latency for
 High licensing, HW costs inserts from 400ms to 2ms.

The “really killer reason” for using MongoDB is its rich JSON-based data structure, which offers Shutterfly
an agile approach to develop software. With MongoDB, the Shutterfly team can quickly develop and
deploy new applications, especially Web 2.0 and social features. -Kenny Gorman, Director of Data Services
81

• Comments and user generated
News Site content
• Personalization of content, layout

Multi-Device • Generate layout on the fly for
rendering each device that connects
• No need to cache static pages

• Store large objects
Sharing • Simple modeling of metadata

82

Geo spatial indexing
Flexible data model for location based
GridFS for large
for similar, but searches
object storage
different objects

{ camera: ‚Nikon d4‛,
location: [ -122.418333, 37.775 ]
}

{ camera: ‚Canon 5d mkII‛,
people: [ ‚Jim‛, ‚Carol‛ ],
taken_on: ISODate("2012-03-07T18:32:35.002Z")
}

{ origin: ‚facebook.com/photos/xwdf23fsdf‛,
license: ‚Creative Commons CC0‛,
size: {
dimensions: [ 124, 52 ],
units: ‚pixels‛
Horizontal scalability }
for large data sets }

83

Wordnik uses MongoDB as the foundation for its “live” dictionary that stores its entire
text corpus – 3.5T of data in 20 billion records

 Analyze a staggering amount of  Migrated 5 billion records in a  Reduced code by 75%
data for a system build on single day with zero downtime compared to MySQL
continuous stream of high-  MongoDB powers every  Fetch time cut from 400ms to
quality text pulled from online website requests: 20m API calls 60ms
sources per day  Sustained insert speed of 8k
 Adding too much data too  Ability to eliminated words per second, with
quickly resulted in outages; memcached layer, creating a frequent bursts of up to 50k per
tables locked for tens of simplified system that required second
seconds during inserts fewer resources and was less  Significant cost savings and 15%
 Initially launched entirely on prone to error. reduction in servers
MySQL but quickly hit
performance road blocks

Life with MongoDB has been good for Wordnik. Our code is faster, more flexible and dramatically smaller.
Since we don’t spend time worrying about the database, we can spend more time writing code for our
application. -Tony Tam, Vice President of Engineering and Technical Co-founder
84

• Scale out to large graphs
Social
Graphs • Easy to search and
process

• Authentication,
Identity
Authorization and
Management
Accounting

85

Native support for
Arrays makes it easy
to store connections
inside user profile

Sharding partitions
user profiles across Documents enable
Social Graph available servers disk locality of all
profile data for a user
86

• Variety, Velocity and Volume make it difficult
• Documents are easier for many use cases

88

• Distributed by default
• Cloud deployment

89

• Document oriented data model
• Highly available deployments
• Strong consistency model
• Horizontally scalable architecture

90

MongoDB at NoSQL Now! 2012: Benefits and Challenges of Using MongoDB in the Enterprise

Recommended

Recommended

More Related Content

What's hot

What's hot (14)

Similar to MongoDB at NoSQL Now! 2012: Benefits and Challenges of Using MongoDB in the Enterprise

Similar to MongoDB at NoSQL Now! 2012: Benefits and Challenges of Using MongoDB in the Enterprise (20)

More from MongoDB

More from MongoDB (20)

Recently uploaded

Recently uploaded (20)

MongoDB at NoSQL Now! 2012: Benefits and Challenges of Using MongoDB in the Enterprise

Editor's Notes