This presentation was given at the LDS Tech SORT Conference 2011 in Salt Lake City. The slides are quite comprehensive covering many topics on MongoDB. Rather than a traditional presentation, this was presented as more of a Q & A session. Topics covered include. Introduction to MongoDB, Use Cases, Schema design, High availability (replication) and Horizontal Scaling (sharding).
22. Signs something
needed
⢠doubleclick - 400,000 ads/second
⢠people writing their own stores
⢠caching is de rigueur
⢠complex ORM frameworks
⢠computer architecture trends
⢠cloud computing
23. Requirements
⢠need a good degree of functionality
to handle a large set of use cases
⢠sometimes need strong
consistency / atomicity
⢠secondary indexes
⢠ad hoc queries
24. Trim unneeded
features
⢠leave out a few things so we can
scale
⢠no choice but to leave out
relational
⢠distributed transactions are hard
to scale
25. Needed a scalable
data model
⢠some options:
⢠key/value
⢠columnar / tabular
⢠document oriented (JSON inspired)
⢠opportunity to innovate -> agility
26. MongoDB philosphy
⢠No longer one-size-ďŹts all. but not 12 tools either.
⢠Non-relational (no joins) makes scaling horizontally
practical
⢠Document data models are good
⢠Keep functionality when we can (key/value stores are
great, but we need more)
⢠Database technology should run anywhere, being
available both for running on your own servers or VMs,
and also as a cloud pay-for-what-you-use service.
⢠Ideally open source...
27. MongoDB
⢠JSON Documents
⢠Querying/Indexing/Updating similar
to relational databases
⢠Traditional Consistency
⢠Auto-Sharding
28. Under the hood
⢠Written in C++
⢠Available on most platforms
⢠Data serialized to BSON
⢠Extensive use of memory-mapped
ďŹles
30. MongoDB is:
Application Document
Oriented
High { author: âsteveâ,
date: new Date(),
Performanc
text: âAbout MongoDB...â,
tags: [âtechâ, âdatabaseâ]}
e
Horizontally Scalable
31. This has led
some to say
â
MongoDB has the best
features of key/ values
stores, document
databases and relational
databases in one.
John Nunemaker
33. Photo Meta-
Problem:
⢠Business needed more ďŹexibility than Oracle could deliver
Solution:
⢠Used MongoDB instead of Oracle
Results:
⢠Developed application in one sprint cycle
⢠500% cost reduction compared to Oracle
⢠900% performance improvement compared to Oracle
34. Customer Analytics
Problem:
⢠Deal with massive data volume across all customer sites
Solution:
⢠Used MongoDB to replace Google Analytics / Omniture
options
Results:
⢠Less than one week to build prototype and prove business
case
⢠Rapid deployment of new features
35. Online
Problem:
⢠MySQL could not scale to handle their 5B+ documents
Solution:
⢠Switched from MySQL to MongoDB
Results:
⢠Massive simpliďŹcation of code base
⢠Eliminated need for external caching system
⢠20x performance improvement over MySQL
36. E-commerce
Problem:
⢠Multi-vertical E-commerce impossible to model (efficiently)
in RDBMS
Solution:
⢠Switched from MySQL to MongoDB
Results:
⢠Massive simpliďŹcation of code base
⢠Rapidly build, halving time to market (and cost)
⢠Eliminated need for external caching system
⢠50x+ improvement over MySQL
49. Secondary Indexes
Create index on any Field in Document
// 1 means ascending, -1 means descending
> db.posts.ensureIndex({author: 1})
> db.posts.find({author: 'roger'})
> { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "roger",
... }
58. MongoDB
Replication
â˘MongoDB replication like MySQL
replication (kinda)
â˘Asynchronous master/slave
â˘Variations
â˘Master / slave
â˘Replica Sets
59. Replica Set features
⢠A cluster of N servers
⢠Any (one) node can be primary
⢠Consensus election of primary
⢠Automatic failover
⢠Automatic recovery
⢠All writes to primary
⢠Reads can be to primary (default) or a
secondary
61. How MongoDB
Replication works
MemberâŠ1 MemberâŠ3
MemberâŠ2
PRIMARY
Election establishes the PRIMARY
Data replication from PRIMARY to SECONDARY
62. How MongoDB
Replication works
negotiateâŠ
newâŠmaster
MemberâŠ1 MemberâŠ3
MemberâŠ2
DOWN
PRIMARY may fail
Automatic election of new PRIMARY if majority
exists
63. How MongoDB
Replication works
MemberâŠ3
MemberâŠ1
PRIMARY
MemberâŠ2
DOWN
New PRIMARY elected
Replication Set re-established
67. Replica Set Options
⢠{arbiterOnly: True}
⢠Can vote in an election
⢠Does not hold any data
⢠{hidden: True}
⢠Not reported in isMaster()
⢠Will not be sent slaveOk() reads
⢠{priority: n}
⢠{tags: }
68. Using Replicas for
Reads
⢠slaveOk()
⢠- driver will send read requests to
Secondaries
⢠- driver will always send writes to Primary
⢠Java examples
⢠- DB.slaveOk()
⢠- Collection.slaveOk()
⢠ďŹnd(q).addOption(Bytes.QUERYOPTION_SLAVEO
K);
69. Safe Writes
⢠db.runCommand({getLastError: 1, w : 1})
⢠- ensure write is synchronous
⢠- command returns after primary has written to memory
⢠w=n or w='majority'
⢠n is the number of nodes data must be replicated to
⢠driver will always send writes to Primary
⢠w='myTag' [MongoDB 2.0]
⢠Each member is "tagged" e.g. "US_EAST", "EMEA",
"US_WEST"
⢠Ensure that the write is executed in each tagged "region"
70. Safe Writes
⢠fsync:true
⢠Ensures changed disk blocks are
ďŹushed to disk
⢠j:true
⢠Ensures changes are ďŹush to
Journal
71. When are elections
triggered?
⢠When a given member see's that the
Primary is not reachable
⢠The member is not an Arbiter
⢠Has a priority greater than other
eligible members
72. Typical
Use?
SetâŠ
size
Deployments
DataâŠProtection HighâŠAvailability Notes
X One No No MustâŠuseâŠââjournalâŠtoâŠprotectâŠagainstâŠcrashes
OnâŠlossâŠofâŠoneâŠmember,âŠsurvivingâŠmemberâŠisâŠ
Two Yes No readâŠonly
OnâŠlossâŠofâŠoneâŠmember,âŠsurvivingâŠtwoâŠ
Three Yes YesâŠââŠ1âŠfailure membersâŠcanâŠelectâŠaâŠnewâŠprimary
*âŠOnâŠlossâŠofâŠtwoâŠmembers,âŠsurvivingâŠtwoâŠ
X Four Yes YesâŠââŠ1âŠfailure* membersâŠareâŠreadâŠonlyâŠ
OnâŠlossâŠofâŠtwoâŠmembers,âŠsurvivingâŠthreeâŠ
Five Yes YesâŠââŠ2âŠfailures membersâŠcanâŠelectâŠaâŠnewâŠprimary
73. Replication features
⢠Reads from Primary are always
consistent
⢠Reads from Secondaries are eventually
consistent
⢠Automatic failover if a Primary fails
⢠Automatic recovery when a node joins
the set
⢠Control of where writes occur
75. What is Sharding
⢠Ad-hoc partitioning
⢠Consistent hashing
⢠Amazon Dynamo
⢠Range based partitioning
⢠Google BigTable
⢠Yahoo! PNUTS
⢠MongoDB
76. MongoDB Sharding
⢠Automatic partitioning and
management
⢠Range based
⢠Convert to sharded system with no
downtime
⢠Fully consistent
78. How MongoDB Sharding works
>âŠdb.runCommand(âŠ{âŠaddshardâŠ:âŠ"shard1"âŠ}âŠ);
>âŠdb.runCommand(âŠ
âŠâŠâŠ{âŠshardCollectionâŠ:âŠâmydb.blogsâ,âŠ
âŠâŠâŠâŠâŠkeyâŠ:âŠ{âŠageâŠ:âŠ1}âŠ}âŠ)
-â Â +â Â
â˘Range keys from -â to +â Â
â˘Ranges are stored as âchunksâ
79. How MongoDB Sharding works
>âŠdb.posts.save(âŠ{age:40}âŠ)
-â Â +â Â
-â Â 40 41 +â Â
â˘Data in inserted
â˘Ranges are split into more âchunksâ
80. How MongoDB Sharding works
>âŠdb.posts.save(âŠ{age:40}âŠ)
>âŠdb.posts.save(âŠ{age:50}âŠ)
-â Â +â Â
-â Â 40 41 +â Â
41 50 51 +â Â
â˘More Data in inserted
â˘Ranges are split into moreâchunksâ
81. How MongoDB Sharding works
>âŠdb.posts.save(âŠ{age:40}âŠ)
>âŠdb.posts.save(âŠ{age:50}âŠ)
>âŠdb.posts.save(âŠ{age:60}âŠ)
-â Â +â Â
-â Â 40 41 +â Â
41 50 51 +â Â
51 60 61 +â Â
82. How MongoDB Sharding works
>âŠdb.posts.save(âŠ{age:40}âŠ)
>âŠdb.posts.save(âŠ{age:50}âŠ)
>âŠdb.posts.save(âŠ{age:60}âŠ)
-â Â +â Â
-â Â 40 41 +â Â
41 50 51 +â Â
51 60 61 +â Â
84. How MongoDB Sharding works
>âŠdb.runCommand(âŠ{âŠaddshardâŠ:âŠ"shard2"âŠ}âŠ);
-â Â 40
41 50
51 60
61 +â Â
85. How MongoDB Sharding works
>âŠdb.runCommand(âŠ{âŠaddshardâŠ:âŠ"shard2"âŠ}âŠ);
shard1
-â Â 40
41 50
51 60
61 +â Â
86. How MongoDB Sharding works
>âŠdb.runCommand(âŠ{âŠaddshardâŠ:âŠ"shard2"âŠ}âŠ);
shard1 shard2
-â Â 40
41 50
51 60
61 +â Â
87. How MongoDB Sharding works
>âŠdb.runCommand(âŠ{âŠaddshardâŠ:âŠ"shard2"âŠ}âŠ);
>âŠdb.runCommand(âŠ{âŠaddshardâŠ:âŠ"shard3"âŠ}âŠ);
shard1 shard2 shard3
-â Â 40
41 50
51 60
61 +â Â
89. Sharding Features
⢠Shard data without no downtime
⢠Automatic balancing as data is written
⢠Commands routed (switched) to correct node
⢠Inserts - must have the Shard Key
⢠Updates - must have the Shard Key
⢠Queries
⢠With Shard Key - routed to nodes
⢠Without Shard Key - scatter gather
⢠Indexed Queries
⢠With Shard Key - routed in order
⢠Without Shard Key - distributed sort merge
92. ConďŹg Servers
⢠3 of them
⢠changes are made with 2 phase
commit
⢠if any are down, meta data
goes read only
⢠system is online as long as 1/3
is up
93. ConďŹg Servers
⢠3 of them
⢠changes are made with 2 phase
commit
⢠if any are down, meta data
goes read only
⢠system is online as long as 1/3
is up
94. Shards
⢠Can be master, master/slave or
replica sets
⢠Replica sets gives sharding + full
auto-failover
⢠Regular mongod processes
95. Shards
⢠Can be master, master/slave or
replica sets
⢠Replica sets gives sharding + full
auto-failover
⢠Regular mongod processes
96. Mongos
⢠Sharding Router
⢠Acts just like a mongod to clients
⢠Can have 1 or as many as you want
⢠Can run on appserver so no extra
network traffic
97. Mongos
⢠Sharding Router
⢠Acts just like a mongod to clients
⢠Can have 1 or as many as you want
⢠Can run on appserver so no extra
network traffic
99. Priorities
⢠Prior to 2.0.0
⢠{priority:0} // Never can be elected Primary
⢠{priority:1} // Can be elected Primary
⢠New in 2.0.0
⢠Priority, ďŹoating point number between 0 and 1000
⢠During an election
⢠Most up to date
⢠Highest priority
⢠Allows weighting of members during failover
100. Priorities - example
⢠Assuming all members are up to date
A D
⢠Members A or B will be chosen ďŹrst p:2 p:1
⢠Highest priority
B E
⢠Members C or D will be chosen next if
p:2 p:0
⢠A and B are unavailable
⢠A and B are not up to date C
p:1
⢠Member E is never chosen
⢠priority:0 means it cannot be elected
101. Tagging
⢠New in 2.0.0
⢠Control over where data is written to
⢠Each member can have one or more tags e.g.
⢠tags: {dc: "ny"}
⢠tags: {dc: "ny",
ip: "192.168",
rack: "row3rk7"}
⢠Replica set deďŹnes rules for where data resides
⢠Rules can change without change application code
103. Use Cases - Multi
Data Center
⢠write to three data centers
⢠allDCs : {"dc" : 3}
⢠> db.runCommand({getLastError : 1, w : "allDCs"})
⢠write to two data centers and three availability zones
⢠allDCsPlus : {"dc" : 2, "az": 3}
⢠> db.runCommand({getLastError : 1, w : "allDCsPlus"})
USâEASTâ1 USâWESTâ1 LONDONâ1
tagâŠ:âŠ{dc:âŠ"JFK", tagâŠ:âŠ{dc:âŠ"SFO", tagâŠ:âŠ{dc:âŠ"LHR",
âŠâŠâŠâŠâŠâŠâŠaz:âŠ"r1"} âŠâŠâŠâŠâŠâŠâŠazâŠ:âŠ"r3"} âŠâŠâŠâŠâŠâŠâŠaz:âŠ"r5"}
USâEASTâ2 USâWESTâ2
tagâŠ:âŠ{dc:âŠ"JFK" tagâŠ:âŠ{dc:âŠ"SFO"
âŠâŠâŠâŠâŠâŠâŠaz:âŠ"r2"} âŠâŠâŠâŠâŠâŠâŠaz:âŠ"r4"}
104. Use Cases - Data Protection
& High Availability
⢠A and B will take priority during a failover
⢠C or D will become primary if A and B become unavailable
⢠E cannot be primary
⢠D and E cannot be read from with a slaveOk()
⢠D can use be used for Backups, feed Solr index etc.
⢠E provides a safe guard for operational or application error
E
A C
priority:âŠ0
priority:âŠ2 priority:âŠ1
hidden:âŠTrue
slaveDelay:âŠ3600
D
B
priority:âŠ1
priority:âŠ2
hidden:âŠTrue
114. http://spf13.com
http://github.com/spf13
@spf13
Questions?
download at mongodb.org
PS: Weâre hiring!! Contact us at
jobs@10gen.com
Editor's Notes
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
Remember in 1995 there were around 10,000 websites. Mosiac, Lynx, Mozilla (pre netscape) and IE 2.0 were the only web browsers. \nApache (Dec ’95), Java (’96), PHP (June ’95), and .net didn’t exist yet. Linux just barely (1.0 in ’94)\n
Remember in 1995 there were around 10,000 websites. Mosiac, Lynx, Mozilla (pre netscape) and IE 2.0 were the only web browsers. \nApache (Dec ’95), Java (’96), PHP (June ’95), and .net didn’t exist yet. Linux just barely (1.0 in ’94)\n
Remember in 1995 there were around 10,000 websites. Mosiac, Lynx, Mozilla (pre netscape) and IE 2.0 were the only web browsers. \nApache (Dec ’95), Java (’96), PHP (June ’95), and .net didn’t exist yet. Linux just barely (1.0 in ’94)\n
\n
\n
\n
\n
\n
\n
\n
\n
By reducing transactional semantics the db provides, one can still solve an interesting set of problems where performance is very important, and horizontal scaling then becomes easier.\n\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
sharding isn’t new\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
write: add new paragraph. read: read through book.\ndon't go into indexes yet\n