The document provides an overview of MongoDB, detailing its advantages such as agility from a schemaless data model, scalability for handling large datasets, and cost-effectiveness. It discusses MongoDB's architecture, including features like replication, sharding, and its compatibility with various programming languages. Additionally, it highlights use cases demonstrating significant performance and cost improvements over traditional relational databases.
Signs something
needed
• doubleclick - 400,000 ads/second
• people writing their own stores
• caching is de rigueur
• complex ORM frameworks
• computer architecture trends
• cloud computing
23.
Requirements
• need agood degree of functionality
to handle a large set of use cases
• sometimes need strong
consistency / atomicity
• secondary indexes
• ad hoc queries
24.
Trim unneeded
features
• leave out a few things so we can
scale
• no choice but to leave out
relational
• distributed transactions are hard
to scale
25.
Needed a scalable
data model
• some options:
• key/value
• columnar / tabular
• document oriented (JSON inspired)
• opportunity to innovate -> agility
26.
MongoDB philosphy
• No longer one-size-fits all. but not 12 tools either.
• Non-relational (no joins) makes scaling horizontally
practical
• Document data models are good
• Keep functionality when we can (key/value stores are
great, but we need more)
• Database technology should run anywhere, being
available both for running on your own servers or VMs,
and also as a cloud pay-for-what-you-use service.
• Ideally open source...
27.
MongoDB
• JSON Documents
•Querying/Indexing/Updating similar
to relational databases
• Traditional Consistency
• Auto-Sharding
28.
Under the hood
•Written in C++
• Available on most platforms
• Data serialized to BSON
• Extensive use of memory-mapped
files
Photo Meta-
Problem:
• Businessneeded more flexibility than Oracle could deliver
Solution:
• Used MongoDB instead of Oracle
Results:
• Developed application in one sprint cycle
• 500% cost reduction compared to Oracle
• 900% performance improvement compared to Oracle
34.
Customer Analytics
Problem:
• Dealwith massive data volume across all customer sites
Solution:
• Used MongoDB to replace Google Analytics / Omniture
options
Results:
• Less than one week to build prototype and prove business
case
• Rapid deployment of new features
35.
Online
Problem:
• MySQL couldnot scale to handle their 5B+ documents
Solution:
• Switched from MySQL to MongoDB
Results:
• Massive simplification of code base
• Eliminated need for external caching system
• 20x performance improvement over MySQL
36.
E-commerce
Problem:
• Multi-vertical E-commerceimpossible to model (efficiently)
in RDBMS
Solution:
• Switched from MySQL to MongoDB
Results:
• Massive simplification of code base
• Rapidly build, halving time to market (and cost)
• Eliminated need for external caching system
• 50x+ improvement over MySQL
MongoDB
Replication
•MongoDB replication like MySQL
replication (kinda)
•Asynchronous master/slave
•Variations
•Master / slave
•Replica Sets
59.
Replica Set features
• A cluster of N servers
• Any (one) node can be primary
• Consensus election of primary
• Automatic failover
• Automatic recovery
• All writes to primary
• Reads can be to primary (default) or a
secondary
How MongoDB
Replication works
Member 1 Member 3
Member 2
PRIMARY
Election establishes the PRIMARY
Data replication from PRIMARY to SECONDARY
62.
How MongoDB
Replication works
negotiate
new master
Member 1 Member 3
Member 2
DOWN
PRIMARY may fail
Automatic election of new PRIMARY if majority
exists
63.
How MongoDB
Replication works
Member 3
Member 1
PRIMARY
Member 2
DOWN
New PRIMARY elected
Replication Set re-established
Replica Set Options
• {arbiterOnly: True}
• Can vote in an election
• Does not hold any data
• {hidden: True}
• Not reported in isMaster()
• Will not be sent slaveOk() reads
• {priority: n}
• {tags: }
68.
Using Replicas for
Reads
• slaveOk()
• - driver will send read requests to
Secondaries
• - driver will always send writes to Primary
• Java examples
• - DB.slaveOk()
• - Collection.slaveOk()
• find(q).addOption(Bytes.QUERYOPTION_SLAVEO
K);
69.
Safe Writes
• db.runCommand({getLastError: 1, w : 1})
• - ensure write is synchronous
• - command returns after primary has written to memory
• w=n or w='majority'
• n is the number of nodes data must be replicated to
• driver will always send writes to Primary
• w='myTag' [MongoDB 2.0]
• Each member is "tagged" e.g. "US_EAST", "EMEA",
"US_WEST"
• Ensure that the write is executed in each tagged "region"
70.
Safe Writes
• fsync:true
• Ensures changed disk blocks are
flushed to disk
• j:true
• Ensures changes are flush to
Journal
71.
When are elections
triggered?
• When a given member see's that the
Primary is not reachable
• The member is not an Arbiter
• Has a priority greater than other
eligible members
72.
Typical
Use?
Set
size
Deployments
Data Protection High Availability Notes
X One No No Must use ‐‐journal to protect against crashes
On loss of one member, surviving member is
Two Yes No read only
On loss of one member, surviving two
Three Yes Yes ‐ 1 failure members can elect a new primary
* On loss of two members, surviving two
X Four Yes Yes ‐ 1 failure* members are read only
On loss of two members, surviving three
Five Yes Yes ‐ 2 failures members can elect a new primary
73.
Replication features
• Reads from Primary are always
consistent
• Reads from Secondaries are eventually
consistent
• Automatic failover if a Primary fails
• Automatic recovery when a node joins
the set
• Control of where writes occur
Sharding Features
• Shard data without no downtime
• Automatic balancing as data is written
• Commands routed (switched) to correct node
• Inserts - must have the Shard Key
• Updates - must have the Shard Key
• Queries
• With Shard Key - routed to nodes
• Without Shard Key - scatter gather
• Indexed Queries
• With Shard Key - routed in order
• Without Shard Key - distributed sort merge
Priorities
• Prior to 2.0.0
• {priority:0} // Never can be elected Primary
• {priority:1} // Can be elected Primary
• New in 2.0.0
• Priority, floating point number between 0 and 1000
• During an election
• Most up to date
• Highest priority
• Allows weighting of members during failover
100.
Priorities - example
• Assuming all members are up to date
A D
• Members A or B will be chosen first p:2 p:1
• Highest priority
B E
• Members C or D will be chosen next if
p:2 p:0
• A and B are unavailable
• A and B are not up to date C
p:1
• Member E is never chosen
• priority:0 means it cannot be elected
101.
Tagging
• New in 2.0.0
• Control over where data is written to
• Each member can have one or more tags e.g.
• tags: {dc: "ny"}
• tags: {dc: "ny",
ip: "192.168",
rack: "row3rk7"}
• Replica set defines rules for where data resides
• Rules can change without change application code
Use Cases -Multi
Data Center
• write to three data centers
• allDCs : {"dc" : 3}
• > db.runCommand({getLastError : 1, w : "allDCs"})
• write to two data centers and three availability zones
• allDCsPlus : {"dc" : 2, "az": 3}
• > db.runCommand({getLastError : 1, w : "allDCsPlus"})
US‐EAST‐1 US‐WEST‐1 LONDON‐1
tag : {dc: "JFK", tag : {dc: "SFO", tag : {dc: "LHR",
az: "r1"} az : "r3"} az: "r5"}
US‐EAST‐2 US‐WEST‐2
tag : {dc: "JFK" tag : {dc: "SFO"
az: "r2"} az: "r4"}
104.
Use Cases -Data Protection
& High Availability
• A and B will take priority during a failover
• C or D will become primary if A and B become unavailable
• E cannot be primary
• D and E cannot be read from with a slaveOk()
• D can use be used for Backups, feed Solr index etc.
• E provides a safe guard for operational or application error
E
A C
priority: 0
priority: 2 priority: 1
hidden: True
slaveDelay: 3600
D
B
priority: 1
priority: 2
hidden: True
#15 Remember in 1995 there were around 10,000 websites. Mosiac, Lynx, Mozilla (pre netscape) and IE 2.0 were the only web browsers. \nApache (Dec ’95), Java (’96), PHP (June ’95), and .net didn’t exist yet. Linux just barely (1.0 in ’94)\n
#16 Remember in 1995 there were around 10,000 websites. Mosiac, Lynx, Mozilla (pre netscape) and IE 2.0 were the only web browsers. \nApache (Dec ’95), Java (’96), PHP (June ’95), and .net didn’t exist yet. Linux just barely (1.0 in ’94)\n
#17 Remember in 1995 there were around 10,000 websites. Mosiac, Lynx, Mozilla (pre netscape) and IE 2.0 were the only web browsers. \nApache (Dec ’95), Java (’96), PHP (June ’95), and .net didn’t exist yet. Linux just barely (1.0 in ’94)\n
#26 By reducing transactional semantics the db provides, one can still solve an interesting set of problems where performance is very important, and horizontal scaling then becomes easier.\n\n