5. MongoDB Brings It All Together
5
Volume of Data
Agile Development
• Cloud Computing
• Commodity servers
• Trillions of records
• 100’s of millions of
queries per second
• Iterative
• Continuous
Hardware Architectures
6. MongoDB Use Cases
User Data Management High Volume Data Feeds
Content Management Operational Intelligence Product Data Mgt
6
7. 10gen: The Creators of MongoDB
Set the
direction &
contribute
code to
MongoDB
Foster
community
& ecosystem
Provide
MongoDB
management
services
Provide
commercial
services
• Founded in 2007
– Dwight Merriman, Eliot
Horowitz
– Doubleclick, Oracle,
Marklogic, HP
• $31M+ in funding
– Flybridge, Sequoia, Union
Square
• Worldwide Expanding
Team
– 150+ employees
– NY, CA and UK
7
8. Agenda
• Welcome and Introductions
• The World We Live In
• MongoDB Technical Overview
• Use Case Discussion
• Demo
8
11. Traditional Architecture
• Relational
– Hard to map to the way we code
• Complex ORM frameworks
– Hard to evolve quickly
• Rigid schema is hard to change, necessitates migrations
– Hard to scale horizontally
• Joins, transactions make scaling by adding servers hard
11
13. MongoDB
• Built from the start to solve the
scaling problem
• Consistency, Availability, Partitioning
- (can’t have it all)
• Configurable to fit requirements
13
14. 1
4
Theory of noSQL: CAP
CAP Theorem:
satisfying all three at the
same time is impossible
A P
• Many nodes
• Nodes contain replicas of
partitions of data
• Consistency
– all replicas contain the same
version of data
• Availability
– system remains operational on
failing nodes
• Partition tolarence
– multiple entry points
– system remains operational on
system split
C
21. Schema design
MongoDB: embed and link
Embedding is the nesting of objects and arrays inside
a BSON document(prejoined). Links are references
between documents(client-side follow-up query).
"contains" relationships, one to many; duplication of
data, many to many
21
32. Arrays
• $push - append
• $pushAll – append array
• $addToSet and $each – add if not contained,
add list
• $pop – remove last
• $pull – remove all occurrences/criteria
• { $pull : { field : {$gt: 3} } }
• $pullAll - removes all occurrences of each
value 32
33. Indexes
// Index nested documents
> db.posts.ensureIndex( “comments.author”:1 )
> db.posts.find({‘comments.author’:’Fred’})
// Index on tags (array values)
> db.posts.ensureIndex( tags: 1)
> db.posts.find( { tags: ’Manga’ } )
// geospatial index
> db.posts.ensureIndex({ “author.location”: “2d” )
> db.posts.find( “author.location” : { $near : [22,42] } )
Create index on any Field in Document
>db.posts.ensureIndex({author: 1})
33
34. Aggregation/Batch Data Processing
• Map/Reduce can be used for batch data processing
– Currently being used for totaling, averaging, etc
– Map/Reduce is a big hammer
• Simple aggregate functions available
• (2.2) Aggregation Framework: Simple, Fast
– No Javascript Needed, runs natively on server
– Filter or Select Only Matching Sub-documents or
Arrays via new operators
• MongoDB Hadoop Connector
– Useful for Hadoop Integration
– Massive Batch Processing Jobs
34
38. Replica Sets
• One primary, many secondaries
– Automatic replication to all secondaries
• Different delays may be configured
– Automatic election of new primary on failure
– Writes to primaries, reads can go to secondaries
• Priority of secondary can be set
– Hidden for administration/back-ups
– Lower score for less powerful machines
• Election of new primary is automatic
– Majority of replica set must be available
– Arbiters can be used
• Many configurations possible (based on use case)
38
47. • Splitting data into chunks
– Automatic
– Existing data can be manually “pre-split”
• Migration of chunks/balancing between servers
– Automatic
– Can be turned off/chunks can be manually moved
• Shard key
– Must be selected by you
– Very important for performance!
• Each shard is really a replica set
Sharding Administration
47
48. Full Deployment
mongod mongod
mongod mongod
Key Range
0..30
Key Range
31..60
Key Range
61..90
Key Range
91.. 100
Write Scalability
MongoS MongoS MongoS
Primary
Secondary
Secondary
Key Range
0..30
Primary
Secondary
Secondary
Key Range
31..60
Primary
Secondary
Secondary
Key Range
61..90
Primary
Secondary
Secondary
Key Range
91.. 100
48
50. MMS: MongoDB Monitoring Service
• SaaS solution providing instrumentation and visibility
into your MongoDB systems
• 3,500+ customers signed up and using service
50
51. Agenda
• Welcome and Introductions
• MongoDB and the New Frontier
• MongoDB Technical Overview
• Use Case Discussion
• Demo
51
52. Agenda
• Welcome and Introductions
• MongoDB and the New Frontier
• MongoDB Technical Overview
• Use Case Discussion
• Demo
52
53. Queries
• Importing Data into Mongodb
– mongoimport --db test --collection restaurants --
file dataset.json
• Exporting Data from MongoDB
– mongoexport -db test -collection newcolln -file
myexport.json
53