MongoDB
Upcoming SlideShare
Loading in...5
×
 

MongoDB

on

  • 2,854 views

This presentation was given at the LDS Tech SORT Conference 2011 in Salt Lake City. The slides are quite comprehensive covering many topics on MongoDB. Rather than a traditional presentation, this was ...

This presentation was given at the LDS Tech SORT Conference 2011 in Salt Lake City. The slides are quite comprehensive covering many topics on MongoDB. Rather than a traditional presentation, this was presented as more of a Q & A session. Topics covered include. Introduction to MongoDB, Use Cases, Schema design, High availability (replication) and Horizontal Scaling (sharding).

Statistics

Views

Total Views
2,854
Views on SlideShare
2,691
Embed Views
163

Actions

Likes
3
Downloads
136
Comments
0

4 Embeds 163

http://spf13.com 123
http://feeds.feedburner.com 32
http://www.linkedin.com 7
http://localhost 1

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Remember in 1995 there were around 10,000 websites. Mosiac, Lynx, Mozilla (pre netscape) and IE 2.0 were the only web browsers. \nApache (Dec ’95), Java (’96), PHP (June ’95), and .net didn’t exist yet. Linux just barely (1.0 in ’94)\n
  • Remember in 1995 there were around 10,000 websites. Mosiac, Lynx, Mozilla (pre netscape) and IE 2.0 were the only web browsers. \nApache (Dec ’95), Java (’96), PHP (June ’95), and .net didn’t exist yet. Linux just barely (1.0 in ’94)\n
  • Remember in 1995 there were around 10,000 websites. Mosiac, Lynx, Mozilla (pre netscape) and IE 2.0 were the only web browsers. \nApache (Dec ’95), Java (’96), PHP (June ’95), and .net didn’t exist yet. Linux just barely (1.0 in ’94)\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • By reducing transactional semantics the db provides, one can still solve an interesting set of problems where performance is very important, and horizontal scaling then becomes easier.\n\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • sharding isn’t new\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • write: add new paragraph. read: read through book.\ndon't go into indexes yet\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • webapp: recent data\n
  • \n
  • \n
  • \n

MongoDB MongoDB Presentation Transcript

  • MongoDB
  • My name isSteve Francia @spf13
  • • 15+ years building the internet• BYU Alumnus• Father, husband, skateboarder• Chief Solutions Architect @ 10gen
  • Introduction to MongoDB
  • Why MongoDB?
  • Agility Easily model complex data Database speaks your languages (java, .net, PHP, etc)Schemaless data model enables faster development cycle
  • ScaleEasy and automatic scale out
  • CostCost effectively manage abundant data (clickstreams, logs, etc.)
  • • Company behind MongoDB • (A)GPL license, own copyrights, engineering team • support, consulting, commercial license revenue• Management • Google/DoubleClick, Oracle, Apple, NetApp • Funding: Sequoia, Union Square, Flybridge • Offices in NYC and Redwood Shores, CA • 50+ employees
  • MongoDB Goals• OpenSource• Designed for today • Today’s hardware / environments • Today’s challenges• Easy development• Reliable• Scalable
  • A bit of history
  • 1974The relational database is created
  • 1979
  • 1979 1982‐1996
  • 1979 1982‐1996 1995
  • Computers in 1995• Pentium 100 mhz• 10base T• 16 MB ram• 200 MB HD
  • Cell Phones in 2011• Dual core 1.5 Ghz• WiFi 802.11n (300+ Mbps)• 1 GB ram• 64GB Solid State
  • How about a DBdesigned for today?
  • It started with DoubleClick
  • Signs something needed• doubleclick - 400,000 ads/second• people writing their own stores• caching is de rigueur• complex ORM frameworks• computer architecture trends• cloud computing
  • Requirements• need a good degree of functionality to handle a large set of use cases • sometimes need strong consistency / atomicity • secondary indexes • ad hoc queries
  • Trim unneeded features• leave out a few things so we can scale • no choice but to leave out relational • distributed transactions are hard to scale
  • Needed a scalable data model• some options: • key/value • columnar / tabular • document oriented (JSON inspired)• opportunity to innovate -> agility
  • MongoDB philosphy• No longer one-size-fits all. but not 12 tools either.• Non-relational (no joins) makes scaling horizontally practical• Document data models are good• Keep functionality when we can (key/value stores are great, but we need more)• Database technology should run anywhere, being available both for running on your own servers or VMs, and also as a cloud pay-for-what-you-use service.• Ideally open source...
  • MongoDB• JSON Documents• Querying/Indexing/Updating similar to relational databases• Traditional Consistency• Auto-Sharding
  • Under the hood• Written in C++• Available on most platforms• Data serialized to BSON• Extensive use of memory-mapped files
  • DatabaseLandscape
  • MongoDB is: Application Document Oriented High { author: “steve”, date: new Date(),Performanc text: “About MongoDB...”, tags: [“tech”, “database”]} e Horizontally Scalable
  • This has led some to say“MongoDB has the bestfeatures of key/ valuesstores, documentdatabases and relationaldatabases in one. John Nunemaker
  • Use Cases
  • Photo Meta-Problem:• Business needed more flexibility than Oracle could deliverSolution:• Used MongoDB instead of OracleResults:• Developed application in one sprint cycle• 500% cost reduction compared to Oracle• 900% performance improvement compared to Oracle
  • Customer AnalyticsProblem:• Deal with massive data volume across all customer sitesSolution:• Used MongoDB to replace Google Analytics / Omniture optionsResults:• Less than one week to build prototype and prove business case• Rapid deployment of new features
  • OnlineProblem:• MySQL could not scale to handle their 5B+ documentsSolution:• Switched from MySQL to MongoDBResults:• Massive simplification of code base• Eliminated need for external caching system• 20x performance improvement over MySQL
  • E-commerceProblem:• Multi-vertical E-commerce impossible to model (efficiently) in RDBMSSolution:• Switched from MySQL to MongoDBResults:• Massive simplification of code base• Rapidly build, halving time to market (and cost)• Eliminated need for external caching system• 50x+ improvement over MySQL
  • Tons morePretty much if you can use a RDMBS or Key/ Value MongoDB is a great fit
  • In Good Company
  • Schema Design
  • Relational made normalized data look like this
  • Document databases makenormalized data look like this
  • Terminology RDBMS MongoTable, View ➜ CollectionRow ➜ JSON DocumentIndex ➜ IndexJoin ➜ EmbeddedPartition ➜ Document ShardPartition Key ➜ Shard Key
  • Tables toDocuments
  • Tables toDocuments { title: ‘MongoDB’, contributors: [ { name: ‘Eliot Horowitz’, email: ‘eh@10gen.com’ }, { name: ‘Dwight Merriman’, email: ‘dm@10gen.com’ } ], model: { relational: false, awesome: true }
  • DEMO TIME
  • DocumentsBlog Post Document> p = {author: “roger”, date: new Date(), text: “about mongoDB...”, tags: [“tech”, “databases”]}> db.posts.save(p)
  • Querying> db.posts.find()> { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "roger", date : "Sat Jul 24 2010 19:47:11", text : "About MongoDB...", tags : [ "tech", "databases" ] } Note: _id is unique, but can beanything you’d like
  • Secondary IndexesCreate index on any Field in Document
  • Secondary IndexesCreate index on any Field in Document // 1 means ascending, -1 means descending > db.posts.ensureIndex({author: 1}) > db.posts.find({author: roger})> { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "roger", ... }
  • Conditional Query Operators$all, $exists, $mod, $ne, $in, $nin, $nor,$or, $size, $type, $lt, $lte, $gt, $gte
  • Conditional Query Operators$all, $exists, $mod, $ne, $in, $nin, $nor,$or, $size, $type, $lt, $lte, $gt, $gte// find posts with any tags> db.posts.find( {tags: {$exists: true }} )// find posts matching a regular expression> db.posts.find( {author: /^rog*/i } )// count posts by author> db.posts.find( {author: ‘roger’} ).count()
  • Update Operations$set, $unset, $inc, $push, $pushAll,$pull, $pullAll, $bit
  • Update Operations $set, $unset, $inc, $push, $pushAll, $pull, $pullAll, $bit> comment = { author: “fred”, date: new Date(), text: “Best Movie Ever”}> db.posts.update( { _id: “...” }, $push: {comments: comment} );
  • Nested Documents { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "roger", date : "Sat Apr 24 2011 19:47:11", text : "About MongoDB...", tags : [ "tech", "databases" ], comments : [ { author : "Fred", date : "Sat Apr 25 2010 20:51:03 GMT-0700", text : "Best Post Ever!" } ]}
  • Secondary Indexes// Index nested documents> db.posts.ensureIndex( “comments.author”: 1)> db.posts.find({‘comments.author’:’Fred’})// Index on tags (multi-key index)> db.posts.ensureIndex( tags: 1)> db.posts.find( { tags: ‘tech’ } )// geospatial index> db.posts.ensureIndex( “author.location”: “2d” )> db.posts.find( “author.location”: { $near : [22,42] } )
  • Rich Documents{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), line_items : [ { sku: ‘tt-123’, name: ‘Coltrane: Impressions’ }, { sku: ‘tt-457’, name: ‘Davis: Kind of Blue’ } ], address : { name: ‘Banker’, street: ‘111 Main’, zip: 10010 }, payment: { cc: 4567, exp: Date(2011, 7, 7) }, subtotal: 2355}
  • High Availability
  • MongoDB Replication•MongoDB replication like MySQL replication (kinda)•Asynchronous master/slave•Variations •Master / slave •Replica Sets
  • Replica Set features• A cluster of N servers• Any (one) node can be primary• Consensus election of primary• Automatic failover• Automatic recovery• All writes to primary• Reads can be to primary (default) or a secondary
  • How MongoDBReplication works Member
1 Member
3 Member
2 Set is made up of 2 or more nodes
  • How MongoDB Replication works Member
1 Member
3 Member
2 PRIMARY Election establishes the PRIMARYData replication from PRIMARY to SECONDARY
  • How MongoDB Replication works negotiate
 new
master Member
1 Member
3 Member
2 DOWN PRIMARY may failAutomatic election of new PRIMARY if majority exists
  • How MongoDBReplication works Member
3 Member
1 PRIMARY Member
2 DOWN New PRIMARY elected Replication Set re-established
  • How MongoDBReplication works Member
3 Member
1 PRIMARY Member
2 RECOVERING Automatic recovery
  • How MongoDBReplication works Member
3 Member
1 PRIMARY Member
2 Replication Set re-established
  • Creating a Replica Set> cfg = { _id : "acme_a", members : [ { _id : 0, host : "sf1.acme.com" }, { _id : 1, host : "sf2.acme.com" }, { _id : 2, host : "sf3.acme.com" } ] }> use admin> db.runCommand( { replSetInitiate : cfg } )
  • Replica Set Options• {arbiterOnly: True} • Can vote in an election • Does not hold any data• {hidden: True} • Not reported in isMaster() • Will not be sent slaveOk() reads• {priority: n}• {tags: }
  • Using Replicas for Reads• slaveOk() • - driver will send read requests to Secondaries • - driver will always send writes to Primary • Java examples • - DB.slaveOk() • - Collection.slaveOk()• find(q).addOption(Bytes.QUERYOPTION_SLAVEO K);
  • Safe Writes• db.runCommand({getLastError: 1, w : 1}) • - ensure write is synchronous • - command returns after primary has written to memory• w=n or w=majority • n is the number of nodes data must be replicated to • driver will always send writes to Primary• w=myTag [MongoDB 2.0] • Each member is "tagged" e.g. "US_EAST", "EMEA", "US_WEST" • Ensure that the write is executed in each tagged "region"
  • Safe Writes• fsync:true • Ensures changed disk blocks are flushed to disk• j:true • Ensures changes are flush to Journal
  • When are elections triggered?• When a given member sees that the Primary is not reachable• The member is not an Arbiter• Has a priority greater than other eligible members
  • TypicalUse? Set
 size Deployments Data
Protection High
Availability Notes X One No No Must
use
‐‐journal
to
protect
against
crashes On
loss
of
one
member,
surviving
member
is
 Two Yes No read
only On
loss
of
one
member,
surviving
two
 Three Yes Yes
‐
1
failure members
can
elect
a
new
primary *
On
loss
of
two
members,
surviving
two
 X Four Yes Yes
‐
1
failure* members
are
read
only
 On
loss
of
two
members,
surviving
three
 Five Yes Yes
‐
2
failures members
can
elect
a
new
primary
  • Replication features• Reads from Primary are always consistent• Reads from Secondaries are eventually consistent• Automatic failover if a Primary fails• Automatic recovery when a node joins the set• Control of where writes occur
  • ScalingSharding MongoDB
  • What is Sharding• Ad-hoc partitioning• Consistent hashing • Amazon Dynamo• Range based partitioning • Google BigTable • Yahoo! PNUTS • MongoDB
  • MongoDB Sharding• Automatic partitioning and management• Range based• Convert to sharded system with no downtime• Fully consistent
  • How MongoDBSharding Works
  • How MongoDB Sharding works >
db.runCommand(
{
addshard
:
"shard1"
}
); >
db.runCommand(
 


{
shardCollection
:
“mydb.blogs”,
 




key
:
{
age
:
1}
}
) -∞   +∞  •Range keys from -∞ to +∞  •Ranges are stored as “chunks”
  • How MongoDB Sharding works >
db.posts.save(
{age:40}
) -∞   +∞   -∞   40 41 +∞  •Data in inserted•Ranges are split into more “chunks”
  • How MongoDB Sharding works >
db.posts.save(
{age:40}
) >
db.posts.save(
{age:50}
) -∞   +∞   -∞   40 41 +∞   41 50 51 +∞  •More Data in inserted•Ranges are split into more“chunks”
  • How MongoDB Sharding works>
db.posts.save(
{age:40}
)>
db.posts.save(
{age:50}
)>
db.posts.save(
{age:60}
) -∞   +∞   -∞   40 41 +∞   41 50 51 +∞   51 60 61 +∞  
  • How MongoDB Sharding works>
db.posts.save(
{age:40}
)>
db.posts.save(
{age:50}
)>
db.posts.save(
{age:60}
) -∞   +∞   -∞   40 41 +∞   41 50 51 +∞   51 60 61 +∞  
  • How MongoDB Sharding worksshard1 -∞   40 41 50 51 60 61 +∞  
  • How MongoDB Sharding works>
db.runCommand(
{
addshard
:
"shard2"
}
); -∞   40 41 50 51 60 61 +∞  
  • How MongoDB Sharding works>
db.runCommand(
{
addshard
:
"shard2"
}
);shard1 -∞   40 41 50 51 60 61 +∞  
  • How MongoDB Sharding works>
db.runCommand(
{
addshard
:
"shard2"
}
);shard1 shard2 -∞   40 41 50 51 60 61 +∞  
  • How MongoDB Sharding works>
db.runCommand(
{
addshard
:
"shard2"
}
);>
db.runCommand(
{
addshard
:
"shard3"
}
);shard1 shard2 shard3 -∞   40 41 50 51 60 61 +∞  
  • How MongoDBSharding Works
  • Sharding Features• Shard data without no downtime• Automatic balancing as data is written• Commands routed (switched) to correct node • Inserts - must have the Shard Key • Updates - must have the Shard Key • Queries • With Shard Key - routed to nodes • Without Shard Key - scatter gather • Indexed Queries • With Shard Key - routed in order • Without Shard Key - distributed sort merge
  • ShardingArchitecture
  • Architecture
  • Config Servers• 3 of them• changes are made with 2 phase commit• if any are down, meta data goes read only• system is online as long as 1/3 is up
  • Config Servers• 3 of them• changes are made with 2 phase commit• if any are down, meta data goes read only• system is online as long as 1/3 is up
  • Shards• Can be master, master/slave or replica sets• Replica sets gives sharding + full auto-failover• Regular mongod processes
  • Shards• Can be master, master/slave or replica sets• Replica sets gives sharding + full auto-failover• Regular mongod processes
  • Mongos• Sharding Router• Acts just like a mongod to clients• Can have 1 or as many as you want• Can run on appserver so no extra network traffic
  • Mongos• Sharding Router• Acts just like a mongod to clients• Can have 1 or as many as you want• Can run on appserver so no extra network traffic
  • AdvancedReplication
  • Priorities• Prior to 2.0.0 • {priority:0} // Never can be elected Primary • {priority:1} // Can be elected Primary• New in 2.0.0 • Priority, floating point number between 0 and 1000 • During an election • Most up to date • Highest priority • Allows weighting of members during failover
  • Priorities - example• Assuming all members are up to date A D• Members A or B will be chosen first p:2 p:1 • Highest priority B E• Members C or D will be chosen next if p:2 p:0 • A and B are unavailable • A and B are not up to date C p:1• Member E is never chosen • priority:0 means it cannot be elected
  • Tagging• New in 2.0.0• Control over where data is written to• Each member can have one or more tags e.g. • tags: {dc: "ny"} • tags: {dc: "ny", ip: "192.168", rack: "row3rk7"}• Replica set defines rules for where data resides• Rules can change without change application code
  • Tagging - example{ _id : "mySet", members : [ {_id : 0, host : "A", tags : {"dc": "ny"}}, {_id : 1, host : "B", tags : {"dc": "ny"}}, {_id : 2, host : "C", tags : {"dc": "sf"}}, {_id : 3, host : "D", tags : {"dc": "sf"}}, {_id : 4, host : "E", tags : {"dc": "cloud"}}] settings : { getLastErrorModes : { allDCs : {"dc" : 3}, someDCs : {"dc" : 2}} }}> db.blogs.insert({...})> db.runCommand({getLastError : 1, w : "allDCs"})
  • Use Cases - Multi Data Center • write to three data centers • allDCs : {"dc" : 3} • > db.runCommand({getLastError : 1, w : "allDCs"}) • write to two data centers and three availability zones • allDCsPlus : {"dc" : 2, "az": 3} • > db.runCommand({getLastError : 1, w : "allDCsPlus"})US‐EAST‐1 US‐WEST‐1 LONDON‐1tag
:
{dc:
"JFK", tag
:
{dc:
"SFO", tag
:
{dc:
"LHR",






az:
"r1"} 






az
:
"r3"} 






az:
"r5"}US‐EAST‐2 US‐WEST‐2tag
:
{dc:
"JFK" tag
:
{dc:
"SFO"






az:
"r2"} 






az:
"r4"}
  • Use Cases - Data Protection & High Availability• A and B will take priority during a failover• C or D will become primary if A and B become unavailable• E cannot be primary• D and E cannot be read from with a slaveOk()• D can use be used for Backups, feed Solr index etc.• E provides a safe guard for operational or application error E A C priority:
0priority:
2 priority:
1 hidden:
True slaveDelay:
3600 D B priority:
1priority:
2 hidden:
True
  • Optimizing app performance
  • RAMDisk
  • RAMDisk
  • RAMDisk
  • RAMDisk
  • GoalMinimize memory turnover
  • What is your data access pattern?
  • 10 days of data RAMDisk
  • http://spf13.com http://github.com/spf13 @spf13 Questions?download at mongodb.orgPS: We’re hiring!! Contact us at jobs@10gen.com