2. • 1970's Relational Databases Invented
– Storage is expensive
– Data is normalized
– Data storage is abstracted away from app
2
3. • 1970's Relational Databases Invented
– Storage is expensive
– Data is normalized
– Data storage is abstracted away from app
• 1980's RDBMS commercialized
– Client/Server model
– SQL becomes the standard
3
4. • 1970's Relational Databases Invented
– Storage is expensive
– Data is normalized
– Data storage is abstracted away from app
• 1980's RDBMS commercialized
– Client/Server model
– SQL becomes the standard
• 1990's Things begin to change
– Client/Server=> 3-tier architecture
– Rise of the Internet and the Web
4
5. • 2000's Web 2.0
– Rise of "Social Media"
– Acceptance of E-Commerce
– Constant decrease of HW prices
– Massive increase of collected data
5
6. • 2000's Web 2.0
– Rise of "Social Media"
– Acceptance of E-Commerce
– Constant decrease of HW prices
– Massive increase of collected data
• Result
– Constant need to scale dramatically
– How can we scale? 6
11. • Agile Development
Methodology
• Shorter development cycles
• Constant evolution of
requirements
• Flexibility at design time
11
12. • Agile Development
Methodology
• Shorter development cycles
• Constant evolution of
requirements
• Flexibility at design time
• Relational Schema
• Hard to evolve
• long painful migrations
• must stay in sync with
application
• few developers interact directly
12
13. + complex transactions
+ ad hoc queries + tabular data
+ SQL standard + ad hoc queries
protocol between - O<->R mapping hard
clients and servers - speed/scale problems
+ scales horizontally - not super agile
better than oper dbs.
- some scale limits at
massive scale BI / OLTP /
- schemas are rigid reporting operational
- no real time; great at
bulk nightly data loads
a lot more
fewer issues issues here
13
here
14. + complex transactions
+ ad hoc queries + tabular data
+ SQL standard + ad hoc queries
protocol between - O<->R mapping hard
clients and servers - speed/scale problems
+ scales horizontally - not super agile
better than oper dbs.
- some scale limits at
massive scale BI / OLTP /
- schemas are rigid reporting operational
- no real time; great at caching
bulk nightly data loads
app layer
flat files partitioning
map/reduce
14
16. • Agile Development
Methodology
• Shorter development cycles
• Constant evolution of
requirements
• Flexibility at design time
16
17. • Agile Development
Methodology
• Shorter development cycles
• Constant evolution of
requirements
• Flexibility at design time
• Relational Schema
• Hard to evolve
• long painful migrations
• must stay in sync with
application
• few developers interact directly
17
21. + speed and scale
- ad hoc query limited
- not very transactional
- no sql/no standard
+ fits OO well
scalable + agile
nonrelational
BI / reporting ("nosql")
OLTP /
operational
21
22. Non-relational next generation
operation data stores and databases
A collection of very different products
• Different data models (Not relational)
• Most are not using SQL for queries
• No predefined schema
• Some allow flexible data structures
22
31. • Designed and developed by founders of Doubleclick,
ShopWiki, GILT groupe, etc.
• GOAL: create high performance, fully consistent,
horizonally scalable general purpose data store.
• Coding started fall 2007
• Open Source – AGPL, written in C++
• First production site March 2008 - businessinsider.com
• Currently version 2.2 – August 2012
31
38. {
_id : ObjectId("4e2e3f92268cdda473b628f6"),
title : "My Very Important Thoughts",
published: ISODate("2011-07-26T19:49:00.147Z"),
author : { name:"Asya Kamsky", username:"asya" },
text : "It was a long and stormy night ..."
}
38
39. {
_id : ObjectId("4e2e3f92268cdda473b628f6"),
title : "My Very Important Thoughts",
published: ISODate("2011-07-26T19:49:00.147Z"),
author : { name:"Asya Kamsky", username:"asya" },
text : "It was a long and stormy night ..."
tags : ["business", "news", "north america"]
}
> db.posts.ensureIndex( { tags : 1 } )
39
40. {
_id : ObjectId("4e2e3f92268cdda473b628f6"),
title : "My Very Important Thoughts",
published: ISODate("2011-07-26T19:49:00.147Z"),
author : { name:"Asya Kamsky", username:"asya" },
text : "It was a long and stormy night ..."
tags : ["business", "news", "north america"]
}
> db.posts.find( { tags : "news" } )
40
41. {
_id : ObjectId("4e2e3f92268cdda473b628f6"),
title : "My Very Important Thoughts",
published: ISODate("2011-07-26T19:49:00.147Z"),
author : { name:"Asya Kamsky", username:"asya" },
text : "It was a long and stormy night ..."
tags : ["business", "news", "north america"]
}
> db.posts.find( { tags : "news" } ) .explain()
{ "cursor" : "BtreeCursor tags_1",
"isMultiKey" : true,
"n" : 1,
"nscannedObjects" : 1,
"scanAndOrder" : false,
"indexOnly" : false, 41
42. {
_id : ObjectId("4e2e3f92268cdda473b628f6"),
title : "My Very Important Thoughts",
published: ISODate("2011-07-26T19:49:00.147Z"),
author : { name:"Asya Kamsky", username:"asya" },
text : "It was a long and stormy night ..."
tags : ["business", "news", "north america"],
votes : 3,
voters : ["dmerr", "sj", "jane" ]
}
> db.posts.update( { }, – query for documents to update
{ } – update to perform
)
42
43. {
_id : ObjectId("4e2e3f92268cdda473b628f6"),
title : "My Very Important Thoughts",
published: ISODate("2011-07-26T19:49:00.147Z"),
author : { name:"Asya Kamsky", username:"asya" },
text : "It was a long and stormy night ..."
tags : ["business", "news", "north america"],
votes : 3,
voters : ["dmerr", "sj", "jane" ]
}
> db.posts.update( {_id:..., voters:{$ne:"asya"} },
{ $push: {voters:"asya"},
$inc : {votes: 1}
} )
43
44. {
_id : ObjectId("4e2e3f92268cdda473b628f6"),
title : "My Very Important Thoughts",
published: ISODate("2011-07-26T19:49:00.147Z"),
author : { name:"Asya Kamsky", username:"asya" },
text : "It was a long and stormy night ..."
tags : ["business", "news", "north america"],
votes : 4,
voters : ["dmerr", "sj", "jane", "asya" ],
comments : [
{ by : "tim157", text : "great story", ... },
{ by : "gora", text : "i don’t think so", ... },
{ by : "dmerr", text : "also check out..." }
]
}
44
45. {
_id : ObjectId("4e2e3f92268cdda473b628f6"),
title : "My Very Important Thoughts",
published: ISODate("2011-07-26T19:49:00.147Z"),
author : { name:"Asya Kamsky", username:"asya" },
text : "It was a long and stormy night ..."
tags : ["business", "news", "north america"],
votes : 4,
voters : ["dmerr", "sj", "jane","asya" ],
comments : [
{ by : "tim157", text : "great story" },
{ by : "gora", text : "i don’t think so" },
{ by : "dmerr", text : "also check out..." }
]
}
> db.posts.ensureIndex( { "comments.by" : 1 } )
45
46. {
_id : ObjectId("4e2e3f92268cdda473b628f6"),
title : "My Very Important Thoughts",
published: ISODate("2011-07-26T19:49:00.147Z"),
author : { name:"Asya Kamsky", username:"asya" },
text : "It was a long and stormy night ..."
tags : ["business", "news", "north america"],
votes : 4,
voters : ["dmerr", "sj", "jane","asya" ],
comments : [
{ by : "tim157", text : "great story" },
{ by : "gora", text : "i don’t think so" },
{ by : "dmerr", text : "also check out..." }
]
}
> db.posts.find( { "comments.by" : "gora" } )
46
47. Seek = 5+ ms Read = really really fast
Post
Comment
Author
47
48. Disk seeks and data locality
Post
Author
Comment
Comment
Comment
Comment
Comment
48
49. • High Availability
• Data Redundancy
• Increase capacity with no downtime
• Transparent to the application
49
50. • A cluster of N servers Pick me!
• Any (one) node can be primary
• All writes to primary Node 1
• Reads go to primary (default)
Node 2
optionally to a secondary
• Consensus election of primary Primary
Node 3
• Automatic failover
• Automatic recovery
50
51. Replica Sets
• High Availability/Automatic Failover
• Data Redundancy
• Disaster Recovery
• Transparent to the application
• Perform maintenance with no down time
51
58. • Increase capacity with no downtime
• Transparent to the application
• Range based partitioning
• Partitioning and balancing is automatic
58
59. Key Range Key Range Key Range Key Range
min..25 26..50 51..75 76.. max
Primary Primary Primary Primary
Secondary Secondary Secondary Secondary
Secondary Secondary Secondary Secondary
59
60. Application
MongoS
Key Range Key Range Key Range Key Range
min..25 26..50 51..75 76.. max
Primary Primary Primary Primary
Secondary Secondary Secondary Secondary
Secondary Secondary Secondary Secondary
60
61. Application
MongoS MongoS MongoS
Key Range Key Range Key Range Key Range
min..25 26..50 51..75 76.. max
Primary Primary Primary Primary
Secondary Secondary Secondary Secondary
Secondary Secondary Secondary Secondary
61
62. Application Application
Application Application
MongoS Config
MongoS Config
MongoS MongoS Config
MongoS
Key Range Key Range Key Range Key Range
min..25 26..50 51..75 76.. max
Primary Primary Primary Primary
Secondary Secondary Secondary Secondary
Secondary Secondary Secondary Secondary
62
63. • Few configuration options
• Does the right thing out of the box
• Easy to deploy and manage
63
64. Better data locality In-Memory Auto-Sharding
Caching
Read scaling
Write scaling
Relational MongoDB
We just can't get any faster than the way MongoDB handles our data.
Tony Tam
CTO, Wordnik
64
65. • Supported Platforms:
– Linux, Windows, Solaris, Mac OS X
– Packages available for all popular distributions
No external/third party software dependencies
10gen maintains drivers for over dozen languages
65
66. Content Management Operational Intelligence E-Commerce
User Data Management High Volume Data Feeds
66