NoSQL: Introduction              Asya Kamsky                            1
• 1970s Relational Databases Invented  – Storage is expensive  – Data is normalized  – Data storage is abstracted away fro...
• 1970s Relational Databases Invented  – Storage is expensive  – Data is normalized  – Data storage is abstracted away fro...
• 1970s Relational Databases Invented  – Storage is expensive  – Data is normalized  – Data storage is abstracted away fro...
• 2000s Web 2.0  –   Rise of "Social Media"  –   Acceptance of E-Commerce  –   Constant decrease of HW prices  –   Massive...
• 2000s Web 2.0  –   Rise of "Social Media"  –   Acceptance of E-Commerce  –   Constant decrease of HW prices  –   Massive...
Computers in 1985• x286 5-35 mhz• 56 kbps• 64 KB RAM• 10 MB HDD                    7
Computers in 1985 Computers in 1995• x286 5-35 mhz   • Pentium 100 mhz• 56 kbps         • 20-50 Mbps• 64 KB RAM       • 16...
Computers in 1985 Computers in 1995   Phone in 2012• x286 5-35 mhz   • Pentium 100 mhz   • Dual core 1.2 Ghz• 56 kbps     ...
Computers in 1985 Computers in 1995   Computers in 2012• x286 5-35 mhz   • Pentium 100 mhz   • Dual core 1.8 Ghz• 56 kbps ...
• Agile Development  Methodology   • Shorter development cycles   • Constant evolution of     requirements   • Flexibility...
• Agile Development  Methodology   • Shorter development cycles   • Constant evolution of     requirements   • Flexibility...
+ complex transactions+ ad hoc queries                                            + tabular data+ SQL standard            ...
+ complex transactions+ ad hoc queries                                                      + tabular data+ SQL standard  ...
15
• Agile Development  Methodology   • Shorter development cycles   • Constant evolution of     requirements   • Flexibility...
• Agile Development  Methodology   • Shorter development cycles   • Constant evolution of     requirements   • Flexibility...
18
•   Horizontal scaling•   Run anywhere•   Flexible data model•   Faster development•   Low upfront cost•   Low cost of own...
What is NoSQL?           Relational                   vs       Non-Relational                        20
+ speed and scale                                   - ad hoc query limited                                   - not very tr...
Non-relational next generation     operation data stores and databasesA collection of very different products•   Different...
• Relational   •   Key-Value               •   Document               •   XML               •   Graph               •   Co...
• Relational   •   Key-Value               •   Document               •   XML               •   Graph               •   Co...
• Relational         •   Key-Value                     •   Document                     •   XML                     •   Gr...
• Relational         •   Key-Value                     •   Document                     •   XML                     •   Gr...
27
• Fits your use case• Reliability• Maintainability• Ease of Use• Scalability• Cost                       28
MongoDB: Introduction                        29
30
• Designed and developed by founders of Doubleclick,  ShopWiki, GILT groupe, etc.• GOAL: create high performance, fully co...
MongoDBDesign Goals               32
33
• Document-oriented  Storage   • Based on JSON     Documents   • Data serialized to BSON   • Flexible Schema• Scalable Arc...
• Rich data models• Seamlessly map to native programming  language types• Flexible for dynamic data• Better data locality ...
Blogging website:  Register users  Users post blog entries  Comment on others entries  Considering:      Tagging, Voting, ...
jointable        37
{    _id : ObjectId("4e2e3f92268cdda473b628f6"),    title : "My Very Important Thoughts",    published: ISODate("2011-07-2...
{    _id : ObjectId("4e2e3f92268cdda473b628f6"),    title : "My Very Important Thoughts",    published: ISODate("2011-07-2...
{    _id : ObjectId("4e2e3f92268cdda473b628f6"),    title : "My Very Important Thoughts",    published: ISODate("2011-07-2...
{    _id : ObjectId("4e2e3f92268cdda473b628f6"),    title : "My Very Important Thoughts",    published: ISODate("2011-07-2...
{    _id : ObjectId("4e2e3f92268cdda473b628f6"),    title : "My Very Important Thoughts",    published: ISODate("2011-07-2...
{    _id : ObjectId("4e2e3f92268cdda473b628f6"),    title : "My Very Important Thoughts",    published: ISODate("2011-07-2...
{    _id : ObjectId("4e2e3f92268cdda473b628f6"),    title : "My Very Important Thoughts",    published: ISODate("2011-07-2...
{    _id : ObjectId("4e2e3f92268cdda473b628f6"),    title : "My Very Important Thoughts",    published: ISODate("2011-07-2...
{    _id : ObjectId("4e2e3f92268cdda473b628f6"),    title : "My Very Important Thoughts",    published: ISODate("2011-07-2...
Seek = 5+ ms          Read = really really fast               Post                              CommentAuthor             ...
Disk seeks and data locality    Post      Author      Comment       Comment        Comment         Comment           Comme...
• High Availability• Data Redundancy• Increase capacity with no downtime• Transparent to the application                  ...
•   A cluster of N servers              Pick me!•   Any (one) node can be primary•   All writes to primary            Node...
Replica Sets• High Availability/Automatic Failover• Data Redundancy• Disaster Recovery• Transparent to the application• Pe...
AsynchronousReplication          52
AsynchronousReplication          53
AsynchronousReplication          54
55
Automatic Election            56
57
•   Increase capacity with no downtime•   Transparent to the application•   Range based partitioning•   Partitioning and b...
Key Range    Key Range    Key Range   Key Range min..25      26..50       51..75      76.. maxPrimary      Primary      Pr...
Application                 MongoS Key Range    Key Range         Key Range   Key Range min..25      26..50            51....
Application          MongoS       MongoS        MongoS Key Range          Key Range         Key Range   Key Range min..25 ...
Application        Application                                 Application      Application                      MongoS   ...
• Few configuration options• Does the right thing out of the box• Easy to deploy and manage                               ...
Better data locality                 In-Memory                                        Auto-Sharding                       ...
• Supported Platforms:  – Linux, Windows, Solaris, Mac OS X  – Packages available for all popular distributions  No extern...
Content Management       Operational Intelligence           E-Commerce            User Data Management         High Volume...
67
Open source, high performance database                                         68
Upcoming SlideShare
Loading in...5
×

Intro to NoSQL and MongoDB

3,185

Published on

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,185
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
136
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Intro to NoSQL and MongoDB

  1. 1. NoSQL: Introduction Asya Kamsky 1
  2. 2. • 1970s Relational Databases Invented – Storage is expensive – Data is normalized – Data storage is abstracted away from app 2
  3. 3. • 1970s Relational Databases Invented – Storage is expensive – Data is normalized – Data storage is abstracted away from app• 1980s RDBMS commercialized – Client/Server model – SQL becomes the standard 3
  4. 4. • 1970s Relational Databases Invented – Storage is expensive – Data is normalized – Data storage is abstracted away from app• 1980s RDBMS commercialized – Client/Server model – SQL becomes the standard• 1990s Things begin to change – Client/Server=> 3-tier architecture – Rise of the Internet and the Web 4
  5. 5. • 2000s Web 2.0 – Rise of "Social Media" – Acceptance of E-Commerce – Constant decrease of HW prices – Massive increase of collected data 5
  6. 6. • 2000s Web 2.0 – Rise of "Social Media" – Acceptance of E-Commerce – Constant decrease of HW prices – Massive increase of collected data• Result – Constant need to scale dramatically – How can we scale? 6
  7. 7. Computers in 1985• x286 5-35 mhz• 56 kbps• 64 KB RAM• 10 MB HDD 7
  8. 8. Computers in 1985 Computers in 1995• x286 5-35 mhz • Pentium 100 mhz• 56 kbps • 20-50 Mbps• 64 KB RAM • 16 MB RAM• 10 MB HDD • 200 MB HDD 8
  9. 9. Computers in 1985 Computers in 1995 Phone in 2012• x286 5-35 mhz • Pentium 100 mhz • Dual core 1.2 Ghz• 56 kbps • 20-50 Mbps • WiFi 802.11n -• 64 KB RAM • 16 MB RAM 300+Mbps• 10 MB HDD • 200 MB HDD • 1 GB RAM • 48 GB SSD 9
  10. 10. Computers in 1985 Computers in 1995 Computers in 2012• x286 5-35 mhz • Pentium 100 mhz • Dual core 1.8 Ghz• 56 kbps • 20-50 Mbps • WiFi 802.11n -• 64 KB RAM • 16 MB RAM 300+Mbps• 10 MB HDD • 200 MB HDD • 180+ Gbps • 8 GB RAM • 512 GB SSD 10
  11. 11. • Agile Development Methodology • Shorter development cycles • Constant evolution of requirements • Flexibility at design time 11
  12. 12. • Agile Development Methodology • Shorter development cycles • Constant evolution of requirements • Flexibility at design time • Relational Schema • Hard to evolve • long painful migrations • must stay in sync with application • few developers interact directly 12
  13. 13. + complex transactions+ ad hoc queries + tabular data+ SQL standard + ad hoc queriesprotocol between - O<->R mapping hardclients and servers - speed/scale problems+ scales horizontally - not super agilebetter than oper dbs.- some scale limits atmassive scale BI / OLTP /- schemas are rigid reporting operational- no real time; great atbulk nightly data loads a lot more fewer issues issues here 13 here
  14. 14. + complex transactions+ ad hoc queries + tabular data+ SQL standard + ad hoc queriesprotocol between - O<->R mapping hardclients and servers - speed/scale problems+ scales horizontally - not super agilebetter than oper dbs.- some scale limits atmassive scale BI / OLTP /- schemas are rigid reporting operational- no real time; great at cachingbulk nightly data loads app layer flat files partitioning map/reduce 14
  15. 15. 15
  16. 16. • Agile Development Methodology • Shorter development cycles • Constant evolution of requirements • Flexibility at design time 16
  17. 17. • Agile Development Methodology • Shorter development cycles • Constant evolution of requirements • Flexibility at design time • Relational Schema • Hard to evolve • long painful migrations • must stay in sync with application • few developers interact directly 17
  18. 18. 18
  19. 19. • Horizontal scaling• Run anywhere• Flexible data model• Faster development• Low upfront cost• Low cost of ownership 19
  20. 20. What is NoSQL? Relational vs Non-Relational 20
  21. 21. + speed and scale - ad hoc query limited - not very transactional - no sql/no standard + fits OO well scalable + agile nonrelationalBI / reporting ("nosql") OLTP / operational 21
  22. 22. Non-relational next generation operation data stores and databasesA collection of very different products• Different data models (Not relational)• Most are not using SQL for queries• No predefined schema• Some allow flexible data structures 22
  23. 23. • Relational • Key-Value • Document • XML • Graph • Column 23
  24. 24. • Relational • Key-Value • Document • XML • Graph • Column• ACID • BASE • Some ACID properties 24
  25. 25. • Relational • Key-Value • Document • XML • Graph • Column• ACID • BASE • Some ACID properties• Two-phase commit • Atomic transactions on document level 25
  26. 26. • Relational • Key-Value • Document • XML • Graph • Column• ACID • BASE • Some ACID properties• Two-phase commit • Atomic transactions on document level• Joins • No Joins 26
  27. 27. 27
  28. 28. • Fits your use case• Reliability• Maintainability• Ease of Use• Scalability• Cost 28
  29. 29. MongoDB: Introduction 29
  30. 30. 30
  31. 31. • Designed and developed by founders of Doubleclick, ShopWiki, GILT groupe, etc.• GOAL: create high performance, fully consistent, horizonally scalable general purpose data store.• Coding started fall 2007• Open Source – AGPL, written in C++• First production site March 2008 - businessinsider.com• Currently version 2.2 – August 2012 31
  32. 32. MongoDBDesign Goals 32
  33. 33. 33
  34. 34. • Document-oriented Storage • Based on JSON Documents • Data serialized to BSON • Flexible Schema• Scalable Architecture • Replication • High availability • Key Features Include: • Auto-sharding • Full featured indexes • Extensive use of memory • Ad-hoc Query Language mapped files • Interactive shell • Durable • Aggregation queries • Strong Consistency • Map/Reduce 34
  35. 35. • Rich data models• Seamlessly map to native programming language types• Flexible for dynamic data• Better data locality 35
  36. 36. Blogging website: Register users Users post blog entries Comment on others entries Considering: Tagging, Voting, ??? 36
  37. 37. jointable 37
  38. 38. { _id : ObjectId("4e2e3f92268cdda473b628f6"), title : "My Very Important Thoughts", published: ISODate("2011-07-26T19:49:00.147Z"), author : { name:"Asya Kamsky", username:"asya" }, text : "It was a long and stormy night ..."} 38
  39. 39. { _id : ObjectId("4e2e3f92268cdda473b628f6"), title : "My Very Important Thoughts", published: ISODate("2011-07-26T19:49:00.147Z"), author : { name:"Asya Kamsky", username:"asya" }, text : "It was a long and stormy night ..." tags : ["business", "news", "north america"]}> db.posts.ensureIndex( { tags : 1 } ) 39
  40. 40. { _id : ObjectId("4e2e3f92268cdda473b628f6"), title : "My Very Important Thoughts", published: ISODate("2011-07-26T19:49:00.147Z"), author : { name:"Asya Kamsky", username:"asya" }, text : "It was a long and stormy night ..." tags : ["business", "news", "north america"]}> db.posts.find( { tags : "news" } ) 40
  41. 41. { _id : ObjectId("4e2e3f92268cdda473b628f6"), title : "My Very Important Thoughts", published: ISODate("2011-07-26T19:49:00.147Z"), author : { name:"Asya Kamsky", username:"asya" }, text : "It was a long and stormy night ..." tags : ["business", "news", "north america"]}> db.posts.find( { tags : "news" } ) .explain(){ "cursor" : "BtreeCursor tags_1", "isMultiKey" : true, "n" : 1, "nscannedObjects" : 1, "scanAndOrder" : false, "indexOnly" : false, 41
  42. 42. { _id : ObjectId("4e2e3f92268cdda473b628f6"), title : "My Very Important Thoughts", published: ISODate("2011-07-26T19:49:00.147Z"), author : { name:"Asya Kamsky", username:"asya" }, text : "It was a long and stormy night ..." tags : ["business", "news", "north america"], votes : 3, voters : ["dmerr", "sj", "jane" ]}> db.posts.update( { }, – query for documents to update { } – update to perform ) 42
  43. 43. { _id : ObjectId("4e2e3f92268cdda473b628f6"), title : "My Very Important Thoughts", published: ISODate("2011-07-26T19:49:00.147Z"), author : { name:"Asya Kamsky", username:"asya" }, text : "It was a long and stormy night ..." tags : ["business", "news", "north america"], votes : 3, voters : ["dmerr", "sj", "jane" ]}> db.posts.update( {_id:..., voters:{$ne:"asya"} }, { $push: {voters:"asya"}, $inc : {votes: 1} } ) 43
  44. 44. { _id : ObjectId("4e2e3f92268cdda473b628f6"), title : "My Very Important Thoughts", published: ISODate("2011-07-26T19:49:00.147Z"), author : { name:"Asya Kamsky", username:"asya" }, text : "It was a long and stormy night ..." tags : ["business", "news", "north america"], votes : 4, voters : ["dmerr", "sj", "jane", "asya" ], comments : [ { by : "tim157", text : "great story", ... }, { by : "gora", text : "i don’t think so", ... }, { by : "dmerr", text : "also check out..." } ]} 44
  45. 45. { _id : ObjectId("4e2e3f92268cdda473b628f6"), title : "My Very Important Thoughts", published: ISODate("2011-07-26T19:49:00.147Z"), author : { name:"Asya Kamsky", username:"asya" }, text : "It was a long and stormy night ..." tags : ["business", "news", "north america"], votes : 4, voters : ["dmerr", "sj", "jane","asya" ], comments : [ { by : "tim157", text : "great story" }, { by : "gora", text : "i don’t think so" }, { by : "dmerr", text : "also check out..." } ]}> db.posts.ensureIndex( { "comments.by" : 1 } ) 45
  46. 46. { _id : ObjectId("4e2e3f92268cdda473b628f6"), title : "My Very Important Thoughts", published: ISODate("2011-07-26T19:49:00.147Z"), author : { name:"Asya Kamsky", username:"asya" }, text : "It was a long and stormy night ..." tags : ["business", "news", "north america"], votes : 4, voters : ["dmerr", "sj", "jane","asya" ], comments : [ { by : "tim157", text : "great story" }, { by : "gora", text : "i don’t think so" }, { by : "dmerr", text : "also check out..." } ]}> db.posts.find( { "comments.by" : "gora" } ) 46
  47. 47. Seek = 5+ ms Read = really really fast Post CommentAuthor 47
  48. 48. Disk seeks and data locality Post Author Comment Comment Comment Comment Comment 48
  49. 49. • High Availability• Data Redundancy• Increase capacity with no downtime• Transparent to the application 49
  50. 50. • A cluster of N servers Pick me!• Any (one) node can be primary• All writes to primary Node 1• Reads go to primary (default) Node 2 optionally to a secondary• Consensus election of primary Primary Node 3• Automatic failover• Automatic recovery 50
  51. 51. Replica Sets• High Availability/Automatic Failover• Data Redundancy• Disaster Recovery• Transparent to the application• Perform maintenance with no down time 51
  52. 52. AsynchronousReplication 52
  53. 53. AsynchronousReplication 53
  54. 54. AsynchronousReplication 54
  55. 55. 55
  56. 56. Automatic Election 56
  57. 57. 57
  58. 58. • Increase capacity with no downtime• Transparent to the application• Range based partitioning• Partitioning and balancing is automatic 58
  59. 59. Key Range Key Range Key Range Key Range min..25 26..50 51..75 76.. maxPrimary Primary Primary PrimarySecondary Secondary Secondary SecondarySecondary Secondary Secondary Secondary 59
  60. 60. Application MongoS Key Range Key Range Key Range Key Range min..25 26..50 51..75 76.. maxPrimary Primary Primary PrimarySecondary Secondary Secondary SecondarySecondary Secondary Secondary Secondary 60
  61. 61. Application MongoS MongoS MongoS Key Range Key Range Key Range Key Range min..25 26..50 51..75 76.. maxPrimary Primary Primary PrimarySecondary Secondary Secondary SecondarySecondary Secondary Secondary Secondary 61
  62. 62. Application Application Application Application MongoS ConfigMongoS Config MongoS MongoS Config MongoS Key Range Key Range Key Range Key Range min..25 26..50 51..75 76.. max Primary Primary Primary Primary Secondary Secondary Secondary Secondary Secondary Secondary Secondary Secondary 62
  63. 63. • Few configuration options• Does the right thing out of the box• Easy to deploy and manage 63
  64. 64. Better data locality In-Memory Auto-Sharding Caching Read scaling Write scaling Relational MongoDB We just cant get any faster than the way MongoDB handles our data. Tony Tam CTO, Wordnik 64
  65. 65. • Supported Platforms: – Linux, Windows, Solaris, Mac OS X – Packages available for all popular distributions No external/third party software dependencies 10gen maintains drivers for over dozen languages 65
  66. 66. Content Management Operational Intelligence E-Commerce User Data Management High Volume Data Feeds 66
  67. 67. 67
  68. 68. Open source, high performance database 68
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×