• Like

Intro to NoSQL and MongoDB

  • 2,544 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,544
On Slideshare
0
From Embeds
0
Number of Embeds
3

Actions

Shares
Downloads
87
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. NoSQL: Introduction Asya Kamsky 1
  • 2. • 1970s Relational Databases Invented – Storage is expensive – Data is normalized – Data storage is abstracted away from app 2
  • 3. • 1970s Relational Databases Invented – Storage is expensive – Data is normalized – Data storage is abstracted away from app• 1980s RDBMS commercialized – Client/Server model – SQL becomes the standard 3
  • 4. • 1970s Relational Databases Invented – Storage is expensive – Data is normalized – Data storage is abstracted away from app• 1980s RDBMS commercialized – Client/Server model – SQL becomes the standard• 1990s Things begin to change – Client/Server=> 3-tier architecture – Rise of the Internet and the Web 4
  • 5. • 2000s Web 2.0 – Rise of "Social Media" – Acceptance of E-Commerce – Constant decrease of HW prices – Massive increase of collected data 5
  • 6. • 2000s Web 2.0 – Rise of "Social Media" – Acceptance of E-Commerce – Constant decrease of HW prices – Massive increase of collected data• Result – Constant need to scale dramatically – How can we scale? 6
  • 7. Computers in 1985• x286 5-35 mhz• 56 kbps• 64 KB RAM• 10 MB HDD 7
  • 8. Computers in 1985 Computers in 1995• x286 5-35 mhz • Pentium 100 mhz• 56 kbps • 20-50 Mbps• 64 KB RAM • 16 MB RAM• 10 MB HDD • 200 MB HDD 8
  • 9. Computers in 1985 Computers in 1995 Phone in 2012• x286 5-35 mhz • Pentium 100 mhz • Dual core 1.2 Ghz• 56 kbps • 20-50 Mbps • WiFi 802.11n -• 64 KB RAM • 16 MB RAM 300+Mbps• 10 MB HDD • 200 MB HDD • 1 GB RAM • 48 GB SSD 9
  • 10. Computers in 1985 Computers in 1995 Computers in 2012• x286 5-35 mhz • Pentium 100 mhz • Dual core 1.8 Ghz• 56 kbps • 20-50 Mbps • WiFi 802.11n -• 64 KB RAM • 16 MB RAM 300+Mbps• 10 MB HDD • 200 MB HDD • 180+ Gbps • 8 GB RAM • 512 GB SSD 10
  • 11. • Agile Development Methodology • Shorter development cycles • Constant evolution of requirements • Flexibility at design time 11
  • 12. • Agile Development Methodology • Shorter development cycles • Constant evolution of requirements • Flexibility at design time • Relational Schema • Hard to evolve • long painful migrations • must stay in sync with application • few developers interact directly 12
  • 13. + complex transactions+ ad hoc queries + tabular data+ SQL standard + ad hoc queriesprotocol between - O<->R mapping hardclients and servers - speed/scale problems+ scales horizontally - not super agilebetter than oper dbs.- some scale limits atmassive scale BI / OLTP /- schemas are rigid reporting operational- no real time; great atbulk nightly data loads a lot more fewer issues issues here 13 here
  • 14. + complex transactions+ ad hoc queries + tabular data+ SQL standard + ad hoc queriesprotocol between - O<->R mapping hardclients and servers - speed/scale problems+ scales horizontally - not super agilebetter than oper dbs.- some scale limits atmassive scale BI / OLTP /- schemas are rigid reporting operational- no real time; great at cachingbulk nightly data loads app layer flat files partitioning map/reduce 14
  • 15. 15
  • 16. • Agile Development Methodology • Shorter development cycles • Constant evolution of requirements • Flexibility at design time 16
  • 17. • Agile Development Methodology • Shorter development cycles • Constant evolution of requirements • Flexibility at design time • Relational Schema • Hard to evolve • long painful migrations • must stay in sync with application • few developers interact directly 17
  • 18. 18
  • 19. • Horizontal scaling• Run anywhere• Flexible data model• Faster development• Low upfront cost• Low cost of ownership 19
  • 20. What is NoSQL? Relational vs Non-Relational 20
  • 21. + speed and scale - ad hoc query limited - not very transactional - no sql/no standard + fits OO well scalable + agile nonrelationalBI / reporting ("nosql") OLTP / operational 21
  • 22. Non-relational next generation operation data stores and databasesA collection of very different products• Different data models (Not relational)• Most are not using SQL for queries• No predefined schema• Some allow flexible data structures 22
  • 23. • Relational • Key-Value • Document • XML • Graph • Column 23
  • 24. • Relational • Key-Value • Document • XML • Graph • Column• ACID • BASE • Some ACID properties 24
  • 25. • Relational • Key-Value • Document • XML • Graph • Column• ACID • BASE • Some ACID properties• Two-phase commit • Atomic transactions on document level 25
  • 26. • Relational • Key-Value • Document • XML • Graph • Column• ACID • BASE • Some ACID properties• Two-phase commit • Atomic transactions on document level• Joins • No Joins 26
  • 27. 27
  • 28. • Fits your use case• Reliability• Maintainability• Ease of Use• Scalability• Cost 28
  • 29. MongoDB: Introduction 29
  • 30. 30
  • 31. • Designed and developed by founders of Doubleclick, ShopWiki, GILT groupe, etc.• GOAL: create high performance, fully consistent, horizonally scalable general purpose data store.• Coding started fall 2007• Open Source – AGPL, written in C++• First production site March 2008 - businessinsider.com• Currently version 2.2 – August 2012 31
  • 32. MongoDBDesign Goals 32
  • 33. 33
  • 34. • Document-oriented Storage • Based on JSON Documents • Data serialized to BSON • Flexible Schema• Scalable Architecture • Replication • High availability • Key Features Include: • Auto-sharding • Full featured indexes • Extensive use of memory • Ad-hoc Query Language mapped files • Interactive shell • Durable • Aggregation queries • Strong Consistency • Map/Reduce 34
  • 35. • Rich data models• Seamlessly map to native programming language types• Flexible for dynamic data• Better data locality 35
  • 36. Blogging website: Register users Users post blog entries Comment on others entries Considering: Tagging, Voting, ??? 36
  • 37. jointable 37
  • 38. { _id : ObjectId("4e2e3f92268cdda473b628f6"), title : "My Very Important Thoughts", published: ISODate("2011-07-26T19:49:00.147Z"), author : { name:"Asya Kamsky", username:"asya" }, text : "It was a long and stormy night ..."} 38
  • 39. { _id : ObjectId("4e2e3f92268cdda473b628f6"), title : "My Very Important Thoughts", published: ISODate("2011-07-26T19:49:00.147Z"), author : { name:"Asya Kamsky", username:"asya" }, text : "It was a long and stormy night ..." tags : ["business", "news", "north america"]}> db.posts.ensureIndex( { tags : 1 } ) 39
  • 40. { _id : ObjectId("4e2e3f92268cdda473b628f6"), title : "My Very Important Thoughts", published: ISODate("2011-07-26T19:49:00.147Z"), author : { name:"Asya Kamsky", username:"asya" }, text : "It was a long and stormy night ..." tags : ["business", "news", "north america"]}> db.posts.find( { tags : "news" } ) 40
  • 41. { _id : ObjectId("4e2e3f92268cdda473b628f6"), title : "My Very Important Thoughts", published: ISODate("2011-07-26T19:49:00.147Z"), author : { name:"Asya Kamsky", username:"asya" }, text : "It was a long and stormy night ..." tags : ["business", "news", "north america"]}> db.posts.find( { tags : "news" } ) .explain(){ "cursor" : "BtreeCursor tags_1", "isMultiKey" : true, "n" : 1, "nscannedObjects" : 1, "scanAndOrder" : false, "indexOnly" : false, 41
  • 42. { _id : ObjectId("4e2e3f92268cdda473b628f6"), title : "My Very Important Thoughts", published: ISODate("2011-07-26T19:49:00.147Z"), author : { name:"Asya Kamsky", username:"asya" }, text : "It was a long and stormy night ..." tags : ["business", "news", "north america"], votes : 3, voters : ["dmerr", "sj", "jane" ]}> db.posts.update( { }, – query for documents to update { } – update to perform ) 42
  • 43. { _id : ObjectId("4e2e3f92268cdda473b628f6"), title : "My Very Important Thoughts", published: ISODate("2011-07-26T19:49:00.147Z"), author : { name:"Asya Kamsky", username:"asya" }, text : "It was a long and stormy night ..." tags : ["business", "news", "north america"], votes : 3, voters : ["dmerr", "sj", "jane" ]}> db.posts.update( {_id:..., voters:{$ne:"asya"} }, { $push: {voters:"asya"}, $inc : {votes: 1} } ) 43
  • 44. { _id : ObjectId("4e2e3f92268cdda473b628f6"), title : "My Very Important Thoughts", published: ISODate("2011-07-26T19:49:00.147Z"), author : { name:"Asya Kamsky", username:"asya" }, text : "It was a long and stormy night ..." tags : ["business", "news", "north america"], votes : 4, voters : ["dmerr", "sj", "jane", "asya" ], comments : [ { by : "tim157", text : "great story", ... }, { by : "gora", text : "i don’t think so", ... }, { by : "dmerr", text : "also check out..." } ]} 44
  • 45. { _id : ObjectId("4e2e3f92268cdda473b628f6"), title : "My Very Important Thoughts", published: ISODate("2011-07-26T19:49:00.147Z"), author : { name:"Asya Kamsky", username:"asya" }, text : "It was a long and stormy night ..." tags : ["business", "news", "north america"], votes : 4, voters : ["dmerr", "sj", "jane","asya" ], comments : [ { by : "tim157", text : "great story" }, { by : "gora", text : "i don’t think so" }, { by : "dmerr", text : "also check out..." } ]}> db.posts.ensureIndex( { "comments.by" : 1 } ) 45
  • 46. { _id : ObjectId("4e2e3f92268cdda473b628f6"), title : "My Very Important Thoughts", published: ISODate("2011-07-26T19:49:00.147Z"), author : { name:"Asya Kamsky", username:"asya" }, text : "It was a long and stormy night ..." tags : ["business", "news", "north america"], votes : 4, voters : ["dmerr", "sj", "jane","asya" ], comments : [ { by : "tim157", text : "great story" }, { by : "gora", text : "i don’t think so" }, { by : "dmerr", text : "also check out..." } ]}> db.posts.find( { "comments.by" : "gora" } ) 46
  • 47. Seek = 5+ ms Read = really really fast Post CommentAuthor 47
  • 48. Disk seeks and data locality Post Author Comment Comment Comment Comment Comment 48
  • 49. • High Availability• Data Redundancy• Increase capacity with no downtime• Transparent to the application 49
  • 50. • A cluster of N servers Pick me!• Any (one) node can be primary• All writes to primary Node 1• Reads go to primary (default) Node 2 optionally to a secondary• Consensus election of primary Primary Node 3• Automatic failover• Automatic recovery 50
  • 51. Replica Sets• High Availability/Automatic Failover• Data Redundancy• Disaster Recovery• Transparent to the application• Perform maintenance with no down time 51
  • 52. AsynchronousReplication 52
  • 53. AsynchronousReplication 53
  • 54. AsynchronousReplication 54
  • 55. 55
  • 56. Automatic Election 56
  • 57. 57
  • 58. • Increase capacity with no downtime• Transparent to the application• Range based partitioning• Partitioning and balancing is automatic 58
  • 59. Key Range Key Range Key Range Key Range min..25 26..50 51..75 76.. maxPrimary Primary Primary PrimarySecondary Secondary Secondary SecondarySecondary Secondary Secondary Secondary 59
  • 60. Application MongoS Key Range Key Range Key Range Key Range min..25 26..50 51..75 76.. maxPrimary Primary Primary PrimarySecondary Secondary Secondary SecondarySecondary Secondary Secondary Secondary 60
  • 61. Application MongoS MongoS MongoS Key Range Key Range Key Range Key Range min..25 26..50 51..75 76.. maxPrimary Primary Primary PrimarySecondary Secondary Secondary SecondarySecondary Secondary Secondary Secondary 61
  • 62. Application Application Application Application MongoS ConfigMongoS Config MongoS MongoS Config MongoS Key Range Key Range Key Range Key Range min..25 26..50 51..75 76.. max Primary Primary Primary Primary Secondary Secondary Secondary Secondary Secondary Secondary Secondary Secondary 62
  • 63. • Few configuration options• Does the right thing out of the box• Easy to deploy and manage 63
  • 64. Better data locality In-Memory Auto-Sharding Caching Read scaling Write scaling Relational MongoDB We just cant get any faster than the way MongoDB handles our data. Tony Tam CTO, Wordnik 64
  • 65. • Supported Platforms: – Linux, Windows, Solaris, Mac OS X – Packages available for all popular distributions No external/third party software dependencies 10gen maintains drivers for over dozen languages 65
  • 66. Content Management Operational Intelligence E-Commerce User Data Management High Volume Data Feeds 66
  • 67. 67
  • 68. Open source, high performance database 68