Intro to NoSQL and MongoDB

1,820 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,820
On SlideShare
0
From Embeds
0
Number of Embeds
474
Actions
Shares
0
Downloads
73
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Intro to NoSQL and MongoDB

  1. 1. NoSQL: Introduction Asya Kamsky 1
  2. 2. • 1970s Relational Databases Invented – Storage is expensive – Data is normalized – Data storage is abstracted away from app 2
  3. 3. • 1970s Relational Databases Invented – Storage is expensive – Data is normalized – Data storage is abstracted away from app• 1980s RDBMS commercialized – Client/Server model – SQL becomes the standard 3
  4. 4. • 1970s Relational Databases Invented – Storage is expensive – Data is normalized – Data storage is abstracted away from app• 1980s RDBMS commercialized – Client/Server model – SQL becomes the standard• 1990s Things begin to change – Client/Server=> 3-tier architecture – Rise of the Internet and the Web 4
  5. 5. • 2000s Web 2.0 – Rise of "Social Media" – Acceptance of E-Commerce – Constant decrease of HW prices – Massive increase of collected data 5
  6. 6. • 2000s Web 2.0 – Rise of "Social Media" – Acceptance of E-Commerce – Constant decrease of HW prices – Massive increase of collected data• Result – Constant need to scale dramatically – How can we scale? 6
  7. 7. + complex transactions + tabular data + ad hoc queries - O<->R mapping hard - speed/scale problems - not super agile BI / OLTP /reporting operational 7
  8. 8. + complex transactions+ ad hoc queries + tabular data+ SQL standard + ad hoc queriesprotocol between - O<->R mapping hardclients and servers - speed/scale problems+ scales horizontally - not super agilebetter than oper dbs.- some scale limits atmassive scale BI / OLTP /- schemas are rigid reporting operational- no real time; great atbulk nightly data loads 8
  9. 9. + complex transactions+ ad hoc queries + tabular data+ SQL standard + ad hoc queriesprotocol between - O<->R mapping hardclients and servers - speed/scale problems+ scales horizontally - not super agilebetter than oper dbs.- some scale limits atmassive scale BI / OLTP /- schemas are rigid reporting operational- no real time; great atbulk nightly data loads fewer issues 9 here
  10. 10. + complex transactions+ ad hoc queries + tabular data+ SQL standard + ad hoc queriesprotocol between - O<->R mapping hardclients and servers - speed/scale problems+ scales horizontally - not super agilebetter than oper dbs.- some scale limits atmassive scale BI / OLTP /- schemas are rigid reporting operational- no real time; great atbulk nightly data loads a lot more fewer issues issues here 10 here
  11. 11. + complex transactions+ ad hoc queries + tabular data+ SQL standard + ad hoc queriesprotocol between - O<->R mapping hardclients and servers - speed/scale problems+ scales horizontally - not super agilebetter than oper dbs.- some scale limits atmassive scale BI / OLTP /- schemas are rigid reporting operational- no real time; great at cachingbulk nightly data loads app layer flat files partitioning map/reduce 11
  12. 12. • Agile Development Methodology • Shorter development cycles • Constant evolution of requirements • Flexibility at design time 12
  13. 13. • Agile Development Methodology • Shorter development cycles • Constant evolution of requirements • Flexibility at design time • Relational Schema • Hard to evolve • long painful migrations • must stay in sync with application • few developers interact directly 13
  14. 14. 14
  15. 15. 15
  16. 16. • Horizontal scaling• More real time requirements• Faster development time• Flexible data model• Low upfront cost• Low cost of ownership 16
  17. 17. What is NoSQL? Relational vs Non-Relational 17
  18. 18. + speed and scale - ad hoc query limited - not very transactional - no sql/no standard + fits OO well scalable + agile nonrelationalBI / reporting (“nosql”) OLTP / operational 18
  19. 19. Non-relational next generation operation data stores and databasesA collection of very different products• Different data models (Not relational)• Most are not using SQL for queries• No predefined schema• Some allow flexible data structures 19
  20. 20. • Relational • Key-Value • Document • XML • Graph • Column 20
  21. 21. • Relational • Key-Value • Document • XML • Graph • Column• ACID • BASE 21
  22. 22. • Relational • Key-Value • Document • XML • Graph • Column• ACID • BASE• Two-phase commit • Atomic transactions on document level 22
  23. 23. • Relational • Key-Value • Document • XML • Graph • Column• ACID • BASE• Two-phase commit • Atomic transactions on document level• Joins • No Joins 23
  24. 24. 24
  25. 25. • Transaction rate• Reliability• Maintainability• Ease of Use• Scalability• Cost 25
  26. 26. MongoDB: Introduction 26
  27. 27. • Designed and developed by founders of Doubleclick, ShopWiki, GILT groupe, etc.• Coding started fall 2007• First production site March 2008 - businessinsider.com• Open Source – AGPL, written in C++• Version 0.8 – first official release February 2009• Version 1.0 – August 2009• Version 2.0 – September 2011 27
  28. 28. MongoDBDesign Goals 28
  29. 29. 29
  30. 30. • Document-oriented Storage • Based on JSON Documents • Flexible Schema• Scalable Architecture • Auto-sharding • Replication & high availability• Key Features Include: • Full featured indexes • Query language • Map/Reduce & Aggregation 30
  31. 31. • Rich data models• Seamlessly map to native programming language types• Flexible for dynamic data• Better data locality 31
  32. 32. 32
  33. 33. { _id : ObjectId("4e2e3f92268cdda473b628f6"), title : “Too Big to Fail”, when : Date(“2011-07-26”), author : “joe”, text : “blah”} 33
  34. 34. { _id : ObjectId("4e2e3f92268cdda473b628f6"), title : “Too Big to Fail”, when : Date(“2011-07-26”), author : “joe”, text : “blah”, tags : [“business”, “news”, “north america”]}> db.posts.find( { tags : “news” } ) 34
  35. 35. { _id : ObjectId("4e2e3f92268cdda473b628f6"), title : “Too Big to Fail”, when : Date(“2011-07-26”), author : “joe”, text : “blah”, tags : [“business”, “news”, “north america”], votes : 3, voters : [“dmerr”, “sj”, “jane” ]} 35
  36. 36. { _id : ObjectId("4e2e3f92268cdda473b628f6"), title : “Too Big to Fail”, when : Date(“2011-07-26”), author : “joe”, text : “blah”, tags : [“business”, “news”, “north america”], votes : 3, voters : [“dmerr”, “sj”, “jane” ], comments : [ { by : “tim157”, text : “great story” }, { by : “gora”, text : “i don’t think so” }, { by : “dmerr”, text : “also check out...” } ]} 36
  37. 37. { _id : ObjectId("4e2e3f92268cdda473b628f6"), title : “Too Big to Fail”, when : Date(“2011-07-26”), author : “joe”, text : “blah”, tags : [“business”, “news”, “north america”], votes : 3, voters : [“dmerr”, “sj”, “jane” ], comments : [ { by : “tim157”, text : “great story” }, { by : “gora”, text : “i don’t think so” }, { by : “dmerr”, text : “also check out...” } ]}> db.posts.find( { “comments.by” : “gora” } )> db.posts.ensureIndex( { “comments.by” : 1 } ) 37
  38. 38. Seek = 5+ ms Read = really really fast Post CommentAuthor 38
  39. 39. Disk seeks and data locality Post Author Comment Comment Comment Comment Comment 39
  40. 40. • Sophisticated secondary indexes• Dynamic queries• Sorting• Rich updates, upserts• Easy aggregation• Viable primary data store 40
  41. 41. • Scale linearly• High Availability• Increase capacity with no downtime• Transparent to the application 41
  42. 42. Replica Sets• High Availability/Automatic Failover• Data Redundancy• Disaster Recovery• Transparent to the application• Perform maintenance with no down time 42
  43. 43. AsynchronousReplication 43
  44. 44. AsynchronousReplication 44
  45. 45. AsynchronousReplication 45
  46. 46. 46
  47. 47. Automatic Election 47
  48. 48. 48
  49. 49. • Increase capacity with no downtime• Transparent to the application 49
  50. 50. • Increase capacity with no downtime• Transparent to the application• Range based partitioning• Partitioning and balancing is automatic 50
  51. 51. Key Range 0..100 mongodWrite Scalability 51
  52. 52. Key Range Key Range 0..50 51..100 mongod mongodWrite Scalability 52
  53. 53. Key Range Key Range Key Range Key Range 0..25 26..50 51..75 76.. 100 mongod mongod mongod mongodWrite Scalability 53
  54. 54. Key Range Key Range Key Range Key Range 0..25 26..50 51..75 76.. 100Primary Primary Primary PrimarySecondary Secondary Secondary SecondarySecondary Secondary Secondary Secondary 54
  55. 55. Application MongoS Key Range Key Range Key Range Key Range 0..25 26..50 51..75 76.. 100Primary Primary Primary PrimarySecondary Secondary Secondary SecondarySecondary Secondary Secondary Secondary 55
  56. 56. Application MongoS MongoS MongoS Key Range Key Range Key Range Key Range 0..25 26..50 51..75 76.. 100Primary Primary Primary PrimarySecondary Secondary Secondary SecondarySecondary Secondary Secondary Secondary 56
  57. 57. Application Config Config MongoS MongoS MongoS Config Key Range Key Range Key Range Key Range 0..25 26..50 51..75 76.. 100Primary Primary Primary PrimarySecondary Secondary Secondary SecondarySecondary Secondary Secondary Secondary 57
  58. 58. • Few configuration options• Does the right thing out of the box• Easy to deploy and manage 58
  59. 59. MySQL MongoDBSTART TRANSACTION; db.contacts.save( {INSERT INTO contacts VALUES userName: “joeblow”, (NULL, ‘joeblow’); emailAddresses: [INSERT INTO contact_emails VALUES “joe@blow.com”, ( NULL, ”joe@blow.com”, “joseph@blow.com” ] } ); LAST_INSERT_ID() ), ( NULL, “joseph@blow.com”, LAST_INSERT_ID() );COMMIT; 59
  60. 60. MySQL MongoDBSTART TRANSACTION; db.contacts.save( {INSERT INTO contacts VALUES userName: “joeblow”, (NULL, ‘joeblow’); emailAddresses: [INSERT INTO contact_emails VALUES “joe@blow.com”, ( NULL, ”joe@blow.com”, “joseph@blow.com” ] } ); LAST_INSERT_ID() ), ( NULL, “joseph@blow.com”, LAST_INSERT_ID() );COMMIT; • Native drivers for dozens of languages • Data maps naturally to OO data structures 60
  61. 61. MongoDB Usage Examples 61
  62. 62. Content Management Operational Intelligence E-Commerce User Data Management High Volume Data Feeds 62
  63. 63. Wordnik uses MongoDB as the foundation for its “live” dictionary that stores its entire text corpus – 3.5T of data in 20 billion records Problem Why MongoDB Impact Analyze a staggering amount of  Migrated 5 billion records in a  Reduced code by 75% data for a system build on single day with zero downtime compared to MySQL continuous stream of high-  MongoDB powers every  Fetch time cut from 400ms to quality text pulled from online website request: 20m API calls 60ms sources per day  Sustained insert speed of 8k Adding too much data too  Ability to eliminate memcached words per second, with quickly resulted in outages; layer, creating a simplified frequent bursts of up to 50k per tables locked for tens of system that required fewer second seconds during inserts resources and was less prone to  Significant cost savings and 15% Initially launched entirely on error. reduction in servers MySQL but quickly hit performance road blocks Life with MongoDB has been good for Wordnik. Our code is faster, more flexible and dramatically smaller. Since we don’t spend time worrying about the database, we can spend more time writing code for our application. -Tony Tam, Vice President of Engineering and Technical Co-founder 63
  64. 64. Intuit relies on a MongoDB-powered real-time analytics tool for small businesses to derive interesting and actionable patterns from their customers’ website traffic Problem Why MongoDB Impact Intuit hosts more than 500,000  MongoDBs querying and  In one week Intuit was able to websites Map/Reduce functionality could become proficient in MongoDB wanted to collect and analyze server as a simpler, higher- development data to recommend conversion performance solution than a  Developed application features and lead generation complex Hadoop more quickly for MongoDB than improvements to customers. implementation. for relational databases With 10 years worth of user  The strength of the MongoDB  MongoDB was 2.5 times faster data, it took several days to community. than MySQL process the information using a relational database. We did a prototype for one week, and within one week we had made big progress. Very big progress. It was so amazing that we decided, “Let’s go with this.” -Nirmala Ranganathan, Intuit 64
  65. 65. Shutterfly uses MongoDB to safeguard more than six billion images for millions of customers in the form of photos and videos, and turn everyday pictures into keepsakes Problem Why MongoDB Impact Managing 20TB of data (six  JSON-based data structure  500% cost reduction and 900% billion images for millions of  Provided Shutterfly with an performance improvement customers) partitioning by agile, high performance, compared to previous Oracle function. scalable solution at a low cost. implementation Home-grown key value store on  Works seamlessly with  Accelerated time-to-market for top of their Oracle database Shutterfly’s services-based nearly a dozen projects on offered sub-par performance architecture MongoDB Codebase for this hybrid store  Improved Performance by became hard to manage reducing average latency for High licensing, HW costs inserts from 400ms to 2ms. The “really killer reason” for using MongoDB is its rich JSON-based data structure, which offers Shutterfly an agile approach to develop software. With MongoDB, the Shutterfly team can quickly develop and deploy new applications, especially Web 2.0 and social features. -Kenny Gorman, Director of Data Services 65
  66. 66. 66
  67. 67. Open source, high performance database 67

×