MongoDB Workshop

3,683 views
3,526 views

Published on

Workshop held at NYC Open Data Meetup

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,683
On SlideShare
0
From Embeds
0
Number of Embeds
1,256
Actions
Shares
0
Downloads
62
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

MongoDB Workshop

  1. 1. MONGODB WORKSHOP { meetup: “NYC Open Data”, presenters: [“Kannan Sankaran”, “Roman Kubiak”], host: “Vivian”, location: “ThoughtWorks”, audience: “You guys” }
  2. 2. MONGODB WORKSHOP { meetup: “NYC Open Data”, presenters: [“Kannan Sankaran”, “Roman Kubiak”], host: “Vivian is awesome, THANK YOU”, location: “ThoughtWorks is awesome, THANK YOU”, audience: “You guys are awesome, THANK YOU” }
  3. 3. OUR TOPICS OVERVIEW OF DATABASES WHAT IS MONGODB? MONGODB, NOSQL, AND RELATIONAL DATABASES A PEEK AT MONGODB COMMANDS SHARDING AND REPLICATION IN MONGODB FUTURE OF MONGODB AND US DEMO WORKSHOP
  4. 4. MONGO PIE ARCHITECT
  5. 5. OVERVIEW OF DATABASES
  6. 6. ORGANIZING DATA ROWS COLUMNS TABLES
  7. 7. DATA SPREAD OUT IN VARIOUS TABLES
  8. 8. DATA MAY BE RELATED
  9. 9. DATABASES AND THEIR GROWTH RELATIONAL DATABASES (RDBMS) CREATED 1970s 1980s RDBMS CONTINUE TO BE POPULAR INTERNET ARRIVES 1990s CLIENT/SERVER MODEL STRUCTURED QUERY LANGUAGE (SQL) CREATED 2000s MONGODB CREATED 2007 INTERNET GROWS NoSQL DATABASES EMERGE
  10. 10. WHAT IS NoSQL?
  11. 11. A TWITTER HASHTAG #nosql
  12. 12. NOSQL GENERALLY REFERS TO DATABASES THAT DO NOT HAVE A FIXED ROW-COLUMN DATA ORGANIZATION STRUCTURE.
  13. 13. WHAT IS MONGODB?
  14. 14. A HUMONGOUS NoSQL DB
  15. 15. A HUMONGOUS NoSQL DB WHERE DATA IS ORGANIZED BY DOCUMENTS NOT ROWS COLLECTIONS NOT TABLES
  16. 16. WHAT IS A DOCUMENT?
  17. 17. A DOCUMENT IS LIKE A ROW… { _id: ObjectID(“12AB34CD56EF”), name: “Ed Brown”, orderDate: “2-1-2014” }
  18. 18. …BUT IT IS MORE FLEXIBLE { { _id: ObjectID(“12AB34CD56EF”), name: “Ed Brown”, orderDate: “2-1-2014”, payments: { car: “100.50”, hotel: “200” } _id: ObjectID(“12AB34CD56EF”), name: “Ed Brown”, orderDate: “2-1-2014”, payments: { car: “100.50”, hotel: “200” }, tags: [“shirt”, “tie”] } } THAT LOOKS LIKE A DOCUMENT WITHIN ANOTHER DOCUMENT! WHAT IS THIS? MULTIPLE VALUES WITHIN A COLUMN?
  19. 19. HOW LARGE CAN THIS DOCUMENT BE? { _id: ObjectID(“12AB34CD56EF”), name: “Ed Brown”, orderDate: “2-1-2014”, payments: { car: “100.50”, hotel: “200” } … … … } UP TO 16 MB LEO TOLSTOY’S 1225PAGE BOOK ON WAR AND PEACE CAN FIT IN 1 DOCUMENT, AS IT IS ONLY AROUND 3 MB.
  20. 20. ISN’T THAT JSON? WELL, ALMOST!
  21. 21. WHAT IS JSON? WEB SERVER { } MONGODB DATABASE “make”: “Chevy”, “model”: “Malibu”, “year”: 2014 { “vehicle”: “Chevy Malibu 2014”, “price”: { “min”: 22340, “max”: 29950 }, “citympg”: 25 }
  22. 22. WHAT IS JSON? JAVASCRIPT OBJECT NOTATION NAME-VALUE PAIRS { { } vehicle: “car”, make: “Malibu”, color: “blue” } name: “Kannan”, gender: “male”, favorites: { color: “blue” }, interests: [“MongoDB”, “R”]
  23. 23. MONGODB DOCUMENT { _id: ObjectID(“12AB34CD56EF”), name: “Kannan”, gender: “male”, favorites: { color: “blue” }, interests: [“MongoDB”, “R”], date: new Date() }
  24. 24. WHAT IS A COLLECTION?
  25. 25. A GROUP OF DOCUMENTS { SIMILAR { _id: ObjectID(“34AB34CD56EF”), name: “Ed Brown”, orderDate: “2-1-2014”, tags: [“shirt”, “tie”] _id: ObjectID(“12AB34CD56EF”), name: “Ed Brown”, orderDate: “2-1-2014” } { _id: ObjectID(“78AB34CD56EF”), name: “Roman Ku”, orderDate: “2-1-2014” } { _id: ObjectID(“56AB34CD56EF”), name: “Eva Green”, orderDate: “2-1-2014” DIFFERENT } { _id: ObjectID(“90AB34CD56EF”), name: “Roman Ku”, orderDate: “2-1-2014”, payments: { car: “100.50”, hotel: “200” } } { VERY DIFFERENT { _id: ObjectID(“35AB34CD56EF”), name: “Ed Brown”, orderDate: “2-1-2014” } { _id: ObjectID(“79AB34CD56EF”), vehicle: “car”, make: “Malibu”, color: “blue” } { _id: ObjectID(“57AB34CD56EF”), name: “Eva Green”, orderDate: “2-1-2014”, tags: [“shirt”, “tie”] _id: ObjectID(“13AB34CD56EF”), name: “Eva Green”, orderDate: “2-1-2014” } } }
  26. 26. MONGODB IS... A DOCUMENT-ORIENTED NOSQL DATABASE WHERE DATA CONSISTS OF DOCUMENTS STORED IN COLLECTIONS.
  27. 27. MONGODB FEATURES EASY TO LEARN DYNAMIC QUERY LANGUAGE - SEARCH BY FIELDS, REGULAR EXPRESSIONS - USER-DEFINED JAVASCRIPT FUNCTIONS - AGGREGATION, INCLUDING MAP/REDUCE INDEXING – SINGLE, COMPOUND, GEOSPATIAL REPLICATION LOAD BALANCING USING SHARDING GRIDFS TO STORE FILES
  28. 28. MONGODB USAGE CONTENT MANAGEMENT SYSTEMS E-COMMERCE WEBSITES LOG DATA AND HIERARCHICAL AGGREGATION REAL-TIME ANALYTICS
  29. 29. MONGODB, NOSQL, AND RELATIONAL DATABASES
  30. 30. DATABASE MANAGEMENT SYSTEMS BERKELEY INGRES ORACLE 1970s MOST SYSTEMS USE SOME FLAVOR OF SQL 1980s INFORMIX DB2 SYBASE SQL SERVER MS ACCESS POSTGRESQL MYSQL 1990s 2000s NETEZZA GREENPLUM VERTICA MARIADB MONGODB 2007
  31. 31. RELATIONAL DATABASES WERE / STILL ARE THE DEFACTO IN SEVERAL COMPANIES.
  32. 32. RELATIONAL DATABASE FEATURES C.R.U.D. OPERATIONS STRUCTURED QUERY LANGUAGE (SQL) FIXED DATABASE SCHEMAS NORMALIZATION REFERENTIAL INTEGRITY (E.G. FOREIGN KEYS, CONSTRAINTS) JOINS TRANSACTIONS - A.C.I.D. PROPERTIES INDEXES
  33. 33. IN THE LATE 90s/EARLY 2000s… DOT COM BUBBLE DOT COM BUST WEB SERVICES SOCIAL NETWORKS GOOGLE, AMAZON COMPUTER OWNERS/USERS WEBSITE DATA COLLECTION DATABASE SIZES
  34. 34. COMPUTING/STORAGE RESOURCES BECAME A CHALLENGE FOR SMALLER COMPANIES LIKE GOOGLE AND AMAZON THAT HAD LOTS OF DATA.
  35. 35. SCALE UP BIGGER MACHINE MORE DISK SPACE MORE RAM MORE PROCESSORS MORE EXPENSIVE SINGLE POINT OF FAILURE HARDWARE HAS LIMITS! SCALE OUT SMALLER LESS DISK SPACE MACHINES LESS RAM LESS PROCESSORS LESS EXPENSIVE NO SINGLE POINT OF FAILURE HIGHER RELIABILITY DESPITE FAILURE OF INDIVIDUAL MACHINES
  36. 36. RELATIONAL DATABASES WERE DESIGNED TO OPERATE ON A SINGLE MACHINE, AND SCALING OUT MEANT A LOT OF CHALLENGES.
  37. 37. SPLITTING DATA FOR SCALE OUT BY COLUMNS BY ROWS
  38. 38. WORDPRESS MYSQL SCHEMA WITH 2 TABLES
  39. 39. WP_POSTS A JOIN QUERY IN MYSQL WP_COMMENTS SELECT p.post_author, p.post_date, c.comment_author, c.comment_date FROM wp_posts AS p INNER JOIN wp_comments AS c ON p.ID = c.comment_post_ID WHERE p.ID = 1;
  40. 40. WP_POSTS A JOIN QUERY IN MYSQL WP_COMMENTS RESULT
  41. 41. SCALE OUT DATA BY ROWS WP_POSTS A B C WP_COMMENTS D
  42. 42. HOW COMPLICATED WOULD SCALING THIS BE?
  43. 43. JOINS MAY GET REALLY MESSY WITH MANY MACHINES (DISTRIBUTED JOINS)
  44. 44. WP_POSTS TRANSACTIONS MUST SATISFY A.C.I.D. PROPERTIES WP_COMMENTS BEGIN TRANSACTION TRY DELETE FROM wp_comments AS c WHERE c.comment_post_ID = 1; DELETE FROM wp_posts AS p WHERE p.ID = 1; CATCH IF ERROR THEN ROLLBACK TRANSACTION COMMIT TRANSACTION END TRANSACTION
  45. 45. TRANSACTIONS MAY TAKE A LONG TIME TO EXECUTE IF DATA IS ON DIFFERENT MACHINES (DISTRIBUTED TRANSACTIONS)
  46. 46. TO SPLIT THE DATA, A WHOLE BUNCH OF COMPROMISES MUST BE MADE IN RELATIONAL DATABASES
  47. 47. THIS GAVE RISE TO NONRELATIONAL SOLUTIONS
  48. 48. GOOGLE AMAZON
  49. 49. NoSQL SYSTEM CHARACTERISTICS C.R.U.D. OPERATIONS STRUCTURED QUERY LANGUAGE (SQL) FIXED DATABASE SCHEMAS NORMALIZATION REFERENTIAL INTEGRITY (E.G. FOREIGN KEYS, CONSTRAINTS) JOINS TRANSACTIONS – LIMITED A.C.I.D. PROPERTIES INDEXES OPEN SOURCE
  50. 50. HOW IS THIS SCALABILITY ACHIEVED IN MONGODB?
  51. 51. STACKING THE DATA
  52. 52. WP_POSTS STACKING THE DATA { NO NEED TO JOIN } WP_COMMENTS _id: 1, post_author: “Amy W”, post_date: “1/1/2014”, comments: [{ comment_author: “bestguy”, comment_date: “1/1/2014” },{ comment_author: “baddie”, comment_date: “1/10/2014” },{ comment_author: “clever24”, comment_date: “1/11/2014” }]
  53. 53. NOW, EACH DOCUMENT CAN BE IN A DIFFERENT MACHINE
  54. 54. WHAT ABOUT TRANSACTIONS?
  55. 55. MONGODB DOES NOT SUPPORT TRANSACTIONS
  56. 56. { BUT SINGLE DOCUMENT UPDATE IS ATOMIC _id: 1, post_author: “Amy W”, post_date: “1/1/2014”, comments: [{ comment_author: “bestguy”, comment_date: “1/1/2014” },{ comment_author: “baddie”, comment_date: “1/10/2014” },{ comment_author: “clever24”, comment_date: “1/11/2014” }] }
  57. 57. THE KEY IS TO FOCUS ON THE DATA MODEL
  58. 58. MONGODB CHARACTERISTICS C.R.U.D. OPERATIONS STRUCTURED QUERY LANGUAGE (SQL) DYNAMIC QUERY LANGUAGE FIXED DATABASE SCHEMAS FLEXIBLE DATABASE SCHEMAS NORMALIZATION REFERENTIAL INTEGRITY (E.G. FOREIGN KEYS, CONSTRAINTS) JOINS TRANSACTIONS – LIMITED A.C.I.D. PROPERTIES INDEXES OPEN SOURCE
  59. 59. WHEN NOT TO USE MONGODB IF TRANSACTIONS ARE A MUST IF JOINS ARE ABSOLUTELY NECESSARY SOFTWARE PRODUCTS LIKE WORDPRESS THAT ALREADY HAVE TONS OF SUPPORT FOR RELATIONAL DATABASES
  60. 60. FOR MONGODB vs MYSQL ARGUMENTS, WATCH… Source: http://www.youtube.com/watch?v=b2F-DItXtZs
  61. 61. A PEEK AT MONGODB COMMANDS
  62. 62. MONGODB IS A DOCUMENTORIENTED DATABASE { _id: ObjectID(“A1234566789”), name: “Ed Brown”, orderDate: “2-1-2014” } { _id: ObjectID(“A1234566789”), name: “Roman Ku”, orderDate: “1-1-2014” } { _id: ObjectID(“A1234566789”), name: “Eva Green”, orderDate: “10-12-2013” } DOCUMENTS ARE INTERNALLY STORED AS BSON (BINARY JSON)
  63. 63. MONGODB FEATURES EASY TO LEARN DYNAMIC QUERY LANGUAGE - SEARCH BY FIELDS, REGULAR EXPRESSIONS - USER-DEFINED JAVASCRIPT FUNCTIONS - AGGREGATION, INCLUDING MAP/REDUCE INDEXING – SINGLE, COMPOUND, GEOSPATIAL REPLICATION LOAD BALANCING USING SHARDING GRIDFS TO STORE FILES
  64. 64. MONGODB SYNTAX SEEMS TO BE BORROWED FROM… - MYSQL JSON JAVASCRIPT UNIX
  65. 65. MONGODB SUPPORTS SEVERAL LANGUAGES DRIVERS FOR - PYTHON - NODE.JS - C# - HADOOP - R AND MANY MORE
  66. 66. MONGODB TERMINOLOGY RDBMS MONGODB DATABASE TABLE ROW DATABASE COLLECTION DOCUMENT A DATABASE CAN HAVE 1 OR MORE COLLECTIONS. A COLLECTION CAN HAVE 1 OR MORE DOCUMENTS. A DOCUMENT CAN HAVE 1 OR MORE NAME-VALUE PAIRS, AND/OR 1 OR MORE EMBEDDED DOCUMENTS.
  67. 67. MONGODB SUPPORTS SEVERAL DATA TYPES STRING NUMBER BOOLEAN ARRAY DATE EMBEDDED DOCUMENT NULL
  68. 68. MONGODB OPERATIONS C.R.U.D. CREATE READ UPDATE DELETE
  69. 69. CONNECTING TO MONGODB MONGO SHELL IS A JAVASCRIPT INTERPRETER. MONGOD ROBOMONGO HAS THE SAME JAVASCRIPT ENGINE AS THE MONGO SHELL. MONGO ROBOMONGO
  70. 70. IMPORT JSON TO MONGO COLLECTION mongoimport -d tennis –c ParksNYC --type json --drop < ParksNYC.json
  71. 71. CREATE COLLECTION SQL CREATE TABLE ParksNYC ( id int identity(1, 1), Prop_ID varchar(10), Name varchar(50) not null, Location varchar(20) not null, EstablishedOn datetime ) MONGODB
  72. 72. CREATE DOCUMENT SQL MONGODB INSERT ParksNYC (Prop_ID, Name, Location, EstablishedOn) VALUES(’Q900’, ’Ridge Park’, ‘1843 Norman St.’, ‘1/1/1970’) Prop_ID Name Location EstablishedOn Q900 Ridge Park 1843 Norman St. 1/1/1970 db.ParksNYC.insert( { Prop_ID : "Q900", Name : "Ridge Park", Location : ”1843 Norman St.”, EstablishedOn: “1/1/1970” })
  73. 73. READ ALL DOCUMENTS SQL SELECT * FROM ParksNYC MONGODB db.ParksNYC.find()
  74. 74. READ SPECIFIC DOCUMENT SQL SELECT * FROM ParksNYC WHERE Name = "Ridge Park" MONGODB db.ParksNYC.find( { Name : "Ridge Park” })
  75. 75. READ FIRST DOCUMENT SQL SELECT TOP 1 * FROM ParksNYC MONGODB db.ParksNYC.findOne()
  76. 76. READ SPECIFIC FIELDS IN DOCUMENT SQL SELECT id, Name FROM ParksNYC MONGODB db.ParksNYC.find( { }, { _id: 1, Name: 1 } )
  77. 77. READ DOCUMENTS WITH RANGE CRITERIA SQL SELECT id, Name FROM ParksNYC WHERE Courts > 5 AND Courts <= 8 MONGODB db.ParksNYC.find( { Courts: { $gt: 5, $lte: 8} } )
  78. 78. READ DOCUMENTS THAT START WITH A LETTER (REGULAR EXPRESSION) SQL SELECT id, Name FROM ParksNYC WHERE NAME LIKE ‘F%’ MONGODB db.ParksNYC.find( { Name: /^F/ } )
  79. 79. UPDATE FIELD IN DOCUMENT SQL UPDATE ParksNYC SET VisitDate = ‘1/1/2014’ MONGODB db.ParksNYC.update( { }, { $set: { VisitDate: "1/1/2014" } }, { multi: true} )
  80. 80. DELETE DOCUMENT SQL DELETE FROM ParksNYC Where Name = ‘Ridge Park’ MONGODB db.ParksNYC.remove( { Name : “Ridge Park” })
  81. 81. GROUP BY AND SUM SQL SELECT COUNT(Name) AS Parks_Number, SUM(Courts) AS Courts_Number FROM ParksNYC GROUP BY Accessible MONGODB db.ParksNYC.aggregate( { $group : { _id : "$Accessible", Parks_Number : { $sum : 1 }, Courts_Number : { $sum : "$Courts" } } })
  82. 82. SHARDING AND REPLICATION IN MONGODB
  83. 83. EACH DOCUMENT CAN BE IN A DIFFERENT MACHINE
  84. 84. HOW DOES MONGODB DO THIS?
  85. 85. AUTOSHARDING, FOR A COLLECTION
  86. 86. MONGODB CLUSTER MONGOD MONGOD MONGOD MONGOD MONGOS CLIENT CLIENT
  87. 87. SHARDING STEPS 1. ENABLE SHARDING ON DATABASE. 2. PICK A SHARD KEY FROM THE COLLECTION. MAKE SURE THE KEY IS - INDEXED - SUFFICIENTLY UNIQUE SO IT WILL HAVE A VARIETY OF UNIQUE VALUES. 3. SIT BACK AND RELAX. MONGODB WILL AUTOMATICALLY DO THE SHARDING. 
  88. 88. SHARDING WP_POSTS COLLECTION { _id: 1, post_author: “Amy W”, post_date: “1/1/2014”, comments: [{ comment_author: “bestguy”, comment_date: “1/1/2014” },{ comment_author: “baddie”, comment_date: “1/10/2014” },{ comment_author: “clever24”, comment_date: “1/11/2014” }] } SHARD KEY
  89. 89. BREAKING THE USERS INTO CHUNKS $minKey Abba1234 Abba1235 CarlW CarlZ FrankT FrankY JackA JackB LambV LambW RobF RobG TimA TimB $maxKey
  90. 90. BREAKING THE RANGE INTO CHUNKS SHARD0000 MONGOD $minKey Abba1234 RobG TimA LambW RobF SHARD0001 MONGOD TimB $maxKey MONGOS CarlZ FrankT MONGOD SHARD0002 CLIENT FrankY JackA Abba1235 CarlW JackB LambV
  91. 91. BENEFITS OF SHARDING 1. 2. 3. 4. INCREASES AVAILABLE MEMORY. REDUCES LOAD ON THE SERVER. INCREASES HARD DISK SPACE. LOCATION-BASED SHARD KEYS CAN PUT DATA CLOSE TO THE USERS AND KEEP RELATED DATA TOGETHER.
  92. 92. MASTER-SLAVE REPLICATION REPLICA SET MASTER SLAVE SLAVE MONGOD MONGOD MONGOD CLIENT
  93. 93. MASTER-SLAVE REPLICATION REPLICA SET MASTER SLAVE SLAVE MONGOD MONGOD MONGOD CLIENT ELECTION
  94. 94. MASTER-SLAVE REPLICATION REPLICA SET MASTER MONGOD CLIENT SLAVE MONGOD MONGOD MINIMUM 3 MEMBERS TO FORM REPLICA SET
  95. 95. MASTER-SLAVE REPLICATION REPLICA SET SLAVE MASTER SLAVE MONGOD MONGOD MONGOD CLIENT REPLICATION SOLVES THE PROBLEM OF AVAILABILITY AND FAULT TOLERANCE
  96. 96. FUTURE OF MONGODB AND US 
  97. 97. COMPANIES USING MONGODB
  98. 98. MONGODB WINS AWARD
  99. 99. 36 MOST VALUABLE STARTUPS ON EARTH
  100. 100. POSTGRESQL RIAK MONGODB NEO4J ? SQL SERVER MYSQL ORACLE DREMEL POLYGLOT PERSISTENCE GOOD TO KNOW BOTH SQL AND NOSQL
  101. 101. WHAT WE DID NOT COVER SECURITY BACKUP/RECOVERY DATA MODELING ARCHITECT
  102. 102. THANK YOU VERY MUCH
  103. 103. AND THANK YOU TO EVERYONE WHO HELPED US DR. BILL HOWE, UNIVERSITY OF WASHINGTON JASON CHEN, MONGODB RECRUITER KRISTINA CHODOROW (DEFINITIVE GUIDE AUTHOR) FRANCESCA KRIHELY (MONGODB COMMUNITY MANAGER) DR. MARKUS SCHMIDBERGER, RMONGODB JOHANNES BRANDSTETTER, MONGOSOUP (THE FIRST EUROPEAN PARTNER OF MONGODB TO PROVIDE MONGODB AS A SERVICE) DR. RAMNATH VAIDYANATHAN, RCHARTS
  104. 104. REFERENCES MongoDB http://www.mongodb.org Book: MongoDB, The Definitive Guide – Kristina Chodorow Book: NoSQL Distilled – Pramod J. Sadalage and Martin Fowler NoSQL http://en.wikipedia.org/wiki/NoSQL MongoDB Use Cases http://www.mongodb.com/use-cases First NoSQL Meetup Notes http://developer.yahoo.com/blogs/ydn/notes-nosql-meetup7663.html Billion dollar club http://graphics.wsj.com/billion-dollar-club/ Photos from Google 
  105. 105. DEMO

×