• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
MongoDB Workshop
 

MongoDB Workshop

on

  • 926 views

Workshop held at NYC Open Data Meetup

Workshop held at NYC Open Data Meetup

Statistics

Views

Total Views
926
Views on SlideShare
572
Embed Views
354

Actions

Likes
2
Downloads
28
Comments
0

6 Embeds 354

http://everythingdatascience.com 323
http://everythingdatascience.wordpress.com 14
http://nycdatascience.com 7
https://twitter.com 5
http://www.nycopendata.com 4
http://translate.googleusercontent.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    MongoDB Workshop MongoDB Workshop Presentation Transcript

    • MONGODB WORKSHOP { meetup: “NYC Open Data”, presenters: [“Kannan Sankaran”, “Roman Kubiak”], host: “Vivian”, location: “ThoughtWorks”, audience: “You guys” }
    • MONGODB WORKSHOP { meetup: “NYC Open Data”, presenters: [“Kannan Sankaran”, “Roman Kubiak”], host: “Vivian is awesome, THANK YOU”, location: “ThoughtWorks is awesome, THANK YOU”, audience: “You guys are awesome, THANK YOU” }
    • OUR TOPICS OVERVIEW OF DATABASES WHAT IS MONGODB? MONGODB, NOSQL, AND RELATIONAL DATABASES A PEEK AT MONGODB COMMANDS SHARDING AND REPLICATION IN MONGODB FUTURE OF MONGODB AND US DEMO WORKSHOP
    • MONGO PIE ARCHITECT
    • OVERVIEW OF DATABASES
    • ORGANIZING DATA ROWS COLUMNS TABLES
    • DATA SPREAD OUT IN VARIOUS TABLES
    • DATA MAY BE RELATED
    • DATABASES AND THEIR GROWTH RELATIONAL DATABASES (RDBMS) CREATED 1970s 1980s RDBMS CONTINUE TO BE POPULAR INTERNET ARRIVES 1990s CLIENT/SERVER MODEL STRUCTURED QUERY LANGUAGE (SQL) CREATED 2000s MONGODB CREATED 2007 INTERNET GROWS NoSQL DATABASES EMERGE
    • WHAT IS NoSQL?
    • A TWITTER HASHTAG #nosql
    • NOSQL GENERALLY REFERS TO DATABASES THAT DO NOT HAVE A FIXED ROW-COLUMN DATA ORGANIZATION STRUCTURE.
    • WHAT IS MONGODB?
    • A HUMONGOUS NoSQL DB
    • A HUMONGOUS NoSQL DB WHERE DATA IS ORGANIZED BY DOCUMENTS NOT ROWS COLLECTIONS NOT TABLES
    • WHAT IS A DOCUMENT?
    • A DOCUMENT IS LIKE A ROW… { _id: ObjectID(“12AB34CD56EF”), name: “Ed Brown”, orderDate: “2-1-2014” }
    • …BUT IT IS MORE FLEXIBLE { { _id: ObjectID(“12AB34CD56EF”), name: “Ed Brown”, orderDate: “2-1-2014”, payments: { car: “100.50”, hotel: “200” } _id: ObjectID(“12AB34CD56EF”), name: “Ed Brown”, orderDate: “2-1-2014”, payments: { car: “100.50”, hotel: “200” }, tags: [“shirt”, “tie”] } } THAT LOOKS LIKE A DOCUMENT WITHIN ANOTHER DOCUMENT! WHAT IS THIS? MULTIPLE VALUES WITHIN A COLUMN?
    • HOW LARGE CAN THIS DOCUMENT BE? { _id: ObjectID(“12AB34CD56EF”), name: “Ed Brown”, orderDate: “2-1-2014”, payments: { car: “100.50”, hotel: “200” } … … … } UP TO 16 MB LEO TOLSTOY’S 1225PAGE BOOK ON WAR AND PEACE CAN FIT IN 1 DOCUMENT, AS IT IS ONLY AROUND 3 MB.
    • ISN’T THAT JSON? WELL, ALMOST!
    • WHAT IS JSON? WEB SERVER { } MONGODB DATABASE “make”: “Chevy”, “model”: “Malibu”, “year”: 2014 { “vehicle”: “Chevy Malibu 2014”, “price”: { “min”: 22340, “max”: 29950 }, “citympg”: 25 }
    • WHAT IS JSON? JAVASCRIPT OBJECT NOTATION NAME-VALUE PAIRS { { } vehicle: “car”, make: “Malibu”, color: “blue” } name: “Kannan”, gender: “male”, favorites: { color: “blue” }, interests: [“MongoDB”, “R”]
    • MONGODB DOCUMENT { _id: ObjectID(“12AB34CD56EF”), name: “Kannan”, gender: “male”, favorites: { color: “blue” }, interests: [“MongoDB”, “R”], date: new Date() }
    • WHAT IS A COLLECTION?
    • A GROUP OF DOCUMENTS { SIMILAR { _id: ObjectID(“34AB34CD56EF”), name: “Ed Brown”, orderDate: “2-1-2014”, tags: [“shirt”, “tie”] _id: ObjectID(“12AB34CD56EF”), name: “Ed Brown”, orderDate: “2-1-2014” } { _id: ObjectID(“78AB34CD56EF”), name: “Roman Ku”, orderDate: “2-1-2014” } { _id: ObjectID(“56AB34CD56EF”), name: “Eva Green”, orderDate: “2-1-2014” DIFFERENT } { _id: ObjectID(“90AB34CD56EF”), name: “Roman Ku”, orderDate: “2-1-2014”, payments: { car: “100.50”, hotel: “200” } } { VERY DIFFERENT { _id: ObjectID(“35AB34CD56EF”), name: “Ed Brown”, orderDate: “2-1-2014” } { _id: ObjectID(“79AB34CD56EF”), vehicle: “car”, make: “Malibu”, color: “blue” } { _id: ObjectID(“57AB34CD56EF”), name: “Eva Green”, orderDate: “2-1-2014”, tags: [“shirt”, “tie”] _id: ObjectID(“13AB34CD56EF”), name: “Eva Green”, orderDate: “2-1-2014” } } }
    • MONGODB IS... A DOCUMENT-ORIENTED NOSQL DATABASE WHERE DATA CONSISTS OF DOCUMENTS STORED IN COLLECTIONS.
    • MONGODB FEATURES EASY TO LEARN DYNAMIC QUERY LANGUAGE - SEARCH BY FIELDS, REGULAR EXPRESSIONS - USER-DEFINED JAVASCRIPT FUNCTIONS - AGGREGATION, INCLUDING MAP/REDUCE INDEXING – SINGLE, COMPOUND, GEOSPATIAL REPLICATION LOAD BALANCING USING SHARDING GRIDFS TO STORE FILES
    • MONGODB USAGE CONTENT MANAGEMENT SYSTEMS E-COMMERCE WEBSITES LOG DATA AND HIERARCHICAL AGGREGATION REAL-TIME ANALYTICS
    • MONGODB, NOSQL, AND RELATIONAL DATABASES
    • DATABASE MANAGEMENT SYSTEMS BERKELEY INGRES ORACLE 1970s MOST SYSTEMS USE SOME FLAVOR OF SQL 1980s INFORMIX DB2 SYBASE SQL SERVER MS ACCESS POSTGRESQL MYSQL 1990s 2000s NETEZZA GREENPLUM VERTICA MARIADB MONGODB 2007
    • RELATIONAL DATABASES WERE / STILL ARE THE DEFACTO IN SEVERAL COMPANIES.
    • RELATIONAL DATABASE FEATURES C.R.U.D. OPERATIONS STRUCTURED QUERY LANGUAGE (SQL) FIXED DATABASE SCHEMAS NORMALIZATION REFERENTIAL INTEGRITY (E.G. FOREIGN KEYS, CONSTRAINTS) JOINS TRANSACTIONS - A.C.I.D. PROPERTIES INDEXES
    • IN THE LATE 90s/EARLY 2000s… DOT COM BUBBLE DOT COM BUST WEB SERVICES SOCIAL NETWORKS GOOGLE, AMAZON COMPUTER OWNERS/USERS WEBSITE DATA COLLECTION DATABASE SIZES
    • COMPUTING/STORAGE RESOURCES BECAME A CHALLENGE FOR SMALLER COMPANIES LIKE GOOGLE AND AMAZON THAT HAD LOTS OF DATA.
    • SCALE UP BIGGER MACHINE MORE DISK SPACE MORE RAM MORE PROCESSORS MORE EXPENSIVE SINGLE POINT OF FAILURE HARDWARE HAS LIMITS! SCALE OUT SMALLER LESS DISK SPACE MACHINES LESS RAM LESS PROCESSORS LESS EXPENSIVE NO SINGLE POINT OF FAILURE HIGHER RELIABILITY DESPITE FAILURE OF INDIVIDUAL MACHINES
    • RELATIONAL DATABASES WERE DESIGNED TO OPERATE ON A SINGLE MACHINE, AND SCALING OUT MEANT A LOT OF CHALLENGES.
    • SPLITTING DATA FOR SCALE OUT BY COLUMNS BY ROWS
    • WORDPRESS MYSQL SCHEMA WITH 2 TABLES
    • WP_POSTS A JOIN QUERY IN MYSQL WP_COMMENTS SELECT p.post_author, p.post_date, c.comment_author, c.comment_date FROM wp_posts AS p INNER JOIN wp_comments AS c ON p.ID = c.comment_post_ID WHERE p.ID = 1;
    • WP_POSTS A JOIN QUERY IN MYSQL WP_COMMENTS RESULT
    • SCALE OUT DATA BY ROWS WP_POSTS A B C WP_COMMENTS D
    • HOW COMPLICATED WOULD SCALING THIS BE?
    • JOINS MAY GET REALLY MESSY WITH MANY MACHINES (DISTRIBUTED JOINS)
    • WP_POSTS TRANSACTIONS MUST SATISFY A.C.I.D. PROPERTIES WP_COMMENTS BEGIN TRANSACTION TRY DELETE FROM wp_comments AS c WHERE c.comment_post_ID = 1; DELETE FROM wp_posts AS p WHERE p.ID = 1; CATCH IF ERROR THEN ROLLBACK TRANSACTION COMMIT TRANSACTION END TRANSACTION
    • TRANSACTIONS MAY TAKE A LONG TIME TO EXECUTE IF DATA IS ON DIFFERENT MACHINES (DISTRIBUTED TRANSACTIONS)
    • TO SPLIT THE DATA, A WHOLE BUNCH OF COMPROMISES MUST BE MADE IN RELATIONAL DATABASES
    • THIS GAVE RISE TO NONRELATIONAL SOLUTIONS
    • GOOGLE AMAZON
    • NoSQL SYSTEM CHARACTERISTICS C.R.U.D. OPERATIONS STRUCTURED QUERY LANGUAGE (SQL) FIXED DATABASE SCHEMAS NORMALIZATION REFERENTIAL INTEGRITY (E.G. FOREIGN KEYS, CONSTRAINTS) JOINS TRANSACTIONS – LIMITED A.C.I.D. PROPERTIES INDEXES OPEN SOURCE
    • HOW IS THIS SCALABILITY ACHIEVED IN MONGODB?
    • STACKING THE DATA
    • WP_POSTS STACKING THE DATA { NO NEED TO JOIN } WP_COMMENTS _id: 1, post_author: “Amy W”, post_date: “1/1/2014”, comments: [{ comment_author: “bestguy”, comment_date: “1/1/2014” },{ comment_author: “baddie”, comment_date: “1/10/2014” },{ comment_author: “clever24”, comment_date: “1/11/2014” }]
    • NOW, EACH DOCUMENT CAN BE IN A DIFFERENT MACHINE
    • WHAT ABOUT TRANSACTIONS?
    • MONGODB DOES NOT SUPPORT TRANSACTIONS
    • { BUT SINGLE DOCUMENT UPDATE IS ATOMIC _id: 1, post_author: “Amy W”, post_date: “1/1/2014”, comments: [{ comment_author: “bestguy”, comment_date: “1/1/2014” },{ comment_author: “baddie”, comment_date: “1/10/2014” },{ comment_author: “clever24”, comment_date: “1/11/2014” }] }
    • THE KEY IS TO FOCUS ON THE DATA MODEL
    • MONGODB CHARACTERISTICS C.R.U.D. OPERATIONS STRUCTURED QUERY LANGUAGE (SQL) DYNAMIC QUERY LANGUAGE FIXED DATABASE SCHEMAS FLEXIBLE DATABASE SCHEMAS NORMALIZATION REFERENTIAL INTEGRITY (E.G. FOREIGN KEYS, CONSTRAINTS) JOINS TRANSACTIONS – LIMITED A.C.I.D. PROPERTIES INDEXES OPEN SOURCE
    • WHEN NOT TO USE MONGODB IF TRANSACTIONS ARE A MUST IF JOINS ARE ABSOLUTELY NECESSARY SOFTWARE PRODUCTS LIKE WORDPRESS THAT ALREADY HAVE TONS OF SUPPORT FOR RELATIONAL DATABASES
    • FOR MONGODB vs MYSQL ARGUMENTS, WATCH… Source: http://www.youtube.com/watch?v=b2F-DItXtZs
    • A PEEK AT MONGODB COMMANDS
    • MONGODB IS A DOCUMENTORIENTED DATABASE { _id: ObjectID(“A1234566789”), name: “Ed Brown”, orderDate: “2-1-2014” } { _id: ObjectID(“A1234566789”), name: “Roman Ku”, orderDate: “1-1-2014” } { _id: ObjectID(“A1234566789”), name: “Eva Green”, orderDate: “10-12-2013” } DOCUMENTS ARE INTERNALLY STORED AS BSON (BINARY JSON)
    • MONGODB FEATURES EASY TO LEARN DYNAMIC QUERY LANGUAGE - SEARCH BY FIELDS, REGULAR EXPRESSIONS - USER-DEFINED JAVASCRIPT FUNCTIONS - AGGREGATION, INCLUDING MAP/REDUCE INDEXING – SINGLE, COMPOUND, GEOSPATIAL REPLICATION LOAD BALANCING USING SHARDING GRIDFS TO STORE FILES
    • MONGODB SYNTAX SEEMS TO BE BORROWED FROM… - MYSQL JSON JAVASCRIPT UNIX
    • MONGODB SUPPORTS SEVERAL LANGUAGES DRIVERS FOR - PYTHON - NODE.JS - C# - HADOOP - R AND MANY MORE
    • MONGODB TERMINOLOGY RDBMS MONGODB DATABASE TABLE ROW DATABASE COLLECTION DOCUMENT A DATABASE CAN HAVE 1 OR MORE COLLECTIONS. A COLLECTION CAN HAVE 1 OR MORE DOCUMENTS. A DOCUMENT CAN HAVE 1 OR MORE NAME-VALUE PAIRS, AND/OR 1 OR MORE EMBEDDED DOCUMENTS.
    • MONGODB SUPPORTS SEVERAL DATA TYPES STRING NUMBER BOOLEAN ARRAY DATE EMBEDDED DOCUMENT NULL
    • MONGODB OPERATIONS C.R.U.D. CREATE READ UPDATE DELETE
    • CONNECTING TO MONGODB MONGO SHELL IS A JAVASCRIPT INTERPRETER. MONGOD ROBOMONGO HAS THE SAME JAVASCRIPT ENGINE AS THE MONGO SHELL. MONGO ROBOMONGO
    • IMPORT JSON TO MONGO COLLECTION mongoimport -d tennis –c ParksNYC --type json --drop < ParksNYC.json
    • CREATE COLLECTION SQL CREATE TABLE ParksNYC ( id int identity(1, 1), Prop_ID varchar(10), Name varchar(50) not null, Location varchar(20) not null, EstablishedOn datetime ) MONGODB
    • CREATE DOCUMENT SQL MONGODB INSERT ParksNYC (Prop_ID, Name, Location, EstablishedOn) VALUES(’Q900’, ’Ridge Park’, ‘1843 Norman St.’, ‘1/1/1970’) Prop_ID Name Location EstablishedOn Q900 Ridge Park 1843 Norman St. 1/1/1970 db.ParksNYC.insert( { Prop_ID : "Q900", Name : "Ridge Park", Location : ”1843 Norman St.”, EstablishedOn: “1/1/1970” })
    • READ ALL DOCUMENTS SQL SELECT * FROM ParksNYC MONGODB db.ParksNYC.find()
    • READ SPECIFIC DOCUMENT SQL SELECT * FROM ParksNYC WHERE Name = "Ridge Park" MONGODB db.ParksNYC.find( { Name : "Ridge Park” })
    • READ FIRST DOCUMENT SQL SELECT TOP 1 * FROM ParksNYC MONGODB db.ParksNYC.findOne()
    • READ SPECIFIC FIELDS IN DOCUMENT SQL SELECT id, Name FROM ParksNYC MONGODB db.ParksNYC.find( { }, { _id: 1, Name: 1 } )
    • READ DOCUMENTS WITH RANGE CRITERIA SQL SELECT id, Name FROM ParksNYC WHERE Courts > 5 AND Courts <= 8 MONGODB db.ParksNYC.find( { Courts: { $gt: 5, $lte: 8} } )
    • READ DOCUMENTS THAT START WITH A LETTER (REGULAR EXPRESSION) SQL SELECT id, Name FROM ParksNYC WHERE NAME LIKE ‘F%’ MONGODB db.ParksNYC.find( { Name: /^F/ } )
    • UPDATE FIELD IN DOCUMENT SQL UPDATE ParksNYC SET VisitDate = ‘1/1/2014’ MONGODB db.ParksNYC.update( { }, { $set: { VisitDate: "1/1/2014" } }, { multi: true} )
    • DELETE DOCUMENT SQL DELETE FROM ParksNYC Where Name = ‘Ridge Park’ MONGODB db.ParksNYC.remove( { Name : “Ridge Park” })
    • GROUP BY AND SUM SQL SELECT COUNT(Name) AS Parks_Number, SUM(Courts) AS Courts_Number FROM ParksNYC GROUP BY Accessible MONGODB db.ParksNYC.aggregate( { $group : { _id : "$Accessible", Parks_Number : { $sum : 1 }, Courts_Number : { $sum : "$Courts" } } })
    • SHARDING AND REPLICATION IN MONGODB
    • EACH DOCUMENT CAN BE IN A DIFFERENT MACHINE
    • HOW DOES MONGODB DO THIS?
    • AUTOSHARDING, FOR A COLLECTION
    • MONGODB CLUSTER MONGOD MONGOD MONGOD MONGOD MONGOS CLIENT CLIENT
    • SHARDING STEPS 1. ENABLE SHARDING ON DATABASE. 2. PICK A SHARD KEY FROM THE COLLECTION. MAKE SURE THE KEY IS - INDEXED - SUFFICIENTLY UNIQUE SO IT WILL HAVE A VARIETY OF UNIQUE VALUES. 3. SIT BACK AND RELAX. MONGODB WILL AUTOMATICALLY DO THE SHARDING. 
    • SHARDING WP_POSTS COLLECTION { _id: 1, post_author: “Amy W”, post_date: “1/1/2014”, comments: [{ comment_author: “bestguy”, comment_date: “1/1/2014” },{ comment_author: “baddie”, comment_date: “1/10/2014” },{ comment_author: “clever24”, comment_date: “1/11/2014” }] } SHARD KEY
    • BREAKING THE USERS INTO CHUNKS $minKey Abba1234 Abba1235 CarlW CarlZ FrankT FrankY JackA JackB LambV LambW RobF RobG TimA TimB $maxKey
    • BREAKING THE RANGE INTO CHUNKS SHARD0000 MONGOD $minKey Abba1234 RobG TimA LambW RobF SHARD0001 MONGOD TimB $maxKey MONGOS CarlZ FrankT MONGOD SHARD0002 CLIENT FrankY JackA Abba1235 CarlW JackB LambV
    • BENEFITS OF SHARDING 1. 2. 3. 4. INCREASES AVAILABLE MEMORY. REDUCES LOAD ON THE SERVER. INCREASES HARD DISK SPACE. LOCATION-BASED SHARD KEYS CAN PUT DATA CLOSE TO THE USERS AND KEEP RELATED DATA TOGETHER.
    • MASTER-SLAVE REPLICATION REPLICA SET MASTER SLAVE SLAVE MONGOD MONGOD MONGOD CLIENT
    • MASTER-SLAVE REPLICATION REPLICA SET MASTER SLAVE SLAVE MONGOD MONGOD MONGOD CLIENT ELECTION
    • MASTER-SLAVE REPLICATION REPLICA SET MASTER MONGOD CLIENT SLAVE MONGOD MONGOD MINIMUM 3 MEMBERS TO FORM REPLICA SET
    • MASTER-SLAVE REPLICATION REPLICA SET SLAVE MASTER SLAVE MONGOD MONGOD MONGOD CLIENT REPLICATION SOLVES THE PROBLEM OF AVAILABILITY AND FAULT TOLERANCE
    • FUTURE OF MONGODB AND US 
    • COMPANIES USING MONGODB
    • MONGODB WINS AWARD
    • 36 MOST VALUABLE STARTUPS ON EARTH
    • POSTGRESQL RIAK MONGODB NEO4J ? SQL SERVER MYSQL ORACLE DREMEL POLYGLOT PERSISTENCE GOOD TO KNOW BOTH SQL AND NOSQL
    • WHAT WE DID NOT COVER SECURITY BACKUP/RECOVERY DATA MODELING ARCHITECT
    • THANK YOU VERY MUCH
    • AND THANK YOU TO EVERYONE WHO HELPED US DR. BILL HOWE, UNIVERSITY OF WASHINGTON JASON CHEN, MONGODB RECRUITER KRISTINA CHODOROW (DEFINITIVE GUIDE AUTHOR) FRANCESCA KRIHELY (MONGODB COMMUNITY MANAGER) DR. MARKUS SCHMIDBERGER, RMONGODB JOHANNES BRANDSTETTER, MONGOSOUP (THE FIRST EUROPEAN PARTNER OF MONGODB TO PROVIDE MONGODB AS A SERVICE) DR. RAMNATH VAIDYANATHAN, RCHARTS
    • REFERENCES MongoDB http://www.mongodb.org Book: MongoDB, The Definitive Guide – Kristina Chodorow Book: NoSQL Distilled – Pramod J. Sadalage and Martin Fowler NoSQL http://en.wikipedia.org/wiki/NoSQL MongoDB Use Cases http://www.mongodb.com/use-cases First NoSQL Meetup Notes http://developer.yahoo.com/blogs/ydn/notes-nosql-meetup7663.html Billion dollar club http://graphics.wsj.com/billion-dollar-club/ Photos from Google 
    • DEMO