Big Data Tools Workshop  Introduction to MongoDB
About MeSundar Nathikudi  – Co-Founder & CTO , MLN Advertising  – Former Principal Engineer , AOL Advertising
About MLN Advertising• Baltimore based Online Advertising Startup• MediaGlu - Cloud based Ad Platform• Our engineers use
Why Does MediaGlu Need Big Data?-MediaGlu takes a holistic approach to marketing and advertising - by using real data and ...
Agenda• Big Data and NoSQL Basics• MongoDB Fundamentals• Running MongoDB• Lab #1 - Shell Commands• Lab #2 - MongoDB Map /R...
Era of Big Data
Era of Big Data
Era of Big Data• Facebook  – 2.7 billion likes made daily on and off of the    Facebook site  – 300 million photos uploade...
Big Data - Definition• Volumes & Volumes of data• Data does not fit on one Rack.• Unstructured or Semi Structured
RDBMS & Big Data• Pros   – Oracle, SQL server, MySql etc..   – Good for structured data and relational model   – Supports ...
NoSQL – Not Only Sql• Not using the relational model (nor the SQL  language)• Open source• Designed to run on large cluste...
NoSQL – Data Models• Key Value Stores    – data is stored in Key-Value pairs    – support get, put, and delete operations ...
NoSQL – CAP Theorem• Consistency - all nodes should see the same data at the same  time.• Availability – node failures do ...
NoSQL – Triangle of compromise
What’s a document database• Composed of Documents – Self describing• Schema Free• Store arbitrary data – Collections ,trees
What is JSON??• Java Script Object Notation• Lightweight data-interchange format• Elements of JSON            { "id": 1,  ...
MongoDB - Overview• BSON – Bin-ary-en-coded seri-al-iz-a-tion of  JSON-like doc-u-ments(more at  http://bsonspec.org/)• Sc...
MongoDB - Overview• Name stems from humongous• 10 gen• Written in C++• Understands Java script• Spider Monkey Java script ...
Languages Supported
Blog Post - Relational Model
Blog Post - Document Model{ _id: 1234, author: { name: "Bob Davis", email : "bob@bob.com" }, post: "In these troubled time...
No SQL vs. RDBMS terminology  MySql         NoSQL  Database      Database  Table         Collection  Index         Index  ...
Installing Mongo DB• Mongo Distributions  – OS X, Linux , Windows, Solaris  – Runs on commodity hardware
Installing Mongo DB• Download Mongo DB server  – http://www.mongodb.org/downloads  – http://www.mlnsitelabs.com/mongodb  –...
Installing MongoVue GUI• Download MongoVue GUI tool  – http://www.mongovue.com/downloads/  – http://www.mlnsitelabs.com/mo...
System componentsmongod.exe                            mongo.exe database server                         shell            ...
Learning MongoDB Shell• Interactive java script Shell• Use online browser shell  – http://try.mongodb.org/• Or run from co...
Learning Shell Commands• Create Database  – use student;  – db.student.scores.find();• Inserting a document into collectio...
Learning Shell Commands• Querying a collection  – db.scores.find();  – db.scores.find({scores: {$gt: 15}});• Updating a do...
Lab #1 - Shell Commands• http://www.mlnsitelabs.com/mongodb/Labs/  Lab1            Lets Do it!!
Data Types•   string•   integer•   boolean•   double•   null•   array•   object•   binary data•   regular expression
Query Selectors• Selectors  – $ne  – $lt  – $lte  – $gt  – $gte  – $in  – $nin  – $all
Learning Shell Commands• Creating an index  – db.scores.ensureIndex(“{name:1}”)
Indexes• What is an Index??  – structure that allows you to quickly locate    documents based on the values stored in cert...
Indexes• Mongo DB Indexes  – defines indexes on a per-collection level.  – B-Tree Indexes  – Compound indexes with multipl...
Map Reduce• Pattern to allow computations to be  parallelized over a cluster.• Group By in SQL
Map Reduce• Write two functions – Map and Reduce• Write them in Java script• Map Function :   – Called once per document –...
Map Reduce• User Profile{    "_id" : ObjectId("505e717a6794e396ac493e37"),    "UserId" : NumberLong(5209704),    "Browser"...
Map Reduce• Map Function   – function() {                var key = { Browser:this.Browser, Gender:this.Gender };          ...
Lab#2 – Map/Reduce• CSV file– user profile information• Count the users by Browser and Gender• Download  • http://www.mlns...
Aggregation Framework• Map/Reduce is a big hammer  – Sum, Average  – Avoid java script overhead if you can• Aggregation Fr...
Aggregation Framework• $match  – Uses query predicate• $project  – Uses a sample document to determine the result• $unwind...
Aggregation Framework• $sort  – sort the result• $limit  – Limit the number of documents to pass• $skip  – Skip over the s...
Lab#3 – Aggregation framework• CSV file– user profile information• { aggregate : ‘UserProfileInfo,      pipeline : [      ...
Replication•   Data Redundancy•   Automated Failover /HA•   Read Scaling•   Master – Slave Replication    – Master handles...
ReplicationSlave                       Slave         Master                   Writes          Client
Sharding• Partitioning of data among multiple machines• Enables Horizontal Scaling – writes per second• Partition a collec...
Sharding         Config                    Shard 1Client   Router     Shard 2          mongos                     Shard 3
GridFS• Specification for storing large files in  MongoDB• BSON object in MongoDB are limited to 16MB  Size• GridFS – Divi...
References• Mongo Cookbook  – http://cookbook.mongodb.org/• NoSQL Distilled: A Brief Guide to the Emerging  World of Polyg...
Contacts{    name : “Sundar Nathikudi”    mail: mln@mlnadvertising.com    website: http://www.mlnadvertising.com    }
Thank You
Upcoming SlideShare
Loading in...5
×

MediaGlu and Mongo DB

21,950

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
21,950
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

MediaGlu and Mongo DB

  1. 1. Big Data Tools Workshop Introduction to MongoDB
  2. 2. About MeSundar Nathikudi – Co-Founder & CTO , MLN Advertising – Former Principal Engineer , AOL Advertising
  3. 3. About MLN Advertising• Baltimore based Online Advertising Startup• MediaGlu - Cloud based Ad Platform• Our engineers use
  4. 4. Why Does MediaGlu Need Big Data?-MediaGlu takes a holistic approach to marketing and advertising - by using real data and user path tracking, our state of the art technology gives marketers the tools to find their customers wherever they are, in Search, Social & Content sites.Data ManagementTag Management: One tag tracks all Discover Insightsyour digital assets (website, socialmedia, etc). Channel Analytics: Visualize the Action effectiveness of your ad campaigns across media by learning whatMedia Attribution: Track user actions Campaign Management: Let MediaGlu interactions actually drive revenue.and attribute them, even from sources improve your media bidding andlike Facebook and Pinterest. creative scheduling in Real Time. User Analytics: Visualize the timeline in which users interact with yourReporting: See how user actions Budget Optimization: Get reports brand.connect across the digital space in one detailing which channels are providingintuitive interface. the most value. Predictive Analytics: Visualize how channel and user path metrics overlap Personalize Web/Social Experience: into predictable behavior. Custom tailored brand interactions based on known and predicted behavior.
  5. 5. Agenda• Big Data and NoSQL Basics• MongoDB Fundamentals• Running MongoDB• Lab #1 - Shell Commands• Lab #2 - MongoDB Map /Reduce and Aggregation Framework• Replication and Sharding
  6. 6. Era of Big Data
  7. 7. Era of Big Data
  8. 8. Era of Big Data• Facebook – 2.7 billion likes made daily on and off of the Facebook site – 300 million photos uploaded – 500+ terabytes of new data "ingested"• Twitter – 340 million tweets daily – 500 Million Users
  9. 9. Big Data - Definition• Volumes & Volumes of data• Data does not fit on one Rack.• Unstructured or Semi Structured
  10. 10. RDBMS & Big Data• Pros – Oracle, SQL server, MySql etc.. – Good for structured data and relational model – Supports Partitioning – ACID – Transactions• Cons – Joins make it difficult for horizontal scaling – Vertical scaling is limited by physics and cost – Hard to scale vertically in cloud.
  11. 11. NoSQL – Not Only Sql• Not using the relational model (nor the SQL language)• Open source• Designed to run on large clusters• No schema, allowing fields to be added to any record without controls• Based on the needs of web 2.0 properties• Rise of NoSQL = Polyglot Persistence
  12. 12. NoSQL – Data Models• Key Value Stores – data is stored in Key-Value pairs – support get, put, and delete operations based on a primary key – Couchbase(membase), Redis , Riak• Document – store data in structured “documents” such as JSON/XML with no support to relationships/joins – MongoDB, CouchDB, SimpleDB• Column Family (Big Table) – contains columns of related data – HBase, Cassandra• Graph – organize data into node and edge graphs; they work best for data that has complex relationship structures – Facebook social graph – Neo4J
  13. 13. NoSQL – CAP Theorem• Consistency - all nodes should see the same data at the same time.• Availability – node failures do not prevent ongoing writes /reads• Partition- Tolerance – system should continue to operate irrespective of data loss Eric Brewer – “distributed system can satisfy any two of these guarantees at the same time, but not all three”
  14. 14. NoSQL – Triangle of compromise
  15. 15. What’s a document database• Composed of Documents – Self describing• Schema Free• Store arbitrary data – Collections ,trees
  16. 16. What is JSON??• Java Script Object Notation• Lightweight data-interchange format• Elements of JSON { "id": 1, – Object : K/V Pairs "name": "Foo", "price": 123, – Key, String "tags": [ "Bar", "Eek" ], – Value "stock": { "warehouse": 300, "retail": 20 } • Number } • String • Boolean • Array • Object
  17. 17. MongoDB - Overview• BSON – Bin-ary-en-coded seri-al-iz-a-tion of JSON-like doc-u-ments(more at http://bsonspec.org/)• Schema Less• Embedded documents and arrays reduce need for joins• Scalable – Replication and Sharding• Best features of key /value store, document and relational databases in one .
  18. 18. MongoDB - Overview• Name stems from humongous• 10 gen• Written in C++• Understands Java script• Spider Monkey Java script Engine for server- side Javascript execution• Lots of language drivers available
  19. 19. Languages Supported
  20. 20. Blog Post - Relational Model
  21. 21. Blog Post - Document Model{ _id: 1234, author: { name: "Bob Davis", email : "bob@bob.com" }, post: "In these troubled times I like to …", date: { $date: "2010-07-12 13:23UTC" }, location: [ -121.2322, 42.1223222 ], rating: 2.2, comments: [ { user: "jgs32@hotmail.com", upVotes: 22, downVotes: 14, text: "Great point! I agree" }, { user: "holly.davidson@gmail.com", upVotes: 421, downVotes: 22, text: "You are an idiot" } ], tags: [ "Politics", "Virginia" ]}
  22. 22. No SQL vs. RDBMS terminology MySql NoSQL Database Database Table Collection Index Index Row Document Column Field Join Embedding and Linking Primary Key _id field Group By Aggregation
  23. 23. Installing Mongo DB• Mongo Distributions – OS X, Linux , Windows, Solaris – Runs on commodity hardware
  24. 24. Installing Mongo DB• Download Mongo DB server – http://www.mongodb.org/downloads – http://www.mlnsitelabs.com/mongodb – Extract the bin folder to C:MongoDB• Create Data Folder - C:MongoDbData• To Start from command line – Run Mongo.Bat• To install as a window service – Run MongoService.bat from command line.
  25. 25. Installing MongoVue GUI• Download MongoVue GUI tool – http://www.mongovue.com/downloads/ – http://www.mlnsitelabs.com/mongodb
  26. 26. System componentsmongod.exe mongo.exe database server shell mongos.exe sharding router
  27. 27. Learning MongoDB Shell• Interactive java script Shell• Use online browser shell – http://try.mongodb.org/• Or run from command line – mongo http://localhost:27017
  28. 28. Learning Shell Commands• Create Database – use student; – db.student.scores.find();• Inserting a document into collection – var student = {name: Jim, scores: [75, 99, 87.2]}; – db.scores.save(student); – var student = {name: John, scores: [35, 45, 55]}; – db.scores.save(student);
  29. 29. Learning Shell Commands• Querying a collection – db.scores.find(); – db.scores.find({scores: {$gt: 15}});• Updating a document – db.scores.update({name : Jim},{name: Jim, scores: [92,34,54]});• Deleting a document – db.scores.remove({name: Jim});
  30. 30. Lab #1 - Shell Commands• http://www.mlnsitelabs.com/mongodb/Labs/ Lab1 Lets Do it!!
  31. 31. Data Types• string• integer• boolean• double• null• array• object• binary data• regular expression
  32. 32. Query Selectors• Selectors – $ne – $lt – $lte – $gt – $gte – $in – $nin – $all
  33. 33. Learning Shell Commands• Creating an index – db.scores.ensureIndex(“{name:1}”)
  34. 34. Indexes• What is an Index?? – structure that allows you to quickly locate documents based on the values stored in certain specified fields.• Indexes enhance query performance
  35. 35. Indexes• Mongo DB Indexes – defines indexes on a per-collection level. – B-Tree Indexes – Compound indexes with multiple fields • db.scores.ensureIndex(“{ name: 1, id: 1 }”}; – Unique Index • db.addresses.ensureIndex( { "user_id": 1 }, { unique: true } ) – Sparse Index • db.addresses.ensureIndex( { "xmpp_id": 1 }, { sparse: true } )
  36. 36. Map Reduce• Pattern to allow computations to be parallelized over a cluster.• Group By in SQL
  37. 37. Map Reduce• Write two functions – Map and Reduce• Write them in Java script• Map Function : – Called once per document – returns key and values• Reduce Function – Called Once per key emitted, with an array of values• Finalize (optional) – Allowing rounding up of the reduced data set.
  38. 38. Map Reduce• User Profile{ "_id" : ObjectId("505e717a6794e396ac493e37"), "UserId" : NumberLong(5209704), "Browser" : "Microsoft Internet Explorer", "Gender" : "M", "CountryCode" : "US", "State" : "FL", "City" : "Spring Hill"}• Count the users from california by Browser and Gender
  39. 39. Map Reduce• Map Function – function() { var key = { Browser:this.Browser, Gender:this.Gender }; emit(key, { Count:1 }); }• Reduce Function • function(key, values) { var cnt = 0; values.forEach(function(value) { cnt += value.Count; }); return { Count:cnt }; }
  40. 40. Lab#2 – Map/Reduce• CSV file– user profile information• Count the users by Browser and Gender• Download • http://www.mlnsitelabs.com/mongodb/Labs/Lab2
  41. 41. Aggregation Framework• Map/Reduce is a big hammer – Sum, Average – Avoid java script overhead if you can• Aggregation Framework – Specify a pipeline – Pipeline = series of operations – Collections run through a pipeline to produce aggregated result
  42. 42. Aggregation Framework• $match – Uses query predicate• $project – Uses a sample document to determine the result• $unwind – Hands out the array elements one at a time• $group – Aggregates items into group defined by a key
  43. 43. Aggregation Framework• $sort – sort the result• $limit – Limit the number of documents to pass• $skip – Skip over the specified number of documents
  44. 44. Lab#3 – Aggregation framework• CSV file– user profile information• { aggregate : ‘UserProfileInfo, pipeline : [ { $match : {State:CA}}, { $group: {_id: {Browser : $Browser, Gender : $Gender}, Count:{$sum: 1 } }}, { $project : { _id :0, Browser : $_id.Browser, Gender : $_id.Gender, Count: 1} ]}
  45. 45. Replication• Data Redundancy• Automated Failover /HA• Read Scaling• Master – Slave Replication – Master handles writes – Slave handles reads
  46. 46. ReplicationSlave Slave Master Writes Client
  47. 47. Sharding• Partitioning of data among multiple machines• Enables Horizontal Scaling – writes per second• Partition a collection, specify a shard key – ex: _id, timestamp
  48. 48. Sharding Config Shard 1Client Router Shard 2 mongos Shard 3
  49. 49. GridFS• Specification for storing large files in MongoDB• BSON object in MongoDB are limited to 16MB Size• GridFS – Divide large files among multiple documents
  50. 50. References• Mongo Cookbook – http://cookbook.mongodb.org/• NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence – http://www.amazon.com/NoSQL-Distilled- Emerging-Polyglot-Persistence/dp/0321826620• Seven Databases in Seven Weeks – http://www.amazon.com/Seven-Databases- Weeks-Modern-Movement/dp/1934356921
  51. 51. Contacts{ name : “Sundar Nathikudi” mail: mln@mlnadvertising.com website: http://www.mlnadvertising.com }
  52. 52. Thank You

×