• Save
MediaGlu and Mongo DB
Upcoming SlideShare
Loading in...5

MediaGlu and Mongo DB






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

MediaGlu and Mongo DB MediaGlu and Mongo DB Presentation Transcript

  • Big Data Tools Workshop Introduction to MongoDB
  • About MeSundar Nathikudi – Co-Founder & CTO , MLN Advertising – Former Principal Engineer , AOL Advertising
  • About MLN Advertising• Baltimore based Online Advertising Startup• MediaGlu - Cloud based Ad Platform• Our engineers use
  • Why Does MediaGlu Need Big Data?-MediaGlu takes a holistic approach to marketing and advertising - by using real data and user path tracking, our state of the art technology gives marketers the tools to find their customers wherever they are, in Search, Social & Content sites.Data ManagementTag Management: One tag tracks all Discover Insightsyour digital assets (website, socialmedia, etc). Channel Analytics: Visualize the Action effectiveness of your ad campaigns across media by learning whatMedia Attribution: Track user actions Campaign Management: Let MediaGlu interactions actually drive revenue.and attribute them, even from sources improve your media bidding andlike Facebook and Pinterest. creative scheduling in Real Time. User Analytics: Visualize the timeline in which users interact with yourReporting: See how user actions Budget Optimization: Get reports brand.connect across the digital space in one detailing which channels are providingintuitive interface. the most value. Predictive Analytics: Visualize how channel and user path metrics overlap Personalize Web/Social Experience: into predictable behavior. Custom tailored brand interactions based on known and predicted behavior.
  • Agenda• Big Data and NoSQL Basics• MongoDB Fundamentals• Running MongoDB• Lab #1 - Shell Commands• Lab #2 - MongoDB Map /Reduce and Aggregation Framework• Replication and Sharding
  • Era of Big Data
  • Era of Big Data
  • Era of Big Data• Facebook – 2.7 billion likes made daily on and off of the Facebook site – 300 million photos uploaded – 500+ terabytes of new data "ingested"• Twitter – 340 million tweets daily – 500 Million Users
  • Big Data - Definition• Volumes & Volumes of data• Data does not fit on one Rack.• Unstructured or Semi Structured
  • RDBMS & Big Data• Pros – Oracle, SQL server, MySql etc.. – Good for structured data and relational model – Supports Partitioning – ACID – Transactions• Cons – Joins make it difficult for horizontal scaling – Vertical scaling is limited by physics and cost – Hard to scale vertically in cloud.
  • NoSQL – Not Only Sql• Not using the relational model (nor the SQL language)• Open source• Designed to run on large clusters• No schema, allowing fields to be added to any record without controls• Based on the needs of web 2.0 properties• Rise of NoSQL = Polyglot Persistence
  • NoSQL – Data Models• Key Value Stores – data is stored in Key-Value pairs – support get, put, and delete operations based on a primary key – Couchbase(membase), Redis , Riak• Document – store data in structured “documents” such as JSON/XML with no support to relationships/joins – MongoDB, CouchDB, SimpleDB• Column Family (Big Table) – contains columns of related data – HBase, Cassandra• Graph – organize data into node and edge graphs; they work best for data that has complex relationship structures – Facebook social graph – Neo4J
  • NoSQL – CAP Theorem• Consistency - all nodes should see the same data at the same time.• Availability – node failures do not prevent ongoing writes /reads• Partition- Tolerance – system should continue to operate irrespective of data loss Eric Brewer – “distributed system can satisfy any two of these guarantees at the same time, but not all three”
  • NoSQL – Triangle of compromise
  • What’s a document database• Composed of Documents – Self describing• Schema Free• Store arbitrary data – Collections ,trees
  • What is JSON??• Java Script Object Notation• Lightweight data-interchange format• Elements of JSON { "id": 1, – Object : K/V Pairs "name": "Foo", "price": 123, – Key, String "tags": [ "Bar", "Eek" ], – Value "stock": { "warehouse": 300, "retail": 20 } • Number } • String • Boolean • Array • Object
  • MongoDB - Overview• BSON – Bin-ary-en-coded seri-al-iz-a-tion of JSON-like doc-u-ments(more at http://bsonspec.org/)• Schema Less• Embedded documents and arrays reduce need for joins• Scalable – Replication and Sharding• Best features of key /value store, document and relational databases in one .
  • MongoDB - Overview• Name stems from humongous• 10 gen• Written in C++• Understands Java script• Spider Monkey Java script Engine for server- side Javascript execution• Lots of language drivers available
  • Languages Supported
  • Blog Post - Relational Model
  • Blog Post - Document Model{ _id: 1234, author: { name: "Bob Davis", email : "bob@bob.com" }, post: "In these troubled times I like to …", date: { $date: "2010-07-12 13:23UTC" }, location: [ -121.2322, 42.1223222 ], rating: 2.2, comments: [ { user: "jgs32@hotmail.com", upVotes: 22, downVotes: 14, text: "Great point! I agree" }, { user: "holly.davidson@gmail.com", upVotes: 421, downVotes: 22, text: "You are an idiot" } ], tags: [ "Politics", "Virginia" ]}
  • No SQL vs. RDBMS terminology MySql NoSQL Database Database Table Collection Index Index Row Document Column Field Join Embedding and Linking Primary Key _id field Group By Aggregation
  • Installing Mongo DB• Mongo Distributions – OS X, Linux , Windows, Solaris – Runs on commodity hardware
  • Installing Mongo DB• Download Mongo DB server – http://www.mongodb.org/downloads – http://www.mlnsitelabs.com/mongodb – Extract the bin folder to C:MongoDB• Create Data Folder - C:MongoDbData• To Start from command line – Run Mongo.Bat• To install as a window service – Run MongoService.bat from command line.
  • Installing MongoVue GUI• Download MongoVue GUI tool – http://www.mongovue.com/downloads/ – http://www.mlnsitelabs.com/mongodb
  • System componentsmongod.exe mongo.exe database server shell mongos.exe sharding router
  • Learning MongoDB Shell• Interactive java script Shell• Use online browser shell – http://try.mongodb.org/• Or run from command line – mongo http://localhost:27017
  • Learning Shell Commands• Create Database – use student; – db.student.scores.find();• Inserting a document into collection – var student = {name: Jim, scores: [75, 99, 87.2]}; – db.scores.save(student); – var student = {name: John, scores: [35, 45, 55]}; – db.scores.save(student);
  • Learning Shell Commands• Querying a collection – db.scores.find(); – db.scores.find({scores: {$gt: 15}});• Updating a document – db.scores.update({name : Jim},{name: Jim, scores: [92,34,54]});• Deleting a document – db.scores.remove({name: Jim});
  • Lab #1 - Shell Commands• http://www.mlnsitelabs.com/mongodb/Labs/ Lab1 Lets Do it!!
  • Data Types• string• integer• boolean• double• null• array• object• binary data• regular expression
  • Query Selectors• Selectors – $ne – $lt – $lte – $gt – $gte – $in – $nin – $all
  • Learning Shell Commands• Creating an index – db.scores.ensureIndex(“{name:1}”)
  • Indexes• What is an Index?? – structure that allows you to quickly locate documents based on the values stored in certain specified fields.• Indexes enhance query performance
  • Indexes• Mongo DB Indexes – defines indexes on a per-collection level. – B-Tree Indexes – Compound indexes with multiple fields • db.scores.ensureIndex(“{ name: 1, id: 1 }”}; – Unique Index • db.addresses.ensureIndex( { "user_id": 1 }, { unique: true } ) – Sparse Index • db.addresses.ensureIndex( { "xmpp_id": 1 }, { sparse: true } )
  • Map Reduce• Pattern to allow computations to be parallelized over a cluster.• Group By in SQL
  • Map Reduce• Write two functions – Map and Reduce• Write them in Java script• Map Function : – Called once per document – returns key and values• Reduce Function – Called Once per key emitted, with an array of values• Finalize (optional) – Allowing rounding up of the reduced data set.
  • Map Reduce• User Profile{ "_id" : ObjectId("505e717a6794e396ac493e37"), "UserId" : NumberLong(5209704), "Browser" : "Microsoft Internet Explorer", "Gender" : "M", "CountryCode" : "US", "State" : "FL", "City" : "Spring Hill"}• Count the users from california by Browser and Gender
  • Map Reduce• Map Function – function() { var key = { Browser:this.Browser, Gender:this.Gender }; emit(key, { Count:1 }); }• Reduce Function • function(key, values) { var cnt = 0; values.forEach(function(value) { cnt += value.Count; }); return { Count:cnt }; }
  • Lab#2 – Map/Reduce• CSV file– user profile information• Count the users by Browser and Gender• Download • http://www.mlnsitelabs.com/mongodb/Labs/Lab2
  • Aggregation Framework• Map/Reduce is a big hammer – Sum, Average – Avoid java script overhead if you can• Aggregation Framework – Specify a pipeline – Pipeline = series of operations – Collections run through a pipeline to produce aggregated result
  • Aggregation Framework• $match – Uses query predicate• $project – Uses a sample document to determine the result• $unwind – Hands out the array elements one at a time• $group – Aggregates items into group defined by a key
  • Aggregation Framework• $sort – sort the result• $limit – Limit the number of documents to pass• $skip – Skip over the specified number of documents
  • Lab#3 – Aggregation framework• CSV file– user profile information• { aggregate : ‘UserProfileInfo, pipeline : [ { $match : {State:CA}}, { $group: {_id: {Browser : $Browser, Gender : $Gender}, Count:{$sum: 1 } }}, { $project : { _id :0, Browser : $_id.Browser, Gender : $_id.Gender, Count: 1} ]}
  • Replication• Data Redundancy• Automated Failover /HA• Read Scaling• Master – Slave Replication – Master handles writes – Slave handles reads
  • ReplicationSlave Slave Master Writes Client
  • Sharding• Partitioning of data among multiple machines• Enables Horizontal Scaling – writes per second• Partition a collection, specify a shard key – ex: _id, timestamp
  • Sharding Config Shard 1Client Router Shard 2 mongos Shard 3
  • GridFS• Specification for storing large files in MongoDB• BSON object in MongoDB are limited to 16MB Size• GridFS – Divide large files among multiple documents
  • References• Mongo Cookbook – http://cookbook.mongodb.org/• NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence – http://www.amazon.com/NoSQL-Distilled- Emerging-Polyglot-Persistence/dp/0321826620• Seven Databases in Seven Weeks – http://www.amazon.com/Seven-Databases- Weeks-Modern-Movement/dp/1934356921
  • Contacts{ name : “Sundar Nathikudi” mail: mln@mlnadvertising.com website: http://www.mlnadvertising.com }
  • Thank You