• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Mongodb hackathon 02
 

Mongodb hackathon 02

on

  • 273 views

 

Statistics

Views

Total Views
273
Views on SlideShare
273
Embed Views
0

Actions

Likes
0
Downloads
6
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Mongodb hackathon 02 Mongodb hackathon 02 Presentation Transcript

    • MongoDB Hackathon 02Vivek A. Ganesanvivganes@gmail.comBig Data Gods Meetup, Santa Clara, CA May18, 2013
    • Before we startCopyright 2013, Vivek A. Ganesan, All rights reserved 1o A BIG thank you to our sponsors –Big Data Cloudo Meeting Spaceo Food + Drinkso Consulting/Training
    • AgendaCopyright 2013, Vivek A. Ganesan, All rights reserved 2o Review of Hackathon 01o Data Modelingo Indexingo Aggregationo Map/Reduce
    • IntroductionCopyright 2013, Vivek A. Ganesan, All rights reserved 3o This is a hackathon, not a classo Which means we work on stuff togethero Please consult and help your team mateso There will be labs (that’s when we learn!)o Talk to your team mateso Figure out what problem you want to solveo Think about your data sets and how to model them inMongo DB
    • Review – MongoDB BasicsCopyright 2013, Vivek A. Ganesan, All rights reserved 4o MongoDB is a document-oriented NoSQL data storeo It saves data internally as Binary JSONo A mongo data store may hold multiple databaseso A database may have multiple collections (analog of tables)o A collection is a container of documentso Documents contain Key/Value pairso A default key of “_id” is inserted by MongoDB for all documentso User can set the value of “_id” to anything they wanto Documents are schema-freeo No fixed structure to a collectiono A collection can have documents with different key/value pairs
    • Review – Shell and ClientsCopyright 2013, Vivek A. Ganesan, All rights reserved 5o A Mongo Shell is a CLI client to MongoDBo Shell commands are Javascript functionso You can write your own Javascript code within the shello You can also import Javascript modules using load()o Mongo Shell looks for an initialization file : ~/.mongorc.jso Setup global variables hereo To use your favorite editor within the Mongo shell :o Set the environment variable EDITOR to your editoro MongoDB supports clients in several programming languages :o JS, Java, C, C++, C#, Scala, Python, Ruby, Perl and Erlang
    • Review – Mongo DB ObjectsCopyright 2013, Vivek A. Ganesan, All rights reserved 6o Note : Mongo Shell commands are in blue and output is in greeno Mongo uses a hierarchical naming scheme for database objectso The current database is always in the db objecto The db command prints the name of the current dbo A collection called “mycollection” in the current database :o db.mycollection (Note : This is a mongodb object)o Commands are methods invoked on objectso For e.g., to insert a document to db.mycollection collection :o db.mycollection.insert commando For e.g., to find documents in db.mycollection collection :o db.mycollection.find command
    • Review – CreateCopyright 2013, Vivek A. Ganesan, All rights reserved 7o First exercise :o Create a new database called “blog”o Create a collection called “users” and a collection called “posts”o Solution to first exercise :o use blog;o db; => blogo show collections; => system.indexeso db.createCollection(“users”); => { “ok” => 1 }o db.createCollection(“posts”); => { “ok” => 1 }o show collections; => posts, system.indexes, users
    • Review – InsertCopyright 2013, Vivek A. Ganesan, All rights reserved 8o Second Exercise :o In the “users” collection :o Insert a single document, {username: “admin”}o In the “posts” collection :o Insert ten posts using a loopo Blog data : post_title, post_body and post_tags as CSVo Solution to Second Exercise :o db.users.insert({username : “admin”});o for (var i = 1; i <= 10; i++) { db.posts.insert({post_title:"Title", post_body: "Post Body", post_tags:"tag1,tag2,tag3,tag4,tag5"}); }
    • Review – Updates with modifierCopyright 2013, Vivek A. Ganesan, All rights reserved 9o Third Exercise :o In the “posts” collection :o Update ten posts with an updated_at key and set it to thecurrent timestampo Solution to the Third Exercise :o Note : MongoDB replaces the entire document for anupdate call without a modifier (modifiers start with a‘$’ symbol)o db.posts.update({}, {$set : {updated_at: newDate()}}, false, true);
    • Review – Selective UpdatesCopyright 2013, Vivek A. Ganesan, All rights reserved 10o Fourth Exercise :o In the “posts” collection :o Update the posts such that the first three posts have a “foo”tag (use the cursor functionality to iterate)o Solution to the Fourth Exercise :o c = db.posts.find().limit(3);o while ( c.hasNext() ) {o post = c.next();o post["post_tags"] = post["post_tags"] + ",foo";o db.posts.save(post);o }
    • Review – Mastering findCopyright 2013, Vivek A. Ganesan, All rights reserved 11o In a Mongo Shell,o Find all posts but extract only the post_title fieldo db.posts.find({}, {post_title: 1, _id: 0});o List all posts but in reverse order of created_ono db.posts.find().sort({_id: -1});o Do the same as above but paginate in sets of threeo db.posts.find().sort({_id: -1}).skip(3).limit(3);o Find all posts that contain a tag called “foo”o db.posts.find({post_tags: /foo/});
    • Review – ModifiersCopyright 2013, Vivek A. Ganesan, All rights reserved 12o Fifth Exercise :o Modify “posts” collectiono Change the post_tags field to an array instead of aCSV listo c = db.posts.find();o while ( c.hasNext() ) {o post = c.next();o post["post_tags"] = post["post_tags"].split(",");o db.posts.save(post);o }
    • Data ModelingCopyright 2013, Vivek A. Ganesan, All rights reserved 13o http://docs.mongodb.org/manual/core/data-modeling/o When to reference?o When it makes sense to i.e. many-to-many relationshipso When document size is a concerno Some drivers may do this automaticallyo When to embed?o When it is “natural” for e.g. blog post and commentso When there is a need for atomic operationso When read performance is critical
    • Lab 01 – Model your data setCopyright 2013, Vivek A. Ganesan, All rights reserved 14o Break – 15 minuteso Lab 01 – 45 minutes - With your team :o Look at your data set and figure out how you will model ito How would you bulk load the data?o How would you handle errors while loading?o Implement the schema for your data seto Bulk load a small portion of your data seto Verify the load and also run some sample querieso Figure out what queries you would run frequently
    • IndexesCopyright 2013, Vivek A. Ganesan, All rights reserved 15o http://docs.mongodb.org/manual/core/indexes/o When to index?o Improve find performanceo Improve sort performanceo Note : There is a performance impact for writeso What to index?o Depends on the queryo Usually, most frequently searched for fieldso Sometimes, fields in embedded documents as well
    • Types of Indexes and OptionsCopyright 2013, Vivek A. Ganesan, All rights reserved 16o Unique indexes (_id has an unique index by default)o Simpleo Compound Indexeso Prefix order is important!o Text indexeso Sparse Indexeso Multi-key indexes (for arrays)o Geospatial and Geohaystack indexeso Indexes can be built in the background (recommended!)o Indexes can be named explicity (definitely recommened!)
    • Lab 02 – IndexesCopyright 2013, Vivek A. Ganesan, All rights reserved 17o Lab 02 – 30 minutes - With your team :o Look at the frequent queries from Lab 01 and :o Which would you index and why?o What kind of indexes are needed?o Since this is predominantly a read use case, index awayo Would you use the sparse index? For what and how?o Would you use the geospatial index? For what and how?o Would you use the TTL index? For what and how?
    • AggregationCopyright 2013, Vivek A. Ganesan, All rights reserved 18o Used for “group by”-like querieso Aggregation Framework (introduced in 2.1)o http://docs.mongodb.org/manual/aggregation/o Simple count : db.posts.count();o Using Aggregation Framework : db.posts.aggregate([{$group: { _id: null, count: {$sum: 1}}}]);o Check the reference for comparison with SQL group byo Still supports Map/Reduce (older approach and still relevant)
    • Lab 03 – AggregationCopyright 2013, Vivek A. Ganesan, All rights reserved 19o Lab 03 – 30 minutes - With your team :o Figure out what aggregations to run on the data set :o For e.g., average rating per user?o Or, average number of movies rated by all users?o Write the queries for these aggregations and test themo Are indexes helpful in aggregations? Why/Why not?o Are you better off just doing these in your client code?Why/Why not?o When would you use pipelined aggregations?
    • Map/ReduceCopyright 2013, Vivek A. Ganesan, All rights reserved 20o Scatter/Gather frameworko db.collection.mapReduce(map_fn, red_fn, {out: output_coll})o http://docs.mongodb.org/manual/aggregation/o Mapper – just emits key/value pairso Framework – Groups and sorts mapper output => Reducero Reducer – Applies a function on the input => Output Coll.o Distributed computation framework for full table scanso http://docs.mongodb.org/manual/tutorial/map-reduce-examples/
    • Lab 04 – Map/ReduceCopyright 2013, Vivek A. Ganesan, All rights reserved 21o Lab 04 – 30 minutes - With your team :o Go through the Map/Reduce exampleso Figure out what Map/Reduce functions you would useo Implement these functions (on a small data set)o Some things to think about :o Can you use Map/Reduce to “seed” yourrecommendations?o Can you use incremental Map/Reduce to “update”your recommendations? How would you do this?
    • Questions? Comments?Thank You!E-mail: vivganes@gmail.comTwitter : onevivekCopyright 2013, Vivek A. Ganesan, All rightsreserved22