Managing Social Content with MongoDB

2,593 views

Published on

Media owners are turning to MongoDB to drive social interaction with their published content. The way customers consume information has changed and passive communication is no longer enough. They want to comment, share and engage with publishers and their community through a range of media types and via multiple channels whenever and wherever they are. There are serious challenges with taking this semi-structured and unstructured data and making it work in a traditional relational database. This webinar looks at how MongoDB’s schemaless design and document orientation gives organisation’s like the Guardian the flexibility to aggregate social content and scale out.

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,593
On SlideShare
0
From Embeds
0
Number of Embeds
1,660
Actions
Shares
0
Downloads
33
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Evolutions in computing are significantly impacting the traditional RDBMS.\nVolume of data is magnitudes higher than previously\ntens of millions quereis a second\nstructured and unstructured data\ncloud computing and storage\nscaling horizontally not vertically due reaching the capacity. buying a bigger box\ncommodity servers not expensive sans\nand developers are not doing waterfall development anymore, they want to be more agile\nflexible in their data models..\n
  • where is mongodb, when you compare functionality vs. performance?\nwe are to haveing most of the features of a relational database but not complex joins which arent scale\n
  • * No joins for scalability - Doing joins across shards in SQL highly inefficient and difficult to perform.\n* MongoDB is geared for easy scaling - going from a single node to a distributed cluster is easy.\n* Little or no application code changes are needed to scale from a single node to a sharded cluster.\n
  • \n
  • \n
  • \n
  • * you can always add und remove indexes during runtime (but reindexing will take some time)\n
  • \n
  • \n
  • * upserts - $push, $inc\n* atomicy\n
  • \n
  • \n
  • * Rich query language\n* Powerful - can do range queries $lt and $gt\n* Update - can update parts of documents\n
  • * upserts - $push, $inc\n* atomicy\n
  • * later? .. extending…: whats wrong with that schema?\ncomments… (a lot of comments) a single doc could be only 16meg in size), padding factors\n
  • * Also one to many pattern\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • change som customers to european ones\ncraigslist: had the problem, that they couldnt introduce new features as fast they want, because they had to change the schema for that, wich is a massive impact on the database. possible \n
  • and counting..\n..national archives, are digitalising their whole dataset and storing that into mongodb\n...the guardian, main database for every new project\n...navteq, discovering mongodb because of its location based features and now loving it because of the flexibility of the schema\n...cern : using for their data aggregation system. so all systems feeding that db results in 1M Updates a day.\n\n..a customer in france:\n250 million products stored (product data only, not images which are stored in our own CDN)\n- 300 million reads per day (peak at 1600 reads per second)\n- 150 million writes per day\n
  • \n
  • Managing Social Content with MongoDB

    1. 1. Managing Social Content with MongoDB Chris Harris - charris@10gen.com twitter: @cj_harris5
    2. 2. Traditional Architecture
    3. 3. Traditional Architecture HTML Web Server Application Server Controllers Services SQL Database
    4. 4. Challenge - Write Volumes
    5. 5. Increase in Write Ratio Users don’t just want to read content!They want to share and contribute to the content! Volume Writes!
    6. 6. Need to Scale Datasource JSON JSON JSON Web Server Application Server Service #1 SQL Bottleneck!
    7. 7. Need to Scale Datasource JSON JSON JSON Web Server Application Server Service #1 SQL Bottleneck!
    8. 8. Application Cache? JSON JSON JSON Web Server Application Server Service #1 App Cache SQL
    9. 9. Issues+ Read Only data comes from a Cache- Writes slow down as need to update theCache and the Database- Need to keep cache data in sync betweenApplication Servers
    10. 10. IT needs are evolving... Agile Development • Iterative • ContinuousData Volume, Type& Use• Trillions of records• 100’s of millions of queries per second• Real-Time Analytics• Unstructured / semi- New Hardware structured Architectures • Commodity servers • Cloud Computing • Horizontal Scaling
    11. 11. Tradeoff: Scale vs. Functionality • memcached scalability & performance • key/value • RDBMS depth of functionality
    12. 12. TerminologyRDBMS MongoDBTable CollectionRow(s) JSON DocumentIndex IndexJoin Embedding & Linking
    13. 13. Publishing Content with MongoDB
    14. 14. A simple startarticle = {author: "Chris", date: new Date(), title: "Managing Social Content"}> db.articles.save(article)Map the documents to your application.
    15. 15. Find the document> db.articles.find() { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Chris", date: ISODate("2012-01-23T14:01:00.117Z"), title: “Managing Social Content" }Note:• _id must be unique, but can be anything youd like• Default BSON ObjectId if one is not supplied
    16. 16. Add an index, find via index> db.articles.ensureIndex({author: 1})> db.articles.find({author: Chris}) { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Chris", date: ISODate("2012-01-23T14:01:00.117Z"), ... }Secondary index on "author"
    17. 17. Social Tagging
    18. 18. Extending the schema http://nysi.org.uk/kids_stuff/rocket/rocket.htm
    19. 19. Adding Tags > db.articles.update( {title: "Managing Social Content" }, {$push: {tags: [“MongoDB”, “NoSQL”]}} )Push social "tags" into the existing article
    20. 20. Find the document> db.articles.find() { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Chris", date: ISODate("2012-01-23T14:01:00.117Z"), title: "Managing Social Content", tags: [ "comic", "adventure" ] }Note:• _id must be unique, but can be anything youd like• Default BSON ObjectId if one is not supplied
    21. 21. Social Comments
    22. 22. Query operators• Conditional operators: ‣ $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. ‣ $lt, $lte, $gt, $gte, $ne• Update operators: ‣ $set, $inc, $push
    23. 23. Extending the Schemanew_comment = {author: "Marc", date: new Date(), text: "great article", stars: 5}> db.articles.update( {title: "Managing Social Content" }, {"$push": {comments: new_comment}, "$inc": {comments_count: 1} } )
    24. 24. Extending the Schema { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Chris", date: ISODate("2012-01-23T14:01:00.117Z"), title : "Managing Social Content", tags : [ "MongoDB", "NoSQL" ], comments : [{ author : "Marc", date : ISODate("2012-01-23T14:31:53.848Z"), text : "great article", stars : 5 }], comments_count: 1 }
    25. 25. Extending the Schema { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Chris", date: ISODate("2012-01-23T14:01:00.117Z"), title : "Managing Social Content", tags : [ "MongoDB", "NoSQL" ], comments : [{ author : "Marc", date : ISODate("2012-01-23T14:31:53.848Z"), text : "great article", stars : 5 }], comments_count: 1 }
    26. 26. Trees //Embedded Tree { ... comments : [{ author : "Marc", text : "...", replies : [{ author : "Fred", text : "..." replies : [], }] }] }+ PROs: Single Document, Performance, Intuitive- CONs: Hard to search, Partial Results, 16MB limit
    27. 27. One to Many - Normalized // Articles collection { _id : 1000, author : "Chris", date: ISODate("2012-01-23T14:01:00.117Z"), title : "Managing Social Content" } // Comments collection { _id : 1, article : 1000, author : "Marc", date : ISODate("2012-01-23T14:31:53.848Z"), ... }> article = db. articles.find({title: "Managing SocialContent"});> db.comments.find({article: article._id});
    28. 28. Array of Ancestors A B C// Store all ancestors of a node{ _id: "a" } E D{ _id: "b", thread: [ "a" ], replyTo: "a" }{ _id: "c", thread: [ "a", "b" ], replyTo: "b" } F{ _id: "d", thread: [ "a", "b" ], replyTo: "b" }{ _id: "e", thread: [ "a" ], replyTo: "a" }{ _id: "f", thread: [ "a", "e" ], replyTo: "e" }
    29. 29. Array of Ancestors A B C// Store all ancestors of a node{ _id: "a" } E D{ _id: "b", thread: [ "a" ], replyTo: "a" }{ _id: "c", thread: [ "a", "b" ], replyTo: "b" } F{ _id: "d", thread: [ "a", "b" ], replyTo: "b" }{ _id: "e", thread: [ "a" ], replyTo: "a" }{ _id: "f", thread: [ "a", "e" ], replyTo: "e" }// find all threads where b" is in> db.msg_tree.find({"thread": "b"})
    30. 30. Array of Ancestors A B C// Store all ancestors of a node{ _id: "a" } E D{ _id: "b", thread: [ "a" ], replyTo: "a" }{ _id: "c", thread: [ "a", "b" ], replyTo: "b" } F{ _id: "d", thread: [ "a", "b" ], replyTo: "b" }{ _id: "e", thread: [ "a" ], replyTo: "a" }{ _id: "f", thread: [ "a", "e" ], replyTo: "e" }// find all threads where b" is in> db.msg_tree.find({"thread": "b"})// find all direct message "b: replied to> db.msg_tree.find({"replyTo": "b"})
    31. 31. Array of Ancestors A B C// Store all ancestors of a node{ _id: "a" } E D{ _id: "b", thread: [ "a" ], replyTo: "a" }{ _id: "c", thread: [ "a", "b" ], replyTo: "b" } F{ _id: "d", thread: [ "a", "b" ], replyTo: "b" }{ _id: "e", thread: [ "a" ], replyTo: "a" }{ _id: "f", thread: [ "a", "e" ], replyTo: "e" }// find all threads where b" is in> db.msg_tree.find({"thread": "b"})// find all direct message "b: replied to> db.msg_tree.find({"replyTo": "b"})//find all ancestors of f:> threads = db.msg_tree.findOne({"_id": "f"}).thread> db.msg_tree.find({"_id ": { $in : threads})
    32. 32. Location, Location, Location!
    33. 33. Geospatial• Geo hash stored in B-Tree• First two values indexeddb.articles.save({ loc: { long: 40.739037, lat: 40.739037 }});db.articles.save({ loc: [40.739037, 40.739037]});db.articles.ensureIndex({"loc": "2d"})
    34. 34. Geospatial Query• Multi-location indexes for a single document• $near may return the document for each index match• $within will return a document once and once onlyFind 100 nearby locations:> db.locations.find({loc: {$near: [37.75, -122.42]}});Find all locations within a box>box = [[40, 40], [60, 60]]>db.locations.find({loc: {$within: {$box: box}}});
    35. 35. Social Aggregation
    36. 36. Aggregation framework• New aggregation framework • Declarative framework (no JavaScript) • Describe a chain of operations to apply • Expression evaluation • Return computed values • Framework: new operations added easily • C++ implementation
    37. 37. Aggregation - Pipelines• Aggregation requests specify a pipeline• A pipeline is a series of operations• Members of a collection are passed through a pipeline to produce a result • ps -ef | grep -i mongod
    38. 38. Example - twitter { "_id" : ObjectId("4f47b268fb1c80e141e9888c"), "user" : { "friends_count" : 73, "location" : "Brazil", "screen_name" : "Bia_cunha1", "name" : "Beatriz Helena Cunha", "followers_count" : 102, } }• Find the # of followers and # friends by location
    39. 39. Example - twitterdb.tweets.aggregate( {$match: {"user.friends_count": { $gt: 0 }, "user.followers_count": { $gt: 0 } } }, {$project: { location: "$user.location", friends: "$user.friends_count", followers: "$user.followers_count" } }, {$group: {_id: "$location", friends: {$sum: "$friends"}, followers: {$sum: "$followers"} } });
    40. 40. Example - twitterdb.tweets.aggregate( {$match: Predicate {"user.friends_count": { $gt: 0 }, "user.followers_count": { $gt: 0 } } }, {$project: { location: "$user.location", friends: "$user.friends_count", followers: "$user.followers_count" } }, {$group: {_id: "$location", friends: {$sum: "$friends"}, followers: {$sum: "$followers"} } });
    41. 41. Example - twitterdb.tweets.aggregate( {$match: Predicate {"user.friends_count": { $gt: 0 }, "user.followers_count": { $gt: 0 } } }, {$project: { location: "$user.location", Parts of the friends: "$user.friends_count", document you followers: "$user.followers_count" want to project } }, {$group: {_id: "$location", friends: {$sum: "$friends"}, followers: {$sum: "$followers"} } });
    42. 42. Example - twitterdb.tweets.aggregate( {$match: Predicate {"user.friends_count": { $gt: 0 }, "user.followers_count": { $gt: 0 } } }, {$project: { location: "$user.location", Parts of the friends: "$user.friends_count", document you followers: "$user.followers_count" want to project } }, {$group: {_id: "$location", Function to friends: {$sum: "$friends"}, followers: {$sum: "$followers"} apply to the } result set });
    43. 43. Example - twitter{ "result" : [ { "_id" : "Far Far Away", "friends" : 344, "followers" : 789 },... ], "ok" : 1}
    44. 44. DemoDemo files are at https://gist.github.com/ 2036709
    45. 45. Use CasesContent Management Operational Intelligence Product Data Management User Data Management High Volume Data Feeds
    46. 46. Some Customers..
    47. 47. Questions• 10Gen Services – Development Support – Consultancy – TAM – Production Support• Free online MongoDB training – Develop – Deploy – Classes start Oct. 2012 43

    ×