Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Webinar: General Technical Overview of MongoDB


Published on

MongoDB is the leading open-source, document database. In this webinar we'll dive into the technical details of MongoDB by first mapping it from relational concepts. Next we'll discuss an example data model and associated query functionality using commands pulled straight from the MongoDB shell. Finally, we'll delve into some of the deployment functionality provided by MongoDB including solutions for data redundancy, node failover and auto-sharding.

Published in: Technology
  • Be the first to comment

Webinar: General Technical Overview of MongoDB

  1. 1. MongoDB Technical OverviewSandeep ParikhSolutions Architect, 10gen
  2. 2. AgendaRelational DatabasesMongoDB FeaturesMongoDB FunctionalityScaling and DeploymentAggregates, Statistics, AnalyticsAdvanced Topics
  3. 3. About 10gen•  Background –  Founded in 2007 –  First release of MongoDB in 2009 –  74M+ in funding•  MongoDB –  Core server –  Native drivers•  Subscriptions, Consulting, Training•  Monitoring
  4. 4. Relational Databases
  5. 5. Category ·Name ·URL Article Tag User ·Name ·Name ·Slug ·Name ·Email address ·Publish date ·URL ·Text Comment ·Comment ·Date ·AuthorRelational Databases
  6. 6. RDBMS Strengths•  Data stored is very compact•  Rigid schemas have led to powerful query capabilities•  Data is optimized for joins and storage•  Robust ecosystem of tools, libraries, integratons•  40+ years old!
  7. 7. Enter “Big Data”•  Gartner defines it with 3Vs•  Volume –  Vast amounts of data being collected•  Variety –  Evolving data –  Uncontrolled formats, no single schema –  Unknown at design time•  Velocity –  Inbound data speed –  Fast read/write operations –  Low latency
  8. 8. Mapping Big Data to RDBMS•  Difficult to store uncontrolled data formats•  Scaling via big iron or custom data marts/ partitioning schemes•  Schema must be known at design time•  Impedance mismatch with agile development and deployment techniques•  Doesn’t map well to native language constructs
  9. 9. MongoDB Features
  10. 10. Goals•  Scale horizontally over commodity systems•  Incorporate what works for RDBMSs –  Rich data models, ad-hoc queries, full indexes•  Drop what doesn’t work well –  Multi-row transactions, complex joins•  Do not homogenize APIs•  Match agile development and deployment workflows
  11. 11. Key Features•  Data stored as documents (JSON) –  Flexible-schema•  Full CRUD support (Create, Read, Update, Delete) –  Atomic in-place updates –  Ad-hoc queries: Equality, RegEx, Ranges, Geospatial•  Secondary indexes•  Replication – redundancy, failover•  Sharding – partitioning for read/write scalability
  12. 12. {name: “will”, name: “jeff”, {name: “brendan”, eyes: “blue”, eyes: “blue”, aliases: [“el diablo”]} birthplace: “NY”, height: 72, aliases: [“bill”, “la boss: “ben”} ciacco”], {name: “matt”, gender: ”???”, pizza: “DiGiorno”, boss: ”ben”} name: “ben”, height: 72, hat: ”yes”} boss: 555.555.1212}Document Oriented, Dynamic Schema
  13. 13. Seek = 5+ ms Read = really really fast Article Comment UserDisk seeks and data locality
  14. 14. Article User Comment Comment Comment Comment CommentDisk seeks and data locality
  15. 15. MongoDB Security•  SSL –  Between your app and MongoDB –  Between nodes in MongoDB cluster•  Authorization at the database level –  Read Only / Read + Write / Administrator•  Roadmap –  2.4: SASL, Kerberos authentication –  2.6: Pluggable authentication
  16. 16. Content Operational High Volume User Data E-CommerceManagement Intelligence Data Feeds ManagementUse Cases
  17. 17. MongoDB Functionality
  18. 18. Documents> var new_article = { author: “roger”, date: new Date(), title: “My Favorite 2012 Movies”, body: “Here are my favorite movies from 2012…” tags: [“horror”, “action”, “independent”]}>
  19. 19. Querying> db.articles.find(){ _id: ObjectId(“4c4ba5c0672c685e5e8aabf3”), author: “roger”, date: ISODate("2013-01-08T22:10:19.880Z") title: “My Favorite 2012 Movies”, body: “Here are my favorite movies from 2012…” tags: [“horror”, “action”, “independent”]}// _id is unique but can be anything you like
  20. 20. Indexes// create an ascending index on “author”> db.articles.ensureIndex({author:1})> db.articles.find({author:”roger”}){ _id: ObjectId(“4c4ba5c0672c685e5e8aabf3”), author: “roger”, …}
  21. 21. Ad-Hoc Queries// Query Operators:// $all, $exists, $mod, $ne, $in, $nin, $nor, $or,// $size, $type, $lt, $lte, $gt, $gte// find articles with any tags> db.articles.find({tags: {$exists: true}})// find posts matching a regular expression> db.articles.find( {author: /^rog*/i } )// count posts by author> db.articles.find( {author: ‘roger’} ).count()
  22. 22. Atomic Updates// Update Modifiers// $set, $unset, $inc, $push, $pushAll, $pull,// $pullAll, $bit> comment = { author: “fred”, date: new Date(), text: “Best list ever!”}> db.articles.update({ _id: “...” }, { $push: {comments: comment}});
  23. 23. Nested Documents{ _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "roger", date: ISODate("2013-01-08T22:10:19.880Z"), title: “My Favorite 2012 Movies”, body: “Here are my favorite movies from 2012…” tags: [“horror”, “action”, “independent”] comments : [ { author: "Fred", date: ISODate("2013-01-08T23:44:15.458Z"), text: "Best list ever!” } ]}
  24. 24. Secondary Indexes// Index nested documents> db.articles.ensureIndex({“”:1})> db.articles.find({“”:’Fred’})// Index on tags> db.articles.ensureIndex({tags: 1})> db.articles.find({tags: ’Manga’})// Geospatial indexes> db.articles.ensureIndex({location: “2d”})> db.posts.find({location: {$near: [22,42]}})
  25. 25. Scaling MongoDB
  26. 26. Scaling MongoDB•  Replica Sets –  Redundancy, failover, read scalability•  Sharding –  Auto-partitions data, read/write scalability•  Multi-datacenter deployments•  Tunable consistency•  Engineering for zero downtime
  27. 27. Client Application Driver Write Read Primary Secondary SecondaryReplica Sets
  28. 28. Node 1 Node 2 Secondary Secondary Heartbeat Re n tio p lic ica ati pl on Re Node 3 PrimaryReplica Set – Initialize
  29. 29. Primary Election Node 1 Node 2 Secondary Heartbeat Secondary Node 3Replica Set – Failure
  30. 30. Replication Node 1 Node 2 Secondary Primary Heartbeat Node 3Replica Set – Failover
  31. 31. Replication Node 1 Node 2 Secondary Primary Heartbeat n tio ica pl Re Node 3 RecoveryReplica Set – Recovery
  32. 32. Replication Node 1 Node 2 Secondary Primary Heartbeat n tio ica pl Re Node 3 SecondaryReplica Set – Recovered
  33. 33. Client Application Driver Write d Re a Re a Primary d Secondary SecondaryScaling Reads
  34. 34. App Server App Server App Server Mongos Mongos Mongos Config Node 1 Server Secondary Config Node 1 Server Secondary Config Node 1 Server Secondary Shard Shard ShardSharding
  35. 35. Data stored in shard•  Shard is a node of the Shard Shard cluster Primary•  For production Mongod or Secondary deployments a shard is a Secondary replica set
  36. 36. Config server stores meta data•  Config Server Config Node 1 –  Stores cluster chunk Server Secondary ranges and locations Config Node 1 Config Node 1 –  Production deployments Server Secondary or Server Secondary need 3 nodes Config Node 1 –  Two phase commit (not Server Secondary a replica set)
  37. 37. Mongos manages the data•  Mongos –  Acts as a router / balancer –  No local data (persists to config database) –  Can have 1 or many App Server App Server App Server App Server or Mongos Mongos Mongos
  38. 38. App Server App Server App Server Mongos Mongos Mongos Config Node 1 Server Secondary Config Node 1 Server Secondary Config Node 1 Server Secondary Shard Shard ShardSharding
  39. 39. Aggregates, Statistics, Analytics
  40. 40. Analyzing Data in MongoDB•  Custom application code –  Run your queries, compute your results•  Aggregation framework –  Declarative, pipeline-based approach•  Native Map/Reduce in MongoDB –  Javascript functions distributed across cluster•  Hadoop –  Offline batch processing/computation
  41. 41. Aggregation Framework// Operations: $project, $match, $limit, $skip, $unwind, $group, $sort{ db.article.aggregate( title: “this is my title” , { $project: { author: “bob” , author: 1, posted: new Date () , tags: 1, tags: [“fun”, “good”, “fun”], }}, comments: [ { $unwind: "$tags" }, { author:“joe”, { $group: { text: “this is cool” }, _id: “$tags”, { author:“sam” , authors: { text: “this is bad” } $addToSet : "$author" ], } other: { foo : 5 } }}} );
  42. 42. Mapping SQL to AggregationSQL  statement   MongoDB  command  SELECT  COUNT(*)  FROM   db.users.aggregate([  users      {  $group:  {_id:null,  count:  {$sum:1}}  }   ])  SELECT  SUM(price)   db.users.aggregate([  FROM  orders      {  $group:  {_id:null,  total:  {$sum:”$price”}}  }   ])  SELECT  cust_id,   db.users.aggregate([  SUM(PRICE)  from      {  $group:  {_id:”$cust_id”,  total:{$sum:”$price”}}  }  orders  GROUP  BY   ])  cust_id  SELECT  cust_id,   db.users.aggregate([  SUM(price)  FROM      {  $match:  {active:true}  },  orders  WHERE      {  $group:  {_id:”$cust_id”,  total:{$sum:”$price”}}  }  active=true  GROUP  BY   ])  cust_id  
  43. 43. Native Map/Reduce•  More complex aggregation tasks•  Map and Reduce functions written in JS•  Can be distributed across sharded cluster for increased parallelism
  44. 44. Map/Reduce Functionsvar map = function() { emit(, {votes: this.votes});};var reduce = function(key, values) { var sum = 0; values.forEach(function(doc) { sum += doc.votes; }); return {votes: sum};};
  45. 45. Hadoop and MongoDB•  MongoDB-Hadoop adapter•  1.0 released, 1.1 in development•  Supports Hadoop –  Map/Reduce, Streaming, Pig•  MongoDB as input/output storage for Hadoop jobs –  No need to go through HDFS•  Leverage power of Hadoop ecosystem against operational data in MongoDB
  46. 46. MongoDB Resources•  Presentations, Webinars –•  MongoDB documentation –•  Community – –
  47. 47. Questions