Tuesday, February 8, 2011
Big and Fat                            Using MongoDB with deep and diverse datasets:                                      ...
About me                    •       My name is Jeremy McAnally                    •       “Software architect” at Intridea...
New book!Tuesday, February 8, 2011
New book!                                     s                                    y y                                   a...
Preface                            The Application   ™Tuesday, February 8, 2011
Tuesday, February 8, 2011
Tuesday, February 8, 2011
Disclaimer               We moved to (mostly) sql.Tuesday, February 8, 2011
Tuesday, February 8, 2011
Tuesday, February 8, 2011
Tuesday, February 8, 2011
YAK SHAVEYAK SHAVE     SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVE     SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK...
Lesson 1                   Abstraction is a double-                       edged sword.Tuesday, February 8, 2011
Abstract away!           Talking to all data (no matter           the source) the same way will                  keep you ...
users  =  MySQL::Query.execute("SELECT  *  FROM  users;")                 users.each  do  |u|                     posts  =...
users  =  User.all                      users.each  do  |u|                          posts  =  Post.find(:user_id  =>  u.i...
users  =  User.all                      users.each  do  |u|                          posts  =  Post.find(:user_id  =>  u.i...
...but wait!        MongoDB has a lot of features        that will perform better and be          less (and often better) ...
pharmacists  =  {}                            Patient.all.each  do  |patient|                                patient.presc...
AS                                 W P                            pharmacists  =  {}                               O A    ...
map  =  "function(){                                    this.prescriptions.forEach(                                       ...
map  =  "function(){                                    this.prescriptions.forEach(                                       ...
Lesson 2                      Schema design matters.Tuesday, February 8, 2011
DAT      Lesson 2                       A design matters.                  Schema                  MOD                    ...
Embedding                      works.          Embedding documents is a        smart decision in a lot of cases.Tuesday, F...
SELECT  *  FROM  patients  WHERE  id=212;                      SELECT  *  FROM  prescriptions  WHERE  patient_id=212;     ...
{                            "_id"  :  ObjectId("4d51959614971661303ea716"),                            "title"  :  "Blogs...
...but watch it.            You can also hit a ton of         performance and design issues.Tuesday, February 8, 2011
Tuesday, February 8, 2011
Tuesday, February 8, 2011
OUR GIANT DOCUMENT                 Mongo’s Pre-Allocated SpaceTuesday, February 8, 2011
Search,                            listing, etc.                                            “Reference”                   ...
Lesson 3                            Don’t go nuts.Tuesday, February 8, 2011
OH MAN MONGO                                             JUST GOT REAL UP                                                 ...
Schemaless Joy                    •       Transforming data models is a delightTuesday, February 8, 2011
Tuesday, February 8, 2011
Schemaless Joy                    •       Transforming data models is a delight                    •       Formless data i...
{                         "_id"  :  ObjectId("4d50c6c32472473e54122d29"),                         "name"  :  "Subject  A",...
>  db.subjects.find({2008:  {$ne:  null}})        {  "_id"  :  ObjectId("4d50c6c32472473e54122d29"),  "name"  :  "Subject ...
Schemaless Joy                    •       Transforming data models is a delight                    •       Formless data i...
Tuesday, February 8, 2011
Schemaless Joy                    •       Transforming data models is a delight                    •       Formless data i...
...but be wary.                   Going nuts will create                    headaches for you.Tuesday, February 8, 2011
Schemaless PainTuesday, February 8, 2011
Schemaless Pain                    •       Weird app behaviorTuesday, February 8, 2011
Schemaless Pain                    •       Weird app behavior                    •       Huge, long-running data transform...
Schemaless Pain                    •       Weird app behavior                    •       Huge, long-running data transform...
Schemaless Pain                    •       Weird app behavior                    •       Huge, long-running data transform...
Lesson 4                            Dig deep.Tuesday, February 8, 2011
>  db.runCommand({"serverStatus"  :  1})                            {                               "version"  :  "1.4.3",...
"opcounters"  :  {                                "insert"  :  0,                                "query"  :  1,           ...
"connections"  :  {                                "current"  :  1,                                "available"  :  19999  ...
Jeremy-­‐McAnallys-­‐MacBook-­‐Pro:~  jeremymcanally$  mongostat  connected  to:  127.0.0.1  insert/s  query/s  update/s  ...
Tuesday, February 8, 2011
db._adminCommand({  diagLogging  :  1  })Tuesday, February 8, 2011
db.currentOp()                {  inprog:  [  {  "opid"  :  35  ,  "op"  :  "query"  ,  "ns"  :                  "fundb.par...
>  db.oplog.$main.find()     {  "ts"  :  {  "t"  :  1290063566000,  "i"  :  1  },  "op"  :  "i",  "ns"  :  "ming     {  "t...
{  "ts"  :                           {  "t"  :  1290063566000,                               "i"  :  1                    ...
That’s all I got.                            Questions?Tuesday, February 8, 2011
Upcoming SlideShare
Loading in …5
×

Big and Fat: Using MongoDB with Deep and Diverse Data Sets (MongoATL version)

2,625 views

Published on

The best presentation ever. Life changing.

Published in: Technology
  • Be the first to comment

Big and Fat: Using MongoDB with Deep and Diverse Data Sets (MongoATL version)

  1. 1. Tuesday, February 8, 2011
  2. 2. Big and Fat Using MongoDB with deep and diverse datasets: A case studyTuesday, February 8, 2011
  3. 3. About me • My name is Jeremy McAnally • “Software architect” at Intridea • Write a lot of books, OSS, etc. • http://github.com/jm • http://twitter.com/jm • http://authoringebooks.com • http://wickhamhousebrand.comTuesday, February 8, 2011
  4. 4. New book!Tuesday, February 8, 2011
  5. 5. New book! s y y a a d d 2 to - ro m fTuesday, February 8, 2011
  6. 6. Preface The Application ™Tuesday, February 8, 2011
  7. 7. Tuesday, February 8, 2011
  8. 8. Tuesday, February 8, 2011
  9. 9. Disclaimer We moved to (mostly) sql.Tuesday, February 8, 2011
  10. 10. Tuesday, February 8, 2011
  11. 11. Tuesday, February 8, 2011
  12. 12. Tuesday, February 8, 2011
  13. 13. YAK SHAVEYAK SHAVE SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVE SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAKYAKTuesday, February 8, 2011
  14. 14. Lesson 1 Abstraction is a double- edged sword.Tuesday, February 8, 2011
  15. 15. Abstract away! Talking to all data (no matter the source) the same way will keep you sane.Tuesday, February 8, 2011
  16. 16. users  =  MySQL::Query.execute("SELECT  *  FROM  users;") users.each  do  |u|    posts  =  db.collection(posts).find(:user_id  =>  u[id])    #  [...]    comments  =  db.collection(comments).find("$where"  =>  "sum (this.admin_count,  this.moderator_count)  ==  5") endTuesday, February 8, 2011
  17. 17. users  =  User.all users.each  do  |u|    posts  =  Post.find(:user_id  =>  u.id)    #  [...]    comments  =  Comment.where("sum(this.admin_count,                                                          this.moderator_count)  ==  5") endTuesday, February 8, 2011
  18. 18. users  =  User.all users.each  do  |u|    posts  =  Post.find(:user_id  =>  u.id)    #  [...]    comments  =  Comment.with_five_things endTuesday, February 8, 2011
  19. 19. ...but wait! MongoDB has a lot of features that will perform better and be less (and often better) code.Tuesday, February 8, 2011
  20. 20. pharmacists  =  {} Patient.all.each  do  |patient|    patient.prescriptions.each  do  |prescription|        pharmacists[presciption.name]  ||=  0        pharmacists[presciption.name]  +=  1    end endTuesday, February 8, 2011
  21. 21. AS W P pharmacists  =  {} O A Patient.all.each  do  |patient| L    patient.prescriptions.each  do  |prescription| S R        pharmacists[presciption.name]  ||=  0        pharmacists[presciption.name]  +=  1    end C endTuesday, February 8, 2011
  22. 22. map  =  "function(){        this.prescriptions.forEach(            function(p)  {                  emit(p.name,  {  count  :  1  });        })}"       reduce  =  "function(k,  v)  {    var  number  =  0;    for  v.forEach(function()  {        number  +=  v[i].count;    });    return  {  count  :  number  };   }"       pharms  =  @patients.map_reduce(map,  reduce)Tuesday, February 8, 2011
  23. 23. map  =  "function(){        this.prescriptions.forEach(            function(p)  {                  emit(p.name,  {  count  :  1  });        })}"       reduce  =  "function(k,  v)  {    var  number  =  0;    for  v.forEach(function()  {        number  +=  v[i].count;    });    return  {  count  :  number  };   }"       pharms  =  @patients.map_reduce(map,  reduce)Tuesday, February 8, 2011
  24. 24. Lesson 2 Schema design matters.Tuesday, February 8, 2011
  25. 25. DAT Lesson 2 A design matters. Schema MOD ELTuesday, February 8, 2011
  26. 26. Embedding works. Embedding documents is a smart decision in a lot of cases.Tuesday, February 8, 2011
  27. 27. SELECT  *  FROM  patients  WHERE  id=212; SELECT  *  FROM  prescriptions  WHERE  patient_id=212; SELECT  *  FROM  appointments  WHERE  patient_id=212; SELECT  *  FROM  contacts  WHERE  patient_id=212; SELECT  *  FROM  claims  WHERE  patient_id=212; . . .Tuesday, February 8, 2011
  28. 28. {   "_id"  :  ObjectId("4d51959614971661303ea716"),   "title"  :  "Blogs  rawk.",   "body"  :  "Fo  realz",   "comments"  :  [     {       "user_name"  :  "Jeremy",       "user_id"  :  1234,       "body"  :  "Yup."     }   ] }Tuesday, February 8, 2011
  29. 29. ...but watch it. You can also hit a ton of performance and design issues.Tuesday, February 8, 2011
  30. 30. Tuesday, February 8, 2011
  31. 31. Tuesday, February 8, 2011
  32. 32. OUR GIANT DOCUMENT Mongo’s Pre-Allocated SpaceTuesday, February 8, 2011
  33. 33. Search, listing, etc. “Reference” Pharmacy Patient PharmacyTuesday, February 8, 2011
  34. 34. Lesson 3 Don’t go nuts.Tuesday, February 8, 2011
  35. 35. OH MAN MONGO JUST GOT REAL UP IN HERE Schemaless is fun! Having schemaless data has its own battery of advantages. nosqlTuesday, February 8, 2011
  36. 36. Schemaless Joy • Transforming data models is a delightTuesday, February 8, 2011
  37. 37. Tuesday, February 8, 2011
  38. 38. Schemaless Joy • Transforming data models is a delight • Formless data isn’t awkwardTuesday, February 8, 2011
  39. 39. {   "_id"  :  ObjectId("4d50c6c32472473e54122d29"),   "name"  :  "Subject  A",   "2007"  :  199,   "2008"  :  2002,   "2010"  :  387 }, {   "_id"  :  ObjectId("4d50c6d92472473e54122d2a"),   "name"  :  "Subject  B",   "2005"  :  8,   "2008"  :  99,   "2012"  :  466 }, {   "_id"  :  ObjectId("4d50c6f52472473e54122d2b"),   "name"  :  "Subject  C",   "2005"  :  100,   "2009"  :  120,   "2010"  :  1201,   "2012"  :  3469 }Tuesday, February 8, 2011
  40. 40. >  db.subjects.find({2008:  {$ne:  null}})       {  "_id"  :  ObjectId("4d50c6c32472473e54122d29"),  "name"  :  "Subject  A" {  "_id"  :  ObjectId("4d50c6d92472473e54122d2a"),  "name"  :  "Subject  B"Tuesday, February 8, 2011
  41. 41. Schemaless Joy • Transforming data models is a delight • Formless data isn’t awkward • Arbitrary embedding is awesomeTuesday, February 8, 2011
  42. 42. Tuesday, February 8, 2011
  43. 43. Schemaless Joy • Transforming data models is a delight • Formless data isn’t awkward • Arbitrary embedding is awesome • Building to work with schemaless data can lead to some really powerful app conceptsTuesday, February 8, 2011
  44. 44. ...but be wary. Going nuts will create headaches for you.Tuesday, February 8, 2011
  45. 45. Schemaless PainTuesday, February 8, 2011
  46. 46. Schemaless Pain • Weird app behaviorTuesday, February 8, 2011
  47. 47. Schemaless Pain • Weird app behavior • Huge, long-running data transformationsTuesday, February 8, 2011
  48. 48. Schemaless Pain • Weird app behavior • Huge, long-running data transformations • Annoying data transforms for development env’sTuesday, February 8, 2011
  49. 49. Schemaless Pain • Weird app behavior • Huge, long-running data transformations • Annoying data transforms for development env’s • Difficult to version data modelsTuesday, February 8, 2011
  50. 50. Lesson 4 Dig deep.Tuesday, February 8, 2011
  51. 51. >  db.runCommand({"serverStatus"  :  1}) {   "version"  :  "1.4.3",   "uptime"  :  96,   "localTime"  :  "Thu  Nov  18  2010  01:49:38   GMT-­‐0500  (EST)",   "globalLock"  :  {     "totalTime"  :  96005290,     "lockTime"  :  174040,     "ratio"  :  0.0018128167729090762   },   "mem"  :  {     "bits"  :  64,     "resident"  :  2,     "virtual"  :  2396,     "supported"  :  true,     "mapped"  :  0   },   "connections"  :  {     "current"  :  1,     "available"  :  19999   },Tuesday, February 8, 2011   "extra_info"  :  {
  52. 52. "opcounters"  :  {    "insert"  :  0,    "query"  :  1,    "update"  :  0,    "delete"  :  0,    "getmore"  :  0,    "command"  :  3 }Tuesday, February 8, 2011
  53. 53. "connections"  :  {    "current"  :  1,    "available"  :  19999 }Tuesday, February 8, 2011
  54. 54. Jeremy-­‐McAnallys-­‐MacBook-­‐Pro:~  jeremymcanally$  mongostat connected  to:  127.0.0.1 insert/s  query/s  update/s  delete/s  getmore/s  command/s  mapped    vsize        res  %  locked  %  idx  miss    conn                0              0                0                0                  0                  1            0      2396            3                0                    0          1                0              0                0                0                  0                  1            0      2396            3                0                    0          1                0              0                0                0                  0                  1            0      2396            3                0                    0          1                0              0                0                0                  0                  1            0      2396            3                0                    0          1                0              0                0                0                  0                  1            0      2396            3                0                    0          1                0              0                0                0                  0                  1            0      2396            3                0                    0          1                0              0                0                0                  0                  1            0      2396            3                0                    0          1                0              0                0                0                  0                  1            0      2396            3                0                    0          1  Tuesday, February 8, 2011
  55. 55. Tuesday, February 8, 2011
  56. 56. db._adminCommand({  diagLogging  :  1  })Tuesday, February 8, 2011
  57. 57. db.currentOp() {  inprog:  [  {  "opid"  :  35  ,  "op"  :  "query"  ,  "ns"  :   "fundb.parties"  ,                            "query"  :  "{  score  :  1.0  }"  ,  "inLock"  :  1  }                    ] }Tuesday, February 8, 2011
  58. 58. >  db.oplog.$main.find() {  "ts"  :  {  "t"  :  1290063566000,  "i"  :  1  },  "op"  :  "i",  "ns"  :  "ming {  "ts"  :  {  "t"  :  1290063569000,  "i"  :  1  },  "op"  :  "n",  "ns"  :  "",  " {  "ts"  :  {  "t"  :  1290063579000,  "i"  :  1  },  "op"  :  "n",  "ns"  :  "",  " {  "ts"  :  {  "t"  :  1290063581000,  "i"  :  1  },  "op"  :  "i",  "ns"  :  "ming {  "ts"  :  {  "t"  :  1290063581000,  "i"  :  2  },  "op"  :  "i",  "ns"  :  "mingTuesday, February 8, 2011
  59. 59. {  "ts"  :      {  "t"  :  1290063566000,          "i"  :  1      },      "op"  :  "i",      "ns"  :  "ming.foo",      "o"  :  {            "_id"  :  ObjectId("4ce4ceceabb1b65158000001"),            "field"  :  2      }   }Tuesday, February 8, 2011
  60. 60. That’s all I got. Questions?Tuesday, February 8, 2011

×