Your SlideShare is downloading. ×
  • Like
MongoDB for Genealogy
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

MongoDB for Genealogy

  • 5,260 views
Published

Why MongoDB is a great fit for genealogical data thanks to it’s flexible schema, rich documents and ability to scale to humongous data sets.

Why MongoDB is a great fit for genealogical data thanks to it’s flexible schema, rich documents and ability to scale to humongous data sets.

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
5,260
On SlideShare
0
From Embeds
0
Number of Embeds
6

Actions

Shares
Downloads
146
Comments
0
Likes
5

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Remember in 1995 there were around 10,000 websites. Mosiac, Lynx, Mozilla (pre netscape) and IE 2.0 were the only web browsers. \nApache (Dec ’95), Java (’96), PHP (June ’95), and .net didn’t exist yet. Linux just barely (1.0 in ’94)\n
  • Remember in 1995 there were around 10,000 websites. Mosiac, Lynx, Mozilla (pre netscape) and IE 2.0 were the only web browsers. \nApache (Dec ’95), Java (’96), PHP (June ’95), and .net didn’t exist yet. Linux just barely (1.0 in ’94)\n
  • Remember in 1995 there were around 10,000 websites. Mosiac, Lynx, Mozilla (pre netscape) and IE 2.0 were the only web browsers. \nApache (Dec ’95), Java (’96), PHP (June ’95), and .net didn’t exist yet. Linux just barely (1.0 in ’94)\n
  • \n
  • \n
  • \n
  • \n
  • By reducing transactional semantics the db provides, one can still solve an interesting set of problems where performance is very important, and horizontal scaling then becomes easier.\n\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Store an array of the id of the ancestor of a given document\n
  • Store an array of the id of the ancestor of a given document\n
  • Store an array of the id of the ancestor of a given document\n
  • Store an array of the id of the ancestor of a given document\n
  • Store an array of the id of the ancestor of a given document\n
  • Store an array of the id of the ancestor of a given document\n
  • Store an array of the id of the ancestor of a given document\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

Transcript

  • 1. Storing the FamilyTree with
  • 2. We’re going to talk aboutMongoDB Intro & FundamentalsMongoDB for Genealogy dataScaling MongoDB for all the generationsThe Family TreeStoring a graph in MongoDB
  • 3. Steve @sp A 15+ years building the internet Father, husband, skateboarder, genealogist at ❤Chief Solutions Architect @responsible for drivers,integrations, web & docs
  • 4. Company behind MongoDBOffices in NYC, Palo Alto, London & Dublin100+ employeesSupport, consulting, trainingMgt: Google/DoubleClick, Oracle, Apple, NetApp, Mark LogicWell Funded: Sequoia, Union Square, Flybridge
  • 5. Introduction toMongoD
  • 6. A bit ofhistory
  • 7. 1974The relational database is created
  • 8. 1979
  • 9. 1979 1994
  • 10. 1979 1994 1995
  • 11. Computers in 1995100 mhz Pentium10 base T16 MB ram200 MB HD
  • 12. Cloud in 1995(Windows 95 cloud wallpaper)
  • 13. Cell Phones in 2012Dual core 1.5Ghz802.11n (300+ Mbps)1 GB ram64 GB Solid State
  • 14. MongoDB Application Document Oriented High { author : “steve”, date : new Date(),Performance text : “About MongoDB...”, tags : [“tech”, “database”]} Fully Consistent Horizontally Scalable
  • 15. MongoDB philosophy Keep functionality when we can (key/value stores are great, but we need more) Non-relational (no joins) makes scaling horizontally practical Document data models are good Database technology should run anywhere virtualized, cloud, metal, etc
  • 16. Under the hoodWritten in C++Runs nearly everywhereData serialized to BSONExtensive use of memory-mapped filesi.e. read-through write-throughmemory caching.
  • 17. Database LandscapeScalability & Performance MemCache MongoDB RDBMS Depth of Functionality
  • 18. “MongoDB has the bestfeatures of key/valuestores, documentdatabases and relationaldatabases in one. John Nunemaker
  • 19. Relational made normalized data look like this Category • Name • Url Article User • Name Tag• Name • Slug • Name• Email Address • Publish date • Url • Text Comment • Comment • Date • Author
  • 20. Document databases makenormalized data look like this Article • Name • Slug • Publish date User • Text • Name • Author • Email Address Comment[] • Comment • Date • Author Tag[] • Value Category[] • Value
  • 21. But we’ve been usinga relational database for 40 years!
  • 22. How do people storedocuments in real life?
  • 23. Think about adoctors office There’s two ways theycould organize their files
  • 24. Each document type in it’s own drawerMRIs X-rays Lab Invoices Index 1 1 1 1 1 1 1 1 History Medications Lab Forms
  • 25. Each document type in it’s own drawerMRIs X-rays Lab Invoices Index 1 1 1 1 1 1 1 1 History Medications Lab Forms
  • 26. Each document type in it’s own drawerMRIs X-rays Lab Invoices Index 1 1 1 1 1 1 1 1 History Medications Lab Forms
  • 27. 2. Group related records Patient 1 Patient 2 Patient 3 ... Vendor 1 Vendor 2 Vendor 3
  • 28. 2. Group related records Patient 1 Patient 3 ... Patient 2 Vendor 1 Vendor 2 Vendor 3
  • 29. Databases work the same way Relation Docum Patient 1 Vendor 1 Article Category • Name • Name • Slug • Url • Publish User date • Text • Name • Author • Email Address Article User Tag • Name Comment[]• Name • Name• Email • Slug • Url • Comment Address • Publish date • Date • Author Comment Tag[] • Comment • Value • Date • Author Category[] • Value
  • 30. Terminology RDBMS MongoTable, View ➜ CollectionRow ➜ DocumentIndex ➜ IndexJoin ➜ EmbeddedForeign Key ➜ Document ReferencePartition ➜ Shard
  • 31. Why MongoDB My Top 10 Reasons10. Great developer experience 9. Speaks your language 8. Scale horizontally 7. Fully consistent data w/atomic operations1.It’s web scale 6. Memory caching integrated5. Open source 4. Flexible, rich & structured data format not just K:V 3. Ludicrously fast (without going plaid) 2. Simplify infrastructure & application
  • 32. Why MongoDB My Top 10 Reasons10. Great developer experience 9. Speaks your language 8. Scale horizontally 7. Fully consistent data w/atomic operations1.It’s web scale 6. Memory caching integrated5. Open source 4. Flexible, rich & structured data format not just K:V 3. Ludicrously fast (without going plaid) 2. Simplify infrastructure & application
  • 33. MongoDBUse Cases
  • 34. CMS / BlogNeeds:• Business needed modern data store for rapid development and scaleSolution:• Use PHP & MongoDBResults:• Real time statistics• All data, images, etc stored together easy access, easy deployment, easy high availability• No need for complex migrations• Enabled very rapid development and growth
  • 35. Photo Meta-DataProblem:• Business needed more flexibility than Oracle could deliverSolution:• Use MongoDB instead of OracleResults:• Developed application in one sprint cycle• 500% cost reduction compared to Oracle• 900% performance improvement compared to Oracle
  • 36. Customer AnalyticsProblem:• Deal with massive data volume across all customer sitesSolution:• Use MongoDB to replace Google Analytics / Omniture optionsResults:• Less than one week to build prototype and prove business case• Rapid deployment of new features
  • 37. ArchivingWhy MongoDB:• Existing application built on MySQL• Lots of friction with RDBMS based archive storage• Needed more scalable archive storage backendSolution:• Keep MySQL for active data (100mil)• MongoDB for archive (2+ billion)Results:• No more alter table statements taking over 2 months to run• Sharding fixed vertical scale problem• Very happily looking at other places to use MongoDB
  • 38. Online DictionaryProblem:• MySQL could not scale to handle their 5B+ documentsSolution:• Switched from MySQL to MongoDBResults:• Massive simplification of code base• Eliminated need for external caching system• 20x performance improvement over MySQL
  • 39. E-commerceProblem:• Multi-vertical E-commerce impossible to model (efficiently) in RDBMSSolution:• Switched from MySQL to MongoDBResults:• Massive simplification of code base• Rapidly build, halving time to market (and cost)• Eliminated need for external caching system• 50x+ performance improvement over MySQL
  • 40. Tons more MongoDB casts a wide net people keep coming up with new and brilliant ways to use it
  • 41. In Good Company and 1000s more
  • 42. MongoD B
  • 43. Start with an (or array, hash, dict, eplace1 = { name : "10gen HQ", address : "578 Broadway 7th Floor", city : "New York", zip : "10011", tags : [ "business", "awesome" ]}
  • 44. Inserting the record Initial Data Load > db.places.insert(place1)> db.places.insert(place1)
  • 45. Querying{ name : "10gen HQ", address : "134 5th Avenue 3rd Floor", city : "New York", zip : "10011", tags : [ "business", "awesome" ]}> db.posts.findOne({ zip: "10011", tags: "awesome" })> db.posts.find({tags: "business" })
  • 46. Nested Documents { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "roger", date : "Sat Apr 24 2011 19:47:11", text : "About MongoDB...", tags : [ "tech", "databases" ], comments : [ { author : "Fred", date : "Sat Apr 25 2010 20:51:03", text : "Best Post Ever!" } ]}
  • 47. Object ID> db.places.insert(place1)object(MongoId)#4 (1) { ["$id"]=> string(24) "4e9cc76a4a1817fd21000000"} 4e9cc76a4a1817fd21000000 |------||----||--||----| ts mac pid inc
  • 48. A More Complex Documentplace1 = { name : "10gen HQ", address : "578 Broadway 7th Floor", city : "New York", zip : "10011", tags : [ "business", "awesome" ], latlong : [40.0,72.0], tips : [ { user : "ryan", time : 6/26/2011, tip : "stop by for office hours"}, {.....}]}
  • 49. Indexing & Adv Querying// Index nested documentsdb.posts.ensureIndex({ "comments.author":1 })db.posts.find({comments.author:Fred})// Regular Expressionsdb.posts.find({comments.author: /^Fr/})// Index on tags (multi-key index)db.posts.ensureIndex({ tags: 1})db.posts.find( { tags: tech } )// geospatial indexdb.posts.ensureIndex({ "author.location": "2d" })db.posts.find({"author.location":{$near:[22,42]}})
  • 50. Updatingplace1 = { name : "10gen HQ",> db.places.update( address : "578 Broadway 7th Floor", {name : "10gen HQ"}, city : "New York", { $push : zip : "10011", { tips : tags : [ "business", "awesome" ], latlong {: user : "nosh", [40.0,72.0], tips : [ { user : "ryan", time : 6/26/2011, time : 6/26/2011, tiptip"Office by for office hours on : : "stop hours are great!" } Wednesdays from 4-6pm"}, } { user : "nosh", time : 7/14/2011, } tip : "Office hours are great!"}) ]}
  • 51. Updatingplace1 = { name : "10gen HQ",> db.places.update( address : "578 Broadway 7th Floor", {name : "10gen HQ"}, city : "New York", { $push : zip : "10011", { tips : tags : [ "business", "awesome" ], latlong {: user : "nosh", [40.0,72.0], tips : [ { user : "ryan", time : 6/26/2011, time : 6/26/2011, tiptip"Office by for office hours on : : "stop hours are great!" } Wednesdays from 4-6pm"}, } { user : "nosh", time : 7/14/2011, } tip : "Office hours are great!"}) ]}
  • 52. Atomic Operations$set $unset $rename $push $pop $pull $addToSet $in
  • 53. Cursors$cursor = $c->find(array("foo" => "bar"));foreach ($cursor as $id => $value) { echo "$id: "; var_dump( $value );}$a = iterator_to_array($cursor);
  • 54. Pagingpage_num = 3;results_per_page = 10;cursor = db.collection.find() .sort({ "ts" : -1 }) .skip(page_num * results_per_page) .limit(results_per_page);
  • 55. Grid FS
  • 56. Storing FilesUnder 16mb
  • 57. Storing Big Files>16mb stored in 16mb chunks
  • 58. Storing Big FilesWorks with replicated and
  • 59. A better network FSGridFS files are seamlessly sharded & replicated.No OS constraints...No file size limitsNo naming constraintsNo folder limitsStandard across different OSsMongoDB automatically generates the MD5 hash ofthe file
  • 60. MongoDB for Genealogy Data
  • 61. Types of genealogy dataEvents (birth, death, Photographsetc) Diaries & lettersOfficial records Ship passenger listCensus OccupationNames and moreRelationships
  • 62. Challenges of genealogy dataLots of possible data points... need flexible schemaMultiple versions of same data point(3 different dates for death date, 4 variations onname).Data related to recordsMultiple versions of same nodes(intelligent nondestructive merge needed)Need to have meta data associated
  • 63. Genealo gy ischangin g
  • 64. 0 @I2@ INDI1 NAME Charles Phillip /Ingalls/1 SEX M1 BIRT2 DATE 10 JAN 18362 PLAC Cuba, Allegheny, NY1 DEAT Recog2 DATE 08 JUN 19022 PLAC De Smet, Kingsbury, Dakota Territory1 FAMC @F2@1 FAMS @F3@ nize0 @I3@ INDI1 NAME Caroline Lake /Quiner/1 SEX F1 BIRT2 DATE 12 DEC 1839
  • 65. GEDCOMFile format, not a databaseHandles the great variety of data wellDoesn’t really scale beyond a local user.Doesn’t provide good mechanism for storingexternal documents (birth certificates, etc).Built to solve problem of sharing data
  • 66. Genealogy & MongoDBGenealogy is anything but rigid and fixedFlexible schema fits genealogy data wellPackaging things together makes senseRelating records doesn’t require a relationaldatabase
  • 67. Indivi•AFN•Modification Date Events[] •type •date Name •contributor[] •record[] •First[] •Middle[] Location •Last[] •city •state •county •country
  • 68. Indivi Events[] Us • Name• AFN • type • Email Address• Modification Date • date • Password • contributor[] • Individual_id • record[] Name• First[]• Middle[] Location• Last[] • city • state Rec • county • contributor • country • type • coordinates[] • thumbnail • content • description • tags[]
  • 69. Individualindividual = { _id : ObjectId("4f2978dfaa999d9db02618ce"), AFN : 1XYK-KQJ, name: { first: [john, johannes], middle: peter, last: [smith, sandvik] }}
  • 70. Individualindividual = { _id : ObjectId("4f2978dfaa999d9db02618ce"), AFN : 1XYK-KQJ, name: { first: [john, johannes], middle: peter, last: [smith, sandvik] }}db.individual.find({name.first : ‘john’, name.middle : ‘peter’})
  • 71. Eventsevents : [ death : { date : ISODate(1989-07-14), location : { city: pensacola, state: fl, county: escambia, country: usa coordinates : [30.26,87.12]}, contributor : ObjectId("4eeac...691")}]
  • 72. events : [ death : { Events date : ISODate(1989-07-14), location : { city: pensacola, state: fl, county: escambia, country: usa coordinates : [30.26,87.12]}, contributor : ObjectId("4eeac...691")}]db.individual.find({events.death.date : ISODate(‘1989-07-14’)})db.individual.find({events.death.location : { $near:[30,90]}})
  • 73. Duplicate Eventsevents : [ birth : [ { date : ISODate(1928-04-06), location : { city: brattleboro, state: vt, county: windham, country: usa coordinates : [42.51,72.34]}, contributor : ObjectId("4ee...00000"), records: ObjectId("4ed8a...7b000000") },
  • 74. county: windham,Duplicate Events country: usa coordinates : [42.51,72.34]}, contributor : ObjectId("4ee...00000"), records: ObjectId("4ed8a...7b000000") }, { date : ISODate(1928-04-16), location : { city: brattleboro, state: vt, county: windham, country: usa coordinates : [42.51,72.34]}, contributor : ObjectId("4ee...37bb"), records: ObjectId("4eea...0000c8"), }],}
  • 75. Duplicate Eventsevents : [ birth : [ { date : ISODate(1928-04-06)}, { date : ISODate(1928-04-16)}],]db.individual.find({events.birth.date : ISODate(‘1928-04-16’)}) Same Query Works!!
  • 76. Multiple Eventsmarriage : [{ date : ISODate(1939-08-11), end_date : ISODate(1940-02-19), to : ObjectId("4f297978aa999d9db02618cf"), location : { city: raleigh, state: nc, county: wake, country: usa coordinates : [35.49,78.38]}, contributor : ObjectId("4eeac...91537bb")},{ date : ISODate(1944-04-19), to : ObjectId("4f2978dfaa999d9db02618ce"), location : {
  • 77. marriage : [{ Multiple Events date : ISODate(1939-08-11), end_date : ISODate(1940-02-19), to : ObjectId("4f297978aa999d9db02618cf"), location : { city: raleigh, state: nc, county: wake, country: usa coordinates : [35.49,78.38]}, contributor : ObjectId("4eeac...91537bb")},{ date : ISODate(1944-04-19), to : ObjectId("4f2978dfaa999d9db02618ce"), location : { city: atlanta, state: ga, county: fulton, country: usa coordinates : [33.45,84.23]}, contributor : ObjectId("4eeb...37bb")}]
  • 78. individual = { All _id : ObjectId("4f2978dfaa999d9db02618ce"), togeth AFN : 1XYK-KQJ, name: { first: [john, johannes], middle: peter, last: [smith, sandvik] }, events : [ er birth : [ { date : ISODate(1928-04-06), location : { Text city: brattleboro, state: vt, county: windham, country: usa coordinates : [42.51,72.34] }, contributor : ObjectId("4eeabc958b691537bb000000"), records: ObjectId("4ed8aea7d8562f7d7b000000") }, { date : ISODate(1928-04-16), location : { city: brattleboro,
  • 79. Recordsrecord1 = { _id : ObjectId("4ed8aea7d8562f7d7b") contributor : ObjectId("4eeab...1537bb"), type : birth certificate, thumbnail : BinData(0,"/9j/4AAQSkZJ...."), content : BinData(0,"j6b/Id11lWqs..."), tags : [NY, certified], description : "Johns birth certificate"}
  • 80. Usersuser = { _id : ObjectId("4eeabc958b691537bb"), username : spf13, email_address : genealogy@spf13.com, password : a.long.passphrase18, individual_id : ObjectId("4f2f...0ce"),}
  • 81. Scaling MongoDB for all thegeneration
  • 82. Replica SetsPrimary Primary PrimarySecondary Secondary SecondarySecondary Arbiter Secondary Secondary Secondary
  • 83. Sharding App App App Server Server Server MongoS MongoS MongoS ConfigD ConfigD ConfigDMongoD MongoD MongoD MongoDMongoD MongoD MongoD MongoDMongoD MongoD MongoD MongoD
  • 84. The Family Tree
  • 85. It’s not a tree at all, It’s really a graph ... and an odd one at that
  • 86. It would be easy if italways looked like this
  • 87. It would be easy if italways looked like this
  • 88. All sorts of messStep & adopted relationshipsDuplicate nodesLots of missing nodesDivorces and re-marriagesMultiple names for the same personMultiple dates for the same event
  • 89. How to makesense of it all
  • 90. Storing agraph in
  • 91. Graphs are importantWithout them we couldn’t store family relationships
  • 92. Trees / graphs in MongoDBSince MongoDB data structures areessentially objects, a good degree offlexibility here.Think of how you would structure them inyour application
  • 93. Trees / graphs in MongoDBEach node is stored as a documentContains references to related nodesWhat is “related” depends on yourapplication
  • 94. References vs RelationMongoDB uses referencesUnlike foreign keys, references don’tenforce integrityReference is really just a referenceFor many applications a reference issufficient
  • 95. Simple relationship{ _id: "a" } { _id: "b" } { _id: "c" } { _id: "d" }{ _id: "e", parents: ["a", "b" ]}{ _id: "f", parents: ["c", "d" ]}{ _id: "g", parents: ["e", "f" ]}•= b =allancestors of g: ofg});b}).toArray(); Easy to access b://find//find all descendantsvar nodes in either direction db.family.find({ _id:g db.family.findOne({_id:•Good for trees / {graphsdescendantsFind = function(par) {ancestorFind = function(child)• if ( ! (i in par) return sets var rv Can==[];[]; { large rv; var rv grab for child.parents)//finddb.family.find( { descendants of b:} ).toArray(); var k = all db.family.find( { _id : :{ par[i]._id }).toArray(); parents = direct parents $in : child.parents }•Minimum amount of maintenance rv = rv.concat(parents); rv = rv.concat(k);>forrv = irv.concat(descendantsFind(k)); : ‘b’}) db.family.find({ parents (var in parents) {•Balanced ancestorFind(parents[i])); } } rv = rv.concat( return rv;•Implied relationships}} return rv;descendantsFind(b);ancestorFind(g);
  • 96. Simple relationship{ _id: "a" } { _id: "b" } { _id: "c" } { _id: "d" }{ _id: "e", parents: ["a", "b" ]}{ _id: "f", parents: ["c", "d" ]}{ _id: "g", parents: ["e", "f" ]}•= b =allancestors of g: ofg});b}).toArray(); Easy to access b://find//find all descendantsvar nodes in either direction db.family.find({ _id:g db.family.findOne({_id:•Good for trees / {graphsdescendantsFind = function(par) {ancestorFind = function(child)• if ( ! (i in par) return sets var rv Can==[];[]; { large rv; var rv grab for child.parents)//finddb.family.find( { descendants of b:} ).toArray(); var k = all db.family.find( { _id : :{ par[i]._id }).toArray(); parents = direct parents $in : child.parents }•Minimum amount of maintenance rv = rv.concat(parents); rv = rv.concat(k);>forrv = irv.concat(descendantsFind(k)); : ‘b’}) db.family.find({ parents (var in parents) {•Balanced ancestorFind(parents[i])); } } rv = rv.concat( return rv;•Implied relationships}} return rv;descendantsFind(b);ancestorFind(g);
  • 97. Simple relationship{ _id: "a" } { _id: "b" } { _id: "c" } { _id: "d" }{ _id: "e", parents: ["a", "b" ]}{ _id: "f", parents: ["c", "d" ]}{ _id: "g", parents: ["e", "f" ]}•= b =allancestors of g: ofg});b}).toArray(); Easy to access b://find//find all descendantsvar nodes in either direction db.family.find({ _id:g db.family.findOne({_id:•Good for trees / {graphsdescendantsFind = function(par) {ancestorFind = function(child)• if ( ! (i in par) return sets var rv Can==[];[]; { large rv; var rv grab for child.parents)//finddb.family.find( { descendants of b:} ).toArray(); var k = all db.family.find( { _id : :{ par[i]._id }).toArray(); parents = direct parents $in : child.parents }•Minimum amount of maintenance rv = rv.concat(parents); rv = rv.concat(k);>forrv = irv.concat(descendantsFind(k)); : ‘b’}) db.family.find({ parents (var in parents) {•Balanced ancestorFind(parents[i])); } } rv = rv.concat( return rv;•Implied relationships}} return rv;descendantsFind(b);ancestorFind(g);
  • 98. Simple relationship{ _id: "a" } { _id: "b" } { _id: "c" } { _id: "d" }{ _id: "e", parents: ["a", "b" ]}{ _id: "f", parents: ["c", "d" ]}{ _id: "g", parents: ["e", "f" ]}•= b =allancestors of g: ofg});b}).toArray(); Easy to access b://find//find all descendantsvar nodes in either direction db.family.find({ _id:g db.family.findOne({_id:•Good for trees / {graphsdescendantsFind = function(par) {ancestorFind = function(child)• if ( ! (i in par) return sets var rv Can==[];[]; { large rv; var rv grab for child.parents)//finddb.family.find( { descendants of b:} ).toArray(); var k = all db.family.find( { _id : :{ par[i]._id }).toArray(); parents = direct parents $in : child.parents }•Minimum amount of maintenance rv = rv.concat(parents); rv = rv.concat(k);>forrv = irv.concat(descendantsFind(k)); : ‘b’}) db.family.find({ parents (var in parents) {•Balanced ancestorFind(parents[i])); } } rv = rv.concat( return rv;•Implied relationships}} return rv;descendantsFind(b);ancestorFind(g);
  • 99. Bi-directional { _id: "a", children: ["e"] } { _id: "b", children: ["e"] } { _id: "c", children: ["f"] } { _id: "d", children: ["f"] } { _id: "e", children: ["g"], parents: ["a", "b" ]} { _id: "f", children: ["g"], parents: ["c", "d" ]} { _id: "g", children: [] , parents: ["e", "f"] }•Doesn’t really add much beyond the first example•More maintenance•Duplication of each relationship•Only real advantage is ability to grab all relatednodes (both directions) with one query.
  • 100. Array of Ancestors{ _id: "a" }{ _id: "b" }{ _id: "c" }{ _id: "d" }{ _id: "e", ancestors: [ "a", "b" ], parents: ["a", "b" ]}{ _id: "f", ancestors: [ "c", "d" ], parents: ["c", "d" ]}{ _id: "g", ancestors: [ "a", "b", "c", "d", "e", "f" ], parents: ["e", "f"] }Great for small trees (or subsets).//find all descendants of b:> db.tree.find({ ancestors: ‘b’})Could be used to store X generations of ancestorsOptimized for retrieving entire tree//find all direct descendants of b:> db.tree.find({ parents: ‘b’})Uses implied relationships//find all ancestors of g:No = db.tree.findOne( { _id: gis )this person my grandson?> g help on specifics... }> db.tree.find( { _id: { $in : g.ancestors } )Easier retrieval at expense of costlier maintenance
  • 101. Array of Ancestors{ _id: "a" }{ _id: "b" }{ _id: "c" }{ _id: "d" }{ _id: "e", ancestors: [ "a", "b" ], parents: ["a", "b" ]}{ _id: "f", ancestors: [ "c", "d" ], parents: ["c", "d" ]}{ _id: "g", ancestors: [ "a", "b", "c", "d", "e", "f" ], parents: ["e", "f"] }Great for small trees (or subsets).//find all descendants of b:> db.tree.find({ ancestors: ‘b’})Could be used to store X generations of ancestorsOptimized for retrieving entire tree//find all direct descendants of b:> db.tree.find({ parents: ‘b’})Uses implied relationships//find all ancestors of g:No = db.tree.findOne( { _id: gis )this person my grandson?> g help on specifics... }> db.tree.find( { _id: { $in : g.ancestors } )Easier retrieval at expense of costlier maintenance
  • 102. Relations (basic){ _id : "b", relations : [ { id : "a", relation : "parent"}, { id : "c", relation : "grandparent"}, { id : "d", relation : "parent"}]}
  • 103. Relations (detailed){ _id : "b", relations : [ { id : "a", relation : "parent", type : "mother", subtype : "biological" }, { id : "c", relation : "parent", type : "father", subtype : "adopted"}, { id : "d", relation : "parent", type : "father", subtype : "biological"}]}
  • 104. Shouldn’t I store myfamily tree in a graph database? They are built to store trees after all
  • 105. Graphs are great attraversing deep in a tree • Is this node my relative? • Retrieve my paternal great, great, great, great grandpa
  • 106. Graphs are great attraversing deep in a tree • Is this node my relative? • Retrieve my paternal great, great, great, great grandpa
  • 107. Graphs are great attraversing deep in a tree • Is this node my relative? • Retrieve my paternal great, great, great, great grandpa
  • 108. Unfortunately that’s nothow we commonly workTypically we are working with a node andit’s immediate neighborsThe significant majority of our operationsaren’t traversingIf those operations areimportant, perhaps ahybrid graph & documentsolution makes sense
  • 109. http://spf13.com http://github.com/s @spf13Question download at mongodb.orgWe’re hiring!! Contact us at jobs@10gen.com