Storing   the FamilyTree with
We’re going to talk aboutMongoDB Intro & FundamentalsMongoDB for Genealogy dataScaling MongoDB for all the generationsThe ...
Steve                  @sp                     A                      15+ years building                      the internet...
Company behind MongoDBOffices in NYC, Palo Alto, London & Dublin100+ employeesSupport, consulting, trainingMgt: Google/Doub...
Introduction     toMongoD
A bit ofhistory
1974The relational database is created
1979
1979   1994
1979   1994   1995
Computers in 1995100 mhz Pentium10 base T16 MB ram200 MB HD
Cloud in 1995(Windows 95 cloud wallpaper)
Cell Phones in 2012Dual core 1.5Ghz802.11n (300+ Mbps)1 GB ram64 GB Solid State
MongoDB         Application     Document                         Oriented    High                 { author : “steve”,     ...
MongoDB philosophy Keep functionality when we can (key/value stores are great, but we need more) Non-relational (no joins)...
Under the hoodWritten in C++Runs nearly everywhereData serialized to BSONExtensive use of memory-mapped filesi.e. read-thro...
Database LandscapeScalability & Performance                            MemCache                                           ...
“MongoDB has the bestfeatures of key/valuestores, documentdatabases and relationaldatabases in one.         John Nunemaker
Relational made normalized     data look like this                      Category                  • Name                  ...
Document databases makenormalized data look like this                            Article                     • Name       ...
But we’ve been usinga relational database    for 40 years!
How do people storedocuments in real life?
Think about adoctors office There’s two ways theycould organize their files
Each document type        in it’s own drawerMRIs   X-rays   Lab   Invoices       Index         1      1        1       1  ...
Each document type        in it’s own drawerMRIs   X-rays   Lab   Invoices       Index         1      1        1       1  ...
Each document type        in it’s own drawerMRIs   X-rays   Lab   Invoices       Index         1      1        1       1  ...
2. Group related records    Patient 1   Patient 2   Patient 3   ...    Vendor 1    Vendor 2    Vendor 3
2. Group related records    Patient 1               Patient 3   ...            Patient 2    Vendor 1     Vendor 2   Vendor 3
Databases work the same way          Relation                               Docum                                         ...
Terminology RDBMS                 MongoTable, View   ➜   CollectionRow           ➜   DocumentIndex         ➜   IndexJoin  ...
Why MongoDB                   My Top 10 Reasons10. Great developer experience 9. Speaks your language 8. Scale horizontall...
Why MongoDB                   My Top 10 Reasons10. Great developer experience 9. Speaks your language 8. Scale horizontall...
MongoDBUse Cases
CMS / BlogNeeds:• Business needed modern data store for rapid development and  scaleSolution:• Use PHP & MongoDBResults:• ...
Photo Meta-DataProblem:• Business needed more flexibility than Oracle could deliverSolution:• Use MongoDB instead of Oracle...
Customer AnalyticsProblem:• Deal with massive data volume across all customer sitesSolution:• Use MongoDB to replace Googl...
ArchivingWhy MongoDB:• Existing application built on MySQL• Lots of friction with RDBMS based archive storage• Needed more...
Online DictionaryProblem:• MySQL could not scale to handle their 5B+ documentsSolution:• Switched from MySQL to MongoDBRes...
E-commerceProblem:• Multi-vertical E-commerce impossible to model (efficiently) in  RDBMSSolution:• Switched from MySQL to ...
Tons more   MongoDB casts a wide net  people keep coming up with new and brilliant ways to use it
In Good Company   and 1000s more
MongoD  B
Start with an              (or array, hash, dict, eplace1 = {   name : "10gen HQ", address : "578 Broadway 7th Floor",   c...
Inserting the record    Initial Data Load               > db.places.insert(place1)> db.places.insert(place1)
Querying{    name : "10gen HQ", address : "134 5th Avenue 3rd Floor",    city : "New York",     zip : "10011",   tags : [ ...
Nested Documents  { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),    author : "roger",    date : "Sat Apr 24 2011 19:47:11",...
Object ID> db.places.insert(place1)object(MongoId)#4 (1) {  ["$id"]=> string(24) "4e9cc76a4a1817fd21000000"}   4e9cc76a4a1...
A More Complex Documentplace1 = {   name : "10gen HQ", address : "578 Broadway 7th Floor",   city : "New York",     zip : ...
Indexing & Adv Querying// Index nested documentsdb.posts.ensureIndex({ "comments.author":1 })db.posts.find({comments.author...
Updatingplace1 = {    name : "10gen HQ",> db.places.update( address : "578 Broadway 7th Floor",  {name : "10gen HQ"},    c...
Updatingplace1 = {    name : "10gen HQ",> db.places.update( address : "578 Broadway 7th Floor",  {name : "10gen HQ"},    c...
Atomic   Operations$set   $unset       $rename   $push     $pop     $pull $addToSet          $in
Cursors$cursor = $c->find(array("foo" => "bar"));foreach ($cursor as $id => $value) {   echo "$id: ";   var_dump( $value );...
Pagingpage_num = 3;results_per_page = 10;cursor = db.collection.find()  .sort({ "ts" : -1 })  .skip(page_num * results_per_...
Grid FS
Storing FilesUnder 16mb
Storing Big Files>16mb stored in 16mb chunks
Storing Big FilesWorks with replicated and
A better network FSGridFS files are seamlessly sharded & replicated.No OS constraints...No file size limitsNo naming constra...
MongoDB for Genealogy   Data
Types of      genealogy dataEvents (birth, death,   Photographsetc)                        Diaries & lettersOfficial record...
Challenges of           genealogy dataLots of possible data points... need flexible schemaMultiple versions of same data po...
Genealo gy ischangin   g
0   @I2@ INDI1   NAME Charles Phillip /Ingalls/1   SEX M1   BIRT2   DATE 10 JAN 18362   PLAC Cuba, Allegheny, NY1   DEAT  ...
GEDCOMFile format, not a databaseHandles the great variety of data wellDoesn’t really scale beyond a local user.Doesn’t pr...
Genealogy &              MongoDBGenealogy is anything but rigid and fixedFlexible schema fits genealogy data wellPackaging t...
Indivi•AFN•Modification Date                      Events[]                    •type                    •date    Name       ...
Indivi                  Events[]                                          Us                                         • Nam...
Individualindividual = {  _id : ObjectId("4f2978dfaa999d9db02618ce"),  AFN : 1XYK-KQJ,  name: {     first: [john, johannes]...
Individualindividual = {  _id : ObjectId("4f2978dfaa999d9db02618ce"),  AFN : 1XYK-KQJ,  name: {     first: [john, johannes]...
Eventsevents : [   death : {    date : ISODate(1989-07-14),    location : {      city: pensacola,      state: fl,      coun...
events : [   death : {                Events    date : ISODate(1989-07-14),    location : {      city: pensacola,      sta...
Duplicate Eventsevents : [  birth : [ {      date : ISODate(1928-04-06),      location : {        city: brattleboro,      ...
county: windham,Duplicate Events            country: usa            coordinates : [42.51,72.34]},          contributor : O...
Duplicate Eventsevents : [  birth : [ { date : ISODate(1928-04-06)},          { date : ISODate(1928-04-16)}],]db.individua...
Multiple Eventsmarriage : [{  date : ISODate(1939-08-11),  end_date : ISODate(1940-02-19),  to : ObjectId("4f297978aa999d9...
marriage : [{ Multiple Events  date : ISODate(1939-08-11),  end_date : ISODate(1940-02-19),  to : ObjectId("4f297978aa999d...
individual = {                              All   _id : ObjectId("4f2978dfaa999d9db02618ce"),                             ...
Recordsrecord1 = {   _id : ObjectId("4ed8aea7d8562f7d7b")   contributor : ObjectId("4eeab...1537bb"),   type : birth certi...
Usersuser = {  _id : ObjectId("4eeabc958b691537bb"),  username : spf13,  email_address : genealogy@spf13.com,  password : ...
Scaling MongoDB for all thegeneration
Replica SetsPrimary         Primary    PrimarySecondary      Secondary   SecondarySecondary       Arbiter    Secondary    ...
Sharding          App       App      App         Server    Server   Server         MongoS    MongoS    MongoS             ...
The Family Tree
It’s not a tree at all,  It’s really a graph     ... and an odd one at that
It would be easy if italways looked like this
It would be easy if italways looked like this
All sorts of messStep & adopted relationshipsDuplicate nodesLots of missing nodesDivorces and re-marriagesMultiple names f...
How to makesense of it all
Storing agraph   in
Graphs are importantWithout them we couldn’t store family relationships
Trees / graphs        in MongoDBSince MongoDB data structures areessentially objects, a good degree offlexibility here.Thin...
Trees / graphs        in MongoDBEach node is stored as a documentContains references to related nodesWhat is “related” dep...
References vs         RelationMongoDB uses referencesUnlike foreign keys, references don’tenforce integrityReference is re...
Simple relationship{   _id:   "a" } { _id: "b" } { _id: "c" } { _id: "d" }{   _id:   "e", parents: ["a", "b" ]}{   _id:   ...
Simple relationship{   _id:   "a" } { _id: "b" } { _id: "c" } { _id: "d" }{   _id:   "e", parents: ["a", "b" ]}{   _id:   ...
Simple relationship{   _id:   "a" } { _id: "b" } { _id: "c" } { _id: "d" }{   _id:   "e", parents: ["a", "b" ]}{   _id:   ...
Simple relationship{   _id:   "a" } { _id: "b" } { _id: "c" } { _id: "d" }{   _id:   "e", parents: ["a", "b" ]}{   _id:   ...
Bi-directional {   _id:   "a", children: ["e"] } {   _id:   "b", children: ["e"] } {   _id:   "c", children: ["f"] } {   _...
Array of Ancestors{   _id:   "a" }{   _id:   "b" }{   _id:   "c" }{   _id:   "d" }{   _id:   "e", ancestors: [ "a", "b" ],...
Array of Ancestors{   _id:   "a" }{   _id:   "b" }{   _id:   "c" }{   _id:   "d" }{   _id:   "e", ancestors: [ "a", "b" ],...
Relations (basic){   _id     : "b",    relations : [       {         id      : "a",         relation : "parent"},       { ...
Relations (detailed){   _id     : "b",    relations : [       {         id      : "a",         relation : "parent",       ...
Shouldn’t I store myfamily tree in a graph     database?   They are built to store trees after all
Graphs are great attraversing deep in a tree              • Is this node my                relative?              • Retrie...
Graphs are great attraversing deep in a tree              • Is this node my                relative?              • Retrie...
Graphs are great attraversing deep in a tree              • Is this node my                relative?              • Retrie...
Unfortunately that’s nothow we commonly workTypically we are working with a node andit’s immediate neighborsThe significant...
http://spf13.com                           http://github.com/s                           @spf13Question    download at mon...
MongoDB for Genealogy
MongoDB for Genealogy
MongoDB for Genealogy
Upcoming SlideShare
Loading in …5
×

MongoDB for Genealogy

8,390 views

Published on

Why MongoDB is a great fit for genealogical data thanks to it’s flexible schema, rich documents and ability to scale to humongous data sets.

Published in: Technology
2 Comments
16 Likes
Statistics
Notes
No Downloads
Views
Total views
8,390
On SlideShare
0
From Embeds
0
Number of Embeds
919
Actions
Shares
0
Downloads
223
Comments
2
Likes
16
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Remember in 1995 there were around 10,000 websites. Mosiac, Lynx, Mozilla (pre netscape) and IE 2.0 were the only web browsers. \nApache (Dec ’95), Java (’96), PHP (June ’95), and .net didn’t exist yet. Linux just barely (1.0 in ’94)\n
  • Remember in 1995 there were around 10,000 websites. Mosiac, Lynx, Mozilla (pre netscape) and IE 2.0 were the only web browsers. \nApache (Dec ’95), Java (’96), PHP (June ’95), and .net didn’t exist yet. Linux just barely (1.0 in ’94)\n
  • Remember in 1995 there were around 10,000 websites. Mosiac, Lynx, Mozilla (pre netscape) and IE 2.0 were the only web browsers. \nApache (Dec ’95), Java (’96), PHP (June ’95), and .net didn’t exist yet. Linux just barely (1.0 in ’94)\n
  • \n
  • \n
  • \n
  • \n
  • By reducing transactional semantics the db provides, one can still solve an interesting set of problems where performance is very important, and horizontal scaling then becomes easier.\n\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Store an array of the id of the ancestor of a given document\n
  • Store an array of the id of the ancestor of a given document\n
  • Store an array of the id of the ancestor of a given document\n
  • Store an array of the id of the ancestor of a given document\n
  • Store an array of the id of the ancestor of a given document\n
  • Store an array of the id of the ancestor of a given document\n
  • Store an array of the id of the ancestor of a given document\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • MongoDB for Genealogy

    1. 1. Storing the FamilyTree with
    2. 2. We’re going to talk aboutMongoDB Intro & FundamentalsMongoDB for Genealogy dataScaling MongoDB for all the generationsThe Family TreeStoring a graph in MongoDB
    3. 3. Steve @sp A 15+ years building the internet Father, husband, skateboarder, genealogist at ❤Chief Solutions Architect @responsible for drivers,integrations, web & docs
    4. 4. Company behind MongoDBOffices in NYC, Palo Alto, London & Dublin100+ employeesSupport, consulting, trainingMgt: Google/DoubleClick, Oracle, Apple, NetApp, Mark LogicWell Funded: Sequoia, Union Square, Flybridge
    5. 5. Introduction toMongoD
    6. 6. A bit ofhistory
    7. 7. 1974The relational database is created
    8. 8. 1979
    9. 9. 1979 1994
    10. 10. 1979 1994 1995
    11. 11. Computers in 1995100 mhz Pentium10 base T16 MB ram200 MB HD
    12. 12. Cloud in 1995(Windows 95 cloud wallpaper)
    13. 13. Cell Phones in 2012Dual core 1.5Ghz802.11n (300+ Mbps)1 GB ram64 GB Solid State
    14. 14. MongoDB Application Document Oriented High { author : “steve”, date : new Date(),Performance text : “About MongoDB...”, tags : [“tech”, “database”]} Fully Consistent Horizontally Scalable
    15. 15. MongoDB philosophy Keep functionality when we can (key/value stores are great, but we need more) Non-relational (no joins) makes scaling horizontally practical Document data models are good Database technology should run anywhere virtualized, cloud, metal, etc
    16. 16. Under the hoodWritten in C++Runs nearly everywhereData serialized to BSONExtensive use of memory-mapped filesi.e. read-through write-throughmemory caching.
    17. 17. Database LandscapeScalability & Performance MemCache MongoDB RDBMS Depth of Functionality
    18. 18. “MongoDB has the bestfeatures of key/valuestores, documentdatabases and relationaldatabases in one. John Nunemaker
    19. 19. Relational made normalized data look like this Category • Name • Url Article User • Name Tag• Name • Slug • Name• Email Address • Publish date • Url • Text Comment • Comment • Date • Author
    20. 20. Document databases makenormalized data look like this Article • Name • Slug • Publish date User • Text • Name • Author • Email Address Comment[] • Comment • Date • Author Tag[] • Value Category[] • Value
    21. 21. But we’ve been usinga relational database for 40 years!
    22. 22. How do people storedocuments in real life?
    23. 23. Think about adoctors office There’s two ways theycould organize their files
    24. 24. Each document type in it’s own drawerMRIs X-rays Lab Invoices Index 1 1 1 1 1 1 1 1 History Medications Lab Forms
    25. 25. Each document type in it’s own drawerMRIs X-rays Lab Invoices Index 1 1 1 1 1 1 1 1 History Medications Lab Forms
    26. 26. Each document type in it’s own drawerMRIs X-rays Lab Invoices Index 1 1 1 1 1 1 1 1 History Medications Lab Forms
    27. 27. 2. Group related records Patient 1 Patient 2 Patient 3 ... Vendor 1 Vendor 2 Vendor 3
    28. 28. 2. Group related records Patient 1 Patient 3 ... Patient 2 Vendor 1 Vendor 2 Vendor 3
    29. 29. Databases work the same way Relation Docum Patient 1 Vendor 1 Article Category • Name • Name • Slug • Url • Publish User date • Text • Name • Author • Email Address Article User Tag • Name Comment[]• Name • Name• Email • Slug • Url • Comment Address • Publish date • Date • Author Comment Tag[] • Comment • Value • Date • Author Category[] • Value
    30. 30. Terminology RDBMS MongoTable, View ➜ CollectionRow ➜ DocumentIndex ➜ IndexJoin ➜ EmbeddedForeign Key ➜ Document ReferencePartition ➜ Shard
    31. 31. Why MongoDB My Top 10 Reasons10. Great developer experience 9. Speaks your language 8. Scale horizontally 7. Fully consistent data w/atomic operations1.It’s web scale 6. Memory caching integrated5. Open source 4. Flexible, rich & structured data format not just K:V 3. Ludicrously fast (without going plaid) 2. Simplify infrastructure & application
    32. 32. Why MongoDB My Top 10 Reasons10. Great developer experience 9. Speaks your language 8. Scale horizontally 7. Fully consistent data w/atomic operations1.It’s web scale 6. Memory caching integrated5. Open source 4. Flexible, rich & structured data format not just K:V 3. Ludicrously fast (without going plaid) 2. Simplify infrastructure & application
    33. 33. MongoDBUse Cases
    34. 34. CMS / BlogNeeds:• Business needed modern data store for rapid development and scaleSolution:• Use PHP & MongoDBResults:• Real time statistics• All data, images, etc stored together easy access, easy deployment, easy high availability• No need for complex migrations• Enabled very rapid development and growth
    35. 35. Photo Meta-DataProblem:• Business needed more flexibility than Oracle could deliverSolution:• Use MongoDB instead of OracleResults:• Developed application in one sprint cycle• 500% cost reduction compared to Oracle• 900% performance improvement compared to Oracle
    36. 36. Customer AnalyticsProblem:• Deal with massive data volume across all customer sitesSolution:• Use MongoDB to replace Google Analytics / Omniture optionsResults:• Less than one week to build prototype and prove business case• Rapid deployment of new features
    37. 37. ArchivingWhy MongoDB:• Existing application built on MySQL• Lots of friction with RDBMS based archive storage• Needed more scalable archive storage backendSolution:• Keep MySQL for active data (100mil)• MongoDB for archive (2+ billion)Results:• No more alter table statements taking over 2 months to run• Sharding fixed vertical scale problem• Very happily looking at other places to use MongoDB
    38. 38. Online DictionaryProblem:• MySQL could not scale to handle their 5B+ documentsSolution:• Switched from MySQL to MongoDBResults:• Massive simplification of code base• Eliminated need for external caching system• 20x performance improvement over MySQL
    39. 39. E-commerceProblem:• Multi-vertical E-commerce impossible to model (efficiently) in RDBMSSolution:• Switched from MySQL to MongoDBResults:• Massive simplification of code base• Rapidly build, halving time to market (and cost)• Eliminated need for external caching system• 50x+ performance improvement over MySQL
    40. 40. Tons more MongoDB casts a wide net people keep coming up with new and brilliant ways to use it
    41. 41. In Good Company and 1000s more
    42. 42. MongoD B
    43. 43. Start with an (or array, hash, dict, eplace1 = { name : "10gen HQ", address : "578 Broadway 7th Floor", city : "New York", zip : "10011", tags : [ "business", "awesome" ]}
    44. 44. Inserting the record Initial Data Load > db.places.insert(place1)> db.places.insert(place1)
    45. 45. Querying{ name : "10gen HQ", address : "134 5th Avenue 3rd Floor", city : "New York", zip : "10011", tags : [ "business", "awesome" ]}> db.posts.findOne({ zip: "10011", tags: "awesome" })> db.posts.find({tags: "business" })
    46. 46. Nested Documents { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "roger", date : "Sat Apr 24 2011 19:47:11", text : "About MongoDB...", tags : [ "tech", "databases" ], comments : [ { author : "Fred", date : "Sat Apr 25 2010 20:51:03", text : "Best Post Ever!" } ]}
    47. 47. Object ID> db.places.insert(place1)object(MongoId)#4 (1) { ["$id"]=> string(24) "4e9cc76a4a1817fd21000000"} 4e9cc76a4a1817fd21000000 |------||----||--||----| ts mac pid inc
    48. 48. A More Complex Documentplace1 = { name : "10gen HQ", address : "578 Broadway 7th Floor", city : "New York", zip : "10011", tags : [ "business", "awesome" ], latlong : [40.0,72.0], tips : [ { user : "ryan", time : 6/26/2011, tip : "stop by for office hours"}, {.....}]}
    49. 49. Indexing & Adv Querying// Index nested documentsdb.posts.ensureIndex({ "comments.author":1 })db.posts.find({comments.author:Fred})// Regular Expressionsdb.posts.find({comments.author: /^Fr/})// Index on tags (multi-key index)db.posts.ensureIndex({ tags: 1})db.posts.find( { tags: tech } )// geospatial indexdb.posts.ensureIndex({ "author.location": "2d" })db.posts.find({"author.location":{$near:[22,42]}})
    50. 50. Updatingplace1 = { name : "10gen HQ",> db.places.update( address : "578 Broadway 7th Floor", {name : "10gen HQ"}, city : "New York", { $push : zip : "10011", { tips : tags : [ "business", "awesome" ], latlong {: user : "nosh", [40.0,72.0], tips : [ { user : "ryan", time : 6/26/2011, time : 6/26/2011, tiptip"Office by for office hours on : : "stop hours are great!" } Wednesdays from 4-6pm"}, } { user : "nosh", time : 7/14/2011, } tip : "Office hours are great!"}) ]}
    51. 51. Updatingplace1 = { name : "10gen HQ",> db.places.update( address : "578 Broadway 7th Floor", {name : "10gen HQ"}, city : "New York", { $push : zip : "10011", { tips : tags : [ "business", "awesome" ], latlong {: user : "nosh", [40.0,72.0], tips : [ { user : "ryan", time : 6/26/2011, time : 6/26/2011, tiptip"Office by for office hours on : : "stop hours are great!" } Wednesdays from 4-6pm"}, } { user : "nosh", time : 7/14/2011, } tip : "Office hours are great!"}) ]}
    52. 52. Atomic Operations$set $unset $rename $push $pop $pull $addToSet $in
    53. 53. Cursors$cursor = $c->find(array("foo" => "bar"));foreach ($cursor as $id => $value) { echo "$id: "; var_dump( $value );}$a = iterator_to_array($cursor);
    54. 54. Pagingpage_num = 3;results_per_page = 10;cursor = db.collection.find() .sort({ "ts" : -1 }) .skip(page_num * results_per_page) .limit(results_per_page);
    55. 55. Grid FS
    56. 56. Storing FilesUnder 16mb
    57. 57. Storing Big Files>16mb stored in 16mb chunks
    58. 58. Storing Big FilesWorks with replicated and
    59. 59. A better network FSGridFS files are seamlessly sharded & replicated.No OS constraints...No file size limitsNo naming constraintsNo folder limitsStandard across different OSsMongoDB automatically generates the MD5 hash ofthe file
    60. 60. MongoDB for Genealogy Data
    61. 61. Types of genealogy dataEvents (birth, death, Photographsetc) Diaries & lettersOfficial records Ship passenger listCensus OccupationNames and moreRelationships
    62. 62. Challenges of genealogy dataLots of possible data points... need flexible schemaMultiple versions of same data point(3 different dates for death date, 4 variations onname).Data related to recordsMultiple versions of same nodes(intelligent nondestructive merge needed)Need to have meta data associated
    63. 63. Genealo gy ischangin g
    64. 64. 0 @I2@ INDI1 NAME Charles Phillip /Ingalls/1 SEX M1 BIRT2 DATE 10 JAN 18362 PLAC Cuba, Allegheny, NY1 DEAT Recog2 DATE 08 JUN 19022 PLAC De Smet, Kingsbury, Dakota Territory1 FAMC @F2@1 FAMS @F3@ nize0 @I3@ INDI1 NAME Caroline Lake /Quiner/1 SEX F1 BIRT2 DATE 12 DEC 1839
    65. 65. GEDCOMFile format, not a databaseHandles the great variety of data wellDoesn’t really scale beyond a local user.Doesn’t provide good mechanism for storingexternal documents (birth certificates, etc).Built to solve problem of sharing data
    66. 66. Genealogy & MongoDBGenealogy is anything but rigid and fixedFlexible schema fits genealogy data wellPackaging things together makes senseRelating records doesn’t require a relationaldatabase
    67. 67. Indivi•AFN•Modification Date Events[] •type •date Name •contributor[] •record[] •First[] •Middle[] Location •Last[] •city •state •county •country
    68. 68. Indivi Events[] Us • Name• AFN • type • Email Address• Modification Date • date • Password • contributor[] • Individual_id • record[] Name• First[]• Middle[] Location• Last[] • city • state Rec • county • contributor • country • type • coordinates[] • thumbnail • content • description • tags[]
    69. 69. Individualindividual = { _id : ObjectId("4f2978dfaa999d9db02618ce"), AFN : 1XYK-KQJ, name: { first: [john, johannes], middle: peter, last: [smith, sandvik] }}
    70. 70. Individualindividual = { _id : ObjectId("4f2978dfaa999d9db02618ce"), AFN : 1XYK-KQJ, name: { first: [john, johannes], middle: peter, last: [smith, sandvik] }}db.individual.find({name.first : ‘john’, name.middle : ‘peter’})
    71. 71. Eventsevents : [ death : { date : ISODate(1989-07-14), location : { city: pensacola, state: fl, county: escambia, country: usa coordinates : [30.26,87.12]}, contributor : ObjectId("4eeac...691")}]
    72. 72. events : [ death : { Events date : ISODate(1989-07-14), location : { city: pensacola, state: fl, county: escambia, country: usa coordinates : [30.26,87.12]}, contributor : ObjectId("4eeac...691")}]db.individual.find({events.death.date : ISODate(‘1989-07-14’)})db.individual.find({events.death.location : { $near:[30,90]}})
    73. 73. Duplicate Eventsevents : [ birth : [ { date : ISODate(1928-04-06), location : { city: brattleboro, state: vt, county: windham, country: usa coordinates : [42.51,72.34]}, contributor : ObjectId("4ee...00000"), records: ObjectId("4ed8a...7b000000") },
    74. 74. county: windham,Duplicate Events country: usa coordinates : [42.51,72.34]}, contributor : ObjectId("4ee...00000"), records: ObjectId("4ed8a...7b000000") }, { date : ISODate(1928-04-16), location : { city: brattleboro, state: vt, county: windham, country: usa coordinates : [42.51,72.34]}, contributor : ObjectId("4ee...37bb"), records: ObjectId("4eea...0000c8"), }],}
    75. 75. Duplicate Eventsevents : [ birth : [ { date : ISODate(1928-04-06)}, { date : ISODate(1928-04-16)}],]db.individual.find({events.birth.date : ISODate(‘1928-04-16’)}) Same Query Works!!
    76. 76. Multiple Eventsmarriage : [{ date : ISODate(1939-08-11), end_date : ISODate(1940-02-19), to : ObjectId("4f297978aa999d9db02618cf"), location : { city: raleigh, state: nc, county: wake, country: usa coordinates : [35.49,78.38]}, contributor : ObjectId("4eeac...91537bb")},{ date : ISODate(1944-04-19), to : ObjectId("4f2978dfaa999d9db02618ce"), location : {
    77. 77. marriage : [{ Multiple Events date : ISODate(1939-08-11), end_date : ISODate(1940-02-19), to : ObjectId("4f297978aa999d9db02618cf"), location : { city: raleigh, state: nc, county: wake, country: usa coordinates : [35.49,78.38]}, contributor : ObjectId("4eeac...91537bb")},{ date : ISODate(1944-04-19), to : ObjectId("4f2978dfaa999d9db02618ce"), location : { city: atlanta, state: ga, county: fulton, country: usa coordinates : [33.45,84.23]}, contributor : ObjectId("4eeb...37bb")}]
    78. 78. individual = { All _id : ObjectId("4f2978dfaa999d9db02618ce"), togeth AFN : 1XYK-KQJ, name: { first: [john, johannes], middle: peter, last: [smith, sandvik] }, events : [ er birth : [ { date : ISODate(1928-04-06), location : { Text city: brattleboro, state: vt, county: windham, country: usa coordinates : [42.51,72.34] }, contributor : ObjectId("4eeabc958b691537bb000000"), records: ObjectId("4ed8aea7d8562f7d7b000000") }, { date : ISODate(1928-04-16), location : { city: brattleboro,
    79. 79. Recordsrecord1 = { _id : ObjectId("4ed8aea7d8562f7d7b") contributor : ObjectId("4eeab...1537bb"), type : birth certificate, thumbnail : BinData(0,"/9j/4AAQSkZJ...."), content : BinData(0,"j6b/Id11lWqs..."), tags : [NY, certified], description : "Johns birth certificate"}
    80. 80. Usersuser = { _id : ObjectId("4eeabc958b691537bb"), username : spf13, email_address : genealogy@spf13.com, password : a.long.passphrase18, individual_id : ObjectId("4f2f...0ce"),}
    81. 81. Scaling MongoDB for all thegeneration
    82. 82. Replica SetsPrimary Primary PrimarySecondary Secondary SecondarySecondary Arbiter Secondary Secondary Secondary
    83. 83. Sharding App App App Server Server Server MongoS MongoS MongoS ConfigD ConfigD ConfigDMongoD MongoD MongoD MongoDMongoD MongoD MongoD MongoDMongoD MongoD MongoD MongoD
    84. 84. The Family Tree
    85. 85. It’s not a tree at all, It’s really a graph ... and an odd one at that
    86. 86. It would be easy if italways looked like this
    87. 87. It would be easy if italways looked like this
    88. 88. All sorts of messStep & adopted relationshipsDuplicate nodesLots of missing nodesDivorces and re-marriagesMultiple names for the same personMultiple dates for the same event
    89. 89. How to makesense of it all
    90. 90. Storing agraph in
    91. 91. Graphs are importantWithout them we couldn’t store family relationships
    92. 92. Trees / graphs in MongoDBSince MongoDB data structures areessentially objects, a good degree offlexibility here.Think of how you would structure them inyour application
    93. 93. Trees / graphs in MongoDBEach node is stored as a documentContains references to related nodesWhat is “related” depends on yourapplication
    94. 94. References vs RelationMongoDB uses referencesUnlike foreign keys, references don’tenforce integrityReference is really just a referenceFor many applications a reference issufficient
    95. 95. Simple relationship{ _id: "a" } { _id: "b" } { _id: "c" } { _id: "d" }{ _id: "e", parents: ["a", "b" ]}{ _id: "f", parents: ["c", "d" ]}{ _id: "g", parents: ["e", "f" ]}•= b =allancestors of g: ofg});b}).toArray(); Easy to access b://find//find all descendantsvar nodes in either direction db.family.find({ _id:g db.family.findOne({_id:•Good for trees / {graphsdescendantsFind = function(par) {ancestorFind = function(child)• if ( ! (i in par) return sets var rv Can==[];[]; { large rv; var rv grab for child.parents)//finddb.family.find( { descendants of b:} ).toArray(); var k = all db.family.find( { _id : :{ par[i]._id }).toArray(); parents = direct parents $in : child.parents }•Minimum amount of maintenance rv = rv.concat(parents); rv = rv.concat(k);>forrv = irv.concat(descendantsFind(k)); : ‘b’}) db.family.find({ parents (var in parents) {•Balanced ancestorFind(parents[i])); } } rv = rv.concat( return rv;•Implied relationships}} return rv;descendantsFind(b);ancestorFind(g);
    96. 96. Simple relationship{ _id: "a" } { _id: "b" } { _id: "c" } { _id: "d" }{ _id: "e", parents: ["a", "b" ]}{ _id: "f", parents: ["c", "d" ]}{ _id: "g", parents: ["e", "f" ]}•= b =allancestors of g: ofg});b}).toArray(); Easy to access b://find//find all descendantsvar nodes in either direction db.family.find({ _id:g db.family.findOne({_id:•Good for trees / {graphsdescendantsFind = function(par) {ancestorFind = function(child)• if ( ! (i in par) return sets var rv Can==[];[]; { large rv; var rv grab for child.parents)//finddb.family.find( { descendants of b:} ).toArray(); var k = all db.family.find( { _id : :{ par[i]._id }).toArray(); parents = direct parents $in : child.parents }•Minimum amount of maintenance rv = rv.concat(parents); rv = rv.concat(k);>forrv = irv.concat(descendantsFind(k)); : ‘b’}) db.family.find({ parents (var in parents) {•Balanced ancestorFind(parents[i])); } } rv = rv.concat( return rv;•Implied relationships}} return rv;descendantsFind(b);ancestorFind(g);
    97. 97. Simple relationship{ _id: "a" } { _id: "b" } { _id: "c" } { _id: "d" }{ _id: "e", parents: ["a", "b" ]}{ _id: "f", parents: ["c", "d" ]}{ _id: "g", parents: ["e", "f" ]}•= b =allancestors of g: ofg});b}).toArray(); Easy to access b://find//find all descendantsvar nodes in either direction db.family.find({ _id:g db.family.findOne({_id:•Good for trees / {graphsdescendantsFind = function(par) {ancestorFind = function(child)• if ( ! (i in par) return sets var rv Can==[];[]; { large rv; var rv grab for child.parents)//finddb.family.find( { descendants of b:} ).toArray(); var k = all db.family.find( { _id : :{ par[i]._id }).toArray(); parents = direct parents $in : child.parents }•Minimum amount of maintenance rv = rv.concat(parents); rv = rv.concat(k);>forrv = irv.concat(descendantsFind(k)); : ‘b’}) db.family.find({ parents (var in parents) {•Balanced ancestorFind(parents[i])); } } rv = rv.concat( return rv;•Implied relationships}} return rv;descendantsFind(b);ancestorFind(g);
    98. 98. Simple relationship{ _id: "a" } { _id: "b" } { _id: "c" } { _id: "d" }{ _id: "e", parents: ["a", "b" ]}{ _id: "f", parents: ["c", "d" ]}{ _id: "g", parents: ["e", "f" ]}•= b =allancestors of g: ofg});b}).toArray(); Easy to access b://find//find all descendantsvar nodes in either direction db.family.find({ _id:g db.family.findOne({_id:•Good for trees / {graphsdescendantsFind = function(par) {ancestorFind = function(child)• if ( ! (i in par) return sets var rv Can==[];[]; { large rv; var rv grab for child.parents)//finddb.family.find( { descendants of b:} ).toArray(); var k = all db.family.find( { _id : :{ par[i]._id }).toArray(); parents = direct parents $in : child.parents }•Minimum amount of maintenance rv = rv.concat(parents); rv = rv.concat(k);>forrv = irv.concat(descendantsFind(k)); : ‘b’}) db.family.find({ parents (var in parents) {•Balanced ancestorFind(parents[i])); } } rv = rv.concat( return rv;•Implied relationships}} return rv;descendantsFind(b);ancestorFind(g);
    99. 99. Bi-directional { _id: "a", children: ["e"] } { _id: "b", children: ["e"] } { _id: "c", children: ["f"] } { _id: "d", children: ["f"] } { _id: "e", children: ["g"], parents: ["a", "b" ]} { _id: "f", children: ["g"], parents: ["c", "d" ]} { _id: "g", children: [] , parents: ["e", "f"] }•Doesn’t really add much beyond the first example•More maintenance•Duplication of each relationship•Only real advantage is ability to grab all relatednodes (both directions) with one query.
    100. 100. Array of Ancestors{ _id: "a" }{ _id: "b" }{ _id: "c" }{ _id: "d" }{ _id: "e", ancestors: [ "a", "b" ], parents: ["a", "b" ]}{ _id: "f", ancestors: [ "c", "d" ], parents: ["c", "d" ]}{ _id: "g", ancestors: [ "a", "b", "c", "d", "e", "f" ], parents: ["e", "f"] }Great for small trees (or subsets).//find all descendants of b:> db.tree.find({ ancestors: ‘b’})Could be used to store X generations of ancestorsOptimized for retrieving entire tree//find all direct descendants of b:> db.tree.find({ parents: ‘b’})Uses implied relationships//find all ancestors of g:No = db.tree.findOne( { _id: gis )this person my grandson?> g help on specifics... }> db.tree.find( { _id: { $in : g.ancestors } )Easier retrieval at expense of costlier maintenance
    101. 101. Array of Ancestors{ _id: "a" }{ _id: "b" }{ _id: "c" }{ _id: "d" }{ _id: "e", ancestors: [ "a", "b" ], parents: ["a", "b" ]}{ _id: "f", ancestors: [ "c", "d" ], parents: ["c", "d" ]}{ _id: "g", ancestors: [ "a", "b", "c", "d", "e", "f" ], parents: ["e", "f"] }Great for small trees (or subsets).//find all descendants of b:> db.tree.find({ ancestors: ‘b’})Could be used to store X generations of ancestorsOptimized for retrieving entire tree//find all direct descendants of b:> db.tree.find({ parents: ‘b’})Uses implied relationships//find all ancestors of g:No = db.tree.findOne( { _id: gis )this person my grandson?> g help on specifics... }> db.tree.find( { _id: { $in : g.ancestors } )Easier retrieval at expense of costlier maintenance
    102. 102. Relations (basic){ _id : "b", relations : [ { id : "a", relation : "parent"}, { id : "c", relation : "grandparent"}, { id : "d", relation : "parent"}]}
    103. 103. Relations (detailed){ _id : "b", relations : [ { id : "a", relation : "parent", type : "mother", subtype : "biological" }, { id : "c", relation : "parent", type : "father", subtype : "adopted"}, { id : "d", relation : "parent", type : "father", subtype : "biological"}]}
    104. 104. Shouldn’t I store myfamily tree in a graph database? They are built to store trees after all
    105. 105. Graphs are great attraversing deep in a tree • Is this node my relative? • Retrieve my paternal great, great, great, great grandpa
    106. 106. Graphs are great attraversing deep in a tree • Is this node my relative? • Retrieve my paternal great, great, great, great grandpa
    107. 107. Graphs are great attraversing deep in a tree • Is this node my relative? • Retrieve my paternal great, great, great, great grandpa
    108. 108. Unfortunately that’s nothow we commonly workTypically we are working with a node andit’s immediate neighborsThe significant majority of our operationsaren’t traversingIf those operations areimportant, perhaps ahybrid graph & documentsolution makes sense
    109. 109. http://spf13.com http://github.com/s @spf13Question download at mongodb.orgWe’re hiring!! Contact us at jobs@10gen.com

    ×