Entity Relationships in a Document Database at CouchConf Boston

  • 2,049 views
Uploaded on

Unlike relational databases, document databases like CouchDB and Couchbase do not directly support entity relationships. This talk will explore patterns of modeling one-to-many and many-to-many entity …

Unlike relational databases, document databases like CouchDB and Couchbase do not directly support entity relationships. This talk will explore patterns of modeling one-to-many and many-to-many entity relationships in a document database. These patterns include using an embedded JSON array, relating documents using identifiers, using a list of keys, and using relationship documents.

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,049
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
42
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • \n
  • \n
  • \n
  • \n
  • A full outer join effectively combines both left and right outer joins. If your relational database doesn’t support full outer joins then a left outer join is “close enough” for the following examples.\n
  • Entities are joined together in a single row.\n
  • Entities are joined together in a single row.\n
  • Entities are joined together in a single row.\n
  • Entities are collated together, but in separate rows.\nNote the use of compound keys.\n
  • Entities are collated together, but in separate rows.\nNote the use of compound keys.\n
  • Entities are collated together, but in separate rows.\nNote the use of compound keys.\n
  • Result set may also include a doc column if include_docs is set to true.\n
  • Result set may also include a doc column if include_docs is set to true.\n
  • Result set may also include a doc column if include_docs is set to true.\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • The “0” and “1” make publisher sort before the publisher’s books.\nNote the use of compound keys.\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Note that the keys are the same as with the embedded document approach, but the IDs are different.\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Note that the best we can do is emit the book IDs, as we don’t have access to any other book data.\n
  • \n
  • Note that it includes the doc having the referenced ID, not the doc from which the row was emitted.\nNote that the docs are truncated.\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Note that none of the entity documents contain any references to other entities.\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Note that the docs are truncated.\n
  • \n
  • \n
  • \n
  • Note that the docs are truncated.\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Note that these are trade-offs that provide associated benefits.\n
  • Note that these are trade-offs that provide associated benefits.\n
  • Note that these are trade-offs that provide associated benefits.\n
  • Note that these are trade-offs that provide associated benefits.\n
  • Note that these are trade-offs that provide associated benefits.\n
  • Note that the startkey and endkey parameters need to be URL encoded.\nNote that one must account for the “left” entity when using grouping levels.\nNote that UUIDs are especially useful for Relationship Documents.\nNote that the bulk document API is not transactional!\n
  • Note that the startkey and endkey parameters need to be URL encoded.\nNote that one must account for the “left” entity when using grouping levels.\nNote that UUIDs are especially useful for Relationship Documents.\nNote that the bulk document API is not transactional!\n
  • Note that the startkey and endkey parameters need to be URL encoded.\nNote that one must account for the “left” entity when using grouping levels.\nNote that UUIDs are especially useful for Relationship Documents.\nNote that the bulk document API is not transactional!\n
  • Note that the startkey and endkey parameters need to be URL encoded.\nNote that one must account for the “left” entity when using grouping levels.\nNote that UUIDs are especially useful for Relationship Documents.\nNote that the bulk document API is not transactional!\n
  • Note that the startkey and endkey parameters need to be URL encoded.\nNote that one must account for the “left” entity when using grouping levels.\nNote that UUIDs are especially useful for Relationship Documents.\nNote that the bulk document API is not transactional!\n
  • \n
  • \n
  • \n

Transcript

  • 1. Entity Relationships ina Document Database MapReduce Views for SQL Users
  • 2. Entity:An object defined by its identityand a thread of continuity[1] 1. "Entity" Domain-driven Design Community <http://domaindrivendesign.org/node/109>.
  • 3. EntityRelationshipModel
  • 4. Join vs. Collation
  • 5. SQL Query JoiningPublishers and BooksSELECT `publisher`.`id`, `publisher`.`name`, `book`.`title`FROM `publisher`FULL OUTER JOIN `book` ON `publisher`.`id` = `book`.`publisher_id`ORDER BY `publisher`.`id`, `book`.`title`;
  • 6. Joined Result Setpublisher.id publisher.name book.title Building iPhone Apps with oreilly OReilly Media HTML, CSS, and JavaScript CouchDB: The Definitive oreilly OReilly Media Guide DocBook: The Definitive oreilly OReilly Media Guide oreilly OReilly Media RESTful Web Services
  • 7. Joined Result Set Publisher (“left”)publisher.id publisher.name book.title Building iPhone Apps with oreilly OReilly Media HTML, CSS, and JavaScript CouchDB: The Definitive oreilly OReilly Media Guide DocBook: The Definitive oreilly OReilly Media Guide oreilly OReilly Media RESTful Web Services
  • 8. Joined Result Set Publisher (“left”) Book “right”publisher.id publisher.name book.title Building iPhone Apps with oreilly OReilly Media HTML, CSS, and JavaScript CouchDB: The Definitive oreilly OReilly Media Guide DocBook: The Definitive oreilly OReilly Media Guide oreilly OReilly Media RESTful Web Services
  • 9. Collated Result Set key id value ["oreilly",0] "oreilly" "OReilly Media" "Building iPhone Apps with ["oreilly",1] "oreilly" HTML, CSS, and JavaScript" "CouchDB: The Definitive ["oreilly",1] "oreilly" Guide" "DocBook: The Definitive ["oreilly",1] "oreilly" Guide" ["oreilly",1] "oreilly" "RESTful Web Services"
  • 10. Collated Result Set key id value["oreilly",0] "oreilly" "OReilly Media" Publisher "Building iPhone Apps with["oreilly",1] "oreilly" HTML, CSS, and JavaScript" "CouchDB: The Definitive["oreilly",1] "oreilly" Guide" "DocBook: The Definitive["oreilly",1] "oreilly" Guide"["oreilly",1] "oreilly" "RESTful Web Services"
  • 11. Collated Result Set key id value["oreilly",0] "oreilly" "OReilly Media" Publisher "Building iPhone Apps with["oreilly",1] "oreilly" HTML, CSS, and JavaScript" "CouchDB: The Definitive["oreilly",1] "oreilly" Guide" Books "DocBook: The Definitive["oreilly",1] "oreilly" Guide"["oreilly",1] "oreilly" "RESTful Web Services"
  • 12. View Result SetsMade up of columns and rowsEvery row has the same three columns: • key • id • valueColumns can contain a mixture of logical data types
  • 13. One to Many Relationships
  • 14. Embedded Entities:Nest related entities within a document
  • 15. Embedded EntitiesA single document represents the “one” entityNested entities (JSON Array) represents the “many” entitiesSimplest way to create a one to many relationship
  • 16. Example: Publisherwith Nested Books{ "_id":"oreilly", "collection":"publisher", "name":"OReilly Media", "books":[ { "title":"CouchDB: The Definitive Guide" }, { "title":"RESTful Web Services" }, { "title":"DocBook: The Definitive Guide" }, { "title":"Building iPhone Apps with HTML, CSS,and JavaScript" } ]}
  • 17. Map Functionfunction(doc) { if ("publisher" == doc.collection) { emit([doc._id, 0], doc.name); for (var i in doc.books) { emit([doc._id, 1], doc.books[i].title); } }}
  • 18. Result Set key id value ["oreilly",0] "oreilly" "OReilly Media" "Building iPhone Apps with ["oreilly",1] "oreilly" HTML, CSS, and JavaScript" "CouchDB: The Definitive ["oreilly",1] "oreilly" Guide" "DocBook: The Definitive ["oreilly",1] "oreilly" Guide" ["oreilly",1] "oreilly" "RESTful Web Services"
  • 19. LimitationsOnly works if there aren’t a large number of related entities: • Too many nested entities can result in very large documents • Slow to transfer between client and server • Unwieldy to modify • Time-consuming to index
  • 20. Related Documents:Reference an entity by its identifier
  • 21. Related DocumentsA document representing the “one” entitySeparate documents for each “many” entityEach “many” entity references its related“one” entity by the “one” entity’s document identifierMakes for smaller documentsReduces the probability of document update conflicts
  • 22. Example: Publisher{ "_id":"oreilly", "collection":"publisher", "name":"OReilly Media"}
  • 23. Example: Related Book{ "_id":"9780596155896", "collection":"book", "title":"CouchDB: The Definitive Guide", "publisher":"oreilly"}
  • 24. Map Functionfunction(doc) { if ("publisher" == doc.collection) { emit([doc._id, 0], doc.name); } if ("book" == doc.collection) { emit([doc.publisher, 1], doc.title); }}
  • 25. Result Set key id value["oreilly",0] "oreilly" "OReilly Media" "CouchDB: The Definitive["oreilly",1] "9780596155896" Guide"["oreilly",1] "9780596529260" "RESTful Web Services" "Building iPhone Apps with["oreilly",1] "9780596805791" HTML, CSS, and JavaScript" "DocBook: The Definitive["oreilly",1] "9781565925809" Guide"
  • 26. LimitationsWhen retrieving the entity on the “right” side of the relationship,one cannot include any data from the entity on the “left” side ofthe relationship without the use of an additional queryOnly works for one to many relationships
  • 27. Many to Many Relationships
  • 28. List of Keys:Reference entities by their identifiers
  • 29. List of KeysA document representing each “many” entity on the “left” sideof the relationshipSeparate documents for each “many” entity on the “right” sideof the relationshipEach “many” entity on the “right” side of the relationshipmaintains a list of document identifiers for its related “many”entities on the “left” side of the relationship
  • 30. Books and Related Authors
  • 31. Example: Book{ "_id":"9780596805029", "collection":"book", "title":"DocBook 5: The Definitive Guide"}
  • 32. Example: Book{ "_id":"9781565920514", "collection":"book", "title":"Making TeX Work"}
  • 33. Example: Book{ "_id":"9781565925809", "collection":"book", "title":"DocBook: The Definitive Guide"}
  • 34. Example: Author{ "_id":"muellner", "collection":"author", "name":"Leonard Muellner", "books":[ "9781565925809" ]}
  • 35. Example: Author{ "_id":"walsh", "collection":"author", "name":"Norman Walsh", "books":[ "9780596805029", "9781565925809", "9781565920514" ]}
  • 36. Map Functionfunction(doc) { if ("book" == doc.collection) { emit([doc._id, 0], doc.title); } if ("author" == doc.collection) { for (var i in doc.books) { emit([doc.books[i], 1], doc.name); } }}
  • 37. Result Set key id value["9780596805029",0] "9780596805029" "DocBook 5: The Definitive Guide"["9780596805029",1] "walsh" "Norman Walsh"["9781565920514",0] "9781565920514" "Making TeX Work"["9781565920514",1] "walsh" "Norman Walsh"["9781565925809",0] "9781565925809" "DocBook: The Definitive Guide"["9781565925809",1] "muellner" "Leonard Muellner"["9781565925809",1] "walsh" "Norman Walsh"
  • 38. Authors and Related Books
  • 39. Map Functionfunction(doc) { if ("author" == doc.collection) { emit([doc._id, 0], doc.name); for (var i in doc.books) { emit([doc._id, 1], {"_id":doc.books[i]}); } }}
  • 40. Result Set key id value["muellner",0] "muellner" "Leonard Muellner"["muellner",1] "muellner" {"_id":"9781565925809"}["walsh",0] "walsh" "Norman Walsh"["walsh",1] "walsh" {"_id":"9780596805029"}["walsh",1] "walsh" {"_id":"9781565920514"}["walsh",1] "walsh" {"_id":"9781565925809"}
  • 41. Including Docs include_docs=true key id value doc (truncated)["muellner",0] "muellner" … {"name":"Leonard Muellner"}["muellner",1] "muellner" … {"title":"DocBook: The Definitive Guide"}["walsh",0] "walsh" … {"name":"Norman Walsh"}["walsh",1] "walsh" … {"title":"DocBook 5: The Definitive Guide"}["walsh",1] "walsh" … {"title":"Making TeX Work"}["walsh",1] "walsh" … {"title":"DocBook: The Definitive Guide"}
  • 42. Or, we can reverse the references…
  • 43. Example: Author{ "_id":"muellner", "collection":"author", "name":"Leonard Muellner"}
  • 44. Example: Author{ "_id":"walsh", "collection":"author", "name":"Norman Walsh"}
  • 45. Example: Book{ "_id":"9780596805029", "collection":"book", "title":"DocBook 5: The Definitive Guide", "authors":[ "walsh" ]}
  • 46. Example: Book{ "_id":"9781565920514", "collection":"book", "title":"Making TeX Work", "authors":[ "walsh" ]}
  • 47. Example: Book{ "_id":"9781565925809", "collection":"book", "title":"DocBook: The Definitive Guide", "authors":[ "muellner", "walsh" ]}
  • 48. Map Functionfunction(doc) { if ("author" == doc.collection) { emit([doc._id, 0], doc.name); } if ("book" == doc.collection) { for (var i in doc.authors) { emit([doc.authors[i], 1], doc.title); } }}
  • 49. Result Set key id value["muellner",0] "muellner" "Leonard Muellner"["muellner",1] "9781565925809" "DocBook: The Definitive Guide"["walsh",0] "walsh" "Norman Walsh"["walsh",1] "9780596805029" "DocBook 5: The Definitive Guide"["walsh",1] "9781565920514" "Making TeX Work"["walsh",1] "9781565925809" "DocBook: The Definitive Guide"
  • 50. LimitationsQueries from the “right” side of the relationship cannot includeany data from entities on the “left” side of the relationship(without the use of include_docs)A document representing an entity with lots of relationshipscould become quite large
  • 51. Relationship Documents:Create a document to represent eachindividual relationship
  • 52. Relationship DocumentsA document representing each “many” entity on the “left” sideof the relationshipSeparate documents for each “many” entity on the “right” sideof the relationshipNeither the “left” nor “right” side of the relationship contain anydirect references to each otherFor each distinct relationship, a separate document includes thedocument identifiers for both the “left” and “right” sides of therelationship
  • 53. Example: Book{ "_id":"9780596805029", "collection":"book", "title":"DocBook 5: The Definitive Guide"}
  • 54. Example: Book{ "_id":"9781565920514", "collection":"book", "title":"Making TeX Work"}
  • 55. Example: Book{ "_id":"9781565925809", "collection":"book", "title":"DocBook: The Definitive Guide"}
  • 56. Example: Author{ "_id":"muellner", "collection":"author", "name":"Leonard Muellner"}
  • 57. Example: Author{ "_id":"walsh", "collection":"author", "name":"Norman Walsh"}
  • 58. Example:Relationship Document{ "_id":"44005f2c", "collection":"book-author", "book":"9780596805029", "author":"walsh"}
  • 59. Example:Relationship Document{ "_id":"44005f72", "collection":"book-author", "book":"9781565920514", "author":"walsh"}
  • 60. Example:Relationship Document{ "_id":"44006720", "collection":"book-author", "book":"9781565925809", "author":"muellner"}
  • 61. Example:Relationship Document{ "_id":"44006b0d", "collection":"book-author", "book":"9781565925809", "author":"walsh"}
  • 62. Books and Related Authors
  • 63. Map Functionfunction(doc) { if ("book" == doc.collection) { emit([doc._id, 0], doc.title); } if ("book-author" == doc.collection) { emit([doc.book, 1], {"_id":doc.author}); }}
  • 64. Result Set key id value["9780596805029",0] "9780596805029" "DocBook 5: The Definitive Guide"["9780596805029",1] "44005f2c" {"_id":"walsh"}["9781565920514",0] "9781565920514" "Making TeX Work"["9781565920514",1] "44005f72" {"_id":"walsh"}["9781565925809",0] "9781565925809" "DocBook: The Definitive Guide"["9781565925809",1] "44006720" {"_id":"muellner"}["9781565925809",1] "44006b0d" {"_id":"walsh"}
  • 65. Including Docs include_docs=true key id value doc (truncated)["9780596805029",0] … … {"title":"DocBook 5: The Definitive Guide"}["9780596805029",1] … … {"name":"Norman Walsh"}["9781565920514",0] … … {"title":"Making TeX Work"}["9781565920514",1] … … {"author","name":"Norman Walsh"}["9781565925809",0] … … {"title":"DocBook: The Definitive Guide"}["9781565925809",1] … … {"name":"Leonard Muellner"}["9781565925809",1] … … {"name":"Norman Walsh"}
  • 66. Authors and Related Books
  • 67. Map Functionfunction(doc) { if ("author" == doc.collection) { emit([doc._id, 0], doc.name); } if ("book-author" == doc.collection) { emit([doc.author, 1], {"_id":doc.book}); }}
  • 68. Result Set key id value["muellner",0] "muellner" "Leonard Muellner"["muellner",1] "44006720" {"_id":"9781565925809"}["walsh",0] "walsh" "Norman Walsh"["walsh",1] "44005f2c" {"_id":"9780596805029"}["walsh",1] "44005f72" {"_id":"9781565920514"}["walsh",1] "44006b0d" {"_id":"9781565925809"}
  • 69. Including Docsinclude_docs=true key id value doc (truncated)["muellner",0] … … {"name":"Leonard Muellner"}["muellner",1] … … {"title":"DocBook: The Definitive Guide"}["walsh",0] … … {"name":"Norman Walsh"}["walsh",1] … … {"title":"DocBook 5: The Definitive Guide"}["walsh",1] … … {"title":"Making TeX Work"}["walsh",1] … … {"title":"DocBook: The Definitive Guide"}
  • 70. LimitationsQueries can only contain data from the “left” or “right” side of therelationship (without the use of include_docs)Maintaining relationship documents may require more work
  • 71. Final Thoughts
  • 72. Document Databases Comparedto Relational DatabasesDocument databases have no tables (and therefore no columns)Indexes (views) are queried directly, instead of being used tooptimize more generalized queriesResult set columns can contain a mix of logical data typesNo built-in concept of relationships between documentsRelated entities can be embedded in a document, referenced froma document, or both
  • 73. CaveatsNo referential integrityNo atomic transactions across document boundariesSome patterns may involve denormalized (i.e. redundant) dataData inconsistencies are inevitable (i.e. eventual consistency)Consider the implications of replication—what may seemconsistent with one database may not be consistent across nodes(e.g. referencing entities that don’t yet exist on the node)
  • 74. Additional TechniquesUse the startkey and endkey parameters to retrieve one entity andits related entities: startkey=["9781565925809"]&endkey=["9781565925809",{}]Define a reduce function and use grouping levelsUse UUIDs rather than natural keys for better performanceUse the bulk document API when writing Relationship DocumentsWhen using the List of Keys or Relationship Documents patterns,denormalize data so that you can have data from the “right” and“left” side of the relationship within your query results
  • 75. Cheat Sheet Embedded Related Relationship List of Keys Entities Documents Documents One to Many ✓ ✓Many to Many ✓ ✓<= N* Relations ✓ ✓> N* Relations ✓ ✓ * where N is a large number for your system
  • 76. http://oreilly.com/catalog/9781449303129/ http://oreilly.com/catalog/9781449303433/
  • 77. Thank You @BradleyHolt http://bradley-holt.com bradley.holt@foundline.comCopyright © 2011-2012 Bradley Holt. All rights reserved.