Entity Relationships in a Document Database at ZendCon 2012
Upcoming SlideShare
Loading in...5
×
 

Entity Relationships in a Document Database at ZendCon 2012

on

  • 974 views

Unlike relational databases, document databases like CouchDB and MongoDB do not directly support entity relationships. This talk will explore patterns of modeling one-to-many and many-to-many entity ...

Unlike relational databases, document databases like CouchDB and MongoDB do not directly support entity relationships. This talk will explore patterns of modeling one-to-many and many-to-many entity relationships in a document database. These patterns include using an embedded JSON array, relating documents using identifiers, using a list of keys, and using relationship documents. This talk will explore how these entity relationship patterns equate to how entities are joined in a relational database. We'll take a look at the relevant differences between document databases and relational databases. For example, document databases do not have tables, each document can have its own schema, there is no built-in concept of relationships between documents, views/indexes are queried directly instead of being used to optimize more generalized queries, a column within a result set can contain a mix of logical data types, and there is typically no support for transactions across document boundaries.

Statistics

Views

Total Views
974
Views on SlideShare
974
Embed Views
0

Actions

Likes
1
Downloads
11
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Entity Relationships in a Document Database at ZendCon 2012 Entity Relationships in a Document Database at ZendCon 2012 Presentation Transcript

  • Entity Relationships ina Document Database MapReduce Views for SQL Users
  • When to Choose aDocument DatabaseYou’re using a relational database, but have been relyingheavily on denormalization to optimize read performanceYou would like to give up consistency in exchange for ahigh level of concurrencyYour data model is a “fit” for documents (e.g. a CMS)
  • When Not to Choose aDocument DatabaseYour data fits better in a relational model—SQL is a powerfuland mature language for working with relational data setsConsistency is critical to your applicationYou haven’t bothered exploring scalability options foryour current database
  • Incremental Map/Reduce"How fucked is my NoSQL database?" howfuckedismydatabase.com. 2009. http://howfuckedismydatabase.com/nosql/ (24 October 2012).
  • EntityRelationshipModel
  • Join vs. Collation
  • SQL Query JoiningPublishers and BooksSELECT `publisher`.`id`, `publisher`.`name`, `book`.`title`FROM `publisher`FULL OUTER JOIN `book` ON `publisher`.`id` = `book`.`publisher_id`ORDER BY `publisher`.`id`, `book`.`title`;
  • Joined Result Set Publisher (“left”) Book “right”publisher.id publisher.name book.title Building iPhone Apps with oreilly OReilly Media HTML, CSS, and JavaScript CouchDB: The Definitive oreilly OReilly Media Guide DocBook: The Definitive oreilly OReilly Media Guide oreilly OReilly Media RESTful Web Services
  • Collated Result Set key id value["oreilly",0] "oreilly" "OReilly Media" Publisher "Building iPhone Apps with["oreilly",1] "oreilly" HTML, CSS, and JavaScript" "CouchDB: The Definitive["oreilly",1] "oreilly" Guide" Books "DocBook: The Definitive["oreilly",1] "oreilly" Guide"["oreilly",1] "oreilly" "RESTful Web Services"
  • View Result SetsMade up of columns and rowsEvery row has the same three columns: • key • id • valueColumns can contain a mixture of logical data types
  • One to Many Relationships
  • Embedded Entities:Nest related entities within a document
  • Embedded EntitiesA single document represents the “one” entityNested entities (JSON Array) represents the “many” entitiesSimplest way to create a one to many relationship
  • Example: Publisherwith Nested Books{ "_id":"oreilly", "collection":"publisher", "name":"OReilly Media", "books":[ { "title":"CouchDB: The Definitive Guide" }, { "title":"RESTful Web Services" }, { "title":"DocBook: The Definitive Guide" }, { "title":"Building iPhone Apps with HTML, CSS,and JavaScript" } ]}
  • Map Functionfunction(doc) { if ("publisher" == doc.collection) { emit([doc._id, 0], doc.name); for (var i in doc.books) { emit([doc._id, 1], doc.books[i].title); } }}
  • Result Set key id value ["oreilly",0] "oreilly" "OReilly Media" "Building iPhone Apps with ["oreilly",1] "oreilly" HTML, CSS, and JavaScript" "CouchDB: The Definitive ["oreilly",1] "oreilly" Guide" "DocBook: The Definitive ["oreilly",1] "oreilly" Guide" ["oreilly",1] "oreilly" "RESTful Web Services"
  • LimitationsOnly works if there aren’t a large number of related entities: • Too many nested entities can result in very large documents • Slow to transfer between client and server • Unwieldy to modify • Time-consuming to index
  • Related Documents:Reference an entity by its identifier
  • Related DocumentsA document representing the “one” entitySeparate documents for each “many” entityEach “many” entity references its related“one” entity by the “one” entity’s document identifierMakes for smaller documentsReduces the probability of document update conflicts
  • Example: Publisher{ "_id":"oreilly", "collection":"publisher", "name":"OReilly Media"}
  • Example: Related Book{ "_id":"9780596155896", "collection":"book", "title":"CouchDB: The Definitive Guide", "publisher":"oreilly"}
  • Map Functionfunction(doc) { if ("publisher" == doc.collection) { emit([doc._id, 0], doc.name); } if ("book" == doc.collection) { emit([doc.publisher, 1], doc.title); }}
  • Result Set key id value["oreilly",0] "oreilly" "OReilly Media" "CouchDB: The Definitive["oreilly",1] "9780596155896" Guide"["oreilly",1] "9780596529260" "RESTful Web Services" "Building iPhone Apps with["oreilly",1] "9780596805791" HTML, CSS, and JavaScript" "DocBook: The Definitive["oreilly",1] "9781565925809" Guide"
  • LimitationsWhen retrieving the entity on the “right” side of the relationship,one cannot include any data from the entity on the “left” side ofthe relationship without the use of an additional queryOnly works for one to many relationships
  • Many to Many Relationships
  • List of Keys:Reference entities by their identifiers
  • List of KeysA document representing each “many” entity on the “left” sideof the relationshipSeparate documents for each “many” entity on the “right” sideof the relationshipEach “many” entity on the “right” side of the relationshipmaintains a list of document identifiers for its related “many”entities on the “left” side of the relationship
  • Books and Related Authors
  • Example: Book{ "_id":"9780596805029", "collection":"book", "title":"DocBook 5: The Definitive Guide"}
  • Example: Book{ "_id":"9781565920514", "collection":"book", "title":"Making TeX Work"}
  • Example: Book{ "_id":"9781565925809", "collection":"book", "title":"DocBook: The Definitive Guide"}
  • Example: Author{ "_id":"muellner", "collection":"author", "name":"Leonard Muellner", "books":[ "9781565925809" ]}
  • Example: Author{ "_id":"walsh", "collection":"author", "name":"Norman Walsh", "books":[ "9780596805029", "9781565925809", "9781565920514" ]}
  • Map Functionfunction(doc) { if ("book" == doc.collection) { emit([doc._id, 0], doc.title); } if ("author" == doc.collection) { for (var i in doc.books) { emit([doc.books[i], 1], doc.name); } }}
  • Result Set key id value["9780596805029",0] "9780596805029" "DocBook 5: The Definitive Guide"["9780596805029",1] "walsh" "Norman Walsh"["9781565920514",0] "9781565920514" "Making TeX Work"["9781565920514",1] "walsh" "Norman Walsh"["9781565925809",0] "9781565925809" "DocBook: The Definitive Guide"["9781565925809",1] "muellner" "Leonard Muellner"["9781565925809",1] "walsh" "Norman Walsh"
  • Authors and Related Books
  • Map Functionfunction(doc) { if ("author" == doc.collection) { emit([doc._id, 0], doc.name); for (var i in doc.books) { emit([doc._id, 1], {"_id":doc.books[i]}); } }}
  • Result Set key id value["muellner",0] "muellner" "Leonard Muellner"["muellner",1] "muellner" {"_id":"9781565925809"}["walsh",0] "walsh" "Norman Walsh"["walsh",1] "walsh" {"_id":"9780596805029"}["walsh",1] "walsh" {"_id":"9781565920514"}["walsh",1] "walsh" {"_id":"9781565925809"}
  • Including Docs include_docs=true key id value doc (truncated)["muellner",0] "muellner" … {"name":"Leonard Muellner"}["muellner",1] "muellner" … {"title":"DocBook: The Definitive Guide"}["walsh",0] "walsh" … {"name":"Norman Walsh"}["walsh",1] "walsh" … {"title":"DocBook 5: The Definitive Guide"}["walsh",1] "walsh" … {"title":"Making TeX Work"}["walsh",1] "walsh" … {"title":"DocBook: The Definitive Guide"}
  • Or, we can reverse the references…
  • Example: Author{ "_id":"muellner", "collection":"author", "name":"Leonard Muellner"}
  • Example: Author{ "_id":"walsh", "collection":"author", "name":"Norman Walsh"}
  • Example: Book{ "_id":"9780596805029", "collection":"book", "title":"DocBook 5: The Definitive Guide", "authors":[ "walsh" ]}
  • Example: Book{ "_id":"9781565920514", "collection":"book", "title":"Making TeX Work", "authors":[ "walsh" ]}
  • Example: Book{ "_id":"9781565925809", "collection":"book", "title":"DocBook: The Definitive Guide", "authors":[ "muellner", "walsh" ]}
  • Map Functionfunction(doc) { if ("author" == doc.collection) { emit([doc._id, 0], doc.name); } if ("book" == doc.collection) { for (var i in doc.authors) { emit([doc.authors[i], 1], doc.title); } }}
  • Result Set key id value["muellner",0] "muellner" "Leonard Muellner"["muellner",1] "9781565925809" "DocBook: The Definitive Guide"["walsh",0] "walsh" "Norman Walsh"["walsh",1] "9780596805029" "DocBook 5: The Definitive Guide"["walsh",1] "9781565920514" "Making TeX Work"["walsh",1] "9781565925809" "DocBook: The Definitive Guide"
  • LimitationsQueries from the “right” side of the relationship cannot includeany data from entities on the “left” side of the relationship(without the use of include_docs)A document representing an entity with lots of relationshipscould become quite large
  • Relationship Documents:Create a document to represent eachindividual relationship
  • Relationship DocumentsA document representing each “many” entity on the “left” sideof the relationshipSeparate documents for each “many” entity on the “right” sideof the relationshipNeither the “left” nor “right” side of the relationship contain anydirect references to each otherFor each distinct relationship, a separate document includes thedocument identifiers for both the “left” and “right” sides of therelationship
  • Example: Book{ "_id":"9780596805029", "collection":"book", "title":"DocBook 5: The Definitive Guide"}
  • Example: Book{ "_id":"9781565920514", "collection":"book", "title":"Making TeX Work"}
  • Example: Book{ "_id":"9781565925809", "collection":"book", "title":"DocBook: The Definitive Guide"}
  • Example: Author{ "_id":"muellner", "collection":"author", "name":"Leonard Muellner"}
  • Example: Author{ "_id":"walsh", "collection":"author", "name":"Norman Walsh"}
  • Example:Relationship Document{ "_id":"44005f2c", "collection":"book-author", "book":"9780596805029", "author":"walsh"}
  • Example:Relationship Document{ "_id":"44005f72", "collection":"book-author", "book":"9781565920514", "author":"walsh"}
  • Example:Relationship Document{ "_id":"44006720", "collection":"book-author", "book":"9781565925809", "author":"muellner"}
  • Example:Relationship Document{ "_id":"44006b0d", "collection":"book-author", "book":"9781565925809", "author":"walsh"}
  • Books and Related Authors
  • Map Functionfunction(doc) { if ("book" == doc.collection) { emit([doc._id, 0], doc.title); } if ("book-author" == doc.collection) { emit([doc.book, 1], {"_id":doc.author}); }}
  • Result Set key id value["9780596805029",0] "9780596805029" "DocBook 5: The Definitive Guide"["9780596805029",1] "44005f2c" {"_id":"walsh"}["9781565920514",0] "9781565920514" "Making TeX Work"["9781565920514",1] "44005f72" {"_id":"walsh"}["9781565925809",0] "9781565925809" "DocBook: The Definitive Guide"["9781565925809",1] "44006720" {"_id":"muellner"}["9781565925809",1] "44006b0d" {"_id":"walsh"}
  • Including Docs include_docs=true key id value doc (truncated)["9780596805029",0] … … {"title":"DocBook 5: The Definitive Guide"}["9780596805029",1] … … {"name":"Norman Walsh"}["9781565920514",0] … … {"title":"Making TeX Work"}["9781565920514",1] … … {"author","name":"Norman Walsh"}["9781565925809",0] … … {"title":"DocBook: The Definitive Guide"}["9781565925809",1] … … {"name":"Leonard Muellner"}["9781565925809",1] … … {"name":"Norman Walsh"}
  • Authors and Related Books
  • Map Functionfunction(doc) { if ("author" == doc.collection) { emit([doc._id, 0], doc.name); } if ("book-author" == doc.collection) { emit([doc.author, 1], {"_id":doc.book}); }}
  • Result Set key id value["muellner",0] "muellner" "Leonard Muellner"["muellner",1] "44006720" {"_id":"9781565925809"}["walsh",0] "walsh" "Norman Walsh"["walsh",1] "44005f2c" {"_id":"9780596805029"}["walsh",1] "44005f72" {"_id":"9781565920514"}["walsh",1] "44006b0d" {"_id":"9781565925809"}
  • Including Docsinclude_docs=true key id value doc (truncated)["muellner",0] … … {"name":"Leonard Muellner"}["muellner",1] … … {"title":"DocBook: The Definitive Guide"}["walsh",0] … … {"name":"Norman Walsh"}["walsh",1] … … {"title":"DocBook 5: The Definitive Guide"}["walsh",1] … … {"title":"Making TeX Work"}["walsh",1] … … {"title":"DocBook: The Definitive Guide"}
  • LimitationsQueries can only contain data from the “left” or “right” side of therelationship (without the use of include_docs)Maintaining relationship documents may require more work
  • Doctrine’s Object-DocumentMapper (ODM)
  • Doctrine CouchDB[1] 1. http://docs.doctrine-project.org/projects/doctrine-couchdb/
  • FeaturesIncludes a CouchDB client library and ODMMaps documents using Doctrine’s persistence semanticsMaps CouchDB views to PHP objectsDocument conflict resolution supportIncludes a write-behind feature for increased performance
  • Defining an Entity[1]/** @Document */class BlogPost{ /** @Id */ private $id; /** @Field(type="string") */ private $headline; /** @Field(type="string") */ private $text; /** @Field(type="datetime") */ private $publishDate; // getter/setter here} 1. http://docs.doctrine-project.org/projects/doctrine-couchdb/en/latest/reference/introduction.html#architecture
  • Persisting an Entity[1]$blogPost = new BlogPost();$blogPost->setHeadline("Hello World!");$blogPost->setText("This is a blog post going tobe saved into CouchDB");$blogPost->setPublishDate(new DateTime("now"));$dm->persist($blogPost);$dm->flush(); 1. http://docs.doctrine-project.org/projects/doctrine-couchdb/en/latest/reference/introduction.html#architecture
  • Querying an Entity[1]// $dm is an instance of DoctrineODMCouchDBDocumentManager$blogPost = $dm->find("MyAppDocumentBlogPost",$theUUID); 1. http://docs.doctrine-project.org/projects/doctrine-couchdb/en/latest/reference/introduction.html#querying
  • Doctrine MongoDB ODM [1] 1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/
  • FeaturesMaps documents using Doctrine’s persistence semanticsMap embedded documentsMap referenced documentsUses batch insertsPerforms atomic updates
  • Defining Entities[1]/** @MappedSuperclass */abstract class BaseEmployee{ /** @Id */ private $id; /** @EmbedOne(targetDocument="Address") */ private $address; // ...} 1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview
  • Defining Entities[1]/** @Document */class Employee extends BaseEmployee{ /** @ReferenceOne(targetDocument="DocumentsManager") */ private $manager; // ...} 1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview
  • Defining Entities[1]/** @Document */class Manager extends BaseEmployee{ /** @ReferenceMany(targetDocument="DocumentsProject") */ private $projects = array(); // ...} 1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview
  • Defining Entities[1]/** @EmbeddedDocument */class Address{ /** @String */ private $address; /** @String */ private $city; /** @String */ private $state; /** @String */ private $zipcode; // ...} 1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview
  • Defining Entities[1]/** @Document */class Project{ /** @Id */ private $id; /** @String */ private $name; public function __construct($name) { $this->name = $name; } // ...} 1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview
  • Persisting Entities[1]$employee = new Employee();$employee->setName(Employee);$employee->setSalary(50000.00);$employee->setStarted(new DateTime()); 1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview
  • Persisting Entities[1]$address = new Address();$address->setAddress(555 Doctrine Rd.);$address->setCity(Nashville);$address->setState(TN);$address->setZipcode(37209);$employee->setAddress($address); 1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview
  • Persisting Entities[1]$project = new Project(New Project);$manager = new Manager();$manager->setName(Manager);$manager->setSalary(100000.00);$manager->setStarted(new DateTime());$manager->addProject($project); 1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview
  • Persisting Entities[1]// $dm is an instance of DoctrineODMMongoDBDocumentManager$dm->persist($employee);$dm->persist($address);$dm->persist($project);$dm->persist($manager);$dm->flush(); 1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview
  • Querying an Entity// $dm is an instance of DoctrineODMMongoDBDocumentManager$manager = $dm->find("DocumentsManager",$theID);
  • Final Thoughts
  • Document Databases Comparedto Relational DatabasesDocument databases have no tables (and therefore no columns)Indexes (views) are queried directly, instead of being used tooptimize more generalized queriesResult set columns can contain a mix of logical data typesNo built-in concept of relationships between documentsRelated entities can be embedded in a document, referenced froma document, or both
  • CaveatsNo referential integrityNo atomic transactions across document boundariesSome patterns may involve denormalized (i.e. redundant) dataData inconsistencies are inevitable (i.e. eventual consistency)Consider the implications of replication—what may seemconsistent with one database may not be consistent across nodes(e.g. referencing entities that don’t yet exist on the node)
  • Additional TechniquesUse the startkey and endkey parameters to retrieve one entity andits related entities: startkey=["9781565925809"]&endkey=["9781565925809",{}]Define a reduce function and use grouping levelsUse UUIDs rather than natural keys for better performanceUse the bulk document API when writing Relationship DocumentsWhen using the List of Keys or Relationship Documents patterns,denormalize data so that you can have data from the “right” and“left” side of the relationship within your query results
  • Cheat Sheet Embedded Related Relationship List of Keys Entities Documents Documents One to Many ✓ ✓Many to Many ✓ ✓<= N* Relations ✓ ✓> N* Relations ✓ ✓ * where N is a large number for your system
  • http://oreilly.com/catalog/9781449303129/ http://oreilly.com/catalog/9781449303433/
  • Thank You @BradleyHolt http://bradley-holt.com https://joind.in/7040Copyright © 2011-2012 Bradley Holt. All rights reserved.