2010 11-02-documents

4,422 views
4,293 views

Published on

0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,422
On SlideShare
0
From Embeds
0
Number of Embeds
122
Actions
Shares
0
Downloads
116
Comments
0
Likes
8
Embeds 0
No embeds

No notes for slide
  • Prague's Dancing House
  • Bill Karwin has written on this on his blog and in his book “SQL Anti-patterns”
  • Eric Evans has written the classic text on DDD, and performs DDD immersion classes regularly.
  • Transaction scripts do not need o be strictly procedural; they “pattern” can also apply to OOP code using such patterns as Strategy, Visitor, etc.
  • Also mention Azure Tables
  • Use AOP-like practices such as SignalSlot, Subject/Observer, etc to help automate this.
  • 2010 11-02-documents

    1. 1. Exploring Document Databases zendcon 2010 Documents, Documents, Documents Matthew Weier O'Phinney Project Lead, Zend Framework
    2. 2. Writing theWriting the typical PHP apptypical PHP app
    3. 3. design the schema (http://musicbrainz.org/)
    4. 4. take input, and shove it in a DB (http://musicbrainz.org/)
    5. 5. write queries to pull from the DB $result = mysql_query( "SELECT * FROM sometable" ); $rows = false; if (mysql_num_rows($result) > 0) { $rows = array(); while ($row = mysql_fetch_assoc($result)) { $rows[] = $row; } } $result = mysql_query( "SELECT * FROM sometable" ); $rows = false; if (mysql_num_rows($result) > 0) { $rows = array(); while ($row = mysql_fetch_assoc($result)) { $rows[] = $row; } }
    6. 6. spit data onto a page (http://www.irs.gov/)
    7. 7. Profit!Profit!
    8. 8. ThingsThings That HappenThat Happen
    9. 9. ThingsThings That HappenThat Happen go wronggo wrong
    10. 10. SQL InjectionSQL Injection
    11. 11. performance issues ➔ Expensive queries ➔ Potentially ORM induced  resource issues
    12. 12. DesignDesign IssuesIssues
    13. 13. design issues? ➔ Many 1:1 or 1:N  relationships ➔ Non­trivial insert/update  operations ➔ Harder to hit indexes on read  operations ➔ For trivial stuff like tags,  addresses, etc!
    14. 14. design issues? ➔ Worse: Changing  requirements ➔ Additional columns needed? ➔ Additional tables needed? ➔ Occasional data needed?
    15. 15. design issues? ➔ Entity­Attribute­Value  Anti­Pattern ➔ Often added after­the­fact, as requirements  change or expand ➔ Support arbitrary data for any record of any  table ➔ Can't do type enforcement ➔ Leads to complex joins that often cannot hit  table indexes ➔ Leads to complex application logic to support  retrieval and insertion of such metadata
    16. 16. Design FirstDesign First
    17. 17. Use Domain Driven  Design (DDD), or  Behavior Driven  Design (BDD)
    18. 18. Develop your application  logic first, in order to  determine what needs to  be persisted. the primary rule
    19. 19. define your application entities
    20. 20. use Plain Old PHP Objects class User { public function getId() {} public function setId($value) {} public function getRealname() {} public function setRealname($value) {} public function getEmail() {} public function setEmail($value) {} } class User { public function getId() {} public function setId($value) {} public function getRealname() {} public function setRealname($value) {} public function getEmail() {} public function setEmail($value) {} }
    21. 21. write tests class PostTest extends PHPUnit_Framework_TestCase { public function testRaisesExceptionOnInvalidDate() { $this->setExpectedException( 'InvalidArgumentException'); $this->post->setDate('foo bar'); } } class PostTest extends PHPUnit_Framework_TestCase { public function testRaisesExceptionOnInvalidDate() { $this->setExpectedException( 'InvalidArgumentException'); $this->post->setDate('foo bar'); } }
    22. 22. implement behaviors class Post { private $date; private $timezone = 'America/New_York'; public function setDate($date) { if (false === strtotime($date)) { throw new InvalidArgumentException(); } $this->date = new DateTime( $date, $this->timezone); return $this; } class Post { private $date; private $timezone = 'America/New_York'; public function setDate($date) { if (false === strtotime($date)) { throw new InvalidArgumentException(); } $this->date = new DateTime( $date, $this->timezone); return $this; }
    23. 23. NOW  determine  what data  you need  to persist.
    24. 24. Define a schema based on the objects you use (http://musicbrainz.org/)
    25. 25. map entities to data store public function fromArray(array $data) { $filter = new OptionsFilter(); foreach ($data as $key => $value) { $method = 'set' . $filter($key); if (method_exists($this, $method)) { $this->$method($value); } } } public function toArray() { return array( '_id' => $this->getId(), public function fromArray(array $data) { $filter = new OptionsFilter(); foreach ($data as $key => $value) { $method = 'set' . $filter($key); if (method_exists($this, $method)) { $this->$method($value); } } } public function toArray() { return array( '_id' => $this->getId(),
    26. 26. approaches ➔ Transaction Scripts ➔ Object Relational Maps  (ORM)
    27. 27. use mappers or transaction scripts to translate objects to data & back $user = new User(); $user->setId('matthew') ->setName("Matthew Weier O'Phinney"); $mapper->save($user); $user = $repository->find('matthew'); $user = new User(); $user->setId('matthew') ->setName("Matthew Weier O'Phinney"); $mapper->save($user); $user = $repository->find('matthew');
    28. 28. additions ➔ Service Layers ➔Interacts with domain  entities ➔Good place for caching,  ACLs, etc.
    29. 29. use service objects to manipulate entities namespace BlogService; class Entries { public function fetchEntry($permalink) {} public function fetchCommentCount( $permalink) {} public function fetchComments($permalink) {} public function fetchTrackbacks($permalink) {} public function addComment($permalink, array $comment) {} public function addTrackback($permalink, array $comment) {} public function fetchTagCloud() {} } namespace BlogService; class Entries { public function fetchEntry($permalink) {} public function fetchCommentCount( $permalink) {} public function fetchComments($permalink) {} public function fetchTrackbacks($permalink) {} public function addComment($permalink, array $comment) {} public function addTrackback($permalink, array $comment) {} public function fetchTagCloud() {} }
    30. 30. DataData PersistencePersistence
    31. 31. you have a choice ➔ Before, relational databases  were the only choice
    32. 32. you have a choice ➔ Today, relational databases are  only one choice
    33. 33. have your domain dictate storage ➔ Do you have many arbitrary,  row­specific fields in the  design? ➔ Do you need many pivot tables  to describe a single entity? ➔ Is transactional integrity part of  your requirements? ➔ Do changes need to be  immediately available?
    34. 34. ➔ defining by what it isn't?
    35. 35. ➔ still defining by what it isn't …
    36. 36. types: key/value stores ➔ each record is a key/value pair,  (though the value may be non­scalar) ➔ Interesting, but not what we're  going to look at today.
    37. 37. types: document databases ➔ Each document can define its  own structure ➔ Typically a document consists  of many key/value pairs ➔ This is what we'll look at! { _id: "weierophinney", realname: "Matthew Weier O'Phinney", email: "matthew@zend.com", roles: [ "admin", "user" ] } { _id: "weierophinney", realname: "Matthew Weier O'Phinney", email: "matthew@zend.com", roles: [ "admin", "user" ] }
    38. 38. document dbs are plentiful
    39. 39. document dbs solve web problems ➔ Data can expand and add  properties over time without requiring schema  changes! ➔ Different content types can  co­exist in the same general  storage
    40. 40. document dbs solve web problems ➔ Aggregate related content in the  document that owns it ➔ Tags ➔ Comments ➔ Addresses ➔ Eventual consistency ➔ Updates often don't need to  propagate in real­time
    41. 41. types of problems documents solve ➔ Blog and News Posts ➔ Product Entries ➔ Content Management  documents ➔ … what don't they solve?
    42. 42. identifiers are king ➔ Most are optimized for fetching  via identifier ➔ Provide your own IDs ➔ Fallback on system  (usually UUID)
    43. 43. mapping documents to objects ➔ Many utilize JSON ➔ If they don't, abstractions let  you sling PHP arrays $result = $cxn->fetch($id); $user = new User(); $user->fromArray($result); $cxn->save($user->toArray()); $result = $cxn->fetch($id); $user = new User(); $user->fromArray($result); $cxn->save($user->toArray());
    44. 44. aggregate metadata ➔ Instead of EAV tables, store  metadata in the document { "_id" : "blog-post-stub", "published" : true, "reviewed" : true, "reviewed_by" : "matthew" } { "_id" : "blog-post-stub", "published" : true, "reviewed" : true, "reviewed_by" : "matthew" }
    45. 45. to pivot tables required! ➔ Instead of pivot tables,  aggregate data inside values { "_id" : "blog-post-stub", "tags" : [ "zend framework", "presentations" ] } { "_id" : "blog-post-stub", "tags" : [ "zend framework", "presentations" ] }
    46. 46. It's not all walks in the parkIt's not all walks in the park
    47. 47. increased disk usage ➔ Each document contains its  schema ➔ Silver lining: most solutions can  cluster and/or provide sharding.
    48. 48. schema differences ➔ How do you keep schemas in  sync between documents when  requirements change? ➔ If you have multiple schemas  for the same document type,  what do you query on? ➔ “firstName” or “FIRST_NAME”? ➔ Did you remember to create  new indexes?
    49. 49. managing schema changes ➔ Handle the differences in  your application code switch ($user->schema_version) { case '2010-01-31': // ... break; case '2010-11-02': // ... break; } switch ($user->schema_version) { case '2010-01-31': // ... break; case '2010-11-02': // ... break; } Meh.Meh.
    50. 50. managing schema changes ➔ Do a batch conversion ➔ Copy all records to a new  database or collection ➔ Migrate all records to the new  schema ➔ Point your application to the  new database/collection Meh.Meh.
    51. 51. managing schema changes ➔ Version the document  schema ➔ Update when fetched { "_id" : "blog-post-stub", "schema_version" : "2010-11-02" } { "_id" : "blog-post-stub", "schema_version" : "2010-11-02" } if ($post->schema_version != $latest) { $post->metadata = $post->METADATA; $post->schema_version = $latest; unset($post->METADATA); $mapper->save($post); } if ($post->schema_version != $latest) { $post->metadata = $post->METADATA; $post->schema_version = $latest; unset($post->METADATA); $mapper->save($post); }
    52. 52. benefits you may enjoy ➔ Easier mapping of document  concepts to data persistence ➔ Easier scaling ➔ Most support clustering and  sharding natively ➔ Easier migration to cloud­based  storage
    53. 53. Closing Notes
    54. 54. ➔ Don't start your development from the wrong end. Start with objects. ➔ Be aware of all the options you have for persisting data; choose appropriately. ➔ Consider document data stores when your objects represent content; store metadata in the document.
    55. 55. Thank youThank youFeedback? http://joind.in/2233 http://twitter.com/weierophinney http://framework.zend.com/

    ×