• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content

Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Like this presentation? Why not share!

2010 11-02-documents

on

  • 4,021 views

 

Statistics

Views

Total Views
4,021
Views on SlideShare
3,913
Embed Views
108

Actions

Likes
7
Downloads
115
Comments
0

10 Embeds 108

http://www.berejeb.com 46
http://mwop.local 41
http://www.devetdesign.com 7
http://coderwall.com 4
http://twitter.com 3
http://mwop.localhost 2
http://zf2-mwop 2
http://www.slideshare.net 1
http://paper.li 1
http://test.mwop.net 1
More...

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Prague's Dancing House
  • Bill Karwin has written on this on his blog and in his book “SQL Anti-patterns”
  • Eric Evans has written the classic text on DDD, and performs DDD immersion classes regularly.
  • Transaction scripts do not need o be strictly procedural; they “pattern” can also apply to OOP code using such patterns as Strategy, Visitor, etc.
  • Also mention Azure Tables
  • Use AOP-like practices such as SignalSlot, Subject/Observer, etc to help automate this.

2010 11-02-documents 2010 11-02-documents Presentation Transcript

  • Exploring Document Databases zendcon 2010 Documents, Documents, Documents Matthew Weier O'Phinney Project Lead, Zend Framework
  • Writing the typical PHP app
  • design the schema (http://musicbrainz.org/)
  • take input, and shove it in a DB (http://musicbrainz.org/)
  • write queries to pull from the DB $result = mysql_query( "SELECT * FROM sometable" ); $rows = false; if (mysql_num_rows( $result ) > 0) { $rows = array (); while ( $row = mysql_fetch_assoc( $result )) { $rows [] = $row ; } }
  • spit data onto a page (http://www.irs.gov/)
  • Profit!
  • Things That Happen
  • Things That Happen go wrong
  • SQL Injection
  • performance issues
    • Expensive queries
    • Potentially ORM induced resource issues
  • Design Issues
  • design issues?
    • Many 1:1 or 1:N relationships
      • Non-trivial insert/update operations
      • Harder to hit indexes on read operations
      • For trivial stuff like tags, addresses, etc!
  • design issues?
    • Worse: Changing requirements
      • Additional columns needed?
      • Additional tables needed?
      • Occasional data needed?
  • design issues?
    • Entity-Attribute-Value Anti-Pattern
      • Often added after-the-fact, as requirements change or expand
      • Support arbitrary data for any record of any table
      • Can't do type enforcement
      • Leads to complex joins that often cannot hit table indexes
      • Leads to complex application logic to support retrieval and insertion of such metadata
  • Design First
  • Use Domain Driven Design (DDD), or Behavior Driven Design (BDD)
  • Develop your application logic first, in order to determine what needs to be persisted. the primary rule
  • define your application entities
  • use Plain Old PHP Objects class User { public function getId() {} public function setId( $value ) {} public function getRealname() {} public function setRealname( $value ) {} public function getEmail() {} public function setEmail( $value ) {} }
  • write tests class PostTest extends PHPUnit_Framework_TestCase { public function testRaisesExceptionOnInvalidDate() { $this ->setExpectedException( 'InvalidArgumentException' ); $this ->post->setDate( 'foo bar' ); } }
  • implement behaviors class Post { private $date ; private $timezone = 'America/New_York' ; public function setDate( $date ) { if (false === strtotime( $date )) { throw new InvalidArgumentException(); } $this ->date = new DateTime( $date , $this ->timezone); return $this ; } }
  • NOW determine what data you need to persist .
  • Define a schema based on the objects you use (http://musicbrainz.org/)
  • map entities to data store public function fromArray( array $data ) { $filter = new OptionsFilter(); foreach ( $data as $key => $value ) { $method = 'set' . $filter ( $key ); if (method_exists( $this , $method )) { $this -> $method ( $value ); } } } public function toArray() { return array ( '_id' => $this ->getId(), 'timestamp' => $this ->getTimestamp(), 'title' => $this ->getTitle(),
  • approaches
    • Transaction Scripts
    • Object Relational Maps (ORM)
  • use mappers or transaction scripts to translate objects to data & back $user = new User(); $user ->setId( 'matthew' ) ->setName( "Matthew Weier O'Phinney" ); $mapper ->save( $user ); $user = $repository ->find( 'matthew' );
  • additions
    • Service Layers
        • Interacts with domain entities
        • Good place for caching, ACLs, etc.
  • use service objects to manipulate entities namespace BlogService; class Entries { public function fetchEntry( $permalink ) {} public function fetchCommentCount( $permalink ) {} public function fetchComments( $permalink ) {} public function fetchTrackbacks( $permalink ) {} public function addComment( $permalink , array $comment ) {} public function addTrackback( $permalink , array $comment ) {} public function fetchTagCloud() {} }
  • Data Persistence
  • you have a choice
    • Before, relational databases were the only choice
  • you have a choice
    • Today , relational databases are only one choice
  • have your domain dictate storage
    • Do you have many arbitrary, row-specific fields in the design?
    • Do you need many pivot tables to describe a single entity?
    • Is transactional integrity part of your requirements?
    • Do changes need to be immediately available?
    • defining by what it isn't?
    • still defining by what it isn't …
  • types: key/value stores
    • each record is a key/value pair, (though the value may be non-scalar)
    • Interesting, but not what we're going to look at today.
  • types: document databases
    • Each document can define its own structure
    • Typically a document consists of many key/value pairs
    • This is what we'll look at!
    { _id: "weierophinney" , realname: "Matthew Weier O'Phinney" , email: "matthew@zend.com" , roles: [ "admin" , "user" ] }
  • document dbs are plentiful
  • document dbs solve web problems
    • Data can expand and add properties over time without requiring schema changes!
    • Different content types can co-exist in the same general storage
  • document dbs solve web problems
    • Aggregate related content in the document that owns it
      • Tags
      • Comments
      • Addresses
    • Eventual consistency
      • Updates often don't need to propagate in real-time
  • types of problems documents solve
    • Blog and News Posts
    • Product Entries
    • Content Management documents
    • … what don't they solve?
  • identifiers are king
    • Most are optimized for fetching via identifier
      • Provide your own IDs
      • Fallback on system (usually UUID )
  • mapping documents to objects
    • Many utilize JSON
    • If they don't, abstractions let you sling PHP arrays
    $result = $cxn ->fetch( $id ); $user = new User(); $user ->fromArray( $result ); $cxn ->save( $user ->toArray());
  • aggregate metadata
    • Instead of EAV tables, store metadata in the document
    { "_id" : "blog-post-stub" , "published" : true , "reviewed" : true , "reviewed_by" : "matthew" }
  • to pivot tables required!
    • Instead of pivot tables, aggregate data inside values
    { "_id" : "blog-post-stub" , "tags" : [ "zend framework" , "presentations" ] }
  • It's not all walks in the park
  • increased disk usage
    • Each document contains its schema
      • Silver lining: most solutions can cluster and/or provide sharding.
  • schema differences
    • How do you keep schemas in sync between documents when requirements change?
    • If you have multiple schemas for the same document type, what do you query on?
      • “ firstName” or “FIRST_NAME”?
    • Did you remember to create new indexes?
  • managing schema changes
    • Handle the differences in your application code
    switch ( $user ->schema_version) { case '2010-01-31' : // ... break ; case '2010-11-02' : // ... break ; } Meh.
  • managing schema changes
    • Do a batch conversion
      • Copy all records to a new database or collection
      • Migrate all records to the new schema
      • Point your application to the new database/collection
    Meh.
  • managing schema changes
    • Version the document schema
    • Update when fetched
    { "_id" : "blog-post-stub" , "schema_version" : "2010-11-02" } if ( $post ->schema_version != $latest ) { $post ->metadata = $post ->METADATA; $post ->schema_version = $latest ; unset ( $post ->METADATA); $mapper ->save( $post ); }
  •  
  • benefits you may enjoy
    • Easier mapping of document concepts to data persistence
    • Easier scaling
      • Most support clustering and sharding natively
      • Easier migration to cloud-based storage
  • Closing Notes
    • Don't start your development from the wrong end. Start with objects.
    • Be aware of all the options you have for persisting data; choose appropriately.
    • Consider document data stores when your objects represent content; store metadata in the document.
  • Thank you Feedback? http://joind.in/2233 http://twitter.com/weierophinney http://framework.zend.com/