• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Thinking in documents
 

Thinking in documents

on

  • 4,532 views

Introduction to NoSQL database in general, focusing on MongoDB

Introduction to NoSQL database in general, focusing on MongoDB

Statistics

Views

Total Views
4,532
Views on SlideShare
2,960
Embed Views
1,572

Actions

Likes
5
Downloads
72
Comments
0

10 Embeds 1,572

http://crodas.org 1512
http://www.slideshare.net 24
http://crodas.net 12
http://www.crodas.net 8
http://www.linkedin.com 6
http://webcache.googleusercontent.com 4
https://www.linkedin.com 3
http://womm.chu.pe 1
http://www.copyscape.com 1
http://cache.baidu.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Thinking in documents Thinking in documents Presentation Transcript

    • Thinking in Documents (dropping ACID) César D. Rodas crodas@member.fsf.org http://crodas.org/ PHP Conference 2009 Sâo Paulo, Brasil 1
    • Who is this fellow? Paraguayan Part of the Google Summer of Code 2008 PHP Classes Innovation Award winner 2007, 2008 ... and some other few things @crodas - http://crodas.org/ - L EX AT 2
    • Agenda How to scale The Web's major bottleneck NoSQL databases • Redis • Tokyo Cabinet • Cassandra • CouchDB • MongoDB Thinking in documents • Data behavior • Complex operations PHP Integration (The fun part!) Map/Reduce (Extra time) @crodas - http://crodas.org/ - L EX AT 3
    • Scaling? @crodas - http://crodas.org/ - L EX AT 4
    • Increase computational power @crodas - http://crodas.org/ - L EX AT 5
    • To make it reliable @crodas - http://crodas.org/ - L EX AT 6
    • DISTRIBUTED @crodas - http://crodas.org/ - L EX AT 7
    • How to scale Buying more hardware (and connectivity) Reverses (threaded) proxies DNS round robin for your Reverses proxies Gearmand Memcached and.. What about the data? @crodas - http://crodas.org/ - L EX AT 8
    • How to scale data? @crodas - http://crodas.org/ - L EX AT 9
    • The hardest way @crodas - http://crodas.org/ - L EX AT 10
    • Scaling RDBMS - Solutions Master - Slave replication Multi-Master replication Data sharding DRDB and Heartbeat (RAID-1 over the network) @crodas - http://crodas.org/ - L EX AT 11
    • @crodas - http://crodas.org/ - L EX AT 12
    • Master-Slave replication We need to modify our app It worth only if our application is read intense It doesn't spread the data across servers Single point of failure @crodas - http://crodas.org/ - L EX AT 13
    • Scaling RDBMS - Problems SQL JOIN Autoincrement Transactions (ACID) @crodas - http://crodas.org/ - L EX AT 14
    • The easiest way @crodas - http://crodas.org/ - L EX AT 15
    • Strong Consistency, High Availability, Partition-tolerance Theorem @crodas - http://crodas.org/ - L EX AT 16
    • BASE Basically Available, Soft state, Eventually Consistent @crodas - http://crodas.org/ - L EX AT 17
    • Everybody is doing it Google Amazon eBay Yahoo! Facebook ... @crodas - http://crodas.org/ - L EX AT 18
    • Open implementations Cassandra Redis Tokyo Cabinet/Tyrant CouchDB MongoDB (FTW!) ... @crodas - http://crodas.org/ - L EX AT 19
    • Cassandra No master (p2p) Storage model more like BigTable Open source Incremental scalable PHP interface (with Thrift) Never played too much with it. @crodas - http://crodas.org/ - L EX AT 20
    • Key-value @crodas - http://crodas.org/ - L EX AT 21
    • Key-value Fast Similar to PHP's array Simple Easy to distribute across machines @crodas - http://crodas.org/ - L EX AT 22
    • Memcached It is a key-value store engine used as a cache. No persistence(RAM, uses LRU) Lightening fast Well supported *Everybody* is using it Several clients for PHP [even I had wrote one ;-)] @crodas - http://crodas.org/ - L EX AT 23
    • Redis Very new As fast as Memcached Persistent to disk Very simple protocol Support lists and tuples Replication Operation in the key space I loved it! • Until I realised it is in-memory DB @crodas - http://crodas.org/ - L EX AT 24
    • Tokyo Tyrant Very similar to BerkeleyDB ( dba open() ) Performs well (I've been playing a bit with it) Actively developed HTTP Interface (+/-) Memcached Protocol (++) Going to Document-oriented (supports "tables") @crodas - http://crodas.org/ - L EX AT 25
    • Document-oriented DB @crodas - http://crodas.org/ - L EX AT 26
    • http://www.flickr.com/photos/beglen/152027605/ @crodas - http://crodas.org/ - L EX AT 27
    • What is a "Document"? <?php $collection[$id] = array( "title" => "PHP rules", "tags" => array("php", "web"), "body" => "... PHP rules ...", "comments" => array( array("author" => "crodas", "comment" => "Yes it does"), ) ); ?> @crodas - http://crodas.org/ - L EX AT 28
    • Docuement Databases Schema free Document versioning Improved Key-value store Great for storing objects @crodas - http://crodas.org/ - L EX AT 29
    • @crodas - http://crodas.org/ - L EX AT 30
    • CouchDB Apache project Asynchronous replication JSON-based (XML free!) RESTful interface (might be bad) Views are materialized on demand (not Indexes :-( ) Cool admin Safe IO (Append only) Distributed (concurrent) by nature (written in Erlang) @crodas - http://crodas.org/ - L EX AT 31
    • @crodas - http://crodas.org/ - L EX AT 32
    • @crodas - http://crodas.org/ - L EX AT 33
    • MongoDB Forgot about its name meaning in Portuguese. Fast, Fast, Fast JSON and BSON (Binary JSON-ish) Asynchronous replication, autosharding Support indexes (FTW!) Nested documents (FTW!) Advanced queries (FTW!) Native extension for PHP @crodas - http://crodas.org/ - L EX AT 34
    • MongoDB - Advanced Select • $gt, $lt, $gte, $lte, $eq, $neq: >, <, >=, <=, ==, != • $in, $nin • $size, $exists • group() • limit() • skip() • ... Update • $push • $pull • $inc • ... @crodas - http://crodas.org/ - L EX AT 35
    • pecl install mongo @crodas - http://crodas.org/ - L EX AT 36
    • MongoDB - Connection <?php /* connects to localhost:27017 */ $connection = new Mongo(); /* connect to a remote host (default port) */ $connection = new Mongo( "example.com" ); /* connect to a remote host at a given port */ $connection = new Mongo( "example.com:65432" ); /* select some DB (and create if it doesn't exits yet) */ $db = $connection->selectDB("db name"); ?> @crodas - http://crodas.org/ - L EX AT 37
    • MongoDB - "Tables" <?php $db = $connection->selectDB("db name"); $table = $db->getCollection("table"); ?> @crodas - http://crodas.org/ - L EX AT 38
    • FROM SQL to MongoDB @crodas - http://crodas.org/ - L EX AT 39
    • MongoDB - Count <?php /* SELECT count(*) FROM table */ $collection->count(); /* SELECT count(*) FROM table WHERE foo = 1 */ $collection->find(array("foo" => 1))->count(); ?> @crodas - http://crodas.org/ - L EX AT 40
    • MongoDB - Queries <?php /* * SELECT * FROM table WHERE field IN (5,6,7) and enable=1 * and worth < 5 * ORDER BY timestamp DESC */ $collection->ensureIndex( array('field'=>1, 'enable'=>1, 'worth'=>1, 'timestamp'=>-1) ); $filter = array( 'field' => array('$in' => array(5,6,7), 'enable' => 1, 'worth' => array('$lt' => 5) ); $results = $collection->find($filter)->sort(array('timestamp' => -1)); @crodas - http://crodas.org/ - L EX AT 41
    • MongoDB - Pagination <?php /* * SELECT * FROM table WHERE field IN (5,6,7) and enable=1 * and worth < 5 * ORDER BY timestamp DESC LIMIT $offset, 20 */ $filter = array( 'field' => array('$in' => array(5,6,7), 'enable' => 1, 'worth' => array('$lt' => 5) ); $cursor = $collection->find($filter); $cursor = $cursor->sort(array('timestamp' => -1))->skip($offset)->limit(20); foreach ($cursor as $result) { var dump($result); } @crodas - http://crodas.org/ - L EX AT 42
    • Thinking in documents @crodas - http://crodas.org/ - L EX AT 43
    • @crodas - http://crodas.org/ - L EX AT 44
    • MongoDB - Data structure <?php $post = array( "title" => "...", "body" => "...", "uri" => "...", "comments" => array( array( "email" => "...", "name" => "...", "comment" => "...", ), ), "tags" => array("tag1", "tag2"), ); /* Creating indexes (they're important) */ $collection->ensureIndex("uri"); $collection->ensureIndex("comments.email"); $collection->ensureIndex("tags"); @crodas - http://crodas.org/ - L EX AT 45
    • MongoDB - Data structure <?php /*** * - SELECT * FROM posts WHERE uri = <uri> * - SELECT tags.tag FROM post has tags * INNER JOIN tags ON (tags id == tags.id) WHERE post id = <post id> * - SELECT * FROM comments WHERE post = <post id> */ $result = $collection->find(array("uri" => "<uri>")); ?> @crodas - http://crodas.org/ - L EX AT 46
    • MongoDB <?php /*** * SELECT posts.* FROM posts INNER * JOIN comments ON (comments.post = posts.id) * WHERE comments.email = '<email>' * */ $filter = array( "comments.email" => 'crodas@member.fsf.org', ); $result = $collection->find($filter); ?> @crodas - http://crodas.org/ - L EX AT 47
    • MongoDB <?php /*** * SELECT * FROM posts * WHERE id IN (SELECT posts id FROM posts has tags * INNER JOIN tags ON (tags id == tags.id) WHERE tag = <tag>) * */ $filter = array( "tags" => '<tag>', ); $result = $collection->find($filter); ?> @crodas - http://crodas.org/ - L EX AT 48
    • MongoDB <?php /*** * SELECT * FROM posts WHERE id IN ( * SELECT post FROM comments GROUP * BY post HAVING count(*) > 10) */ $filter = array( "comments" => array('$size' => array('$gt' => 10)) ); $result = $collection->find($filter); ?> @crodas - http://crodas.org/ - L EX AT 49
    • MongoDB <?php /*** * SELECT * FROM posts WHERE 10 < ( * SELECT count(*) FROM comments * post = posts.id) */ /* on insert a comment */ $collection->update( array("uri" => "uri"), // select array('$inc' => array('comments size'=>1)) //increment ); $filter = array( "comments size" => array('$gt' => 10) ); $result = $collection->find($filter); @crodas - http://crodas.org/ - L EX AT 50
    • Map/Reduce Extra time @crodas - http://crodas.org/ - L EX AT 51
    • Map/Reduce -- Theory <?php for($i=0; $i < 50; $i++) { $result[$i] = pow($i, 2); } var dump($result); /*** * IF pow takes 1 second * 1 process = 50 seconds * 10 process = 5 seconds */ ?> @crodas - http://crodas.org/ - L EX AT 52
    • Map/Reduce -- Theory II <?php $data = range(1, 1000); /* MAP */ foreach ($data as $key => $value) { $n key = $value % 10; /* append */ $tmp[$n key][] = $value; } /* REDUCE */ foreach ($tmp as $key => $value) { $value = array sum($value); print "{$key} = {$value}n"; } @crodas - http://crodas.org/ - L EX AT 53
    • Questions? @crodas - http://crodas.org/ - L EX AT 54
    • Thank you fellows! @crodas - http://crodas.org/ - L EX AT 55
    • @crodas crodas.org @crodas - http://crodas.org/ - L EX AT 56
    • Powered by... @crodas - http://crodas.org/ - L EX AT 57