Thinking in documents

  • 3,317 views
Uploaded on

Introduction to NoSQL database in general, focusing on MongoDB

Introduction to NoSQL database in general, focusing on MongoDB

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,317
On Slideshare
0
From Embeds
0
Number of Embeds
3

Actions

Shares
Downloads
74
Comments
0
Likes
5

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Thinking in Documents (dropping ACID) César D. Rodas crodas@member.fsf.org http://crodas.org/ PHP Conference 2009 Sâo Paulo, Brasil 1
  • 2. Who is this fellow? Paraguayan Part of the Google Summer of Code 2008 PHP Classes Innovation Award winner 2007, 2008 ... and some other few things @crodas - http://crodas.org/ - L EX AT 2
  • 3. Agenda How to scale The Web's major bottleneck NoSQL databases • Redis • Tokyo Cabinet • Cassandra • CouchDB • MongoDB Thinking in documents • Data behavior • Complex operations PHP Integration (The fun part!) Map/Reduce (Extra time) @crodas - http://crodas.org/ - L EX AT 3
  • 4. Scaling? @crodas - http://crodas.org/ - L EX AT 4
  • 5. Increase computational power @crodas - http://crodas.org/ - L EX AT 5
  • 6. To make it reliable @crodas - http://crodas.org/ - L EX AT 6
  • 7. DISTRIBUTED @crodas - http://crodas.org/ - L EX AT 7
  • 8. How to scale Buying more hardware (and connectivity) Reverses (threaded) proxies DNS round robin for your Reverses proxies Gearmand Memcached and.. What about the data? @crodas - http://crodas.org/ - L EX AT 8
  • 9. How to scale data? @crodas - http://crodas.org/ - L EX AT 9
  • 10. The hardest way @crodas - http://crodas.org/ - L EX AT 10
  • 11. Scaling RDBMS - Solutions Master - Slave replication Multi-Master replication Data sharding DRDB and Heartbeat (RAID-1 over the network) @crodas - http://crodas.org/ - L EX AT 11
  • 12. @crodas - http://crodas.org/ - L EX AT 12
  • 13. Master-Slave replication We need to modify our app It worth only if our application is read intense It doesn't spread the data across servers Single point of failure @crodas - http://crodas.org/ - L EX AT 13
  • 14. Scaling RDBMS - Problems SQL JOIN Autoincrement Transactions (ACID) @crodas - http://crodas.org/ - L EX AT 14
  • 15. The easiest way @crodas - http://crodas.org/ - L EX AT 15
  • 16. Strong Consistency, High Availability, Partition-tolerance Theorem @crodas - http://crodas.org/ - L EX AT 16
  • 17. BASE Basically Available, Soft state, Eventually Consistent @crodas - http://crodas.org/ - L EX AT 17
  • 18. Everybody is doing it Google Amazon eBay Yahoo! Facebook ... @crodas - http://crodas.org/ - L EX AT 18
  • 19. Open implementations Cassandra Redis Tokyo Cabinet/Tyrant CouchDB MongoDB (FTW!) ... @crodas - http://crodas.org/ - L EX AT 19
  • 20. Cassandra No master (p2p) Storage model more like BigTable Open source Incremental scalable PHP interface (with Thrift) Never played too much with it. @crodas - http://crodas.org/ - L EX AT 20
  • 21. Key-value @crodas - http://crodas.org/ - L EX AT 21
  • 22. Key-value Fast Similar to PHP's array Simple Easy to distribute across machines @crodas - http://crodas.org/ - L EX AT 22
  • 23. Memcached It is a key-value store engine used as a cache. No persistence(RAM, uses LRU) Lightening fast Well supported *Everybody* is using it Several clients for PHP [even I had wrote one ;-)] @crodas - http://crodas.org/ - L EX AT 23
  • 24. Redis Very new As fast as Memcached Persistent to disk Very simple protocol Support lists and tuples Replication Operation in the key space I loved it! • Until I realised it is in-memory DB @crodas - http://crodas.org/ - L EX AT 24
  • 25. Tokyo Tyrant Very similar to BerkeleyDB ( dba open() ) Performs well (I've been playing a bit with it) Actively developed HTTP Interface (+/-) Memcached Protocol (++) Going to Document-oriented (supports "tables") @crodas - http://crodas.org/ - L EX AT 25
  • 26. Document-oriented DB @crodas - http://crodas.org/ - L EX AT 26
  • 27. http://www.flickr.com/photos/beglen/152027605/ @crodas - http://crodas.org/ - L EX AT 27
  • 28. What is a "Document"? <?php $collection[$id] = array( "title" => "PHP rules", "tags" => array("php", "web"), "body" => "... PHP rules ...", "comments" => array( array("author" => "crodas", "comment" => "Yes it does"), ) ); ?> @crodas - http://crodas.org/ - L EX AT 28
  • 29. Docuement Databases Schema free Document versioning Improved Key-value store Great for storing objects @crodas - http://crodas.org/ - L EX AT 29
  • 30. @crodas - http://crodas.org/ - L EX AT 30
  • 31. CouchDB Apache project Asynchronous replication JSON-based (XML free!) RESTful interface (might be bad) Views are materialized on demand (not Indexes :-( ) Cool admin Safe IO (Append only) Distributed (concurrent) by nature (written in Erlang) @crodas - http://crodas.org/ - L EX AT 31
  • 32. @crodas - http://crodas.org/ - L EX AT 32
  • 33. @crodas - http://crodas.org/ - L EX AT 33
  • 34. MongoDB Forgot about its name meaning in Portuguese. Fast, Fast, Fast JSON and BSON (Binary JSON-ish) Asynchronous replication, autosharding Support indexes (FTW!) Nested documents (FTW!) Advanced queries (FTW!) Native extension for PHP @crodas - http://crodas.org/ - L EX AT 34
  • 35. MongoDB - Advanced Select • $gt, $lt, $gte, $lte, $eq, $neq: >, <, >=, <=, ==, != • $in, $nin • $size, $exists • group() • limit() • skip() • ... Update • $push • $pull • $inc • ... @crodas - http://crodas.org/ - L EX AT 35
  • 36. pecl install mongo @crodas - http://crodas.org/ - L EX AT 36
  • 37. MongoDB - Connection <?php /* connects to localhost:27017 */ $connection = new Mongo(); /* connect to a remote host (default port) */ $connection = new Mongo( "example.com" ); /* connect to a remote host at a given port */ $connection = new Mongo( "example.com:65432" ); /* select some DB (and create if it doesn't exits yet) */ $db = $connection->selectDB("db name"); ?> @crodas - http://crodas.org/ - L EX AT 37
  • 38. MongoDB - "Tables" <?php $db = $connection->selectDB("db name"); $table = $db->getCollection("table"); ?> @crodas - http://crodas.org/ - L EX AT 38
  • 39. FROM SQL to MongoDB @crodas - http://crodas.org/ - L EX AT 39
  • 40. MongoDB - Count <?php /* SELECT count(*) FROM table */ $collection->count(); /* SELECT count(*) FROM table WHERE foo = 1 */ $collection->find(array("foo" => 1))->count(); ?> @crodas - http://crodas.org/ - L EX AT 40
  • 41. MongoDB - Queries <?php /* * SELECT * FROM table WHERE field IN (5,6,7) and enable=1 * and worth < 5 * ORDER BY timestamp DESC */ $collection->ensureIndex( array('field'=>1, 'enable'=>1, 'worth'=>1, 'timestamp'=>-1) ); $filter = array( 'field' => array('$in' => array(5,6,7), 'enable' => 1, 'worth' => array('$lt' => 5) ); $results = $collection->find($filter)->sort(array('timestamp' => -1)); @crodas - http://crodas.org/ - L EX AT 41
  • 42. MongoDB - Pagination <?php /* * SELECT * FROM table WHERE field IN (5,6,7) and enable=1 * and worth < 5 * ORDER BY timestamp DESC LIMIT $offset, 20 */ $filter = array( 'field' => array('$in' => array(5,6,7), 'enable' => 1, 'worth' => array('$lt' => 5) ); $cursor = $collection->find($filter); $cursor = $cursor->sort(array('timestamp' => -1))->skip($offset)->limit(20); foreach ($cursor as $result) { var dump($result); } @crodas - http://crodas.org/ - L EX AT 42
  • 43. Thinking in documents @crodas - http://crodas.org/ - L EX AT 43
  • 44. @crodas - http://crodas.org/ - L EX AT 44
  • 45. MongoDB - Data structure <?php $post = array( "title" => "...", "body" => "...", "uri" => "...", "comments" => array( array( "email" => "...", "name" => "...", "comment" => "...", ), ), "tags" => array("tag1", "tag2"), ); /* Creating indexes (they're important) */ $collection->ensureIndex("uri"); $collection->ensureIndex("comments.email"); $collection->ensureIndex("tags"); @crodas - http://crodas.org/ - L EX AT 45
  • 46. MongoDB - Data structure <?php /*** * - SELECT * FROM posts WHERE uri = <uri> * - SELECT tags.tag FROM post has tags * INNER JOIN tags ON (tags id == tags.id) WHERE post id = <post id> * - SELECT * FROM comments WHERE post = <post id> */ $result = $collection->find(array("uri" => "<uri>")); ?> @crodas - http://crodas.org/ - L EX AT 46
  • 47. MongoDB <?php /*** * SELECT posts.* FROM posts INNER * JOIN comments ON (comments.post = posts.id) * WHERE comments.email = '<email>' * */ $filter = array( "comments.email" => 'crodas@member.fsf.org', ); $result = $collection->find($filter); ?> @crodas - http://crodas.org/ - L EX AT 47
  • 48. MongoDB <?php /*** * SELECT * FROM posts * WHERE id IN (SELECT posts id FROM posts has tags * INNER JOIN tags ON (tags id == tags.id) WHERE tag = <tag>) * */ $filter = array( "tags" => '<tag>', ); $result = $collection->find($filter); ?> @crodas - http://crodas.org/ - L EX AT 48
  • 49. MongoDB <?php /*** * SELECT * FROM posts WHERE id IN ( * SELECT post FROM comments GROUP * BY post HAVING count(*) > 10) */ $filter = array( "comments" => array('$size' => array('$gt' => 10)) ); $result = $collection->find($filter); ?> @crodas - http://crodas.org/ - L EX AT 49
  • 50. MongoDB <?php /*** * SELECT * FROM posts WHERE 10 < ( * SELECT count(*) FROM comments * post = posts.id) */ /* on insert a comment */ $collection->update( array("uri" => "uri"), // select array('$inc' => array('comments size'=>1)) //increment ); $filter = array( "comments size" => array('$gt' => 10) ); $result = $collection->find($filter); @crodas - http://crodas.org/ - L EX AT 50
  • 51. Map/Reduce Extra time @crodas - http://crodas.org/ - L EX AT 51
  • 52. Map/Reduce -- Theory <?php for($i=0; $i < 50; $i++) { $result[$i] = pow($i, 2); } var dump($result); /*** * IF pow takes 1 second * 1 process = 50 seconds * 10 process = 5 seconds */ ?> @crodas - http://crodas.org/ - L EX AT 52
  • 53. Map/Reduce -- Theory II <?php $data = range(1, 1000); /* MAP */ foreach ($data as $key => $value) { $n key = $value % 10; /* append */ $tmp[$n key][] = $value; } /* REDUCE */ foreach ($tmp as $key => $value) { $value = array sum($value); print "{$key} = {$value}n"; } @crodas - http://crodas.org/ - L EX AT 53
  • 54. Questions? @crodas - http://crodas.org/ - L EX AT 54
  • 55. Thank you fellows! @crodas - http://crodas.org/ - L EX AT 55
  • 56. @crodas crodas.org @crodas - http://crodas.org/ - L EX AT 56
  • 57. Powered by... @crodas - http://crodas.org/ - L EX AT 57