Your SlideShare is downloading. ×
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Thinking in documents
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Thinking in documents

3,427

Published on

Introduction to NoSQL database in general, focusing on MongoDB

Introduction to NoSQL database in general, focusing on MongoDB

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,427
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
74
Comments
0
Likes
5
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Thinking in Documents (dropping ACID) César D. Rodas crodas@member.fsf.org http://crodas.org/ PHP Conference 2009 Sâo Paulo, Brasil 1
  • 2. Who is this fellow? Paraguayan Part of the Google Summer of Code 2008 PHP Classes Innovation Award winner 2007, 2008 ... and some other few things @crodas - http://crodas.org/ - L EX AT 2
  • 3. Agenda How to scale The Web's major bottleneck NoSQL databases • Redis • Tokyo Cabinet • Cassandra • CouchDB • MongoDB Thinking in documents • Data behavior • Complex operations PHP Integration (The fun part!) Map/Reduce (Extra time) @crodas - http://crodas.org/ - L EX AT 3
  • 4. Scaling? @crodas - http://crodas.org/ - L EX AT 4
  • 5. Increase computational power @crodas - http://crodas.org/ - L EX AT 5
  • 6. To make it reliable @crodas - http://crodas.org/ - L EX AT 6
  • 7. DISTRIBUTED @crodas - http://crodas.org/ - L EX AT 7
  • 8. How to scale Buying more hardware (and connectivity) Reverses (threaded) proxies DNS round robin for your Reverses proxies Gearmand Memcached and.. What about the data? @crodas - http://crodas.org/ - L EX AT 8
  • 9. How to scale data? @crodas - http://crodas.org/ - L EX AT 9
  • 10. The hardest way @crodas - http://crodas.org/ - L EX AT 10
  • 11. Scaling RDBMS - Solutions Master - Slave replication Multi-Master replication Data sharding DRDB and Heartbeat (RAID-1 over the network) @crodas - http://crodas.org/ - L EX AT 11
  • 12. @crodas - http://crodas.org/ - L EX AT 12
  • 13. Master-Slave replication We need to modify our app It worth only if our application is read intense It doesn't spread the data across servers Single point of failure @crodas - http://crodas.org/ - L EX AT 13
  • 14. Scaling RDBMS - Problems SQL JOIN Autoincrement Transactions (ACID) @crodas - http://crodas.org/ - L EX AT 14
  • 15. The easiest way @crodas - http://crodas.org/ - L EX AT 15
  • 16. Strong Consistency, High Availability, Partition-tolerance Theorem @crodas - http://crodas.org/ - L EX AT 16
  • 17. BASE Basically Available, Soft state, Eventually Consistent @crodas - http://crodas.org/ - L EX AT 17
  • 18. Everybody is doing it Google Amazon eBay Yahoo! Facebook ... @crodas - http://crodas.org/ - L EX AT 18
  • 19. Open implementations Cassandra Redis Tokyo Cabinet/Tyrant CouchDB MongoDB (FTW!) ... @crodas - http://crodas.org/ - L EX AT 19
  • 20. Cassandra No master (p2p) Storage model more like BigTable Open source Incremental scalable PHP interface (with Thrift) Never played too much with it. @crodas - http://crodas.org/ - L EX AT 20
  • 21. Key-value @crodas - http://crodas.org/ - L EX AT 21
  • 22. Key-value Fast Similar to PHP's array Simple Easy to distribute across machines @crodas - http://crodas.org/ - L EX AT 22
  • 23. Memcached It is a key-value store engine used as a cache. No persistence(RAM, uses LRU) Lightening fast Well supported *Everybody* is using it Several clients for PHP [even I had wrote one ;-)] @crodas - http://crodas.org/ - L EX AT 23
  • 24. Redis Very new As fast as Memcached Persistent to disk Very simple protocol Support lists and tuples Replication Operation in the key space I loved it! • Until I realised it is in-memory DB @crodas - http://crodas.org/ - L EX AT 24
  • 25. Tokyo Tyrant Very similar to BerkeleyDB ( dba open() ) Performs well (I've been playing a bit with it) Actively developed HTTP Interface (+/-) Memcached Protocol (++) Going to Document-oriented (supports "tables") @crodas - http://crodas.org/ - L EX AT 25
  • 26. Document-oriented DB @crodas - http://crodas.org/ - L EX AT 26
  • 27. http://www.flickr.com/photos/beglen/152027605/ @crodas - http://crodas.org/ - L EX AT 27
  • 28. What is a "Document"? <?php $collection[$id] = array( "title" => "PHP rules", "tags" => array("php", "web"), "body" => "... PHP rules ...", "comments" => array( array("author" => "crodas", "comment" => "Yes it does"), ) ); ?> @crodas - http://crodas.org/ - L EX AT 28
  • 29. Docuement Databases Schema free Document versioning Improved Key-value store Great for storing objects @crodas - http://crodas.org/ - L EX AT 29
  • 30. @crodas - http://crodas.org/ - L EX AT 30
  • 31. CouchDB Apache project Asynchronous replication JSON-based (XML free!) RESTful interface (might be bad) Views are materialized on demand (not Indexes :-( ) Cool admin Safe IO (Append only) Distributed (concurrent) by nature (written in Erlang) @crodas - http://crodas.org/ - L EX AT 31
  • 32. @crodas - http://crodas.org/ - L EX AT 32
  • 33. @crodas - http://crodas.org/ - L EX AT 33
  • 34. MongoDB Forgot about its name meaning in Portuguese. Fast, Fast, Fast JSON and BSON (Binary JSON-ish) Asynchronous replication, autosharding Support indexes (FTW!) Nested documents (FTW!) Advanced queries (FTW!) Native extension for PHP @crodas - http://crodas.org/ - L EX AT 34
  • 35. MongoDB - Advanced Select • $gt, $lt, $gte, $lte, $eq, $neq: >, <, >=, <=, ==, != • $in, $nin • $size, $exists • group() • limit() • skip() • ... Update • $push • $pull • $inc • ... @crodas - http://crodas.org/ - L EX AT 35
  • 36. pecl install mongo @crodas - http://crodas.org/ - L EX AT 36
  • 37. MongoDB - Connection <?php /* connects to localhost:27017 */ $connection = new Mongo(); /* connect to a remote host (default port) */ $connection = new Mongo( "example.com" ); /* connect to a remote host at a given port */ $connection = new Mongo( "example.com:65432" ); /* select some DB (and create if it doesn't exits yet) */ $db = $connection->selectDB("db name"); ?> @crodas - http://crodas.org/ - L EX AT 37
  • 38. MongoDB - "Tables" <?php $db = $connection->selectDB("db name"); $table = $db->getCollection("table"); ?> @crodas - http://crodas.org/ - L EX AT 38
  • 39. FROM SQL to MongoDB @crodas - http://crodas.org/ - L EX AT 39
  • 40. MongoDB - Count <?php /* SELECT count(*) FROM table */ $collection->count(); /* SELECT count(*) FROM table WHERE foo = 1 */ $collection->find(array("foo" => 1))->count(); ?> @crodas - http://crodas.org/ - L EX AT 40
  • 41. MongoDB - Queries <?php /* * SELECT * FROM table WHERE field IN (5,6,7) and enable=1 * and worth < 5 * ORDER BY timestamp DESC */ $collection->ensureIndex( array('field'=>1, 'enable'=>1, 'worth'=>1, 'timestamp'=>-1) ); $filter = array( 'field' => array('$in' => array(5,6,7), 'enable' => 1, 'worth' => array('$lt' => 5) ); $results = $collection->find($filter)->sort(array('timestamp' => -1)); @crodas - http://crodas.org/ - L EX AT 41
  • 42. MongoDB - Pagination <?php /* * SELECT * FROM table WHERE field IN (5,6,7) and enable=1 * and worth < 5 * ORDER BY timestamp DESC LIMIT $offset, 20 */ $filter = array( 'field' => array('$in' => array(5,6,7), 'enable' => 1, 'worth' => array('$lt' => 5) ); $cursor = $collection->find($filter); $cursor = $cursor->sort(array('timestamp' => -1))->skip($offset)->limit(20); foreach ($cursor as $result) { var dump($result); } @crodas - http://crodas.org/ - L EX AT 42
  • 43. Thinking in documents @crodas - http://crodas.org/ - L EX AT 43
  • 44. @crodas - http://crodas.org/ - L EX AT 44
  • 45. MongoDB - Data structure <?php $post = array( "title" => "...", "body" => "...", "uri" => "...", "comments" => array( array( "email" => "...", "name" => "...", "comment" => "...", ), ), "tags" => array("tag1", "tag2"), ); /* Creating indexes (they're important) */ $collection->ensureIndex("uri"); $collection->ensureIndex("comments.email"); $collection->ensureIndex("tags"); @crodas - http://crodas.org/ - L EX AT 45
  • 46. MongoDB - Data structure <?php /*** * - SELECT * FROM posts WHERE uri = <uri> * - SELECT tags.tag FROM post has tags * INNER JOIN tags ON (tags id == tags.id) WHERE post id = <post id> * - SELECT * FROM comments WHERE post = <post id> */ $result = $collection->find(array("uri" => "<uri>")); ?> @crodas - http://crodas.org/ - L EX AT 46
  • 47. MongoDB <?php /*** * SELECT posts.* FROM posts INNER * JOIN comments ON (comments.post = posts.id) * WHERE comments.email = '<email>' * */ $filter = array( "comments.email" => 'crodas@member.fsf.org', ); $result = $collection->find($filter); ?> @crodas - http://crodas.org/ - L EX AT 47
  • 48. MongoDB <?php /*** * SELECT * FROM posts * WHERE id IN (SELECT posts id FROM posts has tags * INNER JOIN tags ON (tags id == tags.id) WHERE tag = <tag>) * */ $filter = array( "tags" => '<tag>', ); $result = $collection->find($filter); ?> @crodas - http://crodas.org/ - L EX AT 48
  • 49. MongoDB <?php /*** * SELECT * FROM posts WHERE id IN ( * SELECT post FROM comments GROUP * BY post HAVING count(*) > 10) */ $filter = array( "comments" => array('$size' => array('$gt' => 10)) ); $result = $collection->find($filter); ?> @crodas - http://crodas.org/ - L EX AT 49
  • 50. MongoDB <?php /*** * SELECT * FROM posts WHERE 10 < ( * SELECT count(*) FROM comments * post = posts.id) */ /* on insert a comment */ $collection->update( array("uri" => "uri"), // select array('$inc' => array('comments size'=>1)) //increment ); $filter = array( "comments size" => array('$gt' => 10) ); $result = $collection->find($filter); @crodas - http://crodas.org/ - L EX AT 50
  • 51. Map/Reduce Extra time @crodas - http://crodas.org/ - L EX AT 51
  • 52. Map/Reduce -- Theory <?php for($i=0; $i < 50; $i++) { $result[$i] = pow($i, 2); } var dump($result); /*** * IF pow takes 1 second * 1 process = 50 seconds * 10 process = 5 seconds */ ?> @crodas - http://crodas.org/ - L EX AT 52
  • 53. Map/Reduce -- Theory II <?php $data = range(1, 1000); /* MAP */ foreach ($data as $key => $value) { $n key = $value % 10; /* append */ $tmp[$n key][] = $value; } /* REDUCE */ foreach ($tmp as $key => $value) { $value = array sum($value); print "{$key} = {$value}n"; } @crodas - http://crodas.org/ - L EX AT 53
  • 54. Questions? @crodas - http://crodas.org/ - L EX AT 54
  • 55. Thank you fellows! @crodas - http://crodas.org/ - L EX AT 55
  • 56. @crodas crodas.org @crodas - http://crodas.org/ - L EX AT 56
  • 57. Powered by... @crodas - http://crodas.org/ - L EX AT 57

×