0
Thinking in Documents
   (dropping ACID)


        César D. Rodas
      crodas@member.fsf.org
       http://crodas.org/


...
Who is this fellow?
         Paraguayan
         Part of the Google Summer of Code 2008
         PHP Classes Innovation Aw...
Agenda
         How to scale
         The Web's major bottleneck
         NoSQL databases
              • Redis
          ...
Scaling?




@crodas - http://crodas.org/ - L EX
                               AT                4
Increase computational
                      power



@crodas - http://crodas.org/ - L EX
                               A...
To make it reliable




@crodas - http://crodas.org/ - L EX
                               AT             6
DISTRIBUTED




@crodas - http://crodas.org/ - L EX
                               AT                   7
How to scale
         Buying more hardware (and connectivity)
         Reverses (threaded) proxies
         DNS round robi...
How to scale data?




@crodas - http://crodas.org/ - L EX
                               AT             9
The hardest way




@crodas - http://crodas.org/ - L EX
                               AT               10
Scaling RDBMS - Solutions
         Master - Slave replication
         Multi-Master replication
         Data sharding
   ...
@crodas - http://crodas.org/ - L EX
                               AT     12
Master-Slave replication
         We need to modify our app
         It worth only if our application is read intense
    ...
Scaling RDBMS - Problems
         SQL
         JOIN
         Autoincrement
         Transactions (ACID)




@crodas - http...
The easiest way




@crodas - http://crodas.org/ - L EX
                               AT                15
Strong       Consistency, High Availability, Partition-tolerance
                                  Theorem



@crodas - ht...
BASE
                    Basically Available, Soft state, Eventually Consistent




@crodas - http://crodas.org/ - L EX
  ...
Everybody is doing it
         Google
         Amazon
         eBay
         Yahoo!
         Facebook
         ...




@cr...
Open implementations
         Cassandra
         Redis
         Tokyo Cabinet/Tyrant
         CouchDB
         MongoDB (FT...
Cassandra
         No master (p2p)
         Storage model more like BigTable
         Open source
         Incremental sca...
Key-value




@crodas - http://crodas.org/ - L EX
                               AT                 21
Key-value
         Fast
         Similar to PHP's array
         Simple
         Easy to distribute across machines




@c...
Memcached
         It is a key-value store engine used as a cache.
         No persistence(RAM, uses LRU)
         Lighten...
Redis
         Very new
         As fast as Memcached
         Persistent to disk
         Very simple protocol
         S...
Tokyo Tyrant
         Very similar to BerkeleyDB ( dba open() )
         Performs well (I've been playing a bit with it)
 ...
Document-oriented DB




@crodas - http://crodas.org/ - L EX
                               AT      26
http://www.flickr.com/photos/beglen/152027605/


@crodas - http://crodas.org/ - L EX
                               AT    ...
What is a "Document"?

<?php
$collection[$id] = array(
   "title" => "PHP rules",
   "tags" => array("php", "web"),
   "bo...
Docuement Databases
         Schema free
         Document versioning
         Improved Key-value store
         Great for...
@crodas - http://crodas.org/ - L EX
                               AT     30
CouchDB
         Apache project
         Asynchronous replication
         JSON-based (XML free!)
         RESTful interfa...
@crodas - http://crodas.org/ - L EX
                               AT     32
@crodas - http://crodas.org/ - L EX
                               AT     33
MongoDB
         Forgot about its name meaning in Portuguese.
         Fast, Fast, Fast
         JSON and BSON (Binary JSO...
MongoDB - Advanced
         Select
              • $gt, $lt, $gte, $lte, $eq, $neq: >, <, >=, <=, ==, !=
              • $...
pecl install mongo



@crodas - http://crodas.org/ - L EX
                               AT              36
MongoDB - Connection

<?php

/* connects to localhost:27017 */
$connection = new Mongo();

/* connect to a remote host (de...
MongoDB - "Tables"

<?php

$db = $connection->selectDB("db name");
$table = $db->getCollection("table");

?>




@crodas -...
FROM SQL to MongoDB




@crodas - http://crodas.org/ - L EX
                               AT      39
MongoDB - Count

<?php
/* SELECT count(*) FROM table */
$collection->count();

/* SELECT count(*) FROM table WHERE foo = 1...
MongoDB - Queries
<?php
/*
 * SELECT * FROM table WHERE field IN (5,6,7) and enable=1
 * and worth < 5
 * ORDER BY timesta...
MongoDB - Pagination
<?php
/*
 * SELECT * FROM table WHERE field IN (5,6,7) and enable=1
 * and worth < 5
 * ORDER BY time...
Thinking in documents




@crodas - http://crodas.org/ - L EX
                               AT       43
@crodas - http://crodas.org/ - L EX
                               AT     44
MongoDB - Data structure
<?php
$post = array(
   "title" => "...",
   "body" => "...",
   "uri" => "...",
   "comments" =>...
MongoDB - Data structure
<?php
/***
 * - SELECT * FROM posts WHERE uri = <uri>
 * - SELECT tags.tag FROM post has tags
 * ...
MongoDB
<?php
/***
 * SELECT posts.* FROM posts INNER
 * JOIN comments ON (comments.post = posts.id)
 * WHERE comments.ema...
MongoDB
<?php
/***
 * SELECT * FROM posts
 * WHERE id IN (SELECT posts id FROM posts has tags
 * INNER JOIN tags ON (tags ...
MongoDB
<?php
/***
 * SELECT * FROM posts WHERE id IN (
 * SELECT post FROM comments GROUP
 * BY post HAVING count(*) > 10...
MongoDB
<?php
/***
 * SELECT * FROM posts WHERE 10 < (
 * SELECT count(*) FROM comments
 * post = posts.id)
 */
/* on inse...
Map/Reduce
                                         Extra time




@crodas - http://crodas.org/ - L EX
                   ...
Map/Reduce -- Theory
<?php

for($i=0; $i < 50; $i++) {
   $result[$i] = pow($i, 2);
}

var dump($result);

/***
 * IF pow ...
Map/Reduce -- Theory II
<?php

$data = range(1, 1000);

/* MAP */
foreach ($data as $key => $value) {
   $n key = $value %...
Questions?




@crodas - http://crodas.org/ - L EX
                               AT                  54
Thank you fellows!




@crodas - http://crodas.org/ - L EX
                               AT             55
@crodas

                                      crodas.org



@crodas - http://crodas.org/ - L EX
                         ...
Powered by...




@crodas - http://crodas.org/ - L EX
                               AT                     57
Upcoming SlideShare
Loading in...5
×

Thinking in documents

3,444

Published on

Introduction to NoSQL database in general, focusing on MongoDB

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,444
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
74
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Transcript of "Thinking in documents"

  1. 1. Thinking in Documents (dropping ACID) César D. Rodas crodas@member.fsf.org http://crodas.org/ PHP Conference 2009 Sâo Paulo, Brasil 1
  2. 2. Who is this fellow? Paraguayan Part of the Google Summer of Code 2008 PHP Classes Innovation Award winner 2007, 2008 ... and some other few things @crodas - http://crodas.org/ - L EX AT 2
  3. 3. Agenda How to scale The Web's major bottleneck NoSQL databases • Redis • Tokyo Cabinet • Cassandra • CouchDB • MongoDB Thinking in documents • Data behavior • Complex operations PHP Integration (The fun part!) Map/Reduce (Extra time) @crodas - http://crodas.org/ - L EX AT 3
  4. 4. Scaling? @crodas - http://crodas.org/ - L EX AT 4
  5. 5. Increase computational power @crodas - http://crodas.org/ - L EX AT 5
  6. 6. To make it reliable @crodas - http://crodas.org/ - L EX AT 6
  7. 7. DISTRIBUTED @crodas - http://crodas.org/ - L EX AT 7
  8. 8. How to scale Buying more hardware (and connectivity) Reverses (threaded) proxies DNS round robin for your Reverses proxies Gearmand Memcached and.. What about the data? @crodas - http://crodas.org/ - L EX AT 8
  9. 9. How to scale data? @crodas - http://crodas.org/ - L EX AT 9
  10. 10. The hardest way @crodas - http://crodas.org/ - L EX AT 10
  11. 11. Scaling RDBMS - Solutions Master - Slave replication Multi-Master replication Data sharding DRDB and Heartbeat (RAID-1 over the network) @crodas - http://crodas.org/ - L EX AT 11
  12. 12. @crodas - http://crodas.org/ - L EX AT 12
  13. 13. Master-Slave replication We need to modify our app It worth only if our application is read intense It doesn't spread the data across servers Single point of failure @crodas - http://crodas.org/ - L EX AT 13
  14. 14. Scaling RDBMS - Problems SQL JOIN Autoincrement Transactions (ACID) @crodas - http://crodas.org/ - L EX AT 14
  15. 15. The easiest way @crodas - http://crodas.org/ - L EX AT 15
  16. 16. Strong Consistency, High Availability, Partition-tolerance Theorem @crodas - http://crodas.org/ - L EX AT 16
  17. 17. BASE Basically Available, Soft state, Eventually Consistent @crodas - http://crodas.org/ - L EX AT 17
  18. 18. Everybody is doing it Google Amazon eBay Yahoo! Facebook ... @crodas - http://crodas.org/ - L EX AT 18
  19. 19. Open implementations Cassandra Redis Tokyo Cabinet/Tyrant CouchDB MongoDB (FTW!) ... @crodas - http://crodas.org/ - L EX AT 19
  20. 20. Cassandra No master (p2p) Storage model more like BigTable Open source Incremental scalable PHP interface (with Thrift) Never played too much with it. @crodas - http://crodas.org/ - L EX AT 20
  21. 21. Key-value @crodas - http://crodas.org/ - L EX AT 21
  22. 22. Key-value Fast Similar to PHP's array Simple Easy to distribute across machines @crodas - http://crodas.org/ - L EX AT 22
  23. 23. Memcached It is a key-value store engine used as a cache. No persistence(RAM, uses LRU) Lightening fast Well supported *Everybody* is using it Several clients for PHP [even I had wrote one ;-)] @crodas - http://crodas.org/ - L EX AT 23
  24. 24. Redis Very new As fast as Memcached Persistent to disk Very simple protocol Support lists and tuples Replication Operation in the key space I loved it! • Until I realised it is in-memory DB @crodas - http://crodas.org/ - L EX AT 24
  25. 25. Tokyo Tyrant Very similar to BerkeleyDB ( dba open() ) Performs well (I've been playing a bit with it) Actively developed HTTP Interface (+/-) Memcached Protocol (++) Going to Document-oriented (supports "tables") @crodas - http://crodas.org/ - L EX AT 25
  26. 26. Document-oriented DB @crodas - http://crodas.org/ - L EX AT 26
  27. 27. http://www.flickr.com/photos/beglen/152027605/ @crodas - http://crodas.org/ - L EX AT 27
  28. 28. What is a "Document"? <?php $collection[$id] = array( "title" => "PHP rules", "tags" => array("php", "web"), "body" => "... PHP rules ...", "comments" => array( array("author" => "crodas", "comment" => "Yes it does"), ) ); ?> @crodas - http://crodas.org/ - L EX AT 28
  29. 29. Docuement Databases Schema free Document versioning Improved Key-value store Great for storing objects @crodas - http://crodas.org/ - L EX AT 29
  30. 30. @crodas - http://crodas.org/ - L EX AT 30
  31. 31. CouchDB Apache project Asynchronous replication JSON-based (XML free!) RESTful interface (might be bad) Views are materialized on demand (not Indexes :-( ) Cool admin Safe IO (Append only) Distributed (concurrent) by nature (written in Erlang) @crodas - http://crodas.org/ - L EX AT 31
  32. 32. @crodas - http://crodas.org/ - L EX AT 32
  33. 33. @crodas - http://crodas.org/ - L EX AT 33
  34. 34. MongoDB Forgot about its name meaning in Portuguese. Fast, Fast, Fast JSON and BSON (Binary JSON-ish) Asynchronous replication, autosharding Support indexes (FTW!) Nested documents (FTW!) Advanced queries (FTW!) Native extension for PHP @crodas - http://crodas.org/ - L EX AT 34
  35. 35. MongoDB - Advanced Select • $gt, $lt, $gte, $lte, $eq, $neq: >, <, >=, <=, ==, != • $in, $nin • $size, $exists • group() • limit() • skip() • ... Update • $push • $pull • $inc • ... @crodas - http://crodas.org/ - L EX AT 35
  36. 36. pecl install mongo @crodas - http://crodas.org/ - L EX AT 36
  37. 37. MongoDB - Connection <?php /* connects to localhost:27017 */ $connection = new Mongo(); /* connect to a remote host (default port) */ $connection = new Mongo( "example.com" ); /* connect to a remote host at a given port */ $connection = new Mongo( "example.com:65432" ); /* select some DB (and create if it doesn't exits yet) */ $db = $connection->selectDB("db name"); ?> @crodas - http://crodas.org/ - L EX AT 37
  38. 38. MongoDB - "Tables" <?php $db = $connection->selectDB("db name"); $table = $db->getCollection("table"); ?> @crodas - http://crodas.org/ - L EX AT 38
  39. 39. FROM SQL to MongoDB @crodas - http://crodas.org/ - L EX AT 39
  40. 40. MongoDB - Count <?php /* SELECT count(*) FROM table */ $collection->count(); /* SELECT count(*) FROM table WHERE foo = 1 */ $collection->find(array("foo" => 1))->count(); ?> @crodas - http://crodas.org/ - L EX AT 40
  41. 41. MongoDB - Queries <?php /* * SELECT * FROM table WHERE field IN (5,6,7) and enable=1 * and worth < 5 * ORDER BY timestamp DESC */ $collection->ensureIndex( array('field'=>1, 'enable'=>1, 'worth'=>1, 'timestamp'=>-1) ); $filter = array( 'field' => array('$in' => array(5,6,7), 'enable' => 1, 'worth' => array('$lt' => 5) ); $results = $collection->find($filter)->sort(array('timestamp' => -1)); @crodas - http://crodas.org/ - L EX AT 41
  42. 42. MongoDB - Pagination <?php /* * SELECT * FROM table WHERE field IN (5,6,7) and enable=1 * and worth < 5 * ORDER BY timestamp DESC LIMIT $offset, 20 */ $filter = array( 'field' => array('$in' => array(5,6,7), 'enable' => 1, 'worth' => array('$lt' => 5) ); $cursor = $collection->find($filter); $cursor = $cursor->sort(array('timestamp' => -1))->skip($offset)->limit(20); foreach ($cursor as $result) { var dump($result); } @crodas - http://crodas.org/ - L EX AT 42
  43. 43. Thinking in documents @crodas - http://crodas.org/ - L EX AT 43
  44. 44. @crodas - http://crodas.org/ - L EX AT 44
  45. 45. MongoDB - Data structure <?php $post = array( "title" => "...", "body" => "...", "uri" => "...", "comments" => array( array( "email" => "...", "name" => "...", "comment" => "...", ), ), "tags" => array("tag1", "tag2"), ); /* Creating indexes (they're important) */ $collection->ensureIndex("uri"); $collection->ensureIndex("comments.email"); $collection->ensureIndex("tags"); @crodas - http://crodas.org/ - L EX AT 45
  46. 46. MongoDB - Data structure <?php /*** * - SELECT * FROM posts WHERE uri = <uri> * - SELECT tags.tag FROM post has tags * INNER JOIN tags ON (tags id == tags.id) WHERE post id = <post id> * - SELECT * FROM comments WHERE post = <post id> */ $result = $collection->find(array("uri" => "<uri>")); ?> @crodas - http://crodas.org/ - L EX AT 46
  47. 47. MongoDB <?php /*** * SELECT posts.* FROM posts INNER * JOIN comments ON (comments.post = posts.id) * WHERE comments.email = '<email>' * */ $filter = array( "comments.email" => 'crodas@member.fsf.org', ); $result = $collection->find($filter); ?> @crodas - http://crodas.org/ - L EX AT 47
  48. 48. MongoDB <?php /*** * SELECT * FROM posts * WHERE id IN (SELECT posts id FROM posts has tags * INNER JOIN tags ON (tags id == tags.id) WHERE tag = <tag>) * */ $filter = array( "tags" => '<tag>', ); $result = $collection->find($filter); ?> @crodas - http://crodas.org/ - L EX AT 48
  49. 49. MongoDB <?php /*** * SELECT * FROM posts WHERE id IN ( * SELECT post FROM comments GROUP * BY post HAVING count(*) > 10) */ $filter = array( "comments" => array('$size' => array('$gt' => 10)) ); $result = $collection->find($filter); ?> @crodas - http://crodas.org/ - L EX AT 49
  50. 50. MongoDB <?php /*** * SELECT * FROM posts WHERE 10 < ( * SELECT count(*) FROM comments * post = posts.id) */ /* on insert a comment */ $collection->update( array("uri" => "uri"), // select array('$inc' => array('comments size'=>1)) //increment ); $filter = array( "comments size" => array('$gt' => 10) ); $result = $collection->find($filter); @crodas - http://crodas.org/ - L EX AT 50
  51. 51. Map/Reduce Extra time @crodas - http://crodas.org/ - L EX AT 51
  52. 52. Map/Reduce -- Theory <?php for($i=0; $i < 50; $i++) { $result[$i] = pow($i, 2); } var dump($result); /*** * IF pow takes 1 second * 1 process = 50 seconds * 10 process = 5 seconds */ ?> @crodas - http://crodas.org/ - L EX AT 52
  53. 53. Map/Reduce -- Theory II <?php $data = range(1, 1000); /* MAP */ foreach ($data as $key => $value) { $n key = $value % 10; /* append */ $tmp[$n key][] = $value; } /* REDUCE */ foreach ($tmp as $key => $value) { $value = array sum($value); print "{$key} = {$value}n"; } @crodas - http://crodas.org/ - L EX AT 53
  54. 54. Questions? @crodas - http://crodas.org/ - L EX AT 54
  55. 55. Thank you fellows! @crodas - http://crodas.org/ - L EX AT 55
  56. 56. @crodas crodas.org @crodas - http://crodas.org/ - L EX AT 56
  57. 57. Powered by... @crodas - http://crodas.org/ - L EX AT 57
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×