CouchConf-SF-Introduction-to-MapReduce

1,371 views

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,371
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
39
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

CouchConf-SF-Introduction-to-MapReduce

  1. 1. 1Wednesday, August 3, 11
  2. 2. Introduction to MapReduce 2Wednesday, August 3, 11
  3. 3. A View Is: A named pair of functions: a map function a reduce function (optional) An entry in a design document A disk file of indexed results 3Wednesday, August 3, 11
  4. 4. A Map Function Is: Called with every database document An emitter of a key and a value 4Wednesday, August 3, 11
  5. 5. A Reduce Function Is: Called once with the map results A simplifier (it reduces map output) 5Wednesday, August 3, 11
  6. 6. Example "President" Document (1 of 44) { "_id":"5d5f25254ef8fd62d6b9f2db642a8fc2", "_rev": "1-157b2928bec2def71485cc751af7de37", "type":"president", "presidency":1, "name":"George Washington", "wikipedia_entry":"http://en.wikipedia.org/wiki/George_Washington", "took_office":1789, "left_office":1797, "party":"Independent", "home_state":"Virginia" } 6Wednesday, August 3, 11
  7. 7. Example "Event" Document (1 of 883) { "_id":"5d5f25254ef8fd62d6b9f2db642a9f7d", "_rev": "1-9630b35932dedbd4d31138aaf3385847", "type":"event", "year":1791, "event":"The independent Vermont Republic becomes the 14th state" } 7Wednesday, August 3, 11
  8. 8. Design Document special you choose {... "_id":"_design/design_document", "_rev": "1-9630b35932dedbd4d31138aaf3385847", "views": { "party_state_name": { "map":"function ... ", "reduce": " ... "}, "president_events": { "map":"function ... "}, "president_names": { "map":"function ... "}, "presidents": { "map":"function ... ", "reduce": " ... "}, "time_in_office": { "map":"function .... ", "reduce": " ... "}, "total_time_in_office": { "map":"function .... ", "reduce": " ... "} }, ... } 8Wednesday, August 3, 11
  9. 9. president_names 9Wednesday, August 3, 11
  10. 10. Invoke a View 10Wednesday, August 3, 11
  11. 11. Invoke a View curl -X GET http://localhost:5984/presidents/_design/design_doc/ _view/president_names {"total_rows":44,"offset":0,"rows":[ {"id":"...","key":1789,"value":"George Washington"}, {"id":"...","key":1797,"value":"John Adams"}, {"id":"...","key":1801,"value":"Thomas Jefferson"}, {"id":"...","key":1809,"value":"James Madison"}, ... ]} emitting document ids are always included 11Wednesday, August 3, 11
  12. 12. Under the Hood: Views http://localhost:5984/presidents/_design/design_doc/_view/president_names{"total_rows":44, "offset":0, "rows":[...]} Erlang HTTP mod_couch query server CouchDB storage engine view Spidermonkey Disk ICU 12Wednesday, August 3, 11
  13. 13. Under the Hood: Views{"total_rows":44, "offset":0, "rows":[...]} {"id":"...","key":1789,"value":"George Washington"}, {"id":"...","key":1797,"value":"John Adams"}, ... CouchDB Spidermonkey Disk ICU 13Wednesday, August 3, 11
  14. 14. Fetch Documents Matching a Key GET http://localhost:5984/presidents/_design/design_doc/ _view/president_names?key=1993 any valid JSON total view rows offset into rows {"total_rows":44,"offset":41,"rows":[ ! ! {"id":"...","key":1993,"value":"Bill Clinton"} ]} matching key 14Wednesday, August 3, 11
  15. 15. Get a Key Range of Documents GET http://localhost:5984/presidents/_design/design_doc/ _view/president_names?startkey=1790&endkey=1810 {"total_rows":44,"offset":1,"rows":[ ! {"id":"...","key":1797,"value":"John Adams"}, ! {"id":"...","key":1801,"value":"Thomas Jefferson"}, ! {"id":"...","key":1809,"value":"James Madison"} ]} 15Wednesday, August 3, 11
  16. 16. Limit the Number of Documents GET http://localhost:5984/presidents/_design/design_doc/ _view/president_names?limit=2 {"total_rows":44,"offset":0,"rows":[ ! {"id":"...","key":1789,"value":"George Washington"}, ! {"id":"...","key":1797,"value":"John Adams"}, ]} 16Wednesday, August 3, 11
  17. 17. presidents (_count) 17Wednesday, August 3, 11
  18. 18. Invoke a View 18Wednesday, August 3, 11
  19. 19. Reduce: _count { "_id":"_design/design_doc", "_rev": "1-157b2928bec2def71485cc751af7de37", "views": { "presidents": { "map":"function(doc) { ! if(doc.type == president) { ! ! emit(doc.took_office, doc) }}", "reduce":"_count" } }, ... } 19Wednesday, August 3, 11
  20. 20. _count Function GET http://localhost:5984/presidents/_design/design_doc/ _view/presidents {"rows":[ ! {"key":null,"value":44} ]} 20Wednesday, August 3, 11
  21. 21. total_time_in_office (_sum) 21Wednesday, August 3, 11
  22. 22. Invoke a View 22Wednesday, August 3, 11
  23. 23. A Map Function for _sum { "_id":"_design/design_doc", "_rev": "1-157b2928bec2def71485cc751af7de37", "views": { total_time_in_office": { value is number of years in office "map":"function(doc) { ! ! if(doc.type == president) { ! ! ! emit(doc.name, doc.left_office - doc.took_office) ! }}", "reduce":"_sum" } }, will be sorted by name ... } 23Wednesday, August 3, 11
  24. 24. Reduce: _sum { "_id":"_design/design_doc", "_rev": "1-157b2928bec2def71485cc751af7de37", "views": { total_time_in_office": { _sum requires number values "map":"function(doc) { ! ! if(doc.type == president) { ! ! ! emit(doc.name, doc.left_office - doc.took_office) ! }}", "reduce":"_sum" } }, ... } 24Wednesday, August 3, 11
  25. 25. _sum Function GET http://localhost:5984/presidents/_design/design_doc/ _view/total_time_in_office {"rows":[ ! {"key":null,"value":232} ]} 25Wednesday, August 3, 11
  26. 26. time_in_office (_stats) 26Wednesday, August 3, 11
  27. 27. Invoke a View 27Wednesday, August 3, 11
  28. 28. Reduce: _stats { "_id":"_design/design_doc", "_rev": "1-157b2928bec2def71485cc751af7de37", "views": { "time_in_office": { "map":"function(doc) { ! ! if(doc.type == president) { ! ! ! emit(doc.name, doc.left_office - doc.took_office) ! }}", "reduce":"_stats" } }, ... } 28Wednesday, August 3, 11
  29. 29. _stats Function GET http://localhost:5984/presidents/_design/design_doc/ _view/time_in_office {"rows":[ ! {"key":null, ! "value":{ ! "sum":232,"count":43,"min":0,"max":12,"sumsqr":1546 } ]} 29Wednesday, August 3, 11
  30. 30. View Trees 30Wednesday, August 3, 11
  31. 31. Disk-Based View Tree root k=size of interior node A-R n=number of keys A-H interior nodes I-R depth= log k(n) A-C D-F G-H I-L N-R A B C D F G H I K L N O Q R leaves 31Wednesday, August 3, 11
  32. 32. _count Nodes root 14 7 reductions 7 3 2 2 3 4 A B C D F G H I K L N O Q R keys 32Wednesday, August 3, 11
  33. 33. _count Nodes root A-R 14 A-H 7 reductions I-R 7 A-C D-F G-H I-L N-R 3 2 2 3 4 A B C D F G H I K L N O Q R keys 33Wednesday, August 3, 11
  34. 34. Inserting a New Document new root A-R 15 A-R 14 new reductions I-R 8 A-H I-R 7 7 M-R 5 A-C D-F G-H I-L N-R 3 2 2 3 4 A B C D F G H I K L M N O Q R new key 34Wednesday, August 3, 11
  35. 35. Committing the Change A-R 15 A-R 14 I-R 8 A-H I-R 7 7 M-R 5 A-C D-F G-H I-L N-R 3 2 2 3 4 A B C D F G H I K L M N O Q R 35Wednesday, August 3, 11
  36. 36. Getting a Key Range A-R 14 A-H I-R 7 7 A-C D-F G-H I-L M-R 3 2 2 3 5 A B C D F G H I K L M N O Q R startkey endkey 36Wednesday, August 3, 11
  37. 37. Key Range Reduction 15 (8) (3) 7 8 (5) 3 (1) 2 2 3 5 (2) A B C D F G H I K L M N O Q R startkey endkey 37Wednesday, August 3, 11
  38. 38. More Ways to Use Views 38Wednesday, August 3, 11
  39. 39. Skip the Reduce Function GET http://localhost:5984/presidents/_design/design_doc/ _view/time_in_office?reduce=false {"total_rows":44,"offset":0,"rows":[ {"id":"...","key":"Abraham Lincoln","value":5}, {"id":"...","key":"Andrew Jackson","value":8}, {"id":"...","key":"Andrew Johnson","value":4}, ... ]} 39Wednesday, August 3, 11
  40. 40. Reversing the Order of Results GET http://localhost:5984/presidents/_design/design_doc/ _view/president_names?descending=true {"total_rows":44,"offset":0,"rows":[ {"id":"...","key":2009,"value":"Barack Obama"}, {"id":"...","key":2001,"value":"George W. Bush"}, {"id":"...","key":1993,"value":"Bill Clinton"}, {"id":"...","key":1989,"value":"George H. W. Bush"}, ... ]} 40Wednesday, August 3, 11
  41. 41. Reversing the Order of a Range startkey and endkey are reversed, too GET http://localhost:5984/presidents/_design/design_doc/ _view/president_names? descending=true&startkey=1850&endkey=1790 {"total_rows":44,"offset":0,"rows":[ {"id":"...","key":2009,"value":"Barack Obama"}, {"id":"...","key":2001,"value":"George W. Bush"}, {"id":"...","key":1993,"value":"Bill Clinton"}, {"id":"...","key":1989,"value":"George H. W. Bush"}, ... ]} 41Wednesday, August 3, 11
  42. 42. Ignore a Given Number of Rows GET http://localhost:5984/presidents/_design/design_doc/ _view/president_names?limit=10&skip=1 avoid large values 42Wednesday, August 3, 11
  43. 43. Paginating (Initial Page) // first page of documents GET http://localhost:5984/presidents/_design/design_doc/ _view/president_names?limit=2 {"total_rows":44,"offset":0,"rows":[ ! {"id":"...","key":1789,"value":"George Washington"}, ! {"id":"...","key":1797,"value":"John Adams"}, ]} last key of result 43Wednesday, August 3, 11
  44. 44. Paginating (Successive Pages) last key of previous result // successive pages GET http://localhost:5984/presidents/_design/design_doc/ _view/president_names?startkey=1797&skip=1&limit=2 dont include the first document {"total_rows":44,"offset":2,"rows":[ ! {"id":"...","key":1801,"value":"Thomas Jefferson"}, ! {"id":"...","key":1809,"value":"James Madison"}, ]} 44Wednesday, August 3, 11
  45. 45. Paginating in Reverse Order // first page of documents GET http://localhost:5984/presidents/_design/ design_doc/_view/president_names? descending=true&limit=2 // successive pages last key of previous result GET http://localhost:5984/presidents/_design/design_doc/ _view/president_names? descending=true&startkey=1797&skip=1&limit=2 dont include the first document 45Wednesday, August 3, 11
  46. 46. Using a Stale View GET http://localhost:5984/presidents/_design/design_doc/ _view/president_names?stale=ok 46Wednesday, August 3, 11
  47. 47. Updating the View Immediately After GET http://localhost:5984/presidents/_design/design_doc/ _view/president_names?stale=update_after 47Wednesday, August 3, 11
  48. 48. party_state_name (group & group_level) 48Wednesday, August 3, 11
  49. 49. Invoke a View 49Wednesday, August 3, 11
  50. 50. Group Level Map Keys Group Level 1 Group Level 2 ["a",1,1] ["a"] ["a",1] ["a",3,4] ["a",3] ["a",3,8] ["b",2,6] ["b"] ["b",2] ["b",2,6] ["c",1,5] ["c"] ["c",1] ["c",4,2] ["c",4] GET http://localhost:5984/my_db/_design/ddoc/_view/v1? group_level=2 only applies to reduce views 50Wednesday, August 3, 11
  51. 51. Invoke a View 51Wednesday, August 3, 11
  52. 52. Group GET http://localhost:5984/my_db/_design/greeting/_view/v1? group=true one output row for each unique key equivalent to group_level=infinity 52Wednesday, August 3, 11
  53. 53. Including Full Documents GET http://localhost:5984/presidents/_design/design_doc/ _view/president_names?include_docs=true {"total_rows":44,"offset":0,"rows":[ {"id":"...","key":1789,"value":"George Washington", "doc":{"_id":"...","_rev":"1-...","presidency":"1", "wikipedia_entry":"http://en.wikipedia.org/...", "took_office":1789,"left_office":1797, "party":"Independent","home_state":"Virginia", "name":"George Washington","type":"president"}}, ! ,... ]} 53Wednesday, August 3, 11
  54. 54. Emitting with include_docs=true includes the latest rev of the emitter function(doc) { emit("key", aValue) } includes this rev of the emitter function(doc) { emit("key",{"_rev":doc._rev; value:doc}) } includes document with id foo function(doc) { emit("key",{"_id":"foo", "value":doc}) } 54Wednesday, August 3, 11
  55. 55. Requesting Specific Keys POST -H "Content-Type:application/json" http://localhost:5984/presidents/_design/design_doc/ _view/president_names -d {"keys":[1789, 1929, 1993, ... ]} POST -H "Content-Type:application/json" http://localhost:5984/presidents/_design/design_doc/ _view/president_names?include_docs=true -d {"keys":[1789, 1929, 1993, ... ]} 55Wednesday, August 3, 11
  56. 56. president_events (join) 56Wednesday, August 3, 11
  57. 57. Invoke a View 57Wednesday, August 3, 11
  58. 58. Collating Joins year took office "views": { one-element array "president_events": "function(doc) { if (doc.type == president) { emit([doc.took_office], doc.name); } else if (doc.type == event) { emit([doc.year, 0], doc.event); } }" year of event } second array element 58Wednesday, August 3, 11
  59. 59. Join Presidents and Events GET http://localhost:5984/presidents/_design/design_doc/ _view/president_events {"total_rows":883,"offset":0,"rows":[ {"id":"...","key":[1789],"value":"George Washington"}, {"id":"...","key":[1790,0],"value":"Rhode Island ratifies the Constitution and becomes 13th state"}, {"id":"...","key":[1791,0],"value":"Bill of Rights ratified"}, {"id":"...","key":[1791,0],"value":"First Bank of the United States chartered"}, ... ]} 59Wednesday, August 3, 11
  60. 60. End 60Wednesday, August 3, 11

×