Your SlideShare is downloading. ×
  • Like
Riak MapReduce
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply
Published

Slides from webinar on MapReduce: …

Slides from webinar on MapReduce:
http://blog.basho.com/2010/07/15/free-webinar---map/reduce-querying-in-riak---july-22-@-2pm-eastern/

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
10,589
On SlideShare
0
From Embeds
0
Number of Embeds
3

Actions

Shares
Downloads
221
Comments
0
Likes
15

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide



















































  • Tasks - individual map processes
    Combine - function to run over map results on local nodes before shipping data to reduce operations






Transcript

  • 1. MapReduce Daniel Reverri Developer Advocate basho
  • 2. Overview Why MapReduce? MapReduce Basics Using MapReduce Examples Comparisons basho
  • 3. Why MapReduce? Parallel, distributed queries Easy to write Easy to run basho
  • 4. Riak is a Key/Value store basho
  • 5. Key/Value Data /riak/cat/snowball1 /riak/cat/snowball2 /riak/cat/snowball3 basho
  • 6. Cluster catlady@192.168.1.10 catlady@192.168.1.11 catlady@192.168.1.12 basho
  • 7. MapReduce Basics Operates over a known set of keys Runs near the data Consists of two types of functions Map Reduce basho
  • 8. What is a Map Function? Function applied to one piece of data Operates in isolation Returns a list of results basho
  • 9. What can I do with a Map Function? Filtering Filter documents by “tags” Extracting Count words in a document Extract links to related data basho
  • 10. Map cross_the_road(cat) cross_the_road(cat) cross_the_road(cat) basho
  • 11. What is a Reduce Function? Function applied to a list of results Merges results from Map phases basho
  • 12. What can I do with a Reduce Function? Aggregate Sort basho
  • 13. Reduce cross_the_road(cat) cross_the_road(cat) sort(cats) cross_the_road(cat) basho
  • 14. Using MapReduce Define and submit request REST Protocol Buffers Review results basho
  • 15. Request (REST) POST to “/mapred” Content-Type: application/json List of bucket/key pairs List of phase definitions Timeout in milliseconds basho
  • 16. Inputs basho
  • 17. Query basho
  • 18. Phase basho
  • 19. Phase Type (map, reduce, link) basho
  • 20. Phase Function (named) basho
  • 21. Phase Function (anonymous) basho
  • 22. Phase Keep (true|false) basho
  • 23. Phase Argument basho
  • 24. Function Arguments basho
  • 25. Map - value basho
  • 26. Map - keyData, arg basho
  • 27. Reduce - arg basho
  • 28. Examples basho
  • 29. Map Demo Count the number of times the word “demo” appears in a set of documents basho
  • 30. Demo Data map_demo/key1.txt Random boring demo data for map demo map_demo/key2.txt More useless demo data map_demo/key3.txt demo demo demo demo demo basho
  • 31. Request basho
  • 32. Inputs basho
  • 33. Query basho
  • 34. Map basho
  • 35. Results basho
  • 36. Reduce Demo Sort documents by the number of times “demo” appears basho
  • 37. Request basho
  • 38. Inputs basho
  • 39. Query basho
  • 40. Reduce basho
  • 41. Results basho
  • 42. Argument Demo Enhance “demo” count example to count words matching a regular expression basho
  • 43. Map with arg basho
  • 44. Results basho
  • 45. Deploying Demo Deploy enhanced count function as a named function basho
  • 46. js_source_dir app.config $ riak restart basho
  • 47. Named Function /tmp/js_source/count_by_regex.js $ riak-admin js_reload basho
  • 48. Query basho
  • 49. Results basho
  • 50. Comparisons basho
  • 51. Hadoop (similarities) Distributed across multiple machines Provides data locality (HDFS) Phases run near the data
  • 52. Hadoop (differences) Used for large, long running jobs (hours) Restarts failed tasks 3 phases (map, combine, reduce)
  • 53. CouchDB (differences) Not distributed across multiple machines Runs over all docs in a database Computes cached views for lookups No query time arguments 2 phase (map, reduce)
  • 54. MongoDB (differences) Not run in parallel Not spread across multiple machines 3 phases (map, reduce, finalize)
  • 55. Closing thoughts basho
  • 56. Good to Know Phases must always return lists Map inputs are always bucket/key pairs Bucket queries are bad Anonymous functions are bad basho
  • 57. Features not Reviewed Link phase (link walking) Results from multiple phases Erlang MapReduce functions Streaming results basho
  • 58. Questions? basho