Your SlideShare is downloading. ×
Riak MapReduce
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Riak MapReduce

10,729

Published on

Slides from webinar on MapReduce: …

Slides from webinar on MapReduce:
http://blog.basho.com/2010/07/15/free-webinar---map/reduce-querying-in-riak---july-22-@-2pm-eastern/

Published in: Technology
0 Comments
15 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
10,729
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
225
Comments
0
Likes
15
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide



















































  • Tasks - individual map processes
    Combine - function to run over map results on local nodes before shipping data to reduce operations






  • Transcript

    • 1. MapReduce Daniel Reverri Developer Advocate basho
    • 2. Overview Why MapReduce? MapReduce Basics Using MapReduce Examples Comparisons basho
    • 3. Why MapReduce? Parallel, distributed queries Easy to write Easy to run basho
    • 4. Riak is a Key/Value store basho
    • 5. Key/Value Data /riak/cat/snowball1 /riak/cat/snowball2 /riak/cat/snowball3 basho
    • 6. Cluster catlady@192.168.1.10 catlady@192.168.1.11 catlady@192.168.1.12 basho
    • 7. MapReduce Basics Operates over a known set of keys Runs near the data Consists of two types of functions Map Reduce basho
    • 8. What is a Map Function? Function applied to one piece of data Operates in isolation Returns a list of results basho
    • 9. What can I do with a Map Function? Filtering Filter documents by “tags” Extracting Count words in a document Extract links to related data basho
    • 10. Map cross_the_road(cat) cross_the_road(cat) cross_the_road(cat) basho
    • 11. What is a Reduce Function? Function applied to a list of results Merges results from Map phases basho
    • 12. What can I do with a Reduce Function? Aggregate Sort basho
    • 13. Reduce cross_the_road(cat) cross_the_road(cat) sort(cats) cross_the_road(cat) basho
    • 14. Using MapReduce Define and submit request REST Protocol Buffers Review results basho
    • 15. Request (REST) POST to “/mapred” Content-Type: application/json List of bucket/key pairs List of phase definitions Timeout in milliseconds basho
    • 16. Inputs basho
    • 17. Query basho
    • 18. Phase basho
    • 19. Phase Type (map, reduce, link) basho
    • 20. Phase Function (named) basho
    • 21. Phase Function (anonymous) basho
    • 22. Phase Keep (true|false) basho
    • 23. Phase Argument basho
    • 24. Function Arguments basho
    • 25. Map - value basho
    • 26. Map - keyData, arg basho
    • 27. Reduce - arg basho
    • 28. Examples basho
    • 29. Map Demo Count the number of times the word “demo” appears in a set of documents basho
    • 30. Demo Data map_demo/key1.txt Random boring demo data for map demo map_demo/key2.txt More useless demo data map_demo/key3.txt demo demo demo demo demo basho
    • 31. Request basho
    • 32. Inputs basho
    • 33. Query basho
    • 34. Map basho
    • 35. Results basho
    • 36. Reduce Demo Sort documents by the number of times “demo” appears basho
    • 37. Request basho
    • 38. Inputs basho
    • 39. Query basho
    • 40. Reduce basho
    • 41. Results basho
    • 42. Argument Demo Enhance “demo” count example to count words matching a regular expression basho
    • 43. Map with arg basho
    • 44. Results basho
    • 45. Deploying Demo Deploy enhanced count function as a named function basho
    • 46. js_source_dir app.config $ riak restart basho
    • 47. Named Function /tmp/js_source/count_by_regex.js $ riak-admin js_reload basho
    • 48. Query basho
    • 49. Results basho
    • 50. Comparisons basho
    • 51. Hadoop (similarities) Distributed across multiple machines Provides data locality (HDFS) Phases run near the data
    • 52. Hadoop (differences) Used for large, long running jobs (hours) Restarts failed tasks 3 phases (map, combine, reduce)
    • 53. CouchDB (differences) Not distributed across multiple machines Runs over all docs in a database Computes cached views for lookups No query time arguments 2 phase (map, reduce)
    • 54. MongoDB (differences) Not run in parallel Not spread across multiple machines 3 phases (map, reduce, finalize)
    • 55. Closing thoughts basho
    • 56. Good to Know Phases must always return lists Map inputs are always bucket/key pairs Bucket queries are bad Anonymous functions are bad basho
    • 57. Features not Reviewed Link phase (link walking) Results from multiple phases Erlang MapReduce functions Streaming results basho
    • 58. Questions? basho

    ×