Riak MapReduce
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Riak MapReduce

on

  • 12,108 views

Slides from webinar on MapReduce:

Slides from webinar on MapReduce:
http://blog.basho.com/2010/07/15/free-webinar---map/reduce-querying-in-riak---july-22-@-2pm-eastern/

Statistics

Views

Total Views
12,108
Views on SlideShare
11,481
Embed Views
627

Actions

Likes
15
Downloads
218
Comments
0

6 Embeds 627

http://nosql.mypopescu.com 595
http://thinkery.me 27
https://twitter.com 2
http://www.slideshare.net 1
http://translate.googleusercontent.com 1
http://www.mefeedia.com 1

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • Tasks - individual map processes <br /> Combine - function to run over map results on local nodes before shipping data to reduce operations <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />

Riak MapReduce Presentation Transcript

  • 1. MapReduce Daniel Reverri Developer Advocate basho
  • 2. Overview Why MapReduce? MapReduce Basics Using MapReduce Examples Comparisons basho
  • 3. Why MapReduce? Parallel, distributed queries Easy to write Easy to run basho
  • 4. Riak is a Key/Value store basho
  • 5. Key/Value Data /riak/cat/snowball1 /riak/cat/snowball2 /riak/cat/snowball3 basho
  • 6. Cluster catlady@192.168.1.10 catlady@192.168.1.11 catlady@192.168.1.12 basho
  • 7. MapReduce Basics Operates over a known set of keys Runs near the data Consists of two types of functions Map Reduce basho
  • 8. What is a Map Function? Function applied to one piece of data Operates in isolation Returns a list of results basho
  • 9. What can I do with a Map Function? Filtering Filter documents by “tags” Extracting Count words in a document Extract links to related data basho
  • 10. Map cross_the_road(cat) cross_the_road(cat) cross_the_road(cat) basho
  • 11. What is a Reduce Function? Function applied to a list of results Merges results from Map phases basho
  • 12. What can I do with a Reduce Function? Aggregate Sort basho
  • 13. Reduce cross_the_road(cat) cross_the_road(cat) sort(cats) cross_the_road(cat) basho
  • 14. Using MapReduce Define and submit request REST Protocol Buffers Review results basho
  • 15. Request (REST) POST to “/mapred” Content-Type: application/json List of bucket/key pairs List of phase definitions Timeout in milliseconds basho
  • 16. Inputs basho
  • 17. Query basho
  • 18. Phase basho
  • 19. Phase Type (map, reduce, link) basho
  • 20. Phase Function (named) basho
  • 21. Phase Function (anonymous) basho
  • 22. Phase Keep (true|false) basho
  • 23. Phase Argument basho
  • 24. Function Arguments basho
  • 25. Map - value basho
  • 26. Map - keyData, arg basho
  • 27. Reduce - arg basho
  • 28. Examples basho
  • 29. Map Demo Count the number of times the word “demo” appears in a set of documents basho
  • 30. Demo Data map_demo/key1.txt Random boring demo data for map demo map_demo/key2.txt More useless demo data map_demo/key3.txt demo demo demo demo demo basho
  • 31. Request basho
  • 32. Inputs basho
  • 33. Query basho
  • 34. Map basho
  • 35. Results basho
  • 36. Reduce Demo Sort documents by the number of times “demo” appears basho
  • 37. Request basho
  • 38. Inputs basho
  • 39. Query basho
  • 40. Reduce basho
  • 41. Results basho
  • 42. Argument Demo Enhance “demo” count example to count words matching a regular expression basho
  • 43. Map with arg basho
  • 44. Results basho
  • 45. Deploying Demo Deploy enhanced count function as a named function basho
  • 46. js_source_dir app.config $ riak restart basho
  • 47. Named Function /tmp/js_source/count_by_regex.js $ riak-admin js_reload basho
  • 48. Query basho
  • 49. Results basho
  • 50. Comparisons basho
  • 51. Hadoop (similarities) Distributed across multiple machines Provides data locality (HDFS) Phases run near the data
  • 52. Hadoop (differences) Used for large, long running jobs (hours) Restarts failed tasks 3 phases (map, combine, reduce)
  • 53. CouchDB (differences) Not distributed across multiple machines Runs over all docs in a database Computes cached views for lookups No query time arguments 2 phase (map, reduce)
  • 54. MongoDB (differences) Not run in parallel Not spread across multiple machines 3 phases (map, reduce, finalize)
  • 55. Closing thoughts basho
  • 56. Good to Know Phases must always return lists Map inputs are always bucket/key pairs Bucket queries are bad Anonymous functions are bad basho
  • 57. Features not Reviewed Link phase (link walking) Results from multiple phases Erlang MapReduce functions Streaming results basho
  • 58. Questions? basho