• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Riak MapReduce
 

Riak MapReduce

on

  • 11,573 views

Slides from webinar on MapReduce:

Slides from webinar on MapReduce:
http://blog.basho.com/2010/07/15/free-webinar---map/reduce-querying-in-riak---july-22-@-2pm-eastern/

Statistics

Views

Total Views
11,573
Views on SlideShare
10,952
Embed Views
621

Actions

Likes
15
Downloads
212
Comments
0

6 Embeds 621

http://nosql.mypopescu.com 589
http://thinkery.me 27
https://twitter.com 2
http://www.slideshare.net 1
http://translate.googleusercontent.com 1
http://www.mefeedia.com 1

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • Tasks - individual map processes <br /> Combine - function to run over map results on local nodes before shipping data to reduce operations <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />

Riak MapReduce Riak MapReduce Presentation Transcript

  • MapReduce Daniel Reverri Developer Advocate basho
  • Overview Why MapReduce? MapReduce Basics Using MapReduce Examples Comparisons basho
  • Why MapReduce? Parallel, distributed queries Easy to write Easy to run basho
  • Riak is a Key/Value store basho
  • Key/Value Data /riak/cat/snowball1 /riak/cat/snowball2 /riak/cat/snowball3 basho
  • Cluster catlady@192.168.1.10 catlady@192.168.1.11 catlady@192.168.1.12 basho
  • MapReduce Basics Operates over a known set of keys Runs near the data Consists of two types of functions Map Reduce basho
  • What is a Map Function? Function applied to one piece of data Operates in isolation Returns a list of results basho
  • What can I do with a Map Function? Filtering Filter documents by “tags” Extracting Count words in a document Extract links to related data basho
  • Map cross_the_road(cat) cross_the_road(cat) cross_the_road(cat) basho
  • What is a Reduce Function? Function applied to a list of results Merges results from Map phases basho
  • What can I do with a Reduce Function? Aggregate Sort basho
  • Reduce cross_the_road(cat) cross_the_road(cat) sort(cats) cross_the_road(cat) basho
  • Using MapReduce Define and submit request REST Protocol Buffers Review results basho
  • Request (REST) POST to “/mapred” Content-Type: application/json List of bucket/key pairs List of phase definitions Timeout in milliseconds basho
  • Inputs basho
  • Query basho
  • Phase basho
  • Phase Type (map, reduce, link) basho
  • Phase Function (named) basho
  • Phase Function (anonymous) basho
  • Phase Keep (true|false) basho
  • Phase Argument basho
  • Function Arguments basho
  • Map - value basho
  • Map - keyData, arg basho
  • Reduce - arg basho
  • Examples basho
  • Map Demo Count the number of times the word “demo” appears in a set of documents basho
  • Demo Data map_demo/key1.txt Random boring demo data for map demo map_demo/key2.txt More useless demo data map_demo/key3.txt demo demo demo demo demo basho
  • Request basho
  • Inputs basho
  • Query basho
  • Map basho
  • Results basho
  • Reduce Demo Sort documents by the number of times “demo” appears basho
  • Request basho
  • Inputs basho
  • Query basho
  • Reduce basho
  • Results basho
  • Argument Demo Enhance “demo” count example to count words matching a regular expression basho
  • Map with arg basho
  • Results basho
  • Deploying Demo Deploy enhanced count function as a named function basho
  • js_source_dir app.config $ riak restart basho
  • Named Function /tmp/js_source/count_by_regex.js $ riak-admin js_reload basho
  • Query basho
  • Results basho
  • Comparisons basho
  • Hadoop (similarities) Distributed across multiple machines Provides data locality (HDFS) Phases run near the data
  • Hadoop (differences) Used for large, long running jobs (hours) Restarts failed tasks 3 phases (map, combine, reduce)
  • CouchDB (differences) Not distributed across multiple machines Runs over all docs in a database Computes cached views for lookups No query time arguments 2 phase (map, reduce)
  • MongoDB (differences) Not run in parallel Not spread across multiple machines 3 phases (map, reduce, finalize)
  • Closing thoughts basho
  • Good to Know Phases must always return lists Map inputs are always bucket/key pairs Bucket queries are bad Anonymous functions are bad basho
  • Features not Reviewed Link phase (link walking) Results from multiple phases Erlang MapReduce functions Streaming results basho
  • Questions? basho