RでMapreduce
Upcoming SlideShare
Loading in...5
×
 

RでMapreduce

on

  • 6,395 views

 

Statistics

Views

Total Views
6,395
Views on SlideShare
6,085
Embed Views
310

Actions

Likes
8
Downloads
34
Comments
0

6 Embeds 310

http://d.hatena.ne.jp 262
http://localhost:3000 16
http://localhost:9646 13
http://holidayworking.heroku.com 8
http://blog.holidayworking.org 7
http://holidayworking.org 4

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

RでMapreduce RでMapreduce Presentation Transcript

  • R MapReduce @holidayworking 2010 8 28
  • ) Twitter: @holidayworking : : : F1 : Java, PL/SQL: Python, Ruby, R: @holidayworking () R MapReduce 2010 8 28 2 / 18
  • MapReduce Google map reduce 2 @holidayworking () R MapReduce 2010 8 28 3 / 18
  • MapReduce 1 Map 2 Shuffle 3 Reduce @holidayworking () R MapReduce 2010 8 28 4 / 18
  • MapReduce [1] @holidayworking () R MapReduce 2010 8 28 5 / 18
  • MapReduce Grep @holidayworking () R MapReduce 2010 8 28 6 / 18
  • Hadoop Google File System MapReduce @holidayworking () R MapReduce 2010 8 28 7 / 18
  • Hadoop Google File System MapReduce Hadoop Java @holidayworking () R MapReduce 2010 8 28 7 / 18
  • Hadoop Google File System MapReduce Hadoop Java MapReduce Java @holidayworking () R MapReduce 2010 8 28 7 / 18
  • Hadoop Google File System MapReduce Hadoop Java MapReduce Java Hadoop Streaming @holidayworking () R MapReduce 2010 8 28 7 / 18
  • Hadoop Google File System MapReduce Hadoop Java MapReduce Java Hadoop Streaming MapReduce @holidayworking () R MapReduce 2010 8 28 7 / 18
  • Hadoop Google File System MapReduce Hadoop Java MapReduce Java Hadoop Streaming MapReduce R @holidayworking () R MapReduce 2010 8 28 7 / 18
  • R MapReduce Ardbeg 10 Years Old Bowmore 12 Years Old Talisker 10 Years Old The Glenlivet 12 Year Old The Macallan 12 Years Ballantine 12 Years Old Ballantine 17 Years Old Johnnie Walker Gold Label 18 Years Old Johnnie Walker Swing @holidayworking () R MapReduce 2010 8 28 8 / 18
  • iWork Numbers 250 2010/07/01 The Macallan 12 Years single malt 10 2010/07/01 Ballantine 12 Years Old blended 3 2010/07/01 Ballantine 17 Years Old blended 6 2010/07/01 Johnnie Walker Gold Label 18 Years Old blended 6 2010/07/02 The Glenlivet 12 Year Old single malt 4 2010/07/02 Ardbeg 10 Years Old single malt 2 2010/07/02 Ballantine 12 Years Old blended 8 2010/07/02 Ballantine 17 Years Old blended 7 2010/07/02 Johnnie Walker Swing blended 3 ( ) 2010/07/31 Johnnie Walker Swing blended 4 2010/07/31 Johnnie Walker Gold Label 18 Years Old blended 2 2010/07/31 Bowmore 12 Years Old single malt 4 2010/07/31 Talisker 10 Years Old single malt 7 @holidayworking () R MapReduce 2010 8 28 9 / 18
  • @holidayworking () R MapReduce 2010 8 28 10 / 18
  • MapReduce 1 Mapper 2 Reducer 3 Hadoop Streaming $ hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-0.20.2-streaming.jar -input scotch.tsv -output output -mapper mapper.r -reducer reducer.r 4 $ cat output/part-00000 blended 592 single malt 783 @holidayworking () R MapReduce 2010 8 28 11 / 18
  • Reducer #!/usr/bin/env Rscript env <- new.env(hash = TRUE) con <- file("stdin", open = "r") while (length(line <- readLines(con, n = 1, warn = FALSE)) > 0) { line <- unlist(strsplit(line, "t")) key <- line[1] value <- as.integer(line[2]) if (exists(key, envir = env, inherits = FALSE)) { oldcount <- get(key, envir = env) assign(key, oldcount + value, envir = env) } else { assign(key, value, envir = env) } } close(con) for (key in ls(env, all = TRUE)) { cat(key, "t", get(value, envir = env), "n", sep = " ") } @holidayworking () R MapReduce 2010 8 28 12 / 18
  • Mapper #!/usr/bin/env Rscript con <- file("stdin", open = "r") while (length(line <- readLines(con, n = 1, warn = FALSE)) > 0) { line <- unlist(strsplit(line, "t")) date <- line[1] order <- line[4] cat(sprintf("%st%sn", date, order), sep = "") } close(con) cat output/part-00000 2010/07/01 25 2010/07/02 42 2010/07/03 39 2010/07/29 17 2010/07/30 45 2010/07/31 47 @holidayworking () R MapReduce 2010 8 28 13 / 18
  • Mapper #!/usr/bin/env Rscript con <- file("stdin", open = "r") while (length(line <- readLines(con, n = 1, warn = FALSE)) > 0) { line <- unlist(strsplit(line, "t")) brand <- line[2] order <- line[4] cat(sprintf("%st%sn", brand, order), sep = "") } close(con) $ cat output/part-00000 Ardbeg 10 Years Old 166 Ballantine 12 Years Old 142 Ballantine 17 Years Old 150 Bowmore 12 Years Old 149 Johnnie Walker Gold Label 18 Years Old 176 Johnnie Walker Swing 124 Talisker 10 Years Old 176 The Glenlivet 12 Year Old 164 The Macallan 12 Years 128 @holidayworking () R MapReduce 2010 8 28 14 / 18
  • Mapper #!/usr/bin/env Rscript con <- file("stdin", open = "r") while (length(line <- readLines(con, n = 1, warn = FALSE)) > 0) { line <- unlist(strsplit(line, "t")) type <- line[3] order <- line[4] cat(sprintf("%st%sn", type, order), sep = "") } close(con) $ cat output/part-00000 blended 592 single malt 783 @holidayworking () R MapReduce 2010 8 28 15 / 18
  • MapReduce : @holidayworking () R MapReduce 2010 8 28 16 / 18
  • MapReduce : Hadoop : Google File System MapReduce @holidayworking () R MapReduce 2010 8 28 16 / 18
  • MapReduce : Hadoop : Google File System MapReduce Hadoop Streaming R MapReduce @holidayworking () R MapReduce 2010 8 28 16 / 18
  • @holidayworking () R MapReduce 2010 8 28 17 / 18
  • Jeffrey Dean and Sanjay Ghemawat. Mapreduce: Simplified data processing on large clusters. OSDI’04: Sixth Symposium on Operating System Design and Implementation, 2004. Tom White. Hadoop. . @holidayworking () R MapReduce 2010 8 28 18 / 18