Upcoming SlideShare
×

6,977 views

Published on

Published in: Technology
5 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
6,977
On SlideShare
0
From Embeds
0
Number of Embeds
484
Actions
Shares
0
48
0
Likes
5
Embeds 0
No embeds

No notes for slide
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n

1. 1. 6
2. 2. 7
3. 3. 8
4. 4. Blocks (Input Data) Parallel Partition Node 1 Node 2 Node 3 (MAP) Network TransferParallel Recombine Node 1 Node 2 Node 3 (REDUCE) Output Data 9
5. 5. 10
6. 6. 11
7. 7. 12
8. 8. 13
9. 9. 14
10. 10. •R CMD INSTALL Rhipe_version.tar.gz 15
11. 11. map <- expression({ #})reduce <- expression( pre = {}, reduce = {}, post = {})z <- rhmr(map=map, reduce=reduce, inout=c("text","sequence") ,ifolder=”/tmp/input”, ofolder=”/tmp/output”)rhex(z)results <- rhread(“/tmp/output”) 16
12. 12. map <- expression({ library(openNLP) f <- table(tokenize(unlist(map.values), language = "en")) n <- names(f) p <- as.numeric(f) sapply(seq_along(n),function(r) rhcollect(n[r],p[r]))})reduce <- expression( pre = { total <- 0}, reduce = { total <- total+sum(unlist(reduce.values)) }, post = { rhcollect(reduce.key,total) })z <- rhmr(map=map, reduce=reduce, inout=c("text","sequence") ,ifolder=”/tmp/input”, ofolder=”/tmp/output”)rhex(z) 17
13. 13. > results <- rhread("/tmp/output")> results <- data.frame(word=unlist(lapply(results,"[[",1))’+ ,count =unlist (lapply(results,"[[",2)))> results <- (results[order(results\$count, decreasing=TRUE), ])> head(results) word count13 . 2080439 the 110111 , 76032 a 701153 to 65828 I 651> results[results["word"] == "FACEBOOK", ] word count3221 FACEBOOK 6> results[results["word"] == "Facebook", ] word count3223 Facebook 39> results[results["word"] == "facebook", ] word count3389 facebook 6 18
14. 14. 19