Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
© Hortonworks Inc. 2012
Enabling R on Hadoop
July 11, 2013
Page 1
© Hortonworks Inc. 2012
Your Presenters
Ravi Mutyala
Systems Architect
Page 2
Paul Codding
Solutions Engineer
© Hortonworks Inc. 2012
Agenda
• A Brief History of R
• How R is Typically Used
• How R is Used with Hadoop
• Getting Star...
© Hortonworks Inc. 2012
A Brief History of R
Page 4
© Hortonworks Inc. 2012
History of R
Page 5
1976: S
Fortran
John
Chambers
S
1988: S V3
written in C
& statistical
models
i...
© Hortonworks Inc. 2012
How R is Typically Used
Page 6
© Hortonworks Inc. 2012
Main Uses of R
• Statistical Analysis & Modeling
– Classification
– Scoring
– Ranking
– Clustering...
© Hortonworks Inc. 2012
How R is Used with Hadoop
Page 8
© Hortonworks Inc. 2012
Hadoop Components
Page 9
OS	
   Cloud	
   VM	
   Appliance	
  
PLATFORM	
  SERVICES	
  
HADOOP	
  ...
© Hortonworks Inc. 2012
Hadoop Components & R
Page 10
OS	
   Cloud	
   VM	
   Appliance	
  
PLATFORM	
  SERVICES	
  
HADOO...
© Hortonworks Inc. 2012
Options for R on Hadoop
• Options
– RODBC/RJDBC
– RHive
– RHadoop
• Analysis
– Focus
– Integration...
© Hortonworks Inc. 2012
RODBC/RJDBC
• Focus
– SQL Access from R
• Integration Ease
– Install Hortonworks Hive ODBC Driver
...
© Hortonworks Inc. 2012
Deployment Considerations
Page 13
TT , DN
.
.
.
.
.
.
.
TT , DNJTNNHS
© Hortonworks Inc. 2012
RHive
• Focus
– Broad access to Hive and HDFS
• Integration Ease
– Requires Hadoop binaries, libra...
© Hortonworks Inc. 2012
Deployment Considerations
Page 15
TT + DN
.
.
.
.
.
.
.
TT + DN
JT
R Edge
Node
NNHS
© Hortonworks Inc. 2012
RHadoop
• Focus
– Tight integration with core Hadoop components
• Benefit
– Ability to run R on a ...
© Hortonworks Inc. 2012
RHadoop Architecture
Page 17
R
rhdfs
rhbase
rmr2
HDFS
HBase Thrift
Gateway
Map Reduce
HBase
Stream...
© Hortonworks Inc. 2012
rhdfs
• Access HDFS from R
• Read from HDFS to R dataframe
• Write from R dataframe to HDFS
• 1.0....
© Hortonworks Inc. 2012
rhdfs
• Hadoop CLI Commands & rhdfs equivalent
• hadoop fs –ls /
– hdfs.ls(“/”)
• hadoop fs –mkdir...
© Hortonworks Inc. 2012
rhbase
• Access and change data within HBase
• Uses Thrift API
• Command Examples
– hb.new.table
–...
© Hortonworks Inc. 2012
rmr2
• Enables writing MapReduce jobs using R
• Ability to parallelize algorithms
• Ability to use...
© Hortonworks Inc. 2012
Sample code - wordcount
Page 22
wc.map = !
function(., lines) {!
keyval(!
unlist(!
strsplit(!
x = ...
© Hortonworks Inc. 2012
More Sample Code
Page 23
groups = rbinom(32, n = 50, prob = 0.4)!
tapply(groups, groups, length)!
...
© Hortonworks Inc. 2012
Deployment Considerations
Page 24
TT , DN,
RS
R
.
.
.
.
.
.
.
TT , DN,
RS
RJT
R Edge
Node
NN
HT
G
© Hortonworks Inc. 2012
RHadoop
• Limitations
– Requires installation of R on all TaskTracker nodes
– Does not automatical...
© Hortonworks Inc. 2012
Getting Started
Page 26
© Hortonworks Inc. 2012
Your Fastest On-ramp to Enterprise Hadoop™!
Page 27
http://hortonworks.com/products/hortonworks-sa...
© Hortonworks Inc. 2012
Installation
• Install R on all nodes
• Install dependent
packages
– RJSONIO
– itertools
– digest
...
© Hortonworks Inc. 2012
Questions & Answers
TRY
Download HDP at hortonworks.com
LEARN
Applying Data Science using Apache
H...
Upcoming SlideShare
Loading in …5
×

Enabling R on Hadoop

36,776 views

Published on

Hadoop, being a disruptive data processing framework, has made a large impact in the data ecosystems of today. Enabling business users to translate existing skills to Hadoop is necessary to encourage the adoption and allow businesses to get value out of their Hadoop investment quickly. R, being a prolific and rapidly growing data analysis language, now has a place in the Hadoop ecosystem. With the advent of technologies such as RHadoop, optimizing R workloads for use on Hadoop has become much easier. This session will help you understand how RHadoop projects such as RMR, and RHDFS work with Hadoop, and will show you examples of using these technologies on the Hortonworks Data Platform.

Published in: Technology
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download Full EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download Full doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THIS can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THIS is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THIS Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THIS the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THIS Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Can you earn $7000 a month from home? Are you feeling trapped by your life? Stuck in a dead-end job you hate, but too scared to call it quits, because after all, the rent's due on the first of the month, right? Are you ready to change your life for the better? ➤➤ http://t.cn/AisJWUCf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Sex in your area is here: ❶❶❶ http://bit.ly/39sFWPG ❶❶❶
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Dating for everyone is here: ❶❶❶ http://bit.ly/39sFWPG ❶❶❶
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • accessibility Books Library allowing access to top content, including thousands of title from favorite author, plus the ability to read or download a huge selection of books for your pc or smartphone within minutes.........ACCESS WEBSITE Over for All Ebooks ..... (Unlimited) ......................................................................................................................... Download FULL PDF EBOOK here { https://urlzs.com/UABbn } .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Enabling R on Hadoop

  1. 1. © Hortonworks Inc. 2012 Enabling R on Hadoop July 11, 2013 Page 1
  2. 2. © Hortonworks Inc. 2012 Your Presenters Ravi Mutyala Systems Architect Page 2 Paul Codding Solutions Engineer
  3. 3. © Hortonworks Inc. 2012 Agenda • A Brief History of R • How R is Typically Used • How R is Used with Hadoop • Getting Started Page 3
  4. 4. © Hortonworks Inc. 2012 A Brief History of R Page 4
  5. 5. © Hortonworks Inc. 2012 History of R Page 5 1976: S Fortran John Chambers S 1988: S V3 written in C & statistical models included 1998: S V4 1991: R Created by Ross Ihaka & Robert Gentleman R 1997: R Core Group Formed 2000: R Version 1.0 released
  6. 6. © Hortonworks Inc. 2012 How R is Typically Used Page 6
  7. 7. © Hortonworks Inc. 2012 Main Uses of R • Statistical Analysis & Modeling – Classification – Scoring – Ranking – Clustering – Finding relationships – Characterization • Common Uses – Interactive Data Analysis – General Purpose Statistics – Predictive Modeling Page 7
  8. 8. © Hortonworks Inc. 2012 How R is Used with Hadoop Page 8
  9. 9. © Hortonworks Inc. 2012 Hadoop Components Page 9 OS   Cloud   VM   Appliance   PLATFORM  SERVICES   HADOOP  CORE   DATA   SERVICES   OPERATIONAL   SERVICES   Manage & Operate at Scale Store, Process and Access Data Enterprise Readiness: HA, DR, Snapshots, Security, … HORTONWORKS     DATA  PLATFORM  (HDP)   Distributed Storage & ProcessingHDFS   YARN  (in  2.0)   WEBHDFS   MAP  REDUCE   HCATALOG   HIVE  PIG   HBASE   SQOOP   FLUME   OOZIE   AMBARI  
  10. 10. © Hortonworks Inc. 2012 Hadoop Components & R Page 10 OS   Cloud   VM   Appliance   PLATFORM  SERVICES   HADOOP  CORE   DATA   SERVICES   OPERATIONAL   SERVICES   Manage & Operate at Scale Store, Process and Access Data Enterprise Readiness: HA, DR, Snapshots, Security, … HORTONWORKS     DATA  PLATFORM  (HDP)   Distributed Storage & ProcessingHDFS   YARN  (in  2.0)   WEBHDFS   MAP  REDUCE   HCATALOG   HIVE  PIG   HBASE   SQOOP   FLUME   OOZIE   AMBARI   Data Service Components •  Hive •  HBase Hadoop Core •  Map Reduce •  HDFS
  11. 11. © Hortonworks Inc. 2012 Options for R on Hadoop • Options – RODBC/RJDBC – RHive – RHadoop • Analysis – Focus – Integration Ease – Benefits – Limitations Page 11 RHadoop RODBC/RJDBC RHive
  12. 12. © Hortonworks Inc. 2012 RODBC/RJDBC • Focus – SQL Access from R • Integration Ease – Install Hortonworks Hive ODBC Driver – Install Hive libraries • Benefits – Low impact on existing R scripts leveraging other DB packages – Not required to install Hadoop configuration/binaries on client machines • Limitations – Parallelism limited to Hive – Result set size Page 12
  13. 13. © Hortonworks Inc. 2012 Deployment Considerations Page 13 TT , DN . . . . . . . TT , DNJTNNHS
  14. 14. © Hortonworks Inc. 2012 RHive • Focus – Broad access to Hive and HDFS • Integration Ease – Requires Hadoop binaries, libraries, and configuration files on client machines – Uses Java DFS Client and HiveServer • Benefits – Wide range of features expressed through HQL – rhive-apply R Distributed apply function using HQL • Limitations – Requires heavy client deployment – Dependent on HiveServer, and can’t be used with HiveServer2 Page 14
  15. 15. © Hortonworks Inc. 2012 Deployment Considerations Page 15 TT + DN . . . . . . . TT + DN JT R Edge Node NNHS
  16. 16. © Hortonworks Inc. 2012 RHadoop • Focus – Tight integration with core Hadoop components • Benefit – Ability to run R on a massively distributed system – Ability to work with full data sets instead of sample sets • Additional Information – https://github.com/RevolutionAnalytics/RHadoop/wiki Page 16
  17. 17. © Hortonworks Inc. 2012 RHadoop Architecture Page 17 R rhdfs rhbase rmr2 HDFS HBase Thrift Gateway Map Reduce HBase Streaming R R R R
  18. 18. © Hortonworks Inc. 2012 rhdfs • Access HDFS from R • Read from HDFS to R dataframe • Write from R dataframe to HDFS • 1.0.6 adds support for Windows (using HDP) Page 18
  19. 19. © Hortonworks Inc. 2012 rhdfs • Hadoop CLI Commands & rhdfs equivalent • hadoop fs –ls / – hdfs.ls(“/”) • hadoop fs –mkdir /user/rhdfs/ppt – hdfs.mkdir(“/user/rhdfs/ppt”) • hadoop fs –put 1.txt /user/rhfds/ppt/ – localData <- system.file(file.path("unitTestData", ”1.txt"), package="rhdfs”) – hdfs.put(localData, ”/user/rhdfs/ppt/1.txt”) • hadoop fs –get /user/rhdfs/ppt/1.txt 1.txt – hdfs.get(”/user/rhdfs/ppt/1.txt”,”test”) • hadoop fs –rm /user/rhdfs/ppt/1.txt – hdfs.delete(“/user/rhdfs/ppt/1.txt”) Page 19
  20. 20. © Hortonworks Inc. 2012 rhbase • Access and change data within HBase • Uses Thrift API • Command Examples – hb.new.table – hb.insert – hb.scan.ex – hb.scan Page 20
  21. 21. © Hortonworks Inc. 2012 rmr2 • Enables writing MapReduce jobs using R • Ability to parallelize algorithms • Ability to use big data sets without needing to sample data • mapreduce(input, output, map, reduce, …) • Reduces takes a key and a collection of values which could be vector, list, data frame or matrix • 2.2.1 adds support for Windows (using HDP) Page 21
  22. 22. © Hortonworks Inc. 2012 Sample code - wordcount Page 22 wc.map = ! function(., lines) {! keyval(! unlist(! strsplit(! x = lines,! split = pattern)),! 1)}! wc.reduce =! function(word, counts ) {! keyval(word, sum(counts))}! ! mapreduce(! input = input ,! output = output,! input.format = "text",! map = wc.map,! reduce = wc.reduce,! combine = T)}!
  23. 23. © Hortonworks Inc. 2012 More Sample Code Page 23 groups = rbinom(32, n = 50, prob = 0.4)! tapply(groups, groups, length)! groups = to.dfs(groups)! from.dfs(! mapreduce(! input = groups,! map = function(., v) keyval(v, 1),! reduce =! function(k, vv)! keyval(k, length(vv))))!
  24. 24. © Hortonworks Inc. 2012 Deployment Considerations Page 24 TT , DN, RS R . . . . . . . TT , DN, RS RJT R Edge Node NN HT G
  25. 25. © Hortonworks Inc. 2012 RHadoop • Limitations – Requires installation of R on all TaskTracker nodes – Does not automatically parallelize algorithms – Different slot/memory configuration recommended to leave memory and CPU resources for R Page 25 OS Map Reduce OS Map Reduce R
  26. 26. © Hortonworks Inc. 2012 Getting Started Page 26
  27. 27. © Hortonworks Inc. 2012 Your Fastest On-ramp to Enterprise Hadoop™! Page 27 http://hortonworks.com/products/hortonworks-sandbox/ The Sandbox lets you experience Apache Hadoop from the convenience of your own laptop – no data center, no cloud and no internet connection needed! The Hortonworks Sandbox is: •  A free download: http://hortonworks.com/products/hortonworks-sandbox/ •  A complete, self contained virtual machine with Apache Hadoop pre-configured •  A personal, portable and standalone Hadoop environment •  A set of hands-on, step-by-step tutorials that allow you to learn and explore Hadoop
  28. 28. © Hortonworks Inc. 2012 Installation • Install R on all nodes • Install dependent packages – RJSONIO – itertools – digest – Rcpp – rJava – functional – RCurl – httr – plyr • Download & Install RHadoop Packages – rmr2 – rhdfs – rhbase (requires Thrift) Page 28
  29. 29. © Hortonworks Inc. 2012 Questions & Answers TRY Download HDP at hortonworks.com LEARN Applying Data Science using Apache Hadoop Training FOLLOW twitter: @hortonworks Facebook: facebook.com/hortonworks Page 29 Further questions & comments: paul@hortonworks.com ravi@hortonworks.com

×