Pingax
Big Data Analytics with R and Hadoop
http://pingax.com
How can R and Hadoop be used together?
Author : Vignesh Praj...
Pingax
Big Data Analytics with R and Hadoop
http://pingax.com
Analytics platform, Large organization can easily derive ins...
Pingax
Big Data Analytics with R and Hadoop
http://pingax.com
2. RHIPE - RHIPE is the R and Hadoop Integrated Programming ...
Upcoming SlideShare
Loading in …5
×

How can R and Hadoop be used together

915 views

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
915
On SlideShare
0
From Embeds
0
Number of Embeds
63
Actions
Shares
0
Downloads
17
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

How can R and Hadoop be used together

  1. 1. Pingax Big Data Analytics with R and Hadoop http://pingax.com How can R and Hadoop be used together? Author : Vignesh Prajapati Categories : Hadoop, Machine Learning, R Tagged as : Hadoop, Machine Learning, R Date : November 26, 2013 By inspired from this Quora question, I have been started working on how can R and Hadoop integrated to be used together? By very hard verification process, finally I got the possible ways to use R and Hadoop together for performing Big Data Analytics. This blog post is written with consideration of helping to a Data scientist, Data Engineers and Data Analysts who actually want a solution for running Machine Learning Application with Larger dataset. So, I would like to suggest some refined ways to get it possible. I assume here that you are interested to run a Machine Learning (Coursera - Join well known Online course by Professor Andrew NG) Algorithms over large size dataset due to some memory issues with single machine. As such, R users are not required to learn a new language, e.g., Java, or environment, e.g., cluster software and hardware, to work with Hadoop. Moreover, functionality from R open source packages can be used in the writing of mapper and reducer functions. Since the popularity of combined platform of R and Hadoop increases more and more, I think the Big Data Analytics can become a emerging trend. With the help of this parallel Data 1 / 3
  2. 2. Pingax Big Data Analytics with R and Hadoop http://pingax.com Analytics platform, Large organization can easily derive insightful insights to get bigger and bigger advantages from Big Data Analytics. Let's check about the outline of the ways, R and Hadoop can be integrated to scale data Analytics to Big Data Analytics. There are as given below, 1. RHadoop 2. RHIPE 3. ORCH 4. HadoopStreaming (R package) 5. Hadoop Streaming (HadoopStreaming Utility) Now have some warm discussion on real world test cases with popular Hadoop tools. To explain how this is possible, I am going to use various R and Hadoop tools. Why don't we check a list for useful software that can be used. We need following useful data driven tools group by technologies: 1. Linux-based Operating system Fast, secure and stylishly simple, the Ubuntu operating system is used by 20 million people worldwide every day. 1. Ubuntu - Ubuntu is Fast, secure and stylishly simple, the Ubuntu operating system is used by 20 million people worldwide every day. 2. CentOS - CentOS is an Enterprise-class Linux Distribution derived from sources freely provided to the public by a prominent North American Enterprise Linux vendor. 3. Redhat 2. R 1. R - R programming language for dealing with Machine Learning concepts 2. RStudio - RSTudio One only well-known IDE for R 3. Hadoop - 1. Hadoop - Hadoop is Open Source and Big Data Solution. Since its little bit hard to install Hadoop with its components, I would like to suggest you to try classic Hadoop Distribution provided by HortonWorks, Cloudera, mapR or Amazon EMR. There are possibly five ways to use R and Hadoop together. Let's lookup ahead on R and Hadoop integration - 1. RHadoop - RHadoop is a great open source solution for R and Hadoop provided by Revolution Analytics. RHadoop is bundled with four main R packages to manage and analyze the data with Hadoop framework. 2 / 3
  3. 3. Pingax Big Data Analytics with R and Hadoop http://pingax.com 2. RHIPE - RHIPE is the R and Hadoop Integrated Programming Environment specially designed with Divide and Recombine (D&R) techniques to analyze the large datasets. 3. ORCH - ORCH is Oracle R connector for Hadoop. ORCH can be used on the Oracle Big Data Appliance or on non-Oracle Hadoop clusters. 4. HadoopStreaming - Hadoopstreaming utilities as R scripts which is R packages available at CRAN. This R package is developed by David S. Rosenberg with the consideration of making this Hadoop Streaming more easy as possible for R users. 5. Hadoop Streaming - Hadoop Streamingis Hadoop utility which allows users to develop and run MapReduce program in language other than java. In the next of my blogs, I am writing on How Machine Learning can be performed with Big Data platform R and Hadoop. If you want me to write on a particular Tools and Technologies can be used for doing the same, let me know. Powered by TCPDF (www.tcpdf.org) 3 / 3

×