Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

RHadoop

2,869 views

Published on

Big Data Analytics with R and Hadoop

Published in: Data & Analytics
  • DOWNLOAD THI5 BOOKS INTO AVAILABLE FORMAT (Unlimited) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { http://bit.ly/2m77EgH } ......................................................................................................................... Download Full EPUB Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... ACCESS WEBSITE for All Ebooks ......................................................................................................................... Download Full PDF EBOOK here { http://bit.ly/2m77EgH } ......................................................................................................................... Download EPUB Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... Download doc Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... ,DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... ,DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • hi sir this slides help me a lot
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

RHadoop

  1. 1. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Big Data Analytics with R and Hadoop D. Praveen Kumar Research Scholar (Full-Time) Department of Computer Science & Engineering YSREC of Yogi Vemana University, Proddatur Kadapa Dt., A. P, India November 30, 2016 YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 1 / 70
  2. 2. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples 1 Introduction 2 RHadoop 3 RHadoop Installation 4 rhdfs Methods 5 rmr2 6 Examples YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 2 / 70
  3. 3. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Big Data - Introduction Big Data has to deal with large and complex data sets that can be structured, semi-structured, or unstructured and will typically not fit into memory to be processed. They have to be processed in place, which means that computation has to be done where the data resides for processing. YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 3 / 70
  4. 4. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Big Data - 3V’s Velocity refers to the low latency, real-time speed at which the analytics need to be applied. (Example: to perform analytics on a continuous stream of data originating from a social networking site) Volume refers to the size of the data set. It may be in KB, MB, GB, TB, or PB based on the type of the application that generates or receives the data. Variety refers to the various types of the data that can exist, for example, text, audio, video, and photos. YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 4 / 70
  5. 5. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Big Data - 3V’s (Cont..) YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 5 / 70
  6. 6. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Popular Organizations that hold Big Data Some of the popular organizations that hold Big Data are as follows: (upto 2014) Facebook: It has 40 PB of data and captures 100 TB/day Yahoo!: It has 60 PB of data Twitter: It captures 8 TB/day EBay: It has 40 PB of data and captures 50 TB/day YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 6 / 70
  7. 7. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Hadoop - Introduction Apache Hadoop is an open source Java framework for processing and querying vast amounts of data on large clusters of commodity hardware. Hadoop is a top level Apache project, initiated and led by Yahoo! and Doug Cutting. Its impact can be boiled down to four salient characteristics: scalable, cost-effective, flexible, fault-tolerant solutions. Apache Hadoop has two main features: HDFS (Hadoop Distributed File System) - Storing Map Reduce - Processing YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 7 / 70
  8. 8. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Requirements Necessary Java >= 7 ssh Linux OS (Ubuntu >= 14.04) Hadoop framework Optional Eclipse Internet connection YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 8 / 70
  9. 9. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Java 7 & Installation Hadoop requires a working Java installation. However, using java 1.7 or more is recommended. Following command is used to install java in linux platform sudo apt-get install openjdk-7-jdk (or) sudo apt-get install default-jdk YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 9 / 70
  10. 10. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Java PATH Setup We need to set JAVA path Open the .bashrc file located in home directory gedit ~/.bashrc Add below line at the end: export JAVA HOME=/usr/lib/jvm/java−7−openjdk−amd64 YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 10 / 70
  11. 11. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Installation & Configuration of SSH Hadoop requires SSH(Secure Shell) access to manage its nodes, i.e. remote machines plus your local machine if you want to use Hadoop on it. Install SSH using following command sudo apt-get install ssh First, we have to generate DSA an SSH key for user. ssh-keygen -t dsa -P ’’ -f ~ /.ssh/id dsa cat ~ /.ssh/id dsa.pub >> ~ /.ssh/authorized keys YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 11 / 70
  12. 12. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Download & Extract Hadoop Download Hadoop from the Apache Download Mirrors http://mirror.fibergrid.in/apache/hadoop/common/ Extract the contents of the Hadoop package to a location of your choice. I picked /usr/local/hadoop. $ sudo chmod 777 /usr/local $ cd /usr/local $ tar xzf hadoop-2.7.2.tar.gz $ sudo mv hadoop-2.7.2 hadoop YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 12 / 70
  13. 13. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Add Hadoop configuration in .bashrc Add Hadoop configuration in .bashrc in home directory. export HADOOP INSTALL=/usr/local/hadoop export PATH=$PATH:$HADOOP INSTALL/bin export PATH=$PATH:$HADOOP INSTALL/sbin export HADOOP MAPRED HOME=$HADOOP INSTALL export HADOOP HDFS HOME=$HADOOP INSTALL export HADOOP COMMON HOME=$HADOOP INSTALL export YARN HOME=$HADOOP INSTALL export HADOOP COMMON LIB NATIVE DIR=$HADOOP INSTALL/lib/native export HADOOP OPTS="-Djava.library.path=$HADOOP INSTALL/lib" YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 13 / 70
  14. 14. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Create temp file, DataNode & NameNode Execute below commands to create NameNode mkdir -p /usr/local/hadoopdata/hdfs/namenode Execute below commands to create DataNode mkdir -p /usr/local/hadoopdata/hdfs/datanode Execute below code to create the tmp directory in hadoop sudo mkdir -p /app/hadoop/tmp sudo chown hadoop1:hadoop1 /app/hadoop/tmp sudo chmod 750 /app/hadoop/tmp YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 14 / 70
  15. 15. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Files to Configure The following are the files we need to configure core-site.xml hadoop-env.sh mapred-site.xml hdfs-site.xml YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 15 / 70
  16. 16. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Add properties in /usr/local/hadoop/etc/core-site.xml Add the following snippets between the < configuration > ... < /configuration > tags in the core-site.xml file. Add below property to specify the location of tmp < property > < name > hadoop.tmp.dir < /name > < value > /app/hadoop/tmp < /value > < /property > Add below property to specify the location of default file system and its port number. < property > < name > fs.default.name < /name > < value > hdfs : //localhost : 54310 < /value > < /property > YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 16 / 70
  17. 17. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Add properties in /usr/local/hadoop/etc/hadoop-env.sh Un-Comment the JAVA HOME and Give Correct Path For Java. export JAVA HOME=/usr/lib/jvm/java-7-openjdk-amd64 YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 17 / 70
  18. 18. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Add property in /usr/local/hadoop/etc/hadoop/mapred-site.xml In file we add The host name and port that the MapReduce job tracker runs at. Add following in mapred-site.xml : < property > < name > mapred.job.tracker < /name > < value > localhost : 54311 < /value > < /property > YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 18 / 70
  19. 19. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Add properties in ... etc/hadoop/hdfs-site.xml In file hdfs-site.xml add following: Add replication factor < property > < name > dfs.replication < /name > < value > 1 < /value > < /property > Specify the NameNode < property > < name > dfs.namenode.name.dir < /name > < value > file : /usr/local/hadoopdata/hdfs/namenode < /value > < /property > Specify the DataNode < property > < name > dfs.datanode.name.dir < /name > < value > file : /usr/local/hadoopdata/hdfs/datanode < /value > < /property > YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 19 / 70
  20. 20. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Formatting the HDFS file system via the NameNode The first step to starting up your Hadoop installation is Formatting the Hadoop file system We need to do this the first time you set up a Hadoop. Do not format a running Hadoop file system as you will lose all the data currently in HDFS To format the file system, run the command hadoop namenode -format YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 20 / 70
  21. 21. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Starting single-node cluster Run the command: start-all.sh This will startup a NameNode,SecondaryNameNode, DataNode, ResourceManager and a NodeManager on your machine. A nifty tool for checking whether the expected Hadoop processes are running is jps hadoop1@hadoop1:/usr/local/hadoop$ jps 2598 NameNode 3112 ResourceManager 3523 Jps 2917 SecondaryNameNode 2727 DataNode 3242 NodeManager YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 21 / 70
  22. 22. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Stopping your single-node cluster Run the command stop-all.sh To stop all the daemons running on your machine output will be like this. stopping NodeManager localhost: stopping ResourceManager stopping NameNode localhost: stopping DataNode localhost: stopping SecondaryNameNode YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 22 / 70
  23. 23. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples R - Introduction R is an open source software package to perform statistical analysis on data. R is a programming language developed from S(Statistical) R provides a wide variety of statistical, machine learning, graphical techniques, and is highly extensible. R can now connect with other data stores, such as MySQL, SQLite, MongoDB, and Hadoop etc., YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 23 / 70
  24. 24. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples R - Features Following are Some of the R Features Effective statistical programming language Relational database support Data analytics Data visualization Extension through the vast library of R packages YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 24 / 70
  25. 25. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples R - Operations R allows performing Data analytics by various operations such as: Regression Classification Clustering Recommendation Text mining YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 25 / 70
  26. 26. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples R - Installation (Windows) For Windows, follow the given steps: 1 Navigate to www.r-project.org. 2 Click on the CRAN section, select CRAN mirror, and select your Windows OS (stick to Linux; Hadoop is almost always used in a Linux environment). 3 Download the latest R version from the mirror. 4 Execute the downloaded .exe to install R. YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 26 / 70
  27. 27. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples R - Installation (Ubuntu) For Linux-Ubuntu, follow the given steps: 1 Navigate to www.r-project.org. 2 Click on the CRAN section, select CRAN mirror, and select your OS. 3 In the /etc/apt/sources.list file, add the CRAN < mirror > entry. 4 Download and update the package lists from the repositories using the sudo apt-get update command. 5 Install R system using the sudo apt-get install r-base command. YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 27 / 70
  28. 28. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples RHEL/CentOS For Linux-RHEL/CentOS, follow the given steps: 1 Navigate to www.r-project.org. 2 Click on CRAN, select CRAN mirror, and select Red Hat OS. 3 Download the R-*core-*.rpm file. 4 Install the .rpm package using the rpm -ivh R-*core-*.rpm command. 5 Install R system using sudo yum install R. YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 28 / 70
  29. 29. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Hadoop MapReduce in R Hadoop MapReduce in R, we can perform in Three Ways: 1 R and Hadoop Integrated Programming Environment (RHIPE) 2 HadoopStreaming 3 RHadoop Among these three RHadoop is efficient and easiest. YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 29 / 70
  30. 30. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples RHadoop - Introduction RHadoop was developed by Revolution Analytics RHadoop is available with three main R packages: 1 rhdfs - provides HDFS data operations 2 rmr - provides MapReduce execution operations 3 rhbase - input data source at the HBase Here it’s not necessary to install all of the three RHadoop packages to run the Hadoop MapReduce operations with R and Hadoop. YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 30 / 70
  31. 31. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples RHadoop - Architecture YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 31 / 70
  32. 32. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples rhdfs rhdfs is an R interface for providing the HDFS usability from the R console. rhdfs package calls the HDFS API in backend to operate data sources stored on HDFS. With rhdfs methods, R programmer can easily perform read and write operations on distributed data files. YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 32 / 70
  33. 33. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples rmr rmr is an R interface for providing Hadoop MapReduce facility inside the R environment. R programmer needs to just divide their application logic into the map and reduce phases and submit it with the rmr methods. After that, rmr calls the Hadoop streaming MapReduce API with several job parameters as input directory, output directory, mapper, reducer, and so on, to perform the R MapReduce job over Hadoop cluster. YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 33 / 70
  34. 34. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples rhbase rhbase is an R interface for operating the Hadoop HBase data source stored at the distributed network via a Thrift server. The rhbase package is designed with several methods for initialization and read/write and table manipulation operations. YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 34 / 70
  35. 35. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples R and Hadoop installation We already installed R and Hadoop YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 35 / 70
  36. 36. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Installing the R packages To connect R and Hadoop we need to install some of the packages: httr functional devtools plyr reshape2 rJava RJSONIO itertools digest Rcpp install.packages( c(’httr’,’functional’,’devtools’, ’plyr’,’reshape2’)) install.packages( c(’rJava’,’RJSONIO’, ’itertools’, ’digest’,’Rcpp’)) YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 36 / 70
  37. 37. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Setting environment variables We need to set following environment variables through R console. ## Setting HADOOP CMD Sys.setenv(HADOOP CMD="/usr/local/hadoop/bin/hadoop") ## Setting up HADOOP STREAMING Sys.setenv(HADOOP STREAMING="/usr/local/hadoop/share /hadoop/tools/lib/hadoop-streaming-2.7.3.jar") or, we can also set the R console via the command line as follows: export HADOOP CMD="/usr/local/hadoop/" export HADOOP STREAMING="/usr/local/hadoop/share /hadoop/tools/lib/hadoop-streaming-2.7.3.jar" YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 37 / 70
  38. 38. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Usage of Hadoop Streaming jar YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 38 / 70
  39. 39. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Downloading RHadoop Packages Download RHadoop packages from GitHub repository of Revolution Analytics: https://github.com/RevolutionAnalytics/RHadoop rmr: [rmr-2 3.3.1.tar.gz] rhdfs: [rhdfs-1.0.8.tar.gz] rhbase: [rhbase-1.2.1.tar.gz] We can install these packages using R-command line or RStudio YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 39 / 70
  40. 40. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Installing rmr package Install throught R Commander using the following Command R CMD INSTALL rmr-2 3.3.1.tar.gz Install using Rstudio follow the steps Click on Tools → Install Packages Change Install from option from Repository(CERN) to Package Archive File (.tar.gz) option Choose the rmr-2 3.3.1.tar.gz file from your local system Click on Install button (It also install supporting packages of rmr) YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 40 / 70
  41. 41. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Installing rhdfs package Install throught R Commander using the following Command R CMD INSTALL rhdfs-1.0.8.tar.gz Install using Rstudio follow the steps Click on Tools → Install Packages Change Install from option from Repository(CERN) to Package Archive File (.tar.gz) option Choose the rhdfs-1.0.8.tar.gz file from your local system Click on Install button (It also install supporting packages of rhdfs) YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 41 / 70
  42. 42. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Installing rhbase package Install throught R Commander using the following Command R CMD INSTALL rhbase-1.2.1.tar.gz Install using Rstudio follow the steps Click on Tools → Install Packages Change Install from option from Repository(CERN) to Package Archive File (.tar.gz) option Choose the rhbase-1.2.1.tar.gz file from your local system Click on Install button (It also install supporting packages of rhdfs) YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 42 / 70
  43. 43. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Loading the RHadoop libraries However we load a normal library in R, Similarly we can load RHadoop libraries using require() or library() methods. library(’rhdfs’) # Loading HDFS library(’rmr2’) # Loading MapReduce library(’rhbase’) # Loading HBase YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 43 / 70
  44. 44. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Initializing the RHadoop Initialize the rhdfs package with parameters specifying the location of the hadoop configuration files. Syntax: hdfs.init(hadoop=PATH) here PATH specifys the location of the hadoop configuration file. If we can’t pass any parameter, by default conguration files taken from the HADOOP CMD environment variable. YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 44 / 70
  45. 45. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples hdfs.ls It is useful to list files and directories of the HDFS. It returns the data frames that columns corresponding to permissions, owner, groups, size (in bytes), modification time and file or directory name. syntax: hdfs.ls(path, recurse=FALSE) If recurse is TRUE, It recursively shows the sub directories. YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 45 / 70
  46. 46. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples hdfs.defaults This method is used to set and get the default configurations of the HDFS Syntax: hdfs.defaults(arg) arg indicates name of the parameter or NULL. This function list following values local: rJava object corresponding to local system. blocksize: default block size of the files stored in HDFS fs: an rJava object corresponds to the HDFS fu: Helper object for rhdfs classpath: The java classpath replication: default replication factor in HDFS conf : name-value mappings for Hadoop configuration parameters YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 46 / 70
  47. 47. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples hdfs.defaults : Examples YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 47 / 70
  48. 48. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples hdfs.cat This method is useful to read the lines form a file on HDFS. Syntax: hdfs.cat(path,n,buffersize) path : Location of the source file n : Number of line read form file buffersize : Size of the buffer (Optional) Example: hdfs.cat(’/RHadoop/1/example.txt’) YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 48 / 70
  49. 49. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples hdfs.put This method is useful to transfer the data from the local system to HDFS. Syntax: hdfs.put(src,dest,dstFS=hdfs.defaults(”fs”)) src : Location of the source directory or file dest : Location of the destination directory or file dstFS : The destination file system (Optional) Example: hdfs.put(’/home/dp/Desktop/example.txt’,’/RHadoop/1/’) YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 49 / 70
  50. 50. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples hdfs.get This method is useful to transfer the data from the HDFS to local system. Syntax: hdfs.get(src,dest,srcFS=hdfs.defaults(”fs”)) src : Location of the source directory or file dest : Location of the destination directory or file srcFS : The source file system (Optional) Example: hdfs.get(’/RHadoop/1/’,’/home/dp/Desktop/1/’) YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 50 / 70
  51. 51. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples hdfs.copy | hdfs.cp This method is useful to copy the data from one location of the HDFS to another location in HDFS Syntax: hdfs.copy(src,dest,overwrite=FALSE) src : Location of the source directory or file dest : Location of the destination directory or file overwrite : If file exist, whether or not it should be overwritten Example: hdfs.copy(’/RHadoop/1/’,’/RHadoop/2/’) YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 51 / 70
  52. 52. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples hdfs.move This method is useful to move the data from one location of the HDFS to another location in HDFS and remove the source directory or file. Syntax: hdfs.move(src,dest) src : Location of the source directory or file dest : Location of the destination directory or file Example: hdfs.move(’/RHadoop/1/’,’/RHadoop/2/’) YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 52 / 70
  53. 53. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples hdfs.rename This method is useful to rename the file or directory in HDFS through R Syntax: hdfs.rename(src,dest) src : Location of the source directory or file dest : Location of the destination directory or file Example: hdfs.rename(’/RHadoop/1/example.txt’,’/RHadoop/1/sample.txt’) YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 53 / 70
  54. 54. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples hdfs.rm | hdfs.rmr | hdfs.delete These functions are used to delete files or directories of HDFS using R. Syntax: hdfs.delete(path) hdfs.rm(path) hdfs.rmr(path) Example: hdfs.delete("/RHadoop/1/") hdfs.rm("/RHadoop/1/") hdfs.rmr("/RHadoop/1/") YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 54 / 70
  55. 55. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples hdfs.chmod This method is useful to changing the permissions of HDFS files or Directories Syntax hdfs.chmod(Path, permissions= ’777’) permission is a character that represents permission of a file or directory,. Example hdfs.chmod("/RHadoop", permissions= ’777’) YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 55 / 70
  56. 56. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples hdfs.dircreate | hdfs.mkdir Both these functions will be used for creating a directory over the HDFS filesystem. Syntax: hdfs.mkdir(Dirname) Example: hdfs.mkdir("/RHadoop/3/") YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 56 / 70
  57. 57. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples hdfs.file This is used to initialize the file to be used for read/write operation on local system or HDFS. Syntax: hdfs.file(path, mode, buffersize ..) ’r’ for read mode, ’w’ for write mode. Append mode is not allowed. Example: f = hdfs.file("/RHadoop/2/README.txt","r",buffersize=104857600) YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 57 / 70
  58. 58. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples hdfs.write This is used to write in to the file stored at HDFS via streaming. Syntax: hdfs.write(object,con,hsync=FALSE) Object is any R object, con is HDFS connection Example: obj = c1,2,3,4,5,6,7 hdfs.write(object,con,hsync=FALSE) YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 58 / 70
  59. 59. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples hdfs.read This is used to read from binary files on the HDFS directory. This will use the stream for the deserialization of the data. Syntax: hdfs.read(con,n,start) n indicates number of bytes, start indicates starting block. Example: f = hdfs.file("/RHadoop/2/README.txt","r",buffersize=104857600) m = hdfs.read(f) c = rawToChar(m) print(c) YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 59 / 70
  60. 60. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples hdfs.close This is used to close the stream when a file operation is complete. It will close the stream and will not allow further file operations. Syntax: hdfs.close(con) con indicates connection of HDFS Example: hdfs.close(f) YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 60 / 70
  61. 61. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples hdfs.file.info This is used to get meta information about the file stored at HDFS. Syntax: hdfs.file.info(PATH) Example: YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 61 / 70
  62. 62. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples to.dfs Write R objects to the file system. Syntax: to.dfs(kv,output,format=”native”) kv means any valid key value pair or vector, matrix ect., output is any valid path, and format is string naming format Example: small.ints ← to.dfs(1:10) YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 62 / 70
  63. 63. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples from.dfs This is used to read the R objects from the HDFS filesystem that are in the binary encrypted format. Syntax: from.dfs(input,format) input is any valid path, and format is string naming format Example: from.dfs(’/tmp/RtmpRMIXzb/file2bda3fa07850’) YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 63 / 70
  64. 64. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples mapreduce This is used for defining and executing the MapReduce job. Syntax: mapreduce(input, output, map, reduce, input.format, output.format) input: Path to the input folder on HDFS output: Path to the output folder on HDFS map:An optional R function returning null or a value of keyval() reduce: An optional R function of two arguments, a key and a data structure representing all the values associated with key input.format: Type of input data output.format: Type of output data YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 64 / 70
  65. 65. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples keyval The keyval function is used to creates return values from map or reduce functions, themselves parameters to mapreduce. Syntax: keyval(key,val) Where key is the desired key or keys, and val is the desired value or values. YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 65 / 70
  66. 66. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples WordCount Mapreduce source code #Set Environment Variables Sys.setenv(HADOOP CMD="/usr/local/hadoop/bin/hadoop") Sys.setenv(HADOOP STREAMING="/usr/local/hadoop/share /hadoop/tools/lib/hadoop-streaming-2.7.1.jar") Sys.setenv(HADOOP HOME="/usr/local/hadoop/") # load librarys library(rmr2) library(rhdfs) # initiate rhdfs package hdfs.init() Cont.. YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 66 / 70
  67. 67. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples WordCount Mapreduce source code - cont.. map ← function(k,lines) { words.list ← strsplit(lines, ’ ’) words ← unlist(words.list) return( keyval(words, 1) ) } reduce ← function(word, counts) { keyval(word, sum(counts)) } wordcount ← function (input, output) { mapreduce(input=input, output=output, input.format="text", map=map, reduce=reduce) } Cont.. YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 67 / 70
  68. 68. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples WordCount Mapreduce source code - cont.. ## read text files from folder /in1/wc/ hdfs.root ← ’/in1’ hdfs.data ← file.path(hdfs.root, ’wc’) ## save result in folder /in1/out hdfs.out ← file.path(hdfs.root, ’out’) ## Submit job out ← wordcount(hdfs.data, hdfs.out) results ← from.dfs(out) results.df ← as.data.frame(results, stringsAsFactors=F) colnames(results.df) ← c(’word’, ’count’) head(results.df) YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 68 / 70
  69. 69. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples WordCount Output YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 69 / 70
  70. 70. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples thank You YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 70 / 70

×