SlideShare a Scribd company logo
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
Big Data & Hadoop
D. Praveen Kumar
Research Scholar (Full-Time)
Department of Computer Science & Engineering
YSREC of Yogi Vemana University, Proddatur
Kadapa Dt., A. P, India
November 8, 2016
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 1 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
1 Introduction
2 Hadoop Installation
3 Hadoop Configuration
4 Starting & Stopping
5 Map Reduce
6 Execution
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 2 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
GUESTS =4
Transportation from railway station to your
home( one Auto/car is sufficient)
mom can prepare food or snacks without risk.
Your house is sufficient for Accommodation.
Facilities like bed, bathrooms, water and TV are
provided which you use.
You can talk to each other and crack jokes and
you can make them happy
Expenditure is nearly Rs.1000/-
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 3 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
GUESTS =100
Transportation = 25 autos/car or two
buses
Food = catering.
Accommodation = Lodge.
Facilities = AC, TV, and all other facilities
Maintenance= somewhat difficult
Expenditure =nearly Rs. 90,000/-
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 4 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
GUESTS =10000
Transportation = 2500 autos or 500 buses
Food = catering.
Accommodation = all Lodges, function
halls and cottages in the town.
Facilities = AC, TV, and all other
facilities are somewhat difficult to provide.
Maintenance= more difficult
Expenditure =nearly Rs. 2,00,000/-
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 5 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
GUESTS =10000000
Transportation=how many autos=?
Food =?
Accommodation =?
Facilities =?
Maintenance=?
Cost =?
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 6 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
Problems
Same we assume in computing environment
Difficult to handle a huge and ever growing amount of data
Processing of data can not be possible with few machines
distributing large data sets is difficult
Construction of online or offline models are very difficult
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 7 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
Solution
A single solution to all these problems is
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 8 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
What is Big Data?
Big data refers to voluminous amounts of structured or
unstructured data that organizations can potentially mine and
analyze.
Big data is huge amount of large data sets characterized by
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 9 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
Big Data Platforms and Analytical Software
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 10 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
Hadoop
Here we go with
Why ?
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 11 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
Hadoop
Apache Hadoop is an open-source software framework for
distributed storage and distributed processing of very large data
sets on computer clusters built from commodity hardware.
The base Apache Hadoop framework is composed of the following
modules:
Hadoop Common contains libraries and utilities needed by
other Hadoop modules
Hadoop Distributed File System (HDFS) a distributed
file-system that stores data
Hadoop YARN a resource-management platform
Hadoop MapReduce for large scale data processing.
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 12 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
Hadoop Components
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 13 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
Requirements
Necessary
Java >= 7
ssh
Linux OS (Ubuntu >=
14.04)
Hadoop framework
Optional
Eclipse
Internet connection
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 14 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
Java 7 & Installation
Hadoop requires a working Java installation. However, using
java 1.7 or more is recommended.
Following command is used to install java in linux platform
sudo apt-get install openjdk-7-jdk (or)
sudo apt-get install default-jdk
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 15 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
Java PATH Setup
We need to set JAVA path
Open the .bashrc file located in home directory
gedit ~/.bashrc
Add below line at the end:
export JAVA HOME=/usr/lib/jvm/java−7−openjdk−amd64
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 16 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
Installation & Configuration of SSH
Hadoop requires SSH(Secure Shell) access to manage its
nodes, i.e. remote machines plus your local machine if you
want to use Hadoop on it.
Install SSH using following command
sudo apt-get install ssh
First, we have to generate DSA an SSH key for user.
ssh-keygen -t dsa -P ’’ -f ~ /.ssh/id dsa
cat ~ /.ssh/id dsa.pub >> ~ /.ssh/authorized keys
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 17 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
Download & Extract Hadoop
Download Hadoop from the Apache Download Mirrors
http://mirror.fibergrid.in/apache/hadoop/common/
Extract the contents of the Hadoop package to a location of your
choice. I picked /usr/local/hadoop.
$ cd /usr/local
$ sudo tar xzf hadoop-2.7.2.tar.gz
$ sudo mv hadoop-2.7.2 hadoop
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 18 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
Add Hadoop configuration in .bashrc
Add Hadoop configuration in .bashrc in home directory.
export HADOOP INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP INSTALL/bin
export PATH=$PATH:$HADOOP INSTALL/sbin
export HADOOP MAPRED HOME=$HADOOP INSTALL
export HADOOP HDFS HOME=$HADOOP INSTALL
export HADOOP COMMON HOME=$HADOOP INSTALL
export YARN HOME=$HADOOP INSTALL
export HADOOP COMMON LIB NATIVE DIR=$HADOOP INSTALL/lib/native
export HADOOP OPTS="-Djava.library.path=$HADOOP INSTALL/lib"
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 19 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
Create temp file, DataNode & NameNode
Execute below commands to create NameNode
mkdir -p /usr/local/hadoopdata/hdfs/namenode
Execute below commands to create DataNode
mkdir -p /usr/local/hadoopdata/hdfs/datanode
Execute below code to create the tmp directory in hadoop
sudo mkdir -p /app/hadoop/tmp
sudo chown hadoop1:hadoop1 /app/hadoop/tmp
sudo chmod 750 /app/hadoop/tmp
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 20 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
Files to Configure
The following are the files we need to configure
core-site.xml
hadoop-env.sh
mapred-site.xml
hdfs-site.xml
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 21 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
Add properties in /usr/local/hadoop/etc/core-site.xml
Add the following snippets between the
< configuration > ... < /configuration > tags in the core-site.xml
file.
Add below property to specify the location of tmp
< property >
< name > hadoop.tmp.dir < /name >
< value > /app/hadoop/tmp < /value >
< /property >
Add below property to specify the location of default file
system and its port number.
< property >
< name > fs.default.name < /name >
< value > hdfs : //localhost : 54310 < /value >
< /property >
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 22 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
Add properties in /usr/local/hadoop/etc/hadoop-env.sh
Un-Comment the JAVA HOME and Give Correct Path For
Java.
export JAVA HOME=/usr/lib/jvm/java-7-openjdk-amd64
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 23 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
Add property in
/usr/local/hadoop/etc/hadoop/mapred-site.xml
In file we add The host name and port that the MapReduce job
tracker runs at. Add following in mapred-site.xml :
< property >
< name > mapred.job.tracker < /name >
< value > localhost : 54311 < /value >
< /property >
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 24 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
Add properties in ... etc/hadoop/hdfs-site.xml
In file hdfs-site.xml add following:
Add replication factor
< property >
< name > dfs.replication < /name >
< value > 1 < /value >
< /property >
Specify the NameNode
< property >
< name > dfs.namenode.name.dir < /name >
< value > file : /usr/local/hadoopdata/hdfs/namenode < /value >
< /property >
Specify the DataNode
< property >
< name > dfs.datanode.name.dir < /name >
< value > file : /usr/local/hadoopdata/hdfs/datanode < /value >
< /property >
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 25 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
Formatting the HDFS filesystem via the NameNode
The first step to starting up your Hadoop installation is
Formatting the Hadoop file system
We need to do this the first time you set up a Hadoop.
Do not format a running Hadoop filesystem as you will lose all
the data currently in HDFS
To format the filesystem, run the command
hadoop namenode -format
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 26 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
Starting single-node cluster
Run the command:
start-all.sh
This will startup a NameNode,SecondaryNameNode,
DataNode, ResourceManager and a NodeManager on your
machine.
A nifty tool for checking whether the expected Hadoop
processes are running is jps
hadoop1@hadoop1:/usr/local/hadoop$ jps
2598 NameNode
3112 ResourceManager
3523 Jps
2917 SecondaryNameNode
2727 DataNode
3242 NodeManager
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 27 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
Stopping your single-node cluster
Run the command
stop-all.sh
To stop all the daemons running on your machine output will be
like this.
stopping NodeManager
localhost: stopping ResourceManager
stopping NameNode
localhost: stopping DataNode
localhost: stopping SecondaryNameNode
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 28 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
Map-Reduce Framework
Map Reduce programming paradigm
It relies basically on two functions, Map and Reduce
Map Reduce used to manage many large-scale computations
The framework takes care of scheduling tasks, monitoring
them and re-executes the failed tasks.
The framework to effectively schedule tasks on the nodes
where data is already present
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 29 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
Map-Reduce Computation Steps
The key-value pairs from each Map task are collected by a
master controller and sorted by key. The keys are divided
among all the Reduce tasks, so all key-value pairs with the
same key wind up at the same Reduce task.
The Reduce tasks work on one key at a time, and combine
all the values associated with that key in some way. The
manner of combination of values is determined by the code
written by the user for the Reduce function.
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 30 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
Hadoop - MapReduce
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 31 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
Hadoop - MapReduce (Word Count) Example
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 32 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
MapReduce - WordCountMapper
In WordCountMapper class we perform the following operations
Read a line from file
Split line into Words
Assign Count 1 to each word
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 33 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
WordCountMapper source code
public static class WordCountMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context ) throws
IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 34 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
MapReduce - WordCountReducer
In WordCountReducer class we perform the following operations
Sum the list of values
Assign sum to corresponding word
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 35 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
WordCountReducer source code
public static class WordCountReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context ) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 36 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
WordCountJob
public class WordCountJob {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "word count");
job.setJarByClass(WordCountJob.class);
job.setMapperClass(WordCountMapper.class);
job.setCombinerClass(WordCountReducer.class);
job.setReducerClass(WordCountReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 37 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
Header Files to include
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 38 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
Execution of Hadoop Program in Eclipse
Step1:
1 Starting Hadoop in terminal using command:
$ Start-all.sh
2 Use JPS command to check all services of Hadoop are started
or not.
Step 2: open Eclipse
Step 3: Go to file ⇒ New ⇒ Project
Select Java Project and click on Next button
Write project name and click on Finish button
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 39 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
Continue...
Step 4: Right side it creates a project
1 Right click on Project ⇒ New ⇒ Class
2 Write Name of Class and then Click Finish
3 Write MapReduce program in that class
Step 5: Write JAVA Program
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 40 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
Continue...
Step 6: Importing JAR files
1 Right click on Project and select properties (Alt+Enter)
2 Select Java Build Path ⇒ Click on Libraries, then click on add
external JARS
3 Select the following jars from Hadoop library.
/usr/local/Hadoop/share/Hadoop/common/libs
/usr/local/Hadoop/share/Hadoop/hdfs/libs
/usr/local/Hadoop/share/Hadoop/httpfs/libs
/usr/local/Hadoop/share/Hadoop/mapreduce/libs
/usr/local/Hadoop/share/Hadoop/yarn/libs
/usr/local/Hadoop/share/Hadoop/tools/
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 41 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
Continue ....
Step 7: Set input file path
1 Create folder in home dir
2 copy text files in to that
3 Select path of Input
Step 8: Set input and output path
1 right click on source ⇒ Run As ⇒ Run Configuration ⇒
Argument
2 Enter your input and out put path with a single space
3 click on Run
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 42 / 43
Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution
thank You
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 8, 2016 Slide: 43 / 43

More Related Content

What's hot

What's hot (20)

Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Hadoop YARN
Hadoop YARNHadoop YARN
Hadoop YARN
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 
Testing Hadoop jobs with MRUnit
Testing Hadoop jobs with MRUnitTesting Hadoop jobs with MRUnit
Testing Hadoop jobs with MRUnit
 
Introduction to pig & pig latin
Introduction to pig & pig latinIntroduction to pig & pig latin
Introduction to pig & pig latin
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Apache PIG
Apache PIGApache PIG
Apache PIG
 
Hadoop scheduler
Hadoop schedulerHadoop scheduler
Hadoop scheduler
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Hadoop
Hadoop Hadoop
Hadoop
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
 
RAID seminar
RAID seminarRAID seminar
RAID seminar
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 

Similar to Hadoop installation, Configuration, and Mapreduce program

Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics
Data 360 Conference: Introduction to Big Data, Hadoop and Big Data AnalyticsData 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics
Data 360 Conference: Introduction to Big Data, Hadoop and Big Data AnalyticsAvkash Chauhan
 
Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)Søren Lund
 
Sentiment Analysis using Big Data
Sentiment Analysis using Big Data Sentiment Analysis using Big Data
Sentiment Analysis using Big Data Rajat Mittal
 
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaPig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaEdureka!
 
Webinar: Ways to Succeed with Hadoop in 2015
Webinar: Ways to Succeed with Hadoop in 2015Webinar: Ways to Succeed with Hadoop in 2015
Webinar: Ways to Succeed with Hadoop in 2015Edureka!
 
HDFS & MapReduce
HDFS & MapReduceHDFS & MapReduce
HDFS & MapReduceSkillspeed
 
Hadoop and Big Data for Absolute Beginners
Hadoop and Big Data for Absolute BeginnersHadoop and Big Data for Absolute Beginners
Hadoop and Big Data for Absolute BeginnersSam Dias
 
Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopIntroduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopAvkash Chauhan
 
HDFS presented by VIJAY
HDFS presented by VIJAYHDFS presented by VIJAY
HDFS presented by VIJAYthevijayps
 
Hadoop online training
Hadoop online training Hadoop online training
Hadoop online training Keylabs
 
Hadoop framework thesis (3)
Hadoop framework thesis (3)Hadoop framework thesis (3)
Hadoop framework thesis (3)JonySaini2
 
Best Hadoop and Amazon Online Training
Best Hadoop and Amazon Online TrainingBest Hadoop and Amazon Online Training
Best Hadoop and Amazon Online TrainingSamatha Kamuni
 
Hadoop and aws map reducecourse
Hadoop and aws map reducecourseHadoop and aws map reducecourse
Hadoop and aws map reducecourseSamatha Kamuni
 

Similar to Hadoop installation, Configuration, and Mapreduce program (20)

Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics
Data 360 Conference: Introduction to Big Data, Hadoop and Big Data AnalyticsData 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics
Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics
 
RHadoop
RHadoopRHadoop
RHadoop
 
Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)
 
Sentiment Analysis using Big Data
Sentiment Analysis using Big Data Sentiment Analysis using Big Data
Sentiment Analysis using Big Data
 
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaPig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
 
Hadoop content
Hadoop contentHadoop content
Hadoop content
 
Hadoop 2.4 installing on ubuntu 14.04
Hadoop 2.4 installing on ubuntu 14.04Hadoop 2.4 installing on ubuntu 14.04
Hadoop 2.4 installing on ubuntu 14.04
 
Webinar: Ways to Succeed with Hadoop in 2015
Webinar: Ways to Succeed with Hadoop in 2015Webinar: Ways to Succeed with Hadoop in 2015
Webinar: Ways to Succeed with Hadoop in 2015
 
HDFS & MapReduce
HDFS & MapReduceHDFS & MapReduce
HDFS & MapReduce
 
Upgrading hadoop
Upgrading hadoopUpgrading hadoop
Upgrading hadoop
 
Hadoop and Big Data for Absolute Beginners
Hadoop and Big Data for Absolute BeginnersHadoop and Big Data for Absolute Beginners
Hadoop and Big Data for Absolute Beginners
 
Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopIntroduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache Hadoop
 
Spark vs Hadoop
Spark vs HadoopSpark vs Hadoop
Spark vs Hadoop
 
HDFS presented by VIJAY
HDFS presented by VIJAYHDFS presented by VIJAY
HDFS presented by VIJAY
 
Hadoop presentation
Hadoop presentationHadoop presentation
Hadoop presentation
 
Hadoop online training
Hadoop online training Hadoop online training
Hadoop online training
 
Hadoop framework thesis (3)
Hadoop framework thesis (3)Hadoop framework thesis (3)
Hadoop framework thesis (3)
 
Best Hadoop and Amazon Online Training
Best Hadoop and Amazon Online TrainingBest Hadoop and Amazon Online Training
Best Hadoop and Amazon Online Training
 
Hadoop and aws map reducecourse
Hadoop and aws map reducecourseHadoop and aws map reducecourse
Hadoop and aws map reducecourse
 
Hadoop
HadoopHadoop
Hadoop
 

Recently uploaded

Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsalex933524
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJames Polillo
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单yhkoc
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsCEPTES Software Inc
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单ewymefz
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .NABLAS株式会社
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesStarCompliance.io
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictJack Cole
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIAlejandraGmez176757
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...correoyaya
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?DOT TECH
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...elinavihriala
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxDilipVasan
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单enxupq
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单enxupq
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sMAQIB18
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBAlireza Kamrani
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundOppotus
 

Recently uploaded (20)

Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDB
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 

Hadoop installation, Configuration, and Mapreduce program

  • 1. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution Big Data & Hadoop D. Praveen Kumar Research Scholar (Full-Time) Department of Computer Science & Engineering YSREC of Yogi Vemana University, Proddatur Kadapa Dt., A. P, India November 8, 2016 Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 1 / 43
  • 2. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution 1 Introduction 2 Hadoop Installation 3 Hadoop Configuration 4 Starting & Stopping 5 Map Reduce 6 Execution Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 2 / 43
  • 3. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution GUESTS =4 Transportation from railway station to your home( one Auto/car is sufficient) mom can prepare food or snacks without risk. Your house is sufficient for Accommodation. Facilities like bed, bathrooms, water and TV are provided which you use. You can talk to each other and crack jokes and you can make them happy Expenditure is nearly Rs.1000/- Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 3 / 43
  • 4. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution GUESTS =100 Transportation = 25 autos/car or two buses Food = catering. Accommodation = Lodge. Facilities = AC, TV, and all other facilities Maintenance= somewhat difficult Expenditure =nearly Rs. 90,000/- Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 4 / 43
  • 5. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution GUESTS =10000 Transportation = 2500 autos or 500 buses Food = catering. Accommodation = all Lodges, function halls and cottages in the town. Facilities = AC, TV, and all other facilities are somewhat difficult to provide. Maintenance= more difficult Expenditure =nearly Rs. 2,00,000/- Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 5 / 43
  • 6. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution GUESTS =10000000 Transportation=how many autos=? Food =? Accommodation =? Facilities =? Maintenance=? Cost =? Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 6 / 43
  • 7. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution Problems Same we assume in computing environment Difficult to handle a huge and ever growing amount of data Processing of data can not be possible with few machines distributing large data sets is difficult Construction of online or offline models are very difficult Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 7 / 43
  • 8. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution Solution A single solution to all these problems is Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 8 / 43
  • 9. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution What is Big Data? Big data refers to voluminous amounts of structured or unstructured data that organizations can potentially mine and analyze. Big data is huge amount of large data sets characterized by Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 9 / 43
  • 10. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution Big Data Platforms and Analytical Software Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 10 / 43
  • 11. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution Hadoop Here we go with Why ? Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 11 / 43
  • 12. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution Hadoop Apache Hadoop is an open-source software framework for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. The base Apache Hadoop framework is composed of the following modules: Hadoop Common contains libraries and utilities needed by other Hadoop modules Hadoop Distributed File System (HDFS) a distributed file-system that stores data Hadoop YARN a resource-management platform Hadoop MapReduce for large scale data processing. Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 12 / 43
  • 13. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution Hadoop Components Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 13 / 43
  • 14. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution Requirements Necessary Java >= 7 ssh Linux OS (Ubuntu >= 14.04) Hadoop framework Optional Eclipse Internet connection Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 14 / 43
  • 15. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution Java 7 & Installation Hadoop requires a working Java installation. However, using java 1.7 or more is recommended. Following command is used to install java in linux platform sudo apt-get install openjdk-7-jdk (or) sudo apt-get install default-jdk Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 15 / 43
  • 16. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution Java PATH Setup We need to set JAVA path Open the .bashrc file located in home directory gedit ~/.bashrc Add below line at the end: export JAVA HOME=/usr/lib/jvm/java−7−openjdk−amd64 Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 16 / 43
  • 17. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution Installation & Configuration of SSH Hadoop requires SSH(Secure Shell) access to manage its nodes, i.e. remote machines plus your local machine if you want to use Hadoop on it. Install SSH using following command sudo apt-get install ssh First, we have to generate DSA an SSH key for user. ssh-keygen -t dsa -P ’’ -f ~ /.ssh/id dsa cat ~ /.ssh/id dsa.pub >> ~ /.ssh/authorized keys Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 17 / 43
  • 18. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution Download & Extract Hadoop Download Hadoop from the Apache Download Mirrors http://mirror.fibergrid.in/apache/hadoop/common/ Extract the contents of the Hadoop package to a location of your choice. I picked /usr/local/hadoop. $ cd /usr/local $ sudo tar xzf hadoop-2.7.2.tar.gz $ sudo mv hadoop-2.7.2 hadoop Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 18 / 43
  • 19. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution Add Hadoop configuration in .bashrc Add Hadoop configuration in .bashrc in home directory. export HADOOP INSTALL=/usr/local/hadoop export PATH=$PATH:$HADOOP INSTALL/bin export PATH=$PATH:$HADOOP INSTALL/sbin export HADOOP MAPRED HOME=$HADOOP INSTALL export HADOOP HDFS HOME=$HADOOP INSTALL export HADOOP COMMON HOME=$HADOOP INSTALL export YARN HOME=$HADOOP INSTALL export HADOOP COMMON LIB NATIVE DIR=$HADOOP INSTALL/lib/native export HADOOP OPTS="-Djava.library.path=$HADOOP INSTALL/lib" Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 19 / 43
  • 20. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution Create temp file, DataNode & NameNode Execute below commands to create NameNode mkdir -p /usr/local/hadoopdata/hdfs/namenode Execute below commands to create DataNode mkdir -p /usr/local/hadoopdata/hdfs/datanode Execute below code to create the tmp directory in hadoop sudo mkdir -p /app/hadoop/tmp sudo chown hadoop1:hadoop1 /app/hadoop/tmp sudo chmod 750 /app/hadoop/tmp Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 20 / 43
  • 21. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution Files to Configure The following are the files we need to configure core-site.xml hadoop-env.sh mapred-site.xml hdfs-site.xml Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 21 / 43
  • 22. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution Add properties in /usr/local/hadoop/etc/core-site.xml Add the following snippets between the < configuration > ... < /configuration > tags in the core-site.xml file. Add below property to specify the location of tmp < property > < name > hadoop.tmp.dir < /name > < value > /app/hadoop/tmp < /value > < /property > Add below property to specify the location of default file system and its port number. < property > < name > fs.default.name < /name > < value > hdfs : //localhost : 54310 < /value > < /property > Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 22 / 43
  • 23. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution Add properties in /usr/local/hadoop/etc/hadoop-env.sh Un-Comment the JAVA HOME and Give Correct Path For Java. export JAVA HOME=/usr/lib/jvm/java-7-openjdk-amd64 Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 23 / 43
  • 24. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution Add property in /usr/local/hadoop/etc/hadoop/mapred-site.xml In file we add The host name and port that the MapReduce job tracker runs at. Add following in mapred-site.xml : < property > < name > mapred.job.tracker < /name > < value > localhost : 54311 < /value > < /property > Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 24 / 43
  • 25. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution Add properties in ... etc/hadoop/hdfs-site.xml In file hdfs-site.xml add following: Add replication factor < property > < name > dfs.replication < /name > < value > 1 < /value > < /property > Specify the NameNode < property > < name > dfs.namenode.name.dir < /name > < value > file : /usr/local/hadoopdata/hdfs/namenode < /value > < /property > Specify the DataNode < property > < name > dfs.datanode.name.dir < /name > < value > file : /usr/local/hadoopdata/hdfs/datanode < /value > < /property > Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 25 / 43
  • 26. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution Formatting the HDFS filesystem via the NameNode The first step to starting up your Hadoop installation is Formatting the Hadoop file system We need to do this the first time you set up a Hadoop. Do not format a running Hadoop filesystem as you will lose all the data currently in HDFS To format the filesystem, run the command hadoop namenode -format Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 26 / 43
  • 27. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution Starting single-node cluster Run the command: start-all.sh This will startup a NameNode,SecondaryNameNode, DataNode, ResourceManager and a NodeManager on your machine. A nifty tool for checking whether the expected Hadoop processes are running is jps hadoop1@hadoop1:/usr/local/hadoop$ jps 2598 NameNode 3112 ResourceManager 3523 Jps 2917 SecondaryNameNode 2727 DataNode 3242 NodeManager Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 27 / 43
  • 28. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution Stopping your single-node cluster Run the command stop-all.sh To stop all the daemons running on your machine output will be like this. stopping NodeManager localhost: stopping ResourceManager stopping NameNode localhost: stopping DataNode localhost: stopping SecondaryNameNode Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 28 / 43
  • 29. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution Map-Reduce Framework Map Reduce programming paradigm It relies basically on two functions, Map and Reduce Map Reduce used to manage many large-scale computations The framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks. The framework to effectively schedule tasks on the nodes where data is already present Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 29 / 43
  • 30. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution Map-Reduce Computation Steps The key-value pairs from each Map task are collected by a master controller and sorted by key. The keys are divided among all the Reduce tasks, so all key-value pairs with the same key wind up at the same Reduce task. The Reduce tasks work on one key at a time, and combine all the values associated with that key in some way. The manner of combination of values is determined by the code written by the user for the Reduce function. Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 30 / 43
  • 31. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution Hadoop - MapReduce Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 31 / 43
  • 32. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution Hadoop - MapReduce (Word Count) Example Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 32 / 43
  • 33. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution MapReduce - WordCountMapper In WordCountMapper class we perform the following operations Read a line from file Split line into Words Assign Count 1 to each word Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 33 / 43
  • 34. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution WordCountMapper source code public static class WordCountMapper extends Mapper<Object, Text, Text, IntWritable>{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 34 / 43
  • 35. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution MapReduce - WordCountReducer In WordCountReducer class we perform the following operations Sum the list of values Assign sum to corresponding word Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 35 / 43
  • 36. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution WordCountReducer source code public static class WordCountReducer extends Reducer<Text,IntWritable,Text,IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 36 / 43
  • 37. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution WordCountJob public class WordCountJob { public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf, "word count"); job.setJarByClass(WordCountJob.class); job.setMapperClass(WordCountMapper.class); job.setCombinerClass(WordCountReducer.class); job.setReducerClass(WordCountReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } } Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 37 / 43
  • 38. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution Header Files to include import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.GenericOptionsParser; Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 38 / 43
  • 39. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution Execution of Hadoop Program in Eclipse Step1: 1 Starting Hadoop in terminal using command: $ Start-all.sh 2 Use JPS command to check all services of Hadoop are started or not. Step 2: open Eclipse Step 3: Go to file ⇒ New ⇒ Project Select Java Project and click on Next button Write project name and click on Finish button Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 39 / 43
  • 40. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution Continue... Step 4: Right side it creates a project 1 Right click on Project ⇒ New ⇒ Class 2 Write Name of Class and then Click Finish 3 Write MapReduce program in that class Step 5: Write JAVA Program Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 40 / 43
  • 41. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution Continue... Step 6: Importing JAR files 1 Right click on Project and select properties (Alt+Enter) 2 Select Java Build Path ⇒ Click on Libraries, then click on add external JARS 3 Select the following jars from Hadoop library. /usr/local/Hadoop/share/Hadoop/common/libs /usr/local/Hadoop/share/Hadoop/hdfs/libs /usr/local/Hadoop/share/Hadoop/httpfs/libs /usr/local/Hadoop/share/Hadoop/mapreduce/libs /usr/local/Hadoop/share/Hadoop/yarn/libs /usr/local/Hadoop/share/Hadoop/tools/ Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 41 / 43
  • 42. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution Continue .... Step 7: Set input file path 1 Create folder in home dir 2 copy text files in to that 3 Select path of Input Step 8: Set input and output path 1 right click on source ⇒ Run As ⇒ Run Configuration ⇒ Argument 2 Enter your input and out put path with a single space 3 click on Run Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 42 / 43
  • 43. Outline Introduction Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Execution thank You Bapatla Engineering College, Bapatla, Guntur Big Data & Hadoop November 8, 2016 Slide: 43 / 43