SlideShare a Scribd company logo
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Big Data & Hadoop
D. Praveen Kumar
Junior Research Fellow
Department of Computer Science & Engineering
Indian Institute of Technology (Indian School of Mines)
Dhanbad, Jharkhand, India
Head of IT & ITES, Skill Subsist Impels Ltd, Tirupati.
March 25, 2017
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 1 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
1 Introduction
2 Big Data
3 Sources of Big Data
4 Tools
5 HDFS
6 Installation
7 Configuration
8 Starting & Stopping
9 Map Reduce
10 Execution
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 2 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Data
Data means a value or set of values.
Examples:
march 1st 2017
20, 30, 40
ΨΦϕ
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 3 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Information
Meaningful or preprocessed data we called as Information.
Examples:
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 4 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Data Types
The kind of data that may appear in a computer.
Examples: int
float
char
double
Abstract data types -user defined data types.
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 5 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Traditional approaches
Traditional approaches to store and process the data
1 File system
2 RDBMS (Relational Database Management Systems)
3 Data Warehouse & Mining Tools
4 Grid Computing
5 Volunteer Computing
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 6 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
GUESTS =4
Transportation from railway station to your
home( one Auto/car is sufficient)
mom can prepare food or snacks without risk.
Your house is sufficient for Accommodation.
Facilities like bed, bathrooms, water and TV are
provided which you use.
You can talk to each other and crack jokes and
you can make them happy
Expenditure is nearly Rs.1000/-
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 7 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
GUESTS =100
Transportation = 25 autos/car or two
buses
Food = catering.
Accommodation = Lodge.
Facilities = AC, TV, and all other facilities
Maintenance= somewhat difficult
Expenditure =nearly Rs. 90,000/-
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 8 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
GUESTS =10000
Transportation = 2500 autos or 500 buses
Food = catering.
Accommodation = all Lodges, function
halls and cottages in the town.
Facilities = AC, TV, and all other
facilities are somewhat difficult to provide.
Maintenance= more difficult
Expenditure =nearly Rs. 2,00,000/-
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 9 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Grid Computing
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 10 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Volunteer Computing
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 11 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
GUESTS =10000000
Transportation=how many autos=?
Food =?
Accommodation =?
Facilities =?
Maintenance=?
Cost =?
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 12 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Problems
Same we assume in computing environment
Difficult to handle a huge and ever growing amount of data
Processing of data can not be possible with few machines
distributing large data sets is difficult
Construction of online or offline models are very difficult
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 13 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Solution
A single solution to all these problems is
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 14 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
What is Big Data?
Big data refers to voluminous amounts of structured or
unstructured data that organizations can potentially mine and
analyze.
Big data is huge amount of large data sets characterized by
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 15 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Data generation
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 16 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
How Data generated
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 17 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Internet of Events
Internet is the main source to generating the wast amount of data.
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 18 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
4 Internet of Events
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 19 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
4 Questions of Data Analysts
1 What happened?
2 Why did it happen?
3 What will happen?
4 What is the best that can happen?
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 20 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Big Data Platforms and Analytical Software
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 21 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Hadoop
Here we go with
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 22 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Hadoop History
Hadoop was created by Doug Cutting, creator of Lucene.
He also involved in a project called Nutch. (It is basic version
of hadoop)
Nutch is a combination of MapReduce and NDFS (Nutch
Distributed File System)
Later Nutch renamed to Hadoop. (Mapreduce + HDFS
(Hadoop Distributed File System))
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 23 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Hadoop
Apache Hadoop is an open-source software framework for
distributed storage and distributed processing of very large data
sets on computer clusters built from commodity hardware.
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 24 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Hadoop
The base Apache Hadoop framework is composed of the following
modules:
Hadoop Common contains libraries and utilities needed by
other Hadoop modules
Hadoop Distributed File System (HDFS) a distributed
file-system that stores data
Hadoop YARN a resource-management platform
Hadoop MapReduce for large scale data processing.
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 25 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Hadoop Components
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 26 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Hadoop Components
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 27 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
HDFS- Goals
The design goals of HDFS
1 Very Large files
2 Streaming Data Access
3 Commodity Hardware
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 28 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
HDFS- Failed in
HDFS is Not FIT for
1 Lots of small files
2 Low latency database access
3 Multiple writers, arbitrary file modifications
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 29 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
HDFS- Concepts
1 Blocks
2 Namenodes
3 Datanodes
4 HDFS Federation
5 HDFS High Availability
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 30 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Requirements
Necessary
Java >= 7
ssh
Linux OS (Ubuntu >=
14.04)
Hadoop framework
Optional
Eclipse
Internet connection
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 31 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Java 7 & Installation
Hadoop requires a working Java installation. However, using
java 1.7 or more is recommended.
Following command is used to install java in linux platform
sudo apt-get install openjdk-7-jdk (or)
sudo apt-get install default-jdk
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 32 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Java PATH Setup
We need to set JAVA path
Open the .bashrc file located in home directory
gedit ~/.bashrc
Add below line at the end:
export JAVA HOME=/usr/lib/jvm/java−7−openjdk−amd64
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 33 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Installation & Configuration of SSH
Hadoop requires SSH(Secure Shell) access to manage its
nodes, i.e. remote machines plus your local machine if you
want to use Hadoop on it.
Install SSH using following command
sudo apt-get install ssh
First, we have to generate DSA an SSH key for user.
ssh-keygen -t dsa -P ’’ -f ~ /.ssh/id dsa
cat ~ /.ssh/id dsa.pub >> ~ /.ssh/authorized keys
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 34 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Download & Extract Hadoop
Download Hadoop from the Apache Download Mirrors
http://mirror.fibergrid.in/apache/hadoop/common/
Extract the contents of the Hadoop package to a location of your
choice. I picked /usr/local/hadoop.
$ cd /usr/local
$ sudo tar xzf hadoop-2.7.2.tar.gz
$ sudo mv hadoop-2.7.2 hadoop
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 35 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Add Hadoop configuration in .bashrc
Add Hadoop configuration in .bashrc in home directory.
export HADOOP INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP INSTALL/bin
export PATH=$PATH:$HADOOP INSTALL/sbin
export HADOOP MAPRED HOME=$HADOOP INSTALL
export HADOOP HDFS HOME=$HADOOP INSTALL
export HADOOP COMMON HOME=$HADOOP INSTALL
export YARN HOME=$HADOOP INSTALL
export HADOOP COMMON LIB NATIVE DIR=$HADOOP INSTALL/lib/native
export HADOOP OPTS="-Djava.library.path=$HADOOP INSTALL/lib"
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 36 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Create temp file, DataNode & NameNode
Execute below commands to create NameNode
mkdir -p /usr/local/hadoopdata/hdfs/namenode
Execute below commands to create DataNode
mkdir -p /usr/local/hadoopdata/hdfs/datanode
Execute below code to create the tmp directory in hadoop
sudo mkdir -p /app/hadoop/tmp
sudo chown hadoop1:hadoop1 /app/hadoop/tmp
sudo chmod 750 /app/hadoop/tmp
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 37 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Files to Configure
The following are the files we need to configure
core-site.xml
hadoop-env.sh
mapred-site.xml
hdfs-site.xml
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 38 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Add properties in /usr/local/hadoop/etc/core-site.xml
Add the following snippets between the
< configuration > ... < /configuration > tags in the core-site.xml
file.
Add below property to specify the location of tmp
< property >
< name > hadoop.tmp.dir < /name >
< value > /app/hadoop/tmp < /value >
< /property >
Add below property to specify the location of default file
system and its port number.
< property >
< name > fs.default.name < /name >
< value > hdfs : //localhost : 9000 < /value >
< /property >
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 39 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Add properties in /usr/local/hadoop/etc/hadoop-env.sh
Un-Comment the JAVA HOME and Give Correct Path For
Java.
export JAVA HOME=/usr/lib/jvm/java-7-openjdk-amd64
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 40 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Add property in
/usr/local/hadoop/etc/hadoop/mapred-site.xml
In file we add The host name and port that the MapReduce job
tracker runs at. Add following in mapred-site.xml :
< property >
< name > mapred.job.tracker < /name >
< value > localhost : 54311 < /value >
< /property >
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 41 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Add properties in ... etc/hadoop/hdfs-site.xml
In file hdfs-site.xml add following:
Add replication factor
< property >
< name > dfs.replication < /name >
< value > 1 < /value >
< /property >
Specify the NameNode
< property >
< name > dfs.namenode.name.dir < /name >
< value > file : /usr/local/hadoopdata/hdfs/namenode < /value >
< /property >
Specify the DataNode
< property >
< name > dfs.datanode.name.dir < /name >
< value > file : /usr/local/hadoopdata/hdfs/datanode < /value >
< /property >
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 42 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Formatting the HDFS filesystem via the NameNode
The first step to starting up your Hadoop installation is
Formatting the Hadoop file system
We need to do this the first time you set up a Hadoop.
Do not format a running Hadoop filesystem as you will lose all
the data currently in HDFS
To format the filesystem, run the command
hadoop namenode -format
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 43 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Starting single-node cluster
Run the command:
start-all.sh
This will startup a NameNode,SecondaryNameNode,
DataNode, ResourceManager and a NodeManager on your
machine.
A nifty tool for checking whether the expected Hadoop
processes are running is jps
hadoop1@hadoop1:/usr/local/hadoop$ jps
2598 NameNode
3112 ResourceManager
3523 Jps
2917 SecondaryNameNode
2727 DataNode
3242 NodeManager
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 44 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Stopping your single-node cluster
Run the command
stop-all.sh
To stop all the daemons running on your machine output will be
like this.
stopping NodeManager
localhost: stopping ResourceManager
stopping NameNode
localhost: stopping DataNode
localhost: stopping SecondaryNameNode
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 45 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Map-Reduce Framework
Map Reduce programming paradigm
It relies basically on two functions, Map and Reduce
Map Reduce used to manage many large-scale computations
The framework takes care of scheduling tasks, monitoring
them and re-executes the failed tasks.
The framework to effectively schedule tasks on the nodes
where data is already present
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 46 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Map-Reduce Computation Steps
The key-value pairs from each Map task are collected by a
master controller and sorted by key. The keys are divided
among all the Reduce tasks, so all key-value pairs with the
same key wind up at the same Reduce task.
The Reduce tasks work on one key at a time, and combine
all the values associated with that key in some way. The
manner of combination of values is determined by the code
written by the user for the Reduce function.
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 47 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Hadoop - MapReduce
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 48 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Hadoop - MapReduce (Word Count) Example
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 49 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
MapReduce - WordCountMapper
In WordCountMapper class we perform the following operations
Read a line from file
Split line into Words
Assign Count 1 to each word
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 50 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
WordCountMapper source code
public static class WordCountMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context ) throws
IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 51 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
MapReduce - WordCountReducer
In WordCountReducer class we perform the following operations
Sum the list of values
Assign sum to corresponding word
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 52 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
WordCountReducer source code
public static class WordCountReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context ) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 53 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
WordCountJob
public class WordCountJob {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "word count");
job.setJarByClass(WordCountJob.class);
job.setMapperClass(WordCountMapper.class);
job.setCombinerClass(WordCountReducer.class);
job.setReducerClass(WordCountReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 54 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Header Files to include
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 55 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Execution of Hadoop Program in Eclipse
Step1:
1 Starting Hadoop in terminal using command:
$ Start-all.sh
2 Use JPS command to check all services of Hadoop are started
or not.
Step 2: open Eclipse
Step 3: Go to file ⇒ New ⇒ Project
Select Java Project and click on Next button
Write project name and click on Finish button
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 56 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Continue...
Step 4: Right side it creates a project
1 Right click on Project ⇒ New ⇒ Class
2 Write Name of Class and then Click Finish
3 Write MapReduce program in that class
Step 5: Write JAVA Program
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 57 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Continue...
Step 6: Importing JAR files
1 Right click on Project and select properties (Alt+Enter)
2 Select Java Build Path ⇒ Click on Libraries, then click on add
external JARS
3 Select the following jars from Hadoop library.
/usr/local/Hadoop/share/Hadoop/common/libs
/usr/local/Hadoop/share/Hadoop/hdfs/libs
/usr/local/Hadoop/share/Hadoop/httpfs/libs
/usr/local/Hadoop/share/Hadoop/mapreduce/libs
/usr/local/Hadoop/share/Hadoop/yarn/libs
/usr/local/Hadoop/share/Hadoop/tools/
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 58 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Continue ....
Step 7: Set input file path
1 Create folder in home dir
2 copy text files in to that
3 Select path of Input
Step 8: Set input and output path
1 right click on source ⇒ Run As ⇒ Run Configuration ⇒
Argument
2 Enter your input and out put path with a single space
3 click on Run
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 59 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
thank You
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 60 / 60

More Related Content

What's hot

Big data-analytics-cpe8035
Big data-analytics-cpe8035Big data-analytics-cpe8035
Big data-analytics-cpe8035
Neelam Rawat
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Edureka!
 
Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
Mahmood Reza Esmaili Zand
 
Hadoop ppt2
Hadoop ppt2Hadoop ppt2
Hadoop ppt2
Ankit Gupta
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
 
Apache hive
Apache hiveApache hive
Apache hive
Vaibhav Kadu
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
joelcrabb
 
Hadoop
Hadoop Hadoop
Hadoop
ABHIJEET RAJ
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
Shubham Parmar
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP Theorem
Rahul Jain
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
Dataflair Web Services Pvt Ltd
 
Big data analytics - hadoop
Big data analytics - hadoopBig data analytics - hadoop
Big data analytics - hadoop
Vishwajeet Jadeja
 
Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)
SiamAhmed16
 
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
Simplilearn
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
VNIT-ACM Student Chapter
 
Introduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopIntroduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & Hadoop
Savvycom Savvycom
 
Introduction à HDFS
Introduction à HDFSIntroduction à HDFS
Introduction à HDFS
Modern Data Stack France
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
Bhushan Kulkarni
 
Data engineering design patterns
Data engineering design patternsData engineering design patterns
Data engineering design patterns
Valdas Maksimavičius
 
Web hdfs and httpfs
Web hdfs and httpfsWeb hdfs and httpfs
Web hdfs and httpfs
wchevreuil
 

What's hot (20)

Big data-analytics-cpe8035
Big data-analytics-cpe8035Big data-analytics-cpe8035
Big data-analytics-cpe8035
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
 
Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
 
Hadoop ppt2
Hadoop ppt2Hadoop ppt2
Hadoop ppt2
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Apache hive
Apache hiveApache hive
Apache hive
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop
Hadoop Hadoop
Hadoop
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP Theorem
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Big data analytics - hadoop
Big data analytics - hadoopBig data analytics - hadoop
Big data analytics - hadoop
 
Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)
 
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Introduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopIntroduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & Hadoop
 
Introduction à HDFS
Introduction à HDFSIntroduction à HDFS
Introduction à HDFS
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 
Data engineering design patterns
Data engineering design patternsData engineering design patterns
Data engineering design patterns
 
Web hdfs and httpfs
Web hdfs and httpfsWeb hdfs and httpfs
Web hdfs and httpfs
 

Similar to Hadoop basics

Worldranking universities final documentation
Worldranking universities final documentationWorldranking universities final documentation
Worldranking universities final documentation
Bhadra Gowdra
 
HDFS & MapReduce
HDFS & MapReduceHDFS & MapReduce
HDFS & MapReduce
Skillspeed
 
How To Become A Big Data Engineer? Edureka
How To Become A Big Data Engineer? EdurekaHow To Become A Big Data Engineer? Edureka
How To Become A Big Data Engineer? Edureka
Edureka!
 
Hadoop Online Training
Hadoop Online TrainingHadoop Online Training
Hadoop Online Training
Nagendra Kumar
 
Big Data Hadoop Training by Easylearning Guru
Big Data Hadoop Training by Easylearning GuruBig Data Hadoop Training by Easylearning Guru
Big Data Hadoop Training by Easylearning Guru
KCC Software Ltd. & Easylearning.guru
 
Cloudera Academic Partnership: Teaching Hadoop to the Next Generation of Data...
Cloudera Academic Partnership: Teaching Hadoop to the Next Generation of Data...Cloudera Academic Partnership: Teaching Hadoop to the Next Generation of Data...
Cloudera Academic Partnership: Teaching Hadoop to the Next Generation of Data...
Cloudera, Inc.
 
解讀雲端大數據新趨勢
解讀雲端大數據新趨勢解讀雲端大數據新趨勢
解讀雲端大數據新趨勢
Jazz Yao-Tsung Wang
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Giovanna Roda
 
Big Data Engineer Skills and Job Description | Edureka
Big Data Engineer Skills and Job Description | EdurekaBig Data Engineer Skills and Job Description | Edureka
Big Data Engineer Skills and Job Description | Edureka
Edureka!
 
Research IT @ Illinois: Establishing Service Responsive to Investigator Needs
Research IT @ Illinois: Establishing Service Responsive to Investigator NeedsResearch IT @ Illinois: Establishing Service Responsive to Investigator Needs
Research IT @ Illinois: Establishing Service Responsive to Investigator Needs
John Towns
 
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
Naoki (Neo) SATO
 
Hpdw 2015-v10-paper
Hpdw 2015-v10-paperHpdw 2015-v10-paper
Hpdw 2015-v10-paper
restassure
 
TUW-ASE-Summer 2014: Data as a Service – Concepts, Design & Implementation, a...
TUW-ASE-Summer 2014: Data as a Service – Concepts, Design & Implementation, a...TUW-ASE-Summer 2014: Data as a Service – Concepts, Design & Implementation, a...
TUW-ASE-Summer 2014: Data as a Service – Concepts, Design & Implementation, a...
Hong-Linh Truong
 
Webinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use HadoopWebinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use Hadoop
Edureka!
 
Big Data Hadoop Tutorial by Easylearning Guru
Big Data Hadoop Tutorial by Easylearning GuruBig Data Hadoop Tutorial by Easylearning Guru
Big Data Hadoop Tutorial by Easylearning Guru
KCC Software Ltd. & Easylearning.guru
 
Resume_Ayush Gaur_v17
Resume_Ayush Gaur_v17Resume_Ayush Gaur_v17
Resume_Ayush Gaur_v17
Ayush Redevil Gaur
 
Hareesh
HareeshHareesh
Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdf
kalai75
 
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
IJECEIAES
 
Cisco event 6 05 2014v3 wwt only
Cisco event 6 05 2014v3 wwt onlyCisco event 6 05 2014v3 wwt only
Cisco event 6 05 2014v3 wwt only
Arthur_Hansen
 

Similar to Hadoop basics (20)

Worldranking universities final documentation
Worldranking universities final documentationWorldranking universities final documentation
Worldranking universities final documentation
 
HDFS & MapReduce
HDFS & MapReduceHDFS & MapReduce
HDFS & MapReduce
 
How To Become A Big Data Engineer? Edureka
How To Become A Big Data Engineer? EdurekaHow To Become A Big Data Engineer? Edureka
How To Become A Big Data Engineer? Edureka
 
Hadoop Online Training
Hadoop Online TrainingHadoop Online Training
Hadoop Online Training
 
Big Data Hadoop Training by Easylearning Guru
Big Data Hadoop Training by Easylearning GuruBig Data Hadoop Training by Easylearning Guru
Big Data Hadoop Training by Easylearning Guru
 
Cloudera Academic Partnership: Teaching Hadoop to the Next Generation of Data...
Cloudera Academic Partnership: Teaching Hadoop to the Next Generation of Data...Cloudera Academic Partnership: Teaching Hadoop to the Next Generation of Data...
Cloudera Academic Partnership: Teaching Hadoop to the Next Generation of Data...
 
解讀雲端大數據新趨勢
解讀雲端大數據新趨勢解讀雲端大數據新趨勢
解讀雲端大數據新趨勢
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Big Data Engineer Skills and Job Description | Edureka
Big Data Engineer Skills and Job Description | EdurekaBig Data Engineer Skills and Job Description | Edureka
Big Data Engineer Skills and Job Description | Edureka
 
Research IT @ Illinois: Establishing Service Responsive to Investigator Needs
Research IT @ Illinois: Establishing Service Responsive to Investigator NeedsResearch IT @ Illinois: Establishing Service Responsive to Investigator Needs
Research IT @ Illinois: Establishing Service Responsive to Investigator Needs
 
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
 
Hpdw 2015-v10-paper
Hpdw 2015-v10-paperHpdw 2015-v10-paper
Hpdw 2015-v10-paper
 
TUW-ASE-Summer 2014: Data as a Service – Concepts, Design & Implementation, a...
TUW-ASE-Summer 2014: Data as a Service – Concepts, Design & Implementation, a...TUW-ASE-Summer 2014: Data as a Service – Concepts, Design & Implementation, a...
TUW-ASE-Summer 2014: Data as a Service – Concepts, Design & Implementation, a...
 
Webinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use HadoopWebinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use Hadoop
 
Big Data Hadoop Tutorial by Easylearning Guru
Big Data Hadoop Tutorial by Easylearning GuruBig Data Hadoop Tutorial by Easylearning Guru
Big Data Hadoop Tutorial by Easylearning Guru
 
Resume_Ayush Gaur_v17
Resume_Ayush Gaur_v17Resume_Ayush Gaur_v17
Resume_Ayush Gaur_v17
 
Hareesh
HareeshHareesh
Hareesh
 
Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdf
 
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
 
Cisco event 6 05 2014v3 wwt only
Cisco event 6 05 2014v3 wwt onlyCisco event 6 05 2014v3 wwt only
Cisco event 6 05 2014v3 wwt only
 

Recently uploaded

Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 

Recently uploaded (20)

Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 

Hadoop basics

  • 1. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Big Data & Hadoop D. Praveen Kumar Junior Research Fellow Department of Computer Science & Engineering Indian Institute of Technology (Indian School of Mines) Dhanbad, Jharkhand, India Head of IT & ITES, Skill Subsist Impels Ltd, Tirupati. March 25, 2017 Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 1 / 60
  • 2. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc 1 Introduction 2 Big Data 3 Sources of Big Data 4 Tools 5 HDFS 6 Installation 7 Configuration 8 Starting & Stopping 9 Map Reduce 10 Execution Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 2 / 60
  • 3. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Data Data means a value or set of values. Examples: march 1st 2017 20, 30, 40 ΨΦϕ Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 3 / 60
  • 4. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Information Meaningful or preprocessed data we called as Information. Examples: Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 4 / 60
  • 5. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Data Types The kind of data that may appear in a computer. Examples: int float char double Abstract data types -user defined data types. Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 5 / 60
  • 6. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Traditional approaches Traditional approaches to store and process the data 1 File system 2 RDBMS (Relational Database Management Systems) 3 Data Warehouse & Mining Tools 4 Grid Computing 5 Volunteer Computing Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 6 / 60
  • 7. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc GUESTS =4 Transportation from railway station to your home( one Auto/car is sufficient) mom can prepare food or snacks without risk. Your house is sufficient for Accommodation. Facilities like bed, bathrooms, water and TV are provided which you use. You can talk to each other and crack jokes and you can make them happy Expenditure is nearly Rs.1000/- Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 7 / 60
  • 8. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc GUESTS =100 Transportation = 25 autos/car or two buses Food = catering. Accommodation = Lodge. Facilities = AC, TV, and all other facilities Maintenance= somewhat difficult Expenditure =nearly Rs. 90,000/- Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 8 / 60
  • 9. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc GUESTS =10000 Transportation = 2500 autos or 500 buses Food = catering. Accommodation = all Lodges, function halls and cottages in the town. Facilities = AC, TV, and all other facilities are somewhat difficult to provide. Maintenance= more difficult Expenditure =nearly Rs. 2,00,000/- Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 9 / 60
  • 10. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Grid Computing Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 10 / 60
  • 11. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Volunteer Computing Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 11 / 60
  • 12. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc GUESTS =10000000 Transportation=how many autos=? Food =? Accommodation =? Facilities =? Maintenance=? Cost =? Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 12 / 60
  • 13. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Problems Same we assume in computing environment Difficult to handle a huge and ever growing amount of data Processing of data can not be possible with few machines distributing large data sets is difficult Construction of online or offline models are very difficult Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 13 / 60
  • 14. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Solution A single solution to all these problems is Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 14 / 60
  • 15. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc What is Big Data? Big data refers to voluminous amounts of structured or unstructured data that organizations can potentially mine and analyze. Big data is huge amount of large data sets characterized by Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 15 / 60
  • 16. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Data generation Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 16 / 60
  • 17. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc How Data generated Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 17 / 60
  • 18. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Internet of Events Internet is the main source to generating the wast amount of data. Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 18 / 60
  • 19. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc 4 Internet of Events Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 19 / 60
  • 20. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc 4 Questions of Data Analysts 1 What happened? 2 Why did it happen? 3 What will happen? 4 What is the best that can happen? Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 20 / 60
  • 21. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Big Data Platforms and Analytical Software Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 21 / 60
  • 22. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Hadoop Here we go with Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 22 / 60
  • 23. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Hadoop History Hadoop was created by Doug Cutting, creator of Lucene. He also involved in a project called Nutch. (It is basic version of hadoop) Nutch is a combination of MapReduce and NDFS (Nutch Distributed File System) Later Nutch renamed to Hadoop. (Mapreduce + HDFS (Hadoop Distributed File System)) Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 23 / 60
  • 24. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Hadoop Apache Hadoop is an open-source software framework for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 24 / 60
  • 25. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Hadoop The base Apache Hadoop framework is composed of the following modules: Hadoop Common contains libraries and utilities needed by other Hadoop modules Hadoop Distributed File System (HDFS) a distributed file-system that stores data Hadoop YARN a resource-management platform Hadoop MapReduce for large scale data processing. Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 25 / 60
  • 26. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Hadoop Components Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 26 / 60
  • 27. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Hadoop Components Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 27 / 60
  • 28. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc HDFS- Goals The design goals of HDFS 1 Very Large files 2 Streaming Data Access 3 Commodity Hardware Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 28 / 60
  • 29. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc HDFS- Failed in HDFS is Not FIT for 1 Lots of small files 2 Low latency database access 3 Multiple writers, arbitrary file modifications Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 29 / 60
  • 30. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc HDFS- Concepts 1 Blocks 2 Namenodes 3 Datanodes 4 HDFS Federation 5 HDFS High Availability Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 30 / 60
  • 31. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Requirements Necessary Java >= 7 ssh Linux OS (Ubuntu >= 14.04) Hadoop framework Optional Eclipse Internet connection Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 31 / 60
  • 32. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Java 7 & Installation Hadoop requires a working Java installation. However, using java 1.7 or more is recommended. Following command is used to install java in linux platform sudo apt-get install openjdk-7-jdk (or) sudo apt-get install default-jdk Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 32 / 60
  • 33. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Java PATH Setup We need to set JAVA path Open the .bashrc file located in home directory gedit ~/.bashrc Add below line at the end: export JAVA HOME=/usr/lib/jvm/java−7−openjdk−amd64 Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 33 / 60
  • 34. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Installation & Configuration of SSH Hadoop requires SSH(Secure Shell) access to manage its nodes, i.e. remote machines plus your local machine if you want to use Hadoop on it. Install SSH using following command sudo apt-get install ssh First, we have to generate DSA an SSH key for user. ssh-keygen -t dsa -P ’’ -f ~ /.ssh/id dsa cat ~ /.ssh/id dsa.pub >> ~ /.ssh/authorized keys Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 34 / 60
  • 35. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Download & Extract Hadoop Download Hadoop from the Apache Download Mirrors http://mirror.fibergrid.in/apache/hadoop/common/ Extract the contents of the Hadoop package to a location of your choice. I picked /usr/local/hadoop. $ cd /usr/local $ sudo tar xzf hadoop-2.7.2.tar.gz $ sudo mv hadoop-2.7.2 hadoop Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 35 / 60
  • 36. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Add Hadoop configuration in .bashrc Add Hadoop configuration in .bashrc in home directory. export HADOOP INSTALL=/usr/local/hadoop export PATH=$PATH:$HADOOP INSTALL/bin export PATH=$PATH:$HADOOP INSTALL/sbin export HADOOP MAPRED HOME=$HADOOP INSTALL export HADOOP HDFS HOME=$HADOOP INSTALL export HADOOP COMMON HOME=$HADOOP INSTALL export YARN HOME=$HADOOP INSTALL export HADOOP COMMON LIB NATIVE DIR=$HADOOP INSTALL/lib/native export HADOOP OPTS="-Djava.library.path=$HADOOP INSTALL/lib" Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 36 / 60
  • 37. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Create temp file, DataNode & NameNode Execute below commands to create NameNode mkdir -p /usr/local/hadoopdata/hdfs/namenode Execute below commands to create DataNode mkdir -p /usr/local/hadoopdata/hdfs/datanode Execute below code to create the tmp directory in hadoop sudo mkdir -p /app/hadoop/tmp sudo chown hadoop1:hadoop1 /app/hadoop/tmp sudo chmod 750 /app/hadoop/tmp Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 37 / 60
  • 38. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Files to Configure The following are the files we need to configure core-site.xml hadoop-env.sh mapred-site.xml hdfs-site.xml Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 38 / 60
  • 39. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Add properties in /usr/local/hadoop/etc/core-site.xml Add the following snippets between the < configuration > ... < /configuration > tags in the core-site.xml file. Add below property to specify the location of tmp < property > < name > hadoop.tmp.dir < /name > < value > /app/hadoop/tmp < /value > < /property > Add below property to specify the location of default file system and its port number. < property > < name > fs.default.name < /name > < value > hdfs : //localhost : 9000 < /value > < /property > Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 39 / 60
  • 40. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Add properties in /usr/local/hadoop/etc/hadoop-env.sh Un-Comment the JAVA HOME and Give Correct Path For Java. export JAVA HOME=/usr/lib/jvm/java-7-openjdk-amd64 Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 40 / 60
  • 41. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Add property in /usr/local/hadoop/etc/hadoop/mapred-site.xml In file we add The host name and port that the MapReduce job tracker runs at. Add following in mapred-site.xml : < property > < name > mapred.job.tracker < /name > < value > localhost : 54311 < /value > < /property > Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 41 / 60
  • 42. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Add properties in ... etc/hadoop/hdfs-site.xml In file hdfs-site.xml add following: Add replication factor < property > < name > dfs.replication < /name > < value > 1 < /value > < /property > Specify the NameNode < property > < name > dfs.namenode.name.dir < /name > < value > file : /usr/local/hadoopdata/hdfs/namenode < /value > < /property > Specify the DataNode < property > < name > dfs.datanode.name.dir < /name > < value > file : /usr/local/hadoopdata/hdfs/datanode < /value > < /property > Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 42 / 60
  • 43. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Formatting the HDFS filesystem via the NameNode The first step to starting up your Hadoop installation is Formatting the Hadoop file system We need to do this the first time you set up a Hadoop. Do not format a running Hadoop filesystem as you will lose all the data currently in HDFS To format the filesystem, run the command hadoop namenode -format Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 43 / 60
  • 44. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Starting single-node cluster Run the command: start-all.sh This will startup a NameNode,SecondaryNameNode, DataNode, ResourceManager and a NodeManager on your machine. A nifty tool for checking whether the expected Hadoop processes are running is jps hadoop1@hadoop1:/usr/local/hadoop$ jps 2598 NameNode 3112 ResourceManager 3523 Jps 2917 SecondaryNameNode 2727 DataNode 3242 NodeManager Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 44 / 60
  • 45. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Stopping your single-node cluster Run the command stop-all.sh To stop all the daemons running on your machine output will be like this. stopping NodeManager localhost: stopping ResourceManager stopping NameNode localhost: stopping DataNode localhost: stopping SecondaryNameNode Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 45 / 60
  • 46. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Map-Reduce Framework Map Reduce programming paradigm It relies basically on two functions, Map and Reduce Map Reduce used to manage many large-scale computations The framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks. The framework to effectively schedule tasks on the nodes where data is already present Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 46 / 60
  • 47. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Map-Reduce Computation Steps The key-value pairs from each Map task are collected by a master controller and sorted by key. The keys are divided among all the Reduce tasks, so all key-value pairs with the same key wind up at the same Reduce task. The Reduce tasks work on one key at a time, and combine all the values associated with that key in some way. The manner of combination of values is determined by the code written by the user for the Reduce function. Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 47 / 60
  • 48. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Hadoop - MapReduce Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 48 / 60
  • 49. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Hadoop - MapReduce (Word Count) Example Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 49 / 60
  • 50. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc MapReduce - WordCountMapper In WordCountMapper class we perform the following operations Read a line from file Split line into Words Assign Count 1 to each word Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 50 / 60
  • 51. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc WordCountMapper source code public static class WordCountMapper extends Mapper<Object, Text, Text, IntWritable>{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 51 / 60
  • 52. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc MapReduce - WordCountReducer In WordCountReducer class we perform the following operations Sum the list of values Assign sum to corresponding word Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 52 / 60
  • 53. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc WordCountReducer source code public static class WordCountReducer extends Reducer<Text,IntWritable,Text,IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 53 / 60
  • 54. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc WordCountJob public class WordCountJob { public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf, "word count"); job.setJarByClass(WordCountJob.class); job.setMapperClass(WordCountMapper.class); job.setCombinerClass(WordCountReducer.class); job.setReducerClass(WordCountReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } } Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 54 / 60
  • 55. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Header Files to include import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.GenericOptionsParser; Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 55 / 60
  • 56. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Execution of Hadoop Program in Eclipse Step1: 1 Starting Hadoop in terminal using command: $ Start-all.sh 2 Use JPS command to check all services of Hadoop are started or not. Step 2: open Eclipse Step 3: Go to file ⇒ New ⇒ Project Select Java Project and click on Next button Write project name and click on Finish button Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 56 / 60
  • 57. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Continue... Step 4: Right side it creates a project 1 Right click on Project ⇒ New ⇒ Class 2 Write Name of Class and then Click Finish 3 Write MapReduce program in that class Step 5: Write JAVA Program Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 57 / 60
  • 58. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Continue... Step 6: Importing JAR files 1 Right click on Project and select properties (Alt+Enter) 2 Select Java Build Path ⇒ Click on Libraries, then click on add external JARS 3 Select the following jars from Hadoop library. /usr/local/Hadoop/share/Hadoop/common/libs /usr/local/Hadoop/share/Hadoop/hdfs/libs /usr/local/Hadoop/share/Hadoop/httpfs/libs /usr/local/Hadoop/share/Hadoop/mapreduce/libs /usr/local/Hadoop/share/Hadoop/yarn/libs /usr/local/Hadoop/share/Hadoop/tools/ Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 58 / 60
  • 59. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Continue .... Step 7: Set input file path 1 Create folder in home dir 2 copy text files in to that 3 Select path of Input Step 8: Set input and output path 1 right click on source ⇒ Run As ⇒ Run Configuration ⇒ Argument 2 Enter your input and out put path with a single space 3 click on Run Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 59 / 60
  • 60. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc thank You Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 60 / 60