www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Agenda for today’s Session
 MapReduce Way
 Classes and Packages in MapReduce
 Explanation of a Complete MapReduce Program
 MapReduce Examples on Analytics
 MapReduce Example on Testing
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
MapReduce Example on Word Count Process
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
MapReduce Way
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
MapReduce Way – Word Count Process
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Input/Output Classes in MapReduce
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Input Format – Class Hierarchy
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Output Format – Class Hierarchy
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Packages and Classes in Word Count
MapReduce Example
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Packages to Import
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import
org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
All these packages are present in
hadoop-common.jar
All these
packages are
present in
hadoop-mapreduce-
client-core.jar
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Mapper Class
public static class Map extends
Mapper<LongWritable, Text, Text, IntWritable> {
Name of the Mapper Class which
inherits Super Class Mapper
Mapper Class takes 4 Arguments i.e.
Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT>
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Reducer Class
public static class Reduce extends
Reducer<Text, IntWritable, Text, IntWritable> {
Name of the Reducer Class which
inherits Super Class Reducer
Reducer Class takes 4 Arguments i.e.
Reducer <KEYIN, VALUEIN, KEYOUT, VALUEOUT>
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Its Time to see some MapReduce Examples
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
MapReduce is useful in a wide range of applications in multiple domains.
It is majorly used for 2 things:
 Analytics: Process the data and give the desired results
 Testing: Perform few test cases using MRUnit
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Let us see few MapReduce Examples
on Analytics
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
MapReduce Temperature Example
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Weather Forecasting
 Problem Statement:
» Analysing weather data of Austin to determine Hot and Cold
Days.
We have weather data set of Austin by NCIE.
NOAA's National Centres for Environmental Information (NCEI)
(previously NCDC) is responsible for preserving, monitoring, assessing,
and providing public access to the Nation's treasure of climate and
historical weather data and information.
Refer -> ftp://ftp.ncdc.noaa.gov/pub/data/uscrn/products/daily01
Temperature Example
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Temperature Example - Weather Dataset
6th Column
Max Temp
6th Column
Min Temp
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
MapReduce Example
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Last.fm Example
is an online music website where users listen to various tracks,
the data gets collected like shown below. Write a map reduce
program to get the Number of unique listeners.
The data is coming in log files and looks like as shown below:
UserId TrackId Shared Radio Skip
100001 150 1 1 0
100005 103 0 0 1
100142 78 1 0 0
110005 289 1 0 1
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Let us see a MapReduce Example
on Testing
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
MRUnit Testing Framework
 Provides 4 drivers for separately testing MapReduce code
» MapDriver
» ReduceDriver
» MapReduceDriver
» PipelineMapReduceDriver
 Helps in filling the gap between MapReduce programs and JUnit*
 Better control on log messages with JUnit Integration
*JUnit is a simple framework
to write repeatable tests.
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
MapReduce MRUnit Example
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Learning Resources
 Hadoop Tutorial: www.edureka.co/blog/hadoop-tutorial
 MapReduce Tutorial: www.edureka.co/blog/mapreduce-tutorial
 MapReduce Interview Questions:
www.edureka.co/blog/interview-questions/hadoop-interview-questions-mapreduce
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Thank You …
Questions/Queries/Feedback

MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka