SlideShare a Scribd company logo
1 of 69
Module-3
Hadoop MapReduce Framework
www.edureka.co/big-data-and-hadoop
Objectives
www.edureka.co/big-data-and-hadoopSlide 2
At the end of this module, you will be able to
 Analyze different use-cases where MapReduce is used
 Differentiate between Traditional way and MapReduce way
 Learn about Hadoop 2.x MapReduce architecture and components
 Understand execution flow of YARN MapReduce application
 Implement basic MapReduce concepts
 Run a MapReduceProgram
 Understand Input Splits concept in MapReduce
 Understand MapReduce Job Submission Flow
 Implement Combiner and Partitioner in MapReduce
Let’s Revise
 Hadoop Cluster Configuration
 Data Loading Techniques
 Hadoop ClusterModes
Core
HDFS
core-site.xml
hdfs-site.xml
yarn-site.xmlYarn
mapred-site.xml
Map
Reduce
www.edureka.co/big-data-and-hadoopSlide 3
Let’s Revise
Data Analysis
Data Loading
Using Pig Using HIVE
Using Flume
Using Sqoop
Using Hadoop Copy Commands
HDFS
Sqoop Flume – Questions
www.edureka.co/big-data-and-hadoopSlide 4
Annie’s Question
Secondary NameNode is a hot backup for NameNode:
» TRUE
» FALSE
www.edureka.co/big-data-and-hadoopSlide 5
Annie’s Answer
Ans. FALSE. The Secondary NameNode (SNN) is the most
misunderstood component of the HDFS architecture. SNN
is not a hot backup for NameNode but a checkpoint
backup mechanism enabler in a Hadoop Cluster.
www.edureka.co/big-data-and-hadoopSlide 6
Where MapReduce is Used?
Weather Forecasting
 Problem Statement:
» De-identify personal health information.
HealthCare www.edureka.co/big-data-and-hadoopSlide 8
 Problem Statement:
» Finding Maximum temperature recorded in a year.
The Traditional Way
Very
Big
Data
Split Data matches
All
matches
grep
grep
grep cat
grep
matches
matches
matches
Split Data
Split Data
:
Split Data
www.edureka.co/big-data-and-hadoopSlide 8
MapReduce Way
Very
Big
Data
Split Data
All
matches
Split Data
Split Data
:
Split Data
M
A
P
www.edureka.co/big-data-and-hadoopSlide 9
R
E
D
U
C
E
MapReduce Framework
Why MapReduce?
 Two biggest Advantages:
» Taking processing to the data
» Processing data in parallel
a
b
c
Map Task
HDFS Block
Data Center
www.edureka.co/big-data-and-hadoopSlide 10
Rack
Node
Solving the Problem with MapReduce
HDFS
Take DB dump in CSV format and
copy it on HDFS
Store CSV file into
HDFS
Read CSV file
from HDFS
matches
Reduce
Map
0100
1101
1001
0100
1101
1001
0100
1101
Sqoop
1001
www.edureka.co/big-data-and-hadoopSlide 12
Map Logic
Reduce Logic
Node Manager
Container
Map Task
Container
Application
Master
Hadoop 2.x MapReduce Architecture
Client
Job History
Server
Resource
Manager
MapReduce Status
Job Submission
Node Status
Resource Request
Node Manager
Datanode1
www.edureka.co/big-data-and-hadoopSlide 12
Container
Reduce Task
Datanode2
Namenode
 ApplicationMaster
www.edureka.co/big-data-and-hadoopSlide 13
» One per application
» Short life
» Coordinates and Manages MapReduce Jobs
» Negotiates with Resource Manager to
schedule tasks
» The tasks are started by NodeManager(s)
 Job HistoryServer
» Maintains information about submitted
MapReduce jobs after their ApplicationMaster
terminates
 Client
» Submits a MapReduce Job
 Resource Manager
» Cluster Level resource manager
» Long Life, High Quality Hardware
 Node Manager
» One per Data Node
» Monitors resources on Data Node
Hadoop 2.x MapReduce Components
 Container
» Created by NM when requested
» Allocates certain amount of resources
(memory, CPU etc.) on a slave node
MapReduce Application Execution
www.edureka.co/big-data-and-hadoopSlide 14
Executing MapReduce Application on YARN
YARN MR Application Execution Flow
www.edureka.co/big-data-and-hadoopSlide 15
 MapReduce Job Execution
» Job Submission
» Job Initialization
» Tasks Assignment
» Memory Assignment
» StatusUpdates
» Failure Recovery
YARN MR Application Execution Flow
HDFS
Job ObjectApplication
Client JVM
Client
Resource
Manager
Management Node
1. Run Job
2. Get New Application
www.edureka.co/big-data-and-hadoopSlide 16
4. Submit Application
3. Copy Job Resources
YARN MR Application Execution Flow
HDFS
Job ObjectApplication
Client JVM
Client
Resource
Manager
Management Node
1. Run Job
2. Get New Application
4. Submit Application
3. Copy Job Resources
Node Manager
5. Start MR AppMaster container
6. Create 9. Start
container container
www.edureka.co/big-data-and-hadoopSlide 17
7. Get Input Splits
MRAppMaster
Data Node
8. Request
Resources
YARN MR Application Execution Flow
HDFS
Job ObjectApplication
Client JVM
Client
Resource
Manager
Management Node
1. Run Job
2. Get New Application
4. Submit Application
3. Copy Job Resources
Node Manager
5. Start MR AppMaster container
6. Create 9. Start
container container
8. Request
Resources
7. Get Input Splits
MRAppMaster
Data Node
Map/Reduce
Task
10. Create
Container
Task JVM
YarnChild
12.
Execute
11. Acquire Job
Resources www.edureka.co/big-data-and-hadoopSlide 19
YARN MR Application Execution Flow
HDFS
Job ObjectApplication
Client JVM
Client
Resource
Manager
Management Node
Node Manager
MRAppMaster
Data Node
Map/Reduce
Task
YarnChild
Task JVM
Poll for Status
www.edureka.co/big-data-and-hadoopSlide 19
Update
Status
Hadoop 2.x : YARN Workflow
Node Manager
Node Manager
Node Manager
Node Manager
Node Manager
Node Manager
Node Manager
Container1.2
Node Manager
Container1.1
Container2.1
Node Manager
Container2.2
Node Manager
Container2.3
Node Manager
App
Master2
Node Manager
App
Master1
Scheduler
Applications
Manager(AsM)
Resource
Manager
www.edureka.co/big-data-and-hadoopSlide 20
Annie’s Question
YARN was developed to overcome the following disadvantage
in Hadoop 1.0 MapReduce framework?
» Single Point Of Failure Of NameNode
» Only one version can be run in classic MapReduce
» Too much burden on Job Tracker
www.edureka.co/big-data-and-hadoopSlide 21
Annie’s Answer
Too much burden on Job Tracker
www.edureka.co/big-data-and-hadoopSlide 22
Annie’s Question
In YARN, the functionality of JobTracker has been replaced by
which of the following YARN features:
» Job Scheduling
» TaskMonitoring
» Resource Management
» Node management
www.edureka.co/big-data-and-hadoopSlide 23
Annie’s Answer
Task Monitoring and Resource Management. The fundamental
idea of YARN is to split up the two major functionalities of the
JobTracker, i.e. resource management and job
scheduling/monitoring, into separate daemons. A global
Resource Manager (RM) for resources and per-application
ApplicationMaster (AM) for task monitoring.
www.edureka.co/big-data-and-hadoopSlide 24
Annie’s Question
In YARN, which of the following daemons takes care of the
container and the resource utilization by the applications?
» Node Manager
» JobTracker
» Tasktracker
» ApplicationMaster
www.edureka.co/big-data-and-hadoopSlide 25
Annie’s Answer
ApplicationMaster
www.edureka.co/big-data-and-hadoopSlide 26
Annie’s Question
Can we run MRv1 Jobs in a YARN enabled Hadoop Cluster?
» Yes
» No
www.edureka.co/big-data-and-hadoopSlide 27
Annie’s Answer
Yes. MapReduce on YARN ensures full binary compatibility.
These existing applications can run on YARN directly without
recompilation.
www.edureka.co/big-data-and-hadoopSlide 28
MapReduce Paradigm
The Overall MapReduce Word Count Process
Input Splitting Mapping Shuffling Reducing Final Result
List(K3,V3)
Deer Bear River
Dear Bear River
Car Car River
Deer Car Bear
Bear, 2
Car, 3
Deer, 2
River, 2
Deer, 1
Bear, 1
River, 1
Car, 1
Car, 1
River, 1
Deer, 1
Car, 1
Bear, 1
List(K2,V2)
K1,V1
Car Car River
Deer Car Bear
Bear, 2
Car, 3
Deer, 2
River, 2
K2,List(V2)
Bear, (1,1)
Car, (1,1,1)
Deer, (1,1)
River, (1,1)
www.edureka.co/big-data-and-hadoopSlide 29
Anatomy of a MapReduce Program
MapReduce
Map:
Reduce:
(K1, V1) List (K2, V2)
(K2, list (V2)) List (K3, V3)
Key Value
www.edureka.co/big-data-and-hadoopSlide 30
Demo of WordCount Program
www.edureka.co/big-data-and-hadoopSlide 31
Annie’s Question
Input to the mapper is in the form of?
» A flat file
» (key, value) pair
» Only string
» All the above
www.edureka.co/big-data-and-hadoopSlide 32
Annie’s Answer
A Mapper accepts (key, value) pair as input.
www.edureka.co/big-data-and-hadoopSlide 33
Input Splits
INPUT DATA
Physical
Division
HDFS
Blocks
Logical
Division
www.edureka.co/big-data-and-hadoopSlide 34
Input
Splits
Relation Between Input Splits and HDFS Blocks
1 2 3 4 5 6 7 8 9 10 11
 Logical records do not fit neatly into the HDFS blocks.
 Logical records are lines that cross the boundary of the blocks.
 First split contains line 5 although it spans across blocks.
File
Lines
Block
Boundary
Block
Boundary
Block
Boundary
Block
Boundary
Split
www.edureka.co/big-data-and-hadoopSlide 35
Split Split
MapReduce Job Submission Flow
Node 1 Node 2
INPUT DATA
www.edureka.co/big-data-and-hadoopSlide 36
 Input data is distributed to nodes
MapReduce Job Submission Flow
Map
Node 1
Map
Node 2
INPUT DATA
www.edureka.co/big-data-and-hadoopSlide 37
 Input data is distributed to nodes
 Each map task works on a “split” of data
MapReduce Job Submission Flow
Map
Node 1
Map
Node 2
INPUT DATA
www.edureka.co/big-data-and-hadoopSlide 38
 Input data is distributed to nodes
 Each map task works on a “split” of data
 Mapper outputs intermediate data
MapReduce Job Submission Flow
Map
Node 1
Map
Node 2
INPUT DATA
Node 1 Node 2
www.edureka.co/big-data-and-hadoopSlide 39
 Input data is distributed to nodes
 Each map task works on a “split” of data
 Mapper outputs intermediate data
 Data will be copied by the reducer processor once it identifies the
respective task using application master for all data the reducer is
responsible for
MapReduce Job Submission Flow
 Input data is distributed to nodes
 Each map task works on a “split” of data
 Mapper outputs intermediate data
 Data will be copied by the reducer processor once it identifies the
respective task using application master for all data the reducer is
responsible for
 Shuffle processor will sort and merge the data for a particular key
Map
Node 1
Map
Node 2
Reduce Reduce
INPUT DATA
Node 1 Node 2
www.edureka.co/big-data-and-hadoopSlide 40
MapReduce Job Submission Flow
Map
Node 1
Map
Node 2
Reduce Reduce
INPUT DATA
Node 1 Node 2
www.edureka.co/big-data-and-hadoopSlide 41
 Input data is distributed to nodes
 Each map task works on a “split” of data
 Mapper outputs intermediate data
 Data will be copied by the reducer processor once it identifies the
respective task using application master for all data the reducer is
responsible for
 Shuffle processor will sort and merge the data for a particular key
 Reducer output is stored
Annie’s Question
www.edureka.co/big-data-and-hadoopSlide 42
MapReduce programming model provides a way for reducers
to communicate with each other?
» Yes, reducers running on the same machine can
communicate with each other through shared memory
» No, each reducer runs independently and in isolation.
Annie’s Answer
www.edureka.co/big-data-and-hadoopSlide 43
Ans. No, reducers run independently and in isolation.
Individual tasks do not know the input source. Reducer tasks
rely on Hadoop framework to deliver the appropriate input for
processing.
Annie’s Question
www.edureka.co/big-data-and-hadoopSlide 44
Who specify Input Split Information?
» randomly and decided by name node
» randomly and decided by job tracker
» line by Line and decided by Input Splitter
» we will have to specify explicitly
Annie’s Answer
www.edureka.co/big-data-and-hadoopSlide 45
Ans. The client have to submit the input spit information by specifying
the start and end point either in InputFormat Configuration.
Overview of MapReduce
Combiners Partitioners
Combiners can be viewed as
‘mini-reducers’ in the Map phase.
Complete view of MapReduce, illustrating combiners and partitioner in addition
to Mappers and Reducers
MapReduce
www.edureka.co/big-data-and-hadoopSlide 46
Partitioners determine which reducer is
responsible for a particular key.
Combiner – Local Reduce
Passed workload further
to the Reducers
Before we distribute the
mapper results
Mini-Reducers Perform a
“Local Reduce”
COMBINERS
www.edureka.co/big-data-and-hadoopSlide 47
Combiner
Combiner
Reducer
(B,1)
(C,1)
(D,1)
(E,1)
(D,1)
(B,1)
(D,1)
(A,1)
(A,1)
(C,1)
(B,1)
(D,1)
(B,2)
(C,1)
(D,2)
(E,1)
(D,2)
(A,2)
(C,1)
(B,1)
(A, [2])
(B, [2,1])
(C, [1,1])
(D, [2,2])
(E, [1])
(A,2)
(B,3)
(C,2)
(D,4)
(E,1)
Shuffle
CombinerMapper
Mapper
B
C
D
E
D
B
D
A
A
C
B
D
Block1Block2
www.edureka.co/big-data-and-hadoopSlide 48
Annie’s Question
www.edureka.co/big-data-and-hadoopSlide 49
Combiner works at?
» Mapper Level
» Patitioner Level
» Reducer Level
» All the above
Annie’s Answer
www.edureka.co/big-data-and-hadoopSlide 50
Ans. Mapper level as Combiner works on the output data from
Mapper.
Annie’s Question
www.edureka.co/big-data-and-hadoopSlide 51
Combiner can be considered as:
» Semi Partitioner
» Semi Reducer
» Semi Shuffler
» Major Reducer
Annie’s Answer
www.edureka.co/big-data-and-hadoopSlide 52
Ans. Semi Reducer. Combiner works on the Mapper
output and lessen the burden on Reducer.
Partitioner – Redirecting Output from Mapper
Map
Map
Map
Reducer
Reducer
Reducer
Partitioner
Partitioner
Partitioner
www.edureka.co/big-data-and-hadoopSlide 53
Demo: Combiner and Partitioner
www.edureka.co/big-data-and-hadoopSlide 54
Demo: Combiner and Partitioner MR Code
Annie’s Question
www.edureka.co/big-data-and-hadoopSlide 55
Can we use same logic for combiner and reducer?
» No, they are separate entities.
» Yes, only if reducer and combiner logic are commutative
and associative and both of them are of same data
types.
Annie’s Answer
www.edureka.co/big-data-and-hadoopSlide 56
Ans. Yes, you can use same logic if Reducer and Combiner
logic are both commutative and associative and both of them
are of same data types.
Annie’s Question
www.edureka.co/big-data-and-hadoopSlide 57
Can we change the format of output key class and output
value class?
» TRUE
» FALSE
Annie’s Answer
www.edureka.co/big-data-and-hadoopSlide 58
Ans. TRUE
HealthCare Dataset
www.edureka.co/big-data-and-hadoopSlide 59
Revisit De-identification Architecture
HDFS
Taking DB dump in CSV format
and ingest into HDFS
Store De-identified CSV
file into HDFS
De-identify columns
based on
configurations
matches
Map Task 1
Map Task 2
.
.
Read CSV file
from HDFS
Reduce Task 1
0100
1101
1001
0100
1101
1001
0100
1101
Reduce Task 2
.
.
Sqoop
1001
www.edureka.co/big-data-and-hadoopSlide 61
public static String encrypt(String strToEncrypt, byte[] key)
{
try
{
Cipher cipher = Cipher.getInstance("AES/ECB/PKCS5Padding");
SecretKeySpec secretKey = new SecretKeySpec(key, "AES");
cipher.init(Cipher.ENCRYPT_MODE, secretKey);
String encryptedString = Base64.encodeBase64String(cipher.doFinal(strToEncrypt.getBytes()));
return encryptedString.trim();
}
catch (Exception e)
{
logger.error("Error while encrypting", e);
}
return null;
}
}
www.edureka.co/big-data-and-hadoopSlide 61
DeIdentify MapReduce Code
Demo of DeIdentify Program
www.edureka.co/big-data-and-hadoopSlide 62
Weather Data
ftp://ftp.ncdc.noaa.gov/pub/data/uscrn/products/daily01/
www.edureka.co/big-data-and-hadoopSlide 63
Demo of WeatherData Program
www.edureka.co/big-data-and-hadoopSlide 64
Assignment
Write MapReduce code for WordCount on your own and run it on Edureka VM
Download all the MapReduce codes from LMS and import them in your Eclipse IDE and execute them
Try Maximum Temperature problem in MapReduce
Try Hot and Cold day problem in MapReduce
www.edureka.co/big-data-and-hadoopSlide 65
Watch video “Running MapReduce Program” under Module-3 of your LMS
Attempt the Word Count , Patents,& Alphabets assignment using the items present in the LMS under the
tab Module 3
Review the Interview Questions for MapReduce
http://www.edureka.in/blog/hadoop-interview-questions-mapreduce/
Review the Next Generation MapReduce (MRv2 or YARN)
http://www.edureka.in/blog/apache-hadoop-2-0-and-yarn/
http://www.edureka.in/blog/hadoop-2-0-setting-up-a-single-node-cluster-in-15-minutes/
Setup the CDH4 Hadoop development environment using the documents present in the LMS
http://blog.cloudera.com/blog/2013/08/how-to-use-eclipse-with-mapreduce-in-clouderas-quickstart-vm/
www.edureka.co/big-data-and-hadoopSlide 66
Pre-work
Agenda for Next Class
www.edureka.co/big-data-and-hadoopSlide 67
 Map and Reduce Side Join
Counters
 DistributedCache
 Custom Input Format
 Sequence Input Format
MRUnit
Your feedback is important to us, be it a compliment, a suggestion or a complaint. It helps us to make
the course better!
Please spare few minutes to take the survey after the webinar.
www.edureka.co/big-data-and-hadoopSlide 68
Survey
Hadoop MapReduce Framework

More Related Content

What's hot

Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Simplilearn
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremRahul Jain
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)SiamAhmed16
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data miningSlideshare
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoopjoelcrabb
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemRutvik Bapat
 
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...Edureka!
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introductionPooyan Mehrparvar
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Edureka!
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Simplilearn
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...Simplilearn
 

What's hot (20)

Data models in NoSQL
Data models in NoSQLData models in NoSQL
Data models in NoSQL
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP Theorem
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introduction
 
NoSql
NoSqlNoSql
NoSql
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Big data architecture
Big data architectureBig data architecture
Big data architecture
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
 

Similar to Hadoop MapReduce Framework

XML Parsing with Map Reduce
XML Parsing with Map ReduceXML Parsing with Map Reduce
XML Parsing with Map ReduceEdureka!
 
Hadoop 2.0 yarn arch training
Hadoop 2.0 yarn arch trainingHadoop 2.0 yarn arch training
Hadoop 2.0 yarn arch trainingNandan Kumar
 
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with YarnScale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with YarnDavid Kaiser
 
Juniper Innovation Contest
Juniper Innovation ContestJuniper Innovation Contest
Juniper Innovation ContestAMIT BORUDE
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopHortonworks
 
Hadoop mapreduce and yarn frame work- unit5
Hadoop mapreduce and yarn frame work-  unit5Hadoop mapreduce and yarn frame work-  unit5
Hadoop mapreduce and yarn frame work- unit5RojaT4
 
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame WorkA Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame WorkIRJET Journal
 
Big Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdfBig Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdfWasyihunSema2
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerancePallav Jha
 
Strata + Hadoop World 2012: Knitting Boar
Strata + Hadoop World 2012: Knitting BoarStrata + Hadoop World 2012: Knitting Boar
Strata + Hadoop World 2012: Knitting BoarCloudera, Inc.
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTijwscjournal
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Deanna Kosaraju
 
Distributed Cache With MapReduce
Distributed Cache With MapReduceDistributed Cache With MapReduce
Distributed Cache With MapReduceEdureka!
 
A data aware caching 2415
A data aware caching 2415A data aware caching 2415
A data aware caching 2415SANTOSH WAYAL
 
Hot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark frameworkHot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark frameworkSupriya .
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map ReduceUrvashi Kataria
 
MOD-2 presentation on engineering students
MOD-2 presentation on engineering studentsMOD-2 presentation on engineering students
MOD-2 presentation on engineering studentsrishavkumar1402
 
Finalprojectpresentation
FinalprojectpresentationFinalprojectpresentation
FinalprojectpresentationSANTOSH WAYAL
 

Similar to Hadoop MapReduce Framework (20)

XML Parsing with Map Reduce
XML Parsing with Map ReduceXML Parsing with Map Reduce
XML Parsing with Map Reduce
 
Hadoop 2.0 yarn arch training
Hadoop 2.0 yarn arch trainingHadoop 2.0 yarn arch training
Hadoop 2.0 yarn arch training
 
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with YarnScale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
 
Juniper Innovation Contest
Juniper Innovation ContestJuniper Innovation Contest
Juniper Innovation Contest
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
Hadoop mapreduce and yarn frame work- unit5
Hadoop mapreduce and yarn frame work-  unit5Hadoop mapreduce and yarn frame work-  unit5
Hadoop mapreduce and yarn frame work- unit5
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame WorkA Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
 
Big Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdfBig Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdf
 
E031201032036
E031201032036E031201032036
E031201032036
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
 
Strata + Hadoop World 2012: Knitting Boar
Strata + Hadoop World 2012: Knitting BoarStrata + Hadoop World 2012: Knitting Boar
Strata + Hadoop World 2012: Knitting Boar
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
 
Distributed Cache With MapReduce
Distributed Cache With MapReduceDistributed Cache With MapReduce
Distributed Cache With MapReduce
 
A data aware caching 2415
A data aware caching 2415A data aware caching 2415
A data aware caching 2415
 
Hot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark frameworkHot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark framework
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
 
MOD-2 presentation on engineering students
MOD-2 presentation on engineering studentsMOD-2 presentation on engineering students
MOD-2 presentation on engineering students
 
Finalprojectpresentation
FinalprojectpresentationFinalprojectpresentation
Finalprojectpresentation
 

More from Edureka!

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaEdureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaEdureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaEdureka!
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaEdureka!
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaEdureka!
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaEdureka!
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaEdureka!
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaEdureka!
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaEdureka!
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaEdureka!
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | EdurekaEdureka!
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEdureka!
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEdureka!
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaEdureka!
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaEdureka!
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaEdureka!
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaEdureka!
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaEdureka!
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | EdurekaEdureka!
 

More from Edureka! (20)

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
 

Recently uploaded

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Hadoop MapReduce Framework

  • 2. Objectives www.edureka.co/big-data-and-hadoopSlide 2 At the end of this module, you will be able to  Analyze different use-cases where MapReduce is used  Differentiate between Traditional way and MapReduce way  Learn about Hadoop 2.x MapReduce architecture and components  Understand execution flow of YARN MapReduce application  Implement basic MapReduce concepts  Run a MapReduceProgram  Understand Input Splits concept in MapReduce  Understand MapReduce Job Submission Flow  Implement Combiner and Partitioner in MapReduce
  • 3. Let’s Revise  Hadoop Cluster Configuration  Data Loading Techniques  Hadoop ClusterModes Core HDFS core-site.xml hdfs-site.xml yarn-site.xmlYarn mapred-site.xml Map Reduce www.edureka.co/big-data-and-hadoopSlide 3
  • 4. Let’s Revise Data Analysis Data Loading Using Pig Using HIVE Using Flume Using Sqoop Using Hadoop Copy Commands HDFS Sqoop Flume – Questions www.edureka.co/big-data-and-hadoopSlide 4
  • 5. Annie’s Question Secondary NameNode is a hot backup for NameNode: » TRUE » FALSE www.edureka.co/big-data-and-hadoopSlide 5
  • 6. Annie’s Answer Ans. FALSE. The Secondary NameNode (SNN) is the most misunderstood component of the HDFS architecture. SNN is not a hot backup for NameNode but a checkpoint backup mechanism enabler in a Hadoop Cluster. www.edureka.co/big-data-and-hadoopSlide 6
  • 7. Where MapReduce is Used? Weather Forecasting  Problem Statement: » De-identify personal health information. HealthCare www.edureka.co/big-data-and-hadoopSlide 8  Problem Statement: » Finding Maximum temperature recorded in a year.
  • 8. The Traditional Way Very Big Data Split Data matches All matches grep grep grep cat grep matches matches matches Split Data Split Data : Split Data www.edureka.co/big-data-and-hadoopSlide 8
  • 9. MapReduce Way Very Big Data Split Data All matches Split Data Split Data : Split Data M A P www.edureka.co/big-data-and-hadoopSlide 9 R E D U C E MapReduce Framework
  • 10. Why MapReduce?  Two biggest Advantages: » Taking processing to the data » Processing data in parallel a b c Map Task HDFS Block Data Center www.edureka.co/big-data-and-hadoopSlide 10 Rack Node
  • 11. Solving the Problem with MapReduce HDFS Take DB dump in CSV format and copy it on HDFS Store CSV file into HDFS Read CSV file from HDFS matches Reduce Map 0100 1101 1001 0100 1101 1001 0100 1101 Sqoop 1001 www.edureka.co/big-data-and-hadoopSlide 12 Map Logic Reduce Logic
  • 12. Node Manager Container Map Task Container Application Master Hadoop 2.x MapReduce Architecture Client Job History Server Resource Manager MapReduce Status Job Submission Node Status Resource Request Node Manager Datanode1 www.edureka.co/big-data-and-hadoopSlide 12 Container Reduce Task Datanode2 Namenode
  • 13.  ApplicationMaster www.edureka.co/big-data-and-hadoopSlide 13 » One per application » Short life » Coordinates and Manages MapReduce Jobs » Negotiates with Resource Manager to schedule tasks » The tasks are started by NodeManager(s)  Job HistoryServer » Maintains information about submitted MapReduce jobs after their ApplicationMaster terminates  Client » Submits a MapReduce Job  Resource Manager » Cluster Level resource manager » Long Life, High Quality Hardware  Node Manager » One per Data Node » Monitors resources on Data Node Hadoop 2.x MapReduce Components  Container » Created by NM when requested » Allocates certain amount of resources (memory, CPU etc.) on a slave node
  • 15. YARN MR Application Execution Flow www.edureka.co/big-data-and-hadoopSlide 15  MapReduce Job Execution » Job Submission » Job Initialization » Tasks Assignment » Memory Assignment » StatusUpdates » Failure Recovery
  • 16. YARN MR Application Execution Flow HDFS Job ObjectApplication Client JVM Client Resource Manager Management Node 1. Run Job 2. Get New Application www.edureka.co/big-data-and-hadoopSlide 16 4. Submit Application 3. Copy Job Resources
  • 17. YARN MR Application Execution Flow HDFS Job ObjectApplication Client JVM Client Resource Manager Management Node 1. Run Job 2. Get New Application 4. Submit Application 3. Copy Job Resources Node Manager 5. Start MR AppMaster container 6. Create 9. Start container container www.edureka.co/big-data-and-hadoopSlide 17 7. Get Input Splits MRAppMaster Data Node 8. Request Resources
  • 18. YARN MR Application Execution Flow HDFS Job ObjectApplication Client JVM Client Resource Manager Management Node 1. Run Job 2. Get New Application 4. Submit Application 3. Copy Job Resources Node Manager 5. Start MR AppMaster container 6. Create 9. Start container container 8. Request Resources 7. Get Input Splits MRAppMaster Data Node Map/Reduce Task 10. Create Container Task JVM YarnChild 12. Execute 11. Acquire Job Resources www.edureka.co/big-data-and-hadoopSlide 19
  • 19. YARN MR Application Execution Flow HDFS Job ObjectApplication Client JVM Client Resource Manager Management Node Node Manager MRAppMaster Data Node Map/Reduce Task YarnChild Task JVM Poll for Status www.edureka.co/big-data-and-hadoopSlide 19 Update Status
  • 20. Hadoop 2.x : YARN Workflow Node Manager Node Manager Node Manager Node Manager Node Manager Node Manager Node Manager Container1.2 Node Manager Container1.1 Container2.1 Node Manager Container2.2 Node Manager Container2.3 Node Manager App Master2 Node Manager App Master1 Scheduler Applications Manager(AsM) Resource Manager www.edureka.co/big-data-and-hadoopSlide 20
  • 21. Annie’s Question YARN was developed to overcome the following disadvantage in Hadoop 1.0 MapReduce framework? » Single Point Of Failure Of NameNode » Only one version can be run in classic MapReduce » Too much burden on Job Tracker www.edureka.co/big-data-and-hadoopSlide 21
  • 22. Annie’s Answer Too much burden on Job Tracker www.edureka.co/big-data-and-hadoopSlide 22
  • 23. Annie’s Question In YARN, the functionality of JobTracker has been replaced by which of the following YARN features: » Job Scheduling » TaskMonitoring » Resource Management » Node management www.edureka.co/big-data-and-hadoopSlide 23
  • 24. Annie’s Answer Task Monitoring and Resource Management. The fundamental idea of YARN is to split up the two major functionalities of the JobTracker, i.e. resource management and job scheduling/monitoring, into separate daemons. A global Resource Manager (RM) for resources and per-application ApplicationMaster (AM) for task monitoring. www.edureka.co/big-data-and-hadoopSlide 24
  • 25. Annie’s Question In YARN, which of the following daemons takes care of the container and the resource utilization by the applications? » Node Manager » JobTracker » Tasktracker » ApplicationMaster www.edureka.co/big-data-and-hadoopSlide 25
  • 27. Annie’s Question Can we run MRv1 Jobs in a YARN enabled Hadoop Cluster? » Yes » No www.edureka.co/big-data-and-hadoopSlide 27
  • 28. Annie’s Answer Yes. MapReduce on YARN ensures full binary compatibility. These existing applications can run on YARN directly without recompilation. www.edureka.co/big-data-and-hadoopSlide 28
  • 29. MapReduce Paradigm The Overall MapReduce Word Count Process Input Splitting Mapping Shuffling Reducing Final Result List(K3,V3) Deer Bear River Dear Bear River Car Car River Deer Car Bear Bear, 2 Car, 3 Deer, 2 River, 2 Deer, 1 Bear, 1 River, 1 Car, 1 Car, 1 River, 1 Deer, 1 Car, 1 Bear, 1 List(K2,V2) K1,V1 Car Car River Deer Car Bear Bear, 2 Car, 3 Deer, 2 River, 2 K2,List(V2) Bear, (1,1) Car, (1,1,1) Deer, (1,1) River, (1,1) www.edureka.co/big-data-and-hadoopSlide 29
  • 30. Anatomy of a MapReduce Program MapReduce Map: Reduce: (K1, V1) List (K2, V2) (K2, list (V2)) List (K3, V3) Key Value www.edureka.co/big-data-and-hadoopSlide 30
  • 31. Demo of WordCount Program www.edureka.co/big-data-and-hadoopSlide 31
  • 32. Annie’s Question Input to the mapper is in the form of? » A flat file » (key, value) pair » Only string » All the above www.edureka.co/big-data-and-hadoopSlide 32
  • 33. Annie’s Answer A Mapper accepts (key, value) pair as input. www.edureka.co/big-data-and-hadoopSlide 33
  • 35. Relation Between Input Splits and HDFS Blocks 1 2 3 4 5 6 7 8 9 10 11  Logical records do not fit neatly into the HDFS blocks.  Logical records are lines that cross the boundary of the blocks.  First split contains line 5 although it spans across blocks. File Lines Block Boundary Block Boundary Block Boundary Block Boundary Split www.edureka.co/big-data-and-hadoopSlide 35 Split Split
  • 36. MapReduce Job Submission Flow Node 1 Node 2 INPUT DATA www.edureka.co/big-data-and-hadoopSlide 36  Input data is distributed to nodes
  • 37. MapReduce Job Submission Flow Map Node 1 Map Node 2 INPUT DATA www.edureka.co/big-data-and-hadoopSlide 37  Input data is distributed to nodes  Each map task works on a “split” of data
  • 38. MapReduce Job Submission Flow Map Node 1 Map Node 2 INPUT DATA www.edureka.co/big-data-and-hadoopSlide 38  Input data is distributed to nodes  Each map task works on a “split” of data  Mapper outputs intermediate data
  • 39. MapReduce Job Submission Flow Map Node 1 Map Node 2 INPUT DATA Node 1 Node 2 www.edureka.co/big-data-and-hadoopSlide 39  Input data is distributed to nodes  Each map task works on a “split” of data  Mapper outputs intermediate data  Data will be copied by the reducer processor once it identifies the respective task using application master for all data the reducer is responsible for
  • 40. MapReduce Job Submission Flow  Input data is distributed to nodes  Each map task works on a “split” of data  Mapper outputs intermediate data  Data will be copied by the reducer processor once it identifies the respective task using application master for all data the reducer is responsible for  Shuffle processor will sort and merge the data for a particular key Map Node 1 Map Node 2 Reduce Reduce INPUT DATA Node 1 Node 2 www.edureka.co/big-data-and-hadoopSlide 40
  • 41. MapReduce Job Submission Flow Map Node 1 Map Node 2 Reduce Reduce INPUT DATA Node 1 Node 2 www.edureka.co/big-data-and-hadoopSlide 41  Input data is distributed to nodes  Each map task works on a “split” of data  Mapper outputs intermediate data  Data will be copied by the reducer processor once it identifies the respective task using application master for all data the reducer is responsible for  Shuffle processor will sort and merge the data for a particular key  Reducer output is stored
  • 42. Annie’s Question www.edureka.co/big-data-and-hadoopSlide 42 MapReduce programming model provides a way for reducers to communicate with each other? » Yes, reducers running on the same machine can communicate with each other through shared memory » No, each reducer runs independently and in isolation.
  • 43. Annie’s Answer www.edureka.co/big-data-and-hadoopSlide 43 Ans. No, reducers run independently and in isolation. Individual tasks do not know the input source. Reducer tasks rely on Hadoop framework to deliver the appropriate input for processing.
  • 44. Annie’s Question www.edureka.co/big-data-and-hadoopSlide 44 Who specify Input Split Information? » randomly and decided by name node » randomly and decided by job tracker » line by Line and decided by Input Splitter » we will have to specify explicitly
  • 45. Annie’s Answer www.edureka.co/big-data-and-hadoopSlide 45 Ans. The client have to submit the input spit information by specifying the start and end point either in InputFormat Configuration.
  • 46. Overview of MapReduce Combiners Partitioners Combiners can be viewed as ‘mini-reducers’ in the Map phase. Complete view of MapReduce, illustrating combiners and partitioner in addition to Mappers and Reducers MapReduce www.edureka.co/big-data-and-hadoopSlide 46 Partitioners determine which reducer is responsible for a particular key.
  • 47. Combiner – Local Reduce Passed workload further to the Reducers Before we distribute the mapper results Mini-Reducers Perform a “Local Reduce” COMBINERS www.edureka.co/big-data-and-hadoopSlide 47
  • 48. Combiner Combiner Reducer (B,1) (C,1) (D,1) (E,1) (D,1) (B,1) (D,1) (A,1) (A,1) (C,1) (B,1) (D,1) (B,2) (C,1) (D,2) (E,1) (D,2) (A,2) (C,1) (B,1) (A, [2]) (B, [2,1]) (C, [1,1]) (D, [2,2]) (E, [1]) (A,2) (B,3) (C,2) (D,4) (E,1) Shuffle CombinerMapper Mapper B C D E D B D A A C B D Block1Block2 www.edureka.co/big-data-and-hadoopSlide 48
  • 49. Annie’s Question www.edureka.co/big-data-and-hadoopSlide 49 Combiner works at? » Mapper Level » Patitioner Level » Reducer Level » All the above
  • 50. Annie’s Answer www.edureka.co/big-data-and-hadoopSlide 50 Ans. Mapper level as Combiner works on the output data from Mapper.
  • 51. Annie’s Question www.edureka.co/big-data-and-hadoopSlide 51 Combiner can be considered as: » Semi Partitioner » Semi Reducer » Semi Shuffler » Major Reducer
  • 52. Annie’s Answer www.edureka.co/big-data-and-hadoopSlide 52 Ans. Semi Reducer. Combiner works on the Mapper output and lessen the burden on Reducer.
  • 53. Partitioner – Redirecting Output from Mapper Map Map Map Reducer Reducer Reducer Partitioner Partitioner Partitioner www.edureka.co/big-data-and-hadoopSlide 53
  • 54. Demo: Combiner and Partitioner www.edureka.co/big-data-and-hadoopSlide 54 Demo: Combiner and Partitioner MR Code
  • 55. Annie’s Question www.edureka.co/big-data-and-hadoopSlide 55 Can we use same logic for combiner and reducer? » No, they are separate entities. » Yes, only if reducer and combiner logic are commutative and associative and both of them are of same data types.
  • 56. Annie’s Answer www.edureka.co/big-data-and-hadoopSlide 56 Ans. Yes, you can use same logic if Reducer and Combiner logic are both commutative and associative and both of them are of same data types.
  • 57. Annie’s Question www.edureka.co/big-data-and-hadoopSlide 57 Can we change the format of output key class and output value class? » TRUE » FALSE
  • 60. Revisit De-identification Architecture HDFS Taking DB dump in CSV format and ingest into HDFS Store De-identified CSV file into HDFS De-identify columns based on configurations matches Map Task 1 Map Task 2 . . Read CSV file from HDFS Reduce Task 1 0100 1101 1001 0100 1101 1001 0100 1101 Reduce Task 2 . . Sqoop 1001 www.edureka.co/big-data-and-hadoopSlide 61
  • 61. public static String encrypt(String strToEncrypt, byte[] key) { try { Cipher cipher = Cipher.getInstance("AES/ECB/PKCS5Padding"); SecretKeySpec secretKey = new SecretKeySpec(key, "AES"); cipher.init(Cipher.ENCRYPT_MODE, secretKey); String encryptedString = Base64.encodeBase64String(cipher.doFinal(strToEncrypt.getBytes())); return encryptedString.trim(); } catch (Exception e) { logger.error("Error while encrypting", e); } return null; } } www.edureka.co/big-data-and-hadoopSlide 61 DeIdentify MapReduce Code
  • 62. Demo of DeIdentify Program www.edureka.co/big-data-and-hadoopSlide 62
  • 64. Demo of WeatherData Program www.edureka.co/big-data-and-hadoopSlide 64
  • 65. Assignment Write MapReduce code for WordCount on your own and run it on Edureka VM Download all the MapReduce codes from LMS and import them in your Eclipse IDE and execute them Try Maximum Temperature problem in MapReduce Try Hot and Cold day problem in MapReduce www.edureka.co/big-data-and-hadoopSlide 65
  • 66. Watch video “Running MapReduce Program” under Module-3 of your LMS Attempt the Word Count , Patents,& Alphabets assignment using the items present in the LMS under the tab Module 3 Review the Interview Questions for MapReduce http://www.edureka.in/blog/hadoop-interview-questions-mapreduce/ Review the Next Generation MapReduce (MRv2 or YARN) http://www.edureka.in/blog/apache-hadoop-2-0-and-yarn/ http://www.edureka.in/blog/hadoop-2-0-setting-up-a-single-node-cluster-in-15-minutes/ Setup the CDH4 Hadoop development environment using the documents present in the LMS http://blog.cloudera.com/blog/2013/08/how-to-use-eclipse-with-mapreduce-in-clouderas-quickstart-vm/ www.edureka.co/big-data-and-hadoopSlide 66 Pre-work
  • 67. Agenda for Next Class www.edureka.co/big-data-and-hadoopSlide 67  Map and Reduce Side Join Counters  DistributedCache  Custom Input Format  Sequence Input Format MRUnit
  • 68. Your feedback is important to us, be it a compliment, a suggestion or a complaint. It helps us to make the course better! Please spare few minutes to take the survey after the webinar. www.edureka.co/big-data-and-hadoopSlide 68 Survey