The document discusses Hadoop MapReduce and YARN. It provides an overview of MapReduce concepts like parallel processing and data locality. An example of vote counting is used to illustrate the MapReduce approach. Key components of MapReduce like Map, Reduce, and YARN are explained. The YARN application workflow is described through 8 steps from client submission to application completion. Hands-on MapReduce programming and learning resources are also mentioned.
2. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Agenda for today’s Session
1. What is Hadoop MapReduce?
2. MapReduce In Nutshell
3. Advantages of MapReduce
4. Hadoop MapReduce Approach with an Example
5. Hadoop MapReduce/YARN Components
6. YARN With MapReduce
7. Yarn Application Workflow
8. MapReduce Program with Hands On
4. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
MapReduce: Data Processing Using Programming
Big Data
Result
Hadoop MapReduce is the processing component of Apache
Hadoop
It processes data parallelly in distributed environment
5. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
MapReduce In Nutshell
MapReduce
FeaturesLarge Scale
Distributed Model
Used in
Function
Design Pattern
Parallel
Programming
A Program Model
Classification
Analytics
Recommendation
Index and Search
Map
Reduce
Classification
Eg: Top N records
Analytics
Eg: Join, Selection
Recommendation
Eg: Sort
Summarization
Eg: Inverted Index
Implemented
Google
Apache Hadoop
HDFS
Pig
Hive
HBase
For
8. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Moving Data to processing is very costly
In MapReduce, we move processing to
Data
Advantage 2: Data Locality - Processing to Storage
Slave A
Slave B
Slave C Slave D
Slave E
Data
Master
10. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Election Votes Counting
Election Votes Casting
Votes is stored at different Booths
Result Centre has the details of all the Booths
Data
Booth A
Booth B
Booth C Booth D
Booth E
Result
Centre
11. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Election Votes Counting – Traditional Way
Counting – Traditional Approach
Votes are moved to Result Centre for
counting
Moving all the votes to Centre is costly
Result Centre is over-burdened
Counting takes time
Data
Booth A
Booth B
Booth C Booth D
Booth E
Result
Centre
13. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Election Votes Counting – MapReduce Way
Booth A
Booth B
Booth C Booth D
Booth E
Result
Centre
Counting – MapReduce Approach
Votes are counted at individual booths
Booth-wise results are sent back to the result
centre
Final Result is declared easily and quickly using
this way
Votes
23. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Hadoop 2.x MapReduce Yarn Components
ApplicationMaster
» One per application
» Short life
» Coordinates and Manages MapReduce Jobs
» Negotiates with Resource Manager to
schedule tasks
» The tasks are started by NodeManager(s)
Job History Server
» Maintains information about submitted
MapReduce jobs after their ApplicationMaster
terminates
Client
» Submits a MapReduce Job
Resource Manager
» Cluster Level resource manager
» Long Life, High Quality Hardware
Node Manager
» One per Data Node
» Monitors resources on Data Node
Container
» Created by NM when requested
» Allocates certain amount of resources
(memory, CPU etc.) on a slave node
29. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Application Workflow
Execution Sequence :
1. Client submits an application
2. RM allocates a container to start AM
3. AM registers with RM
4. AM asks containers from RM
Client RM NM AM
1
2
3
4
30. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Application Workflow
Execution Sequence :
1. Client submits an application
2. RM allocates a container to start AM
3. AM registers with RM
4. AM asks containers from RM
5. AM notifies NM to launch containers
Client RM NM AM
1
2
3
4
5
31. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Application Workflow
Execution Sequence :
1. Client submits an application
2. RM allocates a container to start AM
3. AM registers with RM
4. AM asks containers from RM
5. AM notifies NM to launch containers
6. Application code is executed in container
Client RM NM AM
1
2
3
4
5
6
32. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Application Workflow
Execution Sequence :
1. Client submits an application
2. RM allocates a container to start AM
3. AM registers with RM
4. AM asks containers from RM
5. AM notifies NM to launch containers
6. Application code is executed in container
7. Client contacts RM/AM to monitor application’s status
Client RM NM AM
1
2
3
4
5
7 6
33. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Application Workflow
Execution Sequence :
1. Client submits an application
2. RM allocates a container to start AM
3. AM registers with RM
4. AM asks containers from RM
5. AM notifies NM to launch containers
6. Application code is executed in container
7. Client contacts RM/AM to monitor application’s status
8. AM unregisters with RM
Client RM NM AM
1
2
3
4
5
7
8
6