SlideShare a Scribd company logo
1 of 6
Download to read offline
Lecture 05
Hadoop MapReduce 1.0
Hadoop MapReduce 1.0 version
Refer slide time :( 0:18)
So, MapReduce is an execution engine of Hadoop and we are going to briefly describe Hadoop 1.0, its
components, that is MapReduce which runs on HDFS. So, MapReduce is the programming paradigm, of
the Hadoop system, for big data computing and it also performs, in MapReduce version 1.0 the resource
management and the data processing, aspects. Also which runs over HDFS 1.0. So, we are going to cover
this Hadoop or a MapReduce 1.0 version, in a brief.
Refer slide time :( 01:04)
So, MapReduce has two different major components, one is called the,’ Job Tracker,’ the other one is
called the,’ Task Tracker’.
Refer slide time :( 01:15)
And the scenario of the job tracker is, you can see here, in this particular diagram. So, this is the job
tracker, which is a part of MapReduce version one. Now this particular job tracker runs, on the master or
this is the master node. So, this master node and this name node, is a part of HDFS, HDFS 1.0 version.
So, both of them may resides on the same machine or may not be in the same machine, but for the sake of
simplicity we assume that the name node and the job tracker resides, on the same node which is called a
‘Master’, over here in this particular scenario, since it is having a master and several slaves. So, hence this
is basically a client-server architecture, which is followed in MapReduce, version 1.0 and also in the
HDFS version 1.0. So, in this particular diagram, we can see here, the job tracker resides on a particular
node which is basically the, the master node. And on the same master node, let us assume that as DFS
name node is there. So, we are not going to refer in this part of the discussion why because, we are
focusing only on the MapReduce.
So, hence we are not going to discuss this name node part, of HDFS. Now another component, is called
the, ‘Task Tracker’, the task record may resides, on the same node also resides on other different slave
nodes. So, the job tracker and task tracker they, they run in the form of a client-server model. So, job
tracker is basically, running as a must is a server and the task records is run as its client. So, it's a client-
server model let us understand, more into the functionality, of job tracker. So, job tracker as, I have
already mentioned, is hosted inside the master and receives the job execution requests, from the client.
So, the so, the client or the application, which basically is nothing but a MapReduce, program when it is
submitted by the client, then the job tracker has to deal with that particular program execution. So, its
main duty is to break, down the receive, its main duties are to break down, the received job, that is a that
is the big data computation, specified in the form of MapReduce jobs. And this particular MapReduce, is
divided into the smaller chunks and that is the small parts and these small parts that is called, ‘Chunks’,
are allocated with the map function and map and reduced function and this particular partial,
contradictions are happening at this particular slave nodes with the help of the, task tracker. And this is
the, the entire unit of execution, of this particular job. So, let us see the more detail of the task tracker. So,
the task tracker is the MapReduce component, on the slave machine, as there are multiple slave machines,
as we have shown here five of them. So, many task records are available, in the cluster its duties to
perform the computations, which are assigned by the by that the job tracker, on the data which is
available on the slave machines. So, the task tracker will communicate, the progress and report the result
back to the job tracker, the master node contains the job tracker and the name node whereas all the slave
nodes contain the, the task tracker and data nodes. So, in this particular way, the job tracker keeps the
track, of the map and reduce jobs, which are being allocated at different nodes and which are executing
on, the data set which are assigned which are allocated, to these particular nodes, where the data is there
the, the map and reduced function or the configuration will be performed. So, it is a computation engine.
So, MapReduce is a computation engine, in version 1.0. so, not only it allocates, the MapReduce jobs to
different slave nodes, where the data also resides, in the form of a chunks and it will then connect and so,
basically not only it assigns but also it tracks, keep track ,of the progress and the resources, which is being
allocated. Okay? Okay? Discovery hmm physically in order the execution steps.
Refer slide time :( 08:04)
So, we are going to now trace all the execution steps, for the life cycle of a MapReduce job, till the
application is submitted, by the client and to the to the MapReduce and it finishes and we are going to
trace the, the execution cycle or the execution steps in the MapReduce, version 1.0. so, the first step is,
the client submits the job, to the job tracker for example here, we have to find out we have to 20,000
records, of the customer and we want to find out all the customers from Mumbai. So, that means the
query is basically to be executed, on these data set and this particular operation to find all the customer,
this is the query, which is to be performed, using the MapReduce program which is being submitted. So,
this particular request is being submitted to the job tracker and job tracker will ask the name note, about
the location of this particular data set. So, the job tracker will consult the name node, which is a part of
HDFS, to find out the location, of the data where it is being installed. So, now I say that this 20,000
records are divided like this, there are five different nodes, which stores all of them this is node number
one, two, three, four, and five. So, the records first four thousand records stored are installed on this
particular node number one and the next 4,000 is stored on the node number two and then next four
thousand node number three and furthermore node number 4 and 5 respectively stores the remaining
twenty thousand records.
Now this is called the chunks or the splits. The entire, two thousand twenty thousand record is splitted
and stored on four different, five different nodes and this information will be given from, this name node
back to the job tracker now the job tracker as per the reply by the name node the job tracker ask the
respective task tracker to execute the tasks on their data. So, therefore, this particular job tracker, now
assigns the, the, the map function or map and reduce function map function, on the stash tracker to
execute, on that data chunk. So, with this particular direction the task tracker will perform this execution,
at all places there in parallel. So, this particular execution of MapReduce program is done in parallel, on
all the chunks. So, after the computation, of a MapReduce so, all the results are stored on the same data
node, so whatever is the result it will be stored on the same data node and the name node is informed,
about these particular results. So, the task tracker informs the job tracker, about the completion and the
progress of this jobs assigned to those job tracker now the job tracker informs this particular completion,
to that particular client and the client contacts the name node and retrieve the result back. So, after
completing it this particular job tracker will inform, to the client about the completion of this particular
job, that means now the result of the query is now ready, which the client will be able to access, with the
help of name node and gets back the result. So, this is the entire execution engine, of which is, there in the
form of, a MapReduce version 1.0, that we have briefly explained. Thank you.

More Related Content

Similar to lec5_ref.pdf

Big data unit iv and v lecture notes qb model exam
Big data unit iv and v lecture notes   qb model examBig data unit iv and v lecture notes   qb model exam
Big data unit iv and v lecture notes qb model exam
Indhujeni
 
Hadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pigHadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pig
KhanKhaja1
 
Hadoop interview question
Hadoop interview questionHadoop interview question
Hadoop interview question
pappupassindia
 

Similar to lec5_ref.pdf (20)

lec10_ref.pdf
lec10_ref.pdflec10_ref.pdf
lec10_ref.pdf
 
Big data unit iv and v lecture notes qb model exam
Big data unit iv and v lecture notes   qb model examBig data unit iv and v lecture notes   qb model exam
Big data unit iv and v lecture notes qb model exam
 
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comHadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.com
 
Unit3 MapReduce
Unit3 MapReduceUnit3 MapReduce
Unit3 MapReduce
 
Unit 2
Unit 2Unit 2
Unit 2
 
MapReduce-Notes.pdf
MapReduce-Notes.pdfMapReduce-Notes.pdf
MapReduce-Notes.pdf
 
lec7_ref.pdf
lec7_ref.pdflec7_ref.pdf
lec7_ref.pdf
 
Hadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pigHadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pig
 
lec15_ref.pdf
lec15_ref.pdflec15_ref.pdf
lec15_ref.pdf
 
lec2_ref.pdf
lec2_ref.pdflec2_ref.pdf
lec2_ref.pdf
 
MapReduce.pptx
MapReduce.pptxMapReduce.pptx
MapReduce.pptx
 
BIG DATA ANALYTICS
BIG DATA ANALYTICSBIG DATA ANALYTICS
BIG DATA ANALYTICS
 
lec9_ref.pdf
lec9_ref.pdflec9_ref.pdf
lec9_ref.pdf
 
Hadoop interview question
Hadoop interview questionHadoop interview question
Hadoop interview question
 
Hadoop Interview Questions and Answers
Hadoop Interview Questions and AnswersHadoop Interview Questions and Answers
Hadoop Interview Questions and Answers
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
 
Hadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsHadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questions
 
MapReduce
MapReduceMapReduce
MapReduce
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questions
 
Enhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop ClusterEnhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop Cluster
 

More from vishal choudhary

More from vishal choudhary (20)

SE-Lecture1.ppt
SE-Lecture1.pptSE-Lecture1.ppt
SE-Lecture1.ppt
 
SE-Testing.ppt
SE-Testing.pptSE-Testing.ppt
SE-Testing.ppt
 
SE-CyclomaticComplexityand Testing.ppt
SE-CyclomaticComplexityand Testing.pptSE-CyclomaticComplexityand Testing.ppt
SE-CyclomaticComplexityand Testing.ppt
 
SE-Lecture-7.pptx
SE-Lecture-7.pptxSE-Lecture-7.pptx
SE-Lecture-7.pptx
 
Se-Lecture-6.ppt
Se-Lecture-6.pptSe-Lecture-6.ppt
Se-Lecture-6.ppt
 
SE-Lecture-5.pptx
SE-Lecture-5.pptxSE-Lecture-5.pptx
SE-Lecture-5.pptx
 
XML.pptx
XML.pptxXML.pptx
XML.pptx
 
SE-Lecture-8.pptx
SE-Lecture-8.pptxSE-Lecture-8.pptx
SE-Lecture-8.pptx
 
SE-coupling and cohesion.ppt
SE-coupling and cohesion.pptSE-coupling and cohesion.ppt
SE-coupling and cohesion.ppt
 
SE-Lecture-2.pptx
SE-Lecture-2.pptxSE-Lecture-2.pptx
SE-Lecture-2.pptx
 
SE-software design.ppt
SE-software design.pptSE-software design.ppt
SE-software design.ppt
 
SE1.ppt
SE1.pptSE1.ppt
SE1.ppt
 
SE-Lecture-4.pptx
SE-Lecture-4.pptxSE-Lecture-4.pptx
SE-Lecture-4.pptx
 
SE-Lecture=3.pptx
SE-Lecture=3.pptxSE-Lecture=3.pptx
SE-Lecture=3.pptx
 
Multimedia-Lecture-Animation.pptx
Multimedia-Lecture-Animation.pptxMultimedia-Lecture-Animation.pptx
Multimedia-Lecture-Animation.pptx
 
MultimediaLecture5.pptx
MultimediaLecture5.pptxMultimediaLecture5.pptx
MultimediaLecture5.pptx
 
Multimedia-Lecture-7.pptx
Multimedia-Lecture-7.pptxMultimedia-Lecture-7.pptx
Multimedia-Lecture-7.pptx
 
MultiMedia-Lecture-4.pptx
MultiMedia-Lecture-4.pptxMultiMedia-Lecture-4.pptx
MultiMedia-Lecture-4.pptx
 
Multimedia-Lecture-6.pptx
Multimedia-Lecture-6.pptxMultimedia-Lecture-6.pptx
Multimedia-Lecture-6.pptx
 
Multimedia-Lecture-3.pptx
Multimedia-Lecture-3.pptxMultimedia-Lecture-3.pptx
Multimedia-Lecture-3.pptx
 

Recently uploaded

Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Christo Ananth
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
Tonystark477637
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 

Recently uploaded (20)

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 

lec5_ref.pdf

  • 1. Lecture 05 Hadoop MapReduce 1.0 Hadoop MapReduce 1.0 version Refer slide time :( 0:18)
  • 2. So, MapReduce is an execution engine of Hadoop and we are going to briefly describe Hadoop 1.0, its components, that is MapReduce which runs on HDFS. So, MapReduce is the programming paradigm, of the Hadoop system, for big data computing and it also performs, in MapReduce version 1.0 the resource management and the data processing, aspects. Also which runs over HDFS 1.0. So, we are going to cover this Hadoop or a MapReduce 1.0 version, in a brief. Refer slide time :( 01:04)
  • 3. So, MapReduce has two different major components, one is called the,’ Job Tracker,’ the other one is called the,’ Task Tracker’. Refer slide time :( 01:15)
  • 4. And the scenario of the job tracker is, you can see here, in this particular diagram. So, this is the job tracker, which is a part of MapReduce version one. Now this particular job tracker runs, on the master or this is the master node. So, this master node and this name node, is a part of HDFS, HDFS 1.0 version. So, both of them may resides on the same machine or may not be in the same machine, but for the sake of simplicity we assume that the name node and the job tracker resides, on the same node which is called a ‘Master’, over here in this particular scenario, since it is having a master and several slaves. So, hence this is basically a client-server architecture, which is followed in MapReduce, version 1.0 and also in the HDFS version 1.0. So, in this particular diagram, we can see here, the job tracker resides on a particular node which is basically the, the master node. And on the same master node, let us assume that as DFS name node is there. So, we are not going to refer in this part of the discussion why because, we are focusing only on the MapReduce. So, hence we are not going to discuss this name node part, of HDFS. Now another component, is called the, ‘Task Tracker’, the task record may resides, on the same node also resides on other different slave nodes. So, the job tracker and task tracker they, they run in the form of a client-server model. So, job tracker is basically, running as a must is a server and the task records is run as its client. So, it's a client- server model let us understand, more into the functionality, of job tracker. So, job tracker as, I have already mentioned, is hosted inside the master and receives the job execution requests, from the client. So, the so, the client or the application, which basically is nothing but a MapReduce, program when it is submitted by the client, then the job tracker has to deal with that particular program execution. So, its main duty is to break, down the receive, its main duties are to break down, the received job, that is a that is the big data computation, specified in the form of MapReduce jobs. And this particular MapReduce, is divided into the smaller chunks and that is the small parts and these small parts that is called, ‘Chunks’, are allocated with the map function and map and reduced function and this particular partial, contradictions are happening at this particular slave nodes with the help of the, task tracker. And this is
  • 5. the, the entire unit of execution, of this particular job. So, let us see the more detail of the task tracker. So, the task tracker is the MapReduce component, on the slave machine, as there are multiple slave machines, as we have shown here five of them. So, many task records are available, in the cluster its duties to perform the computations, which are assigned by the by that the job tracker, on the data which is available on the slave machines. So, the task tracker will communicate, the progress and report the result back to the job tracker, the master node contains the job tracker and the name node whereas all the slave nodes contain the, the task tracker and data nodes. So, in this particular way, the job tracker keeps the track, of the map and reduce jobs, which are being allocated at different nodes and which are executing on, the data set which are assigned which are allocated, to these particular nodes, where the data is there the, the map and reduced function or the configuration will be performed. So, it is a computation engine. So, MapReduce is a computation engine, in version 1.0. so, not only it allocates, the MapReduce jobs to different slave nodes, where the data also resides, in the form of a chunks and it will then connect and so, basically not only it assigns but also it tracks, keep track ,of the progress and the resources, which is being allocated. Okay? Okay? Discovery hmm physically in order the execution steps. Refer slide time :( 08:04)
  • 6. So, we are going to now trace all the execution steps, for the life cycle of a MapReduce job, till the application is submitted, by the client and to the to the MapReduce and it finishes and we are going to trace the, the execution cycle or the execution steps in the MapReduce, version 1.0. so, the first step is, the client submits the job, to the job tracker for example here, we have to find out we have to 20,000 records, of the customer and we want to find out all the customers from Mumbai. So, that means the query is basically to be executed, on these data set and this particular operation to find all the customer, this is the query, which is to be performed, using the MapReduce program which is being submitted. So, this particular request is being submitted to the job tracker and job tracker will ask the name note, about the location of this particular data set. So, the job tracker will consult the name node, which is a part of HDFS, to find out the location, of the data where it is being installed. So, now I say that this 20,000 records are divided like this, there are five different nodes, which stores all of them this is node number one, two, three, four, and five. So, the records first four thousand records stored are installed on this particular node number one and the next 4,000 is stored on the node number two and then next four thousand node number three and furthermore node number 4 and 5 respectively stores the remaining twenty thousand records. Now this is called the chunks or the splits. The entire, two thousand twenty thousand record is splitted and stored on four different, five different nodes and this information will be given from, this name node back to the job tracker now the job tracker as per the reply by the name node the job tracker ask the respective task tracker to execute the tasks on their data. So, therefore, this particular job tracker, now assigns the, the, the map function or map and reduce function map function, on the stash tracker to execute, on that data chunk. So, with this particular direction the task tracker will perform this execution, at all places there in parallel. So, this particular execution of MapReduce program is done in parallel, on all the chunks. So, after the computation, of a MapReduce so, all the results are stored on the same data node, so whatever is the result it will be stored on the same data node and the name node is informed, about these particular results. So, the task tracker informs the job tracker, about the completion and the progress of this jobs assigned to those job tracker now the job tracker informs this particular completion, to that particular client and the client contacts the name node and retrieve the result back. So, after completing it this particular job tracker will inform, to the client about the completion of this particular job, that means now the result of the query is now ready, which the client will be able to access, with the help of name node and gets back the result. So, this is the entire execution engine, of which is, there in the form of, a MapReduce version 1.0, that we have briefly explained. Thank you.