MapReduce is the heart of Apache® Hadoop®. MapReduce is a programming paradigm that runs in the background of Hadoop to provide scalability and easy data-processing solutions. In general terms, MapReduce is a framework for embarrassingly parallel computations that use potentially large data sets and a large number of nodes. Ideally, it also uses data that is stored locally on a particular node where the job is being executed. The computations are embarrassingly parallel because there is no communication between them. The run independent of one another.
2. WHAT IS MAPREDUCE
MapReduce is the heart of Apache® Hadoop®. MapReduce is a
programming paradigm that runs in the background of Hadoop to provide
scalability and easy data-processing solutions. In general terms,
MapReduce is a framework for embarrassingly parallel computations that
use potentially large data sets and a large number of nodes. Ideally, it also
uses data that is stored locally on a particular node where the job is being
executed. The computations are embarrassingly parallel because there is
no communication between them. The run independent of one another.
3. WHY USE MAPREDUCE?
MapReduce is a programming model designed for processing large
volumes of data in parallel by dividing the work into a set of independent
tasks. You just need to put business logic in the way map reduce works
and rest things will be taken care by the framework. Work (complete job)
which is submitted by user to master is divided into small small works
(tasks) and assigned to slaves.
4. HOW MAPREDUCE WORK
MapReduce has two steps. The first step, the "Map" step, takes the input
and breaks it into smaller sub-problems and distributes them to the
worker nodes. The worker nodes then send their results back to the
"master" node. The second step, the "Reduce" step, takes the results from
the worker nodes and combines them in some manner to create the
output, which is the output for the original job.
5. MAP
Map gets input from HDFS using marklogic connector and it splits that
input running across Hadoop Cluster.
▪ Input of Map task function in the form of Key/Value Pairs
▪ Main purpose of Map task is organize the data for reduce processing
▪ Input of Map tasks at file format
6. REDUCE
▪ Reduce gets input from Map tasks output.
▪ It having several reducers and independent to one another.
▪ Reducers are selected by used and default number of reducers is one
▪ Reduces should be create the final results based on Map taks output.
8. ARCHITECTURE AND COMPONENTS OF
MAPREDUCE
Job Client — It submits mapreduce jobs to job tracker
Job tracker — It is one part of master node and it assigns job to task
tracker
Task Tracker — It is one part of slave node and it track all task data. once
completed the task informed to job tracker
PayLoad— It is one type of applications mainly designed for MapReduce
functions
9. ARCHITECTURE AND COMPONENTS OF
MAPREDUCE
Mapper — Main Purpose of mapper is maps the input data to
indermediate key/value pairs
NameNode— It manages the HDFS Data
DataNode— It searches advance data are presents in processing places
Master Node — Main purpose of Master node is receives job data from
clients
Slavenode— it runs Map and Reduce jobs