Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Hadoop, mapreduce and yarn networks
1. Paper name : Big Data Analytics
Staff : Mrs M. Florence Dayana M. C. A., M.Phil., (Ph.D.)
Class : II- M.Sc.(Computer Science)
Semester : IV
Unit : V
Topic : Hadoop, MapReduce and YARN Frameworks
3. MAPREDUCE:
• MapReduce is a framework which can be used to write
applications to process very large amount of data, in parallel,
on large clusters of hardware in a reliable manner.
• MapReduce is a processing technique and a program model
for distributed computing systems that are based on java.
• The MapReduce algorithm is based on two important tasks.
They are Map and Reduce.
• Map takes a set of data and converts it into another set of
data, where the individual elements are broken down into
tuples.
• Reduce takes the output from a map as an input and
combines those data tuples into a smaller set of tuples.
6. Map stage :
The map’s job is to process the input data.
Generally the input data will be in the form of file or
directory and usually it is stored in the Hadoop
distributed file system (HDFS). The input file is passed to
the mapper function as each line. The mapper processes
the data and creates several small parts of data.
Reduce stage :
Reduce stage is the combination of the Shuffle
stage and the Reduce stage. Its main job is to process the
data that comes from the mapper. After processing, it
produces a another set of output, which will be stored in
the Hadoop distributed file system(HDFS).
7. YARN Framework:
• YARN stands for Yet Another Resource Manager
• It takes programming to the next level beyond
Java. YARN makes it interactive to let another
application HBase, Spark etc. to work on it.
• Different Yarn applications can co-exist with the
same cluster so MapReduce, HBase, Spark all can
run at the same time.
• Thus, it can bring a great benefits for
manageability and cluster utilization.
8.
9. SERIALIZATION:
• The process of translating structure of an data or
objects state into binary or textual form to transport
the data over network or to store on some persistent
storage is known as serialization.
• When the data is transported over network or
retrieved from the persistent storage, it needs to be
de-serialized again and vice versa.
• The process of serialization is termed
as marshalling.
• The process of deserialization is termed
as unmarshalling.