Map reduce team and yarn

1/4/2021MAP REDUCE AND YARN 1
DEPT OF Information technology
MAP REDUCE AND YARN
PRESENTED BY
K.MANOJKUMAR(16BIT3051)
C.RANJITH
KUMAR(16BIT3078)
GUIDED BY

BIG DATA
• Big data is collection of massive amount
of structured , semi-structured and
unstructured data.

SOURCES OF DATA
•Social media
•Transport data
•Business transactions
•Bank and credit card data

HDFS
• HDFS holds very large amount of data and
provides easier access.
• To store such huge data, the files are
stored across multiple machines.
• HDFS is highly fault tolerant and designed
using low-cost hardware.

FEATURES OF HDFS
• It is suitable for the distributed storage
and processing.
• Hadoop provides a command interface
to interact with HDFS.
• Streaming access to file system data.
• HDFS provides file permissions and
authentication.

DISTRIBUTED FILE SYSTEM
• Highly scalable distributed file system
for large data-intensive applications.
• E.g. 10K nodes, 100 million files, 10 PB
• Provides redundant storage of massive
amounts of data on cheap and
unreliable computers
• Files are replicated to handle hardware
failure
• Detect failures and recovers from them
• Provides a platform over which other
systems like MapReduce. 1/4/2021MAP REDUCE AND YARN 7

CONCEPTS BEHIND DFS
•Map reduce
MR1
MR2
•Yarn
Both Map Reduce and Yarn are
running under the Hadoop.

BEFORE MAP REDUCE
• Large scale data processing was difficult!
• Managing hundreds or thousands of processors
• Managing parallelization and distribution
• I/O Scheduling
• Status and monitoring
• Fault/crash tolerance
• MapReduce provides all of these, easily!

MAP REDUCE -1
•Earlier version of map reduce called
MR-1.
•It runs only in Map reduce model.
•Here job and task tracker manages
the jobs and tasks.

MAP REDUCE -2
• New version of map reduce is called
MR2.
• Here job and task tracker disappeared.
• Each job control its own destiny. Each
job has application master taking care
of execution flow.

MAP REDUCE-2
CHARACTERISTICS
•More Isolated
•Scalable compared to Map reduce -
1.
•It runs Map reduce framework top
of the yarn.

METHOD OF MAP & REDUCE
• Input: a set of key/value pairs
• User supplies two functions:
• map( k, v)  list(k1,v1)
• reduce(k1, list(v1))  v2
• (k1,v1) is an intermediate key/value pair
• Output is the set of (k1,v2) pairs

MAP EXAMPLE

REDUCE EXAMPLE

HOW MAP AND REDUCE WORK
TOGETHER
Map returns
information
Reduces
accepts
information
Reduce applies
a user defined
function to
reduce the
amount of data

MAP REDUCE APPLICATIONS
• Yahoo!
• Web application uses Hadoop to create a database of
information on all known webpages
• Facebook
• Facebook data center uses Hadoop to provide
business statistics to application developers and
advertisers
• Rackspace
• Analyzes sever log files and usage data using
Hadoop 1/4/2021MAP REDUCE AND YARN 18

YARN
• Stands for Yet Another Resource Negotiator.
• New framework for managing resources.
• Yarn is a generic platform.
• Handles and schedules resource requests from
applications.
• Supervises the execution of the requests.

YARN

REFERENCES
• Jeffrey Dean and Sanjay Ghemawat,
MapReduce: Simplified Data Processing on
Large Clusters
http://labs.google.com/papers/mapreduce.html
• Sanjay Ghemawat, Howard Gobioff, and Shun-
T Leung, The Google File System,
http://labs.google.com/papers/gfs.html

Map reduce team and yarn

Recommended

Recommended

More Related Content

Similar to Map reduce team and yarn

Similar to Map reduce team and yarn (20)

Recently uploaded

Recently uploaded (20)

Map reduce team and yarn