1. 1/4/2021MAP REDUCE AND YARN 1
DEPT OF Information technology
MAP REDUCE AND YARN
PRESENTED BY
K.MANOJKUMAR(16BIT3051)
C.RANJITH
KUMAR(16BIT3078)
GUIDED BY
2. BIG DATA
• Big data is collection of massive amount
of structured , semi-structured and
unstructured data.
1/4/2021MAP REDUCE AND YARN 2
3. SOURCES OF DATA
•Social media
•Transport data
•Business transactions
•Bank and credit card data
1/4/2021MAP REDUCE AND YARN 3
4. HDFS
• HDFS holds very large amount of data and
provides easier access.
• To store such huge data, the files are
stored across multiple machines.
• HDFS is highly fault tolerant and designed
using low-cost hardware.
1/4/2021MAP REDUCE AND YARN 4
6. FEATURES OF HDFS
• It is suitable for the distributed storage
and processing.
• Hadoop provides a command interface
to interact with HDFS.
• Streaming access to file system data.
• HDFS provides file permissions and
authentication.
1/4/2021MAP REDUCE AND YARN 6
7. DISTRIBUTED FILE SYSTEM
• Highly scalable distributed file system
for large data-intensive applications.
• E.g. 10K nodes, 100 million files, 10 PB
• Provides redundant storage of massive
amounts of data on cheap and
unreliable computers
• Files are replicated to handle hardware
failure
• Detect failures and recovers from them
• Provides a platform over which other
systems like MapReduce. 1/4/2021MAP REDUCE AND YARN 7
8. CONCEPTS BEHIND DFS
•Map reduce
MR1
MR2
•Yarn
Both Map Reduce and Yarn are
running under the Hadoop.
1/4/2021MAP REDUCE AND YARN 8
9. BEFORE MAP REDUCE
• Large scale data processing was difficult!
• Managing hundreds or thousands of processors
• Managing parallelization and distribution
• I/O Scheduling
• Status and monitoring
• Fault/crash tolerance
• MapReduce provides all of these, easily!
1/4/2021MAP REDUCE AND YARN 9
10. MAP REDUCE -1
•Earlier version of map reduce called
MR-1.
•It runs only in Map reduce model.
•Here job and task tracker manages
the jobs and tasks.
1/4/2021MAP REDUCE AND YARN 10
11. MAP REDUCE -2
• New version of map reduce is called
MR2.
• Here job and task tracker disappeared.
• Each job control its own destiny. Each
job has application master taking care
of execution flow.
1/4/2021MAP REDUCE AND YARN 11
13. METHOD OF MAP & REDUCE
• Input: a set of key/value pairs
• User supplies two functions:
• map( k, v) list(k1,v1)
• reduce(k1, list(v1)) v2
• (k1,v1) is an intermediate key/value pair
• Output is the set of (k1,v2) pairs
1/4/2021MAP REDUCE AND YARN 13
16. HOW MAP AND REDUCE WORK
TOGETHER
Map returns
information
Reduces
accepts
information
Reduce applies
a user defined
function to
reduce the
amount of data
1/4/2021MAP REDUCE AND YARN 17
17. MAP REDUCE APPLICATIONS
• Yahoo!
• Web application uses Hadoop to create a database of
information on all known webpages
• Facebook
• Facebook data center uses Hadoop to provide
business statistics to application developers and
advertisers
• Rackspace
• Analyzes sever log files and usage data using
Hadoop 1/4/2021MAP REDUCE AND YARN 18
18. YARN
• Stands for Yet Another Resource Negotiator.
• New framework for managing resources.
• Yarn is a generic platform.
• Handles and schedules resource requests from
applications.
• Supervises the execution of the requests.
1/4/2021MAP REDUCE AND YARN 19
20. REFERENCES
• Jeffrey Dean and Sanjay Ghemawat,
MapReduce: Simplified Data Processing on
Large Clusters
http://labs.google.com/papers/mapreduce.html
• Sanjay Ghemawat, Howard Gobioff, and Shun-
T Leung, The Google File System,
http://labs.google.com/papers/gfs.html
1/4/2021MAP REDUCE AND YARN 21