SlideShare a Scribd company logo
1 of 21
1/4/2021MAP REDUCE AND YARN 1
DEPT OF Information technology
MAP REDUCE AND YARN
PRESENTED BY
K.MANOJKUMAR(16BIT3051)
C.RANJITH
KUMAR(16BIT3078)
GUIDED BY
BIG DATA
• Big data is collection of massive amount
of structured , semi-structured and
unstructured data.
1/4/2021MAP REDUCE AND YARN 2
SOURCES OF DATA
•Social media
•Transport data
•Business transactions
•Bank and credit card data
1/4/2021MAP REDUCE AND YARN 3
HDFS
• HDFS holds very large amount of data and
provides easier access.
• To store such huge data, the files are
stored across multiple machines.
• HDFS is highly fault tolerant and designed
using low-cost hardware.
1/4/2021MAP REDUCE AND YARN 4
1/4/2021MAP REDUCE AND YARN 5
FEATURES OF HDFS
• It is suitable for the distributed storage
and processing.
• Hadoop provides a command interface
to interact with HDFS.
• Streaming access to file system data.
• HDFS provides file permissions and
authentication.
1/4/2021MAP REDUCE AND YARN 6
DISTRIBUTED FILE SYSTEM
• Highly scalable distributed file system
for large data-intensive applications.
• E.g. 10K nodes, 100 million files, 10 PB
• Provides redundant storage of massive
amounts of data on cheap and
unreliable computers
• Files are replicated to handle hardware
failure
• Detect failures and recovers from them
• Provides a platform over which other
systems like MapReduce. 1/4/2021MAP REDUCE AND YARN 7
CONCEPTS BEHIND DFS
•Map reduce
MR1
MR2
•Yarn
Both Map Reduce and Yarn are
running under the Hadoop.
1/4/2021MAP REDUCE AND YARN 8
BEFORE MAP REDUCE
• Large scale data processing was difficult!
• Managing hundreds or thousands of processors
• Managing parallelization and distribution
• I/O Scheduling
• Status and monitoring
• Fault/crash tolerance
• MapReduce provides all of these, easily!
1/4/2021MAP REDUCE AND YARN 9
MAP REDUCE -1
•Earlier version of map reduce called
MR-1.
•It runs only in Map reduce model.
•Here job and task tracker manages
the jobs and tasks.
1/4/2021MAP REDUCE AND YARN 10
MAP REDUCE -2
• New version of map reduce is called
MR2.
• Here job and task tracker disappeared.
• Each job control its own destiny. Each
job has application master taking care
of execution flow.
1/4/2021MAP REDUCE AND YARN 11
MAP REDUCE-2
CHARACTERISTICS
•More Isolated
•Scalable compared to Map reduce -
1.
•It runs Map reduce framework top
of the yarn.
1/4/2021MAP REDUCE AND YARN 12
METHOD OF MAP & REDUCE
• Input: a set of key/value pairs
• User supplies two functions:
• map( k, v)  list(k1,v1)
• reduce(k1, list(v1))  v2
• (k1,v1) is an intermediate key/value pair
• Output is the set of (k1,v2) pairs
1/4/2021MAP REDUCE AND YARN 13
MAP EXAMPLE
1/4/2021MAP REDUCE AND YARN 14
REDUCE EXAMPLE
1/4/2021MAP REDUCE AND YARN 15
HOW MAP AND REDUCE WORK
TOGETHER
Map returns
information
Reduces
accepts
information
Reduce applies
a user defined
function to
reduce the
amount of data
1/4/2021MAP REDUCE AND YARN 17
MAP REDUCE APPLICATIONS
• Yahoo!
• Web application uses Hadoop to create a database of
information on all known webpages
• Facebook
• Facebook data center uses Hadoop to provide
business statistics to application developers and
advertisers
• Rackspace
• Analyzes sever log files and usage data using
Hadoop 1/4/2021MAP REDUCE AND YARN 18
YARN
• Stands for Yet Another Resource Negotiator.
• New framework for managing resources.
• Yarn is a generic platform.
• Handles and schedules resource requests from
applications.
• Supervises the execution of the requests.
1/4/2021MAP REDUCE AND YARN 19
YARN
1/4/2021MAP REDUCE AND YARN 20
REFERENCES
• Jeffrey Dean and Sanjay Ghemawat,
MapReduce: Simplified Data Processing on
Large Clusters
http://labs.google.com/papers/mapreduce.html
• Sanjay Ghemawat, Howard Gobioff, and Shun-
T Leung, The Google File System,
http://labs.google.com/papers/gfs.html
1/4/2021MAP REDUCE AND YARN 21
1/4/2021MAP REDUCE AND YARN 22

More Related Content

Similar to Map reduce team and yarn

Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node CombinersHadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
ijcsit
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
Dilip Reddy
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
Dilip Reddy
 
Generating Frequent Itemsets by RElim on Hadoop Clusters
Generating Frequent Itemsets by RElim on Hadoop ClustersGenerating Frequent Itemsets by RElim on Hadoop Clusters
Generating Frequent Itemsets by RElim on Hadoop Clusters
BRNSSPublicationHubI
 

Similar to Map reduce team and yarn (20)

Bulk Loading Into HBase With MapReduce
Bulk Loading Into HBase With MapReduceBulk Loading Into HBase With MapReduce
Bulk Loading Into HBase With MapReduce
 
Distributed Cache With MapReduce
Distributed Cache With MapReduceDistributed Cache With MapReduce
Distributed Cache With MapReduce
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
02 Map Reduce
02 Map Reduce02 Map Reduce
02 Map Reduce
 
XML Parsing with Map Reduce
XML Parsing with Map ReduceXML Parsing with Map Reduce
XML Parsing with Map Reduce
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
 
MapReduce: Simplified Data Processing On Large Clusters
MapReduce: Simplified Data Processing On Large ClustersMapReduce: Simplified Data Processing On Large Clusters
MapReduce: Simplified Data Processing On Large Clusters
 
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node CombinersHadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
48a tuning
48a tuning48a tuning
48a tuning
 
Taming Latency: Case Studies in MapReduce Data Analytics
Taming Latency: Case Studies in MapReduce Data AnalyticsTaming Latency: Case Studies in MapReduce Data Analytics
Taming Latency: Case Studies in MapReduce Data Analytics
 
Hadoop
HadoopHadoop
Hadoop
 
MapReduce
MapReduceMapReduce
MapReduce
 
Stratosphere with big_data_analytics
Stratosphere with big_data_analyticsStratosphere with big_data_analytics
Stratosphere with big_data_analytics
 
Generating Frequent Itemsets by RElim on Hadoop Clusters
Generating Frequent Itemsets by RElim on Hadoop ClustersGenerating Frequent Itemsets by RElim on Hadoop Clusters
Generating Frequent Itemsets by RElim on Hadoop Clusters
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A Survey
 
20181026 streaming architecture
20181026 streaming architecture20181026 streaming architecture
20181026 streaming architecture
 
Apache Hadoop YARN
Apache Hadoop YARNApache Hadoop YARN
Apache Hadoop YARN
 
Hadoop Internals
Hadoop InternalsHadoop Internals
Hadoop Internals
 

Recently uploaded

Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
23050636
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
wsppdmt
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
jk0tkvfv
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
yulianti213969
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
mikehavy0
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
varanasisatyanvesh
 

Recently uploaded (20)

Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchers
 
jll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdfjll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdf
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
 
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTSDBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 

Map reduce team and yarn

  • 1. 1/4/2021MAP REDUCE AND YARN 1 DEPT OF Information technology MAP REDUCE AND YARN PRESENTED BY K.MANOJKUMAR(16BIT3051) C.RANJITH KUMAR(16BIT3078) GUIDED BY
  • 2. BIG DATA • Big data is collection of massive amount of structured , semi-structured and unstructured data. 1/4/2021MAP REDUCE AND YARN 2
  • 3. SOURCES OF DATA •Social media •Transport data •Business transactions •Bank and credit card data 1/4/2021MAP REDUCE AND YARN 3
  • 4. HDFS • HDFS holds very large amount of data and provides easier access. • To store such huge data, the files are stored across multiple machines. • HDFS is highly fault tolerant and designed using low-cost hardware. 1/4/2021MAP REDUCE AND YARN 4
  • 6. FEATURES OF HDFS • It is suitable for the distributed storage and processing. • Hadoop provides a command interface to interact with HDFS. • Streaming access to file system data. • HDFS provides file permissions and authentication. 1/4/2021MAP REDUCE AND YARN 6
  • 7. DISTRIBUTED FILE SYSTEM • Highly scalable distributed file system for large data-intensive applications. • E.g. 10K nodes, 100 million files, 10 PB • Provides redundant storage of massive amounts of data on cheap and unreliable computers • Files are replicated to handle hardware failure • Detect failures and recovers from them • Provides a platform over which other systems like MapReduce. 1/4/2021MAP REDUCE AND YARN 7
  • 8. CONCEPTS BEHIND DFS •Map reduce MR1 MR2 •Yarn Both Map Reduce and Yarn are running under the Hadoop. 1/4/2021MAP REDUCE AND YARN 8
  • 9. BEFORE MAP REDUCE • Large scale data processing was difficult! • Managing hundreds or thousands of processors • Managing parallelization and distribution • I/O Scheduling • Status and monitoring • Fault/crash tolerance • MapReduce provides all of these, easily! 1/4/2021MAP REDUCE AND YARN 9
  • 10. MAP REDUCE -1 •Earlier version of map reduce called MR-1. •It runs only in Map reduce model. •Here job and task tracker manages the jobs and tasks. 1/4/2021MAP REDUCE AND YARN 10
  • 11. MAP REDUCE -2 • New version of map reduce is called MR2. • Here job and task tracker disappeared. • Each job control its own destiny. Each job has application master taking care of execution flow. 1/4/2021MAP REDUCE AND YARN 11
  • 12. MAP REDUCE-2 CHARACTERISTICS •More Isolated •Scalable compared to Map reduce - 1. •It runs Map reduce framework top of the yarn. 1/4/2021MAP REDUCE AND YARN 12
  • 13. METHOD OF MAP & REDUCE • Input: a set of key/value pairs • User supplies two functions: • map( k, v)  list(k1,v1) • reduce(k1, list(v1))  v2 • (k1,v1) is an intermediate key/value pair • Output is the set of (k1,v2) pairs 1/4/2021MAP REDUCE AND YARN 13
  • 16. HOW MAP AND REDUCE WORK TOGETHER Map returns information Reduces accepts information Reduce applies a user defined function to reduce the amount of data 1/4/2021MAP REDUCE AND YARN 17
  • 17. MAP REDUCE APPLICATIONS • Yahoo! • Web application uses Hadoop to create a database of information on all known webpages • Facebook • Facebook data center uses Hadoop to provide business statistics to application developers and advertisers • Rackspace • Analyzes sever log files and usage data using Hadoop 1/4/2021MAP REDUCE AND YARN 18
  • 18. YARN • Stands for Yet Another Resource Negotiator. • New framework for managing resources. • Yarn is a generic platform. • Handles and schedules resource requests from applications. • Supervises the execution of the requests. 1/4/2021MAP REDUCE AND YARN 19
  • 20. REFERENCES • Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simplified Data Processing on Large Clusters http://labs.google.com/papers/mapreduce.html • Sanjay Ghemawat, Howard Gobioff, and Shun- T Leung, The Google File System, http://labs.google.com/papers/gfs.html 1/4/2021MAP REDUCE AND YARN 21