SlideShare a Scribd company logo
1 of 21
1/4/2021MAP REDUCE AND YARN 1
DEPT OF Information technology
MAP REDUCE AND YARN
PRESENTED BY
K.MANOJKUMAR(16BIT3051)
C.RANJITH
KUMAR(16BIT3078)
GUIDED BY
BIG DATA
• Big data is collection of massive amount
of structured , semi-structured and
unstructured data.
1/4/2021MAP REDUCE AND YARN 2
SOURCES OF DATA
•Social media
•Transport data
•Business transactions
•Bank and credit card data
1/4/2021MAP REDUCE AND YARN 3
HDFS
• HDFS holds very large amount of data and
provides easier access.
• To store such huge data, the files are
stored across multiple machines.
• HDFS is highly fault tolerant and designed
using low-cost hardware.
1/4/2021MAP REDUCE AND YARN 4
1/4/2021MAP REDUCE AND YARN 5
FEATURES OF HDFS
• It is suitable for the distributed storage
and processing.
• Hadoop provides a command interface
to interact with HDFS.
• Streaming access to file system data.
• HDFS provides file permissions and
authentication.
1/4/2021MAP REDUCE AND YARN 6
DISTRIBUTED FILE SYSTEM
• Highly scalable distributed file system
for large data-intensive applications.
• E.g. 10K nodes, 100 million files, 10 PB
• Provides redundant storage of massive
amounts of data on cheap and
unreliable computers
• Files are replicated to handle hardware
failure
• Detect failures and recovers from them
• Provides a platform over which other
systems like MapReduce. 1/4/2021MAP REDUCE AND YARN 7
CONCEPTS BEHIND DFS
•Map reduce
MR1
MR2
•Yarn
Both Map Reduce and Yarn are
running under the Hadoop.
1/4/2021MAP REDUCE AND YARN 8
BEFORE MAP REDUCE
• Large scale data processing was difficult!
• Managing hundreds or thousands of processors
• Managing parallelization and distribution
• I/O Scheduling
• Status and monitoring
• Fault/crash tolerance
• MapReduce provides all of these, easily!
1/4/2021MAP REDUCE AND YARN 9
MAP REDUCE -1
•Earlier version of map reduce called
MR-1.
•It runs only in Map reduce model.
•Here job and task tracker manages
the jobs and tasks.
1/4/2021MAP REDUCE AND YARN 10
MAP REDUCE -2
• New version of map reduce is called
MR2.
• Here job and task tracker disappeared.
• Each job control its own destiny. Each
job has application master taking care
of execution flow.
1/4/2021MAP REDUCE AND YARN 11
MAP REDUCE-2
CHARACTERISTICS
•More Isolated
•Scalable compared to Map reduce -
1.
•It runs Map reduce framework top
of the yarn.
1/4/2021MAP REDUCE AND YARN 12
METHOD OF MAP & REDUCE
• Input: a set of key/value pairs
• User supplies two functions:
• map( k, v)  list(k1,v1)
• reduce(k1, list(v1))  v2
• (k1,v1) is an intermediate key/value pair
• Output is the set of (k1,v2) pairs
1/4/2021MAP REDUCE AND YARN 13
MAP EXAMPLE
1/4/2021MAP REDUCE AND YARN 14
REDUCE EXAMPLE
1/4/2021MAP REDUCE AND YARN 15
HOW MAP AND REDUCE WORK
TOGETHER
Map returns
information
Reduces
accepts
information
Reduce applies
a user defined
function to
reduce the
amount of data
1/4/2021MAP REDUCE AND YARN 17
MAP REDUCE APPLICATIONS
• Yahoo!
• Web application uses Hadoop to create a database of
information on all known webpages
• Facebook
• Facebook data center uses Hadoop to provide
business statistics to application developers and
advertisers
• Rackspace
• Analyzes sever log files and usage data using
Hadoop 1/4/2021MAP REDUCE AND YARN 18
YARN
• Stands for Yet Another Resource Negotiator.
• New framework for managing resources.
• Yarn is a generic platform.
• Handles and schedules resource requests from
applications.
• Supervises the execution of the requests.
1/4/2021MAP REDUCE AND YARN 19
YARN
1/4/2021MAP REDUCE AND YARN 20
REFERENCES
• Jeffrey Dean and Sanjay Ghemawat,
MapReduce: Simplified Data Processing on
Large Clusters
http://labs.google.com/papers/mapreduce.html
• Sanjay Ghemawat, Howard Gobioff, and Shun-
T Leung, The Google File System,
http://labs.google.com/papers/gfs.html
1/4/2021MAP REDUCE AND YARN 21
1/4/2021MAP REDUCE AND YARN 22

More Related Content

Similar to Map reduce team and yarn

Bulk Loading Into HBase With MapReduce
Bulk Loading Into HBase With MapReduceBulk Loading Into HBase With MapReduce
Bulk Loading Into HBase With MapReduceEdureka!
 
Distributed Cache With MapReduce
Distributed Cache With MapReduceDistributed Cache With MapReduce
Distributed Cache With MapReduceEdureka!
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduceM Baddar
 
XML Parsing with Map Reduce
XML Parsing with Map ReduceXML Parsing with Map Reduce
XML Parsing with Map ReduceEdureka!
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map ReduceUrvashi Kataria
 
MapReduce: Simplified Data Processing On Large Clusters
MapReduce: Simplified Data Processing On Large ClustersMapReduce: Simplified Data Processing On Large Clusters
MapReduce: Simplified Data Processing On Large Clusterskazuma_sato
 
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node CombinersHadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node Combinersijcsit
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce ParadigmDilip Reddy
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce ParadigmDilip Reddy
 
Taming Latency: Case Studies in MapReduce Data Analytics
Taming Latency: Case Studies in MapReduce Data AnalyticsTaming Latency: Case Studies in MapReduce Data Analytics
Taming Latency: Case Studies in MapReduce Data AnalyticsEMC
 
Stratosphere with big_data_analytics
Stratosphere with big_data_analyticsStratosphere with big_data_analytics
Stratosphere with big_data_analyticsAvinash Pandu
 
Generating Frequent Itemsets by RElim on Hadoop Clusters
Generating Frequent Itemsets by RElim on Hadoop ClustersGenerating Frequent Itemsets by RElim on Hadoop Clusters
Generating Frequent Itemsets by RElim on Hadoop ClustersBRNSSPublicationHubI
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyKyong-Ha Lee
 
20181026 streaming architecture
20181026 streaming architecture20181026 streaming architecture
20181026 streaming architectureLiu Xun
 
Apache Hadoop YARN
Apache Hadoop YARNApache Hadoop YARN
Apache Hadoop YARNAdam Kawa
 

Similar to Map reduce team and yarn (20)

Bulk Loading Into HBase With MapReduce
Bulk Loading Into HBase With MapReduceBulk Loading Into HBase With MapReduce
Bulk Loading Into HBase With MapReduce
 
Distributed Cache With MapReduce
Distributed Cache With MapReduceDistributed Cache With MapReduce
Distributed Cache With MapReduce
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
02 Map Reduce
02 Map Reduce02 Map Reduce
02 Map Reduce
 
XML Parsing with Map Reduce
XML Parsing with Map ReduceXML Parsing with Map Reduce
XML Parsing with Map Reduce
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
 
MapReduce: Simplified Data Processing On Large Clusters
MapReduce: Simplified Data Processing On Large ClustersMapReduce: Simplified Data Processing On Large Clusters
MapReduce: Simplified Data Processing On Large Clusters
 
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node CombinersHadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
48a tuning
48a tuning48a tuning
48a tuning
 
Taming Latency: Case Studies in MapReduce Data Analytics
Taming Latency: Case Studies in MapReduce Data AnalyticsTaming Latency: Case Studies in MapReduce Data Analytics
Taming Latency: Case Studies in MapReduce Data Analytics
 
Hadoop
HadoopHadoop
Hadoop
 
MapReduce
MapReduceMapReduce
MapReduce
 
Stratosphere with big_data_analytics
Stratosphere with big_data_analyticsStratosphere with big_data_analytics
Stratosphere with big_data_analytics
 
Generating Frequent Itemsets by RElim on Hadoop Clusters
Generating Frequent Itemsets by RElim on Hadoop ClustersGenerating Frequent Itemsets by RElim on Hadoop Clusters
Generating Frequent Itemsets by RElim on Hadoop Clusters
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A Survey
 
20181026 streaming architecture
20181026 streaming architecture20181026 streaming architecture
20181026 streaming architecture
 
Apache Hadoop YARN
Apache Hadoop YARNApache Hadoop YARN
Apache Hadoop YARN
 
Hadoop Internals
Hadoop InternalsHadoop Internals
Hadoop Internals
 

Recently uploaded

怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制vexqp
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........EfruzAsilolu
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATIONLakpaYanziSherpa
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格q6pzkpark
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjurptikerjasaptiker
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制vexqp
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样wsppdmt
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schscnajjemba
 

Recently uploaded (20)

怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 

Map reduce team and yarn

  • 1. 1/4/2021MAP REDUCE AND YARN 1 DEPT OF Information technology MAP REDUCE AND YARN PRESENTED BY K.MANOJKUMAR(16BIT3051) C.RANJITH KUMAR(16BIT3078) GUIDED BY
  • 2. BIG DATA • Big data is collection of massive amount of structured , semi-structured and unstructured data. 1/4/2021MAP REDUCE AND YARN 2
  • 3. SOURCES OF DATA •Social media •Transport data •Business transactions •Bank and credit card data 1/4/2021MAP REDUCE AND YARN 3
  • 4. HDFS • HDFS holds very large amount of data and provides easier access. • To store such huge data, the files are stored across multiple machines. • HDFS is highly fault tolerant and designed using low-cost hardware. 1/4/2021MAP REDUCE AND YARN 4
  • 6. FEATURES OF HDFS • It is suitable for the distributed storage and processing. • Hadoop provides a command interface to interact with HDFS. • Streaming access to file system data. • HDFS provides file permissions and authentication. 1/4/2021MAP REDUCE AND YARN 6
  • 7. DISTRIBUTED FILE SYSTEM • Highly scalable distributed file system for large data-intensive applications. • E.g. 10K nodes, 100 million files, 10 PB • Provides redundant storage of massive amounts of data on cheap and unreliable computers • Files are replicated to handle hardware failure • Detect failures and recovers from them • Provides a platform over which other systems like MapReduce. 1/4/2021MAP REDUCE AND YARN 7
  • 8. CONCEPTS BEHIND DFS •Map reduce MR1 MR2 •Yarn Both Map Reduce and Yarn are running under the Hadoop. 1/4/2021MAP REDUCE AND YARN 8
  • 9. BEFORE MAP REDUCE • Large scale data processing was difficult! • Managing hundreds or thousands of processors • Managing parallelization and distribution • I/O Scheduling • Status and monitoring • Fault/crash tolerance • MapReduce provides all of these, easily! 1/4/2021MAP REDUCE AND YARN 9
  • 10. MAP REDUCE -1 •Earlier version of map reduce called MR-1. •It runs only in Map reduce model. •Here job and task tracker manages the jobs and tasks. 1/4/2021MAP REDUCE AND YARN 10
  • 11. MAP REDUCE -2 • New version of map reduce is called MR2. • Here job and task tracker disappeared. • Each job control its own destiny. Each job has application master taking care of execution flow. 1/4/2021MAP REDUCE AND YARN 11
  • 12. MAP REDUCE-2 CHARACTERISTICS •More Isolated •Scalable compared to Map reduce - 1. •It runs Map reduce framework top of the yarn. 1/4/2021MAP REDUCE AND YARN 12
  • 13. METHOD OF MAP & REDUCE • Input: a set of key/value pairs • User supplies two functions: • map( k, v)  list(k1,v1) • reduce(k1, list(v1))  v2 • (k1,v1) is an intermediate key/value pair • Output is the set of (k1,v2) pairs 1/4/2021MAP REDUCE AND YARN 13
  • 16. HOW MAP AND REDUCE WORK TOGETHER Map returns information Reduces accepts information Reduce applies a user defined function to reduce the amount of data 1/4/2021MAP REDUCE AND YARN 17
  • 17. MAP REDUCE APPLICATIONS • Yahoo! • Web application uses Hadoop to create a database of information on all known webpages • Facebook • Facebook data center uses Hadoop to provide business statistics to application developers and advertisers • Rackspace • Analyzes sever log files and usage data using Hadoop 1/4/2021MAP REDUCE AND YARN 18
  • 18. YARN • Stands for Yet Another Resource Negotiator. • New framework for managing resources. • Yarn is a generic platform. • Handles and schedules resource requests from applications. • Supervises the execution of the requests. 1/4/2021MAP REDUCE AND YARN 19
  • 20. REFERENCES • Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simplified Data Processing on Large Clusters http://labs.google.com/papers/mapreduce.html • Sanjay Ghemawat, Howard Gobioff, and Shun- T Leung, The Google File System, http://labs.google.com/papers/gfs.html 1/4/2021MAP REDUCE AND YARN 21