SlideShare a Scribd company logo
1 of 14
 A simple programming model
 Functional model
 For large-scale data processing
 Exploits large set of commodity computers
 Executes process in distributed manner
 Offers high availability
 Lots of demands for very large scale data
processing
 A certain common themes for these demands
 Lots of machines needed (scaling)
 Two basic operations on the input
▪ Map
▪ Reduce
 Map:
 Accepts input key/value
pair
 Emits intermediate
key/value pair
 Reduce :
 Accepts intermediate
key/value* pair
 Emits output key/value
pair
Very
big
data
Result
M
A
P
R
E
D
U
C
E
Partitioning
Function
Very
big
data
Split data
Split data
Split data
Split data
grep
grep
grep
grep
matches
matches
matches
matches
cat
All
matches
 Map
 Process a key/value pair to generate intermediate
key/value pairs
 Reduce
 Merge all intermediate values associated with the
same key
 Partition
 By default : hash(key) mod R
 Well balanced
 No reduce can begin until map is complete
 Master must communicate locations of
intermediate files
 Tasks scheduled based on location of data
 If map worker fails any time before reduce
finishes, task must be completely rerun
 MapReduce library does most of the hard work
for us!
 User to do list:
 indicate:
▪ Input/output files
▪ M: number of map tasks
▪ R: number of reduce tasks
▪ W: number of machines
 Write map and reduce functions
 Submit the job
 String Match, such as Grep
 Reverse index
 Count URL access frequency
 Lots of examples in data mining
 Provide a general-purpose model to simplify
large-scale computation
 Allow users to focus on the problem without
worrying about details
 Original paper
(http://labs.google.com/papers/mapreduce.h
tml)
 On wikipedia
(http://en.wikipedia.org/wiki/MapReduce)
 Hadoop – MapReduce in Java
(http://lucene.apache.org/hadoop/)
 http://code.google.com/edu/parallel/mapred
uce-tutorial.html

More Related Content

What's hot

Real Time Framework by Tonny
Real Time Framework by TonnyReal Time Framework by Tonny
Real Time Framework by Tonny
Agate Studio
 
Creating Reports in SAS Final
Creating Reports in SAS FinalCreating Reports in SAS Final
Creating Reports in SAS Final
Ryan Davidson
 
Pricipal Component Analysis Using R
Pricipal Component Analysis Using RPricipal Component Analysis Using R
Pricipal Component Analysis Using R
Karthi Keyan
 

What's hot (20)

Maps with leafletR
Maps with leafletRMaps with leafletR
Maps with leafletR
 
Imagery Analysis in ArcGIS New View, New Vision - Technical - Esri UK Annual ...
Imagery Analysis in ArcGIS New View, New Vision - Technical - Esri UK Annual ...Imagery Analysis in ArcGIS New View, New Vision - Technical - Esri UK Annual ...
Imagery Analysis in ArcGIS New View, New Vision - Technical - Esri UK Annual ...
 
Real Time Framework by Tonny
Real Time Framework by TonnyReal Time Framework by Tonny
Real Time Framework by Tonny
 
Advanced Analytics - Smart Analytics - Esri UK Annual Conference 2017
Advanced Analytics - Smart Analytics - Esri UK Annual Conference 2017Advanced Analytics - Smart Analytics - Esri UK Annual Conference 2017
Advanced Analytics - Smart Analytics - Esri UK Annual Conference 2017
 
Geolectioxydata
GeolectioxydataGeolectioxydata
Geolectioxydata
 
ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...
ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...
ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...
 
03 sajjad ali -qgis working with raster
03 sajjad ali -qgis working with raster03 sajjad ali -qgis working with raster
03 sajjad ali -qgis working with raster
 
Office for National Statistics - Smart Data - Esri UK Annual Conference 2017
Office for National Statistics - Smart Data - Esri UK Annual Conference 2017Office for National Statistics - Smart Data - Esri UK Annual Conference 2017
Office for National Statistics - Smart Data - Esri UK Annual Conference 2017
 
Analytics for Smarter Working in the Field - Smart Working - Esri UK Annual C...
Analytics for Smarter Working in the Field - Smart Working - Esri UK Annual C...Analytics for Smarter Working in the Field - Smart Working - Esri UK Annual C...
Analytics for Smarter Working in the Field - Smart Working - Esri UK Annual C...
 
Network topologies working
Network topologies workingNetwork topologies working
Network topologies working
 
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy LansleyUsing R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
 
Creating Reports in SAS Final
Creating Reports in SAS FinalCreating Reports in SAS Final
Creating Reports in SAS Final
 
Automating Crime Data to Import into GIS
Automating Crime Data to Import into GISAutomating Crime Data to Import into GIS
Automating Crime Data to Import into GIS
 
MapReduce
MapReduceMapReduce
MapReduce
 
So Many Flightplans – So Many Problems
So Many Flightplans – So Many ProblemsSo Many Flightplans – So Many Problems
So Many Flightplans – So Many Problems
 
GoFFish - A Sub-graph centric framework for large scale graph analytics
GoFFish - A Sub-graph centric framework for large scale graph analyticsGoFFish - A Sub-graph centric framework for large scale graph analytics
GoFFish - A Sub-graph centric framework for large scale graph analytics
 
Pricipal Component Analysis Using R
Pricipal Component Analysis Using RPricipal Component Analysis Using R
Pricipal Component Analysis Using R
 
Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...
Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...
Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...
 
ML whitepaper v0.2
ML whitepaper v0.2ML whitepaper v0.2
ML whitepaper v0.2
 
Gathering information through web applications - Smart Collaboration - Esri U...
Gathering information through web applications - Smart Collaboration - Esri U...Gathering information through web applications - Smart Collaboration - Esri U...
Gathering information through web applications - Smart Collaboration - Esri U...
 

Similar to Map Reduce introduction (google white papers)

Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
rantav
 
Map reduce programming model to solve graph problems
Map reduce programming model to solve graph problemsMap reduce programming model to solve graph problems
Map reduce programming model to solve graph problems
Nishant Gandhi
 
2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)
anh tuan
 
MapReduce
MapReduceMapReduce
MapReduce
robjk
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
Xiao Qin
 

Similar to Map Reduce introduction (google white papers) (20)

Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 
2 mapreduce-model-principles
2 mapreduce-model-principles2 mapreduce-model-principles
2 mapreduce-model-principles
 
MapReduce-Notes.pdf
MapReduce-Notes.pdfMapReduce-Notes.pdf
MapReduce-Notes.pdf
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
Map reduce programming model to solve graph problems
Map reduce programming model to solve graph problemsMap reduce programming model to solve graph problems
Map reduce programming model to solve graph problems
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Map reduce
Map reduceMap reduce
Map reduce
 
2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)
 
Lecture 1 mapreduce
Lecture 1  mapreduceLecture 1  mapreduce
Lecture 1 mapreduce
 
iot.pptx
iot.pptxiot.pptx
iot.pptx
 
An Introduction to MapReduce
An Introduction to MapReduce An Introduction to MapReduce
An Introduction to MapReduce
 
Map reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICSMap reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICS
 
Stratosphere with big_data_analytics
Stratosphere with big_data_analyticsStratosphere with big_data_analytics
Stratosphere with big_data_analytics
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
Sawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsSawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data Clouds
 
MapReduce
MapReduceMapReduce
MapReduce
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
 
Hadoop
HadoopHadoop
Hadoop
 
Map reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clustersMap reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clusters
 

Recently uploaded

Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
Kamal Acharya
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
MayuraD1
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptx
pritamlangde
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 

Recently uploaded (20)

Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptx
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 

Map Reduce introduction (google white papers)

  • 1.
  • 2.  A simple programming model  Functional model  For large-scale data processing  Exploits large set of commodity computers  Executes process in distributed manner  Offers high availability
  • 3.  Lots of demands for very large scale data processing  A certain common themes for these demands  Lots of machines needed (scaling)  Two basic operations on the input ▪ Map ▪ Reduce
  • 4.  Map:  Accepts input key/value pair  Emits intermediate key/value pair  Reduce :  Accepts intermediate key/value* pair  Emits output key/value pair Very big data Result M A P R E D U C E Partitioning Function
  • 5. Very big data Split data Split data Split data Split data grep grep grep grep matches matches matches matches cat All matches
  • 6.
  • 7.
  • 8.  Map  Process a key/value pair to generate intermediate key/value pairs  Reduce  Merge all intermediate values associated with the same key  Partition  By default : hash(key) mod R  Well balanced
  • 9.  No reduce can begin until map is complete  Master must communicate locations of intermediate files  Tasks scheduled based on location of data  If map worker fails any time before reduce finishes, task must be completely rerun  MapReduce library does most of the hard work for us!
  • 10.  User to do list:  indicate: ▪ Input/output files ▪ M: number of map tasks ▪ R: number of reduce tasks ▪ W: number of machines  Write map and reduce functions  Submit the job
  • 11.  String Match, such as Grep  Reverse index  Count URL access frequency  Lots of examples in data mining
  • 12.
  • 13.  Provide a general-purpose model to simplify large-scale computation  Allow users to focus on the problem without worrying about details
  • 14.  Original paper (http://labs.google.com/papers/mapreduce.h tml)  On wikipedia (http://en.wikipedia.org/wiki/MapReduce)  Hadoop – MapReduce in Java (http://lucene.apache.org/hadoop/)  http://code.google.com/edu/parallel/mapred uce-tutorial.html