SlideShare a Scribd company logo
 A simple programming model
 Functional model
 For large-scale data processing
 Exploits large set of commodity computers
 Executes process in distributed manner
 Offers high availability
 Lots of demands for very large scale data
processing
 A certain common themes for these demands
 Lots of machines needed (scaling)
 Two basic operations on the input
▪ Map
▪ Reduce
 Map:
 Accepts input key/value
pair
 Emits intermediate
key/value pair
 Reduce :
 Accepts intermediate
key/value* pair
 Emits output key/value
pair
Very
big
data
Result
M
A
P
R
E
D
U
C
E
Partitioning
Function
Very
big
data
Split data
Split data
Split data
Split data
grep
grep
grep
grep
matches
matches
matches
matches
cat
All
matches
 Map
 Process a key/value pair to generate intermediate
key/value pairs
 Reduce
 Merge all intermediate values associated with the
same key
 Partition
 By default : hash(key) mod R
 Well balanced
 No reduce can begin until map is complete
 Master must communicate locations of
intermediate files
 Tasks scheduled based on location of data
 If map worker fails any time before reduce
finishes, task must be completely rerun
 MapReduce library does most of the hard work
for us!
 User to do list:
 indicate:
▪ Input/output files
▪ M: number of map tasks
▪ R: number of reduce tasks
▪ W: number of machines
 Write map and reduce functions
 Submit the job
 String Match, such as Grep
 Reverse index
 Count URL access frequency
 Lots of examples in data mining
 Provide a general-purpose model to simplify
large-scale computation
 Allow users to focus on the problem without
worrying about details
 Original paper
(http://labs.google.com/papers/mapreduce.h
tml)
 On wikipedia
(http://en.wikipedia.org/wiki/MapReduce)
 Hadoop – MapReduce in Java
(http://lucene.apache.org/hadoop/)
 http://code.google.com/edu/parallel/mapred
uce-tutorial.html

More Related Content

What's hot

Maps with leafletR
Maps with leafletRMaps with leafletR
Maps with leafletR
Michele Tobias
 
Imagery Analysis in ArcGIS New View, New Vision - Technical - Esri UK Annual ...
Imagery Analysis in ArcGIS New View, New Vision - Technical - Esri UK Annual ...Imagery Analysis in ArcGIS New View, New Vision - Technical - Esri UK Annual ...
Imagery Analysis in ArcGIS New View, New Vision - Technical - Esri UK Annual ...
Esri UK
 
Real Time Framework by Tonny
Real Time Framework by TonnyReal Time Framework by Tonny
Real Time Framework by Tonny
Agate Studio
 
Advanced Analytics - Smart Analytics - Esri UK Annual Conference 2017
Advanced Analytics - Smart Analytics - Esri UK Annual Conference 2017Advanced Analytics - Smart Analytics - Esri UK Annual Conference 2017
Advanced Analytics - Smart Analytics - Esri UK Annual Conference 2017
Esri UK
 
Geolectioxydata
GeolectioxydataGeolectioxydata
Geolectioxydata
dave west
 
ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...
ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...
ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...
I3E Technologies
 
03 sajjad ali -qgis working with raster
03 sajjad ali -qgis working with raster03 sajjad ali -qgis working with raster
03 sajjad ali -qgis working with raster
TOUSEEF3347
 
Office for National Statistics - Smart Data - Esri UK Annual Conference 2017
Office for National Statistics - Smart Data - Esri UK Annual Conference 2017Office for National Statistics - Smart Data - Esri UK Annual Conference 2017
Office for National Statistics - Smart Data - Esri UK Annual Conference 2017
Esri UK
 
Analytics for Smarter Working in the Field - Smart Working - Esri UK Annual C...
Analytics for Smarter Working in the Field - Smart Working - Esri UK Annual C...Analytics for Smarter Working in the Field - Smart Working - Esri UK Annual C...
Analytics for Smarter Working in the Field - Smart Working - Esri UK Annual C...
Esri UK
 
Network topologies working
Network topologies workingNetwork topologies working
Network topologies working
MY_Education_System
 
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy LansleyUsing R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
Guy Lansley
 
Creating Reports in SAS Final
Creating Reports in SAS FinalCreating Reports in SAS Final
Creating Reports in SAS Final
Ryan Davidson
 
Automating Crime Data to Import into GIS
Automating Crime Data to Import into GISAutomating Crime Data to Import into GIS
Automating Crime Data to Import into GIS
Safe Software
 
MapReduce
MapReduceMapReduce
MapReduce
Surinder Kaur
 
So Many Flightplans – So Many Problems
So Many Flightplans – So Many ProblemsSo Many Flightplans – So Many Problems
So Many Flightplans – So Many Problems
Safe Software
 
GoFFish - A Sub-graph centric framework for large scale graph analytics
GoFFish - A Sub-graph centric framework for large scale graph analyticsGoFFish - A Sub-graph centric framework for large scale graph analytics
GoFFish - A Sub-graph centric framework for large scale graph analytics
charithwiki
 
Pricipal Component Analysis Using R
Pricipal Component Analysis Using RPricipal Component Analysis Using R
Pricipal Component Analysis Using R
Karthi Keyan
 
Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...
Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...
Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...
Safe Software
 
ML whitepaper v0.2
ML whitepaper v0.2ML whitepaper v0.2
ML whitepaper v0.2
Nathaniel Shimoni
 
Gathering information through web applications - Smart Collaboration - Esri U...
Gathering information through web applications - Smart Collaboration - Esri U...Gathering information through web applications - Smart Collaboration - Esri U...
Gathering information through web applications - Smart Collaboration - Esri U...
Esri UK
 

What's hot (20)

Maps with leafletR
Maps with leafletRMaps with leafletR
Maps with leafletR
 
Imagery Analysis in ArcGIS New View, New Vision - Technical - Esri UK Annual ...
Imagery Analysis in ArcGIS New View, New Vision - Technical - Esri UK Annual ...Imagery Analysis in ArcGIS New View, New Vision - Technical - Esri UK Annual ...
Imagery Analysis in ArcGIS New View, New Vision - Technical - Esri UK Annual ...
 
Real Time Framework by Tonny
Real Time Framework by TonnyReal Time Framework by Tonny
Real Time Framework by Tonny
 
Advanced Analytics - Smart Analytics - Esri UK Annual Conference 2017
Advanced Analytics - Smart Analytics - Esri UK Annual Conference 2017Advanced Analytics - Smart Analytics - Esri UK Annual Conference 2017
Advanced Analytics - Smart Analytics - Esri UK Annual Conference 2017
 
Geolectioxydata
GeolectioxydataGeolectioxydata
Geolectioxydata
 
ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...
ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...
ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...
 
03 sajjad ali -qgis working with raster
03 sajjad ali -qgis working with raster03 sajjad ali -qgis working with raster
03 sajjad ali -qgis working with raster
 
Office for National Statistics - Smart Data - Esri UK Annual Conference 2017
Office for National Statistics - Smart Data - Esri UK Annual Conference 2017Office for National Statistics - Smart Data - Esri UK Annual Conference 2017
Office for National Statistics - Smart Data - Esri UK Annual Conference 2017
 
Analytics for Smarter Working in the Field - Smart Working - Esri UK Annual C...
Analytics for Smarter Working in the Field - Smart Working - Esri UK Annual C...Analytics for Smarter Working in the Field - Smart Working - Esri UK Annual C...
Analytics for Smarter Working in the Field - Smart Working - Esri UK Annual C...
 
Network topologies working
Network topologies workingNetwork topologies working
Network topologies working
 
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy LansleyUsing R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
 
Creating Reports in SAS Final
Creating Reports in SAS FinalCreating Reports in SAS Final
Creating Reports in SAS Final
 
Automating Crime Data to Import into GIS
Automating Crime Data to Import into GISAutomating Crime Data to Import into GIS
Automating Crime Data to Import into GIS
 
MapReduce
MapReduceMapReduce
MapReduce
 
So Many Flightplans – So Many Problems
So Many Flightplans – So Many ProblemsSo Many Flightplans – So Many Problems
So Many Flightplans – So Many Problems
 
GoFFish - A Sub-graph centric framework for large scale graph analytics
GoFFish - A Sub-graph centric framework for large scale graph analyticsGoFFish - A Sub-graph centric framework for large scale graph analytics
GoFFish - A Sub-graph centric framework for large scale graph analytics
 
Pricipal Component Analysis Using R
Pricipal Component Analysis Using RPricipal Component Analysis Using R
Pricipal Component Analysis Using R
 
Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...
Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...
Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...
 
ML whitepaper v0.2
ML whitepaper v0.2ML whitepaper v0.2
ML whitepaper v0.2
 
Gathering information through web applications - Smart Collaboration - Esri U...
Gathering information through web applications - Smart Collaboration - Esri U...Gathering information through web applications - Smart Collaboration - Esri U...
Gathering information through web applications - Smart Collaboration - Esri U...
 

Similar to Map Reduce introduction (google white papers)

Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
VNIT-ACM Student Chapter
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
rantav
 
2 mapreduce-model-principles
2 mapreduce-model-principles2 mapreduce-model-principles
2 mapreduce-model-principles
Genoveva Vargas-Solar
 
MapReduce-Notes.pdf
MapReduce-Notes.pdfMapReduce-Notes.pdf
MapReduce-Notes.pdf
AnilVijayagiri
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
Ahmad El Tawil
 
Map reduce programming model to solve graph problems
Map reduce programming model to solve graph problemsMap reduce programming model to solve graph problems
Map reduce programming model to solve graph problems
Nishant Gandhi
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
Sri Prasanna
 
2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)
anh tuan
 
Map reduce
Map reduceMap reduce
Map reduce
Shahbaz Sidhu
 
Lecture 1 mapreduce
Lecture 1  mapreduceLecture 1  mapreduce
Lecture 1 mapreduce
Shubham Bansal
 
iot.pptx
iot.pptxiot.pptx
iot.pptx
SabthamiS1
 
An Introduction to MapReduce
An Introduction to MapReduce An Introduction to MapReduce
An Introduction to MapReduce
Sina Ebrahimi
 
Map reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICSMap reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICS
Archana Gopinath
 
Stratosphere with big_data_analytics
Stratosphere with big_data_analyticsStratosphere with big_data_analytics
Stratosphere with big_data_analytics
Avinash Pandu
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
M Baddar
 
Sawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsSawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data Clouds
Robert Grossman
 
MapReduce
MapReduceMapReduce
MapReduce
robjk
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
Xiao Qin
 
Hadoop
HadoopHadoop
Map reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clustersMap reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clusters
Cleverence Kombe
 

Similar to Map Reduce introduction (google white papers) (20)

Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 
2 mapreduce-model-principles
2 mapreduce-model-principles2 mapreduce-model-principles
2 mapreduce-model-principles
 
MapReduce-Notes.pdf
MapReduce-Notes.pdfMapReduce-Notes.pdf
MapReduce-Notes.pdf
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
Map reduce programming model to solve graph problems
Map reduce programming model to solve graph problemsMap reduce programming model to solve graph problems
Map reduce programming model to solve graph problems
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)
 
Map reduce
Map reduceMap reduce
Map reduce
 
Lecture 1 mapreduce
Lecture 1  mapreduceLecture 1  mapreduce
Lecture 1 mapreduce
 
iot.pptx
iot.pptxiot.pptx
iot.pptx
 
An Introduction to MapReduce
An Introduction to MapReduce An Introduction to MapReduce
An Introduction to MapReduce
 
Map reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICSMap reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICS
 
Stratosphere with big_data_analytics
Stratosphere with big_data_analyticsStratosphere with big_data_analytics
Stratosphere with big_data_analytics
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
Sawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsSawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data Clouds
 
MapReduce
MapReduceMapReduce
MapReduce
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
 
Hadoop
HadoopHadoop
Hadoop
 
Map reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clustersMap reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clusters
 

Recently uploaded

Design and optimization of ion propulsion drone
Design and optimization of ion propulsion droneDesign and optimization of ion propulsion drone
Design and optimization of ion propulsion drone
bjmsejournal
 
AI for Legal Research with applications, tools
AI for Legal Research with applications, toolsAI for Legal Research with applications, tools
AI for Legal Research with applications, tools
mahaffeycheryld
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
aryanpankaj78
 
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
upoux
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
ydzowc
 
Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...
bijceesjournal
 
Gas agency management system project report.pdf
Gas agency management system project report.pdfGas agency management system project report.pdf
Gas agency management system project report.pdf
Kamal Acharya
 
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
Gino153088
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
21UME003TUSHARDEB
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
Yasser Mahgoub
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
ijaia
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
Divyanshu
 
Engineering Standards Wiring methods.pdf
Engineering Standards Wiring methods.pdfEngineering Standards Wiring methods.pdf
Engineering Standards Wiring methods.pdf
edwin408357
 
Generative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdfGenerative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdf
mahaffeycheryld
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
ElakkiaU
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
bijceesjournal
 
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
nedcocy
 
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
upoux
 
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
shadow0702a
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
kandramariana6
 

Recently uploaded (20)

Design and optimization of ion propulsion drone
Design and optimization of ion propulsion droneDesign and optimization of ion propulsion drone
Design and optimization of ion propulsion drone
 
AI for Legal Research with applications, tools
AI for Legal Research with applications, toolsAI for Legal Research with applications, tools
AI for Legal Research with applications, tools
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
 
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
 
Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...
 
Gas agency management system project report.pdf
Gas agency management system project report.pdfGas agency management system project report.pdf
Gas agency management system project report.pdf
 
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
 
Engineering Standards Wiring methods.pdf
Engineering Standards Wiring methods.pdfEngineering Standards Wiring methods.pdf
Engineering Standards Wiring methods.pdf
 
Generative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdfGenerative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdf
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
 
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
 
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
 
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
 

Map Reduce introduction (google white papers)

  • 1.
  • 2.  A simple programming model  Functional model  For large-scale data processing  Exploits large set of commodity computers  Executes process in distributed manner  Offers high availability
  • 3.  Lots of demands for very large scale data processing  A certain common themes for these demands  Lots of machines needed (scaling)  Two basic operations on the input ▪ Map ▪ Reduce
  • 4.  Map:  Accepts input key/value pair  Emits intermediate key/value pair  Reduce :  Accepts intermediate key/value* pair  Emits output key/value pair Very big data Result M A P R E D U C E Partitioning Function
  • 5. Very big data Split data Split data Split data Split data grep grep grep grep matches matches matches matches cat All matches
  • 6.
  • 7.
  • 8.  Map  Process a key/value pair to generate intermediate key/value pairs  Reduce  Merge all intermediate values associated with the same key  Partition  By default : hash(key) mod R  Well balanced
  • 9.  No reduce can begin until map is complete  Master must communicate locations of intermediate files  Tasks scheduled based on location of data  If map worker fails any time before reduce finishes, task must be completely rerun  MapReduce library does most of the hard work for us!
  • 10.  User to do list:  indicate: ▪ Input/output files ▪ M: number of map tasks ▪ R: number of reduce tasks ▪ W: number of machines  Write map and reduce functions  Submit the job
  • 11.  String Match, such as Grep  Reverse index  Count URL access frequency  Lots of examples in data mining
  • 12.
  • 13.  Provide a general-purpose model to simplify large-scale computation  Allow users to focus on the problem without worrying about details
  • 14.  Original paper (http://labs.google.com/papers/mapreduce.h tml)  On wikipedia (http://en.wikipedia.org/wiki/MapReduce)  Hadoop – MapReduce in Java (http://lucene.apache.org/hadoop/)  http://code.google.com/edu/parallel/mapred uce-tutorial.html