SlideShare a Scribd company logo
Mapreduce
Simplified Data Processing on Large Clusters
Original Research by: Jeffrey Dean and Sanjay Ghemawat
Google Inc., Published in OSDI 2004
P RESENTATION BY: A BE A RREDONDO & J A SON BEERE
UNIVERSITY OF TEXAS AT AUSTIN
GRADUATE SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING
EE 382N: DISTRIBUTED SYSTEMS OPT III PRC, FALL 2015
DR. VIJAY K. GARG PROFESSOR, WEI-LUN-HUNG TEACHING ASSISTANT
SEPTEMBER 18TH, 2015
1
Agenda
• Introduction & Overview
• Motivation, Background, Examples
• Implementation
• Diagram
• Program Example
• Advantages, Disadvantages, Refinements, and Extensions
• Performance
• Conclusion
• References and Appendix
2
Introduction and Overview
•Motivation:
• Process lots of data,Scalable tothousandsof commodityCPU’s, Easyto use
•What does it do?
• Parallelization, Fault Tolerance, Load Balancing, I/O Scheduling, Monitor
• Locally optimized: reduces the amount of data sent across the network
•Examples
• Web Search Service, Sorting, Data Mining, Machine Leaning,
• Distributed Grep & Sort, Web link-Graph Reversal, Inverted Indexes
•How and where is it used?
◦ Analytics,Maps,User Behavior,RetailCommercialAdvertising, SocialMedia,
◦ HumanGenome, CancerResearch,FacialRecognition.… Gov&Military.…
Monash ResearchPub: http://www.dbms2.com/2008/08/26/known-applications-of-mapreduce/
3
Diagram
4
Snippet of code
map(String key, String value):
// key: document name
// value: document contents
for each word w in value:
EmitIntermediate(w, "1");
reduce(String key, Iterator values):
// key: a word
// values: a list of counts
int result = 0;
for each v in values:
result += ParseInt(v);
Emit(AsString(result));
5
Advantages, Disadvantages,
Refinements, and Extensions
•Master Fault Tolerance
• Unlikely: start a new copy of master
• Worker Failure
• Master Redirects tasks
• Detect Failure: Periodic Heartbeat
• Re-execute completed and in-progress map tasks
• Re-execute in progress reduce tasks
• Task completion committed through master
•Straggler Machine Delays
• A machine with a bad Disk. et at. …
•SOLUTION: Master Schedules a Backup
6
Refinements :
◦ Ordering Guarantees: Key/Value pairs are guaranteed to be processed
in increasing order.
◦ Skipping bad records using UDP data packets
◦ Combiner Function: Partial combining speeds up MapReduce Ops
◦ Local Execution and Local debugging tools (gdb)
◦ Status Info: Master runs an internal HTTP server and exports status.
◦ User defined Counters Facility used for sanity checking
Task Granularity and Pipelining
◦ Many more Map Tasks than machines
◦ Min time for fault recovery
◦ Pipeline shuffling
◦ Dynamic Load Balancing
Advantages, Disadvantages,
Refinements, and Extensions 2
200,000 Map, 5000 Reduce, w/ 2000 Machines
7
Grep Performance
8
•Tests run on cluster of 1800 machines:
• 4 GB of memory
• Dual-processor2 GHz Xeons with H-hreading
• Dual160 GB IDE disks
• GigabitEthernetper machine
• Bisectionbandwidthapproximately 100Gbps
•Two benchmarks:
• MR_Grep Scan 1010 100-byte records to extract records matchinga rare
pattern(92K matchingrecords)
• MR_Sort Sort 1010 100-byte records(modeledafter TeraSortbenchmark)
•Locally Optimized Helped
• 1800 Machinesread1 TB of dataat peak ~31 GB/s
• Withoutthis, rack switches wouldlimitto 10 GB/s
• Startupoverhead issignificantforshort jobs
Sort Performance
9
Conclusion
MapReduce has proven to be a useful abstraction
Greatly simplifies large-scale computations at Google
Easy to use: focus on problem, let library deal w/ messy details
• Parallelization, Fault Tolerance, Load Balancing, I/O Scheduling,
Monitor
• Locally optimized: reducesthe amount of data sent across the
network
New code is simpler, easier to understand
MapReduce takes care of failures, slow machines
Easy to make indexing faster by adding more machines
10
Appendix and References
11
One Final Thought
In pioneer days they used oxen for heavy pulling,
and when one ox couldn't budge a log, they didn't
try to grow a larger ox. We shouldn't be trying for
bigger computers, but for more systems of
computers.
- Grace Hopper
12
Questions?
13
Errata
14
Believed “an apple a day keeps a doctor away”
Sam’s Mother
Mother
Sam
An Apple
(3) Ekanayake
15
Sam thought of “drinking” the apple
One day
 He used a to cut the
and a to make juice.
(3) Ekanayake
16
 (map ‘( ))
( )
Sam applied his invention to all the fruits he could find in the fruit
basket
Next Day
 (reduce ‘( )) Classical Notion of MapReduce in
Functional Programming
A list of values mapped into another list
of values, which gets reduced into a
single value
(3) Ekanayake
17
18 Years Later
Sam got his first job in JuiceRUs for his talent in making juice
 Now, it’s not just one basket
but a wholecontainer of fruits
 Also, they produce a list of juice types
separately
NOT ENOUGH!!
 But, Sam had just ONE and ONE
Largedata and list of values for
output
Wait!
(3) Ekanayake
18
Implemented a parallel version of his innovation
Brave Sam
(<a, > , <o, > , <p, > , …)
Each input to a map is a list of <key, value> pairs
Each output of a map is a list of <key, value> pairs
(<a’, > , <o’, > , <p’, > , …)
Grouped by key
Each input to a reduce is a <key, value-list> (possibly a
list of these, depending on the grouping/hashing
mechanism)
e.g. <a’, ( …)>
Reduced into a list of values
(3) Ekanayake
19
Implemented a parallel version of his innovation
Brave Sam
The idea of MapReduce in Data Intensive
Computing
A list of <key, value> pairs mapped into another
list of <key, value> pairs which gets grouped by
the key and reduced into a list of values
(3) Ekanayake
20
Sam realized,
◦ To create his favoritemix fruit juice he can use a combiner after the reducers
◦ If several <key, value-list> fall into the same group (based on the grouping/hashing
algorithm) then use the blender (reducer) separatelyon each of them
◦ The knife (mapper)and blender (reducer)should not contain residueafter use – Side
Effect Free
◦ In general reducer should be associative and commutative
Afterwards
(3) Ekanayake
21
References
(1) MapReduce: Simplified Data Processing on Large Clusters– by Jeffrey
Dean and Sanjay Ghemawat. Presentation:
http://research.google.com/archive/mapreduce-osdi04-slides/index-auto-0002.html
(2) Map-Reducemeets wider Varieties of applications, by Chen,
Scholsser: Intel. http://www.cs.cmu.edu/~chensm/papers/IRP-TR-08-05.pdf
(3) MapReduce: Story of Sam. By Saliya Ekanayake SALSA HPC Group
Pervasive Technology Institute, Indiana University,
Bloomingtonhttp://www.slideshare.net/esaliya/mapreduce-in-simple-terms
(4) Monash Research Pub: http://www.dbms2.com/2008/08/26/known-applications-of-mapreduce/
(5) Wikipedia https://en.wikipedia.org/wiki/MapReduce#References
22

More Related Content

What's hot

Apache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyApache Hadoop Big Data Technology
Apache Hadoop Big Data Technology
Jay Nagar
 
MapReduce Algorithm Design
MapReduce Algorithm DesignMapReduce Algorithm Design
MapReduce Algorithm Design
Gabriela Agustini
 
What is MapReduce ?
What is MapReduce ?What is MapReduce ?
What is MapReduce ?
ShilpaKrishna6
 
B.MONICA II M.SC COMPUTER SCIENCE
B.MONICA II M.SC COMPUTER SCIENCEB.MONICA II M.SC COMPUTER SCIENCE
B.MONICA II M.SC COMPUTER SCIENCE
BMonica1
 
Applying stratosphere for big data analytics
Applying stratosphere for big data analyticsApplying stratosphere for big data analytics
Applying stratosphere for big data analytics
Avinash Pandu
 
MapR Data Analyst
MapR Data AnalystMapR Data Analyst
MapR Data Analyst
selvaraaju
 
Hive Percona 2009
Hive Percona 2009Hive Percona 2009
Hive Percona 2009
prasadc
 
Stratosphere with big_data_analytics
Stratosphere with big_data_analyticsStratosphere with big_data_analytics
Stratosphere with big_data_analytics
Avinash Pandu
 
Large Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part ILarge Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part IMarin Dimitrov
 
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
jencyjayastina
 
Introduction to MapReduce & hadoop
Introduction to MapReduce & hadoopIntroduction to MapReduce & hadoop
Introduction to MapReduce & hadoop
Colin Su
 
Hadoop foundation for analytics,B Monica II M.sc computer science ,BON SECOUR...
Hadoop foundation for analytics,B Monica II M.sc computer science ,BON SECOUR...Hadoop foundation for analytics,B Monica II M.sc computer science ,BON SECOUR...
Hadoop foundation for analytics,B Monica II M.sc computer science ,BON SECOUR...
BMonica1
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling Algorithms
Leila panahi
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
Urvashi Kataria
 
Big Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory ComputationBig Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory Computation
UT, San Antonio
 
Dache: A Data Aware Caching for Big-Data using Map Reduce framework
Dache: A Data Aware Caching for Big-Data using Map Reduce frameworkDache: A Data Aware Caching for Big-Data using Map Reduce framework
Dache: A Data Aware Caching for Big-Data using Map Reduce framework
Safir Shah
 
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
Kwang Woo NAM
 
Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Fra...
Dache: A Data Aware Caching for Big-Data Applications Usingthe MapReduce Fra...Dache: A Data Aware Caching for Big-Data Applications Usingthe MapReduce Fra...
Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Fra...
Govt.Engineering college, Idukki
 
Juniper Innovation Contest
Juniper Innovation ContestJuniper Innovation Contest
Juniper Innovation ContestAMIT BORUDE
 

What's hot (20)

Apache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyApache Hadoop Big Data Technology
Apache Hadoop Big Data Technology
 
MapReduce Algorithm Design
MapReduce Algorithm DesignMapReduce Algorithm Design
MapReduce Algorithm Design
 
What is MapReduce ?
What is MapReduce ?What is MapReduce ?
What is MapReduce ?
 
B.MONICA II M.SC COMPUTER SCIENCE
B.MONICA II M.SC COMPUTER SCIENCEB.MONICA II M.SC COMPUTER SCIENCE
B.MONICA II M.SC COMPUTER SCIENCE
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Applying stratosphere for big data analytics
Applying stratosphere for big data analyticsApplying stratosphere for big data analytics
Applying stratosphere for big data analytics
 
MapR Data Analyst
MapR Data AnalystMapR Data Analyst
MapR Data Analyst
 
Hive Percona 2009
Hive Percona 2009Hive Percona 2009
Hive Percona 2009
 
Stratosphere with big_data_analytics
Stratosphere with big_data_analyticsStratosphere with big_data_analytics
Stratosphere with big_data_analytics
 
Large Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part ILarge Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part I
 
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
 
Introduction to MapReduce & hadoop
Introduction to MapReduce & hadoopIntroduction to MapReduce & hadoop
Introduction to MapReduce & hadoop
 
Hadoop foundation for analytics,B Monica II M.sc computer science ,BON SECOUR...
Hadoop foundation for analytics,B Monica II M.sc computer science ,BON SECOUR...Hadoop foundation for analytics,B Monica II M.sc computer science ,BON SECOUR...
Hadoop foundation for analytics,B Monica II M.sc computer science ,BON SECOUR...
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling Algorithms
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
 
Big Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory ComputationBig Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory Computation
 
Dache: A Data Aware Caching for Big-Data using Map Reduce framework
Dache: A Data Aware Caching for Big-Data using Map Reduce frameworkDache: A Data Aware Caching for Big-Data using Map Reduce framework
Dache: A Data Aware Caching for Big-Data using Map Reduce framework
 
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
 
Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Fra...
Dache: A Data Aware Caching for Big-Data Applications Usingthe MapReduce Fra...Dache: A Data Aware Caching for Big-Data Applications Usingthe MapReduce Fra...
Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Fra...
 
Juniper Innovation Contest
Juniper Innovation ContestJuniper Innovation Contest
Juniper Innovation Contest
 

Viewers also liked

Master thesis byambajargal
Master thesis byambajargalMaster thesis byambajargal
Master thesis byambajargal
kumank
 
High-level languages for Big Data Analytics (Presentation)
High-level languages for Big Data Analytics (Presentation)High-level languages for Big Data Analytics (Presentation)
High-level languages for Big Data Analytics (Presentation)
Jose Luis Lopez Pino
 
Fundação Gol de Letra - Apresentação Institucional
Fundação Gol de Letra - Apresentação InstitucionalFundação Gol de Letra - Apresentação Institucional
Fundação Gol de Letra - Apresentação Institucional
jacawajo
 
myanmar-frontiers(1)
myanmar-frontiers(1)myanmar-frontiers(1)
myanmar-frontiers(1)Luke Phillips
 
TermProject_cp33252_alw278_aa44757
TermProject_cp33252_alw278_aa44757TermProject_cp33252_alw278_aa44757
TermProject_cp33252_alw278_aa44757Abe Arredondo
 
Canonica
CanonicaCanonica
Canonica
Cruz Rodriguez
 
Chinese fashion
Chinese fashionChinese fashion
Chinese fashion
foxali
 
French regular verbs - first group (-er) (present tense)
French regular verbs - first group (-er) (present tense)French regular verbs - first group (-er) (present tense)
French regular verbs - first group (-er) (present tense)
Catherine Bowles
 
Conflicto En Las Organizaciones
Conflicto En Las OrganizacionesConflicto En Las Organizaciones
Conflicto En Las Organizaciones
Cruz Rodriguez
 
Presentacion electronica digital - Compuertas Logicas
Presentacion electronica digital - Compuertas LogicasPresentacion electronica digital - Compuertas Logicas
Presentacion electronica digital - Compuertas Logicas
Cruz Rodriguez
 
MapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementMapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvement
Kyong-Ha Lee
 
1800 METER PLOT FOR SALE IN SECTOR 57,58,63 noida
1800 METER PLOT FOR SALE IN SECTOR 57,58,63 noida1800 METER PLOT FOR SALE IN SECTOR 57,58,63 noida
1800 METER PLOT FOR SALE IN SECTOR 57,58,63 noida
Buniyad Real Estate Services
 

Viewers also liked (12)

Master thesis byambajargal
Master thesis byambajargalMaster thesis byambajargal
Master thesis byambajargal
 
High-level languages for Big Data Analytics (Presentation)
High-level languages for Big Data Analytics (Presentation)High-level languages for Big Data Analytics (Presentation)
High-level languages for Big Data Analytics (Presentation)
 
Fundação Gol de Letra - Apresentação Institucional
Fundação Gol de Letra - Apresentação InstitucionalFundação Gol de Letra - Apresentação Institucional
Fundação Gol de Letra - Apresentação Institucional
 
myanmar-frontiers(1)
myanmar-frontiers(1)myanmar-frontiers(1)
myanmar-frontiers(1)
 
TermProject_cp33252_alw278_aa44757
TermProject_cp33252_alw278_aa44757TermProject_cp33252_alw278_aa44757
TermProject_cp33252_alw278_aa44757
 
Canonica
CanonicaCanonica
Canonica
 
Chinese fashion
Chinese fashionChinese fashion
Chinese fashion
 
French regular verbs - first group (-er) (present tense)
French regular verbs - first group (-er) (present tense)French regular verbs - first group (-er) (present tense)
French regular verbs - first group (-er) (present tense)
 
Conflicto En Las Organizaciones
Conflicto En Las OrganizacionesConflicto En Las Organizaciones
Conflicto En Las Organizaciones
 
Presentacion electronica digital - Compuertas Logicas
Presentacion electronica digital - Compuertas LogicasPresentacion electronica digital - Compuertas Logicas
Presentacion electronica digital - Compuertas Logicas
 
MapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementMapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvement
 
1800 METER PLOT FOR SALE IN SECTOR 57,58,63 noida
1800 METER PLOT FOR SALE IN SECTOR 57,58,63 noida1800 METER PLOT FOR SALE IN SECTOR 57,58,63 noida
1800 METER PLOT FOR SALE IN SECTOR 57,58,63 noida
 

Similar to MapReduce

MapReduce.pptx
MapReduce.pptxMapReduce.pptx
MapReduce.pptx
AtulYadav218546
 
An Introduction to MapReduce
An Introduction to MapReduce An Introduction to MapReduce
An Introduction to MapReduce
Sina Ebrahimi
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reducerantav
 
mapreduce.pptx
mapreduce.pptxmapreduce.pptx
mapreduce.pptx
ShimoFcis
 
Large Scale Data Processing & Storage
Large Scale Data Processing & StorageLarge Scale Data Processing & Storage
Large Scale Data Processing & Storage
Ilayaraja P
 
Performance Models for Apache Accumulo
Performance Models for Apache AccumuloPerformance Models for Apache Accumulo
Performance Models for Apache AccumuloSqrrl
 
IOE MODULE 6.pptx
IOE MODULE 6.pptxIOE MODULE 6.pptx
IOE MODULE 6.pptx
nikshaikh786
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poliivascucristian
 
MATLAB_BIg_Data_ds_Haddop_22032015
MATLAB_BIg_Data_ds_Haddop_22032015MATLAB_BIg_Data_ds_Haddop_22032015
MATLAB_BIg_Data_ds_Haddop_22032015Asaf Ben Gal
 
Download It
Download ItDownload It
Download Itbutest
 
My mapreduce1 presentation
My mapreduce1 presentationMy mapreduce1 presentation
My mapreduce1 presentationNoha Elprince
 
MapReduce: Distributed Computing for Machine Learning
MapReduce: Distributed Computing for Machine LearningMapReduce: Distributed Computing for Machine Learning
MapReduce: Distributed Computing for Machine Learningbutest
 
Map Reducec and Spark big data visualization and analytics
Map Reducec and Spark big data visualization and analyticsMap Reducec and Spark big data visualization and analytics
Map Reducec and Spark big data visualization and analytics
itesm
 
L19CloudMapReduce introduction for cloud computing .ppt
L19CloudMapReduce introduction for cloud computing .pptL19CloudMapReduce introduction for cloud computing .ppt
L19CloudMapReduce introduction for cloud computing .ppt
MaruthiPrasad96
 
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
areej qasrawi
 
MapReduce presentation
MapReduce presentationMapReduce presentation
MapReduce presentation
Vu Thi Trang
 
Apache hadoop
Apache hadoopApache hadoop
Apache hadoop
Sai Koppuravuri
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
Reynold Xin
 

Similar to MapReduce (20)

MapReduce.pptx
MapReduce.pptxMapReduce.pptx
MapReduce.pptx
 
An Introduction to MapReduce
An Introduction to MapReduce An Introduction to MapReduce
An Introduction to MapReduce
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 
mapreduce.pptx
mapreduce.pptxmapreduce.pptx
mapreduce.pptx
 
Large Scale Data Processing & Storage
Large Scale Data Processing & StorageLarge Scale Data Processing & Storage
Large Scale Data Processing & Storage
 
Performance Models for Apache Accumulo
Performance Models for Apache AccumuloPerformance Models for Apache Accumulo
Performance Models for Apache Accumulo
 
IOE MODULE 6.pptx
IOE MODULE 6.pptxIOE MODULE 6.pptx
IOE MODULE 6.pptx
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poli
 
MATLAB_BIg_Data_ds_Haddop_22032015
MATLAB_BIg_Data_ds_Haddop_22032015MATLAB_BIg_Data_ds_Haddop_22032015
MATLAB_BIg_Data_ds_Haddop_22032015
 
Download It
Download ItDownload It
Download It
 
My mapreduce1 presentation
My mapreduce1 presentationMy mapreduce1 presentation
My mapreduce1 presentation
 
MapReduce: Distributed Computing for Machine Learning
MapReduce: Distributed Computing for Machine LearningMapReduce: Distributed Computing for Machine Learning
MapReduce: Distributed Computing for Machine Learning
 
Map Reducec and Spark big data visualization and analytics
Map Reducec and Spark big data visualization and analyticsMap Reducec and Spark big data visualization and analytics
Map Reducec and Spark big data visualization and analytics
 
Pig Experience
Pig ExperiencePig Experience
Pig Experience
 
L19CloudMapReduce introduction for cloud computing .ppt
L19CloudMapReduce introduction for cloud computing .pptL19CloudMapReduce introduction for cloud computing .ppt
L19CloudMapReduce introduction for cloud computing .ppt
 
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
 
MapReduce presentation
MapReduce presentationMapReduce presentation
MapReduce presentation
 
Apache hadoop
Apache hadoopApache hadoop
Apache hadoop
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 
Hadoop
HadoopHadoop
Hadoop
 

More from Abe Arredondo

ChordFusion_cp33252_alw278_aa44757
ChordFusion_cp33252_alw278_aa44757ChordFusion_cp33252_alw278_aa44757
ChordFusion_cp33252_alw278_aa44757Abe Arredondo
 
Arredondo4BizCaseArch2
Arredondo4BizCaseArch2Arredondo4BizCaseArch2
Arredondo4BizCaseArch2Abe Arredondo
 
Implementing PWM Fan Speed Control
Implementing PWM Fan Speed ControlImplementing PWM Fan Speed Control
Implementing PWM Fan Speed ControlAbe Arredondo
 
PWM Fan Speed Control
PWM Fan Speed ControlPWM Fan Speed Control
PWM Fan Speed ControlAbe Arredondo
 

More from Abe Arredondo (11)

ChordFusion_cp33252_alw278_aa44757
ChordFusion_cp33252_alw278_aa44757ChordFusion_cp33252_alw278_aa44757
ChordFusion_cp33252_alw278_aa44757
 
Motorola PCS
Motorola PCSMotorola PCS
Motorola PCS
 
02BladeRunner
02BladeRunner02BladeRunner
02BladeRunner
 
04SJMPeople
04SJMPeople04SJMPeople
04SJMPeople
 
03MSPeople
03MSPeople03MSPeople
03MSPeople
 
Arredondo4BizCaseArch2
Arredondo4BizCaseArch2Arredondo4BizCaseArch2
Arredondo4BizCaseArch2
 
ArredondoPrezi
ArredondoPreziArredondoPrezi
ArredondoPrezi
 
IoT-SecurityECC-v4
IoT-SecurityECC-v4IoT-SecurityECC-v4
IoT-SecurityECC-v4
 
ioT-SecurityECC-v1
ioT-SecurityECC-v1ioT-SecurityECC-v1
ioT-SecurityECC-v1
 
Implementing PWM Fan Speed Control
Implementing PWM Fan Speed ControlImplementing PWM Fan Speed Control
Implementing PWM Fan Speed Control
 
PWM Fan Speed Control
PWM Fan Speed ControlPWM Fan Speed Control
PWM Fan Speed Control
 

MapReduce

  • 1. Mapreduce Simplified Data Processing on Large Clusters Original Research by: Jeffrey Dean and Sanjay Ghemawat Google Inc., Published in OSDI 2004 P RESENTATION BY: A BE A RREDONDO & J A SON BEERE UNIVERSITY OF TEXAS AT AUSTIN GRADUATE SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING EE 382N: DISTRIBUTED SYSTEMS OPT III PRC, FALL 2015 DR. VIJAY K. GARG PROFESSOR, WEI-LUN-HUNG TEACHING ASSISTANT SEPTEMBER 18TH, 2015 1
  • 2. Agenda • Introduction & Overview • Motivation, Background, Examples • Implementation • Diagram • Program Example • Advantages, Disadvantages, Refinements, and Extensions • Performance • Conclusion • References and Appendix 2
  • 3. Introduction and Overview •Motivation: • Process lots of data,Scalable tothousandsof commodityCPU’s, Easyto use •What does it do? • Parallelization, Fault Tolerance, Load Balancing, I/O Scheduling, Monitor • Locally optimized: reduces the amount of data sent across the network •Examples • Web Search Service, Sorting, Data Mining, Machine Leaning, • Distributed Grep & Sort, Web link-Graph Reversal, Inverted Indexes •How and where is it used? ◦ Analytics,Maps,User Behavior,RetailCommercialAdvertising, SocialMedia, ◦ HumanGenome, CancerResearch,FacialRecognition.… Gov&Military.… Monash ResearchPub: http://www.dbms2.com/2008/08/26/known-applications-of-mapreduce/ 3
  • 5. Snippet of code map(String key, String value): // key: document name // value: document contents for each word w in value: EmitIntermediate(w, "1"); reduce(String key, Iterator values): // key: a word // values: a list of counts int result = 0; for each v in values: result += ParseInt(v); Emit(AsString(result)); 5
  • 6. Advantages, Disadvantages, Refinements, and Extensions •Master Fault Tolerance • Unlikely: start a new copy of master • Worker Failure • Master Redirects tasks • Detect Failure: Periodic Heartbeat • Re-execute completed and in-progress map tasks • Re-execute in progress reduce tasks • Task completion committed through master •Straggler Machine Delays • A machine with a bad Disk. et at. … •SOLUTION: Master Schedules a Backup 6
  • 7. Refinements : ◦ Ordering Guarantees: Key/Value pairs are guaranteed to be processed in increasing order. ◦ Skipping bad records using UDP data packets ◦ Combiner Function: Partial combining speeds up MapReduce Ops ◦ Local Execution and Local debugging tools (gdb) ◦ Status Info: Master runs an internal HTTP server and exports status. ◦ User defined Counters Facility used for sanity checking Task Granularity and Pipelining ◦ Many more Map Tasks than machines ◦ Min time for fault recovery ◦ Pipeline shuffling ◦ Dynamic Load Balancing Advantages, Disadvantages, Refinements, and Extensions 2 200,000 Map, 5000 Reduce, w/ 2000 Machines 7
  • 8. Grep Performance 8 •Tests run on cluster of 1800 machines: • 4 GB of memory • Dual-processor2 GHz Xeons with H-hreading • Dual160 GB IDE disks • GigabitEthernetper machine • Bisectionbandwidthapproximately 100Gbps •Two benchmarks: • MR_Grep Scan 1010 100-byte records to extract records matchinga rare pattern(92K matchingrecords) • MR_Sort Sort 1010 100-byte records(modeledafter TeraSortbenchmark) •Locally Optimized Helped • 1800 Machinesread1 TB of dataat peak ~31 GB/s • Withoutthis, rack switches wouldlimitto 10 GB/s • Startupoverhead issignificantforshort jobs
  • 10. Conclusion MapReduce has proven to be a useful abstraction Greatly simplifies large-scale computations at Google Easy to use: focus on problem, let library deal w/ messy details • Parallelization, Fault Tolerance, Load Balancing, I/O Scheduling, Monitor • Locally optimized: reducesthe amount of data sent across the network New code is simpler, easier to understand MapReduce takes care of failures, slow machines Easy to make indexing faster by adding more machines 10
  • 12. One Final Thought In pioneer days they used oxen for heavy pulling, and when one ox couldn't budge a log, they didn't try to grow a larger ox. We shouldn't be trying for bigger computers, but for more systems of computers. - Grace Hopper 12
  • 15. Believed “an apple a day keeps a doctor away” Sam’s Mother Mother Sam An Apple (3) Ekanayake 15
  • 16. Sam thought of “drinking” the apple One day  He used a to cut the and a to make juice. (3) Ekanayake 16
  • 17.  (map ‘( )) ( ) Sam applied his invention to all the fruits he could find in the fruit basket Next Day  (reduce ‘( )) Classical Notion of MapReduce in Functional Programming A list of values mapped into another list of values, which gets reduced into a single value (3) Ekanayake 17
  • 18. 18 Years Later Sam got his first job in JuiceRUs for his talent in making juice  Now, it’s not just one basket but a wholecontainer of fruits  Also, they produce a list of juice types separately NOT ENOUGH!!  But, Sam had just ONE and ONE Largedata and list of values for output Wait! (3) Ekanayake 18
  • 19. Implemented a parallel version of his innovation Brave Sam (<a, > , <o, > , <p, > , …) Each input to a map is a list of <key, value> pairs Each output of a map is a list of <key, value> pairs (<a’, > , <o’, > , <p’, > , …) Grouped by key Each input to a reduce is a <key, value-list> (possibly a list of these, depending on the grouping/hashing mechanism) e.g. <a’, ( …)> Reduced into a list of values (3) Ekanayake 19
  • 20. Implemented a parallel version of his innovation Brave Sam The idea of MapReduce in Data Intensive Computing A list of <key, value> pairs mapped into another list of <key, value> pairs which gets grouped by the key and reduced into a list of values (3) Ekanayake 20
  • 21. Sam realized, ◦ To create his favoritemix fruit juice he can use a combiner after the reducers ◦ If several <key, value-list> fall into the same group (based on the grouping/hashing algorithm) then use the blender (reducer) separatelyon each of them ◦ The knife (mapper)and blender (reducer)should not contain residueafter use – Side Effect Free ◦ In general reducer should be associative and commutative Afterwards (3) Ekanayake 21
  • 22. References (1) MapReduce: Simplified Data Processing on Large Clusters– by Jeffrey Dean and Sanjay Ghemawat. Presentation: http://research.google.com/archive/mapreduce-osdi04-slides/index-auto-0002.html (2) Map-Reducemeets wider Varieties of applications, by Chen, Scholsser: Intel. http://www.cs.cmu.edu/~chensm/papers/IRP-TR-08-05.pdf (3) MapReduce: Story of Sam. By Saliya Ekanayake SALSA HPC Group Pervasive Technology Institute, Indiana University, Bloomingtonhttp://www.slideshare.net/esaliya/mapreduce-in-simple-terms (4) Monash Research Pub: http://www.dbms2.com/2008/08/26/known-applications-of-mapreduce/ (5) Wikipedia https://en.wikipedia.org/wiki/MapReduce#References 22