SlideShare a Scribd company logo
1 of 30
Lecture 2 – MapReduce: Theory and Implementation CSE 490h – Introduction to Distributed Computing, Spring 2007 Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.
Outline ,[object Object],[object Object]
Functional Programming Review ,[object Object],[object Object],[object Object],[object Object]
Functional Programming Review ,[object Object],[object Object],[object Object]
Functional Updates Do Not Modify Structures ,[object Object],[object Object],[object Object],The append() function above reverses a list, adds a new element to the front, and returns all of that, reversed, which appends an item.  But it  never modifies lst !
Functions Can Be Used As Arguments ,[object Object],It does not matter what f does to its argument; DoDouble() will do it twice. What is the type of this function?
Map ,[object Object],[object Object]
Fold ,[object Object],[object Object]
fold left vs. fold right ,[object Object],[object Object],[object Object],SML Implementation: fun foldl f a []  = a | foldl f a (x::xs) = foldl f (f(x, a)) xs fun foldr f a []  = a | foldr f a (x::xs) = f(x, (foldr f a xs))
Example ,[object Object],[object Object],[object Object]
Example (Solved) ,[object Object],[object Object],[object Object],[object Object],[object Object]
A More Complicated Fold Problem ,[object Object],[object Object],[object Object]
A More Complicated Map Problem ,[object Object],[object Object]
map Implementation ,[object Object],[object Object],fun map f []  = [] | map f (x::xs) = (f x) :: (map f xs)
Implicit Parallelism In map ,[object Object],[object Object],[object Object]
MapReduce
Motivation: Large Scale Data Processing ,[object Object],[object Object],[object Object]
MapReduce ,[object Object],[object Object],[object Object],[object Object]
Programming Model ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
map ,[object Object],[object Object]
reduce ,[object Object],[object Object],[object Object]
 
Parallelism ,[object Object],[object Object],[object Object],[object Object]
Example: Count word occurrences map(String input_key, String input_value): // input_key: document name  // input_value: document contents  for each  word w  in  input_value:  EmitIntermediate (w, "1");  reduce(String output_key, Iterator intermediate_values):  // output_key: a word  // output_values: a list of counts  int  result = 0;  for each  v  in  intermediate_values:  result += ParseInt(v); Emit (AsString(result));
Example vs. Actual Source Code ,[object Object],[object Object],[object Object],[object Object]
Locality ,[object Object],[object Object]
Fault Tolerance ,[object Object],[object Object],[object Object],[object Object],[object Object]
Optimizations ,[object Object],[object Object],[object Object],Why is it safe to redundantly execute map tasks? Wouldn’t this mess up the total computation?
Optimizations ,[object Object],[object Object],Under what conditions is it sound to use a combiner?
MapReduce Conclusions ,[object Object],[object Object],[object Object],[object Object]

More Related Content

What's hot

06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering
Subhas Kumar Ghosh
 
SGF15--e-Poster-Shevrin-3273
SGF15--e-Poster-Shevrin-3273SGF15--e-Poster-Shevrin-3273
SGF15--e-Poster-Shevrin-3273
ThomsonReuters
 
MapReduce : Simplified Data Processing on Large Clusters
MapReduce : Simplified Data Processing on Large ClustersMapReduce : Simplified Data Processing on Large Clusters
MapReduce : Simplified Data Processing on Large Clusters
Abolfazl Asudeh
 

What's hot (20)

Map and Reduce
Map and ReduceMap and Reduce
Map and Reduce
 
Map Reduce Online
Map Reduce OnlineMap Reduce Online
Map Reduce Online
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
 
Data Visualization With R
Data Visualization With RData Visualization With R
Data Visualization With R
 
R studio
R studio R studio
R studio
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering
 
Mapreduce
MapreduceMapreduce
Mapreduce
 
SGF15--e-Poster-Shevrin-3273
SGF15--e-Poster-Shevrin-3273SGF15--e-Poster-Shevrin-3273
SGF15--e-Poster-Shevrin-3273
 
Join optimization in hive
Join optimization in hive Join optimization in hive
Join optimization in hive
 
MapReduce : Simplified Data Processing on Large Clusters
MapReduce : Simplified Data Processing on Large ClustersMapReduce : Simplified Data Processing on Large Clusters
MapReduce : Simplified Data Processing on Large Clusters
 
MapReduce: Simplified Data Processing on Large Clusters
MapReduce: Simplified Data Processing on Large ClustersMapReduce: Simplified Data Processing on Large Clusters
MapReduce: Simplified Data Processing on Large Clusters
 
Mapfilterreducepresentation
MapfilterreducepresentationMapfilterreducepresentation
Mapfilterreducepresentation
 
Hash map (java platform se 8 )
Hash map (java platform se 8 )Hash map (java platform se 8 )
Hash map (java platform se 8 )
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
Optimization for iterative queries on Mapreduce
Optimization for iterative queries on MapreduceOptimization for iterative queries on Mapreduce
Optimization for iterative queries on Mapreduce
 
Stack & heap
Stack & heap Stack & heap
Stack & heap
 
Map algebra
Map algebraMap algebra
Map algebra
 
Data Visualization With R: Learn To Combine Multiple Graphs
Data Visualization With R: Learn To Combine Multiple GraphsData Visualization With R: Learn To Combine Multiple Graphs
Data Visualization With R: Learn To Combine Multiple Graphs
 
Mi primer map reduce
Mi primer map reduceMi primer map reduce
Mi primer map reduce
 
Mi primer map reduce
Mi primer map reduceMi primer map reduce
Mi primer map reduce
 

Similar to Mapreduce: Theory and implementation

Distributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
Distributed Computing Seminar - Lecture 2: MapReduce Theory and ImplementationDistributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
Distributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
tugrulh
 
Functional Programming in F#
Functional Programming in F#Functional Programming in F#
Functional Programming in F#
Dmitri Nesteruk
 
Real World Haskell: Lecture 6
Real World Haskell: Lecture 6Real World Haskell: Lecture 6
Real World Haskell: Lecture 6
Bryan O'Sullivan
 

Similar to Mapreduce: Theory and implementation (20)

Lec2 Mapred
Lec2 MapredLec2 Mapred
Lec2 Mapred
 
Distributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
Distributed Computing Seminar - Lecture 2: MapReduce Theory and ImplementationDistributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
Distributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
MapReduce-Notes.pdf
MapReduce-Notes.pdfMapReduce-Notes.pdf
MapReduce-Notes.pdf
 
Unit3 MapReduce
Unit3 MapReduceUnit3 MapReduce
Unit3 MapReduce
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
Matlab1
Matlab1Matlab1
Matlab1
 
Stacks,queues,linked-list
Stacks,queues,linked-listStacks,queues,linked-list
Stacks,queues,linked-list
 
MapReduce
MapReduceMapReduce
MapReduce
 
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache Spark
 
Alpine Spark Implementation - Technical
Alpine Spark Implementation - TechnicalAlpine Spark Implementation - Technical
Alpine Spark Implementation - Technical
 
MapReduce
MapReduceMapReduce
MapReduce
 
Functional Programming in F#
Functional Programming in F#Functional Programming in F#
Functional Programming in F#
 
Game of Life - Polyglot FP - Haskell - Scala - Unison - Part 3
Game of Life - Polyglot FP - Haskell - Scala - Unison - Part 3Game of Life - Polyglot FP - Haskell - Scala - Unison - Part 3
Game of Life - Polyglot FP - Haskell - Scala - Unison - Part 3
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
 
MapReduce.pptx
MapReduce.pptxMapReduce.pptx
MapReduce.pptx
 
MapReduce
MapReduceMapReduce
MapReduce
 
Lecture 2 part 3
Lecture 2 part 3Lecture 2 part 3
Lecture 2 part 3
 
Real World Haskell: Lecture 6
Real World Haskell: Lecture 6Real World Haskell: Lecture 6
Real World Haskell: Lecture 6
 

More from Sri Prasanna

More from Sri Prasanna (20)

Qr codes para tech radar
Qr codes para tech radarQr codes para tech radar
Qr codes para tech radar
 
Qr codes para tech radar 2
Qr codes para tech radar 2Qr codes para tech radar 2
Qr codes para tech radar 2
 
Test
TestTest
Test
 
Test
TestTest
Test
 
assds
assdsassds
assds
 
assds
assdsassds
assds
 
asdsa
asdsaasdsa
asdsa
 
dsd
dsddsd
dsd
 
About stacks
About stacksAbout stacks
About stacks
 
About Stacks
About  StacksAbout  Stacks
About Stacks
 
About Stacks
About  StacksAbout  Stacks
About Stacks
 
About Stacks
About  StacksAbout  Stacks
About Stacks
 
About Stacks
About  StacksAbout  Stacks
About Stacks
 
About Stacks
About  StacksAbout  Stacks
About Stacks
 
About Stacks
About StacksAbout Stacks
About Stacks
 
About Stacks
About StacksAbout Stacks
About Stacks
 
Network and distributed systems
Network and distributed systemsNetwork and distributed systems
Network and distributed systems
 
Introduction & Parellelization on large scale clusters
Introduction & Parellelization on large scale clustersIntroduction & Parellelization on large scale clusters
Introduction & Parellelization on large scale clusters
 
Other distributed systems
Other distributed systemsOther distributed systems
Other distributed systems
 
Distributed file systems
Distributed file systemsDistributed file systems
Distributed file systems
 

Recently uploaded

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 

Mapreduce: Theory and implementation

  • 1. Lecture 2 – MapReduce: Theory and Implementation CSE 490h – Introduction to Distributed Computing, Spring 2007 Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.  
  • 23.
  • 24. Example: Count word occurrences map(String input_key, String input_value): // input_key: document name // input_value: document contents for each word w in input_value: EmitIntermediate (w, "1"); reduce(String output_key, Iterator intermediate_values): // output_key: a word // output_values: a list of counts int result = 0; for each v in intermediate_values: result += ParseInt(v); Emit (AsString(result));
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.