SlideShare a Scribd company logo
1 of 66
Distributed Computing & MapReduce Presented by:  Abdul Qadeer
Today’s Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Feel free to comment / ask questions anytime!
Distributed Computing
Distributed Computing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Distributed Computing - What ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Distributed Computing - Why ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Distributed Computing - Why ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Distributed Computing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Clusters
How Clusters are Built? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
How Clusters are Built? ,[object Object],[object Object],http://www.flickr.com/photos/drurydrama/;http://www.fotosearch.com/photos-images/ox.html
How Clusters are Built? Table generated at : http://top500.org/stats/list/37/archtype
How Clusters are Built? ,[object Object]
How Clusters are Built? ,[object Object]
How Clusters are Used? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
How Clusters are Used? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
MapReduce
The Problem at Hand! ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The Goal ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Elaboration by Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Solution 1 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What is wrong with Solution1? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
How to improve on solution1? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],More cons than pros! No guarantee of speed up.
Solution2 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Solution 2 Elaboration ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What is wrong with Solution2? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
How to improve solution2 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],So ideally we need a solution which takes care of messy / complicated parallelism / distributed computing details
What is MapReduce ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Nuts and Bolts of MapReduce Fig. 1 taken from OSDI 2004 paper:MapReduce: Simplified Data Processing on Large Clusters App. programmer write mapper and reducer code.  Rest is automatic!  Messy details of parallelism, scalability, fault tolerance is taken care of by the framework.
Word Count Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Word Count Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Word Count Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Word Count Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Word Count Example – A Quiz ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Why Split Data ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The Power of Splitting
Fault Tolerance ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Backup Tasks ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Performance due to Backup Tasks Fig. 3 taken from OSDI 2004 paper:MapReduce: Simplified Data Processing on Large Clusters
Combiners ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example Usage of MapReduce ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Web Search ,[object Object],Fig taken from OSDI 2004 paper:MapReduce: Simplified Data Processing on Large Clusters
Yahoo Now Using MapReduce ,[object Object]
Case Studies
Case Study 1: New York Times ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Case Study 2: IPv4 Census ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Case Study 2: IPv4 Census ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hadoop
MapReduce in Action! ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Hadoop and HPCNL Cluster Lets have a visual tour of Hadoop!
Cluster Summary
Worker State
Job Status
Graphical Progress Report
Speedy Machines do More Work
Backup jobs
Failures are Frequent
Backwards Progress
Backwards Progress
Blacklisting
Slow Machines
Stragglers
Multiple Backup Tasks
Sequencial WordCount
Word Counting Mapper and Reducer ,[object Object],[object Object]
Concluding Remarks ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Any more questions?

More Related Content

Viewers also liked

Web data management (chapter-1)
Web data management (chapter-1)Web data management (chapter-1)
Web data management (chapter-1)Dhaval Asodariya
 
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...Frederic Desprez
 
International Journal of Distributed Computing and Technology vol 2 issue 1
International Journal of Distributed Computing and Technology vol 2 issue 1International Journal of Distributed Computing and Technology vol 2 issue 1
International Journal of Distributed Computing and Technology vol 2 issue 1JournalsPub www.journalspub.com
 
Distributed-ness: Distributed computing & the clouds
Distributed-ness: Distributed computing & the cloudsDistributed-ness: Distributed computing & the clouds
Distributed-ness: Distributed computing & the cloudsRobert Coup
 
Grid – Distributed Computing at Scale
Grid – Distributed Computing at ScaleGrid – Distributed Computing at Scale
Grid – Distributed Computing at Scaleroyans
 
Consensus in distributed computing
Consensus in distributed computingConsensus in distributed computing
Consensus in distributed computingRuben Tan
 
Distributed computing the Google way
Distributed computing the Google wayDistributed computing the Google way
Distributed computing the Google wayEduard Hildebrandt
 
High performance data center computing using manageable distributed computing
High performance data center computing using manageable distributed computingHigh performance data center computing using manageable distributed computing
High performance data center computing using manageable distributed computingJuniper Networks
 
Introduction to OpenDaylight & Application Development
Introduction to OpenDaylight & Application DevelopmentIntroduction to OpenDaylight & Application Development
Introduction to OpenDaylight & Application DevelopmentMichelle Holley
 
Concepts of Distributed Computing & Cloud Computing
Concepts of Distributed Computing & Cloud Computing Concepts of Distributed Computing & Cloud Computing
Concepts of Distributed Computing & Cloud Computing Hitesh Kumar Markam
 
Load Balancing In Distributed Computing
Load Balancing In Distributed ComputingLoad Balancing In Distributed Computing
Load Balancing In Distributed ComputingRicha Singh
 
Grid computing notes
Grid computing notesGrid computing notes
Grid computing notesSyed Mustafa
 
OpenStack and OpenDaylight: An Integrated IaaS for SDN/NFV
OpenStack and OpenDaylight: An Integrated IaaS for SDN/NFVOpenStack and OpenDaylight: An Integrated IaaS for SDN/NFV
OpenStack and OpenDaylight: An Integrated IaaS for SDN/NFVCloud Native Day Tel Aviv
 

Viewers also liked (16)

Web data management (chapter-1)
Web data management (chapter-1)Web data management (chapter-1)
Web data management (chapter-1)
 
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
 
International Journal of Distributed Computing and Technology vol 2 issue 1
International Journal of Distributed Computing and Technology vol 2 issue 1International Journal of Distributed Computing and Technology vol 2 issue 1
International Journal of Distributed Computing and Technology vol 2 issue 1
 
Distributed-ness: Distributed computing & the clouds
Distributed-ness: Distributed computing & the cloudsDistributed-ness: Distributed computing & the clouds
Distributed-ness: Distributed computing & the clouds
 
Grid – Distributed Computing at Scale
Grid – Distributed Computing at ScaleGrid – Distributed Computing at Scale
Grid – Distributed Computing at Scale
 
Consensus in distributed computing
Consensus in distributed computingConsensus in distributed computing
Consensus in distributed computing
 
Distributed computing the Google way
Distributed computing the Google wayDistributed computing the Google way
Distributed computing the Google way
 
BitCoin, P2P, Distributed Computing
BitCoin, P2P, Distributed ComputingBitCoin, P2P, Distributed Computing
BitCoin, P2P, Distributed Computing
 
High performance data center computing using manageable distributed computing
High performance data center computing using manageable distributed computingHigh performance data center computing using manageable distributed computing
High performance data center computing using manageable distributed computing
 
Introduction to OpenDaylight & Application Development
Introduction to OpenDaylight & Application DevelopmentIntroduction to OpenDaylight & Application Development
Introduction to OpenDaylight & Application Development
 
Concepts of Distributed Computing & Cloud Computing
Concepts of Distributed Computing & Cloud Computing Concepts of Distributed Computing & Cloud Computing
Concepts of Distributed Computing & Cloud Computing
 
Load Balancing In Distributed Computing
Load Balancing In Distributed ComputingLoad Balancing In Distributed Computing
Load Balancing In Distributed Computing
 
Grid computing notes
Grid computing notesGrid computing notes
Grid computing notes
 
Distributed Computing
Distributed ComputingDistributed Computing
Distributed Computing
 
OpenStack and OpenDaylight: An Integrated IaaS for SDN/NFV
OpenStack and OpenDaylight: An Integrated IaaS for SDN/NFVOpenStack and OpenDaylight: An Integrated IaaS for SDN/NFV
OpenStack and OpenDaylight: An Integrated IaaS for SDN/NFV
 
Distributed Computing
Distributed ComputingDistributed Computing
Distributed Computing
 

Similar to Distributed Computing & MapReduce

Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poliivascucristian
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Distributed Systems: scalability and high availability
Distributed Systems: scalability and high availabilityDistributed Systems: scalability and high availability
Distributed Systems: scalability and high availabilityRenato Lucindo
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabadsreehari orienit
 
MapReduce on Zero VM
MapReduce on Zero VM MapReduce on Zero VM
MapReduce on Zero VM Joy Rahman
 
Simplified Data Processing On Large Cluster
Simplified Data Processing On Large ClusterSimplified Data Processing On Large Cluster
Simplified Data Processing On Large ClusterHarsh Kevadia
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsZvi Avraham
 
Dori Exterman, Considerations for choosing the parallel computing strategy th...
Dori Exterman, Considerations for choosing the parallel computing strategy th...Dori Exterman, Considerations for choosing the parallel computing strategy th...
Dori Exterman, Considerations for choosing the parallel computing strategy th...Sergey Platonov
 
Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overviewharithakannan
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Dayprogrammermag
 
Art Of Distributed P0
Art Of Distributed P0Art Of Distributed P0
Art Of Distributed P0George Ang
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine ParallelismSri Prasanna
 
Hadoop live online training
Hadoop live online trainingHadoop live online training
Hadoop live online trainingHarika583
 
Choosing the right parallel compute architecture
Choosing the right parallel compute architecture Choosing the right parallel compute architecture
Choosing the right parallel compute architecture corehard_by
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational DatabasesUdi Bauman
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scalesamthemonad
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.pptSathish24111
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...Reynold Xin
 

Similar to Distributed Computing & MapReduce (20)

Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poli
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Distributed Systems: scalability and high availability
Distributed Systems: scalability and high availabilityDistributed Systems: scalability and high availability
Distributed Systems: scalability and high availability
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabad
 
MapReduce on Zero VM
MapReduce on Zero VM MapReduce on Zero VM
MapReduce on Zero VM
 
Simplified Data Processing On Large Cluster
Simplified Data Processing On Large ClusterSimplified Data Processing On Large Cluster
Simplified Data Processing On Large Cluster
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
 
Dori Exterman, Considerations for choosing the parallel computing strategy th...
Dori Exterman, Considerations for choosing the parallel computing strategy th...Dori Exterman, Considerations for choosing the parallel computing strategy th...
Dori Exterman, Considerations for choosing the parallel computing strategy th...
 
Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overview
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Day
 
Art Of Distributed P0
Art Of Distributed P0Art Of Distributed P0
Art Of Distributed P0
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine Parallelism
 
Hadoop live online training
Hadoop live online trainingHadoop live online training
Hadoop live online training
 
Choosing the right parallel compute architecture
Choosing the right parallel compute architecture Choosing the right parallel compute architecture
Choosing the right parallel compute architecture
 
Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational Databases
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scale
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.ppt
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 

More from coolmirza143

Introduction Cell Processor
Introduction Cell ProcessorIntroduction Cell Processor
Introduction Cell Processorcoolmirza143
 
Cell processor lab
Cell processor labCell processor lab
Cell processor labcoolmirza143
 
Introduction Cell Processor
Introduction Cell ProcessorIntroduction Cell Processor
Introduction Cell Processorcoolmirza143
 
Map reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingMap reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingcoolmirza143
 
Multithreading in Android
Multithreading in AndroidMultithreading in Android
Multithreading in Androidcoolmirza143
 
Introduction to Real-Time Operating Systems
Introduction to Real-Time Operating SystemsIntroduction to Real-Time Operating Systems
Introduction to Real-Time Operating Systemscoolmirza143
 

More from coolmirza143 (8)

Introduction Cell Processor
Introduction Cell ProcessorIntroduction Cell Processor
Introduction Cell Processor
 
Cell processor lab
Cell processor labCell processor lab
Cell processor lab
 
Introduction Cell Processor
Introduction Cell ProcessorIntroduction Cell Processor
Introduction Cell Processor
 
Map reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingMap reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreading
 
Multithreading in Android
Multithreading in AndroidMultithreading in Android
Multithreading in Android
 
Cuda lab manual
Cuda lab manualCuda lab manual
Cuda lab manual
 
Introduction to Real-Time Operating Systems
Introduction to Real-Time Operating SystemsIntroduction to Real-Time Operating Systems
Introduction to Real-Time Operating Systems
 
Cuda 2011
Cuda 2011Cuda 2011
Cuda 2011
 

Recently uploaded

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 

Recently uploaded (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 

Distributed Computing & MapReduce

Editor's Notes

  1. Course Theme: Learning Multi-Core Programming using different paradigms, techniques e.g. raw threads like Posix Threads, Threading Building Blocks, SIMD style programming using PS3, CUDA etc. Today we will see another style of parallel programming, somewhat similar to SIMD, called SPMD (Single program, multiple data). -In some sense you have been introduced to different parallelization techniques from most generic (and probably most difficult to manage!) to less generic (and probably less powerful and less difficult). -MapReduce was invented by Google and presented in OSDI in 2004. During past 6, 7 years MapReduce has become a defacto way to crunch huge data.
  2. Before we talk specifically about MapReduce you need to learn about: Some generic stuff about Distributed Computing What are clusters and how are they built
  3. Lets do a 5 minute exercise by asking the audience what they think / know about distributed computing.
  4. Probably distributed computing is one of those things which are difficult to succinctly define in few lines. But lets give it a shot.
  5. What is distributed computing, as compared to centralized computing -The network and its intricacies ~ bandwidth, delay, jitter etc. -Similarly the software design and / or the software tools can have their affect on the whole distributed system. -The element of complexity and to tame it
  6. Why on earth we need distributed computing? Distributed programming is difficult than centralized program because of the component of network, failure models (parts can fail, while other parts are alive) etc. -Over time a single machine is increasing in capability and one always need to stop and think that is it enough for me or not. Go to distributed computing out of the need only. -Example of John’s census example; now thinking to move census setup to a single server
  7. Problem: MSN like chat service with possibly millions of users Partitioning users on multiple servers (+Only a portion of user base will go down; graceful degradation) How this partitioning should be done? Based on country? Load?
  8. Take away message is that as a professional when you are making a decision to use centralized or distributed infrastructure, base your decision on the real need.
  9. Google and many others use an implementation of MapReduce which runs on cluster of commodity computers (though there are other implementation are available as well e.g. Standord’s Phoenix system which uses multiple cores within a single computer as a computer node.
  10. -It also depends who is the user of the cluster. E.g. in the case of library of congress, they use blade servers, which are easy for them to manage because their core business is not computing; when something go wrong in a blade, that server is pulled out and sent to the vendor for maintenance. -Similarly AWS can get a beefy server and make many system virtual machines out of it. (We will talk a lot more about it in tomorrow’s lecture).
  11. When you need more ‘power’ what will you do; grow a bigger ‘ox’ or use multiple oxen?
  12. MPP = Massively Parallel Programmed Computer (think of a big Ox!) Constellations = Sun Microsystems’ (Now acquired by Oracle) Sparc based system
  13. Cost / $
  14. So now the stage is set to introduce MapReduce
  15. Generally what is MapReduce. It is a framework for parallel programming which is good for a specific type of problems.