SlideShare a Scribd company logo
1
Cloud Computing :
MapReduce - Tutorial
Prof. Soumya K Ghosh
Department of Computer Science and Engineering
IIT KHARAGPUR
Introduction
• MapReduce: programming model developed at Google
• Objective:
– Implement large scale search
– Text processing on massively scalable web data stored using BigTable and GFS distributed file
system
• Designed for processing and generating large volumes of data via massively parallel
computations, utilizing tens of thousands of processors at a time
• Fault tolerant: ensure progress of computation even if processors and networks fail
• Example:
– Hadoop: open source implementation of MapReduce (developed at Yahoo!)
– Available on pre-packaged AMIs on Amazon EC2 cloud platform
9/11/2017 2
MapReduce Model
9/11/2017 3
• Parallel programming abstraction
• Used by many different parallel applications which carry out large-scale
computation involving thousands of processors
• Leverages a common underlying fault-tolerant implementation
• Two phases of MapReduce:
– Map operation
– Reduce operation
• A configurable number of M ‘mapper’ processors and R ‘reducer’ processors are
assigned to work on the problem
• The computation is coordinated by a single master process
MapReduce Model Contd…
9/11/2017 4
• Map phase:
– Each mapper reads approximately 1/M of the input from the global file
system, using locations given by the master
– Map operation consists of transforming one set of key-value pairs to
another:
– Each mapper writes computation results in one file per reducer
– Files are sorted by a key and stored to the local file system
– The master keeps track of the location of these files
MapReduce Model Contd…
9/11/2017 5
• Reduce phase:
– The master informs the reducers where the partial computations have been stored
on local files of respective mappers
– Reducers make remote procedure call requests to the mappers to fetch the files
– Each reducer groups the results of the map step using the same key and performs a
function f on the list of values that correspond to these key value:
– Final results are written back to the GFS file system
MapReduce: Example
9/11/2017 6
• 3 mappers; 2 reducers
• Map function:
• Reduce function:
Problem-1
9/11/2017 7
In a MapReduce framework consider the HDFS block size is 64 MB.
We have 3 files of size 64K, 65Mb and 127Mb. How many blocks will
be created by Hadoop framework?
Problem-2
9/11/2017 8
Write the pseudo-codes (for map and reduce functions) for calculating
the average of a set of integers in MapReduce.
Suppose A = (10, 20, 30, 40, 50) is a set of integers. Show the map and
reduce outputs.
Problem-3
9/11/2017 9
Compute total and average salary of organization XYZ and group by
based on gender (male or female) using MapReduce. The input is as
follows
Name, Gender, Salary
John, M, 10,000
Martha, F, 15,000
----
Problem-4
9/11/2017 10
Write the Map and Reduce functions (pseudo-codes) for the following Word
Length Categorization problem under MapReduce model.
Word Length Categorization: Given a text paragraph (containing only words),
categorize each word into following categories. Output the frequency of
occurrence of words in each category.
Categories:
tiny: 1-2 letters; small: 3-5 letters; medium: 6-9 letters; big: 10 or more letters
11

More Related Content

What's hot

Introduction to Clustering algorithm
Introduction to Clustering algorithmIntroduction to Clustering algorithm
Introduction to Clustering algorithm
hadifar
 
AI 7 | Constraint Satisfaction Problem
AI 7 | Constraint Satisfaction ProblemAI 7 | Constraint Satisfaction Problem
AI 7 | Constraint Satisfaction Problem
Mohammad Imam Hossain
 
NLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit DistanceNLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit Distance
Hemantha Kulathilake
 
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavMachine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Agile Testing Alliance
 
2. visualization in data mining
2. visualization in data mining2. visualization in data mining
2. visualization in data mining
Azad public school
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
mrizwan969
 
Np hard
Np hardNp hard
Np hard
jesal_joshi
 
P, NP, NP-Complete, and NP-Hard
P, NP, NP-Complete, and NP-HardP, NP, NP-Complete, and NP-Hard
P, NP, NP-Complete, and NP-Hard
Animesh Chaturvedi
 
Concept learning
Concept learningConcept learning
Concept learning
Musa Hawamdah
 
Binary Search Tree
Binary Search TreeBinary Search Tree
Concept learning and candidate elimination algorithm
Concept learning and candidate elimination algorithmConcept learning and candidate elimination algorithm
Concept learning and candidate elimination algorithm
swapnac12
 
Generating code from dags
Generating code from dagsGenerating code from dags
Generating code from dags
indhu mathi
 
Design issues for the layers
Design issues for the layersDesign issues for the layers
Design issues for the layers
jayaprakash
 
Learning in AI
Learning in AILearning in AI
Learning in AI
Minakshi Atre
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
VNIT-ACM Student Chapter
 
Lecture 17 Iterative Deepening a star algorithm
Lecture 17 Iterative Deepening a star algorithmLecture 17 Iterative Deepening a star algorithm
Lecture 17 Iterative Deepening a star algorithm
Hema Kashyap
 
Webinar : P, NP, NP-Hard , NP - Complete problems
Webinar : P, NP, NP-Hard , NP - Complete problems Webinar : P, NP, NP-Hard , NP - Complete problems
Webinar : P, NP, NP-Hard , NP - Complete problems
Ziyauddin Shaik
 
Master theorem
Master theoremMaster theorem
Master theorem
fika sweety
 
Hadoop foundation for analytics
Hadoop foundation for analyticsHadoop foundation for analytics
Hadoop foundation for analytics
HariniA7
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
Knoldus Inc.
 

What's hot (20)

Introduction to Clustering algorithm
Introduction to Clustering algorithmIntroduction to Clustering algorithm
Introduction to Clustering algorithm
 
AI 7 | Constraint Satisfaction Problem
AI 7 | Constraint Satisfaction ProblemAI 7 | Constraint Satisfaction Problem
AI 7 | Constraint Satisfaction Problem
 
NLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit DistanceNLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit Distance
 
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavMachine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
 
2. visualization in data mining
2. visualization in data mining2. visualization in data mining
2. visualization in data mining
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Np hard
Np hardNp hard
Np hard
 
P, NP, NP-Complete, and NP-Hard
P, NP, NP-Complete, and NP-HardP, NP, NP-Complete, and NP-Hard
P, NP, NP-Complete, and NP-Hard
 
Concept learning
Concept learningConcept learning
Concept learning
 
Binary Search Tree
Binary Search TreeBinary Search Tree
Binary Search Tree
 
Concept learning and candidate elimination algorithm
Concept learning and candidate elimination algorithmConcept learning and candidate elimination algorithm
Concept learning and candidate elimination algorithm
 
Generating code from dags
Generating code from dagsGenerating code from dags
Generating code from dags
 
Design issues for the layers
Design issues for the layersDesign issues for the layers
Design issues for the layers
 
Learning in AI
Learning in AILearning in AI
Learning in AI
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Lecture 17 Iterative Deepening a star algorithm
Lecture 17 Iterative Deepening a star algorithmLecture 17 Iterative Deepening a star algorithm
Lecture 17 Iterative Deepening a star algorithm
 
Webinar : P, NP, NP-Hard , NP - Complete problems
Webinar : P, NP, NP-Hard , NP - Complete problems Webinar : P, NP, NP-Hard , NP - Complete problems
Webinar : P, NP, NP-Hard , NP - Complete problems
 
Master theorem
Master theoremMaster theorem
Master theorem
 
Hadoop foundation for analytics
Hadoop foundation for analyticsHadoop foundation for analytics
Hadoop foundation for analytics
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 

Similar to Mod05lec23(map reduce tutorial)

Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
Urvashi Kataria
 
E031201032036
E031201032036E031201032036
E031201032036
ijceronline
 
Hadoop
HadoopHadoop
Big Data.pptx
Big Data.pptxBig Data.pptx
Big Data.pptx
NelakurthyVasanthRed1
 
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfmodule3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
TSANKARARAO
 
Hadoop - Introduction to HDFS
Hadoop - Introduction to HDFSHadoop - Introduction to HDFS
Hadoop - Introduction to HDFS
Vibrant Technologies & Computers
 
Hadoop
HadoopHadoop
Hadoop
Anil Reddy
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Deanna Kosaraju
 
Big Data Processing
Big Data ProcessingBig Data Processing
Big Data Processing
Michael Ming Lei
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
Kelly Technologies
 
Hadoop
HadoopHadoop
Map reducecloudtech
Map reducecloudtechMap reducecloudtech
Map reducecloudtech
Jakir Hossain
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
Dong Ngoc
 
Hadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraintHadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraint
ijccsa
 
Hadoop and Mapreduce Introduction
Hadoop and Mapreduce IntroductionHadoop and Mapreduce Introduction
Hadoop and Mapreduce Introduction
rajsandhu1989
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to Hadoop
GERARDO BARBERENA
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
Reynold Xin
 
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node CombinersHadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
ijcsit
 
Hadoop mapreduce and yarn frame work- unit5
Hadoop mapreduce and yarn frame work-  unit5Hadoop mapreduce and yarn frame work-  unit5
Hadoop mapreduce and yarn frame work- unit5
RojaT4
 
Design Issues and Challenges of Peer-to-Peer Video on Demand System
Design Issues and Challenges of Peer-to-Peer Video on Demand System Design Issues and Challenges of Peer-to-Peer Video on Demand System
Design Issues and Challenges of Peer-to-Peer Video on Demand System
cscpconf
 

Similar to Mod05lec23(map reduce tutorial) (20)

Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
 
E031201032036
E031201032036E031201032036
E031201032036
 
Hadoop
HadoopHadoop
Hadoop
 
Big Data.pptx
Big Data.pptxBig Data.pptx
Big Data.pptx
 
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfmodule3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
 
Hadoop - Introduction to HDFS
Hadoop - Introduction to HDFSHadoop - Introduction to HDFS
Hadoop - Introduction to HDFS
 
Hadoop
HadoopHadoop
Hadoop
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
 
Big Data Processing
Big Data ProcessingBig Data Processing
Big Data Processing
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
 
Hadoop
HadoopHadoop
Hadoop
 
Map reducecloudtech
Map reducecloudtechMap reducecloudtech
Map reducecloudtech
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Hadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraintHadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraint
 
Hadoop and Mapreduce Introduction
Hadoop and Mapreduce IntroductionHadoop and Mapreduce Introduction
Hadoop and Mapreduce Introduction
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to Hadoop
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node CombinersHadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
 
Hadoop mapreduce and yarn frame work- unit5
Hadoop mapreduce and yarn frame work-  unit5Hadoop mapreduce and yarn frame work-  unit5
Hadoop mapreduce and yarn frame work- unit5
 
Design Issues and Challenges of Peer-to-Peer Video on Demand System
Design Issues and Challenges of Peer-to-Peer Video on Demand System Design Issues and Challenges of Peer-to-Peer Video on Demand System
Design Issues and Challenges of Peer-to-Peer Video on Demand System
 

More from Ankit Gupta

Biometricstechnology in iot and machine learning
Biometricstechnology in iot and machine learningBiometricstechnology in iot and machine learning
Biometricstechnology in iot and machine learning
Ankit Gupta
 
Week2 cloud computing week2
Week2 cloud computing week2Week2 cloud computing week2
Week2 cloud computing week2
Ankit Gupta
 
Week 8 lecture material
Week 8 lecture materialWeek 8 lecture material
Week 8 lecture material
Ankit Gupta
 
Week 4 lecture material cc (1)
Week 4 lecture material cc (1)Week 4 lecture material cc (1)
Week 4 lecture material cc (1)
Ankit Gupta
 
Week 3 lecture material cc
Week 3 lecture material ccWeek 3 lecture material cc
Week 3 lecture material cc
Ankit Gupta
 
Week 1 lecture material cc
Week 1 lecture material ccWeek 1 lecture material cc
Week 1 lecture material cc
Ankit Gupta
 
Mod05lec25(resource mgmt ii)
Mod05lec25(resource mgmt ii)Mod05lec25(resource mgmt ii)
Mod05lec25(resource mgmt ii)
Ankit Gupta
 
Mod05lec24(resource mgmt i)
Mod05lec24(resource mgmt i)Mod05lec24(resource mgmt i)
Mod05lec24(resource mgmt i)
Ankit Gupta
 
Mod05lec22(cloudonomics tutorial)
Mod05lec22(cloudonomics tutorial)Mod05lec22(cloudonomics tutorial)
Mod05lec22(cloudonomics tutorial)
Ankit Gupta
 
Mod05lec21(sla tutorial)
Mod05lec21(sla tutorial)Mod05lec21(sla tutorial)
Mod05lec21(sla tutorial)
Ankit Gupta
 
Lecture29 cc-security4
Lecture29 cc-security4Lecture29 cc-security4
Lecture29 cc-security4
Ankit Gupta
 
Lecture28 cc-security3
Lecture28 cc-security3Lecture28 cc-security3
Lecture28 cc-security3
Ankit Gupta
 
Lecture27 cc-security2
Lecture27 cc-security2Lecture27 cc-security2
Lecture27 cc-security2
Ankit Gupta
 
Lecture26 cc-security1
Lecture26 cc-security1Lecture26 cc-security1
Lecture26 cc-security1
Ankit Gupta
 
Lecture 30 cloud mktplace
Lecture 30 cloud mktplaceLecture 30 cloud mktplace
Lecture 30 cloud mktplace
Ankit Gupta
 
Week 7 lecture material
Week 7 lecture materialWeek 7 lecture material
Week 7 lecture material
Ankit Gupta
 
Gurukul Cse cbcs-2015-16
Gurukul Cse cbcs-2015-16Gurukul Cse cbcs-2015-16
Gurukul Cse cbcs-2015-16
Ankit Gupta
 
Microprocessor full hand made notes
Microprocessor full hand made notesMicroprocessor full hand made notes
Microprocessor full hand made notes
Ankit Gupta
 
Transfer Leaning Using Pytorch synopsis Minor project pptx
Transfer Leaning Using Pytorch  synopsis Minor project pptxTransfer Leaning Using Pytorch  synopsis Minor project pptx
Transfer Leaning Using Pytorch synopsis Minor project pptx
Ankit Gupta
 
Intro/Overview on Machine Learning Presentation -2
Intro/Overview on Machine Learning Presentation -2Intro/Overview on Machine Learning Presentation -2
Intro/Overview on Machine Learning Presentation -2
Ankit Gupta
 

More from Ankit Gupta (20)

Biometricstechnology in iot and machine learning
Biometricstechnology in iot and machine learningBiometricstechnology in iot and machine learning
Biometricstechnology in iot and machine learning
 
Week2 cloud computing week2
Week2 cloud computing week2Week2 cloud computing week2
Week2 cloud computing week2
 
Week 8 lecture material
Week 8 lecture materialWeek 8 lecture material
Week 8 lecture material
 
Week 4 lecture material cc (1)
Week 4 lecture material cc (1)Week 4 lecture material cc (1)
Week 4 lecture material cc (1)
 
Week 3 lecture material cc
Week 3 lecture material ccWeek 3 lecture material cc
Week 3 lecture material cc
 
Week 1 lecture material cc
Week 1 lecture material ccWeek 1 lecture material cc
Week 1 lecture material cc
 
Mod05lec25(resource mgmt ii)
Mod05lec25(resource mgmt ii)Mod05lec25(resource mgmt ii)
Mod05lec25(resource mgmt ii)
 
Mod05lec24(resource mgmt i)
Mod05lec24(resource mgmt i)Mod05lec24(resource mgmt i)
Mod05lec24(resource mgmt i)
 
Mod05lec22(cloudonomics tutorial)
Mod05lec22(cloudonomics tutorial)Mod05lec22(cloudonomics tutorial)
Mod05lec22(cloudonomics tutorial)
 
Mod05lec21(sla tutorial)
Mod05lec21(sla tutorial)Mod05lec21(sla tutorial)
Mod05lec21(sla tutorial)
 
Lecture29 cc-security4
Lecture29 cc-security4Lecture29 cc-security4
Lecture29 cc-security4
 
Lecture28 cc-security3
Lecture28 cc-security3Lecture28 cc-security3
Lecture28 cc-security3
 
Lecture27 cc-security2
Lecture27 cc-security2Lecture27 cc-security2
Lecture27 cc-security2
 
Lecture26 cc-security1
Lecture26 cc-security1Lecture26 cc-security1
Lecture26 cc-security1
 
Lecture 30 cloud mktplace
Lecture 30 cloud mktplaceLecture 30 cloud mktplace
Lecture 30 cloud mktplace
 
Week 7 lecture material
Week 7 lecture materialWeek 7 lecture material
Week 7 lecture material
 
Gurukul Cse cbcs-2015-16
Gurukul Cse cbcs-2015-16Gurukul Cse cbcs-2015-16
Gurukul Cse cbcs-2015-16
 
Microprocessor full hand made notes
Microprocessor full hand made notesMicroprocessor full hand made notes
Microprocessor full hand made notes
 
Transfer Leaning Using Pytorch synopsis Minor project pptx
Transfer Leaning Using Pytorch  synopsis Minor project pptxTransfer Leaning Using Pytorch  synopsis Minor project pptx
Transfer Leaning Using Pytorch synopsis Minor project pptx
 
Intro/Overview on Machine Learning Presentation -2
Intro/Overview on Machine Learning Presentation -2Intro/Overview on Machine Learning Presentation -2
Intro/Overview on Machine Learning Presentation -2
 

Recently uploaded

2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
Yasser Mahgoub
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
CSM Cloud Service Management Presentarion
CSM Cloud Service Management PresentarionCSM Cloud Service Management Presentarion
CSM Cloud Service Management Presentarion
rpskprasana
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
KrishnaveniKrishnara1
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
jpsjournal1
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
171ticu
 
The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.
sachin chaurasia
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
University of Maribor
 
Recycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part IIRecycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part II
Aditya Rajan Patra
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
171ticu
 
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball playEric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
enizeyimana36
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
Dr Ramhari Poudyal
 
Casting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdfCasting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdf
zubairahmad848137
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
VICTOR MAESTRE RAMIREZ
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
RadiNasr
 
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have oneISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
Las Vegas Warehouse
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
Textile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdfTextile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdf
NazakatAliKhoso2
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
IJNSA Journal
 

Recently uploaded (20)

2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
CSM Cloud Service Management Presentarion
CSM Cloud Service Management PresentarionCSM Cloud Service Management Presentarion
CSM Cloud Service Management Presentarion
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
 
The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
 
Recycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part IIRecycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part II
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
 
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball playEric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
 
Casting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdfCasting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdf
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
 
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have oneISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
Textile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdfTextile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdf
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
 

Mod05lec23(map reduce tutorial)

  • 1. 1 Cloud Computing : MapReduce - Tutorial Prof. Soumya K Ghosh Department of Computer Science and Engineering IIT KHARAGPUR
  • 2. Introduction • MapReduce: programming model developed at Google • Objective: – Implement large scale search – Text processing on massively scalable web data stored using BigTable and GFS distributed file system • Designed for processing and generating large volumes of data via massively parallel computations, utilizing tens of thousands of processors at a time • Fault tolerant: ensure progress of computation even if processors and networks fail • Example: – Hadoop: open source implementation of MapReduce (developed at Yahoo!) – Available on pre-packaged AMIs on Amazon EC2 cloud platform 9/11/2017 2
  • 3. MapReduce Model 9/11/2017 3 • Parallel programming abstraction • Used by many different parallel applications which carry out large-scale computation involving thousands of processors • Leverages a common underlying fault-tolerant implementation • Two phases of MapReduce: – Map operation – Reduce operation • A configurable number of M ‘mapper’ processors and R ‘reducer’ processors are assigned to work on the problem • The computation is coordinated by a single master process
  • 4. MapReduce Model Contd… 9/11/2017 4 • Map phase: – Each mapper reads approximately 1/M of the input from the global file system, using locations given by the master – Map operation consists of transforming one set of key-value pairs to another: – Each mapper writes computation results in one file per reducer – Files are sorted by a key and stored to the local file system – The master keeps track of the location of these files
  • 5. MapReduce Model Contd… 9/11/2017 5 • Reduce phase: – The master informs the reducers where the partial computations have been stored on local files of respective mappers – Reducers make remote procedure call requests to the mappers to fetch the files – Each reducer groups the results of the map step using the same key and performs a function f on the list of values that correspond to these key value: – Final results are written back to the GFS file system
  • 6. MapReduce: Example 9/11/2017 6 • 3 mappers; 2 reducers • Map function: • Reduce function:
  • 7. Problem-1 9/11/2017 7 In a MapReduce framework consider the HDFS block size is 64 MB. We have 3 files of size 64K, 65Mb and 127Mb. How many blocks will be created by Hadoop framework?
  • 8. Problem-2 9/11/2017 8 Write the pseudo-codes (for map and reduce functions) for calculating the average of a set of integers in MapReduce. Suppose A = (10, 20, 30, 40, 50) is a set of integers. Show the map and reduce outputs.
  • 9. Problem-3 9/11/2017 9 Compute total and average salary of organization XYZ and group by based on gender (male or female) using MapReduce. The input is as follows Name, Gender, Salary John, M, 10,000 Martha, F, 15,000 ----
  • 10. Problem-4 9/11/2017 10 Write the Map and Reduce functions (pseudo-codes) for the following Word Length Categorization problem under MapReduce model. Word Length Categorization: Given a text paragraph (containing only words), categorize each word into following categories. Output the frequency of occurrence of words in each category. Categories: tiny: 1-2 letters; small: 3-5 letters; medium: 6-9 letters; big: 10 or more letters
  • 11. 11