SlideShare a Scribd company logo
NADAR SARSWATHI COLLEGE
OF ARTS AND SCIENCE
THENI
DEPARTMENT OF INFORMATION
TECHNOLOGY
INTERNET OF THINGS
USING HADOOP MAPREDUCE FOR BATCH DATA
ANALYSIS
BY:
S. SABTHAMI
II. M.Sc(IT)
What is MapReduce?
• Terms are borrowed from Functional Language (e.g., Lisp)
Sum of squares:
• (map square ‘(1 2 3 4))
– Output: (1 4 9 16)
[processes each record sequentially and independently]
• (reduce + ‘(1 4 9 16))
– (+ 16 (+ 9 (+ 4 1) ) )
– Output: 30
[processes set of all records in batches]
Map
• Process individual records to generate
intermediate key/value pairs.
Map
• Process individual records to generate
intermediate key/value pairs.
• Parallelly Process a large number of
individual records to generate intermediate
key/value pairs.
Reduce
• Reduce processes and merges all intermediate
values associated per key
• Each key assigned to one Reduce
• Parallelly Processes and merges all
intermediate values by partitioning keys
Some Applications of MapReduce
Distributed Grep:
– Input: large set of files
– Output: lines that match pattern
– Map – Emits a line if it matches the supplied pattern
– Reduce – Copies the intermediate data to output
Some Applications of MapReduce
(2)
Reverse Web-Link Graph
– Input: Web graph: tuples (a, b) where (page a  page b)
– Output: For each page, list of pages that link to it
– Map – process web log and for each input <source, target>, it outputs
<target, source>
– Reduce - emits <target, list(source)>
Some Applications of MapReduce
(3)
Count of URL access frequency
– Input: Log of accessed URLs, e.g., from proxy server
– Output: For each URL, % of total accesses for that URL
– Map – Process web log and outputs <URL, 1>
– Multiple Reducers - Emits <URL, URL_count>
(So far, like Wordcount. But still need %)
– Chain another MapReduce job after above one
– Map – Processes <URL, URL_count> and outputs <1,
(<URL, URL_count> )>
– 1 Reducer – Sums up URL_count’s to calculate
overall_count.
Emits multiple <URL, URL_count/overall_count>
Some Applications of MapReduce
(4)
Map task’s output is sorted (e.g., quicksort)
Reduce task’s input is sorted (e.g., mergesort)
Sort
– Input: Series of (key, value) pairs
– Output: Sorted <value>s
– Map – <key, value>  <value, _> (identity)
– Reducer – <key, value>  <key, value> (identity)
– Partitioning function – partition keys across reducers based
on ranges (can’t use hashing!)
• Take data distribution into account to balance reducer
tasks
Programming MapReduce
Externally: For user
1. Write a Map program (short), write a Reduce program (short)
2. Specify number of Maps and Reduces (parallelism level)
3. Submit job; wait for result
4. Need to know very little about parallel/distributed programming!
Internally: For the Paradigm and Scheduler
1. Parallelize Map
2. Transfer data from Map to Reduce
3. Parallelize Reduce
4. Implement Storage for Map input, Map output, Reduce input, and
Reduce output
(Ensure that no Reduce starts before all Maps are finished. That is,
ensure the barrier between the Map phase and Reduce phase)
Inside MapReduce
1. Parallelize Map: easy! each map task is independent of
the other!
• All Map output records with same key assigned to
same Reduce
2. Transfer data from Map to Reduce:
• All Map output records with same key assigned to
same Reduce task
• use partitioning function, e.g., hash(key)%number of
reducers
iot.pptx

More Related Content

Similar to iot.pptx

Hadoop
HadoopHadoop
Big Data Analytics with Hadoop with @techmilind
Big Data Analytics with Hadoop with @techmilindBig Data Analytics with Hadoop with @techmilind
Big Data Analytics with Hadoop with @techmilind
EMC
 
MapReduce.pptx
MapReduce.pptxMapReduce.pptx
MapReduce.pptx
AtulYadav218546
 
Stratosphere with big_data_analytics
Stratosphere with big_data_analyticsStratosphere with big_data_analytics
Stratosphere with big_data_analytics
Avinash Pandu
 
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Konstantin V. Shvachko
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
Dr. C.V. Suresh Babu
 
Hadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pigHadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pig
KhanKhaja1
 
Unit3 MapReduce
Unit3 MapReduceUnit3 MapReduce
Meethadoop
MeethadoopMeethadoop
Meethadoop
IIIT-H
 
Lecture 2 part 3
Lecture 2 part 3Lecture 2 part 3
Lecture 2 part 3
Jazan University
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
Chicago Hadoop Users Group
 
Large Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part ILarge Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part IMarin Dimitrov
 
Hadoop and Mapreduce for .NET User Group
Hadoop and Mapreduce for .NET User GroupHadoop and Mapreduce for .NET User Group
Hadoop and Mapreduce for .NET User GroupCsaba Toth
 
Hadoop classes in mumbai
Hadoop classes in mumbaiHadoop classes in mumbai
Hadoop classes in mumbai
Vibrant Technologies & Computers
 
Introduction to the Map-Reduce framework.pdf
Introduction to the Map-Reduce framework.pdfIntroduction to the Map-Reduce framework.pdf
Introduction to the Map-Reduce framework.pdf
BikalAdhikari4
 
Scala and spark
Scala and sparkScala and spark
Scala and spark
Fabio Fumarola
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
ateeq ateeq
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
Paul Chao
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
Ahmad El Tawil
 

Similar to iot.pptx (20)

Hadoop
HadoopHadoop
Hadoop
 
Big Data Analytics with Hadoop with @techmilind
Big Data Analytics with Hadoop with @techmilindBig Data Analytics with Hadoop with @techmilind
Big Data Analytics with Hadoop with @techmilind
 
MapReduce.pptx
MapReduce.pptxMapReduce.pptx
MapReduce.pptx
 
Stratosphere with big_data_analytics
Stratosphere with big_data_analyticsStratosphere with big_data_analytics
Stratosphere with big_data_analytics
 
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
 
Hadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pigHadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pig
 
Unit3 MapReduce
Unit3 MapReduceUnit3 MapReduce
Unit3 MapReduce
 
Meethadoop
MeethadoopMeethadoop
Meethadoop
 
Lecture 2 part 3
Lecture 2 part 3Lecture 2 part 3
Lecture 2 part 3
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
 
Large Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part ILarge Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part I
 
Hadoop and Mapreduce for .NET User Group
Hadoop and Mapreduce for .NET User GroupHadoop and Mapreduce for .NET User Group
Hadoop and Mapreduce for .NET User Group
 
Hadoop classes in mumbai
Hadoop classes in mumbaiHadoop classes in mumbai
Hadoop classes in mumbai
 
Introduction to the Map-Reduce framework.pdf
Introduction to the Map-Reduce framework.pdfIntroduction to the Map-Reduce framework.pdf
Introduction to the Map-Reduce framework.pdf
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Scala and spark
Scala and sparkScala and spark
Scala and spark
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 

More from SabthamiS1

women%20empowerment11.pptx
women%20empowerment11.pptxwomen%20empowerment11.pptx
women%20empowerment11.pptx
SabthamiS1
 
big data analytics.pptx
big data analytics.pptxbig data analytics.pptx
big data analytics.pptx
SabthamiS1
 
dip.pptx
dip.pptxdip.pptx
dip.pptx
SabthamiS1
 
csc.pptx
csc.pptxcsc.pptx
csc.pptx
SabthamiS1
 
python.pptx
python.pptxpython.pptx
python.pptx
SabthamiS1
 
Data minig.pptx
Data minig.pptxData minig.pptx
Data minig.pptx
SabthamiS1
 
artificial intelligence.pptx
artificial intelligence.pptxartificial intelligence.pptx
artificial intelligence.pptx
SabthamiS1
 
distributed computing.pptx
distributed computing.pptxdistributed computing.pptx
distributed computing.pptx
SabthamiS1
 
Network and internet security
Network and internet security Network and internet security
Network and internet security
SabthamiS1
 
Java
Java Java
Java
SabthamiS1
 
Advance computer architecture
Advance computer architecture Advance computer architecture
Advance computer architecture
SabthamiS1
 
Data structure and algorithm
Data structure and algorithmData structure and algorithm
Data structure and algorithm
SabthamiS1
 

More from SabthamiS1 (12)

women%20empowerment11.pptx
women%20empowerment11.pptxwomen%20empowerment11.pptx
women%20empowerment11.pptx
 
big data analytics.pptx
big data analytics.pptxbig data analytics.pptx
big data analytics.pptx
 
dip.pptx
dip.pptxdip.pptx
dip.pptx
 
csc.pptx
csc.pptxcsc.pptx
csc.pptx
 
python.pptx
python.pptxpython.pptx
python.pptx
 
Data minig.pptx
Data minig.pptxData minig.pptx
Data minig.pptx
 
artificial intelligence.pptx
artificial intelligence.pptxartificial intelligence.pptx
artificial intelligence.pptx
 
distributed computing.pptx
distributed computing.pptxdistributed computing.pptx
distributed computing.pptx
 
Network and internet security
Network and internet security Network and internet security
Network and internet security
 
Java
Java Java
Java
 
Advance computer architecture
Advance computer architecture Advance computer architecture
Advance computer architecture
 
Data structure and algorithm
Data structure and algorithmData structure and algorithm
Data structure and algorithm
 

Recently uploaded

Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
Excellence Foundation for South Sudan
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
PedroFerreira53928
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
Nguyen Thanh Tu Collection
 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
AzmatAli747758
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
Fundacja Rozwoju Społeczeństwa Przedsiębiorczego
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
Celine George
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
rosedainty
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
Col Mukteshwar Prasad
 

Recently uploaded (20)

Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
 

iot.pptx

  • 1. NADAR SARSWATHI COLLEGE OF ARTS AND SCIENCE THENI DEPARTMENT OF INFORMATION TECHNOLOGY INTERNET OF THINGS USING HADOOP MAPREDUCE FOR BATCH DATA ANALYSIS BY: S. SABTHAMI II. M.Sc(IT)
  • 2. What is MapReduce? • Terms are borrowed from Functional Language (e.g., Lisp) Sum of squares: • (map square ‘(1 2 3 4)) – Output: (1 4 9 16) [processes each record sequentially and independently] • (reduce + ‘(1 4 9 16)) – (+ 16 (+ 9 (+ 4 1) ) ) – Output: 30 [processes set of all records in batches]
  • 3. Map • Process individual records to generate intermediate key/value pairs.
  • 4. Map • Process individual records to generate intermediate key/value pairs. • Parallelly Process a large number of individual records to generate intermediate key/value pairs.
  • 5. Reduce • Reduce processes and merges all intermediate values associated per key • Each key assigned to one Reduce • Parallelly Processes and merges all intermediate values by partitioning keys
  • 6. Some Applications of MapReduce Distributed Grep: – Input: large set of files – Output: lines that match pattern – Map – Emits a line if it matches the supplied pattern – Reduce – Copies the intermediate data to output
  • 7. Some Applications of MapReduce (2) Reverse Web-Link Graph – Input: Web graph: tuples (a, b) where (page a  page b) – Output: For each page, list of pages that link to it – Map – process web log and for each input <source, target>, it outputs <target, source> – Reduce - emits <target, list(source)>
  • 8. Some Applications of MapReduce (3) Count of URL access frequency – Input: Log of accessed URLs, e.g., from proxy server – Output: For each URL, % of total accesses for that URL – Map – Process web log and outputs <URL, 1> – Multiple Reducers - Emits <URL, URL_count> (So far, like Wordcount. But still need %) – Chain another MapReduce job after above one – Map – Processes <URL, URL_count> and outputs <1, (<URL, URL_count> )> – 1 Reducer – Sums up URL_count’s to calculate overall_count. Emits multiple <URL, URL_count/overall_count>
  • 9. Some Applications of MapReduce (4) Map task’s output is sorted (e.g., quicksort) Reduce task’s input is sorted (e.g., mergesort) Sort – Input: Series of (key, value) pairs – Output: Sorted <value>s – Map – <key, value>  <value, _> (identity) – Reducer – <key, value>  <key, value> (identity) – Partitioning function – partition keys across reducers based on ranges (can’t use hashing!) • Take data distribution into account to balance reducer tasks
  • 10. Programming MapReduce Externally: For user 1. Write a Map program (short), write a Reduce program (short) 2. Specify number of Maps and Reduces (parallelism level) 3. Submit job; wait for result 4. Need to know very little about parallel/distributed programming! Internally: For the Paradigm and Scheduler 1. Parallelize Map 2. Transfer data from Map to Reduce 3. Parallelize Reduce 4. Implement Storage for Map input, Map output, Reduce input, and Reduce output (Ensure that no Reduce starts before all Maps are finished. That is, ensure the barrier between the Map phase and Reduce phase)
  • 11. Inside MapReduce 1. Parallelize Map: easy! each map task is independent of the other! • All Map output records with same key assigned to same Reduce 2. Transfer data from Map to Reduce: • All Map output records with same key assigned to same Reduce task • use partitioning function, e.g., hash(key)%number of reducers