SlideShare a Scribd company logo
© RAGINIJAIN CC SA 4.0
Ragini Jain
MSc CA 1st
Year (2015 - 2017)
Map Reduce
© RAGINIJAIN CC SA 4.0
Overview
● What is Map Reduce
● Map Reduce schematic
● Map Reduce in detail
● Comparison of Map Reduce models
● Demo
● References
© RAGINIJAIN CC SA 4.0
What is Map Reduce
● A software framework which supports
– Parallel
– Distributed computing
on large data sets.
● The framework abstracts the data flow of running a parallel
program on a distributed computing system by providing users
with two interfaces in the form of functions:
– Map
– Reduce
● Users can control and manipulate the data flow of their programs
by overriding the Map() and Reduce() function
● Map Reduce library is the controller.
© RAGINIJAIN CC SA 4.0
Map – reduce schematic
Source: jeremykyun
© RAGINIJAIN CC SA 4.0
Map – reduce schematic (2)
Source: hadoop project
© RAGINIJAIN CC SA 4.0
Map Reduce (in detail)
● The Map function is applied in parallel to every input (key, value)
pair and produces new set of intermediate (key, value) pairs
(key1, val1) ------(map function)---> List (key2, val2)
● Then the MapReduce library collects all the produced intermediate
(key, value) pairs from all input (key, val) pairs and sorts them
based on the key part
● Finally Reduce function is applied in parallel to each group
producing the collection of values
(key2, List(val2)) -----(reduce function) ---> List (val2)
© RAGINIJAIN CC SA 4.0
Map Reduce (as a query framework)
● SQL clauses that are the building block for Map Reduce
operations on structured data and data warehouses
– GROUP BY
– ORDER BY
● On a very large set of demographic data
         SELECT age, AVG(contacts)
             FROM social.person
         GROUP BY age
         ORDER BY age
© RAGINIJAIN CC SA 4.0
GROUP BY (SQL vs Pig)
© RAGINIJAIN CC SA 4.0
Comparison Map Reduce models
● Google Map Reduce
– Prog Model: Map Reduce
– Data handling: Google file system
● Apache Hadoop
– Prog Model: Map Reduce
– Data Handling: HDFS (Hadoop Distributed File system)
● Microsoft Dryad
– Prog Model: DAG (Directed Acyclic Graph) execution
– Data Handling: Shared directories, Local disks
● Twister
– Prog Model: Iterative Map Reduce
– Data Handling: Local disks
© RAGINIJAIN CC SA 4.0
Demo
● Java program
– Utilizes concepts from Java 8 programming language platform.
● Lambda expressions
● Streams
– JDK ref
● java.util.Collection.stream()
● java.lang.Iterable.forEach( )
● java.util.List
© RAGINIJAIN CC SA 4.0
References
● Jeffrey Dean et' al
MapReduce: Simplified Data Processing on Large Clusters
http://research.google.com/archive/mapreduce.html
● Michelle Stonebraker et' al
MapReduce and Parallel DBMSs: Friends or Foes ?
http://dl.acm.org/citation.cfm?id=1629197
● Java Lambda expressions
https://docs.oracle.com/javase/tutorial/java/javaOO/lambdaexp
● PostgreSQL GROUP BY and ORDER BY 
http://www.postgresql.org/docs/devel/static/sql­select.html
© RAGINIJAIN CC SA 4.0
Thank you.
● Questions
● Clarifications
● Suggestions
● Feedback
Ragini Jain
15030142023@sicsr.ac.in

More Related Content

What's hot

QGIS UK: QGIS Evangelism (thinkWhere)
QGIS UK: QGIS Evangelism (thinkWhere)QGIS UK: QGIS Evangelism (thinkWhere)
QGIS UK: QGIS Evangelism (thinkWhere)
Ross McDonald
 
Integrating PostGIS in Web Applications
Integrating PostGIS in Web ApplicationsIntegrating PostGIS in Web Applications
Integrating PostGIS in Web Applications
Command Prompt., Inc
 
Introducing MagnetoDB, a key-value storage sevice for OpenStack
Introducing MagnetoDB, a key-value storage sevice for OpenStackIntroducing MagnetoDB, a key-value storage sevice for OpenStack
Introducing MagnetoDB, a key-value storage sevice for OpenStack
Mirantis
 
Nokia Asha webinar: Developing location-based services for Nokia Asha phones ...
Nokia Asha webinar: Developing location-based services for Nokia Asha phones ...Nokia Asha webinar: Developing location-based services for Nokia Asha phones ...
Nokia Asha webinar: Developing location-based services for Nokia Asha phones ...
Microsoft Mobile Developer
 
Dash plotly data visualization
Dash plotly data visualizationDash plotly data visualization
Dash plotly data visualization
Charu Gupta
 
QGIS and Altas: Automatic map generation
QGIS and Altas: Automatic map generationQGIS and Altas: Automatic map generation
QGIS and Altas: Automatic map generation
QGIS UK
 
Integrating CAD and GIS Data at Mineta San Jose International Airport
Integrating CAD and GIS Data at Mineta San Jose International AirportIntegrating CAD and GIS Data at Mineta San Jose International Airport
Integrating CAD and GIS Data at Mineta San Jose International Airport
jeffhobbs
 
Streaming in the Extreme
Streaming in the ExtremeStreaming in the Extreme
Streaming in the Extreme
Julius Remigio, CBIP
 
An End User Perspective on Implementing Oracle in the Engineering Environment
An End User Perspective on Implementing Oracle in the Engineering EnvironmentAn End User Perspective on Implementing Oracle in the Engineering Environment
An End User Perspective on Implementing Oracle in the Engineering Environment
jeffhobbs
 
City of Roseville Case Study
City of Roseville Case StudyCity of Roseville Case Study
City of Roseville Case Study
jeffhobbs
 
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
BigData_Europe
 
The Whitebox Geospatial-Analyisis Tools Project and Open-Access GIS
The Whitebox Geospatial-Analyisis Tools Project and Open-Access GISThe Whitebox Geospatial-Analyisis Tools Project and Open-Access GIS
The Whitebox Geospatial-Analyisis Tools Project and Open-Access GIS
Golgi Alvarez
 
Enriching data by_cooking_recipes_in_cloud_dataprep
Enriching data by_cooking_recipes_in_cloud_dataprepEnriching data by_cooking_recipes_in_cloud_dataprep
Enriching data by_cooking_recipes_in_cloud_dataprep
Supriya Badgujar
 
Location based services for Nokia X and Nokia Asha using Geo2tag
Location based services for Nokia X and Nokia Asha using Geo2tagLocation based services for Nokia X and Nokia Asha using Geo2tag
Location based services for Nokia X and Nokia Asha using Geo2tag
Microsoft Mobile Developer
 
New opensource geospatial software stack from NextGIS
New opensource geospatial software stack from NextGISNew opensource geospatial software stack from NextGIS
New opensource geospatial software stack from NextGIS
Maxim Dubinin
 
Designing and Using Cached Map
Designing and Using Cached Map Designing and Using Cached Map
Designing and Using Cached Map
M.Muneeb Ashraf
 
Mago3D Barcelona ICGC(카탈루니아 지형 및 지질연구소) 발표자료
Mago3D Barcelona ICGC(카탈루니아 지형 및 지질연구소) 발표자료Mago3D Barcelona ICGC(카탈루니아 지형 및 지질연구소) 발표자료
Mago3D Barcelona ICGC(카탈루니아 지형 및 지질연구소) 발표자료
BJ Jang
 
GraphQL & DGraph with Go
GraphQL & DGraph with GoGraphQL & DGraph with Go
GraphQL & DGraph with Go
James Tan
 
Sistema de recomendación entiempo real usando Delta Lake
Sistema de recomendación entiempo real usando Delta LakeSistema de recomendación entiempo real usando Delta Lake
Sistema de recomendación entiempo real usando Delta Lake
Globant
 
Producing Linked Open Data with a Content Management System
Producing Linked Open Data with a Content Management SystemProducing Linked Open Data with a Content Management System
Producing Linked Open Data with a Content Management System
Open Knowledge Belgium
 

What's hot (20)

QGIS UK: QGIS Evangelism (thinkWhere)
QGIS UK: QGIS Evangelism (thinkWhere)QGIS UK: QGIS Evangelism (thinkWhere)
QGIS UK: QGIS Evangelism (thinkWhere)
 
Integrating PostGIS in Web Applications
Integrating PostGIS in Web ApplicationsIntegrating PostGIS in Web Applications
Integrating PostGIS in Web Applications
 
Introducing MagnetoDB, a key-value storage sevice for OpenStack
Introducing MagnetoDB, a key-value storage sevice for OpenStackIntroducing MagnetoDB, a key-value storage sevice for OpenStack
Introducing MagnetoDB, a key-value storage sevice for OpenStack
 
Nokia Asha webinar: Developing location-based services for Nokia Asha phones ...
Nokia Asha webinar: Developing location-based services for Nokia Asha phones ...Nokia Asha webinar: Developing location-based services for Nokia Asha phones ...
Nokia Asha webinar: Developing location-based services for Nokia Asha phones ...
 
Dash plotly data visualization
Dash plotly data visualizationDash plotly data visualization
Dash plotly data visualization
 
QGIS and Altas: Automatic map generation
QGIS and Altas: Automatic map generationQGIS and Altas: Automatic map generation
QGIS and Altas: Automatic map generation
 
Integrating CAD and GIS Data at Mineta San Jose International Airport
Integrating CAD and GIS Data at Mineta San Jose International AirportIntegrating CAD and GIS Data at Mineta San Jose International Airport
Integrating CAD and GIS Data at Mineta San Jose International Airport
 
Streaming in the Extreme
Streaming in the ExtremeStreaming in the Extreme
Streaming in the Extreme
 
An End User Perspective on Implementing Oracle in the Engineering Environment
An End User Perspective on Implementing Oracle in the Engineering EnvironmentAn End User Perspective on Implementing Oracle in the Engineering Environment
An End User Perspective on Implementing Oracle in the Engineering Environment
 
City of Roseville Case Study
City of Roseville Case StudyCity of Roseville Case Study
City of Roseville Case Study
 
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
 
The Whitebox Geospatial-Analyisis Tools Project and Open-Access GIS
The Whitebox Geospatial-Analyisis Tools Project and Open-Access GISThe Whitebox Geospatial-Analyisis Tools Project and Open-Access GIS
The Whitebox Geospatial-Analyisis Tools Project and Open-Access GIS
 
Enriching data by_cooking_recipes_in_cloud_dataprep
Enriching data by_cooking_recipes_in_cloud_dataprepEnriching data by_cooking_recipes_in_cloud_dataprep
Enriching data by_cooking_recipes_in_cloud_dataprep
 
Location based services for Nokia X and Nokia Asha using Geo2tag
Location based services for Nokia X and Nokia Asha using Geo2tagLocation based services for Nokia X and Nokia Asha using Geo2tag
Location based services for Nokia X and Nokia Asha using Geo2tag
 
New opensource geospatial software stack from NextGIS
New opensource geospatial software stack from NextGISNew opensource geospatial software stack from NextGIS
New opensource geospatial software stack from NextGIS
 
Designing and Using Cached Map
Designing and Using Cached Map Designing and Using Cached Map
Designing and Using Cached Map
 
Mago3D Barcelona ICGC(카탈루니아 지형 및 지질연구소) 발표자료
Mago3D Barcelona ICGC(카탈루니아 지형 및 지질연구소) 발표자료Mago3D Barcelona ICGC(카탈루니아 지형 및 지질연구소) 발표자료
Mago3D Barcelona ICGC(카탈루니아 지형 및 지질연구소) 발표자료
 
GraphQL & DGraph with Go
GraphQL & DGraph with GoGraphQL & DGraph with Go
GraphQL & DGraph with Go
 
Sistema de recomendación entiempo real usando Delta Lake
Sistema de recomendación entiempo real usando Delta LakeSistema de recomendación entiempo real usando Delta Lake
Sistema de recomendación entiempo real usando Delta Lake
 
Producing Linked Open Data with a Content Management System
Producing Linked Open Data with a Content Management SystemProducing Linked Open Data with a Content Management System
Producing Linked Open Data with a Content Management System
 

Similar to Map Reduce

MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISONMAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
ijcsit
 
Stratosphere with big_data_analytics
Stratosphere with big_data_analyticsStratosphere with big_data_analytics
Stratosphere with big_data_analytics
Avinash Pandu
 
Download It
Download ItDownload It
Download It
butest
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
Connected Data World
 
Spark Driven Big Data Analytics
Spark Driven Big Data AnalyticsSpark Driven Big Data Analytics
Spark Driven Big Data Analytics
inoshg
 
Benchmarking tool for graph algorithms
Benchmarking tool for graph algorithmsBenchmarking tool for graph algorithms
Benchmarking tool for graph algorithms
Yash Khandelwal
 
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォームPivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Masayuki Matsushita
 
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
A Survey on Data Mapping Strategy for data stored in the storage cloud  111A Survey on Data Mapping Strategy for data stored in the storage cloud  111
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
NavNeet KuMar
 
An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache Spark
Amir Sedighi
 
Introduction to GCP Data Flow Presentation
Introduction to GCP Data Flow PresentationIntroduction to GCP Data Flow Presentation
Introduction to GCP Data Flow Presentation
Knoldus Inc.
 
Introduction to GCP DataFlow Presentation
Introduction to GCP DataFlow PresentationIntroduction to GCP DataFlow Presentation
Introduction to GCP DataFlow Presentation
Knoldus Inc.
 
B04 06 0918
B04 06 0918B04 06 0918
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Anant Corporation
 
Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data Platforms
ScyllaDB
 
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
Mariano Gonzalez
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Tugdual Grall
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big data
eSAT Publishing House
 
B04 06 0918
B04 06 0918B04 06 0918
Dsm Presentation
Dsm PresentationDsm Presentation
Dsm Presentation
richoe
 
Spark cluster computing with working sets
Spark cluster computing with working setsSpark cluster computing with working sets
Spark cluster computing with working sets
JinxinTang
 

Similar to Map Reduce (20)

MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISONMAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
 
Stratosphere with big_data_analytics
Stratosphere with big_data_analyticsStratosphere with big_data_analytics
Stratosphere with big_data_analytics
 
Download It
Download ItDownload It
Download It
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
 
Spark Driven Big Data Analytics
Spark Driven Big Data AnalyticsSpark Driven Big Data Analytics
Spark Driven Big Data Analytics
 
Benchmarking tool for graph algorithms
Benchmarking tool for graph algorithmsBenchmarking tool for graph algorithms
Benchmarking tool for graph algorithms
 
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォームPivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
 
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
A Survey on Data Mapping Strategy for data stored in the storage cloud  111A Survey on Data Mapping Strategy for data stored in the storage cloud  111
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
 
An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache Spark
 
Introduction to GCP Data Flow Presentation
Introduction to GCP Data Flow PresentationIntroduction to GCP Data Flow Presentation
Introduction to GCP Data Flow Presentation
 
Introduction to GCP DataFlow Presentation
Introduction to GCP DataFlow PresentationIntroduction to GCP DataFlow Presentation
Introduction to GCP DataFlow Presentation
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
 
Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data Platforms
 
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big data
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
Dsm Presentation
Dsm PresentationDsm Presentation
Dsm Presentation
 
Spark cluster computing with working sets
Spark cluster computing with working setsSpark cluster computing with working sets
Spark cluster computing with working sets
 

Recently uploaded

Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47
MysoreMuleSoftMeetup
 
Standardized tool for Intelligence test.
Standardized tool for Intelligence test.Standardized tool for Intelligence test.
Standardized tool for Intelligence test.
deepaannamalai16
 
Educational Technology in the Health Sciences
Educational Technology in the Health SciencesEducational Technology in the Health Sciences
Educational Technology in the Health Sciences
Iris Thiele Isip-Tan
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
Himanshu Rai
 
Electric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger HuntElectric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger Hunt
RamseyBerglund
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
GeorgeMilliken2
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
haiqairshad
 
How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17
Celine George
 
Skimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S EliotSkimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S Eliot
nitinpv4ai
 
MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025
khuleseema60
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
Nguyen Thanh Tu Collection
 
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...
EduSkills OECD
 
SWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptxSWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptx
zuzanka
 
Bonku-Babus-Friend by Sathyajith Ray (9)
Bonku-Babus-Friend by Sathyajith Ray  (9)Bonku-Babus-Friend by Sathyajith Ray  (9)
Bonku-Babus-Friend by Sathyajith Ray (9)
nitinpv4ai
 
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptxRESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
zuzanka
 
CIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdfCIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdf
blueshagoo1
 
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptxBIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
RidwanHassanYusuf
 
Stack Memory Organization of 8086 Microprocessor
Stack Memory Organization of 8086 MicroprocessorStack Memory Organization of 8086 Microprocessor
Stack Memory Organization of 8086 Microprocessor
JomonJoseph58
 
Data Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsxData Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsx
Prof. Dr. K. Adisesha
 
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
TechSoup
 

Recently uploaded (20)

Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47
 
Standardized tool for Intelligence test.
Standardized tool for Intelligence test.Standardized tool for Intelligence test.
Standardized tool for Intelligence test.
 
Educational Technology in the Health Sciences
Educational Technology in the Health SciencesEducational Technology in the Health Sciences
Educational Technology in the Health Sciences
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
 
Electric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger HuntElectric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger Hunt
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
 
How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17
 
Skimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S EliotSkimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S Eliot
 
MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
 
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...
 
SWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptxSWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptx
 
Bonku-Babus-Friend by Sathyajith Ray (9)
Bonku-Babus-Friend by Sathyajith Ray  (9)Bonku-Babus-Friend by Sathyajith Ray  (9)
Bonku-Babus-Friend by Sathyajith Ray (9)
 
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptxRESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
 
CIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdfCIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdf
 
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptxBIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
 
Stack Memory Organization of 8086 Microprocessor
Stack Memory Organization of 8086 MicroprocessorStack Memory Organization of 8086 Microprocessor
Stack Memory Organization of 8086 Microprocessor
 
Data Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsxData Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsx
 
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
 

Map Reduce

  • 1. © RAGINIJAIN CC SA 4.0 Ragini Jain MSc CA 1st Year (2015 - 2017) Map Reduce
  • 2. © RAGINIJAIN CC SA 4.0 Overview ● What is Map Reduce ● Map Reduce schematic ● Map Reduce in detail ● Comparison of Map Reduce models ● Demo ● References
  • 3. © RAGINIJAIN CC SA 4.0 What is Map Reduce ● A software framework which supports – Parallel – Distributed computing on large data sets. ● The framework abstracts the data flow of running a parallel program on a distributed computing system by providing users with two interfaces in the form of functions: – Map – Reduce ● Users can control and manipulate the data flow of their programs by overriding the Map() and Reduce() function ● Map Reduce library is the controller.
  • 4. © RAGINIJAIN CC SA 4.0 Map – reduce schematic Source: jeremykyun
  • 5. © RAGINIJAIN CC SA 4.0 Map – reduce schematic (2) Source: hadoop project
  • 6. © RAGINIJAIN CC SA 4.0 Map Reduce (in detail) ● The Map function is applied in parallel to every input (key, value) pair and produces new set of intermediate (key, value) pairs (key1, val1) ------(map function)---> List (key2, val2) ● Then the MapReduce library collects all the produced intermediate (key, value) pairs from all input (key, val) pairs and sorts them based on the key part ● Finally Reduce function is applied in parallel to each group producing the collection of values (key2, List(val2)) -----(reduce function) ---> List (val2)
  • 7. © RAGINIJAIN CC SA 4.0 Map Reduce (as a query framework) ● SQL clauses that are the building block for Map Reduce operations on structured data and data warehouses – GROUP BY – ORDER BY ● On a very large set of demographic data          SELECT age, AVG(contacts)              FROM social.person          GROUP BY age          ORDER BY age
  • 8. © RAGINIJAIN CC SA 4.0 GROUP BY (SQL vs Pig)
  • 9. © RAGINIJAIN CC SA 4.0 Comparison Map Reduce models ● Google Map Reduce – Prog Model: Map Reduce – Data handling: Google file system ● Apache Hadoop – Prog Model: Map Reduce – Data Handling: HDFS (Hadoop Distributed File system) ● Microsoft Dryad – Prog Model: DAG (Directed Acyclic Graph) execution – Data Handling: Shared directories, Local disks ● Twister – Prog Model: Iterative Map Reduce – Data Handling: Local disks
  • 10. © RAGINIJAIN CC SA 4.0 Demo ● Java program – Utilizes concepts from Java 8 programming language platform. ● Lambda expressions ● Streams – JDK ref ● java.util.Collection.stream() ● java.lang.Iterable.forEach( ) ● java.util.List
  • 11. © RAGINIJAIN CC SA 4.0 References ● Jeffrey Dean et' al MapReduce: Simplified Data Processing on Large Clusters http://research.google.com/archive/mapreduce.html ● Michelle Stonebraker et' al MapReduce and Parallel DBMSs: Friends or Foes ? http://dl.acm.org/citation.cfm?id=1629197 ● Java Lambda expressions https://docs.oracle.com/javase/tutorial/java/javaOO/lambdaexp ● PostgreSQL GROUP BY and ORDER BY  http://www.postgresql.org/docs/devel/static/sql­select.html
  • 12. © RAGINIJAIN CC SA 4.0 Thank you. ● Questions ● Clarifications ● Suggestions ● Feedback Ragini Jain 15030142023@sicsr.ac.in