SlideShare a Scribd company logo
1 of 12
© RAGINIJAIN CC SA 4.0
Ragini Jain
MSc CA 1st
Year (2015 - 2017)
Map Reduce
© RAGINIJAIN CC SA 4.0
Overview
● What is Map Reduce
● Map Reduce schematic
● Map Reduce in detail
● Comparison of Map Reduce models
● Demo
● References
© RAGINIJAIN CC SA 4.0
What is Map Reduce
● A software framework which supports
– Parallel
– Distributed computing
on large data sets.
● The framework abstracts the data flow of running a parallel
program on a distributed computing system by providing users
with two interfaces in the form of functions:
– Map
– Reduce
● Users can control and manipulate the data flow of their programs
by overriding the Map() and Reduce() function
● Map Reduce library is the controller.
© RAGINIJAIN CC SA 4.0
Map – reduce schematic
Source: jeremykyun
© RAGINIJAIN CC SA 4.0
Map – reduce schematic (2)
Source: hadoop project
© RAGINIJAIN CC SA 4.0
Map Reduce (in detail)
● The Map function is applied in parallel to every input (key, value)
pair and produces new set of intermediate (key, value) pairs
(key1, val1) ------(map function)---> List (key2, val2)
● Then the MapReduce library collects all the produced intermediate
(key, value) pairs from all input (key, val) pairs and sorts them
based on the key part
● Finally Reduce function is applied in parallel to each group
producing the collection of values
(key2, List(val2)) -----(reduce function) ---> List (val2)
© RAGINIJAIN CC SA 4.0
Map Reduce (as a query framework)
● SQL clauses that are the building block for Map Reduce
operations on structured data and data warehouses
– GROUP BY
– ORDER BY
● On a very large set of demographic data
         SELECT age, AVG(contacts)
             FROM social.person
         GROUP BY age
         ORDER BY age
© RAGINIJAIN CC SA 4.0
GROUP BY (SQL vs Pig)
© RAGINIJAIN CC SA 4.0
Comparison Map Reduce models
● Google Map Reduce
– Prog Model: Map Reduce
– Data handling: Google file system
● Apache Hadoop
– Prog Model: Map Reduce
– Data Handling: HDFS (Hadoop Distributed File system)
● Microsoft Dryad
– Prog Model: DAG (Directed Acyclic Graph) execution
– Data Handling: Shared directories, Local disks
● Twister
– Prog Model: Iterative Map Reduce
– Data Handling: Local disks
© RAGINIJAIN CC SA 4.0
Demo
● Java program
– Utilizes concepts from Java 8 programming language platform.
● Lambda expressions
● Streams
– JDK ref
● java.util.Collection.stream()
● java.lang.Iterable.forEach( )
● java.util.List
© RAGINIJAIN CC SA 4.0
References
● Jeffrey Dean et' al
MapReduce: Simplified Data Processing on Large Clusters
http://research.google.com/archive/mapreduce.html
● Michelle Stonebraker et' al
MapReduce and Parallel DBMSs: Friends or Foes ?
http://dl.acm.org/citation.cfm?id=1629197
● Java Lambda expressions
https://docs.oracle.com/javase/tutorial/java/javaOO/lambdaexp
● PostgreSQL GROUP BY and ORDER BY 
http://www.postgresql.org/docs/devel/static/sql­select.html
© RAGINIJAIN CC SA 4.0
Thank you.
● Questions
● Clarifications
● Suggestions
● Feedback
Ragini Jain
15030142023@sicsr.ac.in

More Related Content

What's hot

QGIS UK: QGIS Evangelism (thinkWhere)
QGIS UK: QGIS Evangelism (thinkWhere)QGIS UK: QGIS Evangelism (thinkWhere)
QGIS UK: QGIS Evangelism (thinkWhere)Ross McDonald
 
Integrating PostGIS in Web Applications
Integrating PostGIS in Web ApplicationsIntegrating PostGIS in Web Applications
Integrating PostGIS in Web ApplicationsCommand Prompt., Inc
 
Introducing MagnetoDB, a key-value storage sevice for OpenStack
Introducing MagnetoDB, a key-value storage sevice for OpenStackIntroducing MagnetoDB, a key-value storage sevice for OpenStack
Introducing MagnetoDB, a key-value storage sevice for OpenStackMirantis
 
Nokia Asha webinar: Developing location-based services for Nokia Asha phones ...
Nokia Asha webinar: Developing location-based services for Nokia Asha phones ...Nokia Asha webinar: Developing location-based services for Nokia Asha phones ...
Nokia Asha webinar: Developing location-based services for Nokia Asha phones ...Microsoft Mobile Developer
 
Dash plotly data visualization
Dash plotly data visualizationDash plotly data visualization
Dash plotly data visualizationCharu Gupta
 
QGIS and Altas: Automatic map generation
QGIS and Altas: Automatic map generationQGIS and Altas: Automatic map generation
QGIS and Altas: Automatic map generationQGIS UK
 
Integrating CAD and GIS Data at Mineta San Jose International Airport
Integrating CAD and GIS Data at Mineta San Jose International AirportIntegrating CAD and GIS Data at Mineta San Jose International Airport
Integrating CAD and GIS Data at Mineta San Jose International Airportjeffhobbs
 
An End User Perspective on Implementing Oracle in the Engineering Environment
An End User Perspective on Implementing Oracle in the Engineering EnvironmentAn End User Perspective on Implementing Oracle in the Engineering Environment
An End User Perspective on Implementing Oracle in the Engineering Environmentjeffhobbs
 
City of Roseville Case Study
City of Roseville Case StudyCity of Roseville Case Study
City of Roseville Case Studyjeffhobbs
 
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...BigData_Europe
 
The Whitebox Geospatial-Analyisis Tools Project and Open-Access GIS
The Whitebox Geospatial-Analyisis Tools Project and Open-Access GISThe Whitebox Geospatial-Analyisis Tools Project and Open-Access GIS
The Whitebox Geospatial-Analyisis Tools Project and Open-Access GISGolgi Alvarez
 
Enriching data by_cooking_recipes_in_cloud_dataprep
Enriching data by_cooking_recipes_in_cloud_dataprepEnriching data by_cooking_recipes_in_cloud_dataprep
Enriching data by_cooking_recipes_in_cloud_dataprepSupriya Badgujar
 
Location based services for Nokia X and Nokia Asha using Geo2tag
Location based services for Nokia X and Nokia Asha using Geo2tagLocation based services for Nokia X and Nokia Asha using Geo2tag
Location based services for Nokia X and Nokia Asha using Geo2tagMicrosoft Mobile Developer
 
New opensource geospatial software stack from NextGIS
New opensource geospatial software stack from NextGISNew opensource geospatial software stack from NextGIS
New opensource geospatial software stack from NextGISMaxim Dubinin
 
Designing and Using Cached Map
Designing and Using Cached Map Designing and Using Cached Map
Designing and Using Cached Map M.Muneeb Ashraf
 
Mago3D Barcelona ICGC(카탈루니아 지형 및 지질연구소) 발표자료
Mago3D Barcelona ICGC(카탈루니아 지형 및 지질연구소) 발표자료Mago3D Barcelona ICGC(카탈루니아 지형 및 지질연구소) 발표자료
Mago3D Barcelona ICGC(카탈루니아 지형 및 지질연구소) 발표자료BJ Jang
 
GraphQL & DGraph with Go
GraphQL & DGraph with GoGraphQL & DGraph with Go
GraphQL & DGraph with GoJames Tan
 
Sistema de recomendación entiempo real usando Delta Lake
Sistema de recomendación entiempo real usando Delta LakeSistema de recomendación entiempo real usando Delta Lake
Sistema de recomendación entiempo real usando Delta LakeGlobant
 
Producing Linked Open Data with a Content Management System
Producing Linked Open Data with a Content Management SystemProducing Linked Open Data with a Content Management System
Producing Linked Open Data with a Content Management SystemOpen Knowledge Belgium
 

What's hot (20)

QGIS UK: QGIS Evangelism (thinkWhere)
QGIS UK: QGIS Evangelism (thinkWhere)QGIS UK: QGIS Evangelism (thinkWhere)
QGIS UK: QGIS Evangelism (thinkWhere)
 
Integrating PostGIS in Web Applications
Integrating PostGIS in Web ApplicationsIntegrating PostGIS in Web Applications
Integrating PostGIS in Web Applications
 
Introducing MagnetoDB, a key-value storage sevice for OpenStack
Introducing MagnetoDB, a key-value storage sevice for OpenStackIntroducing MagnetoDB, a key-value storage sevice for OpenStack
Introducing MagnetoDB, a key-value storage sevice for OpenStack
 
Nokia Asha webinar: Developing location-based services for Nokia Asha phones ...
Nokia Asha webinar: Developing location-based services for Nokia Asha phones ...Nokia Asha webinar: Developing location-based services for Nokia Asha phones ...
Nokia Asha webinar: Developing location-based services for Nokia Asha phones ...
 
Dash plotly data visualization
Dash plotly data visualizationDash plotly data visualization
Dash plotly data visualization
 
QGIS and Altas: Automatic map generation
QGIS and Altas: Automatic map generationQGIS and Altas: Automatic map generation
QGIS and Altas: Automatic map generation
 
Integrating CAD and GIS Data at Mineta San Jose International Airport
Integrating CAD and GIS Data at Mineta San Jose International AirportIntegrating CAD and GIS Data at Mineta San Jose International Airport
Integrating CAD and GIS Data at Mineta San Jose International Airport
 
Streaming in the Extreme
Streaming in the ExtremeStreaming in the Extreme
Streaming in the Extreme
 
An End User Perspective on Implementing Oracle in the Engineering Environment
An End User Perspective on Implementing Oracle in the Engineering EnvironmentAn End User Perspective on Implementing Oracle in the Engineering Environment
An End User Perspective on Implementing Oracle in the Engineering Environment
 
City of Roseville Case Study
City of Roseville Case StudyCity of Roseville Case Study
City of Roseville Case Study
 
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
 
The Whitebox Geospatial-Analyisis Tools Project and Open-Access GIS
The Whitebox Geospatial-Analyisis Tools Project and Open-Access GISThe Whitebox Geospatial-Analyisis Tools Project and Open-Access GIS
The Whitebox Geospatial-Analyisis Tools Project and Open-Access GIS
 
Enriching data by_cooking_recipes_in_cloud_dataprep
Enriching data by_cooking_recipes_in_cloud_dataprepEnriching data by_cooking_recipes_in_cloud_dataprep
Enriching data by_cooking_recipes_in_cloud_dataprep
 
Location based services for Nokia X and Nokia Asha using Geo2tag
Location based services for Nokia X and Nokia Asha using Geo2tagLocation based services for Nokia X and Nokia Asha using Geo2tag
Location based services for Nokia X and Nokia Asha using Geo2tag
 
New opensource geospatial software stack from NextGIS
New opensource geospatial software stack from NextGISNew opensource geospatial software stack from NextGIS
New opensource geospatial software stack from NextGIS
 
Designing and Using Cached Map
Designing and Using Cached Map Designing and Using Cached Map
Designing and Using Cached Map
 
Mago3D Barcelona ICGC(카탈루니아 지형 및 지질연구소) 발표자료
Mago3D Barcelona ICGC(카탈루니아 지형 및 지질연구소) 발표자료Mago3D Barcelona ICGC(카탈루니아 지형 및 지질연구소) 발표자료
Mago3D Barcelona ICGC(카탈루니아 지형 및 지질연구소) 발표자료
 
GraphQL & DGraph with Go
GraphQL & DGraph with GoGraphQL & DGraph with Go
GraphQL & DGraph with Go
 
Sistema de recomendación entiempo real usando Delta Lake
Sistema de recomendación entiempo real usando Delta LakeSistema de recomendación entiempo real usando Delta Lake
Sistema de recomendación entiempo real usando Delta Lake
 
Producing Linked Open Data with a Content Management System
Producing Linked Open Data with a Content Management SystemProducing Linked Open Data with a Content Management System
Producing Linked Open Data with a Content Management System
 

Similar to Map Reduce Framework Overview and Implementation in Java

MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISONMAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISONijcsit
 
Stratosphere with big_data_analytics
Stratosphere with big_data_analyticsStratosphere with big_data_analytics
Stratosphere with big_data_analyticsAvinash Pandu
 
Download It
Download ItDownload It
Download Itbutest
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsConnected Data World
 
Spark Driven Big Data Analytics
Spark Driven Big Data AnalyticsSpark Driven Big Data Analytics
Spark Driven Big Data Analyticsinoshg
 
Benchmarking tool for graph algorithms
Benchmarking tool for graph algorithmsBenchmarking tool for graph algorithms
Benchmarking tool for graph algorithmsYash Khandelwal
 
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォームPivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォームMasayuki Matsushita
 
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
A Survey on Data Mapping Strategy for data stored in the storage cloud  111A Survey on Data Mapping Strategy for data stored in the storage cloud  111
A Survey on Data Mapping Strategy for data stored in the storage cloud 111NavNeet KuMar
 
An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache SparkAmir Sedighi
 
Introduction to GCP Data Flow Presentation
Introduction to GCP Data Flow PresentationIntroduction to GCP Data Flow Presentation
Introduction to GCP Data Flow PresentationKnoldus Inc.
 
Introduction to GCP DataFlow Presentation
Introduction to GCP DataFlow PresentationIntroduction to GCP DataFlow Presentation
Introduction to GCP DataFlow PresentationKnoldus Inc.
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Anant Corporation
 
Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsScyllaDB
 
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020Mariano Gonzalez
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Tugdual Grall
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataeSAT Publishing House
 
Dsm Presentation
Dsm PresentationDsm Presentation
Dsm Presentationrichoe
 
Spark cluster computing with working sets
Spark cluster computing with working setsSpark cluster computing with working sets
Spark cluster computing with working setsJinxinTang
 

Similar to Map Reduce Framework Overview and Implementation in Java (20)

MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISONMAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
 
Stratosphere with big_data_analytics
Stratosphere with big_data_analyticsStratosphere with big_data_analytics
Stratosphere with big_data_analytics
 
Download It
Download ItDownload It
Download It
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
 
Spark Driven Big Data Analytics
Spark Driven Big Data AnalyticsSpark Driven Big Data Analytics
Spark Driven Big Data Analytics
 
Benchmarking tool for graph algorithms
Benchmarking tool for graph algorithmsBenchmarking tool for graph algorithms
Benchmarking tool for graph algorithms
 
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォームPivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
 
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
A Survey on Data Mapping Strategy for data stored in the storage cloud  111A Survey on Data Mapping Strategy for data stored in the storage cloud  111
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
 
An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache Spark
 
Introduction to GCP Data Flow Presentation
Introduction to GCP Data Flow PresentationIntroduction to GCP Data Flow Presentation
Introduction to GCP Data Flow Presentation
 
Introduction to GCP DataFlow Presentation
Introduction to GCP DataFlow PresentationIntroduction to GCP DataFlow Presentation
Introduction to GCP DataFlow Presentation
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
 
Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data Platforms
 
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big data
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
Dsm Presentation
Dsm PresentationDsm Presentation
Dsm Presentation
 
Spark cluster computing with working sets
Spark cluster computing with working setsSpark cluster computing with working sets
Spark cluster computing with working sets
 

Recently uploaded

Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxAnupam32727
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
CHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxCHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxAneriPatwari
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptxmary850239
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfChristalin Nelson
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesVijayaLaxmi84
 

Recently uploaded (20)

Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
CHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxCHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptx
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdf
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their uses
 

Map Reduce Framework Overview and Implementation in Java

  • 1. © RAGINIJAIN CC SA 4.0 Ragini Jain MSc CA 1st Year (2015 - 2017) Map Reduce
  • 2. © RAGINIJAIN CC SA 4.0 Overview ● What is Map Reduce ● Map Reduce schematic ● Map Reduce in detail ● Comparison of Map Reduce models ● Demo ● References
  • 3. © RAGINIJAIN CC SA 4.0 What is Map Reduce ● A software framework which supports – Parallel – Distributed computing on large data sets. ● The framework abstracts the data flow of running a parallel program on a distributed computing system by providing users with two interfaces in the form of functions: – Map – Reduce ● Users can control and manipulate the data flow of their programs by overriding the Map() and Reduce() function ● Map Reduce library is the controller.
  • 4. © RAGINIJAIN CC SA 4.0 Map – reduce schematic Source: jeremykyun
  • 5. © RAGINIJAIN CC SA 4.0 Map – reduce schematic (2) Source: hadoop project
  • 6. © RAGINIJAIN CC SA 4.0 Map Reduce (in detail) ● The Map function is applied in parallel to every input (key, value) pair and produces new set of intermediate (key, value) pairs (key1, val1) ------(map function)---> List (key2, val2) ● Then the MapReduce library collects all the produced intermediate (key, value) pairs from all input (key, val) pairs and sorts them based on the key part ● Finally Reduce function is applied in parallel to each group producing the collection of values (key2, List(val2)) -----(reduce function) ---> List (val2)
  • 7. © RAGINIJAIN CC SA 4.0 Map Reduce (as a query framework) ● SQL clauses that are the building block for Map Reduce operations on structured data and data warehouses – GROUP BY – ORDER BY ● On a very large set of demographic data          SELECT age, AVG(contacts)              FROM social.person          GROUP BY age          ORDER BY age
  • 8. © RAGINIJAIN CC SA 4.0 GROUP BY (SQL vs Pig)
  • 9. © RAGINIJAIN CC SA 4.0 Comparison Map Reduce models ● Google Map Reduce – Prog Model: Map Reduce – Data handling: Google file system ● Apache Hadoop – Prog Model: Map Reduce – Data Handling: HDFS (Hadoop Distributed File system) ● Microsoft Dryad – Prog Model: DAG (Directed Acyclic Graph) execution – Data Handling: Shared directories, Local disks ● Twister – Prog Model: Iterative Map Reduce – Data Handling: Local disks
  • 10. © RAGINIJAIN CC SA 4.0 Demo ● Java program – Utilizes concepts from Java 8 programming language platform. ● Lambda expressions ● Streams – JDK ref ● java.util.Collection.stream() ● java.lang.Iterable.forEach( ) ● java.util.List
  • 11. © RAGINIJAIN CC SA 4.0 References ● Jeffrey Dean et' al MapReduce: Simplified Data Processing on Large Clusters http://research.google.com/archive/mapreduce.html ● Michelle Stonebraker et' al MapReduce and Parallel DBMSs: Friends or Foes ? http://dl.acm.org/citation.cfm?id=1629197 ● Java Lambda expressions https://docs.oracle.com/javase/tutorial/java/javaOO/lambdaexp ● PostgreSQL GROUP BY and ORDER BY  http://www.postgresql.org/docs/devel/static/sql­select.html
  • 12. © RAGINIJAIN CC SA 4.0 Thank you. ● Questions ● Clarifications ● Suggestions ● Feedback Ragini Jain 15030142023@sicsr.ac.in