SlideShare a Scribd company logo
1 of 8
CSci 5707, Fall 2013
MapReduce
vs.
Parallel DBMS
Hamid Safizadeh, Otelia Buffington
University of Minnesota
2
MapReduce Idea
 Mapping
map (k1, v1)
 list (k2, v2)
 Reducing
reduce (k2, list(v2))
 list (v2)
Pseudo-code for counting the number of occurrences of each
word in a large collection of documents
Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simplified Data Processing on Large Clustering, OSDI’08
3
MapReduce Example
Calculation of the number of occurrences of each word
http://aimotion.blogspot.com/2010/08/mapreduce-with-mongodb-and-python.html
4
MapReduce Architecture
Execution overview
Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simplified Data Processing on Large Clustering, OSDI’08
5
MapReduce or Parallel DBMS
 Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., and
Stonebraker, M., “A comparison of approaches to large-scale data analysis”,
ACM SIGMOD International Conference, 2009
(http://database.cs.brown.edu/projects/mapreduce-vs-dbms)
 Dean, J., and Ghemawat, S., “MapReduce: A flexible data processing tool”,
Communications of the ACM, Vol. 53, 2010 (DOI: 10.1145/1629175.1629198)
MapReduce Design Properties
6
 Heterogeneous Systems
 Processing and combining data from a wide variety of storage systems
(such as relational databases, file systems, etc.)
 Fault Tolerance
 Providing fine-grain fault tolerance for large jobs (Failure in middle of a
multi-hour execution does not require restarting the job from scratch)
 Complex Functions
 Simple Map and Reduce functions with straightforward SQL equivalents
 Offering a better framework for some complicated tasks
Jeffrey Dean and Sanjay Ghemawat, MapReduce: A Flexible Data Processing Tool, Communications of the ACM, Vol. 53, 2010
MapReduce Design Properties
7
 Performance
 Loading data: Startup overhead for MapReduce
 Reading data: Full scan over large data files
 Merging results: A MapReduce as the next consumer
Jeffrey Dean and Sanjay Ghemawat, MapReduce: A Flexible Data Processing Tool, Communications of the ACM, Vol. 53, 2010
 Cost
 Hardware: Network workstations
 Software: Open source (Hodoop)
 Communication: Network system
Companies Using Hodoop
8
 Facebook
 Yahoo!
 Google
 Amazon
 Twitter

More Related Content

Similar to 7051461.ppt

Data Dimensional Reduction by Order Prediction in Heterogeneous Environment
Data Dimensional Reduction by Order Prediction in Heterogeneous EnvironmentData Dimensional Reduction by Order Prediction in Heterogeneous Environment
Data Dimensional Reduction by Order Prediction in Heterogeneous Environment
Association of Scientists, Developers and Faculties
 
Application-Aware Big Data Deduplication in Cloud Environment
Application-Aware Big Data Deduplication in Cloud EnvironmentApplication-Aware Big Data Deduplication in Cloud Environment
Application-Aware Big Data Deduplication in Cloud Environment
Safayet Hossain
 
benchmarks-sigmod09
benchmarks-sigmod09benchmarks-sigmod09
benchmarks-sigmod09
Hiroshi Ono
 
Qo s aware scientific application scheduling algorithm in cloud environment
Qo s aware scientific application scheduling algorithm in cloud environmentQo s aware scientific application scheduling algorithm in cloud environment
Qo s aware scientific application scheduling algorithm in cloud environment
Alexander Decker
 
Presentation
PresentationPresentation
Presentation
butest
 

Similar to 7051461.ppt (20)

Survey on load balancing and data skew mitigation in mapreduce applications
Survey on load balancing and data skew mitigation in mapreduce applicationsSurvey on load balancing and data skew mitigation in mapreduce applications
Survey on load balancing and data skew mitigation in mapreduce applications
 
Data Dimensional Reduction by Order Prediction in Heterogeneous Environment
Data Dimensional Reduction by Order Prediction in Heterogeneous EnvironmentData Dimensional Reduction by Order Prediction in Heterogeneous Environment
Data Dimensional Reduction by Order Prediction in Heterogeneous Environment
 
Application-Aware Big Data Deduplication in Cloud Environment
Application-Aware Big Data Deduplication in Cloud EnvironmentApplication-Aware Big Data Deduplication in Cloud Environment
Application-Aware Big Data Deduplication in Cloud Environment
 
benchmarks-sigmod09
benchmarks-sigmod09benchmarks-sigmod09
benchmarks-sigmod09
 
Architecture and Performance of Runtime Environments for Data Intensive Scala...
Architecture and Performance of Runtime Environments for Data Intensive Scala...Architecture and Performance of Runtime Environments for Data Intensive Scala...
Architecture and Performance of Runtime Environments for Data Intensive Scala...
 
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...
 
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...
 
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniques
 
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
 
Paper summary
Paper summaryPaper summary
Paper summary
 
Ppt for paper id 696 a review of hybrid data mining algorithm for big data mi...
Ppt for paper id 696 a review of hybrid data mining algorithm for big data mi...Ppt for paper id 696 a review of hybrid data mining algorithm for big data mi...
Ppt for paper id 696 a review of hybrid data mining algorithm for big data mi...
 
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame WorkA Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
 
Qo s aware scientific application scheduling algorithm in cloud environment
Qo s aware scientific application scheduling algorithm in cloud environmentQo s aware scientific application scheduling algorithm in cloud environment
Qo s aware scientific application scheduling algorithm in cloud environment
 
Qo s aware scientific application scheduling algorithm in cloud environment
Qo s aware scientific application scheduling algorithm in cloud environmentQo s aware scientific application scheduling algorithm in cloud environment
Qo s aware scientific application scheduling algorithm in cloud environment
 
Presentation
PresentationPresentation
Presentation
 
survey paper 2
survey paper 2survey paper 2
survey paper 2
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big data
 
A data aware caching 2415
A data aware caching 2415A data aware caching 2415
A data aware caching 2415
 
Shortest path estimation for graph
Shortest path estimation for graphShortest path estimation for graph
Shortest path estimation for graph
 
Information processing architectures
Information processing architecturesInformation processing architectures
Information processing architectures
 

More from BkesNar

Bioinformatics-3-sequence_analysis_yildiz
Bioinformatics-3-sequence_analysis_yildizBioinformatics-3-sequence_analysis_yildiz
Bioinformatics-3-sequence_analysis_yildiz
BkesNar
 
Bioinformatics-4-yildiz_teknik_universitesi
Bioinformatics-4-yildiz_teknik_universitesiBioinformatics-4-yildiz_teknik_universitesi
Bioinformatics-4-yildiz_teknik_universitesi
BkesNar
 
Biyoinformatik-1-Biyoinformatiğe Giriş.pptx
Biyoinformatik-1-Biyoinformatiğe Giriş.pptxBiyoinformatik-1-Biyoinformatiğe Giriş.pptx
Biyoinformatik-1-Biyoinformatiğe Giriş.pptx
BkesNar
 
The GAPDH redox switch safeguards reductive capacity and enables survival of ...
The GAPDH redox switch safeguards reductive capacity and enables survival of ...The GAPDH redox switch safeguards reductive capacity and enables survival of ...
The GAPDH redox switch safeguards reductive capacity and enables survival of ...
BkesNar
 
Glucose Metabolism in Cancer The Warburg Effect and Beyond.pdf
Glucose Metabolism in Cancer The Warburg Effect and Beyond.pdfGlucose Metabolism in Cancer The Warburg Effect and Beyond.pdf
Glucose Metabolism in Cancer The Warburg Effect and Beyond.pdf
BkesNar
 
Ageing-associated changes in transcriptional elongation influence longevity.pdf
Ageing-associated changes in transcriptional elongation influence longevity.pdfAgeing-associated changes in transcriptional elongation influence longevity.pdf
Ageing-associated changes in transcriptional elongation influence longevity.pdf
BkesNar
 
Janesick-2022-High-resolution-mapping-of-the-brea.pdf
Janesick-2022-High-resolution-mapping-of-the-brea.pdfJanesick-2022-High-resolution-mapping-of-the-brea.pdf
Janesick-2022-High-resolution-mapping-of-the-brea.pdf
BkesNar
 
PIIS109727652300151X.pdf
PIIS109727652300151X.pdfPIIS109727652300151X.pdf
PIIS109727652300151X.pdf
BkesNar
 
Functional neurological restoration of amputated peripheral nerve using biohy...
Functional neurological restoration of amputated peripheral nerve using biohy...Functional neurological restoration of amputated peripheral nerve using biohy...
Functional neurological restoration of amputated peripheral nerve using biohy...
BkesNar
 
Identifying the wide diversity of extraterrestrial purine and pyrimidine nucl...
Identifying the wide diversity of extraterrestrial purine and pyrimidine nucl...Identifying the wide diversity of extraterrestrial purine and pyrimidine nucl...
Identifying the wide diversity of extraterrestrial purine and pyrimidine nucl...
BkesNar
 
0953946809106235.pdf
0953946809106235.pdf0953946809106235.pdf
0953946809106235.pdf
BkesNar
 

More from BkesNar (12)

Bioinformatics-3-sequence_analysis_yildiz
Bioinformatics-3-sequence_analysis_yildizBioinformatics-3-sequence_analysis_yildiz
Bioinformatics-3-sequence_analysis_yildiz
 
Bioinformatics-4-yildiz_teknik_universitesi
Bioinformatics-4-yildiz_teknik_universitesiBioinformatics-4-yildiz_teknik_universitesi
Bioinformatics-4-yildiz_teknik_universitesi
 
Biyoinformatics-5th_part_yildizteknikuni
Biyoinformatics-5th_part_yildizteknikuniBiyoinformatics-5th_part_yildizteknikuni
Biyoinformatics-5th_part_yildizteknikuni
 
Biyoinformatik-1-Biyoinformatiğe Giriş.pptx
Biyoinformatik-1-Biyoinformatiğe Giriş.pptxBiyoinformatik-1-Biyoinformatiğe Giriş.pptx
Biyoinformatik-1-Biyoinformatiğe Giriş.pptx
 
The GAPDH redox switch safeguards reductive capacity and enables survival of ...
The GAPDH redox switch safeguards reductive capacity and enables survival of ...The GAPDH redox switch safeguards reductive capacity and enables survival of ...
The GAPDH redox switch safeguards reductive capacity and enables survival of ...
 
Glucose Metabolism in Cancer The Warburg Effect and Beyond.pdf
Glucose Metabolism in Cancer The Warburg Effect and Beyond.pdfGlucose Metabolism in Cancer The Warburg Effect and Beyond.pdf
Glucose Metabolism in Cancer The Warburg Effect and Beyond.pdf
 
Ageing-associated changes in transcriptional elongation influence longevity.pdf
Ageing-associated changes in transcriptional elongation influence longevity.pdfAgeing-associated changes in transcriptional elongation influence longevity.pdf
Ageing-associated changes in transcriptional elongation influence longevity.pdf
 
Janesick-2022-High-resolution-mapping-of-the-brea.pdf
Janesick-2022-High-resolution-mapping-of-the-brea.pdfJanesick-2022-High-resolution-mapping-of-the-brea.pdf
Janesick-2022-High-resolution-mapping-of-the-brea.pdf
 
PIIS109727652300151X.pdf
PIIS109727652300151X.pdfPIIS109727652300151X.pdf
PIIS109727652300151X.pdf
 
Functional neurological restoration of amputated peripheral nerve using biohy...
Functional neurological restoration of amputated peripheral nerve using biohy...Functional neurological restoration of amputated peripheral nerve using biohy...
Functional neurological restoration of amputated peripheral nerve using biohy...
 
Identifying the wide diversity of extraterrestrial purine and pyrimidine nucl...
Identifying the wide diversity of extraterrestrial purine and pyrimidine nucl...Identifying the wide diversity of extraterrestrial purine and pyrimidine nucl...
Identifying the wide diversity of extraterrestrial purine and pyrimidine nucl...
 
0953946809106235.pdf
0953946809106235.pdf0953946809106235.pdf
0953946809106235.pdf
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 

7051461.ppt

  • 1. CSci 5707, Fall 2013 MapReduce vs. Parallel DBMS Hamid Safizadeh, Otelia Buffington University of Minnesota
  • 2. 2 MapReduce Idea  Mapping map (k1, v1)  list (k2, v2)  Reducing reduce (k2, list(v2))  list (v2) Pseudo-code for counting the number of occurrences of each word in a large collection of documents Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simplified Data Processing on Large Clustering, OSDI’08
  • 3. 3 MapReduce Example Calculation of the number of occurrences of each word http://aimotion.blogspot.com/2010/08/mapreduce-with-mongodb-and-python.html
  • 4. 4 MapReduce Architecture Execution overview Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simplified Data Processing on Large Clustering, OSDI’08
  • 5. 5 MapReduce or Parallel DBMS  Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., and Stonebraker, M., “A comparison of approaches to large-scale data analysis”, ACM SIGMOD International Conference, 2009 (http://database.cs.brown.edu/projects/mapreduce-vs-dbms)  Dean, J., and Ghemawat, S., “MapReduce: A flexible data processing tool”, Communications of the ACM, Vol. 53, 2010 (DOI: 10.1145/1629175.1629198)
  • 6. MapReduce Design Properties 6  Heterogeneous Systems  Processing and combining data from a wide variety of storage systems (such as relational databases, file systems, etc.)  Fault Tolerance  Providing fine-grain fault tolerance for large jobs (Failure in middle of a multi-hour execution does not require restarting the job from scratch)  Complex Functions  Simple Map and Reduce functions with straightforward SQL equivalents  Offering a better framework for some complicated tasks Jeffrey Dean and Sanjay Ghemawat, MapReduce: A Flexible Data Processing Tool, Communications of the ACM, Vol. 53, 2010
  • 7. MapReduce Design Properties 7  Performance  Loading data: Startup overhead for MapReduce  Reading data: Full scan over large data files  Merging results: A MapReduce as the next consumer Jeffrey Dean and Sanjay Ghemawat, MapReduce: A Flexible Data Processing Tool, Communications of the ACM, Vol. 53, 2010  Cost  Hardware: Network workstations  Software: Open source (Hodoop)  Communication: Network system
  • 8. Companies Using Hodoop 8  Facebook  Yahoo!  Google  Amazon  Twitter