SlideShare a Scribd company logo
1 of 15
MapReduce
Functional Programming Map Apply function to transform elements of a list  Return results as a list Reduce Apply function to all elements of a list Collect results and return as a single value
Definition MapReduce: A software framework to support processing of massive data sets across distributed computers
SAMPLE USE CASE Back end credit card processor Nightly processing of millions of transactions Processing requires grouping, sorting, and merchant wide analysis Can’t just divide the over all list into equal parts as further analysis is necessary Tight processing window
Description Simple, powerful programming model Language independent Can run on a single machine, but shines for distributed computing and extreme datasets Break down the processing problem into embarrassingly parallel atomic operations
Algorithm Map Phase  Raw data analyzed and converted to name/value pair Shuffle Phase All name/value pairs are sorted and grouped by their keys Reduce Phase All values associated with a key are processed for results
MapReduce Walk Through Goal: Construct a word frequency of all the words in Wikipedia
Step 0: Split Data Raw input data divided into N parts N > number of machines Split must be context specific
Step 1: Map Each machine takes/receives a single slice of the raw input for mapping The map function processes the input file and emits a name/value pair of the relevant data
STEP 2: Shuffle The results of the map phase are sorted and grouped by the key in each key value pair.
STEP 3: Reduce Results from shuffle phase divided into M parts M >number of machines Each machine runs a reduction method on a part of shuffle results.
MAPREDUCE BENEFITS Scale Processing speed increases with number of machines involved Reliable Loss of any one machine doesn’t stop processing Cost Often built from heterogeneous commodity grade computers
Use Case Results Processing time of 1 million records  Originally ~3 hours Reduced to 40 minutes on 5 computers
Other MapReduce installations Google – Index building Visa – Transaction Processing Facebook – Facebook Lexicon Intelligence Community Yahoo/Google – Terabyte Sort 10 billion, 100 byte records Yahoo: 910 nodes, 206 seconds Google: ~1,000 nodes, 68 seconds
Questions

More Related Content

What's hot

Automating Engineering with FME
Automating Engineering with FMEAutomating Engineering with FME
Automating Engineering with FMESafe Software
 
FME in Support of DOT Data Culture
FME in Support of DOT Data CultureFME in Support of DOT Data Culture
FME in Support of DOT Data CultureSafe Software
 
MAXITHERMAL SOFTWARE FOR USB LOGGERS : WITH MARATHON PRODUCTS
MAXITHERMAL SOFTWARE FOR USB LOGGERS : WITH MARATHON PRODUCTS MAXITHERMAL SOFTWARE FOR USB LOGGERS : WITH MARATHON PRODUCTS
MAXITHERMAL SOFTWARE FOR USB LOGGERS : WITH MARATHON PRODUCTS Marathon Products, Inc
 
LUMASS - a Spatial System Dynamics Modelling Framework
LUMASS - a Spatial System Dynamics Modelling FrameworkLUMASS - a Spatial System Dynamics Modelling Framework
LUMASS - a Spatial System Dynamics Modelling FrameworkAlexander Herzig
 
Web Based GIS LeadGen Introduction
Web Based GIS LeadGen IntroductionWeb Based GIS LeadGen Introduction
Web Based GIS LeadGen IntroductionInteractiveGIS
 
Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...
Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...
Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...Safe Software
 
ArcGIS Bivariate Mapping Tools
ArcGIS Bivariate Mapping ToolsArcGIS Bivariate Mapping Tools
ArcGIS Bivariate Mapping ToolsAileen Buckley
 
Repartition join in mapreduce
Repartition join in mapreduceRepartition join in mapreduce
Repartition join in mapreduceUday Vakalapudi
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentationAhmad El Tawil
 
ArcGIS Extensions
ArcGIS ExtensionsArcGIS Extensions
ArcGIS ExtensionsEsri
 
Utilities Industry Success Stories with FME
Utilities Industry Success Stories with FME Utilities Industry Success Stories with FME
Utilities Industry Success Stories with FME Safe Software
 
From 2D Drawings to 3D Navigation networks built with FME
From 2D Drawings to 3D Navigation networks built with FMEFrom 2D Drawings to 3D Navigation networks built with FME
From 2D Drawings to 3D Navigation networks built with FMESafe Software
 
Fdi extreme features
Fdi extreme featuresFdi extreme features
Fdi extreme features360cell
 
Hadoop MapReduce joins
Hadoop MapReduce joinsHadoop MapReduce joins
Hadoop MapReduce joinsShalish VJ
 
8 Ways Utility Networks Can Meet Data Demands
8 Ways Utility Networks Can Meet Data Demands8 Ways Utility Networks Can Meet Data Demands
8 Ways Utility Networks Can Meet Data DemandsSafe Software
 
Coordinate Systems in FME 101
Coordinate Systems in FME 101 Coordinate Systems in FME 101
Coordinate Systems in FME 101 Safe Software
 

What's hot (20)

Automating Engineering with FME
Automating Engineering with FMEAutomating Engineering with FME
Automating Engineering with FME
 
FME in Support of DOT Data Culture
FME in Support of DOT Data CultureFME in Support of DOT Data Culture
FME in Support of DOT Data Culture
 
MAXITHERMAL SOFTWARE FOR USB LOGGERS : WITH MARATHON PRODUCTS
MAXITHERMAL SOFTWARE FOR USB LOGGERS : WITH MARATHON PRODUCTS MAXITHERMAL SOFTWARE FOR USB LOGGERS : WITH MARATHON PRODUCTS
MAXITHERMAL SOFTWARE FOR USB LOGGERS : WITH MARATHON PRODUCTS
 
Hadoop Mapreduce joins
Hadoop Mapreduce joinsHadoop Mapreduce joins
Hadoop Mapreduce joins
 
LUMASS - a Spatial System Dynamics Modelling Framework
LUMASS - a Spatial System Dynamics Modelling FrameworkLUMASS - a Spatial System Dynamics Modelling Framework
LUMASS - a Spatial System Dynamics Modelling Framework
 
Web Based GIS LeadGen Introduction
Web Based GIS LeadGen IntroductionWeb Based GIS LeadGen Introduction
Web Based GIS LeadGen Introduction
 
QGIS Tutorial 2
QGIS Tutorial 2QGIS Tutorial 2
QGIS Tutorial 2
 
Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...
Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...
Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...
 
ArcGIS Bivariate Mapping Tools
ArcGIS Bivariate Mapping ToolsArcGIS Bivariate Mapping Tools
ArcGIS Bivariate Mapping Tools
 
Repartition join in mapreduce
Repartition join in mapreduceRepartition join in mapreduce
Repartition join in mapreduce
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
ArcGIS Extensions
ArcGIS ExtensionsArcGIS Extensions
ArcGIS Extensions
 
Utilities Industry Success Stories with FME
Utilities Industry Success Stories with FME Utilities Industry Success Stories with FME
Utilities Industry Success Stories with FME
 
Main map reduce
Main map reduceMain map reduce
Main map reduce
 
From 2D Drawings to 3D Navigation networks built with FME
From 2D Drawings to 3D Navigation networks built with FMEFrom 2D Drawings to 3D Navigation networks built with FME
From 2D Drawings to 3D Navigation networks built with FME
 
MDT7
MDT7MDT7
MDT7
 
Fdi extreme features
Fdi extreme featuresFdi extreme features
Fdi extreme features
 
Hadoop MapReduce joins
Hadoop MapReduce joinsHadoop MapReduce joins
Hadoop MapReduce joins
 
8 Ways Utility Networks Can Meet Data Demands
8 Ways Utility Networks Can Meet Data Demands8 Ways Utility Networks Can Meet Data Demands
8 Ways Utility Networks Can Meet Data Demands
 
Coordinate Systems in FME 101
Coordinate Systems in FME 101 Coordinate Systems in FME 101
Coordinate Systems in FME 101
 

Viewers also liked

Energy For One World- The Book- updated version (December 2012)
Energy For One World- The Book- updated version (December 2012)Energy For One World- The Book- updated version (December 2012)
Energy For One World- The Book- updated version (December 2012)Energy for One World
 
Maintaining Body Balance Through Probiotics By Mr. Hussian Sabuwala
Maintaining Body Balance Through Probiotics By Mr. Hussian SabuwalaMaintaining Body Balance Through Probiotics By Mr. Hussian Sabuwala
Maintaining Body Balance Through Probiotics By Mr. Hussian SabuwalaHealth Education Library for People
 
Conservation of natural resources A Presentation By Mr Allah Dad Khan Former...
Conservation of natural resources  A Presentation By Mr Allah Dad Khan Former...Conservation of natural resources  A Presentation By Mr Allah Dad Khan Former...
Conservation of natural resources A Presentation By Mr Allah Dad Khan Former...Mr.Allah Dad Khan
 
EFOW Performance Platforms- Plan Frame
EFOW Performance Platforms- Plan FrameEFOW Performance Platforms- Plan Frame
EFOW Performance Platforms- Plan FrameEnergy for One World
 
презентация пятница (1)
презентация пятница (1)презентация пятница (1)
презентация пятница (1)yuyukul
 
An Ecological View on Process Improvement: Some Thoughts for Improving Proces...
An Ecological View on Process Improvement: Some Thoughts for Improving Proces...An Ecological View on Process Improvement: Some Thoughts for Improving Proces...
An Ecological View on Process Improvement: Some Thoughts for Improving Proces...Luigi Buglione
 
The LEGO Maturity & Capability Model Approach
The LEGO Maturity & Capability Model ApproachThe LEGO Maturity & Capability Model Approach
The LEGO Maturity & Capability Model ApproachLuigi Buglione
 
Employee onboarding and employee engagement in it organizations human resourc...
Employee onboarding and employee engagement in it organizations human resourc...Employee onboarding and employee engagement in it organizations human resourc...
Employee onboarding and employee engagement in it organizations human resourc...AKSHAY KHATRI
 
презентация лето 13 группа
презентация лето 13 группапрезентация лето 13 группа
презентация лето 13 группаyuyukul
 
Banking Trends for 2016
Banking Trends for 2016Banking Trends for 2016
Banking Trends for 2016Capgemini
 
кожни систем човека
кожни систем човекакожни систем човека
кожни систем човекаMaja Simic
 
Skeletni sistem čoveka
Skeletni sistem čoveka Skeletni sistem čoveka
Skeletni sistem čoveka Ena Horvat
 
THE SCIENCE BEHIND EFFECTIVE FACEBOOK AD CAMPAIGNS
THE SCIENCE BEHIND EFFECTIVE FACEBOOK AD CAMPAIGNSTHE SCIENCE BEHIND EFFECTIVE FACEBOOK AD CAMPAIGNS
THE SCIENCE BEHIND EFFECTIVE FACEBOOK AD CAMPAIGNSunfunnel
 
Measuring Success on Facebook, Twitter & LinkedIn
Measuring Success on Facebook, Twitter & LinkedInMeasuring Success on Facebook, Twitter & LinkedIn
Measuring Success on Facebook, Twitter & LinkedInBrian Honigman
 

Viewers also liked (20)

Energy For One World- The Book- updated version (December 2012)
Energy For One World- The Book- updated version (December 2012)Energy For One World- The Book- updated version (December 2012)
Energy For One World- The Book- updated version (December 2012)
 
My favourite topic
My favourite topicMy favourite topic
My favourite topic
 
Maintaining Body Balance Through Probiotics By Mr. Hussian Sabuwala
Maintaining Body Balance Through Probiotics By Mr. Hussian SabuwalaMaintaining Body Balance Through Probiotics By Mr. Hussian Sabuwala
Maintaining Body Balance Through Probiotics By Mr. Hussian Sabuwala
 
Can Water Heal? By Ms. Anu Mehta
Can Water Heal? By Ms. Anu MehtaCan Water Heal? By Ms. Anu Mehta
Can Water Heal? By Ms. Anu Mehta
 
Conservation of natural resources A Presentation By Mr Allah Dad Khan Former...
Conservation of natural resources  A Presentation By Mr Allah Dad Khan Former...Conservation of natural resources  A Presentation By Mr Allah Dad Khan Former...
Conservation of natural resources A Presentation By Mr Allah Dad Khan Former...
 
EFOW Performance Platforms- Plan Frame
EFOW Performance Platforms- Plan FrameEFOW Performance Platforms- Plan Frame
EFOW Performance Platforms- Plan Frame
 
Presentation
PresentationPresentation
Presentation
 
презентация пятница (1)
презентация пятница (1)презентация пятница (1)
презентация пятница (1)
 
An Ecological View on Process Improvement: Some Thoughts for Improving Proces...
An Ecological View on Process Improvement: Some Thoughts for Improving Proces...An Ecological View on Process Improvement: Some Thoughts for Improving Proces...
An Ecological View on Process Improvement: Some Thoughts for Improving Proces...
 
The LEGO Maturity & Capability Model Approach
The LEGO Maturity & Capability Model ApproachThe LEGO Maturity & Capability Model Approach
The LEGO Maturity & Capability Model Approach
 
Hadoop Security
Hadoop SecurityHadoop Security
Hadoop Security
 
Employee onboarding and employee engagement in it organizations human resourc...
Employee onboarding and employee engagement in it organizations human resourc...Employee onboarding and employee engagement in it organizations human resourc...
Employee onboarding and employee engagement in it organizations human resourc...
 
Kekurangan zat makanan
Kekurangan zat makananKekurangan zat makanan
Kekurangan zat makanan
 
презентация лето 13 группа
презентация лето 13 группапрезентация лето 13 группа
презентация лето 13 группа
 
Banking Trends for 2016
Banking Trends for 2016Banking Trends for 2016
Banking Trends for 2016
 
кожни систем човека
кожни систем човекакожни систем човека
кожни систем човека
 
On Boarding Ppt
On Boarding PptOn Boarding Ppt
On Boarding Ppt
 
Skeletni sistem čoveka
Skeletni sistem čoveka Skeletni sistem čoveka
Skeletni sistem čoveka
 
THE SCIENCE BEHIND EFFECTIVE FACEBOOK AD CAMPAIGNS
THE SCIENCE BEHIND EFFECTIVE FACEBOOK AD CAMPAIGNSTHE SCIENCE BEHIND EFFECTIVE FACEBOOK AD CAMPAIGNS
THE SCIENCE BEHIND EFFECTIVE FACEBOOK AD CAMPAIGNS
 
Measuring Success on Facebook, Twitter & LinkedIn
Measuring Success on Facebook, Twitter & LinkedInMeasuring Success on Facebook, Twitter & LinkedIn
Measuring Success on Facebook, Twitter & LinkedIn
 

Similar to Map Reduce

Comparing Distributed Indexing To Mapreduce or Not?
Comparing Distributed Indexing To Mapreduce or Not?Comparing Distributed Indexing To Mapreduce or Not?
Comparing Distributed Indexing To Mapreduce or Not?TerrierTeam
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentationateeq ateeq
 
2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)anh tuan
 
Spatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use CasesSpatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use Casesmathieuraj
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reducerantav
 
Sawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsSawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsRobert Grossman
 
Mapreduce2008 cacm
Mapreduce2008 cacmMapreduce2008 cacm
Mapreduce2008 cacmlmphuong06
 
MapReduce: Ordering and Large-Scale Indexing on Large Clusters
MapReduce: Ordering and  Large-Scale Indexing on Large ClustersMapReduce: Ordering and  Large-Scale Indexing on Large Clusters
MapReduce: Ordering and Large-Scale Indexing on Large ClustersIRJET Journal
 
Download It
Download ItDownload It
Download Itbutest
 
High Throughput Data Analysis
High Throughput Data AnalysisHigh Throughput Data Analysis
High Throughput Data AnalysisJ Singh
 
High Performance Computing on NYC Yellow Taxi Data Set
High Performance Computing on NYC Yellow Taxi Data SetHigh Performance Computing on NYC Yellow Taxi Data Set
High Performance Computing on NYC Yellow Taxi Data SetParag Ahire
 
Big data & Hadoop
Big data & HadoopBig data & Hadoop
Big data & HadoopAhmed Gamil
 
FME Around the World
FME Around the WorldFME Around the World
FME Around the WorldSafe Software
 

Similar to Map Reduce (20)

Comparing Distributed Indexing To Mapreduce or Not?
Comparing Distributed Indexing To Mapreduce or Not?Comparing Distributed Indexing To Mapreduce or Not?
Comparing Distributed Indexing To Mapreduce or Not?
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
Map reduce
Map reduceMap reduce
Map reduce
 
2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)
 
Spatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use CasesSpatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use Cases
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 
Lecture 1 mapreduce
Lecture 1  mapreduceLecture 1  mapreduce
Lecture 1 mapreduce
 
Sawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsSawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data Clouds
 
Mapreduce2008 cacm
Mapreduce2008 cacmMapreduce2008 cacm
Mapreduce2008 cacm
 
MapReduce: Ordering and Large-Scale Indexing on Large Clusters
MapReduce: Ordering and  Large-Scale Indexing on Large ClustersMapReduce: Ordering and  Large-Scale Indexing on Large Clusters
MapReduce: Ordering and Large-Scale Indexing on Large Clusters
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Download It
Download ItDownload It
Download It
 
2 mapreduce-model-principles
2 mapreduce-model-principles2 mapreduce-model-principles
2 mapreduce-model-principles
 
High Throughput Data Analysis
High Throughput Data AnalysisHigh Throughput Data Analysis
High Throughput Data Analysis
 
MapReduce-Notes.pdf
MapReduce-Notes.pdfMapReduce-Notes.pdf
MapReduce-Notes.pdf
 
MapReduce.pptx
MapReduce.pptxMapReduce.pptx
MapReduce.pptx
 
High Performance Computing on NYC Yellow Taxi Data Set
High Performance Computing on NYC Yellow Taxi Data SetHigh Performance Computing on NYC Yellow Taxi Data Set
High Performance Computing on NYC Yellow Taxi Data Set
 
Big data & Hadoop
Big data & HadoopBig data & Hadoop
Big data & Hadoop
 
FME Around the World
FME Around the WorldFME Around the World
FME Around the World
 

Recently uploaded

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 

Recently uploaded (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 

Map Reduce

  • 2. Functional Programming Map Apply function to transform elements of a list Return results as a list Reduce Apply function to all elements of a list Collect results and return as a single value
  • 3. Definition MapReduce: A software framework to support processing of massive data sets across distributed computers
  • 4. SAMPLE USE CASE Back end credit card processor Nightly processing of millions of transactions Processing requires grouping, sorting, and merchant wide analysis Can’t just divide the over all list into equal parts as further analysis is necessary Tight processing window
  • 5. Description Simple, powerful programming model Language independent Can run on a single machine, but shines for distributed computing and extreme datasets Break down the processing problem into embarrassingly parallel atomic operations
  • 6. Algorithm Map Phase Raw data analyzed and converted to name/value pair Shuffle Phase All name/value pairs are sorted and grouped by their keys Reduce Phase All values associated with a key are processed for results
  • 7. MapReduce Walk Through Goal: Construct a word frequency of all the words in Wikipedia
  • 8. Step 0: Split Data Raw input data divided into N parts N > number of machines Split must be context specific
  • 9. Step 1: Map Each machine takes/receives a single slice of the raw input for mapping The map function processes the input file and emits a name/value pair of the relevant data
  • 10. STEP 2: Shuffle The results of the map phase are sorted and grouped by the key in each key value pair.
  • 11. STEP 3: Reduce Results from shuffle phase divided into M parts M >number of machines Each machine runs a reduction method on a part of shuffle results.
  • 12. MAPREDUCE BENEFITS Scale Processing speed increases with number of machines involved Reliable Loss of any one machine doesn’t stop processing Cost Often built from heterogeneous commodity grade computers
  • 13. Use Case Results Processing time of 1 million records Originally ~3 hours Reduced to 40 minutes on 5 computers
  • 14. Other MapReduce installations Google – Index building Visa – Transaction Processing Facebook – Facebook Lexicon Intelligence Community Yahoo/Google – Terabyte Sort 10 billion, 100 byte records Yahoo: 910 nodes, 206 seconds Google: ~1,000 nodes, 68 seconds