SlideShare a Scribd company logo
Hadoop Map/Reduce

    Owen O’Malley
      July 2006
Map/Reduce Goals
– Distribution
   • The data is available where needed.
   • Application does not care how many computers
     are being used.
– Reliability
   • Application does not care that computers or
     networks may have temporary or permanent
     failures.


                                                   2
Application Perspective
• Define Mapper and Reducer classes and a
  “launching” program.
• Mapper
  – Is given a stream of key1,value1 pairs
  – Generates a stream of key2, value2 pairs
• Reducer
  – Is given a key2 and a stream of value2’s
  – Generates a stream of key3, value3 pairs
• Launching Program
  – Creates a JobConf to define a job.
  – Submits JobConf to JobTracker and waits for
    completion.                                   3
Application Dataflow




                       4
Input & Output Formats
• The application also chooses input and output
  formats, which define how the persistent data
  is read and written. These are interfaces and
  can be defined by the application.
• InputFormat
  – Splits the input to determine the input to each map
    task.
  – Defines a RecordReader that reads key, value
    pairs that are passed to the map task
• OutputFormat
  – Given the key, value pairs and a filename, writes
    the reduce task output to persistent store.
                                                        5
Output Ordering
• The application can control the sort order and
  partitions of the output via
  OutputKeyComparator and Partitioner.
• OutputKeyComparator
   – Defines how to compare serialized keys.
   – Defaults to WritableComparable, but should be
     defined for any application defined key types.
      • key1.compareTo(key2)
• Partitioner
   – Given a map output key and the number of
     reduces, chooses a reduce.
   – Defaults to HashPartitioner
                                                      6
      • key.hashCode % numReduces
Combiners
• Combiners are an optimization for jobs with
  reducers that can merge multiple values into
  a single value.
• Typically, the combiner is the same as the
  reducer and runs on the map outputs before it
  is transferred to the reducer’s machine.
• For example, WordCount’s mapper generates
  (word, count) and the combiner and reducer
  generate the sum for each word.
  – Input: “hi Owen bye Owen”
  – Map output: (“hi”, 1), (“Owen”, 1), (“bye”,1), (“Owen”,1)
  – Combiner output: (“Owen”, 2), (“bye”, 1), (“hi”, 1)         7
Process Communication
• Use a custom RPC implementation
  –   Easy to change/extend
  –   Defined as Java interfaces
  –   Server objects implement the interface
  –   Client proxy objects automatically created
• All messages originate at the client
  – Prevents cycles and therefore deadlocks
• Errors
  – Include timeouts and communication problems.
  – Are signaled to client via IOException.
  – Are NEVER signaled to the server.
                                                   8
Map/Reduce Processes
• Launching Application
  – User application code
  – Submits a specific kind of Map/Reduce job
• JobTracker
  – Handles all jobs
  – Makes all scheduling decisions
• TaskTracker
  – Manager for all tasks on a given node
• Task
  – Runs an individual map or reduce fragment for a
    given job
  – Forks from the TaskTracker
                                                      9
Process Diagram




                  10
Job Control Flow
• Application launcher creates and submits job.
• JobTracker initializes job, creates FileSplits,
  and adds tasks to queue.
• TaskTrackers ask for a new map or reduce
  task every 10 seconds or when the previous
  task finishes.
• As tasks run, the TaskTracker reports status
  to the JobTracker every 10 seconds.
• When job completes, the JobTracker tells the
  TaskTrackers to delete temporary files.
• Application launcher notices job completion
  and stops waiting.                              11
Application Launcher
• Application code to create JobConf and set
  the parameters.
  – Mapper, Reducer classes
  – InputFormat and OutputFormat classes
  – Combiner class, if desired
• Writes JobConf and the application jar to DFS
  and submits job to JobTracker.
• Can exit immediately or wait for the job to
  complete or fail.

                                               12
JobTracker
• Takes JobConf and creates an instance of
  the InputFormat. Calls the getSplits method to
  generate map inputs.
• Creates a JobInProgress object and a bunch
  of TaskInProgress “TIP” and Task objects.
  – JobInProgress is the status of the job.
  – TaskInProgress is the status of a fragment of
    work.
  – Task is an attempt to do a TIP.
• As TaskTrackers request work, they are given
  Tasks to execute.                          13
TaskTracker
• All Tasks
  –   Create the TaskRunner
  –   Copy the job.jar and job.xml from DFS.
  –   Localize the JobConf for this Task.
  –   Call task.prepare() (details later)
  –   Launch the Task in a new JVM under
      TaskTracker.Child.
  –   Catch output from Task and log it at the info level.
  –   Take Task status updates and send to JobTracker
      every 10 seconds.
  –   If job is killed, kill the task.
  –   If task dies or completes, tell the JobTracker.    14
TaskTracker for Reduces
• For Reduces, the task.prepare() fetches all of
  the relevant map outputs for this reduce.
• Files are fetched using http from the map’s
  TaskTracker’s Jetty.
• Files are fetched in parallel threads, but only
  1 to each host.
• When fetches fail, a backoff scheme is used
  to keep from overloading TaskTrackers.
• Fetching accounts for the first 33% of the
  reduce’s progress.
                                                15
Map Tasks
• Use the InputFormat object to create a
  RecordReader from the FileSplit.
• Loop through the keys and values in the
  FileSplit and feed each to the mapper.
• For no combiner, a SequenceFile is written
  for the keys to each reduce.
• With a combiner, the frameworks buffers
  100,000 keys and values, sorts, combines,
  and writes them to SequenceFile’s for each
  reduce.
                                               16
Reduce Tasks: Sort
• Sort
  – 33% to 66% of reduce’s progress
  – Base
     • Read 100 (io.sort.mb) meg of keys and values into
       memory.
     • Sort the memory
     • Write to disk
  – Merge
     • Read 10 (io.sort.factor) files and do a merge into 1 file.
     • Repeat as many times as required (2 levels for 100 files,
       3 levels for 1000 files, etc.)

                                                                17
Reduce Tasks: Reduce
• Reduce
  – 66% to 100% of reduce’s progress
  – Use a SequenceFile.Reader to read sorted input
    and pass to reducer one key at a time along with
    the associated values.
  – Output keys and values are written to the
    OutputFormat object, which usually writes a file to
    DFS.
  – The output from the reduce is NOT resorted, so it
    is in the order and fragmentation of the map output
    keys.
                                                     18

More Related Content

What's hot

Map reduce and Hadoop on windows
Map reduce and Hadoop on windowsMap reduce and Hadoop on windows
Map reduce and Hadoop on windows
Muhammad Shahid
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
schapht
 
Mapreduce advanced
Mapreduce advancedMapreduce advanced
Mapreduce advanced
Chirag Ahuja
 
Large Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part ILarge Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part IMarin Dimitrov
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
Hassan A-j
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
Vigen Sahakyan
 
Introduction to Map-Reduce
Introduction to Map-ReduceIntroduction to Map-Reduce
Introduction to Map-Reduce
Brendan Tierney
 
Hadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepHadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by step
Subhas Kumar Ghosh
 
Relational Algebra and MapReduce
Relational Algebra and MapReduceRelational Algebra and MapReduce
Relational Algebra and MapReduce
Pietro Michiardi
 
Hadoop MapReduce framework - Module 3
Hadoop MapReduce framework - Module 3Hadoop MapReduce framework - Module 3
Hadoop MapReduce framework - Module 3
Rohit Agrawal
 
An Introduction To Map-Reduce
An Introduction To Map-ReduceAn Introduction To Map-Reduce
An Introduction To Map-Reduce
Francisco Pérez-Sorrosal
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
M Baddar
 
04 pig data operations
04 pig data operations04 pig data operations
04 pig data operations
Subhas Kumar Ghosh
 
The google MapReduce
The google MapReduceThe google MapReduce
The google MapReduce
Romain Jacotin
 
Hadoop map reduce in operation
Hadoop map reduce in operationHadoop map reduce in operation
Hadoop map reduce in operation
Subhas Kumar Ghosh
 
Map reduce prashant
Map reduce prashantMap reduce prashant
Map reduce prashant
Prashant Gupta
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce ParadigmDilip Reddy
 
Map Reduce Online
Map Reduce OnlineMap Reduce Online
Map Reduce Online
Hadoop User Group
 
An Introduction to MapReduce
An Introduction to MapReduceAn Introduction to MapReduce
An Introduction to MapReduceFrane Bandov
 
MapReduce: Simplified Data Processing on Large Clusters
MapReduce: Simplified Data Processing on Large ClustersMapReduce: Simplified Data Processing on Large Clusters
MapReduce: Simplified Data Processing on Large Clusters
Ashraf Uddin
 

What's hot (20)

Map reduce and Hadoop on windows
Map reduce and Hadoop on windowsMap reduce and Hadoop on windows
Map reduce and Hadoop on windows
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Mapreduce advanced
Mapreduce advancedMapreduce advanced
Mapreduce advanced
 
Large Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part ILarge Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part I
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Introduction to Map-Reduce
Introduction to Map-ReduceIntroduction to Map-Reduce
Introduction to Map-Reduce
 
Hadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepHadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by step
 
Relational Algebra and MapReduce
Relational Algebra and MapReduceRelational Algebra and MapReduce
Relational Algebra and MapReduce
 
Hadoop MapReduce framework - Module 3
Hadoop MapReduce framework - Module 3Hadoop MapReduce framework - Module 3
Hadoop MapReduce framework - Module 3
 
An Introduction To Map-Reduce
An Introduction To Map-ReduceAn Introduction To Map-Reduce
An Introduction To Map-Reduce
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
04 pig data operations
04 pig data operations04 pig data operations
04 pig data operations
 
The google MapReduce
The google MapReduceThe google MapReduce
The google MapReduce
 
Hadoop map reduce in operation
Hadoop map reduce in operationHadoop map reduce in operation
Hadoop map reduce in operation
 
Map reduce prashant
Map reduce prashantMap reduce prashant
Map reduce prashant
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
Map Reduce Online
Map Reduce OnlineMap Reduce Online
Map Reduce Online
 
An Introduction to MapReduce
An Introduction to MapReduceAn Introduction to MapReduce
An Introduction to MapReduce
 
MapReduce: Simplified Data Processing on Large Clusters
MapReduce: Simplified Data Processing on Large ClustersMapReduce: Simplified Data Processing on Large Clusters
MapReduce: Simplified Data Processing on Large Clusters
 

Viewers also liked

Intro to BigData , Hadoop and Mapreduce
Intro to BigData , Hadoop and MapreduceIntro to BigData , Hadoop and Mapreduce
Intro to BigData , Hadoop and Mapreduce
Krishna Sangeeth KS
 
Hadoop_EcoSystem_Pradeep_MG
Hadoop_EcoSystem_Pradeep_MGHadoop_EcoSystem_Pradeep_MG
Hadoop_EcoSystem_Pradeep_MGPradeep MG
 
Pyshark in Network Packet analysis
Pyshark in Network Packet analysisPyshark in Network Packet analysis
Pyshark in Network Packet analysis
Rengaraj D
 
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014soujavajug
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling Algorithms
Leila panahi
 
Big Data Analytics(Intro,Hadoop Map Reduce,Mahout,K-means clustering,H-base)
Big Data Analytics(Intro,Hadoop Map Reduce,Mahout,K-means clustering,H-base)Big Data Analytics(Intro,Hadoop Map Reduce,Mahout,K-means clustering,H-base)
Big Data Analytics(Intro,Hadoop Map Reduce,Mahout,K-means clustering,H-base)
MIT College Of Engineering,Pune
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce Algorithms
Amund Tveit
 
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsIntroduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Skillspeed
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
Ricardo Varela
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
Lynn Langit
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reducerantav
 

Viewers also liked (12)

Intro to BigData , Hadoop and Mapreduce
Intro to BigData , Hadoop and MapreduceIntro to BigData , Hadoop and Mapreduce
Intro to BigData , Hadoop and Mapreduce
 
Hadoop_EcoSystem_Pradeep_MG
Hadoop_EcoSystem_Pradeep_MGHadoop_EcoSystem_Pradeep_MG
Hadoop_EcoSystem_Pradeep_MG
 
Pyshark in Network Packet analysis
Pyshark in Network Packet analysisPyshark in Network Packet analysis
Pyshark in Network Packet analysis
 
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling Algorithms
 
Big Data Analytics(Intro,Hadoop Map Reduce,Mahout,K-means clustering,H-base)
Big Data Analytics(Intro,Hadoop Map Reduce,Mahout,K-means clustering,H-base)Big Data Analytics(Intro,Hadoop Map Reduce,Mahout,K-means clustering,H-base)
Big Data Analytics(Intro,Hadoop Map Reduce,Mahout,K-means clustering,H-base)
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce Algorithms
 
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsIntroduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 
Slideshare ppt
Slideshare pptSlideshare ppt
Slideshare ppt
 

Similar to Hadoop Map Reduce Arch

MapReduce
MapReduceMapReduce
Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialApache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce Tutorial
Farzad Nozarian
 
Map reduce
Map reduceMap reduce
Map reduce
Somesh Maliye
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptx
HARIKRISHNANU13
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
Prashant Gupta
 
Hadoop first mr job - inverted index construction
Hadoop first mr job - inverted index constructionHadoop first mr job - inverted index construction
Hadoop first mr job - inverted index construction
Subhas Kumar Ghosh
 
mapreduce.pptx
mapreduce.pptxmapreduce.pptx
mapreduce.pptx
ShimoFcis
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
Ahmad El Tawil
 
Anatomy of classic map reduce in hadoop
Anatomy of classic map reduce in hadoop Anatomy of classic map reduce in hadoop
Anatomy of classic map reduce in hadoop
Rajesh Ananda Kumar
 
Big data unit iv and v lecture notes qb model exam
Big data unit iv and v lecture notes   qb model examBig data unit iv and v lecture notes   qb model exam
Big data unit iv and v lecture notes qb model exam
Indhujeni
 
Apache Spark: What's under the hood
Apache Spark: What's under the hoodApache Spark: What's under the hood
Apache Spark: What's under the hood
Adarsh Pannu
 
Ufuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemUfuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one System
Flink Forward
 
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
ucelebi
 
Map reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clustersMap reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clusters
Cleverence Kombe
 
Unified stateful big data processing in Apache Beam (incubating)
Unified stateful big data processing in Apache Beam (incubating)Unified stateful big data processing in Apache Beam (incubating)
Unified stateful big data processing in Apache Beam (incubating)
Aljoscha Krettek
 
Aljoscha Krettek - Portable stateful big data processing in Apache Beam
Aljoscha Krettek - Portable stateful big data processing in Apache BeamAljoscha Krettek - Portable stateful big data processing in Apache Beam
Aljoscha Krettek - Portable stateful big data processing in Apache Beam
Ververica
 
MapReduce: Ordering and Large-Scale Indexing on Large Clusters
MapReduce: Ordering and  Large-Scale Indexing on Large ClustersMapReduce: Ordering and  Large-Scale Indexing on Large Clusters
MapReduce: Ordering and Large-Scale Indexing on Large Clusters
IRJET Journal
 
Mapreduce script
Mapreduce scriptMapreduce script
Mapreduce script
Haripritha
 
Q2.12: Debugging with GDB
Q2.12: Debugging with GDBQ2.12: Debugging with GDB
Q2.12: Debugging with GDB
Linaro
 

Similar to Hadoop Map Reduce Arch (20)

MapReduce
MapReduceMapReduce
MapReduce
 
Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialApache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce Tutorial
 
Map reduce
Map reduceMap reduce
Map reduce
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptx
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Hadoop first mr job - inverted index construction
Hadoop first mr job - inverted index constructionHadoop first mr job - inverted index construction
Hadoop first mr job - inverted index construction
 
mapreduce.pptx
mapreduce.pptxmapreduce.pptx
mapreduce.pptx
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
Anatomy of classic map reduce in hadoop
Anatomy of classic map reduce in hadoop Anatomy of classic map reduce in hadoop
Anatomy of classic map reduce in hadoop
 
Big data unit iv and v lecture notes qb model exam
Big data unit iv and v lecture notes   qb model examBig data unit iv and v lecture notes   qb model exam
Big data unit iv and v lecture notes qb model exam
 
MapReduce
MapReduceMapReduce
MapReduce
 
Apache Spark: What's under the hood
Apache Spark: What's under the hoodApache Spark: What's under the hood
Apache Spark: What's under the hood
 
Ufuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemUfuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one System
 
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
 
Map reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clustersMap reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clusters
 
Unified stateful big data processing in Apache Beam (incubating)
Unified stateful big data processing in Apache Beam (incubating)Unified stateful big data processing in Apache Beam (incubating)
Unified stateful big data processing in Apache Beam (incubating)
 
Aljoscha Krettek - Portable stateful big data processing in Apache Beam
Aljoscha Krettek - Portable stateful big data processing in Apache BeamAljoscha Krettek - Portable stateful big data processing in Apache Beam
Aljoscha Krettek - Portable stateful big data processing in Apache Beam
 
MapReduce: Ordering and Large-Scale Indexing on Large Clusters
MapReduce: Ordering and  Large-Scale Indexing on Large ClustersMapReduce: Ordering and  Large-Scale Indexing on Large Clusters
MapReduce: Ordering and Large-Scale Indexing on Large Clusters
 
Mapreduce script
Mapreduce scriptMapreduce script
Mapreduce script
 
Q2.12: Debugging with GDB
Q2.12: Debugging with GDBQ2.12: Debugging with GDB
Q2.12: Debugging with GDB
 

More from Jeff Hammerbacher

20091027genentech
20091027genentech20091027genentech
20091027genentech
Jeff Hammerbacher
 
Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...
Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...
Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...Jeff Hammerbacher
 
20081030linkedin
20081030linkedin20081030linkedin
20081030linkedin
Jeff Hammerbacher
 
20081022cca
20081022cca20081022cca
20081022cca
Jeff Hammerbacher
 

More from Jeff Hammerbacher (20)

20120223keystone
20120223keystone20120223keystone
20120223keystone
 
20100714accel
20100714accel20100714accel
20100714accel
 
20100608sigmod
20100608sigmod20100608sigmod
20100608sigmod
 
20100513brown
20100513brown20100513brown
20100513brown
 
20100423sage
20100423sage20100423sage
20100423sage
 
20100418sos
20100418sos20100418sos
20100418sos
 
20100301icde
20100301icde20100301icde
20100301icde
 
20100201hplabs
20100201hplabs20100201hplabs
20100201hplabs
 
20100128ebay
20100128ebay20100128ebay
20100128ebay
 
20091203gemini
20091203gemini20091203gemini
20091203gemini
 
20091203gemini
20091203gemini20091203gemini
20091203gemini
 
20091110startup2startup
20091110startup2startup20091110startup2startup
20091110startup2startup
 
20091030nasajpl
20091030nasajpl20091030nasajpl
20091030nasajpl
 
20091027genentech
20091027genentech20091027genentech
20091027genentech
 
Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...
Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...
Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...
 
20090622 Velocity
20090622 Velocity20090622 Velocity
20090622 Velocity
 
20090422 Www
20090422 Www20090422 Www
20090422 Www
 
20090309berkeley
20090309berkeley20090309berkeley
20090309berkeley
 
20081030linkedin
20081030linkedin20081030linkedin
20081030linkedin
 
20081022cca
20081022cca20081022cca
20081022cca
 

Recently uploaded

Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
Globus
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 

Recently uploaded (20)

Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 

Hadoop Map Reduce Arch

  • 1. Hadoop Map/Reduce Owen O’Malley July 2006
  • 2. Map/Reduce Goals – Distribution • The data is available where needed. • Application does not care how many computers are being used. – Reliability • Application does not care that computers or networks may have temporary or permanent failures. 2
  • 3. Application Perspective • Define Mapper and Reducer classes and a “launching” program. • Mapper – Is given a stream of key1,value1 pairs – Generates a stream of key2, value2 pairs • Reducer – Is given a key2 and a stream of value2’s – Generates a stream of key3, value3 pairs • Launching Program – Creates a JobConf to define a job. – Submits JobConf to JobTracker and waits for completion. 3
  • 5. Input & Output Formats • The application also chooses input and output formats, which define how the persistent data is read and written. These are interfaces and can be defined by the application. • InputFormat – Splits the input to determine the input to each map task. – Defines a RecordReader that reads key, value pairs that are passed to the map task • OutputFormat – Given the key, value pairs and a filename, writes the reduce task output to persistent store. 5
  • 6. Output Ordering • The application can control the sort order and partitions of the output via OutputKeyComparator and Partitioner. • OutputKeyComparator – Defines how to compare serialized keys. – Defaults to WritableComparable, but should be defined for any application defined key types. • key1.compareTo(key2) • Partitioner – Given a map output key and the number of reduces, chooses a reduce. – Defaults to HashPartitioner 6 • key.hashCode % numReduces
  • 7. Combiners • Combiners are an optimization for jobs with reducers that can merge multiple values into a single value. • Typically, the combiner is the same as the reducer and runs on the map outputs before it is transferred to the reducer’s machine. • For example, WordCount’s mapper generates (word, count) and the combiner and reducer generate the sum for each word. – Input: “hi Owen bye Owen” – Map output: (“hi”, 1), (“Owen”, 1), (“bye”,1), (“Owen”,1) – Combiner output: (“Owen”, 2), (“bye”, 1), (“hi”, 1) 7
  • 8. Process Communication • Use a custom RPC implementation – Easy to change/extend – Defined as Java interfaces – Server objects implement the interface – Client proxy objects automatically created • All messages originate at the client – Prevents cycles and therefore deadlocks • Errors – Include timeouts and communication problems. – Are signaled to client via IOException. – Are NEVER signaled to the server. 8
  • 9. Map/Reduce Processes • Launching Application – User application code – Submits a specific kind of Map/Reduce job • JobTracker – Handles all jobs – Makes all scheduling decisions • TaskTracker – Manager for all tasks on a given node • Task – Runs an individual map or reduce fragment for a given job – Forks from the TaskTracker 9
  • 11. Job Control Flow • Application launcher creates and submits job. • JobTracker initializes job, creates FileSplits, and adds tasks to queue. • TaskTrackers ask for a new map or reduce task every 10 seconds or when the previous task finishes. • As tasks run, the TaskTracker reports status to the JobTracker every 10 seconds. • When job completes, the JobTracker tells the TaskTrackers to delete temporary files. • Application launcher notices job completion and stops waiting. 11
  • 12. Application Launcher • Application code to create JobConf and set the parameters. – Mapper, Reducer classes – InputFormat and OutputFormat classes – Combiner class, if desired • Writes JobConf and the application jar to DFS and submits job to JobTracker. • Can exit immediately or wait for the job to complete or fail. 12
  • 13. JobTracker • Takes JobConf and creates an instance of the InputFormat. Calls the getSplits method to generate map inputs. • Creates a JobInProgress object and a bunch of TaskInProgress “TIP” and Task objects. – JobInProgress is the status of the job. – TaskInProgress is the status of a fragment of work. – Task is an attempt to do a TIP. • As TaskTrackers request work, they are given Tasks to execute. 13
  • 14. TaskTracker • All Tasks – Create the TaskRunner – Copy the job.jar and job.xml from DFS. – Localize the JobConf for this Task. – Call task.prepare() (details later) – Launch the Task in a new JVM under TaskTracker.Child. – Catch output from Task and log it at the info level. – Take Task status updates and send to JobTracker every 10 seconds. – If job is killed, kill the task. – If task dies or completes, tell the JobTracker. 14
  • 15. TaskTracker for Reduces • For Reduces, the task.prepare() fetches all of the relevant map outputs for this reduce. • Files are fetched using http from the map’s TaskTracker’s Jetty. • Files are fetched in parallel threads, but only 1 to each host. • When fetches fail, a backoff scheme is used to keep from overloading TaskTrackers. • Fetching accounts for the first 33% of the reduce’s progress. 15
  • 16. Map Tasks • Use the InputFormat object to create a RecordReader from the FileSplit. • Loop through the keys and values in the FileSplit and feed each to the mapper. • For no combiner, a SequenceFile is written for the keys to each reduce. • With a combiner, the frameworks buffers 100,000 keys and values, sorts, combines, and writes them to SequenceFile’s for each reduce. 16
  • 17. Reduce Tasks: Sort • Sort – 33% to 66% of reduce’s progress – Base • Read 100 (io.sort.mb) meg of keys and values into memory. • Sort the memory • Write to disk – Merge • Read 10 (io.sort.factor) files and do a merge into 1 file. • Repeat as many times as required (2 levels for 100 files, 3 levels for 1000 files, etc.) 17
  • 18. Reduce Tasks: Reduce • Reduce – 66% to 100% of reduce’s progress – Use a SequenceFile.Reader to read sorted input and pass to reducer one key at a time along with the associated values. – Output keys and values are written to the OutputFormat object, which usually writes a file to DFS. – The output from the reduce is NOT resorted, so it is in the order and fragmentation of the map output keys. 18