SlideShare a Scribd company logo
1 of 8
MAP REDUCE: BASICS
1
MR: Processing
<key,
value>
<key, value>
MR Box/Job
MR: For distributed data processing (here discussing with reference to Hadoop).
Provides easy scalability.
Essential data processing units- Mapper and Reducer.
However other additional units can be introduced based on application.
MR: Key-Value Pairs
Following are the requirements for key-value pairs for MR Jobs:
Both the key and value classes need to be serializable. Which
implies they need to implement Writable interface.
Key class need to implement WritableComparable interface, to
enable the sorting.
MR: Detailed Processing
<key, value>
Map
<key, value>
Combine Reduce
<key, value> <key, value>
MR: Terminology
• Job: Execution of mapper/reducer program over a data set.
• Job Tracker: Schedules jobs and tracks the assign jobs to Task tracker.
• Task: Each job is divided into smaller tasks, thus it is execution of map/reduce on slice
of data.
• Task Tracker: Tracks the task and reports status to JobTracker.
• Task Attempt: Attempt to execute a task on slave node.
• Payload:
MR: Types of Nodes
• Named Node: Manages the Hadoop Distributed File System. JobTracker sits here.
• Data Node: Holds all the data required for processing. TaskTrackers sit here.
• Master Node: Executes JobTracker and accepts all the job requests from the client.
• Slave Node: Runs the mapper and reducer jobs/program.
MR: Process
• Map task executes map function on each split of the data. (Split
size is important)
• Output of the map job is written to the local disk not on HDFS
(to avoid replication). Since this is just an intermediately output
and is rejected once the entire process is done. Hence storing
on HDFS will cause replication.
• Output of map is input to reduce job (for simple Map-Reduce
model). Output of reduce (i.e. the final output) is stored on
HDFS (first on local node and then replicated on disk)
MR: References
• https://hadoop.apache.org/docs/current/hadoop-mapreduce-
client/hadoop-mapreduce-client-core/MapReduceTutorial.html
• https://www.guru99.com/introduction-to-mapreduce.html

More Related Content

What's hot

Mapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersMapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersAbhishek Singh
 
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ..."MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...Adrian Florea
 
MapReduce: Simplified Data Processing On Large Clusters
MapReduce: Simplified Data Processing On Large ClustersMapReduce: Simplified Data Processing On Large Clusters
MapReduce: Simplified Data Processing On Large Clusterskazuma_sato
 
Wei's notes on MapReduce Scheduling
Wei's notes on MapReduce SchedulingWei's notes on MapReduce Scheduling
Wei's notes on MapReduce SchedulingLu Wei
 
Hadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorHadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorSubhas Kumar Ghosh
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce ParadigmNilaNila16
 
Hadoop combiner and partitioner
Hadoop combiner and partitionerHadoop combiner and partitioner
Hadoop combiner and partitionerSubhas Kumar Ghosh
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clusteringSubhas Kumar Ghosh
 
Mapreduce total order sorting technique
Mapreduce total order sorting techniqueMapreduce total order sorting technique
Mapreduce total order sorting techniqueUday Vakalapudi
 
Introducing MapReduce Programming Framework
Introducing MapReduce Programming FrameworkIntroducing MapReduce Programming Framework
Introducing MapReduce Programming FrameworkSamuel Yee
 
Map reduce in Hadoop
Map reduce in HadoopMap reduce in Hadoop
Map reduce in Hadoopishan0019
 
GoFFish - A Sub-graph centric framework for large scale graph analytics
GoFFish - A Sub-graph centric framework for large scale graph analyticsGoFFish - A Sub-graph centric framework for large scale graph analytics
GoFFish - A Sub-graph centric framework for large scale graph analyticscharithwiki
 
An Enhanced MapReduce Model (on BSP)
An Enhanced MapReduce Model (on BSP)An Enhanced MapReduce Model (on BSP)
An Enhanced MapReduce Model (on BSP)Yu Liu
 
Synchronizing Data Between Smallworld and Azure Cosmos DB
Synchronizing Data Between Smallworld and Azure Cosmos DBSynchronizing Data Between Smallworld and Azure Cosmos DB
Synchronizing Data Between Smallworld and Azure Cosmos DBSafe Software
 
Repartition join in mapreduce
Repartition join in mapreduceRepartition join in mapreduce
Repartition join in mapreduceUday Vakalapudi
 
Automating Crime Data to Import into GIS
Automating Crime Data to Import into GISAutomating Crime Data to Import into GIS
Automating Crime Data to Import into GISSafe Software
 
Map reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICSMap reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICSArchana Gopinath
 

What's hot (20)

Mapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersMapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large Clusters
 
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ..."MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
 
MapReduce: Simplified Data Processing On Large Clusters
MapReduce: Simplified Data Processing On Large ClustersMapReduce: Simplified Data Processing On Large Clusters
MapReduce: Simplified Data Processing On Large Clusters
 
Wei's notes on MapReduce Scheduling
Wei's notes on MapReduce SchedulingWei's notes on MapReduce Scheduling
Wei's notes on MapReduce Scheduling
 
Hadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorHadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparator
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
Hadoop combiner and partitioner
Hadoop combiner and partitionerHadoop combiner and partitioner
Hadoop combiner and partitioner
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering
 
Map reduce
Map reduceMap reduce
Map reduce
 
Mapreduce total order sorting technique
Mapreduce total order sorting techniqueMapreduce total order sorting technique
Mapreduce total order sorting technique
 
Introducing MapReduce Programming Framework
Introducing MapReduce Programming FrameworkIntroducing MapReduce Programming Framework
Introducing MapReduce Programming Framework
 
Map reduce in Hadoop
Map reduce in HadoopMap reduce in Hadoop
Map reduce in Hadoop
 
Accordion - VLDB 2014
Accordion - VLDB 2014Accordion - VLDB 2014
Accordion - VLDB 2014
 
GoFFish - A Sub-graph centric framework for large scale graph analytics
GoFFish - A Sub-graph centric framework for large scale graph analyticsGoFFish - A Sub-graph centric framework for large scale graph analytics
GoFFish - A Sub-graph centric framework for large scale graph analytics
 
An Enhanced MapReduce Model (on BSP)
An Enhanced MapReduce Model (on BSP)An Enhanced MapReduce Model (on BSP)
An Enhanced MapReduce Model (on BSP)
 
Synchronizing Data Between Smallworld and Azure Cosmos DB
Synchronizing Data Between Smallworld and Azure Cosmos DBSynchronizing Data Between Smallworld and Azure Cosmos DB
Synchronizing Data Between Smallworld and Azure Cosmos DB
 
MapReduce
MapReduceMapReduce
MapReduce
 
Repartition join in mapreduce
Repartition join in mapreduceRepartition join in mapreduce
Repartition join in mapreduce
 
Automating Crime Data to Import into GIS
Automating Crime Data to Import into GISAutomating Crime Data to Import into GIS
Automating Crime Data to Import into GIS
 
Map reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICSMap reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICS
 

Similar to Map Reduce Basics Explained

Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduceM Baddar
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce ParadigmDilip Reddy
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce ParadigmDilip Reddy
 
Hadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pigHadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pigKhanKhaja1
 
Big Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdfBig Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdfWasyihunSema2
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerancePallav Jha
 
writing Hadoop Map Reduce programs
writing Hadoop Map Reduce programswriting Hadoop Map Reduce programs
writing Hadoop Map Reduce programsjani shaik
 
Introduction to the Map-Reduce framework.pdf
Introduction to the Map-Reduce framework.pdfIntroduction to the Map-Reduce framework.pdf
Introduction to the Map-Reduce framework.pdfBikalAdhikari4
 
Hadoop first mr job - inverted index construction
Hadoop first mr job - inverted index constructionHadoop first mr job - inverted index construction
Hadoop first mr job - inverted index constructionSubhas Kumar Ghosh
 
Juniper Innovation Contest
Juniper Innovation ContestJuniper Innovation Contest
Juniper Innovation ContestAMIT BORUDE
 
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comHadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comsoftwarequery
 
Meethadoop
MeethadoopMeethadoop
MeethadoopIIIT-H
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxHARIKRISHNANU13
 

Similar to Map Reduce Basics Explained (20)

Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
Hadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pigHadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pig
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
 
MapReduce.pptx
MapReduce.pptxMapReduce.pptx
MapReduce.pptx
 
Hadoop
HadoopHadoop
Hadoop
 
Big Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdfBig Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdf
 
MapReduce basics
MapReduce basicsMapReduce basics
MapReduce basics
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
 
MapReduce-Notes.pdf
MapReduce-Notes.pdfMapReduce-Notes.pdf
MapReduce-Notes.pdf
 
writing Hadoop Map Reduce programs
writing Hadoop Map Reduce programswriting Hadoop Map Reduce programs
writing Hadoop Map Reduce programs
 
Introduction to the Map-Reduce framework.pdf
Introduction to the Map-Reduce framework.pdfIntroduction to the Map-Reduce framework.pdf
Introduction to the Map-Reduce framework.pdf
 
Hadoop first mr job - inverted index construction
Hadoop first mr job - inverted index constructionHadoop first mr job - inverted index construction
Hadoop first mr job - inverted index construction
 
Juniper Innovation Contest
Juniper Innovation ContestJuniper Innovation Contest
Juniper Innovation Contest
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comHadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.com
 
Meethadoop
MeethadoopMeethadoop
Meethadoop
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptx
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 

More from Surinder Kaur

More from Surinder Kaur (12)

Lucene
LuceneLucene
Lucene
 
Agile
AgileAgile
Agile
 
Apache Hive
Apache HiveApache Hive
Apache Hive
 
JSON Parsing
JSON ParsingJSON Parsing
JSON Parsing
 
Analysis of Emergency Evacuation of Building using PEPA
Analysis of Emergency Evacuation of Building using PEPAAnalysis of Emergency Evacuation of Building using PEPA
Analysis of Emergency Evacuation of Building using PEPA
 
Skype
SkypeSkype
Skype
 
NAT
NATNAT
NAT
 
XSLT
XSLTXSLT
XSLT
 
Dom
Dom Dom
Dom
 
java API for XML DOM
java API for XML DOMjava API for XML DOM
java API for XML DOM
 
intelligent sensors and sensor networks
intelligent sensors and sensor networksintelligent sensors and sensor networks
intelligent sensors and sensor networks
 
MPI n OpenMP
MPI n OpenMPMPI n OpenMP
MPI n OpenMP
 

Recently uploaded

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Recently uploaded (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Map Reduce Basics Explained

  • 2. MR: Processing <key, value> <key, value> MR Box/Job MR: For distributed data processing (here discussing with reference to Hadoop). Provides easy scalability. Essential data processing units- Mapper and Reducer. However other additional units can be introduced based on application.
  • 3. MR: Key-Value Pairs Following are the requirements for key-value pairs for MR Jobs: Both the key and value classes need to be serializable. Which implies they need to implement Writable interface. Key class need to implement WritableComparable interface, to enable the sorting.
  • 4. MR: Detailed Processing <key, value> Map <key, value> Combine Reduce <key, value> <key, value>
  • 5. MR: Terminology • Job: Execution of mapper/reducer program over a data set. • Job Tracker: Schedules jobs and tracks the assign jobs to Task tracker. • Task: Each job is divided into smaller tasks, thus it is execution of map/reduce on slice of data. • Task Tracker: Tracks the task and reports status to JobTracker. • Task Attempt: Attempt to execute a task on slave node. • Payload:
  • 6. MR: Types of Nodes • Named Node: Manages the Hadoop Distributed File System. JobTracker sits here. • Data Node: Holds all the data required for processing. TaskTrackers sit here. • Master Node: Executes JobTracker and accepts all the job requests from the client. • Slave Node: Runs the mapper and reducer jobs/program.
  • 7. MR: Process • Map task executes map function on each split of the data. (Split size is important) • Output of the map job is written to the local disk not on HDFS (to avoid replication). Since this is just an intermediately output and is rejected once the entire process is done. Hence storing on HDFS will cause replication. • Output of map is input to reduce job (for simple Map-Reduce model). Output of reduce (i.e. the final output) is stored on HDFS (first on local node and then replicated on disk)