SlideShare a Scribd company logo
1 of 15
Advance MapReduce
Concepts
Counters
• Counter provides a way to measure the progress or the number of operations
that occur within MapReduce programs.
Built-in counter groups
Task counters
File System Counters
File InputFormat Counters
File OutputFormat Counters
Creating Custom Counters
• First step is to create an Enum that will contain the names of all custom
counters for a particular job.
public enum CustomCounters {VALID, INVALID};
• Inside the map or reduce task, the counter can be adjusted
if(validRecord)
context.getCounter(CustomCounters.VALID).increment(1); // increase the counter by 1
else if(invalidRecord)
context.getCounter(CustomCounters.INVALID).increment(1); // increase the counter by 1
• The custom counter values will be displayed alongside the built-in counter
values on the summary web page for a job viewed through the JobTracker.
• The values can be accessed programmatically
long validCounterValue = job.getCounters().findCounter(CustomCounters.VALID).getValue();
Serialization
• Serialization is the process of turning structured objects into a byte
stream for transmission over a network or for writing to persistent
storage.
• Deserialization is the reverse process of turning a byte stream back
into a series of structured objects.
• Hadoop uses its own serialization format, Writables, which is certainly
compact and fast, but not so easy to extend or use from languages
other than Java.
The Writable Interface
• The Writable interface defines two methods
• one for writing its state to a DataOutput binary stream
• one for reading its state from a DataInput binary stream
Writable class hierarchy
Custom Writables Example
public class TextPair implements
WritableComparable<TextPair>{
private Text first;
private Text second;
public TextPair() {set(new Text(), new Text());}
public TextPair(String first, String second) {
set(new Text(first), new Text(second));}
public TextPair(Text first, Text second) {
set(first, second);}
public void set(Text first, Text second) {
this.first = first;
this.second = second;}
public Text getFirst() {
return first;}
public Text getSecond() {
return second;}
public void write(DataOutput out) throws
IOException {
first.write(out); second.write(out);}
public void readFields(DataInput in) throws
IOException {first.readFields(in);
second.readFields(in);}
@Override
public int hashCode() {
return first.hashCode() * 163 +
second.hashCode();}
@Override
public boolean equals(Object o) {
if (o instanceof TextPair) {
TextPair tp = (TextPair) o;
return first.equals(tp.first) &&
second.equals(tp.second);}return false;}
@Override
public String toString() {
return first + "t" + second;}
public int compareTo(TextPair tp) {
int cmp = first.compareTo(tp.first);
if (cmp != 0) {
return cmp;}
return second.compareTo(tp.second);}
}
Error Handling
• Handling non-fatal errors that need to be tracked
• In the mapper:
if (some_error_condition){
context.getCounter(COUNTER_GROUP, COUNTER).increment(1);
}
• In the client:
boolean okay = job.waitForCompletion(true);
if (okay){
Counters counters = job.getCounters();
Counter bwc = counters.findCounter(COUNTER_GROUP, COUNTER);
System.out.println("Errors" + bwc.getDisplayName()+":" + bwc.getValue());
}
Compression
• It reduces the space needed to store files.
• It speeds up data transfer across the network, or to or from disk.
Tuning
Map Side Tuning Properties
Reduce Side Tuning Properties

More Related Content

What's hot

Sql Connection and data table and data set and sample program in C# ....
Sql Connection and data table and data set and sample program in C# ....Sql Connection and data table and data set and sample program in C# ....
Sql Connection and data table and data set and sample program in C# ....Hari Haran
 
Memory management
Memory managementMemory management
Memory managementsana younas
 
Functions in advanced programming
Functions in advanced programmingFunctions in advanced programming
Functions in advanced programmingVisnuDharsini
 
Java Arrays and DateTime Functions
Java Arrays and DateTime FunctionsJava Arrays and DateTime Functions
Java Arrays and DateTime FunctionsJamsher bhanbhro
 
Coherence SIG: Advanced usage of indexes in coherence
Coherence SIG: Advanced usage of indexes in coherenceCoherence SIG: Advanced usage of indexes in coherence
Coherence SIG: Advanced usage of indexes in coherencearagozin
 
Cupdf.com introduction to-data-structures-and-algorithm
Cupdf.com introduction to-data-structures-and-algorithmCupdf.com introduction to-data-structures-and-algorithm
Cupdf.com introduction to-data-structures-and-algorithmTarikuDabala1
 
Applications of data structures
Applications of data structuresApplications of data structures
Applications of data structuresWipro
 
Intro to plyr for Davis R Users' Group, by Steve Culman
Intro to plyr for Davis R Users' Group, by Steve CulmanIntro to plyr for Davis R Users' Group, by Steve Culman
Intro to plyr for Davis R Users' Group, by Steve CulmanNoam Ross
 
Cost Based Optimizer - Part 2 of 2
Cost Based Optimizer - Part 2 of 2Cost Based Optimizer - Part 2 of 2
Cost Based Optimizer - Part 2 of 2Mahesh Vallampati
 
Python library
Python libraryPython library
Python libraryToniyaP1
 
Introduction to datastructure and algorithm
Introduction to datastructure and algorithmIntroduction to datastructure and algorithm
Introduction to datastructure and algorithmPratik Mota
 
Pyclustering tutorial - K-means
Pyclustering tutorial - K-meansPyclustering tutorial - K-means
Pyclustering tutorial - K-meansAndrei Novikov
 
Overview of query evaluation
Overview of query evaluationOverview of query evaluation
Overview of query evaluationavniS
 
Experiment no 05
Experiment no 05Experiment no 05
Experiment no 05Ankit Dubey
 

What's hot (20)

Sql Connection and data table and data set and sample program in C# ....
Sql Connection and data table and data set and sample program in C# ....Sql Connection and data table and data set and sample program in C# ....
Sql Connection and data table and data set and sample program in C# ....
 
Memory management
Memory managementMemory management
Memory management
 
Hadoop map reduce concepts
Hadoop map reduce conceptsHadoop map reduce concepts
Hadoop map reduce concepts
 
Hadoop exercise
Hadoop exerciseHadoop exercise
Hadoop exercise
 
Functions in advanced programming
Functions in advanced programmingFunctions in advanced programming
Functions in advanced programming
 
Java Arrays and DateTime Functions
Java Arrays and DateTime FunctionsJava Arrays and DateTime Functions
Java Arrays and DateTime Functions
 
Ecet 370 week 1 lab
Ecet 370 week 1 labEcet 370 week 1 lab
Ecet 370 week 1 lab
 
Coherence SIG: Advanced usage of indexes in coherence
Coherence SIG: Advanced usage of indexes in coherenceCoherence SIG: Advanced usage of indexes in coherence
Coherence SIG: Advanced usage of indexes in coherence
 
11i Logs
11i Logs11i Logs
11i Logs
 
Cupdf.com introduction to-data-structures-and-algorithm
Cupdf.com introduction to-data-structures-and-algorithmCupdf.com introduction to-data-structures-and-algorithm
Cupdf.com introduction to-data-structures-and-algorithm
 
Applications of data structures
Applications of data structuresApplications of data structures
Applications of data structures
 
Java data types
Java data typesJava data types
Java data types
 
RxSubject And Operators
RxSubject And OperatorsRxSubject And Operators
RxSubject And Operators
 
Intro to plyr for Davis R Users' Group, by Steve Culman
Intro to plyr for Davis R Users' Group, by Steve CulmanIntro to plyr for Davis R Users' Group, by Steve Culman
Intro to plyr for Davis R Users' Group, by Steve Culman
 
Cost Based Optimizer - Part 2 of 2
Cost Based Optimizer - Part 2 of 2Cost Based Optimizer - Part 2 of 2
Cost Based Optimizer - Part 2 of 2
 
Python library
Python libraryPython library
Python library
 
Introduction to datastructure and algorithm
Introduction to datastructure and algorithmIntroduction to datastructure and algorithm
Introduction to datastructure and algorithm
 
Pyclustering tutorial - K-means
Pyclustering tutorial - K-meansPyclustering tutorial - K-means
Pyclustering tutorial - K-means
 
Overview of query evaluation
Overview of query evaluationOverview of query evaluation
Overview of query evaluation
 
Experiment no 05
Experiment no 05Experiment no 05
Experiment no 05
 

Viewers also liked

Image Guidelines For Keys & Synthesizers
Image Guidelines For Keys & SynthesizersImage Guidelines For Keys & Synthesizers
Image Guidelines For Keys & SynthesizersSellOnFlipkart
 
Estudios ambientales
Estudios ambientalesEstudios ambientales
Estudios ambientalesValentinaM16
 
PRESENTACIÓN - MIS PRIMEROS NUMEROS
PRESENTACIÓN - MIS PRIMEROS NUMEROSPRESENTACIÓN - MIS PRIMEROS NUMEROS
PRESENTACIÓN - MIS PRIMEROS NUMEROSCPESANSEBASTIAN
 
Ingenieria genetica y clonacion humana
Ingenieria genetica  y clonacion humanaIngenieria genetica  y clonacion humana
Ingenieria genetica y clonacion humanarojassuhail
 
Our continuous delivery journey
Our continuous delivery journeyOur continuous delivery journey
Our continuous delivery journeySuzie Prince
 
Khởi công, công ty tổ chức lễ khởi công chuyên nghiệp nhất tại Tây Ninh
Khởi công, công ty tổ chức lễ khởi công chuyên nghiệp nhất tại Tây NinhKhởi công, công ty tổ chức lễ khởi công chuyên nghiệp nhất tại Tây Ninh
Khởi công, công ty tổ chức lễ khởi công chuyên nghiệp nhất tại Tây NinhHoàng Tuấn
 
Workshop evidence based talent &amp; motivatie
Workshop evidence based talent &amp; motivatie Workshop evidence based talent &amp; motivatie
Workshop evidence based talent &amp; motivatie Jeroen Meliëzer
 
Clinical Samples & Disease State Plasma Newsflash October 2016 final
Clinical Samples & Disease State Plasma Newsflash October 2016 finalClinical Samples & Disease State Plasma Newsflash October 2016 final
Clinical Samples & Disease State Plasma Newsflash October 2016 finalBBISolutions
 
Marxismo francés
Marxismo francésMarxismo francés
Marxismo francéskissme19
 
Redes sociales en la docencia1
Redes sociales en la docencia1Redes sociales en la docencia1
Redes sociales en la docencia1Dew_Icbi
 
Jmmo m3 u4_reporte_recursosweb2.0_personal_actividadopcional
Jmmo m3 u4_reporte_recursosweb2.0_personal_actividadopcionalJmmo m3 u4_reporte_recursosweb2.0_personal_actividadopcional
Jmmo m3 u4_reporte_recursosweb2.0_personal_actividadopcionalMaggieMedrano35
 
GET GOING ON YOUR GOALS
GET GOING ON YOUR GOALSGET GOING ON YOUR GOALS
GET GOING ON YOUR GOALSHien Lam
 
Foodtruck research midterm
Foodtruck research midtermFoodtruck research midterm
Foodtruck research midterm하영 지
 
Sistemas admisnistrativos
Sistemas admisnistrativosSistemas admisnistrativos
Sistemas admisnistrativosanny1390
 
Pramodkitekar_2.1Yr exp
Pramodkitekar_2.1Yr expPramodkitekar_2.1Yr exp
Pramodkitekar_2.1Yr exppramod kitekar
 

Viewers also liked (20)

Image Guidelines For Keys & Synthesizers
Image Guidelines For Keys & SynthesizersImage Guidelines For Keys & Synthesizers
Image Guidelines For Keys & Synthesizers
 
Estudios ambientales
Estudios ambientalesEstudios ambientales
Estudios ambientales
 
PRESENTACIÓN - MIS PRIMEROS NUMEROS
PRESENTACIÓN - MIS PRIMEROS NUMEROSPRESENTACIÓN - MIS PRIMEROS NUMEROS
PRESENTACIÓN - MIS PRIMEROS NUMEROS
 
Diablada pillareña
Diablada pillareñaDiablada pillareña
Diablada pillareña
 
Ingenieria genetica y clonacion humana
Ingenieria genetica  y clonacion humanaIngenieria genetica  y clonacion humana
Ingenieria genetica y clonacion humana
 
Our continuous delivery journey
Our continuous delivery journeyOur continuous delivery journey
Our continuous delivery journey
 
Khởi công, công ty tổ chức lễ khởi công chuyên nghiệp nhất tại Tây Ninh
Khởi công, công ty tổ chức lễ khởi công chuyên nghiệp nhất tại Tây NinhKhởi công, công ty tổ chức lễ khởi công chuyên nghiệp nhất tại Tây Ninh
Khởi công, công ty tổ chức lễ khởi công chuyên nghiệp nhất tại Tây Ninh
 
Workshop evidence based talent &amp; motivatie
Workshop evidence based talent &amp; motivatie Workshop evidence based talent &amp; motivatie
Workshop evidence based talent &amp; motivatie
 
Clinical Samples & Disease State Plasma Newsflash October 2016 final
Clinical Samples & Disease State Plasma Newsflash October 2016 finalClinical Samples & Disease State Plasma Newsflash October 2016 final
Clinical Samples & Disease State Plasma Newsflash October 2016 final
 
Marxismo francés
Marxismo francésMarxismo francés
Marxismo francés
 
Redes sociales en la docencia1
Redes sociales en la docencia1Redes sociales en la docencia1
Redes sociales en la docencia1
 
Jmmo m3 u4_reporte_recursosweb2.0_personal_actividadopcional
Jmmo m3 u4_reporte_recursosweb2.0_personal_actividadopcionalJmmo m3 u4_reporte_recursosweb2.0_personal_actividadopcional
Jmmo m3 u4_reporte_recursosweb2.0_personal_actividadopcional
 
Assisi
AssisiAssisi
Assisi
 
Carta brya
Carta bryaCarta brya
Carta brya
 
GET GOING ON YOUR GOALS
GET GOING ON YOUR GOALSGET GOING ON YOUR GOALS
GET GOING ON YOUR GOALS
 
Foodtruck research midterm
Foodtruck research midtermFoodtruck research midterm
Foodtruck research midterm
 
Sistema digestivo
Sistema digestivoSistema digestivo
Sistema digestivo
 
Sistemas admisnistrativos
Sistemas admisnistrativosSistemas admisnistrativos
Sistemas admisnistrativos
 
C1.ics.p3.s3. el estado
C1.ics.p3.s3. el estadoC1.ics.p3.s3. el estado
C1.ics.p3.s3. el estado
 
Pramodkitekar_2.1Yr exp
Pramodkitekar_2.1Yr expPramodkitekar_2.1Yr exp
Pramodkitekar_2.1Yr exp
 

Similar to Advance MapReduce Concepts - Module 4

Hadoop MapReduce framework - Module 3
Hadoop MapReduce framework - Module 3Hadoop MapReduce framework - Module 3
Hadoop MapReduce framework - Module 3Rohit Agrawal
 
Big-data-analysis-training-in-mumbai
Big-data-analysis-training-in-mumbaiBig-data-analysis-training-in-mumbai
Big-data-analysis-training-in-mumbaiUnmesh Baile
 
Hadoop_Pennonsoft
Hadoop_PennonsoftHadoop_Pennonsoft
Hadoop_PennonsoftPennonSoft
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and MonoidsHugo Gävert
 
An introduction to Test Driven Development on MapReduce
An introduction to Test Driven Development on MapReduceAn introduction to Test Driven Development on MapReduce
An introduction to Test Driven Development on MapReduceAnanth PackkilDurai
 
Hadoop Integration in Cassandra
Hadoop Integration in CassandraHadoop Integration in Cassandra
Hadoop Integration in CassandraJairam Chandar
 
Enterprise workflow with Apps Script
Enterprise workflow with Apps ScriptEnterprise workflow with Apps Script
Enterprise workflow with Apps Scriptccherubino
 
Cs267 hadoop programming
Cs267 hadoop programmingCs267 hadoop programming
Cs267 hadoop programmingKuldeep Dhole
 
Getting Started with Real-Time Analytics
Getting Started with Real-Time AnalyticsGetting Started with Real-Time Analytics
Getting Started with Real-Time AnalyticsAmazon Web Services
 
Apache Spark, the Next Generation Cluster Computing
Apache Spark, the Next Generation Cluster ComputingApache Spark, the Next Generation Cluster Computing
Apache Spark, the Next Generation Cluster ComputingGerger
 
Functional patterns and techniques in C#
Functional patterns and techniques in C#Functional patterns and techniques in C#
Functional patterns and techniques in C#Péter Takács
 
JRubyKaigi2010 Hadoop Papyrus
JRubyKaigi2010 Hadoop PapyrusJRubyKaigi2010 Hadoop Papyrus
JRubyKaigi2010 Hadoop PapyrusKoichi Fujikawa
 
Spark what's new what's coming
Spark what's new what's comingSpark what's new what's coming
Spark what's new what's comingDatabricks
 
Spark Sql and DataFrame
Spark Sql and DataFrameSpark Sql and DataFrame
Spark Sql and DataFramePrashant Gupta
 

Similar to Advance MapReduce Concepts - Module 4 (20)

Hadoop MapReduce framework - Module 3
Hadoop MapReduce framework - Module 3Hadoop MapReduce framework - Module 3
Hadoop MapReduce framework - Module 3
 
TechTalk - Dotnet
TechTalk - DotnetTechTalk - Dotnet
TechTalk - Dotnet
 
Big-data-analysis-training-in-mumbai
Big-data-analysis-training-in-mumbaiBig-data-analysis-training-in-mumbai
Big-data-analysis-training-in-mumbai
 
Hadoop_Pennonsoft
Hadoop_PennonsoftHadoop_Pennonsoft
Hadoop_Pennonsoft
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and Monoids
 
Anti patterns
Anti patternsAnti patterns
Anti patterns
 
Functional Programming
Functional ProgrammingFunctional Programming
Functional Programming
 
Hadoop - Introduction to mapreduce
Hadoop -  Introduction to mapreduceHadoop -  Introduction to mapreduce
Hadoop - Introduction to mapreduce
 
An introduction to Test Driven Development on MapReduce
An introduction to Test Driven Development on MapReduceAn introduction to Test Driven Development on MapReduce
An introduction to Test Driven Development on MapReduce
 
Hadoop Integration in Cassandra
Hadoop Integration in CassandraHadoop Integration in Cassandra
Hadoop Integration in Cassandra
 
Enterprise workflow with Apps Script
Enterprise workflow with Apps ScriptEnterprise workflow with Apps Script
Enterprise workflow with Apps Script
 
Cs267 hadoop programming
Cs267 hadoop programmingCs267 hadoop programming
Cs267 hadoop programming
 
Google cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache FlinkGoogle cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache Flink
 
Getting Started with Real-Time Analytics
Getting Started with Real-Time AnalyticsGetting Started with Real-Time Analytics
Getting Started with Real-Time Analytics
 
Apache Spark, the Next Generation Cluster Computing
Apache Spark, the Next Generation Cluster ComputingApache Spark, the Next Generation Cluster Computing
Apache Spark, the Next Generation Cluster Computing
 
Functional patterns and techniques in C#
Functional patterns and techniques in C#Functional patterns and techniques in C#
Functional patterns and techniques in C#
 
JRubyKaigi2010 Hadoop Papyrus
JRubyKaigi2010 Hadoop PapyrusJRubyKaigi2010 Hadoop Papyrus
JRubyKaigi2010 Hadoop Papyrus
 
Functions
FunctionsFunctions
Functions
 
Spark what's new what's coming
Spark what's new what's comingSpark what's new what's coming
Spark what's new what's coming
 
Spark Sql and DataFrame
Spark Sql and DataFrameSpark Sql and DataFrame
Spark Sql and DataFrame
 

More from Rohit Agrawal

Apache Oozie Workflow Scheduler - Module 10
Apache Oozie Workflow Scheduler - Module 10Apache Oozie Workflow Scheduler - Module 10
Apache Oozie Workflow Scheduler - Module 10Rohit Agrawal
 
Hadoop 2.0, MRv2 and YARN - Module 9
Hadoop 2.0, MRv2 and YARN - Module 9Hadoop 2.0, MRv2 and YARN - Module 9
Hadoop 2.0, MRv2 and YARN - Module 9Rohit Agrawal
 
Advance HBase and Zookeeper - Module 8
Advance HBase and Zookeeper - Module 8Advance HBase and Zookeeper - Module 8
Advance HBase and Zookeeper - Module 8Rohit Agrawal
 
Advance Hive, NoSQL Database (HBase) - Module 7
Advance Hive, NoSQL Database (HBase) - Module 7Advance Hive, NoSQL Database (HBase) - Module 7
Advance Hive, NoSQL Database (HBase) - Module 7Rohit Agrawal
 
Pig and Pig Latin - Module 5
Pig and Pig Latin - Module 5Pig and Pig Latin - Module 5
Pig and Pig Latin - Module 5Rohit Agrawal
 
Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Rohit Agrawal
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Rohit Agrawal
 
Hive and HiveQL - Module6
Hive and HiveQL - Module6Hive and HiveQL - Module6
Hive and HiveQL - Module6Rohit Agrawal
 

More from Rohit Agrawal (8)

Apache Oozie Workflow Scheduler - Module 10
Apache Oozie Workflow Scheduler - Module 10Apache Oozie Workflow Scheduler - Module 10
Apache Oozie Workflow Scheduler - Module 10
 
Hadoop 2.0, MRv2 and YARN - Module 9
Hadoop 2.0, MRv2 and YARN - Module 9Hadoop 2.0, MRv2 and YARN - Module 9
Hadoop 2.0, MRv2 and YARN - Module 9
 
Advance HBase and Zookeeper - Module 8
Advance HBase and Zookeeper - Module 8Advance HBase and Zookeeper - Module 8
Advance HBase and Zookeeper - Module 8
 
Advance Hive, NoSQL Database (HBase) - Module 7
Advance Hive, NoSQL Database (HBase) - Module 7Advance Hive, NoSQL Database (HBase) - Module 7
Advance Hive, NoSQL Database (HBase) - Module 7
 
Pig and Pig Latin - Module 5
Pig and Pig Latin - Module 5Pig and Pig Latin - Module 5
Pig and Pig Latin - Module 5
 
Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
Hive and HiveQL - Module6
Hive and HiveQL - Module6Hive and HiveQL - Module6
Hive and HiveQL - Module6
 

Recently uploaded

Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceIES VE
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuidePixlogix Infotech
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...caitlingebhard1
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaWSO2
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....rightmanforbloodline
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityVictorSzoltysek
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data SciencePaolo Missier
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 

Recently uploaded (20)

Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 

Advance MapReduce Concepts - Module 4

  • 2. Counters • Counter provides a way to measure the progress or the number of operations that occur within MapReduce programs.
  • 5. File System Counters File InputFormat Counters File OutputFormat Counters
  • 6. Creating Custom Counters • First step is to create an Enum that will contain the names of all custom counters for a particular job. public enum CustomCounters {VALID, INVALID}; • Inside the map or reduce task, the counter can be adjusted if(validRecord) context.getCounter(CustomCounters.VALID).increment(1); // increase the counter by 1 else if(invalidRecord) context.getCounter(CustomCounters.INVALID).increment(1); // increase the counter by 1 • The custom counter values will be displayed alongside the built-in counter values on the summary web page for a job viewed through the JobTracker. • The values can be accessed programmatically long validCounterValue = job.getCounters().findCounter(CustomCounters.VALID).getValue();
  • 7. Serialization • Serialization is the process of turning structured objects into a byte stream for transmission over a network or for writing to persistent storage. • Deserialization is the reverse process of turning a byte stream back into a series of structured objects. • Hadoop uses its own serialization format, Writables, which is certainly compact and fast, but not so easy to extend or use from languages other than Java.
  • 8. The Writable Interface • The Writable interface defines two methods • one for writing its state to a DataOutput binary stream • one for reading its state from a DataInput binary stream
  • 10. Custom Writables Example public class TextPair implements WritableComparable<TextPair>{ private Text first; private Text second; public TextPair() {set(new Text(), new Text());} public TextPair(String first, String second) { set(new Text(first), new Text(second));} public TextPair(Text first, Text second) { set(first, second);} public void set(Text first, Text second) { this.first = first; this.second = second;} public Text getFirst() { return first;} public Text getSecond() { return second;} public void write(DataOutput out) throws IOException { first.write(out); second.write(out);} public void readFields(DataInput in) throws IOException {first.readFields(in); second.readFields(in);} @Override public int hashCode() { return first.hashCode() * 163 + second.hashCode();} @Override public boolean equals(Object o) { if (o instanceof TextPair) { TextPair tp = (TextPair) o; return first.equals(tp.first) && second.equals(tp.second);}return false;} @Override public String toString() { return first + "t" + second;} public int compareTo(TextPair tp) { int cmp = first.compareTo(tp.first); if (cmp != 0) { return cmp;} return second.compareTo(tp.second);} }
  • 11. Error Handling • Handling non-fatal errors that need to be tracked • In the mapper: if (some_error_condition){ context.getCounter(COUNTER_GROUP, COUNTER).increment(1); } • In the client: boolean okay = job.waitForCompletion(true); if (okay){ Counters counters = job.getCounters(); Counter bwc = counters.findCounter(COUNTER_GROUP, COUNTER); System.out.println("Errors" + bwc.getDisplayName()+":" + bwc.getValue()); }
  • 12. Compression • It reduces the space needed to store files. • It speeds up data transfer across the network, or to or from disk.
  • 14. Map Side Tuning Properties
  • 15. Reduce Side Tuning Properties