SlideShare a Scribd company logo
1 of 27
Mastering Map Reduce
Scott Crespo
Path to Success
Map Reduce Refresher
Optimization Strategies
CustomType Example
Applications
What’s Hadoop?
A framework that facilitates data flow through a cluster of servers
What’s Map Reduce?
 A paradigm for analyzing distributed data sets
Raw Data ( K, [V1..Vn] )(K,V)
What About Hive And Pig?
Use them whenever possible!
Data States in Map Reduce (Letter Count)
HelloWorld
Hello
World
H,1
E,1
L,1
L,1
O,1
W,1
O,1
R,1
L,1
D,1
H,[1]
E,[1]
L,[1,1,1]
O,[1,1]
W,[1]
R,[1]
D,[1]
H,1
E,1
L,3
O,2
W,1
R,1
D,1
Split
Map Partition/Shuffle
Reduce
Basic Map Reduce Program Structure
MyMapReduceProgram {
MyMapperClass extends Mapper {
map() {
// map code
}
}
MyReducerClass extends Reducer {
reduce() {
//reduce code
}
}
main() {
//driver code
}
}
Advanced Optimizations
 Drivers
 CustomTypes
 Setup Methods
 Partitioning
 Combiners
 Chaining
 FaultTolerance
Generating N-Grams
 N-Gram: Set of all n sequential elements in a set.
Trigram: “The quick brown fox jumps over the lazy dog”
(the quick brown), (quick brown fox), (brown fox jumps),
(fox jumps over), (jumps over the), (the lazy dog)
Solution Design
NGramCounter {
NGramMapper {
map() {
//Tokenize and Sanitize Inputs
// Create NGram
// Output (NGram ngram, Int count)
}
}
NGramCombiner {
combine() {
// Sum local NGrams counts that are of the same key
// Output (NGram ngram, Int Count)
}
}
NGramReducer {
reduce() {
// Sum Ngrams counts of the same key
// Output (NGram ngram, Int Count)
}
}
}
CustomType!
Work Flow
 Prototype (Python)
 CustomType (Trigram)
 UnitTests
 Mapper
 Reducer
Prototype
Quick and Dirty Python
Prototype
def test_mapper():
lines = [“the quick brown fox jumped over the lazy dog", "the quick brown”]
for line in lines:
words = line.split()
length = len(words)
sys.stdout.write("nLength of %d n-------------------n" % length)
i = 0
while (i+2 < length):
first = words[i]
second = words [i+1]
third = words[i+2]
trigram = "%s %s %s n" % (first, second, third)
sys.stdout.write(trigram)
i += 1
Output
Length of 9
-------------------
the quick brown
quick brown fox
brown fox jumped
fox jumped over
jumped over the
over the lazy
the lazy dog
Length of 3
-------------------
the quick brown
Custom DataTypes
Custom KeyTypes
Must implement Hadoops WritableComparable interface
 Writable:The key can be serialized and transmitted across a
network
 Comparable:The key can be compared to other keys &
combined/sorted for the reduce phase
write() readFields() compareTo() hashCode()
toString() equals()
Trigram.java
public class Trigram implements WritableComparable<Trigram> {
…
public int compareTo(Trigram other) {
int compared = first.compareTo(other.first);
if (compared != 0) {
return compared;
}
compared = second.compareTo(other.second);
if (compared != 0) {
return compared;
}
return third.compareTo(other.third);
}
public int hashCode() {
return first.hashCode()*163 + second.hashCode() + third.hashCode();
}
}
Map Reduce Program
TrigramMapper
public static class TrigramMapper
extends Mapper<Object, Text, Trigram, IntWritable> {
…
public void map(Object key, Text value, Context context) {
String line = value.toString().toLowerCase(); // create string and lower case
line = line.replaceAll("[^a-zs]",""); // remove bad non-word chars
String[] words = line.split("s"); // split line into list of words
int len = words.length; // need the length for our loop condition
for(int i = 0; i+2 < len; i++) {
if(len <= 1) { continue; } // remove short lines
first.set(words[i]);
second.set(words[i+1]);
third.set(words[i+2]);
trigram.set(first, second, third);
context.write(trigram, one);
TrigramReducer
public static class TrigramReducer
extends Reducer<Trigram, IntWritable, Trigram, IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Trigram key, Iterable<IntWritable> values, Context context ) {
int sum = 0;
for(IntWritable value : values) {
sum += value.get();
}
result.set(sum);
context.write(key, result);
…
Driver
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "Trigram Count");
job.setJarByClass(TrigramCount.class);
job.setMapperClass(TrigramMapper.class);
job.setMapOutputKeyClass(Trigram.class);
job.setMapOutputValueClass(IntWritable.class);
job.setReducerClass(TrigramReducer.class);
job.setCombinerClass(TrigramReducer.class);
job.setOutputKeyClass(Trigram.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
Applications
Speech Recognition
(Trigram1, 90)
(Trigram2, 76)
(Trigram3, 8)
(Trigram4, 1)
Other Applications
 Blog Posts
 Stocks
 GIS Coordinates
Any object with multiple attributes!
Stock
Attributes
Text timeStamp;
Text ticker;
Float price;
Conclusion
Custom DataTypes Can:
 Improve Runtime Performance
 Result in Reusable Code
 Provide a Consistent Interface
ThankYou!
Scott Crespo
scott@orlandods.com

More Related Content

What's hot

Spivakian Postcolonial-feminism Elements PPt.pdf
Spivakian Postcolonial-feminism Elements PPt.pdfSpivakian Postcolonial-feminism Elements PPt.pdf
Spivakian Postcolonial-feminism Elements PPt.pdflaya91
 
Neocolonialism in Petals of Blood
Neocolonialism in Petals of Blood Neocolonialism in Petals of Blood
Neocolonialism in Petals of Blood BhavnaSosa
 
Savagery in the Heart of Darkness
Savagery in the Heart of DarknessSavagery in the Heart of Darkness
Savagery in the Heart of DarknessWater Birds (Ali)
 
The veldt by Ray Bradbury 2012
The veldt by Ray Bradbury 2012The veldt by Ray Bradbury 2012
The veldt by Ray Bradbury 2012Linda Rubens
 
Okonkwo as a tragic hero in Things Fall Apart
Okonkwo as a tragic hero in Things Fall ApartOkonkwo as a tragic hero in Things Fall Apart
Okonkwo as a tragic hero in Things Fall ApartAteeqRana87
 
Neocolonialism in Petals of Blood
Neocolonialism in Petals of Blood Neocolonialism in Petals of Blood
Neocolonialism in Petals of Blood Latta Baraiya
 
wide_sargasso_sea_by_jean_rhys
wide_sargasso_sea_by_jean_rhyswide_sargasso_sea_by_jean_rhys
wide_sargasso_sea_by_jean_rhysSneha Agravat
 
Magical realism in midnight's children.pptx
Magical realism in midnight's children.pptxMagical realism in midnight's children.pptx
Magical realism in midnight's children.pptxJanviNakum
 
Names in the Hunger Games
Names in the Hunger GamesNames in the Hunger Games
Names in the Hunger GamesValerie Frankel
 
Postcolonialism12
Postcolonialism12Postcolonialism12
Postcolonialism12jakajmmk
 
Themes in Things Fall Apart
Themes in Things Fall ApartThemes in Things Fall Apart
Themes in Things Fall Apartmonikamakwana5
 
Beloved By Toni Morrison, American literature
Beloved By Toni Morrison, American literatureBeloved By Toni Morrison, American literature
Beloved By Toni Morrison, American literatureAyeshaKhan809
 
Feminism elements-in-ice-candy-man
Feminism elements-in-ice-candy-manFeminism elements-in-ice-candy-man
Feminism elements-in-ice-candy-manAli Raza
 
Lord of the flies
Lord of the fliesLord of the flies
Lord of the fliesGerald Pang
 
things fall a part themes and character
things fall a part themes and character things fall a part themes and character
things fall a part themes and character Chintan Patel
 

What's hot (20)

Spivakian Postcolonial-feminism Elements PPt.pdf
Spivakian Postcolonial-feminism Elements PPt.pdfSpivakian Postcolonial-feminism Elements PPt.pdf
Spivakian Postcolonial-feminism Elements PPt.pdf
 
Scarlet letter
Scarlet letterScarlet letter
Scarlet letter
 
Neocolonialism in Petals of Blood
Neocolonialism in Petals of Blood Neocolonialism in Petals of Blood
Neocolonialism in Petals of Blood
 
Savagery in the Heart of Darkness
Savagery in the Heart of DarknessSavagery in the Heart of Darkness
Savagery in the Heart of Darkness
 
4b use-case analysis
4b use-case analysis4b use-case analysis
4b use-case analysis
 
The veldt by Ray Bradbury 2012
The veldt by Ray Bradbury 2012The veldt by Ray Bradbury 2012
The veldt by Ray Bradbury 2012
 
Okonkwo as a tragic hero in Things Fall Apart
Okonkwo as a tragic hero in Things Fall ApartOkonkwo as a tragic hero in Things Fall Apart
Okonkwo as a tragic hero in Things Fall Apart
 
Neocolonialism in Petals of Blood
Neocolonialism in Petals of Blood Neocolonialism in Petals of Blood
Neocolonialism in Petals of Blood
 
wide_sargasso_sea_by_jean_rhys
wide_sargasso_sea_by_jean_rhyswide_sargasso_sea_by_jean_rhys
wide_sargasso_sea_by_jean_rhys
 
Closure
ClosureClosure
Closure
 
Magical realism in midnight's children.pptx
Magical realism in midnight's children.pptxMagical realism in midnight's children.pptx
Magical realism in midnight's children.pptx
 
Names in the Hunger Games
Names in the Hunger GamesNames in the Hunger Games
Names in the Hunger Games
 
Introduction to FormsFX
Introduction to FormsFXIntroduction to FormsFX
Introduction to FormsFX
 
Postcolonialism12
Postcolonialism12Postcolonialism12
Postcolonialism12
 
Themes in Things Fall Apart
Themes in Things Fall ApartThemes in Things Fall Apart
Themes in Things Fall Apart
 
Purple Hibiscus
Purple HibiscusPurple Hibiscus
Purple Hibiscus
 
Beloved By Toni Morrison, American literature
Beloved By Toni Morrison, American literatureBeloved By Toni Morrison, American literature
Beloved By Toni Morrison, American literature
 
Feminism elements-in-ice-candy-man
Feminism elements-in-ice-candy-manFeminism elements-in-ice-candy-man
Feminism elements-in-ice-candy-man
 
Lord of the flies
Lord of the fliesLord of the flies
Lord of the flies
 
things fall a part themes and character
things fall a part themes and character things fall a part themes and character
things fall a part themes and character
 

Similar to Mastering Hadoop Map Reduce - Custom Types and Other Optimizations

Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraBrief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraSomnath Mazumdar
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxHARIKRISHNANU13
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerankgothicane
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersXiao Qin
 
Introduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopIntroduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopDilum Bandara
 
Hadoop trainingin bangalore
Hadoop trainingin bangaloreHadoop trainingin bangalore
Hadoop trainingin bangaloreappaji intelhunt
 
Stratosphere System Overview Big Data Beers Berlin. 20.11.2013
Stratosphere System Overview Big Data Beers Berlin. 20.11.2013Stratosphere System Overview Big Data Beers Berlin. 20.11.2013
Stratosphere System Overview Big Data Beers Berlin. 20.11.2013Robert Metzger
 
Embarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel ProblemsEmbarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel ProblemsDilum Bandara
 
Applying stratosphere for big data analytics
Applying stratosphere for big data analyticsApplying stratosphere for big data analytics
Applying stratosphere for big data analyticsAvinash Pandu
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentationateeq ateeq
 
MapReduceAlgorithms.ppt
MapReduceAlgorithms.pptMapReduceAlgorithms.ppt
MapReduceAlgorithms.pptCheeWeiTan10
 
Google Cluster Innards
Google Cluster InnardsGoogle Cluster Innards
Google Cluster InnardsMartin Dvorak
 
Meethadoop
MeethadoopMeethadoop
MeethadoopIIIT-H
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine ParallelismSri Prasanna
 

Similar to Mastering Hadoop Map Reduce - Custom Types and Other Optimizations (20)

Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraBrief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptx
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Introduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopIntroduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with Hadoop
 
Lecture 2 part 3
Lecture 2 part 3Lecture 2 part 3
Lecture 2 part 3
 
Hadoop trainingin bangalore
Hadoop trainingin bangaloreHadoop trainingin bangalore
Hadoop trainingin bangalore
 
Hadoop
HadoopHadoop
Hadoop
 
Stratosphere System Overview Big Data Beers Berlin. 20.11.2013
Stratosphere System Overview Big Data Beers Berlin. 20.11.2013Stratosphere System Overview Big Data Beers Berlin. 20.11.2013
Stratosphere System Overview Big Data Beers Berlin. 20.11.2013
 
Embarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel ProblemsEmbarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel Problems
 
Applying stratosphere for big data analytics
Applying stratosphere for big data analyticsApplying stratosphere for big data analytics
Applying stratosphere for big data analytics
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
MapReduceAlgorithms.ppt
MapReduceAlgorithms.pptMapReduceAlgorithms.ppt
MapReduceAlgorithms.ppt
 
Google Cluster Innards
Google Cluster InnardsGoogle Cluster Innards
Google Cluster Innards
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Meethadoop
MeethadoopMeethadoop
Meethadoop
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine Parallelism
 

Recently uploaded

vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookmanojkuma9823
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 

Recently uploaded (20)

vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 

Mastering Hadoop Map Reduce - Custom Types and Other Optimizations

  • 2. Path to Success Map Reduce Refresher Optimization Strategies CustomType Example Applications
  • 3. What’s Hadoop? A framework that facilitates data flow through a cluster of servers
  • 4. What’s Map Reduce?  A paradigm for analyzing distributed data sets Raw Data ( K, [V1..Vn] )(K,V)
  • 5. What About Hive And Pig? Use them whenever possible!
  • 6. Data States in Map Reduce (Letter Count) HelloWorld Hello World H,1 E,1 L,1 L,1 O,1 W,1 O,1 R,1 L,1 D,1 H,[1] E,[1] L,[1,1,1] O,[1,1] W,[1] R,[1] D,[1] H,1 E,1 L,3 O,2 W,1 R,1 D,1 Split Map Partition/Shuffle Reduce
  • 7. Basic Map Reduce Program Structure MyMapReduceProgram { MyMapperClass extends Mapper { map() { // map code } } MyReducerClass extends Reducer { reduce() { //reduce code } } main() { //driver code } }
  • 8. Advanced Optimizations  Drivers  CustomTypes  Setup Methods  Partitioning  Combiners  Chaining  FaultTolerance
  • 9. Generating N-Grams  N-Gram: Set of all n sequential elements in a set. Trigram: “The quick brown fox jumps over the lazy dog” (the quick brown), (quick brown fox), (brown fox jumps), (fox jumps over), (jumps over the), (the lazy dog)
  • 10. Solution Design NGramCounter { NGramMapper { map() { //Tokenize and Sanitize Inputs // Create NGram // Output (NGram ngram, Int count) } } NGramCombiner { combine() { // Sum local NGrams counts that are of the same key // Output (NGram ngram, Int Count) } } NGramReducer { reduce() { // Sum Ngrams counts of the same key // Output (NGram ngram, Int Count) } } } CustomType!
  • 11. Work Flow  Prototype (Python)  CustomType (Trigram)  UnitTests  Mapper  Reducer
  • 13. Prototype def test_mapper(): lines = [“the quick brown fox jumped over the lazy dog", "the quick brown”] for line in lines: words = line.split() length = len(words) sys.stdout.write("nLength of %d n-------------------n" % length) i = 0 while (i+2 < length): first = words[i] second = words [i+1] third = words[i+2] trigram = "%s %s %s n" % (first, second, third) sys.stdout.write(trigram) i += 1
  • 14. Output Length of 9 ------------------- the quick brown quick brown fox brown fox jumped fox jumped over jumped over the over the lazy the lazy dog Length of 3 ------------------- the quick brown
  • 16. Custom KeyTypes Must implement Hadoops WritableComparable interface  Writable:The key can be serialized and transmitted across a network  Comparable:The key can be compared to other keys & combined/sorted for the reduce phase write() readFields() compareTo() hashCode() toString() equals()
  • 17. Trigram.java public class Trigram implements WritableComparable<Trigram> { … public int compareTo(Trigram other) { int compared = first.compareTo(other.first); if (compared != 0) { return compared; } compared = second.compareTo(other.second); if (compared != 0) { return compared; } return third.compareTo(other.third); } public int hashCode() { return first.hashCode()*163 + second.hashCode() + third.hashCode(); } }
  • 19. TrigramMapper public static class TrigramMapper extends Mapper<Object, Text, Trigram, IntWritable> { … public void map(Object key, Text value, Context context) { String line = value.toString().toLowerCase(); // create string and lower case line = line.replaceAll("[^a-zs]",""); // remove bad non-word chars String[] words = line.split("s"); // split line into list of words int len = words.length; // need the length for our loop condition for(int i = 0; i+2 < len; i++) { if(len <= 1) { continue; } // remove short lines first.set(words[i]); second.set(words[i+1]); third.set(words[i+2]); trigram.set(first, second, third); context.write(trigram, one);
  • 20. TrigramReducer public static class TrigramReducer extends Reducer<Trigram, IntWritable, Trigram, IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Trigram key, Iterable<IntWritable> values, Context context ) { int sum = 0; for(IntWritable value : values) { sum += value.get(); } result.set(sum); context.write(key, result); …
  • 21. Driver public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "Trigram Count"); job.setJarByClass(TrigramCount.class); job.setMapperClass(TrigramMapper.class); job.setMapOutputKeyClass(Trigram.class); job.setMapOutputValueClass(IntWritable.class); job.setReducerClass(TrigramReducer.class); job.setCombinerClass(TrigramReducer.class); job.setOutputKeyClass(Trigram.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); }
  • 23. Speech Recognition (Trigram1, 90) (Trigram2, 76) (Trigram3, 8) (Trigram4, 1)
  • 24. Other Applications  Blog Posts  Stocks  GIS Coordinates Any object with multiple attributes!
  • 26. Conclusion Custom DataTypes Can:  Improve Runtime Performance  Result in Reusable Code  Provide a Consistent Interface