Learning Objectives - In this module, you will learn Advance MapReduce concepts such as Counters, Custom Writables, Compression, Tuning, Error Handling, and how to deal with complex MapReduce programs.
6. Creating Custom Counters
• First step is to create an Enum that will contain the names of all custom
counters for a particular job.
public enum CustomCounters {VALID, INVALID};
• Inside the map or reduce task, the counter can be adjusted
if(validRecord)
context.getCounter(CustomCounters.VALID).increment(1); // increase the counter by 1
else if(invalidRecord)
context.getCounter(CustomCounters.INVALID).increment(1); // increase the counter by 1
• The custom counter values will be displayed alongside the built-in counter
values on the summary web page for a job viewed through the JobTracker.
• The values can be accessed programmatically
long validCounterValue = job.getCounters().findCounter(CustomCounters.VALID).getValue();
7. Serialization
• Serialization is the process of turning structured objects into a byte
stream for transmission over a network or for writing to persistent
storage.
• Deserialization is the reverse process of turning a byte stream back
into a series of structured objects.
• Hadoop uses its own serialization format, Writables, which is certainly
compact and fast, but not so easy to extend or use from languages
other than Java.
8. The Writable Interface
• The Writable interface defines two methods
• one for writing its state to a DataOutput binary stream
• one for reading its state from a DataInput binary stream
10. Custom Writables Example
public class TextPair implements
WritableComparable<TextPair>{
private Text first;
private Text second;
public TextPair() {set(new Text(), new Text());}
public TextPair(String first, String second) {
set(new Text(first), new Text(second));}
public TextPair(Text first, Text second) {
set(first, second);}
public void set(Text first, Text second) {
this.first = first;
this.second = second;}
public Text getFirst() {
return first;}
public Text getSecond() {
return second;}
public void write(DataOutput out) throws
IOException {
first.write(out); second.write(out);}
public void readFields(DataInput in) throws
IOException {first.readFields(in);
second.readFields(in);}
@Override
public int hashCode() {
return first.hashCode() * 163 +
second.hashCode();}
@Override
public boolean equals(Object o) {
if (o instanceof TextPair) {
TextPair tp = (TextPair) o;
return first.equals(tp.first) &&
second.equals(tp.second);}return false;}
@Override
public String toString() {
return first + "t" + second;}
public int compareTo(TextPair tp) {
int cmp = first.compareTo(tp.first);
if (cmp != 0) {
return cmp;}
return second.compareTo(tp.second);}
}
11. Error Handling
• Handling non-fatal errors that need to be tracked
• In the mapper:
if (some_error_condition){
context.getCounter(COUNTER_GROUP, COUNTER).increment(1);
}
• In the client:
boolean okay = job.waitForCompletion(true);
if (okay){
Counters counters = job.getCounters();
Counter bwc = counters.findCounter(COUNTER_GROUP, COUNTER);
System.out.println("Errors" + bwc.getDisplayName()+":" + bwc.getValue());
}
12. Compression
• It reduces the space needed to store files.
• It speeds up data transfer across the network, or to or from disk.