SlideShare a Scribd company logo
1 of 24
An introduction to Test Driven 
Development on MapReduce
What is TDD 
• Test first development approach where 
developers write test cases to capture the 
failure cases and improve the system to the 
acceptable state.
Why it is difficult in Hadoop 
• Hadoop is a distributed framework designed 
to run on a larger cluster with terra bytes of 
data 
• Mimic the behavior of a Hadoop cluster is very 
hard
The Best Practice 
• Golden Rule of Programming 
Always abstract your business logic. This 
will make easier for you to unit test
Example 
public class StockMeanReducer extends Reducer 
<Text,DoubleWritable,Text,DoubleWritable>{ 
private DoubleWritable writable = new DoubleWritable(); 
@Override 
public void reduce(final Text stockText, Iterable<DoubleWritable> values, Context 
context) throws IOException, InterruptedException 
{ 
double total = 0; 
int count = 0; 
for(DoubleWritable stockPrice : values) 
{ 
total += stockPrice.get(); 
count++; 
} 
writable.set(total / count); 
context.write(stockText, writable); 
}
The best approach – Abstraction 
public class StockMeanReducer2 extends Reducer <Text,DoubleWritable,Text,DoubleWritable> 
{ 
private DoubleWritable writable = new DoubleWritable(); 
private final StockMean stockMean = new StockMean(); 
@Override 
public void reduce(final Text stockText, Iterable<DoubleWritable> values, Context context) throws 
IOException, InterruptedException 
{ 
stockMean.reset(); 
for(DoubleWritable stockPrice : values) 
{ 
stockMean.add(stockPrice.get()); 
} 
writable.set(stockMean.calculate()); 
context.write(stockText, writable); 
} 
}
The best approach – Abstraction- cont 
public class StockMean 
{ 
private double total = 0; 
private int instance = 0; 
public void add(final double total) 
{ 
this.total += total; 
++this.instance; 
} 
public double calculate() 
{ 
return total / (double) instance; 
} 
public void reset() 
{ 
this.total = 0; 
this.instance = 0; 
} 
}
Testing Map Reduce Jobs 
• Best Practices are fine. Still I need to test the 
code inside my mapper and reducer. What 
shall I do??
Introduction to MRUNIT 
• MRUnit is a Map Reduce unit testing 
framework. 
• Developed by cloudera and been open 
sourced and currently in Apache Incubator. 
• Developed on top of Mockito mock object 
framework 
• It is a generic framework that you can use 
with both Junit and TestNG
MRUnit – Testing Mapper 
Unit Test 
Mapper 
MR Unit 
MapDriver 
Mock Output 
Collector 
(1) Set up and execute test 
(2) Call Map method with key / 
value 
(3) Map output is captured 
(4) Compare the expected 
outputs
Sample Mapper 
public class StockMeanMapper extends 
Mapper<Text,DoubleWritable,Text,DoubleWritable> 
{ 
@Override 
protected void map(Text key, DoubleWritable value, Context 
context) 
throws IOException, InterruptedException 
{ 
if(key == null) return; 
if(key.toString().equalsIgnoreCase("xyz")) return; 
context.write(key, value); 
} 
}
Mapper Unit Test 
public class StockMeanMapperTest { 
private Mapper<Text,DoubleWritable,Text,DoubleWritable> mapper; 
private MapDriver<Text,DoubleWritable,Text,DoubleWritable> driver; 
@Before 
public void setUp() 
{ 
mapper = new StockMeanMapper(); 
driver = new MapDriver<Text,DoubleWritable,Text,DoubleWritable>(mapper); 
} 
@Test 
public void testPositiveConditionStockMeanMapper() throws IOException 
{ 
List<Pair<Text, DoubleWritable>> results = driver.withInput(new Text("rahul"), new 
DoubleWritable(1)) 
.withOutput(new Text("rahul"), new DoubleWritable(1)) 
.run(); 
assertEquals(1, results.size()); 
} 
}
MRUnit – Testing Reducer 
Unit Test 
Reducer 
MR Unit 
ReduceDriver 
Mock Output 
Collector 
(1) Set up and execute test 
(2) Call Reduce method with key 
/ value 
(3) Reduce output is captured 
(4) Compare the expected 
outputs
Sample Reducer 
public class StockMeanReducer2 extends Reducer <Text,DoubleWritable,Text,DoubleWritable> 
{ 
private DoubleWritable writable = new DoubleWritable(); 
private final StockMean stockMean = new StockMean(); 
@Override 
public void reduce(final Text stockText, Iterable<DoubleWritable> values, Context context) throws 
IOException, InterruptedException 
{ 
stockMean.reset(); 
for(DoubleWritable stockPrice : values) 
{ 
stockMean.add(stockPrice.get()); 
} 
writable.set(stockMean.calculate()); 
context.write(stockText, writable); 
} 
}
Reducer Unit Test 
public class StockMeanReducerTest { 
private ReduceDriver<Text,DoubleWritable,Text,DoubleWritable> driver; 
private Reducer<Text,DoubleWritable,Text,DoubleWritable> reducer; 
@Before 
public void setup() 
{ 
reducer = new StockMeanReducer2(); 
driver = new ReduceDriver<Text,DoubleWritable,Text,DoubleWritable>(reducer); 
} 
@Test 
public void testStockPositive() throws IOException 
{ 
Pair<Text,DoubleWritable> assertPair = new Pair<Text,DoubleWritable>(new Text("ananth"), 
new DoubleWritable(300)); 
List<Pair<Text,DoubleWritable>> results = driver.withInput(new Text("ananth"), 
Arrays.asList(new 
DoubleWritable(500), 
new DoubleWritable(100))) 
.run(); 
assertEquals(assertPair, results.get(0)); 
} 
}
MRUnit – Testing MapReduce 
Unit Test 
Reducer 
MR Unit 
(1) Set up and execute test 
(4) Call Reduce method with key 
/ value (5) Compare the expected 
outputs 
MapReduceDriver 
MapDriver 
(3)Shuffle 
ReduceDriver 
Mapper 
(2) Call Map method with key / 
value 
(3) MRUnit perform it’s own in 
memory shuffle phase
MapReduce Unit Test 
public class StockMeanMapReduceTest 
{ 
private Mapper<Text,DoubleWritable,Text,DoubleWritable> mapper; 
private Reducer<Text,DoubleWritable,Text,DoubleWritable> reducer; 
private MapReduceDriver<Text,DoubleWritable,Text,DoubleWritable,Text,DoubleWritable> driver; 
@Before 
public void setup() 
{ 
mapper = new StockMeanMapper(); 
reducer = new StockMeanReducer2(); 
driver = new MapReduceDriver<Text,DoubleWritable,Text,DoubleWritable,Text,DoubleWritable>(mapper,reducer); 
}
MapReduce Unit Test – Contd.. 
@Test 
public void testPositive() throws IOException 
{ 
Pair<Text,DoubleWritable> inputPair1 = new Pair<Text,DoubleWritable>(new Text("ananth"), new DoubleWritable(300)); 
Pair<Text,DoubleWritable> inputPair2 = new Pair<Text,DoubleWritable>(new Text("ananth"), new DoubleWritable(100)); 
Pair<Text,DoubleWritable> inputPair3 = new Pair<Text,DoubleWritable>(new Text("rahul"), new DoubleWritable(400)); 
Pair<Text,DoubleWritable> inputPair4 = new Pair<Text,DoubleWritable>(new Text("xyz"), new DoubleWritable(50)); 
Pair<Text,DoubleWritable> assertPair1 = new Pair<Text,DoubleWritable>(new Text("ananth"), new DoubleWritable(200)); 
Pair<Text,DoubleWritable> assertPair2 = new Pair<Text,DoubleWritable>(new Text("rahul"), new DoubleWritable(400)); 
List<Pair<Text,DoubleWritable>> assertPair = Arrays.asList(assertPair1,assertPair2); 
List<Pair<Text,DoubleWritable>> results = driver. 
withInput(inputPair1) 
.withInput(inputPair2) 
.withInput(inputPair3) 
.withInput(inputPair4) 
.run(); 
assertEquals(assertPair, results); 
}
Wait, there is one more thing!!! 
• Hadoop is all about data. 
• We can’t always assume that data will be 
100% perfect. 
• So do MRUnit unit testing by mocking Object 
is enough??
Hadoop LocalFile System 
• Hadoop API provides LocalFileSystem, which 
enable you to read data from your local file 
system and test your map reduce jobs. 
• Best practice is to take a sample of your real 
data and load in to local file system and test it 
out. 
• LocalFileSystem only work in Linux based 
System.
How can I test LocalFileSystem in 
Windows? – A little hack 
public class WindowsLocalFileSystem extends LocalFileSystem 
{ 
public WindowsLocalFileSystem() 
{ 
super(); 
} 
@Override 
public boolean mkdirs ( 
final Path path, 
final FsPermission permission) 
throws IOException 
{ 
final boolean result = super.mkdirs(path); 
this.setPermission(path, permission); 
return result; 
}
Hack Contd.. 
@Override 
public void setPermission ( 
final Path path, 
final FsPermission permission) 
throws 
IOException 
{ 
try { 
super.setPermission(path, permission); 
} 
catch (final IOException e) { 
System.err.println("Cant help it, hence ignoring IOException 
setting persmission for path "" + path + 
"": " + e.getMessage()); 
} 
} 
}
How to use it? 
public class StockMeanDriver extends Configured implements Tool 
{ 
/** 
* @param args 
* @throws Exception 
*/ 
public static void main(String[] args) throws Exception { 
ToolRunner.run(new StockMeanDriver(), null); 
}
How to use it – contd.. 
@Override 
public int run(String[] arg0) throws Exception 
{ 
Configuration conf = getConf(); 
conf.set("fs.default.name", "file:///"); 
conf.set("mapred.job.tracker", "local"); 
conf.set("fs.file.impl", "org.intellipaat.training.hadoop.fs.WindowsLocalFileSystem"); 
conf.set("io.serializations","org.apache.hadoop.io.serializer.JavaSerialization," + 
"org.apache.hadoop.io.serializer.WritableSerialization"); 
Job job = new Job(conf,"Stock Mean"); 
job.setJarByClass(StockMeanDriver.class); 
job.setMapperClass(StockMeanMapper2.class); 
job.setReducerClass(StockMeanReducer2.class); 
job.setMapOutputKeyClass(Text.class); 
job.setMapOutputValueClass(DoubleWritable.class); 
job.setOutputKeyClass(Text.class); 
job.setOutputValueClass(DoubleWritable.class); 
job.setInputFormatClass(TextInputFormat.class); 
job.setOutputFormatClass(TextOutputFormat.class); 
FileInputFormat.addInputPath(job, new Path("input")); 
FileOutputFormat.setOutputPath(job, new Path("output")); 
job.waitForCompletion(Boolean.TRUE); 
return 0; 
}}

More Related Content

What's hot

Java Heap Dump Analysis Primer
Java Heap Dump Analysis PrimerJava Heap Dump Analysis Primer
Java Heap Dump Analysis PrimerKyle Hodgson
 
Mocking in Java with Mockito
Mocking in Java with MockitoMocking in Java with Mockito
Mocking in Java with MockitoRichard Paul
 
All Aboard the Databus
All Aboard the DatabusAll Aboard the Databus
All Aboard the DatabusAmy W. Tang
 
Advanced Node.JS Meetup
Advanced Node.JS MeetupAdvanced Node.JS Meetup
Advanced Node.JS MeetupLINAGORA
 
Load Balancing In Cloud Computing newppt
Load Balancing In Cloud Computing newpptLoad Balancing In Cloud Computing newppt
Load Balancing In Cloud Computing newpptUtshab Saha
 
LOAD BALANCING ALGORITHMS
LOAD BALANCING ALGORITHMSLOAD BALANCING ALGORITHMS
LOAD BALANCING ALGORITHMStanmayshah95
 
Memory Management in the Java Virtual Machine(Garbage collection)
Memory Management in the Java Virtual Machine(Garbage collection)Memory Management in the Java Virtual Machine(Garbage collection)
Memory Management in the Java Virtual Machine(Garbage collection)Prashanth Kumar
 
NIST Cloud Computing Reference Architecture
NIST Cloud Computing Reference ArchitectureNIST Cloud Computing Reference Architecture
NIST Cloud Computing Reference ArchitectureThanakrit Lersmethasakul
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQLDon Demcsak
 
Hardware supports for Virtualization
Hardware supports for VirtualizationHardware supports for Virtualization
Hardware supports for VirtualizationYoonje Choi
 
Ooad lab manual
Ooad  lab manualOoad  lab manual
Ooad lab manualPraseela R
 
Cloud Computing Principles and Paradigms: 6 on the management of virtual mach...
Cloud Computing Principles and Paradigms: 6 on the management of virtual mach...Cloud Computing Principles and Paradigms: 6 on the management of virtual mach...
Cloud Computing Principles and Paradigms: 6 on the management of virtual mach...Majid Hajibaba
 
Class and Objects in PHP
Class and Objects in PHPClass and Objects in PHP
Class and Objects in PHPRamasubbu .P
 
What is a Network Hypervisor?
What is a Network Hypervisor?What is a Network Hypervisor?
What is a Network Hypervisor?ADVA
 
Java buzzwords
Java buzzwordsJava buzzwords
Java buzzwordsramesh517
 
CloudAnalyst: A CloudSim-based Tool for Modelling and Analysis of Large Scale...
CloudAnalyst: A CloudSim-based Tool for Modelling and Analysis of Large Scale...CloudAnalyst: A CloudSim-based Tool for Modelling and Analysis of Large Scale...
CloudAnalyst: A CloudSim-based Tool for Modelling and Analysis of Large Scale...ambitlick
 

What's hot (20)

Java Heap Dump Analysis Primer
Java Heap Dump Analysis PrimerJava Heap Dump Analysis Primer
Java Heap Dump Analysis Primer
 
Mocking in Java with Mockito
Mocking in Java with MockitoMocking in Java with Mockito
Mocking in Java with Mockito
 
All Aboard the Databus
All Aboard the DatabusAll Aboard the Databus
All Aboard the Databus
 
Advanced Node.JS Meetup
Advanced Node.JS MeetupAdvanced Node.JS Meetup
Advanced Node.JS Meetup
 
Load Balancing In Cloud Computing newppt
Load Balancing In Cloud Computing newpptLoad Balancing In Cloud Computing newppt
Load Balancing In Cloud Computing newppt
 
LOAD BALANCING ALGORITHMS
LOAD BALANCING ALGORITHMSLOAD BALANCING ALGORITHMS
LOAD BALANCING ALGORITHMS
 
Memory Management in the Java Virtual Machine(Garbage collection)
Memory Management in the Java Virtual Machine(Garbage collection)Memory Management in the Java Virtual Machine(Garbage collection)
Memory Management in the Java Virtual Machine(Garbage collection)
 
HPC in the Cloud
HPC in the CloudHPC in the Cloud
HPC in the Cloud
 
NIST Cloud Computing Reference Architecture
NIST Cloud Computing Reference ArchitectureNIST Cloud Computing Reference Architecture
NIST Cloud Computing Reference Architecture
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
Hardware supports for Virtualization
Hardware supports for VirtualizationHardware supports for Virtualization
Hardware supports for Virtualization
 
Ooad lab manual
Ooad  lab manualOoad  lab manual
Ooad lab manual
 
Cloud Computing Principles and Paradigms: 6 on the management of virtual mach...
Cloud Computing Principles and Paradigms: 6 on the management of virtual mach...Cloud Computing Principles and Paradigms: 6 on the management of virtual mach...
Cloud Computing Principles and Paradigms: 6 on the management of virtual mach...
 
Class and Objects in PHP
Class and Objects in PHPClass and Objects in PHP
Class and Objects in PHP
 
Sql lite android
Sql lite androidSql lite android
Sql lite android
 
What is a Network Hypervisor?
What is a Network Hypervisor?What is a Network Hypervisor?
What is a Network Hypervisor?
 
Java buzzwords
Java buzzwordsJava buzzwords
Java buzzwords
 
Virtualization basics
Virtualization basics Virtualization basics
Virtualization basics
 
CloudAnalyst: A CloudSim-based Tool for Modelling and Analysis of Large Scale...
CloudAnalyst: A CloudSim-based Tool for Modelling and Analysis of Large Scale...CloudAnalyst: A CloudSim-based Tool for Modelling and Analysis of Large Scale...
CloudAnalyst: A CloudSim-based Tool for Modelling and Analysis of Large Scale...
 
Hbase hivepig
Hbase hivepigHbase hivepig
Hbase hivepig
 

Similar to An introduction to Test Driven Development on MapReduce

JRubyKaigi2010 Hadoop Papyrus
JRubyKaigi2010 Hadoop PapyrusJRubyKaigi2010 Hadoop Papyrus
JRubyKaigi2010 Hadoop PapyrusKoichi Fujikawa
 
Hadoop Integration in Cassandra
Hadoop Integration in CassandraHadoop Integration in Cassandra
Hadoop Integration in CassandraJairam Chandar
 
Testing in android
Testing in androidTesting in android
Testing in androidjtrindade
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examplesAndrea Iacono
 
Workshop 23: ReactJS, React & Redux testing
Workshop 23: ReactJS, React & Redux testingWorkshop 23: ReactJS, React & Redux testing
Workshop 23: ReactJS, React & Redux testingVisual Engineering
 
Android Unit Test
Android Unit TestAndroid Unit Test
Android Unit TestPhuoc Bui
 
Spring data ii
Spring data iiSpring data ii
Spring data ii명철 강
 
比XML更好用的Java Annotation
比XML更好用的Java Annotation比XML更好用的Java Annotation
比XML更好用的Java Annotationjavatwo2011
 
Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)
Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)
Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)Jyotirmoy Sundi
 
Finagle and Java Service Framework at Pinterest
Finagle and Java Service Framework at PinterestFinagle and Java Service Framework at Pinterest
Finagle and Java Service Framework at PinterestPavan Chitumalla
 
33rd Degree 2013, Bad Tests, Good Tests
33rd Degree 2013, Bad Tests, Good Tests33rd Degree 2013, Bad Tests, Good Tests
33rd Degree 2013, Bad Tests, Good TestsTomek Kaczanowski
 
Beyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel ProcessingBeyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel ProcessingEd Kohlwey
 
NET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxNET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxpetabridge
 
Bring the fun back to java
Bring the fun back to javaBring the fun back to java
Bring the fun back to javaciklum_ods
 
Test Automation for NoSQL Databases
Test Automation for NoSQL DatabasesTest Automation for NoSQL Databases
Test Automation for NoSQL DatabasesTobias Trelle
 
Mapredtutorial
MapredtutorialMapredtutorial
MapredtutorialAnup Mohta
 
Testing basics for developers
Testing basics for developersTesting basics for developers
Testing basics for developersAnton Udovychenko
 

Similar to An introduction to Test Driven Development on MapReduce (20)

Amazon elastic map reduce
Amazon elastic map reduceAmazon elastic map reduce
Amazon elastic map reduce
 
Junit_.pptx
Junit_.pptxJunit_.pptx
Junit_.pptx
 
JRubyKaigi2010 Hadoop Papyrus
JRubyKaigi2010 Hadoop PapyrusJRubyKaigi2010 Hadoop Papyrus
JRubyKaigi2010 Hadoop Papyrus
 
Hadoop Integration in Cassandra
Hadoop Integration in CassandraHadoop Integration in Cassandra
Hadoop Integration in Cassandra
 
Testing in android
Testing in androidTesting in android
Testing in android
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examples
 
Workshop 23: ReactJS, React & Redux testing
Workshop 23: ReactJS, React & Redux testingWorkshop 23: ReactJS, React & Redux testing
Workshop 23: ReactJS, React & Redux testing
 
Android Unit Test
Android Unit TestAndroid Unit Test
Android Unit Test
 
Good Practices On Test Automation
Good Practices On Test AutomationGood Practices On Test Automation
Good Practices On Test Automation
 
Spring data ii
Spring data iiSpring data ii
Spring data ii
 
比XML更好用的Java Annotation
比XML更好用的Java Annotation比XML更好用的Java Annotation
比XML更好用的Java Annotation
 
Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)
Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)
Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)
 
Finagle and Java Service Framework at Pinterest
Finagle and Java Service Framework at PinterestFinagle and Java Service Framework at Pinterest
Finagle and Java Service Framework at Pinterest
 
33rd Degree 2013, Bad Tests, Good Tests
33rd Degree 2013, Bad Tests, Good Tests33rd Degree 2013, Bad Tests, Good Tests
33rd Degree 2013, Bad Tests, Good Tests
 
Beyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel ProcessingBeyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel Processing
 
NET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxNET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptx
 
Bring the fun back to java
Bring the fun back to javaBring the fun back to java
Bring the fun back to java
 
Test Automation for NoSQL Databases
Test Automation for NoSQL DatabasesTest Automation for NoSQL Databases
Test Automation for NoSQL Databases
 
Mapredtutorial
MapredtutorialMapredtutorial
Mapredtutorial
 
Testing basics for developers
Testing basics for developersTesting basics for developers
Testing basics for developers
 

More from Ananth PackkilDurai

Emerging Trends in Data Engineering
Emerging Trends in Data EngineeringEmerging Trends in Data Engineering
Emerging Trends in Data EngineeringAnanth PackkilDurai
 
Measuring slack api_performance_using_druid
Measuring slack api_performance_using_druidMeasuring slack api_performance_using_druid
Measuring slack api_performance_using_druidAnanth PackkilDurai
 
education Innovation, using webiner
education Innovation, using webinereducation Innovation, using webiner
education Innovation, using webinerAnanth PackkilDurai
 
Hotcourses innovation day presentation
Hotcourses innovation day presentationHotcourses innovation day presentation
Hotcourses innovation day presentationAnanth PackkilDurai
 

More from Ananth PackkilDurai (6)

Functional Data Engineering.pdf
Functional Data Engineering.pdfFunctional Data Engineering.pdf
Functional Data Engineering.pdf
 
Emerging Trends in Data Engineering
Emerging Trends in Data EngineeringEmerging Trends in Data Engineering
Emerging Trends in Data Engineering
 
Measuring slack api_performance_using_druid
Measuring slack api_performance_using_druidMeasuring slack api_performance_using_druid
Measuring slack api_performance_using_druid
 
The journey towards pinot
The journey towards pinotThe journey towards pinot
The journey towards pinot
 
education Innovation, using webiner
education Innovation, using webinereducation Innovation, using webiner
education Innovation, using webiner
 
Hotcourses innovation day presentation
Hotcourses innovation day presentationHotcourses innovation day presentation
Hotcourses innovation day presentation
 

An introduction to Test Driven Development on MapReduce

  • 1. An introduction to Test Driven Development on MapReduce
  • 2. What is TDD • Test first development approach where developers write test cases to capture the failure cases and improve the system to the acceptable state.
  • 3. Why it is difficult in Hadoop • Hadoop is a distributed framework designed to run on a larger cluster with terra bytes of data • Mimic the behavior of a Hadoop cluster is very hard
  • 4. The Best Practice • Golden Rule of Programming Always abstract your business logic. This will make easier for you to unit test
  • 5. Example public class StockMeanReducer extends Reducer <Text,DoubleWritable,Text,DoubleWritable>{ private DoubleWritable writable = new DoubleWritable(); @Override public void reduce(final Text stockText, Iterable<DoubleWritable> values, Context context) throws IOException, InterruptedException { double total = 0; int count = 0; for(DoubleWritable stockPrice : values) { total += stockPrice.get(); count++; } writable.set(total / count); context.write(stockText, writable); }
  • 6. The best approach – Abstraction public class StockMeanReducer2 extends Reducer <Text,DoubleWritable,Text,DoubleWritable> { private DoubleWritable writable = new DoubleWritable(); private final StockMean stockMean = new StockMean(); @Override public void reduce(final Text stockText, Iterable<DoubleWritable> values, Context context) throws IOException, InterruptedException { stockMean.reset(); for(DoubleWritable stockPrice : values) { stockMean.add(stockPrice.get()); } writable.set(stockMean.calculate()); context.write(stockText, writable); } }
  • 7. The best approach – Abstraction- cont public class StockMean { private double total = 0; private int instance = 0; public void add(final double total) { this.total += total; ++this.instance; } public double calculate() { return total / (double) instance; } public void reset() { this.total = 0; this.instance = 0; } }
  • 8. Testing Map Reduce Jobs • Best Practices are fine. Still I need to test the code inside my mapper and reducer. What shall I do??
  • 9. Introduction to MRUNIT • MRUnit is a Map Reduce unit testing framework. • Developed by cloudera and been open sourced and currently in Apache Incubator. • Developed on top of Mockito mock object framework • It is a generic framework that you can use with both Junit and TestNG
  • 10. MRUnit – Testing Mapper Unit Test Mapper MR Unit MapDriver Mock Output Collector (1) Set up and execute test (2) Call Map method with key / value (3) Map output is captured (4) Compare the expected outputs
  • 11. Sample Mapper public class StockMeanMapper extends Mapper<Text,DoubleWritable,Text,DoubleWritable> { @Override protected void map(Text key, DoubleWritable value, Context context) throws IOException, InterruptedException { if(key == null) return; if(key.toString().equalsIgnoreCase("xyz")) return; context.write(key, value); } }
  • 12. Mapper Unit Test public class StockMeanMapperTest { private Mapper<Text,DoubleWritable,Text,DoubleWritable> mapper; private MapDriver<Text,DoubleWritable,Text,DoubleWritable> driver; @Before public void setUp() { mapper = new StockMeanMapper(); driver = new MapDriver<Text,DoubleWritable,Text,DoubleWritable>(mapper); } @Test public void testPositiveConditionStockMeanMapper() throws IOException { List<Pair<Text, DoubleWritable>> results = driver.withInput(new Text("rahul"), new DoubleWritable(1)) .withOutput(new Text("rahul"), new DoubleWritable(1)) .run(); assertEquals(1, results.size()); } }
  • 13. MRUnit – Testing Reducer Unit Test Reducer MR Unit ReduceDriver Mock Output Collector (1) Set up and execute test (2) Call Reduce method with key / value (3) Reduce output is captured (4) Compare the expected outputs
  • 14. Sample Reducer public class StockMeanReducer2 extends Reducer <Text,DoubleWritable,Text,DoubleWritable> { private DoubleWritable writable = new DoubleWritable(); private final StockMean stockMean = new StockMean(); @Override public void reduce(final Text stockText, Iterable<DoubleWritable> values, Context context) throws IOException, InterruptedException { stockMean.reset(); for(DoubleWritable stockPrice : values) { stockMean.add(stockPrice.get()); } writable.set(stockMean.calculate()); context.write(stockText, writable); } }
  • 15. Reducer Unit Test public class StockMeanReducerTest { private ReduceDriver<Text,DoubleWritable,Text,DoubleWritable> driver; private Reducer<Text,DoubleWritable,Text,DoubleWritable> reducer; @Before public void setup() { reducer = new StockMeanReducer2(); driver = new ReduceDriver<Text,DoubleWritable,Text,DoubleWritable>(reducer); } @Test public void testStockPositive() throws IOException { Pair<Text,DoubleWritable> assertPair = new Pair<Text,DoubleWritable>(new Text("ananth"), new DoubleWritable(300)); List<Pair<Text,DoubleWritable>> results = driver.withInput(new Text("ananth"), Arrays.asList(new DoubleWritable(500), new DoubleWritable(100))) .run(); assertEquals(assertPair, results.get(0)); } }
  • 16. MRUnit – Testing MapReduce Unit Test Reducer MR Unit (1) Set up and execute test (4) Call Reduce method with key / value (5) Compare the expected outputs MapReduceDriver MapDriver (3)Shuffle ReduceDriver Mapper (2) Call Map method with key / value (3) MRUnit perform it’s own in memory shuffle phase
  • 17. MapReduce Unit Test public class StockMeanMapReduceTest { private Mapper<Text,DoubleWritable,Text,DoubleWritable> mapper; private Reducer<Text,DoubleWritable,Text,DoubleWritable> reducer; private MapReduceDriver<Text,DoubleWritable,Text,DoubleWritable,Text,DoubleWritable> driver; @Before public void setup() { mapper = new StockMeanMapper(); reducer = new StockMeanReducer2(); driver = new MapReduceDriver<Text,DoubleWritable,Text,DoubleWritable,Text,DoubleWritable>(mapper,reducer); }
  • 18. MapReduce Unit Test – Contd.. @Test public void testPositive() throws IOException { Pair<Text,DoubleWritable> inputPair1 = new Pair<Text,DoubleWritable>(new Text("ananth"), new DoubleWritable(300)); Pair<Text,DoubleWritable> inputPair2 = new Pair<Text,DoubleWritable>(new Text("ananth"), new DoubleWritable(100)); Pair<Text,DoubleWritable> inputPair3 = new Pair<Text,DoubleWritable>(new Text("rahul"), new DoubleWritable(400)); Pair<Text,DoubleWritable> inputPair4 = new Pair<Text,DoubleWritable>(new Text("xyz"), new DoubleWritable(50)); Pair<Text,DoubleWritable> assertPair1 = new Pair<Text,DoubleWritable>(new Text("ananth"), new DoubleWritable(200)); Pair<Text,DoubleWritable> assertPair2 = new Pair<Text,DoubleWritable>(new Text("rahul"), new DoubleWritable(400)); List<Pair<Text,DoubleWritable>> assertPair = Arrays.asList(assertPair1,assertPair2); List<Pair<Text,DoubleWritable>> results = driver. withInput(inputPair1) .withInput(inputPair2) .withInput(inputPair3) .withInput(inputPair4) .run(); assertEquals(assertPair, results); }
  • 19. Wait, there is one more thing!!! • Hadoop is all about data. • We can’t always assume that data will be 100% perfect. • So do MRUnit unit testing by mocking Object is enough??
  • 20. Hadoop LocalFile System • Hadoop API provides LocalFileSystem, which enable you to read data from your local file system and test your map reduce jobs. • Best practice is to take a sample of your real data and load in to local file system and test it out. • LocalFileSystem only work in Linux based System.
  • 21. How can I test LocalFileSystem in Windows? – A little hack public class WindowsLocalFileSystem extends LocalFileSystem { public WindowsLocalFileSystem() { super(); } @Override public boolean mkdirs ( final Path path, final FsPermission permission) throws IOException { final boolean result = super.mkdirs(path); this.setPermission(path, permission); return result; }
  • 22. Hack Contd.. @Override public void setPermission ( final Path path, final FsPermission permission) throws IOException { try { super.setPermission(path, permission); } catch (final IOException e) { System.err.println("Cant help it, hence ignoring IOException setting persmission for path "" + path + "": " + e.getMessage()); } } }
  • 23. How to use it? public class StockMeanDriver extends Configured implements Tool { /** * @param args * @throws Exception */ public static void main(String[] args) throws Exception { ToolRunner.run(new StockMeanDriver(), null); }
  • 24. How to use it – contd.. @Override public int run(String[] arg0) throws Exception { Configuration conf = getConf(); conf.set("fs.default.name", "file:///"); conf.set("mapred.job.tracker", "local"); conf.set("fs.file.impl", "org.intellipaat.training.hadoop.fs.WindowsLocalFileSystem"); conf.set("io.serializations","org.apache.hadoop.io.serializer.JavaSerialization," + "org.apache.hadoop.io.serializer.WritableSerialization"); Job job = new Job(conf,"Stock Mean"); job.setJarByClass(StockMeanDriver.class); job.setMapperClass(StockMeanMapper2.class); job.setReducerClass(StockMeanReducer2.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(DoubleWritable.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(DoubleWritable.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path("input")); FileOutputFormat.setOutputPath(job, new Path("output")); job.waitForCompletion(Boolean.TRUE); return 0; }}