2. What is TDD
• Test first development approach where
developers write test cases to capture the
failure cases and improve the system to the
acceptable state.
3. Why it is difficult in Hadoop
• Hadoop is a distributed framework designed
to run on a larger cluster with terra bytes of
data
• Mimic the behavior of a Hadoop cluster is very
hard
4. The Best Practice
• Golden Rule of Programming
Always abstract your business logic. This
will make easier for you to unit test
5. Example
public class StockMeanReducer extends Reducer
<Text,DoubleWritable,Text,DoubleWritable>{
private DoubleWritable writable = new DoubleWritable();
@Override
public void reduce(final Text stockText, Iterable<DoubleWritable> values, Context
context) throws IOException, InterruptedException
{
double total = 0;
int count = 0;
for(DoubleWritable stockPrice : values)
{
total += stockPrice.get();
count++;
}
writable.set(total / count);
context.write(stockText, writable);
}
6. The best approach – Abstraction
public class StockMeanReducer2 extends Reducer <Text,DoubleWritable,Text,DoubleWritable>
{
private DoubleWritable writable = new DoubleWritable();
private final StockMean stockMean = new StockMean();
@Override
public void reduce(final Text stockText, Iterable<DoubleWritable> values, Context context) throws
IOException, InterruptedException
{
stockMean.reset();
for(DoubleWritable stockPrice : values)
{
stockMean.add(stockPrice.get());
}
writable.set(stockMean.calculate());
context.write(stockText, writable);
}
}
7. The best approach – Abstraction- cont
public class StockMean
{
private double total = 0;
private int instance = 0;
public void add(final double total)
{
this.total += total;
++this.instance;
}
public double calculate()
{
return total / (double) instance;
}
public void reset()
{
this.total = 0;
this.instance = 0;
}
}
8. Testing Map Reduce Jobs
• Best Practices are fine. Still I need to test the
code inside my mapper and reducer. What
shall I do??
9. Introduction to MRUNIT
• MRUnit is a Map Reduce unit testing
framework.
• Developed by cloudera and been open
sourced and currently in Apache Incubator.
• Developed on top of Mockito mock object
framework
• It is a generic framework that you can use
with both Junit and TestNG
10. MRUnit – Testing Mapper
Unit Test
Mapper
MR Unit
MapDriver
Mock Output
Collector
(1) Set up and execute test
(2) Call Map method with key /
value
(3) Map output is captured
(4) Compare the expected
outputs
12. Mapper Unit Test
public class StockMeanMapperTest {
private Mapper<Text,DoubleWritable,Text,DoubleWritable> mapper;
private MapDriver<Text,DoubleWritable,Text,DoubleWritable> driver;
@Before
public void setUp()
{
mapper = new StockMeanMapper();
driver = new MapDriver<Text,DoubleWritable,Text,DoubleWritable>(mapper);
}
@Test
public void testPositiveConditionStockMeanMapper() throws IOException
{
List<Pair<Text, DoubleWritable>> results = driver.withInput(new Text("rahul"), new
DoubleWritable(1))
.withOutput(new Text("rahul"), new DoubleWritable(1))
.run();
assertEquals(1, results.size());
}
}
13. MRUnit – Testing Reducer
Unit Test
Reducer
MR Unit
ReduceDriver
Mock Output
Collector
(1) Set up and execute test
(2) Call Reduce method with key
/ value
(3) Reduce output is captured
(4) Compare the expected
outputs
14. Sample Reducer
public class StockMeanReducer2 extends Reducer <Text,DoubleWritable,Text,DoubleWritable>
{
private DoubleWritable writable = new DoubleWritable();
private final StockMean stockMean = new StockMean();
@Override
public void reduce(final Text stockText, Iterable<DoubleWritable> values, Context context) throws
IOException, InterruptedException
{
stockMean.reset();
for(DoubleWritable stockPrice : values)
{
stockMean.add(stockPrice.get());
}
writable.set(stockMean.calculate());
context.write(stockText, writable);
}
}
15. Reducer Unit Test
public class StockMeanReducerTest {
private ReduceDriver<Text,DoubleWritable,Text,DoubleWritable> driver;
private Reducer<Text,DoubleWritable,Text,DoubleWritable> reducer;
@Before
public void setup()
{
reducer = new StockMeanReducer2();
driver = new ReduceDriver<Text,DoubleWritable,Text,DoubleWritable>(reducer);
}
@Test
public void testStockPositive() throws IOException
{
Pair<Text,DoubleWritable> assertPair = new Pair<Text,DoubleWritable>(new Text("ananth"),
new DoubleWritable(300));
List<Pair<Text,DoubleWritable>> results = driver.withInput(new Text("ananth"),
Arrays.asList(new
DoubleWritable(500),
new DoubleWritable(100)))
.run();
assertEquals(assertPair, results.get(0));
}
}
16. MRUnit – Testing MapReduce
Unit Test
Reducer
MR Unit
(1) Set up and execute test
(4) Call Reduce method with key
/ value (5) Compare the expected
outputs
MapReduceDriver
MapDriver
(3)Shuffle
ReduceDriver
Mapper
(2) Call Map method with key /
value
(3) MRUnit perform it’s own in
memory shuffle phase
17. MapReduce Unit Test
public class StockMeanMapReduceTest
{
private Mapper<Text,DoubleWritable,Text,DoubleWritable> mapper;
private Reducer<Text,DoubleWritable,Text,DoubleWritable> reducer;
private MapReduceDriver<Text,DoubleWritable,Text,DoubleWritable,Text,DoubleWritable> driver;
@Before
public void setup()
{
mapper = new StockMeanMapper();
reducer = new StockMeanReducer2();
driver = new MapReduceDriver<Text,DoubleWritable,Text,DoubleWritable,Text,DoubleWritable>(mapper,reducer);
}
18. MapReduce Unit Test – Contd..
@Test
public void testPositive() throws IOException
{
Pair<Text,DoubleWritable> inputPair1 = new Pair<Text,DoubleWritable>(new Text("ananth"), new DoubleWritable(300));
Pair<Text,DoubleWritable> inputPair2 = new Pair<Text,DoubleWritable>(new Text("ananth"), new DoubleWritable(100));
Pair<Text,DoubleWritable> inputPair3 = new Pair<Text,DoubleWritable>(new Text("rahul"), new DoubleWritable(400));
Pair<Text,DoubleWritable> inputPair4 = new Pair<Text,DoubleWritable>(new Text("xyz"), new DoubleWritable(50));
Pair<Text,DoubleWritable> assertPair1 = new Pair<Text,DoubleWritable>(new Text("ananth"), new DoubleWritable(200));
Pair<Text,DoubleWritable> assertPair2 = new Pair<Text,DoubleWritable>(new Text("rahul"), new DoubleWritable(400));
List<Pair<Text,DoubleWritable>> assertPair = Arrays.asList(assertPair1,assertPair2);
List<Pair<Text,DoubleWritable>> results = driver.
withInput(inputPair1)
.withInput(inputPair2)
.withInput(inputPair3)
.withInput(inputPair4)
.run();
assertEquals(assertPair, results);
}
19. Wait, there is one more thing!!!
• Hadoop is all about data.
• We can’t always assume that data will be
100% perfect.
• So do MRUnit unit testing by mocking Object
is enough??
20. Hadoop LocalFile System
• Hadoop API provides LocalFileSystem, which
enable you to read data from your local file
system and test your map reduce jobs.
• Best practice is to take a sample of your real
data and load in to local file system and test it
out.
• LocalFileSystem only work in Linux based
System.
21. How can I test LocalFileSystem in
Windows? – A little hack
public class WindowsLocalFileSystem extends LocalFileSystem
{
public WindowsLocalFileSystem()
{
super();
}
@Override
public boolean mkdirs (
final Path path,
final FsPermission permission)
throws IOException
{
final boolean result = super.mkdirs(path);
this.setPermission(path, permission);
return result;
}
22. Hack Contd..
@Override
public void setPermission (
final Path path,
final FsPermission permission)
throws
IOException
{
try {
super.setPermission(path, permission);
}
catch (final IOException e) {
System.err.println("Cant help it, hence ignoring IOException
setting persmission for path "" + path +
"": " + e.getMessage());
}
}
}
23. How to use it?
public class StockMeanDriver extends Configured implements Tool
{
/**
* @param args
* @throws Exception
*/
public static void main(String[] args) throws Exception {
ToolRunner.run(new StockMeanDriver(), null);
}
24. How to use it – contd..
@Override
public int run(String[] arg0) throws Exception
{
Configuration conf = getConf();
conf.set("fs.default.name", "file:///");
conf.set("mapred.job.tracker", "local");
conf.set("fs.file.impl", "org.intellipaat.training.hadoop.fs.WindowsLocalFileSystem");
conf.set("io.serializations","org.apache.hadoop.io.serializer.JavaSerialization," +
"org.apache.hadoop.io.serializer.WritableSerialization");
Job job = new Job(conf,"Stock Mean");
job.setJarByClass(StockMeanDriver.class);
job.setMapperClass(StockMeanMapper2.class);
job.setReducerClass(StockMeanReducer2.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(DoubleWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(DoubleWritable.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path("input"));
FileOutputFormat.setOutputPath(job, new Path("output"));
job.waitForCompletion(Boolean.TRUE);
return 0;
}}