• Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
457
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
32
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Testing Big Data: Unit Test in Hadoop (Part II) Jongwook Woo (PhD) High-Performance Internet Computing Center (HiPIC) Educational Partner with Cloudera and Grants Awardee of Amazon AWS Computer Information Systems Department California State University, Los Angeles Seoul Software Testing Conference
  • 2. Contents • Test in General • Use Cases: Big Data in Hadoop and Ecosystems • Unit Test in Hadoop Seoul Software Testing Conference
  • 3. Test in general • Quality Assurance – TDD (Test Driven Development) • Unit Test – Test functional units of the S/W – BDD (Behavior Driven Development) • Based on TDD • Test behavior of the S/W – Integration Test: • integrated components – Group of unit tests • CI (Continuous Integration) Server – Hudson, Jenkins etc Seoul Software Testing Conference
  • 4. CI Server • Continuous Integration Server – TDD (Test Driven Development) based • All developers commit the update everyday • CI server compile and run the unit tests • If a test fails, all receive the failure email – Know who committed a bad code – Hudson, Jenkins etc • Supports SCM version control tools – CVS, Subversion, Git Seoul Software Testing Conference
  • 5. Test in Hadoop • Much harder – JUnit cannot be used in Hadoop – Cluster – Server – Parallel Computing Seoul Software Testing Conference
  • 6. Use Cases: Shopzilla • Hadoop’s Elephant In The Room – Hadoop testing • Quality Assurance – Unit Test: functional units of the S/W – Integration Test: integrated components – BDD Test: Behavior of the S/W • Augmented Development – Use a dev cluster? » Too long per day – Hadoop-In-A-Box Seoul Software Testing Conference
  • 7. Use Cases: Shopzilla • Hadoop-In-A-Box – Fully compatible Mock Environment • Without a cluster • Mock cluster state – Test Locally • Single Node Pseudo Cluster • MiniMRCluster • => can test HDFS, Pig Seoul Software Testing Conference
  • 8. Use Cases: Yahoo • Developer – Wants to run Hadoop codes in the local machine • Does not want to run Hadoop codes at the Hadoop cluster • Yahoo HIT – Hadoop Integration Test – Run Hadoop tests in the Hadoop Ecosystems • Deploy HIT on a Hadoop single or cluster • Run tests in Hadoop, Pig, Hive, Oozie,… Seoul Software Testing Conference
  • 9. Unit Test in Hadoop • MRUnit testing framework – is based on JUnit – Cloudera donated to Apache – can test Map Reduce programs • written on 0.20 , 0.23.x , 1.0.x , 2.x version of Hadoop – Can test Mapper, Reducer, MapperReducer Seoul Software Testing Conference
  • 10. Unit Test in Hadoop • WordCount Example – reads text files and counts how often words occur. • The input and the output are text files, – Need three classes • WordCount.java – Driver class with main function • WordMapper.java – Mapper class with map method • SumReducer.java – Reducer class with reduce method Seoul Software Testing Conference
  • 11. WordCount Example • WordMapper.java – Mapper class with map function – For the given sample input • assuming two map nodes – The sample input is distributed to the maps • the first map emits: – <Hello, 1> <World, 1> <Bye, 1> <World, 1> • The second map emits: – <Hello, 1> <Hadoop, 1> <Goodbye, 1> <Hadoop, 1> Seoul Software Testing Conference
  • 12. WordCount Example • SumReducer.java – Reducer class with reduce function – For the input from two Mappers • the reduce method just sums up the values, – which are the occurrence counts for each key • Thus the output of the job is: – <Bye, 1> <Goodbye, 1> <Hadoop, 2> <Hello, 2> <World, 2> Seoul Software Testing Conference
  • 13. WordCount.java (Driver) import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [input] [output]"); System.exit(-1); } Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); } } Seoul Software Testing Conference
  • 14. WordCount.java public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [input] [output]"); System.exit(-1); } Check Input and Output files Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); } } Seoul Software Testing Conference
  • 15. WordCount.java public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [input] [output]"); System.exit(-1); } Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); Set output (key, value) types job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); } } Seoul Software Testing Conference
  • 16. WordCount.java public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [input] [output]"); System.exit(-1); } Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); Set Mapper/Reducer classes job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); } } Seoul Software Testing Conference
  • 17. WordCount.java public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [input] [output]"); System.exit(-1); } Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); Set Input/Output format classes job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); } } Seoul Software Testing Conference
  • 18. WordCount.java public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [input] [output]"); System.exit(-1); } Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); Set Input/Output paths FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); } } Seoul Software Testing Conference
  • 19. WordCount.java public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [input] [output]"); System.exit(-1); } Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); Set FileOutputFormat.setOutputPath(job, new Path(args[1])); Driver class job.setJarByClass(WordCount.class); job.submit(); } } Seoul Software Testing Conference
  • 20. WordCount.java public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [input] [output]"); System.exit(-1); } Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); Submit the job job.setJarByClass(WordCount.class); to the master node job.submit(); } } Seoul Software Testing Conference
  • 21. WordMapper.java (Mapper class) import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class WordMapper extends Mapper<Object, Text, Text, IntWritable> { private Text word = new Text(); private final static IntWritable one = new IntWritable(1); @Override public void map(Object key, Text value, Context contex) throws IOException, InterruptedException { // Break line into words for processing StringTokenizer wordList = new StringTokenizer(value.toString()); while (wordList.hasMoreTokens()) { word.set(wordList.nextToken()); contex.write(word, one); } } } Seoul Software Testing Conference
  • 22. WordMapper.java Extends mapper class with input/ output keys and values public class WordMapper extends Mapper<Object, Text, Text, IntWritable> { private Text word = new Text(); private final static IntWritable one = new IntWritable(1); @Override public void map(Object key, Text value, Context contex) throws IOException, InterruptedException { // Break line into words for processing StringTokenizer wordList = new StringTokenizer(value.toString()); while (wordList.hasMoreTokens()) { word.set(wordList.nextToken()); contex.write(word, one); } } } Seoul Software Testing Conference
  • 23. WordMapper.java public class WordMapper extends Mapper<Object, Text, Text, IntWritable> { Output (key, value) types private Text word = new Text(); private final static IntWritable one = new IntWritable(1); @Override public void map(Object key, Text value, Context contex) throws IOException, InterruptedException { // Break line into words for processing StringTokenizer wordList = new StringTokenizer(value.toString()); while (wordList.hasMoreTokens()) { word.set(wordList.nextToken()); contex.write(word, one); } } } Seoul Software Testing Conference
  • 24. WordMapper.java public class WordMapper extends Mapper<Object, Text, Text, IntWritable> { private Text word = new Text(); private final static IntWritable one = new IntWritable(1); Input (key, value) types Output as Context type @Override public void map(Object key, Text value, Context contex) throws IOException, InterruptedException { // Break line into words for processing StringTokenizer wordList = new StringTokenizer(value.toString()); while (wordList.hasMoreTokens()) { word.set(wordList.nextToken()); contex.write(word, one); } } } Seoul Software Testing Conference
  • 25. WordMapper.java public class WordMapper extends Mapper<Object, Text, Text, IntWritable> { private Text word = new Text(); private final static IntWritable one = new IntWritable(1); @Override public void map(Object key, Text value, Read words from each line Context contex) throws IOException, InterruptedException { input file of the // Break line into words for processing StringTokenizer wordList = new StringTokenizer(value.toString()); while (wordList.hasMoreTokens()) { word.set(wordList.nextToken()); contex.write(word, one); } } } Seoul Software Testing Conference
  • 26. WordMapper.java public class WordMapper extends Mapper<Object, Text, Text, IntWritable> { private Text word = new Text(); private final static IntWritable one = new IntWritable(1); @Override public void map(Object key, Text value, Context contex) throws IOException, InterruptedException { // Break line into words for processing StringTokenizer wordList = new StringTokenizer(value.toString()); Count each word while (wordList.hasMoreTokens()) { word.set(wordList.nextToken()); contex.write(word, one); } } } Seoul Software Testing Conference
  • 27. Shuffler/Sorter • Maps emit (key, value) pairs • Shuffler/Sorter of Hadoop framework – Sort (key, value) pairs by key – Then, append the value to make (key, list of values) p air – For example, • The first, second maps emit: – <Hello, 1> <World, 1> <Bye, 1> <World, 1> – <Hello, 1> <Hadoop, 1> <Goodbye, 1> <Hadoop, 1> • Shuffler produces and it becomes the input of the reducer – <Bye, 1>, <Goodbye, 1>, <Hadoop, <1,1>>, <Hello, <1, 1>> , <World, <1,1>> Seoul Software Testing Conference
  • 28. SumReducer.java (Reducer class) import java.io.IOException; import java.util.Iterator; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; public class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable totalWordCount = new IntWritable(); @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; Iterator<IntWritable> it=values.iterator(); while (it.hasNext()) { wordCount += it.next().get(); } totalWordCount.set(wordCount); context.write(key, totalWordCount); } } Seoul Software Testing Conference
  • 29. SumReducer.java Extends Reducer class with input/ output keys and values public class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable totalWordCount = new IntWritable(); @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; Iterator<IntWritable> it=values.iterator(); while (it.hasNext()) { wordCount += it.next().get(); } totalWordCount.set(wordCount); context.write(key, totalWordCount); } } Seoul Software Testing Conference
  • 30. SumReducer.java public class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { Set output value type private IntWritable totalWordCount = new IntWritable(); @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; Iterator<IntWritable> it=values.iterator(); while (it.hasNext()) { wordCount += it.next().get(); } totalWordCount.set(wordCount); context.write(key, totalWordCount); } } Seoul Software Testing Conference
  • 31. SumReducer.java public class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable totalWordCount = new IntWritable(); Set input (key, list of values) type and output as Context class @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; Iterator<IntWritable> it=values.iterator(); while (it.hasNext()) { wordCount += it.next().get(); } totalWordCount.set(wordCount); context.write(key, totalWordCount); } } Seoul Software Testing Conference
  • 32. SumReducer.java public class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable totalWordCount = new IntWritable(); @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; For each word, Iterator<IntWritable> it=values.iterator(); Count/sum the number of values while (it.hasNext()) { wordCount += it.next().get(); } totalWordCount.set(wordCount); context.write(key, totalWordCount); } } Seoul Software Testing Conference
  • 33. SumReducer.java public class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable totalWordCount = new IntWritable(); @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; Iterator<IntWritable> it=values.iterator(); while (it.hasNext()) { wordCount += it.next().get(); } For each word, Total count becomes the value totalWordCount.set(wordCount); context.write(key, totalWordCount); } } Seoul Software Testing Conference
  • 34. SumReducer • Reducer – Input: Shuffler produces and it becomes the input of the reducer • <Bye, 1>, <Goodbye, 1>, <Hadoop, <1,1>>, <Hello, <1, 1>> , <World, <1,1>> – Output • <Bye, 1>, <Goodbye, 1>, <Hadoop, 2>, <Hello, 2>, <World, 2> Seoul Software Testing Conference
  • 35. MRUnit Test • How to UnitTest in Hadoop – Extending JUnit test • With org.apache.hadoop.mrunit.* API – Needs to test Driver, Mapper, Reducer • MapReduceDriver, MapDriver, ReduceDriver • Add input with expected output Seoul Software Testing Conference
  • 36. MRUnit Test import java.util.ArrayList; import java.util.List; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mrunit.MapDriver; import org.apache.hadoop.mrunit.MapReduceDriver; import org.apache.hadoop.mrunit.ReduceDriver; import org.junit.Before; import org.junit.Test; public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDriver; MapDriver<LongWritable, Text, Text, IntWritable> mapDriver; ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver; @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer(); mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper); reduceDriver = new ReduceDriver<Text, IntWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer); mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable>(); mapReduceDriver.setMapper(mapper); mapReduceDriver.setReducer(reducer); } Seoul Software Testing Conference
  • 37. MRUnit Test @Test public void testMapper() { mapDriver.withInput(new LongWritable(1), new Text("cat cat dog")); mapDriver.withOutput(new Text("cat"), new IntWritable(1)); mapDriver.withOutput(new Text("cat"), new IntWritable(1)); mapDriver.withOutput(new Text("dog"), new IntWritable(1)); mapDriver.runTest(); } @Test public void testReducer() { List<IntWritable> values = new ArrayList<IntWritable>(); values.add(new IntWritable(1)); values.add(new IntWritable(1)); reduceDriver.withInput(new Text("cat"), values); reduceDriver.withOutput(new Text("cat"), new IntWritable(2)); reduceDriver.runTest(); } @Test public void testMapReduce() { mapReduceDriver.withInput(new LongWritable(1), new Text("cat cat dog")); mapReduceDriver.addOutput(new Text("cat"), new IntWritable(2)); mapReduceDriver.addOutput(new Text("dog"), new IntWritable(1)); mapReduceDriver.runTest(); } } Seoul Software Testing Conference
  • 38. MRUnit Test Using MRUnit API, declare MapReduce, Mapper, Reducer drivers with input/output (key, value) public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDr iver; MapDriver<LongWritable, Text, Text, IntWritable> mapDriver; ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver; @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer(); mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper); reduceDriver = new ReduceDriver<Text, IntWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer); mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, I ntWritable>(); mapReduceDriver.setMapper(mapper); mapReduceDriver.setReducer(reducer); } Seoul Software Testing Conference
  • 39. MRUnit Test public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDr iver; MapDriver<LongWritable, Text, Text, IntWritable> mapDriver; ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver; Run setUp() before executing each test method @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer(); mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper); reduceDriver = new ReduceDriver<Text, IntWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer); mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, I ntWritable>(); mapReduceDriver.setMapper(mapper); mapReduceDriver.setReducer(reducer); } Seoul Software Testing Conference
  • 40. MRUnit Test public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDr iver; MapDriver<LongWritable, Text, Text, IntWritable> mapDriver; ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver; @Before Instantiate WordCount Mapper, Reducer public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer(); mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper); reduceDriver = new ReduceDriver<Text, IntWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer); mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, I ntWritable>(); mapReduceDriver.setMapper(mapper); mapReduceDriver.setReducer(reducer); } Seoul Software Testing Conference
  • 41. MRUnit Test public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDr iver; MapDriver<LongWritable, Text, Text, IntWritable> mapDriver; ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver; @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer(); Instantiate and set Mapper driver with input/output (key, value) mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper); reduceDriver = new ReduceDriver<Text, IntWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer); mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, I ntWritable>(); mapReduceDriver.setMapper(mapper); mapReduceDriver.setReducer(reducer); } Seoul Software Testing Conference
  • 42. MRUnit Test public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDr iver; MapDriver<LongWritable, Text, Text, IntWritable> mapDriver; ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver; @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer(); Instantiate and set Reducer driver with input/output (key, value) mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper); reduceDriver = new ReduceDriver<Text, IntWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer); mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, I ntWritable>(); mapReduceDriver.setMapper(mapper); mapReduceDriver.setReducer(reducer); } Seoul Software Testing Conference
  • 43. MRUnit Test public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDr iver; MapDriver<LongWritable, Text, Text, IntWritable> mapDriver; ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver; @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer(); Instantiate and set MapperReducer driver with input/output (key, value) mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper); reduceDriver = new ReduceDriver<Text, IntWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer); mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, I ntWritable>(); mapReduceDriver.setMapper(mapper); mapReduceDriver.setReducer(reducer); } Seoul Software Testing Conference
  • 44. MRUnit Test Mapper test: Define sample input with expected output @Test public void testMapper() { mapDriver.withInput(new LongWritable(1), new Text("cat cat dog")); mapDriver.withOutput(new Text("cat"), new IntWritable(1)); mapDriver.withOutput(new Text("cat"), new IntWritable(1)); mapDriver.withOutput(new Text("dog"), new IntWritable(1)); mapDriver.runTest(); } @Test public void testReducer() { List<IntWritable> values = new ArrayList<IntWritable>(); values.add(new IntWritable(1)); values.add(new IntWritable(1)); reduceDriver.withInput(new Text("cat"), values); reduceDriver.withOutput(new Text("cat"), new IntWritable(2)); reduceDriver.runTest(); } Seoul Software Testing Conference
  • 45. MRUnit Test @Test public void testMapper() { mapDriver.withInput(new LongWritable(1), new Text("cat cat dog")); mapDriver.withOutput(new Text("cat"), new IntWritable(1)); mapDriver.withOutput(new Text("cat"), new IntWritable(1)); mapDriver.withOutput(new Text("dog"), new IntWritable(1)); mapDriver.runTest(); } Reducer test: Define sample input with expected output @Test public void testReducer() { List<IntWritable> values = new ArrayList<IntWritable>(); values.add(new IntWritable(1)); values.add(new IntWritable(1)); reduceDriver.withInput(new Text("cat"), values); reduceDriver.withOutput(new Text("cat"), new IntWritable(2)); reduceDriver.runTest(); } Seoul Software Testing Conference
  • 46. MRUnit Test MapperReducer test: Define sample input with expected output @Test public void testMapReduce() { mapReduceDriver.withInput(new LongWritable(1), new Text("cat cat dog")); mapReduceDriver.addOutput(new Text("cat"), new IntWritable(2)); mapReduceDriver.addOutput(new Text("dog"), new IntWritable(1)); mapReduceDriver.runTest(); } } Seoul Software Testing Conference
  • 47. MRUnit Test in real • Need to implement unit tests – How many? • all Map, Reduce, Driver – Problems? • Mostly work – But it does not support complicated Map, Reduce APIs – How many problems you can detect • Depends on how well you implement MRUnit code Seoul Software Testing Conference
  • 48. Conclusion • • • • MRUnit for Hadoop Unit Test Development Integrate with QA site with CI server Need to use it Seoul Software Testing Conference
  • 49. Question? Seoul Software Testing Conference
  • 50. References 1. Hadoop WordCount example with new map reduce api (http://codesfusion.blogspot.com/2013/1 0/hadoop-wordcount-with-new-map-reduce-api.html) 2. Hadoop Word Count Example (http://wiki.apache.org/hadoop/WordCount ) 3. Example: WordCount v1.0, Cloudera Hadoop Tutorial (http://www.cloudera.com/content/clouder a-content/cloudera-docs/HadoopTutorial/CDH4/Hadoop-Tutorial/ht_walk_through.html ) 4. Testing Word Count (https://cwiki.apache.org/confluence/display/MRUNIT/Testing+Word+Count) 5. Apache MRUnit Tutorial (https://cwiki.apache.org/confluence/display/MRUNIT/MRUnit+Tutorial ) 6. Hadoop Integration Test Suite, Shopzilla (https://github.com/shopzilla/hadoop-integration-test-sui te ) 7. Hadoop’s Elepahnt in the Room, Jeremy Lucas, Shopzilla (http://tech.shopzilla.com/2013/04/hado ops-elephant-in-the-room/ ) 8. Facebook Test MapReduce Local (https://github.com/facebook/hadoop-20/blob/master/src/test/ org/apache/hadoop/mapreduce/TestMapReduceLocal.java ) 9. Yahoo HIT Hadoop Integrated Testing (http://www.slideshare.net/ydn/hi-tv3?from_search=1 ) Seoul Software Testing Conference
  • 51. Seoul Software Testing Conference
  • 52. Seoul Software Testing Conference