Testing Big Data:
Unit Test in Hadoop (Part II)
Jongwook Woo (PhD)
High-Performance Internet Computing Center (HiPIC)
Educ...
Contents

• Test in General
• Use Cases: Big Data in Hadoop and Ecosystems
• Unit Test in Hadoop

Seoul Software Testing C...
Test in general

• Quality Assurance
– TDD (Test Driven Development)
• Unit Test
– Test functional units of the S/W

– BDD...
CI Server

• Continuous Integration Server
– TDD (Test Driven Development) based
• All developers commit the update everyd...
Test in Hadoop

• Much harder
– JUnit cannot be used in Hadoop
– Cluster
– Server
– Parallel Computing

Seoul Software Tes...
Use Cases: Shopzilla

• Hadoop’s Elephant In The Room
– Hadoop testing
• Quality Assurance
– Unit Test: functional units o...
Use Cases: Shopzilla

• Hadoop-In-A-Box
– Fully compatible Mock Environment
• Without a cluster
• Mock cluster state

– Te...
Use Cases: Yahoo

• Developer
– Wants to run Hadoop codes in the local machine
• Does not want to run Hadoop codes at the ...
Unit Test in Hadoop

• MRUnit testing framework
– is based on JUnit
– Cloudera donated to Apache
– can test Map Reduce pro...
Unit Test in Hadoop

• WordCount Example
– reads text files and counts how often words occur.
• The input and the output a...
WordCount Example

• WordMapper.java
– Mapper class with map function
– For the given sample input
• assuming two map node...
WordCount Example

• SumReducer.java
– Reducer class with reduce function
– For the input from two Mappers
• the reduce me...
WordCount.java (Driver)
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.h...
WordCount.java
public class WordCount {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
...
WordCount.java
public class WordCount {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
...
WordCount.java
public class WordCount {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
...
WordCount.java
public class WordCount {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
...
WordCount.java
public class WordCount {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
...
WordCount.java
public class WordCount {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
...
WordCount.java
public class WordCount {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
...
WordMapper.java (Mapper class)
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.I...
WordMapper.java

Extends mapper class with input/
output keys and values

public class WordMapper extends Mapper<Object, T...
WordMapper.java
public class WordMapper extends Mapper<Object, Text, Text, IntWritable> {
Output (key, value) types
privat...
WordMapper.java
public class WordMapper extends Mapper<Object, Text, Text, IntWritable> {
private Text word = new Text();
...
WordMapper.java
public class WordMapper extends Mapper<Object, Text, Text, IntWritable> {
private Text word = new Text();
...
WordMapper.java
public class WordMapper extends Mapper<Object, Text, Text, IntWritable> {
private Text word = new Text();
...
Shuffler/Sorter

• Maps emit (key, value) pairs
• Shuffler/Sorter of Hadoop framework
– Sort (key, value) pairs by key
– T...
SumReducer.java (Reducer class)
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.IntWrit...
SumReducer.java

Extends Reducer class with input/
output keys and values

public class SumReducer extends Reducer<Text, I...
SumReducer.java
public class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
Set output value type

pri...
SumReducer.java
public class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable totalW...
SumReducer.java
public class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable totalW...
SumReducer.java
public class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable totalW...
SumReducer

• Reducer
– Input: Shuffler produces and it becomes the input of
the reducer
• <Bye, 1>, <Goodbye, 1>, <Hadoop...
MRUnit Test

• How to UnitTest in Hadoop
– Extending JUnit test
• With org.apache.hadoop.mrunit.* API

– Needs to test Dri...
MRUnit Test
import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.io.IntWritable;
import org.apache....
MRUnit Test
@Test
public void testMapper() {
mapDriver.withInput(new LongWritable(1), new Text("cat cat dog"));
mapDriver....
MRUnit Test

Using MRUnit API, declare MapReduce,
Mapper, Reducer drivers
with input/output (key, value)

public class Tes...
MRUnit Test
public class TestWordCount {
MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapRedu...
MRUnit Test
public class TestWordCount {
MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapRedu...
MRUnit Test
public class TestWordCount {
MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapRedu...
MRUnit Test
public class TestWordCount {
MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapRedu...
MRUnit Test
public class TestWordCount {
MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapRedu...
MRUnit Test
Mapper test:
Define sample input with expected output

@Test
public void testMapper() {
mapDriver.withInput(ne...
MRUnit Test
@Test
public void testMapper() {
mapDriver.withInput(new LongWritable(1), new Text("cat cat dog"));
mapDriver....
MRUnit Test

MapperReducer test:
Define sample input with expected output

@Test
public void testMapReduce() {
mapReduceDr...
MRUnit Test in real

• Need to implement unit tests
– How many?
• all Map, Reduce, Driver

– Problems?
• Mostly work
– But...
Conclusion

•
•
•
•

MRUnit for Hadoop Unit Test
Development
Integrate with QA site with CI server
Need to use it

Seoul S...
Question?

Seoul Software Testing Conference
References
1. Hadoop WordCount example with new map reduce api (http://codesfusion.blogspot.com/2013/1
0/hadoop-wordcount-...
Seoul Software Testing Conference
Seoul Software Testing Conference
Upcoming SlideShare
Loading in...5
×

2014 International Software Testing Conference in Seoul

653

Published on

Introduction to test Hadoop MapReduce code and working environment

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
653
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
37
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

2014 International Software Testing Conference in Seoul

  1. 1. Testing Big Data: Unit Test in Hadoop (Part II) Jongwook Woo (PhD) High-Performance Internet Computing Center (HiPIC) Educational Partner with Cloudera and Grants Awardee of Amazon AWS Computer Information Systems Department California State University, Los Angeles Seoul Software Testing Conference
  2. 2. Contents • Test in General • Use Cases: Big Data in Hadoop and Ecosystems • Unit Test in Hadoop Seoul Software Testing Conference
  3. 3. Test in general • Quality Assurance – TDD (Test Driven Development) • Unit Test – Test functional units of the S/W – BDD (Behavior Driven Development) • Based on TDD • Test behavior of the S/W – Integration Test: • integrated components – Group of unit tests • CI (Continuous Integration) Server – Hudson, Jenkins etc Seoul Software Testing Conference
  4. 4. CI Server • Continuous Integration Server – TDD (Test Driven Development) based • All developers commit the update everyday • CI server compile and run the unit tests • If a test fails, all receive the failure email – Know who committed a bad code – Hudson, Jenkins etc • Supports SCM version control tools – CVS, Subversion, Git Seoul Software Testing Conference
  5. 5. Test in Hadoop • Much harder – JUnit cannot be used in Hadoop – Cluster – Server – Parallel Computing Seoul Software Testing Conference
  6. 6. Use Cases: Shopzilla • Hadoop’s Elephant In The Room – Hadoop testing • Quality Assurance – Unit Test: functional units of the S/W – Integration Test: integrated components – BDD Test: Behavior of the S/W • Augmented Development – Use a dev cluster? » Too long per day – Hadoop-In-A-Box Seoul Software Testing Conference
  7. 7. Use Cases: Shopzilla • Hadoop-In-A-Box – Fully compatible Mock Environment • Without a cluster • Mock cluster state – Test Locally • Single Node Pseudo Cluster • MiniMRCluster • => can test HDFS, Pig Seoul Software Testing Conference
  8. 8. Use Cases: Yahoo • Developer – Wants to run Hadoop codes in the local machine • Does not want to run Hadoop codes at the Hadoop cluster • Yahoo HIT – Hadoop Integration Test – Run Hadoop tests in the Hadoop Ecosystems • Deploy HIT on a Hadoop single or cluster • Run tests in Hadoop, Pig, Hive, Oozie,… Seoul Software Testing Conference
  9. 9. Unit Test in Hadoop • MRUnit testing framework – is based on JUnit – Cloudera donated to Apache – can test Map Reduce programs • written on 0.20 , 0.23.x , 1.0.x , 2.x version of Hadoop – Can test Mapper, Reducer, MapperReducer Seoul Software Testing Conference
  10. 10. Unit Test in Hadoop • WordCount Example – reads text files and counts how often words occur. • The input and the output are text files, – Need three classes • WordCount.java – Driver class with main function • WordMapper.java – Mapper class with map method • SumReducer.java – Reducer class with reduce method Seoul Software Testing Conference
  11. 11. WordCount Example • WordMapper.java – Mapper class with map function – For the given sample input • assuming two map nodes – The sample input is distributed to the maps • the first map emits: – <Hello, 1> <World, 1> <Bye, 1> <World, 1> • The second map emits: – <Hello, 1> <Hadoop, 1> <Goodbye, 1> <Hadoop, 1> Seoul Software Testing Conference
  12. 12. WordCount Example • SumReducer.java – Reducer class with reduce function – For the input from two Mappers • the reduce method just sums up the values, – which are the occurrence counts for each key • Thus the output of the job is: – <Bye, 1> <Goodbye, 1> <Hadoop, 2> <Hello, 2> <World, 2> Seoul Software Testing Conference
  13. 13. WordCount.java (Driver) import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [input] [output]"); System.exit(-1); } Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); } } Seoul Software Testing Conference
  14. 14. WordCount.java public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [input] [output]"); System.exit(-1); } Check Input and Output files Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); } } Seoul Software Testing Conference
  15. 15. WordCount.java public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [input] [output]"); System.exit(-1); } Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); Set output (key, value) types job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); } } Seoul Software Testing Conference
  16. 16. WordCount.java public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [input] [output]"); System.exit(-1); } Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); Set Mapper/Reducer classes job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); } } Seoul Software Testing Conference
  17. 17. WordCount.java public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [input] [output]"); System.exit(-1); } Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); Set Input/Output format classes job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); } } Seoul Software Testing Conference
  18. 18. WordCount.java public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [input] [output]"); System.exit(-1); } Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); Set Input/Output paths FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); } } Seoul Software Testing Conference
  19. 19. WordCount.java public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [input] [output]"); System.exit(-1); } Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); Set FileOutputFormat.setOutputPath(job, new Path(args[1])); Driver class job.setJarByClass(WordCount.class); job.submit(); } } Seoul Software Testing Conference
  20. 20. WordCount.java public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [input] [output]"); System.exit(-1); } Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); Submit the job job.setJarByClass(WordCount.class); to the master node job.submit(); } } Seoul Software Testing Conference
  21. 21. WordMapper.java (Mapper class) import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class WordMapper extends Mapper<Object, Text, Text, IntWritable> { private Text word = new Text(); private final static IntWritable one = new IntWritable(1); @Override public void map(Object key, Text value, Context contex) throws IOException, InterruptedException { // Break line into words for processing StringTokenizer wordList = new StringTokenizer(value.toString()); while (wordList.hasMoreTokens()) { word.set(wordList.nextToken()); contex.write(word, one); } } } Seoul Software Testing Conference
  22. 22. WordMapper.java Extends mapper class with input/ output keys and values public class WordMapper extends Mapper<Object, Text, Text, IntWritable> { private Text word = new Text(); private final static IntWritable one = new IntWritable(1); @Override public void map(Object key, Text value, Context contex) throws IOException, InterruptedException { // Break line into words for processing StringTokenizer wordList = new StringTokenizer(value.toString()); while (wordList.hasMoreTokens()) { word.set(wordList.nextToken()); contex.write(word, one); } } } Seoul Software Testing Conference
  23. 23. WordMapper.java public class WordMapper extends Mapper<Object, Text, Text, IntWritable> { Output (key, value) types private Text word = new Text(); private final static IntWritable one = new IntWritable(1); @Override public void map(Object key, Text value, Context contex) throws IOException, InterruptedException { // Break line into words for processing StringTokenizer wordList = new StringTokenizer(value.toString()); while (wordList.hasMoreTokens()) { word.set(wordList.nextToken()); contex.write(word, one); } } } Seoul Software Testing Conference
  24. 24. WordMapper.java public class WordMapper extends Mapper<Object, Text, Text, IntWritable> { private Text word = new Text(); private final static IntWritable one = new IntWritable(1); Input (key, value) types Output as Context type @Override public void map(Object key, Text value, Context contex) throws IOException, InterruptedException { // Break line into words for processing StringTokenizer wordList = new StringTokenizer(value.toString()); while (wordList.hasMoreTokens()) { word.set(wordList.nextToken()); contex.write(word, one); } } } Seoul Software Testing Conference
  25. 25. WordMapper.java public class WordMapper extends Mapper<Object, Text, Text, IntWritable> { private Text word = new Text(); private final static IntWritable one = new IntWritable(1); @Override public void map(Object key, Text value, Read words from each line Context contex) throws IOException, InterruptedException { input file of the // Break line into words for processing StringTokenizer wordList = new StringTokenizer(value.toString()); while (wordList.hasMoreTokens()) { word.set(wordList.nextToken()); contex.write(word, one); } } } Seoul Software Testing Conference
  26. 26. WordMapper.java public class WordMapper extends Mapper<Object, Text, Text, IntWritable> { private Text word = new Text(); private final static IntWritable one = new IntWritable(1); @Override public void map(Object key, Text value, Context contex) throws IOException, InterruptedException { // Break line into words for processing StringTokenizer wordList = new StringTokenizer(value.toString()); Count each word while (wordList.hasMoreTokens()) { word.set(wordList.nextToken()); contex.write(word, one); } } } Seoul Software Testing Conference
  27. 27. Shuffler/Sorter • Maps emit (key, value) pairs • Shuffler/Sorter of Hadoop framework – Sort (key, value) pairs by key – Then, append the value to make (key, list of values) p air – For example, • The first, second maps emit: – <Hello, 1> <World, 1> <Bye, 1> <World, 1> – <Hello, 1> <Hadoop, 1> <Goodbye, 1> <Hadoop, 1> • Shuffler produces and it becomes the input of the reducer – <Bye, 1>, <Goodbye, 1>, <Hadoop, <1,1>>, <Hello, <1, 1>> , <World, <1,1>> Seoul Software Testing Conference
  28. 28. SumReducer.java (Reducer class) import java.io.IOException; import java.util.Iterator; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; public class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable totalWordCount = new IntWritable(); @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; Iterator<IntWritable> it=values.iterator(); while (it.hasNext()) { wordCount += it.next().get(); } totalWordCount.set(wordCount); context.write(key, totalWordCount); } } Seoul Software Testing Conference
  29. 29. SumReducer.java Extends Reducer class with input/ output keys and values public class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable totalWordCount = new IntWritable(); @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; Iterator<IntWritable> it=values.iterator(); while (it.hasNext()) { wordCount += it.next().get(); } totalWordCount.set(wordCount); context.write(key, totalWordCount); } } Seoul Software Testing Conference
  30. 30. SumReducer.java public class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { Set output value type private IntWritable totalWordCount = new IntWritable(); @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; Iterator<IntWritable> it=values.iterator(); while (it.hasNext()) { wordCount += it.next().get(); } totalWordCount.set(wordCount); context.write(key, totalWordCount); } } Seoul Software Testing Conference
  31. 31. SumReducer.java public class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable totalWordCount = new IntWritable(); Set input (key, list of values) type and output as Context class @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; Iterator<IntWritable> it=values.iterator(); while (it.hasNext()) { wordCount += it.next().get(); } totalWordCount.set(wordCount); context.write(key, totalWordCount); } } Seoul Software Testing Conference
  32. 32. SumReducer.java public class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable totalWordCount = new IntWritable(); @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; For each word, Iterator<IntWritable> it=values.iterator(); Count/sum the number of values while (it.hasNext()) { wordCount += it.next().get(); } totalWordCount.set(wordCount); context.write(key, totalWordCount); } } Seoul Software Testing Conference
  33. 33. SumReducer.java public class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable totalWordCount = new IntWritable(); @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; Iterator<IntWritable> it=values.iterator(); while (it.hasNext()) { wordCount += it.next().get(); } For each word, Total count becomes the value totalWordCount.set(wordCount); context.write(key, totalWordCount); } } Seoul Software Testing Conference
  34. 34. SumReducer • Reducer – Input: Shuffler produces and it becomes the input of the reducer • <Bye, 1>, <Goodbye, 1>, <Hadoop, <1,1>>, <Hello, <1, 1>> , <World, <1,1>> – Output • <Bye, 1>, <Goodbye, 1>, <Hadoop, 2>, <Hello, 2>, <World, 2> Seoul Software Testing Conference
  35. 35. MRUnit Test • How to UnitTest in Hadoop – Extending JUnit test • With org.apache.hadoop.mrunit.* API – Needs to test Driver, Mapper, Reducer • MapReduceDriver, MapDriver, ReduceDriver • Add input with expected output Seoul Software Testing Conference
  36. 36. MRUnit Test import java.util.ArrayList; import java.util.List; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mrunit.MapDriver; import org.apache.hadoop.mrunit.MapReduceDriver; import org.apache.hadoop.mrunit.ReduceDriver; import org.junit.Before; import org.junit.Test; public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDriver; MapDriver<LongWritable, Text, Text, IntWritable> mapDriver; ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver; @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer(); mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper); reduceDriver = new ReduceDriver<Text, IntWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer); mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable>(); mapReduceDriver.setMapper(mapper); mapReduceDriver.setReducer(reducer); } Seoul Software Testing Conference
  37. 37. MRUnit Test @Test public void testMapper() { mapDriver.withInput(new LongWritable(1), new Text("cat cat dog")); mapDriver.withOutput(new Text("cat"), new IntWritable(1)); mapDriver.withOutput(new Text("cat"), new IntWritable(1)); mapDriver.withOutput(new Text("dog"), new IntWritable(1)); mapDriver.runTest(); } @Test public void testReducer() { List<IntWritable> values = new ArrayList<IntWritable>(); values.add(new IntWritable(1)); values.add(new IntWritable(1)); reduceDriver.withInput(new Text("cat"), values); reduceDriver.withOutput(new Text("cat"), new IntWritable(2)); reduceDriver.runTest(); } @Test public void testMapReduce() { mapReduceDriver.withInput(new LongWritable(1), new Text("cat cat dog")); mapReduceDriver.addOutput(new Text("cat"), new IntWritable(2)); mapReduceDriver.addOutput(new Text("dog"), new IntWritable(1)); mapReduceDriver.runTest(); } } Seoul Software Testing Conference
  38. 38. MRUnit Test Using MRUnit API, declare MapReduce, Mapper, Reducer drivers with input/output (key, value) public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDr iver; MapDriver<LongWritable, Text, Text, IntWritable> mapDriver; ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver; @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer(); mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper); reduceDriver = new ReduceDriver<Text, IntWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer); mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, I ntWritable>(); mapReduceDriver.setMapper(mapper); mapReduceDriver.setReducer(reducer); } Seoul Software Testing Conference
  39. 39. MRUnit Test public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDr iver; MapDriver<LongWritable, Text, Text, IntWritable> mapDriver; ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver; Run setUp() before executing each test method @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer(); mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper); reduceDriver = new ReduceDriver<Text, IntWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer); mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, I ntWritable>(); mapReduceDriver.setMapper(mapper); mapReduceDriver.setReducer(reducer); } Seoul Software Testing Conference
  40. 40. MRUnit Test public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDr iver; MapDriver<LongWritable, Text, Text, IntWritable> mapDriver; ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver; @Before Instantiate WordCount Mapper, Reducer public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer(); mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper); reduceDriver = new ReduceDriver<Text, IntWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer); mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, I ntWritable>(); mapReduceDriver.setMapper(mapper); mapReduceDriver.setReducer(reducer); } Seoul Software Testing Conference
  41. 41. MRUnit Test public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDr iver; MapDriver<LongWritable, Text, Text, IntWritable> mapDriver; ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver; @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer(); Instantiate and set Mapper driver with input/output (key, value) mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper); reduceDriver = new ReduceDriver<Text, IntWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer); mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, I ntWritable>(); mapReduceDriver.setMapper(mapper); mapReduceDriver.setReducer(reducer); } Seoul Software Testing Conference
  42. 42. MRUnit Test public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDr iver; MapDriver<LongWritable, Text, Text, IntWritable> mapDriver; ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver; @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer(); Instantiate and set Reducer driver with input/output (key, value) mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper); reduceDriver = new ReduceDriver<Text, IntWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer); mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, I ntWritable>(); mapReduceDriver.setMapper(mapper); mapReduceDriver.setReducer(reducer); } Seoul Software Testing Conference
  43. 43. MRUnit Test public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDr iver; MapDriver<LongWritable, Text, Text, IntWritable> mapDriver; ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver; @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer(); Instantiate and set MapperReducer driver with input/output (key, value) mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper); reduceDriver = new ReduceDriver<Text, IntWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer); mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, I ntWritable>(); mapReduceDriver.setMapper(mapper); mapReduceDriver.setReducer(reducer); } Seoul Software Testing Conference
  44. 44. MRUnit Test Mapper test: Define sample input with expected output @Test public void testMapper() { mapDriver.withInput(new LongWritable(1), new Text("cat cat dog")); mapDriver.withOutput(new Text("cat"), new IntWritable(1)); mapDriver.withOutput(new Text("cat"), new IntWritable(1)); mapDriver.withOutput(new Text("dog"), new IntWritable(1)); mapDriver.runTest(); } @Test public void testReducer() { List<IntWritable> values = new ArrayList<IntWritable>(); values.add(new IntWritable(1)); values.add(new IntWritable(1)); reduceDriver.withInput(new Text("cat"), values); reduceDriver.withOutput(new Text("cat"), new IntWritable(2)); reduceDriver.runTest(); } Seoul Software Testing Conference
  45. 45. MRUnit Test @Test public void testMapper() { mapDriver.withInput(new LongWritable(1), new Text("cat cat dog")); mapDriver.withOutput(new Text("cat"), new IntWritable(1)); mapDriver.withOutput(new Text("cat"), new IntWritable(1)); mapDriver.withOutput(new Text("dog"), new IntWritable(1)); mapDriver.runTest(); } Reducer test: Define sample input with expected output @Test public void testReducer() { List<IntWritable> values = new ArrayList<IntWritable>(); values.add(new IntWritable(1)); values.add(new IntWritable(1)); reduceDriver.withInput(new Text("cat"), values); reduceDriver.withOutput(new Text("cat"), new IntWritable(2)); reduceDriver.runTest(); } Seoul Software Testing Conference
  46. 46. MRUnit Test MapperReducer test: Define sample input with expected output @Test public void testMapReduce() { mapReduceDriver.withInput(new LongWritable(1), new Text("cat cat dog")); mapReduceDriver.addOutput(new Text("cat"), new IntWritable(2)); mapReduceDriver.addOutput(new Text("dog"), new IntWritable(1)); mapReduceDriver.runTest(); } } Seoul Software Testing Conference
  47. 47. MRUnit Test in real • Need to implement unit tests – How many? • all Map, Reduce, Driver – Problems? • Mostly work – But it does not support complicated Map, Reduce APIs – How many problems you can detect • Depends on how well you implement MRUnit code Seoul Software Testing Conference
  48. 48. Conclusion • • • • MRUnit for Hadoop Unit Test Development Integrate with QA site with CI server Need to use it Seoul Software Testing Conference
  49. 49. Question? Seoul Software Testing Conference
  50. 50. References 1. Hadoop WordCount example with new map reduce api (http://codesfusion.blogspot.com/2013/1 0/hadoop-wordcount-with-new-map-reduce-api.html) 2. Hadoop Word Count Example (http://wiki.apache.org/hadoop/WordCount ) 3. Example: WordCount v1.0, Cloudera Hadoop Tutorial (http://www.cloudera.com/content/clouder a-content/cloudera-docs/HadoopTutorial/CDH4/Hadoop-Tutorial/ht_walk_through.html ) 4. Testing Word Count (https://cwiki.apache.org/confluence/display/MRUNIT/Testing+Word+Count) 5. Apache MRUnit Tutorial (https://cwiki.apache.org/confluence/display/MRUNIT/MRUnit+Tutorial ) 6. Hadoop Integration Test Suite, Shopzilla (https://github.com/shopzilla/hadoop-integration-test-sui te ) 7. Hadoop’s Elepahnt in the Room, Jeremy Lucas, Shopzilla (http://tech.shopzilla.com/2013/04/hado ops-elephant-in-the-room/ ) 8. Facebook Test MapReduce Local (https://github.com/facebook/hadoop-20/blob/master/src/test/ org/apache/hadoop/mapreduce/TestMapReduceLocal.java ) 9. Yahoo HIT Hadoop Integrated Testing (http://www.slideshare.net/ydn/hi-tv3?from_search=1 ) Seoul Software Testing Conference
  51. 51. Seoul Software Testing Conference
  52. 52. Seoul Software Testing Conference
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×