12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal 
Ashok Agarwal 
Testing MultiOutputFormat based MapReduce 
≈ LEAVE A COMMENT 
[] 
Tags 
11 Thursday Sep 2014 
POSTED BY ASHOK AGARWAL IN BIG DATA 
Big Data, Hadoop, MapReduce 
In one of our projects, we were require to generate per client file as output of MapReduce Job, so 
that the corresponding client can see their data and analyze it. 
Consider you get daily stock prices files. 
For 9/8/2014: 9_8_2014.csv 
1234 
9/8/14,MSFT,47 
9/8/14,ORCL,40 
9/8/14,GOOG,577 
9/8/14,AAPL,100.4 
For 9/9/2014: 9_9_2014.csv 
1234 
9/9/14,MSFT,46 
9/9/14,ORCL,41 
9/9/14,GOOG,578 
9/9/14,AAPL,101 
So on… 
123456789 
10 
9/10/14,MSFT,48 
9/10/14,ORCL,39.5 
9/10/14,GOOG,577 
9/10/14,AAPL,100 
9/11/14,MSFT,47.5 
9/11/14,ORCL,41 
9/11/14,GOOG,588 
9/11/14,AAPL,99.8 
9/12/14,MSFT,46.69 
9/12/14,ORCL,40.5 
https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 1/7
12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal 
11 
12 
9/12/14,GOOG,576 
9/12/14,AAPL,102.5 
We want to analyze the each stock weekly trend. In order to that we need to create each stock 
based data. 
The below mapper code splits the read records from csv using TextInputFormat. The output 
mapper key is stock and value is price. 
123456789 
10 
11 
12 
13 
package com.jbksoft; 
import org.apache.hadoop.io.LongWritable; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapreduce.Mapper; 
import java.io.IOException; 
public class MyMultiOutputMapper extends Mapper<LongWritable, Text, Text, public void map(LongWritable key, Text value, Context context) 
throws IOException, InterruptedException { 
String line = value.toString(); 
String[] tokens = line.split(","); 
context.write(new Text(tokens[1]), new Text(tokens[2])); 
} 
} 
The below reducer code creates file for each stock. 
123456789 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
package com.jbksoft; 
import org.apache.hadoop.io.NullWritable; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapreduce.Reducer; 
import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs; 
import java.io.IOException; 
public class MyMultiOutputReducer extends Reducer<Text, Text, NullWritable, MultipleOutputs<NullWritable, Text> mos; 
public void setup(Context context) { 
mos = new MultipleOutputs(context); 
} 
public void reduce(Text key, Iterable<Text> values, Context context) 
throws IOException, InterruptedException { 
for (Text value : values) { 
mos.write(NullWritable.get(), value, key.toString()); 
} 
} 
protected void cleanup(Context context) 
throws IOException, InterruptedException { 
mos.close(); 
} 
} 
The driver for the code: 
https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 2/7
12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal 
123456789 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
package com.jbksoft; 
import org.apache.hadoop.conf.Configuration; 
import org.apache.hadoop.fs.Path; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapreduce.Job; 
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; 
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; 
import org.apache.hadoop.mapreduce.lib.output.LazyOutputFormat; 
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; 
import java.io.IOException; 
public class MyMultiOutputTest { 
public static void main(String[] args) throws IOException, InterruptedException, Path inputDir = new Path(args[0]); 
Path outputDir = new Path(args[1]); 
Configuration conf = new Configuration(); 
Job job = new Job(conf); 
job.setJarByClass(MyMultiOutputTest.class); 
job.setJobName("My MultipleOutputs Demo"); 
job.setMapOutputKeyClass(Text.class); 
job.setMapOutputValueClass(Text.class); 
job.setMapperClass(MyMultiOutputMapper.class); 
job.setReducerClass(MyMultiOutputReducer.class); 
FileInputFormat.setInputPaths(job, inputDir); 
FileOutputFormat.setOutputPath(job, outputDir); 
LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class); 
job.waitForCompletion(true); 
} 
} 
The command for executing above code(compiled and packaged as jar): 
123456789 
aagarwal‐mbpro:~ ashok.agarwal$ hadoop jar test.jar com.jbksoft.MyMultiOutputTest aagarwal‐mbpro:~ ashok.agarwal$ ls ‐l /Users/ashok.agarwal/dev/HBaseDemo/output 
total 32 
‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 25 Sep 11 11:32 AAPL‐r‐00000 
‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 20 Sep 11 11:32 GOOG‐r‐00000 
‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 20 Sep 11 11:32 MSFT‐r‐00000 
‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 19 Sep 11 11:32 ORCL‐r‐00000 
‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 0 Sep 11 11:32 _SUCCESS 
aagarwal‐mbpro:~ ashok.agarwal$ 
The test case for the above code can be created using MRunit. 
The reducer needs to be mocked over here as below: 
https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 3/7
12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal 
123456789 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 
44 
45 
46 
47 
48 
49 
50 
51 
52 
53 
54 
package com.jbksoft.test; 
import com.jbksoft.MyMultiOutputReducer; 
import org.apache.hadoop.conf.Configuration; 
import org.apache.hadoop.io.LongWritable; 
import org.apache.hadoop.io.NullWritable; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapreduce.Reducer; 
import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs; 
import org.apache.hadoop.mrunit.mapreduce.ReduceDriver; 
import org.apache.hadoop.mrunit.types.Pair; 
import org.junit.Before; 
import org.junit.Test; 
import java.util.ArrayList; 
import java.util.HashMap; 
import java.util.List; 
import java.util.Map; 
import static org.junit.Assert.assertEquals; 
import static org.junit.Assert.assertTrue; 
public class MyMultiOutputReducerTest { 
MockOSReducer reducer; 
ReduceDriver<Text, Text, NullWritable, Text> reduceDriver; 
Configuration config; 
Map<String, List<Text>> outputCSVFiles; 
static String[] CSV = { 
"9/8/14,MSFT,47", 
"9/8/14,ORCL,40", 
"9/8/14,GOOG,577", 
"9/8/14,AAPL,100.4", 
"9/9/14,MSFT,46", 
"9/9/14,ORCL,41", 
"9/9/14,GOOG,578" 
}; 
class MockOSReducer extends MyMultiOutputReducer { 
private Map<String, List<Text>> multipleOutputMap; 
public MockOSReducer(Map<String, List<Text>> map) { 
super(); 
multipleOutputMap = map; 
} 
@Override 
public void setup(Reducer.Context context) { 
mos = new MultipleOutputs<NullWritable, Text>(context) { 
@Override 
public void write(NullWritable key, Text value, String outputFileName) 
throws java.io.IOException, java.lang.InterruptedException { 
List<Text> outputs = multipleOutputMap.get(outputFileName); 
if (outputs == null) { 
outputs = new ArrayList<Text>(); 
https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 4/7
12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal 
55 
56 
57 
58 
59 
60 
61 
62 
63 
64 
65 
66 
67 
68 
69 
70 
71 
72 
73 
74 
75 
76 
77 
78 
79 
80 
81 
82 
83 
84 
85 
86 
87 
88 
89 
90 
91 
92 
93 
94 
95 
96 
97 
98 
99 
100 
101 
102 
103 
104 
105 
106 
107 
108 
multipleOutputMap.put(outputFileName, outputs); 
} 
outputs.add(new Text(value)); 
} 
}; 
config = context.getConfiguration(); 
} 
} 
@Before 
public void setup() 
throws Exception { 
config = new Configuration(); 
outputCSVFiles = new HashMap<String, List<Text>>(); 
reducer = new MockOSReducer(outputCSVFiles); 
reduceDriver = ReduceDriver.newReduceDriver(reducer); 
reduceDriver.setConfiguration(config); 
} 
@Test 
public void testReduceInput1Output() 
throws Exception { 
List<Text> list = new ArrayList<Text>(); 
list.add(new Text("47")); 
list.add(new Text("46")); 
list.add(new Text("48")); 
reduceDriver.withInput(new Text("MSFT"), list); 
reduceDriver.runTest(); 
Map<String, List<Text>> expectedCSVOutput = new HashMap<String, List<Text> outputs = new ArrayList<Text>(); 
outputs.add(new Text("47")); 
outputs.add(new Text("46")); 
outputs.add(new Text("48")); 
expectedCSVOutput.put("MSFT", outputs); 
validateOutputList(outputCSVFiles, expectedCSVOutput); 
} 
static void print(Map<String, List<Text>> outputCSVFiles) { 
for (String key : outputCSVFiles.keySet()) { 
List<Text> valueList = outputCSVFiles.get(key); 
for (Text pair : valueList) { 
System.out.println("OUTPUT " + key + " = " + pair.toString()); 
} 
} 
} 
https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 5/7
12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal 
109 
110 
111 
112 
113 
114 
115 
116 
117 
118 
119 
120 
121 
122 
123 
124 
125 
126 
127 
128 
129 
130 
131 
132 
133 
134 
135 
136 
137 
138 
139 
protected void validateOutputList(Map<String, List<Text>> actuals, 
Map<String, List<Text>> expects) { 
List<String> removeList = new ArrayList<String>(); 
for (String key : expects.keySet()) { 
removeList.add(key); 
List<Text> expectedValues = expects.get(key); 
List<Text> actualValues = actuals.get(key); 
int expectedSize = expectedValues.size(); 
int actualSize = actualValues.size(); 
int i = 0; 
assertEquals("Number of output CSV files is " + actualSize + " actualSize, expectedSize); 
while (expectedSize > i || actualSize > i) { 
if (expectedSize > i && actualSize > i) { 
Text expected = expectedValues.get(i); 
Text actual = actualValues.get(i); 
assertTrue("Expected CSV content is " + expected.toString() + "expected.equals(actual)); 
} 
i++; 
} 
} 
} 
} 
The mapper unit test can be as below: 
123456789 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
package com.jbksoft.test; 
import com.jbksoft.MyMultiOutputMapper; 
import org.apache.hadoop.conf.Configuration; 
import org.apache.hadoop.io.LongWritable; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mrunit.mapreduce.MapDriver; 
import org.apache.hadoop.mrunit.types.Pair; 
import org.junit.Before; 
import org.junit.Test; 
import java.util.ArrayList; 
import java.util.List; 
public class MyMultiOutputMapperTest { 
MyMultiOutputMapper mapper; 
MapDriver<LongWritable, Text, Text, Text> mapDriver; 
Configuration config; 
static String[] CSV = { 
"9/8/14,MSFT,47", 
"9/8/14,ORCL,40", 
"9/8/14,GOOG,577" 
https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 6/7
12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 
44 
45 
46 
47 
48 
49 
50 
51 
52 
53 
}; 
@Before 
public void setup() 
throws Exception { 
config = new Configuration(); 
mapper = new MyMultiOutputMapper(); 
mapDriver = MapDriver.newMapDriver(mapper); 
mapDriver.setConfiguration(config); 
} 
@Test 
public void testMapInput1Output() 
throws Exception { 
mapDriver.withInput(new LongWritable(), new Text(CSV[0])); 
mapDriver.withOutput(new Text("MSFT"), new Text("47")); 
mapDriver.runTest(); 
} 
@Test 
public void testMapInput2Output() 
throws Exception { 
final List<Pair<LongWritable, Text>> inputs = new ArrayList<Pair&inputs.add(new Pair<LongWritable, Text>(new LongWritable(), new Text(CSV[ 
inputs.add(new Pair<LongWritable, Text>(new LongWritable(), new Text(CSV[ 
final List<Pair<Text, Text>> outputs = new ArrayList<Pair<outputs.add(new Pair<Text, Text>(new Text("MSFT"), new Text(&outputs.add(new Pair<Text, Text>(new Text("ORCL"), new Text(&// mapDriver.withAll(inputs).withAllOutput(outputs).runTest(); 
} 
} 
References: 
1. MapReduce Tutorial 
2. HDFS Architecture 
3. MultipileOutputs 
4. MRUnit 
About Occasionally, these ads 
some of your visitors may see an advertisement here. 
Tell me more | Dismiss this message 
Blog at WordPress.com. The Chateau Theme. 
https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 7/7

Testing multi outputformat based mapreduce

  • 1.
    12/10/2014 Testing MultiOutputFormatbased MapReduce | Ashok Agarwal Ashok Agarwal Testing MultiOutputFormat based MapReduce ≈ LEAVE A COMMENT [] Tags 11 Thursday Sep 2014 POSTED BY ASHOK AGARWAL IN BIG DATA Big Data, Hadoop, MapReduce In one of our projects, we were require to generate per client file as output of MapReduce Job, so that the corresponding client can see their data and analyze it. Consider you get daily stock prices files. For 9/8/2014: 9_8_2014.csv 1234 9/8/14,MSFT,47 9/8/14,ORCL,40 9/8/14,GOOG,577 9/8/14,AAPL,100.4 For 9/9/2014: 9_9_2014.csv 1234 9/9/14,MSFT,46 9/9/14,ORCL,41 9/9/14,GOOG,578 9/9/14,AAPL,101 So on… 123456789 10 9/10/14,MSFT,48 9/10/14,ORCL,39.5 9/10/14,GOOG,577 9/10/14,AAPL,100 9/11/14,MSFT,47.5 9/11/14,ORCL,41 9/11/14,GOOG,588 9/11/14,AAPL,99.8 9/12/14,MSFT,46.69 9/12/14,ORCL,40.5 https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 1/7
  • 2.
    12/10/2014 Testing MultiOutputFormatbased MapReduce | Ashok Agarwal 11 12 9/12/14,GOOG,576 9/12/14,AAPL,102.5 We want to analyze the each stock weekly trend. In order to that we need to create each stock based data. The below mapper code splits the read records from csv using TextInputFormat. The output mapper key is stock and value is price. 123456789 10 11 12 13 package com.jbksoft; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; import java.io.IOException; public class MyMultiOutputMapper extends Mapper<LongWritable, Text, Text, public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); String[] tokens = line.split(","); context.write(new Text(tokens[1]), new Text(tokens[2])); } } The below reducer code creates file for each stock. 123456789 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 package com.jbksoft; import org.apache.hadoop.io.NullWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs; import java.io.IOException; public class MyMultiOutputReducer extends Reducer<Text, Text, NullWritable, MultipleOutputs<NullWritable, Text> mos; public void setup(Context context) { mos = new MultipleOutputs(context); } public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException { for (Text value : values) { mos.write(NullWritable.get(), value, key.toString()); } } protected void cleanup(Context context) throws IOException, InterruptedException { mos.close(); } } The driver for the code: https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 2/7
  • 3.
    12/10/2014 Testing MultiOutputFormatbased MapReduce | Ashok Agarwal 123456789 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 package com.jbksoft; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.LazyOutputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import java.io.IOException; public class MyMultiOutputTest { public static void main(String[] args) throws IOException, InterruptedException, Path inputDir = new Path(args[0]); Path outputDir = new Path(args[1]); Configuration conf = new Configuration(); Job job = new Job(conf); job.setJarByClass(MyMultiOutputTest.class); job.setJobName("My MultipleOutputs Demo"); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(Text.class); job.setMapperClass(MyMultiOutputMapper.class); job.setReducerClass(MyMultiOutputReducer.class); FileInputFormat.setInputPaths(job, inputDir); FileOutputFormat.setOutputPath(job, outputDir); LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class); job.waitForCompletion(true); } } The command for executing above code(compiled and packaged as jar): 123456789 aagarwal‐mbpro:~ ashok.agarwal$ hadoop jar test.jar com.jbksoft.MyMultiOutputTest aagarwal‐mbpro:~ ashok.agarwal$ ls ‐l /Users/ashok.agarwal/dev/HBaseDemo/output total 32 ‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 25 Sep 11 11:32 AAPL‐r‐00000 ‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 20 Sep 11 11:32 GOOG‐r‐00000 ‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 20 Sep 11 11:32 MSFT‐r‐00000 ‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 19 Sep 11 11:32 ORCL‐r‐00000 ‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 0 Sep 11 11:32 _SUCCESS aagarwal‐mbpro:~ ashok.agarwal$ The test case for the above code can be created using MRunit. The reducer needs to be mocked over here as below: https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 3/7
  • 4.
    12/10/2014 Testing MultiOutputFormatbased MapReduce | Ashok Agarwal 123456789 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 package com.jbksoft.test; import com.jbksoft.MyMultiOutputReducer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.NullWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs; import org.apache.hadoop.mrunit.mapreduce.ReduceDriver; import org.apache.hadoop.mrunit.types.Pair; import org.junit.Before; import org.junit.Test; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertTrue; public class MyMultiOutputReducerTest { MockOSReducer reducer; ReduceDriver<Text, Text, NullWritable, Text> reduceDriver; Configuration config; Map<String, List<Text>> outputCSVFiles; static String[] CSV = { "9/8/14,MSFT,47", "9/8/14,ORCL,40", "9/8/14,GOOG,577", "9/8/14,AAPL,100.4", "9/9/14,MSFT,46", "9/9/14,ORCL,41", "9/9/14,GOOG,578" }; class MockOSReducer extends MyMultiOutputReducer { private Map<String, List<Text>> multipleOutputMap; public MockOSReducer(Map<String, List<Text>> map) { super(); multipleOutputMap = map; } @Override public void setup(Reducer.Context context) { mos = new MultipleOutputs<NullWritable, Text>(context) { @Override public void write(NullWritable key, Text value, String outputFileName) throws java.io.IOException, java.lang.InterruptedException { List<Text> outputs = multipleOutputMap.get(outputFileName); if (outputs == null) { outputs = new ArrayList<Text>(); https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 4/7
  • 5.
    12/10/2014 Testing MultiOutputFormatbased MapReduce | Ashok Agarwal 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 multipleOutputMap.put(outputFileName, outputs); } outputs.add(new Text(value)); } }; config = context.getConfiguration(); } } @Before public void setup() throws Exception { config = new Configuration(); outputCSVFiles = new HashMap<String, List<Text>>(); reducer = new MockOSReducer(outputCSVFiles); reduceDriver = ReduceDriver.newReduceDriver(reducer); reduceDriver.setConfiguration(config); } @Test public void testReduceInput1Output() throws Exception { List<Text> list = new ArrayList<Text>(); list.add(new Text("47")); list.add(new Text("46")); list.add(new Text("48")); reduceDriver.withInput(new Text("MSFT"), list); reduceDriver.runTest(); Map<String, List<Text>> expectedCSVOutput = new HashMap<String, List<Text> outputs = new ArrayList<Text>(); outputs.add(new Text("47")); outputs.add(new Text("46")); outputs.add(new Text("48")); expectedCSVOutput.put("MSFT", outputs); validateOutputList(outputCSVFiles, expectedCSVOutput); } static void print(Map<String, List<Text>> outputCSVFiles) { for (String key : outputCSVFiles.keySet()) { List<Text> valueList = outputCSVFiles.get(key); for (Text pair : valueList) { System.out.println("OUTPUT " + key + " = " + pair.toString()); } } } https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 5/7
  • 6.
    12/10/2014 Testing MultiOutputFormatbased MapReduce | Ashok Agarwal 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 protected void validateOutputList(Map<String, List<Text>> actuals, Map<String, List<Text>> expects) { List<String> removeList = new ArrayList<String>(); for (String key : expects.keySet()) { removeList.add(key); List<Text> expectedValues = expects.get(key); List<Text> actualValues = actuals.get(key); int expectedSize = expectedValues.size(); int actualSize = actualValues.size(); int i = 0; assertEquals("Number of output CSV files is " + actualSize + " actualSize, expectedSize); while (expectedSize > i || actualSize > i) { if (expectedSize > i && actualSize > i) { Text expected = expectedValues.get(i); Text actual = actualValues.get(i); assertTrue("Expected CSV content is " + expected.toString() + "expected.equals(actual)); } i++; } } } } The mapper unit test can be as below: 123456789 10 11 12 13 14 15 16 17 18 19 20 package com.jbksoft.test; import com.jbksoft.MyMultiOutputMapper; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mrunit.mapreduce.MapDriver; import org.apache.hadoop.mrunit.types.Pair; import org.junit.Before; import org.junit.Test; import java.util.ArrayList; import java.util.List; public class MyMultiOutputMapperTest { MyMultiOutputMapper mapper; MapDriver<LongWritable, Text, Text, Text> mapDriver; Configuration config; static String[] CSV = { "9/8/14,MSFT,47", "9/8/14,ORCL,40", "9/8/14,GOOG,577" https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 6/7
  • 7.
    12/10/2014 Testing MultiOutputFormatbased MapReduce | Ashok Agarwal 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 }; @Before public void setup() throws Exception { config = new Configuration(); mapper = new MyMultiOutputMapper(); mapDriver = MapDriver.newMapDriver(mapper); mapDriver.setConfiguration(config); } @Test public void testMapInput1Output() throws Exception { mapDriver.withInput(new LongWritable(), new Text(CSV[0])); mapDriver.withOutput(new Text("MSFT"), new Text("47")); mapDriver.runTest(); } @Test public void testMapInput2Output() throws Exception { final List<Pair<LongWritable, Text>> inputs = new ArrayList<Pair&inputs.add(new Pair<LongWritable, Text>(new LongWritable(), new Text(CSV[ inputs.add(new Pair<LongWritable, Text>(new LongWritable(), new Text(CSV[ final List<Pair<Text, Text>> outputs = new ArrayList<Pair<outputs.add(new Pair<Text, Text>(new Text("MSFT"), new Text(&outputs.add(new Pair<Text, Text>(new Text("ORCL"), new Text(&// mapDriver.withAll(inputs).withAllOutput(outputs).runTest(); } } References: 1. MapReduce Tutorial 2. HDFS Architecture 3. MultipileOutputs 4. MRUnit About Occasionally, these ads some of your visitors may see an advertisement here. Tell me more | Dismiss this message Blog at WordPress.com. The Chateau Theme. https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 7/7