SlideShare a Scribd company logo
12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal 
Ashok Agarwal 
Testing MultiOutputFormat based MapReduce 
≈ LEAVE A COMMENT 
[] 
Tags 
11 Thursday Sep 2014 
POSTED BY ASHOK AGARWAL IN BIG DATA 
Big Data, Hadoop, MapReduce 
In one of our projects, we were require to generate per client file as output of MapReduce Job, so 
that the corresponding client can see their data and analyze it. 
Consider you get daily stock prices files. 
For 9/8/2014: 9_8_2014.csv 
1234 
9/8/14,MSFT,47 
9/8/14,ORCL,40 
9/8/14,GOOG,577 
9/8/14,AAPL,100.4 
For 9/9/2014: 9_9_2014.csv 
1234 
9/9/14,MSFT,46 
9/9/14,ORCL,41 
9/9/14,GOOG,578 
9/9/14,AAPL,101 
So on… 
123456789 
10 
9/10/14,MSFT,48 
9/10/14,ORCL,39.5 
9/10/14,GOOG,577 
9/10/14,AAPL,100 
9/11/14,MSFT,47.5 
9/11/14,ORCL,41 
9/11/14,GOOG,588 
9/11/14,AAPL,99.8 
9/12/14,MSFT,46.69 
9/12/14,ORCL,40.5 
https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 1/7
12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal 
11 
12 
9/12/14,GOOG,576 
9/12/14,AAPL,102.5 
We want to analyze the each stock weekly trend. In order to that we need to create each stock 
based data. 
The below mapper code splits the read records from csv using TextInputFormat. The output 
mapper key is stock and value is price. 
123456789 
10 
11 
12 
13 
package com.jbksoft; 
import org.apache.hadoop.io.LongWritable; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapreduce.Mapper; 
import java.io.IOException; 
public class MyMultiOutputMapper extends Mapper<LongWritable, Text, Text, public void map(LongWritable key, Text value, Context context) 
throws IOException, InterruptedException { 
String line = value.toString(); 
String[] tokens = line.split(","); 
context.write(new Text(tokens[1]), new Text(tokens[2])); 
} 
} 
The below reducer code creates file for each stock. 
123456789 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
package com.jbksoft; 
import org.apache.hadoop.io.NullWritable; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapreduce.Reducer; 
import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs; 
import java.io.IOException; 
public class MyMultiOutputReducer extends Reducer<Text, Text, NullWritable, MultipleOutputs<NullWritable, Text> mos; 
public void setup(Context context) { 
mos = new MultipleOutputs(context); 
} 
public void reduce(Text key, Iterable<Text> values, Context context) 
throws IOException, InterruptedException { 
for (Text value : values) { 
mos.write(NullWritable.get(), value, key.toString()); 
} 
} 
protected void cleanup(Context context) 
throws IOException, InterruptedException { 
mos.close(); 
} 
} 
The driver for the code: 
https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 2/7
12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal 
123456789 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
package com.jbksoft; 
import org.apache.hadoop.conf.Configuration; 
import org.apache.hadoop.fs.Path; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapreduce.Job; 
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; 
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; 
import org.apache.hadoop.mapreduce.lib.output.LazyOutputFormat; 
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; 
import java.io.IOException; 
public class MyMultiOutputTest { 
public static void main(String[] args) throws IOException, InterruptedException, Path inputDir = new Path(args[0]); 
Path outputDir = new Path(args[1]); 
Configuration conf = new Configuration(); 
Job job = new Job(conf); 
job.setJarByClass(MyMultiOutputTest.class); 
job.setJobName("My MultipleOutputs Demo"); 
job.setMapOutputKeyClass(Text.class); 
job.setMapOutputValueClass(Text.class); 
job.setMapperClass(MyMultiOutputMapper.class); 
job.setReducerClass(MyMultiOutputReducer.class); 
FileInputFormat.setInputPaths(job, inputDir); 
FileOutputFormat.setOutputPath(job, outputDir); 
LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class); 
job.waitForCompletion(true); 
} 
} 
The command for executing above code(compiled and packaged as jar): 
123456789 
aagarwal‐mbpro:~ ashok.agarwal$ hadoop jar test.jar com.jbksoft.MyMultiOutputTest aagarwal‐mbpro:~ ashok.agarwal$ ls ‐l /Users/ashok.agarwal/dev/HBaseDemo/output 
total 32 
‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 25 Sep 11 11:32 AAPL‐r‐00000 
‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 20 Sep 11 11:32 GOOG‐r‐00000 
‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 20 Sep 11 11:32 MSFT‐r‐00000 
‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 19 Sep 11 11:32 ORCL‐r‐00000 
‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 0 Sep 11 11:32 _SUCCESS 
aagarwal‐mbpro:~ ashok.agarwal$ 
The test case for the above code can be created using MRunit. 
The reducer needs to be mocked over here as below: 
https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 3/7
12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal 
123456789 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 
44 
45 
46 
47 
48 
49 
50 
51 
52 
53 
54 
package com.jbksoft.test; 
import com.jbksoft.MyMultiOutputReducer; 
import org.apache.hadoop.conf.Configuration; 
import org.apache.hadoop.io.LongWritable; 
import org.apache.hadoop.io.NullWritable; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapreduce.Reducer; 
import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs; 
import org.apache.hadoop.mrunit.mapreduce.ReduceDriver; 
import org.apache.hadoop.mrunit.types.Pair; 
import org.junit.Before; 
import org.junit.Test; 
import java.util.ArrayList; 
import java.util.HashMap; 
import java.util.List; 
import java.util.Map; 
import static org.junit.Assert.assertEquals; 
import static org.junit.Assert.assertTrue; 
public class MyMultiOutputReducerTest { 
MockOSReducer reducer; 
ReduceDriver<Text, Text, NullWritable, Text> reduceDriver; 
Configuration config; 
Map<String, List<Text>> outputCSVFiles; 
static String[] CSV = { 
"9/8/14,MSFT,47", 
"9/8/14,ORCL,40", 
"9/8/14,GOOG,577", 
"9/8/14,AAPL,100.4", 
"9/9/14,MSFT,46", 
"9/9/14,ORCL,41", 
"9/9/14,GOOG,578" 
}; 
class MockOSReducer extends MyMultiOutputReducer { 
private Map<String, List<Text>> multipleOutputMap; 
public MockOSReducer(Map<String, List<Text>> map) { 
super(); 
multipleOutputMap = map; 
} 
@Override 
public void setup(Reducer.Context context) { 
mos = new MultipleOutputs<NullWritable, Text>(context) { 
@Override 
public void write(NullWritable key, Text value, String outputFileName) 
throws java.io.IOException, java.lang.InterruptedException { 
List<Text> outputs = multipleOutputMap.get(outputFileName); 
if (outputs == null) { 
outputs = new ArrayList<Text>(); 
https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 4/7
12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal 
55 
56 
57 
58 
59 
60 
61 
62 
63 
64 
65 
66 
67 
68 
69 
70 
71 
72 
73 
74 
75 
76 
77 
78 
79 
80 
81 
82 
83 
84 
85 
86 
87 
88 
89 
90 
91 
92 
93 
94 
95 
96 
97 
98 
99 
100 
101 
102 
103 
104 
105 
106 
107 
108 
multipleOutputMap.put(outputFileName, outputs); 
} 
outputs.add(new Text(value)); 
} 
}; 
config = context.getConfiguration(); 
} 
} 
@Before 
public void setup() 
throws Exception { 
config = new Configuration(); 
outputCSVFiles = new HashMap<String, List<Text>>(); 
reducer = new MockOSReducer(outputCSVFiles); 
reduceDriver = ReduceDriver.newReduceDriver(reducer); 
reduceDriver.setConfiguration(config); 
} 
@Test 
public void testReduceInput1Output() 
throws Exception { 
List<Text> list = new ArrayList<Text>(); 
list.add(new Text("47")); 
list.add(new Text("46")); 
list.add(new Text("48")); 
reduceDriver.withInput(new Text("MSFT"), list); 
reduceDriver.runTest(); 
Map<String, List<Text>> expectedCSVOutput = new HashMap<String, List<Text> outputs = new ArrayList<Text>(); 
outputs.add(new Text("47")); 
outputs.add(new Text("46")); 
outputs.add(new Text("48")); 
expectedCSVOutput.put("MSFT", outputs); 
validateOutputList(outputCSVFiles, expectedCSVOutput); 
} 
static void print(Map<String, List<Text>> outputCSVFiles) { 
for (String key : outputCSVFiles.keySet()) { 
List<Text> valueList = outputCSVFiles.get(key); 
for (Text pair : valueList) { 
System.out.println("OUTPUT " + key + " = " + pair.toString()); 
} 
} 
} 
https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 5/7
12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal 
109 
110 
111 
112 
113 
114 
115 
116 
117 
118 
119 
120 
121 
122 
123 
124 
125 
126 
127 
128 
129 
130 
131 
132 
133 
134 
135 
136 
137 
138 
139 
protected void validateOutputList(Map<String, List<Text>> actuals, 
Map<String, List<Text>> expects) { 
List<String> removeList = new ArrayList<String>(); 
for (String key : expects.keySet()) { 
removeList.add(key); 
List<Text> expectedValues = expects.get(key); 
List<Text> actualValues = actuals.get(key); 
int expectedSize = expectedValues.size(); 
int actualSize = actualValues.size(); 
int i = 0; 
assertEquals("Number of output CSV files is " + actualSize + " actualSize, expectedSize); 
while (expectedSize > i || actualSize > i) { 
if (expectedSize > i && actualSize > i) { 
Text expected = expectedValues.get(i); 
Text actual = actualValues.get(i); 
assertTrue("Expected CSV content is " + expected.toString() + "expected.equals(actual)); 
} 
i++; 
} 
} 
} 
} 
The mapper unit test can be as below: 
123456789 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
package com.jbksoft.test; 
import com.jbksoft.MyMultiOutputMapper; 
import org.apache.hadoop.conf.Configuration; 
import org.apache.hadoop.io.LongWritable; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mrunit.mapreduce.MapDriver; 
import org.apache.hadoop.mrunit.types.Pair; 
import org.junit.Before; 
import org.junit.Test; 
import java.util.ArrayList; 
import java.util.List; 
public class MyMultiOutputMapperTest { 
MyMultiOutputMapper mapper; 
MapDriver<LongWritable, Text, Text, Text> mapDriver; 
Configuration config; 
static String[] CSV = { 
"9/8/14,MSFT,47", 
"9/8/14,ORCL,40", 
"9/8/14,GOOG,577" 
https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 6/7
12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 
44 
45 
46 
47 
48 
49 
50 
51 
52 
53 
}; 
@Before 
public void setup() 
throws Exception { 
config = new Configuration(); 
mapper = new MyMultiOutputMapper(); 
mapDriver = MapDriver.newMapDriver(mapper); 
mapDriver.setConfiguration(config); 
} 
@Test 
public void testMapInput1Output() 
throws Exception { 
mapDriver.withInput(new LongWritable(), new Text(CSV[0])); 
mapDriver.withOutput(new Text("MSFT"), new Text("47")); 
mapDriver.runTest(); 
} 
@Test 
public void testMapInput2Output() 
throws Exception { 
final List<Pair<LongWritable, Text>> inputs = new ArrayList<Pair&inputs.add(new Pair<LongWritable, Text>(new LongWritable(), new Text(CSV[ 
inputs.add(new Pair<LongWritable, Text>(new LongWritable(), new Text(CSV[ 
final List<Pair<Text, Text>> outputs = new ArrayList<Pair<outputs.add(new Pair<Text, Text>(new Text("MSFT"), new Text(&outputs.add(new Pair<Text, Text>(new Text("ORCL"), new Text(&// mapDriver.withAll(inputs).withAllOutput(outputs).runTest(); 
} 
} 
References: 
1. MapReduce Tutorial 
2. HDFS Architecture 
3. MultipileOutputs 
4. MRUnit 
About Occasionally, these ads 
some of your visitors may see an advertisement here. 
Tell me more | Dismiss this message 
Blog at WordPress.com. The Chateau Theme. 
https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 7/7

More Related Content

What's hot

Taking Jenkins Pipeline to the Extreme
Taking Jenkins Pipeline to the ExtremeTaking Jenkins Pipeline to the Extreme
Taking Jenkins Pipeline to the Extreme
yinonavraham
 
利用Init connect做mysql clients stat 用户审计
 利用Init connect做mysql clients stat 用户审计 利用Init connect做mysql clients stat 用户审计
利用Init connect做mysql clients stat 用户审计
Dehua Yang
 
Innovative Specifications for Better Performance Logging and Monitoring
Innovative Specifications for Better Performance Logging and MonitoringInnovative Specifications for Better Performance Logging and Monitoring
Innovative Specifications for Better Performance Logging and Monitoring
Cary Millsap
 
State of the CFEngine 2018
State of the CFEngine 2018State of the CFEngine 2018
State of the CFEngine 2018
Nick Anderson
 
Why Kotlin - Apalon Kotlin Sprint Part 1
Why Kotlin - Apalon Kotlin Sprint Part 1Why Kotlin - Apalon Kotlin Sprint Part 1
Why Kotlin - Apalon Kotlin Sprint Part 1
Kirill Rozov
 
Non-Relational Postgres / Bruce Momjian (EnterpriseDB)
Non-Relational Postgres / Bruce Momjian (EnterpriseDB)Non-Relational Postgres / Bruce Momjian (EnterpriseDB)
Non-Relational Postgres / Bruce Momjian (EnterpriseDB)
Ontico
 
Db2
Db2Db2
Cassandra - lesson learned
Cassandra  - lesson learnedCassandra  - lesson learned
Cassandra - lesson learned
Andrzej Ludwikowski
 
Performance Profiling in Rust
Performance Profiling in RustPerformance Profiling in Rust
Performance Profiling in Rust
InfluxData
 
Automated Testing with CMake, CTest and CDash
Automated Testing with CMake, CTest and CDashAutomated Testing with CMake, CTest and CDash
Automated Testing with CMake, CTest and CDash
Richard Thomson
 
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
NAVER D2
 
Common scenarios in vcl
Common scenarios in vclCommon scenarios in vcl
Common scenarios in vcl
Varnish Software
 
PostgreSQL Procedural Languages: Tips, Tricks and Gotchas
PostgreSQL Procedural Languages: Tips, Tricks and GotchasPostgreSQL Procedural Languages: Tips, Tricks and Gotchas
PostgreSQL Procedural Languages: Tips, Tricks and Gotchas
Jim Mlodgenski
 
Large scale machine learning projects with r suite
Large scale machine learning projects with r suiteLarge scale machine learning projects with r suite
Large scale machine learning projects with r suite
Wit Jakuczun
 
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos Toolkit
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos ToolkitExploring OpenFaaS autoscalability on Kubernetes with the Chaos Toolkit
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos Toolkit
Sylvain Hellegouarch
 
Writing Hadoop Jobs in Scala using Scalding
Writing Hadoop Jobs in Scala using ScaldingWriting Hadoop Jobs in Scala using Scalding
Writing Hadoop Jobs in Scala using Scalding
Toni Cebrián
 
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...
Big Data Spain
 
Strategic autovacuum
Strategic autovacuumStrategic autovacuum
Strategic autovacuum
Jim Mlodgenski
 
Sessionization with Spark streaming
Sessionization with Spark streamingSessionization with Spark streaming
Sessionization with Spark streaming
Ramūnas Urbonas
 
The Ring programming language version 1.5.3 book - Part 78 of 184
The Ring programming language version 1.5.3 book - Part 78 of 184The Ring programming language version 1.5.3 book - Part 78 of 184
The Ring programming language version 1.5.3 book - Part 78 of 184
Mahmoud Samir Fayed
 

What's hot (20)

Taking Jenkins Pipeline to the Extreme
Taking Jenkins Pipeline to the ExtremeTaking Jenkins Pipeline to the Extreme
Taking Jenkins Pipeline to the Extreme
 
利用Init connect做mysql clients stat 用户审计
 利用Init connect做mysql clients stat 用户审计 利用Init connect做mysql clients stat 用户审计
利用Init connect做mysql clients stat 用户审计
 
Innovative Specifications for Better Performance Logging and Monitoring
Innovative Specifications for Better Performance Logging and MonitoringInnovative Specifications for Better Performance Logging and Monitoring
Innovative Specifications for Better Performance Logging and Monitoring
 
State of the CFEngine 2018
State of the CFEngine 2018State of the CFEngine 2018
State of the CFEngine 2018
 
Why Kotlin - Apalon Kotlin Sprint Part 1
Why Kotlin - Apalon Kotlin Sprint Part 1Why Kotlin - Apalon Kotlin Sprint Part 1
Why Kotlin - Apalon Kotlin Sprint Part 1
 
Non-Relational Postgres / Bruce Momjian (EnterpriseDB)
Non-Relational Postgres / Bruce Momjian (EnterpriseDB)Non-Relational Postgres / Bruce Momjian (EnterpriseDB)
Non-Relational Postgres / Bruce Momjian (EnterpriseDB)
 
Db2
Db2Db2
Db2
 
Cassandra - lesson learned
Cassandra  - lesson learnedCassandra  - lesson learned
Cassandra - lesson learned
 
Performance Profiling in Rust
Performance Profiling in RustPerformance Profiling in Rust
Performance Profiling in Rust
 
Automated Testing with CMake, CTest and CDash
Automated Testing with CMake, CTest and CDashAutomated Testing with CMake, CTest and CDash
Automated Testing with CMake, CTest and CDash
 
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
 
Common scenarios in vcl
Common scenarios in vclCommon scenarios in vcl
Common scenarios in vcl
 
PostgreSQL Procedural Languages: Tips, Tricks and Gotchas
PostgreSQL Procedural Languages: Tips, Tricks and GotchasPostgreSQL Procedural Languages: Tips, Tricks and Gotchas
PostgreSQL Procedural Languages: Tips, Tricks and Gotchas
 
Large scale machine learning projects with r suite
Large scale machine learning projects with r suiteLarge scale machine learning projects with r suite
Large scale machine learning projects with r suite
 
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos Toolkit
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos ToolkitExploring OpenFaaS autoscalability on Kubernetes with the Chaos Toolkit
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos Toolkit
 
Writing Hadoop Jobs in Scala using Scalding
Writing Hadoop Jobs in Scala using ScaldingWriting Hadoop Jobs in Scala using Scalding
Writing Hadoop Jobs in Scala using Scalding
 
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...
 
Strategic autovacuum
Strategic autovacuumStrategic autovacuum
Strategic autovacuum
 
Sessionization with Spark streaming
Sessionization with Spark streamingSessionization with Spark streaming
Sessionization with Spark streaming
 
The Ring programming language version 1.5.3 book - Part 78 of 184
The Ring programming language version 1.5.3 book - Part 78 of 184The Ring programming language version 1.5.3 book - Part 78 of 184
The Ring programming language version 1.5.3 book - Part 78 of 184
 

Viewers also liked

Ship classification and types
Ship classification and typesShip classification and types
Ship classification and types
muhdisys
 
Business Analysis for professionals
Business Analysis for professionalsBusiness Analysis for professionals
Business Analysis for professionals
Nazish Riaz
 
Ship classification and types
Ship classification and typesShip classification and types
Ship classification and types
muhdisys
 
Num Integration
Num IntegrationNum Integration
Num Integration
muhdisys
 
Compensation management
Compensation managementCompensation management
Compensation management
Nazish Riaz
 
Cost of Capital
Cost of CapitalCost of Capital
Cost of Capital
Jithin Thomas
 
HBase based map reduce job unit testing
HBase based map reduce job unit testingHBase based map reduce job unit testing
HBase based map reduce job unit testing
Ashok Agarwal
 
Ship classification and types
Ship classification and typesShip classification and types
Ship classification and types
muhdisys
 
Elasticity of demand
Elasticity of demandElasticity of demand
Elasticity of demand
Jithin Thomas
 
Demand
DemandDemand
Price discrimination
Price discriminationPrice discrimination
Price discrimination
Jithin Thomas
 
Accounting Principles, Concepts and Accounting Equation
Accounting Principles, Concepts and Accounting EquationAccounting Principles, Concepts and Accounting Equation
Accounting Principles, Concepts and Accounting Equation
Jithin Thomas
 
Theory of Production
Theory of ProductionTheory of Production
Theory of Production
Jithin Thomas
 

Viewers also liked (13)

Ship classification and types
Ship classification and typesShip classification and types
Ship classification and types
 
Business Analysis for professionals
Business Analysis for professionalsBusiness Analysis for professionals
Business Analysis for professionals
 
Ship classification and types
Ship classification and typesShip classification and types
Ship classification and types
 
Num Integration
Num IntegrationNum Integration
Num Integration
 
Compensation management
Compensation managementCompensation management
Compensation management
 
Cost of Capital
Cost of CapitalCost of Capital
Cost of Capital
 
HBase based map reduce job unit testing
HBase based map reduce job unit testingHBase based map reduce job unit testing
HBase based map reduce job unit testing
 
Ship classification and types
Ship classification and typesShip classification and types
Ship classification and types
 
Elasticity of demand
Elasticity of demandElasticity of demand
Elasticity of demand
 
Demand
DemandDemand
Demand
 
Price discrimination
Price discriminationPrice discrimination
Price discrimination
 
Accounting Principles, Concepts and Accounting Equation
Accounting Principles, Concepts and Accounting EquationAccounting Principles, Concepts and Accounting Equation
Accounting Principles, Concepts and Accounting Equation
 
Theory of Production
Theory of ProductionTheory of Production
Theory of Production
 

Similar to Testing multi outputformat based mapreduce

Amazon elastic map reduce
Amazon elastic map reduceAmazon elastic map reduce
Amazon elastic map reduce
Olga Lavrentieva
 
Elastic search integration with hadoop leveragebigdata
Elastic search integration with hadoop   leveragebigdataElastic search integration with hadoop   leveragebigdata
Elastic search integration with hadoop leveragebigdata
Pooja Gupta
 
Javascript Continues Integration in Jenkins with AngularJS
Javascript Continues Integration in Jenkins with AngularJSJavascript Continues Integration in Jenkins with AngularJS
Javascript Continues Integration in Jenkins with AngularJS
Ladislav Prskavec
 
Endofday: A Container Workflow Engine for Scalable, Reproducible Computation
Endofday: A Container Workflow Engine for Scalable, Reproducible ComputationEndofday: A Container Workflow Engine for Scalable, Reproducible Computation
Endofday: A Container Workflow Engine for Scalable, Reproducible Computation
Enis Afgan
 
PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개
PgDay.Seoul
 
BATTLESTAR GALACTICA : Saison 5 - Les Cylons passent dans le cloud avec Vert....
BATTLESTAR GALACTICA : Saison 5 - Les Cylons passent dans le cloud avec Vert....BATTLESTAR GALACTICA : Saison 5 - Les Cylons passent dans le cloud avec Vert....
BATTLESTAR GALACTICA : Saison 5 - Les Cylons passent dans le cloud avec Vert....
La Cuisine du Web
 
Groovy Introduction - JAX Germany - 2008
Groovy Introduction - JAX Germany - 2008Groovy Introduction - JAX Germany - 2008
Groovy Introduction - JAX Germany - 2008
Guillaume Laforge
 
NodeJs
NodeJsNodeJs
NodeJs
dizabl
 
Having Fun with Play
Having Fun with PlayHaving Fun with Play
Having Fun with Play
Clinton Dreisbach
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
Dongmin Yu
 
Introduction To Groovy 2005
Introduction To Groovy 2005Introduction To Groovy 2005
Introduction To Groovy 2005
Tugdual Grall
 
Advanced Javascript Unit Testing
Advanced Javascript Unit TestingAdvanced Javascript Unit Testing
Advanced Javascript Unit Testing
Lars Thorup
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기
NAVER D2
 
Quick and Easy Development with Node.js and Couchbase Server
Quick and Easy Development with Node.js and Couchbase ServerQuick and Easy Development with Node.js and Couchbase Server
Quick and Easy Development with Node.js and Couchbase Server
Nic Raboy
 
Good Practices On Test Automation
Good Practices On Test AutomationGood Practices On Test Automation
Good Practices On Test Automation
Gustavo Labbate Godoy
 
Testing Hadoop jobs with MRUnit
Testing Hadoop jobs with MRUnitTesting Hadoop jobs with MRUnit
Testing Hadoop jobs with MRUnit
Eric Wendelin
 
Scripting Oracle Develop 2007
Scripting Oracle Develop 2007Scripting Oracle Develop 2007
Scripting Oracle Develop 2007
Tugdual Grall
 
Hadoop Integration in Cassandra
Hadoop Integration in CassandraHadoop Integration in Cassandra
Hadoop Integration in Cassandra
Jairam Chandar
 
Junit_.pptx
Junit_.pptxJunit_.pptx
Junit_.pptx
Suman Sourav
 
UNO based ODF Toolkit API
UNO based ODF Toolkit APIUNO based ODF Toolkit API
UNO based ODF Toolkit API
Alexandro Colorado
 

Similar to Testing multi outputformat based mapreduce (20)

Amazon elastic map reduce
Amazon elastic map reduceAmazon elastic map reduce
Amazon elastic map reduce
 
Elastic search integration with hadoop leveragebigdata
Elastic search integration with hadoop   leveragebigdataElastic search integration with hadoop   leveragebigdata
Elastic search integration with hadoop leveragebigdata
 
Javascript Continues Integration in Jenkins with AngularJS
Javascript Continues Integration in Jenkins with AngularJSJavascript Continues Integration in Jenkins with AngularJS
Javascript Continues Integration in Jenkins with AngularJS
 
Endofday: A Container Workflow Engine for Scalable, Reproducible Computation
Endofday: A Container Workflow Engine for Scalable, Reproducible ComputationEndofday: A Container Workflow Engine for Scalable, Reproducible Computation
Endofday: A Container Workflow Engine for Scalable, Reproducible Computation
 
PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개
 
BATTLESTAR GALACTICA : Saison 5 - Les Cylons passent dans le cloud avec Vert....
BATTLESTAR GALACTICA : Saison 5 - Les Cylons passent dans le cloud avec Vert....BATTLESTAR GALACTICA : Saison 5 - Les Cylons passent dans le cloud avec Vert....
BATTLESTAR GALACTICA : Saison 5 - Les Cylons passent dans le cloud avec Vert....
 
Groovy Introduction - JAX Germany - 2008
Groovy Introduction - JAX Germany - 2008Groovy Introduction - JAX Germany - 2008
Groovy Introduction - JAX Germany - 2008
 
NodeJs
NodeJsNodeJs
NodeJs
 
Having Fun with Play
Having Fun with PlayHaving Fun with Play
Having Fun with Play
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
Introduction To Groovy 2005
Introduction To Groovy 2005Introduction To Groovy 2005
Introduction To Groovy 2005
 
Advanced Javascript Unit Testing
Advanced Javascript Unit TestingAdvanced Javascript Unit Testing
Advanced Javascript Unit Testing
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기
 
Quick and Easy Development with Node.js and Couchbase Server
Quick and Easy Development with Node.js and Couchbase ServerQuick and Easy Development with Node.js and Couchbase Server
Quick and Easy Development with Node.js and Couchbase Server
 
Good Practices On Test Automation
Good Practices On Test AutomationGood Practices On Test Automation
Good Practices On Test Automation
 
Testing Hadoop jobs with MRUnit
Testing Hadoop jobs with MRUnitTesting Hadoop jobs with MRUnit
Testing Hadoop jobs with MRUnit
 
Scripting Oracle Develop 2007
Scripting Oracle Develop 2007Scripting Oracle Develop 2007
Scripting Oracle Develop 2007
 
Hadoop Integration in Cassandra
Hadoop Integration in CassandraHadoop Integration in Cassandra
Hadoop Integration in Cassandra
 
Junit_.pptx
Junit_.pptxJunit_.pptx
Junit_.pptx
 
UNO based ODF Toolkit API
UNO based ODF Toolkit APIUNO based ODF Toolkit API
UNO based ODF Toolkit API
 

Recently uploaded

Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
Grant Fritchey
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
Sven Peters
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
Remote DBA Services
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
Green Software Development
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Julian Hyde
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
brainerhub1
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
ICS
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
Hironori Washizaki
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 
DDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systemsDDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systems
Gerardo Pardo-Castellote
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
Peter Muessig
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
TheSMSPoint
 

Recently uploaded (20)

Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 
DDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systemsDDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systems
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
 

Testing multi outputformat based mapreduce

  • 1. 12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal Ashok Agarwal Testing MultiOutputFormat based MapReduce ≈ LEAVE A COMMENT [] Tags 11 Thursday Sep 2014 POSTED BY ASHOK AGARWAL IN BIG DATA Big Data, Hadoop, MapReduce In one of our projects, we were require to generate per client file as output of MapReduce Job, so that the corresponding client can see their data and analyze it. Consider you get daily stock prices files. For 9/8/2014: 9_8_2014.csv 1234 9/8/14,MSFT,47 9/8/14,ORCL,40 9/8/14,GOOG,577 9/8/14,AAPL,100.4 For 9/9/2014: 9_9_2014.csv 1234 9/9/14,MSFT,46 9/9/14,ORCL,41 9/9/14,GOOG,578 9/9/14,AAPL,101 So on… 123456789 10 9/10/14,MSFT,48 9/10/14,ORCL,39.5 9/10/14,GOOG,577 9/10/14,AAPL,100 9/11/14,MSFT,47.5 9/11/14,ORCL,41 9/11/14,GOOG,588 9/11/14,AAPL,99.8 9/12/14,MSFT,46.69 9/12/14,ORCL,40.5 https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 1/7
  • 2. 12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal 11 12 9/12/14,GOOG,576 9/12/14,AAPL,102.5 We want to analyze the each stock weekly trend. In order to that we need to create each stock based data. The below mapper code splits the read records from csv using TextInputFormat. The output mapper key is stock and value is price. 123456789 10 11 12 13 package com.jbksoft; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; import java.io.IOException; public class MyMultiOutputMapper extends Mapper<LongWritable, Text, Text, public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); String[] tokens = line.split(","); context.write(new Text(tokens[1]), new Text(tokens[2])); } } The below reducer code creates file for each stock. 123456789 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 package com.jbksoft; import org.apache.hadoop.io.NullWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs; import java.io.IOException; public class MyMultiOutputReducer extends Reducer<Text, Text, NullWritable, MultipleOutputs<NullWritable, Text> mos; public void setup(Context context) { mos = new MultipleOutputs(context); } public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException { for (Text value : values) { mos.write(NullWritable.get(), value, key.toString()); } } protected void cleanup(Context context) throws IOException, InterruptedException { mos.close(); } } The driver for the code: https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 2/7
  • 3. 12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal 123456789 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 package com.jbksoft; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.LazyOutputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import java.io.IOException; public class MyMultiOutputTest { public static void main(String[] args) throws IOException, InterruptedException, Path inputDir = new Path(args[0]); Path outputDir = new Path(args[1]); Configuration conf = new Configuration(); Job job = new Job(conf); job.setJarByClass(MyMultiOutputTest.class); job.setJobName("My MultipleOutputs Demo"); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(Text.class); job.setMapperClass(MyMultiOutputMapper.class); job.setReducerClass(MyMultiOutputReducer.class); FileInputFormat.setInputPaths(job, inputDir); FileOutputFormat.setOutputPath(job, outputDir); LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class); job.waitForCompletion(true); } } The command for executing above code(compiled and packaged as jar): 123456789 aagarwal‐mbpro:~ ashok.agarwal$ hadoop jar test.jar com.jbksoft.MyMultiOutputTest aagarwal‐mbpro:~ ashok.agarwal$ ls ‐l /Users/ashok.agarwal/dev/HBaseDemo/output total 32 ‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 25 Sep 11 11:32 AAPL‐r‐00000 ‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 20 Sep 11 11:32 GOOG‐r‐00000 ‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 20 Sep 11 11:32 MSFT‐r‐00000 ‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 19 Sep 11 11:32 ORCL‐r‐00000 ‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 0 Sep 11 11:32 _SUCCESS aagarwal‐mbpro:~ ashok.agarwal$ The test case for the above code can be created using MRunit. The reducer needs to be mocked over here as below: https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 3/7
  • 4. 12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal 123456789 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 package com.jbksoft.test; import com.jbksoft.MyMultiOutputReducer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.NullWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs; import org.apache.hadoop.mrunit.mapreduce.ReduceDriver; import org.apache.hadoop.mrunit.types.Pair; import org.junit.Before; import org.junit.Test; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertTrue; public class MyMultiOutputReducerTest { MockOSReducer reducer; ReduceDriver<Text, Text, NullWritable, Text> reduceDriver; Configuration config; Map<String, List<Text>> outputCSVFiles; static String[] CSV = { "9/8/14,MSFT,47", "9/8/14,ORCL,40", "9/8/14,GOOG,577", "9/8/14,AAPL,100.4", "9/9/14,MSFT,46", "9/9/14,ORCL,41", "9/9/14,GOOG,578" }; class MockOSReducer extends MyMultiOutputReducer { private Map<String, List<Text>> multipleOutputMap; public MockOSReducer(Map<String, List<Text>> map) { super(); multipleOutputMap = map; } @Override public void setup(Reducer.Context context) { mos = new MultipleOutputs<NullWritable, Text>(context) { @Override public void write(NullWritable key, Text value, String outputFileName) throws java.io.IOException, java.lang.InterruptedException { List<Text> outputs = multipleOutputMap.get(outputFileName); if (outputs == null) { outputs = new ArrayList<Text>(); https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 4/7
  • 5. 12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 multipleOutputMap.put(outputFileName, outputs); } outputs.add(new Text(value)); } }; config = context.getConfiguration(); } } @Before public void setup() throws Exception { config = new Configuration(); outputCSVFiles = new HashMap<String, List<Text>>(); reducer = new MockOSReducer(outputCSVFiles); reduceDriver = ReduceDriver.newReduceDriver(reducer); reduceDriver.setConfiguration(config); } @Test public void testReduceInput1Output() throws Exception { List<Text> list = new ArrayList<Text>(); list.add(new Text("47")); list.add(new Text("46")); list.add(new Text("48")); reduceDriver.withInput(new Text("MSFT"), list); reduceDriver.runTest(); Map<String, List<Text>> expectedCSVOutput = new HashMap<String, List<Text> outputs = new ArrayList<Text>(); outputs.add(new Text("47")); outputs.add(new Text("46")); outputs.add(new Text("48")); expectedCSVOutput.put("MSFT", outputs); validateOutputList(outputCSVFiles, expectedCSVOutput); } static void print(Map<String, List<Text>> outputCSVFiles) { for (String key : outputCSVFiles.keySet()) { List<Text> valueList = outputCSVFiles.get(key); for (Text pair : valueList) { System.out.println("OUTPUT " + key + " = " + pair.toString()); } } } https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 5/7
  • 6. 12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 protected void validateOutputList(Map<String, List<Text>> actuals, Map<String, List<Text>> expects) { List<String> removeList = new ArrayList<String>(); for (String key : expects.keySet()) { removeList.add(key); List<Text> expectedValues = expects.get(key); List<Text> actualValues = actuals.get(key); int expectedSize = expectedValues.size(); int actualSize = actualValues.size(); int i = 0; assertEquals("Number of output CSV files is " + actualSize + " actualSize, expectedSize); while (expectedSize > i || actualSize > i) { if (expectedSize > i && actualSize > i) { Text expected = expectedValues.get(i); Text actual = actualValues.get(i); assertTrue("Expected CSV content is " + expected.toString() + "expected.equals(actual)); } i++; } } } } The mapper unit test can be as below: 123456789 10 11 12 13 14 15 16 17 18 19 20 package com.jbksoft.test; import com.jbksoft.MyMultiOutputMapper; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mrunit.mapreduce.MapDriver; import org.apache.hadoop.mrunit.types.Pair; import org.junit.Before; import org.junit.Test; import java.util.ArrayList; import java.util.List; public class MyMultiOutputMapperTest { MyMultiOutputMapper mapper; MapDriver<LongWritable, Text, Text, Text> mapDriver; Configuration config; static String[] CSV = { "9/8/14,MSFT,47", "9/8/14,ORCL,40", "9/8/14,GOOG,577" https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 6/7
  • 7. 12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 }; @Before public void setup() throws Exception { config = new Configuration(); mapper = new MyMultiOutputMapper(); mapDriver = MapDriver.newMapDriver(mapper); mapDriver.setConfiguration(config); } @Test public void testMapInput1Output() throws Exception { mapDriver.withInput(new LongWritable(), new Text(CSV[0])); mapDriver.withOutput(new Text("MSFT"), new Text("47")); mapDriver.runTest(); } @Test public void testMapInput2Output() throws Exception { final List<Pair<LongWritable, Text>> inputs = new ArrayList<Pair&inputs.add(new Pair<LongWritable, Text>(new LongWritable(), new Text(CSV[ inputs.add(new Pair<LongWritable, Text>(new LongWritable(), new Text(CSV[ final List<Pair<Text, Text>> outputs = new ArrayList<Pair<outputs.add(new Pair<Text, Text>(new Text("MSFT"), new Text(&outputs.add(new Pair<Text, Text>(new Text("ORCL"), new Text(&// mapDriver.withAll(inputs).withAllOutput(outputs).runTest(); } } References: 1. MapReduce Tutorial 2. HDFS Architecture 3. MultipileOutputs 4. MRUnit About Occasionally, these ads some of your visitors may see an advertisement here. Tell me more | Dismiss this message Blog at WordPress.com. The Chateau Theme. https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 7/7