MRUnit is a testing library that makes it easier to test Hadoop jobs. It allows programmatically specifying test input and output, reducing the need for external test files. Tests can focus on individual map and reduce functions. MRUnit abstracts away much of the boilerplate test setup code, though it has some limitations like a lack of distributed testing. Overall though, the benefits of using MRUnit to test Hadoop jobs outweigh the problems.
What is MRUnit?
•Testing library for MapReduce
• Developed by Cloudera
• Easy integration between MapReduce
and standard testing tools (e.g. JUnit)
cloudera.com/hadoop-mrunit
Testing without MRUnit
•Write tests that create JobConf or
Configuration objects
• conf.set(‘mapred.job.tracker’, ‘local’)
• Developing new test input files stored
alongside MapReduce test code
• Lots of work to validate output files
• External file I/O makes tests slooooow
Testing with MRUnit
•No external test input or output files
• Programmatically specified
• Less test harness code (but also perhaps
less control)
• Concise, fast tests
8.
Example
class ExampleTest() {
private Example.MyMapper mapper
private Example.MyReducer reducer
private MapReduceDriver driver
@Before void setUp() {
mapper = new Example.MyMapper()
reducer = new Example.MyReducer()
driver = new MapReduceDriver(mapper, reducer)
}
@Test void testMapReduce() {
driver.withInput(new Text(‘a’), new Text(‘b’))
driver.withOutput(new Text(‘c’), new Text(‘d’))
driver.runTest()
}
}
9.
Example
class ExampleTest() {
private Example.MyMapper mapper
private Example.MyReducer reducer
private MapReduceDriver driver
@Before void setUp() {
mapper = new Example.MyMapper()
reducer = new Example.MyReducer()
driver = new MapReduceDriver(mapper, reducer)
}
@Test void testMapReduce() {
driver.withInput(new Text(‘a’), new Text(‘b’))
.withOutput(new Text(‘c’), new Text(‘d’))
.runTest()
}
}
Cool stuff Ihaven’t
tried...
• The PipelineMapReduceDriver - allows
testing a series of MapReduce passes
• Just call addMapReduce(mapper, reducer)
• Mock objects - MockReporter,
MockInputSplit, and MockOutputCollector
• Test combiners with
myMapReduceDriver.setCombiner(myCombiner)
In Summary, MRUnit...
•Makes testing your Hadoop jobs easier
• Abstracts away a lot of the boilerplate test
setup you need
• Has it’s problems
• but they are outweighed by the benefits