Testing Hadoop jobs with MRUnit

  • 19,103 views
Uploaded on

Real-world examples and struggles with MRUnit testing Hadoop MapReduce jobs.

Real-world examples and struggles with MRUnit testing Hadoop MapReduce jobs.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
19,103
On Slideshare
0
From Embeds
0
Number of Embeds
28

Actions

Shares
Downloads
319
Comments
0
Likes
17

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide


























Transcript

  • 1. Testing Hadoop jobs with MRUnit © 2010 Eric Wendelin
  • 2. Eric Wendelin Hadooper at Return Path Blog: eriwen.com Twitter: @eriwen
  • 3. What is MRUnit? • Testing library for MapReduce • Developed by Cloudera • Easy integration between MapReduce and standard testing tools (e.g. JUnit)
  • 4. Why do I need that?
  • 5. Testing without MRUnit • Write tests that create JobConf or Configuration objects • conf.set(‘mapred.job.tracker’, ‘local’) • Developing new test input files stored alongside MapReduce test code • Lots of work to validate output files • External file I/O makes tests slooooow
  • 6. MRUnit makes testing Hadoop jobs easier
  • 7. Testing with MRUnit • No external test input or output files • Programmatically specified • Less test harness code (but also perhaps less control) • Concise, fast tests
  • 8. class ExampleTest() { private Example.MyMapper mapper private Example.MyReducer reducer private MapReduceDriver driver @Before void setUp() { mapper = new Example.MyMapper() reducer = new Example.MyReducer() driver = new MapReduceDriver(mapper, reducer) } @Test void testMapReduce() { driver.withInput(new Text(‘a’), new Text(‘b’)) driver.withOutput(new Text(‘c’), new Text(‘d’)) driver.runTest() } }
  • 9. class ExampleTest() { private Example.MyMapper mapper private Example.MyReducer reducer private MapReduceDriver driver @Before void setUp() { mapper = new Example.MyMapper() reducer = new Example.MyReducer() driver = new MapReduceDriver(mapper, reducer) } @Test void testMapReduce() { driver.withInput(new Text(‘a’), new Text(‘b’)) .withOutput(new Text(‘c’), new Text(‘d’)) .runTest() } }
  • 10. Test map and reduce separately
  • 11. class ExampleTest() { private Example.MyMapper mapper private MapDriver mDriver @Before void setUp() { mapper = new Example.MyMapper() driver = new MapDriver(mapper) } @Test void testMap() { mDriver.withInput(new Text(‘a’), new Text(‘b’)) mDriver.withOutput(new Text(‘c’), new Text(‘d’)) mDriver.runTest() } }
  • 12. class ExampleTest() { private Example.MyReducer reducer private ReduceDriver rDriver @Before void setUp() { rDriver = new Example.MyReducer() driver = new ReduceDriver(reducer) } @Test void testReduce() { rDriver.withInput(new Text(‘a’), [new Text(‘foo’), new Text(‘bar’)]) rDriver.withOutput(new Text(‘c’), new Text(‘d’)) rDriver.runTest() } }
  • 13. Counters!
  • 14. driver.withInput(...) driver.run() def counters = driver.getCounters() assertEquals(1, counters.findCounter (‘foo’, ‘bar’).getValue())
  • 15. Verifying logging
  • 16. def messages = [] def appender = [ append: { messages.add(it) }, requiresLayout: { false } ] as AppenderSkeleton Logger.getRootLogger().addAppender(appender) driver.runTest() assertTrue messages.find { it.getLevel.toString() == ‘WARN’ && it.getMessage().contains(‘My err’) } Logger.getRootLogger().removeAppender(appender)
  • 17. Cool stuff I haven’t tried... • The PipelineMapReduceDriver - allows testing a series of MapReduce passes • Just call addMapReduce(mapper, reducer) • Mock objects - MockReporter, MockInputSplit, and MockOutputCollector • Test combiners with myMapReduceDriver.setCombiner(myCombiner)
  • 18. Problems with MRUnit
  • 19. runTest() does not give meaningful information on failure
  • 20. Better to use run() and then assert
  • 21. driver.setInput(new Text(‘foo’), new Text(‘bar’)) def output = driver.run() assertEquals ‘baz’, output[0].first assertEquals ‘jy’, output[0].second
  • 22. Documentation is severely lacking
  • 23. runXxx() calls setup() called for new Hadoop API, but not old API
  • 24. Tests are not executed in a distributed way
  • 25. In Summary, MRUnit... • Makes testing your Hadoop jobs easier • Abstracts away a lot of the boilerplate test setup you need • Has it’s problems • but they are outweighed by the benefits
  • 26. cloudera.com/hadoop-mrunit Blog: eriwen.com Twitter: @eriwen Email: eric.wendelin@returnpath.net © 2010 Eric Wendelin