Testing Hadoop jobs with MRUnit
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Testing Hadoop jobs with MRUnit

on

  • 20,685 views

Real-world examples and struggles with MRUnit testing Hadoop MapReduce jobs.

Real-world examples and struggles with MRUnit testing Hadoop MapReduce jobs.

Statistics

Views

Total Views
20,685
Views on SlideShare
13,416
Embed Views
7,269

Actions

Likes
17
Downloads
303
Comments
0

56 Embeds 7,269

http://eriwen.com 4334
http://www.eriwen.com 1463
http://feeds2.feedburner.com 428
http://www.nofluffjuststuff.com 239
http://kernel-panik.blogspot.com 220
http://therichwebexperience.com 146
http://feeds.feedburner.com 94
http://nourlcn.ownlinux.net 81
http://kernel-panik.blogspot.in 58
http://www.slideshare.net 23
http://www.therichwebexperience.com 22
http://kernel-panik.blogspot.ru 11
http://kernel-panik.blogspot.fr 10
http://kernel-panik.blogspot.de 9
http://feedplanets.com 7
http://polygynia2.rssing.com 7
http://kernel-panik.blogspot.co.uk 7
http://dashboard.bloglines.com 7
http://127.0.0.1 6
http://www.newsblur.com 6
http://www.linkedin.com 5
http://localhost:4000 5
http://www.springone2gx.com 5
http://kernel-panik.blogspot.kr 5
http://www.nfjsone.com 5
http://kernel-panik.blogspot.co.il 4
http://www.rxx.co.il 4
http://kernel-panik.blogspot.nl 4
http://kernel-panik.blogspot.com.br 4
http://blog.ownlinux.net 4
http://projectautomationexperience.com 4
http://kernel-panik.blogspot.com.es 4
http://uberconf.com 3
http://kernel-panik.blogspot.it 2
http://www.hanrss.com 2
http://kernel-panik.blogspot.com.au 2
http://kernel-panik.blogspot.jp 2
http://prlog.ru 2
http://xss.yandex.net 2
http://newsblur.com 2
http://www.twylah.com 2
http://kernel-panik.blogspot.com.ar 2
http://translate.googleusercontent.com 2
http://kernel-panik.blogspot.ch 2
http://webcache.googleusercontent.com 2
http://kernel-panik.blogspot.co.at 1
http://www.zhuaxia.com 1
http://grails.resourcezen.com 1
http://kernel-panik.blogspot.dk 1
http://blogs.sun.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />

Testing Hadoop jobs with MRUnit Presentation Transcript

  • 1. Testing Hadoop jobs with MRUnit © 2010 Eric Wendelin
  • 2. Eric Wendelin Hadooper at Return Path Blog: eriwen.com Twitter: @eriwen
  • 3. What is MRUnit? • Testing library for MapReduce • Developed by Cloudera • Easy integration between MapReduce and standard testing tools (e.g. JUnit)
  • 4. Why do I need that?
  • 5. Testing without MRUnit • Write tests that create JobConf or Configuration objects • conf.set(‘mapred.job.tracker’, ‘local’) • Developing new test input files stored alongside MapReduce test code • Lots of work to validate output files • External file I/O makes tests slooooow
  • 6. MRUnit makes testing Hadoop jobs easier
  • 7. Testing with MRUnit • No external test input or output files • Programmatically specified • Less test harness code (but also perhaps less control) • Concise, fast tests
  • 8. class ExampleTest() { private Example.MyMapper mapper private Example.MyReducer reducer private MapReduceDriver driver @Before void setUp() { mapper = new Example.MyMapper() reducer = new Example.MyReducer() driver = new MapReduceDriver(mapper, reducer) } @Test void testMapReduce() { driver.withInput(new Text(‘a’), new Text(‘b’)) driver.withOutput(new Text(‘c’), new Text(‘d’)) driver.runTest() } }
  • 9. class ExampleTest() { private Example.MyMapper mapper private Example.MyReducer reducer private MapReduceDriver driver @Before void setUp() { mapper = new Example.MyMapper() reducer = new Example.MyReducer() driver = new MapReduceDriver(mapper, reducer) } @Test void testMapReduce() { driver.withInput(new Text(‘a’), new Text(‘b’)) .withOutput(new Text(‘c’), new Text(‘d’)) .runTest() } }
  • 10. Test map and reduce separately
  • 11. class ExampleTest() { private Example.MyMapper mapper private MapDriver mDriver @Before void setUp() { mapper = new Example.MyMapper() driver = new MapDriver(mapper) } @Test void testMap() { mDriver.withInput(new Text(‘a’), new Text(‘b’)) mDriver.withOutput(new Text(‘c’), new Text(‘d’)) mDriver.runTest() } }
  • 12. class ExampleTest() { private Example.MyReducer reducer private ReduceDriver rDriver @Before void setUp() { rDriver = new Example.MyReducer() driver = new ReduceDriver(reducer) } @Test void testReduce() { rDriver.withInput(new Text(‘a’), [new Text(‘foo’), new Text(‘bar’)]) rDriver.withOutput(new Text(‘c’), new Text(‘d’)) rDriver.runTest() } }
  • 13. Counters!
  • 14. driver.withInput(...) driver.run() def counters = driver.getCounters() assertEquals(1, counters.findCounter (‘foo’, ‘bar’).getValue())
  • 15. Verifying logging
  • 16. def messages = [] def appender = [ append: { messages.add(it) }, requiresLayout: { false } ] as AppenderSkeleton Logger.getRootLogger().addAppender(appender) driver.runTest() assertTrue messages.find { it.getLevel.toString() == ‘WARN’ && it.getMessage().contains(‘My err’) } Logger.getRootLogger().removeAppender(appender)
  • 17. Cool stuff I haven’t tried... • The PipelineMapReduceDriver - allows testing a series of MapReduce passes • Just call addMapReduce(mapper, reducer) • Mock objects - MockReporter, MockInputSplit, and MockOutputCollector • Test combiners with myMapReduceDriver.setCombiner(myCombiner)
  • 18. Problems with MRUnit
  • 19. runTest() does not give meaningful information on failure
  • 20. Better to use run() and then assert
  • 21. driver.setInput(new Text(‘foo’), new Text(‘bar’)) def output = driver.run() assertEquals ‘baz’, output[0].first assertEquals ‘jy’, output[0].second
  • 22. Documentation is severely lacking
  • 23. runXxx() calls setup() called for new Hadoop API, but not old API
  • 24. Tests are not executed in a distributed way
  • 25. In Summary, MRUnit... • Makes testing your Hadoop jobs easier • Abstracts away a lot of the boilerplate test setup you need • Has it’s problems • but they are outweighed by the benefits
  • 26. cloudera.com/hadoop-mrunit Blog: eriwen.com Twitter: @eriwen Email: eric.wendelin@returnpath.net © 2010 Eric Wendelin