Successfully reported this slideshow.
Unit Testing Map Reduce Jobs in                      HadoopSpeaker Details :Anirudh Bhatnagar     Senior Consultant-Xebia ...
Agenda●   Hadoop Introduction●   What is Map Reduce [Sample Code]●   Map-Reduce Testing using Mockito [Sample Code]●   Sho...
What is Hadoop??
WHY Hadoop???
How Hadoop works?
What is Map Reduce
Map Reduce Execution
Sample Map Reduce Code ●   All examples and setup is done for a single     node cluster- map(LongWritable key, Text value,...
Problem StatementTo find the top trend of all the given tags in             different user logs
Sample Code Unit Testing with             Mockito●   No MRUnit code used
Shortcoming with Mockito●   Not very intuitive for Map Reduce style of    programming●   Semantics for Map-Reduce are diff...
MRUnit Test Harness●   Very intuitive for Map-Reduce style of prorgamming●   MRUnit helps bridge the gap between MapReduce...
Sample Code with MRunit●   Used in combination with Junit to get better    control on log messages●   Easily integrable wi...
Gotchas With MRUnit●   MapDriver.withInput supports only one input    types, multiple inputs are replaced sequentially    ...
What Lies Ahead●   MiniMRCluster and MiniDFSCluster classes    offer full-blown in-memory MapReduce and    HDFS clusters, ...
Questions??
Bibliography●   Books    –   Hadoop in Practice    –   Hadoop Definitive Guide    –   Hadoop in Action●   Links    –   htt...
Agile NCR 2013- Anirudh Bhatnagar - Hadoop unit testing agile ncr
Upcoming SlideShare
Loading in …5
×

Agile NCR 2013- Anirudh Bhatnagar - Hadoop unit testing agile ncr

1,170 views

Published on

  • Be the first to comment

  • Be the first to like this

Agile NCR 2013- Anirudh Bhatnagar - Hadoop unit testing agile ncr

  1. 1. Unit Testing Map Reduce Jobs in HadoopSpeaker Details :Anirudh Bhatnagar Senior Consultant-Xebia India abhatnagar@xebia.comSanchit Agarwal Senior Consultant-Xebia India sagarwal@xebia.com
  2. 2. Agenda● Hadoop Introduction● What is Map Reduce [Sample Code]● Map-Reduce Testing using Mockito [Sample Code]● Shortcomings with Mockito● MRUnit Test Harness [Sample Code]● Advantages of MRUnit● What Lies Ahead
  3. 3. What is Hadoop??
  4. 4. WHY Hadoop???
  5. 5. How Hadoop works?
  6. 6. What is Map Reduce
  7. 7. Map Reduce Execution
  8. 8. Sample Map Reduce Code ● All examples and setup is done for a single node cluster- map(LongWritable key, Text value, Contextcontext) {Mapper Class}- reduce(Text key, Iterable<IntWritable>values, Context context) {Reducer Class}
  9. 9. Problem StatementTo find the top trend of all the given tags in different user logs
  10. 10. Sample Code Unit Testing with Mockito● No MRUnit code used
  11. 11. Shortcoming with Mockito● Not very intuitive for Map Reduce style of programming● Semantics for Map-Reduce are different in subtle ways as compared to how it is done with Mockito● Might be equally good in some scenarios and might fail to cover more complex scenarios
  12. 12. MRUnit Test Harness● Very intuitive for Map-Reduce style of prorgamming● MRUnit helps bridge the gap between MapReduce programs and JUnit by providing a set of interfaces and test harnesses, which allow MapReduce programs to be more easily tested using standard tools and practices.● Provides 4 drivers for seperately testing Map-Reduce code – MapDriver – ReduceDriver – MapReduceDriver – PipelineMapReduceDriver
  13. 13. Sample Code with MRunit● Used in combination with Junit to get better control on log messages● Easily integrable with Junit
  14. 14. Gotchas With MRUnit● MapDriver.withInput supports only one input types, multiple inputs are replaced sequentially and last one is used● Handle runTest() and run() methods with care, runTest() runs the test and returns void while run() executes the test and return a list of output map.● PipelineMapReduceDriver only supports old Hadoop API
  15. 15. What Lies Ahead● MiniMRCluster and MiniDFSCluster classes offer full-blown in-memory MapReduce and HDFS clusters, and can launch multiple MapReduce and HDFS nodes● Best Practices and Debugging techniques for Map-Reduce
  16. 16. Questions??
  17. 17. Bibliography● Books – Hadoop in Practice – Hadoop Definitive Guide – Hadoop in Action● Links – http://hadoop.apache.org/● Blogs – http://codingjunkie.net/testing-hadoop-programs-with-mrunit/ – http://java.dzone.com/articles/effective-testing-strategies – https://github.com/alexholmes/blog/blob/master/_posts/2012-10-20- hadoop-unit-testing-with-minimrcluster.markdown

×