Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Designing and Testing Accumulo Iterators


Published on

Slides from the Washington DC Accumulo Meetup group on 10 November 2015. Topic covered is iterator design and a new test framework.

Published in: Software
  • Be the first to comment

Designing and Testing Accumulo Iterators

  1. 1. © Hortonworks Inc. 2014 Designing and Testing Accumulo Iterators Josh Elser Member of Technical Staff PMC, Apache Accumulo 10, November 2015 Page 1 Apache, Accumulo, and Apache Accumulo are trademarks of the Apache Software Foundation.
  2. 2. © Hortonworks Inc. 2014 Design Page 2 How do I know if my Iterator works? What can I do in an Iterator? How are these methods even called?!
  3. 3. © Hortonworks Inc. 2014 Common Patterns Only a certain subset of algorithms fit into Accumulo Iterators well. (Avoid shoving a square peg into a round hole.) • Filtering • Reduction • Bounded aggregations –Keep an upper bound on the number of elements being aggregated to avoid memory issues • Transformations –Key sort-order must be retained –Best limited to the Value only Page 3
  4. 4. © Hortonworks Inc. 2014 Design Josh’s Iterator Design Principles: •Always make forward-progress •Think functional – Avoid unnecessary state •Operate only on the data you have •Do one thing and do it efficiently Page 4
  5. 5. © Hortonworks Inc. 2014 Design Page 5 Make Forward Progress Start End
  6. 6. © Hortonworks Inc. 2014 Think about your Iterator like a function Unnecessary State Page 6 def sum(list): sum = 0 for entry in list: sum += entry return sum • Avoid holding onto state when at all possible. • Think in terms of a stream rather than chunks of data. • Beware of memory implications when performing aggregations.
  7. 7. © Hortonworks Inc. 2014 Operate locally Daily Reminder: Iterators have no calls for implementing a safe cleanup. • Iterators cannot properly handle I/O-related issues to external systems. • Slow-external calls result in slow Accumulo. • Some problems are more-safely implemented outside of an Accumulo Iterators. Not a Coprocessor/Container. Page 7
  8. 8. © Hortonworks Inc. 2014 Simplicity Avoid doing multiple things in a single Iterator. •Object Oriented Design 101 •Iterators can be tricky to debug on their own •Configuring multiple iterators are a feature Page 8
  9. 9. © Hortonworks Inc. 2014 Testing You should always test your code before running it in any environment to ensure that it functions as intended. Page 9
  10. 10. © Hortonworks Inc. 2014 Testing HOW? Page 10
  11. 11. © Hortonworks Inc. 2014 Testing A framework designed for testing Iterators given input, a Range, options, and expected output. Page 11
  12. 12. © Hortonworks Inc. 2014 Testing Page 12 Test Test Test Test Test Iterator Class Range Iterator Options Sorted Input Data Verification of output records OR True/False check User-Provided Framework
  13. 13. © Hortonworks Inc. 2014 Features •Auto-Discovery of test cases •JUnit Parameterized test integration •Provided Generic Tests –Default Constructor –Re-Seek (teardown) –Deep Copy Verification Page 13
  14. 14. © Hortonworks Inc. 2014 Future Work •More Iterator tests! •A final resting place for the code •Documentation •Usability testing Page 14
  15. 15. © Hortonworks Inc. 2014 Thanks! Page 15