Designing and Testing Accumulo Iterators

5,121 views

Published on

Slides from the Washington DC Accumulo Meetup group on 10 November 2015. Topic covered is iterator design and a new test framework.

Published in: Software
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,121
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
10
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Designing and Testing Accumulo Iterators

  1. 1. © Hortonworks Inc. 2014 Designing and Testing Accumulo Iterators Josh Elser Member of Technical Staff PMC, Apache Accumulo 10, November 2015 Page 1 Apache, Accumulo, and Apache Accumulo are trademarks of the Apache Software Foundation.
  2. 2. © Hortonworks Inc. 2014 Design Page 2 How do I know if my Iterator works? What can I do in an Iterator? How are these methods even called?!
  3. 3. © Hortonworks Inc. 2014 Common Patterns Only a certain subset of algorithms fit into Accumulo Iterators well. (Avoid shoving a square peg into a round hole.) • Filtering • Reduction • Bounded aggregations –Keep an upper bound on the number of elements being aggregated to avoid memory issues • Transformations –Key sort-order must be retained –Best limited to the Value only Page 3
  4. 4. © Hortonworks Inc. 2014 Design Josh’s Iterator Design Principles: •Always make forward-progress •Think functional – Avoid unnecessary state •Operate only on the data you have •Do one thing and do it efficiently Page 4
  5. 5. © Hortonworks Inc. 2014 Design Page 5 Make Forward Progress Start End
  6. 6. © Hortonworks Inc. 2014 Think about your Iterator like a function Unnecessary State Page 6 def sum(list): sum = 0 for entry in list: sum += entry return sum • Avoid holding onto state when at all possible. • Think in terms of a stream rather than chunks of data. • Beware of memory implications when performing aggregations.
  7. 7. © Hortonworks Inc. 2014 Operate locally Daily Reminder: Iterators have no calls for implementing a safe cleanup. • Iterators cannot properly handle I/O-related issues to external systems. • Slow-external calls result in slow Accumulo. • Some problems are more-safely implemented outside of an Accumulo Iterators. Not a Coprocessor/Container. Page 7
  8. 8. © Hortonworks Inc. 2014 Simplicity Avoid doing multiple things in a single Iterator. •Object Oriented Design 101 •Iterators can be tricky to debug on their own •Configuring multiple iterators are a feature Page 8
  9. 9. © Hortonworks Inc. 2014 Testing You should always test your code before running it in any environment to ensure that it functions as intended. Page 9
  10. 10. © Hortonworks Inc. 2014 Testing HOW? Page 10
  11. 11. © Hortonworks Inc. 2014 Testing A framework designed for testing Iterators given input, a Range, options, and expected output. Page 11
  12. 12. © Hortonworks Inc. 2014 Testing Page 12 Test Test Test Test Test Iterator Class Range Iterator Options Sorted Input Data Verification of output records OR True/False check User-Provided Framework
  13. 13. © Hortonworks Inc. 2014 Features •Auto-Discovery of test cases •JUnit Parameterized test integration •Provided Generic Tests –Default Constructor –Re-Seek (teardown) –Deep Copy Verification Page 13
  14. 14. © Hortonworks Inc. 2014 Future Work •More Iterator tests! •A final resting place for the code •Documentation •Usability testing Page 14 https://issues.apache.org/jira/browse/ACCUMULO-626
  15. 15. © Hortonworks Inc. 2014 Thanks! jelser@hortonworks.com Page 15

×