As the volume of data increases utilizing Accumulo as a backend for user applications is very appealing, but care must be taken to ensure service-level agreements are met and system response time is short and consistent. This talk focuses on problems encountered when implementing Accumulo as a backend for a user application and the specific techniques used to counter these issues. This talk covers the following topics:
Increasing performance for repeated queries
The effects of storage techniques on query response time
Provide near instant results when querying large datasets
– Speaker –
Zach Radtka
Platform Engineer, Miner & Kasch
Zachary Radtka is a platform engineer at the data science firm Miner & Kasch and has extensive experience creating custom analytics that run on petabyte-scale datasets. Zach is the author of the O'Reilly book Hadoop with Python. Zach is an experienced educator, having instructed collegiate-level computer science classes, professional training classes on Big Data technologies, and public technology tutorials. He has also created production-level analytics for many industries, including US government, financial, healthcare, telecommunications, and retail.
— More Information —
For more information see http://www.accumulosummit.com/
15. Increase Speed of Scan: Combine and Compress
RowID CF:CQ Vis Value
apple fruit:produce [] 10
avocado fruit:produce [] 4
banana fruit:produce [] 11
beet vegetable:produce [] 3
Results will be returned in order: 10, 4, 11, 3
Using an iterator to combine the results and compress them will result in less
network traffic
16. Increase Speed of Scans: Store More Efficiently
RowID CF:CQ Vis Value
fruit department:produce [] {"apple":10, "avocado":4, "banana":11}
vegetable department:produce [] {"beet":3}
Similar to previous solution except no need for an iterator
Greatly reduces the amount of next() and seek() calls