1 © Hortonworks Inc. 2011–2018. All rights reserved
Multi-Lingual Accumulo
Communications
Marc Parisi
https://s.mrsegfault.com/summit
2 © Hortonworks Inc. 2011–2018. All rights reserved
About
• I write code
• Apache NiFi / Apache Accumulo
• Serial website owner
• https://www.iotallthethings.com
3 © Hortonworks Inc. 2011–2018. All rights reserved
Agenda
• History and Why
• Design
• Comparison Techniques
4 © Hortonworks Inc. 2011–2018. All rights reserved
History
• April 2012 – June 2016, 2018 Python
• Started with C
• Migrated to C++
• Bad idea
• Accumulo & Hbase & Cassandra
5 © Hortonworks Inc. 2011–2018. All rights reserved
Questions that I get
• Why?
• Good Question!
• What is the purpose?
• Haven't a clue!
• Why not use proxy?
• A what?!
6 © Hortonworks Inc. 2011–2018. All rights reserved
Why?
• Small footprint ( Ruby/Python/Go)
• Colossus
• Reduces GC
• Reduces overhead
• Hedged reads
• Raw speed not factor!
7 © Hortonworks Inc. 2011–2018. All rights reserved
Colossus
• First Iteration showed promise
• Shared memory locking and control to shared cache [3]
• C++ Server kept memory map and forwarded certain operations to Java Server
• Hedged Reads
• Complement to client
8 © Hortonworks Inc. 2011–2018. All rights reserved
Hedged Reads
• RPC scans + Functor
• Keep track of winner to improve future scans
• Functor Scans RFiles with custom iterator stack
• Read Tserver in memory maps
• C++ RFile reads 3-5x faster in 2012 – writes 1-2x slower
• Libhdfs3 predecessor
9 © Hortonworks Inc. 2011–2018. All rights reserved
I'm not saying it's a bad idea...
• C / C++ developer limits
• C++11 improvements
•Template metaprogramming
• SFINAE
10 © Hortonworks Inc. 2011–2018. All rights reserved
Design
https://accumulo.apache.org/1.3/user_manual/Accumulo_Design.html
11 © Hortonworks Inc. 2011–2018. All rights reserved
Client Design
• Abstractions for client/server services [4]
• Differs for different client types
• Interconnects abstracted per client implementation
• Constructed types passed to services via templates
• Iterators of KeyValue support inline low-cost object translation
• Enables viewing stream as typed objects
12 © Hortonworks Inc. 2011–2018. All rights reserved
Interactions at a glance
13 © Hortonworks Inc. 2011–2018. All rights reserved
Abstractions -- Interconnect Layer
• Interconnect defines templates for all operations
• Templates <template typename T> are compile time
• Some run time polymorphism
• Some compile time
• Enables connecting to different services/systems more seamlessly
• Improves code re-use between other key value stores
14 © Hortonworks Inc. 2011–2018. All rights reserved
Connection Heuristics
• Interconnects can define heuristic functions
• F(x) will define how scan ranges can be interrogated
• Intended to be combined across multiple clients
• Clients can and should interact
• Goals
• Avoid TServer overhead
• Hedged Reads
15 © Hortonworks Inc. 2011–2018. All rights reserved
Is this any better?
• Avoiding proxy beneficial
• Look at stats soon
• Not necessarily better than Java Client
• Most operations are I/O bound at TServer
• Let's take a closer look
16 © Hortonworks Inc. 2011–2018. All rights reserved
Proxy
• Additional RPC calls
• Additional GC
• Could be load balanced
• Doesn't facilitate language elitism...
17 © Hortonworks Inc. 2011–2018. All rights reserved
Python by example
• Python code uses sharkbite
• Cost incurred for ctypes to call dynamically linked symbols
• RPC costs are same as those in sharkbite client and java client
• Thrift RPC
18 © Hortonworks Inc. 2011–2018. All rights reserved
C client
• Writing C is not actually native
• Conversion from C structs to C++ objects
• Not user friendly
• Is C?
• Impacts Python too
19 © Hortonworks Inc. 2011–2018. All rights reserved
Sharkbite Python
20 © Hortonworks Inc. 2011–2018. All rights reserved
Proxy
• Creates a connection to a proxy server
• Communications through proxy incur additional RPC
• RPC communications are native ( only need proxy )
• No cost to make those RPC calls
21 © Hortonworks Inc. 2011–2018. All rights reserved
Thrift Proxy
22 © Hortonworks Inc. 2011–2018. All rights reserved
GoLang
• Better integration [5]
• Less overhead
• Improved compile optimizations
• Variadic functions not supported
23 © Hortonworks Inc. 2011–2018. All rights reserved
Integrated Python vs Proxy
• R710 – 144 GB ram with 24 Cores
• Heavily loaded servers running multiple services ( cameras, transcoding, etc )
• 3 servers - 2 Tabletservers – 32 gb per VM – no special configuration
• Write 10 million keys on fresh tables.
• Different rows
• Not concerned with costs associated with splits
• 100 runs of each write session from monitor JSON interface
• Computed average
24 © Hortonworks Inc. 2011–2018. All rights reserved
Integrated Python vs Proxy
Client Write Throughput Max Memory Footprint
Sharkbite Client 122,272 k/s 29 kb
Java Client 100,183 k/s 320 MB
Sharkbite Python 37,239 k/s 45 kb
Proxy Python 14,293 k/s 47 kb (client)
1.2 GB ( proxy )
25 © Hortonworks Inc. 2011–2018. All rights reserved
Performance Analysis
• DO. NOT. READ. INTO. THIS.
• Do not read into this
• GC was not really limiting factor ( 1 GB heap on Java Client )
• Not enough keys created to stress
• Memory footprint of Sharkbite client kept small due to small
flush setting
• Sharkbite python costs spent converting objects
• *I think RPC costs impacted proxy python client.
26 © Hortonworks Inc. 2011–2018. All rights reserved
A word of caution
• Perception of better client
• Tserver configurations have bigger impact
• Tablet Server tuning
27 © Hortonworks Inc. 2011–2018. All rights reserved
Conclusion
• Sharkbite C++ interconnect
• Not ready for primetime
• May better facilitate languages
28 © Hortonworks Inc. 2011–2018. All rights reserved
[6]

Multi-Lingual Accumulo Communications

  • 1.
    1 © HortonworksInc. 2011–2018. All rights reserved Multi-Lingual Accumulo Communications Marc Parisi https://s.mrsegfault.com/summit
  • 2.
    2 © HortonworksInc. 2011–2018. All rights reserved About • I write code • Apache NiFi / Apache Accumulo • Serial website owner • https://www.iotallthethings.com
  • 3.
    3 © HortonworksInc. 2011–2018. All rights reserved Agenda • History and Why • Design • Comparison Techniques
  • 4.
    4 © HortonworksInc. 2011–2018. All rights reserved History • April 2012 – June 2016, 2018 Python • Started with C • Migrated to C++ • Bad idea • Accumulo & Hbase & Cassandra
  • 5.
    5 © HortonworksInc. 2011–2018. All rights reserved Questions that I get • Why? • Good Question! • What is the purpose? • Haven't a clue! • Why not use proxy? • A what?!
  • 6.
    6 © HortonworksInc. 2011–2018. All rights reserved Why? • Small footprint ( Ruby/Python/Go) • Colossus • Reduces GC • Reduces overhead • Hedged reads • Raw speed not factor!
  • 7.
    7 © HortonworksInc. 2011–2018. All rights reserved Colossus • First Iteration showed promise • Shared memory locking and control to shared cache [3] • C++ Server kept memory map and forwarded certain operations to Java Server • Hedged Reads • Complement to client
  • 8.
    8 © HortonworksInc. 2011–2018. All rights reserved Hedged Reads • RPC scans + Functor • Keep track of winner to improve future scans • Functor Scans RFiles with custom iterator stack • Read Tserver in memory maps • C++ RFile reads 3-5x faster in 2012 – writes 1-2x slower • Libhdfs3 predecessor
  • 9.
    9 © HortonworksInc. 2011–2018. All rights reserved I'm not saying it's a bad idea... • C / C++ developer limits • C++11 improvements •Template metaprogramming • SFINAE
  • 10.
    10 © HortonworksInc. 2011–2018. All rights reserved Design https://accumulo.apache.org/1.3/user_manual/Accumulo_Design.html
  • 11.
    11 © HortonworksInc. 2011–2018. All rights reserved Client Design • Abstractions for client/server services [4] • Differs for different client types • Interconnects abstracted per client implementation • Constructed types passed to services via templates • Iterators of KeyValue support inline low-cost object translation • Enables viewing stream as typed objects
  • 12.
    12 © HortonworksInc. 2011–2018. All rights reserved Interactions at a glance
  • 13.
    13 © HortonworksInc. 2011–2018. All rights reserved Abstractions -- Interconnect Layer • Interconnect defines templates for all operations • Templates <template typename T> are compile time • Some run time polymorphism • Some compile time • Enables connecting to different services/systems more seamlessly • Improves code re-use between other key value stores
  • 14.
    14 © HortonworksInc. 2011–2018. All rights reserved Connection Heuristics • Interconnects can define heuristic functions • F(x) will define how scan ranges can be interrogated • Intended to be combined across multiple clients • Clients can and should interact • Goals • Avoid TServer overhead • Hedged Reads
  • 15.
    15 © HortonworksInc. 2011–2018. All rights reserved Is this any better? • Avoiding proxy beneficial • Look at stats soon • Not necessarily better than Java Client • Most operations are I/O bound at TServer • Let's take a closer look
  • 16.
    16 © HortonworksInc. 2011–2018. All rights reserved Proxy • Additional RPC calls • Additional GC • Could be load balanced • Doesn't facilitate language elitism...
  • 17.
    17 © HortonworksInc. 2011–2018. All rights reserved Python by example • Python code uses sharkbite • Cost incurred for ctypes to call dynamically linked symbols • RPC costs are same as those in sharkbite client and java client • Thrift RPC
  • 18.
    18 © HortonworksInc. 2011–2018. All rights reserved C client • Writing C is not actually native • Conversion from C structs to C++ objects • Not user friendly • Is C? • Impacts Python too
  • 19.
    19 © HortonworksInc. 2011–2018. All rights reserved Sharkbite Python
  • 20.
    20 © HortonworksInc. 2011–2018. All rights reserved Proxy • Creates a connection to a proxy server • Communications through proxy incur additional RPC • RPC communications are native ( only need proxy ) • No cost to make those RPC calls
  • 21.
    21 © HortonworksInc. 2011–2018. All rights reserved Thrift Proxy
  • 22.
    22 © HortonworksInc. 2011–2018. All rights reserved GoLang • Better integration [5] • Less overhead • Improved compile optimizations • Variadic functions not supported
  • 23.
    23 © HortonworksInc. 2011–2018. All rights reserved Integrated Python vs Proxy • R710 – 144 GB ram with 24 Cores • Heavily loaded servers running multiple services ( cameras, transcoding, etc ) • 3 servers - 2 Tabletservers – 32 gb per VM – no special configuration • Write 10 million keys on fresh tables. • Different rows • Not concerned with costs associated with splits • 100 runs of each write session from monitor JSON interface • Computed average
  • 24.
    24 © HortonworksInc. 2011–2018. All rights reserved Integrated Python vs Proxy Client Write Throughput Max Memory Footprint Sharkbite Client 122,272 k/s 29 kb Java Client 100,183 k/s 320 MB Sharkbite Python 37,239 k/s 45 kb Proxy Python 14,293 k/s 47 kb (client) 1.2 GB ( proxy )
  • 25.
    25 © HortonworksInc. 2011–2018. All rights reserved Performance Analysis • DO. NOT. READ. INTO. THIS. • Do not read into this • GC was not really limiting factor ( 1 GB heap on Java Client ) • Not enough keys created to stress • Memory footprint of Sharkbite client kept small due to small flush setting • Sharkbite python costs spent converting objects • *I think RPC costs impacted proxy python client.
  • 26.
    26 © HortonworksInc. 2011–2018. All rights reserved A word of caution • Perception of better client • Tserver configurations have bigger impact • Tablet Server tuning
  • 27.
    27 © HortonworksInc. 2011–2018. All rights reserved Conclusion • Sharkbite C++ interconnect • Not ready for primetime • May better facilitate languages
  • 28.
    28 © HortonworksInc. 2011–2018. All rights reserved [6]