Organizations looking to use a NoSQL data store based on Big Table face a challenge when deciding between alternatives. Often superficial differences are overblown or worse, subtle differences aren't discovered until it's too late. In this talk we compare and contrast Apache Accumulo against Apache Cassandra and Apache HBase, diving deep into design differences and subtleties that may hinder a project only after reaching a certain amount of usage or data storage.
– Speaker –
Aaron Cordova
Co-founder and CTO, Koverse
Aaron has built multiple, large-scale, big data systems that are used by the intelligence, defense, finance and healthcare industries. Aaron co-founded Koverse Inc. Prior to that, Aaron spent five years as a researcher for the National Security Agency (NSA) where he developed and deployed into operations dozens of advanced analytical techniques. He is the founder of Apache Accumulo, a scalable and secure data store on top of Apache Hadoop and the author of the recently released O’Reilly book, Accumulo: Application Development, Table Design, and Best Practices.
— More Information —
For more information see http://www.accumulosummit.com/
13. Some Gotchas
• Usually requires tuning beyond what distros provide
• Not balancing clients and tablet servers
• Having many small tables vs few large tables
• Not as many free resources online like blog posts,
tutorials, forums, etc.
• Larger individual servers mean that server failure can
result in a large amount of data needing to be
replicated. Accumulo only needs to process recent
write-ahead log entries, however, before everything is
back online.
16. Recent Improvements in 1.7
• Client Authentication with Kerberos
• Data-Center Replication
• User-Initiated Compaction Strategies
• API Clarification
• Faster Startup via Configurable Threadpool Size for Assignments
• Group-Commit Threshold as a Factor of Data Size
• Balancing Groups of Tablets
• User-specified Durability
• Hadoop Metrics2 Support
• Distributed Tracing with Htrace
• Per-Table Volume Chooser
• Table and namespace custom properties
32. Security
• To use cell level security
• Ensure HBase is configured to use v 3 Hfile storage
• VisibilityController must be added to the list of co-
processors
• Setup Hadoop Group Mapping mechanism
• By default, visibility labels are lost on replication
• ! (not) included as an operator, making it more
important to ensure that clients can’t drop user
authorization tokens to avoid elevation of privilege
38. Architecture
• Tries to combine parts of BigTable and Amazon’s
Dynamo
• Designed to span data centers, allows users to
choose between CP and AP
• Every node is the same, no masters, no zookeeper,
storage is coupled with service
• Each server still uses a memtable, sstable files on
disk, compaction, sorting, etc
• Use the ‘gossip’ peer-to-peer protocol
59. 0 2 4 6 8 10 12 14 16 18
ad
built on
cloud
enterprise
gaming
marketing
mobile
social
web
Relative Usage by Declaration
Cassandra Hbase
https://en.wikipedia.org/wiki/Apache_Cassandra
https://hbase.apache.org/poweredbyhbase.html