Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
The Future of HBase
Lars Hofhansl
Principal Architect & VP, Salesforce
HBase PMC & Committer
Phoenix PMC & Committer
Apach...
Who’s driving it?
It’s us*
We define our future
* people in this room, developers, contributors, comments on mailing list, committers, PMC m...
Kafka
Spark
Cassandra
HBase
: Confluent
: Databricks
: Datastax
: ????
Kafka
Spark
Cassandra
HBase
: Confluent
: Databricks
: Datastax
: Cloudera. Hortonworks.
Kafka
Spark
Cassandra
HBase :
: Confluent
: Databricks
: Datastax
Adobe, Alibaba, Apple, Cask, Cloudera, Facebook, Google,...
Cloud?
Carter Page, Google
Cloud?
What does Cloud
mean for HBase’s
future?
More problems? More
work to do? Rather...
Free stuff?
Free stuff engineering.
HBase on GCP
HBase on Dataproc HBase on Cloud Bigtable
HBase on GCP
HBase on Dataproc HBase on Cloud Bigtable
HBase Client (>= v1.0)
Why is HBase the
client for Cloud
Bigtable?
Why HBase?
#1: Open source is the de facto
way that standards are defined
now.
Committers
Not committees
Why HBase?
#2: HBase is indisputably the
best open source
implementation the Bigtable
architecture.
Bigtable
HBase
Why HBase?
#3: Because supporting an
ecosystem is the right thing.
Technology needs a rich
community to flourish.
Supporting how?
Rich abstractions on top of HBase
Future big data customers need fully formed solutions:
A great graph database
A great Io...
But there’s already a
great _______ HBase
solution that could use
some love!
Please email me with ideas. (Really.)
carterp
(at)
google.com
Have a great open source HBase integration that
could use so...
Maxim Lukiyanov, Microsoft
Cloud?
CPU utilization
Typical picture in pure key/value
stores
Unutilized CPU!
Run something else on it (Analytics on Hbase anybody?)
Give it back to the cloud
HBase file system abstraction
HBase file system abstraction
HBase in the cloud
HBase in the cloud
OLAP?
OLTP?
Database?
“What is your biggest mistake as an engineer?
Not putting distributed transactions in BigTable. If you wanted to update mo...
John Leach
Founder & CTO
Call for Founders!!!
Be part of bringing Splice Machine to Open Source
Splice Machine, the first ...
Multi-Tenant Mixed Workloads
Current Storage Challenges
Lack of Transactions (see Dean Quote)
Single Write Optimized Store: Log Structured Merge Tree
L...
Future Storage Approach (Code Named: Janus)
Typed Storage System
JSON first class citizen
Serde based on Spark UnsafeRow
H...
Future Execution Approach (Dual Engine)
All Execution Engines
Statistical Hooks (Sketching Algorithms)
OLAP Execution Engi...
Modern Hardware?
Matt Mullins, Facebook
Lars Hofhansl, Salesforce
Salesforce Single-SKU project
We used to have 30+ different SKUs
Now there is one SKU (almost) for all projects
1U, 10Ge e...
HBase 2.0Matteo Bertozzi, Cloudera
We are trying to avoid another singularity (like 0.94 to 0.96)
(almost) Rolling Upgradable from 1.x
Wire Compatible with 1...
“What makes HBase…
truly Special?”
The (Big) Landscape
Cassandra
CouchDB
DB2
Hana
HBase
Hive
Hypertable
Impala
Kudu
LevelDB
MySQL
RedShift
RocksDB MongoDB
Or...
Can you spot HBase?
The (Big) Landscape
Cassandra
CouchDB
DB2
Hana
HBase
Hive
Hypertable
Impala
Kudu
LevelDB
MySQL
RedShift
RocksDB MongoDB
Or...
The (Big) Landscape
Cassandra
CouchDB
DB2
Hana
HBase
Hive
Hypertable
Impala
Kudu
LevelDB
MySQL
RedShift
RocksDB MongoDB
Or...
(CC BY-SA 2.5)
The HBase Sweet Spot
1. Scales single clusters to 100’s or 1000’s of commodity machines
2. Small Scans (<100m rows) and Ge...
Future work
Using large RAM effectively
Off-heaping everything
In-memory compactions
Optimizing lock contention to utilize...
Time To Party!
Keynote: The Future of Apache HBase
Keynote: The Future of Apache HBase
Upcoming SlideShare
Loading in …5
×

Keynote: The Future of Apache HBase

1,452 views

Published on

Moderated by Lars Hofhansl (Salesforce), with Matteo Bertozzi (Cloudera), John Leach (Splice Machine), Maxim Lukiyanov (Microsoft), Matt Mullins (Facebook), and Carter Page (Google)

The future of HBase, via a variety of viewpoints.

Published in: Software
  • Be the first to comment

Keynote: The Future of Apache HBase

  1. 1. The Future of HBase Lars Hofhansl Principal Architect & VP, Salesforce HBase PMC & Committer Phoenix PMC & Committer Apache Member
  2. 2. Who’s driving it?
  3. 3. It’s us* We define our future * people in this room, developers, contributors, comments on mailing list, committers, PMC members, etc
  4. 4. Kafka Spark Cassandra HBase : Confluent : Databricks : Datastax : ????
  5. 5. Kafka Spark Cassandra HBase : Confluent : Databricks : Datastax : Cloudera. Hortonworks.
  6. 6. Kafka Spark Cassandra HBase : : Confluent : Databricks : Datastax Adobe, Alibaba, Apple, Cask, Cloudera, Facebook, Google, Hortonworks, Huawei, HubSpot, IBM, Intel, NGDATA, Salesforce, The Gap, Twitter, Xiaomi, Yahoo!, etc, etc….
  7. 7. Cloud?
  8. 8. Carter Page, Google Cloud?
  9. 9. What does Cloud mean for HBase’s future?
  10. 10. More problems? More work to do? Rather...
  11. 11. Free stuff?
  12. 12. Free stuff engineering.
  13. 13. HBase on GCP HBase on Dataproc HBase on Cloud Bigtable
  14. 14. HBase on GCP HBase on Dataproc HBase on Cloud Bigtable HBase Client (>= v1.0)
  15. 15. Why is HBase the client for Cloud Bigtable?
  16. 16. Why HBase? #1: Open source is the de facto way that standards are defined now. Committers Not committees
  17. 17. Why HBase? #2: HBase is indisputably the best open source implementation the Bigtable architecture. Bigtable HBase
  18. 18. Why HBase? #3: Because supporting an ecosystem is the right thing. Technology needs a rich community to flourish.
  19. 19. Supporting how?
  20. 20. Rich abstractions on top of HBase Future big data customers need fully formed solutions: A great graph database A great IoT solution A great geo solution And so on... Open source. Each with the scale of HBase. And we want to help, with engineering time and code.
  21. 21. But there’s already a great _______ HBase solution that could use some love!
  22. 22. Please email me with ideas. (Really.) carterp (at) google.com Have a great open source HBase integration that could use some Google engineering help?
  23. 23. Maxim Lukiyanov, Microsoft Cloud?
  24. 24. CPU utilization Typical picture in pure key/value stores
  25. 25. Unutilized CPU! Run something else on it (Analytics on Hbase anybody?) Give it back to the cloud
  26. 26. HBase file system abstraction
  27. 27. HBase file system abstraction
  28. 28. HBase in the cloud
  29. 29. HBase in the cloud
  30. 30. OLAP?
  31. 31. OLTP?
  32. 32. Database?
  33. 33. “What is your biggest mistake as an engineer? Not putting distributed transactions in BigTable. If you wanted to update more than one row you had to roll your own transaction protocol. It wasn’t put in because it would have complicated the system design. In retrospect lots of teams wanted that capability and built their own with different degrees of success. We should have implemented transactions in the core system. It would have been useful internally as well. Spanner fixed this problem by adding transactions.” - Jeff Dean, March 7th, 2016
  34. 34. John Leach Founder & CTO Call for Founders!!! Be part of bringing Splice Machine to Open Source Splice Machine, the first dual-engine RDBMS on HBase and Spark, is headed to open-source and we are looking for some key individuals to be founders to support the transition.
  35. 35. Multi-Tenant Mixed Workloads
  36. 36. Current Storage Challenges Lack of Transactions (see Dean Quote) Single Write Optimized Store: Log Structured Merge Tree Limited Metadata Facilities Current Execution Challenges OLTP: Limited/Rigid Concurrency Model OLAP: Foggy Execution Model Remote Client Scans (Slow) Internal Scans via Coprocessor (In JVM) Custom Rolled Data Flow Engine (Yikes) Maintenance Operations Do not talk about Fight Club (Compactions)
  37. 37. Future Storage Approach (Code Named: Janus) Typed Storage System JSON first class citizen Serde based on Spark UnsafeRow Hierarchical, Partition Aware Transactions Partitions: Within and Across Data Centers Write Optimized Store (Optional) LSM Tree Read Optimized Store (Optional) Positional Delta Trees, Columnar Full Metadata Facilities (https://datasketches.github.io/) Theta Sketch, Quantiles, Frequent Items
  38. 38. Future Execution Approach (Dual Engine) All Execution Engines Statistical Hooks (Sketching Algorithms) OLAP Execution Engines Spark, Flink, MapReduce, Impala etc. YARN, Fair Scheduling Transactional Input/Output Formats File System based with incremental memstore deltas Columnar Support Arrow, Calcite Perform Compactions (yes, it works) OLTP Execution Engines Row Based Storage, Remote HBase Scans
  39. 39. Modern Hardware?
  40. 40. Matt Mullins, Facebook Lars Hofhansl, Salesforce
  41. 41. Salesforce Single-SKU project We used to have 30+ different SKUs Now there is one SKU (almost) for all projects 1U, 10Ge everywhere, FAT networking tree (no/little oversubscription) Same SKU used by all projects Very few exceptions: High storage SKU and high compute SKU Vendor: varies Allows us to order/repurpose in large quantities and then assign to projects Compromise for individual projects, but cheaper overall FAT network -> location independence
  42. 42. HBase 2.0Matteo Bertozzi, Cloudera
  43. 43. We are trying to avoid another singularity (like 0.94 to 0.96) (almost) Rolling Upgradable from 1.x Wire Compatible with 1.x Possible Features HBASE-11425 - Off-Heap for read and write path HBASE-13773 - Replication off Zookeeper and ReplicationAdmin with ACLs support HBASE-14070 - Hybrid-Logical Clocks HBASE-14123 - Backups HBase 2.0
  44. 44. “What makes HBase… truly Special?”
  45. 45. The (Big) Landscape Cassandra CouchDB DB2 Hana HBase Hive Hypertable Impala Kudu LevelDB MySQL RedShift RocksDB MongoDB Oracle PostgreSQL SOLR SleepyCat SQLLite SQLServer Voldemort
  46. 46. Can you spot HBase?
  47. 47. The (Big) Landscape Cassandra CouchDB DB2 Hana HBase Hive Hypertable Impala Kudu LevelDB MySQL RedShift RocksDB MongoDB Oracle PostgreSQL SOLR SleepyCat SQLLite SQLServer Voldemort
  48. 48. The (Big) Landscape Cassandra CouchDB DB2 Hana HBase Hive Hypertable Impala Kudu LevelDB MySQL RedShift RocksDB MongoDB Oracle PostgreSQL SOLR SleepyCat SQLLite SQLServer Voldemort
  49. 49. (CC BY-SA 2.5)
  50. 50. The HBase Sweet Spot 1. Scales single clusters to 100’s or 1000’s of commodity machines 2. Small Scans (<100m rows) and Gets 3. Operations are harder, but amortized over large (>15 node) installs 4. Consistent, OpenSource, OnPrem, Cloud 5. A foundational, general purpose, low latency storage engine There is no system that handles analytical and OLTP workloads well There is no replacement for HBase in this sweet spot
  51. 51. Future work Using large RAM effectively Off-heaping everything In-memory compactions Optimizing lock contention to utilize all cores, on SSDs, 10Ge or more Scaling assignment manager Spark integration for large scans, OLAP Multi-tenancy Sister projects such as Phoenix, for easy interfacing Easier operations to ease on-boarding
  52. 52. Time To Party!

×