Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
1 hbasecon.com
HBase 2.0 and Beyond Panel
Moderator: Jonathan Hsieh
Panel: Matteo Bertozzi / Sean Busbey / Jingcheng Du / ...
2 hbasecon.com
Who are we?
 Matteo Bertozzi – HBase PMC, Cloudera
 Sean Busbey – HBase PMC, Cloudera
 Jingcheng Du – In...
3 hbasecon.com
Outline
 Storing Larger Objects efficiently
 Making DDL Operations fault tolerant
 Better Region Assignm...
4 hbasecon.com
Outline
 Storing Larger Objects efficiently
 Making DDL Operations fault tolerant
 Better Region Assignm...
5 hbasecon.com
Why Moderate Object Storage (MOB)?
 A growing demand for the ability to store moderateobjects (MOB) in HBa...
6 hbasecon.com
How MOB I/O works
HRegionServer
Client
HFIle
MOB cell
HLog
memstore
MOB cell
MOB HFile
Flush
MOB cell
Write...
7 hbasecon.com
Benefits
 Move the MOBs out of the main I/O path to make the write amplification more predictable.
 The s...
8 hbasecon.com
Outline
 Storing Larger Objects efficiently
 Making DDL Operations fault tolerant
 Better Region Assignm...
9 hbasecon.com
Problem – Multi-Steps ops & Failures
DDL & other operations consist of multiple steps
e.g.
Create Table
Han...
10 hbasecon.com
Solution – Multi-Steps ops & Failures
Rewrite each operation to use a State-Machine
e.g.
Create Table
Hand...
11 hbasecon.com
Procedure-v2/Notification-Bus
 The Procedure v2/NotificationBus aims to provide a unified way to build:
...
12 hbasecon.com
Procedure-v2/Notification-Bus - Roadmap
 Apache HBase 1.1
 Fault tolerant Master Operations (e.g. create...
13 hbasecon.com
Outline
 Storing Larger Objects efficiently
 Making DDL Operations fault tolerant
 Better Region Assign...
14 hbasecon.com
ZK-based Region Assignment
 Region states could be inconsistent
 Assignment info stored in both meta tab...
15 hbasecon.com
ZK-less Region Assignment
 RPC based
 Master, the true coordinator
 Only Master can update meta table
...
16 hbasecon.com
Current Status
 Off by default in 1.0
 Impact
 Master is in the critical path
 Meta should be co-locat...
17 hbasecon.com
Outline
 Storing Larger Objects efficiently
 Making DDL Operations fault tolerant
 Better Region Assign...
18 hbasecon.com
HBase Semantic Versioning
The Return to Sanity
19 hbasecon.com
Client
Version?
Server
Version?
Hadoop
Version?
Binary
Compatibility?
HFile
Version?
ARRGGHHH.
Should be S...
20 hbasecon.com
Semantic Versioning Makes Things Simple
21 hbasecon.com
HBase <Major>.<Minor>.<Patch>
22 hbasecon.com
MAJOR version
when you make incompatible API changes
23 hbasecon.com
MINOR version
when you add backwards-compatible functionality
24 hbasecon.com
PATCH version
when you make backwards-compatible bug fixes
25 hbasecon.com
We are adoption this starting with HBase 1.0
26 hbasecon.com
Compatibility Dimensions
(the long version)
 Client-Server wire protocol compatibility
 Server-Server pr...
27 hbasecon.com
TL;DR:
 A patch upgrade is a drop-in replacement
 A minor upgrade requires no application or client code...
28 hbasecon.com
Simple
29 hbasecon.com
Thanks
http://semver.org/
http://hbase.apache.org/book.html#hbase.versioning
30 hbasecon.com
Outline
 Storing Larger Objects efficiently
 Making DDL Operations fault tolerant
 Better Region Assign...
31 hbasecon.com
Improving read availability
 HBase is CP
 When a node goes down, some regions are unavailable until
reco...
32 hbasecon.com
Phase contents
 Phase 1
 Region replicas
 Stale data up to minutes (15 min)
 in 1.0
 Phase 2
 millis...
33 hbasecon.com
Region1
Region2
Region3
WAL
append
ReplicaReplication
RegionServer 1
tail
hfile hfile hfile
HDFS
Flush/Com...
34 hbasecon.com
Region1
Region2
Region3
WAL
ReplicaReplication
RegionServer 1
tail
Region2 (replica)
RegionServer 15
repla...
35 hbasecon.com
Pluggable WAL Replication
 Pluggable WAL replication endpoint
 You can write your own replicators!
 Sim...
36 hbasecon.com
Outline
 Storing Larger Objects efficiently
 Making DDL Operations fault tolerant
 Better Region Assign...
37 hbasecon.com
Workload Throughput
Distributed work will eventually be limited by one of
• CPU
• Disk IO
• Network IO
38 hbasecon.com
HBase Under (synthetic) Load Now
Not CPU Bound
39 hbasecon.com
HBase Under (synthetic) Load Now
Not Disk Bound
40 hbasecon.com
HBase Under (synthetic) Load Now
Not Network
Bound
41 hbasecon.com
Modest Gain: Multiple WALs
 All regions write to one Write
ahead log file. (WAL)
 Idea: Let’s have multi...
42 hbasecon.com
Future Solutions
• Alternative WAL providers
• Read path optimizations based on profiling
• Better tuning
43 hbasecon.com
Outline
 Storing Larger Objects efficiently
 Making DDL Operations fault tolerant
 Better Region Assign...
44 hbasecon.com
Thanks!
Upcoming SlideShare
Loading in …5
×

HBaseCon 2015: HBase 2.0 and Beyond Panel

3,998 views

Published on

Now that you've seen Base 1.0, what's ahead in HBase 2.0, and beyond—and why? Find out from this panel of people who have designed and/or are working on 2.0 features.

Published in: Software
  • Be the first to comment

HBaseCon 2015: HBase 2.0 and Beyond Panel

  1. 1. 1 hbasecon.com HBase 2.0 and Beyond Panel Moderator: Jonathan Hsieh Panel: Matteo Bertozzi / Sean Busbey / Jingcheng Du / Lars Hofhansl / / Enis Soztutar / Jimmy Xiang
  2. 2. 2 hbasecon.com Who are we?  Matteo Bertozzi – HBase PMC, Cloudera  Sean Busbey – HBase PMC, Cloudera  Jingcheng Du – Intel  Lars Hofhansl – HBase PMC, 0.94.x RM, Salesforce.com  Jonathan Hsieh – HBase PMC  Enis Soztutar – HBase PMC, 1.0.0 RM, Hortonworks  Jimmy Xiang – HBase PMC, Cloudera
  3. 3. 3 hbasecon.com Outline  Storing Larger Objects efficiently  Making DDL Operations fault tolerant  Better Region Assignment  Compatibility guarantees for our users  Improving Availability  Using all machine resources  Q+A
  4. 4. 4 hbasecon.com Outline  Storing Larger Objects efficiently  Making DDL Operations fault tolerant  Better Region Assignment  Compatibility guarantees for our users  Improving Availability  Using all machine resources  Q+A
  5. 5. 5 hbasecon.com Why Moderate Object Storage (MOB)?  A growing demand for the ability to store moderateobjects (MOB) in HBase ( 100KB up to 10MB).  Write amplification created by compactions, the write performance degrades along with the accumulation of massive MOBs in HBase.  Too many store files -> Frequent region compactions -> Massive I/O -> Slow compactions -> Flush delay -> High memory usage -> Blocking updates 8.098 10.159 10.700 0.000 2.000 4.000 6.000 8.000 10.000 12.000 125G 500G 1T Latency(sec) Data volume Data Insertion Average Latency (5MB/record, 32 pre-split regions) 0 5 10 15 20 25 1 2 3 4 5 6 7 8 Latency(sec) Time (hour) 1T Data Insertion Average Latency (5MB/record, 32 pre-split regions)
  6. 6. 6 hbasecon.com How MOB I/O works HRegionServer Client HFIle MOB cell HLog memstore MOB cell MOB HFile Flush MOB cell Write Path Ref cell Client Read Path HRegionServer memstore HFIle MOB HFile MOB cell MOB cell Ref cell
  7. 7. 7 hbasecon.com Benefits  Move the MOBs out of the main I/O path to make the write amplification more predictable.  The same APIs to read and write MOBs.  Work with HBase export/copy table, bulk load, replication and snapshot features.  Work with HBase security mechanism. 8.098 10.159 10.700 6.851 6.963 7.033 0.000 2.000 4.000 6.000 8.000 10.000 12.000 125G 500G 1T Latency(sec) Data volume Data Insertion Average Latency (5MB/record, 32 pre-split regions) MOB Disabled MOB Enabled 10.590 57.975 6.212 33.886 0.000 10.000 20.000 30.000 40.000 50.000 60.000 Data Insertion Data Random Get Latency(sec) Average Latency for R/W Mixed Workload (5MB/record, 32 pre-split regions, 300G pre-load, 200G insertion) MOB Disabled MOB Enabled 0 2 4 6 8 10 12 14 16 18 10 20 30 40 50 60 Lantecy(sec) Time (minute) Data Insertion Average Latency MOB Enabled MOB Disabled 0 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 Latency(minute) Time (minute) Data Random Get Average Latency MOB Enabled MOB Disabled
  8. 8. 8 hbasecon.com Outline  Storing Larger Objects efficiently  Making DDL Operations fault tolerant  Better Region Assignment  Compatibility guarantees for our users  Improving Availability  Using all machine resources  Q+A
  9. 9. 9 hbasecon.com Problem – Multi-Steps ops & Failures DDL & other operations consist of multiple steps e.g. Create Table Handler Create regions on FileSystem Add regions to META Assign cpHost.postCreateTableHandler() -> (ACLs) if we crash in between steps. we end up with half state. e.g. File-System present, META not present hbck MAY be able to repair it if we crash in the middle of a single step (e.g. create N regions on fs) hbck has not enough information to rebuild a correct state. Requires manual intervention to repair the state
  10. 10. 10 hbasecon.com Solution – Multi-Steps ops & Failures Rewrite each operation to use a State-Machine e.g. Create Table Handler Create regions on FileSystem Add regions to META Assign cpHost.postCreateTableHandler() -> (ACLs) ...each executed step is written to a store if the machine goes down we know what was pending and what should be rolledback or how to continue to complete the operation
  11. 11. 11 hbasecon.com Procedure-v2/Notification-Bus  The Procedure v2/NotificationBus aims to provide a unified way to build:  Synchronous calls, with the ability to see the state/result in case of failure.  Multisteps procedure with a rollback/rollforward ability in case of failure (e.g. create/delete table)  Notifications across multiple machines (e.g. ACLs/Labels/Quota cache updates)  Coordination of long-running/heavy procedures (e.g. compactions, splits, …)  Procedures across multiple machines (e.g. Snapshots, Assignment)  Replication for Master operations (e.g. grant/revoke)
  12. 12. 12 hbasecon.com Procedure-v2/Notification-Bus - Roadmap  Apache HBase 1.1  Fault tolerant Master Operations (e.g. create/delete/…)  Sync Client (We are still wire compatible, both ways)  Apache HBase 1.2  Master WebUI  Notification BUS, and at least Snapshot using it.  Apache HBase 1.3+ or 2.0 (depending on how hard is to keep Master/RSs compatibility)  Replace Cache Updates, Assignment Manager, Distributed Log Replay,…  New Features: Coordinated compactions, Master ops Replication (e.g. grant/revoke)
  13. 13. 13 hbasecon.com Outline  Storing Larger Objects efficiently  Making DDL Operations fault tolerant  Better Region Assignment  Compatibility guarantees for our users  Improving Availability  Using all machine resources  Q+A
  14. 14. 14 hbasecon.com ZK-based Region Assignment  Region states could be inconsistent  Assignment info stored in both meta table and ZooKeeper  Both Master and RegionServer can update them  Limited scalability and operations efficiency  ZooKeeper events used for coordination 14
  15. 15. 15 hbasecon.com ZK-less Region Assignment  RPC based  Master, the true coordinator  Only Master can update meta table  All state changes are persisted  Follow the state machine  RegionServer does what told by Master  Report status to Master  Each step needs acknowledgement from Master 15
  16. 16. 16 hbasecon.com Current Status  Off by default in 1.0  Impact  Master is in the critical path  Meta should be co-located with Master  Procedure V2 could solve it (future work)  Deployment topology change  Master is a RegionServer, serves small system tables  Blog post has more info  https://blogs.apache.org/hbase/entry/hbase_zk_less_region_assignment 16
  17. 17. 17 hbasecon.com Outline  Storing Larger Objects efficiently  Making DDL Operations fault tolerant  Better Region Assignment  Compatibility guarantees for our users  Improving Availability  Using all machine resources  Q+A
  18. 18. 18 hbasecon.com HBase Semantic Versioning The Return to Sanity
  19. 19. 19 hbasecon.com Client Version? Server Version? Hadoop Version? Binary Compatibility? HFile Version? ARRGGHHH. Should be SIMPLE! Protobufs Client/Server Compatibility?
  20. 20. 20 hbasecon.com Semantic Versioning Makes Things Simple
  21. 21. 21 hbasecon.com HBase <Major>.<Minor>.<Patch>
  22. 22. 22 hbasecon.com MAJOR version when you make incompatible API changes
  23. 23. 23 hbasecon.com MINOR version when you add backwards-compatible functionality
  24. 24. 24 hbasecon.com PATCH version when you make backwards-compatible bug fixes
  25. 25. 25 hbasecon.com We are adoption this starting with HBase 1.0
  26. 26. 26 hbasecon.com Compatibility Dimensions (the long version)  Client-Server wire protocol compatibility  Server-Server protocol compatibility  File format compatibility  Client API compatibility  Client Binary compatibility  Server-Side Limited API compatibility (taken from Hadoop)  Dependency Compatibility  Operational Compatibility
  27. 27. 27 hbasecon.com TL;DR:  A patch upgrade is a drop-in replacement  A minor upgrade requires no application or client code modification  A major upgrade allows us - the HBase community - to make breaking changes.
  28. 28. 28 hbasecon.com Simple
  29. 29. 29 hbasecon.com Thanks http://semver.org/ http://hbase.apache.org/book.html#hbase.versioning
  30. 30. 30 hbasecon.com Outline  Storing Larger Objects efficiently  Making DDL Operations fault tolerant  Better Region Assignment  Compatibility guarantees for our users  Improving Availability  Using all machine resources  Q+A
  31. 31. 31 hbasecon.com Improving read availability  HBase is CP  When a node goes down, some regions are unavailable until recovery  Some class of applications want high availability (for reads)  Region replicas  TIMELINE consistency reads
  32. 32. 32 hbasecon.com Phase contents  Phase 1  Region replicas  Stale data up to minutes (15 min)  in 1.0  Phase 2  millisecond-latencies for staleness (WAL replication)  Replicas for the meta table  Region splits and merges with region replicas  Scan support  In 1.1
  33. 33. 33 hbasecon.com Region1 Region2 Region3 WAL append ReplicaReplication RegionServer 1 tail hfile hfile hfile HDFS Flush/Compaction
  34. 34. 34 hbasecon.com Region1 Region2 Region3 WAL ReplicaReplication RegionServer 1 tail Region2 (replica) RegionServer 15 replay RegionServer 20 Region1 (replica) replay hfile hfile hfile HDFS Flush/Compaction Read flush files
  35. 35. 35 hbasecon.com Pluggable WAL Replication  Pluggable WAL replication endpoint  You can write your own replicators!  Similar to co-processors (runs in the same RS process) hbase> add_peer ’my_peer', ENDPOINT_CLASSNAME => 'org.hbase.MyReplicationEndpoint', DATA => { "key1" => 1 }, CONFIG => { "config1" => "value1", "config2" => "value2" }}
  36. 36. 36 hbasecon.com Outline  Storing Larger Objects efficiently  Making DDL Operations fault tolerant  Better Region Assignment  Compatibility guarantees for our users  Improving Availability  Using all machine resources  Q+A
  37. 37. 37 hbasecon.com Workload Throughput Distributed work will eventually be limited by one of • CPU • Disk IO • Network IO
  38. 38. 38 hbasecon.com HBase Under (synthetic) Load Now Not CPU Bound
  39. 39. 39 hbasecon.com HBase Under (synthetic) Load Now Not Disk Bound
  40. 40. 40 hbasecon.com HBase Under (synthetic) Load Now Not Network Bound
  41. 41. 41 hbasecon.com Modest Gain: Multiple WALs  All regions write to one Write ahead log file. (WAL)  Idea: Let’s have multiple write ahead logs so that we can write more in parallel.  Follow-up work:  To the limit if were on SSD we could have one WAL per region. RS 1 2 3 DNDisksRS 1 2 3 DNDisks IDLE IDLE
  42. 42. 42 hbasecon.com Future Solutions • Alternative WAL providers • Read path optimizations based on profiling • Better tuning
  43. 43. 43 hbasecon.com Outline  Storing Larger Objects efficiently  Making DDL Operations fault tolerant  Better Region Assignment  Compatibility guarantees for our users  Improving Availability  Using all machine resources  Q+A
  44. 44. 44 hbasecon.com Thanks!

×