Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Apache HBase: State of the
Union
Enis Söztutar
enis@apache.org
About Me
Enis Söztutar
• enis@apache.org
• Committer and PMC member in Apache HBase, Phoenix, and Hadoop
• HBase/Phoenix d...
Outline
Versions, compatibility
Releases, what is in HBase-{1.1, 1.2, 1.3}
New Developments
HBase-2.0
Versions, Compatibility
Semantic Versioning
Starting with the 1.0 release, HBase works toward
Semantic Versioning
MAJOR.MINOR.PATCH[-identifiers]
...
SemVer in Action
1.0 Released last year. Started following semantic versioning
10 releases with 1.x.y versions. More comin...
To be, or not to be (Compatible)
To be, or not to be (Compatible)
Compatibility is NOT a simple yes or no
Many dimensions
• source, binary, wire, command l...
Major Minor Patch
Client-Server Wire Compatibility
✗ ✓ ✓
Server-Server Compatibility
✗ ✓ ✓
File Format Compatibility
✗* ✓ ...
Releases
2015 H2 – 2016 H1 (repo and releases)
(master) 2.0.0-SNAPSHOT
(branch-1) 1.4.0-SNAPSHOT
(branch-1.3)
1.3.0 RC
1.2.2 RC1.2....
RTFM – HBase-1.1 Release Notes
• Async RPC client
• Simple RPC throttling
• Improved compaction controls
• Scan improvemen...
RTFM – HBase-1.2 Release Notes
• JDK8 is now supported
• Hadoop 2.6.1+ and Hadoop 2.7.1+
are now supported
• Per column-fa...
RTFM – HBase-1.3 Release Notes
• Date-based tiered compactions
• Maven archetypes for HBase client
applications
• Throughp...
Releases – How to choose
0.98 is still released frequently, likely will continue till end of 2016
1.0 is EOL’ed. Move to 1...
New Developments
New Compaction Policies for Time series
FIFO: First In, First Out
• No Compaction!
• Only data with very short TTL
Date Ti...
Date Tiered Compaction
From https://labs.spotify.com/2014/12/18/date-tiered-compaction/
Spark Integration
• RRD
• DataFrame / DataSet / SparkSQL
• Partition pruning
• Column pruning
• Data locality
• Predicate ...
Spark Integration
Perf
Async
• Async RPC client already in
• Async Client
• Async WAL Writer
Row locks, Read / Write
Write path re-ordered
New Development – In Progress
RPC Scheduling improvements
Replication 2.0
Reduce Garbage
C++ Client
Backup / Restore
New Development – In Progress
Offheaping
Read path (done)
Write path in development
In-memory flushes/compactions
Compact ...
HBase-2.0
HBase-2.0
Target is 2016 EOY
Learnt from singularity (0.94 -> 0.96+)
2.0 will be rolling upgradable!
• Disclaimer: to the ...
How to prepare for HBase-2.0
2.0 contains more API clean up
Cleanup PB and guava “leaks” into the API
Some deprecated APIs...
Other HBase talks
Today
(3:00pm) Omid: A Transactional Framework
for HBase
(4:10pm) Hive Hbase Metastore - Improving
Hive ...
Questions
Thanks for listening *.
*Here is a picture of a cat for your suffering!
Apache HBase: State of the Union
Upcoming SlideShare
Loading in …5
×

Apache HBase: State of the Union

1,078 views

Published on

Apache HBase: State of the Union

Published in: Technology
  • Be the first to comment

Apache HBase: State of the Union

  1. 1. Apache HBase: State of the Union Enis Söztutar enis@apache.org
  2. 2. About Me Enis Söztutar • enis@apache.org • Committer and PMC member in Apache HBase, Phoenix, and Hadoop • HBase/Phoenix dev @Hortonworks
  3. 3. Outline Versions, compatibility Releases, what is in HBase-{1.1, 1.2, 1.3} New Developments HBase-2.0
  4. 4. Versions, Compatibility
  5. 5. Semantic Versioning Starting with the 1.0 release, HBase works toward Semantic Versioning MAJOR.MINOR.PATCH[-identifiers] PATCH: only BC bug fixes. MINOR: BC new features MAJOR: Incompatible changes
  6. 6. SemVer in Action 1.0 Released last year. Started following semantic versioning 10 releases with 1.x.y versions. More coming! Release notes contain “compatibility” report for source / binary Patch upgrades do not have new features. Drop in replacement. Minor versions are “compatible”
  7. 7. To be, or not to be (Compatible)
  8. 8. To be, or not to be (Compatible) Compatibility is NOT a simple yes or no Many dimensions • source, binary, wire, command line, dependencies etc What is client interface? • InterfaceAudience.{Public,Private,LimitedPrivate} Read https://hbase.apache.org/book.html#upgrading
  9. 9. Major Minor Patch Client-Server Wire Compatibility ✗ ✓ ✓ Server-Server Compatibility ✗ ✓ ✓ File Format Compatibility ✗* ✓ ✓ Client API Compatibility ✗ ✓ ✓ Client Binary Compatibility ✗ ✗ ✓ Server Side Limited API Compatibility ✗ ✗*/✓* ✓ Dependency Compatibility ✗ ✓ ✓ Operation Compatibility ✗ ✗ ✓
  10. 10. Releases
  11. 11. 2015 H2 – 2016 H1 (repo and releases) (master) 2.0.0-SNAPSHOT (branch-1) 1.4.0-SNAPSHOT (branch-1.3) 1.3.0 RC 1.2.2 RC1.2.0 1.2.1 (branch-1.2) 1.1.0 1.1.5 0.98.200.98.19 1.0.0 1.0.3 (branch-1.1) (branch-1.0) (0.98) … … …
  12. 12. RTFM – HBase-1.1 Release Notes • Async RPC client • Simple RPC throttling • Improved compaction controls • Scan improvements • Procedure V2 for improved reliability of cluster operations (HBASE-12439) • New extension interfaces for coprocessor users • Per-column family flush • WAL on SSD • BlockCache in Memcached • Region replica enhancements around META, WAL, and bulk loading
  13. 13. RTFM – HBase-1.2 Release Notes • JDK8 is now supported • Hadoop 2.6.1+ and Hadoop 2.7.1+ are now supported • Per column-family time ranges for scan • Daemons respond to SIGHUP to reload configs • Region location methods added to thrift2 proxy • Table-level sync that sends deltas • Client side metrics via JMX
  14. 14. RTFM – HBase-1.3 Release Notes • Date-based tiered compactions • Maven archetypes for HBase client applications • Throughput controller for flushes Controlled delay (CoDel) based RPC scheduler (HBASE-15136) • Bulk loaded HFile replication • More improvements to Procedure V2 • Improvements to Multi WAL • Many improvements and optimizations in metrics subsystem • Reduced memory allocation in RPC layer • Region location lookups optimizations in HBase client
  15. 15. Releases – How to choose 0.98 is still released frequently, likely will continue till end of 2016 1.0 is EOL’ed. Move to 1.1 at least Both 1.1 and 1.2 are pretty stable Starting from scratch, use 1.2 or 1.3 1.3 is coming shortly Moving between minor versions is easy for 1.x
  16. 16. New Developments
  17. 17. New Compaction Policies for Time series FIFO: First In, First Out • No Compaction! • Only data with very short TTL Date Tiered Compaction • Dramatic reduction in IO! • Partition hfiles and compaction by time windows • Scans with time ranges filters whole files
  18. 18. Date Tiered Compaction From https://labs.spotify.com/2014/12/18/date-tiered-compaction/
  19. 19. Spark Integration • RRD • DataFrame / DataSet / SparkSQL • Partition pruning • Column pruning • Data locality • Predicate pushdown
  20. 20. Spark Integration
  21. 21. Perf Async • Async RPC client already in • Async Client • Async WAL Writer Row locks, Read / Write Write path re-ordered
  22. 22. New Development – In Progress RPC Scheduling improvements Replication 2.0 Reduce Garbage C++ Client Backup / Restore
  23. 23. New Development – In Progress Offheaping Read path (done) Write path in development In-memory flushes/compactions Compact in-memory representations Fatter flushes Assignment Manager/Master
  24. 24. HBase-2.0
  25. 25. HBase-2.0 Target is 2016 EOY Learnt from singularity (0.94 -> 0.96+) 2.0 will be rolling upgradable! • Disclaimer: to the extend that we can make it JDK-8 only Will work with Hadoop-3? Assignment and data layout changes is the big driver
  26. 26. How to prepare for HBase-2.0 2.0 contains more API clean up Cleanup PB and guava “leaks” into the API Some deprecated APIs (HConnection, HTable, HBaseAdmin, etc) going away Start using JDK-8 (and G1). You will like it. 1.x client should be able to do read / write / scan against 2.0 clusters Some DDL / Admin operations may not work
  27. 27. Other HBase talks Today (3:00pm) Omid: A Transactional Framework for HBase (4:10pm) Hive Hbase Metastore - Improving Hive with a Big Data Metadata Storage (5:00pm) Operating and Supporting Apache HBase - Best Practices and Improvements Thursday (2:10pm) Managing Hadoop, HBase, and Storm Clusters at Yahoo Scale (3:00pm) Phoenix + HBase: An Enterprise Grade Data-Warehouse Appliance for Interactive Analytics? (4:10pm) The DAP: Where Yarn, HBase, Kafka and Spark go to Production (5:00pm) HBase BoF
  28. 28. Questions Thanks for listening *. *Here is a picture of a cat for your suffering!

×