Your SlideShare is downloading. ×
0
0.96.0
Bay Area Hadoop User Group, October 16th, 2013
Michael Stack <stack@apache.org>
•

•

•

•

0.96.0 Release Manager

Chair of Apache HBase PMC*
Apache Hadoop PMC
Engineer...
HBase?
"...scalable,
distributed
datastore."
"...open source,
distributed, scalable,
consistent, low
latency, random
access non-relational
database..."
Inspiration
A Google Technology described in a 2006 paper, by
Chang et al.?
●Apache Top-level Project
○hbase.apache.org
●Up out of Apache Hadoop contrib
●Project goal: “Billions of rows X
millions o...
When would I
use it?
BIG DATA
Random read/writes
SCA

LI

NG!
Who uses it?
Who runs the
project?
Diverse team*

COMMITTERS!
Preferably ALIVE!
* http://hbase.apache.org/team-list.html
• Release every month
• Each more stable
• & more performant
• Some features…
•
• Currently at 0.94.12

Wire compatible be...
http://www.flickr.com/photos/sysli/3026288256/sizes/o/in/photostream/
(Self-)Migration
Downstreamers
●

Minimal API disturbance
– None?
– Last-minute feedback
Hive, Sqoop, OpenTSDB
Deprecations
●

●
Stats
●

>2k issues fixed
>1500 in 0.96.x only
Currently 6th Release Candidate
–

●

●

Branched 7months ago

●

18months ...
Requirements
●

Hadoop 1.0.3+

●

Hadoop 2.1.0-beta+

●

Must choose one
Big Themes
●

●

●

●

Stability
Operability
– Insight, tools
Scalability
Evolvability
http://www.flickr.com/photos/allspaw/5815258929/sizes/o/in/photostream/
http://www.flickr.com/photos/allspaw/5815258929/sizes/o/in/photostream/
http://www.flickr.com/photos/38595542@N02/3690830720/sizes/o/in/photostream/
•
•
•
•

HBase
Dedicated meta WAL
Don't put WAL replicas on local node
– 33% of reads have to timeout
Lowered ZK timeout
–...
•

HDFS
HDFS-4721 Speed up lease/block recovery
when DN fails and a block goes into
recovery
–

•

HDFS-3703 Decrease the ...
Coming...
●

Faster WAL replay/Distributed WAL Replay
–

No intermediate files

–

No wait on NN
Committed
●

Experimental...
One rationale for pb: http://goo.gl/N0HO6n
•

•

•

•

System tables
Filesystem
Up in zookeeper
Over the wire
RPC
•

•

Implements Protobuf Service
●
Specification!
Data on the side
oEncoding
oCompression

PB

DATA
Scalability
•

•

•

•

e.g. Replicating 1k to 1k & heading north
HBASE-8778 Region assigments scan
table directory making...
Snapshots
•

•

•

By Table
oSnapshot, clone, restore, export
Inexpensive
oJust metadata
Good for...
oBackups
oReplication...
Integration Tests
•

Cluster test module

•

"Borrows" test types from all over
o Netflix "ChaosMonkey"
o Apache Accumulo ...
StochasticLoadBalancer

• Region Count
• Locality
• Movement Cost
• Table Count
• Regions/Table/RegionServer
• Read/Write ...
Tracing

• Review HDFS-5274 Add Tracing to HDFS!
Namespaces
• Grouping of tables
–

Like database in mysql

• System/User

hbase:meta
Quota
Coming
– Security by ns
– Group...
Metrics2
●

Radical revamp

●

Module of Interfaces
– H1

and H2 Impls modules
●
Categories/Naming/Patterns
API
●

Client/Dev

●

Hadoop Annotations
–

●

Stable/Evolving/Private

Cell Interface
–

KeyValue deprecated
Miscellaneous
• X-Row (in-region) Transactions
• Hardened Assignment
• Hardened Replication
• New UI
• Online Merge
• Fine...
More Misc.
• Maven modularized
• Client-side Types
• Revamped defaults
• Compactions
o Pluggable
o Smarter triggers

• Win...
0.96.1, 0.96.2, etc.
●

●

●

●

Bug fixes
Performance fixes
ONLY!
No features!
• Right after 0.96.0
– Month

or two

• Rolling upgrade from 0.96.0
• In-line Cell-tags
• Quota/Groupings
• Reverse Scan
1.0.0?
Thank You!
stack@apache.org
October 2013 HUG: HBase 0.96
October 2013 HUG: HBase 0.96
October 2013 HUG: HBase 0.96
October 2013 HUG: HBase 0.96
October 2013 HUG: HBase 0.96
October 2013 HUG: HBase 0.96
October 2013 HUG: HBase 0.96
October 2013 HUG: HBase 0.96
October 2013 HUG: HBase 0.96
Upcoming SlideShare
Loading in...5
×

October 2013 HUG: HBase 0.96

1,312

Published on

The next major version - 0.96- of Apache HBase have several new features. The "Singularity", because you will have to start and stop your cluster to upgrade to 0.96. 0.96 requires Apache Hadoop 1.0.0 at least, and supported on Hadoop 2.0.0 as well. 0.96 uses protobufs all the time. All of its serializations to ZooKeeper, to the filesystem, and over rpc are protobufs. It runs on JDK7. Metrics have been edited and converted to use Hadoop Metrics2. It has HBase Snapshots and PrefixTreeCompression, etc. This presentation captures a high-level overview of what's new in HBase 0.96.

Published in: Technology, Education

Transcript of "October 2013 HUG: HBase 0.96"

  1. 1. 0.96.0 Bay Area Hadoop User Group, October 16th, 2013
  2. 2. Michael Stack <stack@apache.org> • • • • 0.96.0 Release Manager Chair of Apache HBase PMC* Apache Hadoop PMC Engineer at Cloudera in San Francisco * Project Management Committee
  3. 3. HBase?
  4. 4. "...scalable, distributed datastore."
  5. 5. "...open source, distributed, scalable, consistent, low latency, random access non-relational database..."
  6. 6. Inspiration A Google Technology described in a 2006 paper, by Chang et al.?
  7. 7. ●Apache Top-level Project ○hbase.apache.org ●Up out of Apache Hadoop contrib ●Project goal: “Billions of rows X millions of columns on clusters of ‘commodity hardware” ●HBase persists all data to HDFS ●Uses Apache ZooKeeper ○Cluster coordination
  8. 8. When would I use it?
  9. 9. BIG DATA Random read/writes
  10. 10. SCA LI NG!
  11. 11. Who uses it?
  12. 12. Who runs the project?
  13. 13. Diverse team* COMMITTERS! Preferably ALIVE! * http://hbase.apache.org/team-list.html
  14. 14. • Release every month • Each more stable • & more performant • Some features… • • Currently at 0.94.12 Wire compatible between releases
  15. 15. http://www.flickr.com/photos/sysli/3026288256/sizes/o/in/photostream/
  16. 16. (Self-)Migration
  17. 17. Downstreamers ● Minimal API disturbance – None? – Last-minute feedback Hive, Sqoop, OpenTSDB Deprecations ● ●
  18. 18. Stats ● >2k issues fixed >1500 in 0.96.x only Currently 6th Release Candidate – ● ● Branched 7months ago ● 18months in the making
  19. 19. Requirements ● Hadoop 1.0.3+ ● Hadoop 2.1.0-beta+ ● Must choose one
  20. 20. Big Themes ● ● ● ● Stability Operability – Insight, tools Scalability Evolvability
  21. 21. http://www.flickr.com/photos/allspaw/5815258929/sizes/o/in/photostream/
  22. 22. http://www.flickr.com/photos/allspaw/5815258929/sizes/o/in/photostream/
  23. 23. http://www.flickr.com/photos/38595542@N02/3690830720/sizes/o/in/photostream/
  24. 24. • • • • HBase Dedicated meta WAL Don't put WAL replicas on local node – 33% of reads have to timeout Lowered ZK timeout – 30s instead of 180s Watcher script kills znode – • Detection time approaches 0 Faster assignment
  25. 25. • HDFS HDFS-4721 Speed up lease/block recovery when DN fails and a block goes into recovery – • HDFS-3703 Decrease the datanode failure detection time – • Do not recover on STALE DNs Avoid reading STALE DNs HDFS-3912 Detecting and avoiding stale datanodes for writing
  26. 26. Coming... ● Faster WAL replay/Distributed WAL Replay – No intermediate files – No wait on NN Committed ● Experimental Regions online immediately for Writes ● ● – ● Read older consistent view “Favored Nodes”
  27. 27. One rationale for pb: http://goo.gl/N0HO6n
  28. 28. • • • • System tables Filesystem Up in zookeeper Over the wire
  29. 29. RPC • • Implements Protobuf Service ● Specification! Data on the side oEncoding oCompression PB DATA
  30. 30. Scalability • • • • e.g. Replicating 1k to 1k & heading north HBASE-8778 Region assigments scan table directory making them slow for huge tables HBASE-9208 ReplicationLogCleaner slow at large scale HBASE-8877 Reentrant row locks
  31. 31. Snapshots • • • By Table oSnapshot, clone, restore, export Inexpensive oJust metadata Good for... oBackups oReplication oOffline processing
  32. 32. Integration Tests • Cluster test module • "Borrows" test types from all over o Netflix "ChaosMonkey" o Apache Accumulo linked-list dataloss checker o Standalone or cluster o Sizeable  x data  x runtime hbase-it/src/test/java//org/apache/hadoop/hbase/mapreduce/IntegrationTestBulkLoad.java hbase-it/src/test/java//org/apache/hadoop/hbase/mapreduce/IntegrationTestImportTsv.java hbase-it/src/test/java//org/apache/hadoop/hbase/mttr/IntegrationTestMTTR.java hbase-it/src/test/java//org/apache/hadoop/hbase/test/IntegrationTestBigLinkedList.java hbase-it/src/test/java//org/apache/hadoop/hbase/test/IntegrationTestLoadAndVerify.java
  33. 33. StochasticLoadBalancer • Region Count • Locality • Movement Cost • Table Count • Regions/Table/RegionServer • Read/Write Counts • Memstore Size • Storefile Size
  34. 34. Tracing • Review HDFS-5274 Add Tracing to HDFS!
  35. 35. Namespaces • Grouping of tables – Like database in mysql • System/User hbase:meta Quota Coming – Security by ns – Grouping on cluster by ns – • •
  36. 36. Metrics2 ● Radical revamp ● Module of Interfaces – H1 and H2 Impls modules ● Categories/Naming/Patterns
  37. 37. API ● Client/Dev ● Hadoop Annotations – ● Stable/Evolving/Private Cell Interface – KeyValue deprecated
  38. 38. Miscellaneous • X-Row (in-region) Transactions • Hardened Assignment • Hardened Replication • New UI • Online Merge • Finer grained ACLs • More Coprocessor hooks
  39. 39. More Misc. • Maven modularized • Client-side Types • Revamped defaults • Compactions o Pluggable o Smarter triggers • Windows!
  40. 40. 0.96.1, 0.96.2, etc. ● ● ● ● Bug fixes Performance fixes ONLY! No features!
  41. 41. • Right after 0.96.0 – Month or two • Rolling upgrade from 0.96.0 • In-line Cell-tags • Quota/Groupings • Reverse Scan
  42. 42. 1.0.0?
  43. 43. Thank You! stack@apache.org
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×