• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
October 2013 HUG: HBase 0.96
 

October 2013 HUG: HBase 0.96

on

  • 1,259 views

The next major version - 0.96- of Apache HBase have several new features. The "Singularity", because you will have to start and stop your cluster to upgrade to 0.96. 0.96 requires Apache Hadoop 1.0.0 ...

The next major version - 0.96- of Apache HBase have several new features. The "Singularity", because you will have to start and stop your cluster to upgrade to 0.96. 0.96 requires Apache Hadoop 1.0.0 at least, and supported on Hadoop 2.0.0 as well. 0.96 uses protobufs all the time. All of its serializations to ZooKeeper, to the filesystem, and over rpc are protobufs. It runs on JDK7. Metrics have been edited and converted to use Hadoop Metrics2. It has HBase Snapshots and PrefixTreeCompression, etc. This presentation captures a high-level overview of what's new in HBase 0.96.

Statistics

Views

Total Views
1,259
Views on SlideShare
1,258
Embed Views
1

Actions

Likes
6
Downloads
25
Comments
0

1 Embed 1

http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    October 2013 HUG: HBase 0.96 October 2013 HUG: HBase 0.96 Presentation Transcript

    • 0.96.0 Bay Area Hadoop User Group, October 16th, 2013
    • Michael Stack <stack@apache.org> • • • • 0.96.0 Release Manager Chair of Apache HBase PMC* Apache Hadoop PMC Engineer at Cloudera in San Francisco * Project Management Committee
    • HBase?
    • "...scalable, distributed datastore."
    • "...open source, distributed, scalable, consistent, low latency, random access non-relational database..."
    • Inspiration A Google Technology described in a 2006 paper, by Chang et al.?
    • ●Apache Top-level Project ○hbase.apache.org ●Up out of Apache Hadoop contrib ●Project goal: “Billions of rows X millions of columns on clusters of ‘commodity hardware” ●HBase persists all data to HDFS ●Uses Apache ZooKeeper ○Cluster coordination
    • When would I use it?
    • BIG DATA Random read/writes
    • SCA LI NG!
    • Who uses it?
    • Who runs the project?
    • Diverse team* COMMITTERS! Preferably ALIVE! * http://hbase.apache.org/team-list.html
    • • Release every month • Each more stable • & more performant • Some features… • • Currently at 0.94.12 Wire compatible between releases
    • http://www.flickr.com/photos/sysli/3026288256/sizes/o/in/photostream/
    • (Self-)Migration
    • Downstreamers ● Minimal API disturbance – None? – Last-minute feedback Hive, Sqoop, OpenTSDB Deprecations ● ●
    • Stats ● >2k issues fixed >1500 in 0.96.x only Currently 6th Release Candidate – ● ● Branched 7months ago ● 18months in the making
    • Requirements ● Hadoop 1.0.3+ ● Hadoop 2.1.0-beta+ ● Must choose one
    • Big Themes ● ● ● ● Stability Operability – Insight, tools Scalability Evolvability
    • http://www.flickr.com/photos/allspaw/5815258929/sizes/o/in/photostream/
    • http://www.flickr.com/photos/allspaw/5815258929/sizes/o/in/photostream/
    • http://www.flickr.com/photos/38595542@N02/3690830720/sizes/o/in/photostream/
    • • • • • HBase Dedicated meta WAL Don't put WAL replicas on local node – 33% of reads have to timeout Lowered ZK timeout – 30s instead of 180s Watcher script kills znode – • Detection time approaches 0 Faster assignment
    • • HDFS HDFS-4721 Speed up lease/block recovery when DN fails and a block goes into recovery – • HDFS-3703 Decrease the datanode failure detection time – • Do not recover on STALE DNs Avoid reading STALE DNs HDFS-3912 Detecting and avoiding stale datanodes for writing
    • Coming... ● Faster WAL replay/Distributed WAL Replay – No intermediate files – No wait on NN Committed ● Experimental Regions online immediately for Writes ● ● – ● Read older consistent view “Favored Nodes”
    • One rationale for pb: http://goo.gl/N0HO6n
    • • • • • System tables Filesystem Up in zookeeper Over the wire
    • RPC • • Implements Protobuf Service ● Specification! Data on the side oEncoding oCompression PB DATA
    • Scalability • • • • e.g. Replicating 1k to 1k & heading north HBASE-8778 Region assigments scan table directory making them slow for huge tables HBASE-9208 ReplicationLogCleaner slow at large scale HBASE-8877 Reentrant row locks
    • Snapshots • • • By Table oSnapshot, clone, restore, export Inexpensive oJust metadata Good for... oBackups oReplication oOffline processing
    • Integration Tests • Cluster test module • "Borrows" test types from all over o Netflix "ChaosMonkey" o Apache Accumulo linked-list dataloss checker o Standalone or cluster o Sizeable  x data  x runtime hbase-it/src/test/java//org/apache/hadoop/hbase/mapreduce/IntegrationTestBulkLoad.java hbase-it/src/test/java//org/apache/hadoop/hbase/mapreduce/IntegrationTestImportTsv.java hbase-it/src/test/java//org/apache/hadoop/hbase/mttr/IntegrationTestMTTR.java hbase-it/src/test/java//org/apache/hadoop/hbase/test/IntegrationTestBigLinkedList.java hbase-it/src/test/java//org/apache/hadoop/hbase/test/IntegrationTestLoadAndVerify.java
    • StochasticLoadBalancer • Region Count • Locality • Movement Cost • Table Count • Regions/Table/RegionServer • Read/Write Counts • Memstore Size • Storefile Size
    • Tracing • Review HDFS-5274 Add Tracing to HDFS!
    • Namespaces • Grouping of tables – Like database in mysql • System/User hbase:meta Quota Coming – Security by ns – Grouping on cluster by ns – • •
    • Metrics2 ● Radical revamp ● Module of Interfaces – H1 and H2 Impls modules ● Categories/Naming/Patterns
    • API ● Client/Dev ● Hadoop Annotations – ● Stable/Evolving/Private Cell Interface – KeyValue deprecated
    • Miscellaneous • X-Row (in-region) Transactions • Hardened Assignment • Hardened Replication • New UI • Online Merge • Finer grained ACLs • More Coprocessor hooks
    • More Misc. • Maven modularized • Client-side Types • Revamped defaults • Compactions o Pluggable o Smarter triggers • Windows!
    • 0.96.1, 0.96.2, etc. ● ● ● ● Bug fixes Performance fixes ONLY! No features!
    • • Right after 0.96.0 – Month or two • Rolling upgrade from 0.96.0 • In-line Cell-tags • Quota/Groupings • Reverse Scan
    • 1.0.0?
    • Thank You! stack@apache.org