Your SlideShare is downloading. ×
October 2013 HUG: HBase 0.96
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

October 2013 HUG: HBase 0.96

1,231

Published on

The next major version - 0.96- of Apache HBase have several new features. The "Singularity", because you will have to start and stop your cluster to upgrade to 0.96. 0.96 requires Apache Hadoop 1.0.0 …

The next major version - 0.96- of Apache HBase have several new features. The "Singularity", because you will have to start and stop your cluster to upgrade to 0.96. 0.96 requires Apache Hadoop 1.0.0 at least, and supported on Hadoop 2.0.0 as well. 0.96 uses protobufs all the time. All of its serializations to ZooKeeper, to the filesystem, and over rpc are protobufs. It runs on JDK7. Metrics have been edited and converted to use Hadoop Metrics2. It has HBase Snapshots and PrefixTreeCompression, etc. This presentation captures a high-level overview of what's new in HBase 0.96.

Published in: Technology, Education
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,231
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
30
Comments
0
Likes
7
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. 0.96.0 Bay Area Hadoop User Group, October 16th, 2013
  • 2. Michael Stack <stack@apache.org> • • • • 0.96.0 Release Manager Chair of Apache HBase PMC* Apache Hadoop PMC Engineer at Cloudera in San Francisco * Project Management Committee
  • 3. HBase?
  • 4. "...scalable, distributed datastore."
  • 5. "...open source, distributed, scalable, consistent, low latency, random access non-relational database..."
  • 6. Inspiration A Google Technology described in a 2006 paper, by Chang et al.?
  • 7. ●Apache Top-level Project ○hbase.apache.org ●Up out of Apache Hadoop contrib ●Project goal: “Billions of rows X millions of columns on clusters of ‘commodity hardware” ●HBase persists all data to HDFS ●Uses Apache ZooKeeper ○Cluster coordination
  • 8. When would I use it?
  • 9. BIG DATA Random read/writes
  • 10. SCA LI NG!
  • 11. Who uses it?
  • 12. Who runs the project?
  • 13. Diverse team* COMMITTERS! Preferably ALIVE! * http://hbase.apache.org/team-list.html
  • 14. • Release every month • Each more stable • & more performant • Some features… • • Currently at 0.94.12 Wire compatible between releases
  • 15. http://www.flickr.com/photos/sysli/3026288256/sizes/o/in/photostream/
  • 16. (Self-)Migration
  • 17. Downstreamers ● Minimal API disturbance – None? – Last-minute feedback Hive, Sqoop, OpenTSDB Deprecations ● ●
  • 18. Stats ● >2k issues fixed >1500 in 0.96.x only Currently 6th Release Candidate – ● ● Branched 7months ago ● 18months in the making
  • 19. Requirements ● Hadoop 1.0.3+ ● Hadoop 2.1.0-beta+ ● Must choose one
  • 20. Big Themes ● ● ● ● Stability Operability – Insight, tools Scalability Evolvability
  • 21. http://www.flickr.com/photos/allspaw/5815258929/sizes/o/in/photostream/
  • 22. http://www.flickr.com/photos/allspaw/5815258929/sizes/o/in/photostream/
  • 23. http://www.flickr.com/photos/38595542@N02/3690830720/sizes/o/in/photostream/
  • 24. • • • • HBase Dedicated meta WAL Don't put WAL replicas on local node – 33% of reads have to timeout Lowered ZK timeout – 30s instead of 180s Watcher script kills znode – • Detection time approaches 0 Faster assignment
  • 25. • HDFS HDFS-4721 Speed up lease/block recovery when DN fails and a block goes into recovery – • HDFS-3703 Decrease the datanode failure detection time – • Do not recover on STALE DNs Avoid reading STALE DNs HDFS-3912 Detecting and avoiding stale datanodes for writing
  • 26. Coming... ● Faster WAL replay/Distributed WAL Replay – No intermediate files – No wait on NN Committed ● Experimental Regions online immediately for Writes ● ● – ● Read older consistent view “Favored Nodes”
  • 27. One rationale for pb: http://goo.gl/N0HO6n
  • 28. • • • • System tables Filesystem Up in zookeeper Over the wire
  • 29. RPC • • Implements Protobuf Service ● Specification! Data on the side oEncoding oCompression PB DATA
  • 30. Scalability • • • • e.g. Replicating 1k to 1k & heading north HBASE-8778 Region assigments scan table directory making them slow for huge tables HBASE-9208 ReplicationLogCleaner slow at large scale HBASE-8877 Reentrant row locks
  • 31. Snapshots • • • By Table oSnapshot, clone, restore, export Inexpensive oJust metadata Good for... oBackups oReplication oOffline processing
  • 32. Integration Tests • Cluster test module • "Borrows" test types from all over o Netflix "ChaosMonkey" o Apache Accumulo linked-list dataloss checker o Standalone or cluster o Sizeable  x data  x runtime hbase-it/src/test/java//org/apache/hadoop/hbase/mapreduce/IntegrationTestBulkLoad.java hbase-it/src/test/java//org/apache/hadoop/hbase/mapreduce/IntegrationTestImportTsv.java hbase-it/src/test/java//org/apache/hadoop/hbase/mttr/IntegrationTestMTTR.java hbase-it/src/test/java//org/apache/hadoop/hbase/test/IntegrationTestBigLinkedList.java hbase-it/src/test/java//org/apache/hadoop/hbase/test/IntegrationTestLoadAndVerify.java
  • 33. StochasticLoadBalancer • Region Count • Locality • Movement Cost • Table Count • Regions/Table/RegionServer • Read/Write Counts • Memstore Size • Storefile Size
  • 34. Tracing • Review HDFS-5274 Add Tracing to HDFS!
  • 35. Namespaces • Grouping of tables – Like database in mysql • System/User hbase:meta Quota Coming – Security by ns – Grouping on cluster by ns – • •
  • 36. Metrics2 ● Radical revamp ● Module of Interfaces – H1 and H2 Impls modules ● Categories/Naming/Patterns
  • 37. API ● Client/Dev ● Hadoop Annotations – ● Stable/Evolving/Private Cell Interface – KeyValue deprecated
  • 38. Miscellaneous • X-Row (in-region) Transactions • Hardened Assignment • Hardened Replication • New UI • Online Merge • Finer grained ACLs • More Coprocessor hooks
  • 39. More Misc. • Maven modularized • Client-side Types • Revamped defaults • Compactions o Pluggable o Smarter triggers • Windows!
  • 40. 0.96.1, 0.96.2, etc. ● ● ● ● Bug fixes Performance fixes ONLY! No features!
  • 41. • Right after 0.96.0 – Month or two • Rolling upgrade from 0.96.0 • In-line Cell-tags • Quota/Groupings • Reverse Scan
  • 42. 1.0.0?
  • 43. Thank You! stack@apache.org

×