• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Apache HBase 0.98
 

Apache HBase 0.98

on

  • 3,124 views

New features and improvements in the upcoming Apache HBase 0.98 release.

New features and improvements in the upcoming Apache HBase 0.98 release.

Statistics

Views

Total Views
3,124
Views on SlideShare
2,568
Embed Views
556

Actions

Likes
12
Downloads
33
Comments
0

23 Embeds 556

http://www.scoop.it 245
http://social.labs.navercorp.com 120
https://twitter.com 101
http://labs.navercorp.com 38
https://www.facebook.com 30
https://m.facebook.com 4
http://www.pinterest.com 2
https://m.facebook.com&_=1392697930255 HTTP 1
https://m.facebook.com&_=1392729456990 HTTP 1
https://hootsuite.com 1
https://m.facebook.com&_=1392688556928 HTTP 1
https://m.facebook.com&_=1392948653601 HTTP 1
https://m.facebook.com&_=1392690464712 HTTP 1
https://eediom.slack.com 1
https://m.facebook.com&_=1392682446956 HTTP 1
https://m.facebook.com&_=1392681358802 HTTP 1
https://m.facebook.com&_=1392681516299 HTTP 1
http://www.facebook.com 1
https://m.facebook.com&_=1392683558845 HTTP 1
https://m.facebook.com&_=1392684074660 HTTP 1
https://m.facebook.com&_=1392684181532 HTTP 1
https://m.facebook.com&_=1392684430766 HTTP 1
https://m.facebook.com&_=1394710250481 HTTP 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Apache HBase 0.98 Apache HBase 0.98 Presentation Transcript

    • Apache HBase 0.98 Andrew Purtell Committer, Apache HBase, Apache Software Foundation Big Data US Research And Development, Intel
    • Who am I? • Committer on the Apache HBase project • Member of the Big Data Research And Development Group at Intel • Release manager for Apache HBase 0.98
    • What’s In Apache HBase 0.98? • 212 resolved JIRAs • New features – – – – – – – – Reverse scans (HBASE-4811) EXEC access checks for Endpoints (HBASE-6104) Transparent server side encryption (HBASE-7544) Per-cell ACLs (HBASE-7662) Visibility labels (HBASE-7663) Stripe compactions (HBASE-7667) MapReduce over snapshots (HBASE-8369) REST streaming scans (HBASE-9343) • Performance improvements – Improved WAL write threading model (HBASE-8755) • API cleanups and many bug fixes
    • Branch Release Criteria • Wire compatibility with HBase 0.96 – Mixed client↔server and server↔server operation with 0.96 possible as long as no 0.98 specific features enabled • Compatible with earlier on-disk data formats • Direct upgrade possible from 0.94 → 0.98 using the same offline data migration procedure necessary for 0.94 → 0.96 • No significant performance regression from 0.96 using defaults • Binary API compatibility with versions < 0.98 not guaranteed, code that directly references HBase JARs may need to be recompiled
    • Reverse Scans (HBASE-4811) • Introduces a new internal scanner type that seeks to the end of a range and then steps backwards • No longer necessary to maintain tables of keys in reverse sort order for scanning • Exposed at the client with a new Scan method Scan#setReversed(boolean reversed) • A few % slower than forward scanning in CPU bound tests (server side, filters)
    • Endpoint EXEC Grants (HBASE-6104) • HBase ACLs can grant a familiar set of privileges to users (and groups): – – – – – (R)ead (W)rite E(X)excute (C)reate (A)dmin • AccessController versions prior to 0.98 ignored X • Now access to coprocessor Endpoint invocations can be controlled on a global, per-table, or per-CF basis – – – – Enable the AccessController Set hbase.security.exec.permission.checks to “true” Grant or revoke permissions as appropriate Deploy the coprocessor application
    • Cell Tags • All values written to HBase are stored into cells – Cell is used interchangeably with “key-value” or “KeyValue” for legacy reasons • Cells can now also carry an arbitrary number of tags – – – – Metadata, considered distinct from the key and the value Optional dictionary compression for tags in HFiles and WALs Only available server side Coprocessors can manage their own user defined tags
    • HFile Version 3 • HFile version 2 plus – The ability to persist cell tags – Support for optional file block encryption • Enabled via a site file change – hfile.format.version -> 3 • Once enabled, all data is transparently migrated over time as new files are written by flushes and compactions • Required for: – Transparent Encryption (HBASE-7544) – Per-cell ACLs (HBASE-7662) – Visibility labels (HBASE-7663) • Considered experimental, but proven stable under load
    • Transparent Encryption (HBASE-7544) • Introduces a new generic cryptographic codec and key management framework into hbase-common • Provides transparent encryption of HBase on disk data – Optional per-file HFile block encryption (requires HFile v3) – Optional secure WAL reader and writer • Provides simple key management – Flexible and non-intrusive key rotation – Two-tier key architecture for consistency with best practices – Key provider supports secure local key storage or any network or hardware key storage with Java KeyStore support • Shell support
    • Transparent Encryption (HBASE-7544)
    • Per-Cell ACLs (HBASE-7662) • Extends the AccessController with support for persisting and checking ACL data in cell tags • Uses existing API facilities to transmit per cell ACLs • Backward compatible with existing installs and code • We treat ACLs on a cell as scoped only to the cell for straightforward policy evolution • All mutations must have covering permission in a dominating grant
    • Visibility Labels (HBASE-7663) • Introduces a new VisibilityController coprocessor • Introduces per-cell visibility expressions, client API extensions for setting visibility and authorizations, and new shell commands for label management • The maximal set of labels for a user is defined with the new shell command ‘setauths’ or equivalent admin API • Users specify visibility expressions on cells • Users submit authorizations on Gets and Scans • The effective label set for the request is built in the RPC context from authorizations; those not in the maximal set are dropped – How this is done is pluggable, e.g. integration with enterprise identity management solutions • Scan results are filtered with (label) set membership tests
    • Visibility Labels (HBASE-7663) • Visibility expressions – Labels: arbitrary strings (converted into ordinals with an internal dictionary) – Expressions: Labels joined in boolean expressions – Operators: &, |, ! – Parenthesis for precedence secret secret | topsecret ( secret | topsecret ) & !probationary
    • Improved WAL Write Throughput (HBASE-8755) • Introduces a new threading model for WAL writes that reduces lock contention • Provides better write throughput when under load – A ~15% improvement in write ops/sec at high write concurrency • Lays groundwork for multiple WALs – Will provide further write throughput increase – Also important for limiting the impact of encrypting WAL entries
    • Stripe Compactions (HBASE-7667) • Stripe compactions split the data inside the region by row key and create sub-ranges of data • The sub-ranges are compacted independently • Depending on ingest and access patterns, using stripe compactions can reduce read latency variability and reduce compaction data volume (write amplification) • Two use cases in particular may benefit 1. Approximately uniform keys and large regions 2. Non-uniform data with sequential row keys (e.g. log data) • Can be complex to configure and tune, consult the documentation for detail
    • MapReduce Over Snapshots (HBASE-8369) • Introduces MapReduce utilities supporting MR jobs over snapshots of table data • Similar to TableInputFormat but instead of running over an online table using the HBase API it runs directly over HFiles on disk collected from a table snapshot. • For performance-dominant use cases where the HBase API cannot provide sufficient throughput – Can increase throughput of bulk scanning ~5x by streaming HDFS reads directly to the client • Caveat: Not recommended from a security perspective – Built in access control is completely bypassed – It is a risk to open direct access to HFile data in HDFS
    • REST Streaming Scans (HBASE-9343) • The REST gateway provides stateful scanners to be consistent with the HBase API but this is not REST-ful – Scanner state is not shared across multiple gateways – Scanner state will be lost if the gateway fails • Introduces a new scanning mode to the REST API for stateless scanning • The client manages paging and limits • Instead of forcing a batching up of results as they come back from the RegionServers into multiple HTTP transactions, the stateless scanner can stream all results back to the client over one HTTP connection
    • End Questions?