SlideShare a Scribd company logo
Apache HBase 0.98
Andrew Purtell

Committer, Apache HBase, Apache Software Foundation
Big Data US Research And Development, Intel
Who am I?
• Committer on the Apache HBase project
• Member of the Big Data Research And Development
Group at Intel
• Release manager for Apache HBase 0.98
What’s In Apache HBase 0.98?
• 212 resolved JIRAs
• New features
–
–
–
–
–
–
–
–

Reverse scans (HBASE-4811)
EXEC access checks for Endpoints (HBASE-6104)
Transparent server side encryption (HBASE-7544)
Per-cell ACLs (HBASE-7662)
Visibility labels (HBASE-7663)
Stripe compactions (HBASE-7667)
MapReduce over snapshots (HBASE-8369)
REST streaming scans (HBASE-9343)

• Performance improvements
– Improved WAL write threading model (HBASE-8755)

• API cleanups and many bug fixes
Branch Release Criteria
• Wire compatibility with HBase 0.96
– Mixed client↔server and server↔server operation with 0.96
possible as long as no 0.98 specific features enabled

• Compatible with earlier on-disk data formats
• Direct upgrade possible from 0.94 → 0.98 using the
same offline data migration procedure necessary for
0.94 → 0.96
• No significant performance regression from 0.96 using
defaults
• Binary API compatibility with versions < 0.98 not
guaranteed, code that directly references HBase JARs
may need to be recompiled
Reverse Scans (HBASE-4811)
• Introduces a new internal scanner type that seeks to
the end of a range and then steps backwards
• No longer necessary to maintain tables of keys in
reverse sort order for scanning
• Exposed at the client with a new Scan method
Scan#setReversed(boolean reversed)

• A few % slower than forward scanning in CPU bound
tests (server side, filters)
Endpoint EXEC Grants (HBASE-6104)
• HBase ACLs can grant a familiar set of privileges to
users (and groups):
–
–
–
–
–

(R)ead
(W)rite
E(X)excute
(C)reate
(A)dmin

• AccessController versions prior to 0.98 ignored X
• Now access to coprocessor Endpoint invocations can
be controlled on a global, per-table, or per-CF basis
–
–
–
–

Enable the AccessController
Set hbase.security.exec.permission.checks to “true”
Grant or revoke permissions as appropriate
Deploy the coprocessor application
Cell Tags
• All values written to HBase are stored into cells
– Cell is used interchangeably with “key-value” or “KeyValue” for
legacy reasons

• Cells can now also carry an arbitrary number of tags
–
–
–
–

Metadata, considered distinct from the key and the value
Optional dictionary compression for tags in HFiles and WALs
Only available server side
Coprocessors can manage their own user defined tags
HFile Version 3
• HFile version 2 plus
– The ability to persist cell tags
– Support for optional file block encryption

• Enabled via a site file change
– hfile.format.version -> 3

• Once enabled, all data is transparently migrated over
time as new files are written by flushes and
compactions
• Required for:
– Transparent Encryption (HBASE-7544)
– Per-cell ACLs (HBASE-7662)
– Visibility labels (HBASE-7663)

• Considered experimental, but proven stable under load
Transparent Encryption (HBASE-7544)
• Introduces a new generic cryptographic codec and key
management framework into hbase-common
• Provides transparent encryption of HBase on disk data
– Optional per-file HFile block encryption (requires HFile v3)
– Optional secure WAL reader and writer

• Provides simple key management
– Flexible and non-intrusive key rotation
– Two-tier key architecture for consistency with best practices
– Key provider supports secure local key storage or any network
or hardware key storage with Java KeyStore support

• Shell support
Transparent Encryption (HBASE-7544)
Per-Cell ACLs (HBASE-7662)
• Extends the AccessController with support for
persisting and checking ACL data in cell tags
• Uses existing API facilities to transmit per cell ACLs
• Backward
compatible
with existing installs and
code
• We treat ACLs on a cell
as scoped only to the
cell for straightforward
policy evolution
• All mutations must have
covering permission in a
dominating grant
Visibility Labels (HBASE-7663)
• Introduces a new VisibilityController coprocessor
• Introduces per-cell visibility expressions, client API
extensions for setting visibility and authorizations, and new
shell commands for label management
• The maximal set of labels for a user is defined with the new
shell command ‘setauths’ or equivalent admin API
• Users specify visibility expressions on cells
• Users submit authorizations on Gets and Scans
• The effective label set for the request is built in the RPC
context from authorizations; those not in the maximal set
are dropped
– How this is done is pluggable, e.g. integration with enterprise
identity management solutions

• Scan results are filtered with (label) set membership tests
Visibility Labels (HBASE-7663)
• Visibility expressions
– Labels:
arbitrary
strings
(converted into ordinals with an
internal dictionary)
– Expressions: Labels joined in
boolean expressions
– Operators: &, |, !
– Parenthesis for precedence
secret
secret | topsecret
( secret | topsecret ) & !probationary
Improved WAL Write Throughput (HBASE-8755)
• Introduces a new threading model for WAL writes that
reduces lock contention
• Provides better write throughput when under load
– A ~15% improvement in write ops/sec at high write
concurrency

• Lays groundwork for multiple WALs
– Will provide further write throughput increase
– Also important for limiting the impact of encrypting WAL
entries
Stripe Compactions (HBASE-7667)
• Stripe compactions split the data inside the region by
row key and create sub-ranges of data
• The sub-ranges are compacted independently
• Depending on ingest and access patterns, using stripe
compactions can reduce read latency variability and
reduce compaction data volume (write amplification)
• Two use cases in particular may benefit
1. Approximately uniform keys and large regions
2. Non-uniform data with sequential row keys (e.g. log data)

• Can be complex to configure and tune, consult the
documentation for detail
MapReduce Over Snapshots (HBASE-8369)
• Introduces MapReduce utilities supporting MR jobs
over snapshots of table data
• Similar to TableInputFormat but instead of running over
an online table using the HBase API it runs directly
over HFiles on disk collected from a table snapshot.
• For performance-dominant use cases where the
HBase API cannot provide sufficient throughput
– Can increase throughput of bulk scanning ~5x by streaming
HDFS reads directly to the client

• Caveat: Not recommended from a security perspective
– Built in access control is completely bypassed
– It is a risk to open direct access to HFile data in HDFS
REST Streaming Scans (HBASE-9343)
• The REST gateway provides stateful scanners to be
consistent with the HBase API but this is not REST-ful
– Scanner state is not shared across multiple gateways
– Scanner state will be lost if the gateway fails

• Introduces a new scanning mode to the REST API for
stateless scanning
• The client manages paging and limits
• Instead of forcing a batching up of results as they
come back from the RegionServers into multiple HTTP
transactions, the stateless scanner can stream all
results back to the client over one HTTP connection
End
Questions?

More Related Content

What's hot

HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
Anil Gupta
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
JAX London
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
huguk
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
enissoz
 
Advanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterAdvanced Security In Hadoop Cluster
Advanced Security In Hadoop Cluster
Edureka!
 
SQOOP PPT
SQOOP PPTSQOOP PPT
SQOOP PPT
Dushhyant Kumar
 
Tajo Seoul Meetup July 2015 - What's New Tajo 0.11
Tajo Seoul Meetup July 2015 - What's New Tajo 0.11Tajo Seoul Meetup July 2015 - What's New Tajo 0.11
Tajo Seoul Meetup July 2015 - What's New Tajo 0.11
Hyunsik Choi
 
Apache hadoop hbase
Apache hadoop hbaseApache hadoop hbase
Apache hadoop hbase
sheetal sharma
 
The Heterogeneous Data lake
The Heterogeneous Data lakeThe Heterogeneous Data lake
The Heterogeneous Data lake
DataWorks Summit/Hadoop Summit
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
Cloudera, Inc.
 
8. key value databases laboratory
8. key value databases laboratory 8. key value databases laboratory
8. key value databases laboratory
Fabio Fumarola
 
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.
George Joseph
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Cloudera, Inc.
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
Doron Vainrub
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
Cloudera, Inc.
 
Dancing with the elephant h base1_final
Dancing with the elephant   h base1_finalDancing with the elephant   h base1_final
Dancing with the elephant h base1_finalasterix_smartplatf
 
Hadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema DesignHadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema Design
Cloudera, Inc.
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Swiss Big Data User Group
 
Performance Analysis of HBASE and MONGODB
Performance Analysis of HBASE and MONGODBPerformance Analysis of HBASE and MONGODB
Performance Analysis of HBASE and MONGODB
Kaushik Rajan
 

What's hot (20)

HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
 
Advanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterAdvanced Security In Hadoop Cluster
Advanced Security In Hadoop Cluster
 
SQOOP PPT
SQOOP PPTSQOOP PPT
SQOOP PPT
 
Tajo Seoul Meetup July 2015 - What's New Tajo 0.11
Tajo Seoul Meetup July 2015 - What's New Tajo 0.11Tajo Seoul Meetup July 2015 - What's New Tajo 0.11
Tajo Seoul Meetup July 2015 - What's New Tajo 0.11
 
Apache hadoop hbase
Apache hadoop hbaseApache hadoop hbase
Apache hadoop hbase
 
The Heterogeneous Data lake
The Heterogeneous Data lakeThe Heterogeneous Data lake
The Heterogeneous Data lake
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
 
8. key value databases laboratory
8. key value databases laboratory 8. key value databases laboratory
8. key value databases laboratory
 
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 
Dancing with the elephant h base1_final
Dancing with the elephant   h base1_finalDancing with the elephant   h base1_final
Dancing with the elephant h base1_final
 
Hadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema DesignHadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema Design
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Performance Analysis of HBASE and MONGODB
Performance Analysis of HBASE and MONGODBPerformance Analysis of HBASE and MONGODB
Performance Analysis of HBASE and MONGODB
 

Viewers also liked

HBase Consistency and Performance Improvements
HBase Consistency and Performance ImprovementsHBase Consistency and Performance Improvements
HBase Consistency and Performance ImprovementsDataWorks Summit
 
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance ImprovementsHadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
Cloudera, Inc.
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introductionScott Miao
 
Hadoop voor niet-technici
Hadoop voor niet-techniciHadoop voor niet-technici
Hadoop voor niet-technici
Evert Lammerts
 
Streaming map reduce
Streaming map reduceStreaming map reduce
Streaming map reduce
danirayan
 
阿里自研数据库 Ocean base实践
阿里自研数据库 Ocean base实践阿里自研数据库 Ocean base实践
阿里自研数据库 Ocean base实践
wuqiuping
 
IoT:what about data storage?
IoT:what about data storage?IoT:what about data storage?
IoT:what about data storage?
DataWorks Summit/Hadoop Summit
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
强 王
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBase
HBaseCon
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBase
Carol McDonald
 
Hortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks Technical Workshop: HBase For Mission Critical ApplicationsHortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks
 
唯品会大数据实践 Sacc pub
唯品会大数据实践 Sacc pub唯品会大数据实践 Sacc pub
唯品会大数据实践 Sacc pub
Chao Zhu
 
Content Identification using HBase
Content Identification using HBaseContent Identification using HBase
Content Identification using HBase
HBaseCon
 
Design Patterns for Building 360-degree Views with HBase and Kiji
Design Patterns for Building 360-degree Views with HBase and KijiDesign Patterns for Building 360-degree Views with HBase and Kiji
Design Patterns for Building 360-degree Views with HBase and Kiji
HBaseCon
 
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
Inhacking
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0
enissoz
 

Viewers also liked (17)

HBase Consistency and Performance Improvements
HBase Consistency and Performance ImprovementsHBase Consistency and Performance Improvements
HBase Consistency and Performance Improvements
 
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance ImprovementsHadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introduction
 
Hadoop voor niet-technici
Hadoop voor niet-techniciHadoop voor niet-technici
Hadoop voor niet-technici
 
Streaming map reduce
Streaming map reduceStreaming map reduce
Streaming map reduce
 
阿里自研数据库 Ocean base实践
阿里自研数据库 Ocean base实践阿里自研数据库 Ocean base实践
阿里自研数据库 Ocean base实践
 
Hbase Nosql
Hbase NosqlHbase Nosql
Hbase Nosql
 
IoT:what about data storage?
IoT:what about data storage?IoT:what about data storage?
IoT:what about data storage?
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBase
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBase
 
Hortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks Technical Workshop: HBase For Mission Critical ApplicationsHortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks Technical Workshop: HBase For Mission Critical Applications
 
唯品会大数据实践 Sacc pub
唯品会大数据实践 Sacc pub唯品会大数据实践 Sacc pub
唯品会大数据实践 Sacc pub
 
Content Identification using HBase
Content Identification using HBaseContent Identification using HBase
Content Identification using HBase
 
Design Patterns for Building 360-degree Views with HBase and Kiji
Design Patterns for Building 360-degree Views with HBase and KijiDesign Patterns for Building 360-degree Views with HBase and Kiji
Design Patterns for Building 360-degree Views with HBase and Kiji
 
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0
 

Similar to Apache HBase 0.98

Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
Chris Nauroth
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
Cask Data
 
HBaseConAsia2018 Track2-1: Kerberos-based Big Data Security Solution and Prac...
HBaseConAsia2018 Track2-1: Kerberos-based Big Data Security Solution and Prac...HBaseConAsia2018 Track2-1: Kerberos-based Big Data Security Solution and Prac...
HBaseConAsia2018 Track2-1: Kerberos-based Big Data Security Solution and Prac...
Michael Stack
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityDataWorks Summit
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduce
Chris Nauroth
 
Real-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaReal-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using Impala
Jason Shih
 
Red Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS PlansRed Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS Plans
Red_Hat_Storage
 
Thug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen ZhangThug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen Zhang
Chen Zhang
 
Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
Jean-Baptiste Poullet
 
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
Claudiu Barbura
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
Peter Clapham
 
xPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, TachyonxPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, Tachyon
Claudiu Barbura
 
Meet Apache HBase - 2.0
Meet Apache HBase - 2.0Meet Apache HBase - 2.0
Meet Apache HBase - 2.0
DataWorks Summit
 
Meet HBase 2.0
Meet HBase 2.0Meet HBase 2.0
Meet HBase 2.0
enissoz
 
MySQL Load Balancers - MaxScale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
MySQL Load Balancers - MaxScale, ProxySQL, HAProxy, MySQL Router & nginx - A ...MySQL Load Balancers - MaxScale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
MySQL Load Balancers - MaxScale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
Severalnines
 
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
Alluxio, Inc.
 
HBaseCon2016-final
HBaseCon2016-finalHBaseCon2016-final
HBaseCon2016-finalMaryann Xue
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
Cloudera, Inc.
 
Hadoop - Apache Hbase
Hadoop - Apache HbaseHadoop - Apache Hbase
Hadoop - Apache Hbase
Vibrant Technologies & Computers
 
Introduction to MariaDB MaxScale
Introduction to MariaDB MaxScaleIntroduction to MariaDB MaxScale
Introduction to MariaDB MaxScale
I Goo Lee
 

Similar to Apache HBase 0.98 (20)

Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
 
HBaseConAsia2018 Track2-1: Kerberos-based Big Data Security Solution and Prac...
HBaseConAsia2018 Track2-1: Kerberos-based Big Data Security Solution and Prac...HBaseConAsia2018 Track2-1: Kerberos-based Big Data Security Solution and Prac...
HBaseConAsia2018 Track2-1: Kerberos-based Big Data Security Solution and Prac...
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduce
 
Real-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaReal-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using Impala
 
Red Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS PlansRed Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS Plans
 
Thug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen ZhangThug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen Zhang
 
Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
 
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
xPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, TachyonxPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, Tachyon
 
Meet Apache HBase - 2.0
Meet Apache HBase - 2.0Meet Apache HBase - 2.0
Meet Apache HBase - 2.0
 
Meet HBase 2.0
Meet HBase 2.0Meet HBase 2.0
Meet HBase 2.0
 
MySQL Load Balancers - MaxScale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
MySQL Load Balancers - MaxScale, ProxySQL, HAProxy, MySQL Router & nginx - A ...MySQL Load Balancers - MaxScale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
MySQL Load Balancers - MaxScale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
 
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
 
HBaseCon2016-final
HBaseCon2016-finalHBaseCon2016-final
HBaseCon2016-final
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
 
Hadoop - Apache Hbase
Hadoop - Apache HbaseHadoop - Apache Hbase
Hadoop - Apache Hbase
 
Introduction to MariaDB MaxScale
Introduction to MariaDB MaxScaleIntroduction to MariaDB MaxScale
Introduction to MariaDB MaxScale
 

Recently uploaded

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 

Recently uploaded (20)

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 

Apache HBase 0.98

  • 1. Apache HBase 0.98 Andrew Purtell Committer, Apache HBase, Apache Software Foundation Big Data US Research And Development, Intel
  • 2. Who am I? • Committer on the Apache HBase project • Member of the Big Data Research And Development Group at Intel • Release manager for Apache HBase 0.98
  • 3. What’s In Apache HBase 0.98? • 212 resolved JIRAs • New features – – – – – – – – Reverse scans (HBASE-4811) EXEC access checks for Endpoints (HBASE-6104) Transparent server side encryption (HBASE-7544) Per-cell ACLs (HBASE-7662) Visibility labels (HBASE-7663) Stripe compactions (HBASE-7667) MapReduce over snapshots (HBASE-8369) REST streaming scans (HBASE-9343) • Performance improvements – Improved WAL write threading model (HBASE-8755) • API cleanups and many bug fixes
  • 4. Branch Release Criteria • Wire compatibility with HBase 0.96 – Mixed client↔server and server↔server operation with 0.96 possible as long as no 0.98 specific features enabled • Compatible with earlier on-disk data formats • Direct upgrade possible from 0.94 → 0.98 using the same offline data migration procedure necessary for 0.94 → 0.96 • No significant performance regression from 0.96 using defaults • Binary API compatibility with versions < 0.98 not guaranteed, code that directly references HBase JARs may need to be recompiled
  • 5. Reverse Scans (HBASE-4811) • Introduces a new internal scanner type that seeks to the end of a range and then steps backwards • No longer necessary to maintain tables of keys in reverse sort order for scanning • Exposed at the client with a new Scan method Scan#setReversed(boolean reversed) • A few % slower than forward scanning in CPU bound tests (server side, filters)
  • 6. Endpoint EXEC Grants (HBASE-6104) • HBase ACLs can grant a familiar set of privileges to users (and groups): – – – – – (R)ead (W)rite E(X)excute (C)reate (A)dmin • AccessController versions prior to 0.98 ignored X • Now access to coprocessor Endpoint invocations can be controlled on a global, per-table, or per-CF basis – – – – Enable the AccessController Set hbase.security.exec.permission.checks to “true” Grant or revoke permissions as appropriate Deploy the coprocessor application
  • 7. Cell Tags • All values written to HBase are stored into cells – Cell is used interchangeably with “key-value” or “KeyValue” for legacy reasons • Cells can now also carry an arbitrary number of tags – – – – Metadata, considered distinct from the key and the value Optional dictionary compression for tags in HFiles and WALs Only available server side Coprocessors can manage their own user defined tags
  • 8. HFile Version 3 • HFile version 2 plus – The ability to persist cell tags – Support for optional file block encryption • Enabled via a site file change – hfile.format.version -> 3 • Once enabled, all data is transparently migrated over time as new files are written by flushes and compactions • Required for: – Transparent Encryption (HBASE-7544) – Per-cell ACLs (HBASE-7662) – Visibility labels (HBASE-7663) • Considered experimental, but proven stable under load
  • 9. Transparent Encryption (HBASE-7544) • Introduces a new generic cryptographic codec and key management framework into hbase-common • Provides transparent encryption of HBase on disk data – Optional per-file HFile block encryption (requires HFile v3) – Optional secure WAL reader and writer • Provides simple key management – Flexible and non-intrusive key rotation – Two-tier key architecture for consistency with best practices – Key provider supports secure local key storage or any network or hardware key storage with Java KeyStore support • Shell support
  • 11. Per-Cell ACLs (HBASE-7662) • Extends the AccessController with support for persisting and checking ACL data in cell tags • Uses existing API facilities to transmit per cell ACLs • Backward compatible with existing installs and code • We treat ACLs on a cell as scoped only to the cell for straightforward policy evolution • All mutations must have covering permission in a dominating grant
  • 12. Visibility Labels (HBASE-7663) • Introduces a new VisibilityController coprocessor • Introduces per-cell visibility expressions, client API extensions for setting visibility and authorizations, and new shell commands for label management • The maximal set of labels for a user is defined with the new shell command ‘setauths’ or equivalent admin API • Users specify visibility expressions on cells • Users submit authorizations on Gets and Scans • The effective label set for the request is built in the RPC context from authorizations; those not in the maximal set are dropped – How this is done is pluggable, e.g. integration with enterprise identity management solutions • Scan results are filtered with (label) set membership tests
  • 13. Visibility Labels (HBASE-7663) • Visibility expressions – Labels: arbitrary strings (converted into ordinals with an internal dictionary) – Expressions: Labels joined in boolean expressions – Operators: &, |, ! – Parenthesis for precedence secret secret | topsecret ( secret | topsecret ) & !probationary
  • 14. Improved WAL Write Throughput (HBASE-8755) • Introduces a new threading model for WAL writes that reduces lock contention • Provides better write throughput when under load – A ~15% improvement in write ops/sec at high write concurrency • Lays groundwork for multiple WALs – Will provide further write throughput increase – Also important for limiting the impact of encrypting WAL entries
  • 15. Stripe Compactions (HBASE-7667) • Stripe compactions split the data inside the region by row key and create sub-ranges of data • The sub-ranges are compacted independently • Depending on ingest and access patterns, using stripe compactions can reduce read latency variability and reduce compaction data volume (write amplification) • Two use cases in particular may benefit 1. Approximately uniform keys and large regions 2. Non-uniform data with sequential row keys (e.g. log data) • Can be complex to configure and tune, consult the documentation for detail
  • 16. MapReduce Over Snapshots (HBASE-8369) • Introduces MapReduce utilities supporting MR jobs over snapshots of table data • Similar to TableInputFormat but instead of running over an online table using the HBase API it runs directly over HFiles on disk collected from a table snapshot. • For performance-dominant use cases where the HBase API cannot provide sufficient throughput – Can increase throughput of bulk scanning ~5x by streaming HDFS reads directly to the client • Caveat: Not recommended from a security perspective – Built in access control is completely bypassed – It is a risk to open direct access to HFile data in HDFS
  • 17. REST Streaming Scans (HBASE-9343) • The REST gateway provides stateful scanners to be consistent with the HBase API but this is not REST-ful – Scanner state is not shared across multiple gateways – Scanner state will be lost if the gateway fails • Introduces a new scanning mode to the REST API for stateless scanning • The client manages paging and limits • Instead of forcing a batching up of results as they come back from the RegionServers into multiple HTTP transactions, the stateless scanner can stream all results back to the client over one HTTP connection