HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro

Cloudera, Inc.
Cloudera, Inc.Cloudera, Inc.
HBase Security for the Enterprise
Andrew Purtell, Trend Micro
On behalf of the Trend Hadoop Group
apurtell@apache.org
Agenda

•
    Who we are

•
    Motivation

•
    Use Cases

•
    Implementation

•
    Experience

•
    Quickstart Tutorial
Introduction
Trend Micro




                           Headquartered:
                           Tokyo, Japan     Founded:
                                             LA 1988




•
    Technology innovator and top ranked security solutions
    provider
•
    4,000+ employees worldwide
Trend Micro Smart Protection Network                       WEB
                                                        REPUTATION


                                          EMAIL                       FILE
Threats         Threat Collection         REPUTATION                  REPUTATION
               • Customers
               • Partners
               • TrendLabs Research,
                 Service & Support
                                                                                   Management
               • Samples
               • Submissions
               • Honeypots
               • Web Crawling                              Data
               • Feedback Loops                          Platform
               • Behavioral Analysis




                                                                                            SaaS
          Partners
          • ISPs                                                                            Cloud
          • Routers
          • Etc.

                                             Endpoint
                            Off Network                                        Gateway

                                                              Messaging




•
    Information integration is our advantage
Trend Hadoop Group

                                                  Cascading
               Pig
                                     Giraph
                                               Flume
              HDFS                    Oozie
              HBase                                    Sqoop
            MapReduce
            ZooKeeper                    Mahout
                                                      Avro
               Core
                                      Hive               Gora
                                               Solr
             Supported               Not Supported, Monitoring


•
    We curate and support a complete internal distribution
•
    We act within the ASF community processes on behalf of
    internal stakeholders, and are ASF evangelists
Motivation
Our Challenges

•
    As we grow our business we see the network effects of
    our customers' interactions with the Internet and each
    other




•
  This is a volume,
variety, and velocity
problem
Why HBase?

•
  For our Hadoop based
applications, if we were
forced to use MR for
every operation, it
would not be useful
•
  Fortunately, HBase
provides low latency
random access to
very large data tables
and first class Hadoop
platform integration
But...
•
    Hadoop, for us, is the centerpiece of a data management
    consolidation strategy
•
    (Prior to release 0.92) HBase did not have intrinsic
    access control facilities
•
    Why do we care? Provenance, fault isolation, data
    sensitivity, auditable controls, ...
Our Solution

•
    Use HBase where appropriate
•
    Build in the basic access control features we need
    (added in 0.92, evolving in 0.94+)
•
    Do so with a community sanctioned approach
•
    As a byproduct of this work, we have Coprocessors,
    separately interesting
Use Cases
Meta
•
     Our meta use case: Data integration, storage and service
     consolidation




                                            Today: “Data
                                            neighborhood”




    Yesterday: Data islands
Application Fault Isolation

•
    Multitenant cluster, multiple application dev teams
•
    Need to strongly authenticate users to all system
    components: HDFS, HBase, ZooKeeper
•
    Rogue users cannot subvert authentication
•
    Allow and enforce restrictive permissions on internal
    application state: files, tables/CFs, znodes
Private Table (Default case)

•
    Strongly authenticate users to all system components
•
    Assign ownership when a table is created
•
    Allow only the owner full access to table resources
•
    Deny all others
•
    (Optional) Privacy on the wire with encrypted RPC




•
    Internal application state
•
    Applications under development, proofs of concept
Sensitive Column Families in Shared Tables

•
    Strongly authenticate users to all system components
•
    Grant read or read-write permissions to some CFs
•
    Restrict access to one or more other CFs only to owner
•
    Requires ACLs at per-CF granularity
•
    Default deny to help avoid policy mistakes




•
    Domain Reputation Repository (DRR)
•
    Tracking and logging system (TLS), like Google's Dapper
Read-only Access for Ad Hoc Query

•
    Strongly authenticate users to all system components
•
    Need to supply HBase delegation tokens to MR
•
    Grant write permissions to data ingress and analytic
    pipeline processes
•
    Grant read only permissions for ad hoc uses, such as Pig
    jobs
•
    Default deny to help avoid policy mistakes



•
    Knowledge discovery via ad hoc query (Pig)
Implementation
Goals and Non-Goals

Goals

•
    Satisfy use cases
•
    Use what Secure Hadoop Core provides as much as
    possible
•
    Minimally invasive to core code

Non-Goals

•
    Row-level or per value (cell)
•
    Complex policy, full role based access control
•
    Push down of file ownership to HDFS
Coprocessors

•
    Inspired by Bigtable coprocessors, hinted at like the
    Higgs Boson in Jeff Dean's LADIS '09 keynote talk
•
    Dynamically installed code that runs at each region in the
    RegionServers, loaded on a per-table basis:
      Observers: Like database triggers, provide event-based hooks for
        interacting with normal operations
      Endpoints: Like stored procedures, custom RPC methods called
        explicitly with parameters
•
    A high-level call interface for clients: Calls addressed to
    rows or ranges of rows are mapped to data location and
    parallelized by client library
•
    Access checking is done by an Observer
•
    New security APIs implemented as Endpoints
Authentication

•
    Built on Secure Hadoop
      Client authentication via Kerberos, a trusted third party
      Secure RPC based on SASL
•
    SASL can negotiate encryption and/or message integrity
    verification on a per connection basis
•
    Make RPC extensible                and      pluggable,        add   a
    SecureRpcEngine option
•
    Support DIGEST-MD5 authentication, allowing Hadoop
    delegation token use for MapReduce
      TokenProvider, a Coprocessor that provides and verifies HBase
        delegation tokens, and manages shared secrets on the cluster
Authorization – AccessController

•
    AccessController: A Coprocessor that manages access
    control lists
•
    Simple and familiar permissions model: READ, WRITE,
    CREATE, ADMIN
•
    Permissions grantable at table, column family, and
    column qualifier granularity
•
    Supports user and group based assignment
•
    The Hadoop group mapping        service    can   model
    application roles as groups
Authorization – AccessController
Authorization – Secure ZooKeeper

•
    ZooKeeper plays a critical role in HBase cluster
    operations and in the security implementation; needs
    strong security or it becomes a weak point
•
    Kerberos-based client
    authentication
•
    Znode ACLs enforce
    SASL authenticated access
    for sensitive data
Audit

•
    Simple audit log via Log4J
•
    Still need to work out a structured format for audit log
    messages
Two Implementation “Levels”

1. Secure RPC
•
    SecureRPCEngine for integration with Secure Hadoop,
    strong user authentication, message integrity, and
    encryption on the wire
•
    Implementation is solid


2. Coprocessor-based add-ons
•
    TokenProvider: Install only if running MR jobs with HBase
    RPC security enabled
•
    AccessController: Install on a per table basis, configure
    per CF policy, otherwise no overheads
•
    Implementations bring in new runtime dependencies on
    ZooKeeper, still considered experimental
Two Implementation “Levels”

1. Secure RPC
•
    SecureRPCEngine for integration with Secure Hadoop,
    strong user authentication, message integrity, and
    encryption on the wire
•
    Implementation is solid


2. Coprocessor-based add-ons
•
    TokenProvider: Install only if running MR jobs with HBase
    RPC security enabled
•
    AccessController: Install on a per table basis, configure
    per CF policy, otherwise no overheads
•
    Implementations bring in new runtime dependencies on
    ZooKeeper, still considered experimental
Layering
                                         Thrift client
                                          Thrift client     REST client
                                                             REST client
       HBase MapReduce client
        HBase MapReduce client           HBase Thrift
                                          HBase Thrift      HBase REST
                                                             HBase REST

       TokenProvider
        TokenProvider                    HBase Java client
                                          HBase Java client

       HBase Secure RPC
        HBase Secure RPC

       AccessController (optional on a a per-table basis)
        AccessController (optional on per-table basis)

       HBase
        HBase

       Hadoop Secure RPC
        Hadoop Secure RPC

       MapReduce
        MapReduce                        HDFS
                                          HDFS

       Authentication infrastructure: Kerberos ++ LDAP
        Authentication infrastructure: Kerberos LDAP

       OS
        OS
Experience
Secure RPC Engine
•
    Authentication adds latency at connection setup: Extra
    round trips for SASL negotiation
•
    Recommendation: Increase RPC idle time for better
    connection reuse

•
    Negotiating message integrity (“auth-int”) takes ~5% off
    of max throughput
•
    Negotiating SASL encryption (“auth-conf”) takes ~10%
    off of max throughput
•
    Recommendation: Consider your need for such options
    carefully
Secure RPC Engine

•
    A Hadoop system including HBase will initiate RPC far
    more frequently than without (file reads, compactions,
    client API access, …)
•
    If the KDC is overloaded then not only client operations
    but also things like region post deployment tasks may fail,
    increasing region transition time
•
    Recommendation: HA KDC deployment, KDC capacity
    planning, trust federation over multiple KDC HA-pairs
Secure RPC Engine

•
    Activity swarms may be seen by a KDC as replay attacks
    (“Request is a replay (34)”)
•
    Recommendation: Insure unique keys for each service
    instance, e.g. hbase/host@realm where host is fqdn
•
    Recommendation: Check for clock skew over cluster hosts
•
    Recommendation: Use MIT Kerberos 1.8
•
    Recommendation: Increase RPC idle time for better
    connection reuse
•
    Recommendation: Avoid too frequent HBCK validation of
    cluster health
Hadoop Security Issues (?)
•
    Open issue: Occasional swarms of 5-10 seconds at
    intervals of about TGT lifetime of:
     date time host.dc ERROR [PostOpenDeployTasks:
        a74847b544ba37001f56a9d716385253]
        (org.apache.hadoop.security.UserGroupInformation) -
        PriviledgedActionException as:hbase/host.dc@realm (auth:KERBEROS)
        cause:javax.security.sasl.SaslException: GSS initiate failed
        [Caused by GSSException: No valid credentials provided (Mechanism
        level: Failed to find any Kerberos tgt)]

    Some Hadoop RPC improvements not yet ported
•
    Speaking of swarms, at or about delegation token
    expiration interval you may see runs of:
     date time host.dc ERROR [DataStreamer for file file block blockId]
        (org.apache.hadoop.security.UserGroupInformation) -
        PriviledgedActionException as:blockId (auth:SIMPLE)
        cause:org.apache.hadoop.ipc.RemoteException: Block token with
        block_token_identifier (expiryDate=timestamp, keyId=keyId,
        userId=hbase, blockIds=blockId, access modes=[READ|WRITE]) is
        expired.

These should probably not be logged at ERROR level
TokenProvider

•
    Increases exposure to ZooKeeper related RegionServer
    aborts: If keys cannot be rolled or accessed due to a ZK
    error, we must fail closed
•
    Recommendation: Provision sufficient ZK quorum peers
    and deploy them in separate failure domains (one at
    each top of rack, or similar)
•
    Recommendation: Redundant L2 / L2+L3, you probably
    have it already

•
    Recent versions of ZooKeeper have important bug fixes
•
    Recommendation: Use ZooKeeper 3.4.4 (when released)
    or higher
    For more detail on HBase token authentication:
    http://wiki.apache.org/hadoop/Hbase/HBaseTokenAuthentication
AccessController
•
    Use 0.92.1 or above for a bug fix with Get protection

•
    The AccessController will create a small new “system”
    table named _ acl _; the data in this table is almost as
    important as that in .META.
•
    Recommendation: Use the shell to manually flush the
    ACL table after permissions changes to insure changes
    are persisted

•
    Recommendation: The recommendations related to
    ZooKeeper for TokenProvider apply equally here
Shell Support

•
    Shell support is rudimentary, will support the basic use
    cases
•
    Note: You must supply exactly the same permission
    specification to revoke as you did to grant; there is no
    wildcarding and nothing like revoke all
Demonstration Video
Thank You!
1 of 38

More Related Content

What's hot(20)

Evolving HDFS to a Generalized Storage SubsystemEvolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit1.5K views
Curb your insecurity with HDPCurb your insecurity with HDP
Curb your insecurity with HDP
DataWorks Summit/Hadoop Summit1.1K views
Managing Hadoop, HBase and Storm Clusters at Yahoo ScaleManaging Hadoop, HBase and Storm Clusters at Yahoo Scale
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
DataWorks Summit/Hadoop Summit1.3K views
Operationalizing YARN based Hadoop Clusters in the CloudOperationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the Cloud
DataWorks Summit/Hadoop Summit1.4K views
Realtime Analytics in HadoopRealtime Analytics in Hadoop
Realtime Analytics in Hadoop
Rommel Garcia806 views
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
DataWorks Summit/Hadoop Summit954 views
HBaseCon 2013: ETL for Apache HBaseHBaseCon 2013: ETL for Apache HBase
HBaseCon 2013: ETL for Apache HBase
Cloudera, Inc.6.9K views
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoop
hadooparchbook3.9K views
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
DataWorks Summit/Hadoop Summit2.2K views

Viewers also liked(20)

Trend micro v2Trend micro v2
Trend micro v2
JD Sherry2.7K views
HBaseCon 2013: 1500 JIRAs in 20 MinutesHBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: 1500 JIRAs in 20 Minutes
Cloudera, Inc.4.1K views
HBaseCon 2013: Apache HBase on FlashHBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on Flash
Cloudera, Inc.4.3K views

Similar to HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro(20)

Bi with apache hadoop(en)Bi with apache hadoop(en)
Bi with apache hadoop(en)
Alexander Alten-Lorenz1.2K views
Securing the Hadoop EcosystemSecuring the Hadoop Ecosystem
Securing the Hadoop Ecosystem
DataWorks Summit6.7K views
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise Context
DataWorks Summit/Hadoop Summit1K views
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephants
Ovidiu Dimulescu1.4K views
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
DataWorks Summit4.2K views
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and Tomorrow
DataWorks Summit5.5K views
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
Cloudera, Inc.10.1K views
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Dr. C.V. Suresh Babu2K views
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
Shravan (Sean) Pabba1.2K views
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
Chris Nauroth775 views
Big data - Online TrainingBig data - Online Training
Big data - Online Training
Learntek130 views

More from Cloudera, Inc.(20)

Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.4.5K views

Recently uploaded(20)

[2023] Putting the R! in R&D.pdf[2023] Putting the R! in R&D.pdf
[2023] Putting the R! in R&D.pdf
Eleanor McHugh36 views
The Research Portal of Catalonia: Growing more (information) & more (services)The Research Portal of Catalonia: Growing more (information) & more (services)
The Research Portal of Catalonia: Growing more (information) & more (services)
CSUC - Consorci de Serveis Universitaris de Catalunya59 views
METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
Prity Khastgir IPR Strategic India Patent Attorney Amplify Innovation24 views
Liqid: Composable CXL PreviewLiqid: Composable CXL Preview
Liqid: Composable CXL Preview
CXL Forum120 views
Web Dev - 1 PPT.pdfWeb Dev - 1 PPT.pdf
Web Dev - 1 PPT.pdf
gdsczhcet49 views

HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro

  • 1. HBase Security for the Enterprise Andrew Purtell, Trend Micro On behalf of the Trend Hadoop Group apurtell@apache.org
  • 2. Agenda • Who we are • Motivation • Use Cases • Implementation • Experience • Quickstart Tutorial
  • 4. Trend Micro Headquartered: Tokyo, Japan Founded: LA 1988 • Technology innovator and top ranked security solutions provider • 4,000+ employees worldwide
  • 5. Trend Micro Smart Protection Network WEB REPUTATION EMAIL FILE Threats Threat Collection REPUTATION REPUTATION • Customers • Partners • TrendLabs Research, Service & Support Management • Samples • Submissions • Honeypots • Web Crawling Data • Feedback Loops Platform • Behavioral Analysis SaaS Partners • ISPs Cloud • Routers • Etc. Endpoint Off Network Gateway Messaging • Information integration is our advantage
  • 6. Trend Hadoop Group Cascading Pig Giraph Flume HDFS Oozie HBase Sqoop MapReduce ZooKeeper Mahout Avro Core Hive Gora Solr Supported Not Supported, Monitoring • We curate and support a complete internal distribution • We act within the ASF community processes on behalf of internal stakeholders, and are ASF evangelists
  • 8. Our Challenges • As we grow our business we see the network effects of our customers' interactions with the Internet and each other • This is a volume, variety, and velocity problem
  • 9. Why HBase? • For our Hadoop based applications, if we were forced to use MR for every operation, it would not be useful • Fortunately, HBase provides low latency random access to very large data tables and first class Hadoop platform integration
  • 10. But... • Hadoop, for us, is the centerpiece of a data management consolidation strategy • (Prior to release 0.92) HBase did not have intrinsic access control facilities • Why do we care? Provenance, fault isolation, data sensitivity, auditable controls, ...
  • 11. Our Solution • Use HBase where appropriate • Build in the basic access control features we need (added in 0.92, evolving in 0.94+) • Do so with a community sanctioned approach • As a byproduct of this work, we have Coprocessors, separately interesting
  • 13. Meta • Our meta use case: Data integration, storage and service consolidation Today: “Data neighborhood” Yesterday: Data islands
  • 14. Application Fault Isolation • Multitenant cluster, multiple application dev teams • Need to strongly authenticate users to all system components: HDFS, HBase, ZooKeeper • Rogue users cannot subvert authentication • Allow and enforce restrictive permissions on internal application state: files, tables/CFs, znodes
  • 15. Private Table (Default case) • Strongly authenticate users to all system components • Assign ownership when a table is created • Allow only the owner full access to table resources • Deny all others • (Optional) Privacy on the wire with encrypted RPC • Internal application state • Applications under development, proofs of concept
  • 16. Sensitive Column Families in Shared Tables • Strongly authenticate users to all system components • Grant read or read-write permissions to some CFs • Restrict access to one or more other CFs only to owner • Requires ACLs at per-CF granularity • Default deny to help avoid policy mistakes • Domain Reputation Repository (DRR) • Tracking and logging system (TLS), like Google's Dapper
  • 17. Read-only Access for Ad Hoc Query • Strongly authenticate users to all system components • Need to supply HBase delegation tokens to MR • Grant write permissions to data ingress and analytic pipeline processes • Grant read only permissions for ad hoc uses, such as Pig jobs • Default deny to help avoid policy mistakes • Knowledge discovery via ad hoc query (Pig)
  • 19. Goals and Non-Goals Goals • Satisfy use cases • Use what Secure Hadoop Core provides as much as possible • Minimally invasive to core code Non-Goals • Row-level or per value (cell) • Complex policy, full role based access control • Push down of file ownership to HDFS
  • 20. Coprocessors • Inspired by Bigtable coprocessors, hinted at like the Higgs Boson in Jeff Dean's LADIS '09 keynote talk • Dynamically installed code that runs at each region in the RegionServers, loaded on a per-table basis: Observers: Like database triggers, provide event-based hooks for interacting with normal operations Endpoints: Like stored procedures, custom RPC methods called explicitly with parameters • A high-level call interface for clients: Calls addressed to rows or ranges of rows are mapped to data location and parallelized by client library • Access checking is done by an Observer • New security APIs implemented as Endpoints
  • 21. Authentication • Built on Secure Hadoop Client authentication via Kerberos, a trusted third party Secure RPC based on SASL • SASL can negotiate encryption and/or message integrity verification on a per connection basis • Make RPC extensible and pluggable, add a SecureRpcEngine option • Support DIGEST-MD5 authentication, allowing Hadoop delegation token use for MapReduce TokenProvider, a Coprocessor that provides and verifies HBase delegation tokens, and manages shared secrets on the cluster
  • 22. Authorization – AccessController • AccessController: A Coprocessor that manages access control lists • Simple and familiar permissions model: READ, WRITE, CREATE, ADMIN • Permissions grantable at table, column family, and column qualifier granularity • Supports user and group based assignment • The Hadoop group mapping service can model application roles as groups
  • 24. Authorization – Secure ZooKeeper • ZooKeeper plays a critical role in HBase cluster operations and in the security implementation; needs strong security or it becomes a weak point • Kerberos-based client authentication • Znode ACLs enforce SASL authenticated access for sensitive data
  • 25. Audit • Simple audit log via Log4J • Still need to work out a structured format for audit log messages
  • 26. Two Implementation “Levels” 1. Secure RPC • SecureRPCEngine for integration with Secure Hadoop, strong user authentication, message integrity, and encryption on the wire • Implementation is solid 2. Coprocessor-based add-ons • TokenProvider: Install only if running MR jobs with HBase RPC security enabled • AccessController: Install on a per table basis, configure per CF policy, otherwise no overheads • Implementations bring in new runtime dependencies on ZooKeeper, still considered experimental
  • 27. Two Implementation “Levels” 1. Secure RPC • SecureRPCEngine for integration with Secure Hadoop, strong user authentication, message integrity, and encryption on the wire • Implementation is solid 2. Coprocessor-based add-ons • TokenProvider: Install only if running MR jobs with HBase RPC security enabled • AccessController: Install on a per table basis, configure per CF policy, otherwise no overheads • Implementations bring in new runtime dependencies on ZooKeeper, still considered experimental
  • 28. Layering Thrift client Thrift client REST client REST client HBase MapReduce client HBase MapReduce client HBase Thrift HBase Thrift HBase REST HBase REST TokenProvider TokenProvider HBase Java client HBase Java client HBase Secure RPC HBase Secure RPC AccessController (optional on a a per-table basis) AccessController (optional on per-table basis) HBase HBase Hadoop Secure RPC Hadoop Secure RPC MapReduce MapReduce HDFS HDFS Authentication infrastructure: Kerberos ++ LDAP Authentication infrastructure: Kerberos LDAP OS OS
  • 30. Secure RPC Engine • Authentication adds latency at connection setup: Extra round trips for SASL negotiation • Recommendation: Increase RPC idle time for better connection reuse • Negotiating message integrity (“auth-int”) takes ~5% off of max throughput • Negotiating SASL encryption (“auth-conf”) takes ~10% off of max throughput • Recommendation: Consider your need for such options carefully
  • 31. Secure RPC Engine • A Hadoop system including HBase will initiate RPC far more frequently than without (file reads, compactions, client API access, …) • If the KDC is overloaded then not only client operations but also things like region post deployment tasks may fail, increasing region transition time • Recommendation: HA KDC deployment, KDC capacity planning, trust federation over multiple KDC HA-pairs
  • 32. Secure RPC Engine • Activity swarms may be seen by a KDC as replay attacks (“Request is a replay (34)”) • Recommendation: Insure unique keys for each service instance, e.g. hbase/host@realm where host is fqdn • Recommendation: Check for clock skew over cluster hosts • Recommendation: Use MIT Kerberos 1.8 • Recommendation: Increase RPC idle time for better connection reuse • Recommendation: Avoid too frequent HBCK validation of cluster health
  • 33. Hadoop Security Issues (?) • Open issue: Occasional swarms of 5-10 seconds at intervals of about TGT lifetime of: date time host.dc ERROR [PostOpenDeployTasks: a74847b544ba37001f56a9d716385253] (org.apache.hadoop.security.UserGroupInformation) - PriviledgedActionException as:hbase/host.dc@realm (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] Some Hadoop RPC improvements not yet ported • Speaking of swarms, at or about delegation token expiration interval you may see runs of: date time host.dc ERROR [DataStreamer for file file block blockId] (org.apache.hadoop.security.UserGroupInformation) - PriviledgedActionException as:blockId (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException: Block token with block_token_identifier (expiryDate=timestamp, keyId=keyId, userId=hbase, blockIds=blockId, access modes=[READ|WRITE]) is expired. These should probably not be logged at ERROR level
  • 34. TokenProvider • Increases exposure to ZooKeeper related RegionServer aborts: If keys cannot be rolled or accessed due to a ZK error, we must fail closed • Recommendation: Provision sufficient ZK quorum peers and deploy them in separate failure domains (one at each top of rack, or similar) • Recommendation: Redundant L2 / L2+L3, you probably have it already • Recent versions of ZooKeeper have important bug fixes • Recommendation: Use ZooKeeper 3.4.4 (when released) or higher For more detail on HBase token authentication: http://wiki.apache.org/hadoop/Hbase/HBaseTokenAuthentication
  • 35. AccessController • Use 0.92.1 or above for a bug fix with Get protection • The AccessController will create a small new “system” table named _ acl _; the data in this table is almost as important as that in .META. • Recommendation: Use the shell to manually flush the ACL table after permissions changes to insure changes are persisted • Recommendation: The recommendations related to ZooKeeper for TokenProvider apply equally here
  • 36. Shell Support • Shell support is rudimentary, will support the basic use cases • Note: You must supply exactly the same permission specification to revoke as you did to grant; there is no wildcarding and nothing like revoke all