1Š Cloudera, Inc. All rights reserved.
Project Rhino: Enhancing Data
Protection for Hadoop
Sam Heywood – Director of Product Management,
Cloudera
Ritu Kama – Director of Product Management, Intel
2Š Cloudera, Inc. All rights reserved.
Agenda
• Big Data Security Challenges
• Project Rhino & Security for Hadoop
• Unified Authorization
• HDFS Encryption
• Cloudera’s Compliance-Ready Security
3Š Cloudera, Inc. All rights reserved.
How is Big Data Different
Why It’s Different Architecturally
• Shared data
• Highly distributed system and inter-node communication
• All data is online
Why It’s Different Operationally
• Operate in internal network
• Insider data access
• No native security deployed, depends on traditional security perimeter
4Š Cloudera, Inc. All rights reserved.
Two Reasons for Security for Hadoop
Hadoop Contains Sensitive Data
• All data is security relevant
• Improper usage or breaches of data will cause huge damage to the business
• Hadoop is governed by the same security requirements as any data center platform
Hadoop is Subject to Compliance Adherence
• Organization are often subject to comply with regulations such as HIPPA, PCI-DSS that require
protection of personal information
• Adhere to other corporate security policies
5Š Cloudera, Inc. All rights reserved.
A Brief History of Hadoop Security
Originally developed
without security in mind
Yahoo! focused on
adding authentication
Project Rhino works to
add security to Hadoop
2008 2009 2013
• No authentication of users or
services
• Anyone could submit arbitrary
code to be executed
• Any user could impersonate
other users
• Resulting security model was
complex
• Security configurations were
complex and error-prone
• No data-at-rest encryption
• Limited authorization
capabilities
Project aims to add:
• Data Protection
• Authorization
• Authentication
6Š Cloudera, Inc. All rights reserved.
Project Rhino Initiatives
Authentication
• Token Based Authentication
• Token Preauth
Authorization
• Sentry Role-Based
Authorization
• HBase Cell Security
Data
Protection
• Cryptographic File System and Data
Encryption at Rest
• Data Encryption with AES-NI & Diceros
• HBase Transparent Encryption
• HDFS Extended Attribute
• Zookeeper, Hive and Pig Data Encryption
7Š Cloudera, Inc. All rights reserved.
Blueprint for enterprise-grade
security
Cloudera and Intel Project Rhino
Rhino Goal: Unified Authorization
Engineers at Intel and Cloudera
(together with Oracle and IBM)
are now jointly contributing to
Apache Sentry
Rhino Goal: Encryption and Key
Management Framework
Cloudera and Intel engineers are now
contributing HDFS encryption
capabilities that can plug into enterprise
key managers
8Š Cloudera, Inc. All rights reserved.
Unified Authorization
Apache Sentry
9Š Cloudera, Inc. All rights reserved.
Sentry – The Open Standard
Broad
Contributions
• Cloudera
• IBM
• Intel
• Oracle
Multi-Vendor
Support
• Cloudera
• IBM
• MapR
• Oracle
Wide Industry
Adoption
• Banking
• Healthcare
• Insurance
• Pharma
• Telco
Third-Party
Integrations
• Oracle
Endeca
• Platfora
10Š Cloudera, Inc. All rights reserved.
Sentry provides unified authorization via fine-grained RBAC for Impala,
Hive, HDFS, and Search
Goal: Unified authorization for all Hadoop services and applications
Unified Authorization with Apache Sentry
Sentry Perm.
Read Access
to ALL
Transaction
Data
Sentry Role
Fraud Analyst
Role
Group
Fraud
Analysts
Sam Smith
11Š Cloudera, Inc. All rights reserved.
• Sentry can be configured to use AD to determine a user’s group
assignments
• Group assignment changes in AD are automatically picked up, resulting in
updated Sentry role assignments
Sentry and Active Directory Groups
Sentry Perm.
Read Access
to ALL
Transaction
Data
Sentry Role
Fraud Analyst
Role
AD Group
Fraud
Analysts
Sam Smith
12Š Cloudera, Inc. All rights reserved.
Sentry Enforcement with CDH 5.3
Hive
Server 2
Enforcement
code
Impala
MR, Pig,
HDFS
Apps:
Datameer,
Platfora,
etc
Permissions
Rules
Common enforcement
code for consistency
Permissions specified by
administrators
(top-level and delegated)
Enforcement
code
Enforcement
code
Enforcement
code
Rule 1: Allow fraud
analysts read
access to the
transaction table
13Š Cloudera, Inc. All rights reserved.
Encryption & Key Management
HDFS Encryption
14Š Cloudera, Inc. All rights reserved.
HDFS Encryption Available with CDH 5.3
• Supports specification of HDFS directories
as “Encryption Zones
• All subsequent directory contents
encrypted
• Multi-tenant encryption with tenant
specific keys
• Separation of duties via key access
restrictions
• Key management via Navigator Key Trustee
15Š Cloudera, Inc. All rights reserved.
• Encryption for HDFS, HBase
• No encryption for metadata, log files,
ingest paths
• No key management
• Complicated, manual command line
configuration
• Disjointed audit trail
Open Source HDFS Encryption
Manager Navigator
Impala Hive
HDFS HBase
Sentry
Log Files
Ingest Paths
Metadata Store
Encrypted Data
Encryption Key
Legend
16Š Cloudera, Inc. All rights reserved.
Cloudera’s Solution:
• ALL data encrypted: HDFS, HBase,
metadata, log files, ingest paths
• Enterprise Key Management via Navigator
Key Trustee
• Configuration support via Cloudera
Manager
• Audit integration to Cloudera Navigator
• Optional root-of-trust integration with
HSMs
Compliance-Ready Encryption & Key Management
Manager Navigator
Impala Hive
HDFS HBase
Sentry
Navigator Key Trustee
Log Files
Metadata Store
Encrypted Data
Encryption Key
Legend
Ingest Paths
17Š Cloudera, Inc. All rights reserved.
Cloudera Enterprise Open Source
HDFS Data Encryption ✔ ✔
HBase Encryption ✔ ✔
Log File Encryption ✔ ✖
Metadata Encryption ✔ ✖
Ingest Path Encryption ✔ ✖
Key Management ✔ ✖
HSM Integration ✔ ✖
Configuration ✔ ✖
Integrated Auditing ✔ ✖
Comparison: Encryption and Key Management
18Š Cloudera, Inc. All rights reserved.
Encryption & Key Management
Navigator Encrypt & Navigator Key Trustee
19Š Cloudera, Inc. All rights reserved.
Transparent layer between
application and file system
• Compliance-Ready
• Massively Scalable
• High Performance: Optimized for
Intel
• Separation of Duties via process
based access controls
• Key Management with Navigator Key
Trustee
Navigator Encrypt
20Š Cloudera, Inc. All rights reserved.
“Virtual safe-deposit box” for managing encryption keys or other
Hadoop security artifact
Navigator Key Trustee
• Separates keys from encrypted data
• Hot/Hot-Tandem dual key manager
configuration
• Integration with HSMs from Thales,
RSA, and SafeNet
• Roadmap: Management of SSL
certificates, SSH keys, tokens,
passwords, Kerberos Keytab Files, and
more
21Š Cloudera, Inc. All rights reserved.
• Using views, Sentry provides column restricted access to data
• Combined with UDF’s, the resulting data will be dynamically masked before
displaying to the user
Dynamic Data Masking with Apache Sentry
Sentry Perm.
Masked
Access to
subset of
Patient Data
Sentry Role
Clinical
Analyst Role
Group
Clinical
Analysts
Sam Smith
22Š Cloudera, Inc. All rights reserved.
What’s Next?
• Log Redaction
• Highly Available Authorization
• Unified Credential Management
• Simplified Wire Encryption
• Attribute-Based Access Controls & “Follow the Data” Security
• Continued Cloudera & Intel Efforts
23Š Cloudera, Inc. All rights reserved.
Balance Security and Privacy with Business Agility
Cloudera is the leader in
Hadoop security.
Unique Capabilities:
• Comprehensive and Unified
• Secure at the core
• No Performance Impact
• Jointly engineered with Intel
• Compliance-Ready
• Only distribution to pass PCI audit
1. Perimeter Standards-based Authentication
Security and Administration
Unlimited Storage
Process Discover Model Serve
2. Access Unified Role-based Authorization
4. Data Encryption & Key Management
3. Visibility Auditing & Governance
24Š Cloudera, Inc. All rights reserved.
Thank You

Project Rhino: Enhancing Data Protection for Hadoop

  • 1.
    1© Cloudera, Inc.All rights reserved. Project Rhino: Enhancing Data Protection for Hadoop Sam Heywood – Director of Product Management, Cloudera Ritu Kama – Director of Product Management, Intel
  • 2.
    2© Cloudera, Inc.All rights reserved. Agenda • Big Data Security Challenges • Project Rhino & Security for Hadoop • Unified Authorization • HDFS Encryption • Cloudera’s Compliance-Ready Security
  • 3.
    3© Cloudera, Inc.All rights reserved. How is Big Data Different Why It’s Different Architecturally • Shared data • Highly distributed system and inter-node communication • All data is online Why It’s Different Operationally • Operate in internal network • Insider data access • No native security deployed, depends on traditional security perimeter
  • 4.
    4© Cloudera, Inc.All rights reserved. Two Reasons for Security for Hadoop Hadoop Contains Sensitive Data • All data is security relevant • Improper usage or breaches of data will cause huge damage to the business • Hadoop is governed by the same security requirements as any data center platform Hadoop is Subject to Compliance Adherence • Organization are often subject to comply with regulations such as HIPPA, PCI-DSS that require protection of personal information • Adhere to other corporate security policies
  • 5.
    5© Cloudera, Inc.All rights reserved. A Brief History of Hadoop Security Originally developed without security in mind Yahoo! focused on adding authentication Project Rhino works to add security to Hadoop 2008 2009 2013 • No authentication of users or services • Anyone could submit arbitrary code to be executed • Any user could impersonate other users • Resulting security model was complex • Security configurations were complex and error-prone • No data-at-rest encryption • Limited authorization capabilities Project aims to add: • Data Protection • Authorization • Authentication
  • 6.
    6© Cloudera, Inc.All rights reserved. Project Rhino Initiatives Authentication • Token Based Authentication • Token Preauth Authorization • Sentry Role-Based Authorization • HBase Cell Security Data Protection • Cryptographic File System and Data Encryption at Rest • Data Encryption with AES-NI & Diceros • HBase Transparent Encryption • HDFS Extended Attribute • Zookeeper, Hive and Pig Data Encryption
  • 7.
    7Š Cloudera, Inc.All rights reserved. Blueprint for enterprise-grade security Cloudera and Intel Project Rhino Rhino Goal: Unified Authorization Engineers at Intel and Cloudera (together with Oracle and IBM) are now jointly contributing to Apache Sentry Rhino Goal: Encryption and Key Management Framework Cloudera and Intel engineers are now contributing HDFS encryption capabilities that can plug into enterprise key managers
  • 8.
    8Š Cloudera, Inc.All rights reserved. Unified Authorization Apache Sentry
  • 9.
    9© Cloudera, Inc.All rights reserved. Sentry – The Open Standard Broad Contributions • Cloudera • IBM • Intel • Oracle Multi-Vendor Support • Cloudera • IBM • MapR • Oracle Wide Industry Adoption • Banking • Healthcare • Insurance • Pharma • Telco Third-Party Integrations • Oracle Endeca • Platfora
  • 10.
    10Š Cloudera, Inc.All rights reserved. Sentry provides unified authorization via fine-grained RBAC for Impala, Hive, HDFS, and Search Goal: Unified authorization for all Hadoop services and applications Unified Authorization with Apache Sentry Sentry Perm. Read Access to ALL Transaction Data Sentry Role Fraud Analyst Role Group Fraud Analysts Sam Smith
  • 11.
    11© Cloudera, Inc.All rights reserved. • Sentry can be configured to use AD to determine a user’s group assignments • Group assignment changes in AD are automatically picked up, resulting in updated Sentry role assignments Sentry and Active Directory Groups Sentry Perm. Read Access to ALL Transaction Data Sentry Role Fraud Analyst Role AD Group Fraud Analysts Sam Smith
  • 12.
    12Š Cloudera, Inc.All rights reserved. Sentry Enforcement with CDH 5.3 Hive Server 2 Enforcement code Impala MR, Pig, HDFS Apps: Datameer, Platfora, etc Permissions Rules Common enforcement code for consistency Permissions specified by administrators (top-level and delegated) Enforcement code Enforcement code Enforcement code Rule 1: Allow fraud analysts read access to the transaction table
  • 13.
    13Š Cloudera, Inc.All rights reserved. Encryption & Key Management HDFS Encryption
  • 14.
    14© Cloudera, Inc.All rights reserved. HDFS Encryption Available with CDH 5.3 • Supports specification of HDFS directories as “Encryption Zones • All subsequent directory contents encrypted • Multi-tenant encryption with tenant specific keys • Separation of duties via key access restrictions • Key management via Navigator Key Trustee
  • 15.
    15© Cloudera, Inc.All rights reserved. • Encryption for HDFS, HBase • No encryption for metadata, log files, ingest paths • No key management • Complicated, manual command line configuration • Disjointed audit trail Open Source HDFS Encryption Manager Navigator Impala Hive HDFS HBase Sentry Log Files Ingest Paths Metadata Store Encrypted Data Encryption Key Legend
  • 16.
    16© Cloudera, Inc.All rights reserved. Cloudera’s Solution: • ALL data encrypted: HDFS, HBase, metadata, log files, ingest paths • Enterprise Key Management via Navigator Key Trustee • Configuration support via Cloudera Manager • Audit integration to Cloudera Navigator • Optional root-of-trust integration with HSMs Compliance-Ready Encryption & Key Management Manager Navigator Impala Hive HDFS HBase Sentry Navigator Key Trustee Log Files Metadata Store Encrypted Data Encryption Key Legend Ingest Paths
  • 17.
    17© Cloudera, Inc.All rights reserved. Cloudera Enterprise Open Source HDFS Data Encryption ✔ ✔ HBase Encryption ✔ ✔ Log File Encryption ✔ ✖ Metadata Encryption ✔ ✖ Ingest Path Encryption ✔ ✖ Key Management ✔ ✖ HSM Integration ✔ ✖ Configuration ✔ ✖ Integrated Auditing ✔ ✖ Comparison: Encryption and Key Management
  • 18.
    18Š Cloudera, Inc.All rights reserved. Encryption & Key Management Navigator Encrypt & Navigator Key Trustee
  • 19.
    19© Cloudera, Inc.All rights reserved. Transparent layer between application and file system • Compliance-Ready • Massively Scalable • High Performance: Optimized for Intel • Separation of Duties via process based access controls • Key Management with Navigator Key Trustee Navigator Encrypt
  • 20.
    20© Cloudera, Inc.All rights reserved. “Virtual safe-deposit box” for managing encryption keys or other Hadoop security artifact Navigator Key Trustee • Separates keys from encrypted data • Hot/Hot-Tandem dual key manager configuration • Integration with HSMs from Thales, RSA, and SafeNet • Roadmap: Management of SSL certificates, SSH keys, tokens, passwords, Kerberos Keytab Files, and more
  • 21.
    21© Cloudera, Inc.All rights reserved. • Using views, Sentry provides column restricted access to data • Combined with UDF’s, the resulting data will be dynamically masked before displaying to the user Dynamic Data Masking with Apache Sentry Sentry Perm. Masked Access to subset of Patient Data Sentry Role Clinical Analyst Role Group Clinical Analysts Sam Smith
  • 22.
    22© Cloudera, Inc.All rights reserved. What’s Next? • Log Redaction • Highly Available Authorization • Unified Credential Management • Simplified Wire Encryption • Attribute-Based Access Controls & “Follow the Data” Security • Continued Cloudera & Intel Efforts
  • 23.
    23© Cloudera, Inc.All rights reserved. Balance Security and Privacy with Business Agility Cloudera is the leader in Hadoop security. Unique Capabilities: • Comprehensive and Unified • Secure at the core • No Performance Impact • Jointly engineered with Intel • Compliance-Ready • Only distribution to pass PCI audit 1. Perimeter Standards-based Authentication Security and Administration Unlimited Storage Process Discover Model Serve 2. Access Unified Role-based Authorization 4. Data Encryption & Key Management 3. Visibility Auditing & Governance
  • 24.
    24Š Cloudera, Inc.All rights reserved. Thank You

Editor's Notes

  • #7 Intel launched Rhino project in early 2013. Project Rhino is an open source initiative dedicated to enhancing security in Hadoop. Since 2014, Cloudera joined project Rhino with Sentry project.
  • #8 Our security story is one that we’re building hand-in-hand with Intel. In 2013, Intel established Project Rhino, which is a blueprint for enterprise-grade security. It’s meant to address many of the security concerns with Hadoop and we are working closely with them on many of these concerns – specifically around delivering unified authorization for Hadoop through Apache Sentry and bringing new encryption and key management frameworks to a Hadoop cluster.
  • #10 Another note about Sentry - Sentry is an open source Apache project and its emerging as an open standard for unified authorization. It has a broad set of contributions from Cloudera, Intel, IBM, and Oracle. It ships in multiple distributions. We’ve seen wide industry adoption across verticals and many third-party integrations – we want to provide unified authorization not only for Hadoop services but also for the third-party tools that users are choosing to access the cluster with.
  • #11 With Cloudera, we deliver unified authorization with Apache Sentry. Sentry provides unified authorization via fine-grained RBAC today for Impala, Hive, HDFS, and Search. The goal is to provide it for all Hadoop services and third-party applications (such as Spark, Pig, MR, BI Tools, etc). How does it work? You see here we have a Sentry Role (fraud analyst role) and this role has one or more permissions (for this example, read access to all transaction data so two parts – what are the actions that can be taken to some set of data and the scope of the data – read and all). There’s a group in AD called fraud analysts and Sam Smith, as a member of this group, has this role and these permissions. With the 5.3 release, we can provide table-level access control to MR, Spark, Pig etc and in 2015, we’ll add column level access control for all services. Scope of data control can be server, database, table, or column-level.
  • #12 Sentry can be configured to use AD to determine a user’s group assignment so any changes to group assignment in AD is automatically picked up by Sentry, resulting in updated Sentry role assignments. So you can manage Sam Smith’s access to cluster simply by moving them between groups in AD. User access control to cluster is controlled via AD group management, which is how most group assignments are managed anyway (again, leveraging existing AD tools/skills).
  • #15 https://github.com/intel-hadoop/project-rhino/
  • #16 Navigator encrypt provides massively scalable, hi-performance at rest data encryption for all critical Hadoop data, in and out of HDFS Navigator encrypt uses process based access controls to mitigate data custodian issues and prevent unauthorized access to data in clear-text Navigator key trustee provides secure, policy driven key management for Navigator encrypt. Key trustee can also be used to secure and manage any security related Hadoop assets e.g. SSL Certificates and SSH Keys
  • #17 Navigator encrypt provides massively scalable, hi-performance at rest data encryption for all critical Hadoop data, in and out of HDFS Navigator encrypt uses process based access controls to mitigate data custodian issues and prevent unauthorized access to data in clear-text Navigator key trustee provides secure, policy driven key management for Navigator encrypt. Key trustee can also be used to secure and manage any security related Hadoop assets e.g. SSL Certificates and SSH Keys
  • #20 Navigator Encrypt provides massively scalable, high performance at rest data encryption for all critical Hadoop data, in and out of HDFS. Transparent encryption for Hadoop data as it’s written to disk. We can enable compliance (HIPAA, PCI-DSS, SOX, FERPA, EU data protection) initiatives that require at-rest encryption and key management Fast, easy deployment and configuration with enterprise scalability We provide a transparent layer between the application and file system that dramatically reduces performance impact of encryption Fully integrated into Navigator. Features Navigator encrypt uses process based access controls to mitigate data custodian issues and prevent unauthorized access to data in clear-text We can ensure sensitive data and encryption keys are never stored in plain text nor exposed publicly We can make sure only applications that need access to plaintext data will have it Navigator encrypt can prevent admins and super users from accessing encrypted data You can establish a variety of key retrieval policies that dictate who or what can access the secure artifact Keys protected by Navigator key trustee Navigator encrypt provides massively scalable, high performance at rest data encryption for all critical Hadoop data, in and out of HDFS. Transparent encryption for Hadoop data as it’s written to disk. We can enable compliance (HIPAA, PCI-DSS, SOX, FERPA, EU data protection) initiatives that require at-rest encryption and key management Fast, easy deployment and configuration with enterprise scalability We provide a transparent layer between the application and file system that dramatically reduces performance impact of encryption Fully integrated into Navigator. Features Navigator encrypt uses process based access controls to mitigate data custodian issues and prevent unauthorized access to data in clear-text We can ensure sensitive data and encryption keys are never stored in plain text nor exposed publicly We can make sure only applications that need access to plaintext data will have it Navigator encrypt can prevent admins and super users from accessing encrypted data You can establish a variety of key retrieval policies that dictate who or what can access the secure artifact Keys protected by Navigator key trustee
  • #21 Navigator key trustee is Cloudera’s key manager and the primary use case is storing keys for Navigator encrypt Key trustee is a software based key manager with packaged integrations to HSM’s like SafeNet Luna, Thales nShield and RSA DPM ensuring consistency with infosec policies that require these boxes to serve as root-of-trust inside a corporate environment Key trustee runs on a dedicated server and ensures the keys are stored separate from the data which is a requirement for regulations like PCI In addition to key management, you can think of key trustee as a virtual safe deposit box that can be used to secure any type of sensitive assets for the cluster. SSL certificates, ssh keys, passwords, keytab files, truststore files and more can all be secured with key trustee
  • #24 With Cloudera’s EDH, we have built in security that’s comprehensive, transparent, and compliance-ready. Cloudera offers a set of security and governance capabilities that’s unmatched within the Hadoop environment/ecosystem.