Learn the history of Project Rhino and its importance, the progress that’s been made so far (including a deep dive into the new security features announced with CDH 5.3), and what’s next for Hadoop security.
Intel launched Rhino project in early 2013. Project Rhino is an open source initiative dedicated to enhancing security in Hadoop. Since 2014, Cloudera joined project Rhino with Sentry project.
Our security story is one that we’re building hand-in-hand with Intel. In 2013, Intel established Project Rhino, which is a blueprint for enterprise-grade security. It’s meant to address many of the security concerns with Hadoop and we are working closely with them on many of these concerns – specifically around delivering unified authorization for Hadoop through Apache Sentry and bringing new encryption and key management frameworks to a Hadoop cluster.
Another note about Sentry - Sentry is an open source Apache project and its emerging as an open standard for unified authorization. It has a broad set of contributions from Cloudera, Intel, IBM, and Oracle. It ships in multiple distributions. We’ve seen wide industry adoption across verticals and many third-party integrations – we want to provide unified authorization not only for Hadoop services but also for the third-party tools that users are choosing to access the cluster with.
With Cloudera, we deliver unified authorization with Apache Sentry. Sentry provides unified authorization via fine-grained RBAC today for Impala, Hive, HDFS, and Search. The goal is to provide it for all Hadoop services and third-party applications (such as Spark, Pig, MR, BI Tools, etc). How does it work? You see here we have a Sentry Role (fraud analyst role) and this role has one or more permissions (for this example, read access to all transaction data so two parts – what are the actions that can be taken to some set of data and the scope of the data – read and all). There’s a group in AD called fraud analysts and Sam Smith, as a member of this group, has this role and these permissions. With the 5.3 release, we can provide table-level access control to MR, Spark, Pig etc and in 2015, we’ll add column level access control for all services. Scope of data control can be server, database, table, or column-level.
Sentry can be configured to use AD to determine a user’s group assignment so any changes to group assignment in AD is automatically picked up by Sentry, resulting in updated Sentry role assignments. So you can manage Sam Smith’s access to cluster simply by moving them between groups in AD. User access control to cluster is controlled via AD group management, which is how most group assignments are managed anyway (again, leveraging existing AD tools/skills).
https://github.com/intel-hadoop/project-rhino/
Navigator encrypt provides massively scalable, hi-performance at rest data encryption for all critical Hadoop data, in and out of HDFS
Navigator encrypt uses process based access controls to mitigate data custodian issues and prevent unauthorized access to data in clear-text
Navigator key trustee provides secure, policy driven key management for Navigator encrypt. Key trustee can also be used to secure and manage any security related Hadoop assets
e.g. SSL Certificates and SSH Keys
Navigator encrypt provides massively scalable, hi-performance at rest data encryption for all critical Hadoop data, in and out of HDFS
Navigator encrypt uses process based access controls to mitigate data custodian issues and prevent unauthorized access to data in clear-text
Navigator key trustee provides secure, policy driven key management for Navigator encrypt. Key trustee can also be used to secure and manage any security related Hadoop assets
e.g. SSL Certificates and SSH Keys
Navigator Encrypt provides massively scalable, high performance at rest data encryption for all critical Hadoop data, in and out of HDFS. Transparent encryption for Hadoop data as it’s written to disk.
We can enable compliance (HIPAA, PCI-DSS, SOX, FERPA, EU data protection) initiatives that require at-rest encryption and key management
Fast, easy deployment and configuration with enterprise scalability
We provide a transparent layer between the application and file system that dramatically reduces performance impact of encryption
Fully integrated into Navigator.
Features
Navigator encrypt uses process based access controls to mitigate data custodian issues and prevent unauthorized access to data in clear-text
We can ensure sensitive data and encryption keys are never stored in plain text nor exposed publicly
We can make sure only applications that need access to plaintext data will have it
Navigator encrypt can prevent admins and super users from accessing encrypted data
You can establish a variety of key retrieval policies that dictate who or what can access the secure artifact
Keys protected by Navigator key trustee
Navigator encrypt provides massively scalable, high performance at rest data encryption for all critical Hadoop data, in and out of HDFS. Transparent encryption for Hadoop data as it’s written to disk.
We can enable compliance (HIPAA, PCI-DSS, SOX, FERPA, EU data protection) initiatives that require at-rest encryption and key management
Fast, easy deployment and configuration with enterprise scalability
We provide a transparent layer between the application and file system that dramatically reduces performance impact of encryption
Fully integrated into Navigator.
Features
Navigator encrypt uses process based access controls to mitigate data custodian issues and prevent unauthorized access to data in clear-text
We can ensure sensitive data and encryption keys are never stored in plain text nor exposed publicly
We can make sure only applications that need access to plaintext data will have it
Navigator encrypt can prevent admins and super users from accessing encrypted data
You can establish a variety of key retrieval policies that dictate who or what can access the secure artifact
Keys protected by Navigator key trustee
Navigator key trustee is Cloudera’s key manager and the primary use case is storing keys for Navigator encrypt
Key trustee is a software based key manager with packaged integrations to HSM’s like SafeNet Luna, Thales nShield and RSA DPM ensuring consistency with infosec policies that require these boxes to serve as root-of-trust inside a corporate environment
Key trustee runs on a dedicated server and ensures the keys are stored separate from the data which is a requirement for regulations like PCI
In addition to key management, you can think of key trustee as a virtual safe deposit box that can be used to secure any type of sensitive assets for the cluster. SSL certificates, ssh keys, passwords, keytab files, truststore files and more can all be secured with key trustee
With Cloudera’s EDH, we have built in security that’s comprehensive, transparent, and compliance-ready. Cloudera offers a set of security and governance capabilities that’s unmatched within the Hadoop environment/ecosystem.