4. Access Control in Hadoop: Apache Ranger
>hdfs dfs -chmod -R 000 /apps/hive
4
[http://hortonworks.com/blog/best-practices-in-hdfs-authorization-with-apache-ranger]
5. Access Control in Hadoop: Apache Sentry
5
How do you ensure the consistency of the policies and the data?
[Mujumdar’15]
6. Access Control in Relational Databases
# Multi-tenancy for alice and bob on db1 and db2
grant all privileges on db1.* to ‘alice'@‘%‘;
grant all privileges on db2.* to ‘bob'@‘%‘;
6
Consistency of security and privileges guaranteed with foreign keys.
drop db2; // deletes associated privileges
16. Problem: Sensitive Data needs its own Cluster
16
NSA DataSet
User DataSet
Copy/cross-link between data sets
Alice has only one Kerberos Identity.
Neither attribute-based access control nor dynamic roles supported in Hadoop.
Alice
17. Solution: Project-Specific UserIDs
17
Project NSA
Project Users
Member of
NSA__Alice
Users__Alice
Member of
HDFS enforces
access control
How can we share DataSets between Projects?
18. Sharing Data with First-Class DataSets
18
Project NSA
Project Users
Member of
DataSetowns
Add members of Project
NSA to the DataSet group
NSA__Alice
Users__Alice
Member of
22. X.509 Certificate Per Project-Specific User
22
Alice@gmail.com
Authenticate
Add/Del
Users
Distributed
Database
Insert/Remove CertsProject
Mgr
Root
CA
Services
Hadoop
Spark
Kafka
etc
Cert Signing
Requests
23. Project
A project has an owner
A project is a collection of
- Members
- HDFS DataSets
- Kafka Topics
- Notebooks and Jobs
A project has quotas
23
project
dataset 1
dataset N
Topic 1
Topic N
Kafka
HDFS
24. Project Roles
Data Owner Privileges
- Import/Export data
- Manage Membership
- Share DataSets, Topics
Data Scientist Privileges
- Write and Run code
24
We delegate administration of privileges to users
just like GitHub
25. Elastic Hadoop
Each Project has:
YARN CPU Quota
(in mins)
HDFS Storage Quota
(in GB/TB)
Uber-Style Pricing
25
27. Delegate Access Control to HDFS
HDFS enforces access
control
- UserID per Project
- GroupID per Project and
DataSet
Metadata Integrity
using Foreign Keys
- Removing a project removes
all users, groups, and
(optionally) DataSets
27
28. Delegate Access Control to Kafka
Kafka brokers enforce access control with certificates
Principle name extracted from the X.509 Certificate
is: projectName__userID
HopsAuthorizer enforces ACLs for the topic
ACLs are stored in the distributed database
28
29. Free Text Search for Metadata
29
Free-Text
Search
Distributed
Database
ElasticSearch
The Distributed Database is the Single Source of Truth.
Zero overhead, streaming API synchronizes with Elasticsearch.
MetaData
Designer
MetaData
Entry
32. Automated Installation
32
Vagrant/Chef to spin up on a single host
Karamel/Chef to deploy on AWS/GCE/OpenStack or
on-premises
name: HopsWorks
ec2:
type: m3.medium
cookbooks:
hadoop: github: "hopshadoop/hopsworks-chef" version: "v0.1"
groups:
ui:
size: 1
recipes:
- hopsworks
metadata:
size: 2
recipes:
- hops::nn
- hops::rm
datanodes:
size: 50
recipes:
- hops::dn
- hops::nm
33. www.hops.site
33
A 2 MW datacenter research and test environment
5 lab modules, planned up to 3-4000 servers, 2-3000 square meters
[Slide by Prof. Tor Björn Minde, CEO SICS North Swedish ICT AB]