Apache Ranger

© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Ranger
Rommel Garcia

Who Am I
• Solutions Engineer @hortonworks
• Security SME Lead @hortonworks
• Author “Virtualizing Hadoop: How to Install, Deploy, and Optimize
Hadoop in A Virtualized Architecture”

5 Pillars of Security
• Authentication
• Authorization
• Audit
• Encryption
• Centralized Administration

Hadoop Security Tools
• AD/LDAP (authentication)
• Apache Knox (authentication)
• Kerberos (authentication)
• Apache Ranger (authorization, audit, kms)
• HDFS TDE (data encryption)
• Wire Encryption (data protection)

Data
Sources

Apache Ranger
• Provides centralized policy definition for authorizing access to
resources
• Supported components as of v0.5
• HDFS
• HBase
• Hive
• YARN
• Knox
• Storm
• Solr
• Kafka

Agent AgentAgent AgentAgent Agent
Apache Ranger authZ Architecture
HBase Hive YARN Knox Storm Solr Kafka
Agent
HDFS
Agent
Audit
Server
Policy
Server
Administration Portal
REST APIs
DB
SOLR
HDFS
KMS
LDAP/AD
user/group
syncLog4j

Sample Simplified Workflow - HDFS
Policy
Manager
Agent
Admin sets policies for HDFS
files/folder
Data scientist runs a
map reduce job
User
Application
Users access HDFS data
through application Name Node
IT users access
HDFS through
CLI
Namenode uses
Agent for
Authorization
Audit
Database Audit logs pushed to DB
Namenode provides
resource access to
user/client
1
2
2
2
3
4
5

authZ Best Practice – POSIX + Ranger
• HDFS -> POSIX -> owned by hdfs -> Ranger ACLs
• Hive -> POSIX -> owned by hive -> Ranger ACLs
• HBase -> POSIX -> owned by hbase -> Ranger ACLs
• Solr -> native -> owned by solr -> Ranger ACLs
• Kafka -> owned by kafka -> Ranger ACLs

authZ Best Practice - Ranger
10
000
(posix permissions on all HDFS files)

Ranger UserSync Best Practice
11
• Ensure LDAPS is used to integrate with Ranger
• Create OU ONLY for Hadoop users for performance
• Only run usersync when necessary
– How much users are being added and how often
– How much users are changing roles
– Too much syncing can degrade LDAP performance
• Do not sync anonymously

Ranger Audit Locations
12
• HDFS
– Long term storage that can be used to understand user event
trends and predict anomaly
• RDBMS
– When SQL is preferred by auditors
– MySQL, Oracle, Postgres, SQL Server
• Solr
– Nice quick reporting metrics to understand user event trends
• Log4j Appenders

Apache Ranger – ACLs & Audit Demo
Environment
• CentOS 6.6
• 2 vms
• FreeIPA 2.0
• HDP 2.3
• Apache Ranger v0.5
• Kerberized 2 node cluster

Q&A

1
°
°
°
°
° °
° °
° °
° °
° N°
Ranger KMS + HDFS TDE
DATA ACCESS
DATA MANAGEMENT
1 ° ° ° ° °
° ° ° ° ° °
° ° ° ° ° °
SECURITY
YARN
HDFS Client
° ° ° ° ° °
° ° ° ° ° °
° °
° °
° °
° °
°HDFS
(Hadoop Distributed File System)
Encryption Zone
(attributes - EZKey ID, version)
HDFS-6134
Encrypted File
(attributes - EDEK, IV)
Name Node
KeyProvider
API
KeyProvider
API
Key Management
System (KMS)
Hadoop-10433
KeyProvider API –
Hadoop-10141
EDEK
DEK
Crypto Stream
(r/w with DEK)
DEKs EZKs
Acronym Description
EZ Encryption Zone (an HDFS directory)
EZK Encryption Zone Key; master key associated with all
files in an EZ
DEK Data Encryption Key, unique key associated with each
file. EZ Key used to generate DEK
EDEK Encrypted DEK, Name Node only has access to
encrypted DEK.
IV Initialization Vector
EDEK
EDEK

Apache Ranger – KMS + TDE Demo
Exercise
• Create an encryption zone
• Create key for encryption zone
• Create file
• Load to hdfs, encrypted zone
• List encrypted file
• Print encrypted file

Thank you!
Rommel Garcia
@rommelgarcia
/in/rommelgarcia

Apache Ranger

More Related Content

What's hot

Similar to Apache Ranger

More from Rommel Garcia

Recently uploaded

Apache Ranger