Curb Your Insecurity with HDP
Tips for a Secure Cluster (with Spark too)
Hadoop Summit – San Jose
June 29th, 2016
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Pardeep Kumar
Sr. Systems Architect, NA Prof. Services
4+ years in Hadoop
Helping Fortune500 customers succeed in
their Hadoop journey
Setup, implement, migrate and secure some
of the largest clusters in North America
Security, & Migration SME, HCC Guru
Loves Hadoop, Cricket and Kerberos ;)
pardeep.kumar@hortonworks.com
@hadooptutor
linkedin.com/in/pardeepkumarmishra
Ancil McBarnett
Sr. Solutions Engineer, NorthEast
Helping organizations design, implement,
operate and consume Hadoop and Big Data
Solutions. Specialize in Security and Hive
Tuning. HCC Guru.
Loves Cricket, and DJ Bravo Champion :D
amcbarnett@hortonworks.com
@mcbkingdom
linkedin.com/in/mcbkingdom
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hadoop Security in 4 Steps
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
How do I set policy across the entire cluster?
Who am I/prove it?
What can I do?
What did I do?
How can I encrypt at rest and over the wire?
Comprehensive Approach to Security
Data Protection
Protect data at rest and in motion
In order to protect any data system you must implement the following:
Audit
Maintain a record of data access
Authorization
Provision access to data
Authentication
Authenticate users and systems
Administration
Central management and consistent security
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDP Security: Comprehensive, Complete, Extensible
Perimeter Level Security
• Network Security (i.e. Firewalls)
• Apache Knox (i.e. Gateways)
Authentication
• LDAP/ AD - Kerberos
Data Protection
• Encrypts data in motion and data at rest;
refer partner encryption solutions for broader
needs: HDFS TDE with Ranger KMS
Authorization & Audit
• Consistent authorization controls
across all Apache components within
HDP: Apache Ranger
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Authentication with Kerberos
Kerberos is necessary evil, just do it!!
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Security Without Kerberos
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Configure Kerberos – Ambari Wizard
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Security With Kerberos
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Ranger
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Ranger
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS File Security
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hive Database and Table Security
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Authorization and Audit
Authorization
Fine grain access control
• HDFS – Folder, File
• Hive – Database, Table, Column
• HBase – Table, Column Family, Column
• Storm, Knox and more
Audit
Extensive user access auditing in
HDFS, Hive and HBase
• IP Address
• Resource type/ resource
• Timestamp
• Access granted or denied
Control access
into system
Flexibility
in defining
policies
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Rest API Security with Apache Knox
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hadoop REST APIs
Useful for connecting to Hadoop from the outside the cluster
When more client language flexibility is required
– i.e. Java binding not an option
Challenges
– Client must have knowledge of cluster topology
– Required to open ports (and in some cases, on every host) outside the cluster
Service API
WebHDFS Supports HDFS user operations including reading files, writing to
files, making directories, changing permissions and renaming.
WebHCat Job control for MapReduce, Pig and Hive jobs, and HCatalog DDL
commands. Learn more about WebHCat.
Hive Hive REST API operations
HBase HBase REST API operations
Oozie Job submission and management, and Oozie administration.
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Authentication—API Security with Knox
• Eliminates SSH “edge node”
• Central API management
• Central audit control
• Service level authorization
• SSO Integration—Siteminder
and OAM
• LDAP and AD integration
Incubated and led by Hortonworks,
Apache Knox extends the reach of Hadoop REST API
without Kerberos complexities
Integrated with existing systems to
simplify identity maintenance
Single, simple point of access for a
cluster
Central controls ensure consistency
across one or more clusters
• Kerberos Encapsulation
• Single Hadoop access point
• REST API hierarchy
• Consolidated API calls
• Multi-cluster support
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hadoop REST API with Knox
Service Direct URL Knox URL
WebHDFS http://namenode-host:50070/webhdfs https://knox-host:8443/webhdfs
WebHCat http://webhcat-host:50111/templeton https://knox-host:8443/templeton
Oozie http://ooziehost:11000/oozie https://knox-host:8443/oozie
HBase http://hbasehost:60080 https://knox-host:8443/hbase
Hive http://hivehost:10001/cliservice https://knox-host:8443/hive
YARN http://yarn-host:yarn-port/ws https://knox-host:8443/resourcemanager
Masters could
be on many
different hosts
One hosts,
one port
Consistent
paths
SSL config at
one host
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hadoop REST API Security: Drill-Down
REST
Client
Enterprise
Identity
Provider
LDAP/AD
Knox Gateway
GW
GW
Firewall
Firewall
DMZ
LB
Edge
Node/Hadoo
p CLIs RPC
HTTP
HTTP HTTP
LDAP
Hadoop Cluster 1
Masters
Slaves
RM
NN
Web
HCat
Oozie
DN NM
HS2
Hadoop Cluster 2
Masters
Slaves
RM
NN
Web
HCat
Oozie
DN NM
HS2
HBase
HBase
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data Protection
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data Protection
HDP allows you to apply data protection policy at
different layers across the Hadoop stack
Layer What? How ?
Storage and
Access
Encrypt data while it is at rest
HDFS Transparent Data Encryption, Partners,
Hbase encryption, OS level encrypt,
Transmission Encrypt data as it moves SSL, SASL, RPC
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Points of Communication
Page 22
WebHDFS
DataTransferProtocol
Nodes
M/R Shuffle
Client
1
2
4
RPC3
Nodes
DataTransfer2
JDBC/ODBC
3
Hadoop Cluster
RPC
4
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data Protection - HDFS Encryption
DATA ACCESS
DATA MANAGEMENT
SECURITY PARTNERS
YARN
KeyProvider API
(partner integration point)
Key Management System (KMS)
Stateless Key Management
°
1
°
°
°
°
° °
° °
° °
° °
° N°
1 ° ° ° ° °
° ° ° ° ° °
° ° ° ° ° °
° ° ° ° ° °
° ° ° ° ° °
° °
° °
° °
° °
°
HDFS
Encryption Zone
Encrypted
File
Encrypted
File
Encrypted
File
Encrypted
File
Encrypted
Files
Name
Node
HDFS
Client
HDFS
Client
• Leverage Native HDFS Transparent Data Encryption or commercial ones like Protegrity etc.
• Hortonworks collaborating with partners to deliver enterprise scale
Key Management , deliver more choices to customers
• Open source KMS with Ranger
• Or Partner with commercial KMS solutions i.e. Voltage KMS
- Partner joint engineering resources
- Voltage Stateless Key Management integrated with KeyProvider API
Only HDP offers open
source and
commercial choices
for key managementOpen Source Key Management
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo Transparent Data Encryption
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Securing Spark Deployments
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Spark - Authentication
Hadoop Cluster
Spark leverages Kerberos on
YARN
KDC
Use Spark ST,
submit Spark Job
Spark gets Namenode
(NN) service ticket
YARN launches
Spark Executors
using John Doe’s
identity
John
Doe
Spark AM
NN
Executor reads from HDFS
using John Doe’s
delegation token
kinit
1
2
3
4
5
6
7
Get Service Ticket
(ST) for Spark
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS
Spark – Authorization
YARN Cluster
A B C
KDC
Use Spark ST,
submit Spark Job
Get Namenode (NN)
service ticket
Executors
read from
HDFS
Client gets service
ticket for Spark
John
Doe
RangerCan John launch this job?
Can John read this file
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Spark – Channel Encryption - Example
Shuffle Data
Control/RPC
Shuffle
BlockTransfer
Read/Write
Data
FS – Broadcast,
File Download
spark.authenticate.enableSaslEncryption= true
spark.authenticate = true. Leverage YARN to distribute keys
Depends on Data Source, For HDFS RPC (RC4 | 3DES) or SSL for WebHDFS
NM > Ex leverages YARN based SSL
spark.ssl.enabled = true
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Gotchas with Spark Security
 Client -> Spark Thrift Server > Spark Executors – No identity propagation on 2nd hop
– Forces STS to run as Hive user to read all data
– Reduces security
– Use SparkSQL via shell or programmatic API
– https://issues.apache.org/jira/browse/SPARK-5159
 SparkSQL – Granular security unavailable
– Ranger integration will solve this problem (Refer to talk in Room 210A for Security in Spark and
Hive)
– Brings Row/Column level/Masking features to SparkSQL
 Spark + HBase with Kerberos
– Issue fixed in Spark 1.4 (Spark-6918)
 Spark Stream + Kafka + Kerberos + SSL
– Issues fixed in HDP 2.4.x
 Spark jobs > 72 Hours
– Kerberos token not renewed, fixed in Spark 1.5+
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Questions??

Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!

  • 1.
    Curb Your Insecuritywith HDP Tips for a Secure Cluster (with Spark too) Hadoop Summit – San Jose June 29th, 2016
  • 2.
    2 © HortonworksInc. 2011 – 2016. All Rights Reserved Pardeep Kumar Sr. Systems Architect, NA Prof. Services 4+ years in Hadoop Helping Fortune500 customers succeed in their Hadoop journey Setup, implement, migrate and secure some of the largest clusters in North America Security, & Migration SME, HCC Guru Loves Hadoop, Cricket and Kerberos ;) pardeep.kumar@hortonworks.com @hadooptutor linkedin.com/in/pardeepkumarmishra Ancil McBarnett Sr. Solutions Engineer, NorthEast Helping organizations design, implement, operate and consume Hadoop and Big Data Solutions. Specialize in Security and Hive Tuning. HCC Guru. Loves Cricket, and DJ Bravo Champion :D amcbarnett@hortonworks.com @mcbkingdom linkedin.com/in/mcbkingdom
  • 3.
    3 © HortonworksInc. 2011 – 2016. All Rights Reserved Hadoop Security in 4 Steps
  • 4.
    4 © HortonworksInc. 2011 – 2016. All Rights Reserved How do I set policy across the entire cluster? Who am I/prove it? What can I do? What did I do? How can I encrypt at rest and over the wire? Comprehensive Approach to Security Data Protection Protect data at rest and in motion In order to protect any data system you must implement the following: Audit Maintain a record of data access Authorization Provision access to data Authentication Authenticate users and systems Administration Central management and consistent security
  • 5.
    5 © HortonworksInc. 2011 – 2016. All Rights Reserved HDP Security: Comprehensive, Complete, Extensible Perimeter Level Security • Network Security (i.e. Firewalls) • Apache Knox (i.e. Gateways) Authentication • LDAP/ AD - Kerberos Data Protection • Encrypts data in motion and data at rest; refer partner encryption solutions for broader needs: HDFS TDE with Ranger KMS Authorization & Audit • Consistent authorization controls across all Apache components within HDP: Apache Ranger
  • 6.
    6 © HortonworksInc. 2011 – 2016. All Rights Reserved Authentication with Kerberos Kerberos is necessary evil, just do it!!
  • 7.
    7 © HortonworksInc. 2011 – 2016. All Rights Reserved Security Without Kerberos
  • 8.
    8 © HortonworksInc. 2011 – 2016. All Rights Reserved Configure Kerberos – Ambari Wizard
  • 9.
    9 © HortonworksInc. 2011 – 2016. All Rights Reserved Security With Kerberos
  • 10.
    10 © HortonworksInc. 2011 – 2016. All Rights Reserved Apache Ranger
  • 11.
    11 © HortonworksInc. 2011 – 2016. All Rights Reserved Apache Ranger
  • 12.
    12 © HortonworksInc. 2011 – 2016. All Rights Reserved HDFS File Security
  • 13.
    13 © HortonworksInc. 2011 – 2016. All Rights Reserved Hive Database and Table Security
  • 14.
    14 © HortonworksInc. 2011 – 2016. All Rights Reserved Authorization and Audit Authorization Fine grain access control • HDFS – Folder, File • Hive – Database, Table, Column • HBase – Table, Column Family, Column • Storm, Knox and more Audit Extensive user access auditing in HDFS, Hive and HBase • IP Address • Resource type/ resource • Timestamp • Access granted or denied Control access into system Flexibility in defining policies
  • 15.
    15 © HortonworksInc. 2011 – 2016. All Rights Reserved Rest API Security with Apache Knox
  • 16.
    16 © HortonworksInc. 2011 – 2016. All Rights Reserved Hadoop REST APIs Useful for connecting to Hadoop from the outside the cluster When more client language flexibility is required – i.e. Java binding not an option Challenges – Client must have knowledge of cluster topology – Required to open ports (and in some cases, on every host) outside the cluster Service API WebHDFS Supports HDFS user operations including reading files, writing to files, making directories, changing permissions and renaming. WebHCat Job control for MapReduce, Pig and Hive jobs, and HCatalog DDL commands. Learn more about WebHCat. Hive Hive REST API operations HBase HBase REST API operations Oozie Job submission and management, and Oozie administration.
  • 17.
    17 © HortonworksInc. 2011 – 2016. All Rights Reserved Authentication—API Security with Knox • Eliminates SSH “edge node” • Central API management • Central audit control • Service level authorization • SSO Integration—Siteminder and OAM • LDAP and AD integration Incubated and led by Hortonworks, Apache Knox extends the reach of Hadoop REST API without Kerberos complexities Integrated with existing systems to simplify identity maintenance Single, simple point of access for a cluster Central controls ensure consistency across one or more clusters • Kerberos Encapsulation • Single Hadoop access point • REST API hierarchy • Consolidated API calls • Multi-cluster support
  • 18.
    18 © HortonworksInc. 2011 – 2016. All Rights Reserved Hadoop REST API with Knox Service Direct URL Knox URL WebHDFS http://namenode-host:50070/webhdfs https://knox-host:8443/webhdfs WebHCat http://webhcat-host:50111/templeton https://knox-host:8443/templeton Oozie http://ooziehost:11000/oozie https://knox-host:8443/oozie HBase http://hbasehost:60080 https://knox-host:8443/hbase Hive http://hivehost:10001/cliservice https://knox-host:8443/hive YARN http://yarn-host:yarn-port/ws https://knox-host:8443/resourcemanager Masters could be on many different hosts One hosts, one port Consistent paths SSL config at one host
  • 19.
    19 © HortonworksInc. 2011 – 2016. All Rights Reserved Hadoop REST API Security: Drill-Down REST Client Enterprise Identity Provider LDAP/AD Knox Gateway GW GW Firewall Firewall DMZ LB Edge Node/Hadoo p CLIs RPC HTTP HTTP HTTP LDAP Hadoop Cluster 1 Masters Slaves RM NN Web HCat Oozie DN NM HS2 Hadoop Cluster 2 Masters Slaves RM NN Web HCat Oozie DN NM HS2 HBase HBase
  • 20.
    20 © HortonworksInc. 2011 – 2016. All Rights Reserved Data Protection
  • 21.
    21 © HortonworksInc. 2011 – 2016. All Rights Reserved Data Protection HDP allows you to apply data protection policy at different layers across the Hadoop stack Layer What? How ? Storage and Access Encrypt data while it is at rest HDFS Transparent Data Encryption, Partners, Hbase encryption, OS level encrypt, Transmission Encrypt data as it moves SSL, SASL, RPC
  • 22.
    22 © HortonworksInc. 2011 – 2016. All Rights Reserved Points of Communication Page 22 WebHDFS DataTransferProtocol Nodes M/R Shuffle Client 1 2 4 RPC3 Nodes DataTransfer2 JDBC/ODBC 3 Hadoop Cluster RPC 4
  • 23.
    23 © HortonworksInc. 2011 – 2016. All Rights Reserved Data Protection - HDFS Encryption DATA ACCESS DATA MANAGEMENT SECURITY PARTNERS YARN KeyProvider API (partner integration point) Key Management System (KMS) Stateless Key Management ° 1 ° ° ° ° ° ° ° ° ° ° ° ° ° N° 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° HDFS Encryption Zone Encrypted File Encrypted File Encrypted File Encrypted File Encrypted Files Name Node HDFS Client HDFS Client • Leverage Native HDFS Transparent Data Encryption or commercial ones like Protegrity etc. • Hortonworks collaborating with partners to deliver enterprise scale Key Management , deliver more choices to customers • Open source KMS with Ranger • Or Partner with commercial KMS solutions i.e. Voltage KMS - Partner joint engineering resources - Voltage Stateless Key Management integrated with KeyProvider API Only HDP offers open source and commercial choices for key managementOpen Source Key Management
  • 24.
    24 © HortonworksInc. 2011 – 2016. All Rights Reserved Demo Transparent Data Encryption
  • 25.
    25 © HortonworksInc. 2011 – 2016. All Rights Reserved Securing Spark Deployments
  • 26.
    26 © HortonworksInc. 2011 – 2016. All Rights Reserved Spark - Authentication Hadoop Cluster Spark leverages Kerberos on YARN KDC Use Spark ST, submit Spark Job Spark gets Namenode (NN) service ticket YARN launches Spark Executors using John Doe’s identity John Doe Spark AM NN Executor reads from HDFS using John Doe’s delegation token kinit 1 2 3 4 5 6 7 Get Service Ticket (ST) for Spark
  • 27.
    27 © HortonworksInc. 2011 – 2016. All Rights Reserved HDFS Spark – Authorization YARN Cluster A B C KDC Use Spark ST, submit Spark Job Get Namenode (NN) service ticket Executors read from HDFS Client gets service ticket for Spark John Doe RangerCan John launch this job? Can John read this file
  • 28.
    28 © HortonworksInc. 2011 – 2016. All Rights Reserved Spark – Channel Encryption - Example Shuffle Data Control/RPC Shuffle BlockTransfer Read/Write Data FS – Broadcast, File Download spark.authenticate.enableSaslEncryption= true spark.authenticate = true. Leverage YARN to distribute keys Depends on Data Source, For HDFS RPC (RC4 | 3DES) or SSL for WebHDFS NM > Ex leverages YARN based SSL spark.ssl.enabled = true
  • 29.
    29 © HortonworksInc. 2011 – 2016. All Rights Reserved Gotchas with Spark Security  Client -> Spark Thrift Server > Spark Executors – No identity propagation on 2nd hop – Forces STS to run as Hive user to read all data – Reduces security – Use SparkSQL via shell or programmatic API – https://issues.apache.org/jira/browse/SPARK-5159  SparkSQL – Granular security unavailable – Ranger integration will solve this problem (Refer to talk in Room 210A for Security in Spark and Hive) – Brings Row/Column level/Masking features to SparkSQL  Spark + HBase with Kerberos – Issue fixed in Spark 1.4 (Spark-6918)  Spark Stream + Kafka + Kerberos + SSL – Issues fixed in HDP 2.4.x  Spark jobs > 72 Hours – Kerberos token not renewed, fixed in Spark 1.5+
  • 30.
    30 © HortonworksInc. 2011 – 2016. All Rights Reserved Questions??