SlideShare a Scribd company logo
© Hortonworks Inc. 2015
Hadoop and Kerberos:
The madness beyond the
gate
Steve Loughran
stevel@hortonworks.com
@steveloughran
2015
Page 2
Me: Before Kerberos
© Hortonworks Inc.
Page 3
Me: After Kerberos
© Hortonworks Inc.
Page 4
HP Lovecraft Kerberos
Evil lurking in New England MIT Project Athena
Ancient, inhuman deities Kerberos Domain Controller
Manuscripts to drive the reader
insane
IETF RFC 4120
Entities never spoken of aloud UserGroupInformation
Doomed explorers of darkness You
© Hortonworks Inc. 2015
Leave now if you want
to retain your life of
naïve innocence
Page 5
© Hortonworks Inc.
Page 6
© Hortonworks Inc. 2015
export HADOOP_USER="root"
Page 7
© Hortonworks Inc. 2015
Modern Hadoop clusters
are locked down
through Kerberos
Page 8
© Hortonworks Inc. 2015
Discover Kerberos
before Kerberos
discovers you
Page 9
© Hortonworks Inc. 2015
Kerberos:
the gateway to hell
Page 10
© Hortonworks Inc.
This is not a metaphor
Art: Andrés Álvarez Iglesias
© Hortonworks Inc. 2015
KP
Kerberos is the gateway
Page 12
Authentication Service
Ticket Granting Service
Principal
user@REALM
user/hostname@REALM
(P, TGS, n1)
{KP.TGS, n1}KP, {ticket(P,TGS)} KTGS
Ticket(P, TGS) =
(TGS, P, tstart, tend, KPT)
KP
{KP.S, n2}KP, {ticket(P,S)} KS
{auth(P)}KP.TGS,{ticket(P,TGS)}KTGS,S,n2
KTGS
Kerberos Domain ControllerClient
auth(P)KP.TGS = {P, time)}KP.TGS
© Hortonworks Inc
Every service is a principal
alice@REALM
bob@REALM
oozie/ooziehost@REALM
namenode/nn1@REALM
hdfs/_HOST@REALM
hdfs/r04s12@REALM
hdfs/r04s13@REALM
yarn/_HOST@REALM
yarn/r04s12@REALM
HTTP/_HOST@REALM
Page 13
short names:
alice
bob
oozie
namenode
hdfs
yarn
HTTP
© Hortonworks Inc.
Page 14
Entering the darkness
© Hortonworks Inc.
Page 16
© Hortonworks Inc. 2015
HDFS Bootstrap: Kerberos Login
Page 17
shared keytab in /etc/hadoop
log in to kerberos
datanode/_HOST@REALM
tickets for TGS
namenode/nn@REALM
© Hortonworks Inc. 2015
HDFS Bootstrap: DNs register with NN
Page 18
shared keytab in /etc/hadoop
DN registration
Ticket for namenode/nn@REALM
ExportedBlockKeys
Request ticket for namenode/nn@REALM
namenode/nn@REALM
datanode/_HOST@REALM
© Hortonworks Inc.
Hadoop Tokens
• Issued and tracked by individual services
(HDFS, WebHDFS, Timeline Server, YARN RM, …)
• Grant some form of access:
Block tokens, Delegation Tokens
• Can be passed on to other processes
• Renewable via service APIs (RPC, HTTP)
• Revocable in server via service APIs
Page 19
read: O'Malley 2009, Hadoop Security Architecture
© Hortonworks Inc. 2015
HDFS IO: Block Tokens
Page 20
alice@REALM
Obtain ticket for namenode/nn@REALM
BlockToken
BlockToken
BlockToken: userId, (BlockPoolId, BlockId), keyId, expiryDate, access-modes
namenode/nn@REALM
open("file")
© Hortonworks Inc. 2015
service/host@REALM
Delegation Tokens delegate access
Page 21
alice@REALM BlockToken
HDFS
Delegation
Token
BlockToken
HDFS
Delegation
Token
HDFS
Delegation
Token
namenode/nn@REALM
Token
Obtain ticket for namenode/nn@REALM
Request delegation
token
© Hortonworks Inc. 2015
Launch Context
YARN app launch
Page 22
alice@REALM
HDFS
Delegation
Token
HDFS
resourcemanager/rm@REALM
nodemanager/_HOST@REALMalice
Launch Context
AM/RM
HDFS AM/RM
HDFS
HDFS
HDFS
AM/RM
namenode/nn@REALM
Obtain ticket for resourcemanager/rm@REALM
Request delegation
token
AM/RM
Token
Obtain tickvet for namenode/nn@REALM
AM/RM'
AM/RM'
AM/RM'
Refresh AM/RM
© Hortonworks Inc
That which must not be named: UGI
if(!UserGroupInformation.isSecurityEnabled()) {
stayInALifeOfNaiveInnocence();
} else {
sufferTheEnternalPainOfKerberos();
}
UserGroupInformation.checkTGTAndReloginFromKeytab();
UserGroupInformation.getLoginUser() // principal logged in as
UserGroupInformation.getCurrentUser() // principal acting as
Page 23
© Hortonworks Inc
UGI.doAs()
UserGroupInformation bob =
UserGroupInformation.createProxyUser("bob",
UserGroupInformation.getLoginUser());
FileSystem userFS = bob.doAs(
new PrivilegedExceptionAction<FileSystem>() {
public FileSystem run() throws Exception {
return FileSystem.get(FileSystem.getDefaultUri(), conf);
}
});
Page 24
© Hortonworks Inc
Hadoop RPC
@KerberosInfo(serverPrincipal = "my.kerberos.principal")
public interface MyRpc extends VersionedProtocol { … }
public class MyRpcPolicyProvider extends PolicyProvider {
public Service[] getServices() {
return new Service[] {
new Service("my.protocol.acl", MyRpc.class)
};
}
}
public class MyRpcSecurityInfo extends SecurityInfo { … }
META-INF/services/org.apache.hadoop.security.SecurityInfo
org.example.rpc.MyRpcSecurityInfo
Page 25
© Hortonworks Inc
IPC Server: get the current user identity
Messages.KillResponse killContainer(Messages.KillRequest request) {
UserGroupInformation callerUGI;
try {
callerUGI = UserGroupInformation.getCurrentUser();
} catch (IOException ie) {
LOG.info("Error getting UGI ", ie);
AuditLogger.logFailure("UNKNOWN", "Error getting UGI");
throw RPCUtil.getRemoteException(ie);
}
…
Page 26
© Hortonworks Inc
IPC Server: Authorize
String user = callerUGI.getShortUserName();
if (!checkAccess(callerUGI, MODIFY)) {
AuditLog.unauthorized(user,
KILL_CONTAINER_REQUEST,
"User doesn't have permissions to " + MODIFY);
throw RPCUtil.getRemoteException(
new AccessControlException(
+ user + " lacks access "
+ MODIFY_APP.name()));
}
AuditLog.authorized(user, KILL_CONTAINER_REQUEST)
Page 27
© Hortonworks Inc. 2015
SASL: RFC4422
Page 28
© Hortonworks Inc.
REST: SPNEGO (+ Delegation tokens)
Page 29
• Jersey + java.net
• httpclient? “if lucky it'll work”
HADOOP-11825: Move timeline client
Jersey+Kerberos+UGI support into a public implementation
© Hortonworks Inc.
Testing
Page 30
© Hortonworks Inc.
Error messages to fear
Art: Andrés Álvarez Iglesias
Failure unspecified at GSS-API level (Checksum failed)
No valid credentials provided (Failed to find any Kerberos tgt)
Server not found in Kerberos database
Clock skew too great
Principal not found
No valid credentials provided (Illegal key size)
© Hortonworks Inc.
Topics Avoided Not Covered
• Zookeeper
• JAAS
• Trying to use HTTPS in a YARN application
• Trying to use Full REST in a YARN application
• System properties to debug Kerberos & SPNEGO
• Group management
• HADOOP_PROXY_USER
Page 32
© Hortonworks Inc.
gitbook.com/@steveloughran
Questions?
Art: Andrés Álvarez Iglesias
© Hortonworks Inc.
Zookeeper
• SASL to negotiate security:
System.setProperty("zookeeper.sasl.client", "true");
• Permissions are not transitive down the tree
Page 34
List<ACL> perms = new ArrayList<>();
if (UserGroupInformation.isSecurityEnabled()) {
perms(new ACL(ZooDefs.Perms.ALL, ZooDefs.Ids.AUTH_IDS));
perms.add(new ACL(ZooDefs.Perms.READ,ZooDefs.Ids.ANYONE_ID_UNSAFE));
} else {
perms.add(new ACL(ZooDefs.Perms.ALL, ZooDefs.Ids.ANYONE_ID_UNSAFE));
}
zk.createPath(path, null, perms, CreateMode.PERSISTENT);
© Hortonworks Inc
System Properties for debugging
-Dsun.security.krb5.debug=true
-Dsun.security.spnego.debug=true
export HADOOP_JAAS_DEBUG=true
Page 35
© Hortonworks Inc.
Services
• RPC authentication via annotations & metadata in JAR
• YARN Web UIs: rely on RM proxy for authentication
• Authentication != Authorization
• Add audit logs on service endpoints
• YARN services: come up with a token refresh strategy:
keytab everywhere; keytab in AM; update from client
Page 36
© Hortonworks Inc.
JAAS
• Java Authentication and Authorization Service
• Core Kerberos classes and types (Principal)
• Text files to configure
–Different for different JVMs
–Need to double escape  for windows paths
• UGI handles setting up a JAAS context & logging in
Page 37
© Hortonworks Inc.
Glossary
• Simple Authentication and Security Layer (SASL)
• GSSAPI Generic Security Service Application Program
Interface (RFC-2743+ others)
• JAAS: Java Authentication and Authorization Service
• Simple and Protected GSSAPI Negotiation Mechanism
(SPNEGO)
Page 38

More Related Content

What's hot

Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
DataWorks Summit
 
Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and Tomorrow
DataWorks Summit
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
DataWorks Summit
 

What's hot (20)

Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated HadoopHadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
 
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheConTechnical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
 
YARN Services
YARN ServicesYARN Services
YARN Services
 
Hadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117revHadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117rev
 
Apache Knox setup and hive and hdfs Access using KNOX
Apache Knox setup and hive and hdfs Access using KNOXApache Knox setup and hive and hdfs Access using KNOX
Apache Knox setup and hive and hdfs Access using KNOX
 
Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and Tomorrow
 
Hadoop Security Now and Future
Hadoop Security Now and FutureHadoop Security Now and Future
Hadoop Security Now and Future
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
CBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFSCBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFS
 
Hadoop ClusterClient Security Using Kerberos
Hadoop ClusterClient Security Using KerberosHadoop ClusterClient Security Using Kerberos
Hadoop ClusterClient Security Using Kerberos
 
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
 
Structor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop ClustersStructor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop Clusters
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Effective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant ClustersEffective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant Clusters
 

Viewers also liked

Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdf
Edureka!
 

Viewers also liked (18)

Hadoop Security: Overview
Hadoop Security: OverviewHadoop Security: Overview
Hadoop Security: Overview
 
Administer Hadoop Cluster
Administer Hadoop ClusterAdminister Hadoop Cluster
Administer Hadoop Cluster
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
 
Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster Access
 
Introduction to sentry
Introduction to sentryIntroduction to sentry
Introduction to sentry
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With Kerberos
 
Apache Sentry for Hadoop security
Apache Sentry for Hadoop securityApache Sentry for Hadoop security
Apache Sentry for Hadoop security
 
Deploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for HadoopDeploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for Hadoop
 
Hadoop admin
Hadoop adminHadoop admin
Hadoop admin
 
Introduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopIntroduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache Hadoop
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 
Sentry - An Introduction
Sentry - An Introduction Sentry - An Introduction
Sentry - An Introduction
 
Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdf
 
Kerberos, Token and Hadoop
Kerberos, Token and HadoopKerberos, Token and Hadoop
Kerberos, Token and Hadoop
 
Hadoop and Kerberos
Hadoop and KerberosHadoop and Kerberos
Hadoop and Kerberos
 
Kerberos
KerberosKerberos
Kerberos
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 

Similar to Hadoop and Kerberos: the Madness Beyond the Gate

Spark summit-east-dowling-feb2017-full
Spark summit-east-dowling-feb2017-fullSpark summit-east-dowling-feb2017-full
Spark summit-east-dowling-feb2017-full
Jim Dowling
 
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark Summit
 

Similar to Hadoop and Kerberos: the Madness Beyond the Gate (20)

Practical Kerberos
Practical KerberosPractical Kerberos
Practical Kerberos
 
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
 
Curb your insecurity with HDP
Curb your insecurity with HDPCurb your insecurity with HDP
Curb your insecurity with HDP
 
Practical Kerberos with Apache HBase
Practical Kerberos with Apache HBasePractical Kerberos with Apache HBase
Practical Kerberos with Apache HBase
 
HBaseConEast2016: Practical Kerberos with Apache HBase
HBaseConEast2016: Practical Kerberos with Apache HBaseHBaseConEast2016: Practical Kerberos with Apache HBase
HBaseConEast2016: Practical Kerberos with Apache HBase
 
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache HiveDiscover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the Beast
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.final
 
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxFortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
 
Securing Spark Applications
Securing Spark ApplicationsSecuring Spark Applications
Securing Spark Applications
 
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
 
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopDiscover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
 
Big Data Security with Hadoop
Big Data Security with HadoopBig Data Security with Hadoop
Big Data Security with Hadoop
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Spark summit-east-dowling-feb2017-full
Spark summit-east-dowling-feb2017-fullSpark summit-east-dowling-feb2017-full
Spark summit-east-dowling-feb2017-full
 
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
 
Hadoop Security
Hadoop SecurityHadoop Security
Hadoop Security
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in Hadoop
 
TriHUG October: Apache Ranger
TriHUG October: Apache RangerTriHUG October: Apache Ranger
TriHUG October: Apache Ranger
 

More from Steve Loughran

More from Steve Loughran (20)

Hadoop Vectored IO
Hadoop Vectored IOHadoop Vectored IO
Hadoop Vectored IO
 
The age of rename() is over
The age of rename() is overThe age of rename() is over
The age of rename() is over
 
What does Rename Do: (detailed version)
What does Rename Do: (detailed version)What does Rename Do: (detailed version)
What does Rename Do: (detailed version)
 
Put is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit EditionPut is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit Edition
 
@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!
 
PUT is the new rename()
PUT is the new rename()PUT is the new rename()
PUT is the new rename()
 
Extreme Programming Deployed
Extreme Programming DeployedExtreme Programming Deployed
Extreme Programming Deployed
 
Testing
TestingTesting
Testing
 
I hate mocking
I hate mockingI hate mocking
I hate mocking
 
What does rename() do?
What does rename() do?What does rename() do?
What does rename() do?
 
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and HiveDancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
 
Apache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User GroupApache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User Group
 
Spark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSpark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object stores
 
Hadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresHadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object Stores
 
Apache Spark and Object Stores
Apache Spark and Object StoresApache Spark and Object Stores
Apache Spark and Object Stores
 
Household INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony EraHousehold INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony Era
 
Slider: Applications on YARN
Slider: Applications on YARNSlider: Applications on YARN
Slider: Applications on YARN
 
Datacentre stack
Datacentre stackDatacentre stack
Datacentre stack
 
Overview of slider project
Overview of slider projectOverview of slider project
Overview of slider project
 
Help! My Hadoop doesn't work!
Help! My Hadoop doesn't work!Help! My Hadoop doesn't work!
Help! My Hadoop doesn't work!
 

Recently uploaded

Recently uploaded (20)

PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 

Hadoop and Kerberos: the Madness Beyond the Gate

  • 1. © Hortonworks Inc. 2015 Hadoop and Kerberos: The madness beyond the gate Steve Loughran stevel@hortonworks.com @steveloughran 2015
  • 2. Page 2 Me: Before Kerberos
  • 3. © Hortonworks Inc. Page 3 Me: After Kerberos
  • 4. © Hortonworks Inc. Page 4 HP Lovecraft Kerberos Evil lurking in New England MIT Project Athena Ancient, inhuman deities Kerberos Domain Controller Manuscripts to drive the reader insane IETF RFC 4120 Entities never spoken of aloud UserGroupInformation Doomed explorers of darkness You
  • 5. © Hortonworks Inc. 2015 Leave now if you want to retain your life of naïve innocence Page 5
  • 7. © Hortonworks Inc. 2015 export HADOOP_USER="root" Page 7
  • 8. © Hortonworks Inc. 2015 Modern Hadoop clusters are locked down through Kerberos Page 8
  • 9. © Hortonworks Inc. 2015 Discover Kerberos before Kerberos discovers you Page 9
  • 10. © Hortonworks Inc. 2015 Kerberos: the gateway to hell Page 10
  • 11. © Hortonworks Inc. This is not a metaphor Art: Andrés Álvarez Iglesias
  • 12. © Hortonworks Inc. 2015 KP Kerberos is the gateway Page 12 Authentication Service Ticket Granting Service Principal user@REALM user/hostname@REALM (P, TGS, n1) {KP.TGS, n1}KP, {ticket(P,TGS)} KTGS Ticket(P, TGS) = (TGS, P, tstart, tend, KPT) KP {KP.S, n2}KP, {ticket(P,S)} KS {auth(P)}KP.TGS,{ticket(P,TGS)}KTGS,S,n2 KTGS Kerberos Domain ControllerClient auth(P)KP.TGS = {P, time)}KP.TGS
  • 13. © Hortonworks Inc Every service is a principal alice@REALM bob@REALM oozie/ooziehost@REALM namenode/nn1@REALM hdfs/_HOST@REALM hdfs/r04s12@REALM hdfs/r04s13@REALM yarn/_HOST@REALM yarn/r04s12@REALM HTTP/_HOST@REALM Page 13 short names: alice bob oozie namenode hdfs yarn HTTP
  • 14. © Hortonworks Inc. Page 14 Entering the darkness
  • 15.
  • 17. © Hortonworks Inc. 2015 HDFS Bootstrap: Kerberos Login Page 17 shared keytab in /etc/hadoop log in to kerberos datanode/_HOST@REALM tickets for TGS namenode/nn@REALM
  • 18. © Hortonworks Inc. 2015 HDFS Bootstrap: DNs register with NN Page 18 shared keytab in /etc/hadoop DN registration Ticket for namenode/nn@REALM ExportedBlockKeys Request ticket for namenode/nn@REALM namenode/nn@REALM datanode/_HOST@REALM
  • 19. © Hortonworks Inc. Hadoop Tokens • Issued and tracked by individual services (HDFS, WebHDFS, Timeline Server, YARN RM, …) • Grant some form of access: Block tokens, Delegation Tokens • Can be passed on to other processes • Renewable via service APIs (RPC, HTTP) • Revocable in server via service APIs Page 19 read: O'Malley 2009, Hadoop Security Architecture
  • 20. © Hortonworks Inc. 2015 HDFS IO: Block Tokens Page 20 alice@REALM Obtain ticket for namenode/nn@REALM BlockToken BlockToken BlockToken: userId, (BlockPoolId, BlockId), keyId, expiryDate, access-modes namenode/nn@REALM open("file")
  • 21. © Hortonworks Inc. 2015 service/host@REALM Delegation Tokens delegate access Page 21 alice@REALM BlockToken HDFS Delegation Token BlockToken HDFS Delegation Token HDFS Delegation Token namenode/nn@REALM Token Obtain ticket for namenode/nn@REALM Request delegation token
  • 22. © Hortonworks Inc. 2015 Launch Context YARN app launch Page 22 alice@REALM HDFS Delegation Token HDFS resourcemanager/rm@REALM nodemanager/_HOST@REALMalice Launch Context AM/RM HDFS AM/RM HDFS HDFS HDFS AM/RM namenode/nn@REALM Obtain ticket for resourcemanager/rm@REALM Request delegation token AM/RM Token Obtain tickvet for namenode/nn@REALM AM/RM' AM/RM' AM/RM' Refresh AM/RM
  • 23. © Hortonworks Inc That which must not be named: UGI if(!UserGroupInformation.isSecurityEnabled()) { stayInALifeOfNaiveInnocence(); } else { sufferTheEnternalPainOfKerberos(); } UserGroupInformation.checkTGTAndReloginFromKeytab(); UserGroupInformation.getLoginUser() // principal logged in as UserGroupInformation.getCurrentUser() // principal acting as Page 23
  • 24. © Hortonworks Inc UGI.doAs() UserGroupInformation bob = UserGroupInformation.createProxyUser("bob", UserGroupInformation.getLoginUser()); FileSystem userFS = bob.doAs( new PrivilegedExceptionAction<FileSystem>() { public FileSystem run() throws Exception { return FileSystem.get(FileSystem.getDefaultUri(), conf); } }); Page 24
  • 25. © Hortonworks Inc Hadoop RPC @KerberosInfo(serverPrincipal = "my.kerberos.principal") public interface MyRpc extends VersionedProtocol { … } public class MyRpcPolicyProvider extends PolicyProvider { public Service[] getServices() { return new Service[] { new Service("my.protocol.acl", MyRpc.class) }; } } public class MyRpcSecurityInfo extends SecurityInfo { … } META-INF/services/org.apache.hadoop.security.SecurityInfo org.example.rpc.MyRpcSecurityInfo Page 25
  • 26. © Hortonworks Inc IPC Server: get the current user identity Messages.KillResponse killContainer(Messages.KillRequest request) { UserGroupInformation callerUGI; try { callerUGI = UserGroupInformation.getCurrentUser(); } catch (IOException ie) { LOG.info("Error getting UGI ", ie); AuditLogger.logFailure("UNKNOWN", "Error getting UGI"); throw RPCUtil.getRemoteException(ie); } … Page 26
  • 27. © Hortonworks Inc IPC Server: Authorize String user = callerUGI.getShortUserName(); if (!checkAccess(callerUGI, MODIFY)) { AuditLog.unauthorized(user, KILL_CONTAINER_REQUEST, "User doesn't have permissions to " + MODIFY); throw RPCUtil.getRemoteException( new AccessControlException( + user + " lacks access " + MODIFY_APP.name())); } AuditLog.authorized(user, KILL_CONTAINER_REQUEST) Page 27
  • 28. © Hortonworks Inc. 2015 SASL: RFC4422 Page 28
  • 29. © Hortonworks Inc. REST: SPNEGO (+ Delegation tokens) Page 29 • Jersey + java.net • httpclient? “if lucky it'll work” HADOOP-11825: Move timeline client Jersey+Kerberos+UGI support into a public implementation
  • 31. © Hortonworks Inc. Error messages to fear Art: Andrés Álvarez Iglesias Failure unspecified at GSS-API level (Checksum failed) No valid credentials provided (Failed to find any Kerberos tgt) Server not found in Kerberos database Clock skew too great Principal not found No valid credentials provided (Illegal key size)
  • 32. © Hortonworks Inc. Topics Avoided Not Covered • Zookeeper • JAAS • Trying to use HTTPS in a YARN application • Trying to use Full REST in a YARN application • System properties to debug Kerberos & SPNEGO • Group management • HADOOP_PROXY_USER Page 32
  • 34. © Hortonworks Inc. Zookeeper • SASL to negotiate security: System.setProperty("zookeeper.sasl.client", "true"); • Permissions are not transitive down the tree Page 34 List<ACL> perms = new ArrayList<>(); if (UserGroupInformation.isSecurityEnabled()) { perms(new ACL(ZooDefs.Perms.ALL, ZooDefs.Ids.AUTH_IDS)); perms.add(new ACL(ZooDefs.Perms.READ,ZooDefs.Ids.ANYONE_ID_UNSAFE)); } else { perms.add(new ACL(ZooDefs.Perms.ALL, ZooDefs.Ids.ANYONE_ID_UNSAFE)); } zk.createPath(path, null, perms, CreateMode.PERSISTENT);
  • 35. © Hortonworks Inc System Properties for debugging -Dsun.security.krb5.debug=true -Dsun.security.spnego.debug=true export HADOOP_JAAS_DEBUG=true Page 35
  • 36. © Hortonworks Inc. Services • RPC authentication via annotations & metadata in JAR • YARN Web UIs: rely on RM proxy for authentication • Authentication != Authorization • Add audit logs on service endpoints • YARN services: come up with a token refresh strategy: keytab everywhere; keytab in AM; update from client Page 36
  • 37. © Hortonworks Inc. JAAS • Java Authentication and Authorization Service • Core Kerberos classes and types (Principal) • Text files to configure –Different for different JVMs –Need to double escape for windows paths • UGI handles setting up a JAAS context & logging in Page 37
  • 38. © Hortonworks Inc. Glossary • Simple Authentication and Security Layer (SASL) • GSSAPI Generic Security Service Application Program Interface (RFC-2743+ others) • JAAS: Java Authentication and Authorization Service • Simple and Protected GSSAPI Negotiation Mechanism (SPNEGO) Page 38

Editor's Notes

  1. Th
  2. Enough people like Dunkin Donut's decaf coffee that you can buy it for home use —and supermarkets will stock it next to the MacDonalds coffee.
  3. This is your get out clause. Turn off encryption. Users are who they claim to be; the environment variable HADOOP_USER can change it on a whim.
  4. ..which is why production clusters are all locked down with kerberos. Callout: this doesn't cover authorization/access control (exception: Hadoop IPC acls), wire encryption, HTTPS or data encryption.
  5. So you can't ignore Kerberos. You only get a choice about when to encounter it -early on in your coding and testing -during final integration tests -in late night support calls.
  6. Photo: https://www.flickr.com/photos/doctorserone/4635167170/ Andrés Álvarez Iglesias
  7. The KDC is managed by the enterprise security team. They are either paranoid about security, or your organisation is 0wned by everyone from Anonymous to North Korea. They don't trust you, they don't trust Hadoop, and make the rest of the network ops people seem welcoming. You will need to work with these people.
  8. AuthenticatedURL DelegationTokenAuthenticatedURL org.apache.hadoop.hdfs.web.URLConnectionFactory org/apache/spark/deploy/history/yarn/rest in SPARK-1537
  9. There is a mini KDC, "MiniKDC" in the Hadoop codebase. I've used this in the YARN-913 registry work; its good for verifying that you got through the permissions logic, and for learning various acronyms. And at the end of the run you get tests that Jenkins can run every build. But I've embraced testing against kerberized VMs, where you do the work of creating keytabs, filling in the configuration files, requiring SPENGO authed web browsers, having your command line account kinit in regularly, services having tokens expire, etc. etc. Why? Because its what the real world is like. L
  10. Error messages with UGI are usually a sign of trouble Photo: https://www.flickr.com/photos/doctorserone/4635167170/ Andrés Álvarez Iglesias
  11. Photo: https://www.flickr.com/photos/doctorserone/4635167170/ Andrés Álvarez Iglesias