SlideShare a Scribd company logo
Hadoop Security
Building a fence around your Hadoop cluster
Lars Francke
June 12, 2017
Berlin Buzzwords 2017
Introduction
About me - Lars Francke
• Partner & Co-Founder at OpenCore
• Before that: EMEA Hadoop Consultant
• Hadoop since 2008/2009
• Apache Committer: Hive, ORC
• Contact:
• lars.francke@opencore.com
• @lars_francke
© 2017 OpenCore GmbH & Co. KG 1/50
Overview
Overview - Be Warned
?© 2017 OpenCore GmbH & Co. KG 2/50
The Book
Source: oreilly.com
© 2017 OpenCore GmbH & Co. KG 3/50
Other Books
Source: oreilly.com
© 2017 OpenCore GmbH & Co. KG 4/50
Other Sources
• Kerberos: The Definitive Guide1
• Kerberos (German)2
• Active Directory, 5th Edition3
• Hadoop and Kerberos: The Madness beyond the Gate4
• HBase: The Definitive Guide, 2nd Edition5
• Bulletproof SSL and TLS6
1http://shop.oreilly.com/product/9780596004033.do
2http://www.kerberos-buch.de/
3http://shop.oreilly.com/product/0636920023913.do
4https://www.gitbook.com/book/steveloughran/kerberos_and_hadoop/details
5http://shop.oreilly.com/product/0636920033943.do
6https://www.feistyduck.com/books/bulletproof-ssl-and-tls/
© 2017 OpenCore GmbH & Co. KG 5/50
Project Management
How is a typical project structured and where does
Hadoop security come into play?
© 2017 OpenCore GmbH & Co. KG 6/50
Project Management
• Idea/Initiation
© 2017 OpenCore GmbH & Co. KG 7/50
Project Management
• Idea/Initiation
• Planning/Design
© 2017 OpenCore GmbH & Co. KG 7/50
Project Management
• Idea/Initiation
• Planning/Design
• Execution/Implementation
© 2017 OpenCore GmbH & Co. KG 7/50
Project Management
• Idea/Initiation
• Planning/Design
• Execution/Implementation
• “Production”
© 2017 OpenCore GmbH & Co. KG 7/50
When To Start
You can not begin thinking
about Security too early!
© 2017 OpenCore GmbH & Co. KG 8/50
Next steps…
Let’s dig into the details of each step in a project
© 2017 OpenCore GmbH & Co. KG 9/50
Project Planning
What To Think About?
• What am I going to use the cluster for?
© 2017 OpenCore GmbH & Co. KG 10/50
What To Think About?
• What am I going to use the cluster for?
• Which tools am I going to need to achieve
the use-cases?
© 2017 OpenCore GmbH & Co. KG 10/50
What To Think About?
• What am I going to use the cluster for?
• Which tools am I going to need to achieve
the use-cases?
• What kind of data am I going to ingest, store
or process?
© 2017 OpenCore GmbH & Co. KG 10/50
What To Think About?
• What am I going to use the cluster for?
• Which tools am I going to need to achieve
the use-cases?
• What kind of data am I going to ingest, store
or process?
• What kind of corporate guidelines exist that
must be followed?
© 2017 OpenCore GmbH & Co. KG 10/50
Do Not Assume Anything
Do not assume companies/people know what a
product is capable of just because they seem
confident.
© 2017 OpenCore GmbH & Co. KG 11/50
What does Security even mean?
What is part of Hadoop Security?
© 2017 OpenCore GmbH & Co. KG 12/50
Parts Of A Security Concept
• Authentication of users
© 2017 OpenCore GmbH & Co. KG 13/50
Parts Of A Security Concept
• Authentication of users
• Authorization of users
© 2017 OpenCore GmbH & Co. KG 13/50
Parts Of A Security Concept
• Authentication of users
• Authorization of users
• Auditing
© 2017 OpenCore GmbH & Co. KG 13/50
Parts Of A Security Concept
• Authentication of users
• Authorization of users
• Auditing
• Data Protection: Encryption on-the-wire
© 2017 OpenCore GmbH & Co. KG 13/50
Parts Of A Security Concept
• Authentication of users
• Authorization of users
• Auditing
• Data Protection: Encryption on-the-wire
• Data Protection: Encryption at-rest
© 2017 OpenCore GmbH & Co. KG 13/50
Project Execution
Kerberos == Hadoop Security?
So, does Hadoop Security mean I need to “enable”
Kerberos and I’m done?
© 2017 OpenCore GmbH & Co. KG 14/50
Kerberos == Hadoop Security?
So, does Hadoop Security mean I need to “enable”
Kerberos and I’m done?
No! Far from it!
© 2017 OpenCore GmbH & Co. KG 14/50
Overview
Before you’re able to secure anything…
© 2017 OpenCore GmbH & Co. KG 15/50
Overview
Before you’re able to secure anything…
…something must be running that can be secured.
© 2017 OpenCore GmbH & Co. KG 15/50
Overview - contd.
© 2017 OpenCore GmbH & Co. KG 16/50
Network
© 2017 OpenCore GmbH & Co. KG 17/50
Host/Operating System
• SELinux
• NTP/Chrony
• Firewalls
• Antivirus
• Proxies
© 2017 OpenCore GmbH & Co. KG 18/50
Hadoop Installation
• You usually install a cluster manager first
© 2017 OpenCore GmbH & Co. KG 19/50
Hadoop Installation
• You usually install a cluster manager first
• Then you install Agents
© 2017 OpenCore GmbH & Co. KG 19/50
Hadoop Installation
• You usually install a cluster manager first
• Then you install Agents
• Followed by a distribution with lots of components…
© 2017 OpenCore GmbH & Co. KG 19/50
Authentication
• Core Hadoop requires Kerberos for strong authentication
• Multiple choices on how to implement that:
• Cluster-local standalone KDC
• Cluster-local standalone KDC with one-way cross-realm trust to a
central KDC
• Direct integration into a central KDC
© 2017 OpenCore GmbH & Co. KG 20/50
Authentication: Cloudera Manager/Ambari Wizards
Source: unsplash.com by Sirotorn Sumpunkulpak
© 2017 OpenCore GmbH & Co. KG 21/50
Authentication - contd.
What you need:
• Names/IPs for your KDC
• Supported Encryption Types (see my blog post7
for more details)
• Firewall must allow all cluster machines to access the KDC
• Potentially a bunch of information to configure krb5.conf
properly
7http://www.opencore.com/blog/2017/3/kerberos-encryption-types
© 2017 OpenCore GmbH & Co. KG 22/50
Authentication - contd.
Ideally you want the automatic option, but...
• You need an account in your AD/KDC that is allowed to create
other accounts!
• (AD only) You need to talk via LDAPS and make sure that you
have all the truststores set up properly
• (AD only) You cannot have multiple SPNs per User
• For SPNEGO you need a HTTP/<host> principal, sometimes
doesn’t match corporate guidelines
© 2017 OpenCore GmbH & Co. KG 23/50
Authentication - contd.
There’s a manual option…
© 2017 OpenCore GmbH & Co. KG 24/50
Authentication - contd.
Kerberos is only part of the authentication story.
© 2017 OpenCore GmbH & Co. KG 25/50
Authentication - contd.
Knox, Cloudera Manager, Ambari, Ranger, Sentry, Hive, Impala etc. all
have their own authentication layers which also support LDAP(S) or
other mechanisms.
© 2017 OpenCore GmbH & Co. KG 26/50
Identity Management
Your users need to exist on the workers so that YARN can start jobs
using setuid().
© 2017 OpenCore GmbH & Co. KG 27/50
Identity Management - contd.
• This is usually done using a tool like SSSD (free) or Centrify
(commercial)
© 2017 OpenCore GmbH & Co. KG 28/50
Identity Management - contd.
• This is usually done using a tool like SSSD (free) or Centrify
(commercial)
• When using Centrify, configure it to not create the default HTTP
principal for each server
© 2017 OpenCore GmbH & Co. KG 28/50
Identity Management - contd.
• This is usually done using a tool like SSSD (free) or Centrify
(commercial)
• When using Centrify, configure it to not create the default HTTP
principal for each server
• This does not mean that your users need to be able to SSH into
the workers, quite the opposite
© 2017 OpenCore GmbH & Co. KG 28/50
Identity Management - contd.
• You need to have the details needed to fetch user & group
information from LDAP(S)
• Some solutions require a domain join
© 2017 OpenCore GmbH & Co. KG 29/50
Impersonation/Proxy Users
© 2017 OpenCore GmbH & Co. KG 30/50
Authorization
• All tools have some form of authorization built-in
• Core Hadoop components have a first line defense: Service
Level Authorizations
• Ranger & Sentry promise cross-cutting RBAC functionality
© 2017 OpenCore GmbH & Co. KG 31/50
Data Protection: Encryption At-Rest
HDFS Transparent Encryption
All I need is a KMS, right?
© 2017 OpenCore GmbH & Co. KG 32/50
Data Protection: Encryption At-Rest - contd.
• KMS should be “close” to your clients and NameNode
• KMS should be separately administered
• AFAIK only Hadoop 3 will allow the / (root) directory to be its
own zone and have sub-zones
• Secure access to the backing store & host where KMS runs
• You can use a HSM
• What about Impersonation and things like Knox or HttpFS?
© 2017 OpenCore GmbH & Co. KG 33/50
Data Protection: Encryption At Rest - contd.
Now that my data in HDFS is secure I’m good, right?
• Metadata in databases
• Loq & Audit files
• YARN localized stuff
• Temporary files
• (Spark) spill files & cached data
Only Cloudera has a (paid) solution for this: Cloudera NavEncrypt
© 2017 OpenCore GmbH & Co. KG 34/50
Data Protection/Authorization: Data Masking
• Static vs. dynamic masking
• Ranger & BigSQL support masking, Sentry does not
• RecordService might be a potential integration point
• 3rd party tools
© 2017 OpenCore GmbH & Co. KG 35/50
Data Protection: Encryption On-The-Wire
• Web Interfaces
• REST/Thrift Interfaces
• MapReduce Shuffle
• Spark Shuffle (only in 2.1+)
• Server to Agent communication
• Ingest/Egress tools (Flume, Informatica, Kafka, …)
• RPC
• HDFS Data traffic
• Traffic from your YARN apps
© 2017 OpenCore GmbH & Co. KG 36/50
Data Protection: Encryption On-The-Wire
• 2-way TLS possible?
• Which cipher suites are supported?
• Some tools automatically disable HTTP others don’t!
• Missing documentation all around
© 2017 OpenCore GmbH & Co. KG 37/50
Auditing & Logging
• No documentation (on auditing events and lots of other things)
• Ranger aggregates audit logs, async by default
• Cloudera has Navigator
• No integrity protection
• Need to protect against admins
© 2017 OpenCore GmbH & Co. KG 38/50
3rd Party Tools
• Not a single product has good documentation
• How do they access the cluster?
• Do they themselves authenticate and authorize users? How?
• Auditing/Logging?
© 2017 OpenCore GmbH & Co. KG 39/50
Production
What now?
So you’ve finished the project and handed it over to the operations
team…what now?
© 2017 OpenCore GmbH & Co. KG 40/50
Keep it running
• Plan regular updates to your OS
• Use a Configuration Management tool (e.g. Ansible)
• Plan regular updates to your distribution!
© 2017 OpenCore GmbH & Co. KG 41/50
Monitoring
All our logging & auditing is irrelevant if you’re not monitoring &
alerting on those logs.
© 2017 OpenCore GmbH & Co. KG 42/50
Environments
You need a place where you can test upgrades.
You might also need a place for backups.
Devs would like a cluster as well…
© 2017 OpenCore GmbH & Co. KG 43/50
Environments
© 2017 OpenCore GmbH & Co. KG 44/50
Environments
© 2017 OpenCore GmbH & Co. KG 45/50
User Lifecycle
• What happens when a user leaves your organization?
• Caches
• Ranger Usersync
• Existing sessions
• Shared accounts
© 2017 OpenCore GmbH & Co. KG 46/50
Misc
Cloud?
What about the cloud providers?
© 2017 OpenCore GmbH & Co. KG 47/50
Distributions
• Cloudera
© 2017 OpenCore GmbH & Co. KG 48/50
Distributions
• Cloudera
• Hortonworks
© 2017 OpenCore GmbH & Co. KG 48/50
Distributions
• Cloudera
• Hortonworks
• IBM
© 2017 OpenCore GmbH & Co. KG 48/50
Distributions
• Cloudera
• Hortonworks
• IBM
• Microsoft HDInsight
© 2017 OpenCore GmbH & Co. KG 48/50
Thank You!
Thank you for listening!
© 2017 OpenCore GmbH & Co. KG 49/50
Questions & Contact
Questions?
Contact me at:
• lars.francke@opencore.com
• @lars_francke
Visit us at opencore.com
And if this stuff interests you: We’re looking to expand our team!
© 2017 OpenCore GmbH & Co. KG 50/50

More Related Content

What's hot

Cloud Foundry, the Open Platform As A Service
Cloud Foundry, the Open Platform As A ServiceCloud Foundry, the Open Platform As A Service
Cloud Foundry, the Open Platform As A Service
Patrick Chanezon
 
cross cloud inter-operability with iPaaS and serverless for Telco cloud SDN/NFV
cross cloud inter-operability with iPaaS and serverless for Telco cloud SDN/NFVcross cloud inter-operability with iPaaS and serverless for Telco cloud SDN/NFV
cross cloud inter-operability with iPaaS and serverless for Telco cloud SDN/NFV
Krishna-Kumar
 
Unlocking the Cloud Operating Model: The Provisioning Strategy
Unlocking the Cloud Operating Model: The Provisioning StrategyUnlocking the Cloud Operating Model: The Provisioning Strategy
Unlocking the Cloud Operating Model: The Provisioning Strategy
Mitchell Pronschinske
 
HP Helion Webinar #4 - Open stack the magic pill
HP Helion Webinar #4 - Open stack the magic pillHP Helion Webinar #4 - Open stack the magic pill
HP Helion Webinar #4 - Open stack the magic pill
BeMyApp
 
BayInfotech (BIT) ACI Portfolio
BayInfotech (BIT) ACI PortfolioBayInfotech (BIT) ACI Portfolio
BayInfotech (BIT) ACI Portfolio
Maulik Shyani
 
HP Helion - Copaco Cloud Event 2015 (break-out 4)
HP Helion - Copaco Cloud Event 2015 (break-out 4)HP Helion - Copaco Cloud Event 2015 (break-out 4)
HP Helion - Copaco Cloud Event 2015 (break-out 4)
Copaco Nederland
 
Kafka summit apac session
Kafka summit apac sessionKafka summit apac session
Kafka summit apac session
Christina Lin
 
Hashicorp Terraform Open Source vs Enterprise
Hashicorp Terraform Open Source vs EnterpriseHashicorp Terraform Open Source vs Enterprise
Hashicorp Terraform Open Source vs Enterprise
Stenio Ferreira
 
IoT Scale Event-Stream Processing for Connected Fleet at Penske
IoT Scale Event-Stream Processing for Connected Fleet at PenskeIoT Scale Event-Stream Processing for Connected Fleet at Penske
IoT Scale Event-Stream Processing for Connected Fleet at Penske
VMware Tanzu
 
Introduction to ibm cloud paks concept license and minimum config public
Introduction to ibm cloud paks concept license and minimum config publicIntroduction to ibm cloud paks concept license and minimum config public
Introduction to ibm cloud paks concept license and minimum config public
Petchpaitoon Krungwong
 
CF SUMMIT: Partnerships, Business and Cloud Foundry
CF SUMMIT: Partnerships, Business and Cloud FoundryCF SUMMIT: Partnerships, Business and Cloud Foundry
CF SUMMIT: Partnerships, Business and Cloud Foundry
Nima Badiey
 
HP Helion OpenStack Community Edition Deployment
HP Helion OpenStack Community Edition DeploymentHP Helion OpenStack Community Edition Deployment
HP Helion OpenStack Community Edition Deployment
Marton Kiss
 
Pulling Back the Curtain – CloudStack in Private and Community Clouds
Pulling Back the Curtain –CloudStack in Private and Community CloudsPulling Back the Curtain –CloudStack in Private and Community Clouds
Pulling Back the Curtain – CloudStack in Private and Community Clouds
Chip Childers
 
HP Helion Webinar #2
HP Helion Webinar #2 HP Helion Webinar #2
HP Helion Webinar #2
BeMyApp
 
Welcome to the Multi-cloud world
Welcome to the Multi-cloud worldWelcome to the Multi-cloud world
Welcome to the Multi-cloud world
Lew Tucker
 
Open stack the road ahead
Open stack   the road aheadOpen stack   the road ahead
Open stack the road ahead
Lew Tucker
 
Cloud Foundry and MongoDB
Cloud Foundry and MongoDBCloud Foundry and MongoDB
Cloud Foundry and MongoDB
Jake Peyser
 
Best Practices and Hard Lessons of Serverless- AWS Startup Day Toronto- Diego...
Best Practices and Hard Lessons of Serverless- AWS Startup Day Toronto- Diego...Best Practices and Hard Lessons of Serverless- AWS Startup Day Toronto- Diego...
Best Practices and Hard Lessons of Serverless- AWS Startup Day Toronto- Diego...
Amazon Web Services
 
Breaking the Monolith
Breaking the MonolithBreaking the Monolith
Breaking the Monolith
VMware Tanzu
 
HP Helion Webinar #1 - Introduction to HP Helion OpenStack w/Christian Frank
HP Helion Webinar #1 - Introduction to HP Helion OpenStack w/Christian FrankHP Helion Webinar #1 - Introduction to HP Helion OpenStack w/Christian Frank
HP Helion Webinar #1 - Introduction to HP Helion OpenStack w/Christian Frank
BeMyApp
 

What's hot (20)

Cloud Foundry, the Open Platform As A Service
Cloud Foundry, the Open Platform As A ServiceCloud Foundry, the Open Platform As A Service
Cloud Foundry, the Open Platform As A Service
 
cross cloud inter-operability with iPaaS and serverless for Telco cloud SDN/NFV
cross cloud inter-operability with iPaaS and serverless for Telco cloud SDN/NFVcross cloud inter-operability with iPaaS and serverless for Telco cloud SDN/NFV
cross cloud inter-operability with iPaaS and serverless for Telco cloud SDN/NFV
 
Unlocking the Cloud Operating Model: The Provisioning Strategy
Unlocking the Cloud Operating Model: The Provisioning StrategyUnlocking the Cloud Operating Model: The Provisioning Strategy
Unlocking the Cloud Operating Model: The Provisioning Strategy
 
HP Helion Webinar #4 - Open stack the magic pill
HP Helion Webinar #4 - Open stack the magic pillHP Helion Webinar #4 - Open stack the magic pill
HP Helion Webinar #4 - Open stack the magic pill
 
BayInfotech (BIT) ACI Portfolio
BayInfotech (BIT) ACI PortfolioBayInfotech (BIT) ACI Portfolio
BayInfotech (BIT) ACI Portfolio
 
HP Helion - Copaco Cloud Event 2015 (break-out 4)
HP Helion - Copaco Cloud Event 2015 (break-out 4)HP Helion - Copaco Cloud Event 2015 (break-out 4)
HP Helion - Copaco Cloud Event 2015 (break-out 4)
 
Kafka summit apac session
Kafka summit apac sessionKafka summit apac session
Kafka summit apac session
 
Hashicorp Terraform Open Source vs Enterprise
Hashicorp Terraform Open Source vs EnterpriseHashicorp Terraform Open Source vs Enterprise
Hashicorp Terraform Open Source vs Enterprise
 
IoT Scale Event-Stream Processing for Connected Fleet at Penske
IoT Scale Event-Stream Processing for Connected Fleet at PenskeIoT Scale Event-Stream Processing for Connected Fleet at Penske
IoT Scale Event-Stream Processing for Connected Fleet at Penske
 
Introduction to ibm cloud paks concept license and minimum config public
Introduction to ibm cloud paks concept license and minimum config publicIntroduction to ibm cloud paks concept license and minimum config public
Introduction to ibm cloud paks concept license and minimum config public
 
CF SUMMIT: Partnerships, Business and Cloud Foundry
CF SUMMIT: Partnerships, Business and Cloud FoundryCF SUMMIT: Partnerships, Business and Cloud Foundry
CF SUMMIT: Partnerships, Business and Cloud Foundry
 
HP Helion OpenStack Community Edition Deployment
HP Helion OpenStack Community Edition DeploymentHP Helion OpenStack Community Edition Deployment
HP Helion OpenStack Community Edition Deployment
 
Pulling Back the Curtain – CloudStack in Private and Community Clouds
Pulling Back the Curtain –CloudStack in Private and Community CloudsPulling Back the Curtain –CloudStack in Private and Community Clouds
Pulling Back the Curtain – CloudStack in Private and Community Clouds
 
HP Helion Webinar #2
HP Helion Webinar #2 HP Helion Webinar #2
HP Helion Webinar #2
 
Welcome to the Multi-cloud world
Welcome to the Multi-cloud worldWelcome to the Multi-cloud world
Welcome to the Multi-cloud world
 
Open stack the road ahead
Open stack   the road aheadOpen stack   the road ahead
Open stack the road ahead
 
Cloud Foundry and MongoDB
Cloud Foundry and MongoDBCloud Foundry and MongoDB
Cloud Foundry and MongoDB
 
Best Practices and Hard Lessons of Serverless- AWS Startup Day Toronto- Diego...
Best Practices and Hard Lessons of Serverless- AWS Startup Day Toronto- Diego...Best Practices and Hard Lessons of Serverless- AWS Startup Day Toronto- Diego...
Best Practices and Hard Lessons of Serverless- AWS Startup Day Toronto- Diego...
 
Breaking the Monolith
Breaking the MonolithBreaking the Monolith
Breaking the Monolith
 
HP Helion Webinar #1 - Introduction to HP Helion OpenStack w/Christian Frank
HP Helion Webinar #1 - Introduction to HP Helion OpenStack w/Christian FrankHP Helion Webinar #1 - Introduction to HP Helion OpenStack w/Christian Frank
HP Helion Webinar #1 - Introduction to HP Helion OpenStack w/Christian Frank
 

Similar to Building a fence around your Hadoop cluster

Secure and Convenient Workflows: Integrating HashiCorp Vault with Pivotal Clo...
Secure and Convenient Workflows: Integrating HashiCorp Vault with Pivotal Clo...Secure and Convenient Workflows: Integrating HashiCorp Vault with Pivotal Clo...
Secure and Convenient Workflows: Integrating HashiCorp Vault with Pivotal Clo...
Amanda MacLeod
 
Secure and Convenient Workflows: Integrating HashiCorp Vault with Pivotal Clo...
Secure and Convenient Workflows: Integrating HashiCorp Vault with Pivotal Clo...Secure and Convenient Workflows: Integrating HashiCorp Vault with Pivotal Clo...
Secure and Convenient Workflows: Integrating HashiCorp Vault with Pivotal Clo...
Amanda MacLeod
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
Shravan (Sean) Pabba
 
The Rise of DataOps: Making Big Data Bite Size with DataOps
The Rise of DataOps: Making Big Data Bite Size with DataOpsThe Rise of DataOps: Making Big Data Bite Size with DataOps
The Rise of DataOps: Making Big Data Bite Size with DataOps
Delphix
 
Secure and Convenient Workflows: Integrating HashiCorp Vault with Pivotal Clo...
Secure and Convenient Workflows: Integrating HashiCorp Vault with Pivotal Clo...Secure and Convenient Workflows: Integrating HashiCorp Vault with Pivotal Clo...
Secure and Convenient Workflows: Integrating HashiCorp Vault with Pivotal Clo...
Stenio Ferreira
 
Q Con New York 2015 Presentation - Conjur
Q Con New York 2015 Presentation - ConjurQ Con New York 2015 Presentation - Conjur
Q Con New York 2015 Presentation - Conjur
conjur_inc
 
Covert Attack Mystery Box: A few novel techniques for exploiting Microsoft “f...
Covert Attack Mystery Box: A few novel techniques for exploiting Microsoft “f...Covert Attack Mystery Box: A few novel techniques for exploiting Microsoft “f...
Covert Attack Mystery Box: A few novel techniques for exploiting Microsoft “f...
Beau Bullock
 
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by ClouderaBig Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Caserta
 
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding EdgeCIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CloudIDSummit
 
Cloudy with a Chance of Databases
Cloudy with a Chance of DatabasesCloudy with a Chance of Databases
Cloudy with a Chance of Databases
Kellyn Pot'Vin-Gorman
 
New DevOps for the DBA
New DevOps for the DBANew DevOps for the DBA
New DevOps for the DBA
Kellyn Pot'Vin-Gorman
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
DataWorks Summit
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
Great Wide Open
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
Rommel Garcia
 
Application security meetup - cloud security best practices 24062021
Application security meetup - cloud security best practices 24062021Application security meetup - cloud security best practices 24062021
Application security meetup - cloud security best practices 24062021
lior mazor
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
Kashif Khan
 
Using CredHub for Kubernetes Deployments
Using CredHub for Kubernetes DeploymentsUsing CredHub for Kubernetes Deployments
Using CredHub for Kubernetes Deployments
VMware Tanzu
 
Vault Open Source vs Enterprise v2
Vault Open Source vs Enterprise v2Vault Open Source vs Enterprise v2
Vault Open Source vs Enterprise v2
Stenio Ferreira
 
Rails security: above and beyond the defaults
Rails security: above and beyond the defaultsRails security: above and beyond the defaults
Rails security: above and beyond the defaults
Matias Korhonen
 
13 practical tips for writing secure golang applications
13 practical tips for writing secure golang applications13 practical tips for writing secure golang applications
13 practical tips for writing secure golang applications
Karthik Gaekwad
 

Similar to Building a fence around your Hadoop cluster (20)

Secure and Convenient Workflows: Integrating HashiCorp Vault with Pivotal Clo...
Secure and Convenient Workflows: Integrating HashiCorp Vault with Pivotal Clo...Secure and Convenient Workflows: Integrating HashiCorp Vault with Pivotal Clo...
Secure and Convenient Workflows: Integrating HashiCorp Vault with Pivotal Clo...
 
Secure and Convenient Workflows: Integrating HashiCorp Vault with Pivotal Clo...
Secure and Convenient Workflows: Integrating HashiCorp Vault with Pivotal Clo...Secure and Convenient Workflows: Integrating HashiCorp Vault with Pivotal Clo...
Secure and Convenient Workflows: Integrating HashiCorp Vault with Pivotal Clo...
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
 
The Rise of DataOps: Making Big Data Bite Size with DataOps
The Rise of DataOps: Making Big Data Bite Size with DataOpsThe Rise of DataOps: Making Big Data Bite Size with DataOps
The Rise of DataOps: Making Big Data Bite Size with DataOps
 
Secure and Convenient Workflows: Integrating HashiCorp Vault with Pivotal Clo...
Secure and Convenient Workflows: Integrating HashiCorp Vault with Pivotal Clo...Secure and Convenient Workflows: Integrating HashiCorp Vault with Pivotal Clo...
Secure and Convenient Workflows: Integrating HashiCorp Vault with Pivotal Clo...
 
Q Con New York 2015 Presentation - Conjur
Q Con New York 2015 Presentation - ConjurQ Con New York 2015 Presentation - Conjur
Q Con New York 2015 Presentation - Conjur
 
Covert Attack Mystery Box: A few novel techniques for exploiting Microsoft “f...
Covert Attack Mystery Box: A few novel techniques for exploiting Microsoft “f...Covert Attack Mystery Box: A few novel techniques for exploiting Microsoft “f...
Covert Attack Mystery Box: A few novel techniques for exploiting Microsoft “f...
 
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by ClouderaBig Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
 
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding EdgeCIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
 
Cloudy with a Chance of Databases
Cloudy with a Chance of DatabasesCloudy with a Chance of Databases
Cloudy with a Chance of Databases
 
New DevOps for the DBA
New DevOps for the DBANew DevOps for the DBA
New DevOps for the DBA
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Application security meetup - cloud security best practices 24062021
Application security meetup - cloud security best practices 24062021Application security meetup - cloud security best practices 24062021
Application security meetup - cloud security best practices 24062021
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Using CredHub for Kubernetes Deployments
Using CredHub for Kubernetes DeploymentsUsing CredHub for Kubernetes Deployments
Using CredHub for Kubernetes Deployments
 
Vault Open Source vs Enterprise v2
Vault Open Source vs Enterprise v2Vault Open Source vs Enterprise v2
Vault Open Source vs Enterprise v2
 
Rails security: above and beyond the defaults
Rails security: above and beyond the defaultsRails security: above and beyond the defaults
Rails security: above and beyond the defaults
 
13 practical tips for writing secure golang applications
13 practical tips for writing secure golang applications13 practical tips for writing secure golang applications
13 practical tips for writing secure golang applications
 

Recently uploaded

Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
mz5nrf0n
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
pavan998932
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
Hironori Washizaki
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
Philip Schwarz
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 

Recently uploaded (20)

Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 

Building a fence around your Hadoop cluster

  • 1. Hadoop Security Building a fence around your Hadoop cluster Lars Francke June 12, 2017 Berlin Buzzwords 2017
  • 3. About me - Lars Francke • Partner & Co-Founder at OpenCore • Before that: EMEA Hadoop Consultant • Hadoop since 2008/2009 • Apache Committer: Hive, ORC • Contact: • lars.francke@opencore.com • @lars_francke © 2017 OpenCore GmbH & Co. KG 1/50
  • 5. Overview - Be Warned ?© 2017 OpenCore GmbH & Co. KG 2/50
  • 6. The Book Source: oreilly.com © 2017 OpenCore GmbH & Co. KG 3/50
  • 7. Other Books Source: oreilly.com © 2017 OpenCore GmbH & Co. KG 4/50
  • 8. Other Sources • Kerberos: The Definitive Guide1 • Kerberos (German)2 • Active Directory, 5th Edition3 • Hadoop and Kerberos: The Madness beyond the Gate4 • HBase: The Definitive Guide, 2nd Edition5 • Bulletproof SSL and TLS6 1http://shop.oreilly.com/product/9780596004033.do 2http://www.kerberos-buch.de/ 3http://shop.oreilly.com/product/0636920023913.do 4https://www.gitbook.com/book/steveloughran/kerberos_and_hadoop/details 5http://shop.oreilly.com/product/0636920033943.do 6https://www.feistyduck.com/books/bulletproof-ssl-and-tls/ © 2017 OpenCore GmbH & Co. KG 5/50
  • 9. Project Management How is a typical project structured and where does Hadoop security come into play? © 2017 OpenCore GmbH & Co. KG 6/50
  • 10. Project Management • Idea/Initiation © 2017 OpenCore GmbH & Co. KG 7/50
  • 11. Project Management • Idea/Initiation • Planning/Design © 2017 OpenCore GmbH & Co. KG 7/50
  • 12. Project Management • Idea/Initiation • Planning/Design • Execution/Implementation © 2017 OpenCore GmbH & Co. KG 7/50
  • 13. Project Management • Idea/Initiation • Planning/Design • Execution/Implementation • “Production” © 2017 OpenCore GmbH & Co. KG 7/50
  • 14. When To Start You can not begin thinking about Security too early! © 2017 OpenCore GmbH & Co. KG 8/50
  • 15. Next steps… Let’s dig into the details of each step in a project © 2017 OpenCore GmbH & Co. KG 9/50
  • 17. What To Think About? • What am I going to use the cluster for? © 2017 OpenCore GmbH & Co. KG 10/50
  • 18. What To Think About? • What am I going to use the cluster for? • Which tools am I going to need to achieve the use-cases? © 2017 OpenCore GmbH & Co. KG 10/50
  • 19. What To Think About? • What am I going to use the cluster for? • Which tools am I going to need to achieve the use-cases? • What kind of data am I going to ingest, store or process? © 2017 OpenCore GmbH & Co. KG 10/50
  • 20. What To Think About? • What am I going to use the cluster for? • Which tools am I going to need to achieve the use-cases? • What kind of data am I going to ingest, store or process? • What kind of corporate guidelines exist that must be followed? © 2017 OpenCore GmbH & Co. KG 10/50
  • 21. Do Not Assume Anything Do not assume companies/people know what a product is capable of just because they seem confident. © 2017 OpenCore GmbH & Co. KG 11/50
  • 22. What does Security even mean? What is part of Hadoop Security? © 2017 OpenCore GmbH & Co. KG 12/50
  • 23. Parts Of A Security Concept • Authentication of users © 2017 OpenCore GmbH & Co. KG 13/50
  • 24. Parts Of A Security Concept • Authentication of users • Authorization of users © 2017 OpenCore GmbH & Co. KG 13/50
  • 25. Parts Of A Security Concept • Authentication of users • Authorization of users • Auditing © 2017 OpenCore GmbH & Co. KG 13/50
  • 26. Parts Of A Security Concept • Authentication of users • Authorization of users • Auditing • Data Protection: Encryption on-the-wire © 2017 OpenCore GmbH & Co. KG 13/50
  • 27. Parts Of A Security Concept • Authentication of users • Authorization of users • Auditing • Data Protection: Encryption on-the-wire • Data Protection: Encryption at-rest © 2017 OpenCore GmbH & Co. KG 13/50
  • 29. Kerberos == Hadoop Security? So, does Hadoop Security mean I need to “enable” Kerberos and I’m done? © 2017 OpenCore GmbH & Co. KG 14/50
  • 30. Kerberos == Hadoop Security? So, does Hadoop Security mean I need to “enable” Kerberos and I’m done? No! Far from it! © 2017 OpenCore GmbH & Co. KG 14/50
  • 31. Overview Before you’re able to secure anything… © 2017 OpenCore GmbH & Co. KG 15/50
  • 32. Overview Before you’re able to secure anything… …something must be running that can be secured. © 2017 OpenCore GmbH & Co. KG 15/50
  • 33. Overview - contd. © 2017 OpenCore GmbH & Co. KG 16/50
  • 34. Network © 2017 OpenCore GmbH & Co. KG 17/50
  • 35. Host/Operating System • SELinux • NTP/Chrony • Firewalls • Antivirus • Proxies © 2017 OpenCore GmbH & Co. KG 18/50
  • 36. Hadoop Installation • You usually install a cluster manager first © 2017 OpenCore GmbH & Co. KG 19/50
  • 37. Hadoop Installation • You usually install a cluster manager first • Then you install Agents © 2017 OpenCore GmbH & Co. KG 19/50
  • 38. Hadoop Installation • You usually install a cluster manager first • Then you install Agents • Followed by a distribution with lots of components… © 2017 OpenCore GmbH & Co. KG 19/50
  • 39. Authentication • Core Hadoop requires Kerberos for strong authentication • Multiple choices on how to implement that: • Cluster-local standalone KDC • Cluster-local standalone KDC with one-way cross-realm trust to a central KDC • Direct integration into a central KDC © 2017 OpenCore GmbH & Co. KG 20/50
  • 40. Authentication: Cloudera Manager/Ambari Wizards Source: unsplash.com by Sirotorn Sumpunkulpak © 2017 OpenCore GmbH & Co. KG 21/50
  • 41. Authentication - contd. What you need: • Names/IPs for your KDC • Supported Encryption Types (see my blog post7 for more details) • Firewall must allow all cluster machines to access the KDC • Potentially a bunch of information to configure krb5.conf properly 7http://www.opencore.com/blog/2017/3/kerberos-encryption-types © 2017 OpenCore GmbH & Co. KG 22/50
  • 42. Authentication - contd. Ideally you want the automatic option, but... • You need an account in your AD/KDC that is allowed to create other accounts! • (AD only) You need to talk via LDAPS and make sure that you have all the truststores set up properly • (AD only) You cannot have multiple SPNs per User • For SPNEGO you need a HTTP/<host> principal, sometimes doesn’t match corporate guidelines © 2017 OpenCore GmbH & Co. KG 23/50
  • 43. Authentication - contd. There’s a manual option… © 2017 OpenCore GmbH & Co. KG 24/50
  • 44. Authentication - contd. Kerberos is only part of the authentication story. © 2017 OpenCore GmbH & Co. KG 25/50
  • 45. Authentication - contd. Knox, Cloudera Manager, Ambari, Ranger, Sentry, Hive, Impala etc. all have their own authentication layers which also support LDAP(S) or other mechanisms. © 2017 OpenCore GmbH & Co. KG 26/50
  • 46. Identity Management Your users need to exist on the workers so that YARN can start jobs using setuid(). © 2017 OpenCore GmbH & Co. KG 27/50
  • 47. Identity Management - contd. • This is usually done using a tool like SSSD (free) or Centrify (commercial) © 2017 OpenCore GmbH & Co. KG 28/50
  • 48. Identity Management - contd. • This is usually done using a tool like SSSD (free) or Centrify (commercial) • When using Centrify, configure it to not create the default HTTP principal for each server © 2017 OpenCore GmbH & Co. KG 28/50
  • 49. Identity Management - contd. • This is usually done using a tool like SSSD (free) or Centrify (commercial) • When using Centrify, configure it to not create the default HTTP principal for each server • This does not mean that your users need to be able to SSH into the workers, quite the opposite © 2017 OpenCore GmbH & Co. KG 28/50
  • 50. Identity Management - contd. • You need to have the details needed to fetch user & group information from LDAP(S) • Some solutions require a domain join © 2017 OpenCore GmbH & Co. KG 29/50
  • 51. Impersonation/Proxy Users © 2017 OpenCore GmbH & Co. KG 30/50
  • 52. Authorization • All tools have some form of authorization built-in • Core Hadoop components have a first line defense: Service Level Authorizations • Ranger & Sentry promise cross-cutting RBAC functionality © 2017 OpenCore GmbH & Co. KG 31/50
  • 53. Data Protection: Encryption At-Rest HDFS Transparent Encryption All I need is a KMS, right? © 2017 OpenCore GmbH & Co. KG 32/50
  • 54. Data Protection: Encryption At-Rest - contd. • KMS should be “close” to your clients and NameNode • KMS should be separately administered • AFAIK only Hadoop 3 will allow the / (root) directory to be its own zone and have sub-zones • Secure access to the backing store & host where KMS runs • You can use a HSM • What about Impersonation and things like Knox or HttpFS? © 2017 OpenCore GmbH & Co. KG 33/50
  • 55. Data Protection: Encryption At Rest - contd. Now that my data in HDFS is secure I’m good, right? • Metadata in databases • Loq & Audit files • YARN localized stuff • Temporary files • (Spark) spill files & cached data Only Cloudera has a (paid) solution for this: Cloudera NavEncrypt © 2017 OpenCore GmbH & Co. KG 34/50
  • 56. Data Protection/Authorization: Data Masking • Static vs. dynamic masking • Ranger & BigSQL support masking, Sentry does not • RecordService might be a potential integration point • 3rd party tools © 2017 OpenCore GmbH & Co. KG 35/50
  • 57. Data Protection: Encryption On-The-Wire • Web Interfaces • REST/Thrift Interfaces • MapReduce Shuffle • Spark Shuffle (only in 2.1+) • Server to Agent communication • Ingest/Egress tools (Flume, Informatica, Kafka, …) • RPC • HDFS Data traffic • Traffic from your YARN apps © 2017 OpenCore GmbH & Co. KG 36/50
  • 58. Data Protection: Encryption On-The-Wire • 2-way TLS possible? • Which cipher suites are supported? • Some tools automatically disable HTTP others don’t! • Missing documentation all around © 2017 OpenCore GmbH & Co. KG 37/50
  • 59. Auditing & Logging • No documentation (on auditing events and lots of other things) • Ranger aggregates audit logs, async by default • Cloudera has Navigator • No integrity protection • Need to protect against admins © 2017 OpenCore GmbH & Co. KG 38/50
  • 60. 3rd Party Tools • Not a single product has good documentation • How do they access the cluster? • Do they themselves authenticate and authorize users? How? • Auditing/Logging? © 2017 OpenCore GmbH & Co. KG 39/50
  • 62. What now? So you’ve finished the project and handed it over to the operations team…what now? © 2017 OpenCore GmbH & Co. KG 40/50
  • 63. Keep it running • Plan regular updates to your OS • Use a Configuration Management tool (e.g. Ansible) • Plan regular updates to your distribution! © 2017 OpenCore GmbH & Co. KG 41/50
  • 64. Monitoring All our logging & auditing is irrelevant if you’re not monitoring & alerting on those logs. © 2017 OpenCore GmbH & Co. KG 42/50
  • 65. Environments You need a place where you can test upgrades. You might also need a place for backups. Devs would like a cluster as well… © 2017 OpenCore GmbH & Co. KG 43/50
  • 66. Environments © 2017 OpenCore GmbH & Co. KG 44/50
  • 67. Environments © 2017 OpenCore GmbH & Co. KG 45/50
  • 68. User Lifecycle • What happens when a user leaves your organization? • Caches • Ranger Usersync • Existing sessions • Shared accounts © 2017 OpenCore GmbH & Co. KG 46/50
  • 69. Misc
  • 70. Cloud? What about the cloud providers? © 2017 OpenCore GmbH & Co. KG 47/50
  • 71. Distributions • Cloudera © 2017 OpenCore GmbH & Co. KG 48/50
  • 72. Distributions • Cloudera • Hortonworks © 2017 OpenCore GmbH & Co. KG 48/50
  • 73. Distributions • Cloudera • Hortonworks • IBM © 2017 OpenCore GmbH & Co. KG 48/50
  • 74. Distributions • Cloudera • Hortonworks • IBM • Microsoft HDInsight © 2017 OpenCore GmbH & Co. KG 48/50
  • 75. Thank You! Thank you for listening! © 2017 OpenCore GmbH & Co. KG 49/50
  • 76. Questions & Contact Questions? Contact me at: • lars.francke@opencore.com • @lars_francke Visit us at opencore.com And if this stuff interests you: We’re looking to expand our team! © 2017 OpenCore GmbH & Co. KG 50/50