SlideShare a Scribd company logo
1 of 22
Securing the Hadoop Ecosystem
Patrick Angeles
Big Data Warehouse Meetup
Feb 10, 2014
Why is Security Important?
About Me
Hadooping for 5+ years
• Responsible for several secure Hadoop deployments
• Did e-commerce and consumer analytics (PCI, PII,
etc.)
• Crypto and PKI in a previous life.
•
Why Secure Hadoop?
•

Multi-tenancy
•

•

You want your cluster to store data and run workloads
from multiple users and groups

Compliance
•

You have policies on which personnel can view what data
Agenda
Hadoop Ecosystem Interactions
• Security Concepts
• Security in Practice
•

•
•

IT Infrastructure Integration
Deployment Recommendations
Hadoop on its Own
WebHdfs
client

HDFS
client

Hadoop
NN

SNN

DN TT

Map
Task

DN TT

Map
Task

DN TT

Reduce
Task

HttpFS

MR
client

hdfs, httpfs & mapred users

JT

end users

protocols: RPC/data transfer/HTTP
Hadoop and Friends
service users

end users

clients

protocols: RPCs/data/HTTP/Thrift/Avro-RPC
services

clients
Hbase

Zookeeper

RPC

Hbase

RPC

Zookeeper
Oozie

HTTP

Oozie

WebHdfs
Pig

HTTP

Hue

Crunch

HTTP

browser

HTTP
Cascading

MapRed

RPC

Hadoop

RPC

Flume

Sqoop

Impala

Hive

Hive Metastore

Thrift

Avro RPC

Thrift

Flume

Impala
Security Concepts
Authentication
• Authorization
• Confidentiality
•

•

•

Encryption

Auditing
•

Traceability
Authentication
•

End Users to Services, as a user
•
•
•

•

Services to Services, as a service
•
•

•

CLI & libraries: Kerberos (kinit or keytab)
Web UIs: Kerberos SPNEGO & pluggable HTTP auth
MR tasks use delegation tokens
Credentials: Kerberos (keytab)
Client SSL certificates (for shuffle encryption)

Services to Services, on behalf of a user
•

Proxy-user (after Kerberos for service)
Authorization
•

HDFS Data
•

•

HBase Data
•

•

Fine-grained authorization through Apache Sentry (Incubating)

Jobs (Hadoop, Oozie)
•

•

Read/Write Access Control Lists (ACLs) at table level

Hive Server 2 and Impala
•

•

File System permissions (Unix like user/group
permissions)

Job ACLs for Hadoop Scheduler Queues, manage &
view jobs

Zookeeper
•

ACLs at znodes, authenticated & read/write
Confidentiality
•

Data in transit
RPC: using SASL
• HDFS data: using SASL
• HTTP: using SSL (web UIs, shuffle). Requires SSL
certs
•

•

Data at rest
Nothing out of the box
• Doable by: custom ‘compression’ codec or
local file system encryption
•
Auditing
•

Who accessed (read/write) FS data
•
•

•

Who submitted, managed, or viewed a Job or a
Query
•

•

NN audit log contains all file opens, creates
NN audit log contains all metadata ops, e.g. rename, listdir

JT, RM, and Job History Server logs contain history of all
jobs run on a cluster

Who submitted, managed, or viewed a workflow
•

Oozie audit logs contain history of all user requests
Auditing Gaps
•

Not all projects have explicit audit logs
•
•

•

It is difficult to correlate jobs & data access
•
•

•

Audit-like information can be extracted by processing logs
Eg: Impala query logs are distributed across all nodes
Eg: Map-Reduce jobs launched by Pig job
Eg: HDFS data accessed by a Map-Reduce job

Tools written on top of Hadoop can do this well
Security in Practice
Integration: Kerberos
Users don’t want Yet Another Credential
• Corp IT doesn’t want to provision thousands of
service principals
• Solution: local KDC + one-way trust
• Run a KDC (usually MIT Kerberos) in the cluster
•

•

•

Put all service principals here

Set up one-way trust of central corporate realm by
local KDC
•

Normal user credentials can be used to access Hadoop
Integration: Groups
•

Much of Hadoop authorization uses “groups”
•

•

Users’ groups are not stored in Hadoop anywhere
•
•

•

User ‘patrick’ might belong to groups ‘analysts’, ‘eng’, etc.
Refers to external system to determine group membership
NN/JT/Oozie/Hive servers all must perform group mapping

Default plugins for user/group mapping:
•
•
•

ShellBasedUnixGroupsMapping – forks/runs `/bin/id’
JniBasedUnixGroupsMapping – makes a system call
LdapGroupsMapping – talks directly to an LDAP server
Integration: Kerberos + LDAP

Central Active Directory

LDAP group
mapping

me@EXAMPLE.COM
…

Hadoop Cluster

NN

JT
Local KDC

Cross-realm trust

hdfs/host1@HADOOP.EXAMPLE.COM
yarn/host2@HADOOP.EXAMPLE.COM
…
Integration: Web Interfaces
•

Most web interfaces authenticate using SPNEGO
•
•
•

•

Standard HTTP authentication protocol
Used internally by services which communicate over HTTP
Most browsers support Kerberos SPNEGO authentication

Hadoop components which use servlets for web
interfaces can plug in custom filter
•

Integrate with intranet SSO HTTP solution
Recommendations
•

Security configuration is a PITA
•

•

Do only what you really need

Enable cluster security (Kerberos) only if un-trusted
groups of users are sharing the cluster
•

Otherwise use edge-security to keep outsiders out

Only enable wire encryption if required
• Only enable web interface authentication if required
•
Security Enablement
•

Secure Hadoop enablement order
1.
2.
3.
4.
5.
6.
7.

HDFS RPC (including SNN check-pointing)
JobTracker RPC
TaskTrackers RPC & LinuxTaskControler
Hadoop web UI
Configure monitoring to work with security
Other services (HBase, Oozie, Hive Metastore, etc)
Continue with authorization and network encryption if
needed
Administration
•

Use an admin/management tool
•
•
•

Several inter-related configuration knobs
To manage principals/keytabs creation and distribution
Automatically configures monitoring for security
Q&A

More Related Content

What's hot

Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?DataWorks Summit
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big DataRommel Garcia
 
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Cloudera, Inc.
 
Article data-centric security key to cloud and digital business
Article   data-centric security key to cloud and digital businessArticle   data-centric security key to cloud and digital business
Article data-centric security key to cloud and digital businessUlf Mattsson
 
Nl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesNl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesBolke de Bruin
 
Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016StampedeCon
 
Implementing Security on a Large Multi-Tenant Cluster the Right Way
Implementing Security on a Large Multi-Tenant Cluster the Right WayImplementing Security on a Large Multi-Tenant Cluster the Right Way
Implementing Security on a Large Multi-Tenant Cluster the Right WayDataWorks Summit
 
What the Enterprise Requires - Business Continuity and Visibility
What the Enterprise Requires - Business Continuity and VisibilityWhat the Enterprise Requires - Business Continuity and Visibility
What the Enterprise Requires - Business Continuity and VisibilityCloudera, Inc.
 
Classification based security in Hadoop
Classification based security in HadoopClassification based security in Hadoop
Classification based security in HadoopMadhan Neethiraj
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityDataWorks Summit
 
Advanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterAdvanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterEdureka!
 
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise UsersApache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise UsersDataWorks Summit
 
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...Hortonworks
 
Ranger admin dev overview
Ranger admin dev overviewRanger admin dev overview
Ranger admin dev overviewTushar Dudhatra
 
Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger...
Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger...Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger...
Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger...DataWorks Summit
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoopWei-Chiu Chuang
 
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...Artem Ervits
 

What's hot (20)

Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
 
Hadoop Security
Hadoop SecurityHadoop Security
Hadoop Security
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
 
Article data-centric security key to cloud and digital business
Article   data-centric security key to cloud and digital businessArticle   data-centric security key to cloud and digital business
Article data-centric security key to cloud and digital business
 
Hadoop Security
Hadoop SecurityHadoop Security
Hadoop Security
 
Big data security
Big data securityBig data security
Big data security
 
Nl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesNl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenches
 
Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016
 
Implementing Security on a Large Multi-Tenant Cluster the Right Way
Implementing Security on a Large Multi-Tenant Cluster the Right WayImplementing Security on a Large Multi-Tenant Cluster the Right Way
Implementing Security on a Large Multi-Tenant Cluster the Right Way
 
What the Enterprise Requires - Business Continuity and Visibility
What the Enterprise Requires - Business Continuity and VisibilityWhat the Enterprise Requires - Business Continuity and Visibility
What the Enterprise Requires - Business Continuity and Visibility
 
Classification based security in Hadoop
Classification based security in HadoopClassification based security in Hadoop
Classification based security in Hadoop
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Advanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterAdvanced Security In Hadoop Cluster
Advanced Security In Hadoop Cluster
 
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise UsersApache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
 
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
 
Ranger admin dev overview
Ranger admin dev overviewRanger admin dev overview
Ranger admin dev overview
 
Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger...
Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger...Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger...
Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger...
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoop
 
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
 

Viewers also liked

Anthony R Palazzo - Manufacturing-2016
Anthony R Palazzo - Manufacturing-2016Anthony R Palazzo - Manufacturing-2016
Anthony R Palazzo - Manufacturing-2016Anthony Palazzo
 
The Workers Business Model
The Workers Business ModelThe Workers Business Model
The Workers Business ModelMalcolm Ryder
 
Zlot Chorągwi
Zlot ChorągwiZlot Chorągwi
Zlot Chorągwicnskubiak
 
Hive contributors meetup apache sentry
Hive contributors meetup   apache sentryHive contributors meetup   apache sentry
Hive contributors meetup apache sentryBrock Noland
 
Cerner Corporation
Cerner CorporationCerner Corporation
Cerner CorporationWaqask23
 
Expertise Hour: The Dos and Don'ts of Web Chat with Johan Jacobs
 Expertise Hour: The Dos and Don'ts of Web Chat with Johan Jacobs Expertise Hour: The Dos and Don'ts of Web Chat with Johan Jacobs
Expertise Hour: The Dos and Don'ts of Web Chat with Johan JacobsMoxie
 
Unlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator OptimizerUnlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator OptimizerCloudera, Inc.
 
Knowledge Economy - Knowledge Ecology
Knowledge Economy - Knowledge EcologyKnowledge Economy - Knowledge Ecology
Knowledge Economy - Knowledge EcologyJay Hays
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHumza Naseer
 
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Cloudera, Inc.
 
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...Cloudera, Inc.
 

Viewers also liked (20)

Caab2010jan Foh
Caab2010jan FohCaab2010jan Foh
Caab2010jan Foh
 
Resume 1
Resume 1Resume 1
Resume 1
 
Anthony R Palazzo - Manufacturing-2016
Anthony R Palazzo - Manufacturing-2016Anthony R Palazzo - Manufacturing-2016
Anthony R Palazzo - Manufacturing-2016
 
learning
learninglearning
learning
 
Ytube
YtubeYtube
Ytube
 
The Workers Business Model
The Workers Business ModelThe Workers Business Model
The Workers Business Model
 
3rd Slide Markma
3rd Slide Markma3rd Slide Markma
3rd Slide Markma
 
Zlot Chorągwi
Zlot ChorągwiZlot Chorągwi
Zlot Chorągwi
 
Hive contributors meetup apache sentry
Hive contributors meetup   apache sentryHive contributors meetup   apache sentry
Hive contributors meetup apache sentry
 
Cerner Corporation
Cerner CorporationCerner Corporation
Cerner Corporation
 
Expertise Hour: The Dos and Don'ts of Web Chat with Johan Jacobs
 Expertise Hour: The Dos and Don'ts of Web Chat with Johan Jacobs Expertise Hour: The Dos and Don'ts of Web Chat with Johan Jacobs
Expertise Hour: The Dos and Don'ts of Web Chat with Johan Jacobs
 
Unlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator OptimizerUnlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator Optimizer
 
Knowledge Economy - Knowledge Ecology
Knowledge Economy - Knowledge EcologyKnowledge Economy - Knowledge Ecology
Knowledge Economy - Knowledge Ecology
 
April 2014 HUG : Apache Sentry
April 2014 HUG : Apache SentryApril 2014 HUG : Apache Sentry
April 2014 HUG : Apache Sentry
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing Architectures
 
Presentación Práctica Grupal 2012
Presentación Práctica Grupal 2012Presentación Práctica Grupal 2012
Presentación Práctica Grupal 2012
 
Sentry - An Introduction
Sentry - An Introduction Sentry - An Introduction
Sentry - An Introduction
 
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
 
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
 
Proarrhythmia
ProarrhythmiaProarrhythmia
Proarrhythmia
 

Similar to Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera

Securing the Hadoop Ecosystem
Securing the Hadoop EcosystemSecuring the Hadoop Ecosystem
Securing the Hadoop EcosystemDataWorks Summit
 
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding EdgeCIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding EdgeCloudIDSummit
 
Middleware in Golang: InVision's Rye
Middleware in Golang: InVision's RyeMiddleware in Golang: InVision's Rye
Middleware in Golang: InVision's RyeCale Hoopes
 
SQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightSQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightTillmann Eitelberg
 
Securing Hadoop in an Enterprise Context (v2)
Securing Hadoop in an Enterprise Context (v2)Securing Hadoop in an Enterprise Context (v2)
Securing Hadoop in an Enterprise Context (v2)Hellmar Becker
 
drupal 7 amfserver presentation: integrating flash and drupal
drupal 7 amfserver presentation: integrating flash and drupaldrupal 7 amfserver presentation: integrating flash and drupal
drupal 7 amfserver presentation: integrating flash and drupalrolf vreijdenberger
 
BigData Security - A Point of View
BigData Security - A Point of ViewBigData Security - A Point of View
BigData Security - A Point of ViewKaran Alang
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextHellmar Becker
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online TrainingLearntek1
 
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend MicroHBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend MicroCloudera, Inc.
 
TriHUG October: Apache Ranger
TriHUG October: Apache RangerTriHUG October: Apache Ranger
TriHUG October: Apache Rangertrihug
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityChris Nauroth
 
Harjeet 2.3 hadoop year
Harjeet 2.3 hadoop yearHarjeet 2.3 hadoop year
Harjeet 2.3 hadoop yearHarjeet Singh
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big DataGreat Wide Open
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoopbddmoscow
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security ArchitectureOwen O'Malley
 
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013Amazon Web Services
 

Similar to Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera (20)

Securing the Hadoop Ecosystem
Securing the Hadoop EcosystemSecuring the Hadoop Ecosystem
Securing the Hadoop Ecosystem
 
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding EdgeCIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
 
Middleware in Golang: InVision's Rye
Middleware in Golang: InVision's RyeMiddleware in Golang: InVision's Rye
Middleware in Golang: InVision's Rye
 
SQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightSQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsight
 
Securing Hadoop in an Enterprise Context (v2)
Securing Hadoop in an Enterprise Context (v2)Securing Hadoop in an Enterprise Context (v2)
Securing Hadoop in an Enterprise Context (v2)
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise Context
 
HDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the CloudHDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the Cloud
 
drupal 7 amfserver presentation: integrating flash and drupal
drupal 7 amfserver presentation: integrating flash and drupaldrupal 7 amfserver presentation: integrating flash and drupal
drupal 7 amfserver presentation: integrating flash and drupal
 
BigData Security - A Point of View
BigData Security - A Point of ViewBigData Security - A Point of View
BigData Security - A Point of View
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise Context
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend MicroHBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
 
TriHUG October: Apache Ranger
TriHUG October: Apache RangerTriHUG October: Apache Ranger
TriHUG October: Apache Ranger
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Harjeet 2.3 hadoop year
Harjeet 2.3 hadoop yearHarjeet 2.3 hadoop year
Harjeet 2.3 hadoop year
 
HDF Cloud Services
HDF Cloud ServicesHDF Cloud Services
HDF Cloud Services
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013
 

More from Caserta

Using Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingUsing Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingCaserta
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Caserta
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Caserta
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017Caserta
 
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Caserta
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteCaserta
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Caserta
 
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Caserta
 
The Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseThe Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseCaserta
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Caserta
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Caserta
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?Caserta
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation Caserta
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for EveryoneCaserta
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure CloudCaserta
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the CloudCaserta
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on HadoopCaserta
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data LakeCaserta
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by DatabricksCaserta
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkCaserta
 

More from Caserta (20)

Using Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingUsing Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven Marketing
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017
 
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
 
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
 
The Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseThe Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's Enterprise
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the Cloud
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on Hadoop
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data Lake
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by Databricks
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache Spark
 

Recently uploaded

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Recently uploaded (20)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 

Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera

  • 1. Securing the Hadoop Ecosystem Patrick Angeles Big Data Warehouse Meetup Feb 10, 2014
  • 2. Why is Security Important?
  • 3. About Me Hadooping for 5+ years • Responsible for several secure Hadoop deployments • Did e-commerce and consumer analytics (PCI, PII, etc.) • Crypto and PKI in a previous life. •
  • 4. Why Secure Hadoop? • Multi-tenancy • • You want your cluster to store data and run workloads from multiple users and groups Compliance • You have policies on which personnel can view what data
  • 5. Agenda Hadoop Ecosystem Interactions • Security Concepts • Security in Practice • • • IT Infrastructure Integration Deployment Recommendations
  • 6. Hadoop on its Own WebHdfs client HDFS client Hadoop NN SNN DN TT Map Task DN TT Map Task DN TT Reduce Task HttpFS MR client hdfs, httpfs & mapred users JT end users protocols: RPC/data transfer/HTTP
  • 7. Hadoop and Friends service users end users clients protocols: RPCs/data/HTTP/Thrift/Avro-RPC services clients Hbase Zookeeper RPC Hbase RPC Zookeeper Oozie HTTP Oozie WebHdfs Pig HTTP Hue Crunch HTTP browser HTTP Cascading MapRed RPC Hadoop RPC Flume Sqoop Impala Hive Hive Metastore Thrift Avro RPC Thrift Flume Impala
  • 8. Security Concepts Authentication • Authorization • Confidentiality • • • Encryption Auditing • Traceability
  • 9. Authentication • End Users to Services, as a user • • • • Services to Services, as a service • • • CLI & libraries: Kerberos (kinit or keytab) Web UIs: Kerberos SPNEGO & pluggable HTTP auth MR tasks use delegation tokens Credentials: Kerberos (keytab) Client SSL certificates (for shuffle encryption) Services to Services, on behalf of a user • Proxy-user (after Kerberos for service)
  • 10. Authorization • HDFS Data • • HBase Data • • Fine-grained authorization through Apache Sentry (Incubating) Jobs (Hadoop, Oozie) • • Read/Write Access Control Lists (ACLs) at table level Hive Server 2 and Impala • • File System permissions (Unix like user/group permissions) Job ACLs for Hadoop Scheduler Queues, manage & view jobs Zookeeper • ACLs at znodes, authenticated & read/write
  • 11. Confidentiality • Data in transit RPC: using SASL • HDFS data: using SASL • HTTP: using SSL (web UIs, shuffle). Requires SSL certs • • Data at rest Nothing out of the box • Doable by: custom ‘compression’ codec or local file system encryption •
  • 12. Auditing • Who accessed (read/write) FS data • • • Who submitted, managed, or viewed a Job or a Query • • NN audit log contains all file opens, creates NN audit log contains all metadata ops, e.g. rename, listdir JT, RM, and Job History Server logs contain history of all jobs run on a cluster Who submitted, managed, or viewed a workflow • Oozie audit logs contain history of all user requests
  • 13. Auditing Gaps • Not all projects have explicit audit logs • • • It is difficult to correlate jobs & data access • • • Audit-like information can be extracted by processing logs Eg: Impala query logs are distributed across all nodes Eg: Map-Reduce jobs launched by Pig job Eg: HDFS data accessed by a Map-Reduce job Tools written on top of Hadoop can do this well
  • 15. Integration: Kerberos Users don’t want Yet Another Credential • Corp IT doesn’t want to provision thousands of service principals • Solution: local KDC + one-way trust • Run a KDC (usually MIT Kerberos) in the cluster • • • Put all service principals here Set up one-way trust of central corporate realm by local KDC • Normal user credentials can be used to access Hadoop
  • 16. Integration: Groups • Much of Hadoop authorization uses “groups” • • Users’ groups are not stored in Hadoop anywhere • • • User ‘patrick’ might belong to groups ‘analysts’, ‘eng’, etc. Refers to external system to determine group membership NN/JT/Oozie/Hive servers all must perform group mapping Default plugins for user/group mapping: • • • ShellBasedUnixGroupsMapping – forks/runs `/bin/id’ JniBasedUnixGroupsMapping – makes a system call LdapGroupsMapping – talks directly to an LDAP server
  • 17. Integration: Kerberos + LDAP Central Active Directory LDAP group mapping me@EXAMPLE.COM … Hadoop Cluster NN JT Local KDC Cross-realm trust hdfs/host1@HADOOP.EXAMPLE.COM yarn/host2@HADOOP.EXAMPLE.COM …
  • 18. Integration: Web Interfaces • Most web interfaces authenticate using SPNEGO • • • • Standard HTTP authentication protocol Used internally by services which communicate over HTTP Most browsers support Kerberos SPNEGO authentication Hadoop components which use servlets for web interfaces can plug in custom filter • Integrate with intranet SSO HTTP solution
  • 19. Recommendations • Security configuration is a PITA • • Do only what you really need Enable cluster security (Kerberos) only if un-trusted groups of users are sharing the cluster • Otherwise use edge-security to keep outsiders out Only enable wire encryption if required • Only enable web interface authentication if required •
  • 20. Security Enablement • Secure Hadoop enablement order 1. 2. 3. 4. 5. 6. 7. HDFS RPC (including SNN check-pointing) JobTracker RPC TaskTrackers RPC & LinuxTaskControler Hadoop web UI Configure monitoring to work with security Other services (HBase, Oozie, Hive Metastore, etc) Continue with authorization and network encryption if needed
  • 21. Administration • Use an admin/management tool • • • Several inter-related configuration knobs To manage principals/keytabs creation and distribution Automatically configures monitoring for security
  • 22. Q&A

Editor's Notes

  1. Proxy-user setup:Relying party is configured to recognized super-users who are allowed to impersonate