SlideShare a Scribd company logo
Securing Hadoop in an Enterprise
Context
Apache: Big Data conference
Hellmar Becker, Senior IT Specialist
Budapest, September 29, 2015
Who am I?
2
1. The Challenge
2. Excursion: Hadoop Usage Patterns
3. Aspects of Security
4. Analytic Clusters: “Sandbox” Model
5. Securing HDFS Environments That Do Automated Processing
6. Connecting to the Enterprise Directory
7. Further Aspects
8. Questions
Securing Hadoop in an Enterprise Context
3
1. The Challenge
4
Integrate all
data sources
within the bank
into one processing
platform
• Batch data streams
• Live transactions
• Model building for
customer
interaction
Data Lake and Advanced Analytics within ING
5
Empower data scientists and analysts
to get the best results with advanced
analytics tools and predictive models
Open source software where possible –
Hadoop as a core component
Risks
• Data loss
• Privacy breach
• System intrusion
6
Possible consequences
Legal consequences
Loss of reputation
Financial loss
Hadoop user model:
• A user name is just an alphanumeric string
• So is a group name
• They do not have to match entities in the OS
• Via REST API anybody could in theory read/write HDFS
Hadoop "out of the box" does not have any security model
switched on
7
2. Excursion: Hadoop Usage Patterns
8
1. File Storage
2. Deep Data
3. Analytical
Hadoop
4. (Real Time)
Hadoop Usage Patterns
9
Topics Analytical Hadoop Deep Data File Storage
User Access Named Non Personal Accounts Non Personal Accounts
Capacity mgmt. Small disk space Large disks space Large disks space
Resource mgmt. High CPU & memory Med CPU & memory Low CPU & memory
Confidentiality Integrity Availability –
rating
C based on use case, IA-low C static/data driven, IA-high C static/data driven, IA-high
Flexibility High Low Low
Tooling outside Hadoop High & user driven Low & life cycle driven Low & life cycle driven
Disaster recovery & High Availability Low High High
Predictability of Jobs Ad hoc Scheduled None
Data Subset relevant for use case All All
Lineage Irrelevant Relevant Relevant
Descriptive metadata Relevant Relevant Relevant
Develop Test Acceptance Production Develop (Test) Test Acceptance Production Test Acceptance Production
Hadoop Usage Patterns: Characteristics
10
3. Aspects of Security
11
Technical: Rings of Defense
• Perimeter Level Security
• Application Level Authentication and Authorization
• OS Security
• Data Protection
See also: http://www.slideshare.net/vinnies12/hadoop-security-today-tomorrow-apache-knox
Conceptual: Five Pillars of Security
• Administration
• Authentication
• Authorization
• Auditing
• Data Protection
See also: http://hortonworks.com/hdp/security/
Aspects of Security
12
4. Analytic Clusters:
“Sandbox” Model
13
• Strong perimeter security
• Ideally "air gapped"
• Practical: allow access only through a terminal service (Citrix, VNC)
Pro:
• Easy to implement
• No changes to internal settings
Con:
• Even legitimate data transfers are difficult
• Not suitable for automated batch processing
• Software updates only through manually maintained mirror
Used in exploratory environments (pattern 3)
Approach A: “Sandbox”
14
5. Securing HDFS Environments
That Do Automated Processing
15
• General goal: Zero Touch
deployment
• Automatic synchronization with
enterprise directory
• Ranger UI is only used for
incidents
Administration
16
• Kerberos
• Question of one KDC per Cluster? (Yes)
• Connecting to enterprise directory (next chapter)
• Keep the Kerberos principals (Hadoop users) completely separate from OS users
Authentication
Simplest approach: HDFS ACLs
BUT:
• No easy to use GUI
• Difficult to maintain overview
• Only for HDFS, does not handle other components
Authorization
17
> hdfs dfs -setfacl -m group:execs:r-- /sales-data
> hdfs dfs -getfacl /sales-data
# file: /sales-data
# owner: bruce
# group: sales
user::rw-
group::r--
group:execs:r--
mask::r--
other::---
Better: Unified rights management with Ranger
• Service principals will be directly made known to Ranger;
PA's rights are assigned only based on groups
• Groups and users are synced with AD. See below for
details
• Note: Be aware that Ranger can not take away privileges
that were granted on a lower level
• HDFS permissions and ACLs override Ranger
• Make sure these access paths are locked down
• Ranger standard
auditing
• More testing required:
Is audit logging to a
database good
enough/fast enough?
Auditing
18
6. Connecting to the Enterprise
Directory
19
• Personal users in corporate Active Directory,
NPAs in cluster KDC
• One way realm trust
Separation of
administrative duties
20
• Historically, Windows and Linux are
different worlds
• Need to work in interdisciplinary teams
• Educate AD experts on the details of Kerberos realm trust
• Still to be solved: YARN containers need to run as a OS user that matches the HDFS user name
• AD and Linux LDAP use different user keys
• Currently, some teams use workarounds for this (manually maintenance required)
Specific challenges
• Maintained in HR database/tools
• More interdisciplinary cooperation required!
• Need to map abstract "business roles" (function descriptions) to "technical roles" (sets of
privileges)
• HR database maintainers have to update this, it will be reflected in AD
• In LDAP, these technical roles appear as groups
Security roles for personal accounts
21
• Ranger's uxugsync process queries Active Directory through LDAP protocol
• Ranger 0.4: Reads all users, then determines their group affiliation
• More than 50,000 employees in ING Group
• Need to limit the load on LDAP server!
• Ranger 0.5: Group driven query - still not optimal because it uses attribute filters
• Most efficient LDAP query is either by a single DN (Distinguished Name), or by container
(query base DN).
• But we cannot use containers because of enterprise policy
• Solution: custom Python script that queries LDAP hierarchically
• One “supergroup” is picked by DN
• The members of the “supergroup” are all LDAP groups that have Hadoop related
privileges
• Query all these groups, again by DN
• Examine the members of each group (personal users)
• Make the user-group relationships known to Ranger via REST call
Synchronizing users and roles from Active Directory
22
7. Further Aspects
23
• Use LDAP to authenticate in Ambari, Hue
• Note: Our current setup connects Ambari to Unix LDAP, which is not in sync with AD
Securing the Non-Kerberos/Ranger Components
24
• Knox
• Reverse proxy
Securing the Perimeter
• A good HDFS security model takes care of much that follows
• Considerations for database-like processing (Hive, Hbase): Column or file based security
models, can't have both
Securing Platform Components
8. Questions
25
• Hellmar in Nîmes / With Python in Mindanao, by the author
• Domtoren in het oranje licht by helena_is_here is licensed under CC BY 2.0
• Data Pipeline, ING OIB Image Bank
• Storm surge by David Baird is licensed under CC BY-SA 2.0; cropped by me
• System Lock by Yuri Samoilov is licensed under CC BY 2.0; cropped by me
• Safe by Rob Pongsajapan is licensed under CC BY 2.0; cropped by me
• Hercules and Cerberus by The Los Angeles County Museum of Art is Public Domain
Attributions
26
Backup
27
Security Model
28

More Related Content

What's hot

Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and Tomorrow
DataWorks Summit
 
April 2014 HUG : Apache Sentry
April 2014 HUG : Apache SentryApril 2014 HUG : Apache Sentry
April 2014 HUG : Apache Sentry
Yahoo Developer Network
 
Data protection for hadoop environments
Data protection for hadoop environmentsData protection for hadoop environments
Data protection for hadoop environments
DataWorks Summit
 
Ranger admin dev overview
Ranger admin dev overviewRanger admin dev overview
Ranger admin dev overview
Tushar Dudhatra
 
Deploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for HadoopDeploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for Hadoop
Cloudera, Inc.
 
Apache Sentry for Hadoop security
Apache Sentry for Hadoop securityApache Sentry for Hadoop security
Apache Sentry for Hadoop security
bigdatagurus_meetup
 
Big data security
Big data securityBig data security
Big data security
Joey Echeverria
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
Owen O'Malley
 
Securing the Hadoop Ecosystem
Securing the Hadoop EcosystemSecuring the Hadoop Ecosystem
Securing the Hadoop Ecosystem
DataWorks Summit
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
Rommel Garcia
 
Nl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesNl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenches
Bolke de Bruin
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
Biju Nair
 
Big Data Security with Hadoop
Big Data Security with HadoopBig Data Security with Hadoop
Big Data Security with Hadoop
Cloudera, Inc.
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
DataWorks Summit
 
Getting started with MariaDB with Docker
Getting started with MariaDB with DockerGetting started with MariaDB with Docker
Getting started with MariaDB with Docker
MariaDB plc
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
Cloudera, Inc.
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
Chris Nauroth
 
Database Security Threats — MariaDB Security Best Practices
Database Security Threats — MariaDB Security Best PracticesDatabase Security Threats — MariaDB Security Best Practices
Database Security Threats — MariaDB Security Best Practices
MariaDB plc
 
Database Security Threats - MariaDB Security Best Practices
Database Security Threats - MariaDB Security Best PracticesDatabase Security Threats - MariaDB Security Best Practices
Database Security Threats - MariaDB Security Best Practices
MariaDB plc
 
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - Ranger
Isheeta Sanghi
 

What's hot (20)

Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and Tomorrow
 
April 2014 HUG : Apache Sentry
April 2014 HUG : Apache SentryApril 2014 HUG : Apache Sentry
April 2014 HUG : Apache Sentry
 
Data protection for hadoop environments
Data protection for hadoop environmentsData protection for hadoop environments
Data protection for hadoop environments
 
Ranger admin dev overview
Ranger admin dev overviewRanger admin dev overview
Ranger admin dev overview
 
Deploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for HadoopDeploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for Hadoop
 
Apache Sentry for Hadoop security
Apache Sentry for Hadoop securityApache Sentry for Hadoop security
Apache Sentry for Hadoop security
 
Big data security
Big data securityBig data security
Big data security
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
Securing the Hadoop Ecosystem
Securing the Hadoop EcosystemSecuring the Hadoop Ecosystem
Securing the Hadoop Ecosystem
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Nl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesNl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenches
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Big Data Security with Hadoop
Big Data Security with HadoopBig Data Security with Hadoop
Big Data Security with Hadoop
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Getting started with MariaDB with Docker
Getting started with MariaDB with DockerGetting started with MariaDB with Docker
Getting started with MariaDB with Docker
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Database Security Threats — MariaDB Security Best Practices
Database Security Threats — MariaDB Security Best PracticesDatabase Security Threats — MariaDB Security Best Practices
Database Security Threats — MariaDB Security Best Practices
 
Database Security Threats - MariaDB Security Best Practices
Database Security Threats - MariaDB Security Best PracticesDatabase Security Threats - MariaDB Security Best Practices
Database Security Threats - MariaDB Security Best Practices
 
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - Ranger
 

Viewers also liked

Destroying Data Silos
Destroying Data SilosDestroying Data Silos
Destroying Data Silos
Hellmar Becker
 
Erfolg im Netz messbar machen - Chancen für die Leasingindustrie
Erfolg im Netz messbar machen - Chancen für die LeasingindustrieErfolg im Netz messbar machen - Chancen für die Leasingindustrie
Erfolg im Netz messbar machen - Chancen für die Leasingindustrie
Hellmar Becker
 
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Cloudera, Inc.
 
Hadoop & Complex Systems Research
Hadoop & Complex Systems ResearchHadoop & Complex Systems Research
Hadoop & Complex Systems Research
Dr. Mirko Kämpf
 
Placement Cell project
Placement Cell projectPlacement Cell project
Placement Cell project
Manish Kumar
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
boorad
 

Viewers also liked (6)

Destroying Data Silos
Destroying Data SilosDestroying Data Silos
Destroying Data Silos
 
Erfolg im Netz messbar machen - Chancen für die Leasingindustrie
Erfolg im Netz messbar machen - Chancen für die LeasingindustrieErfolg im Netz messbar machen - Chancen für die Leasingindustrie
Erfolg im Netz messbar machen - Chancen für die Leasingindustrie
 
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
 
Hadoop & Complex Systems Research
Hadoop & Complex Systems ResearchHadoop & Complex Systems Research
Hadoop & Complex Systems Research
 
Placement Cell project
Placement Cell projectPlacement Cell project
Placement Cell project
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 

Similar to Securing Hadoop in an Enterprise Context

Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise Context
DataWorks Summit/Hadoop Summit
 
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the EnterpriseDeploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Big-Data-as-a-Service (BDaaS) Meetup
 
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Cloudera, Inc.
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
Uwe Printz
 
Big Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsBig Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the Experts
DataWorks Summit/Hadoop Summit
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
Jim Kaskade
 
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
DataWorks Summit
 
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by ClouderaBig Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Caserta
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
DataWorks Summit/Hadoop Summit
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
Seeling Cheung
 
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache HadoopCombat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
Cloudera, Inc.
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoop
Chiou-Nan Chen
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
Great Wide Open
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
DataWorks Summit
 
End-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service DeploymentEnd-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service Deployment
DataWorks Summit/Hadoop Summit
 
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend MicroHBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
Cloudera, Inc.
 
Hadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsHadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the Experts
DataWorks Summit/Hadoop Summit
 
Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...
Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...
Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...
Big Data Spain
 
Hadoop in the cloud – The what, why and how from the experts
Hadoop in the cloud – The what, why and how from the expertsHadoop in the cloud – The what, why and how from the experts
Hadoop in the cloud – The what, why and how from the experts
DataWorks Summit
 

Similar to Securing Hadoop in an Enterprise Context (20)

Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise Context
 
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the EnterpriseDeploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
 
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
 
Big Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsBig Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the Experts
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
 
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by ClouderaBig Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache HadoopCombat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoop
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
 
End-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service DeploymentEnd-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service Deployment
 
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend MicroHBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
 
Hadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsHadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the Experts
 
Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...
Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...
Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...
 
Hadoop in the cloud – The what, why and how from the experts
Hadoop in the cloud – The what, why and how from the expertsHadoop in the cloud – The what, why and how from the experts
Hadoop in the cloud – The what, why and how from the experts
 

Recently uploaded

Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 

Recently uploaded (20)

Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 

Securing Hadoop in an Enterprise Context

  • 1. Securing Hadoop in an Enterprise Context Apache: Big Data conference Hellmar Becker, Senior IT Specialist Budapest, September 29, 2015
  • 3. 1. The Challenge 2. Excursion: Hadoop Usage Patterns 3. Aspects of Security 4. Analytic Clusters: “Sandbox” Model 5. Securing HDFS Environments That Do Automated Processing 6. Connecting to the Enterprise Directory 7. Further Aspects 8. Questions Securing Hadoop in an Enterprise Context 3
  • 5. Integrate all data sources within the bank into one processing platform • Batch data streams • Live transactions • Model building for customer interaction Data Lake and Advanced Analytics within ING 5 Empower data scientists and analysts to get the best results with advanced analytics tools and predictive models Open source software where possible – Hadoop as a core component
  • 6. Risks • Data loss • Privacy breach • System intrusion 6 Possible consequences Legal consequences Loss of reputation Financial loss
  • 7. Hadoop user model: • A user name is just an alphanumeric string • So is a group name • They do not have to match entities in the OS • Via REST API anybody could in theory read/write HDFS Hadoop "out of the box" does not have any security model switched on 7
  • 8. 2. Excursion: Hadoop Usage Patterns 8
  • 9. 1. File Storage 2. Deep Data 3. Analytical Hadoop 4. (Real Time) Hadoop Usage Patterns 9
  • 10. Topics Analytical Hadoop Deep Data File Storage User Access Named Non Personal Accounts Non Personal Accounts Capacity mgmt. Small disk space Large disks space Large disks space Resource mgmt. High CPU & memory Med CPU & memory Low CPU & memory Confidentiality Integrity Availability – rating C based on use case, IA-low C static/data driven, IA-high C static/data driven, IA-high Flexibility High Low Low Tooling outside Hadoop High & user driven Low & life cycle driven Low & life cycle driven Disaster recovery & High Availability Low High High Predictability of Jobs Ad hoc Scheduled None Data Subset relevant for use case All All Lineage Irrelevant Relevant Relevant Descriptive metadata Relevant Relevant Relevant Develop Test Acceptance Production Develop (Test) Test Acceptance Production Test Acceptance Production Hadoop Usage Patterns: Characteristics 10
  • 11. 3. Aspects of Security 11
  • 12. Technical: Rings of Defense • Perimeter Level Security • Application Level Authentication and Authorization • OS Security • Data Protection See also: http://www.slideshare.net/vinnies12/hadoop-security-today-tomorrow-apache-knox Conceptual: Five Pillars of Security • Administration • Authentication • Authorization • Auditing • Data Protection See also: http://hortonworks.com/hdp/security/ Aspects of Security 12
  • 14. • Strong perimeter security • Ideally "air gapped" • Practical: allow access only through a terminal service (Citrix, VNC) Pro: • Easy to implement • No changes to internal settings Con: • Even legitimate data transfers are difficult • Not suitable for automated batch processing • Software updates only through manually maintained mirror Used in exploratory environments (pattern 3) Approach A: “Sandbox” 14
  • 15. 5. Securing HDFS Environments That Do Automated Processing 15
  • 16. • General goal: Zero Touch deployment • Automatic synchronization with enterprise directory • Ranger UI is only used for incidents Administration 16 • Kerberos • Question of one KDC per Cluster? (Yes) • Connecting to enterprise directory (next chapter) • Keep the Kerberos principals (Hadoop users) completely separate from OS users Authentication
  • 17. Simplest approach: HDFS ACLs BUT: • No easy to use GUI • Difficult to maintain overview • Only for HDFS, does not handle other components Authorization 17 > hdfs dfs -setfacl -m group:execs:r-- /sales-data > hdfs dfs -getfacl /sales-data # file: /sales-data # owner: bruce # group: sales user::rw- group::r-- group:execs:r-- mask::r-- other::--- Better: Unified rights management with Ranger • Service principals will be directly made known to Ranger; PA's rights are assigned only based on groups • Groups and users are synced with AD. See below for details • Note: Be aware that Ranger can not take away privileges that were granted on a lower level • HDFS permissions and ACLs override Ranger • Make sure these access paths are locked down
  • 18. • Ranger standard auditing • More testing required: Is audit logging to a database good enough/fast enough? Auditing 18
  • 19. 6. Connecting to the Enterprise Directory 19
  • 20. • Personal users in corporate Active Directory, NPAs in cluster KDC • One way realm trust Separation of administrative duties 20 • Historically, Windows and Linux are different worlds • Need to work in interdisciplinary teams • Educate AD experts on the details of Kerberos realm trust • Still to be solved: YARN containers need to run as a OS user that matches the HDFS user name • AD and Linux LDAP use different user keys • Currently, some teams use workarounds for this (manually maintenance required) Specific challenges
  • 21. • Maintained in HR database/tools • More interdisciplinary cooperation required! • Need to map abstract "business roles" (function descriptions) to "technical roles" (sets of privileges) • HR database maintainers have to update this, it will be reflected in AD • In LDAP, these technical roles appear as groups Security roles for personal accounts 21
  • 22. • Ranger's uxugsync process queries Active Directory through LDAP protocol • Ranger 0.4: Reads all users, then determines their group affiliation • More than 50,000 employees in ING Group • Need to limit the load on LDAP server! • Ranger 0.5: Group driven query - still not optimal because it uses attribute filters • Most efficient LDAP query is either by a single DN (Distinguished Name), or by container (query base DN). • But we cannot use containers because of enterprise policy • Solution: custom Python script that queries LDAP hierarchically • One “supergroup” is picked by DN • The members of the “supergroup” are all LDAP groups that have Hadoop related privileges • Query all these groups, again by DN • Examine the members of each group (personal users) • Make the user-group relationships known to Ranger via REST call Synchronizing users and roles from Active Directory 22
  • 24. • Use LDAP to authenticate in Ambari, Hue • Note: Our current setup connects Ambari to Unix LDAP, which is not in sync with AD Securing the Non-Kerberos/Ranger Components 24 • Knox • Reverse proxy Securing the Perimeter • A good HDFS security model takes care of much that follows • Considerations for database-like processing (Hive, Hbase): Column or file based security models, can't have both Securing Platform Components
  • 26. • Hellmar in Nîmes / With Python in Mindanao, by the author • Domtoren in het oranje licht by helena_is_here is licensed under CC BY 2.0 • Data Pipeline, ING OIB Image Bank • Storm surge by David Baird is licensed under CC BY-SA 2.0; cropped by me • System Lock by Yuri Samoilov is licensed under CC BY 2.0; cropped by me • Safe by Rob Pongsajapan is licensed under CC BY 2.0; cropped by me • Hercules and Cerberus by The Los Angeles County Museum of Art is Public Domain Attributions 26