SlideShare a Scribd company logo
1 of 20
Hadoop Summit 2016
Securing Hadoop in an
Enterprise Context
Hellmar Becker, DevOps Engineer
Dublin, April 14, 2016
Who am I?
2
2
4
3
1
5
The Challenge
Hadoop Usage Patterns
Aspects of Security
Building Blocks for a Security Architecture
Questions
Securing Hadoop in an Enterprise Context
3
The Challenge
Data Lake and Advanced Analytics within ING
5
External and internal reporting for
own or regulatory purposes
Integrate all data sources within the
bank into one processing platform
• Batch data streams
• Live transactions
• Model building for customer
interaction
Better understand customer
needs in an increasingly digital world
Data can help us offering
tailored products and services
Empower data scientists and analysts
to get the best results with advanced
analytics tools and predictive models
Open source software where possible
– Hadoop as a core component
6
Possible consequences
• Legal consequences
• Loss of reputation
• Financial loss
Risks
• Data loss
• Privacy breach
• System intrusion
Hadoop user model:
• A user name is just an alphanumeric string
• So is a group name
• They do not have to match entities in the OS
• Via REST API anybody could read or modify
data
So, the security design has to be actively built!
And this is what we did.
Hadoop "out of the box" default runs without security
7
Hadoop Usage Patterns
1. File Storage
2. Deep Data
3. Analytical
Hadoop
4. (Real Time)
Hadoop Usage Patterns
9
Aspects of Security
Aspects of Security
12
Technical: Rings of Defense
• Perimeter Level Security
• Application Level Authentication and Authorization
• OS Security
• Data Protection
See also: http://www.slideshare.net/vinnies12/hadoop-
security-today-tomorrow-apache-knox
Conceptual: Five Pillars of Security
• Administration
• Authentication
• Authorization
• Auditing
• Data Protection
See also: http://hortonworks.com/hdp/security/
Building Blocks for a Security
Architecture
• Firewall around the entire cluster
• “Stepping stone” servers
• Citrix/Terminal server for interactive access
• Ingestion server with defined transfer
paths
User model
• Personal users locally defined or with
corporate directory
• Service/Technical users defined locally
Software updates and software development
• Through manually maintained mirror
Used in exploratory environments (pattern 3)
Building Blocks: Perimeter Security
14
• General goal: Zero Touch deployment
• Automatic synchronization with enterprise
directory
• UI access is only used for incidents
Administration
15
• Kerberos]
• Future: Share a KDC HA cluster among Hadoop instances
• Connecting to enterprise directory using trusts and synchronization (next chapter)
• Keep the Kerberos principals (Hadoop users) completely separate from OS users
Authentication
Building Blocks: Internal Security
Unified rights management with Ranger
• Service principals will be directly made known to Ranger;
PA's rights are assigned only based on groups
• Groups and users are synced with Active Directory
• Ranger 0.4 can not take away privileges that were granted
on a lower level
• HDFS permissions and ACLs override Ranger
• Make sure these access paths are locked down
HDFS ACLs (No!)
• No easy to use GUI
• Difficult to maintain overview
• Only for HDFS, does not handle other components
Authorization
16
> hdfs dfs -setfacl -m group:execs:r-- /sales-data
> hdfs dfs -getfacl /sales-data
# file: /sales-data
# owner: bruce
# group: sales
user::rw-
group::r--
group:execs:r--
mask::r--
other::---
• Personal users in corporate Active Directory, NPAs
in cluster KDC
• One KDC pair per cluster
• One way realm trust
• Custom script to synchronize Ranger
What We Have Done: Corporate Integration
17
Challenges
• Learning to work in interdisciplinary teams
• Organizational boundaries
• UNIX – Windows
• Infra – Platform DevOps
Example: Ambari service connects to UNIX LDAP rather than
AD
OS security and Hadoop security are not integrated
• YARN container users
• Hadoop ACLs, group mapping
• Multitenancy? (Not solved in this picture)
• Ranger's uxugsync process queries Active Directory through LDAP protocol
• Ranger 0.4: Reads all users, then determines their group affiliation
• More than 50,000 employees in ING Group
• Need to limit the load on LDAP server!
• Ranger 0.5: Group driven query - still not optimal because it uses attribute filters
• Most efficient LDAP query is either by a single DN (Distinguished Name), or by
container (query base DN).
• But we cannot use containers because of enterprise policy
• Solution: custom Python script that queries LDAP hierarchically
• One “supergroup” is picked by DN
• The members of the “supergroup” are all LDAP groups that have Hadoop related
privileges
• Query all these groups, again by DN
• Examine the members of each group (personal users)
• Make the user-group relationships known to Ranger via REST call
Working Around Ranger’s Limitations
18
Ranger User-Group
API is not
documented and
supported
Database schema:
creates duplicate
records,
inconsistent
deletion behavior
OS integration
should be better
• IPA and sssd provide user/group mapping on
Hadoop and OS level
• Role based access for personal users,
managed through a central tool
• One user database for Hadoop services,
Ambari, Ranger
• YARN, HDFS user models fall nicely into place
• Requires ING patches (HDP 2.4, Ranger 0.6)
• RANGER-827 use getent instead of files
• RANGER-842 use pam for Ranger auth
• HADOOP-12751, HIVE-4413 support ‘@’ in
user name
• AMBARI-6432 support IPA KDC
A Better Approach: Corporate Directory Integration
19
Timelines!
We need this
prioritized by our
vendor
Questions
• Hellmar in Nîmes / With Python in Mindanao, by the author
• Domtoren in het oranje licht by helena_is_here is licensed under CC BY 2.0
• Data Pipeline, ING OIB Image Bank
• Storm surge by David Baird is licensed under CC BY-SA 2.0; cropped by me
• Scared Girl by Victor Bezrukov - Port-42 is licensed under CC BY 2.0
• System Lock by Yuri Samoilov is licensed under CC BY 2.0; cropped by me
• Safe by Rob Pongsajapan is licensed under CC BY 2.0; cropped by me
• Hercules and Cerberus by The Los Angeles County Museum of Art is Public Domain
Attributions
21

More Related Content

What's hot

Ensuring Cloud Native Success: Organization Transformation
Ensuring Cloud Native Success:  Organization TransformationEnsuring Cloud Native Success:  Organization Transformation
Ensuring Cloud Native Success: Organization Transformation
Chloe Jackson
 

What's hot (20)

Webinar: How and Why to Containerize Your Legacy Applications
Webinar: How and Why to Containerize Your Legacy ApplicationsWebinar: How and Why to Containerize Your Legacy Applications
Webinar: How and Why to Containerize Your Legacy Applications
 
Microservices in the Enterprise
Microservices in the Enterprise Microservices in the Enterprise
Microservices in the Enterprise
 
Getting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague GriffithGetting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague Griffith
 
Hadoop on Docker
Hadoop on DockerHadoop on Docker
Hadoop on Docker
 
Continus sql with sql stream builder
Continus sql with sql stream builderContinus sql with sql stream builder
Continus sql with sql stream builder
 
Drone fly - Decoupling Event Listeners from the Hive Metastore
Drone fly - Decoupling Event Listeners from the Hive MetastoreDrone fly - Decoupling Event Listeners from the Hive Metastore
Drone fly - Decoupling Event Listeners from the Hive Metastore
 
SAP Teched 2012 Session Tec3438 Automate IaaS SAP deployments
SAP Teched 2012 Session Tec3438 Automate IaaS SAP deploymentsSAP Teched 2012 Session Tec3438 Automate IaaS SAP deployments
SAP Teched 2012 Session Tec3438 Automate IaaS SAP deployments
 
Micro service architecture
Micro service architecture  Micro service architecture
Micro service architecture
 
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT InfrastructuresOPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
 
CWIN17 london becoming cloud native part 2 - guy martin docker
CWIN17 london   becoming cloud native part 2 - guy martin dockerCWIN17 london   becoming cloud native part 2 - guy martin docker
CWIN17 london becoming cloud native part 2 - guy martin docker
 
GDPR- The Buck Stops Here
GDPR-  The Buck Stops HereGDPR-  The Buck Stops Here
GDPR- The Buck Stops Here
 
B3 getting started_with_cloud_native_development
B3 getting started_with_cloud_native_developmentB3 getting started_with_cloud_native_development
B3 getting started_with_cloud_native_development
 
Monitoring MySQL at scale
Monitoring MySQL at scaleMonitoring MySQL at scale
Monitoring MySQL at scale
 
DevOps and Decoys How to Build a Successful Microsoft DevOps Including the Data
DevOps and Decoys  How to Build a Successful Microsoft DevOps Including the DataDevOps and Decoys  How to Build a Successful Microsoft DevOps Including the Data
DevOps and Decoys How to Build a Successful Microsoft DevOps Including the Data
 
REST - Why, When and How? at AMIS25
REST - Why, When and How? at AMIS25REST - Why, When and How? at AMIS25
REST - Why, When and How? at AMIS25
 
Ensuring Cloud Native Success: Organization Transformation
Ensuring Cloud Native Success:  Organization TransformationEnsuring Cloud Native Success:  Organization Transformation
Ensuring Cloud Native Success: Organization Transformation
 
An Architecture for Autonomy
An Architecture for AutonomyAn Architecture for Autonomy
An Architecture for Autonomy
 
Newt global meetup microservices
Newt global meetup microservicesNewt global meetup microservices
Newt global meetup microservices
 
Deploying your apps in the cloud - the options: an overview
Deploying your apps in the cloud - the options: an overviewDeploying your apps in the cloud - the options: an overview
Deploying your apps in the cloud - the options: an overview
 
6_OPEN17_SUSE Enterprise Storage 4
6_OPEN17_SUSE Enterprise Storage 4 6_OPEN17_SUSE Enterprise Storage 4
6_OPEN17_SUSE Enterprise Storage 4
 

Similar to Securing Hadoop in an Enterprise Context (v2)

Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
DataWorks Summit
 

Similar to Securing Hadoop in an Enterprise Context (v2) (20)

Hadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happyHadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happy
 
Hadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happyHadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happy
 
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
End-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service DeploymentEnd-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service Deployment
 
iRODS 4.0 and Beyond (DDN UK User Group Meeting, September 2014)
iRODS 4.0 and Beyond (DDN UK User Group Meeting, September 2014)iRODS 4.0 and Beyond (DDN UK User Group Meeting, September 2014)
iRODS 4.0 and Beyond (DDN UK User Group Meeting, September 2014)
 
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend MicroHBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
 
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by ClouderaBig Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Hadoop Data Reservoir Webinar
Hadoop Data Reservoir WebinarHadoop Data Reservoir Webinar
Hadoop Data Reservoir Webinar
 
2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014
 
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache HadoopCombat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the EnterpriseDeploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
 
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopImpala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for Hadoop
 
Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...
Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...
Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 

Recently uploaded

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 

Recently uploaded (20)

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions Presentation
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 

Securing Hadoop in an Enterprise Context (v2)

  • 1. Hadoop Summit 2016 Securing Hadoop in an Enterprise Context Hellmar Becker, DevOps Engineer Dublin, April 14, 2016
  • 3. 2 4 3 1 5 The Challenge Hadoop Usage Patterns Aspects of Security Building Blocks for a Security Architecture Questions Securing Hadoop in an Enterprise Context 3
  • 5. Data Lake and Advanced Analytics within ING 5 External and internal reporting for own or regulatory purposes Integrate all data sources within the bank into one processing platform • Batch data streams • Live transactions • Model building for customer interaction Better understand customer needs in an increasingly digital world Data can help us offering tailored products and services Empower data scientists and analysts to get the best results with advanced analytics tools and predictive models Open source software where possible – Hadoop as a core component
  • 6. 6 Possible consequences • Legal consequences • Loss of reputation • Financial loss Risks • Data loss • Privacy breach • System intrusion
  • 7. Hadoop user model: • A user name is just an alphanumeric string • So is a group name • They do not have to match entities in the OS • Via REST API anybody could read or modify data So, the security design has to be actively built! And this is what we did. Hadoop "out of the box" default runs without security 7
  • 9. 1. File Storage 2. Deep Data 3. Analytical Hadoop 4. (Real Time) Hadoop Usage Patterns 9
  • 11. Aspects of Security 12 Technical: Rings of Defense • Perimeter Level Security • Application Level Authentication and Authorization • OS Security • Data Protection See also: http://www.slideshare.net/vinnies12/hadoop- security-today-tomorrow-apache-knox Conceptual: Five Pillars of Security • Administration • Authentication • Authorization • Auditing • Data Protection See also: http://hortonworks.com/hdp/security/
  • 12. Building Blocks for a Security Architecture
  • 13. • Firewall around the entire cluster • “Stepping stone” servers • Citrix/Terminal server for interactive access • Ingestion server with defined transfer paths User model • Personal users locally defined or with corporate directory • Service/Technical users defined locally Software updates and software development • Through manually maintained mirror Used in exploratory environments (pattern 3) Building Blocks: Perimeter Security 14
  • 14. • General goal: Zero Touch deployment • Automatic synchronization with enterprise directory • UI access is only used for incidents Administration 15 • Kerberos] • Future: Share a KDC HA cluster among Hadoop instances • Connecting to enterprise directory using trusts and synchronization (next chapter) • Keep the Kerberos principals (Hadoop users) completely separate from OS users Authentication Building Blocks: Internal Security
  • 15. Unified rights management with Ranger • Service principals will be directly made known to Ranger; PA's rights are assigned only based on groups • Groups and users are synced with Active Directory • Ranger 0.4 can not take away privileges that were granted on a lower level • HDFS permissions and ACLs override Ranger • Make sure these access paths are locked down HDFS ACLs (No!) • No easy to use GUI • Difficult to maintain overview • Only for HDFS, does not handle other components Authorization 16 > hdfs dfs -setfacl -m group:execs:r-- /sales-data > hdfs dfs -getfacl /sales-data # file: /sales-data # owner: bruce # group: sales user::rw- group::r-- group:execs:r-- mask::r-- other::---
  • 16. • Personal users in corporate Active Directory, NPAs in cluster KDC • One KDC pair per cluster • One way realm trust • Custom script to synchronize Ranger What We Have Done: Corporate Integration 17 Challenges • Learning to work in interdisciplinary teams • Organizational boundaries • UNIX – Windows • Infra – Platform DevOps Example: Ambari service connects to UNIX LDAP rather than AD OS security and Hadoop security are not integrated • YARN container users • Hadoop ACLs, group mapping • Multitenancy? (Not solved in this picture)
  • 17. • Ranger's uxugsync process queries Active Directory through LDAP protocol • Ranger 0.4: Reads all users, then determines their group affiliation • More than 50,000 employees in ING Group • Need to limit the load on LDAP server! • Ranger 0.5: Group driven query - still not optimal because it uses attribute filters • Most efficient LDAP query is either by a single DN (Distinguished Name), or by container (query base DN). • But we cannot use containers because of enterprise policy • Solution: custom Python script that queries LDAP hierarchically • One “supergroup” is picked by DN • The members of the “supergroup” are all LDAP groups that have Hadoop related privileges • Query all these groups, again by DN • Examine the members of each group (personal users) • Make the user-group relationships known to Ranger via REST call Working Around Ranger’s Limitations 18 Ranger User-Group API is not documented and supported Database schema: creates duplicate records, inconsistent deletion behavior OS integration should be better
  • 18. • IPA and sssd provide user/group mapping on Hadoop and OS level • Role based access for personal users, managed through a central tool • One user database for Hadoop services, Ambari, Ranger • YARN, HDFS user models fall nicely into place • Requires ING patches (HDP 2.4, Ranger 0.6) • RANGER-827 use getent instead of files • RANGER-842 use pam for Ranger auth • HADOOP-12751, HIVE-4413 support ‘@’ in user name • AMBARI-6432 support IPA KDC A Better Approach: Corporate Directory Integration 19 Timelines! We need this prioritized by our vendor
  • 20. • Hellmar in Nîmes / With Python in Mindanao, by the author • Domtoren in het oranje licht by helena_is_here is licensed under CC BY 2.0 • Data Pipeline, ING OIB Image Bank • Storm surge by David Baird is licensed under CC BY-SA 2.0; cropped by me • Scared Girl by Victor Bezrukov - Port-42 is licensed under CC BY 2.0 • System Lock by Yuri Samoilov is licensed under CC BY 2.0; cropped by me • Safe by Rob Pongsajapan is licensed under CC BY 2.0; cropped by me • Hercules and Cerberus by The Los Angeles County Museum of Art is Public Domain Attributions 21