SlideShare a Scribd company logo
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
BEST PRACTICES FOR ENTERPRISE
USER MANAGEMENT IN HADOOP
ENVIRONMENT
Sailaja Polavarapu
Sr. Software Engineer
Hortonworks
Dataworks Summit 2017 Munich
Don Bosco Durai
Cofounder &
Chief Security Architect
Privacera
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Don Bosco Durai
⬢Cofounder and Chief Security Architect at Privacera
⬢Committer in Apache Ranger and Apache Ambari
⬢Contributor in most Apache projects for security
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Sailaja Polavarapu
⬢ Apache Ranger contributor since 2015
⬢ Apache Ranger Committer
⬢ Contributed major improvements for Usersync module in
Ranger
⬢Currently working at Hortonworks Security Team
⬢ Contact: spolavarapu@apache.org
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Agenda
◆ Authentication and Users in Hadoop
◆ Integrating Ranger with AD/LDAP
◆ Common Use cases
◆ LDAP connection check tool
◆ Best practices
◆ Demo
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Most commonly asked question
If I have Ranger, do I need Kerberos?
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why Authenticate Users?
Authentication
Authorization
Auditing
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Service Types
Infrastructure
HDFS
Oozie
Storm
YARN
Hive
Server
HBase
Zookeeper Kafka
Apps
Zeppelin
Ambari
Views
Ambari
Admin
Ranger
Atlas
LogSearch
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Master Node
Infrastructure - Kerberos
YARN
Resource Manager
Hive Server
HDFS
Name Node
Node 1
YARN
Node Manager
HDFS
Data Node
Linux
Process
Linux
Process
Node 2
YARN
Node Manager
HDFS
Data Node
Linux
Process
Linux
Process
2
3 3
4 4
5
6 6
Users
1
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
PortalsNotebooks/Viewer
Apps - Username & Password
Hive Server2
ZeppelinAmbari Views
HDFS
Ambari
Atlas
Ranger
BI Tools
Spark
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Knox - Gateway & SSO
Ambari
WebHDFS (HDFS)
Templeton (HCatalog)
Stargate (HBase)
Oozie
Hive/JDBC
Yarn RM
Storm
Name Node UI
Job History UI
Oozie UI
HBase UI
Yarn UI
Spark UI
Ambari UI
Ranger Admin Console
Services UIs
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Authentication and User Source
Hive JDBC
Web Apps
(Zeppelin, Ranger,
Ambari, Atlas)
CLI/ API
(HDFS, Hive Beeline,
HBase, etc.)
LDAP/Kerberos
LDAP
Kerberos
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger
UserSync
Ranger
Admin
Database
AD/
LDAP
Sync
Users/Groups
User/Group Synchronization in Ranger
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
User sources
⬢ AD/LDAP
–Syncs users and groups from LDAP Organizational Units (OU)
⬢Unix Native Users
–Syncs users and groups from /etc/passwd and /etc/group files
⬢ File Sources
–Syncs users and groups from a file specified in the configuration.
–Supports many file formats like - CSV, JSON, etc...
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Integrating Ranger with AD/LDAP
⬢ Understanding your deployment
– What kind of directory server: Active Directory, OpenLdap
server, etc…?
– Is the communication between hadoop cluster and directory
server secure or unsecure?
– Do you have atleast a read-only LDAP user for binding?
– Any firewall restrictions for communication between hadoop
and directory server?
– Is Centrify being used as Ldap proxy?
– Does your AD have spaces or special characters in
username
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
⬢ Gathering details of the directory server structure
– AD/LDAP url and bind credentials
– Any specific OU(s) for hadoop users and groups?
– How many users and groups in the Domain and/or in Ous?
– What kind of filters for user search and/or group search to
be configured in order to limit the users and groups synced to
hadoop?
– What all the available attributes on the directory server for
users and groups like uid, sAMAccountname, memberof,
objectclass, etc…
– Authorization policies to be configured at user level or
group level?
Requirements for User Management
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
DC=ad01,DC=hadoop,DC=com
OU=Hadoop Users
OU=Hadoop Groups
sAMAccountName=jdoe
cn=John Doe
sAMAccountName=bhall
cn=Bob Hall
sAMAccountName=asmith
cn=Andy Smith
sAMAccountName=acaroll
cn=Ashley Caroll
(|(memberof=cn=hdp_testing,ou=Hadoop
Groups,dc=hortonworks,dc=com)(membe
rof=cn=hdp_admin,ou=Hadoop
Groups,dc=hortonworks,dc=com)(membe
rof=cn=dev_ops,ou=Hadoop
Groups,dc=hortonworks,dc=com))
cn=hdp_testing
cn=dev_ops
cn=hdp_admin
sAMAccountName=jdoe
cn=John Doe
sAMAccountName=bhall
cn=Bob Hall
sAMAccountName=asmith
cn=Andy Smith
sAMAccountName=acaroll
cn=Ashley Caroll
Sample Active Directory Server Structure
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Use Case
⬢ Sync all the users that belong to groups -
“hdp_testing”, “hdp_admin”, or “dev_ops”
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
User based Search
⬢ Filter based on “memberof” attribute of the user
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
(| (memberof=cn=hdp_testing,ou=Hadoop
Groups, dc=hortonworks,dc=com)
(memberof=cn=hdp_admin, ou=Hadoop Groups,
dc=hortonworks,dc=com)
(memberof=cn=dev_ops, ou=Hadoop Groups,
dc=hortonworks,dc=com) )
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
sAMAccountName
(|(memberof=cn=hdp_testing,ou=Hadoop Groups,
dc=hortonworks,dc=com)
(memberof=cn=hdp_admin, ou=Hadoop Groups,
dc=hortonworks,dc=com)
(memberof=cn=dev_ops, ou=Hadoop Groups,
dc=hortonworks,dc=com))
OU=Hadoop Users,dc=hortonworks,dc=com
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Group based Search
⬢ Filter based on the group name or “cn” attribute of the group
(|(cn=hdp_*)(cn=dev_*))
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
cn
OU=Hadoop Groups,dc=hortonworks,dc=com
member
(|(cn=dev_*)(cn=hdp_*))
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
LDAP connection check tool
⬢ Command line tool
⬢ Used for
–Discovering various LDAP attributes
– Validate the LDAP settings in Ranger, Ambari, or HDFS LDAP
Group Mapping
– To retrieve the total number of user and/or groups
⬢ Available as part of ranger installation
⬢ Requires basic information like ldap url, bind credentials, etc…
– Command line interface
– a template properties file to update the values specific to the
setup
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Tool usage
⬢usage: run.sh
-a ignore authentication properties
-d <arg> {all|users|groups}
-h show help.
-i <arg> Input file name
-o <arg> Output directory
-r <arg> {all|users|groups}
⬢ All these above parameters are optional
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
CLI option for the Ldap tool
⬢ CLI is provided when input file is not specified:
Ldap url [ldap://ldap.example.com:389]:
Bind DN [cn=admin,ou=users,dc=example,dc=com]:
Bind Password:
User Search Base [ou=users,dc=example,dc=com]:
User Search Filter [cn=user1]:
Sample Authentication User [user1]:
Sample Authentication Password:
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Best practices and Strategies
⬢ Use LDAP/AD for application service authentication
⬢ Use Ranger for authorization
⬢ Verify the truststore certs are updated across the system in case
of SSL
⬢ Use LDAP Connection check tool to
–discover LDAP configuration attributes
–verify the number of users and groups to be sync’d to ranger
⬢ Verify if same case conversion and special characters for user and
group names are handled uniformly across hadoop environment
–Matching rules must be used in core-site.xml as well as in
ranger
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
user@ranger.apache.org

More Related Content

What's hot

Apache NiFi Record Processing
Apache NiFi Record ProcessingApache NiFi Record Processing
Apache NiFi Record ProcessingBryan Bende
 
Hacked? Pray that the Attacker used PowerShell
Hacked? Pray that the Attacker used PowerShellHacked? Pray that the Attacker used PowerShell
Hacked? Pray that the Attacker used PowerShellNikhil Mittal
 
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...Edureka!
 
0wn-premises: Bypassing Microsoft Defender for Identity
0wn-premises: Bypassing Microsoft Defender for Identity0wn-premises: Bypassing Microsoft Defender for Identity
0wn-premises: Bypassing Microsoft Defender for IdentityNikhil Mittal
 
Multi-Tenancy with Spring Boot
Multi-Tenancy with Spring Boot Multi-Tenancy with Spring Boot
Multi-Tenancy with Spring Boot Stormpath
 
FAPI (Financial-grade API) and CIBA (Client Initiated Backchannel Authenticat...
FAPI (Financial-grade API) and CIBA (Client Initiated Backchannel Authenticat...FAPI (Financial-grade API) and CIBA (Client Initiated Backchannel Authenticat...
FAPI (Financial-grade API) and CIBA (Client Initiated Backchannel Authenticat...Tatsuo Kudo
 
Consumer offset management in Kafka
Consumer offset management in KafkaConsumer offset management in Kafka
Consumer offset management in KafkaJoel Koshy
 
Hacking Adobe Experience Manager sites
Hacking Adobe Experience Manager sitesHacking Adobe Experience Manager sites
Hacking Adobe Experience Manager sitesMikhail Egorov
 
Reaching 5 Million Messaging Connections: Our Journey with Kubernetes
Reaching 5 Million Messaging Connections:  Our Journey with KubernetesReaching 5 Million Messaging Connections:  Our Journey with Kubernetes
Reaching 5 Million Messaging Connections: Our Journey with KubernetesConnected
 
Sigma Hall of Fame - EU ATT&CK User Workshop, October 2021
Sigma Hall of Fame - EU ATT&CK User Workshop, October 2021Sigma Hall of Fame - EU ATT&CK User Workshop, October 2021
Sigma Hall of Fame - EU ATT&CK User Workshop, October 2021Florian Roth
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache RangerDataWorks Summit
 
PowerShell for Practical Purple Teaming
PowerShell for Practical Purple TeamingPowerShell for Practical Purple Teaming
PowerShell for Practical Purple TeamingNikhil Mittal
 
Invoke-Obfuscation DerbyCon 2016
Invoke-Obfuscation DerbyCon 2016Invoke-Obfuscation DerbyCon 2016
Invoke-Obfuscation DerbyCon 2016Daniel Bohannon
 
Kafka Security 101 and Real-World Tips
Kafka Security 101 and Real-World Tips Kafka Security 101 and Real-World Tips
Kafka Security 101 and Real-World Tips confluent
 
Kafkaを活用するためのストリーム処理の基本
Kafkaを活用するためのストリーム処理の基本Kafkaを活用するためのストリーム処理の基本
Kafkaを活用するためのストリーム処理の基本Sotaro Kimura
 
WAF Bypass Techniques - Using HTTP Standard and Web Servers’ Behaviour
WAF Bypass Techniques - Using HTTP Standard and Web Servers’ BehaviourWAF Bypass Techniques - Using HTTP Standard and Web Servers’ Behaviour
WAF Bypass Techniques - Using HTTP Standard and Web Servers’ BehaviourSoroush Dalili
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseDataWorks Summit
 
Best practices for Terraform with Vault
Best practices for Terraform with VaultBest practices for Terraform with Vault
Best practices for Terraform with VaultMitchell Pronschinske
 

What's hot (20)

Apache NiFi Record Processing
Apache NiFi Record ProcessingApache NiFi Record Processing
Apache NiFi Record Processing
 
Hacked? Pray that the Attacker used PowerShell
Hacked? Pray that the Attacker used PowerShellHacked? Pray that the Attacker used PowerShell
Hacked? Pray that the Attacker used PowerShell
 
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
 
0wn-premises: Bypassing Microsoft Defender for Identity
0wn-premises: Bypassing Microsoft Defender for Identity0wn-premises: Bypassing Microsoft Defender for Identity
0wn-premises: Bypassing Microsoft Defender for Identity
 
ELK Stack
ELK StackELK Stack
ELK Stack
 
Multi-Tenancy with Spring Boot
Multi-Tenancy with Spring Boot Multi-Tenancy with Spring Boot
Multi-Tenancy with Spring Boot
 
FAPI (Financial-grade API) and CIBA (Client Initiated Backchannel Authenticat...
FAPI (Financial-grade API) and CIBA (Client Initiated Backchannel Authenticat...FAPI (Financial-grade API) and CIBA (Client Initiated Backchannel Authenticat...
FAPI (Financial-grade API) and CIBA (Client Initiated Backchannel Authenticat...
 
Consumer offset management in Kafka
Consumer offset management in KafkaConsumer offset management in Kafka
Consumer offset management in Kafka
 
Hacking Adobe Experience Manager sites
Hacking Adobe Experience Manager sitesHacking Adobe Experience Manager sites
Hacking Adobe Experience Manager sites
 
Reaching 5 Million Messaging Connections: Our Journey with Kubernetes
Reaching 5 Million Messaging Connections:  Our Journey with KubernetesReaching 5 Million Messaging Connections:  Our Journey with Kubernetes
Reaching 5 Million Messaging Connections: Our Journey with Kubernetes
 
Sigma Hall of Fame - EU ATT&CK User Workshop, October 2021
Sigma Hall of Fame - EU ATT&CK User Workshop, October 2021Sigma Hall of Fame - EU ATT&CK User Workshop, October 2021
Sigma Hall of Fame - EU ATT&CK User Workshop, October 2021
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 
PowerShell for Practical Purple Teaming
PowerShell for Practical Purple TeamingPowerShell for Practical Purple Teaming
PowerShell for Practical Purple Teaming
 
Invoke-Obfuscation DerbyCon 2016
Invoke-Obfuscation DerbyCon 2016Invoke-Obfuscation DerbyCon 2016
Invoke-Obfuscation DerbyCon 2016
 
Kafka Security 101 and Real-World Tips
Kafka Security 101 and Real-World Tips Kafka Security 101 and Real-World Tips
Kafka Security 101 and Real-World Tips
 
Kafkaを活用するためのストリーム処理の基本
Kafkaを活用するためのストリーム処理の基本Kafkaを活用するためのストリーム処理の基本
Kafkaを活用するためのストリーム処理の基本
 
WAF Bypass Techniques - Using HTTP Standard and Web Servers’ Behaviour
WAF Bypass Techniques - Using HTTP Standard and Web Servers’ BehaviourWAF Bypass Techniques - Using HTTP Standard and Web Servers’ Behaviour
WAF Bypass Techniques - Using HTTP Standard and Web Servers’ Behaviour
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
 
HAProxy
HAProxy HAProxy
HAProxy
 
Best practices for Terraform with Vault
Best practices for Terraform with VaultBest practices for Terraform with Vault
Best practices for Terraform with Vault
 

Viewers also liked

Viewers also liked (12)

Solving Cyber at Scale
Solving Cyber at ScaleSolving Cyber at Scale
Solving Cyber at Scale
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...
 
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron
MaaS (Model as a Service): Modern Streaming Data Science with Apache MetronMaaS (Model as a Service): Modern Streaming Data Science with Apache Metron
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron
 
File Format Benchmark - Avro, JSON, ORC and Parquet
File Format Benchmark - Avro, JSON, ORC and ParquetFile Format Benchmark - Avro, JSON, ORC and Parquet
File Format Benchmark - Avro, JSON, ORC and Parquet
 
Running Services on YARN
Running Services on YARNRunning Services on YARN
Running Services on YARN
 
Apache Metron: Community Driven Cyber Security
Apache Metron: Community Driven Cyber Security Apache Metron: Community Driven Cyber Security
Apache Metron: Community Driven Cyber Security
 
Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Mod...
Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Mod...Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Mod...
Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Mod...
 
Hadoop 3 in a Nutshell
Hadoop 3 in a NutshellHadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 

Similar to Best Practices for Enterprise User Management in Hadoop Environment

Managing enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemManaging enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemDataWorks Summit
 
Cloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerationsCloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerationsDataWorks Summit
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...DataWorks Summit
 
Hadoop in adtech
Hadoop in adtechHadoop in adtech
Hadoop in adtechYuta Imai
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityDataWorks Summit
 
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San JoseCloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San JoseMingliang Liu
 
Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Clusterahortonworks
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseDataWorks Summit
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017alanfgates
 
Hadoop & devOps : better together
Hadoop & devOps : better togetherHadoop & devOps : better together
Hadoop & devOps : better togetherMaxime Lanciaux
 
Running Apache Zeppelin production
Running Apache Zeppelin productionRunning Apache Zeppelin production
Running Apache Zeppelin productionVinay Shukla
 
Running Zeppelin in Enterprise
Running Zeppelin in EnterpriseRunning Zeppelin in Enterprise
Running Zeppelin in EnterpriseDataWorks Summit
 
Apache Ambari - What's New in 1.7.0
Apache Ambari - What's New in 1.7.0Apache Ambari - What's New in 1.7.0
Apache Ambari - What's New in 1.7.0Hortonworks
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies DataWorks Summit/Hadoop Summit
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityChris Nauroth
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopYifeng Jiang
 

Similar to Best Practices for Enterprise User Management in Hadoop Environment (20)

Managing enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemManaging enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystem
 
Cloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerationsCloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerations
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
 
Hadoop in adtech
Hadoop in adtechHadoop in adtech
Hadoop in adtech
 
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
 
Curb your insecurity with HDP
Curb your insecurity with HDPCurb your insecurity with HDP
Curb your insecurity with HDP
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San JoseCloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
 
Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Cluster
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
 
Hadoop & devOps : better together
Hadoop & devOps : better togetherHadoop & devOps : better together
Hadoop & devOps : better together
 
Running Apache Zeppelin production
Running Apache Zeppelin productionRunning Apache Zeppelin production
Running Apache Zeppelin production
 
Running Zeppelin in Enterprise
Running Zeppelin in EnterpriseRunning Zeppelin in Enterprise
Running Zeppelin in Enterprise
 
Apache Ambari - What's New in 1.7.0
Apache Ambari - What's New in 1.7.0Apache Ambari - What's New in 1.7.0
Apache Ambari - What's New in 1.7.0
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise Hadoop
 
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 

More from DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesDataWorks Summit/Hadoop Summit
 

More from DataWorks Summit/Hadoop Summit (20)

State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 

Recently uploaded

AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekCzechDreamin
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomCzechDreamin
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Product School
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀DianaGray10
 
Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfChristopherTHyatt
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsPaul Groth
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeCzechDreamin
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Julian Hyde
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...CzechDreamin
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsStefano
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxJennifer Lim
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...Product School
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Product School
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Thierry Lestable
 

Recently uploaded (20)

AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdf
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 

Best Practices for Enterprise User Management in Hadoop Environment

  • 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved BEST PRACTICES FOR ENTERPRISE USER MANAGEMENT IN HADOOP ENVIRONMENT Sailaja Polavarapu Sr. Software Engineer Hortonworks Dataworks Summit 2017 Munich Don Bosco Durai Cofounder & Chief Security Architect Privacera
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Don Bosco Durai ⬢Cofounder and Chief Security Architect at Privacera ⬢Committer in Apache Ranger and Apache Ambari ⬢Contributor in most Apache projects for security
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Sailaja Polavarapu ⬢ Apache Ranger contributor since 2015 ⬢ Apache Ranger Committer ⬢ Contributed major improvements for Usersync module in Ranger ⬢Currently working at Hortonworks Security Team ⬢ Contact: spolavarapu@apache.org
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Agenda ◆ Authentication and Users in Hadoop ◆ Integrating Ranger with AD/LDAP ◆ Common Use cases ◆ LDAP connection check tool ◆ Best practices ◆ Demo
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Most commonly asked question If I have Ranger, do I need Kerberos?
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Why Authenticate Users? Authentication Authorization Auditing
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Service Types Infrastructure HDFS Oozie Storm YARN Hive Server HBase Zookeeper Kafka Apps Zeppelin Ambari Views Ambari Admin Ranger Atlas LogSearch
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Master Node Infrastructure - Kerberos YARN Resource Manager Hive Server HDFS Name Node Node 1 YARN Node Manager HDFS Data Node Linux Process Linux Process Node 2 YARN Node Manager HDFS Data Node Linux Process Linux Process 2 3 3 4 4 5 6 6 Users 1
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved PortalsNotebooks/Viewer Apps - Username & Password Hive Server2 ZeppelinAmbari Views HDFS Ambari Atlas Ranger BI Tools Spark
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Knox - Gateway & SSO Ambari WebHDFS (HDFS) Templeton (HCatalog) Stargate (HBase) Oozie Hive/JDBC Yarn RM Storm Name Node UI Job History UI Oozie UI HBase UI Yarn UI Spark UI Ambari UI Ranger Admin Console Services UIs
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Authentication and User Source Hive JDBC Web Apps (Zeppelin, Ranger, Ambari, Atlas) CLI/ API (HDFS, Hive Beeline, HBase, etc.) LDAP/Kerberos LDAP Kerberos
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger UserSync Ranger Admin Database AD/ LDAP Sync Users/Groups User/Group Synchronization in Ranger
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved User sources ⬢ AD/LDAP –Syncs users and groups from LDAP Organizational Units (OU) ⬢Unix Native Users –Syncs users and groups from /etc/passwd and /etc/group files ⬢ File Sources –Syncs users and groups from a file specified in the configuration. –Supports many file formats like - CSV, JSON, etc...
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Integrating Ranger with AD/LDAP ⬢ Understanding your deployment – What kind of directory server: Active Directory, OpenLdap server, etc…? – Is the communication between hadoop cluster and directory server secure or unsecure? – Do you have atleast a read-only LDAP user for binding? – Any firewall restrictions for communication between hadoop and directory server? – Is Centrify being used as Ldap proxy? – Does your AD have spaces or special characters in username
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved ⬢ Gathering details of the directory server structure – AD/LDAP url and bind credentials – Any specific OU(s) for hadoop users and groups? – How many users and groups in the Domain and/or in Ous? – What kind of filters for user search and/or group search to be configured in order to limit the users and groups synced to hadoop? – What all the available attributes on the directory server for users and groups like uid, sAMAccountname, memberof, objectclass, etc… – Authorization policies to be configured at user level or group level? Requirements for User Management
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved DC=ad01,DC=hadoop,DC=com OU=Hadoop Users OU=Hadoop Groups sAMAccountName=jdoe cn=John Doe sAMAccountName=bhall cn=Bob Hall sAMAccountName=asmith cn=Andy Smith sAMAccountName=acaroll cn=Ashley Caroll (|(memberof=cn=hdp_testing,ou=Hadoop Groups,dc=hortonworks,dc=com)(membe rof=cn=hdp_admin,ou=Hadoop Groups,dc=hortonworks,dc=com)(membe rof=cn=dev_ops,ou=Hadoop Groups,dc=hortonworks,dc=com)) cn=hdp_testing cn=dev_ops cn=hdp_admin sAMAccountName=jdoe cn=John Doe sAMAccountName=bhall cn=Bob Hall sAMAccountName=asmith cn=Andy Smith sAMAccountName=acaroll cn=Ashley Caroll Sample Active Directory Server Structure
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Use Case ⬢ Sync all the users that belong to groups - “hdp_testing”, “hdp_admin”, or “dev_ops”
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved User based Search ⬢ Filter based on “memberof” attribute of the user
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved (| (memberof=cn=hdp_testing,ou=Hadoop Groups, dc=hortonworks,dc=com) (memberof=cn=hdp_admin, ou=Hadoop Groups, dc=hortonworks,dc=com) (memberof=cn=dev_ops, ou=Hadoop Groups, dc=hortonworks,dc=com) )
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved sAMAccountName (|(memberof=cn=hdp_testing,ou=Hadoop Groups, dc=hortonworks,dc=com) (memberof=cn=hdp_admin, ou=Hadoop Groups, dc=hortonworks,dc=com) (memberof=cn=dev_ops, ou=Hadoop Groups, dc=hortonworks,dc=com)) OU=Hadoop Users,dc=hortonworks,dc=com
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Group based Search ⬢ Filter based on the group name or “cn” attribute of the group (|(cn=hdp_*)(cn=dev_*))
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved cn OU=Hadoop Groups,dc=hortonworks,dc=com member (|(cn=dev_*)(cn=hdp_*))
  • 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved LDAP connection check tool ⬢ Command line tool ⬢ Used for –Discovering various LDAP attributes – Validate the LDAP settings in Ranger, Ambari, or HDFS LDAP Group Mapping – To retrieve the total number of user and/or groups ⬢ Available as part of ranger installation ⬢ Requires basic information like ldap url, bind credentials, etc… – Command line interface – a template properties file to update the values specific to the setup
  • 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Tool usage ⬢usage: run.sh -a ignore authentication properties -d <arg> {all|users|groups} -h show help. -i <arg> Input file name -o <arg> Output directory -r <arg> {all|users|groups} ⬢ All these above parameters are optional
  • 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved CLI option for the Ldap tool ⬢ CLI is provided when input file is not specified: Ldap url [ldap://ldap.example.com:389]: Bind DN [cn=admin,ou=users,dc=example,dc=com]: Bind Password: User Search Base [ou=users,dc=example,dc=com]: User Search Filter [cn=user1]: Sample Authentication User [user1]: Sample Authentication Password:
  • 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo
  • 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Best practices and Strategies ⬢ Use LDAP/AD for application service authentication ⬢ Use Ranger for authorization ⬢ Verify the truststore certs are updated across the system in case of SSL ⬢ Use LDAP Connection check tool to –discover LDAP configuration attributes –verify the number of users and groups to be sync’d to ranger ⬢ Verify if same case conversion and special characters for user and group names are handled uniformly across hadoop environment –Matching rules must be used in core-site.xml as well as in ranger
  • 29. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved user@ranger.apache.org