SlideShare a Scribd company logo
1 of 29
Download to read offline
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
BEST PRACTICES FOR ENTERPRISE
USER MANAGEMENT IN HADOOP
ENVIRONMENT
Sailaja Polavarapu
Sr. Software Engineer
Hortonworks
Dataworks Summit 2017 Munich
Don Bosco Durai
Cofounder &
Chief Security Architect
Privacera
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Don Bosco Durai
⬢Cofounder and Chief Security Architect at Privacera
⬢Committer in Apache Ranger and Apache Ambari
⬢Contributor in most Apache projects for security
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Sailaja Polavarapu
⬢ Apache Ranger contributor since 2015
⬢ Apache Ranger Committer
⬢ Contributed major improvements for Usersync module in
Ranger
⬢Currently working at Hortonworks Security Team
⬢ Contact: spolavarapu@apache.org
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Agenda
◆ Authentication and Users in Hadoop
◆ Integrating Ranger with AD/LDAP
◆ Common Use cases
◆ LDAP connection check tool
◆ Best practices
◆ Demo
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Most commonly asked question
If I have Ranger, do I need Kerberos?
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why Authenticate Users?
Authentication
Authorization
Auditing
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Service Types
Infrastructure
HDFS
Oozie
Storm
YARN
Hive
Server
HBase
Zookeeper Kafka
Apps
Zeppelin
Ambari
Views
Ambari
Admin
Ranger
Atlas
LogSearch
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Master Node
Infrastructure - Kerberos
YARN
Resource Manager
Hive Server
HDFS
Name Node
Node 1
YARN
Node Manager
HDFS
Data Node
Linux
Process
Linux
Process
Node 2
YARN
Node Manager
HDFS
Data Node
Linux
Process
Linux
Process
2
3 3
4 4
5
6 6
Users
1
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
PortalsNotebooks/Viewer
Apps - Username & Password
Hive Server2
ZeppelinAmbari Views
HDFS
Ambari
Atlas
Ranger
BI Tools
Spark
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Knox - Gateway & SSO
Ambari
WebHDFS (HDFS)
Templeton (HCatalog)
Stargate (HBase)
Oozie
Hive/JDBC
Yarn RM
Storm
Name Node UI
Job History UI
Oozie UI
HBase UI
Yarn UI
Spark UI
Ambari UI
Ranger Admin Console
Services UIs
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Authentication and User Source
Hive JDBC
Web Apps
(Zeppelin, Ranger,
Ambari, Atlas)
CLI/ API
(HDFS, Hive Beeline,
HBase, etc.)
LDAP/Kerberos
LDAP
Kerberos
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger
UserSync
Ranger
Admin
Database
AD/
LDAP
Sync
Users/Groups
User/Group Synchronization in Ranger
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
User sources
⬢ AD/LDAP
–Syncs users and groups from LDAP Organizational Units (OU)
⬢Unix Native Users
–Syncs users and groups from /etc/passwd and /etc/group files
⬢ File Sources
–Syncs users and groups from a file specified in the configuration.
–Supports many file formats like - CSV, JSON, etc...
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Integrating Ranger with AD/LDAP
⬢ Understanding your deployment
– What kind of directory server: Active Directory, OpenLdap
server, etc…?
– Is the communication between hadoop cluster and directory
server secure or unsecure?
– Do you have atleast a read-only LDAP user for binding?
– Any firewall restrictions for communication between hadoop
and directory server?
– Is Centrify being used as Ldap proxy?
– Does your AD have spaces or special characters in
username
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
⬢ Gathering details of the directory server structure
– AD/LDAP url and bind credentials
– Any specific OU(s) for hadoop users and groups?
– How many users and groups in the Domain and/or in Ous?
– What kind of filters for user search and/or group search to
be configured in order to limit the users and groups synced to
hadoop?
– What all the available attributes on the directory server for
users and groups like uid, sAMAccountname, memberof,
objectclass, etc…
– Authorization policies to be configured at user level or
group level?
Requirements for User Management
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
DC=ad01,DC=hadoop,DC=com
OU=Hadoop Users
OU=Hadoop Groups
sAMAccountName=jdoe
cn=John Doe
sAMAccountName=bhall
cn=Bob Hall
sAMAccountName=asmith
cn=Andy Smith
sAMAccountName=acaroll
cn=Ashley Caroll
(|(memberof=cn=hdp_testing,ou=Hadoop
Groups,dc=hortonworks,dc=com)(membe
rof=cn=hdp_admin,ou=Hadoop
Groups,dc=hortonworks,dc=com)(membe
rof=cn=dev_ops,ou=Hadoop
Groups,dc=hortonworks,dc=com))
cn=hdp_testing
cn=dev_ops
cn=hdp_admin
sAMAccountName=jdoe
cn=John Doe
sAMAccountName=bhall
cn=Bob Hall
sAMAccountName=asmith
cn=Andy Smith
sAMAccountName=acaroll
cn=Ashley Caroll
Sample Active Directory Server Structure
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Use Case
⬢ Sync all the users that belong to groups -
“hdp_testing”, “hdp_admin”, or “dev_ops”
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
User based Search
⬢ Filter based on “memberof” attribute of the user
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
(| (memberof=cn=hdp_testing,ou=Hadoop
Groups, dc=hortonworks,dc=com)
(memberof=cn=hdp_admin, ou=Hadoop Groups,
dc=hortonworks,dc=com)
(memberof=cn=dev_ops, ou=Hadoop Groups,
dc=hortonworks,dc=com) )
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
sAMAccountName
(|(memberof=cn=hdp_testing,ou=Hadoop Groups,
dc=hortonworks,dc=com)
(memberof=cn=hdp_admin, ou=Hadoop Groups,
dc=hortonworks,dc=com)
(memberof=cn=dev_ops, ou=Hadoop Groups,
dc=hortonworks,dc=com))
OU=Hadoop Users,dc=hortonworks,dc=com
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Group based Search
⬢ Filter based on the group name or “cn” attribute of the group
(|(cn=hdp_*)(cn=dev_*))
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
cn
OU=Hadoop Groups,dc=hortonworks,dc=com
member
(|(cn=dev_*)(cn=hdp_*))
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
LDAP connection check tool
⬢ Command line tool
⬢ Used for
–Discovering various LDAP attributes
– Validate the LDAP settings in Ranger, Ambari, or HDFS LDAP
Group Mapping
– To retrieve the total number of user and/or groups
⬢ Available as part of ranger installation
⬢ Requires basic information like ldap url, bind credentials, etc…
– Command line interface
– a template properties file to update the values specific to the
setup
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Tool usage
⬢usage: run.sh
-a ignore authentication properties
-d <arg> {all|users|groups}
-h show help.
-i <arg> Input file name
-o <arg> Output directory
-r <arg> {all|users|groups}
⬢ All these above parameters are optional
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
CLI option for the Ldap tool
⬢ CLI is provided when input file is not specified:
Ldap url [ldap://ldap.example.com:389]:
Bind DN [cn=admin,ou=users,dc=example,dc=com]:
Bind Password:
User Search Base [ou=users,dc=example,dc=com]:
User Search Filter [cn=user1]:
Sample Authentication User [user1]:
Sample Authentication Password:
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Best practices and Strategies
⬢ Use LDAP/AD for application service authentication
⬢ Use Ranger for authorization
⬢ Verify the truststore certs are updated across the system in case
of SSL
⬢ Use LDAP Connection check tool to
–discover LDAP configuration attributes
–verify the number of users and groups to be sync’d to ranger
⬢ Verify if same case conversion and special characters for user and
group names are handled uniformly across hadoop environment
–Matching rules must be used in core-site.xml as well as in
ranger
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
user@ranger.apache.org

More Related Content

What's hot

ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...Databricks
 
Installation of Dspace in Windows OS: A Complete Documentation
Installation of Dspace in Windows OS: A Complete DocumentationInstallation of Dspace in Windows OS: A Complete Documentation
Installation of Dspace in Windows OS: A Complete DocumentationAshok Kumar Satapathy
 
Sqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionSqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionDataWorks Summit
 
Compact, Compress, De-Duplicate (DAOS)
Compact, Compress, De-Duplicate (DAOS)Compact, Compress, De-Duplicate (DAOS)
Compact, Compress, De-Duplicate (DAOS)Ulrich Krause
 
Integrando Skype em aplicações Delphi
Integrando Skype em aplicações DelphiIntegrando Skype em aplicações Delphi
Integrando Skype em aplicações DelphiAndreano Lanusse
 
Oracle E-Business Suite R12.2.5 on Database 12c: Install, Patch and Administer
Oracle E-Business Suite R12.2.5 on Database 12c: Install, Patch and AdministerOracle E-Business Suite R12.2.5 on Database 12c: Install, Patch and Administer
Oracle E-Business Suite R12.2.5 on Database 12c: Install, Patch and AdministerAndrejs Karpovs
 
Solaris Linux Performance, Tools and Tuning
Solaris Linux Performance, Tools and TuningSolaris Linux Performance, Tools and Tuning
Solaris Linux Performance, Tools and TuningAdrian Cockcroft
 
Fluid UI, Tips, Info
Fluid UI, Tips, InfoFluid UI, Tips, Info
Fluid UI, Tips, InfoAnoop Savio
 
Ad108 - XPages in the IBM Lotus Notes Client - A Deep Dive!
Ad108 - XPages in the IBM Lotus Notes Client - A Deep Dive!Ad108 - XPages in the IBM Lotus Notes Client - A Deep Dive!
Ad108 - XPages in the IBM Lotus Notes Client - A Deep Dive!ddrschiw
 
IBM Traveler Management, Security and Performance
IBM Traveler Management, Security and PerformanceIBM Traveler Management, Security and Performance
IBM Traveler Management, Security and PerformanceGabriella Davis
 
Engage.UG 2022 - Domino TOTP/2FA - Best Practices and Pitfalls
Engage.UG 2022 - Domino TOTP/2FA - Best Practices and PitfallsEngage.UG 2022 - Domino TOTP/2FA - Best Practices and Pitfalls
Engage.UG 2022 - Domino TOTP/2FA - Best Practices and PitfallsMilan Matejic
 
The Ultimate Administrator’s Guide to HCL Nomad Web
The Ultimate Administrator’s Guide to HCL Nomad WebThe Ultimate Administrator’s Guide to HCL Nomad Web
The Ultimate Administrator’s Guide to HCL Nomad Webpanagenda
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and FutureDataWorks Summit
 
April, 2021 OpenNTF Webinar - Domino Administration Best Practices
April, 2021 OpenNTF Webinar - Domino Administration Best PracticesApril, 2021 OpenNTF Webinar - Domino Administration Best Practices
April, 2021 OpenNTF Webinar - Domino Administration Best PracticesHoward Greenberg
 
IMS DC Self Study Complete Tutorial
IMS DC Self Study Complete TutorialIMS DC Self Study Complete Tutorial
IMS DC Self Study Complete TutorialSrinimf-Slides
 
Cassandra sharding and consistency (lightning talk)
Cassandra sharding and consistency (lightning talk)Cassandra sharding and consistency (lightning talk)
Cassandra sharding and consistency (lightning talk)Federico Razzoli
 
IBM Notes Traveler administration and Log troubleshooting tips
IBM Notes Traveler administration and Log troubleshooting tipsIBM Notes Traveler administration and Log troubleshooting tips
IBM Notes Traveler administration and Log troubleshooting tipsjayeshpar2006
 

What's hot (20)

ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
 
Installation of Dspace in Windows OS: A Complete Documentation
Installation of Dspace in Windows OS: A Complete DocumentationInstallation of Dspace in Windows OS: A Complete Documentation
Installation of Dspace in Windows OS: A Complete Documentation
 
Sqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionSqoop on Spark for Data Ingestion
Sqoop on Spark for Data Ingestion
 
Compact, Compress, De-Duplicate (DAOS)
Compact, Compress, De-Duplicate (DAOS)Compact, Compress, De-Duplicate (DAOS)
Compact, Compress, De-Duplicate (DAOS)
 
Integrando Skype em aplicações Delphi
Integrando Skype em aplicações DelphiIntegrando Skype em aplicações Delphi
Integrando Skype em aplicações Delphi
 
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on TezAchieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
 
Oracle E-Business Suite R12.2.5 on Database 12c: Install, Patch and Administer
Oracle E-Business Suite R12.2.5 on Database 12c: Install, Patch and AdministerOracle E-Business Suite R12.2.5 on Database 12c: Install, Patch and Administer
Oracle E-Business Suite R12.2.5 on Database 12c: Install, Patch and Administer
 
HBase Low Latency
HBase Low LatencyHBase Low Latency
HBase Low Latency
 
Solaris Linux Performance, Tools and Tuning
Solaris Linux Performance, Tools and TuningSolaris Linux Performance, Tools and Tuning
Solaris Linux Performance, Tools and Tuning
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 
Fluid UI, Tips, Info
Fluid UI, Tips, InfoFluid UI, Tips, Info
Fluid UI, Tips, Info
 
Ad108 - XPages in the IBM Lotus Notes Client - A Deep Dive!
Ad108 - XPages in the IBM Lotus Notes Client - A Deep Dive!Ad108 - XPages in the IBM Lotus Notes Client - A Deep Dive!
Ad108 - XPages in the IBM Lotus Notes Client - A Deep Dive!
 
IBM Traveler Management, Security and Performance
IBM Traveler Management, Security and PerformanceIBM Traveler Management, Security and Performance
IBM Traveler Management, Security and Performance
 
Engage.UG 2022 - Domino TOTP/2FA - Best Practices and Pitfalls
Engage.UG 2022 - Domino TOTP/2FA - Best Practices and PitfallsEngage.UG 2022 - Domino TOTP/2FA - Best Practices and Pitfalls
Engage.UG 2022 - Domino TOTP/2FA - Best Practices and Pitfalls
 
The Ultimate Administrator’s Guide to HCL Nomad Web
The Ultimate Administrator’s Guide to HCL Nomad WebThe Ultimate Administrator’s Guide to HCL Nomad Web
The Ultimate Administrator’s Guide to HCL Nomad Web
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
April, 2021 OpenNTF Webinar - Domino Administration Best Practices
April, 2021 OpenNTF Webinar - Domino Administration Best PracticesApril, 2021 OpenNTF Webinar - Domino Administration Best Practices
April, 2021 OpenNTF Webinar - Domino Administration Best Practices
 
IMS DC Self Study Complete Tutorial
IMS DC Self Study Complete TutorialIMS DC Self Study Complete Tutorial
IMS DC Self Study Complete Tutorial
 
Cassandra sharding and consistency (lightning talk)
Cassandra sharding and consistency (lightning talk)Cassandra sharding and consistency (lightning talk)
Cassandra sharding and consistency (lightning talk)
 
IBM Notes Traveler administration and Log troubleshooting tips
IBM Notes Traveler administration and Log troubleshooting tipsIBM Notes Traveler administration and Log troubleshooting tips
IBM Notes Traveler administration and Log troubleshooting tips
 

Viewers also liked

Viewers also liked (12)

Solving Cyber at Scale
Solving Cyber at ScaleSolving Cyber at Scale
Solving Cyber at Scale
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...
 
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron
MaaS (Model as a Service): Modern Streaming Data Science with Apache MetronMaaS (Model as a Service): Modern Streaming Data Science with Apache Metron
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron
 
File Format Benchmark - Avro, JSON, ORC and Parquet
File Format Benchmark - Avro, JSON, ORC and ParquetFile Format Benchmark - Avro, JSON, ORC and Parquet
File Format Benchmark - Avro, JSON, ORC and Parquet
 
Running Services on YARN
Running Services on YARNRunning Services on YARN
Running Services on YARN
 
Apache Metron: Community Driven Cyber Security
Apache Metron: Community Driven Cyber Security Apache Metron: Community Driven Cyber Security
Apache Metron: Community Driven Cyber Security
 
Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Mod...
Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Mod...Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Mod...
Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Mod...
 
Hadoop 3 in a Nutshell
Hadoop 3 in a NutshellHadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 

Similar to Best Practices for Enterprise User Management in Hadoop Environment

Managing enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemManaging enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemDataWorks Summit
 
Cloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerationsCloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerationsDataWorks Summit
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...DataWorks Summit
 
Hadoop in adtech
Hadoop in adtechHadoop in adtech
Hadoop in adtechYuta Imai
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityDataWorks Summit
 
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San JoseCloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San JoseMingliang Liu
 
Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Clusterahortonworks
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseDataWorks Summit
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017alanfgates
 
Hadoop & devOps : better together
Hadoop & devOps : better togetherHadoop & devOps : better together
Hadoop & devOps : better togetherMaxime Lanciaux
 
Running Apache Zeppelin production
Running Apache Zeppelin productionRunning Apache Zeppelin production
Running Apache Zeppelin productionVinay Shukla
 
Running Zeppelin in Enterprise
Running Zeppelin in EnterpriseRunning Zeppelin in Enterprise
Running Zeppelin in EnterpriseDataWorks Summit
 
Apache Ambari - What's New in 1.7.0
Apache Ambari - What's New in 1.7.0Apache Ambari - What's New in 1.7.0
Apache Ambari - What's New in 1.7.0Hortonworks
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies DataWorks Summit/Hadoop Summit
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityChris Nauroth
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopYifeng Jiang
 

Similar to Best Practices for Enterprise User Management in Hadoop Environment (20)

Managing enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemManaging enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystem
 
Cloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerationsCloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerations
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
 
Hadoop in adtech
Hadoop in adtechHadoop in adtech
Hadoop in adtech
 
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
 
Curb your insecurity with HDP
Curb your insecurity with HDPCurb your insecurity with HDP
Curb your insecurity with HDP
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San JoseCloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
 
Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Cluster
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
 
Hadoop & devOps : better together
Hadoop & devOps : better togetherHadoop & devOps : better together
Hadoop & devOps : better together
 
Running Apache Zeppelin production
Running Apache Zeppelin productionRunning Apache Zeppelin production
Running Apache Zeppelin production
 
Running Zeppelin in Enterprise
Running Zeppelin in EnterpriseRunning Zeppelin in Enterprise
Running Zeppelin in Enterprise
 
Apache Ambari - What's New in 1.7.0
Apache Ambari - What's New in 1.7.0Apache Ambari - What's New in 1.7.0
Apache Ambari - What's New in 1.7.0
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise Hadoop
 
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 

More from DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesDataWorks Summit/Hadoop Summit
 

More from DataWorks Summit/Hadoop Summit (20)

State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 

Recently uploaded

The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 

Recently uploaded (20)

The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 

Best Practices for Enterprise User Management in Hadoop Environment

  • 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved BEST PRACTICES FOR ENTERPRISE USER MANAGEMENT IN HADOOP ENVIRONMENT Sailaja Polavarapu Sr. Software Engineer Hortonworks Dataworks Summit 2017 Munich Don Bosco Durai Cofounder & Chief Security Architect Privacera
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Don Bosco Durai ⬢Cofounder and Chief Security Architect at Privacera ⬢Committer in Apache Ranger and Apache Ambari ⬢Contributor in most Apache projects for security
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Sailaja Polavarapu ⬢ Apache Ranger contributor since 2015 ⬢ Apache Ranger Committer ⬢ Contributed major improvements for Usersync module in Ranger ⬢Currently working at Hortonworks Security Team ⬢ Contact: spolavarapu@apache.org
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Agenda ◆ Authentication and Users in Hadoop ◆ Integrating Ranger with AD/LDAP ◆ Common Use cases ◆ LDAP connection check tool ◆ Best practices ◆ Demo
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Most commonly asked question If I have Ranger, do I need Kerberos?
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Why Authenticate Users? Authentication Authorization Auditing
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Service Types Infrastructure HDFS Oozie Storm YARN Hive Server HBase Zookeeper Kafka Apps Zeppelin Ambari Views Ambari Admin Ranger Atlas LogSearch
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Master Node Infrastructure - Kerberos YARN Resource Manager Hive Server HDFS Name Node Node 1 YARN Node Manager HDFS Data Node Linux Process Linux Process Node 2 YARN Node Manager HDFS Data Node Linux Process Linux Process 2 3 3 4 4 5 6 6 Users 1
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved PortalsNotebooks/Viewer Apps - Username & Password Hive Server2 ZeppelinAmbari Views HDFS Ambari Atlas Ranger BI Tools Spark
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Knox - Gateway & SSO Ambari WebHDFS (HDFS) Templeton (HCatalog) Stargate (HBase) Oozie Hive/JDBC Yarn RM Storm Name Node UI Job History UI Oozie UI HBase UI Yarn UI Spark UI Ambari UI Ranger Admin Console Services UIs
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Authentication and User Source Hive JDBC Web Apps (Zeppelin, Ranger, Ambari, Atlas) CLI/ API (HDFS, Hive Beeline, HBase, etc.) LDAP/Kerberos LDAP Kerberos
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger UserSync Ranger Admin Database AD/ LDAP Sync Users/Groups User/Group Synchronization in Ranger
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved User sources ⬢ AD/LDAP –Syncs users and groups from LDAP Organizational Units (OU) ⬢Unix Native Users –Syncs users and groups from /etc/passwd and /etc/group files ⬢ File Sources –Syncs users and groups from a file specified in the configuration. –Supports many file formats like - CSV, JSON, etc...
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Integrating Ranger with AD/LDAP ⬢ Understanding your deployment – What kind of directory server: Active Directory, OpenLdap server, etc…? – Is the communication between hadoop cluster and directory server secure or unsecure? – Do you have atleast a read-only LDAP user for binding? – Any firewall restrictions for communication between hadoop and directory server? – Is Centrify being used as Ldap proxy? – Does your AD have spaces or special characters in username
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved ⬢ Gathering details of the directory server structure – AD/LDAP url and bind credentials – Any specific OU(s) for hadoop users and groups? – How many users and groups in the Domain and/or in Ous? – What kind of filters for user search and/or group search to be configured in order to limit the users and groups synced to hadoop? – What all the available attributes on the directory server for users and groups like uid, sAMAccountname, memberof, objectclass, etc… – Authorization policies to be configured at user level or group level? Requirements for User Management
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved DC=ad01,DC=hadoop,DC=com OU=Hadoop Users OU=Hadoop Groups sAMAccountName=jdoe cn=John Doe sAMAccountName=bhall cn=Bob Hall sAMAccountName=asmith cn=Andy Smith sAMAccountName=acaroll cn=Ashley Caroll (|(memberof=cn=hdp_testing,ou=Hadoop Groups,dc=hortonworks,dc=com)(membe rof=cn=hdp_admin,ou=Hadoop Groups,dc=hortonworks,dc=com)(membe rof=cn=dev_ops,ou=Hadoop Groups,dc=hortonworks,dc=com)) cn=hdp_testing cn=dev_ops cn=hdp_admin sAMAccountName=jdoe cn=John Doe sAMAccountName=bhall cn=Bob Hall sAMAccountName=asmith cn=Andy Smith sAMAccountName=acaroll cn=Ashley Caroll Sample Active Directory Server Structure
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Use Case ⬢ Sync all the users that belong to groups - “hdp_testing”, “hdp_admin”, or “dev_ops”
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved User based Search ⬢ Filter based on “memberof” attribute of the user
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved (| (memberof=cn=hdp_testing,ou=Hadoop Groups, dc=hortonworks,dc=com) (memberof=cn=hdp_admin, ou=Hadoop Groups, dc=hortonworks,dc=com) (memberof=cn=dev_ops, ou=Hadoop Groups, dc=hortonworks,dc=com) )
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved sAMAccountName (|(memberof=cn=hdp_testing,ou=Hadoop Groups, dc=hortonworks,dc=com) (memberof=cn=hdp_admin, ou=Hadoop Groups, dc=hortonworks,dc=com) (memberof=cn=dev_ops, ou=Hadoop Groups, dc=hortonworks,dc=com)) OU=Hadoop Users,dc=hortonworks,dc=com
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Group based Search ⬢ Filter based on the group name or “cn” attribute of the group (|(cn=hdp_*)(cn=dev_*))
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved cn OU=Hadoop Groups,dc=hortonworks,dc=com member (|(cn=dev_*)(cn=hdp_*))
  • 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved LDAP connection check tool ⬢ Command line tool ⬢ Used for –Discovering various LDAP attributes – Validate the LDAP settings in Ranger, Ambari, or HDFS LDAP Group Mapping – To retrieve the total number of user and/or groups ⬢ Available as part of ranger installation ⬢ Requires basic information like ldap url, bind credentials, etc… – Command line interface – a template properties file to update the values specific to the setup
  • 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Tool usage ⬢usage: run.sh -a ignore authentication properties -d <arg> {all|users|groups} -h show help. -i <arg> Input file name -o <arg> Output directory -r <arg> {all|users|groups} ⬢ All these above parameters are optional
  • 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved CLI option for the Ldap tool ⬢ CLI is provided when input file is not specified: Ldap url [ldap://ldap.example.com:389]: Bind DN [cn=admin,ou=users,dc=example,dc=com]: Bind Password: User Search Base [ou=users,dc=example,dc=com]: User Search Filter [cn=user1]: Sample Authentication User [user1]: Sample Authentication Password:
  • 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo
  • 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Best practices and Strategies ⬢ Use LDAP/AD for application service authentication ⬢ Use Ranger for authorization ⬢ Verify the truststore certs are updated across the system in case of SSL ⬢ Use LDAP Connection check tool to –discover LDAP configuration attributes –verify the number of users and groups to be sync’d to ranger ⬢ Verify if same case conversion and special characters for user and group names are handled uniformly across hadoop environment –Matching rules must be used in core-site.xml as well as in ranger
  • 29. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved user@ranger.apache.org