How to manage authorization rules on Hadoop cluster with Apache Ranger
Krzysztof Adamski
3
We deliver innovative
IT services for the ING Group
all over the world.
ING Services Polska
4
SocialHarmonisation
Digitalisation
Customer Call Centres
Webservices
In the Cloud
Virtual Bank
Software as a Service
Infrastructure as a Service
Seamless
Concept of ONE
No geographical boundaries
Exception Handling
APIs
My identity
Straight through processing
Customer experience
Personalisation
Automation
Standardisation
Agile
Self Service
Mobile First
Real Time
Security
24/7
‘Outside in and Inside out’
Omnichannel
Zero Touch
Customer journeys
Analytics
Big Data
Digitalised branches
Building standard for new generation digital bank
Cloud Platform as a service
Data Centre
197
289
58
10
Średnia wieku w ISP
20-30 31-40 41-50 50-70
33,26
People matters
554
16,43% (91)83,57%
(463)
5
How secure is your cluster?
Ownership and permissions look fine…
How secure is your cluster?
That must have been a sophisticated hack…
3 x A or 4 as you wish
Hadoop authentication methods
Simple
Hadoop authentication methods
Kerberos
HDFS
HiveServer 2
A B C
KDC
Use Hive ST,
submit query
Hive gets
Namenode (NN)
service ticket
Hive creates
map reduce
using NN ST
Ranger
Knox gets
service ticket for
Hive
Knox runs as proxy
user using Hive ST
Original request
with user id and
password
Client gets
query result
Client
Apache
Knox
Active
Directory
Hortonworks Ring of Defense Architecture
hortonworks.com
What is IPA?
redhat.com
AD Account mapping
redhat.com
SSSD integration
redhat.com
IPA for central UAM
• This works great for OS
• Can this be used by Hadoop?
• Can this be used by Ranger?
HDFS
HiveServer 2
A B C
KDC
Use Hive ST,
submit query
Hive gets
Namenode (NN)
service ticket
Hive creates
map reduce
using NN ST
Ranger
Knox gets
service ticket for
Hive
Knox runs as proxy
user using Hive ST
Original request
with user id and
password
Client gets
query result
Client
Apache
Knox
Active
Directory
Hortonworks Ring of Defense Architecture
hortonworks.com
Installation through ambari
hortonworks.com
Installation through ambari
hortonworks.com
HDP 2.3.4
Watch for ranger.usersync.source.impl.class property
Enable Ranger for HDFS
hortonworks.com
hortonworks.com
hortonworks.com
Ranger audit
• It is recommended that you store audits in Solr and HDFS, and disable
Audit to DB.
• Otherwise you can expect performance issues
• Audit is stored in a single table
• No partitions
• No data retention
IPA as a central UAM
• This works great for OS
• Can this be used by Hadoop? Works great for PA in IPA
• Can this be used by Ranger? Not yet. You still need to bind to LDAP.
Ranger KMS
One big advantage of encryption in
HDFS is that even privileged users,
such as the “hdfs” superuser, can be
blocked from viewing encrypted data.
Caveats
• Ranger (the same goes for Sentry) feels like slapped on security
• User synchronization can be very slow with many users due to
architecture issues
• Doesn’t manage HDFS ACLS and requires Hive user access… defeating
end to end security
• Vulnerability scans just kill Ranger ;)
Caveats
mysql> select count(*) from x_user;
+----------+
| count(*) |
+----------+
| 99 |
+----------+
1 row in set (0.00 sec)
mysql> select count(*) from x_group;
+----------+
| count(*) |
+----------+
| 45 |
+----------+
1 row in set (0.00 sec)
mysql> select count(*) from x_group_users;
+----------+
| count(*) |
+----------+
| 645697 |
+----------+
1 row in set (0.13 sec)
mysql> select sum(user_id) from (select count(distinct user_id) user_id
from x_group_users group by p_group_id) temp;
+--------------+
| sum(user_id) |
+--------------+
| 603 |
+--------------+
1 row in set (1.21 sec)
mysql>
delete from x_group_users where id not in
(
select minid from
(select min(id) as minid from x_group_users group by
p_group_id,user_id) as temp
);
Make it better
• https://issues.apache.org/jira/browse/RANGER-827
usersync SSSD integration (sync excplicitly specified group)
• https://issues.apache.org/jira/browse/HADOOP-12751
allow users with domain suffix (avoid naming collision)
• https://issues.apache.org/jira/browse/HIVE-12981
the same for Hive
• https://issues.apache.org/jira/browse/RANGER-842
PAM integrated authentication for Ranger
Ambari integration with IPA
• https://github.com/HariSekhon/tools/blob/master/ambari_freeipa_k
erberos_setup.pl
Other upcoming features (0.6)
• Tag based policies
• Geolocation based policies
• Deny and exclude policies
• Hive Metastore plugin
Some take away tips
• Install updates on a regular basis
• Isolate your cluster from the rest of the network
• Kerberize your cluster
• Secure the user interfaces
• dfs.namenode.acls.enabled
• fs.permissions.umask-mode
• Watch for superusers (hadoop.proxyuser settings)
• Change OS default umask (watch for the upgrades and config permissions)
• Make sure hive warehouse hdfs path is protected
• Implement Ranger
• Just don’t sync your whole AD with it ;)
krzysztof.adamski@ingservicespolska.pl
@adamskikrzysiek
http://pl.linkedin.com/in/adamskikrzysztof
And yes. We are hiring 

BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger

  • 1.
    How to manageauthorization rules on Hadoop cluster with Apache Ranger Krzysztof Adamski
  • 3.
    3 We deliver innovative ITservices for the ING Group all over the world. ING Services Polska
  • 4.
    4 SocialHarmonisation Digitalisation Customer Call Centres Webservices Inthe Cloud Virtual Bank Software as a Service Infrastructure as a Service Seamless Concept of ONE No geographical boundaries Exception Handling APIs My identity Straight through processing Customer experience Personalisation Automation Standardisation Agile Self Service Mobile First Real Time Security 24/7 ‘Outside in and Inside out’ Omnichannel Zero Touch Customer journeys Analytics Big Data Digitalised branches Building standard for new generation digital bank Cloud Platform as a service Data Centre
  • 5.
    197 289 58 10 Średnia wieku wISP 20-30 31-40 41-50 50-70 33,26 People matters 554 16,43% (91)83,57% (463) 5
  • 6.
    How secure isyour cluster?
  • 7.
  • 8.
    How secure isyour cluster?
  • 9.
    That must havebeen a sophisticated hack…
  • 10.
    3 x Aor 4 as you wish
  • 11.
  • 12.
  • 13.
    HDFS HiveServer 2 A BC KDC Use Hive ST, submit query Hive gets Namenode (NN) service ticket Hive creates map reduce using NN ST Ranger Knox gets service ticket for Hive Knox runs as proxy user using Hive ST Original request with user id and password Client gets query result Client Apache Knox Active Directory Hortonworks Ring of Defense Architecture hortonworks.com
  • 14.
  • 15.
  • 16.
  • 17.
    IPA for centralUAM • This works great for OS • Can this be used by Hadoop? • Can this be used by Ranger?
  • 18.
    HDFS HiveServer 2 A BC KDC Use Hive ST, submit query Hive gets Namenode (NN) service ticket Hive creates map reduce using NN ST Ranger Knox gets service ticket for Hive Knox runs as proxy user using Hive ST Original request with user id and password Client gets query result Client Apache Knox Active Directory Hortonworks Ring of Defense Architecture hortonworks.com
  • 19.
  • 20.
    Installation through ambari hortonworks.com HDP2.3.4 Watch for ranger.usersync.source.impl.class property
  • 21.
    Enable Ranger forHDFS hortonworks.com
  • 23.
  • 24.
  • 26.
    Ranger audit • Itis recommended that you store audits in Solr and HDFS, and disable Audit to DB. • Otherwise you can expect performance issues • Audit is stored in a single table • No partitions • No data retention
  • 28.
    IPA as acentral UAM • This works great for OS • Can this be used by Hadoop? Works great for PA in IPA • Can this be used by Ranger? Not yet. You still need to bind to LDAP.
  • 29.
    Ranger KMS One bigadvantage of encryption in HDFS is that even privileged users, such as the “hdfs” superuser, can be blocked from viewing encrypted data.
  • 30.
    Caveats • Ranger (thesame goes for Sentry) feels like slapped on security • User synchronization can be very slow with many users due to architecture issues • Doesn’t manage HDFS ACLS and requires Hive user access… defeating end to end security • Vulnerability scans just kill Ranger ;)
  • 31.
  • 32.
    mysql> select count(*)from x_user; +----------+ | count(*) | +----------+ | 99 | +----------+ 1 row in set (0.00 sec)
  • 33.
    mysql> select count(*)from x_group; +----------+ | count(*) | +----------+ | 45 | +----------+ 1 row in set (0.00 sec)
  • 34.
    mysql> select count(*)from x_group_users; +----------+ | count(*) | +----------+ | 645697 | +----------+ 1 row in set (0.13 sec)
  • 35.
    mysql> select sum(user_id)from (select count(distinct user_id) user_id from x_group_users group by p_group_id) temp; +--------------+ | sum(user_id) | +--------------+ | 603 | +--------------+ 1 row in set (1.21 sec)
  • 36.
    mysql> delete from x_group_userswhere id not in ( select minid from (select min(id) as minid from x_group_users group by p_group_id,user_id) as temp );
  • 37.
    Make it better •https://issues.apache.org/jira/browse/RANGER-827 usersync SSSD integration (sync excplicitly specified group) • https://issues.apache.org/jira/browse/HADOOP-12751 allow users with domain suffix (avoid naming collision) • https://issues.apache.org/jira/browse/HIVE-12981 the same for Hive • https://issues.apache.org/jira/browse/RANGER-842 PAM integrated authentication for Ranger
  • 38.
    Ambari integration withIPA • https://github.com/HariSekhon/tools/blob/master/ambari_freeipa_k erberos_setup.pl
  • 39.
    Other upcoming features(0.6) • Tag based policies • Geolocation based policies • Deny and exclude policies • Hive Metastore plugin
  • 41.
    Some take awaytips • Install updates on a regular basis • Isolate your cluster from the rest of the network • Kerberize your cluster • Secure the user interfaces • dfs.namenode.acls.enabled • fs.permissions.umask-mode • Watch for superusers (hadoop.proxyuser settings) • Change OS default umask (watch for the upgrades and config permissions) • Make sure hive warehouse hdfs path is protected • Implement Ranger • Just don’t sync your whole AD with it ;)
  • 42.