SlideShare a Scribd company logo
1 of 31
Download to read offline
By: Shrey Mehrotra
 A form of protection where a separation is created between the assets and the threat.
 Security in IT realm:
Application security
Computing security
Data security
Information security
Network security
Data : We have critical data in HDFS.
Resources : Each node of Hadoop cluster has resources required for executing
applications.
Applications : Web Applications and REST APIs to access cluster details.
Services : HDFS, YARN and other services running on the cluster nodes.
Network Security : Services and Application communications over network.
 Configuration
 Data confidentiality
 Service Level Authorization
 Encryption
 Authentication for Hadoop HTTP web-consoles
 Delegation Tokens
 Kerberos
core-site.xml
Parameter Value Notes
hadoop.security.authentication kerberos
simple : No authentication. (default)
kerberos : Enable authentication by Kerberos.
hadoop.security.authorization true Enable RPC service-level authorization.
hadoop.rpc.protection authentication
authentication : authentication only (default)
integrity : integrity check in addition to authentication
privacy : data encryption in addition to integrity
hadoop.proxyuser.superuser.hosts comma separated hosts from which superuser access are allowd to impersonation. * means wildcard.
hadoop.proxyuser.superuser.groups comma separated groups to which users impersonated by superuser belongs. * means wildcard.
hdfs-site.xml
Parameter Value Notes
dfs.block.access.token.enable true Enable HDFS block access tokens for secure operations.
dfs.https.enable true This value is deprecated. Use dfs.http.policy
dfs.namenode.https-address nn_host_fqdn:50470
dfs.https.port 50470
dfs.namenode.keytab.file /etc/security/keytab/nn.service.keytab Kerberos keytab file for the NameNode.
dfs.namenode.kerberos.principal nn/_HOST@REALM.TLD Kerberos principal name for the NameNode.
dfs.namenode.kerberos.internal.spnego.principal HTTP/_HOST@REALM.TLD HTTP Kerberos principal name for the NameNode.
 Superuser can submit jobs or access hdfs on behalf of another user in a secured
way.
 Superuser must have kerberos credentials to be able to impersonate another user.
Ex. A superuser “bob” wants to submit job or access hdfs cluster as “alice”
//Create ugi for joe. The login user is 'super'.
UserGroupInformation ugi =
UserGroupInformation.createProxyUser(“alice", UserGroupInformation.getLoginUser());
ugi.doAs(new PrivilegedExceptionAction<Void>() {
public Void run() throws Exception {
//Submit a job
JobClient jc = new JobClient(conf);
jc.submitJob(conf);
//OR access hdfs
FileSystem fs = FileSystem.get(conf);
fs.mkdir(someFilePath);
}
}
 The superuser must be configured on Namenode and ResourceManager to be
allowed to impersonate another user. Following configurations are required.
<property>
<name>hadoop.proxyuser.super.groups</name>
<value>group1,group2</value>
<description>Allow the superuser super to impersonate any members of the group group1 and group2</description>
</property>
<property>
<name>hadoop.proxyuser.super.hosts</name>
<value>host1,host2</value>
<description>The superuser can connect only from host1 and host2 to impersonate a user</description>
</property>
 Initial authorization mechanism to ensure clients connecting to a particular Hadoop service have the
necessary, pre-configured, permissions and are authorized to access the given service.
 For example, a MapReduce cluster can use this mechanism to allow a configured list of users/groups to
submit jobs.
 By default, service-level authorization is disabled for Hadoop.
 To enable it set following configuration property in core-site.xml :
<property>
<name>hadoop.security.authorization</name>
<value> true</value>
</property>
 hadoop-policy.xml defines an access control list for each Hadoop service.
 Every ACL has simple format, a comma separated list of users and groups separated by space.
Example: user1,user2 group1,group2.
 Blocked Access Control Lists
security.client.protocol.acl  security.client.protocol.acl.blocked
 Refreshing Service Level Authorization Configuration
hadoop dfsadmin –refreshServiceAcl
<property>
<name>security.job.submission.protocol.acl</name>
<value>alice,bob mapreduce</value>
</property>
 Allow only users alice, bob and users in the mapreduce group to submit jobs to the MapReduce cluster:
 Allow only DataNodes running as the users who belong to the group datanodes to communicate with the NameNode:
<property>
<name>security.datanode.protocol.acl</name>
<value>datanodes</value>
</property>
 Allow any user to talk to the HDFS cluster as a DFSClient:
<property>
<name>security.client.protocol.acl</name>
<value>*</value>
</property>
 Data Encryption on RPC
• The data transfered between hadoop services and clients.
• Setting hadoop.rpc.protection to "privacy" in the core-site.xml activate data encryption.
 Data Encryption on Block data transfer
• set dfs.encrypt.data.transfer to "true" in the hdfs-site.xml.
• set dfs.encrypt.data.transfer.algorithm to either "3des" or "rc4" to choose the specific encryption
algorithm.
• By default, 3DES is used.
 Data Encryption on HTTP
• Data transfer between Web-console and clients are protected by using SSL(HTTPS).
 It implements a permissions model for files and directories that shares much of the POSIX model.
 User Identity
 simple : In this mode of operation, the identity of a client process is determined by the host operating system.
 kerberos : In Kerberized operation, the identity of a client process is determined by its Kerberos credentials.
 Group Mapping
 org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback
 org.apache.hadoop.security.ShellBasedUnixGroupsMapping
 HDFS stores the user and group of a file or directory as strings; there is no conversion from user and group identity
numbers as is conventional in Unix.
 Shell Operations
• hadoop fs -chmod [-R] mode file
• hadoop fs -chgrp [-R] group file
• chown [-R] [owner][:[group]] file
 The Super-User
 The super-user is the user with the same identity as name node process itself.
 Permissions checks never fail for the super-user.
 There is no persistent notion of who was the super-user
 When the name node is started the process identity determines who is the super-user for now.
WebHDFS
 Uses Kerberos (SPNEGO) and Hadoop delegation tokens for authentication.
An ACL provides a way to set different permissions for specific named users or named groups, not only the file's owner and
the file's group.
 By default, support for ACLs is disabled.
 Enable ACLs by adding the following configuration property to hdfs-site.xml and restarting the NameNode
<property>
<name>dfs.namenode.acls.enabled</name>
<value>true</value>
</property>
ACLs Shell Commands
 hdfs dfs -getfacl [-R] <path>
 hdfs dfs -setfacl [-R] [-b|-k -m|-x <acl_spec> <path>]|[--set <acl_spec> <path>]
-R : Recursive
-m : Modify ACL.
-b : Remove all but the base ACL entries. The entries for user, group and others are retained for compatibility
with permission bits.
-k : Remove the default ACL.
-x : Remove specified ACL entries.
<acl_spec> : Comma separated list of ACL entries.
--set : Fully replace the ACL, discarding all existing entries.
 hdfs dfs -ls <args>
ls will append a '+' character to the permissions string of any file or directory that has an ACL.
Source : Apache
 Tokens are generated for applications, containers.
 Use HMAC_ALGORITHM to generate password
for tokens.
 YARN interfaces for secret manager tokens
BaseNMTokenSecretManager
AMRMTokenSecretManager
BaseClientToAMTokenSecretManager
BaseContainerTokenSecretManager
Source : Hortonworks
 Enable ACL check in YARN
Queues ACL
 QueueACLsManager check for access of each user against the ACL defined in the queue.
 Following would restrict access to the "support" queue to the users “shrey” and the
members of the “sales" group:
 yarn.scheduler.capacity.root.<queue-path>.acl_administer_queue
<property>
<name>yarn.acl.enable</name>
<value>true</value>
<property>
<property>
<name>yarn.scheduler.capacity.root.<queue-path>.acl_submit_applications</name>
<value>shrey sales</value>
<property>
<property>
<name>yarn.scheduler.capacity.root.<queue-path>.acl_administer_queue s</name>
<value>sales</value>
<property>
Services
Client
Plain Text or Encrypted
Password
 Kerberos is a network authentication protocol.
 It is used to authenticate the identity of the services running on different
nodes (machines) communicating over a non-secure network.
 It uses “tickets” as basic unit for authentication.
 Authentication Server
It is a service used to authenticate or verify clients. It usually checks for username of the requested client
in the system
 Ticket Granting Server
It generates Ticket Granting Tickets (TGTs) based on target service name, initial
ticket (if any) and authenticator.
 Principles
It is the unique identity to which Kerberos could assign tickets provided by Ticket
Granting Server.
To enable Kerberos authentication in Hadoop, we need to configure following properties
in core-site.xml
<property>
<name>hadoop.security.authentication</name>
<value>kerberos</value>
<!-- Giving value as "simple" disables security.-->
</property>
<property>
<name>hadoop.security.authorization</name>
<value>true</value>
</property>
 Keytab is a file containing Kerberos principals and encrypted keys. These files are used to
login directly to Kerberos without being prompted for the password.
 Enabling kerberos for HDFS services:
A. Generating KeyTab
Create the hdfs keytab file that will contain the hdfs principal and HTTP principal. This keytab file is used for the
Namenode and Datanode
B. Associate KeyTab with YARN principle
kadmin: xst -norandkey -k yarn.keytab hdfs/fully.qualified.domain.name HTTP/fully.qualified.domain.name
sudo mv hdfs.keytab /etc/hadoop/conf/
<!-- Namenode security configs -->
<property>
<name>dfs.namenode.keytab.file</name>
<value>/etc/hadoop/hdfs.keytab</value>
<!-- path to the HDFS keytab -->
</property>
<property>
<name>dfs.namenode.kerberos.principal</name>
<value>hdfs/_HOST@YOUR-REALM.COM</value>
</property>
 Add the following properties to the hdfs-site.xml file
<!-- Datanode security configs -->
<property>
<name>dfs.datanode.keytab.file</name>
<value>/etc/hadoop/hdfs.keytab</value>
<!-- path to the HDFS keytab -->
</property>
<property>
<name>dfs.datanode.kerberos.principal</name>
<value>hdfs/_HOST@YOUR-REALM.COM</value>
</property>

More Related Content

What's hot

HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated HadoopHadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated HadoopYafang Chang
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxVinay Shukla
 
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Big Data Spain
 
Improving HDFS Availability with Hadoop RPC Quality of Service
Improving HDFS Availability with Hadoop RPC Quality of ServiceImproving HDFS Availability with Hadoop RPC Quality of Service
Improving HDFS Availability with Hadoop RPC Quality of ServiceMing Ma
 
Hadoop ClusterClient Security Using Kerberos
Hadoop ClusterClient Security Using KerberosHadoop ClusterClient Security Using Kerberos
Hadoop ClusterClient Security Using KerberosSarvesh Meena
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayDataWorks Summit
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureUwe Printz
 
Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowDataWorks Summit
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security ArchitectureOwen O'Malley
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview Hortonworks
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionSteve Loughran
 
Apache Sentry for Hadoop security
Apache Sentry for Hadoop securityApache Sentry for Hadoop security
Apache Sentry for Hadoop securitybigdatagurus_meetup
 
Nl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesNl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesBolke de Bruin
 
TriHUG 2/14: Apache Sentry
TriHUG 2/14: Apache SentryTriHUG 2/14: Apache Sentry
TriHUG 2/14: Apache Sentrytrihug
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big DataRommel Garcia
 
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Kevin Minder
 
Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessCloudera, Inc.
 
Охота на уязвимости Hadoop
Охота на уязвимости HadoopОхота на уязвимости Hadoop
Охота на уязвимости HadoopPositive Hack Days
 
HadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop OverviewHadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop OverviewYafang Chang
 

What's hot (20)

HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated HadoopHadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
 
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
 
Improving HDFS Availability with Hadoop RPC Quality of Service
Improving HDFS Availability with Hadoop RPC Quality of ServiceImproving HDFS Availability with Hadoop RPC Quality of Service
Improving HDFS Availability with Hadoop RPC Quality of Service
 
Hadoop ClusterClient Security Using Kerberos
Hadoop ClusterClient Security Using KerberosHadoop ClusterClient Security Using Kerberos
Hadoop ClusterClient Security Using Kerberos
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
 
Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and Tomorrow
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
 
Apache Sentry for Hadoop security
Apache Sentry for Hadoop securityApache Sentry for Hadoop security
Apache Sentry for Hadoop security
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Nl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesNl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenches
 
TriHUG 2/14: Apache Sentry
TriHUG 2/14: Apache SentryTriHUG 2/14: Apache Sentry
TriHUG 2/14: Apache Sentry
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
 
Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster Access
 
Охота на уязвимости Hadoop
Охота на уязвимости HadoopОхота на уязвимости Hadoop
Охота на уязвимости Hadoop
 
HadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop OverviewHadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop Overview
 

Viewers also liked

オープンソースとコミュニティによる価値の創造
オープンソースとコミュニティによる価値の創造オープンソースとコミュニティによる価値の創造
オープンソースとコミュニティによる価値の創造Rakuten Group, Inc.
 
Data Lake, Virtual Database, or Data Hub - How to Choose?
Data Lake, Virtual Database, or Data Hub - How to Choose?Data Lake, Virtual Database, or Data Hub - How to Choose?
Data Lake, Virtual Database, or Data Hub - How to Choose?DATAVERSITY
 
マルチテナント化に向けたHadoopの最新セキュリティ事情 #hcj2014
マルチテナント化に向けたHadoopの最新セキュリティ事情 #hcj2014マルチテナント化に向けたHadoopの最新セキュリティ事情 #hcj2014
マルチテナント化に向けたHadoopの最新セキュリティ事情 #hcj2014Cloudera Japan
 
Securing MQTT - BuildingIoT 2016 slides
Securing MQTT - BuildingIoT 2016 slidesSecuring MQTT - BuildingIoT 2016 slides
Securing MQTT - BuildingIoT 2016 slidesDominik Obermaier
 
ブラックボックスなアドテクを機械学習で推理してみた Short ver
ブラックボックスなアドテクを機械学習で推理してみた Short verブラックボックスなアドテクを機械学習で推理してみた Short ver
ブラックボックスなアドテクを機械学習で推理してみた Short ver尚行 坂井
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Chris Nauroth
 
IoT Toulouse : introduction à mqtt
IoT Toulouse : introduction à mqttIoT Toulouse : introduction à mqtt
IoT Toulouse : introduction à mqttJulien Vermillard
 
Amazon Redshift ベンチマーク Hadoop + Hiveと比較
Amazon Redshift ベンチマーク  Hadoop + Hiveと比較 Amazon Redshift ベンチマーク  Hadoop + Hiveと比較
Amazon Redshift ベンチマーク Hadoop + Hiveと比較 FlyData Inc.
 
SparkとCassandraの美味しい関係
SparkとCassandraの美味しい関係SparkとCassandraの美味しい関係
SparkとCassandraの美味しい関係datastaxjp
 
リクルートにおけるデータのインフラ化への取組
リクルートにおけるデータのインフラ化への取組リクルートにおけるデータのインフラ化への取組
リクルートにおけるデータのインフラ化への取組Recruit Technologies
 
Using Kubernetes on Google Container Engine
Using Kubernetes on Google Container EngineUsing Kubernetes on Google Container Engine
Using Kubernetes on Google Container EngineEtsuji Nakai
 
20170302 tryswift tasting_tests
20170302 tryswift tasting_tests20170302 tryswift tasting_tests
20170302 tryswift tasting_testsKazuaki Matsuo
 
Hardening Microservices Security: Building a Layered Defense Strategy
Hardening Microservices Security: Building a Layered Defense StrategyHardening Microservices Security: Building a Layered Defense Strategy
Hardening Microservices Security: Building a Layered Defense StrategyCloudflare
 
サーバレスアーキテクチャにしてみた【デブサミ2017 17-E-2】
サーバレスアーキテクチャにしてみた【デブサミ2017 17-E-2】サーバレスアーキテクチャにしてみた【デブサミ2017 17-E-2】
サーバレスアーキテクチャにしてみた【デブサミ2017 17-E-2】dreamarts_pr
 
SIerもはじめる わたしたちのDevOps #jjug_ccc
SIerもはじめる わたしたちのDevOps #jjug_cccSIerもはじめる わたしたちのDevOps #jjug_ccc
SIerもはじめる わたしたちのDevOps #jjug_cccMizuki Ugajin
 
MQTT - A practical protocol for the Internet of Things
MQTT - A practical protocol for the Internet of ThingsMQTT - A practical protocol for the Internet of Things
MQTT - A practical protocol for the Internet of ThingsBryan Boyd
 

Viewers also liked (20)

オープンソースとコミュニティによる価値の創造
オープンソースとコミュニティによる価値の創造オープンソースとコミュニティによる価値の創造
オープンソースとコミュニティによる価値の創造
 
Data Lake, Virtual Database, or Data Hub - How to Choose?
Data Lake, Virtual Database, or Data Hub - How to Choose?Data Lake, Virtual Database, or Data Hub - How to Choose?
Data Lake, Virtual Database, or Data Hub - How to Choose?
 
マルチテナント化に向けたHadoopの最新セキュリティ事情 #hcj2014
マルチテナント化に向けたHadoopの最新セキュリティ事情 #hcj2014マルチテナント化に向けたHadoopの最新セキュリティ事情 #hcj2014
マルチテナント化に向けたHadoopの最新セキュリティ事情 #hcj2014
 
Securing MQTT - BuildingIoT 2016 slides
Securing MQTT - BuildingIoT 2016 slidesSecuring MQTT - BuildingIoT 2016 slides
Securing MQTT - BuildingIoT 2016 slides
 
ブラックボックスなアドテクを機械学習で推理してみた Short ver
ブラックボックスなアドテクを機械学習で推理してみた Short verブラックボックスなアドテクを機械学習で推理してみた Short ver
ブラックボックスなアドテクを機械学習で推理してみた Short ver
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
 
IoT Toulouse : introduction à mqtt
IoT Toulouse : introduction à mqttIoT Toulouse : introduction à mqtt
IoT Toulouse : introduction à mqtt
 
これがCassandra
これがCassandraこれがCassandra
これがCassandra
 
Amazon Redshift ベンチマーク Hadoop + Hiveと比較
Amazon Redshift ベンチマーク  Hadoop + Hiveと比較 Amazon Redshift ベンチマーク  Hadoop + Hiveと比較
Amazon Redshift ベンチマーク Hadoop + Hiveと比較
 
「YDNの広告のCTRをオンライン学習で予測してみた」#yjdsw4
「YDNの広告のCTRをオンライン学習で予測してみた」#yjdsw4「YDNの広告のCTRをオンライン学習で予測してみた」#yjdsw4
「YDNの広告のCTRをオンライン学習で予測してみた」#yjdsw4
 
SparkとCassandraの美味しい関係
SparkとCassandraの美味しい関係SparkとCassandraの美味しい関係
SparkとCassandraの美味しい関係
 
はじめての Elastic Beanstalk
はじめての Elastic Beanstalkはじめての Elastic Beanstalk
はじめての Elastic Beanstalk
 
Data Feed Landscape 1.0 - データフィード版 カオスマップ (日本)
Data Feed Landscape 1.0 - データフィード版 カオスマップ (日本)Data Feed Landscape 1.0 - データフィード版 カオスマップ (日本)
Data Feed Landscape 1.0 - データフィード版 カオスマップ (日本)
 
リクルートにおけるデータのインフラ化への取組
リクルートにおけるデータのインフラ化への取組リクルートにおけるデータのインフラ化への取組
リクルートにおけるデータのインフラ化への取組
 
Using Kubernetes on Google Container Engine
Using Kubernetes on Google Container EngineUsing Kubernetes on Google Container Engine
Using Kubernetes on Google Container Engine
 
20170302 tryswift tasting_tests
20170302 tryswift tasting_tests20170302 tryswift tasting_tests
20170302 tryswift tasting_tests
 
Hardening Microservices Security: Building a Layered Defense Strategy
Hardening Microservices Security: Building a Layered Defense StrategyHardening Microservices Security: Building a Layered Defense Strategy
Hardening Microservices Security: Building a Layered Defense Strategy
 
サーバレスアーキテクチャにしてみた【デブサミ2017 17-E-2】
サーバレスアーキテクチャにしてみた【デブサミ2017 17-E-2】サーバレスアーキテクチャにしてみた【デブサミ2017 17-E-2】
サーバレスアーキテクチャにしてみた【デブサミ2017 17-E-2】
 
SIerもはじめる わたしたちのDevOps #jjug_ccc
SIerもはじめる わたしたちのDevOps #jjug_cccSIerもはじめる わたしたちのDevOps #jjug_ccc
SIerもはじめる わたしたちのDevOps #jjug_ccc
 
MQTT - A practical protocol for the Internet of Things
MQTT - A practical protocol for the Internet of ThingsMQTT - A practical protocol for the Internet of Things
MQTT - A practical protocol for the Internet of Things
 

Similar to Hadoop security

Advanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterAdvanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterEdureka!
 
New Security Features in Apache HBase 0.98: An Operator's Guide
New Security Features in Apache HBase 0.98: An Operator's GuideNew Security Features in Apache HBase 0.98: An Operator's Guide
New Security Features in Apache HBase 0.98: An Operator's GuideHBaseCon
 
Install ldap server
Install ldap serverInstall ldap server
Install ldap serverMawardi 12
 
Install ldap server
Install ldap serverInstall ldap server
Install ldap serverMawardi 12
 
Hadoop Security in Detail__HadoopSummit2010
Hadoop Security in Detail__HadoopSummit2010Hadoop Security in Detail__HadoopSummit2010
Hadoop Security in Detail__HadoopSummit2010Yahoo Developer Network
 
1 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit20101 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit2010Hadoop User Group
 
Role based access control
Role based access controlRole based access control
Role based access controlPeter Edwards
 
Securing Hadoop - MapR Technologies
Securing Hadoop - MapR TechnologiesSecuring Hadoop - MapR Technologies
Securing Hadoop - MapR TechnologiesMapR Technologies
 
Apache Hadoop India Summit 2011 talk "Making Apache Hadoop Secure" by Devaraj...
Apache Hadoop India Summit 2011 talk "Making Apache Hadoop Secure" by Devaraj...Apache Hadoop India Summit 2011 talk "Making Apache Hadoop Secure" by Devaraj...
Apache Hadoop India Summit 2011 talk "Making Apache Hadoop Secure" by Devaraj...Yahoo Developer Network
 
Building Open Source Identity Management with FreeIPA
Building Open Source Identity Management with FreeIPABuilding Open Source Identity Management with FreeIPA
Building Open Source Identity Management with FreeIPALDAPCon
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON Padma shree. T
 
Moving a Windows environment to the cloud - DevOps Galway Meetup
Moving a Windows environment to the cloud - DevOps Galway MeetupMoving a Windows environment to the cloud - DevOps Galway Meetup
Moving a Windows environment to the cloud - DevOps Galway MeetupGiulio Vian
 
Hadoop Security: Overview
Hadoop Security: OverviewHadoop Security: Overview
Hadoop Security: OverviewCloudera, Inc.
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big DataGreat Wide Open
 
Secure Credential Management with CredHub - DaShaun Carter & Sharath Sahadevan
Secure Credential Management with CredHub - DaShaun Carter & Sharath Sahadevan Secure Credential Management with CredHub - DaShaun Carter & Sharath Sahadevan
Secure Credential Management with CredHub - DaShaun Carter & Sharath Sahadevan VMware Tanzu
 
AWS Meetup - Sydney - March
AWS Meetup - Sydney - MarchAWS Meetup - Sydney - March
AWS Meetup - Sydney - Marchmarkghiasy
 

Similar to Hadoop security (20)

Advanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterAdvanced Security In Hadoop Cluster
Advanced Security In Hadoop Cluster
 
New Security Features in Apache HBase 0.98: An Operator's Guide
New Security Features in Apache HBase 0.98: An Operator's GuideNew Security Features in Apache HBase 0.98: An Operator's Guide
New Security Features in Apache HBase 0.98: An Operator's Guide
 
Hadoop Security Preview
Hadoop Security PreviewHadoop Security Preview
Hadoop Security Preview
 
Hadoop Security Preview
Hadoop Security PreviewHadoop Security Preview
Hadoop Security Preview
 
Hadoop Security Preview
Hadoop Security PreviewHadoop Security Preview
Hadoop Security Preview
 
Install ldap server
Install ldap serverInstall ldap server
Install ldap server
 
Install ldap server
Install ldap serverInstall ldap server
Install ldap server
 
Hadoop Security in Detail__HadoopSummit2010
Hadoop Security in Detail__HadoopSummit2010Hadoop Security in Detail__HadoopSummit2010
Hadoop Security in Detail__HadoopSummit2010
 
1 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit20101 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit2010
 
Role based access control
Role based access controlRole based access control
Role based access control
 
Securing Hadoop - MapR Technologies
Securing Hadoop - MapR TechnologiesSecuring Hadoop - MapR Technologies
Securing Hadoop - MapR Technologies
 
Apache Hadoop India Summit 2011 talk "Making Apache Hadoop Secure" by Devaraj...
Apache Hadoop India Summit 2011 talk "Making Apache Hadoop Secure" by Devaraj...Apache Hadoop India Summit 2011 talk "Making Apache Hadoop Secure" by Devaraj...
Apache Hadoop India Summit 2011 talk "Making Apache Hadoop Secure" by Devaraj...
 
Building Open Source Identity Management with FreeIPA
Building Open Source Identity Management with FreeIPABuilding Open Source Identity Management with FreeIPA
Building Open Source Identity Management with FreeIPA
 
Sentry - An Introduction
Sentry - An Introduction Sentry - An Introduction
Sentry - An Introduction
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON
 
Moving a Windows environment to the cloud - DevOps Galway Meetup
Moving a Windows environment to the cloud - DevOps Galway MeetupMoving a Windows environment to the cloud - DevOps Galway Meetup
Moving a Windows environment to the cloud - DevOps Galway Meetup
 
Hadoop Security: Overview
Hadoop Security: OverviewHadoop Security: Overview
Hadoop Security: Overview
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Secure Credential Management with CredHub - DaShaun Carter & Sharath Sahadevan
Secure Credential Management with CredHub - DaShaun Carter & Sharath Sahadevan Secure Credential Management with CredHub - DaShaun Carter & Sharath Sahadevan
Secure Credential Management with CredHub - DaShaun Carter & Sharath Sahadevan
 
AWS Meetup - Sydney - March
AWS Meetup - Sydney - MarchAWS Meetup - Sydney - March
AWS Meetup - Sydney - March
 

Recently uploaded

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

Hadoop security

  • 2.  A form of protection where a separation is created between the assets and the threat.  Security in IT realm: Application security Computing security Data security Information security Network security
  • 3. Data : We have critical data in HDFS. Resources : Each node of Hadoop cluster has resources required for executing applications. Applications : Web Applications and REST APIs to access cluster details. Services : HDFS, YARN and other services running on the cluster nodes. Network Security : Services and Application communications over network.
  • 4.  Configuration  Data confidentiality  Service Level Authorization  Encryption  Authentication for Hadoop HTTP web-consoles  Delegation Tokens  Kerberos
  • 5. core-site.xml Parameter Value Notes hadoop.security.authentication kerberos simple : No authentication. (default) kerberos : Enable authentication by Kerberos. hadoop.security.authorization true Enable RPC service-level authorization. hadoop.rpc.protection authentication authentication : authentication only (default) integrity : integrity check in addition to authentication privacy : data encryption in addition to integrity hadoop.proxyuser.superuser.hosts comma separated hosts from which superuser access are allowd to impersonation. * means wildcard. hadoop.proxyuser.superuser.groups comma separated groups to which users impersonated by superuser belongs. * means wildcard. hdfs-site.xml Parameter Value Notes dfs.block.access.token.enable true Enable HDFS block access tokens for secure operations. dfs.https.enable true This value is deprecated. Use dfs.http.policy dfs.namenode.https-address nn_host_fqdn:50470 dfs.https.port 50470 dfs.namenode.keytab.file /etc/security/keytab/nn.service.keytab Kerberos keytab file for the NameNode. dfs.namenode.kerberos.principal nn/_HOST@REALM.TLD Kerberos principal name for the NameNode. dfs.namenode.kerberos.internal.spnego.principal HTTP/_HOST@REALM.TLD HTTP Kerberos principal name for the NameNode.
  • 6.  Superuser can submit jobs or access hdfs on behalf of another user in a secured way.  Superuser must have kerberos credentials to be able to impersonate another user. Ex. A superuser “bob” wants to submit job or access hdfs cluster as “alice” //Create ugi for joe. The login user is 'super'. UserGroupInformation ugi = UserGroupInformation.createProxyUser(“alice", UserGroupInformation.getLoginUser()); ugi.doAs(new PrivilegedExceptionAction<Void>() { public Void run() throws Exception { //Submit a job JobClient jc = new JobClient(conf); jc.submitJob(conf); //OR access hdfs FileSystem fs = FileSystem.get(conf); fs.mkdir(someFilePath); } }
  • 7.  The superuser must be configured on Namenode and ResourceManager to be allowed to impersonate another user. Following configurations are required. <property> <name>hadoop.proxyuser.super.groups</name> <value>group1,group2</value> <description>Allow the superuser super to impersonate any members of the group group1 and group2</description> </property> <property> <name>hadoop.proxyuser.super.hosts</name> <value>host1,host2</value> <description>The superuser can connect only from host1 and host2 to impersonate a user</description> </property>
  • 8.  Initial authorization mechanism to ensure clients connecting to a particular Hadoop service have the necessary, pre-configured, permissions and are authorized to access the given service.  For example, a MapReduce cluster can use this mechanism to allow a configured list of users/groups to submit jobs.  By default, service-level authorization is disabled for Hadoop.  To enable it set following configuration property in core-site.xml : <property> <name>hadoop.security.authorization</name> <value> true</value> </property>
  • 9.  hadoop-policy.xml defines an access control list for each Hadoop service.  Every ACL has simple format, a comma separated list of users and groups separated by space. Example: user1,user2 group1,group2.  Blocked Access Control Lists security.client.protocol.acl  security.client.protocol.acl.blocked  Refreshing Service Level Authorization Configuration hadoop dfsadmin –refreshServiceAcl
  • 10. <property> <name>security.job.submission.protocol.acl</name> <value>alice,bob mapreduce</value> </property>  Allow only users alice, bob and users in the mapreduce group to submit jobs to the MapReduce cluster:  Allow only DataNodes running as the users who belong to the group datanodes to communicate with the NameNode: <property> <name>security.datanode.protocol.acl</name> <value>datanodes</value> </property>  Allow any user to talk to the HDFS cluster as a DFSClient: <property> <name>security.client.protocol.acl</name> <value>*</value> </property>
  • 11.  Data Encryption on RPC • The data transfered between hadoop services and clients. • Setting hadoop.rpc.protection to "privacy" in the core-site.xml activate data encryption.  Data Encryption on Block data transfer • set dfs.encrypt.data.transfer to "true" in the hdfs-site.xml. • set dfs.encrypt.data.transfer.algorithm to either "3des" or "rc4" to choose the specific encryption algorithm. • By default, 3DES is used.  Data Encryption on HTTP • Data transfer between Web-console and clients are protected by using SSL(HTTPS).
  • 12.  It implements a permissions model for files and directories that shares much of the POSIX model.  User Identity  simple : In this mode of operation, the identity of a client process is determined by the host operating system.  kerberos : In Kerberized operation, the identity of a client process is determined by its Kerberos credentials.  Group Mapping  org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback  org.apache.hadoop.security.ShellBasedUnixGroupsMapping  HDFS stores the user and group of a file or directory as strings; there is no conversion from user and group identity numbers as is conventional in Unix.
  • 13.  Shell Operations • hadoop fs -chmod [-R] mode file • hadoop fs -chgrp [-R] group file • chown [-R] [owner][:[group]] file  The Super-User  The super-user is the user with the same identity as name node process itself.  Permissions checks never fail for the super-user.  There is no persistent notion of who was the super-user  When the name node is started the process identity determines who is the super-user for now. WebHDFS  Uses Kerberos (SPNEGO) and Hadoop delegation tokens for authentication.
  • 14. An ACL provides a way to set different permissions for specific named users or named groups, not only the file's owner and the file's group.  By default, support for ACLs is disabled.  Enable ACLs by adding the following configuration property to hdfs-site.xml and restarting the NameNode <property> <name>dfs.namenode.acls.enabled</name> <value>true</value> </property>
  • 15. ACLs Shell Commands  hdfs dfs -getfacl [-R] <path>  hdfs dfs -setfacl [-R] [-b|-k -m|-x <acl_spec> <path>]|[--set <acl_spec> <path>] -R : Recursive -m : Modify ACL. -b : Remove all but the base ACL entries. The entries for user, group and others are retained for compatibility with permission bits. -k : Remove the default ACL. -x : Remove specified ACL entries. <acl_spec> : Comma separated list of ACL entries. --set : Fully replace the ACL, discarding all existing entries.  hdfs dfs -ls <args> ls will append a '+' character to the permissions string of any file or directory that has an ACL.
  • 17.  Tokens are generated for applications, containers.  Use HMAC_ALGORITHM to generate password for tokens.  YARN interfaces for secret manager tokens BaseNMTokenSecretManager AMRMTokenSecretManager BaseClientToAMTokenSecretManager BaseContainerTokenSecretManager Source : Hortonworks
  • 18.  Enable ACL check in YARN Queues ACL  QueueACLsManager check for access of each user against the ACL defined in the queue.  Following would restrict access to the "support" queue to the users “shrey” and the members of the “sales" group:  yarn.scheduler.capacity.root.<queue-path>.acl_administer_queue <property> <name>yarn.acl.enable</name> <value>true</value> <property> <property> <name>yarn.scheduler.capacity.root.<queue-path>.acl_submit_applications</name> <value>shrey sales</value> <property> <property> <name>yarn.scheduler.capacity.root.<queue-path>.acl_administer_queue s</name> <value>sales</value> <property>
  • 19.
  • 20. Services Client Plain Text or Encrypted Password
  • 21.  Kerberos is a network authentication protocol.  It is used to authenticate the identity of the services running on different nodes (machines) communicating over a non-secure network.  It uses “tickets” as basic unit for authentication.
  • 22.  Authentication Server It is a service used to authenticate or verify clients. It usually checks for username of the requested client in the system  Ticket Granting Server It generates Ticket Granting Tickets (TGTs) based on target service name, initial ticket (if any) and authenticator.  Principles It is the unique identity to which Kerberos could assign tickets provided by Ticket Granting Server.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29. To enable Kerberos authentication in Hadoop, we need to configure following properties in core-site.xml <property> <name>hadoop.security.authentication</name> <value>kerberos</value> <!-- Giving value as "simple" disables security.--> </property> <property> <name>hadoop.security.authorization</name> <value>true</value> </property>
  • 30.  Keytab is a file containing Kerberos principals and encrypted keys. These files are used to login directly to Kerberos without being prompted for the password.  Enabling kerberos for HDFS services: A. Generating KeyTab Create the hdfs keytab file that will contain the hdfs principal and HTTP principal. This keytab file is used for the Namenode and Datanode B. Associate KeyTab with YARN principle kadmin: xst -norandkey -k yarn.keytab hdfs/fully.qualified.domain.name HTTP/fully.qualified.domain.name sudo mv hdfs.keytab /etc/hadoop/conf/
  • 31. <!-- Namenode security configs --> <property> <name>dfs.namenode.keytab.file</name> <value>/etc/hadoop/hdfs.keytab</value> <!-- path to the HDFS keytab --> </property> <property> <name>dfs.namenode.kerberos.principal</name> <value>hdfs/_HOST@YOUR-REALM.COM</value> </property>  Add the following properties to the hdfs-site.xml file <!-- Datanode security configs --> <property> <name>dfs.datanode.keytab.file</name> <value>/etc/hadoop/hdfs.keytab</value> <!-- path to the HDFS keytab --> </property> <property> <name>dfs.datanode.kerberos.principal</name> <value>hdfs/_HOST@YOUR-REALM.COM</value> </property>