Thank you all the users of Hadoop & Spark
Thank you if you are developing, contributing to Hadoop & Spark
Thank you for coming to this session.
Access Control governed by external data sources: E.g HDFS, S3, HBase, access policies still apply
John Doe first authenticates to Kerberos before launching Spark Shell
kinit -kt /etc/security/keytabs/johndoe.keytab johndoe@EXAMPLE.COM
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 lib/spark-examples*.jar 10
The first step of security is network security
The second step of security is Authentication
Most Hadoop echo system projects rely on Kerberos for Authentication
Kerberos – 3 Headed Guard Dog : https://en.wikipedia.org/wiki/Cerberus
Client talks to KDC with Kerberos Library
Orange line – Client to KDC communication
Green line – Client to HDFS communication, does not talk to Kerberos/KDC
John Doe first authenticates to Kerberos before launching Spark Shell
kinit -kt /etc/security/keytabs/johndoe.keytab johndoe@EXAMPLE.COM
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 lib/spark-examples*.jar 10
John Doe first authenticates to Kerberos before launching Spark Shell
kinit -kt /etc/security/keytabs/johndoe.keytab johndoe@EXAMPLE.COM
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 lib/spark-examples*.jar 10
Controlling HDFS Authorization is easy/Done
Controlling Hive row/column level authorization in Spark is WIP
For HDFS as Data Source can use RPC or use SSL with WebHDFS
For NM Shuffle Data – Use YARN SSL
Spark support SSL for FS (Broadcast or File download)
Shuffle Block Transfer supports SASL based encryption – SSL coming
Thank you Prasad Wagle (Twitter) & Prabhjot Singh (Hortonworks)