Spark Security

519 views

Published on

Introduction to Spark Security

Published in: Software
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
519
On SlideShare
0
From Embeds
0
Number of Embeds
169
Actions
Shares
0
Downloads
13
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Spark Security

  1. 1. Apache Spark: Enterprise Security for Production Deployments 蒋 逸峰(しょう いつほう/Yifeng Jiang) Solutions Engineer, Hortonworks @uprush December 21, 2016
  2. 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What are the security requirements? à Spark user should be authenticated à Integrate with corporate LDAP/AD à Allow only authorized users access à Audit all access à Protect data both in motion & at rest à Easily manage all security à Make security easy to manage à …
  3. 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Interacting with Spark Ex Spark on YARN Zeppelin Spark- Shell Ex Spark Thrift Server Driver REST ServerDriver Driver Driver
  4. 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Context: Spark Deployment Modes • Spark on YARN – Spark driver (SparkContext) in YARN AM(yarn-cluster) – Spark driver (SparkContext) in local (yarn-client): • Spark Shell & Spark Thrift Server runs in yarn-client only Client Executor App MasterSpark Driver Client Executor App Master Spark Driver YARN-Client YARN-Cluster
  5. 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Spark on YARN Spark Submit John Doe Spark AM 1 Hadoop Cluster HDFS Executor YARN RM 4 2 3 Node Manager
  6. 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved DEMO A DATA LAKE WITHOUT SECURITY
  7. 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Spark – Security – Four Pillars à Authentication à Authorization à Audit à Encryption Spark leverages Kerberos on YARN
  8. 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Authenticate users with Kerberos/AD KDC Use Spark ST, submit Spark Job Spark gets Namenode (NN) service ticket YARN launches Spark Executors using John Doe’s identity Get service ticket for Spark, John Doe Spark AM NN Executor reads from HDFS using John Doe’s delegation token kinit 1 2 3 4 5 6 7 Hadoop Cluster
  9. 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Spark – Kerberos - Example kinit -kt /etc/security/keytabs/johndoe.keytab johndoe@EXAMPLE.COM spark-submit --class org.apache.spark.examples.SparkPi -- master yarn-cluster --num-executors 3 --driver-memory 512m -- executor-memory 512m --executor-cores 1 /usr/hdp/current/spark-client/lib/spark-examples*.jar 10
  10. 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Allow only authorized users access to Spark jobs YARN Cluster A B C KDC Use Spark ST, submit Spark Job Get Namenode (NN) service ticket Executors read from HDFS Client gets service ticket for Spark RangerCan John launch this job? Can John read this file John Doe
  11. 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SparkSQL: Fine grained security
  12. 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SparkSQL Security -- Current Status à SparkSQL – Only coarse grain access control today JDBC client Spark ThriftServer (driver) YARN Container HDFS /apps/hive/warehouse/… Hive Metastore YARN Container (DAG) Run as hive user
  13. 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SparkSQL Security à Spark Thrift Server & Spark Executors run as Hive user to read all data – No authorization support in STS – No Ranger integration support – Anyone can authenticate to STS can real ALL data à No identity propagation on 2nd hop (STS to Executors): no doAs equivalence in HS2
  14. 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved YARN & HDFS How Hive Security Works HiveServer 2 A B C KDC Use Hive ST, submit query 4. Hive gets Namenode (NN) service ticket 5.Hive creates MR/ Tez using NN ST as proxy user Ranger 1.Original request w/user id/password Client gets query result O/JDBC clients LDAP 2.HS2 Authenticates user/pass Ranger Sync users/groups from LDAP 3. Ranger AuthZ
  15. 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved DEMO HIVE & SPARKSQL AUTHORIZATION
  16. 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Key Features: Spark Column Security with LLAP Ã Fine-Grained Column Level Access Control for SparkSQL. Ã Fully dynamic policies per user. Doesn’t require views. Ã Use Standard Ranger policies and tools to control access and masking policies. Flow: 1. SparkSQL gets data locations known as “splits” from HiveServer and plans query. 2. HiveServer2 authorizes access using Ranger. Per-user policies like row filtering are applied. 3. Spark gets a modified query plan based on dynamic security policy. 4. Spark reads data from LLAP. Filtering / masking guaranteed by LLAP server. HiveServer2 Authorization Hive Metastore Data Locations View Definitions LLAP Data Read Filter Pushdown Ranger Server Dynamic Policies Spark Client 1 2 4 3
  17. 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Example: Per-User Row Filtering by Region in SparkSQL Spark User 2 (East Region) Spark User 1 (West Region) Original Query: SELECT * from CUSTOMERS WHERE total_spend > 10000 Query Rewrites based on Dynamic Ranger Policies LLAP Data Access User ID Region Total Spend 1 East 5,131 2 East 27,828 3 West 55,493 4 West 7,193 5 East 18,193 Dynamic Rewrite: SELECT * from CUSTOMERS WHERE total_spend > 10000 AND region = “east” Dynamic Rewrite: SELECT * from CUSTOMERS WHERE total_spend > 10000 AND region = “west” Fine grained Security to SparkSQL http://bit.ly/2bLghGz http://bit.ly/2bTX7Pm
  18. 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Dynamic Masking and Row Level Filtering Country National ID CC No Name DOB MRN Policy ID US 232323233 4539067047629850 John Doe 9/12/1969 8233054331 nj23j424 US 333287465 5391304868205600 Jane Doe 9/13/1969 3736885376 cadsd984 Japan T30007873 4532488639863821 Ben Jackson 73/1975 876392473A KK-287365 Ranger Policy Enforcement Country National ID CC No MRN Name US xxxxx3233 4539 xxxx xxxx xxxx null John Doe US xxxxx7465 5391 xxxx xxxx xxxx null Jane Doe Country National ID Name MRN Japan 232323233 John Doe 8233054331 Users from US customer support groups see row filtered data for US persons with CC and SSN as masked values and MRN is nullified Japan Health Policy Admins view relevant columns of data unmasked but are restricted by row filtering policies to see data for Japan persons only
  19. 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved THANK YOU @uprush

×