Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Curb your insecurity with HDP

651 views

Published on

Curb your insecurity with HDP

Published in: Technology
  • Be the first to comment

Curb your insecurity with HDP

  1. 1. Curb Your Insecurity with HDP Tips for a Secure Cluster (with Spark too) Hadoop Summit – San Jose June 29th, 2016
  2. 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Pardeep Kumar Sr. Systems Architect, NA Prof. Services 4+ years in Hadoop Helping Fortune500 customers succeed in their Hadoop journey Setup, implement, migrate and secure some of the largest clusters in North America Security, & Migration SME, HCC Guru Loves Hadoop, Cricket and Kerberos ;) pardeep.kumar@hortonworks.com @hadooptutor linkedin.com/in/pardeepkumarmishra Ancil McBarnett Sr. Solutions Engineer, NorthEast Helping organizations design, implement, operate and consume Hadoop and Big Data Solutions. Specialize in Security and Hive Tuning. HCC Guru. Loves Cricket, and DJ Bravo Champion :D amcbarnett@hortonworks.com @mcbkingdom linkedin.com/in/mcbkingdom
  3. 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hadoop Security in 4 Steps
  4. 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved How do I set policy across the entire cluster? Who am I/prove it? What can I do? What did I do? How can I encrypt at rest and over the wire? Comprehensive Approach to Security Data Protection Protect data at rest and in motion In order to protect any data system you must implement the following: Audit Maintain a record of data access Authorization Provision access to data Authentication Authenticate users and systems Administration Central management and consistent security
  5. 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDP Security: Comprehensive, Complete, Extensible Perimeter Level Security • Network Security (i.e. Firewalls) • Apache Knox (i.e. Gateways) Authentication • LDAP/ AD - Kerberos Data Protection • Encrypts data in motion and data at rest; refer partner encryption solutions for broader needs: HDFS TDE with Ranger KMS Authorization & Audit • Consistent authorization controls across all Apache components within HDP: Apache Ranger
  6. 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Authentication with Kerberos Kerberos is necessary evil, just do it!!
  7. 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Security Without Kerberos
  8. 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Configure Kerberos – Ambari Wizard
  9. 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Security With Kerberos
  10. 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Ranger
  11. 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Ranger
  12. 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS File Security
  13. 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hive Database and Table Security
  14. 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Authorization and Audit Authorization Fine grain access control • HDFS – Folder, File • Hive – Database, Table, Column • HBase – Table, Column Family, Column • Storm, Knox and more Audit Extensive user access auditing in HDFS, Hive and HBase • IP Address • Resource type/ resource • Timestamp • Access granted or denied Control access into system Flexibility in defining policies
  15. 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Rest API Security with Apache Knox
  16. 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hadoop REST APIs Useful for connecting to Hadoop from the outside the cluster When more client language flexibility is required – i.e. Java binding not an option Challenges – Client must have knowledge of cluster topology – Required to open ports (and in some cases, on every host) outside the cluster Service API WebHDFS Supports HDFS user operations including reading files, writing to files, making directories, changing permissions and renaming. WebHCat Job control for MapReduce, Pig and Hive jobs, and HCatalog DDL commands. Learn more about WebHCat. Hive Hive REST API operations HBase HBase REST API operations Oozie Job submission and management, and Oozie administration.
  17. 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Authentication—API Security with Knox • Eliminates SSH “edge node” • Central API management • Central audit control • Service level authorization • SSO Integration—Siteminder and OAM • LDAP and AD integration Incubated and led by Hortonworks, Apache Knox extends the reach of Hadoop REST API without Kerberos complexities Integrated with existing systems to simplify identity maintenance Single, simple point of access for a cluster Central controls ensure consistency across one or more clusters • Kerberos Encapsulation • Single Hadoop access point • REST API hierarchy • Consolidated API calls • Multi-cluster support
  18. 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hadoop REST API with Knox Service Direct URL Knox URL WebHDFS http://namenode-host:50070/webhdfs https://knox-host:8443/webhdfs WebHCat http://webhcat-host:50111/templeton https://knox-host:8443/templeton Oozie http://ooziehost:11000/oozie https://knox-host:8443/oozie HBase http://hbasehost:60080 https://knox-host:8443/hbase Hive http://hivehost:10001/cliservice https://knox-host:8443/hive YARN http://yarn-host:yarn-port/ws https://knox-host:8443/resourcemanager Masters could be on many different hosts One hosts, one port Consistent paths SSL config at one host
  19. 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hadoop REST API Security: Drill-Down REST Client Enterprise Identity Provider LDAP/AD Knox Gateway GW GW Firewall Firewall DMZ LB Edge Node/Hadoo p CLIs RPC HTTP HTTP HTTP LDAP Hadoop Cluster 1 Masters Slaves RM NN Web HCat Oozie DN NM HS2 Hadoop Cluster 2 Masters Slaves RM NN Web HCat Oozie DN NM HS2 HBase HBase
  20. 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Data Protection
  21. 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Data Protection HDP allows you to apply data protection policy at different layers across the Hadoop stack Layer What? How ? Storage and Access Encrypt data while it is at rest HDFS Transparent Data Encryption, Partners, Hbase encryption, OS level encrypt, Transmission Encrypt data as it moves SSL, SASL, RPC
  22. 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Points of Communication Page 22 WebHDFS DataTransferProtocol Nodes M/R Shuffle Client 1 2 4 RPC3 Nodes DataTransfer2 JDBC/ODBC 3 Hadoop Cluster RPC 4
  23. 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Data Protection - HDFS Encryption DATA ACCESS DATA MANAGEMENT SECURITY PARTNERS YARN KeyProvider API (partner integration point) Key Management System (KMS) Stateless Key Management ° 1 ° ° ° ° ° ° ° ° ° ° ° ° ° N° 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° HDFS Encryption Zone Encrypted File Encrypted File Encrypted File Encrypted File Encrypted Files Name Node HDFS Client HDFS Client • Leverage Native HDFS Transparent Data Encryption or commercial ones like Protegrity etc. • Hortonworks collaborating with partners to deliver enterprise scale Key Management , deliver more choices to customers • Open source KMS with Ranger • Or Partner with commercial KMS solutions i.e. Voltage KMS - Partner joint engineering resources - Voltage Stateless Key Management integrated with KeyProvider API Only HDP offers open source and commercial choices for key managementOpen Source Key Management
  24. 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo Transparent Data Encryption
  25. 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Securing Spark Deployments
  26. 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Spark - Authentication Hadoop Cluster Spark leverages Kerberos on YARN KDC Use Spark ST, submit Spark Job Spark gets Namenode (NN) service ticket YARN launches Spark Executors using John Doe’s identity John Doe Spark AM NN Executor reads from HDFS using John Doe’s delegation token kinit 1 2 3 4 5 6 7 Get Service Ticket (ST) for Spark
  27. 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Spark – Authorization YARN Cluster A B C KDC Use Spark ST, submit Spark Job Get Namenode (NN) service ticket Executors read from HDFS Client gets service ticket for Spark John Doe RangerCan John launch this job? Can John read this file
  28. 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Spark – Channel Encryption - Example Shuffle Data Control/RPC Shuffle BlockTransfer Read/Write Data FS – Broadcast, File Download spark.authenticate.enableSaslEncryption= true spark.authenticate = true. Leverage YARN to distribute keys Depends on Data Source, For HDFS RPC (RC4 | 3DES) or SSL for WebHDFS NM > Ex leverages YARN based SSL spark.ssl.enabled = true
  29. 29. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Gotchas with Spark Security  Client -> Spark Thrift Server > Spark Executors – No identity propagation on 2nd hop – Forces STS to run as Hive user to read all data – Reduces security – Use SparkSQL via shell or programmatic API – https://issues.apache.org/jira/browse/SPARK-5159  SparkSQL – Granular security unavailable – Ranger integration will solve this problem (Refer to talk in Room 210A for Security in Spark and Hive) – Brings Row/Column level/Masking features to SparkSQL  Spark + HBase with Kerberos – Issue fixed in Spark 1.4 (Spark-6918)  Spark Stream + Kafka + Kerberos + SSL – Issues fixed in HDP 2.4.x  Spark jobs > 72 Hours – Kerberos token not renewed, fixed in Spark 1.5+
  30. 30. 30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Questions??

×