Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hadoop security @ Philly Hadoop Meetup May 2015

872 views

Published on

Apache Hadoop Security, Today and Tomorrow

Published in: Software

Hadoop security @ Philly Hadoop Meetup May 2015

  1. 1. Apache Hadoop Security Today and Tomorrow Philly Hadoop Meetup Shravan (Sean) Pabba | Senior Systems Engineer | @skpabba
  2. 2. 2© Cloudera, Inc. All rights reserved. Agenda • Hadoop Evolution • Why is Hadoop Security Different? • Enterprise Security • Perimeter • Access • Visibility • Data • What’s Next? • Demo
  3. 3. 3© Cloudera, Inc. All rights reserved. Have you done? •Hadoop? •Cloudera/Hortonworks/MapR/Others? •Security? •Kerberos/AD/Encryption?
  4. 4. 4© Cloudera, Inc. All rights reserved. Evolution of the Hadoop Platform 2005-07 2008 2009 2010 2011 2012 2013-2015 Core Hadoop (HDFS, MapReduce) HBase ZooKeeper Core Hadoop Hive Mahout HBase ZooKeeper Core Hadoop Sqoop Whirr Avro Hive Mahout HBase ZooKeeper Core Hadoop Flume Bigtop Oozie MRUnit HCatalog Hue Sqoop Whirr Avro Hive Mahout HBase ZooKeeper YARN Core Hadoop Spark Tez Impala Kafka Flume Bigtop Oozie MRUnit HCatalog Hue Sqoop Whirr Avro Hive Mahout HBase ZooKeeper YARN Core Hadoop Parquet Sentry Spark Tez Impala Kafka Flume Bigtop Oozie MRUnit HCatalog Hue Sqoop Whirr Avro Hive Mahout HBase ZooKeeper YARN Core Hadoop The stack is continually evolving and growing!
  5. 5. 5© Cloudera, Inc. All rights reserved. Multiple Workloads Batch, Interactive, and Real-Time. Leading performance and usability in one platform. • End-to-end analytic workflows • Access more data • Work with data in new ways • Enable new users Security and Administration Process Ingest Sqoop, Flume Transform MapReduce, Hive, Pig, Spark Discover Analytic Database Impala Search Solr Model Machine Learning SAS, R, Spark, Mahout Serve NoSQL Database HBase Streaming Spark Streaming Scale-Out Storage HDFS, HBase YARN, Cloudera Manager, Cloudera Navigator Multiple big data opportunities in one optimized, high-performance, multi-tenant platform.
  6. 6. 6© Cloudera, Inc. All rights reserved. Why is Hadoop Security Different? Benefits of EDH Security Side Effect A single platform for all the data Combining data and audiences that used to be securely silo’d A rich, flexible ecosystem of tools & utilities Security method proliferation can increase costs/ introduce coverage gaps Ingest data of any type Sensitive fields added without review Active Archive provides lower cost storage than legacy systems Lose the built-in compliance controls that legacy systems provided
  7. 7. 7© Cloudera, Inc. All rights reserved. Business Users • Run high value workloads in cluster • Quickly adopt new innovations Information Security • Follow established policies and procedures • Maintain compliance IT/Operations • Integrate with existing IT investments • Minimize end-user support • Automate configuration Multiple Security Stakeholders - Competing Goals?
  8. 8. 8© Cloudera, Inc. All rights reserved. A Brief History of Hadoop Security Originally developed without security in mind Yahoo! focused on adding authentication Project Rhino works to enhance Hadoop Security 2005 2009 2013 • No authentication of users or services • Anyone could submit arbitrary code to be executed • Any user could impersonate other users • Security model was complex • Security configurations were complex and error-prone • No data-at-rest encryption • Limited authorization capabilities Project aims to add: • Data Protection • Authorization Simplified Authentication
  9. 9. 9© Cloudera, Inc. All rights reserved. What is Enterprise Security? Four Functional Areas Hadoop Cluster Users Applications Operators Perimeter Data Access Visibility
  10. 10. 10© Cloudera, Inc. All rights reserved. Enterprise Security Authentication, Authorization, Audit, and Compliance Perimeter Guarding access to the cluster itself InfoSec Concept: Authentication Access Defining what users and applications can do with data InfoSec Concept: Authorization Kerberos, LDAP/AD, Cloudera Manager HDFS ACLs, Apache Sentry, HBase ABAC Visibility Reporting on where data came from and how it’s being used InfoSec Concept: Audit Cloudera Navigator Data Protecting data in the cluster from unauthorized visibility InfoSec Concept: Compliance HDFS Encryption, Navigator Encrypt & Key Trustee
  11. 11. 11© Cloudera, Inc. All rights reserved. • Contributed by Intel in 2013 • Blueprint for enterprise-grade security Cloudera and Intel Project Rhino Rhino Goal: Unified Authorization Engineers at Intel and Cloudera (together with Oracle and IBM) are jointly contributing to Apache Sentry Rhino Goal: Encryption and Key Management Framework Cloudera and Intel engineers are now contributing HDFS encryption capabilities that can plug into enterprise key managers
  12. 12. 12© Cloudera, Inc. All rights reserved. Perimeter Security Requirements Preserve user choice of the right Hadoop service (e.g. Impala, Spark) Conform to centrally managed authentication policies Implement with existing standard systems: Active Directory and Kerberos Perimeter Guarding access to the cluster itself InfoSec Concept: Authentication Kerberos, LDAP/AD, Cloudera Manager
  13. 13. 13© Cloudera, Inc. All rights reserved. Perimeter: Authentication in Hadoop Kerberos • Provably strong authentication between all Hadoop services and (optionally) to end-points • Cloudera Manager hides complexity LDAP/AD • Username / password • Option for Hue, Hive Metastore, Impala connectors, Cloudera Manager admin logins SAML • For Single Sign-On (SSO) for listed options • Kerberos clients no longer required on most user end-points
  14. 14. 14© Cloudera, Inc. All rights reserved. Authentication Options and Coverage HDFS HBase & Search Impala & Hive Server 2 MapReduce & YARN … Other Services Commercial BI Gateways Client Client Client Client … Applications (Pig, Hue, etc.) “End-to-End” Kerberos “Core” Kerberos “Edge” AD/LDAP/SAML
  15. 15. 15© Cloudera, Inc. All rights reserved. Example KDC NFS Server ID: jdoe@FOO.BAR.ORG TGT & Client Session KeyTGT & Authenticator Service Ticket & Service Session Key Service Ticket & Authenticator Timestamp
  16. 16. 16© Cloudera, Inc. All rights reserved. Kerberizing Hadoop Cluster • Need a MIT KDC or Active Directory. • Appropriate realm definitions in place. • Each Hadoop service needs a principal for each host it runs on. • Cloudera Manager makes it easy to enable Kerberos. • Recent Cloudera blog post has detailed instructions, http://blog.cloudera.com/blog/2015/03/how-to-quickly-configure-kerberos-for- your-apache-hadoop-cluster/
  17. 17. 17© Cloudera, Inc. All rights reserved. • Manages Users, Groups, and Services • Provides username / password authentication • Group membership determines Service access Active Directory • Trusted and standard third-party • Authenticated users receive “Tickets” • “Tickets” gain access to Services Kerberos Active Directory and Kerberos User authenticates to AD Authenticated user gets Kerberos Ticket Ticket grants access to Services e.g. Impala User [ssmith] Password[***** ]
  18. 18. 18© Cloudera, Inc. All rights reserved. Automated Authentication with Cloudera Manager Direct to AD Kerberos Integration Kerberos Configuration Wizard Added Tuning and Monitoring • Users authenticate directly against AD • Hadoop Services defined directly in AD Kerberos • User access to Hadoop services controlled via AD Groups • Automates Kerberos configuration for existing Hadoop clusters simplifying a tedious and error prone process • Tune interrelated configuration for dual KDC’s • Service monitoring through CM when Kerberos enabled
  19. 19. 19© Cloudera, Inc. All rights reserved. Access Security Requirements Provide users access to data needed to do their job Centrally manage access policies Leverage a role-based access control model built on AD Access Defining what users and applications can do with data InfoSec Concept: Authorization HDFS ACL, Apache Sentry, HBase ABAC
  20. 20. 20© Cloudera, Inc. All rights reserved. Manage data access by role, instead of by individual user • Fraud Analyst Role has read access on ALL transaction data • Branch Teller Role has read / write access on very limited set of data • Relationships between users and roles are established via groups An RBAC policy is then uniformly enforced for all Hadoop services • Provides unified authorization controls • As opposed to tools for managing numerous, service specific policies RBAC and Centralized Authorization
  21. 21. 21© Cloudera, Inc. All rights reserved. Sentry provides unified authorization via fine-grained RBAC for Impala, Hive, Search, MapReduce, Pig, HDFS… Unified Authorization with Apache Sentry Sentry Perm. Read Access to ALL Transaction Data Sentry Role Fraud Analyst Role Group Fraud Analysts Sam Smith
  22. 22. 22© Cloudera, Inc. All rights reserved. • Sentry can be configured to use AD to determine a user’s group assignments • Group assignment changes in AD are automatically picked up, resulting in updated Sentry role assignments Sentry and Active Directory Groups Sentry Perm. Read Access to ALL Transaction Data Sentry Role Fraud Analyst Role AD Group Fraud Analysts Sam Smith
  23. 23. 23© Cloudera, Inc. All rights reserved. Sentry enforces each rule across Hadoop components Hive Server 2 Enforcement code Impala MapRed uce, Pig, HDFS* Apps: Datameer, Platfora, etc* Permissions rules Common enforcement code for consistency. Rule 1: Allow fraud analysts read access to the transaction table Permissions specified by administrators (top-level and delegated) Enforcement code Enforcement code Enforcement code Search Enforcement code
  24. 24. 24© Cloudera, Inc. All rights reserved. Visual Policy Management
  25. 25. 25© Cloudera, Inc. All rights reserved. Visibility Security Requirements Understand where report data came from and discover more data like it Comply with policies for audit, data classification, and lineage Centralize the audit repository; perform discovery; automate lineage Visibility Reporting on where data came from and how it’s being used InfoSec Concept: Audit Cloudera Navigator
  26. 26. 26© Cloudera, Inc. All rights reserved. Auditing and Access Management • Full audit and access history for HDFS, Impala, HIVE, HBase and Sentry • Review and verify HDFS permissions Metadata & Discovery • Easily discover, classify, and locate data to support governance and compliance Lineage & Provenance • Automatic collection and easy visualization of upstream and downstream data lineage Lifecycle Management • Policy-based data management Visibility through Cloudera Navigator Data Management for an EDH HDFS HBASE HIVE CLOUDERA NAVIGATOR CDH Audit & Access Management Classification & Discovery Lineage Lifecycle Management Enterprise Metadata Repository  Business metadata  System metadata
  27. 27. 27© Cloudera, Inc. All rights reserved.
  28. 28. 28© Cloudera, Inc. All rights reserved. Data Security Requirements Perform analytics on regulated data Encrypt data at rest and in motion, conform to key management policies, protect from root Integrate with existing HSM as part of key management infrastructure Data Protecting data in the cluster from unauthorized visibility InfoSec Concept: Compliance HDFS Encryption, Navigator Encrypt & Key Trustee
  29. 29. 29© Cloudera, Inc. All rights reserved. Data: Protection in Hadoop Data in Motion Data at Rest “Network Encryption” • SSL/TLS/HTTPS – web user interfaces, JDBC • SASL – most other paths (i.e. network RPC) • Centrally enabled and configured in Cloudera Manager “Data Encryption” • HDFS Encryption • Cloudera Navigator • OS-level file system encryption • Certified partner solutions • Field-level encryption • Data masking or tokenization
  30. 30. 30© Cloudera, Inc. All rights reserved. Encryption of Data in Motion using SSL/TLS, HTTPS and SASL • HDFS – SASL (RPC), SASL (Data Transfer Protocol) • YARN – SASL (RPC) • MR – SASL (RPC), HTTPS (shuffle) • Flume – SSL (Avro RPC) • HS2 – SASL (Thrift), SASL (JDBC), SSL (JDBC, ODBC) • Impala - SSL • Search – SSL • Hue – HTTPS Encrypting Data In Motion
  31. 31. 31© Cloudera, Inc. All rights reserved. • ZK – SASL (RPC) • Sentry – SASL (RPC) • Oozie – HTTPS • Spark – SSL for Akka and HTTP (for broadcast and file server) protocols. No SSL support for WebUI and block transfer service. • Kafka – None Encrypting Data In Motion
  32. 32. 32© Cloudera, Inc. All rights reserved. • Available in CDH 5.3/Hadoop 2.6 (HDFS- 6134) • Supports specification of HDFS directories as “Encryption Zones” • All subsequent directory contents encrypted • Multi-tenant encryption with tenant specific keys • Compliments Navigator encrypt for meta- data encryption • Key management via Navigator key trustee HDFS Encryption
  33. 33. 33© Cloudera, Inc. All rights reserved. • Encryption for HDFS, HBase • No encryption for metadata, log files, ingest paths • No key management • Complicated, manual command line configuration • Incomplete audit trail Open Source HDFS Encryption Manager Navigator Impala Hive HDFS HBase Sentry Log Files Ingest Paths Metadata Store Encrypted Data Encryption Key Legend
  34. 34. 34© Cloudera, Inc. All rights reserved. Cloudera’s Solution: • ALL data encrypted: HDFS, HBase, metadata, log files, ingest paths • Enterprise Key Management via Navigator Key Trustee • Configuration support via Cloudera Manager • Audit integration to Cloudera Navigator • Optional root-of-trust integration with HSMs Compliance-Ready Encryption & Key Management Manager Navigator Impala Hive HDFS HBase Sentry Navigator Key Trustee Log Files Metadata Store Encrypted Data Encryption Key Legend Ingest Paths
  35. 35. 35© Cloudera, Inc. All rights reserved. Transparent layer between application and file system • Compliance-Ready • Massively Scalable • High Performance: Optimized for Intel • Separation of Duties • Key Management with Navigator Key Trustee Navigator Encrypt
  36. 36. 36© Cloudera, Inc. All rights reserved. “Virtual safe-deposit box” for managing encryption keys or other Hadoop security artifact Navigator Key Trustee • Separates Keys from Encrypted Data • Centralized Management with Audit Controls • Integration with HSMs from Thales, RSA, and SafeNet • Roadmap: Management of SSL certificates, SSH keys, tokens, passwords, Kerberos Keytab Files, and more
  37. 37. 37© Cloudera, Inc. All rights reserved. What’s Next? • Log Redaction – Was delivered as part of CDH 5.4 • Highly Available Authorization • Unified Credential Management • Simplified Wire Encryption • Attribute-Based Access Controls & “Follow the Data” Security • Single fine-grained permissions rule (such as row and column level) to be enforced across all access paths, including MR and Spark, in addition to Hive and Impala. • Continued Cloudera & Intel Efforts
  38. 38. 38© Cloudera, Inc. All rights reserved. Demo Time
  39. 39. Thank You @skpabba

×