• Save
Hadoop and Data Access Security
 

Hadoop and Data Access Security

on

  • 706 views

The fundamentals and best practices of securing your Hadoop cluster are top of mind today. In this session, we will examine and explain the components, tools, and frameworks used in Hadoop for ...

The fundamentals and best practices of securing your Hadoop cluster are top of mind today. In this session, we will examine and explain the components, tools, and frameworks used in Hadoop for authentication, authorization, audit, and encryption of data and processes. See how the latest innovations can let you securely connect more data to more users within your organization.

Statistics

Views

Total Views
706
Views on SlideShare
706
Embed Views
0

Actions

Likes
4
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Hadoop and Data Access Security Hadoop and Data Access Security Presentation Transcript

  • Securing the Hadoop Ecosystem
  • Hadoop Security and Compliance Challenges 2 • History • Security was not a priority in early Hadoop adopters like Yahoo! and Facebook / it is now! • Data concentration • Quantity and diversity of data creates compliance challenges • Flexibility of the Hadoop architecture • Many paths for data in, out, processing • Access data at different granularities, from fields to files • ELT: sensitive data “discovery” occurs after data arrives
  • Cloudera has led in investments in security 3 Authentication • First Hadoop distribution to offer strong authentication throughout Encryption • First Hadoop distribution to support encryption on wire Audit • Only Hadoop distribution to support audit histories for all data objects & access paths • Single point for log capture, audit Authorization • Founded the Apache Sentry project along with Oracle and Lab41 to manage fine- grained permissions Automation • Cloudera Manager automates security configurations & LDAP/AD integration View slide
  • Case Study: Finance and Banking • Identify patterns in financially-sensitive, PCI and PII data • Before: Unable to build applications on Hadoop; forced to use other systems, to greatly limit Hadoop access, or to forgo analysis due to privacy concerns • Now: Provide broad analysis capabilities with Impala to large population and secured by Sentry Fraud and Purchasing Behavior Analysis View slide
  • Enterprise Security in Hadoop overview 5 Four Functional Areas Hadoop Cluster Users Applications Operators Perimeter Data Access Visibility
  • Defining the Functional Areas 6 Perimeter Guarding access to the cluster itself Technical Concepts: Authentication Network isolation Data Protecting data in the cluster from unauthorized visibility Technical Concepts: Encryption, Tokenization, Data masking Access Defining what users and applications can do with data Technical Concepts: Permissions Authorization Visibility Reporting on where data came from and how it’s being used Technical Concepts: Auditing Lineage
  • Enabling Enterprise Security 7 Perimeter Guarding access to the cluster itself Technical Concepts: Authentication Network isolation Data Protecting data in the cluster from unauthorized visibility Technical Concepts: Encryption, Tokenization, Data masking Access Defining what users and applications can do with data Technical Concepts: Permissions Authorization Visibility Reporting on where data came from and how it’s being used Technical Concepts: Auditing Lineage SentryKerberos | AD/LDAP Cloudera NavigatorNative | Certified Partners
  • Enabling Enterprise Security 8 Perimeter Guarding access to the cluster itself Technical Concepts: Authentication Network isolation Data Protecting data in the cluster from unauthorized visibility Technical Concepts: Encryption, Tokenization, Data masking Access Defining what users and applications can do with data Technical Concepts: Permissions Authorization Visibility Reporting on where data came from and how it’s being used Technical Concepts: Auditing Lineage SentryKerberos | AD/LDAP Cloudera NavigatorNative | Certified Partners
  • Perimeter: Authentication in Hadoop 10 Kerberos • Provably strong authentication between all Hadoop services and (optionally) to end-points • Cloudera Manager hides complexity LDAP/AD • Username / password • Option for Hue, Hive Metastore, Impala connectors, Cloudera Manager admin logins SAML • For Single Sign-On (SSO) for listed options • Kerberos clients no longer required on most user end-points
  • Authentication Options and Coverage 11 HDFS DN NN YARN RM AM Impala ID SS MapReduce JT TT … Services … (Oozie, Search, etc.) 3rd Party Gateway … Client Client Client Client … Applications … (Pig, Hive, Hue, etc.) “End-to-End” Kerberos “Core” Kerberos “Edge” AD/LDAP/SAML
  • IT Integration: Kerberos • Users don’t want Yet Another Credential • Corp IT doesn’t want to provision and maintain thousands of service principals and keytabs • Solution: local KDC + one-way trust • Run MIT Kerberos KDC in the cluster • Put all service principals here • Set up one-way trust of central corporate realm by local KDC • Normal user credentials can be used to access Hadoop • Recommended: Use Cloudera Manager • To properly tune inter-related configuration knobs • To manage principals/keytabs creation and distribution • To preserve service monitoring with Kerberos security enabled
  • IT Integration: Kerberos + LDAP Hadoop Cluster Local KDC (MIT Kerberos) hdfs/host1@HADOOP.EXAMPLE.COM yarn/host2@HADOOP.EXAMPLE.COM … Central Active Directory user@EXAMPLE.COM … Cross-realm trust NN JT LDAP group mapping
  • Network Access Management • Use Hue to front-end both Hadoop and Oozie to control access through a web browser • HTTP proxy servers: • Oozie : MR jobs, Pig jobs, Hive jobs • HttpFS: hadoop fs is front-ended over HTTP • HBase REST server: HBase reads Secure configuration with Oozie, Hue and HttpFS front-ends co-located to act as network bridge Hue supports AD/LDAP based authentication instead of Kerberos for client simplicity
  • Enabling Enterprise Security 15 Perimeter Guarding access to the cluster itself Technical Concepts: Authentication Network isolation Data Protecting data in the cluster from unauthorized visibility Technical Concepts: Encryption, Tokenization, Data masking Access Defining what users and applications can do with data Technical Concepts: Permissions Authorization Visibility Reporting on where data came from and how it’s being used Technical Concepts: Auditing Lineage SentryKerberos | AD/LDAP Cloudera NavigatorNative | Certified Partners
  • Data: Protection in Hadoop 16 Data in Motion Data at Rest “Network Encryption” • SASL: Network RPC • SSL: MapReduce shuffle • SSL: Web-based user and administration tools • SSL: JDBC • HDFS data transfer protocol “Data Encryption” • Certified partner solutions • Field-level encryption • Data masking or tokenization • OS-level file system encryption
  • Enabling Enterprise Security 18 Perimeter Guarding access to the cluster itself Technical Concepts: Authentication Network isolation Data Protecting data in the cluster from unauthorized visibility Technical Concepts: Encryption, Tokenization, Data masking Access Defining what users and applications can do with data Technical Concepts: Permissions Authorization Visibility Reporting on where data came from and how it’s being used Technical Concepts: Auditing Lineage SentryKerberos | AD/LDAP Cloudera NavigatorNative | Certified Partners
  • Prior State of Authorization Two Sub-Optimal Choices for SQL on Hadoop 19 • Insecure Advisory Authorization • Users could grant themselves permissions • Intended to prevent accidental deletion of data • Problem: Did not guard against malicious users • Problem: Only worked with Hive • HDFS Impersonation • Data was only protected at the file level by HDFS permissions • Problem: File-level not granular enough • Problem: Lacked flexibility; not role-based
  • Sentry: Key Capabilities 21 Fine-Grained Authorization • Specify security for SERVERS, DATABASES, TABLES, VIEWS, and search indices Role-Based Authorization • SELECT privilege on views & tables • INSERT privilege on tables • TRANSFORM privilege on servers • ALL privilege on the server, databases, tables & views • ALL privilege is needed to create/modify schema Multitenant Administration • Separate policies for each database/schema • Can be maintained by separate admins
  • Sentry Architecture 22 Binding Layer Impala Impala Hive Policy Engine Search Policy Provider File Database HiveServer2 Authorization Provider Evaluation, Validation Parsing Interface Interface Local FS/HDFS Search
  • QueryMR SQL Query Execution Flow 23 Parse Build Check Plan Sentry Validate SQL grammar Construct statement tree Validate statement objects • First check: Authorization Forward to execution planner
  • Multitenant Security Global [groups] admin_group = admin_role dep1_admin = uri_role [roles] admin_role = server=server1 uri_role = hdfs:///ha-nn-uri/data [databases] db1 = hdfs://ha-nn- uri/user/hive/sentry/db1.ini Per Database [groups] dep1_admin = db1_admin_role dep1_analyst = db1_read_role [roles] db1_admin_role = server=server1- >db=db1 db1_read_role = server=server1- >db=db1->table=*->action=select
  • Apache Ecosystem and Sentry Inline support in Cloudera Impala Extensibility plug-in for Apache HiveServer2 Inline support in Cloudera Search Complementary security with HDFS ACLs
  • Access: Authorization in Hadoop 26 File ACL Admin RBAC Data RBAC • Permission at file-level granularity • HDFS POSIX-style permissions: u/g/o • Access Control Lists (ACL) • HBase, Oozie, MapReduce • Permissions on tables, views, indices • Sentry for HiveServer2, Impala, Search App and Workflow • Cloudera Manager, Hue
  • Enabling Enterprise Security 28 Perimeter Guarding access to the cluster itself Technical Concepts: Authentication Network isolation Data Protecting data in the cluster from unauthorized visibility Technical Concepts: Encryption, Tokenization, Data masking Access Defining what users and applications can do with data Technical Concepts: Permissions Authorization Visibility Reporting on where data came from and how it’s being used Technical Concepts: Auditing Lineage SentryKerberos | AD/LDAP Cloudera NavigatorNative | Certified Partners
  • Visibility: Cloudera Navigator 29 Audit & Access Control • Maintain full audit history • Ensuring appropriate permissions and reporting on data access for compliance Discovery & Exploration • Finding out what data is available and what it looks like Lineage • Tracing data back to its original source Lifecycle Management • Migration of data based on policies 3RD PARTY APPS STORAGE FOR ANY TYPE OF DATA UNIFIED, ELASTIC, RESILIENT, SECURE CLOUDERA’S ENTERPRISE DATA HUB BATCH PROCESSING MAPREDUCE ANALYTIC SQL IMPALA SEARCH ENGINE SOLR MACHINE LEARNING SPARK STREAM PROCESSING SPARK STREAMING WORKLOAD MANAGEMENT YARN FILESYSTEM HDFS ONLINE NOSQL HBASE DATA MANAGEMENT CLOUDERANAVIGATOR SYSTEM MANAGEMENT CLOUDERAMANAGER SENTRY, SECURE
  • Why Navigator? 30 Lots of Data Landing in Cloudera Enterprise  Huge quantities  Many different sources – structured and unstructured  Varying levels of sensitivity 1 Many Users Working with the Data  Administrators and compliance officers  Analysts and data scientists  Business users 2 Need to Effectively Control and Consume Data  Get visibility and control over the environment  Discover and explore data 3
  • 31 31
  • 32 32
  • 33 33
  • Leading Investment to Address the Challenges 34 Authentication First Hadoop distribution to offer strong authentication throughout Encryption First Hadoop distribution to support encryption on wire Audit Only Hadoop distribution to support audit histories for all data objects and access paths; Single point for log capture, audit Authorization Founded the Apache Sentry project along with Oracle and Lab41 to manage fine-grained permissions Automation Cloudera Manager automates security configurations & LDAP/AD integration
  • Cloudera 5: Enabling the Enterprise Data Hub 35 Open Source Scalable Flexible Cost-Effective ✔ Managed ✖ Open Architecture ✖ Secure and Governed ✖ ✔ ✔ ✔ 3RD PARTY APPS STORAGE FOR ANY TYPE OF DATA UNIFIED, ELASTIC, RESILIENT, SECURE CLOUDERA’S ENTERPRISE DATA HUB BATCH PROCESSING MAPREDUCE ANALYTIC SQL IMPALA SEARCH ENGINE SOLR MACHINE LEARNING SPARK STREAM PROCESSING SPARK STREAMING WORKLOAD MANAGEMENT YARN FILESYSTEM HDFS ONLINE NOSQL HBASE DATA MANAGEMENT CLOUDERANAVIGATOR SYSTEM MANAGEMENT CLOUDERAMANAGER SENTRY