Submit Search
Upload
大数据数据安全
•
0 likes
•
249 views
Jianwei Li
Follow
基于角色的权限管控及统一的大数据安全管控
Read less
Read more
Technology
Report
Share
Report
Share
1 of 48
Download now
Download to read offline
Recommended
sql on hadoop
sql on hadoop
Jianwei Li
大数据数据治理及数据安全
大数据数据治理及数据安全
Jianwei Li
快速数据快速分析引擎-Kudu
快速数据快速分析引擎-Kudu
Jianwei Li
Multi-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BT
Cloudera, Inc.
Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and Governed
Cloudera, Inc.
Security implementation on hadoop
Security implementation on hadoop
Wei-Chiu Chuang
What the Enterprise Requires - Business Continuity and Visibility
What the Enterprise Requires - Business Continuity and Visibility
Cloudera, Inc.
Hadoop and Data Access Security
Hadoop and Data Access Security
Cloudera, Inc.
Recommended
sql on hadoop
sql on hadoop
Jianwei Li
大数据数据治理及数据安全
大数据数据治理及数据安全
Jianwei Li
快速数据快速分析引擎-Kudu
快速数据快速分析引擎-Kudu
Jianwei Li
Multi-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BT
Cloudera, Inc.
Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and Governed
Cloudera, Inc.
Security implementation on hadoop
Security implementation on hadoop
Wei-Chiu Chuang
What the Enterprise Requires - Business Continuity and Visibility
What the Enterprise Requires - Business Continuity and Visibility
Cloudera, Inc.
Hadoop and Data Access Security
Hadoop and Data Access Security
Cloudera, Inc.
Self-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft Azure
Cloudera, Inc.
Big Data Fundamentals
Big Data Fundamentals
Cloudera, Inc.
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
Cloudera, Inc.
John Zuniga Resume
John Zuniga Resume
John Zuniga
Cloudera GoDataFest Security and Governance
Cloudera GoDataFest Security and Governance
GoDataDriven
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
DataWorks Summit
Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016
StampedeCon
Apache Hadoop 3
Apache Hadoop 3
Cloudera, Inc.
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
Shravan (Sean) Pabba
A deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloud
Cloudera, Inc.
Configuring a Secure, Multitenant Cluster for the Enterprise
Configuring a Secure, Multitenant Cluster for the Enterprise
Cloudera, Inc.
Hadoop Security
Hadoop Security
Timothy Spann
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Abhiraj Butala
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Cloudera, Inc.
Securing Data in Hybrid on-premise and Cloud Environments Using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments Using Apache Ranger
DataWorks Summit
Big Data Security with Hadoop
Big Data Security with Hadoop
Cloudera, Inc.
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data Success
Cloudera, Inc.
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18
Cloudera, Inc.
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
Uwe Printz
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Cloudera, Inc.
The Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data Hub
DataWorks Summit
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014
Cloudera, Inc.
More Related Content
What's hot
Self-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft Azure
Cloudera, Inc.
Big Data Fundamentals
Big Data Fundamentals
Cloudera, Inc.
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
Cloudera, Inc.
John Zuniga Resume
John Zuniga Resume
John Zuniga
Cloudera GoDataFest Security and Governance
Cloudera GoDataFest Security and Governance
GoDataDriven
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
DataWorks Summit
Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016
StampedeCon
Apache Hadoop 3
Apache Hadoop 3
Cloudera, Inc.
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
Shravan (Sean) Pabba
A deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloud
Cloudera, Inc.
Configuring a Secure, Multitenant Cluster for the Enterprise
Configuring a Secure, Multitenant Cluster for the Enterprise
Cloudera, Inc.
Hadoop Security
Hadoop Security
Timothy Spann
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Abhiraj Butala
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Cloudera, Inc.
Securing Data in Hybrid on-premise and Cloud Environments Using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments Using Apache Ranger
DataWorks Summit
Big Data Security with Hadoop
Big Data Security with Hadoop
Cloudera, Inc.
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data Success
Cloudera, Inc.
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18
Cloudera, Inc.
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
Uwe Printz
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Cloudera, Inc.
What's hot
(20)
Self-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft Azure
Big Data Fundamentals
Big Data Fundamentals
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
John Zuniga Resume
John Zuniga Resume
Cloudera GoDataFest Security and Governance
Cloudera GoDataFest Security and Governance
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016
Apache Hadoop 3
Apache Hadoop 3
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
A deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloud
Configuring a Secure, Multitenant Cluster for the Enterprise
Configuring a Secure, Multitenant Cluster for the Enterprise
Hadoop Security
Hadoop Security
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Securing Data in Hybrid on-premise and Cloud Environments Using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments Using Apache Ranger
Big Data Security with Hadoop
Big Data Security with Hadoop
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data Success
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Similar to 大数据数据安全
The Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data Hub
DataWorks Summit
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014
Cloudera, Inc.
Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...
Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...
Cloudera, Inc.
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoop
Niel Dunnage
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Cloudera, Inc.
Project Rhino: Enhancing Data Protection for Hadoop
Project Rhino: Enhancing Data Protection for Hadoop
Cloudera, Inc.
Comprehensive Hadoop Security for the Enterprise | Part I | Compliance Ready ...
Comprehensive Hadoop Security for the Enterprise | Part I | Compliance Ready ...
Cloudera, Inc.
Seeking Cybersecurity--Strategies to Protect the Data
Seeking Cybersecurity--Strategies to Protect the Data
Cloudera, Inc.
Hadoop security implementationon 20171003
Hadoop security implementationon 20171003
lee tracie
Cloudera training secure your cloudera cluster 7.10.18
Cloudera training secure your cloudera cluster 7.10.18
Cloudera, Inc.
Intel boubker el mouttahid
Intel boubker el mouttahid
BigDataExpo
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
Cloudera, Inc.
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
confluent
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Cloudera, Inc.
IoT-Enabled Predictive Maintenance
IoT-Enabled Predictive Maintenance
Cloudera, Inc.
BigData Security - A Point of View
BigData Security - A Point of View
Karan Alang
Vault 1.4 launch webinar
Vault 1.4 launch webinar
Mitchell Pronschinske
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWS
Cloudera, Inc.
Cloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera cluster
Cloudera, Inc.
Bringing Trus and Visibility to Apache Hadoop
Bringing Trus and Visibility to Apache Hadoop
DataWorks Summit
Similar to 大数据数据安全
(20)
The Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data Hub
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014
Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...
Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoop
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Project Rhino: Enhancing Data Protection for Hadoop
Project Rhino: Enhancing Data Protection for Hadoop
Comprehensive Hadoop Security for the Enterprise | Part I | Compliance Ready ...
Comprehensive Hadoop Security for the Enterprise | Part I | Compliance Ready ...
Seeking Cybersecurity--Strategies to Protect the Data
Seeking Cybersecurity--Strategies to Protect the Data
Hadoop security implementationon 20171003
Hadoop security implementationon 20171003
Cloudera training secure your cloudera cluster 7.10.18
Cloudera training secure your cloudera cluster 7.10.18
Intel boubker el mouttahid
Intel boubker el mouttahid
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
IoT-Enabled Predictive Maintenance
IoT-Enabled Predictive Maintenance
BigData Security - A Point of View
BigData Security - A Point of View
Vault 1.4 launch webinar
Vault 1.4 launch webinar
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWS
Cloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera cluster
Bringing Trus and Visibility to Apache Hadoop
Bringing Trus and Visibility to Apache Hadoop
Recently uploaded
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
ngoud9212
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
Fwdays
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
Slibray Presentation
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Mark Simos
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
Deakin University
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
Scott Keck-Warren
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
Padma Pradeep
Key Features Of Token Development (1).pptx
Key Features Of Token Development (1).pptx
LBM Solutions
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Fwdays
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
null - The Open Security Community
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
Mark Billinghurst
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
Enterprise Knowledge
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
Mattias Andersson
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
Ridwan Fadjar
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
Fwdays
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
Alex Barbosa Coqueiro
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Wonjun Hwang
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
comworks
Recently uploaded
(20)
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
Key Features Of Token Development (1).pptx
Key Features Of Token Development (1).pptx
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
大数据数据安全
1.
1© Cloudera, Inc. All rights reserved. Data Access Security In Hadoop Jianwei Li jarred@cloudera.com Apache Sentry and RecordService
2.
2© Cloudera, Inc. All rights reserved. Agenda • Data Access
Security in Hadoop • Sentry • RecordService
3.
3© Cloudera, Inc. All rights reserved. Hadoop Security Pillars Authentication, Authorization, Audit, and Compliance Access Defining what users and applications can do with data Technical Concepts: Permissions Authorization Data Protecting data in the cluster from unauthorized visibility Technical Concepts: Encryption, Tokenization, Data masking Visibility Reporting on where data came from and how it’s being used Technical Concepts: Auditing Lineage Cloudera Manager Apache Sentry & RecordService Cloudera Navigator Navigator Encrypt & Key Trustee | Partners Perimeter Guarding access to the cluster itself Technical Concepts: Authentication Network isolation
4.
4© Cloudera, Inc. All rights reserved. Sentry & RecordService OPERATIONS Cloudera Manager Cloudera Director DATA MANAGEMENT Cloudera Navigator Encrypt and KeyTrustee Optimizer STRUCTURED Sqoop UNSTRUCTURED Kafka, Flume PROCESS, ANALYZE, SERVE UNIFIED SERVICES RESOURCE MANAGEMENT YARN SECURITY Sentry, RecordService FILESYSTEM HDFS RELATIONAL Kudu NoSQL HBase STORE INTEGRATE BATCH Spark, Hive, Pig MapReduce STREAM Spark SQL Impala SEARCH Solr SDK Kite
5.
5© Cloudera, Inc. All rights reserved. Authorization Mechanisms in Hadoop • POSIX-style permissions on files and directories • Read,
Write,Excecute • Owner, group, other • Access Control Lists (ACL) for management of services and resources • set different permissions for specific named users or named groups • hdfs dfs -setfacl [-R] [-b|-k -m|-x <acl_spec> <path>]|[--set <acl_spec> <path>] • Role-Based Access Control (RBAC) for certain services with advanced access controls to data • Sentry • Record Service
6.
6© Cloudera, Inc. All rights reserved. Apache Sentry
7.
7© Cloudera, Inc. All rights reserved. Sentry Overview • Apache Sentry is an authorization module for Hadoop •
Provides the ability to control and enforce access to data and/or privileges on data for authenticated users • Apache Licensed & ASF Incubator project • Supports ease of administration through role-based authorization (RBAC) • It currently works out of the box with • Hive/Hcatalog • Apache Solr • Impala • More to come (e.g. HBase, Kudu)
8.
8© Cloudera, Inc. All rights reserved. Sentry and Hadoop
Components
9.
9© Cloudera, Inc. All rights reserved. Sentry Architecture • Sentry Server:The
Sentry RPC server manages the authorization metadata. It supports interfaces to securely retrieve and manipulate the metadata. • Data Engine:This is a data processing application such as Hive or Impala that needs to authorize access to data or metadata resources. The data engine loads the Sentry plugin and all client requests for accessing resources are intercepted and routed to the Sentry plugin for validation. • Sentry Plugin:The Sentry plugin runs in the data engine. It offers interfaces to manipulate authorization metadata stored in the Sentry server, and includes the authorization policy engine that evaluates access requests using the authorization metadata retrieved from the server.
10.
10© Cloudera, Inc. All rights reserved. Sentry Components • Bindings –
Extracts access requests from client and passes to policy engine. • Policy engine – reconciles access requests with access policies. • Policy provider – provides common interface to rules database • Files-based – deprecated except for Solr • Database-based – matches RDBMS syntax
11.
11© Cloudera, Inc. All rights reserved. Sentry Policy Store & Service • Persist the role to privilege and group to role mappings in an RDBMS • Provide programmatic APIs to create, query, update and delete it. •
Enables various Sentry clients to retrieve and modify the privileges concurrently and securely. • Supports Kerberos authentication.
12.
12© Cloudera, Inc. All rights reserved. Sentry/Hive Integration Query authorization • Done with HiveServer2 via plug-in • Performed after the query is successfully compiled •
The plug-in gets the list of objects the query is try to access • Converts this list into an authorization request • User is allowed or not
13.
13© Cloudera, Inc. All rights reserved. Sentry/Hive Integration Changing privileges Same as above If approved: • Hive generates a Sentry specific task • This task invokes the Sentry store client •
sends RPC request to Sentry service for making authorization policy changes.
14.
14© Cloudera, Inc. All rights reserved. Sentry/Impala Integration • Similar to Hive • Catalogd caches and distributes Sentry policy changes across all Impalad nodes •
Authorization happens quicker since requests are local to each Impalad
15.
15© Cloudera, Inc. All rights reserved. Synchronizing HDFS ACLs and Sentry Permissions Maps Sentry privileges to HDFS ACLs: • SELECT privilege -> Read access on the file. • INSERT privilege -> Write access on the file. •
ALL privilege -> Read and Write access on the file. The NameNode loads a Sentry plugin that caches Sentry privileges as well Hive metadata.
16.
16© Cloudera, Inc. All rights reserved. The actors that play part in Sentry authorization •Resource – Server, Database, Table or URI •Privileges – Select, Insert •Roles –
Collections of privileges •Users and Groups
17.
17© Cloudera, Inc. All rights reserved. User Identity and Group Mapping • User management:
Active Directory, MIT Keberos • Group Mapping: • System Security Services Daemon(SSSD) • Linux OS with LDAP • SAMBA,Centrify,Winbind… • Active Directory/LDAP • hadoop.security.group.mapping -> org.apache.hadoop.security.LdapGroupsMapping • Manual configure in OS • Useradd,newgrp
18.
18© Cloudera, Inc. All rights reserved. Group and Role Mapping •
Groups • Alice -> finance-department • Bob -> finance-department, finance-manager • Role mapping: • “Analyst” role: “select” on “Customers”, “Sales” table • Grant “Analyst” role to “finance-department”
19.
19© Cloudera, Inc. All rights reserved. Sentry Commands – Create/Drop Role •Creates a role to which privileges can be granted. •Only Sentry admin users can use these commands •By default, the hive,
impala and hue users have admin privileges in Sentry. •CREATE ROLE [role_name]; •DROP ROLE [role_name];
20.
20© Cloudera, Inc. All rights reserved. Sentry Commands – Grant/Revoke Privilege •Grant privileges on an object to a role •Only Sentry admin users can use these commands • GRANT <PRIVILEGE> [, <PRIVILEGE> ] ON <OBJECT> <object_name> TO ROLE <roleName> [,ROLE <roleName>] •
REVOKE<PRIVILEGE> [, <PRIVILEGE> ] ON <OBJECT> <object_name> FROM ROLE <roleName> [,ROLE <roleName>] • GRANT <PRIVILEGE> ... WITH GRANT OPTION • Objects can be Server, Database, Table, URI
21.
21© Cloudera, Inc. All rights reserved. Sentry Commands – Grant/Revoke Role •The GRANT ROLE
statement can be used to assign or remove roles to groups. •Only Sentry admin users can use these commands GRANT ROLE role_name [, role_name] TO GROUP <groupName> [,GROUP <groupName>] REVOKE ROLE role_name [, role_name] FROM GROUP <groupName> [,GROUP <groupName>]
22.
22© Cloudera, Inc. All rights reserved. Sentry Commands – SHOW • SHOW CURRENT ROLES; -
List all the roles in effect for the current user session • SHOW ROLES; - To list all the roles in the system (only for sentry admin users) • SHOW ROLE GRANT GROUP <groupName>; - To list all the roles assigned to the given <groupName> (only allowed for Sentry admin users and others users that are part of the group) • SHOW GRANT ROLE <roleName>; - List all the grants for the given <roleName> (only allowed for Sentry admin users and other users that have been granted the role) • SHOW GRANT ROLE <roleName> on OBJECT <objectName>; - List all the grants for a role on the given <objectName> (only allowed for Sentry admin users and other users that have been granted the role)
23.
23© Cloudera, Inc. All rights reserved. Sentry Web UI
24.
24© Cloudera, Inc. All rights reserved. RecordService
25.
25© Cloudera, Inc. All rights reserved. Permission Enforcement today with Sentry Hive Server 2 Sentry Enforcement Impala HDFS: MR, Pig, Spark, ... Search (Solr) Sentry Permissions rules Rule: “Allow fraud analysts read access to the transaction table” Admins specify permissions Sentry Enforcement Sentry Enforcement Sentry Enforcement Apps: Datameer, Platfora, Zoomdata, etc Sentry Service Coarse grained (table)
26.
26© Cloudera, Inc. All rights reserved. The Need for Fine-Grained Access Control Across all access paths Columns: Sensitive column visibility varies; Example: credit card numbers • Managers: 1234 5678 1234 5678 • CallCenter: XXXX XXXX XXXX 5678 •
Analysts: XXXX XXXX XXXX XXXX • Others: Does not see credit card column Rows: Different groups of users need access to different records • European privacy laws • Government security clearance • Financial information restrictions
27.
27© Cloudera, Inc. All rights reserved. The workaround Date/time Accnt # SSN
Asset Trade Broker 09:33:11 16- Feb-2015 0234837823 238-23- 9876 ABC Sell group1 11:33:01 16- Feb-2015 3947848494 329-44- 9847 TBT Buy group2 14:12:34 16- Feb-2015 4848367383 123-56- 2345 DEF Sell group3 09:22:03 16- Feb-2015 3485739384 585-11- 2345 INTC Buy group1 11:55:33 16- Feb-2015 3847598390 234-11- 8765 F Buy group1 10:22:55 16- Feb-2015 8765432176 344-22- 9876 UA Buy group3 13:45:24 16- Feb-2015 3456789012 412-22- 8765 XYZ Sell group2 09:03:44 16- Feb-2015 4857389329 123-44- 5678 TMV Buy group1 15:55:55 16- Feb-2015 4756983234 234-76- 9274 MA Buy group3 Date/time Accnt # SSN Asset Trade Broker 14:12:34 16- Feb-2015 4848367383 123-56- 2345 DEF Sell group3 10:22:55 16- Feb-2015 8765432176 344-22- 9876 UA Buy group3 15:55:55 16- Feb-2015 4756983234 234-76- 9274 MA Buy group3 Date/time Accnt # SSN Asset Trade Broker 11:33:01 16- Feb-2015 3947848494 329-44- 9847 TBT Buy group2 13:45:24 16- Feb-2015 3456789012 412-22- 8765 XYZ Sell group2 Date/time Accnt # SSN Asset Trade Broker 09:33:11 16- Feb-2015 0234837823 238-23- 9876 ABC Sell group1 09:22:03 16- Feb-2015 3485739384 585-11- 2345 INTC Buy group1 11:55:33 16- Feb-2015 3847598390 234-11- 8765 F Buy group1 09:03:44 16- Feb-2015 4857389329 123-44- 5678 TMV Buy group1 Split the original file; Use HDFS permissions to limit access What if only some brokers in each group are allowed to see full SSN?
28.
28© Cloudera, Inc. All rights reserved. The Solution • Apply controls to the master data file • Row, column, and sub-column (masking) controls •
Ability to enforce these across access paths Date/time Accnt # SSN Asset Trade Broker 09:33:11 16- Feb-2015 0234837823 238-23- 9876 ABC Sell group1 11:33:01 16- Feb-2015 3947848494 329-44- 9847 TBT Buy group2 14:12:34 16- Feb-2015 4848367383 123-56- 2345 HDP Sell group3 09:22:03 16- Feb-2015 3485739384 585-11- 2345 INTC Buy group1 11:55:33 16- Feb-2015 3847598390 234-11- 8765 F Buy group1 10:22:55 16- Feb-2015 8765432176 344-22- 9876 UA Buy group3 13:45:24 16- Feb-2015 3456789012 412-22- 8765 AMZN Sell group2 Column-Level Controls Row-Level Controls What All Group 1 Brokers See:
29.
29© Cloudera, Inc. All rights reserved. RecordService Unified Access Control Enforcement Sentry Permissions Rules Permissions specified by administrators (top-level and delegated) Rule: Allow managers to see social security numbers Sentry Service HDFS HBase STORAGE RecordService Impala Spark
MR Solr Apps …
30.
30© Cloudera, Inc. All rights reserved. RecordService - Overview •
Simplifies • Provides a higher level, logical abstraction for data (ie Tables or Views) • Returns schemed objects (instead of paths and bytes). No need for applications to worry about storage APIs and file formats. • HCatalog? Similar concept - RecordService is secure, performant. Plan to support HCatalog as a data model on RecordService. • Secures • Central location for all authorization checks using Sentry metadata. • Secure service that does not execute arbitrary user code • Accelerates • Unified data access path allows platform-wide performance improvements.
31.
31© Cloudera, Inc. All rights reserved. Architecture
32.
32© Cloudera, Inc. All rights reserved. Architecture • Runs as a distributed service: Planner Servers & Worker
Servers • Servers do not store any state • Easy HA, fault tolerance. • Planner Servers responsible for request planning • Retrieve and combine metadata (NN, HMS, Sentry) • Split generation -> Creates tasks for workers • Performs authorization • Worker Servers reads from storage and constructs records. • IO, file parsing, predicate evaluation • Runs as the “source” for a DAG computation
33.
33© Cloudera, Inc. All rights reserved. Architecture – Fault tolerance • Cluster state persisted in ZK •
Membership, delegation tokens, secret keys • Servers do not communicate with each other directly => scalability • Planner services • Expected to run a few (i.e. 3) for HA • Fault tolerance handled with clients getting a list of planners and failing over • Plan requests are short • Worker services • Expect to run on each node in the cluster with data • Fault tolerance handled by framework (e.g. MR) rescheduling task
34.
34© Cloudera, Inc. All rights reserved. Architecture – Security • Authentication using Kerberos and delegation tokens •
Planner authorizes request using metadata in Sentry • Column level ACLs • Row level ACLs – create a view with a predicate • Masking – create a view with the masking function in the select list • Worker runs generated tasks.
35.
35© Cloudera, Inc. All rights reserved. Client APIs – Integration with ecosystem • Similar APIs designed to integrate with MapReduce
and Spark • Client APIs make things simpler
36.
36© Cloudera, Inc. All rights reserved. MR Example //FileInputFormat.setInputPaths(job, new Path(args[0])); //job.setInputFormatClass(AvroKeyInputFormat.class); RecordServiceConfig.setInputTable(configuration,
null, args[0]); job.setInputFormatClass( com.cloudera.recordservice.avro.mapreduce.AvroKeyInputFormat.class);
37.
37© Cloudera, Inc. All rights reserved. Spark Example // Comment out
one or the other val file = sc.recordServiceTextFile(path) //val file = sc.textFile(path)
38.
38© Cloudera, Inc. All rights reserved. Spark SQL Example ctx.sql(s""" |CREATE TEMPORARY TABLE
$tbl |USING com.cloudera.recordservice.spark.DefaultSource |OPTIONS ( | RecordServiceTable '$db.$tbl', | RecordServiceTableSize '$size' |) """.stripMargin)
39.
39© Cloudera, Inc. All rights reserved. Performance • Shares some core components with Impala • IO management, optimized C++ code, runtime code generation, uses low level storage APIs •
Highly efficient implementation of the scan functionality • Optimized columnar on wire format • Inspired by Apache Parquet • Accelerates performance for many workloads
40.
40© Cloudera, Inc. All rights reserved. Terasort • ~Worst case scenario. Minimal schema: a single STRING column • Custom RecordServiceTeraInputFormat
(similar to TeraInputFormat) • 78 Node cluster (12 cores/24 Hyper-Threaded, 12 disks) • Ran on 1 billion, 50 billion and 1 trillion (~100TB) scales • See Github repo for more details and runnable examples.
41.
41© Cloudera, Inc. All rights reserved. TeraChecksum 1 0.48 0.23 1.03 0.8 0.85 0 0.2 0.4 0.6 0.8 1 1.2 1B (MapReduce) 50B (MapReduce) 1T (MapReduce)
1B (Spark) 50B (Spark) 1T (Spark) Normalized job time TeraChecksum Without RecordService With RecordService
42.
42© Cloudera, Inc. All rights reserved. Spark SQL • Represents a more expected use case • Data is fully schemed •
TPCDS • 500GB scale factor, on parquet • Cluster • 5 node cluster
43.
43© Cloudera, Inc. All rights reserved. 0 50 100 150 200 250 300 350 TPCDS SparkSQL SparkSQL SparkSQL with RecordService Spark SQL ~15% improvement in query times; queries are not scan bound
44.
44© Cloudera, Inc. All rights reserved. Spark SQL 29.5 31 14 23.5 0 5 10 15 20 25 30 35 2% Selective Scan Sum(col) SparkSQL SparkSQL SparkSQL with RecordService
45.
45© Cloudera, Inc. All rights reserved. Summary – Sentry and RecordService Sentry Perm. Read Access to Transactions.D ate… Where Country = US Sentry Perm. Read Access to Customers.Cust omerID… Where Country = USSentry Role U.S. Customer Transaction Analysis Group Tier 1 Customer Support Reps Sam Smith Group Tier 1 Broker Analysts Martha Jones Cust. ID SSN
Phone Country 6758493 329-44-9847 US 09:22:03 16- Feb-2015 344-22-9876 EU 5768459 585-11-2345 US Date/Time Cust. ID Trade Country 11:33:01 16- Feb-2015 Sell US 09:22:03 16- Feb-2015 344- 22- 9876 EU 13:45:24 16- Feb-2015 Buy US
46.
46© Cloudera, Inc. All rights reserved. Getting Started: Sentry Users: Install CDH, try a VM, or try on AWS: cloudera.com/download Read docs: www.cloudera.com/content/www/en- us/documentation/enterprise/latest/topics/sg_sentry _overview.html Get help: community.cloudera.com Developers: Contribute: sentry.incubator.apache.org Report issues: issues.apache.org/jira/browse/SENTRY Join Dev list: dev@sentry.incubator.apache.org Contributions/participation are welcome and encouraged!
47.
47© Cloudera, Inc. All rights reserved. Getting Started: RecordService Users: Install the RS Beta or try a VM: cloudera.github.io/RecordServiceClient Get help: recordservice-user@googlegroups.com Developers: Contribute: github.com/cloudera/RecordServiceClient Join Dev list: recordservice-dev@googlegroups.com Contributions/participation are welcome and encouraged!
48.
48© Cloudera, Inc. All rights reserved. Thank you
Download now