Your SlideShare is downloading. ×
0
Sentry: Open Source Authorization for
Hive & Impala
Alexander Alten-Lorenz | Senior Field Engineer, Cloudera	

Wednesday, ...
Defining  Security  Func/ons

Perimeter	
  
!
!
!

!2

Data	
  

Access	
  

Visibility	
  

Guarding	
  access	
  to	
  th...
Enabling  Enterprise  Security

Perimeter	
  
!
!
!

Data	
  

Access	
  

Visibility	
  

Guarding	
  access	
  to	
  the...
Hive  Overview
SQL	
  Access	
  to	
  Hadoop	
  
§
§

MapReduce:	
  great	
  massively	
  scalable	
  batch	
  processin...
Our  OpenSource  ac/vity
CDH	
  4.1	
  (HiveServer2)	
  
§
§

Concurrency	
  and	
  Kerberos	
  authen3ca3on	
  for	
  H...
Current  State  of  Authoriza/on
Two	
  Sub-­‐OpEmal	
  Choices	
  for	
  SQL	
  on	
  Hadoop
Insecure	
  Advisory	
  Auth...
Authoriza/on  Requirements
Secure	
  Authoriza3on	
  
Ability	
  to	
  control	
  access	
  to	
  data	
  and/or	
  privil...
The  Next  Step:  Introducing  Sentry
AuthorizaEon	
  module	
  for	
  Hive	
  &	
  Impala
Unlocks	
  Key	
  RBAC	
  Requi...
Key  Benefits  of  Sentry
Store	
  Sensi3ve	
  Data	
  in	
  Hadoop	
  
Extend	
  Hadoop	
  to	
  More	
  Users	
  
Enable	...
Key  Capabili/es  of  Sentry
Fine-­‐Grained	
  Authoriza3on	
  
Specify	
  security	
  for	
  SERVERS,	
  DATABASES,	
  TA...
Apache  Ecosystem  and  Sentry
Shared	
  Hive	
  Metastore	
  (with	
  
HCatalog)	
  
Extensibility	
  plug-­‐in	
  for	
 ...
Sentry  Architecture
Impala

Binding	
  
Layer

HiveServer2

Impala

Hive

Authoriza<on	
  
Provider

Future

Policy	
  En...
Query  Execu/on  Flow
SQL

Parse

Validate	
  SQL	
  grammar

Build

Construct	
  statement	
  tree

Check

Validate	
  st...
Example  Security  Policy
[databases]
junior_analyst_role = server=server1->db=jranalyst1, 
# Defines the location of the ...
Live  Demo  &  Give  Aways
Closes	
  gap	
  between	
  HDFS	
  and	
  Metastore	
  
Easy	
  to	
  implement	
  
RFC	
  230...
About
dev@sentry.incubator.apache.org	

alexander@cloudera.com	

@mapredit	

mapredit.blogspot.com	

!

Web: http://wiki.a...
Sentry - An Introduction
Upcoming SlideShare
Loading in...5
×

Sentry - An Introduction

2,476

Published on

Talk I held in Stuttgart about Sentry, with a Live Demo.

Published in: Technology

Transcript of "Sentry - An Introduction "

  1. 1. Sentry: Open Source Authorization for Hive & Impala Alexander Alten-Lorenz | Senior Field Engineer, Cloudera Wednesday, 7th November 2013
  2. 2. Defining  Security  Func/ons Perimeter   ! ! ! !2 Data   Access   Visibility   Guarding  access  to  the   cluster  itself   Protec3ng  data  in  the   cluster  from  unauthorized   visibility   Defining  what  users  and   applica3ons  can  do  with   data   Repor3ng  on  where  data   came  from  and  how  it’s   being  used   Technical  Concepts:   Authen3ca3on   Network  isola3on ! ! Technical  Concepts:   Encryp3on   Data  masking ! ! Technical  Concepts:   Permissions   Authoriza3on ! ! Technical  Concepts:   Audi3ng   Lineage
  3. 3. Enabling  Enterprise  Security Perimeter   ! ! ! Data   Access   Visibility   Guarding  access  to  the   cluster  itself   Protec3ng  data  in  the   cluster  from  unauthorized   visibility   Defining  what  users  and   applica3ons  can  do  with   data   Repor3ng  on  where  data   came  from  and  how  it’s   being  used   Technical  Concepts:   Authen3ca3on   Network  isola3on  Kerberos  |  Oozie  |  Knox ! ! Technical  Concepts:   Encryp3on   Data  masking Cer3fied  Partners ! ! Technical  Concepts:   Permissions   Authoriza3on Sentry Available  7/23 !3 ! ! Technical  Concepts:   Audi3ng   Lineage Cloudera  Navigator
  4. 4. Hive  Overview SQL  Access  to  Hadoop   § § MapReduce:  great  massively  scalable  batch  processing  framework;   required  development  for  each  new  job   Hive  opened  up  Hadoop  for  more  users  with  standard  SQL   ! Key  Challenges   § § Batch  MapReduce  too  slow  for  interac3ve  BI/analy3cs   No  concurrency,  no  security   ! OpEons  Today   § § !4 Impala  designed  for  low-­‐latency  queries   HiveServer2  delivers  concurrency,  authen3ca3on  
  5. 5. Our  OpenSource  ac/vity CDH  4.1  (HiveServer2)   § § Concurrency  and  Kerberos  authen3ca3on  for  Hive   JDBC  and  Beeline  clients   CDH  4.2   § § § HDFS  impersona3on  authoriza3on  as  stop-­‐gap   Pluggable  authen3ca3on  API   JDBC  LDAP  username/password   ODBC   § § !5 Supports  Kerberos  authen3ca3on  and  LDAP   Extended  partner  cer3fica3on
  6. 6. Current  State  of  Authoriza/on Two  Sub-­‐OpEmal  Choices  for  SQL  on  Hadoop Insecure  Advisory  Authoriza3on   Users  can  grant  themselves  permissions   Intended  to  prevent  accidental  dele3on  of  data   Problem:  Doesn’t  guard  against  malicious  users   HDFS  Impersona3on   Data  is  protected  at  the  file  level  by  HDFS  permissions   Problem:  File-­‐level  not  granular  enough   Problem:  Not  role-­‐based !6
  7. 7. Authoriza/on  Requirements Secure  Authoriza3on   Ability  to  control  access  to  data  and/or  privileges  on  data  for   authen3cated  users   Fine-­‐Grained  Authoriza3on   Ability  to  give  users  access  to  a  subset  of  data  (e.g.  column)  in  a   database   Role-­‐Based  Authoriza3on   Ability  to  create/apply  templa3zed  privileges  based  on   func3onal  roles   Mul3-­‐Tenant  Administra3on   Ability  for  central  admin  group  to  empower  lower-­‐level  admins   to  manage  security  for  each  database/schema !7
  8. 8. The  Next  Step:  Introducing  Sentry AuthorizaEon  module  for  Hive  &  Impala Unlocks  Key  RBAC  Requirements   Secure,  fine-­‐grained,  role-­‐based  authoriza3on   Mul3-­‐tenant  administra3on   Open  Source   Intent  to  donate  to  ASF   Available  and  Fully  Supported   Hiveserver2  &  Impala  1.1  ini3ally !8
  9. 9. Key  Benefits  of  Sentry Store  Sensi3ve  Data  in  Hadoop   Extend  Hadoop  to  More  Users   Enable  New  Use  Cases   Enable  Mul3-­‐User  Applica3ons   Comply  with  Regula3ons !9
  10. 10. Key  Capabili/es  of  Sentry Fine-­‐Grained  Authoriza3on   Specify  security  for  SERVERS,  DATABASES,  TABLES  &  VIEWS   Role-­‐Based  Authoriza3on   SELECT  privilege  on  views  &  tables     INSERT  privilege  on  tables   TRANSFORM  privilege  on  servers   ALL  privilege  on  the  server,  databases,  tables  &  views   ALL  privilege  is  needed  to  create/modify  schema   Mul3-­‐Tenant  Administra3on   Separate  policies  for  each  database/schema   Can  be  maintained  by  separate  admins !10
  11. 11. Apache  Ecosystem  and  Sentry Shared  Hive  Metastore  (with   HCatalog)   Extensibility  plug-­‐in  for   HiveServer2   Inline  support  in  Impala  1.1   Poten3al  extension  to  Pig,   MapReduce,  REST Hive  Metastore HCatalog   M !11 Sentry Possible  future   development RE
  12. 12. Sentry  Architecture Impala Binding   Layer HiveServer2 Impala Hive Authoriza<on   Provider Future Policy  Engine Policy  Provider File Local  FS/HDFS !12 Database Interface Evalua3on,  Valida3on Parsing Interface
  13. 13. Query  Execu/on  Flow SQL Parse Validate  SQL  grammar Build Construct  statement  tree Check Validate  statement  objects   • First  check:  Authoriza3on Forward  to  execu3on  planner Plan MR !13 Sentry Query
  14. 14. Example  Security  Policy [databases] junior_analyst_role = server=server1->db=jranalyst1, # Defines the location of the per DB policy file for server=server1->uri=hdfs://ha-nn-uri/ the landing/jranalyst1 # ‘customers’ DB (schema) customers = hdfs://ha-nn-uri/etc/access/customers.ini # Privileges for ‘customers’ can be defined in the global policy # file even though ‘customers’ has its only policy [groups] file. # Assigns Hadoop groups to their respective set of # Note that the privileges from both the global roles policy file and manager = analyst_role, junior_analyst_role # the per-db policy file are merged. There is no analyst = analyst_role overriding. jranalyst = junior_analyst_role customers_admin_role = server=server1->db=customers customers_admin = customers_admin_role admin = admin_role # Role controls everything on server1. admin_role = server=server1 [roles] # Roles that can import or export data to the the URIs defined, # i.e. a landing zone. Since the server runs as the user "hive," # files in this directory must either have the “hive” group set # with read/write or be set world read/write. analyst_role = server=server1->db=analyst1, server=server1->db=jranalyst1->table=*>action=select server=server1->uri=hdfs://ha-nn-uri/landing/ analyst1 (Continued on next column) ! ! ! ! ! # Role controls everything for the ‘customers’ DB on server1. !14 !
  15. 15. Live  Demo  &  Give  Aways Closes  gap  between  HDFS  and  Metastore   Easy  to  implement   RFC  2307  compilant  (Kerberos)   Enable  Mul3-­‐User  Applica3ons  in  one  Hive  WH   Enables  Mul3  Tendency  per  Row  and  Column   !15
  16. 16. About dev@sentry.incubator.apache.org alexander@cloudera.com @mapredit mapredit.blogspot.com ! Web: http://wiki.apache.org/incubator/SentryProposal
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×