Sentry - An Introduction

  • 1,661 views
Uploaded on

Talk I held in Stuttgart about Sentry, with a Live Demo.

Talk I held in Stuttgart about Sentry, with a Live Demo.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,661
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
61
Comments
0
Likes
5

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Sentry: Open Source Authorization for Hive & Impala Alexander Alten-Lorenz | Senior Field Engineer, Cloudera Wednesday, 7th November 2013
  • 2. Defining  Security  Func/ons Perimeter   ! ! ! !2 Data   Access   Visibility   Guarding  access  to  the   cluster  itself   Protec3ng  data  in  the   cluster  from  unauthorized   visibility   Defining  what  users  and   applica3ons  can  do  with   data   Repor3ng  on  where  data   came  from  and  how  it’s   being  used   Technical  Concepts:   Authen3ca3on   Network  isola3on ! ! Technical  Concepts:   Encryp3on   Data  masking ! ! Technical  Concepts:   Permissions   Authoriza3on ! ! Technical  Concepts:   Audi3ng   Lineage
  • 3. Enabling  Enterprise  Security Perimeter   ! ! ! Data   Access   Visibility   Guarding  access  to  the   cluster  itself   Protec3ng  data  in  the   cluster  from  unauthorized   visibility   Defining  what  users  and   applica3ons  can  do  with   data   Repor3ng  on  where  data   came  from  and  how  it’s   being  used   Technical  Concepts:   Authen3ca3on   Network  isola3on  Kerberos  |  Oozie  |  Knox ! ! Technical  Concepts:   Encryp3on   Data  masking Cer3fied  Partners ! ! Technical  Concepts:   Permissions   Authoriza3on Sentry Available  7/23 !3 ! ! Technical  Concepts:   Audi3ng   Lineage Cloudera  Navigator
  • 4. Hive  Overview SQL  Access  to  Hadoop   § § MapReduce:  great  massively  scalable  batch  processing  framework;   required  development  for  each  new  job   Hive  opened  up  Hadoop  for  more  users  with  standard  SQL   ! Key  Challenges   § § Batch  MapReduce  too  slow  for  interac3ve  BI/analy3cs   No  concurrency,  no  security   ! OpEons  Today   § § !4 Impala  designed  for  low-­‐latency  queries   HiveServer2  delivers  concurrency,  authen3ca3on  
  • 5. Our  OpenSource  ac/vity CDH  4.1  (HiveServer2)   § § Concurrency  and  Kerberos  authen3ca3on  for  Hive   JDBC  and  Beeline  clients   CDH  4.2   § § § HDFS  impersona3on  authoriza3on  as  stop-­‐gap   Pluggable  authen3ca3on  API   JDBC  LDAP  username/password   ODBC   § § !5 Supports  Kerberos  authen3ca3on  and  LDAP   Extended  partner  cer3fica3on
  • 6. Current  State  of  Authoriza/on Two  Sub-­‐OpEmal  Choices  for  SQL  on  Hadoop Insecure  Advisory  Authoriza3on   Users  can  grant  themselves  permissions   Intended  to  prevent  accidental  dele3on  of  data   Problem:  Doesn’t  guard  against  malicious  users   HDFS  Impersona3on   Data  is  protected  at  the  file  level  by  HDFS  permissions   Problem:  File-­‐level  not  granular  enough   Problem:  Not  role-­‐based !6
  • 7. Authoriza/on  Requirements Secure  Authoriza3on   Ability  to  control  access  to  data  and/or  privileges  on  data  for   authen3cated  users   Fine-­‐Grained  Authoriza3on   Ability  to  give  users  access  to  a  subset  of  data  (e.g.  column)  in  a   database   Role-­‐Based  Authoriza3on   Ability  to  create/apply  templa3zed  privileges  based  on   func3onal  roles   Mul3-­‐Tenant  Administra3on   Ability  for  central  admin  group  to  empower  lower-­‐level  admins   to  manage  security  for  each  database/schema !7
  • 8. The  Next  Step:  Introducing  Sentry AuthorizaEon  module  for  Hive  &  Impala Unlocks  Key  RBAC  Requirements   Secure,  fine-­‐grained,  role-­‐based  authoriza3on   Mul3-­‐tenant  administra3on   Open  Source   Intent  to  donate  to  ASF   Available  and  Fully  Supported   Hiveserver2  &  Impala  1.1  ini3ally !8
  • 9. Key  Benefits  of  Sentry Store  Sensi3ve  Data  in  Hadoop   Extend  Hadoop  to  More  Users   Enable  New  Use  Cases   Enable  Mul3-­‐User  Applica3ons   Comply  with  Regula3ons !9
  • 10. Key  Capabili/es  of  Sentry Fine-­‐Grained  Authoriza3on   Specify  security  for  SERVERS,  DATABASES,  TABLES  &  VIEWS   Role-­‐Based  Authoriza3on   SELECT  privilege  on  views  &  tables     INSERT  privilege  on  tables   TRANSFORM  privilege  on  servers   ALL  privilege  on  the  server,  databases,  tables  &  views   ALL  privilege  is  needed  to  create/modify  schema   Mul3-­‐Tenant  Administra3on   Separate  policies  for  each  database/schema   Can  be  maintained  by  separate  admins !10
  • 11. Apache  Ecosystem  and  Sentry Shared  Hive  Metastore  (with   HCatalog)   Extensibility  plug-­‐in  for   HiveServer2   Inline  support  in  Impala  1.1   Poten3al  extension  to  Pig,   MapReduce,  REST Hive  Metastore HCatalog   M !11 Sentry Possible  future   development RE
  • 12. Sentry  Architecture Impala Binding   Layer HiveServer2 Impala Hive Authoriza<on   Provider Future Policy  Engine Policy  Provider File Local  FS/HDFS !12 Database Interface Evalua3on,  Valida3on Parsing Interface
  • 13. Query  Execu/on  Flow SQL Parse Validate  SQL  grammar Build Construct  statement  tree Check Validate  statement  objects   • First  check:  Authoriza3on Forward  to  execu3on  planner Plan MR !13 Sentry Query
  • 14. Example  Security  Policy [databases] junior_analyst_role = server=server1->db=jranalyst1, # Defines the location of the per DB policy file for server=server1->uri=hdfs://ha-nn-uri/ the landing/jranalyst1 # ‘customers’ DB (schema) customers = hdfs://ha-nn-uri/etc/access/customers.ini # Privileges for ‘customers’ can be defined in the global policy # file even though ‘customers’ has its only policy [groups] file. # Assigns Hadoop groups to their respective set of # Note that the privileges from both the global roles policy file and manager = analyst_role, junior_analyst_role # the per-db policy file are merged. There is no analyst = analyst_role overriding. jranalyst = junior_analyst_role customers_admin_role = server=server1->db=customers customers_admin = customers_admin_role admin = admin_role # Role controls everything on server1. admin_role = server=server1 [roles] # Roles that can import or export data to the the URIs defined, # i.e. a landing zone. Since the server runs as the user "hive," # files in this directory must either have the “hive” group set # with read/write or be set world read/write. analyst_role = server=server1->db=analyst1, server=server1->db=jranalyst1->table=*>action=select server=server1->uri=hdfs://ha-nn-uri/landing/ analyst1 (Continued on next column) ! ! ! ! ! # Role controls everything for the ‘customers’ DB on server1. !14 !
  • 15. Live  Demo  &  Give  Aways Closes  gap  between  HDFS  and  Metastore   Easy  to  implement   RFC  2307  compilant  (Kerberos)   Enable  Mul3-­‐User  Applica3ons  in  one  Hive  WH   Enables  Mul3  Tendency  per  Row  and  Column   !15
  • 16. About dev@sentry.incubator.apache.org alexander@cloudera.com @mapredit mapredit.blogspot.com ! Web: http://wiki.apache.org/incubator/SentryProposal 16