TriHUG 2/14: Apache Sentry

1,211 views

Published on

Deploying enterprise grade security for Hadoop with Apache Sentry (incubating).
Apache Hive is deployed in the vast majority of Hadoop use cases despite the major practical flaws in it's most secure operational mode (Kerberos + User Impersonation).
In this talk we will discuss these flaws and how Apache Sentry addresses them. We will then enable Apache Sentry on a existing cluster. Additional topics will include Hadoop security and Role Based Access Control (RBAC).

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,211
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
35
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

TriHUG 2/14: Apache Sentry

  1. 1. Deploying  enterprise  grade  security   for  Hadoop   Brock  Noland  |So.ware  Engineer,  Cloudera   February  27,  2014   1
  2. 2. Outline   •  •  IntroducCon   Hadoop  security  primer   •  •  •  Security  opCons   •  •  •  •  2 AuthenCcaCon   AuthorizaCon   Default   Kerberos  with  ImpersonaCon   Kerberos  with  Sentry   Demo  
  3. 3. IntroducCon   Tonight's  focus  is  SQL-­‐on-­‐Hadoop   •  Vast  majority  of  Hadoop  users  use  Hive  or  Cloudera   Impala   •  Data  warehouse  offload  is  the  most  common  use   case   •  Data  warehouse  offload  is  a  two  step  process   1.  2.  3 AutomaCc  transformaCons  moved  to  Hadoop   Data  analysts  given  query  access  
  4. 4. Data  warehouse  use  case   Online   Database   4 Hadoop   Data  Warehouse  
  5. 5. Outline   •  •  IntroducCon   Hadoop  Security  Primer   •  •  •  Security  opCons   •  •  •  •  5 AuthenCcaCon   AuthorizaCon   Default   Kerberos  with  ImpersonaCon   Kerberos  with  Sentry   Demo  
  6. 6. AuthenCcaCon   •  •  AuthenCcaCon  is  who  you  are   Hadoop  models   •  •  6 Default  -­‐  “trusted  network”   Strong  -­‐  Kerberos  
  7. 7. Default  AuthenCcaCon  –  trusted  network   •  •  •  Default  security  mechanism   Hadoop  client  uses  local  username   Used  in   •  •  •  •  7 POCs   Startups   Demos   Pre-­‐prod  environments  
  8. 8. Default  AuthenCcaCon  –  trusted  network   Client  Host   User:  brock   File:  a.txt   Contents:  some  data   $  whoami   brock   $  cat  a.txt   some  data   $  hadoop  fs  -­‐put  a.txt  .   8 Hadoop  
  9. 9. Strong  AuthenCcaCon  –  Kerberos   •  Hadoop  is  secured  with  Kerberos   •  •  •  Every  user  and  service  has  a  Kerberos  “principal”   •  •  •  Service:  impala/hostname@MYCOMPANY.COM   User:  brock@MYCOMPANY.COM   CredenCals   •  •  9 Provides  mutual  authenCcaCon   Protects  against  eavesdropping  and  replay  a^acks   Service:  keytabs   User:  password  
  10. 10. Strong  AuthenCcaCon  –  Kerberos   Client  Host   <kerberos  Ccket>   <encrypted  data>  *   $  whoami   brock   $  kinit   Password:  *******   $  cat  a.txt   some  data   $  hadoop  fs  -­‐put  a.txt  .   10 Hadoop   *  RPC  EncrypCon  must  be  enabled  
  11. 11. Strong  AuthenCcaCon  –  Kerberos   •  Keytab   •  •  11 Encrypted  key  for  servers  (similar  to  a  “password”)   Generated  by  server  such  as  MIT  Kerberos  or  AcCve   Directory  
  12. 12. Strong  AuthenCcaCon  –  Kerberos   •  ImpersonaCon   •  •  •  12 Services  such  as  Hive  Server2  impersonate  users   Data  loaded  by  “joe”  via  HS2  is  owned  by  “joe”   Oozie  jobs  submi^ed  by  “brock”  are  run  as  “brock”  
  13. 13. Hive  Server  2  and  Oozie   Beeline   (Hive  CLI)   Tableau   JDBC   Hive  Server  2  (HS2)   Oozie   Hadoop   13 Oozie  CLI   Control-­‐M  
  14. 14. AuthorizaCon   •  HDFS  permissions   •  •  •  •  Other  Hadoop  components  have  authorizaCon   •  •  14 Unix  style   Read/Write/Execute  for  Owner/Group/Other   Coarse  grained   MapReduce  who  can  use  which  job  queues   HBase  table  ACL’s  
  15. 15. HDFS  Permisssions   $ hadoop fs -ls file -rw-r----1 analyst1 analysts   •  Permissions   •  •  •  •  Owner   •  •  Unix  style  permissions   Read/Write/Execute   Owner/Group/Other   One  and  only  one  owner   Group   •  One  and  only  one  group   2244 2014-01-19 12:15 file
  16. 16. Back  to  our  use  case   •  Scenario  facts   •  •  •  •  Next  step   •  •  16 ETL  offload  is  a  success   Data  warehouse  is  expensive  and  at  capacity   Same  data  is  in  Hadoop   End  users  start  using  Hadoop  to  augment  the  DW   Security  becomes  primary  concern  
  17. 17. End  users  need  to  share  data   •  •  •  •  17 Unlike  automated  ETL  jobs,  end  users  want  to  share   data  with  peers   Must  manage  HDFS  permissions  manually   Each  file  has  a  single  group   End  result  is  users  set  permissions  to  world   readable/writeable  
  18. 18. Outline   •  •  IntroducCon   Hadoop  Security  Primer   •  •  •  Security  opCons   •  •  •  •  18 AuthenCcaCon   AuthorizaCon   Default   Kerberos  with  ImpersonaCon   Kerberos  with  Sentry   Demo  
  19. 19. Hive:  Security  holes   CREATE TEMPORARY FUNCTION custom_udf AS ’com.mycompany. MaliciousClass’; SELECT TRANSFORM(stuff) USING 'malicious-script.pl' AS thing1, thing; CREATE EXTERNAL TABLE external_table(column1 string) LOCATION ‘/path/to/any/table’; 19
  20. 20. Hive:  Security  holes   CREATE TABLE test (c1 string) ROW FORMAT SERDE 'com.mycompany.MaliciousClass'; FROM ( FROM t1 MAP t1.c1 USING 'malicious-script1.pl' CLUSTER BY key) map_output INSERT OVERWRITE TABLE t2 REDUCE t2.c1 USING 'malicious-script2.pl' AS c2; 20
  21. 21. Default:  AuthorizaCon   •  Hive  ships  with  an  “advisory”  authorizaCon  system   •  •  •  21 All  users  see  all  databases/tables/columns   Does  not  fix  any  security  holes   Users  grant  themselves  permissions  
  22. 22. Outline   •  •  IntroducCon   Hadoop  Security  Primer   •  •  •  Security  opCons   •  •  •  •  22 AuthenCcaCon   AuthorizaCon   Default   Kerberos  with  ImpersonaCon   Kerberos  with  Sentry   Demo  
  23. 23. Kerberos  with  impersonaCon:  Sharing  data   The  user  “manager1”  wants  to  share  the  table  “manager1_table”   with  senior  analysts  but  not  junior  analysts. # hadoop fs -ls -R /user/hive/warehouse drwxr-x--T - analyst1 analyst1 drwxr-x--T - jranalyst1 jranalyst1 drwxr-x--T - manager1 manager1 23 0 0 0 analyst1_table jranalyst1_table manager1_table
  24. 24. Kerberos  with  impersonaCon:  Sharing  data   IT  must  create  a  group # groupadd senioranalysts   Then  add  the  appropriate  members  to  group # usermod -G analyst,senioranalysts analyst1 # usermod -G management,analyst,senioranalysts manager1 24
  25. 25. Kerberos  with  impersonaCon:  Sharing  data   Then  “manager1”  can  manually  change  the  file  permissions   $ hadoop fs -chgrp -R senioranalysts …/warehouse/manager1_table $ hadoop fs -ls /user/hive/warehouse/ Found 3 items drwxr-x--T - analyst1 analyst1 drwxr-x--T - jranalyst1 jranalyst1 drwxr-x--T - manager1 senioranalysts 25 0 0 0 analyst1_table jranalyst1_table manager1_table
  26. 26. Kerberos  with  impersonaCon:  Sharing  data   Now  any  senior-­‐level  analyst  can  query  the  data   $ whoami analyst1 $ beeline ... Connected to: Hive (version 0.10.0) 0: jdbc:hive2://localhost:10000/default> select count(*) from manager1_table; +------------+ | count(*) | +------------+ | 47 | +------------+ 26 ⏎
  27. 27. Kerberos  with  impersonaCon:  Sharing  data   Junior  analysts  cannot  query  the  data:   $ whoami jranalyst1 $ beeline .... Connected to: Hive (version 0.10.0) 0: jdbc:hive2://localhost:10000/default> ⏎ select * from manager1_table; Error: java.io.IOException: org.apache.hadoop.security.AccessControlException: Permission denied: user=jranalyst1, access=READ_EXECUTE, inode="/user/hive/warehouse/ manager1_table":manager1:senioranalysts:drwxr-x--T 27
  28. 28. Kerberos  with  impersonaCon:  Sharing  data       What  happens  in  the  real  world?   28
  29. 29. Kerberos  with  impersonaCon:  Sharing  data   Table  “manager1_table”  is  owned  by  user/group  “manager1”   $ hadoop fs -ls /user/hive/warehouse/ Found 3 items drwxr-x--T - analyst1 analyst1 drwxr-x--T - jranalyst1 jranalyst1 drwxr-x--T - manager1 manager1 29 0 0 0 analyst1_table jranalyst1_table manager1_table
  30. 30. Kerberos  with  impersonaCon:  Sharing  data   User  “manager1”  makes  “manager1_table”  world  readable/writable   $ hadoop fs -chmod -R 777 /user/hive/warehouse/manager1_table $ hadoop fs -ls /user/hive/warehouse/ Found 3 items drwxr-x--T - analyst1 analyst1 drwxr-x--T - jranalyst1 jranalyst1 drwxrwxrwt - manager1 manager1 30 0 0 0 analyst1_table jranalyst1_table manager1_table
  31. 31. Kerberos  with  impersonaCon:  Summary   •  Securing  Hive  with  Kerberos  and  impersonaCon   makes  Hive  unusable  for  DW  offload   •  •  •  •  31 Manual  file  permission  management   End  state  is  world  writable/readable   No  ability  to  restrict  access  to  columns  or  rows   All  users  see  all  databases/tables/columns  
  32. 32. Outline   •  •  IntroducCon   Hadoop  Security  Primer   •  •  •  Security  opCons   •  •  •  •  32 AuthenCcaCon   AuthorizaCon   Default   Kerberos  with  ImpersonaCon   Kerberos  with  Sentry   Demo  
  33. 33. Fine  Grained  Security:  Apache  Sentry   AuthorizaRon  module  for  Hive,  Search,  &  Impala   Unlocks  Key  RBAC  Requirements   Secure,  fine-­‐grained,  role-­‐based  authorizaCon   MulC-­‐tenant  administraCon   Open  Source   Apache  Incubator  project   Ecosystem  Support   Apache  SOLR,  HiveServer2,  &  Impala  1.1+   33
  34. 34. Key  Benefits  of  Sentry   Store  SensiCve  Data  in  Hadoop   Extend  Hadoop  to  More  Users   Comply  with  RegulaCons   34
  35. 35. Key  CapabiliCes  of  Sentry   Fine-­‐Grained  AuthorizaCon   Specify  security  for  SERVERS,  DATABASES,  TABLES  &  VIEWS   Role-­‐Based  AuthorizaCon   SELECT  privilege  on  views  &  tables     INSERT  privilege  on  tables   ALL  privilege  on  the  server,  databases,  tables  &  views   ALL  privilege  is  needed  to  create/modify  schema   MulC-­‐Tenant  AdministraCon   Separate  policies  for  each  database/schema   Can  be  maintained  by  separate  admins   35
  36. 36. Sentry  Architecture   Impala   Binding   Layer   Impala   HiveServer2   Hive   Authoriza5on   Provider   SOLR   Search   Pig   Policy  Engine   Policy  Provider   File   Local  FS/HDFS   36 Database   …  
  37. 37. Query  ExecuCon  Flow   SQL   Parse   Validate  SQL  grammar   Build   Construct  statement  tree   Check   37 Validate  statement  objects   •  First  check:  AuthorizaCon   Forward  to  execuCon  planner   Plan   MR   Sentry   Query  
  38. 38. Outline   •  •  IntroducCon   Hadoop  Security  Primer   •  •  •  Security  opCons   •  •  •  •  38 AuthenCcaCon   AuthorizaCon   Default   Kerberos  with  ImpersonaCon   Kerberos  with  Sentry   Demo  
  39. 39. Click  to  edit  Master  Ctle  style   39  

×