Big	  Data	  Security	      Joey	  Echeverria	  |	  Principal	  Solu8ons	  Architect	      joey@cloudera.com	  |	  @fwiffo	...
Big	  Data	  Security	             EARLY	  DAYS	  2	  
Hadoop	  File	  Permissions	      •    Added	  in	  HADOOP-­‐1298	            •    Hadoop	  0.16	            •    Early	  ...
MapReduce	  ACLs	      •    Added	  in	  HADOOP-­‐3698	            •    Hadoop	  0.19	            •    Late	  2008	      •...
Securing	  a	  Cluster	  Through	  a	  Gateway	      •    Hadoop	  cluster	  runs	  on	  a	  private	  network	      •    ...
Big	  Data	  Security	             WHY	  SECURITY	  MATTERS	  6	  
Prevent	  Accidental	  Access	      •    Don’t	  let	  users	  shoot	  themselves	  in	  the	  foot	      •    Main	  driv...
Stop	  Malicious	  Users	      •    Early	  features	  were	  necessary,	  but	  not	  sufficient	      •    Security	  has	...
Co-­‐mingle	  All	  Your	  Data	      •    Ofen	  overlooked	      •    Big	  data	  means	  gegng	  rid	  of	  stovepipes...
Big	  Data	  Security	              AN	  EVOLVING	  STORY	  10	  
Authoriza8on	       •    Files	       •    MapReduce/YARN	  job	  queues	       •    Service-­‐level	  authoriza8on	      ...
Authen8ca8on	                        2.2    High Level Use Cases                                            2    USE CASES...
Encryp8on	       •    Over	  the	  wire	  encryp8on	  for	  some	  socket	            connec8ons	       •    RPC	  encryp8...
Big	  Data	  Security	              SECURITY	  FOR	  KEY	  VALUE	  STORES	  14	  
Apache	  Accumulo	       •    Robust,	  scalable,	  high	  performance	  data	  storage	  and	            retrieval	  syst...
Data	  Model	       •    Mul8-­‐dimensional,	  persistent,	  sorted	  map	       •    Key/Value	  store	  with	  a	  twist...
Cell-­‐Level	  Security	       •    Labels	  stored	  per	  cell	       •    Labels	  consist	  of	  Boolean	  expressions...
Pluggable	  Authen8ca8on	       •    Currently	  supports	  username/password	            authen8ca8on	  backed	  by	  Zoo...
Applica8on	  Level	       •    Accumulo	  ofen	  paired	  with	  applica8on	  level	            authen8ca8on/authoriza8on	...
Apache	  HBase	       •    Also	  based	  on	  Google’s	  BigTable	       •    Started	  as	  a	  Hadoop	  contrib	  proje...
Big	  Data	  Security	              FUTURE	  21	  
Encryp8on	  for	  Data	  at	  Rest	       •    Need	  mul8ple	  levels	  of	  granularity	       •    Encryp8on	  keys	  8...
Hive	  Security	       •    Column-­‐level	  ACLs	       •    Kerberos	  authen8ca8on	       •    AccessServer	  23
24   ©2013 Cloudera, Inc.
Upcoming SlideShare
Loading in...5
×

Big Data Security with Hadoop

3,840

Published on

Cloudera's Principal Solutions Architect, Joey Echeverria, explains Big Data security at the Federal Big Data Forum.

0 Comments
11 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,840
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
209
Comments
0
Likes
11
Embeds 0
No embeds

No notes for slide

Big Data Security with Hadoop

  1. 1. Big  Data  Security   Joey  Echeverria  |  Principal  Solu8ons  Architect   joey@cloudera.com  |  @fwiffo  1 ©2013 Cloudera, Inc.
  2. 2. Big  Data  Security   EARLY  DAYS  2  
  3. 3. Hadoop  File  Permissions   •  Added  in  HADOOP-­‐1298   •  Hadoop  0.16   •  Early  2008   •  Authoriza8on  without  authen8ca8on   •  POSIX-­‐like  RWX  bits  3
  4. 4. MapReduce  ACLs   •  Added  in  HADOOP-­‐3698   •  Hadoop  0.19   •  Late  2008   •  ACLs  per  job  queue   •  Set  a  list  of  allowed  users  or  groups  per  opera8on   •  Job  submission   •  Job  administra8on   •  No  authen8ca8on  4
  5. 5. Securing  a  Cluster  Through  a  Gateway   •  Hadoop  cluster  runs  on  a  private  network   •  Gateway  server  dual-­‐homed  (Hadoop  network  and   public  network)   •  Users  SSH  onto  gateway   •  Op8onally  can  create  an  SSH  proxy  for  jobs  to  be   submi`ed  from  the  client  machine   •  Provides  minimum  level  of  protec8on  5
  6. 6. Big  Data  Security   WHY  SECURITY  MATTERS  6  
  7. 7. Prevent  Accidental  Access   •  Don’t  let  users  shoot  themselves  in  the  foot   •  Main  driver  for  early  features   •  Not  security  per-­‐se,  but  a  cri8cal  first  step   •  Doesn’t  require  strong  authen8ca8on  7
  8. 8. Stop  Malicious  Users   •  Early  features  were  necessary,  but  not  sufficient   •  Security  has  to  get  real   •  Hadoop  runs  arbitrary  code   •  Implicit  trust  doesn’t  prevent  the  insider  threat  8
  9. 9. Co-­‐mingle  All  Your  Data   •  Ofen  overlooked   •  Big  data  means  gegng  rid  of  stovepipes   •  Scalability  and  flexibility  are  only  50%  of  the  problem   •  Trust  your  data  in  a  mul8-­‐tenant  environment   •  Most  cri8cal  driver  9
  10. 10. Big  Data  Security   AN  EVOLVING  STORY  10  
  11. 11. Authoriza8on   •  Files   •  MapReduce/YARN  job  queues   •  Service-­‐level  authoriza8on   •  Whitelists  and  blacklists  of  hosts  and  users  11
  12. 12. Authen8ca8on   2.2 High Level Use Cases 2 USE CASES •  HADOOP-­‐4487   •  Hadoop  0.22  and  0.20.205   2.2 High Level Use Cases 1. Applications accessing files on HDFS clusters Non-MapReduce ap- •  Late  2010   including hadoop fs, access files stored on one or more HDFS plications, clusters. The application should only be able to access files and services •  Based  on  Kerberos  and  internal  delega8on  tokens   they are authorized to access. See figure 1. Variations: (a) Access HDFS directly using HDFS protocol. •  Provides  strong  user  authen8ca8on   servers via the HFTP (b) Access HDFS indirectly though HDFS proxy FileSystem or HTTP get. •  Also  used  for  service-­‐to-­‐service  authen8ca8on     (joe) Name Node delg(jo e) kerb MapReduce Application kerb(hdfs) Task bloc n k to oke ken ck t Data blo Node Figure 1: HDFS High-level Dataflow12 2. Applications accessing third-party (non-Hadoop) services Non- MapReduce applications and MapReduce tasks accessing files or opera-
  13. 13. Encryp8on   •  Over  the  wire  encryp8on  for  some  socket   connec8ons   •  RPC  encryp8on  added  soon  afer  Kerberos   •  Shuffle  encryp8on  (HTTPS)  added  in  Hadoop  2.0.2-­‐ alpha,  back  ported  to  CDH4  MR1   •  HDFS  block  streamer  encryp8on  added  in  Hadoop   2.0.2-­‐alpha   •  Volume-­‐level  encryp8on  for  data  at  rest  13
  14. 14. Big  Data  Security   SECURITY  FOR  KEY  VALUE  STORES  14  
  15. 15. Apache  Accumulo   •  Robust,  scalable,  high  performance  data  storage  and   retrieval  system   •  Built  by  NSA,  now  an  Apache  project   •  Based  on  Google’s  BigTable   •  Built  on  top  of  HDFS,  ZooKeeper  and  Thrif   •  Iterators  for  server-­‐side  extensions   •  Cell  labels  for  flexible  security  models  15
  16. 16. Data  Model   •  Mul8-­‐dimensional,  persistent,  sorted  map   •  Key/Value  store  with  a  twist   •  A  single  primary  key  (Row  ID)   •  Secondary  key  (Column)  internal  to  a  row   •  Family   •  Qualifier   •  Per-­‐cell  8mestamp  16
  17. 17. Cell-­‐Level  Security   •  Labels  stored  per  cell   •  Labels  consist  of  Boolean  expressions  (AND,  OR,   nes8ng)   •  Labels  associated  with  each  user   •  Cell  labels  checked  against  user’s  labels  with  a  built-­‐ in  iterator  17
  18. 18. Pluggable  Authen8ca8on   •  Currently  supports  username/password   authen8ca8on  backed  by  ZooKeeper   •  ACCUMULO-­‐259   •  Targeted  for  Accumulo  1.5.0   •  Authen8ca8on  info  replaced  with  generic  tokens   •  Supports  mul8ple  implementa8ons  (e.g.  Kerberos)  18
  19. 19. Applica8on  Level   •  Accumulo  ofen  paired  with  applica8on  level   authen8ca8on/authoriza8on   •  Accumulo  users  created  per  applica8on   •  Each  applica8on  granted  access  level  of  most   permi`ed  user   •  Applica8on  authen8cates  users,  grabs  user   authoriza8ons,  passes  user  labels  with  requests  19
  20. 20. Apache  HBase   •  Also  based  on  Google’s  BigTable   •  Started  as  a  Hadoop  contrib  project   •  Supports  column-­‐level  ACLs   •  Kerberos  for  authen8ca8on   •  Discussion  and  early  prototypes  of  cell-­‐level  security   ongoing  20
  21. 21. Big  Data  Security   FUTURE  21  
  22. 22. Encryp8on  for  Data  at  Rest   •  Need  mul8ple  levels  of  granularity   •  Encryp8on  keys  8ed  to  authoriza8on  labels  (like   Accumulo  labels  or  HBase  ACLs)   •  APIs  for  file-­‐level,  block-­‐level,  or  record-­‐level   encryp8on  22
  23. 23. Hive  Security   •  Column-­‐level  ACLs   •  Kerberos  authen8ca8on   •  AccessServer  23
  24. 24. 24 ©2013 Cloudera, Inc.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×