sqrrl
Secure.	
  Scale.	
  Adapt	
  
Sqrrl Data, Inc., All Rights Reserved
Security	
  of	
  data	
  within	
  Hadoop	
  
2	
  
Sqrrl Data, Inc., All Rights Reserved
Problem	
  
<5%	
  of	
  
Data	
  
Solu+on	
  
General Data Problems
Source:	
  	
  Forrester	
  
3	
  
Sqrrl Data, Inc., All Rights Reserved
What about security?
3	
  
4	
  
Sqrrl Data, Inc., All Rights Reserved
What is the market saying?
security	
  becomes	
  an	
  “enabler”	
  
by	
  making	
  it	
  possible	
  to	
  bring	
  
together	
  huge	
  stores	
  of	
  data	
  	
  
You	
  want	
  security	
  to	
  be	
  just	
  as	
  scalable,	
  high-­‐
performance	
  and	
  self-­‐organizing	
  as	
  the	
  
clusters	
  
most	
  big	
  data	
  technologies	
  don’t	
  have	
  any	
  security	
  
features	
  built	
  in	
  
want	
  fine-­‐grained	
  security	
  and	
  policy	
  control	
  at	
  the	
  
database-­‐level	
  
5	
  
Sqrrl Data, Inc., All Rights Reserved
	
  
•  With	
  every	
  copy	
  of	
  data,	
  there	
  is	
  an	
  increased	
  
risk	
  of	
  unintended	
  disclosure	
  
•  Every	
  now	
  and	
  then	
  people	
  with	
  access	
  and	
  
privileges	
  take	
  a	
  look	
  at	
  records	
  without	
  a	
  
legiCmate	
  business	
  purpose	
  e.g.,	
  an	
  employee	
  
of	
  a	
  banking	
  system	
  looking	
  up	
  their	
  neighbor	
  
A few more risks…
6	
  
Sqrrl Data, Inc., All Rights Reserved
The Perfect Storm
6	
  
Security	
  
Analysis	
  
Customer	
  
Support	
  
Customer	
  
Profiles	
  
Sales	
  &	
  
MarkeCng	
  
Social	
  
Media	
  
Business	
  
Improvement	
  
Big	
  
Data	
  
Regula+ons	
  
&	
  Breaches	
   Increased
profits
Increased
profits
Increased
profits
Increased
profits
Increased
profits
Increased
profits
7	
  
Sqrrl Data, Inc., All Rights Reserved
•  Big	
  Data	
  is	
  a	
  Cme-­‐bomb	
  based	
  on	
  how	
  things	
  are	
  
coming	
  together	
  
•  Big	
  Data	
  deployment	
  is	
  growing	
  fast;	
  rushing	
  into	
  it	
  
•  Shortage	
  in	
  Big	
  Data	
  skills	
  
•  Big	
  Data	
  security	
  soluCons	
  are	
  not	
  effecCve	
  
•  General	
  shortage	
  in	
  security	
  skills	
  
The Perfect Storm
7	
  
8	
  
Sqrrl Data, Inc., All Rights Reserved
So	
  what	
  can	
  we	
  do?	
  
9	
  
Sqrrl Data, Inc., All Rights Reserved
	
  
(Def.)	
  A	
  form	
  of	
  security	
  in	
  which	
  data	
  carries	
  with	
  it	
  the	
  
elements	
  of	
  provenance	
  that	
  are	
  required	
  to	
  make	
  policy	
  
decisions	
  on	
  its	
  visibility:	
  
	
  
•  Separate	
  data	
  modeling	
  for	
  security	
  and	
  analysis	
  
•  Data	
  comes	
  with	
  security	
  aYributes	
  governing	
  its	
  
visibility…..data	
  is	
  self-­‐describing	
  
•  Reusability	
  of	
  applicaCons	
  across	
  security	
  domains	
  
•  Distributed	
  development	
  of	
  ingest	
  and	
  query	
  applicaCons	
  
•  Supported	
  by	
  Accumulo’s	
  cell-­‐level	
  security	
  
Data-Centric Security
10	
  
Sqrrl Data, Inc., All Rights Reserved
Data-Centric Security
Within	
  Accumulo,	
  a	
  key	
  is	
  a	
  5-­‐tuple,	
  consis+ng	
  of:	
  	
  
	
  
"   Row:	
  Controls	
  Atomicity	
  
"   Column	
  Family:	
  Controls	
  Locality	
  	
  
"   Column	
  Qualifier:	
  	
  Controls	
  Uniqueness	
  
"   Visibility	
  Label:	
  	
  Controls	
  Access	
  
"   Timestamp:	
  	
  Controls	
  Versioning	
  
Row	
   Col.	
  Fam.	
   Col.	
  Qual.	
   Visibility	
   Timestamp	
   Value	
  
John	
  Doe	
   Notes	
   PCP	
   PCP_JD	
   20120912	
  
PaCent	
  suffers	
  
from	
  an	
  acute	
  …	
  
John	
  Doe	
   Test	
  Results	
   Cholesterol	
   JD|PCP_JD	
   20120912	
   183	
  
John	
  Doe	
   Test	
  Results	
   Mental	
  Health	
   JD|PSYCH_JD	
   20120801	
   Pass	
  
John	
  Doe	
   Test	
  Results	
   X-­‐Ray	
   JD|PHYS_JD	
   20120513	
   1010110110100…	
  
Accumulo	
  Key/Value	
  Example	
  
11	
  
Sqrrl Data, Inc., All Rights Reserved
Data-Centric Security
12	
  
Sqrrl Data, Inc., All Rights Reserved
Data-Centric Security
Row Col Value
1 Name Jones
1 Sales 100
1 Age 28
2 Name Smith
2 Sales 350
2 Age 25
2	
   Quota	
   1000	
  
Row Col Value
1 Name Anon1
1 Sales 100
2 Name Smith
2 Sales 350
2	
   Quota	
   1000	
  
User 1 User 2Data	
  Store	
  
Data-­‐centric	
  security	
  approach	
  allows	
  all	
  the	
  data	
  to	
  be	
  stored	
  on	
  a	
  
single	
  pla9orm	
  and	
  only	
  authorized	
  data	
  is	
  returned	
  to	
  the	
  user	
  
Pushing	
  security	
  to	
  the	
  data-­‐level,	
  simplifies	
  applica@on	
  development	
  
and	
  enables	
  more	
  powerful	
  queries	
  
13	
  
Sqrrl Data, Inc., All Rights Reserved
We	
  now	
  have	
  user	
  access	
  to	
  the	
  
data	
  secured.	
  	
  But	
  what	
  about	
  your	
  
HDFS	
  administrators?	
  
Encryption of Files
14	
  
Sqrrl Data, Inc., All Rights Reserved
Encryption of Files
By	
  encrypCng	
  the	
  files	
  we	
  write	
  
into	
  HDFS	
  we	
  further	
  eliminate	
  
who	
  can	
  access	
  the	
  data!	
  

Meetup presenation 06192013

  • 1.
    sqrrl Secure.  Scale.  Adapt   Sqrrl Data, Inc., All Rights Reserved Security  of  data  within  Hadoop  
  • 2.
    2   Sqrrl Data,Inc., All Rights Reserved Problem   <5%  of   Data   Solu+on   General Data Problems Source:    Forrester  
  • 3.
    3   Sqrrl Data,Inc., All Rights Reserved What about security? 3  
  • 4.
    4   Sqrrl Data,Inc., All Rights Reserved What is the market saying? security  becomes  an  “enabler”   by  making  it  possible  to  bring   together  huge  stores  of  data     You  want  security  to  be  just  as  scalable,  high-­‐ performance  and  self-­‐organizing  as  the   clusters   most  big  data  technologies  don’t  have  any  security   features  built  in   want  fine-­‐grained  security  and  policy  control  at  the   database-­‐level  
  • 5.
    5   Sqrrl Data,Inc., All Rights Reserved   •  With  every  copy  of  data,  there  is  an  increased   risk  of  unintended  disclosure   •  Every  now  and  then  people  with  access  and   privileges  take  a  look  at  records  without  a   legiCmate  business  purpose  e.g.,  an  employee   of  a  banking  system  looking  up  their  neighbor   A few more risks…
  • 6.
    6   Sqrrl Data,Inc., All Rights Reserved The Perfect Storm 6   Security   Analysis   Customer   Support   Customer   Profiles   Sales  &   MarkeCng   Social   Media   Business   Improvement   Big   Data   Regula+ons   &  Breaches   Increased profits Increased profits Increased profits Increased profits Increased profits Increased profits
  • 7.
    7   Sqrrl Data,Inc., All Rights Reserved •  Big  Data  is  a  Cme-­‐bomb  based  on  how  things  are   coming  together   •  Big  Data  deployment  is  growing  fast;  rushing  into  it   •  Shortage  in  Big  Data  skills   •  Big  Data  security  soluCons  are  not  effecCve   •  General  shortage  in  security  skills   The Perfect Storm 7  
  • 8.
    8   Sqrrl Data,Inc., All Rights Reserved So  what  can  we  do?  
  • 9.
    9   Sqrrl Data,Inc., All Rights Reserved   (Def.)  A  form  of  security  in  which  data  carries  with  it  the   elements  of  provenance  that  are  required  to  make  policy   decisions  on  its  visibility:     •  Separate  data  modeling  for  security  and  analysis   •  Data  comes  with  security  aYributes  governing  its   visibility…..data  is  self-­‐describing   •  Reusability  of  applicaCons  across  security  domains   •  Distributed  development  of  ingest  and  query  applicaCons   •  Supported  by  Accumulo’s  cell-­‐level  security   Data-Centric Security
  • 10.
    10   Sqrrl Data,Inc., All Rights Reserved Data-Centric Security Within  Accumulo,  a  key  is  a  5-­‐tuple,  consis+ng  of:       "   Row:  Controls  Atomicity   "   Column  Family:  Controls  Locality     "   Column  Qualifier:    Controls  Uniqueness   "   Visibility  Label:    Controls  Access   "   Timestamp:    Controls  Versioning   Row   Col.  Fam.   Col.  Qual.   Visibility   Timestamp   Value   John  Doe   Notes   PCP   PCP_JD   20120912   PaCent  suffers   from  an  acute  …   John  Doe   Test  Results   Cholesterol   JD|PCP_JD   20120912   183   John  Doe   Test  Results   Mental  Health   JD|PSYCH_JD   20120801   Pass   John  Doe   Test  Results   X-­‐Ray   JD|PHYS_JD   20120513   1010110110100…   Accumulo  Key/Value  Example  
  • 11.
    11   Sqrrl Data,Inc., All Rights Reserved Data-Centric Security
  • 12.
    12   Sqrrl Data,Inc., All Rights Reserved Data-Centric Security Row Col Value 1 Name Jones 1 Sales 100 1 Age 28 2 Name Smith 2 Sales 350 2 Age 25 2   Quota   1000   Row Col Value 1 Name Anon1 1 Sales 100 2 Name Smith 2 Sales 350 2   Quota   1000   User 1 User 2Data  Store   Data-­‐centric  security  approach  allows  all  the  data  to  be  stored  on  a   single  pla9orm  and  only  authorized  data  is  returned  to  the  user   Pushing  security  to  the  data-­‐level,  simplifies  applica@on  development   and  enables  more  powerful  queries  
  • 13.
    13   Sqrrl Data,Inc., All Rights Reserved We  now  have  user  access  to  the   data  secured.    But  what  about  your   HDFS  administrators?   Encryption of Files
  • 14.
    14   Sqrrl Data,Inc., All Rights Reserved Encryption of Files By  encrypCng  the  files  we  write   into  HDFS  we  further  eliminate   who  can  access  the  data!