sqrrlSecure. Scale. Adapt Sqrrl Data, Inc., All Rights ReservedSecurity of data within Hadoop
2 Sqrrl Data, Inc., All Rights ReservedProblem <5% of Data Solu+on General Data ProblemsSource: Forrester
3 Sqrrl Data, Inc., All Rights ReservedWhat about security?3
4 Sqrrl Data, Inc., All Rights ReservedWhat is the market saying?security becomes an “enabler” by making it possible to bring together huge stores of data You want security to be just as scalable, high-‐performance and self-‐organizing as the clusters most big data technologies don’t have any security features built in want ﬁne-‐grained security and policy control at the database-‐level
5 Sqrrl Data, Inc., All Rights Reserved • With every copy of data, there is an increased risk of unintended disclosure • Every now and then people with access and privileges take a look at records without a legiCmate business purpose e.g., an employee of a banking system looking up their neighbor A few more risks…
6 Sqrrl Data, Inc., All Rights ReservedThe Perfect Storm6 Security Analysis Customer Support Customer Proﬁles Sales & MarkeCng Social Media Business Improvement Big Data Regula+ons & Breaches IncreasedprofitsIncreasedprofitsIncreasedprofitsIncreasedprofitsIncreasedprofitsIncreasedprofits
7 Sqrrl Data, Inc., All Rights Reserved• Big Data is a Cme-‐bomb based on how things are coming together • Big Data deployment is growing fast; rushing into it • Shortage in Big Data skills • Big Data security soluCons are not eﬀecCve • General shortage in security skills The Perfect Storm7
8 Sqrrl Data, Inc., All Rights ReservedSo what can we do?
9 Sqrrl Data, Inc., All Rights Reserved (Def.) A form of security in which data carries with it the elements of provenance that are required to make policy decisions on its visibility: • Separate data modeling for security and analysis • Data comes with security aYributes governing its visibility…..data is self-‐describing • Reusability of applicaCons across security domains • Distributed development of ingest and query applicaCons • Supported by Accumulo’s cell-‐level security Data-Centric Security
10 Sqrrl Data, Inc., All Rights ReservedData-Centric SecurityWithin Accumulo, a key is a 5-‐tuple, consis+ng of: " Row: Controls Atomicity " Column Family: Controls Locality " Column Qualiﬁer: Controls Uniqueness " Visibility Label: Controls Access " Timestamp: Controls Versioning Row Col. Fam. Col. Qual. Visibility Timestamp Value John Doe Notes PCP PCP_JD 20120912 PaCent suﬀers from an acute … John Doe Test Results Cholesterol JD|PCP_JD 20120912 183 John Doe Test Results Mental Health JD|PSYCH_JD 20120801 Pass John Doe Test Results X-‐Ray JD|PHYS_JD 20120513 1010110110100… Accumulo Key/Value Example
11 Sqrrl Data, Inc., All Rights ReservedData-Centric Security
12 Sqrrl Data, Inc., All Rights ReservedData-Centric SecurityRow Col Value1 Name Jones1 Sales 1001 Age 282 Name Smith2 Sales 3502 Age 252 Quota 1000 Row Col Value1 Name Anon11 Sales 1002 Name Smith2 Sales 3502 Quota 1000 User 1 User 2Data Store Data-‐centric security approach allows all the data to be stored on a single pla9orm and only authorized data is returned to the user Pushing security to the data-‐level, simpliﬁes applica@on development and enables more powerful queries
13 Sqrrl Data, Inc., All Rights ReservedWe now have user access to the data secured. But what about your HDFS administrators? Encryption of Files
14 Sqrrl Data, Inc., All Rights ReservedEncryption of FilesBy encrypCng the ﬁles we write into HDFS we further eliminate who can access the data!