Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big Data Security and Privacy - Presentation to AFCEA Cyber Symposium 2014

1,385 views

Published on

In our era of “Big Data”, organizations are collecting, analyzing, and making decisions based on analysis of massive amounts of data sets from various sources, and security in this process is becoming increasingly more important. With regulations like HIPAA and other privacy protection laws, securing access and determining releasability of data sets is critical. Organizations using Big Data Analytics solutions face challenges, as most of today’s solutions were not designed with security in mind. This presentation focuses on challenges, use cases, and practical real-world solutions related to securing and preserving privacy in Big Data Analytics solutions, addressing authorization, differential privacy, and more.

Published in: Technology
  • Be the first to comment

Big Data Security and Privacy - Presentation to AFCEA Cyber Symposium 2014

  1. 1. Big Data Security and Privacy Copyright © 2014, Novetta Solutions, LLC. All rights reserved. AFCEA CyberSecurity Symposium 2014 Kevin T. Smith, Novetta Solutions June 25, 2014 Ksmith <AT> Novetta.com | KevinTSmith <AT> Comcast.Net
  2. 2. Big Data With the increase of computing power, electronic devices & accessibility to the Internet, more data than ever is being produced, collected and transmitted. Interesting Facts*: • Facebook Collects 250 Terabytes a Day • Digital Data Production worldwide doubled in 2009 to 1 zettabyte (1 million petabytes) • Worldwide digital production is expected to reach • 7.9 zettabytes in 2015 • And 35 Zettabytes in 2020 *Stats from Thompson Reuters & InfoQ, http://www.infoq.com/news/2013/12/HadoopUsage Organizations have recognized the power of data analysis, but are struggling to manage the massive amounts of information they have.
  3. 3. Securing Big Data – Why Should We Care? Regulatory, Access Control & Releasability Concerns – Regulatory - Many Organizations required to enforce access control & privacy restrictions on data sets (HIPAA, Privacy Laws) – or face steep penalties and fines – Access Control - U.S. Government organizations are required to provide access control based on Need-to-Know, & Formal Authorization Credentials – Releasability - Big Data brings new challenges related to data management & organizations are struggling to understand what results they can release without unintentionally disclosing information Insider Threat / Threats on Availability – How do you control access to your analytics? Many deployments are unsecured – “Your data is only a distributed delete away” Mismanagement of Data Sets & Breaches are Costly – AOL Research “Data Valdez Incident” – Listed as one of CNN/Money’s “Dumbest Moments in Business”: $5M Settlement + $100 to each member at the time + $50 “to any member concerned” – Netflix Contest & “Anonymized Data Set” – Class Action Lawsuit, $9M Settlement – Playstation (2011) – Experts predict costs to Sony between $2.4 and $2.6 Billion Copyright © 2014, Novetta Solutions, LLC. All rights reserved. *Ponemon Institute, “Cost of Data Breach Study: Global Analysis”, May 2013
  4. 4. What makes Securing Big Data Different? Unique Challenges to Big Data Analytics – Distributed Security: When Data and Processing are distributed to a cluster, there are lots of moving parts to secure related to confidentiality, integrity, and availability. This often leads to complexity related to the development & configuration of security on these systems. – Combination of Different Sources: Big Data Analytics Solutions are great at bringing many data sources together & doing analytics on their combination. Given that each data source may have its own access control security policy, how do you enforce security policies on the combination of these data sources? – Aggregation & Differential Privacy: When you combine different sources of data, you may discover “connections” between those data sources that may disclose more information that you intended, potentially violating access control and privacy policies. – Unintended Deduction from Large Data Sets: Data sets are typically so large, that it is often difficult to determine what may be deduced from them that may disclose sensitive information. Copyright © 2014, Novetta Solutions, LLC. All rights reserved.
  5. 5. Deduction & Differential Privacy Example Could a data analyst working for Commissioner Gordon deduce that Batman is Bruce Wayne?
  6. 6. To Complicate the Matter… Most Data Analytics Tools were designed without Security In Mind. Example: Apache Hadoop Originally No Security Model – No authentication of users or services – Anyone can submit arbitrary code to be executed – Anyone could add data to or delete data from, or read data from distributed file system – You could write a service that impersonated a Hadoop service. – Later, after authorization was added, user impersonation = command line switch 2009 Yahoo! Security Retrofit – Resulting Security Model is Complex – Configuration is Complex – No Data at Rest Encryption – Kerberos-Centric – Limited Authorization Capabilities – Easy to Mess Up if You Don’t Know What You are Doing Things Are Changing, But They are Changing Slowly! – An Alphabet Soup of Secure Distributions, Vendor Add-Ons & Security Focused-Companies – Companies releasing Hadoop Distros are taking Security Seriously (See recent press releases - Cloudera: Gazzang, HortonWorks XASecurity) – Much activity in open source movements like Project Rhino & projects like Apache Sentry Copyright © 2014, Novetta Solutions, LLC. All rights reserved.
  7. 7. All Security Needs to be Policy-Driven
  8. 8. Air Gap & Isolation Approaches - Network Isolation in various forms is used in lieu of security in “closed networks” - Import/Export is problematic - Accidents may still happen - Does not solve issues related to diff. privacy | AuthZ issues
  9. 9. Augmenting Analytic Security with Other Tools • Cell-Level Access Control via visibility • By default, uses its own db for users & credentials • Can be extended in code to use other Identity & Access Management Infrastructure Ex: Apache Accumulo Find your analytics tools limitations & complement your solution with other tools and libraries. Example here shows building a security layer over Hadoop…
  10. 10. Differential Privacy & Deduction – Many approaches are in the Academic Sphere • Cynthia Dwork from Microsoft Research is one of the leading researchers • Lots of University Work • Lots of Math involved.  – I’m involved in more practical solutions (but no Math) • Determining Access Control Policies up Front & Applying that Policy • Determining Entities that Should not Resolve (Batman + Bruce Wayne) & including this in the security of the system • Sometimes this involved an aggregation filter component to prevent the resolution of entities • We will still need to follow the academic research in this area. Copyright © 2014, Novetta Solutions, LLC. All rights reserved.
  11. 11. Final Thoughts – General Guidance Every Security Approach Is Different – Security is a Journey, Not a Destination – Know Your Security Requirements • Understand your security requirements & policies related to access to data – Know The Security Policies of Your Data: • Understand the security policies of your data so that you can enforce them – Know Your Tools & Their Limitations • Understand, from an in-depth perspective, how to successfully meet your security goals • Understand the limitations of your tools & augment your solutions with other approaches – Understand the Unique Challenges of Big Data Security • Combination of Different Sources & Resulting Policies • Aggregation and Differential Privacy (Netflix Contest) • Unintended Disclosure (The Batman Problem) Copyright © 2014, Novetta Solutions, LLC. All rights reserved.

×