In our era of “Big Data”, organizations are collecting, analyzing, and making decisions based on analysis of massive amounts of data sets from various sources, and security in this process is becoming increasingly more important. With regulations like HIPAA and other privacy protection laws, securing access and determining releasability of data sets is critical. Organizations using Big Data Analytics solutions face challenges, as most of today’s solutions were not designed with security in mind. This presentation focuses on challenges, use cases, and practical real-world solutions related to securing and preserving privacy in Big Data Analytics solutions, addressing authorization, differential privacy, and more.
2. Big Data
With the increase of computing power, electronic devices & accessibility to the Internet,
more data than ever is being produced, collected and transmitted.
Interesting Facts*:
• Facebook Collects 250 Terabytes a Day
• Digital Data Production worldwide doubled in
2009 to 1 zettabyte (1 million petabytes)
• Worldwide digital production is expected to reach
• 7.9 zettabytes in 2015
• And 35 Zettabytes in 2020
*Stats from Thompson Reuters & InfoQ, http://www.infoq.com/news/2013/12/HadoopUsage
Organizations have recognized the power of data analysis, but are struggling to manage
the massive amounts of information they have.
8. Air Gap & Isolation Approaches
- Network Isolation in various forms is used
in lieu of security in “closed networks”
- Import/Export is problematic
- Accidents may still happen
- Does not solve issues related to diff.
privacy | AuthZ issues
9. Augmenting Analytic Security with Other Tools
• Cell-Level Access Control via visibility
• By default, uses its own db for
users & credentials
• Can be extended in code to use other
Identity & Access Management
Infrastructure
Ex: Apache Accumulo Find your analytics tools limitations &
complement your solution with other tools
and libraries.
Example here shows building a security
layer over Hadoop…