Five Steps to Secure Big Data
Ulf Mattsson, CTO
Protegrity
ulf.mattsson AT protegrity.com
Ulf Mattsson, CTO Protegrity
20 years with IBM
• Research & Development & Global Services

Inventor
• Encryption, Tokeniza...
Big Data
What is Big Data?
Hadoop
• Designed to handle the emerging “4 V’s”
• Massively Parallel Processing (MPP)
• Elastic scale
•...
Has Your Organization Already Invested in Big Data?

Source: Gartner
5
http://www.ey.com/Publication/vwLUAssets/EY_-_2013_Global_Information_Security_Survey/$FILE/EY-GISS-Under-cyber-attack.pdf...
Holes in Big Data…

Source: Gartner
7
Many Ways to Hack Big Data

BI Reporting

RDBMS

Hackers

Pig (Data Flow)

Hive (SQL)

Sqoop

Unvetted
Applications
Or
Ad ...
Current Data Security for Big data
Authentication
• Who am I and how do I prove it?
•

Ensure the identity of the users, s...
Data
Security

10

Taking Data Security
to the Next Level
Achieving Best Data Security for Big Data
Massively Scalable Data Security
Maximum Transparency
Maximum Performance
Easy t...
Many Layers of Defense
Corporate Enterprise

Kerberos Authentication
Encrypted Communications

Big Data

Corporate Firewal...
Protecting the Big Data Ecosystem
BI Applications

BI Applications are authorized to access
sensitive data through the pol...
Coarse
Grained

14

Policy Based
File and Disk
Encryption
File Based Encryption Example
Files with personal identifiable information
Stored in Hadoop cluster
Root user logged-in to...
Fine
Grained

16

Policy Based
Field Level Data
Protection
Fine Grained Protection: Field Protection

Production Systems

Encryption
• Reversible
• Policy Control (Authorized / Unau...
Field Level Protection Example
Files with personal identifiable information
Loaded in to a Hive table
Select data from tha...
Security
Policy

19

Take Control Of Data
Security
Policy Based Access Control

Combination of what
data needs to be
protected and who has
access to that data is
the key to ...
Protegrity Data Security Policy

What

What is the sensitive data that needs to be protected. Data
Element.

How

How you ...
Policy Based Filed Protection Example
Files with personal identifiable information
Loaded in to a Hive table
Create a view...
Enterprise Strength

Enterprise

23

Protection platforms must
protect sensitive data end to
end – at rest, in transit and...
End to End Data Security Across the Enterprise

Enterprise Heterogeneous Coverage
• File Protectors: AIX, HPUX, Linux, Sol...
Best Practices for Protecting Big Data
Start Early
Fine Grained protection
Select the optimal protection for the future
En...
Five Point Data Protection
Methodology

1. Classify

26

2. Discovery

3. Protect

4. Enforce

5. Monitor
Classify
Determine what data is
sensitive to your organization.

27
Select US Regulations for Security and Privacy
Financial Services
Healthcare and Pharmaceuticals
Infrastructure and Energy...
1. Classify: Examples of Sensitive Data
Sensitive Information
Credit Card Numbers

PCI DSS

Names

HIPAA, State Privacy La...
Discovery
Discover where the sensitive
data is located and how it flows

30
2. Discovery in a large enterprise with many systems
System

System

System

System

System

System

System

System

Syste...
2. Discovery: Determine the context to the Business
System

Retail

System

System

Employees
System

System

Corporate IP...
2. Discover: Context to the Business and to Security
Collecting
transactions

Stores &
Ecommerce

Databases

Data Protecti...
Protect
Protect the sensitive data at
rest and in transit.

34
Balancing Security and Data Insight
Tug of war between security and data insight
Big Data is designed for access
Privacy r...
Protection Beyond Kerberos

ETL Tools

BI Reporting

RDBMS

Pig (Data Flow)

Hive (SQL)

Sqoop

MapReduce
(Job Scheduling/...
Volume Encryption

Entire file is in the
clear when analyzed

MapReduce

HDFS

Protected with
Volume Encryption

37
File Encryption – Authorized User

Entire file is in the
clear when analyzed

MapReduce

HDFS

Protected with
File Encrypt...
File Encryption – Non Authorized User

Entire file is in
unreadable when
analyzed

MapReduce

HDFS

Protected with
File En...
Volume Encryption + Gateway Field Protection

Granular Field
Level Protection

MapReduce

HDFS

Data Protection File
Gatew...
Volume Encryption + Internal MapReduce Field Protection

Analytics
Granular Field
Level Protection

MapReduce
Hadoop
Stagi...
Enforce
Policies are used to enforce
rules about how sensitive data
should be treated in the
enterprise.

42
A Data Security Policy
What

What is the sensitive data that needs to be protected. Data
Element.

How

How you want to pr...
Volume Encryption + Field Protection + Policy Enforcement

MapReduce

HDFS
Protected with
Volume Encryption

Data Protecti...
Volume Encryption + Field Protection + Policy Enforcement

MapReduce

HDFS
Protected with
Volume Encryption

Data Protecti...
4. Authorized User Example
Presentation to requestor
Name: Joe Smith
Address: 100 Main Street, Pleasantville, CA

Data Sci...
4. Un-Authorized User Example
Presentation to requestor
Name: csu wusoj
Address: 476 srta coetse, cysieondusbak, CA

Privi...
Monitor
A critically important part of a
security solution is the ongoing
monitoring of any activity on
sensitive data.

4...
Best Practices for Protecting Big Data
Start early
Granular protection
Select the optimal protection
Enterprise coverage
P...
How Protegrity Can Help

1
2

We can help you Discover where the sensitive data sits

3

We can help you Protect your sens...
Protegrity Summary
Proven enterprise data security
software and innovation leader
•

Sole focus on the protection of
data
...
Please contact us for more information
Ulf.Mattsson@protegrity.com
Info@protegrity.com
Upcoming SlideShare
Loading in...5
×

Five steps to secure big data

298

Published on

database, security, big data, tokenization

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
298
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
17
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Five steps to secure big data

  1. 1. Five Steps to Secure Big Data Ulf Mattsson, CTO Protegrity ulf.mattsson AT protegrity.com
  2. 2. Ulf Mattsson, CTO Protegrity 20 years with IBM • Research & Development & Global Services Inventor • Encryption, Tokenization & Intrusion Prevention Involvement • PCI Security Standards Council (PCI SSC) • American National Standards Institute (ANSI) X9 • Encryption & Tokenization • International Federation for Information Processing • IFIP WG 11.3 Data and Application Security • ISACA New York Metro chapter 2
  3. 3. Big Data
  4. 4. What is Big Data? Hadoop • Designed to handle the emerging “4 V’s” • Massively Parallel Processing (MPP) • Elastic scale • Usually Read-Only • Allows for data insights on massive, heterogeneous data sets • Includes an ecosystem of components: Hive Pig Other Application Layers MapReduce HDFS Storage Layers Physical Storage 4
  5. 5. Has Your Organization Already Invested in Big Data? Source: Gartner 5
  6. 6. http://www.ey.com/Publication/vwLUAssets/EY_-_2013_Global_Information_Security_Survey/$FILE/EY-GISS-Under-cyber-attack.pdf 6
  7. 7. Holes in Big Data… Source: Gartner 7
  8. 8. Many Ways to Hack Big Data BI Reporting RDBMS Hackers Pig (Data Flow) Hive (SQL) Sqoop Unvetted Applications Or Ad Hoc Processes MapReduce (Job Scheduling/Execution System) Hbase (Column DB) HDFS (Hadoop Distributed File System) Source: http://nosql.mypopescu.com/post/1473423255/apache-hadoop-and-hbase 8 Avro (Serialization) Zookeeper (Coordination) ETL Tools Privileged Users
  9. 9. Current Data Security for Big data Authentication • Who am I and how do I prove it? • Ensure the identity of the users, services and hosts that make up and use the system is authoritatively known Authorization • What am I allowed to see and do? • Ensure services and data are accessed only by entitled identities Data Protection • How is my Data being Protected? • Ensure data cannot be usefully stolen or undetectably tampered with Auditing • What have I attempted to do or done? • Ensure a permanent record of who did what, when
  10. 10. Data Security 10 Taking Data Security to the Next Level
  11. 11. Achieving Best Data Security for Big Data Massively Scalable Data Security Maximum Transparency Maximum Performance Easy to Use Heterogeneous System Compatibility Enterprise Ready
  12. 12. Many Layers of Defense Corporate Enterprise Kerberos Authentication Encrypted Communications Big Data Corporate Firewall Authorization through ACLs Fine Grained Big Data Cluster 8 Data Security Policy Protegrity Coarse Grained
  13. 13. Protecting the Big Data Ecosystem BI Applications BI Applications are authorized to access sensitive data through the policy. Data Access Framework Pig Hive Data Processing Framework (MapReduce) Data Storage Framework (HDFS) User Defined Functions (UDFs) enable Field Level data protection with Policy based access controls with Monitoring. Java API enables Field Level data protection with Policy based access controls with Monitoring. File level data protection with Policy based access controls for existing and new data. Volume or File Encryption with Policy based access controls at the OS file system level.
  14. 14. Coarse Grained 14 Policy Based File and Disk Encryption
  15. 15. File Based Encryption Example Files with personal identifiable information Stored in Hadoop cluster Root user logged-in to one of the nodes Search for sensitive information on disk
  16. 16. Fine Grained 16 Policy Based Field Level Data Protection
  17. 17. Fine Grained Protection: Field Protection Production Systems Encryption • Reversible • Policy Control (Authorized / Unauthorized Access) • Lacks Integration Transparency • Complex Key Management • Example !@#$%a^.,mhu7///&*B()_+!@ Tokenization / Pseudonymization • Reversible • Policy Control (Authorized / Unauthorized Access) • Integrates Transparently • No Complex Key Management • Business Intelligence Credit Card: 0389 3778 3652 0038 Non-Production Systems 17 Masking • Not reversible • No Policy, Everyone Can Access the Data • Integrates Transparently • No Complex Key Management • Example 0389 3778 3652 0038
  18. 18. Field Level Protection Example Files with personal identifiable information Loaded in to a Hive table Select data from that table Root user logged-in to one of the nodes Search for sensitive information on disk
  19. 19. Security Policy 19 Take Control Of Data Security
  20. 20. Policy Based Access Control Combination of what data needs to be protected and who has access to that data is the key to creating a meaningful policy 20 What Who What is the sensitive data that needs to be protected. Data Element. Who should have access to sensitive data and who should not. Security access control. Roles & Members.
  21. 21. Protegrity Data Security Policy What What is the sensitive data that needs to be protected. Data Element. How How you want to protect and present sensitive data. There are several methods for protecting sensitive data. Encryption, tokenization, monitoring, etc. Who Who should have access to sensitive data and who should not. Security access control. Roles & Members. When When should sensitive data access be granted to those who have access. Day of week, time of day. Where Where is the sensitive data stored? This will be where the policy is enforced. At the protector. Audit Audit authorized or un-authorized access to sensitive data. Optional audit of protect/unprotect.
  22. 22. Policy Based Filed Protection Example Files with personal identifiable information Loaded in to a Hive table Create a view on that table Select data as authorized user Select data as privileged user
  23. 23. Enterprise Strength Enterprise 23 Protection platforms must protect sensitive data end to end – at rest, in transit and on any technology platform
  24. 24. End to End Data Security Across the Enterprise Enterprise Heterogeneous Coverage • File Protectors: AIX, HPUX, Linux, Solaris, Windows • Database Protectors : DB2, SQL Server, Oracle, Teradata, Informix, Netezza, Greenplum • Big Data Protectors: BigInsights, Cloudera, Greenplum, mapR, Aster, Apache Hadoop, Hortonworks • Big Iron Platform: zSeries, HP Non-Stop 24
  25. 25. Best Practices for Protecting Big Data Start Early Fine Grained protection Select the optimal protection for the future Enterprise coverage Protection against insider threat Transparent protection to the analysis process Policy based protection and audit 25
  26. 26. Five Point Data Protection Methodology 1. Classify 26 2. Discovery 3. Protect 4. Enforce 5. Monitor
  27. 27. Classify Determine what data is sensitive to your organization. 27
  28. 28. Select US Regulations for Security and Privacy Financial Services Healthcare and Pharmaceuticals Infrastructure and Energy Federal Government 28
  29. 29. 1. Classify: Examples of Sensitive Data Sensitive Information Credit Card Numbers PCI DSS Names HIPAA, State Privacy Laws Address HIPAA, State Privacy Laws Dates HIPAA, State Privacy Laws Phone Numbers HIPAA, State Privacy Laws Personal ID Numbers HIPAA, State Privacy Laws Personally owned property numbers HIPAA, State Privacy Laws Personal Characteristics HIPAA, State Privacy Laws Asset Information 29 Compliance Regulation / Laws HIPAA, State Privacy Laws
  30. 30. Discovery Discover where the sensitive data is located and how it flows 30
  31. 31. 2. Discovery in a large enterprise with many systems System System System System System System System System System System System System Corporate Firewall System 031
  32. 32. 2. Discovery: Determine the context to the Business System Retail System System Employees System System Corporate IP System Healthcare Corporate Firewall System 032 032
  33. 33. 2. Discover: Context to the Business and to Security Collecting transactions Stores & Ecommerce Databases Data Protection Solution Requirements File Server Hadoop Applications File Server containing IP Corporate Firewall Research Databases 033
  34. 34. Protect Protect the sensitive data at rest and in transit. 34
  35. 35. Balancing Security and Data Insight Tug of war between security and data insight Big Data is designed for access Privacy regulations require de-identification Granular data-level protection Traditional security don’t allow for seamless data use 35
  36. 36. Protection Beyond Kerberos ETL Tools BI Reporting RDBMS Pig (Data Flow) Hive (SQL) Sqoop MapReduce (Job Scheduling/Execution System) API enabled Field level data protection API enabled Field level data protection Hbase (Column DB) HDFS Field level data protection for existing and new data. (Hadoop Distributed File System) Volume Encryption 36
  37. 37. Volume Encryption Entire file is in the clear when analyzed MapReduce HDFS Protected with Volume Encryption 37
  38. 38. File Encryption – Authorized User Entire file is in the clear when analyzed MapReduce HDFS Protected with File Encryption 38
  39. 39. File Encryption – Non Authorized User Entire file is in unreadable when analyzed MapReduce HDFS Protected with File Encryption 39
  40. 40. Volume Encryption + Gateway Field Protection Granular Field Level Protection MapReduce HDFS Data Protection File Gateway 40 Kerberos Access Control Protected with Volume Encryption
  41. 41. Volume Encryption + Internal MapReduce Field Protection Analytics Granular Field Level Protection MapReduce Hadoop Staging HDFS MapReduce 41 Kerberos Access Control Protected with Volume Encryption
  42. 42. Enforce Policies are used to enforce rules about how sensitive data should be treated in the enterprise. 42
  43. 43. A Data Security Policy What What is the sensitive data that needs to be protected. Data Element. How How you want to protect and present sensitive data. There are several methods for protecting sensitive data. Encryption, tokenization, monitoring, etc. Who Who should have access to sensitive data and who should not. Security access control. Roles & Members. When Where Where is the sensitive data stored? This will be where the policy is enforced. At the protector. Audit 43 When should sensitive data access be granted to those who have access. Day of week, time of day. Audit authorized or un-authorized access to sensitive data. Optional audit of protect/unprotect.
  44. 44. Volume Encryption + Field Protection + Policy Enforcement MapReduce HDFS Protected with Volume Encryption Data Protection Policy 44
  45. 45. Volume Encryption + Field Protection + Policy Enforcement MapReduce HDFS Protected with Volume Encryption Data Protection Policy 45
  46. 46. 4. Authorized User Example Presentation to requestor Name: Joe Smith Address: 100 Main Street, Pleasantville, CA Data Scientist, Business Analyst Selected data displayed (least privilege) Response Request Policy Enforcement Authorized Does the requestor have the authority to access the protected data? Protection at rest Name: csu wusoj Address: 476 srta coetse, cysieondusbak, CA 46
  47. 47. 4. Un-Authorized User Example Presentation to requestor Name: csu wusoj Address: 476 srta coetse, cysieondusbak, CA Privileged Used, DBA, System Administrators, Bad Guy Response Request Policy Enforcement Not Authorized Does the requestor have the authority to access the protected data? Protection at rest Name: csu wusoj Address: 476 srta coetse, cysieondusbak, CA 47
  48. 48. Monitor A critically important part of a security solution is the ongoing monitoring of any activity on sensitive data. 48
  49. 49. Best Practices for Protecting Big Data Start early Granular protection Select the optimal protection Enterprise coverage Protection against insider threat Protect highly sensitive data in a way that is mostly transparent to the analysis process Policy based protection Record data access events 49
  50. 50. How Protegrity Can Help 1 2 We can help you Discover where the sensitive data sits 3 We can help you Protect your sensitive data in a flexible way 4 We can help you Enforce policies that will enable business functions and preventing sensitive data from the wrong hands. 5 50 We can help you Classify the sensitive data We can help you Monitor sensitive data to gain insights on abnormal behaviors.
  51. 51. Protegrity Summary Proven enterprise data security software and innovation leader • Sole focus on the protection of data • Patented Technology, Continuing to Drive Innovation Cross-industry applicability • • Financial Services, Insurance, Banking • Healthcare • Telecommunications, Media and Entertainment • 51 Retail, Hospitality, Travel and Transportation Manufacturing and Government
  52. 52. Please contact us for more information Ulf.Mattsson@protegrity.com Info@protegrity.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×