Bridging the Gap Between Privacy and Big Data
Ulf Mattsson, CTO
Protegrity
ulf.mattsson AT protegrity.com
2
20 years with IBM
• Research & Development & Global Services
Inventor
• Encryption, Tokenization & Intrusion Prevention
...
Agenda
Big Data adoption rate
Security holes and threats to data
• Privacy regulations
New data protection techniques
Best...
4
Bridging the Gap Between Privacy and Big Data
BIG DATA ADOPTION AND
HOLES
5
6
Source: Gartner
Has Your Organization Already Invested in Big Data?
In 2012, Gartner
predicted Big Data to
be majority adopted
in 2 to 5 years.
In 2013, Gartner
updated this
prediction to 5 ...
Holes in Big Data…
8
Source: Gartner
9
Data
Monetization
Customer
Support
Customer
Profiles
Sales &
Marketing
Social
Media
Business
Improvement
Big
Data
Regula...
Many ways to hack Big Data
Many regulations on sensitive data
Privileged User Threats
Encryption
• Data Size
• Data type
•...
Many Ways to Hack Big Data
Source: http://nosql.mypopescu.com/post/1473423255/apache-hadoop-and-hbase
11
HDFS
(Hadoop Dist...
Big Data and The Insider Threat
12
BALANCING SECURITY
AND DATA INSIGHT
13
Balancing Security and Data Insight
Tug of war between security and data insight
Big Data is designed for access
Privacy r...
Reduction of Pain with New Protection Techniques
15
1970 2000 2005 2010
High
Low
Pain
& TCO
Strong Encryption
AES, 3DES
Fo...
10 000 000 -
1 000 000 -
100 000 -
10 000 -
1 000 -
100 -
Transactions per second*
I
Format
Preserving
Encryption
Speed of...
HOW IS TOKENIZATION
BRIDGING THE GAP?
17
De-Identified Sensitive Data
Field Real Data Tokenized / Pseudonymized
Name Joe Smith csu wusoj
Address 100 Main Street, P...
Research Brief
“Tokenization Gets Traction”
Aberdeen has seen a steady increase in enterprise
use of tokenization
Alternat...
Protection Granularity: Encryption of Fields
20
Production Systems
Encryption
•Reversible
•Policy Control (authorized / Un...
Protection Granularity: Encryption vs. Masking
21
Production Systems
Non-Production Systems
Encryption of fields
•Reversib...
Protection Granularity: Tokenization of Fields
22
Production Systems
Non-Production Systems
Vaultless Tokenization / Pseud...
CLASSIFY DATA
23
Financial Services
Healthcare and Pharmaceuticals
Infrastructure and Energy
Federal Government
Select US Regulations for S...
Privacy Laws
54 International Privacy Laws
30 United States Privacy Laws
25
1. Classify: Examples of Sensitive Data
26
Sensitive Information Compliance Regulation / Laws
Credit Card Numbers PCI DSS
...
PCI DSS
Build and maintain a secure
network.
1. Install and maintain a firewall configuration to protect data
2. Do not us...
Protection of cardholder data in memory
Clarification of key management dual control and split
knowledge
Recommendations o...
HIPAA PHI: List of 18 Identifiers
1. Names
2. All geographical subdivisions
smaller than a State
3. All elements of dates ...
Make business associates of covered entities
directly liable for compliance
Strengthen the limitations on the use and disc...
PROTECT DATA
31
Protection Beyond Kerberos
32
Volume Encryption
Field level data protection for existing
and new data.
API enabled Field l...
Volume Encryption
33
HDFS
MapReduce
Protected with
Volume Encryption
Entire file is in the
clear when analyzed
File Encryption – Authorized User
34
HDFS
MapReduce
Protected with
File Encryption
Entire file is in the
clear when analyz...
File Encryption – Non Authorized User
35
HDFS
MapReduce
Protected with
File Encryption
Entire file is in
unreadable when
a...
Volume Encryption + Gateway Field Protection
36
HDFS
MapReduce
Kerberos
Access
Control
Granular Field
Level Protection
Pro...
Volume Encryption + Internal MapReduce Field Protection
37
HDFS
MapReduce
Kerberos
Access Control
Granular Field
Level Pro...
ENFORCE DATA
PROTECTION
38
A Data Security Policy
39
What is the sensitive data that needs to be protected. Data
Element.
How you want to protect and...
Volume Encryption + Field Protection + Policy Enforcement
40
HDFS
Protected with
Volume Encryption
MapReduce
Data Protecti...
Volume Encryption + Field Protection + Policy Enforcement
41
HDFS
Protected with
Volume Encryption
MapReduce
Data Protecti...
4. Authorized User Example
42
Presentation to requestor
Name: Joe Smith
Address: 100 Main Street, Pleasantville, CA
Select...
4. Un-Authorized User Example
43
Request
Response
Presentation to requestor
Name: csu wusoj
Address: 476 srta coetse, cysi...
MONITOR DATA
PROTECTION
44
Best Practices for Protecting Big Data
Start early
Granular protection
Select the optimal protection
Enterprise coverage
P...
Proven enterprise data security software
and innovation leader
• Sole focus on the protection of data
Growth driven by ris...
How Protegrity Can Help
47
1
2
3
4
5
We can help you Classify the sensitive data
We can help you Discover where the sensit...
Please contact us for more information
Ulf.Mattsson@protegrity.com
Info@protegrity.com
www.protegrity.com
Upcoming SlideShare
Loading in …5
×

BigData and Privacy webinar at Brighttalk

446 views

Published on

BigData and Privacy webinar at Brighttalk

Published in: Technology
1 Comment
0 Likes
Statistics
Notes
  • Be the first to like this

No Downloads
Views
Total views
446
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
36
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide
  • Lets start with a recent study by Gartner about Big data adoption rate. We know that Governments have traditionally  been seen as laggards when it comes to adoption of new leading edge environments. Statistically when it comes to “Big Data”, the Gartner survey with over 700 respondents globally shows that very well with only 16% of Government respondent saying they have invested in “Big Data ”, which as you can see by the Chart is the furthest right column next to Utilities at 17%. Leading the pack, no surprise was Media and Communications followed by Banking .
  • What are the key characteristics of tokenization? No Complex Key Management Business Intelligence Production systems Reversible Policy Control (Authorized / Unauthorized Access) Test / dev Not Reversible Integrates Transparently
  • BigData and Privacy webinar at Brighttalk

    1. 1. Bridging the Gap Between Privacy and Big Data Ulf Mattsson, CTO Protegrity ulf.mattsson AT protegrity.com
    2. 2. 2 20 years with IBM • Research & Development & Global Services Inventor • Encryption, Tokenization & Intrusion Prevention Member of • PCI Security Standards Council (PCI SSC) • American National Standards Institute (ANSI) X9 • Encryption & Tokenization • International Federation for Information Processing • IFIP WG 11.3 Data and Application Security Ulf Mattsson, CTO Protegrity
    3. 3. Agenda Big Data adoption rate Security holes and threats to data • Privacy regulations New data protection techniques Best practices for protecting big data 3
    4. 4. 4 Bridging the Gap Between Privacy and Big Data
    5. 5. BIG DATA ADOPTION AND HOLES 5
    6. 6. 6 Source: Gartner Has Your Organization Already Invested in Big Data?
    7. 7. In 2012, Gartner predicted Big Data to be majority adopted in 2 to 5 years. In 2013, Gartner updated this prediction to 5 to 10 years… Why has adoption been slower than expected? Big Data Adoption Statistics 7 Are they investing? Source: Gartner
    8. 8. Holes in Big Data… 8 Source: Gartner
    9. 9. 9 Data Monetization Customer Support Customer Profiles Sales & Marketing Social Media Business Improvement Big Data Regulations & Breaches Increased profits Increased profits Increased profits Increased profits Increased profits Balancing Security and Data Insight
    10. 10. Many ways to hack Big Data Many regulations on sensitive data Privileged User Threats Encryption • Data Size • Data type • Data format • Performance (SLA) Analytics Inter-node data movement What’s the Problem with Securing Big Data? 10
    11. 11. Many Ways to Hack Big Data Source: http://nosql.mypopescu.com/post/1473423255/apache-hadoop-and-hbase 11 HDFS (Hadoop Distributed File System) MapReduce (Job Scheduling/Execution System) Hbase (Column DB) Pig (Data Flow) Hive (SQL) Sqoop ETL Tools BI Reporting RDBMS Avro(Serialization) Zookeeper(Coordination) Hackers Privileged Users Unvetted Applications Or Ad Hoc Processes
    12. 12. Big Data and The Insider Threat 12
    13. 13. BALANCING SECURITY AND DATA INSIGHT 13
    14. 14. Balancing Security and Data Insight Tug of war between security and data insight Big Data is designed for access Privacy regulations require de-identification Granular data-level protection Traditional security don’t allow for seamless data use 14
    15. 15. Reduction of Pain with New Protection Techniques 15 1970 2000 2005 2010 High Low Pain & TCO Strong Encryption AES, 3DES Format Preserving Encryption DTP, FPE Vault-based Tokenization Vaultless Tokenization Input Value: 3872 3789 1620 3675 !@#$%a^.,mhu7///&*B()_+!@ 8278 2789 2990 2789 8278 2789 2990 2789 Format Preserving Greatly reduced Key Management No Vault 8278 2789 2990 2789
    16. 16. 10 000 000 - 1 000 000 - 100 000 - 10 000 - 1 000 - 100 - Transactions per second* I Format Preserving Encryption Speed of Different Protection Methods I Vaultless Data Tokenization I AES CBC Encryption Standard I Vault-based Data Tokenization *: Speed will depend on the configuration 16
    17. 17. HOW IS TOKENIZATION BRIDGING THE GAP? 17
    18. 18. De-Identified Sensitive Data Field Real Data Tokenized / Pseudonymized Name Joe Smith csu wusoj Address 100 Main Street, Pleasantville, CA 476 srta coetse, cysieondusbak, CA Date of Birth 12/25/1966 01/02/1966 Telephone 760-278-3389 760-389-2289 E-Mail Address joe.smith@surferdude.org eoe.nwuer@beusorpdqo.org SSN 076-39-2778 076-28-3390 CC Number 3678 2289 3907 3378 3846 2290 3371 3378 Business URL www.surferdude.com www.sheyinctao.com Fingerprint Encrypted Photo Encrypted X-Ray Encrypted Healthcare Data – Primary Care Data Dr. visits, prescriptions, hospital stays and discharges, clinical, billing, etc. Protection methods can be equally applied to the actual healthcare data, but not needed with de-identification 18
    19. 19. Research Brief “Tokenization Gets Traction” Aberdeen has seen a steady increase in enterprise use of tokenization Alternative to encryption 47% using tokenization for something other than cardholder data (PII and PHI) Tokenization users had 50% fewer security-related incidents 19 Author: Derek Brink, VP and Research Fellow, IT Security and IT GRC
    20. 20. Protection Granularity: Encryption of Fields 20 Production Systems Encryption •Reversible •Policy Control (authorized / Unauthorized Access) •Lacks Integration Transparency •Complex Key Management •Example !@#$%a^.,mhu7///&*B()_+!@
    21. 21. Protection Granularity: Encryption vs. Masking 21 Production Systems Non-Production Systems Encryption of fields •Reversible •Policy Control (authorized / Unauthorized Access) •Lacks Integration Transparency •Complex Key Management •Example Masking of fields •Not reversible •No Policy, Everyone can access the data •Integrates Transparently •No Complex Key Management •Example 0389 3778 3652 0038 !@#$%a^.,mhu7///&*B()_+!@
    22. 22. Protection Granularity: Tokenization of Fields 22 Production Systems Non-Production Systems Vaultless Tokenization / Pseudonymization •No Complex Key Management •Business Intelligence •Reversible •Policy Control (Authorized / Unauthorized Access) •Not Reversible •Integrates Transparently Credit Card: 0389 3778 3652 0038
    23. 23. CLASSIFY DATA 23
    24. 24. Financial Services Healthcare and Pharmaceuticals Infrastructure and Energy Federal Government Select US Regulations for Security and Privacy 24
    25. 25. Privacy Laws 54 International Privacy Laws 30 United States Privacy Laws 25
    26. 26. 1. Classify: Examples of Sensitive Data 26 Sensitive Information Compliance Regulation / Laws Credit Card Numbers PCI DSS Names HIPAA, State Privacy Laws Address HIPAA, State Privacy Laws Dates HIPAA, State Privacy Laws Phone Numbers HIPAA, State Privacy Laws Personal ID Numbers HIPAA, State Privacy Laws Personally owned property numbers HIPAA, State Privacy Laws Personal Characteristics HIPAA, State Privacy Laws Asset Information HIPAA, State Privacy Laws
    27. 27. PCI DSS Build and maintain a secure network. 1. Install and maintain a firewall configuration to protect data 2. Do not use vendor-supplied defaults for system passwords and other security parameters Protect cardholder data. 3. Protect stored data 4. Encrypt transmission of cardholder data and sensitive information across public networks Maintain a vulnerability management program. 5. Use and regularly update anti-virus software 6. Develop and maintain secure systems and applications Implement strong access control measures. 7. Restrict access to data by business need-to-know 8. Assign a unique ID to each person with computer access 9. Restrict physical access to cardholder data Regularly monitor and test networks. 10. Track and monitor all access to network resources and cardholder data 11. Regularly test security systems and processes Maintain an information security policy. 12. Maintain a policy that addresses information security 27
    28. 28. Protection of cardholder data in memory Clarification of key management dual control and split knowledge Recommendations on making PCI DSS business-as- usual and best practices Security policy and operational procedures added Increased password strength New requirements for point-of-sale terminal security More robust requirements for penetration testing PCI DSS 3.0 28
    29. 29. HIPAA PHI: List of 18 Identifiers 1. Names 2. All geographical subdivisions smaller than a State 3. All elements of dates (except year) related to individual 4. Phone numbers 5. Fax numbers 6. Electronic mail addresses 7. Social Security numbers 8. Medical record numbers 9. Health plan beneficiary numbers 10. Account numbers 29 11. Certificate/license numbers 12. Vehicle identifiers and serial numbers 13. Device identifiers and serial numbers 14. Web Universal Resource Locators (URLs) 15. Internet Protocol (IP) address numbers 16. Biometric identifiers, including finger prints 17. Full face photographic images 18. Any other unique identifying number
    30. 30. Make business associates of covered entities directly liable for compliance Strengthen the limitations on the use and disclosure of protected health information for marketing purposes Prohibit the sale of protected health information without individual authorization. Expand individuals’ rights to receive electronic copies of their health information HIPAA Omnibus Final Rule – Some Examples 30
    31. 31. PROTECT DATA 31
    32. 32. Protection Beyond Kerberos 32 Volume Encryption Field level data protection for existing and new data. API enabled Field level data protection API enabled Field level data protection HDFS (Hadoop Distributed File System) MapReduce (Job Scheduling/Execution System) Hbase (Column DB) Pig (Data Flow) Hive (SQL) Sqoop ETL Tools BI Reporting RDBMS
    33. 33. Volume Encryption 33 HDFS MapReduce Protected with Volume Encryption Entire file is in the clear when analyzed
    34. 34. File Encryption – Authorized User 34 HDFS MapReduce Protected with File Encryption Entire file is in the clear when analyzed
    35. 35. File Encryption – Non Authorized User 35 HDFS MapReduce Protected with File Encryption Entire file is in unreadable when analyzed
    36. 36. Volume Encryption + Gateway Field Protection 36 HDFS MapReduce Kerberos Access Control Granular Field Level Protection Protected with Volume Encryption Data Protection File Gateway
    37. 37. Volume Encryption + Internal MapReduce Field Protection 37 HDFS MapReduce Kerberos Access Control Granular Field Level Protection Protected with Volume Encryption Hadoop Staging MapReduce Analytics
    38. 38. ENFORCE DATA PROTECTION 38
    39. 39. A Data Security Policy 39 What is the sensitive data that needs to be protected. Data Element. How you want to protect and present sensitive data. There are several methods for protecting sensitive data. Encryption, tokenization, monitoring, etc. Who should have access to sensitive data and who should not. Security access control. Roles & Members. When should sensitive data access be granted to those who have access. Day of week, time of day. Where is the sensitive data stored? This will be where the policy is enforced. At the protector. Audit authorized or un-authorized access to sensitive data. Optional audit of protect/unprotect. What Who When Where How Audit
    40. 40. Volume Encryption + Field Protection + Policy Enforcement 40 HDFS Protected with Volume Encryption MapReduce Data Protection Policy
    41. 41. Volume Encryption + Field Protection + Policy Enforcement 41 HDFS Protected with Volume Encryption MapReduce Data Protection Policy
    42. 42. 4. Authorized User Example 42 Presentation to requestor Name: Joe Smith Address: 100 Main Street, Pleasantville, CA Selected data displayed (least privilege) Protection at rest Name: csu wusoj Address: 476 srta coetse, cysieondusbak, CA Data Scientist, Business Analyst Policy Enforcement Does the requestor have the authority to access the protected data? AuthorizedRequest Response
    43. 43. 4. Un-Authorized User Example 43 Request Response Presentation to requestor Name: csu wusoj Address: 476 srta coetse, cysieondusbak, CA Protection at rest Name: csu wusoj Address: 476 srta coetse, cysieondusbak, CA Privileged Used, DBA, System Administrators, Bad Guy Policy Enforcement Does the requestor have the authority to access the protected data? Not Authorized
    44. 44. MONITOR DATA PROTECTION 44
    45. 45. Best Practices for Protecting Big Data Start early Granular protection Select the optimal protection Enterprise coverage Protection against insider threat Protect highly sensitive data in a way that is mostly transparent to the analysis process Policy based protection Record data access events 45
    46. 46. Proven enterprise data security software and innovation leader • Sole focus on the protection of data Growth driven by risk management and compliance • PCI (Payment Card Industry) • PII (Personally Identifiable Information) • PHI (Protected Health Information) – HIPAA • State and Foreign Privacy Laws, Breach Notification Laws Successful across many key industries Introduction to Protegrity 46
    47. 47. How Protegrity Can Help 47 1 2 3 4 5 We can help you Classify the sensitive data We can help you Discover where the sensitive data sits We can help you Protect your sensitive data in a flexible way We can help you Enforce policies that will enable business functions and preventing sensitive data from the wrong hands. We can help you Monitor sensitive data to gain insights on abnormal behaviors.
    48. 48. Please contact us for more information Ulf.Mattsson@protegrity.com Info@protegrity.com www.protegrity.com

    ×