A TALE OF TWO REGULATIONS: CROSS-BORDER DATA
PROTECTION FOR BIG DATA UNDER GDPR AND PRIVACY SHIELD
DATAWORKS SUMMIT – MUNICH 2017
Balaji Ganesan, CEO,
Privacera
Srikanth Venkat, Sr. Director,
Hortonworks
Agenda
Introductions
What is GDPR and Privacy Shield?
Key regulations to consider
Big data strategies to manage GDPR
Demo
Wrap up
3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Apache Ranger
Comprehensive and Extensible Security Model
– Centralized platform to define, administer and manage
security policies consistently across Hadoop components
(HDFS, Hive, HBase, YARN, Kafka, Solr, Storm, Knox, NiFi,
Atlas)
– Extensible Architecture with ability to add custom policy
conditions, user context enrichers
Centralized Auditing
– Central audit location for all access requests
– Support multiple destination sources (HDFS, Solr, etc.)
– Real-time visual query interface
Fine-Grained Authorization
for data access control for Database, Table, Column, LDAP
Groups & Specific Users
Advanced Security
• Dynamic Security Policies: Tag (Atlas), Prohibition, Time, and
Location
• Dynamic Column Masking & Row Filtering
OPERATIONS SECURITY
GOVERNANCE
STORAGE
STORAGE
Machine
Learning
Batch
StreamingInteractive
Search
SECURITY
4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
STRUCTURED
Apache Atlas: Metadata & Governance
TRADITIONAL
RDBMS
METADATA
MPP
APPLIANCES
Kafka Storm
Sqoop
Hive
ATLAS
METADATA
Falcon
RANGER
Custom
Partners
Metadata-driven governance services for Hadoop and
enterprise big data ecosystems
Data Lineage/Provenance
 Along the entire data lifecycle with integrated Cross
component lineage
Data Classification
 Supports classification of data assets using tags (e.g.
PII, PHI, PCI etc.) and attributes
Metadata Catalog Search
 Free text search on metadata
 Advanced search using DSL
Integrations
across the Hadoop ecosystem, through a common metadata
store
 Free text search on metadata
 OOtB real-time metadata and lineage ingestion with
Hive, Sqoop, Storm/Kafka
 APIs for custom metadata ingestion
 Apache Ranger integration for classification based
security
Key Benefits:
Modern Data Lakes need new ways to
govern because:
• Cost – Traditional staff ratio to data size not possible
• Diversity – Only way to manage velocity of new datasets
• Agility – Quick change based on tags / taxonomy
5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
HDP – Security & Governance
Classification
Prohibition
Time
Location
Policies
PDP
Resource
Cache
Ranger
Manage Access Policies
and Audit Logs
Track Metadata
and Lineage
Atlas Client
Subscribers
to Topic
Gets Metadata
Updates
Atlas
Metastore
Tags
Assets
Entitles
Streams
Pipelines
Feeds
Hive
Tables
HDFS
Files
HBase
Tables
Entities
in Data
Lake
Industry First: Dynamic Tag-based Security Policies
Privacera – Focus on Sensitive Data
Protect
Detect
Track
Discover
Sensitive
Data
Discover Sensitive Data
Track Movements of
Sensitive Data
Detect breach or
accidental use
Block user, encrypt or
mask
What is GDPR ?
o Replaces data protection
directive
o Harmonious rules across
member countries
o Applies to all companies
processing personal data
related to EU
o EU scope
o Revenue based
penalties
o Data subject rights
o Data protection
officers
o Data breach
notifications
What is Privacy Shield?
“ The EU-US Privacy Shield is a framework for transatlantic exchanges of
personal data for commercial purposes between the European
Union and the United States “
Privacy Shield vs GDPR
Source : TRUSTe
GDPR Awareness..
1 in 3
Companies feel
prepared for GDPR
today
SOURCE : Dell GDPR Survey Oct 2016
70%Business and IT
professional do not know
about their company’s
GDPR strategy
75%
outside EU
respondents did not
know or prepared for
GDPR
97%
Companies have no
concrete plans for
GDPR post May 2018
GDPR – Considerations for big data?
Cybersecurity and
Breach Notification
Consent
Profiling
RTBF and data
portability
Cybersecurity and Breach Notification
What it means? How you can manage ?
• Specific suggestions on
security
• Pseudonymisation and
encryption of personal
data.
• Ensure the ongoing
confidentiality, integrity,
availability and resilience
• Availability and access to
personal data in a timely
manner
• Data breach
• Personal data
breach to be
notified to a
supervisory
authority ” not later
than 72 hours after
having become
aware of it
• Engage with data protection
officer and cybersecurity team
• Invest in data discovery
solutions
• Data at REST and dynamic
masking solutions
• Data replication and disaster
recovery
Consent
What it means? How you can manage ?
• Affirmative consent for data processing
• Specific to data processing operation
• GDPR requires explicit consent for special
categories of personal data
• Parental consent for processing children’s personal
data
• Right to withdraw consent
• Establish a process for notice
and collecting consent
• Establish purpose for data
processing
• Leverage access control to
block data where no consent is
provided
Profiling
What it means? How you can manage ?
• Restrictions on processing that may be classified as
profiling
• Automated processing of personal data
• Using the personal data to evaluate a person
• Includes taking decisions or predicting an
individual
• Rights of subject to object or halt profiling
• Notice and access.
• Establish a strict process for
automated data processing
• For analytics involving
personal data, ensure manual
processing can be done
• Establish consent for any
manual profiling related
processing
RTBF and data portability
What it means? How you can manage ?
• Right to erasure, right to be
forgotten
• Remove data upon
request
• Reasonable access to own
data
• Right to know purposes
• Only applies if original
consent was provided
• Data Portability
• Data subjects can
receive own data
• Right to transfer
data to 3rd party
• Right to rectify data
• Invest in user centric UI to
receive requests
• Customer identity solution
• Data discovery, classification
Big data security and governance
checklist
Coordinate with Privacy and Security teams1
Invest in user and customer identity
Data discovery and classification
Centralize data around consent, purpose
Analyze pseudonymization, encryption options
2
3
4
5
6 Constantly monitor personal data for breaches
7 Automate data retention and recovery strategies
Demo Scenarios
• Data classification
• Collect consent
• Access data based on consent provide
• Dynamic anonymization
• Purpose verification
• Breach Notification
More Information…
Check out these sessions
Apace Atlas: Governance for your
data, 4:10p, Wednesday April 5th
2017
Bridle Your Flying Islands And Castles
In The Sky: Built-in Governance And
Security For The Cloud,11:30a, April
6th 2017
BoF sessions – Security and
Governance 5:50p, Thursday, April
6th 2017
Hortonworks
https://hortonworks.com/apache/ranger/
https://hortonworks.com/apache/atlas
Privacera
http://www.privacera.com/GDPR
send email to gdpr@privacera.com

A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GDPR and Privacy Shield

  • 1.
    A TALE OFTWO REGULATIONS: CROSS-BORDER DATA PROTECTION FOR BIG DATA UNDER GDPR AND PRIVACY SHIELD DATAWORKS SUMMIT – MUNICH 2017 Balaji Ganesan, CEO, Privacera Srikanth Venkat, Sr. Director, Hortonworks
  • 2.
    Agenda Introductions What is GDPRand Privacy Shield? Key regulations to consider Big data strategies to manage GDPR Demo Wrap up
  • 3.
    3 © HortonworksInc. 2011 – 2017. All Rights Reserved Apache Ranger Comprehensive and Extensible Security Model – Centralized platform to define, administer and manage security policies consistently across Hadoop components (HDFS, Hive, HBase, YARN, Kafka, Solr, Storm, Knox, NiFi, Atlas) – Extensible Architecture with ability to add custom policy conditions, user context enrichers Centralized Auditing – Central audit location for all access requests – Support multiple destination sources (HDFS, Solr, etc.) – Real-time visual query interface Fine-Grained Authorization for data access control for Database, Table, Column, LDAP Groups & Specific Users Advanced Security • Dynamic Security Policies: Tag (Atlas), Prohibition, Time, and Location • Dynamic Column Masking & Row Filtering OPERATIONS SECURITY GOVERNANCE STORAGE STORAGE Machine Learning Batch StreamingInteractive Search SECURITY
  • 4.
    4 © HortonworksInc. 2011 – 2017. All Rights Reserved STRUCTURED Apache Atlas: Metadata & Governance TRADITIONAL RDBMS METADATA MPP APPLIANCES Kafka Storm Sqoop Hive ATLAS METADATA Falcon RANGER Custom Partners Metadata-driven governance services for Hadoop and enterprise big data ecosystems Data Lineage/Provenance  Along the entire data lifecycle with integrated Cross component lineage Data Classification  Supports classification of data assets using tags (e.g. PII, PHI, PCI etc.) and attributes Metadata Catalog Search  Free text search on metadata  Advanced search using DSL Integrations across the Hadoop ecosystem, through a common metadata store  Free text search on metadata  OOtB real-time metadata and lineage ingestion with Hive, Sqoop, Storm/Kafka  APIs for custom metadata ingestion  Apache Ranger integration for classification based security Key Benefits: Modern Data Lakes need new ways to govern because: • Cost – Traditional staff ratio to data size not possible • Diversity – Only way to manage velocity of new datasets • Agility – Quick change based on tags / taxonomy
  • 5.
    5 © HortonworksInc. 2011 – 2017. All Rights Reserved HDP – Security & Governance Classification Prohibition Time Location Policies PDP Resource Cache Ranger Manage Access Policies and Audit Logs Track Metadata and Lineage Atlas Client Subscribers to Topic Gets Metadata Updates Atlas Metastore Tags Assets Entitles Streams Pipelines Feeds Hive Tables HDFS Files HBase Tables Entities in Data Lake Industry First: Dynamic Tag-based Security Policies
  • 6.
    Privacera – Focuson Sensitive Data Protect Detect Track Discover Sensitive Data Discover Sensitive Data Track Movements of Sensitive Data Detect breach or accidental use Block user, encrypt or mask
  • 7.
    What is GDPR? o Replaces data protection directive o Harmonious rules across member countries o Applies to all companies processing personal data related to EU o EU scope o Revenue based penalties o Data subject rights o Data protection officers o Data breach notifications
  • 8.
    What is PrivacyShield? “ The EU-US Privacy Shield is a framework for transatlantic exchanges of personal data for commercial purposes between the European Union and the United States “
  • 9.
    Privacy Shield vsGDPR Source : TRUSTe
  • 10.
    GDPR Awareness.. 1 in3 Companies feel prepared for GDPR today SOURCE : Dell GDPR Survey Oct 2016 70%Business and IT professional do not know about their company’s GDPR strategy 75% outside EU respondents did not know or prepared for GDPR 97% Companies have no concrete plans for GDPR post May 2018
  • 11.
    GDPR – Considerationsfor big data? Cybersecurity and Breach Notification Consent Profiling RTBF and data portability
  • 12.
    Cybersecurity and BreachNotification What it means? How you can manage ? • Specific suggestions on security • Pseudonymisation and encryption of personal data. • Ensure the ongoing confidentiality, integrity, availability and resilience • Availability and access to personal data in a timely manner • Data breach • Personal data breach to be notified to a supervisory authority ” not later than 72 hours after having become aware of it • Engage with data protection officer and cybersecurity team • Invest in data discovery solutions • Data at REST and dynamic masking solutions • Data replication and disaster recovery
  • 13.
    Consent What it means?How you can manage ? • Affirmative consent for data processing • Specific to data processing operation • GDPR requires explicit consent for special categories of personal data • Parental consent for processing children’s personal data • Right to withdraw consent • Establish a process for notice and collecting consent • Establish purpose for data processing • Leverage access control to block data where no consent is provided
  • 14.
    Profiling What it means?How you can manage ? • Restrictions on processing that may be classified as profiling • Automated processing of personal data • Using the personal data to evaluate a person • Includes taking decisions or predicting an individual • Rights of subject to object or halt profiling • Notice and access. • Establish a strict process for automated data processing • For analytics involving personal data, ensure manual processing can be done • Establish consent for any manual profiling related processing
  • 15.
    RTBF and dataportability What it means? How you can manage ? • Right to erasure, right to be forgotten • Remove data upon request • Reasonable access to own data • Right to know purposes • Only applies if original consent was provided • Data Portability • Data subjects can receive own data • Right to transfer data to 3rd party • Right to rectify data • Invest in user centric UI to receive requests • Customer identity solution • Data discovery, classification
  • 16.
    Big data securityand governance checklist Coordinate with Privacy and Security teams1 Invest in user and customer identity Data discovery and classification Centralize data around consent, purpose Analyze pseudonymization, encryption options 2 3 4 5 6 Constantly monitor personal data for breaches 7 Automate data retention and recovery strategies
  • 17.
    Demo Scenarios • Dataclassification • Collect consent • Access data based on consent provide • Dynamic anonymization • Purpose verification • Breach Notification
  • 18.
    More Information… Check outthese sessions Apace Atlas: Governance for your data, 4:10p, Wednesday April 5th 2017 Bridle Your Flying Islands And Castles In The Sky: Built-in Governance And Security For The Cloud,11:30a, April 6th 2017 BoF sessions – Security and Governance 5:50p, Thursday, April 6th 2017 Hortonworks https://hortonworks.com/apache/ranger/ https://hortonworks.com/apache/atlas Privacera http://www.privacera.com/GDPR send email to gdpr@privacera.com

Editor's Notes

  • #4 HDP2.5 represents a maturing of our offerings in security and a sort of growing of our little elephant
  • #5 4