Data Governance, Security & Compliance
Crash Course
Srikanth Venkat – Senior Director, PM
Eyad Garelnabi – Senior Solutions Engineer
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Regulatory Compliance
How can HDP help with Compliance requirements?
Overview of applicable HDP components
- Authentication & Data Protection: Kerberos, Ranger KMS
- Authorization/Audit/Admin: Apache Ranger
- Data Governance: Apache Atlas
- Data Stewardship: Data Steward Studio
Demo & Optional Lab
Building Blocks for Compliance
Identify
and classify
personal data
Understand
provenance,
origin, lineage
and impact
Classify
personal data
for business
purpose
and security
Centralize
data access
and auditing
for consent
and purpose
Monitor
and correlate
data access via
audits for
breach
forensics
Anonymize
&
pseudonymize
personal/
sensitive
Automate
data use,
retention,
and recovery
strategies
How Do I….
1. Discover and inventory
personal data across data lakes?
2. Centralize consent data to provide
appropriate security controls in a data lake?
3. Efficiently erase subject’s data
when they invoke the right to be forgotten?
3
Key Compliance
Challenges
Title
Data mapping and inventory are
critical for privacy protection
• Article 6: Lawfulness of Processing
• Article 30: Records of Processing Activities
• Article 32: Security of Processing
Required to secure and
analyze data responsibly
• Article 25: Data protection by design
and by default
• Article 5: Principles of personal data processing
• Article 35: Data protection impact assessment
Understand Your Personal DataChallenge #1
Data Map
500,000
Tables
50 Million
Columns
5,000
Database
Where?
Where is the
personal data
located?
What?
What is this
data about
(subject
matter or
domain) - for
e.g. personal
data?
Who?
Who is the
owner for this
data?
Who has access
to it and who
has accessed it?
Why?
For what
purpose was
this data
collected?
When?
When was
this data
created or
updated?
When was it
accessed?
How?
Was proper
consent
acquired to
collect this
data?
Was there a
legal basis for
processing
this data?
Understand Your Personal DataChallenge #1
Challenge #2 & 3
SUBJECT
RIGHTS
UNDER
GDPR Data
Portability
Erasure
Consent
Consent can be
withdrawn at any
time by customer
Right to transfer
personal data
from one data
processor to
another
Right to be have all
personal data erased
upon subject's request
Correction
Right to rectify inaccurate
data
Access
Right to know
what information
that has been
collected and how it
is being processed
Consent needs to be:
• Specific per purpose
• Explicit
Subject can revoke consent
at any time they choose
• Article 7: Conditions of consent
Challenge #2
marketing email
phone chat
web forms others
Ways of acquiring consent
Subject can request erasure
under several conditions
• Article 17: Right to erasure
Data gathered on minors
without parental consent
Subject objects
to processing
Data collected
without specific
processing intent
Erasure for controller's
legal or regulatory
compliance obligations
Subject withdraws
consent and
requests erasure
Unlawful processing
of data
Right to
erasure (Right
to be forgotten)
Handling Right to be Forgotten in HadoopChallenge #3
Hortonworks Stack – Solutions & OSS Components
YARN
HDFS
Storage
and Compute
Components
Data Steward Studio (DSS)Solutions
12 © Hortonworks Inc. 2011–2018. All rights reserved.
How can HDP help with GDPR Compliance?
Identify and Classify Sensitive Personal Data (Apache Atlas – tagging)1
Classify personal data for business purpose and security (Apache Atlas – tagging,
Apache Ranger – tag based policies)
Centralized data access and auditing for consent and purpose (Apache Ranger
dynamic row filtering, Hive ACID)
Anonymization/pseudonymization (Apache Ranger dynamic masking)
2
3
4
5
6
Automate data use, retention, and recovery strategies (Apache Atlas - tagging +
Apache Ranger – tag based policies + other data movement tools/scripts)
Monitor and correlate data access via audits for breach forensics (Apache Ranger
audits)
Understand provenance, origin, lineage and impact (Apache Atlas – lineage)
7
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Authentication & Data Protection
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Background: HDP + Kerberos
Service
Component
A
Service
Component
B
HDP Cluster
KDC
keytabkeytab
Service
Component
C
keytab
Service
Component
D
keytab
Service
Component
X
Service
Component
X
keytabkeytab
Service
Component
X
keytab
Service
Component
X
keytab
Kerberos is used to
secure the
Components in the
cluster. Kerberos
identities are
managed via
“keytabs” on the
Component hosts.
Principals
for the
cluster are
managed in
the KDC.
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Kerberos + Active Directory
Page 25
Cross Realm Trust
Client
Hadoop Cluster
AD /
LDAP KDC
Users: smith@EXAMPLE.COM
Hosts: host1@HADOOP.EXAMPLE.COM
Services: hdfs/host1@HADOOP.EXAMPLE.COM
User Store
Use existing directory
tools to manage users
Use Kerberos tools to
manage host + service
principals
Authentication
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data Protection in Hadoop
can be applied at three different layers
in HDP
Storage: encrypt data while it is at rest
Transparent Data Encryption in HDFS, Ranger KMS + HSM, Partner Products
(HPE Voltage, Protegrity, Dataguise)
Transmission: encrypt data over the wire when it
leaves the cluster
SASL (RPC) and TLS (HTTP)
Avoid for intracluster communication
Upon Access: apply restrictions when accessed
Ranger (Dynamic Column Masking + Row Filtering), Partner Masking +
Encryption
Data Protection
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger KMS
Transparent Data Encryption in HDFS
NN
A B
C D
HDFS Client
A B
C D
A B
C D
DN DN DN
Benefits
v Selective encryption of relevant files/folders
v Prevent rogue admin access to sensitive data
v Fine grained access controls
v Transparent to end application w/o changes
v Ranger KMS integrated to external HSM
(Safenet Luna) adding to reliability/security of
KMS
SafeNet-
Luna HSM
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Authorization, Audit, Administration
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Ranger
• Central audit location for all
access requests
• Support multiple destination
sources (HDFS, Solr, etc.)
• Real-time visual query interface
AuditingAuthorization
• Store and manage encryption
keys
• Support HDFS Transparent Data
Encryption
• Integration with HSM
• Safenet LUNA
Ranger KMS
• Centralized platform to define, administer
and manage security policies consistently
across Hadoop components
• HDFS, Hive, HBase, YARN, Kafka, Solr,
Storm, Knox, NiFi, Atlas
• Extensible Architecture
• Custom policy conditions, user context
enrichers
• Easy to add new component types for
authorization
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger – ABAC Model
⬢ ABAC Model
⬢ Combination of the subject, action,
resource, and environment
⬢ Uses descriptive attributes: AD group,
Apache Atlas-based tags or classifications,
geo-location, etc.
⬢ Ranger approach is consistent with NIST
800-162
⬢ Avoid role proliferation and manageability
issues
35 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
HDP Architecture: Dynamic Tag-based Security Policies
Classification
Prohibition
Time
Location
Policies
PDP
Resource
Cache
Ranger
Manage Access Policies
and Audit Logs
Track Metadata
and Lineage
Atlas Client
Subscribers
to Topic
Gets Metadata
Updates
Atlas
Metastore
Tags
Assets
Entitles
Streams
Pipelines
Feeds
Hive
Tables
HDFS
Files
HBase
Tables
Entities
in Data
Lake
41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Dynamic Row Filtering & Column Masking: Apache Ranger with Apache Hive
User 2: Ivanna
Location : EU
Group: HRUser 1: Joe
Location : US
Group: Analyst
Original Query:
SELECT country, nationalid,
ccnumber, mrn, name FROM
ww_customers
Country National
ID
CC No DOB MRN Name Policy ID
US 232323233 4539067047629850 9/12/1969 8233054331 John Doe nj23j424
US 333287465 5391304868205600 8/13/1979 3736885376 Jane Doe cadsd984
Germany T22000129 4532786256545550 3/5/1963 876452830A Ernie Schwarz KK-2345909
Country National ID CC No MRN Name
US xxxxx3233 4539 xxxx xxxx xxxx null John Doe
US xxxxx7465 5391 xxxx xxxx xxxx null Jane Doe
Ranger Policy Enforcement
Query Rewritten based on Dynamic Ranger
Policies: Filter rows by region & apply
relevant column masking
Users from US Analyst group see data
for US persons with CC and National ID
(SSN) as masked values and MRN is
nullified
Country National ID Name MRN
Germany T22000129 Ernie Schwarz 876452830A
EU HR Policy Admins can see
unmasked but are restricted
by row filtering policies to see
data for EU persons only
Original Query:
SELECT country, nationalid,
name, mrn FROM
ww_customers
Analysts
HR Marketing
45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data Governance
47 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
GOVERNANCE
SECURITY
BUSINESS
• Metadata Catalog & Search
• Lineage & Chain of Custody
• Business glossary
• Metadata Audits & Security
ATLAS CAPABILITIES
48 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Atlas Hooks and Bridges
metadata
Information
Governance Catalog
Tech Preview
58 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data Stewardship
59 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What If…
In the Cloud
On Premises
Aware of
Data Sources
Enable
New Services
Security
Controls
Cluster 2
(Unstructured)
Cluster 1
(Structured
)
Cluster 2
(Unstructured
)
Cluster 1
(Structured
)
Cluster 3
(Structured
)
Data Center Dublin
Cluster 2
(Unstructured)
Cluster 1
(Structured
)
Cluster 3
(Structured
)
Cluster 4
(Unstructured)
Data Center Las Vegas
Cluster 2
(Unstructured
)
Cluster 1
(Structured
)
Cluster 3
(Structured
)
Data Center Bangkok
Cluster 1
(Unstructured)
Cluster 2
(Structured
)
60 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HORTONWORKS
DATAPLANE SERVICE
Manage, Secure, Govern
Hortonworks DataPlane Service
a common set of services that
⬢ Supports enterprise deployment strategy and
move to the cloud
⬢ Addresses compliance and regulatory
requirements for enterprise
⬢ Eliminates policy silos and ensures security &
governance moves w/ data
⬢ Simplifies data asset management and provides
access to analysts and data scientists
⬢ Extensible to new services: Services enablement
layer to bring new offerings to market rapidly
What is Hortonworks DataPlane Service?
DATA IN MOTION
Hortonworks DataFlow
DATA AT REST
Hortonworks Data Platform
MULTIPLE CLUSTERS & SOURCES
CLOUD
ON PREM
H Y B R I D
IOT
61 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Goals
62 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo & Optional Lab
63 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo Scenario
à Hortonia – mid-size financial services company (HortoniaBank merged with
HortoniaAssurance health insurance services) expanding from US to
international markets
à Employees in EU and US
à Multiple business units need access to customer data: Analysts, Compliance
Admins, HR
à Customer data is co-mingled as well as isolated
à Leases data from external data brokers
à Needs to have rational security policies to provide the right level of access
control to customer data across geographies, business functions, and to
comply with external regulations (GDPR, PII, HIPAA, EU Privacy etc.)
all user passwords: hadoop
64 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo Setup
à 2 Customer Tables: 50K customer records each with 38 fields (PII, PHI, PCI &
non-sensitive data)
– us_customers: USA person data only
– ww_customers: multi-language, multi-country, localized person data
à 1 Reference table: eu_countries (reference table for looking up EU country
codes to country mappings)
à Finance DB: 1 data set leased from a data broker
– tax_2015: Data lease expired already (on Dec 31st 2015)
all user passwords: hadoop
65 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger Hive Policies Setup for Demo
à Only US employees can see data in us_customers table and only from locations within the US
(access_us_customers)
à US employees can see only data rows of US persons in ww_customers table (“filter_ww_customers for
consent” + access_ww_customers)
à EU employees can see only data rows of EU persons who have given consent within last 365 days in
ww_customers table (“filter_ww_customers for consent” + access_ww_customers)
à US HR team members can see all original unmasked data (PCI, PII,….)
à Super users belonging to etl/DPO groups can see data for all EU customers
à Masking: Analysts can view only masked versions of sensitive data from WW customers table but are
prohibited from viewing PII data in US tables (All masking policies under Masking Tab of Resource based
policies)
à Prohibition: No combination of zip code, insurance, and bloodgroup data are permitted to be joined in any
query (prohibition policy)
67 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Personas
Data Column Masking Type Sample Output
Password** Hash 237672b21819462ff39fcea7d990c3e5
National ID Last 4 Only xx-xx-9324
Credit Card First 4 Only 4532xxxxxxxxxxxx
Street Address Static nnn Xxxxxx Xxxxx
MRN** NULL null
Birthdate CUSTOM Hide birthday by showing it as 01/01/yyyy
Age CUSTOM (Add a random number below 20 to actual age)
User Group Access Privileges
joe_analyst us_employee US Data Only, non-sensitive data only, rest masked or forbidden
depending on sensitivity
ivanna_eu_hr eu_employee EU Data Only (only customers who gave consent), All sensitive data
etl_user eu_employee EU Data (all customers), All sensitive data, Update consent/Delete
Masking Rules
** = masked via tag based masking
68 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Environments & Demo Instructions
https://community.hortonworks.com/articles/151939
/hdp-securitygovernance-demo-kit.html

GDPR/CCPA Compliance and Data Governance in Hadoop

  • 1.
    Data Governance, Security& Compliance Crash Course Srikanth Venkat – Senior Director, PM Eyad Garelnabi – Senior Solutions Engineer
  • 2.
    2 © HortonworksInc. 2011 – 2016. All Rights Reserved Agenda Regulatory Compliance How can HDP help with Compliance requirements? Overview of applicable HDP components - Authentication & Data Protection: Kerberos, Ranger KMS - Authorization/Audit/Admin: Apache Ranger - Data Governance: Apache Atlas - Data Stewardship: Data Steward Studio Demo & Optional Lab
  • 3.
    Building Blocks forCompliance Identify and classify personal data Understand provenance, origin, lineage and impact Classify personal data for business purpose and security Centralize data access and auditing for consent and purpose Monitor and correlate data access via audits for breach forensics Anonymize & pseudonymize personal/ sensitive Automate data use, retention, and recovery strategies
  • 4.
    How Do I…. 1.Discover and inventory personal data across data lakes? 2. Centralize consent data to provide appropriate security controls in a data lake? 3. Efficiently erase subject’s data when they invoke the right to be forgotten? 3 Key Compliance Challenges
  • 5.
    Title Data mapping andinventory are critical for privacy protection • Article 6: Lawfulness of Processing • Article 30: Records of Processing Activities • Article 32: Security of Processing Required to secure and analyze data responsibly • Article 25: Data protection by design and by default • Article 5: Principles of personal data processing • Article 35: Data protection impact assessment Understand Your Personal DataChallenge #1 Data Map 500,000 Tables 50 Million Columns 5,000 Database
  • 6.
    Where? Where is the personaldata located? What? What is this data about (subject matter or domain) - for e.g. personal data? Who? Who is the owner for this data? Who has access to it and who has accessed it? Why? For what purpose was this data collected? When? When was this data created or updated? When was it accessed? How? Was proper consent acquired to collect this data? Was there a legal basis for processing this data? Understand Your Personal DataChallenge #1
  • 7.
    Challenge #2 &3 SUBJECT RIGHTS UNDER GDPR Data Portability Erasure Consent Consent can be withdrawn at any time by customer Right to transfer personal data from one data processor to another Right to be have all personal data erased upon subject's request Correction Right to rectify inaccurate data Access Right to know what information that has been collected and how it is being processed
  • 8.
    Consent needs tobe: • Specific per purpose • Explicit Subject can revoke consent at any time they choose • Article 7: Conditions of consent Challenge #2 marketing email phone chat web forms others Ways of acquiring consent
  • 9.
    Subject can requesterasure under several conditions • Article 17: Right to erasure Data gathered on minors without parental consent Subject objects to processing Data collected without specific processing intent Erasure for controller's legal or regulatory compliance obligations Subject withdraws consent and requests erasure Unlawful processing of data Right to erasure (Right to be forgotten) Handling Right to be Forgotten in HadoopChallenge #3
  • 10.
    Hortonworks Stack –Solutions & OSS Components YARN HDFS Storage and Compute Components Data Steward Studio (DSS)Solutions
  • 11.
    12 © HortonworksInc. 2011–2018. All rights reserved. How can HDP help with GDPR Compliance? Identify and Classify Sensitive Personal Data (Apache Atlas – tagging)1 Classify personal data for business purpose and security (Apache Atlas – tagging, Apache Ranger – tag based policies) Centralized data access and auditing for consent and purpose (Apache Ranger dynamic row filtering, Hive ACID) Anonymization/pseudonymization (Apache Ranger dynamic masking) 2 3 4 5 6 Automate data use, retention, and recovery strategies (Apache Atlas - tagging + Apache Ranger – tag based policies + other data movement tools/scripts) Monitor and correlate data access via audits for breach forensics (Apache Ranger audits) Understand provenance, origin, lineage and impact (Apache Atlas – lineage) 7
  • 12.
    23 © HortonworksInc. 2011 – 2016. All Rights Reserved Authentication & Data Protection
  • 13.
    24 © HortonworksInc. 2011 – 2016. All Rights Reserved Background: HDP + Kerberos Service Component A Service Component B HDP Cluster KDC keytabkeytab Service Component C keytab Service Component D keytab Service Component X Service Component X keytabkeytab Service Component X keytab Service Component X keytab Kerberos is used to secure the Components in the cluster. Kerberos identities are managed via “keytabs” on the Component hosts. Principals for the cluster are managed in the KDC.
  • 14.
    25 © HortonworksInc. 2011 – 2016. All Rights Reserved Kerberos + Active Directory Page 25 Cross Realm Trust Client Hadoop Cluster AD / LDAP KDC Users: smith@EXAMPLE.COM Hosts: host1@HADOOP.EXAMPLE.COM Services: hdfs/host1@HADOOP.EXAMPLE.COM User Store Use existing directory tools to manage users Use Kerberos tools to manage host + service principals Authentication
  • 15.
    26 © HortonworksInc. 2011 – 2016. All Rights Reserved Data Protection in Hadoop can be applied at three different layers in HDP Storage: encrypt data while it is at rest Transparent Data Encryption in HDFS, Ranger KMS + HSM, Partner Products (HPE Voltage, Protegrity, Dataguise) Transmission: encrypt data over the wire when it leaves the cluster SASL (RPC) and TLS (HTTP) Avoid for intracluster communication Upon Access: apply restrictions when accessed Ranger (Dynamic Column Masking + Row Filtering), Partner Masking + Encryption Data Protection
  • 16.
    28 © HortonworksInc. 2011 – 2016. All Rights Reserved Ranger KMS Transparent Data Encryption in HDFS NN A B C D HDFS Client A B C D A B C D DN DN DN Benefits v Selective encryption of relevant files/folders v Prevent rogue admin access to sensitive data v Fine grained access controls v Transparent to end application w/o changes v Ranger KMS integrated to external HSM (Safenet Luna) adding to reliability/security of KMS SafeNet- Luna HSM
  • 17.
    29 © HortonworksInc. 2011 – 2016. All Rights Reserved Authorization, Audit, Administration
  • 18.
    30 © HortonworksInc. 2011 – 2016. All Rights Reserved Apache Ranger • Central audit location for all access requests • Support multiple destination sources (HDFS, Solr, etc.) • Real-time visual query interface AuditingAuthorization • Store and manage encryption keys • Support HDFS Transparent Data Encryption • Integration with HSM • Safenet LUNA Ranger KMS • Centralized platform to define, administer and manage security policies consistently across Hadoop components • HDFS, Hive, HBase, YARN, Kafka, Solr, Storm, Knox, NiFi, Atlas • Extensible Architecture • Custom policy conditions, user context enrichers • Easy to add new component types for authorization
  • 19.
    31 © HortonworksInc. 2011 – 2016. All Rights Reserved Ranger – ABAC Model ⬢ ABAC Model ⬢ Combination of the subject, action, resource, and environment ⬢ Uses descriptive attributes: AD group, Apache Atlas-based tags or classifications, geo-location, etc. ⬢ Ranger approach is consistent with NIST 800-162 ⬢ Avoid role proliferation and manageability issues
  • 20.
    35 © HortonworksInc. 2011 – 2017. All Rights Reserved HDP Architecture: Dynamic Tag-based Security Policies Classification Prohibition Time Location Policies PDP Resource Cache Ranger Manage Access Policies and Audit Logs Track Metadata and Lineage Atlas Client Subscribers to Topic Gets Metadata Updates Atlas Metastore Tags Assets Entitles Streams Pipelines Feeds Hive Tables HDFS Files HBase Tables Entities in Data Lake
  • 21.
    41 © HortonworksInc. 2011 – 2016. All Rights Reserved Dynamic Row Filtering & Column Masking: Apache Ranger with Apache Hive User 2: Ivanna Location : EU Group: HRUser 1: Joe Location : US Group: Analyst Original Query: SELECT country, nationalid, ccnumber, mrn, name FROM ww_customers Country National ID CC No DOB MRN Name Policy ID US 232323233 4539067047629850 9/12/1969 8233054331 John Doe nj23j424 US 333287465 5391304868205600 8/13/1979 3736885376 Jane Doe cadsd984 Germany T22000129 4532786256545550 3/5/1963 876452830A Ernie Schwarz KK-2345909 Country National ID CC No MRN Name US xxxxx3233 4539 xxxx xxxx xxxx null John Doe US xxxxx7465 5391 xxxx xxxx xxxx null Jane Doe Ranger Policy Enforcement Query Rewritten based on Dynamic Ranger Policies: Filter rows by region & apply relevant column masking Users from US Analyst group see data for US persons with CC and National ID (SSN) as masked values and MRN is nullified Country National ID Name MRN Germany T22000129 Ernie Schwarz 876452830A EU HR Policy Admins can see unmasked but are restricted by row filtering policies to see data for EU persons only Original Query: SELECT country, nationalid, name, mrn FROM ww_customers Analysts HR Marketing
  • 22.
    45 © HortonworksInc. 2011 – 2016. All Rights Reserved Data Governance
  • 23.
    47 © HortonworksInc. 2011 – 2016. All Rights Reserved GOVERNANCE SECURITY BUSINESS • Metadata Catalog & Search • Lineage & Chain of Custody • Business glossary • Metadata Audits & Security ATLAS CAPABILITIES
  • 24.
    48 © HortonworksInc. 2011 – 2016. All Rights Reserved Atlas Hooks and Bridges metadata Information Governance Catalog Tech Preview
  • 25.
    58 © HortonworksInc. 2011 – 2016. All Rights Reserved Data Stewardship
  • 26.
    59 © HortonworksInc. 2011 – 2016. All Rights Reserved What If… In the Cloud On Premises Aware of Data Sources Enable New Services Security Controls Cluster 2 (Unstructured) Cluster 1 (Structured ) Cluster 2 (Unstructured ) Cluster 1 (Structured ) Cluster 3 (Structured ) Data Center Dublin Cluster 2 (Unstructured) Cluster 1 (Structured ) Cluster 3 (Structured ) Cluster 4 (Unstructured) Data Center Las Vegas Cluster 2 (Unstructured ) Cluster 1 (Structured ) Cluster 3 (Structured ) Data Center Bangkok Cluster 1 (Unstructured) Cluster 2 (Structured )
  • 27.
    60 © HortonworksInc. 2011 – 2016. All Rights Reserved HORTONWORKS DATAPLANE SERVICE Manage, Secure, Govern Hortonworks DataPlane Service a common set of services that ⬢ Supports enterprise deployment strategy and move to the cloud ⬢ Addresses compliance and regulatory requirements for enterprise ⬢ Eliminates policy silos and ensures security & governance moves w/ data ⬢ Simplifies data asset management and provides access to analysts and data scientists ⬢ Extensible to new services: Services enablement layer to bring new offerings to market rapidly What is Hortonworks DataPlane Service? DATA IN MOTION Hortonworks DataFlow DATA AT REST Hortonworks Data Platform MULTIPLE CLUSTERS & SOURCES CLOUD ON PREM H Y B R I D IOT
  • 28.
    61 © HortonworksInc. 2011 – 2016. All Rights Reserved Goals
  • 29.
    62 © HortonworksInc. 2011 – 2016. All Rights Reserved Demo & Optional Lab
  • 30.
    63 © HortonworksInc. 2011 – 2016. All Rights Reserved Demo Scenario à Hortonia – mid-size financial services company (HortoniaBank merged with HortoniaAssurance health insurance services) expanding from US to international markets à Employees in EU and US à Multiple business units need access to customer data: Analysts, Compliance Admins, HR à Customer data is co-mingled as well as isolated à Leases data from external data brokers à Needs to have rational security policies to provide the right level of access control to customer data across geographies, business functions, and to comply with external regulations (GDPR, PII, HIPAA, EU Privacy etc.) all user passwords: hadoop
  • 31.
    64 © HortonworksInc. 2011 – 2016. All Rights Reserved Demo Setup à 2 Customer Tables: 50K customer records each with 38 fields (PII, PHI, PCI & non-sensitive data) – us_customers: USA person data only – ww_customers: multi-language, multi-country, localized person data à 1 Reference table: eu_countries (reference table for looking up EU country codes to country mappings) à Finance DB: 1 data set leased from a data broker – tax_2015: Data lease expired already (on Dec 31st 2015) all user passwords: hadoop
  • 32.
    65 © HortonworksInc. 2011 – 2016. All Rights Reserved Ranger Hive Policies Setup for Demo à Only US employees can see data in us_customers table and only from locations within the US (access_us_customers) à US employees can see only data rows of US persons in ww_customers table (“filter_ww_customers for consent” + access_ww_customers) à EU employees can see only data rows of EU persons who have given consent within last 365 days in ww_customers table (“filter_ww_customers for consent” + access_ww_customers) à US HR team members can see all original unmasked data (PCI, PII,….) à Super users belonging to etl/DPO groups can see data for all EU customers à Masking: Analysts can view only masked versions of sensitive data from WW customers table but are prohibited from viewing PII data in US tables (All masking policies under Masking Tab of Resource based policies) à Prohibition: No combination of zip code, insurance, and bloodgroup data are permitted to be joined in any query (prohibition policy)
  • 33.
    67 © HortonworksInc. 2011 – 2016. All Rights Reserved Personas Data Column Masking Type Sample Output Password** Hash 237672b21819462ff39fcea7d990c3e5 National ID Last 4 Only xx-xx-9324 Credit Card First 4 Only 4532xxxxxxxxxxxx Street Address Static nnn Xxxxxx Xxxxx MRN** NULL null Birthdate CUSTOM Hide birthday by showing it as 01/01/yyyy Age CUSTOM (Add a random number below 20 to actual age) User Group Access Privileges joe_analyst us_employee US Data Only, non-sensitive data only, rest masked or forbidden depending on sensitivity ivanna_eu_hr eu_employee EU Data Only (only customers who gave consent), All sensitive data etl_user eu_employee EU Data (all customers), All sensitive data, Update consent/Delete Masking Rules ** = masked via tag based masking
  • 34.
    68 © HortonworksInc. 2011 – 2016. All Rights Reserved Environments & Demo Instructions https://community.hortonworks.com/articles/151939 /hdp-securitygovernance-demo-kit.html