For firms in the financial industry, especially within regulated organizations such as credit card processors and banks, PCI DSS compliance has become a business and operational necessity. Although the blueprint of a PCI-compliant architecture varies from organization to organization, the mixture of modern Hadoop-based data lakes and legacy systems are a common theme.
In this talk, we will discuss recent updates to PCI DSS and how significant portions of PCI DSS compliance controls can be achieved using open source Hadoop security stack and technologies for the Hadoop ecosystem. We will provide a broad overview of implementing key aspects of PCI DSS standards at WorldPay such as encryption management, data protection with anonymization, separation of duties, and deployment considerations regarding securing the Hadoop clusters at the network layer from a practitioner’s perspective. The talk will provide patterns and practices map current Hadoop security capabilities to security controls that a PCI-compliant environment requires.
Speaker
David Walker, Enterprise Data Platform Programme Director, Worldpay
Srikanth Venkat, Senior Director Product Management, Hortonworks
20. 20
Background: Kerberos
⬢ Strongly authenticating and establishing a user’s identity is the basis for secure
access in Hadoop
⬢ Users need to be able to reliably “identify” themselves and have identity
propagated throughout the Hadoop cluster
⬢ Design & implementation of Kerberos security in native Apache Hadoop was
delivered by Hortonworks co-founder Owen O’Malley!
⬢ Why Kerberos?
⬢ Establishes identity for clients, hosts and services
⬢ Prevents impersonation/passwords are never sent over the wire
⬢ Integrates w/ enterprise identity mgmt tools such as LDAP &Active Directory
⬢ More granular auditing of data access/job execution
21. 21
Background: HDP + Kerberos
Service
Component
A
Service
Component
B
HDP Cluster
KDC
keytabkeytab
Service
Component
C
keytab
Service
Component
D
keytab
Service
Component
X
Service
Component
X
keytabkeytab
Service
Component
X
keytab
Service
Component
X
keytab
Kerberos is used to
secure the
Components in the
cluster. Kerberos
identities are
managed via
“keytabs” on the
Component hosts.
Principals
for the
cluster are
managed in
the KDC.
22. 22
Automated Kerberos Setup with Ambari
Wizard driven and automated Kerberos
support (kerberos principal creation for service
accounts, keytab generation and distribution
for appropriate hosts, permissions, etc.)
Removes cumbersome, time consuming and
error prone administration of Kerberos
Works with existing Kerberos infrastructure,
including Active Directory to automate
common tasks, removing the burden from the
operator:
• Add/Delete Host
• Add Service
• Add/Delete Component
• Regenerate Keytabs
• Disable Kerberos
23. 23
Kerberos + Active Directory
Page 23
Cross Realm Trust
Client
Hadoop Cluster
AD /
LDAP KDC
Users: smith@EXAMPLE.COM
Hosts: host1@HADOOP.EXAMPLE.COM
Services: hdfs/host1@HADOOP.EXAMPLE.COM
User Store
Use existing directory
tools to manage users
Use Kerberos tools to
manage host + service
principals
Authentication
25. 25
Apache Ranger
• Central audit location for all
access requests
• Support multiple destination
sources (HDFS, Solr, etc.)
• Real-time visual query
interface
AuditingAuthorization
• Store and manage encryption
keys
• Support HDFS Transparent Data
Encryption
• Integration with HSM
• Safenet LUNA
Ranger KMS
• Centralized platform to define, administer
and manage security policies consistently
across Hadoop components
• HDFS, Hive, HBase, YARN, Kafka, Solr,
Storm, Knox, NiFi, Atlas
• Extensible Architecture
• Custom policy conditions, user
context enrichers
• Easy to add new component types
for authorization
26. 26
Ranger – ABAC Model
ABAC Model
Combination of the subject, action,
resource, and environment
Uses descriptive attributes: AD group,
Apache Atlas-based tags or
classifications, geo-location, etc.
Ranger approach is consistent with NIST
800-162
Avoid role proliferation and
manageability issues
27. 27
Dynamic Row Filtering & Column Masking: Apache Ranger with Apache Hive
User 2: Ivanna
Location : EU
Group: HRUser 1: Joe
Location : US
Group: Analyst
Original Query:
SELECT country, nationalid,
ccnumber, mrn, name FROM
ww_customers
Country National
ID
CC No DOB MRN Name Policy ID
US 232323233 4539067047629850 9/12/1969 8233054331 John Doe nj23j424
US 333287465 5391304868205600 8/13/1979 3736885376 Jane Doe cadsd984
Germany T22000129 4532786256545550 3/5/1963 876452830A Ernie Schwarz KK-2345909
Country National ID CC No MR
N
Name
US xxxxx3233 4539 xxxx xxxx xxxx null John Doe
US xxxxx7465 5391 xxxx xxxx xxxx null Jane Doe
Ranger Policy Enforcement
Query Rewritten based on Dynamic Ranger
Policies: Filter rows by region & apply
relevant column masking
Users from US Analyst group see data
for US persons with CC and National ID
(SSN) as masked values and MRN is
nullified
Country National ID Name MRN
Germany T22000129 Ernie
Schwarz
876452830A
EU HR Policy Admins can see
unmasked but are restricted
by row filtering policies to
see data for EU persons only
Original Query:
SELECT country, nationalid,
name, mrn FROM
ww_customers
Analysts
HR Marketing
29. 29
Data Protection in Hadoop
must be applied at three different layers in
Apache Hadoop
Storage: encrypt data while it is at rest
Transparent Data Encryption in HDFS, Ranger KMS + HSM, Partner
Products (HPE Voltage, Protegrity, Dataguise)
Transmission: encrypt data as it is in motion
Native Apache Hadoop 2.0 provides wire encryption.
Upon Access: apply restrictions when accessed
Ranger (Dynamic Column Masking + Row Filtering), Partner Masking +
Encryption
Data Protection
30. 30
Data Protection – Layered
Approach
• Encryption of Data at Rest
– OS Level Encryption (LUKS)
– Certified Partners for volume encryption (e.g: Vormetric (Thales) Protegrity, HPE Voltage Security)
– HDFS TDE file/folder level encryption with keys managed by Ranger KMS, External HSM integration
• Encryption of Data on the Wire
– All wire protocols can be encrypted by HDP platform
– Wire-level encryption enhancements (SSL).
• Granular Data Protection
– Dynamic Masking + Row Filtering for Hive with Ranger
– Classification Based Security with Ranger + Atlas
– Element level encryption/masking from certified partners (HPE Voltage, Protegrity)
31. 31
Ranger KMS
Transparent Data Encryption in HDFS
NN
A B
C D
HDFS Client
A B
C D
A B
C D
DN DN DN
Benefits
Selective encryption of relevant files/folders
Prevent rogue admin access to sensitive data
Fine grained access controls
Transparent to end application w/o changes
Ranger KMS integrated to external HSM
(Safenet Luna) adding to reliability/security of
KMS
SafeNet-
Luna HSM
32. 32
HSM integration with Ranger KMS
HSM client needs to be setup in
KMS nodes
When installing Ranger KMS, HSM
parameters can be specified
If KMS is already installed with DB,
Master key can be migrated to HSM
All other TDE functionality remains
unchanged
33. 33
HSM integration with Ranger KMS
Only master key
will be in HSM
Other keys stored
in Ranger KMS DB
35. 35
Apache Atlas Vision: Open Metadata & Governance Services
STRUCTURED
TRADITIONAL
RDBMS
METADATA
MPP
APPLIANCES
Kafka Storm
Sqoop
Hive
ATLAS
METADATA
Falcon
RANGER
Custom
Partners
Comprehensive Enterprise Data Catalog
• Lists all of your data, where it is located, its origin (lineage), owner, structure,
meaning, classification and quality
• Integrate both on-premise and cloud platforms to provide enterprise wide view
Open Enterprise Data Connectors
• Interoperable connector framework to connect to your data catalog out of the
box with many vendor technologies
• No expensive population of proprietary siloed metadata repositories
Dynamic Metadata Discovery
• Metadata is added automatically to the catalog as new data is created or data is
updated
• Extensible discovery processes that characterize and classify the data
Enabling Collaboration & Workflows
• Subject matter experts locate the data they need quickly and efficiently, share
their knowledge about the data and its usage to help others
• Interested parties and processes are notified automatically
Automated Governance Processes
• Metadata-driven access control
• Auditing, metering, and monitoring
• Quality control and exception management
• Rights (entitlement) management
Predefined standards for glossaries, data schemas, rules and regulations
Vision:
Metadata-driven foundational
governance services for enterprise data
ecosystem
• Open frameworks and APIs
• Agile and secure collaboration around data and advanced
analytics
• Reduce operational costs while extracting economic value
of data
36. 36
HDP – Security & Governance
Classification
Prohibition
Time
Location
Policies
PDP
Resource
Cache
Ranger
Manage Access Policies
and Audit Logs
Track Metadata
and Lineage
Atlas Client
Subscribers
to Topic
Gets Metadata
Updates
Atlas
Metastore
Tags
Assets
Entitles
Streams
Pipelines
Feeds
Hive
Tables
HDFS
Files
HBase
Tables
Entities
in Data
Lake
Industry First: Dynamic Tag-based Security Policies
38. 38
Walk Through Items
⬢ Ranger
⬢ ABAC Fine Grained Security
⬢ Resource/Masking/Row Filtering Policies
⬢ Audits – self audits/access/plugin audits, logins
⬢ User/Group/Roles in Ranger
⬢ Atlas
⬢ Search and tag assets
⬢ Tag Attributes
⬢ Tag based policies in Ranger
This is designed to impress - I use lots of junk examples here:
* These slides were written before our merger with Vantiv - we were the biggest in the UK, they were the biggest in the US - now we are bringing this all together in the biggest payment company in the world
* If you've used a card 3 times in the UK - we have your details, your name, card number and expiry date. If one of those transactions was on the web we also have your address, email and probably your mobile too. If it was for a flight, we know where you are going
* So this data is sensitive - it has to be secure which is why our enterprise data platform focuses so much on security - we do security at every level [explain each one], data at rest, data in motion, authentication, access control and tokenisation
* We are so important to the digital economy that we are on the UK Gov't risk register
* Payments aren't just card transactions - we do mobile payments ion Kenya, sit behind PayPal and have an alternative platform that looks at Bitcoin etc.
* Our transactions peak out at >1500 per day - do you [the audience] know which is the busiest day ? - [wait for guesses - Christmas, Black Friday, etc.] then do the China Singles Day stats - see wikipedia for numbers
* Without Worldpay you probably can't buy me a beer in the bar later * All the stats in this presentation refer to Worldpay pre-merger - we are bigger (and better now)
Proxying services –
Only proxy HTTP resources at the moment – let us know if there are requests for other protocols
Support UIs, REST APIs, JDBC/ODBC, and Websockets
We can configure Knox to impersonate the user when making requests to the cluster – that’s called the trusted proxy pattern. Not all services use Knox as a trusted proxy.
Auth Services
Oauth will be supported in the future subject to demand
------------------------------------------------------------------------------
HDP Certified =
We have Groovy based DSL scripting access to underlying Hadoop resources via the SDK– WEBHDFS, WEBHCAT,OOZIE,HBASE, so most likely we should colour code that too.
Yarn RM – Resource Manager
HBASE was tested then they ran into some issues, so it’s now tracked for Atlantic,https://hortonworks.jira.com/browse/BUG-84047
For example, even if you had SSL to the web UIs and have accounts with sudo privileges. This sudo user will be able to sniff and decrypt network traffic (domain account password). Also the damage will be widespread since the sudo user will be able to get domain creds of any user who logs on to the cluster.
With Kerberos this concern is not there since the user’s password is not sent to the cluster. If the sudo user become compromised, he can only get Kerberos ticket which is for talking to a particular service and also has a lifetime. We definitely need Kerberos based authentication for Ambari.
Strong Authentication = Password never sent over the wire
HSMs are trust anchors – appliances or cards that have dedicated crypto processor that generates and stores cypto keys securely – validated by 3rd parties security certification (FIPS140-2, CC EAL etc.) simplify audits, ease compliance, & provide high performance
35
For example, even if you had SSL to the web UIs and have accounts with sudo privileges. This sudo user will be able to sniff and decrypt network traffic (domain account password). Also the damage will be widespread since the sudo user will be able to get domain creds of any user who logs on to the cluster.
With Kerberos this concern is not there since the user’s password is not sent to the cluster. If the sudo user become compromised, he can only get Kerberos ticket which is for talking to a particular service and also has a lifetime. We definitely need Kerberos based authentication for Ambari.
Strong Authentication = Password never sent over the wire