SlideShare a Scribd company logo
1 of 40
1
© Worldpay 2018. All rights reserved.
Not just a necessary evil, it’s good for business:
Implementing PCI DSS controls for the Hadoop
ecosystem at the UK’s largest payment processor
David Walker/Worldpay & Srikanth Venkat/HortonWorks
DataWorks Summit – Berlin
17-19 April 2018
2 © Worldpay 2018. All rights reserved.2
Session Synopsis & Your Speakers
David has over 20 years’ technical leadership expertise and
has led the development and management of complex BI
solutions, supporting technical architectures for a wide
range of organisations spanning SME start-ups to large
enterprise. In his role at Worldpay, David specialises in
developing and delivering the Enterprise Data Platform, a
multi-tenant highly secure Hadoop platform for decision
engines, analytics and reporting using his experience and
knowledge in technical architecture, data modelling, ETL
design, data quality, and metadata management. A key
aspect of David’s role also involves acting as the lynchpin
between Worldpay’s commercial and technical business
leaders by regularly engaging at the executive level. David
also manages cross-cultural teams in the analysis of
technical infrastructures and the delivery of innovative and
successful change programmes.
Srikanth Venkat is currently responsible for Security & Governance
portfolio of products at Hortonworks which include Apache Knox,
Apache Ranger, Apache Atlas, Platform wide security and
Hortonworks DataPlane Service. Prior to Hortonworks, Srikanth has
held multiple roles in areas of cloud services, marketplaces, security,
and business applications. His experience includes leadership across
Product Management, Strategy and Operations, and Technical
Architecture with broad experience in startups to global organizations
including Telefonica, Salesforce.com, Cisco-Webex, Proofpoint,
Dataguise, Trilogy Software, and Hewlett-Packard. Srikanth holds a
PhD in Engineering with a focus on Artificial Intelligence from
University of Pittsburgh, and an MBA in General Management from
Indiana University and a Masters in Global Management from
Thunderbird School of Global Management. Srikanth is a Data
Sciences & Machine Learning hobbyist and enjoys tinkering with Big
Data technologies.
For firms in the financial industry, especially within regulated organizations such as credit card processors and banks, PCI DSS compliance has become a business
and operational necessity. Although the blueprint of a PCI-compliant architecture varies from organization to organization, the mixture of modern Hadoop-based
data lakes and legacy systems are a common theme.
In this talk, we will discuss recent updates to PCI DSS and how significant portions of PCI DSS compliance controls can be achieved using open source Hadoop
security stack and technologies for the Hadoop ecosystem. We will provide a broad overview of implementing key aspects of PCI DSS standards at WorldPay such as
encryption management, data protection with anonymization, separation of duties, and deployment considerations regarding securing the Hadoop clusters at the
network layer from a practitioner’s perspective. The talk will provide patterns and practices map current Hadoop security capabilities to security controls that a PCI-
compliant environment requires.
3 © Worldpay 2017. All rights reserved.3
Transactions Daily.
On average that’s per second.
merchants using >
payment methods & currencies
in countries and in the UK we
process % of all non-cash transactions
Worldpay In (Big) Numbers
In Store
Online
Mobile
4 © Worldpay 2018. All rights reserved.4
Data Security & Regulatory Compliance are both in the news …
… but in reality they are two sides of the same coin
Payment Card
Industry Data
Security Standards*
General
Data
Protection
Regulations
Payment Services
Directive 2*
Data
Protection
Act(s)
* Other industries have
their own standards but the
principle is the same
5 © Worldpay 2018. All rights reserved.5
So why is this good for business ?
• In a digital world the success of our business
(regardless of industry) will be significantly
defined by our organisations ability to handle
and use data responsibly throughout our
business. We must protect our customers and
business partners from both data misuse and
from fraud. In short we need to be trusted by
our customers in the ways that we handle their
information
• Legal & regulatory standards are being set by
governments, regulators and industry bodies in
an attempt to set a minimum sufficient
standard to protect data subjects
6 © Worldpay 2018. All rights reserved.6
How do you develop a secure platform
• Compliance is not lip-service to doing security
– the auditing for PCI DSS is rigorous and we
have to continuously review and upgrade our
systems to maintain compliance
• Audit of and compliance with these standards
is a way of demonstrating that we have taken
appropriate steps to protect our data assets –
and in the worst case scenario it is also a way
of mitigating the financial and reputational
impact of an incident.
Either start with a blank piece of paper …. … or adopt and commit to security framework
7 © Worldpay 2018. All rights reserved.7
Todays Hadoop Environments Are The Big Targets Within Your Organisation
• If you are building or have built a large successful Hadoop deployment
that contains a large amount of your business data then you have just
created a massive target within your organisation
• PCI DSS only certifies a project or implementation
• No single product can deliver a PCI DSS compliant solution
• As the implementers of a system we are looking to get the
greatest amount of compliance by deploying the smallest number
of products and tools to do the job
8 © Worldpay 2018. All rights reserved.8
First Some Historical Context
• The Worldpay journey to build a big data platform started in April 2015
• We started with HortonWorks 2.2
• The Hortonworks Data Platform Security document did not exist
• Apache Ranger was new, Apache Atlas was a concept, HortonWorks DataPlane wasn’t even a
twinkle
• Today we are on 2.6.4 and have applied nearly every release in between
• Across the entire software product stack we did 298 patch sets and upgrades in 2017
• Besides the core platform paying for support and deploying HortonWorks SmartSense
significantly improves your security profile
• We are also interested in:
• https://workbench.cisecurity.org – Center for Internet Security
• http://owasp.org – Open Web Application Security Project
9 © Worldpay 2018. All rights reserved.9
Even your fish tank is a risk
to you data platform(s)
• PCI is about putting in place the
• Security
• Logging of activity
• Audit of that security
• Separation of duties
• Patching Cycles
• Etc.
• And then maintaining them
• We are just finishing our 2018 PCI cycle
and start planning 2019 PCI cycle in
September
10 © Worldpay 2018. All rights reserved.10
The PCI DSS 3.2 Requirements
Goals PCI DSS 3.2 Requirement
Build and Maintain a Secure Network and
Systems
1. Install and maintain a firewall configuration to protect cardholder data
2. Do not use vendor-supplied defaults for system passwords and other security
parameters
Protect Cardholder Data 3. Protect stored cardholder data
4. Encrypt transmission of cardholder data across open, public networks
Maintain a Vulnerability Management
Program
5. Protect all systems against malware and regularly update antivirus software or programs
6. Develop and maintain secure systems and applications
Implement Strong Access Control
Measures
7. Restrict access to cardholder data by business need to know
8. Identify and authenticate access to system components
9. Restrict physical access to cardholder data
Regularly Monitor and Test Networks 10. Track and monitor all access to network resources and cardholder data
11. Regularly test security systems and processes
Maintain an Information Security Policy 12. Maintain a policy that addresses information security for all personnel
11 © Worldpay 2018. All rights reserved.11
Addressing The Requirements
1. Install and maintain a firewall configuration to
protect cardholder data
• The Worldpay network has defence in depth, much more than just firewalls including
virtualised jumpboxes and two factor authentication. Our network traffic is monitored
and logged
• Apache Knox is used to supplement perimeter security
2. Do not use vendor-supplied defaults for system
passwords and other security parameters
• Apache Ambari allows us to install, configure and manage the system passwords,
connection ports, certificates, etc.
• Apache Ambari is used to help implement Kerberos
• Keys stored in HSMs
3. Protect stored cardholder data • Hardware Encrypted Disks
• Apache Atlas is used to ‘tag’ columns as PCI or PII data
• Apache Atlas is used to mask data and/or remove data at run time
• Apache Ranger is used to restrict access to the data based on roles (RBAC)
• Apache Ranger is used to restrict access to the data based on attributes (ABAC)
• Apache Ranger is integrated to our LDAP/Active Directory
• Apache HDFS Transparent Data Encryption enabled
• HDFS ACLs enabled
• Microfocus SecureData (formerly HP Voltage) is used to either Tokenise or Encrypt
sensitive (PCI & PII) data
• Vormetric Disk Protection enabled
12 © Worldpay 2018. All rights reserved.12
Addressing The Requirements
4. Encrypt transmission of cardholder data
across open, public networks
• All of our components are use TLS 1.2 to encrypt network traffic – this has to be
supported by every HortonWorks component to be effective
5. Protect all systems against malware and
regularly update antivirus software or
programs
• Worldpay runs on Linux rather than Windows but we do still have anti-virus
• Worldpay implements File Integrity Management that checks critical files are not
being modified
• Regular patching of entire software stack including OS and all software packages
as patches and releases come out
• Worl;dpay limits what software can be downloaded and in stalled on an servers
• Hortonwork have/are specifically addressed vulnerabilities we have found
• Use Hortonworks SmartSense to ensure optimal configurations
6. Develop and maintain secure systems and
applications
• Worldpay peer reviews our code before deploying
• Worldpay developed code has to be scanned with tools like Vericode
• Worldpay develops to OWASP (Open Web Application Security Project) standards
for interfaces
13 © Worldpay 2018. All rights reserved.13
Addressing The Requirements
7. Restrict access to cardholder data by
business need to know
• Apache Ranger is used to restrict access to the data based on roles (RBAC)
• Apache Ranger is used to restrict access to the data based on attributes (ABAC)
8. Identify and authenticate access to system
components
• Kerberos enabled cluster
• Apache Ranger is integrated to our LDAP/Active Directory
• Apache Ranger implements user -> group -> role -> access relationship
9. Restrict physical access to cardholder data • Tightly restricted access to the data centres
• No disks returned on failure to the vendors
• Indirect server access via virtualised jumpboxes
14 © Worldpay 2018. All rights reserved.14
Addressing The Requirements
10. Track and monitor all access to network
resources and cardholder data
• Systems Access logged via Apache Ranger to Apache Solr and made available to
auditors
• All other Hortonworks audit functions also enabled
11. Regularly test security systems and
processes
• Worldpay ‘pentests’ systems regularly (i.e. on installation, after major changes
and annually) as part of the certification process
• The EDP Governance team defines and audits policies relating to security (as
well as other data management functions)
12. Maintain a policy that addresses
information security for all personnel
• Worldpay has a set of mandatory compliance training on PCI and other security
issues that has to be renewed each year by all employees
15 © Worldpay 2018. All rights reserved.15
Our 2.6.4 Components that help us create a PCI compliant system today
16
Srikanth Venkat – Senior Director, Product Management
Security & Governance in HDP:
From a PCI-DSS Perspective
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Authentication & API Security: Apache
Knox
18
Apache Knox Overview
WebSSO
Authentication
And
Federation
providers
Groovy based
DSL
Client DSL/SDK Services
HTTP
Proxying
Services
UIs
REST
APIs
Web
Sockets
Hive
Ambari
HBase
WebHCa
t
WebHDFS
Authentication Services
Proxying
Services
KnoxShell
SDK
Token
Sessions
REST
API
Classes
KnoxSSO/Token
YARN
Range
r
Zeppeli
n
Oozie
Phoenix
Gremlin
JDBC/
ODBC
SAML
OAuth
LDAP/AD
SPNEGO
Header
Based
YARN
RM
WebHCa
tWebHDF
S
Hive
YARN
RM
HBase
Proxying Services
★ Provide access to Hadoop via proxying of
HTTP resources
★ Ecosystem APIs and UIs + Hadoop oriented
dispatching for Kerberos + doAs
(impersonation) etc.
Authentication Services
★ REST API access, WebSSO flow for UIs
★ LDAP/AD, Header based PreAuth, and
Token Exchange
★ Kerberos, SAML, OAuth
Client DSL/SDK Services
★ Scripting through DSL
★ Using Knox Shell classes directly as SDK
HDP Certified as
of HDP 2.6.4
Community
supported
Atlas
Oozie
Druid
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Authentication: Kerberos
20
Background: Kerberos
⬢ Strongly authenticating and establishing a user’s identity is the basis for secure
access in Hadoop
⬢ Users need to be able to reliably “identify” themselves and have identity
propagated throughout the Hadoop cluster
⬢ Design & implementation of Kerberos security in native Apache Hadoop was
delivered by Hortonworks co-founder Owen O’Malley!
⬢ Why Kerberos?
⬢ Establishes identity for clients, hosts and services
⬢ Prevents impersonation/passwords are never sent over the wire
⬢ Integrates w/ enterprise identity mgmt tools such as LDAP &Active Directory
⬢ More granular auditing of data access/job execution
21
Background: HDP + Kerberos
Service
Component
A
Service
Component
B
HDP Cluster
KDC
keytabkeytab
Service
Component
C
keytab
Service
Component
D
keytab
Service
Component
X
Service
Component
X
keytabkeytab
Service
Component
X
keytab
Service
Component
X
keytab
Kerberos is used to
secure the
Components in the
cluster. Kerberos
identities are
managed via
“keytabs” on the
Component hosts.
Principals
for the
cluster are
managed in
the KDC.
22
Automated Kerberos Setup with Ambari
 Wizard driven and automated Kerberos
support (kerberos principal creation for service
accounts, keytab generation and distribution
for appropriate hosts, permissions, etc.)
 Removes cumbersome, time consuming and
error prone administration of Kerberos
 Works with existing Kerberos infrastructure,
including Active Directory to automate
common tasks, removing the burden from the
operator:
• Add/Delete Host
• Add Service
• Add/Delete Component
• Regenerate Keytabs
• Disable Kerberos
23
Kerberos + Active Directory
Page 23
Cross Realm Trust
Client
Hadoop Cluster
AD /
LDAP KDC
Users: smith@EXAMPLE.COM
Hosts: host1@HADOOP.EXAMPLE.COM
Services: hdfs/host1@HADOOP.EXAMPLE.COM
User Store
Use existing directory
tools to manage users
Use Kerberos tools to
manage host + service
principals
Authentication
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Authorization & Audits: Apache Ranger
25
Apache Ranger
• Central audit location for all
access requests
• Support multiple destination
sources (HDFS, Solr, etc.)
• Real-time visual query
interface
AuditingAuthorization
• Store and manage encryption
keys
• Support HDFS Transparent Data
Encryption
• Integration with HSM
• Safenet LUNA
Ranger KMS
• Centralized platform to define, administer
and manage security policies consistently
across Hadoop components
• HDFS, Hive, HBase, YARN, Kafka, Solr,
Storm, Knox, NiFi, Atlas
• Extensible Architecture
• Custom policy conditions, user
context enrichers
• Easy to add new component types
for authorization
26
Ranger – ABAC Model
 ABAC Model
 Combination of the subject, action,
resource, and environment
 Uses descriptive attributes: AD group,
Apache Atlas-based tags or
classifications, geo-location, etc.
 Ranger approach is consistent with NIST
800-162
 Avoid role proliferation and
manageability issues
27
Dynamic Row Filtering & Column Masking: Apache Ranger with Apache Hive
User 2: Ivanna
Location : EU
Group: HRUser 1: Joe
Location : US
Group: Analyst
Original Query:
SELECT country, nationalid,
ccnumber, mrn, name FROM
ww_customers
Country National
ID
CC No DOB MRN Name Policy ID
US 232323233 4539067047629850 9/12/1969 8233054331 John Doe nj23j424
US 333287465 5391304868205600 8/13/1979 3736885376 Jane Doe cadsd984
Germany T22000129 4532786256545550 3/5/1963 876452830A Ernie Schwarz KK-2345909
Country National ID CC No MR
N
Name
US xxxxx3233 4539 xxxx xxxx xxxx null John Doe
US xxxxx7465 5391 xxxx xxxx xxxx null Jane Doe
Ranger Policy Enforcement
Query Rewritten based on Dynamic Ranger
Policies: Filter rows by region & apply
relevant column masking
Users from US Analyst group see data
for US persons with CC and National ID
(SSN) as masked values and MRN is
nullified
Country National ID Name MRN
Germany T22000129 Ernie
Schwarz
876452830A
EU HR Policy Admins can see
unmasked but are restricted
by row filtering policies to
see data for EU persons only
Original Query:
SELECT country, nationalid,
name, mrn FROM
ww_customers
Analysts
HR Marketing
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data Protection
29
Data Protection in Hadoop
must be applied at three different layers in
Apache Hadoop
Storage: encrypt data while it is at rest
Transparent Data Encryption in HDFS, Ranger KMS + HSM, Partner
Products (HPE Voltage, Protegrity, Dataguise)
Transmission: encrypt data as it is in motion
Native Apache Hadoop 2.0 provides wire encryption.
Upon Access: apply restrictions when accessed
Ranger (Dynamic Column Masking + Row Filtering), Partner Masking +
Encryption
Data Protection
30
Data Protection – Layered
Approach
• Encryption of Data at Rest
– OS Level Encryption (LUKS)
– Certified Partners for volume encryption (e.g: Vormetric (Thales) Protegrity, HPE Voltage Security)
– HDFS TDE file/folder level encryption with keys managed by Ranger KMS, External HSM integration
• Encryption of Data on the Wire
– All wire protocols can be encrypted by HDP platform
– Wire-level encryption enhancements (SSL).
• Granular Data Protection
– Dynamic Masking + Row Filtering for Hive with Ranger
– Classification Based Security with Ranger + Atlas
– Element level encryption/masking from certified partners (HPE Voltage, Protegrity)
31
Ranger KMS
Transparent Data Encryption in HDFS
NN
A B
C D
HDFS Client
A B
C D
A B
C D
DN DN DN
Benefits
 Selective encryption of relevant files/folders
 Prevent rogue admin access to sensitive data
 Fine grained access controls
 Transparent to end application w/o changes
 Ranger KMS integrated to external HSM
(Safenet Luna) adding to reliability/security of
KMS
SafeNet-
Luna HSM
32
HSM integration with Ranger KMS
 HSM client needs to be setup in
KMS nodes
 When installing Ranger KMS, HSM
parameters can be specified
 If KMS is already installed with DB,
Master key can be migrated to HSM
 All other TDE functionality remains
unchanged
33
HSM integration with Ranger KMS
Only master key
will be in HSM
Other keys stored
in Ranger KMS DB
34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Governance with Apache Atlas
35
Apache Atlas Vision: Open Metadata & Governance Services
STRUCTURED
TRADITIONAL
RDBMS
METADATA
MPP
APPLIANCES
Kafka Storm
Sqoop
Hive
ATLAS
METADATA
Falcon
RANGER
Custom
Partners
Comprehensive Enterprise Data Catalog
• Lists all of your data, where it is located, its origin (lineage), owner, structure,
meaning, classification and quality
• Integrate both on-premise and cloud platforms to provide enterprise wide view
Open Enterprise Data Connectors
• Interoperable connector framework to connect to your data catalog out of the
box with many vendor technologies
• No expensive population of proprietary siloed metadata repositories
Dynamic Metadata Discovery
• Metadata is added automatically to the catalog as new data is created or data is
updated
• Extensible discovery processes that characterize and classify the data
Enabling Collaboration & Workflows
• Subject matter experts locate the data they need quickly and efficiently, share
their knowledge about the data and its usage to help others
• Interested parties and processes are notified automatically
Automated Governance Processes
• Metadata-driven access control
• Auditing, metering, and monitoring
• Quality control and exception management
• Rights (entitlement) management
Predefined standards for glossaries, data schemas, rules and regulations
Vision:
Metadata-driven foundational
governance services for enterprise data
ecosystem
• Open frameworks and APIs
• Agile and secure collaboration around data and advanced
analytics
• Reduce operational costs while extracting economic value
of data
36
HDP – Security & Governance
Classification
Prohibition
Time
Location
Policies
PDP
Resource
Cache
Ranger
Manage Access Policies
and Audit Logs
Track Metadata
and Lineage
Atlas Client
Subscribers
to Topic
Gets Metadata
Updates
Atlas
Metastore
Tags
Assets
Entitles
Streams
Pipelines
Feeds
Hive
Tables
HDFS
Files
HBase
Tables
Entities
in Data
Lake
Industry First: Dynamic Tag-based Security Policies
37
Walk Through
38
Walk Through Items
⬢ Ranger
⬢ ABAC Fine Grained Security
⬢ Resource/Masking/Row Filtering Policies
⬢ Audits – self audits/access/plugin audits, logins
⬢ User/Group/Roles in Ranger
⬢ Atlas
⬢ Search and tag assets
⬢ Tag Attributes
⬢ Tag based policies in Ranger
39 © Worldpay 2018. All rights reserved.39
WorldPay – Hortonworks Partnership
• WorldPay has partnered closely with Hortonworks to
improve security and governance features across HDP and
to certify their internal platforms for PCI-DSS
• Collaboration has resulted in the community enhancements
via Apache Knox, Apache Ranger, and Apache Atlas, wire
encryption & TDE
• Ongoing collaboration on HDP platform security fixes from
external audits
• Key learnings incorporated into Hortonworks DataPlane
Service – Data Steward Studio (DSS)
40
© Worldpay 2018. All rights reserved.
Leaders in Modern Money
Innovating In Secure Modern Data Analytics
Thank You
David M Walker (david.walker@worldpay.com)
Enterprise Data Platform Programme Director, Worldpay
Srikanth Venkat
Senior Director, Product Management, Hortonworks

More Related Content

What's hot

Solving Performance Problems on Hadoop
Solving Performance Problems on HadoopSolving Performance Problems on Hadoop
Solving Performance Problems on HadoopTyler Mitchell
 
Data Offload for the Chief Data Officer – how to move data onto Hadoop withou...
Data Offload for the Chief Data Officer – how to move data onto Hadoop withou...Data Offload for the Chief Data Officer – how to move data onto Hadoop withou...
Data Offload for the Chief Data Officer – how to move data onto Hadoop withou...DataWorks Summit
 
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...Seeling Cheung
 
Big Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeBig Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeDataWorks Summit
 
Continuous Data Ingestion pipeline for the Enterprise
Continuous Data Ingestion pipeline for the EnterpriseContinuous Data Ingestion pipeline for the Enterprise
Continuous Data Ingestion pipeline for the EnterpriseDataWorks Summit
 
HDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFSHDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFSDataWorks Summit
 
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...DataWorks Summit
 
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Precisely
 
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...DataWorks Summit
 
Reaching scale limits on a Hadoop platform: issues and errors created by spee...
Reaching scale limits on a Hadoop platform: issues and errors created by spee...Reaching scale limits on a Hadoop platform: issues and errors created by spee...
Reaching scale limits on a Hadoop platform: issues and errors created by spee...DataWorks Summit
 
Benefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at ScaleBenefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at ScaleHortonworks
 
Cloud Innovation Day - Commonwealth of PA v11.3
Cloud Innovation Day - Commonwealth of PA v11.3Cloud Innovation Day - Commonwealth of PA v11.3
Cloud Innovation Day - Commonwealth of PA v11.3Eric Rice
 
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...DataWorks Summit
 
O2’s Financial Data Hub: going beyond IFRS compliance to support digital tran...
O2’s Financial Data Hub: going beyond IFRS compliance to support digital tran...O2’s Financial Data Hub: going beyond IFRS compliance to support digital tran...
O2’s Financial Data Hub: going beyond IFRS compliance to support digital tran...DataWorks Summit
 
GDPR-focused partner community showcase for Apache Ranger and Apache Atlas
GDPR-focused partner community showcase for Apache Ranger and Apache AtlasGDPR-focused partner community showcase for Apache Ranger and Apache Atlas
GDPR-focused partner community showcase for Apache Ranger and Apache AtlasDataWorks Summit
 
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalPresentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalDiego Alberto Tamayo
 
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...DataWorks Summit
 
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsVerizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsDataWorks Summit
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark DataWorks Summit/Hadoop Summit
 

What's hot (20)

Solving Performance Problems on Hadoop
Solving Performance Problems on HadoopSolving Performance Problems on Hadoop
Solving Performance Problems on Hadoop
 
Data Offload for the Chief Data Officer – how to move data onto Hadoop withou...
Data Offload for the Chief Data Officer – how to move data onto Hadoop withou...Data Offload for the Chief Data Officer – how to move data onto Hadoop withou...
Data Offload for the Chief Data Officer – how to move data onto Hadoop withou...
 
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
 
Big Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeBig Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short Time
 
Continuous Data Ingestion pipeline for the Enterprise
Continuous Data Ingestion pipeline for the EnterpriseContinuous Data Ingestion pipeline for the Enterprise
Continuous Data Ingestion pipeline for the Enterprise
 
HDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFSHDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFS
 
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
 
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
 
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
 
Reaching scale limits on a Hadoop platform: issues and errors created by spee...
Reaching scale limits on a Hadoop platform: issues and errors created by spee...Reaching scale limits on a Hadoop platform: issues and errors created by spee...
Reaching scale limits on a Hadoop platform: issues and errors created by spee...
 
Benefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at ScaleBenefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at Scale
 
Cloud Innovation Day - Commonwealth of PA v11.3
Cloud Innovation Day - Commonwealth of PA v11.3Cloud Innovation Day - Commonwealth of PA v11.3
Cloud Innovation Day - Commonwealth of PA v11.3
 
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
 
O2’s Financial Data Hub: going beyond IFRS compliance to support digital tran...
O2’s Financial Data Hub: going beyond IFRS compliance to support digital tran...O2’s Financial Data Hub: going beyond IFRS compliance to support digital tran...
O2’s Financial Data Hub: going beyond IFRS compliance to support digital tran...
 
GDPR-focused partner community showcase for Apache Ranger and Apache Atlas
GDPR-focused partner community showcase for Apache Ranger and Apache AtlasGDPR-focused partner community showcase for Apache Ranger and Apache Atlas
GDPR-focused partner community showcase for Apache Ranger and Apache Atlas
 
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalPresentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
 
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
 
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsVerizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark
 
Data-In-Motion Unleashed
Data-In-Motion UnleashedData-In-Motion Unleashed
Data-In-Motion Unleashed
 

Similar to Implementing PCI DSS Controls for Hadoop at UK's Largest Payment Processor

Hadoop and Financial Services
Hadoop and Financial ServicesHadoop and Financial Services
Hadoop and Financial ServicesCloudera, Inc.
 
How the latest trends in data security can help your data protection strategy...
How the latest trends in data security can help your data protection strategy...How the latest trends in data security can help your data protection strategy...
How the latest trends in data security can help your data protection strategy...Ulf Mattsson
 
Compliance in the Cloud
Compliance in the CloudCompliance in the Cloud
Compliance in the CloudRapidScale
 
CASE STUDY: UK NATIONAL HEALTH SERVICE
CASE STUDY: UK NATIONAL HEALTH SERVICECASE STUDY: UK NATIONAL HEALTH SERVICE
CASE STUDY: UK NATIONAL HEALTH SERVICEForgeRock
 
Payment Card Security: 12-Steps to Meeting PCI-DSS Compliance with SafeNet
Payment Card Security: 12-Steps to Meeting PCI-DSS Compliance with SafeNetPayment Card Security: 12-Steps to Meeting PCI-DSS Compliance with SafeNet
Payment Card Security: 12-Steps to Meeting PCI-DSS Compliance with SafeNetSafeNet
 
Big Data Week 2016 - Worldpay - Deploying Secure Clusters
Big Data Week 2016  - Worldpay - Deploying Secure ClustersBig Data Week 2016  - Worldpay - Deploying Secure Clusters
Big Data Week 2016 - Worldpay - Deploying Secure ClustersDavid Walker
 
Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...
Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...
Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...DataStax
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...DataWorks Summit/Hadoop Summit
 
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy ClustersData Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy ClustersDavid Walker
 
Will Your Cloud Be Compliant? OpenStack Security
Will Your Cloud Be Compliant?  OpenStack SecurityWill Your Cloud Be Compliant?  OpenStack Security
Will Your Cloud Be Compliant? OpenStack SecurityScott Carlson
 
04 - VMUGIT - Lecce 2018 - Giampiero Petrosi, Rubrik
04 - VMUGIT - Lecce 2018 - Giampiero Petrosi, Rubrik04 - VMUGIT - Lecce 2018 - Giampiero Petrosi, Rubrik
04 - VMUGIT - Lecce 2018 - Giampiero Petrosi, RubrikVMUG IT
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesDataStax
 
Data Power For Pci Webinar Aug 2012
Data Power For Pci Webinar Aug 2012Data Power For Pci Webinar Aug 2012
Data Power For Pci Webinar Aug 2012gaborvodics
 
Rightscale Webinar: PCI in Public Cloud
Rightscale Webinar: PCI in Public CloudRightscale Webinar: PCI in Public Cloud
Rightscale Webinar: PCI in Public CloudRightScale
 
Applying Auto-Data Classification Techniques for Large Data Sets
Applying Auto-Data Classification Techniques for Large Data SetsApplying Auto-Data Classification Techniques for Large Data Sets
Applying Auto-Data Classification Techniques for Large Data SetsPriyanka Aash
 
Shield db data security
Shield db   data securityShield db   data security
Shield db data securityMousumi Manna
 
Shield db data security
Shield db   data securityShield db   data security
Shield db data securityMousumi Manna
 
Shield db data security
Shield db   data securityShield db   data security
Shield db data securityTapan Biswas
 

Similar to Implementing PCI DSS Controls for Hadoop at UK's Largest Payment Processor (20)

Hadoop and Financial Services
Hadoop and Financial ServicesHadoop and Financial Services
Hadoop and Financial Services
 
Will your cloud be compliant
Will your cloud be compliantWill your cloud be compliant
Will your cloud be compliant
 
How the latest trends in data security can help your data protection strategy...
How the latest trends in data security can help your data protection strategy...How the latest trends in data security can help your data protection strategy...
How the latest trends in data security can help your data protection strategy...
 
Compliance in the Cloud
Compliance in the CloudCompliance in the Cloud
Compliance in the Cloud
 
CASE STUDY: UK NATIONAL HEALTH SERVICE
CASE STUDY: UK NATIONAL HEALTH SERVICECASE STUDY: UK NATIONAL HEALTH SERVICE
CASE STUDY: UK NATIONAL HEALTH SERVICE
 
Payment Card Security: 12-Steps to Meeting PCI-DSS Compliance with SafeNet
Payment Card Security: 12-Steps to Meeting PCI-DSS Compliance with SafeNetPayment Card Security: 12-Steps to Meeting PCI-DSS Compliance with SafeNet
Payment Card Security: 12-Steps to Meeting PCI-DSS Compliance with SafeNet
 
Big Data Week 2016 - Worldpay - Deploying Secure Clusters
Big Data Week 2016  - Worldpay - Deploying Secure ClustersBig Data Week 2016  - Worldpay - Deploying Secure Clusters
Big Data Week 2016 - Worldpay - Deploying Secure Clusters
 
Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...
Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...
Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
 
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy ClustersData Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
 
Will Your Cloud Be Compliant? OpenStack Security
Will Your Cloud Be Compliant?  OpenStack SecurityWill Your Cloud Be Compliant?  OpenStack Security
Will Your Cloud Be Compliant? OpenStack Security
 
04 - VMUGIT - Lecce 2018 - Giampiero Petrosi, Rubrik
04 - VMUGIT - Lecce 2018 - Giampiero Petrosi, Rubrik04 - VMUGIT - Lecce 2018 - Giampiero Petrosi, Rubrik
04 - VMUGIT - Lecce 2018 - Giampiero Petrosi, Rubrik
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
 
Data Power For Pci Webinar Aug 2012
Data Power For Pci Webinar Aug 2012Data Power For Pci Webinar Aug 2012
Data Power For Pci Webinar Aug 2012
 
BREACHED: Data Centric Security for SAP
BREACHED: Data Centric Security for SAPBREACHED: Data Centric Security for SAP
BREACHED: Data Centric Security for SAP
 
Rightscale Webinar: PCI in Public Cloud
Rightscale Webinar: PCI in Public CloudRightscale Webinar: PCI in Public Cloud
Rightscale Webinar: PCI in Public Cloud
 
Applying Auto-Data Classification Techniques for Large Data Sets
Applying Auto-Data Classification Techniques for Large Data SetsApplying Auto-Data Classification Techniques for Large Data Sets
Applying Auto-Data Classification Techniques for Large Data Sets
 
Shield db data security
Shield db   data securityShield db   data security
Shield db data security
 
Shield db data security
Shield db   data securityShield db   data security
Shield db data security
 
Shield db data security
Shield db   data securityShield db   data security
Shield db data security
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

Implementing PCI DSS Controls for Hadoop at UK's Largest Payment Processor

  • 1. 1 © Worldpay 2018. All rights reserved. Not just a necessary evil, it’s good for business: Implementing PCI DSS controls for the Hadoop ecosystem at the UK’s largest payment processor David Walker/Worldpay & Srikanth Venkat/HortonWorks DataWorks Summit – Berlin 17-19 April 2018
  • 2. 2 © Worldpay 2018. All rights reserved.2 Session Synopsis & Your Speakers David has over 20 years’ technical leadership expertise and has led the development and management of complex BI solutions, supporting technical architectures for a wide range of organisations spanning SME start-ups to large enterprise. In his role at Worldpay, David specialises in developing and delivering the Enterprise Data Platform, a multi-tenant highly secure Hadoop platform for decision engines, analytics and reporting using his experience and knowledge in technical architecture, data modelling, ETL design, data quality, and metadata management. A key aspect of David’s role also involves acting as the lynchpin between Worldpay’s commercial and technical business leaders by regularly engaging at the executive level. David also manages cross-cultural teams in the analysis of technical infrastructures and the delivery of innovative and successful change programmes. Srikanth Venkat is currently responsible for Security & Governance portfolio of products at Hortonworks which include Apache Knox, Apache Ranger, Apache Atlas, Platform wide security and Hortonworks DataPlane Service. Prior to Hortonworks, Srikanth has held multiple roles in areas of cloud services, marketplaces, security, and business applications. His experience includes leadership across Product Management, Strategy and Operations, and Technical Architecture with broad experience in startups to global organizations including Telefonica, Salesforce.com, Cisco-Webex, Proofpoint, Dataguise, Trilogy Software, and Hewlett-Packard. Srikanth holds a PhD in Engineering with a focus on Artificial Intelligence from University of Pittsburgh, and an MBA in General Management from Indiana University and a Masters in Global Management from Thunderbird School of Global Management. Srikanth is a Data Sciences & Machine Learning hobbyist and enjoys tinkering with Big Data technologies. For firms in the financial industry, especially within regulated organizations such as credit card processors and banks, PCI DSS compliance has become a business and operational necessity. Although the blueprint of a PCI-compliant architecture varies from organization to organization, the mixture of modern Hadoop-based data lakes and legacy systems are a common theme. In this talk, we will discuss recent updates to PCI DSS and how significant portions of PCI DSS compliance controls can be achieved using open source Hadoop security stack and technologies for the Hadoop ecosystem. We will provide a broad overview of implementing key aspects of PCI DSS standards at WorldPay such as encryption management, data protection with anonymization, separation of duties, and deployment considerations regarding securing the Hadoop clusters at the network layer from a practitioner’s perspective. The talk will provide patterns and practices map current Hadoop security capabilities to security controls that a PCI- compliant environment requires.
  • 3. 3 © Worldpay 2017. All rights reserved.3 Transactions Daily. On average that’s per second. merchants using > payment methods & currencies in countries and in the UK we process % of all non-cash transactions Worldpay In (Big) Numbers In Store Online Mobile
  • 4. 4 © Worldpay 2018. All rights reserved.4 Data Security & Regulatory Compliance are both in the news … … but in reality they are two sides of the same coin Payment Card Industry Data Security Standards* General Data Protection Regulations Payment Services Directive 2* Data Protection Act(s) * Other industries have their own standards but the principle is the same
  • 5. 5 © Worldpay 2018. All rights reserved.5 So why is this good for business ? • In a digital world the success of our business (regardless of industry) will be significantly defined by our organisations ability to handle and use data responsibly throughout our business. We must protect our customers and business partners from both data misuse and from fraud. In short we need to be trusted by our customers in the ways that we handle their information • Legal & regulatory standards are being set by governments, regulators and industry bodies in an attempt to set a minimum sufficient standard to protect data subjects
  • 6. 6 © Worldpay 2018. All rights reserved.6 How do you develop a secure platform • Compliance is not lip-service to doing security – the auditing for PCI DSS is rigorous and we have to continuously review and upgrade our systems to maintain compliance • Audit of and compliance with these standards is a way of demonstrating that we have taken appropriate steps to protect our data assets – and in the worst case scenario it is also a way of mitigating the financial and reputational impact of an incident. Either start with a blank piece of paper …. … or adopt and commit to security framework
  • 7. 7 © Worldpay 2018. All rights reserved.7 Todays Hadoop Environments Are The Big Targets Within Your Organisation • If you are building or have built a large successful Hadoop deployment that contains a large amount of your business data then you have just created a massive target within your organisation • PCI DSS only certifies a project or implementation • No single product can deliver a PCI DSS compliant solution • As the implementers of a system we are looking to get the greatest amount of compliance by deploying the smallest number of products and tools to do the job
  • 8. 8 © Worldpay 2018. All rights reserved.8 First Some Historical Context • The Worldpay journey to build a big data platform started in April 2015 • We started with HortonWorks 2.2 • The Hortonworks Data Platform Security document did not exist • Apache Ranger was new, Apache Atlas was a concept, HortonWorks DataPlane wasn’t even a twinkle • Today we are on 2.6.4 and have applied nearly every release in between • Across the entire software product stack we did 298 patch sets and upgrades in 2017 • Besides the core platform paying for support and deploying HortonWorks SmartSense significantly improves your security profile • We are also interested in: • https://workbench.cisecurity.org – Center for Internet Security • http://owasp.org – Open Web Application Security Project
  • 9. 9 © Worldpay 2018. All rights reserved.9 Even your fish tank is a risk to you data platform(s) • PCI is about putting in place the • Security • Logging of activity • Audit of that security • Separation of duties • Patching Cycles • Etc. • And then maintaining them • We are just finishing our 2018 PCI cycle and start planning 2019 PCI cycle in September
  • 10. 10 © Worldpay 2018. All rights reserved.10 The PCI DSS 3.2 Requirements Goals PCI DSS 3.2 Requirement Build and Maintain a Secure Network and Systems 1. Install and maintain a firewall configuration to protect cardholder data 2. Do not use vendor-supplied defaults for system passwords and other security parameters Protect Cardholder Data 3. Protect stored cardholder data 4. Encrypt transmission of cardholder data across open, public networks Maintain a Vulnerability Management Program 5. Protect all systems against malware and regularly update antivirus software or programs 6. Develop and maintain secure systems and applications Implement Strong Access Control Measures 7. Restrict access to cardholder data by business need to know 8. Identify and authenticate access to system components 9. Restrict physical access to cardholder data Regularly Monitor and Test Networks 10. Track and monitor all access to network resources and cardholder data 11. Regularly test security systems and processes Maintain an Information Security Policy 12. Maintain a policy that addresses information security for all personnel
  • 11. 11 © Worldpay 2018. All rights reserved.11 Addressing The Requirements 1. Install and maintain a firewall configuration to protect cardholder data • The Worldpay network has defence in depth, much more than just firewalls including virtualised jumpboxes and two factor authentication. Our network traffic is monitored and logged • Apache Knox is used to supplement perimeter security 2. Do not use vendor-supplied defaults for system passwords and other security parameters • Apache Ambari allows us to install, configure and manage the system passwords, connection ports, certificates, etc. • Apache Ambari is used to help implement Kerberos • Keys stored in HSMs 3. Protect stored cardholder data • Hardware Encrypted Disks • Apache Atlas is used to ‘tag’ columns as PCI or PII data • Apache Atlas is used to mask data and/or remove data at run time • Apache Ranger is used to restrict access to the data based on roles (RBAC) • Apache Ranger is used to restrict access to the data based on attributes (ABAC) • Apache Ranger is integrated to our LDAP/Active Directory • Apache HDFS Transparent Data Encryption enabled • HDFS ACLs enabled • Microfocus SecureData (formerly HP Voltage) is used to either Tokenise or Encrypt sensitive (PCI & PII) data • Vormetric Disk Protection enabled
  • 12. 12 © Worldpay 2018. All rights reserved.12 Addressing The Requirements 4. Encrypt transmission of cardholder data across open, public networks • All of our components are use TLS 1.2 to encrypt network traffic – this has to be supported by every HortonWorks component to be effective 5. Protect all systems against malware and regularly update antivirus software or programs • Worldpay runs on Linux rather than Windows but we do still have anti-virus • Worldpay implements File Integrity Management that checks critical files are not being modified • Regular patching of entire software stack including OS and all software packages as patches and releases come out • Worl;dpay limits what software can be downloaded and in stalled on an servers • Hortonwork have/are specifically addressed vulnerabilities we have found • Use Hortonworks SmartSense to ensure optimal configurations 6. Develop and maintain secure systems and applications • Worldpay peer reviews our code before deploying • Worldpay developed code has to be scanned with tools like Vericode • Worldpay develops to OWASP (Open Web Application Security Project) standards for interfaces
  • 13. 13 © Worldpay 2018. All rights reserved.13 Addressing The Requirements 7. Restrict access to cardholder data by business need to know • Apache Ranger is used to restrict access to the data based on roles (RBAC) • Apache Ranger is used to restrict access to the data based on attributes (ABAC) 8. Identify and authenticate access to system components • Kerberos enabled cluster • Apache Ranger is integrated to our LDAP/Active Directory • Apache Ranger implements user -> group -> role -> access relationship 9. Restrict physical access to cardholder data • Tightly restricted access to the data centres • No disks returned on failure to the vendors • Indirect server access via virtualised jumpboxes
  • 14. 14 © Worldpay 2018. All rights reserved.14 Addressing The Requirements 10. Track and monitor all access to network resources and cardholder data • Systems Access logged via Apache Ranger to Apache Solr and made available to auditors • All other Hortonworks audit functions also enabled 11. Regularly test security systems and processes • Worldpay ‘pentests’ systems regularly (i.e. on installation, after major changes and annually) as part of the certification process • The EDP Governance team defines and audits policies relating to security (as well as other data management functions) 12. Maintain a policy that addresses information security for all personnel • Worldpay has a set of mandatory compliance training on PCI and other security issues that has to be renewed each year by all employees
  • 15. 15 © Worldpay 2018. All rights reserved.15 Our 2.6.4 Components that help us create a PCI compliant system today
  • 16. 16 Srikanth Venkat – Senior Director, Product Management Security & Governance in HDP: From a PCI-DSS Perspective
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Authentication & API Security: Apache Knox
  • 18. 18 Apache Knox Overview WebSSO Authentication And Federation providers Groovy based DSL Client DSL/SDK Services HTTP Proxying Services UIs REST APIs Web Sockets Hive Ambari HBase WebHCa t WebHDFS Authentication Services Proxying Services KnoxShell SDK Token Sessions REST API Classes KnoxSSO/Token YARN Range r Zeppeli n Oozie Phoenix Gremlin JDBC/ ODBC SAML OAuth LDAP/AD SPNEGO Header Based YARN RM WebHCa tWebHDF S Hive YARN RM HBase Proxying Services ★ Provide access to Hadoop via proxying of HTTP resources ★ Ecosystem APIs and UIs + Hadoop oriented dispatching for Kerberos + doAs (impersonation) etc. Authentication Services ★ REST API access, WebSSO flow for UIs ★ LDAP/AD, Header based PreAuth, and Token Exchange ★ Kerberos, SAML, OAuth Client DSL/SDK Services ★ Scripting through DSL ★ Using Knox Shell classes directly as SDK HDP Certified as of HDP 2.6.4 Community supported Atlas Oozie Druid
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Authentication: Kerberos
  • 20. 20 Background: Kerberos ⬢ Strongly authenticating and establishing a user’s identity is the basis for secure access in Hadoop ⬢ Users need to be able to reliably “identify” themselves and have identity propagated throughout the Hadoop cluster ⬢ Design & implementation of Kerberos security in native Apache Hadoop was delivered by Hortonworks co-founder Owen O’Malley! ⬢ Why Kerberos? ⬢ Establishes identity for clients, hosts and services ⬢ Prevents impersonation/passwords are never sent over the wire ⬢ Integrates w/ enterprise identity mgmt tools such as LDAP &Active Directory ⬢ More granular auditing of data access/job execution
  • 21. 21 Background: HDP + Kerberos Service Component A Service Component B HDP Cluster KDC keytabkeytab Service Component C keytab Service Component D keytab Service Component X Service Component X keytabkeytab Service Component X keytab Service Component X keytab Kerberos is used to secure the Components in the cluster. Kerberos identities are managed via “keytabs” on the Component hosts. Principals for the cluster are managed in the KDC.
  • 22. 22 Automated Kerberos Setup with Ambari  Wizard driven and automated Kerberos support (kerberos principal creation for service accounts, keytab generation and distribution for appropriate hosts, permissions, etc.)  Removes cumbersome, time consuming and error prone administration of Kerberos  Works with existing Kerberos infrastructure, including Active Directory to automate common tasks, removing the burden from the operator: • Add/Delete Host • Add Service • Add/Delete Component • Regenerate Keytabs • Disable Kerberos
  • 23. 23 Kerberos + Active Directory Page 23 Cross Realm Trust Client Hadoop Cluster AD / LDAP KDC Users: smith@EXAMPLE.COM Hosts: host1@HADOOP.EXAMPLE.COM Services: hdfs/host1@HADOOP.EXAMPLE.COM User Store Use existing directory tools to manage users Use Kerberos tools to manage host + service principals Authentication
  • 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Authorization & Audits: Apache Ranger
  • 25. 25 Apache Ranger • Central audit location for all access requests • Support multiple destination sources (HDFS, Solr, etc.) • Real-time visual query interface AuditingAuthorization • Store and manage encryption keys • Support HDFS Transparent Data Encryption • Integration with HSM • Safenet LUNA Ranger KMS • Centralized platform to define, administer and manage security policies consistently across Hadoop components • HDFS, Hive, HBase, YARN, Kafka, Solr, Storm, Knox, NiFi, Atlas • Extensible Architecture • Custom policy conditions, user context enrichers • Easy to add new component types for authorization
  • 26. 26 Ranger – ABAC Model  ABAC Model  Combination of the subject, action, resource, and environment  Uses descriptive attributes: AD group, Apache Atlas-based tags or classifications, geo-location, etc.  Ranger approach is consistent with NIST 800-162  Avoid role proliferation and manageability issues
  • 27. 27 Dynamic Row Filtering & Column Masking: Apache Ranger with Apache Hive User 2: Ivanna Location : EU Group: HRUser 1: Joe Location : US Group: Analyst Original Query: SELECT country, nationalid, ccnumber, mrn, name FROM ww_customers Country National ID CC No DOB MRN Name Policy ID US 232323233 4539067047629850 9/12/1969 8233054331 John Doe nj23j424 US 333287465 5391304868205600 8/13/1979 3736885376 Jane Doe cadsd984 Germany T22000129 4532786256545550 3/5/1963 876452830A Ernie Schwarz KK-2345909 Country National ID CC No MR N Name US xxxxx3233 4539 xxxx xxxx xxxx null John Doe US xxxxx7465 5391 xxxx xxxx xxxx null Jane Doe Ranger Policy Enforcement Query Rewritten based on Dynamic Ranger Policies: Filter rows by region & apply relevant column masking Users from US Analyst group see data for US persons with CC and National ID (SSN) as masked values and MRN is nullified Country National ID Name MRN Germany T22000129 Ernie Schwarz 876452830A EU HR Policy Admins can see unmasked but are restricted by row filtering policies to see data for EU persons only Original Query: SELECT country, nationalid, name, mrn FROM ww_customers Analysts HR Marketing
  • 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Data Protection
  • 29. 29 Data Protection in Hadoop must be applied at three different layers in Apache Hadoop Storage: encrypt data while it is at rest Transparent Data Encryption in HDFS, Ranger KMS + HSM, Partner Products (HPE Voltage, Protegrity, Dataguise) Transmission: encrypt data as it is in motion Native Apache Hadoop 2.0 provides wire encryption. Upon Access: apply restrictions when accessed Ranger (Dynamic Column Masking + Row Filtering), Partner Masking + Encryption Data Protection
  • 30. 30 Data Protection – Layered Approach • Encryption of Data at Rest – OS Level Encryption (LUKS) – Certified Partners for volume encryption (e.g: Vormetric (Thales) Protegrity, HPE Voltage Security) – HDFS TDE file/folder level encryption with keys managed by Ranger KMS, External HSM integration • Encryption of Data on the Wire – All wire protocols can be encrypted by HDP platform – Wire-level encryption enhancements (SSL). • Granular Data Protection – Dynamic Masking + Row Filtering for Hive with Ranger – Classification Based Security with Ranger + Atlas – Element level encryption/masking from certified partners (HPE Voltage, Protegrity)
  • 31. 31 Ranger KMS Transparent Data Encryption in HDFS NN A B C D HDFS Client A B C D A B C D DN DN DN Benefits  Selective encryption of relevant files/folders  Prevent rogue admin access to sensitive data  Fine grained access controls  Transparent to end application w/o changes  Ranger KMS integrated to external HSM (Safenet Luna) adding to reliability/security of KMS SafeNet- Luna HSM
  • 32. 32 HSM integration with Ranger KMS  HSM client needs to be setup in KMS nodes  When installing Ranger KMS, HSM parameters can be specified  If KMS is already installed with DB, Master key can be migrated to HSM  All other TDE functionality remains unchanged
  • 33. 33 HSM integration with Ranger KMS Only master key will be in HSM Other keys stored in Ranger KMS DB
  • 34. 34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Governance with Apache Atlas
  • 35. 35 Apache Atlas Vision: Open Metadata & Governance Services STRUCTURED TRADITIONAL RDBMS METADATA MPP APPLIANCES Kafka Storm Sqoop Hive ATLAS METADATA Falcon RANGER Custom Partners Comprehensive Enterprise Data Catalog • Lists all of your data, where it is located, its origin (lineage), owner, structure, meaning, classification and quality • Integrate both on-premise and cloud platforms to provide enterprise wide view Open Enterprise Data Connectors • Interoperable connector framework to connect to your data catalog out of the box with many vendor technologies • No expensive population of proprietary siloed metadata repositories Dynamic Metadata Discovery • Metadata is added automatically to the catalog as new data is created or data is updated • Extensible discovery processes that characterize and classify the data Enabling Collaboration & Workflows • Subject matter experts locate the data they need quickly and efficiently, share their knowledge about the data and its usage to help others • Interested parties and processes are notified automatically Automated Governance Processes • Metadata-driven access control • Auditing, metering, and monitoring • Quality control and exception management • Rights (entitlement) management Predefined standards for glossaries, data schemas, rules and regulations Vision: Metadata-driven foundational governance services for enterprise data ecosystem • Open frameworks and APIs • Agile and secure collaboration around data and advanced analytics • Reduce operational costs while extracting economic value of data
  • 36. 36 HDP – Security & Governance Classification Prohibition Time Location Policies PDP Resource Cache Ranger Manage Access Policies and Audit Logs Track Metadata and Lineage Atlas Client Subscribers to Topic Gets Metadata Updates Atlas Metastore Tags Assets Entitles Streams Pipelines Feeds Hive Tables HDFS Files HBase Tables Entities in Data Lake Industry First: Dynamic Tag-based Security Policies
  • 38. 38 Walk Through Items ⬢ Ranger ⬢ ABAC Fine Grained Security ⬢ Resource/Masking/Row Filtering Policies ⬢ Audits – self audits/access/plugin audits, logins ⬢ User/Group/Roles in Ranger ⬢ Atlas ⬢ Search and tag assets ⬢ Tag Attributes ⬢ Tag based policies in Ranger
  • 39. 39 © Worldpay 2018. All rights reserved.39 WorldPay – Hortonworks Partnership • WorldPay has partnered closely with Hortonworks to improve security and governance features across HDP and to certify their internal platforms for PCI-DSS • Collaboration has resulted in the community enhancements via Apache Knox, Apache Ranger, and Apache Atlas, wire encryption & TDE • Ongoing collaboration on HDP platform security fixes from external audits • Key learnings incorporated into Hortonworks DataPlane Service – Data Steward Studio (DSS)
  • 40. 40 © Worldpay 2018. All rights reserved. Leaders in Modern Money Innovating In Secure Modern Data Analytics Thank You David M Walker (david.walker@worldpay.com) Enterprise Data Platform Programme Director, Worldpay Srikanth Venkat Senior Director, Product Management, Hortonworks

Editor's Notes

  1. This is designed to impress - I use lots of junk examples here: * These slides were written before our merger with Vantiv - we were the biggest in the UK, they were the biggest in the US - now we are bringing this all together in the biggest payment company in the world * If you've used a card 3 times in the UK - we have your details, your name, card number and expiry date. If one of those transactions was on the web we also have your address, email and probably your mobile too. If it was for a flight, we know where you are going * So this data is sensitive - it has to be secure which is why our enterprise data platform focuses so much on security - we do security at every level [explain each one], data at rest, data in motion, authentication, access control and tokenisation * We are so important to the digital economy that we are on the UK Gov't risk register * Payments aren't just card transactions - we do mobile payments ion Kenya, sit behind PayPal and have an alternative platform that looks at Bitcoin etc. * Our transactions peak out at >1500 per day - do you [the audience] know which is the busiest day ? - [wait for guesses - Christmas, Black Friday, etc.] then do the China Singles Day stats - see wikipedia for numbers * Without Worldpay you probably can't buy me a beer in the bar later * All the stats in this presentation refer to Worldpay pre-merger - we are bigger (and better now)
  2. Proxying services – Only proxy HTTP resources at the moment – let us know if there are requests for other protocols Support UIs, REST APIs, JDBC/ODBC, and Websockets We can configure Knox to impersonate the user when making requests to the cluster – that’s called the trusted proxy pattern. Not all services use Knox as a trusted proxy. Auth Services Oauth will be supported in the future subject to demand ------------------------------------------------------------------------------ HDP Certified =  We have Groovy based DSL scripting access to underlying Hadoop resources via the SDK– WEBHDFS, WEBHCAT,OOZIE,HBASE, so most likely we should colour code that too. Yarn RM – Resource Manager HBASE was tested then they ran into some issues, so it’s now tracked for Atlantic,https://hortonworks.jira.com/browse/BUG-84047
  3. For example, even if you had SSL to the web UIs and have accounts with sudo privileges. This sudo user will be able to sniff and decrypt network traffic (domain account password). Also the damage will be widespread since the sudo user will be able to get domain creds of any user who logs on to the cluster.     With Kerberos this concern is not there since the user’s password is not sent to the cluster. If the sudo user become compromised, he can only get Kerberos ticket which is for talking to a particular service and also has a lifetime. We definitely need Kerberos based authentication for Ambari. Strong Authentication = Password never sent over the wire
  4. HSMs are trust anchors – appliances or cards that have dedicated crypto processor that generates and stores cypto keys securely – validated by 3rd parties security certification (FIPS140-2, CC EAL etc.) simplify audits, ease compliance, & provide high performance
  5. 35
  6. For example, even if you had SSL to the web UIs and have accounts with sudo privileges. This sudo user will be able to sniff and decrypt network traffic (domain account password). Also the damage will be widespread since the sudo user will be able to get domain creds of any user who logs on to the cluster.     With Kerberos this concern is not there since the user’s password is not sent to the cluster. If the sudo user become compromised, he can only get Kerberos ticket which is for talking to a particular service and also has a lifetime. We definitely need Kerberos based authentication for Ambari. Strong Authentication = Password never sent over the wire