Big Data Week 2016 - Worldpay - Deploying Secure Clusters

1
© Worldpay 2016. All rights reserved.
Deploying Secure Clusters at Worldpay
David M Walker
Enterprise Data Platform Programme Director
27th October 2016

2 © Worldpay 2016. All rights reserved.2
Transactions Daily.
On average that’s per second.
merchants using >
payment methods & currencies
in countries and in the UK we
process % of all non-cash transactions
Worldpay In (Big) Numbers
In Store
Online
Mobile

Who are our customers?
• You probably interact with Worldpay several times a day without realising it:
• But we also behind the payment provisions for over:
̶ 16,000 hairdressers - 24,000 restaurants - 9,000 pubs - etc.
• And after today you will probably notice everywhere

Worldpay & Big Data
• In April 2015 we made the strategic decision to commit to a new enterprise
wide data platform to:
̶ Provide deep analytics and data driven decisions as well as traditional
reporting
̶ Source information from across all our platforms and bring it to one place
̶ Make this information available to our colleagues, our customers and our
partners
̶ Exploit disruptive open-source technologies
̶ Full commitment from CEO, CIO and Head of Data who initiated the project
• But with 13.1 billion transactions to a total value of £402bn from 2015 alone
and with a significant proportion of both your card and my card transaction
history in the system it had to be SECURE

Any Loss Of Private Data Is A Financial & A Reputational Cost
• Card Data (PCI)
̶ Covered by the Payment Card Industry (PCI) Data Security Standard (DSS)
̶ Allows us to store PAN as long as it is encrypted, Cardholder Name, Expiry
Date but we must not store the CVV2 or the PIN
• Personally Identifiable Information (PII)
̶ Covered by Data Protection Acts – and there are many
̶ Any data that can be used to identify a living person
̶ Subject to laws of all countries whose cards we process
• Health Data (PHI)
̶ Any medical data – we don’t have any but we have a policy just in case!
• We define our policy as securing PxI – in other words if it contains Personal
Information regardless of type it needs securing

Who are the Information Security Stakeholders?
Business
Security Legal
Needs appropriate access to
data at an appropriate level
to offer innovative data
services to our customers
and partnersHas to ensure that all data is
secure and protected from
sophisticated multi-level,
multi-faceted attack vectors
both externally and
internally
Ensure that the way in which
we use data is legal and
assesses the risks associated
with using information in
certain ways
Ultimately it is the potential for huge financial and reputational costs associated
with the loss or misuse of data that the organisation must protect itself from

Platform CDE
End User Environment
Platform CDE
Enterprise Data Platform Card Data Environment (CDE)
Hadoop Infrastructure
What Does Worldpay’s Multi-Tenant Environment Look Like?
Data Lake
(Safe)
1:1 copies of
data in a
secure usable
format
Ingest Area
(Sensitive)
Tokenization
Lineage
Standardize
DQ
Monitoring
TDE
Protected
Enterprise Data Warehouse Tenant
Data Marts for Visualization & Reporting
Decision Services Tenant
Machine Learning & Data Science Modelling
Decision Services
Decision Engine
Shopper Insights Tenant
Analytics on Consumer Behavior
Many More Tenants To Come
Watch this space …
Third Party
Tokenization
Solution
End User
Interfaces
Corporate Infrastructure
Active Directory, Code Repository & Build Tools, Local Disk Encryption Tools, etc.
Operational
Payment Source
Infrastructure
Platform CDE
Operational
Payment Source
Infrastructure
Platform CDE
Operational
Payment Source
Infrastructure
LocalFileSystem(Sensitive)
Requiredforunpacksomefile

Key Design Principles
• Security Designed In From Day 1
̶ Enabling security from the outset – this makes the initial build slower
but speeds up the ultimate delivery
• Hortonworks First
̶ If it was available, or likely to become available soon, within the platform
we would strive to make it work rather than buy a third party product
• Turn It All On
̶ If a security feature is available we would try and make use of it
• Staged Component Enablement
̶ Bring up, secure and test each Hadoop component before bringing users on. In our case Hive first, before we looked at
Oozie, etc.
• Third Party Product PoCs in our Production Environment
̶ Most vendors ‘can do a PoC quickly’ in the cloud – but most vendors have struggled to integrate into the secure
environment – some have eventually succeeded!

The Outer Walls
• Firewalls & Network ACLs
̶ Put everything that forms part of the environment with
firewalls to minimise the attack surface
̶ Close all un-necessary ports – obvious but necessary
• Limit The Number Of Edge Nodes
̶ We have two – for load balancing and high availability
• Reverse Proxy For Connectivity
̶ Knox – integrated with Ranger in HortonWorks – to limit access
• Enforce Card Data Environment (CDE) Isolation
̶ Data Sources & Targets in a different CDE from the cluster
̶ Enforce highly restrictive data transfer routes

Authentication Controls
• Kerberos
̶ Kerberise everything that you can
• Integrate with your Corporate AD/LDAP
̶ Ensure all authentication happens in one corporate system
̶ Watch out for “Please can you clone user X for user Y” that
causes privileges to leak
• Implement Role Based Access Controls (RBAC)
̶ The key to success is having smaller groups with regular audit rather than larger complex groups
• Store Keys & Certificates Securely
̶ Hadoop keys are all over the place – it is a significant exercise to manage them
̶ HortonWorks have developed Hardware Security Module (HSM) integration for us
̶ Certificate Management is a significant overhead
• Implement Attribute Based Access Controls (ABAC)
̶ Not fully there yet but a feature of Atlas/Ranger that will enable even tighter security

Data In Motion
• Data Channels
̶ Use TLS 1.2 - anything less is not sufficient
• Terminal Session Connectivity Over SSH
̶ Todays norm but still needs mandating
̶ Requires access via JumpBox controlled via AD
• Web Interfaces for Management and Users all over https
̶ Many components default to unsecure http
• Do this for each and every Hadoop component
̶ It is not sufficient to do this at the start of the build.
̶ Each component needs to be checked when added and when updated
• Managing inside the cluster is only the start
̶ Integrating with legacy sources that may not support secure protocols is much harder

Data At Rest – Local Disk
• Protect Local Disk
̶ We use a third party product to ensure that local disk is encrypted
wherever data could land
̶ Even ‘root’ can not read some file systems as authentication and
access happens outside the server itself
̶ Note that your local disk will contain critical sensitive information if
for example you use Map-Reduce the Reducers cache intermediate
results
̶ Watch out for poorly written Third Party Products – we evaluated one
encryption tool that wrote a log file that said ‘Encrypting 123 as XYZ’

Data At Rest – HDFS
• Protect HDFS
̶ Enable Transparent Data Encryption (TDE) for areas with Sensitive
Data
̶ Understand the HDFS processes
̶ Deleted files end up in Trash (except TDE)
̶ Snapshots/ Replicas/Backups preserve copies of sensitive data
̶ Replication between clusters means managing keys between clusters
̶ TDE comes with a performance overhead
• Pay the uplift for dead disk retention
̶ Destroy disks that have been on site and provide a ‘Certificate of Destruction’ rather than
let the vendor take them away for recycling

Data At Rest – Tokenization
• Tokenise/Encrypt Data As Soon As Possible
̶ AES256 as a minimum for encryption
̶ Worldpay have used a Third Party Product for Tokenisation
̶ As with all crypto – DON’T ROLL YOUR OWN!
̶ Think about key rotation – you need to compare the tokenised data over
many years (Worldpay = 7 years) but must manage a 6 month key rotation?
• Universally Unique Obviously Tokenised PAN (Card) Data
̶ Our Third Party Tool allows us to create tokenised data
̶ Universally Unique – Every encrypted card number will always be the same regardless of key
rotation, and each card number maps to only one token
̶ Obviously Tokenised - a 16 digit card number (1234 5678 7890 1234) becomes a 16 digit upper case
string (ABCD EFGH IJKL MNOP) – this means that if a card number does bypass all the checks it is
immediately obvious to the user and can be reported
̶ We have an alternative token scheme for sharing data with customers and partners so that the core
tokens are for internal use only

Hadoop Infrastructure
Our End To End Process for Data At Rest
Data Lake:
Safe Data
(Tokenized)
Secure Usable Data
With Higher Performance
Ingest Area:
Sensitive Data
(TDE Encrypted)
PxI Data Checks,
Tokenization &
Then Deleted
Local Disk on
Edge Nodes
(Encrypted)
De-compression
& HDFS Load
sftp, scp, etc.
Replication Tools, Spark Streaming, Sqoop, etc.
hdfs –dfs cp

Environment Management – Core To Security
• Frequent Patch Cycle
̶ Take updates for Linux OS, Hadoop and Third Party Products as soon as is practical
̶ Linux and Third Party Tools tend to be stable and have a slower release but this is part of why we limit
the number of third party products
̶ Hadoop changes tend to be more frequent and the amount of change in a release is also significant so
keeping up to date and re-testing the security is a major undertaking
̶ Every platform change is tested and documented on PPE1 by one SysAdmin, a second SysAdmin uses
the docs to perform the upgrade on PPE2 – and improves the documentation. PRD2 is then done by
both SysAdmins and PRD1 is done last. Then a consistency check is run across all environments
̶ Every software change is deployed in DEVL, TEST and then PROD environments
̶ All code (internal & external) is also code scanned
• Penetration Testing
̶ Quarterly Pen Tests - The pace of change dictates that a planned regular full test is required
• PCI DSS 3.2 Compliance
̶ We are just doing the paperwork to get our certification however we believe that we have already
addressed the requirement
̶ We view the requirement as both necessary and also as a useful checklist – where possible we will
move beyond this as a minimum requirement

People are key to delivery and security
• CEO, CIO and Head of Data Transformation fully engaged
̶ There is no question that from the very top of the organisation
as a business is fully committed to delivering our Enterprise Data Platform and that
information and data security are core to that delivery
• Our Project Stakeholders
̶ Security, Operational Acceptance Testing (OAT), Service Transition (ST), Infrastructure & Operations (I&O)
and the Project Management Office (PMO) are engaged at every step of the process
• But most of all - our Enterprise Data Platform Team
̶ A team with a mixture of youth and experience that has grown from 1 to 30 people in 18 months
̶ Predominately internal people with support from the vendors and big data consultancies and backfill
from contract resource
̶ We target 70% permanent, 20% contractor, 10% consultancy – but usually miss!
̶ We act as one team regardless of the company that employs the person

The Enterprise Data Platform Team
Programme
Director
Data
Governance
Systems
Engineering
Systems
Analysis
Data
Engineering
Project
Management
• Security Policy
• Metadata
• Data Lifecycle
• Data Model Standards
• Data Quality
Management
• Development Lifecycle
• Hadoop
• Linux
• Third Party Products
• Platform Support
• Systems Security
Implementation
• Data Model Design
• Data Requirements
Analysis
• Source System Analysis
including Sensitive Data
Identification
• Source to Target Mapping
• Data Loading
(Batch & Stream)
• Code Management &
Deployment
• Testing
• Data Security
Implementation
• Stakeholder
Management:
PMO, OAT, ST, I&O,
Security
• PCI Audit
• Planning & Resources

What have been the pain points?
• Translating Open Source into Enterprise Ready
̶ Each open source component (Hive, Hbase, Spark, etc.) does it’s own thing with security – and tends to think that is
sufficient – as a business we look across the platform – we want to secure the Enterprise Data Platform not Hive
̶ Standardising which database is used by each component for backing metadata store (we settled on Postgres in the end)
̶ Also remember that just because it is the vendor release does not mean it is GA – some items will be Incubator Status on
Apache
• Turning EVERYTHING ON - we wanted to do this but it does create pain
̶ It is clear that very few, if any, other organisations have enabled as many security features as we have and consequently
we have hit every pothole along the road and had to solve them
̶ Our work has contributed to the latest HortonWorks Security Guide (see http://docs.hortonworks.com)
• Vendor Specific Security Integration
̶ HortonWorks (Ambari/Ranger), Cloudera (Sentry) & MapR all have different interfaces for managing security.
̶ Knowledge on how to configure each specific distribution is difficult to resource
• Understanding
̶ It has been hard to persuade people (especially vendors) just how serious and competent the team is about building a
robust, secure environment and that it if we find a issue then it is they that are likely to have a problem. Vendors are
learning with us what it takes to build secure open source big data platforms

Who are our technology partners?
Hardware &
Networking Supplier
Strategic
Core Hadoop Distribution
Strategic Architecture
Consulting
Systems & Data Engineers
Systems & Data EngineersTokenization
Local File System
EncryptionENTERPRISE DATA PLATFORM

Some Stats About Our Environment
• Two Production Clusters (56 nodes) and Two PPE Clusters (16 nodes)
̶ All environments are built using the same templates and build instructions
̶ The average Data Node has 12x4Tb disk, 256Gb Memory and 20 cores
̶ Our clusters are on premise and we have the capability to burst to cloud infrastructure with secured
(tokenised) data
• We’ve upgraded from HDP 2.3 to HDP 2.4 to HDP 2.5 in 18 months
̶ And security and ease of management have improved with each release
• We’ve loaded 60+ Billion Card Transactions from two of Worldpay’s systems
̶ And we are busy at work to get all the other systems on board as both batch and real-time streams
• We’re in the process of delivering to Users and Systems
̶ Users have secure data access with a range of desktop and web tools to the Transaction History
̶ We are in the process of deploying Machine Learning Derived Algorithms back into payment platforms

22
© Worldpay 2016. All rights reserved.
Leaders in Modern Money
Innovating In Secure Modern Data Analytics
Thank You
David M Walker (david.walker@worldpay.com)
Enterprise Data Platform Programme Director

Big Data Week 2016 - Worldpay - Deploying Secure Clusters

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Big Data Week 2016 - Worldpay - Deploying Secure Clusters

Similar to Big Data Week 2016 - Worldpay - Deploying Secure Clusters (20)

More from David Walker

More from David Walker (20)

Recently uploaded

Recently uploaded (20)

Big Data Week 2016 - Worldpay - Deploying Secure Clusters