SlideShare a Scribd company logo
1 of 40
Download to read offline
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Privacy and Governance in the Age of
Big Data: Deploying a De-Identified Data
Lake
Ryan Peterson
Principal Solutions Architect,
Data & Analytics
AWS Partner Team
G P S T E C 3 0 3
Danielle Greshock
Sr. Manager, Business Applications
AWS Partner Team
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Let’s look at some metrics
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What problems are customers trying to solve?
• What type of data am I collecting?
• Where do I collect it?
• Where do I store it?
• Do I have the appropriate legal collection
statements?
• How and when do I delete data?
• How do I secure the data?
• What responsibility do I have?
• Why do I collect the data?
• What is my legal basis for processing and
using the data?
• Where is a list of all my data?
• Do I communicate with the subject I am
collecting from?
• Who do I share it with?
• Who has access to my data? How do I
control it?
• What are the use cases for the data? Are
they permitted? Who provided permission?
• How do I find my data?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How are privacy regulations attempting to protect consumers?
CSA
Cloud Security
Alliance Controls
ISO 9001
Global Quality
Standard
ISO 27001
Security Management
Controls
ISO 27017
Cloud Specific
Controls
ISO 27018
Personal Data
Protection
PCI DSS Level 1
Payment Card
Standards
SOC 1
Audit Controls
Report
SOC 2
Security, Availability, &
Confidentiality Report
SOC 3
General Controls
Report
Global United States
CJIS
Criminal Justice
Information Services
DoD SRG
DoD Data
Processing
FedRAMP
Government Data
Standards
FERPA
Educational
Privacy Act
FIPS
Government Security
Standards
FISMA
Federal Information
Security Management
GxP
Quality Guidelines
and Regulations
ISO FFIEC
Financial Institutions
Regulation
HIPAA
Protected Health
Information
ITAR
International Arms
Regulations
MPAA
Protected Media
Content
NIST
National Institute of
Standards and Technology
SEC Rule 17a-4(f)
Financial Data
Standards
VPAT/Section 508
Accountability
Standards
Asia Pacific
FISC [Japan]
Financial Industry
Information Systems
IRAP [Australia]
Australian Security
Standards
K-ISMS [Korea]
Korean Information
Security
MTCS Tier 3 [Singapore]
Multi-Tier Cloud
Security Standard
My Number Act [Japan]
Personal Information
Protection
Europe
C5 [Germany]
Operational Security
Attestation
Cyber Essentials
Plus [UK]
Cyber Threat
Protection
G-Cloud [UK]
UK Government
Standards
IT-Grundschutz
[Germany]
Baseline Protection
Methodology
X P
G
GDPR
[EU]
General Data
Protection RegulationCCPA
California Consumer
Privacy Act of 2018
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Internal challenges with development lifecycle
What challenges do I have, and what can I do?
What are my major risks for data compromise?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Situation two: External challenges with bad actors
What challenges do I have, and what can I do?
What are my major risks for data compromise?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How we’ve handled such challenges, pre-GDPR
• Outsource credit card processing
• Masking personally identifiable information (PII)
• Firewall rules for distributed denial-of-service (DDoS)
• Network lockdowns
• Database encryption
• Alerting
• Environment automation (no SSH)
• Validation and sanitation of your data input and output (SQL
injection / cross-site scripting). Does the data look like it’s supposed
to?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How does a de-identified data lake help?
• It protects what both
internal and external
bad actors want—the
data!
• It allows developers to
focus on their goals—
high-quality, tested
software
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Then and now
Directives Laws
Best practices /
Good ethics
Regulatory
requirements
No
consequences
Heavy fines
Overhead
In design and
necessity
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How do we resolve PII dangers?
• Do we need to solve these
individual issues?
• Is there a solution architecture that
solves all PII issues?
• What best practices have you used
to mitigate PII dangers?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Oracle
SQL Server
Teradata
Avro, Parquet
CSV, JSON, XML
DB, DWH,
NoSQL,
NewSQL,
In-Memory
Structured &
Richer file
formats
SaaS,
On premise
Social & Web Logs
Batch
Ingestion
Compute ClusterAmazon
EMR
2nd Party PII
Curated data
Amazon
Redshift /
Spectrum
Amazon
RDS
Amazon
DynamoDB
Analysis, ML,
Search, Reporting
& Dashboards
Data Catalog
Exploratory Analytics
Analytics
Data Processing at Scale
PII Masking/Filter
ETL
2nd Parties
Hashed PII Hashed PII
Decentralized
Hashed PII Matching
HITCH Token+De-Identified Data
Amazon
SageMaker
Amazon Macie
PII Compliance
AWS Glue
Amazon Athena
S3 Data Lake
Amazon
QuickSight
Media
Ingestion
Video & Images
S3 Staging
Image Scanning
Amazon
Rekognition
Streaming
Kinesis
Amazon Elasticsearch Service
Identity&
Security
IAM Directory Service KMS
Payroll Providers
Marketing Partners
Insurance Providers
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why catalogs are key
New business initiatives Advanced analytics
How do I understand,
document, and ensure
proper usage?
What data is
available to me?
Regulatory compliance
How do I use my data
for business value?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Catalog current & future state
Current state
Two types of data catalogs:
1. Pure data cataloging for inventorying
and identification
2. Catalogs embedded in apps to make
more useful
Has resulted in limited usefulness because no
integration into larger enterprise information
management activities
Desired state
Data marketplace with an actionable data
catalog:
- Find data
- Transform data
- Provision data
- Maintain data governance and
catalog currency
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Actionable Data Catalog via a self-service Data Platform
Self-ServiceDataPlatform
RDBMS Data lake Catalogs Data governance
tools
Data discovery & catalog
Ingest Transform Improve
Files
RDBMS Data lake Reports
Data marketplace
Data management
Data consumers
Applications
Actionable data catalog
• Enables a self-service Data
Platform
• Increases productivity of data
producers and consumers
• Governance and catalog
currency are infused
throughout the process
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Self-Service Data Platform: Discover, catalog & ingest
• Catalog everything
▪ Relational database
management system
(RDBMS), Data Lakes,
Automated Data Inventory
• Leverage enterprise
definitions & standards
• Annotate & customize
for business need
• Empower SME’s to discover
new data
Ingest
Catalog
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Self-Service Data Platform: Prepare & manage
• Enrich from catalog
▪ Leverage data profile &
quality findings
▪ Build recipes to cleanse,
join, and aggregate
• Schedule, Manage,
Operationalize
▪ Scale out
▪ Govern based on DM
policies
Workflow designer
Token masking
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Self-Service Data Platform: Marketplace
• Collaborate
▪ Workspaces & rankings
▪ Annotate and customize
• Share
▪ Provision to existing enterprise
data warehouses (EDW) or data
marts
▪ Deliver to data science data lake
or zone
▪ Enable in data visualization or BI
tools (Tableau)
Share
Provision
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Deployment architecture for AWS data lake with ZDP
Amazon
ES
Amazon
Athena
Amazon
Kinesis
Amazon
Kinesis Data
Firehose
Amazon S3
AWS
Glue
Availability ZoneAvailability Zone
Private subnet 1
Public subnet 1
Private subnet 2
Public subnet 2
Amazon
QuickSight
NAT GW NAT GW
Internet
GW
Amazon
RDS
S
Amazon
RDS
Amazon S3
Endpoint
Amazon S3
Endpoint
AWS Region
Amazon
Redshift
Storage GWStorage GW
Zaloni bastionZaloni bastion
Auto
Scaling
Group
Cluster
instances
Cluster
instances
Zaloni
Managed
Clusters
AWS
Directory
Service
Amazon
CloudWatch
Elastic IPs Elastic IPs
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Zaloni’s Integrated Self-Service Data Platform (ZDP) offerings
Enable Govern Engage
• Batch ingestion
• Streaming ingestion
• Metadata capture
• Auto discovery
• Data quality
• Data lineage
• Data mastering
• Data privacy/security
• Data enrichment
• Data lifecycle management
• Discovery catalog
• Self-service ingestion
• Self-service preparation
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Dataguise DgSecure capabilities
Detect
Find and report the exact
quantity and location of
sensitive data in structured,
unstructured and semi-
structured content
Protect
Remediate your sensitive
data exposure by masking
and / or encryption it at the
element level
Monitor
Track how and where
sensitive data is being
accessed through a 360°
dashboard
Right of access
Upon request, precisely find
and report all records and
other information of a specific
individual
Right to erasure
Upon request, delete all
records and other information
of a specific individual
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Validating the knowns & finding the unknowns—
Structured and semi-structured data
Email Customer ID Transcript
csalazar@example.com 19664 Just talked to Carlos Salazar
mary@example.com 23423 Mary’s SSN is 000000000
mateo@example.com 99644 Mateo is moving to Nevada
NA 02945 It is expected to rain tomorrow
ID Name, SSN, StateEmail
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Validating the knowns & finding the unknowns—
Structured and semi-structured
Email Customer ID Transcript
csalazar@example.com 19664 Just talked to Carlos Salazar
mary@example.com 23423 Mary’s SSN is 000000000
mateo@example.com 99644 Mateo is moving to Nevada
NA 02945 It is expected to rain tomorrow
ID Name, SSN, StateEmail
Email Customer ID Transcript
4t34gttt 7462391 Just talked to Jane Roe
44e5325 1239474 Jorge’s SSN is 666666666
0we&yrw 9983487 Sofia is moving to Texas
NA 3344325 It is expected to rain tomorrow
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Finding the needles in the haystack
Unstructured data
Customers
Call Center
Your call will be recorded
for quality assurance
Social Security Number
Full Name
1 2 3
4
5
……………..this is Shirley Rodriguez, and my social is
six six six six six six six six six is there any more
information you need for my app...........
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Match datasets without PII ever leaving your organization
Senate matching
Remain in control
Matching subject to strict
governance and licensing
workflows. Audit all
matching.
Private by design
PII never leaves your
organization’s firewall
Future-proof compliance
The end of the PII
honeypot
Decentralized matching
occurs on hash
fragments. Protect PII at
all times.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The decentralized matching process
Senate matching protects customer PII for data custodians:
1. Tokenization: Anonymizes PII by replacing it with random tokens when datasets are uploaded to the
contributor node
2. Hashing: Protects PII by hashing (one-way, non reversible encryption)—original PII is not stored
3. Slicing: Distributes hash fragments across a distributed computing network
4. Matching: Provides a matching service to link identities across different contributor nodes, without
disclosing data or putting it at risk of misuse
5. Governance: Performs these matches according to data exchange governance within Data Republic’s
Senate Platform
6. Auditable: Functions in a transparent and verifiable way so that analysts and custodians can build trust in
the system, the data, and with each other
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Comparing methods for PII management
Custodian’s database
(CRM or EDW) contains
customer PII data that
must be protected
The central data store is
able to match customer
records from multiple
contributors
Matched customer
tokens are used by an
analyst to link datasets
Custodian’s database
(CRM or EDW) contains
customer PII data that
must be protected
No single company can
match or re-identify
customer data
Matched customer
tokens are used by an
analyst to link datasets
Senate Matching Contributor
Node hashes the PII so that
the original text is
unrecoverable
Hash values are sliced
into parts and stored in a
distributed network
The old way
The new way with senate matching
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Senate matching processSenate
matching
Customer database Contributor node Decentralized
Network
Aggregator node
Your org’s environment The matcher network
Senate
After logging into the DR’s Senate Platform
Attribute data
uploaded
Data license
+ match approved
Match Discovery
workspace
Output
The node returns the
random tokens to the
data custodian.
Aggregator node
Custodian approves match request.
The Matcher network sends their list of
matches to the Aggregator Node,
which sends decrypted tokens.
Anonymized data
Custodian sends anonymized data to Senate by
substituting any identifying information with the
random token generated by their Contributor Node
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The problem
The old way
Bringing data from sources
like databases into data
lakes and data warehouses
to perform analytics
Today
There are many
operational systems
and sources of data
ETL needs to adapt
Risk is increased by
compounding the number of
locations where data is stored
Cloud-based
repositories
Modern data lakes and
analytics tools are all based
in the cloud
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How data normally flows …
Extraction process
Load process
Transformation
process
Amazon S3
data lake
Amazon
Redshift
staging
table
Reporting process
Amazon
Redshift
destination
table
Reports and
extracts
Source data
(Database or
API)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The solution: Transforming sensitive data
The key to building a de-
identified system is adding a
sensitive data
transformation step to the
data extraction process
Extraction and
transformation
process
Load process
Post-load
transformation
Amazon S3
data lake
Amazon
Redshift
staging
table
Reporting process
Amazon
Redshift
destination
table
Source data
(Database or
API)
Reporting process
Reports and
extracts
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How customers use Etleap
Source data
(Database or
API)
Amazon S3
de-identified
data lake
Amazon
Redshift data
warehouse
Extraction, sensitive data
filtering, milestoning, and
transformation
Data curation
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Oracle
SQL Server
Teradata
Avro, Parquet
CSV, JSON, XML
DB, DWH,
NoSQL,
NewSQL,
In-Memory
Structured &
Richer file
formats
SaaS,
On premise
Social & Web Logs
Batch
Ingestion
Compute ClusterAmazon
EMR
2nd Party PII
Curated data
Amazon
Redshift /
Spectrum
Amazon
RDS
Amazon
DynamoDB
Analysis, ML,
Search, Reporting
& Dashboards
Data Catalog
Exploratory Analytics
Analytics
Data Processing at Scale
PII Masking/Filter
ETL
2nd Parties
Hashed PII Hashed PII
Decentralized
Hashed PII Matching
HITCH Token+De-Identified Data
Amazon
SageMaker
Amazon Macie
PII Compliance
AWS Glue
Amazon Athena
S3 Data Lake
Amazon
QuickSight
Media
Ingestion
Video & Images
S3 Staging
Image Scanning
Amazon
Rekognition
Streaming
Kinesis
Amazon Elasticsearch Service
Identity&
Security
IAM Directory Service KMS
Payroll Providers
Marketing Partners
Insurance Providers
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Key takeaways
Detailed
catalog of
locations where
sensitive data
resides
Methodology
to audit,
assess, and
then handle PII
Significantly
accelerated time
to value—
months to
minutes
Improved
consumer,
employee, and
shareholder trust
and revenue
Risk of 2nd party
misuse reduced,
enables data
monetization
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Ryan Peterson
ryapet@amazon.com
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

More Related Content

What's hot

Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018
Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018
Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018Amazon Web Services
 
Using Amazon Kinesis Data Streams as a Low-Latency Message Bus (ANT361) - AWS...
Using Amazon Kinesis Data Streams as a Low-Latency Message Bus (ANT361) - AWS...Using Amazon Kinesis Data Streams as a Low-Latency Message Bus (ANT361) - AWS...
Using Amazon Kinesis Data Streams as a Low-Latency Message Bus (ANT361) - AWS...Amazon Web Services
 
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...Amazon Web Services
 
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...Amazon Web Services
 
SRV307 Applying AWS Purpose-Built Database Strategy: Match Your Workload to ...
 SRV307 Applying AWS Purpose-Built Database Strategy: Match Your Workload to ... SRV307 Applying AWS Purpose-Built Database Strategy: Match Your Workload to ...
SRV307 Applying AWS Purpose-Built Database Strategy: Match Your Workload to ...Amazon Web Services
 
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Amazon Web Services
 
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018Amazon Web Services
 
Enabling a Digital Platform with Microservices Architecture (ARC218-S) - AWS ...
Enabling a Digital Platform with Microservices Architecture (ARC218-S) - AWS ...Enabling a Digital Platform with Microservices Architecture (ARC218-S) - AWS ...
Enabling a Digital Platform with Microservices Architecture (ARC218-S) - AWS ...Amazon Web Services
 
BP Takes a Quantum Leap Towards a Cloud-First Network (OIG301) - AWS re:Inven...
BP Takes a Quantum Leap Towards a Cloud-First Network (OIG301) - AWS re:Inven...BP Takes a Quantum Leap Towards a Cloud-First Network (OIG301) - AWS re:Inven...
BP Takes a Quantum Leap Towards a Cloud-First Network (OIG301) - AWS re:Inven...Amazon Web Services
 
Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...
Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...
Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...Amazon Web Services
 
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...Amazon Web Services
 
Deep Dive on Amazon S3: Manage Operations Across Amazon S3 Objects at Scale (...
Deep Dive on Amazon S3: Manage Operations Across Amazon S3 Objects at Scale (...Deep Dive on Amazon S3: Manage Operations Across Amazon S3 Objects at Scale (...
Deep Dive on Amazon S3: Manage Operations Across Amazon S3 Objects at Scale (...Amazon Web Services
 
High Performance Data Streaming with Amazon Kinesis: Best Practices (ANT322-R...
High Performance Data Streaming with Amazon Kinesis: Best Practices (ANT322-R...High Performance Data Streaming with Amazon Kinesis: Best Practices (ANT322-R...
High Performance Data Streaming with Amazon Kinesis: Best Practices (ANT322-R...Amazon Web Services
 
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT331) - AWS re:Inv...
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT331) - AWS re:Inv...Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT331) - AWS re:Inv...
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT331) - AWS re:Inv...Amazon Web Services
 
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018Amazon Web Services
 
Optimize Your SaaS Offering with Serverless Microservices (GPSTEC405) - AWS r...
Optimize Your SaaS Offering with Serverless Microservices (GPSTEC405) - AWS r...Optimize Your SaaS Offering with Serverless Microservices (GPSTEC405) - AWS r...
Optimize Your SaaS Offering with Serverless Microservices (GPSTEC405) - AWS r...Amazon Web Services
 
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...Amazon Web Services
 
Search Your DynamoDB Data with Amazon Elasticsearch Service (ANT302) - AWS re...
Search Your DynamoDB Data with Amazon Elasticsearch Service (ANT302) - AWS re...Search Your DynamoDB Data with Amazon Elasticsearch Service (ANT302) - AWS re...
Search Your DynamoDB Data with Amazon Elasticsearch Service (ANT302) - AWS re...Amazon Web Services
 
Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...
Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...
Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...Amazon Web Services
 
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018Amazon Web Services
 

What's hot (20)

Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018
Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018
Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018
 
Using Amazon Kinesis Data Streams as a Low-Latency Message Bus (ANT361) - AWS...
Using Amazon Kinesis Data Streams as a Low-Latency Message Bus (ANT361) - AWS...Using Amazon Kinesis Data Streams as a Low-Latency Message Bus (ANT361) - AWS...
Using Amazon Kinesis Data Streams as a Low-Latency Message Bus (ANT361) - AWS...
 
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
 
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
 
SRV307 Applying AWS Purpose-Built Database Strategy: Match Your Workload to ...
 SRV307 Applying AWS Purpose-Built Database Strategy: Match Your Workload to ... SRV307 Applying AWS Purpose-Built Database Strategy: Match Your Workload to ...
SRV307 Applying AWS Purpose-Built Database Strategy: Match Your Workload to ...
 
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
 
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
 
Enabling a Digital Platform with Microservices Architecture (ARC218-S) - AWS ...
Enabling a Digital Platform with Microservices Architecture (ARC218-S) - AWS ...Enabling a Digital Platform with Microservices Architecture (ARC218-S) - AWS ...
Enabling a Digital Platform with Microservices Architecture (ARC218-S) - AWS ...
 
BP Takes a Quantum Leap Towards a Cloud-First Network (OIG301) - AWS re:Inven...
BP Takes a Quantum Leap Towards a Cloud-First Network (OIG301) - AWS re:Inven...BP Takes a Quantum Leap Towards a Cloud-First Network (OIG301) - AWS re:Inven...
BP Takes a Quantum Leap Towards a Cloud-First Network (OIG301) - AWS re:Inven...
 
Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...
Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...
Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...
 
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...
 
Deep Dive on Amazon S3: Manage Operations Across Amazon S3 Objects at Scale (...
Deep Dive on Amazon S3: Manage Operations Across Amazon S3 Objects at Scale (...Deep Dive on Amazon S3: Manage Operations Across Amazon S3 Objects at Scale (...
Deep Dive on Amazon S3: Manage Operations Across Amazon S3 Objects at Scale (...
 
High Performance Data Streaming with Amazon Kinesis: Best Practices (ANT322-R...
High Performance Data Streaming with Amazon Kinesis: Best Practices (ANT322-R...High Performance Data Streaming with Amazon Kinesis: Best Practices (ANT322-R...
High Performance Data Streaming with Amazon Kinesis: Best Practices (ANT322-R...
 
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT331) - AWS re:Inv...
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT331) - AWS re:Inv...Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT331) - AWS re:Inv...
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT331) - AWS re:Inv...
 
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
 
Optimize Your SaaS Offering with Serverless Microservices (GPSTEC405) - AWS r...
Optimize Your SaaS Offering with Serverless Microservices (GPSTEC405) - AWS r...Optimize Your SaaS Offering with Serverless Microservices (GPSTEC405) - AWS r...
Optimize Your SaaS Offering with Serverless Microservices (GPSTEC405) - AWS r...
 
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
 
Search Your DynamoDB Data with Amazon Elasticsearch Service (ANT302) - AWS re...
Search Your DynamoDB Data with Amazon Elasticsearch Service (ANT302) - AWS re...Search Your DynamoDB Data with Amazon Elasticsearch Service (ANT302) - AWS re...
Search Your DynamoDB Data with Amazon Elasticsearch Service (ANT302) - AWS re...
 
Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...
Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...
Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...
 
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
 

Similar to Data Privacy & Governance in the Age of Big Data: Deploy a De-Identified Data Lake (GPSTEC303) - AWS re:Invent 2018

Protect customer privacy with AWS - GRC351 - AWS re:Inforce 2019
Protect customer privacy with AWS - GRC351 - AWS re:Inforce 2019 Protect customer privacy with AWS - GRC351 - AWS re:Inforce 2019
Protect customer privacy with AWS - GRC351 - AWS re:Inforce 2019 Amazon Web Services
 
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...Amazon Web Services
 
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...Amazon Web Services
 
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...Amazon Web Services
 
Emerging Trends in Big Data, Analytics, Machine Learning, and Internet-of-Thi...
Emerging Trends in Big Data, Analytics, Machine Learning, and Internet-of-Thi...Emerging Trends in Big Data, Analytics, Machine Learning, and Internet-of-Thi...
Emerging Trends in Big Data, Analytics, Machine Learning, and Internet-of-Thi...Michaela Bromfield
 
Big Data - EBC on the road Brazil Edition [Portuguese]
Big Data - EBC on the road Brazil Edition [Portuguese]Big Data - EBC on the road Brazil Edition [Portuguese]
Big Data - EBC on the road Brazil Edition [Portuguese]Amazon Web Services
 
BI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSBI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSAmazon Web Services
 
Uses of Data Lakes: Data Analytics Week SF
Uses of Data Lakes: Data Analytics Week SFUses of Data Lakes: Data Analytics Week SF
Uses of Data Lakes: Data Analytics Week SFAmazon Web Services
 
Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
 Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
Big Data Meets AI - Driving Insights and Adding Intelligence to Your SolutionsAmazon Web Services
 
Choose the right DB for the Job - Builders Day Israel
Choose the right DB for the Job - Builders Day IsraelChoose the right DB for the Job - Builders Day Israel
Choose the right DB for the Job - Builders Day IsraelAmazon Web Services
 
Non-Relational Revolution: Database Week SF
Non-Relational Revolution: Database Week SFNon-Relational Revolution: Database Week SF
Non-Relational Revolution: Database Week SFAmazon Web Services
 
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...AWS Summits
 
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)Amazon Web Services
 
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018Amazon Web Services
 
Breaking the Ice: How Broadridge is Helping Customers Transform Cold Archiva...
 Breaking the Ice: How Broadridge is Helping Customers Transform Cold Archiva... Breaking the Ice: How Broadridge is Helping Customers Transform Cold Archiva...
Breaking the Ice: How Broadridge is Helping Customers Transform Cold Archiva...Amazon Web Services
 

Similar to Data Privacy & Governance in the Age of Big Data: Deploy a De-Identified Data Lake (GPSTEC303) - AWS re:Invent 2018 (20)

Protect customer privacy with AWS - GRC351 - AWS re:Inforce 2019
Protect customer privacy with AWS - GRC351 - AWS re:Inforce 2019 Protect customer privacy with AWS - GRC351 - AWS re:Inforce 2019
Protect customer privacy with AWS - GRC351 - AWS re:Inforce 2019
 
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
 
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
 
BI & Analytics
BI & AnalyticsBI & Analytics
BI & Analytics
 
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
 
Uses of Data Lakes
Uses of Data LakesUses of Data Lakes
Uses of Data Lakes
 
Emerging Trends in Big Data, Analytics, Machine Learning, and Internet-of-Thi...
Emerging Trends in Big Data, Analytics, Machine Learning, and Internet-of-Thi...Emerging Trends in Big Data, Analytics, Machine Learning, and Internet-of-Thi...
Emerging Trends in Big Data, Analytics, Machine Learning, and Internet-of-Thi...
 
Big Data - EBC on the road Brazil Edition [Portuguese]
Big Data - EBC on the road Brazil Edition [Portuguese]Big Data - EBC on the road Brazil Edition [Portuguese]
Big Data - EBC on the road Brazil Edition [Portuguese]
 
BI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSBI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWS
 
Uses of Data Lakes: Data Analytics Week SF
Uses of Data Lakes: Data Analytics Week SFUses of Data Lakes: Data Analytics Week SF
Uses of Data Lakes: Data Analytics Week SF
 
Data Lakes in the Wild
Data Lakes in the WildData Lakes in the Wild
Data Lakes in the Wild
 
Customer Uses of Data Lakes
Customer Uses of Data LakesCustomer Uses of Data Lakes
Customer Uses of Data Lakes
 
Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
 Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
 
Choose the right DB for the Job - Builders Day Israel
Choose the right DB for the Job - Builders Day IsraelChoose the right DB for the Job - Builders Day Israel
Choose the right DB for the Job - Builders Day Israel
 
Non-Relational Revolution: Database Week SF
Non-Relational Revolution: Database Week SFNon-Relational Revolution: Database Week SF
Non-Relational Revolution: Database Week SF
 
Non-Relational Revolution
Non-Relational RevolutionNon-Relational Revolution
Non-Relational Revolution
 
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
 
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
 
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
 
Breaking the Ice: How Broadridge is Helping Customers Transform Cold Archiva...
 Breaking the Ice: How Broadridge is Helping Customers Transform Cold Archiva... Breaking the Ice: How Broadridge is Helping Customers Transform Cold Archiva...
Breaking the Ice: How Broadridge is Helping Customers Transform Cold Archiva...
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Data Privacy & Governance in the Age of Big Data: Deploy a De-Identified Data Lake (GPSTEC303) - AWS re:Invent 2018

  • 1.
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data Privacy and Governance in the Age of Big Data: Deploying a De-Identified Data Lake Ryan Peterson Principal Solutions Architect, Data & Analytics AWS Partner Team G P S T E C 3 0 3 Danielle Greshock Sr. Manager, Business Applications AWS Partner Team
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Let’s look at some metrics
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What problems are customers trying to solve? • What type of data am I collecting? • Where do I collect it? • Where do I store it? • Do I have the appropriate legal collection statements? • How and when do I delete data? • How do I secure the data? • What responsibility do I have? • Why do I collect the data? • What is my legal basis for processing and using the data? • Where is a list of all my data? • Do I communicate with the subject I am collecting from? • Who do I share it with? • Who has access to my data? How do I control it? • What are the use cases for the data? Are they permitted? Who provided permission? • How do I find my data?
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How are privacy regulations attempting to protect consumers? CSA Cloud Security Alliance Controls ISO 9001 Global Quality Standard ISO 27001 Security Management Controls ISO 27017 Cloud Specific Controls ISO 27018 Personal Data Protection PCI DSS Level 1 Payment Card Standards SOC 1 Audit Controls Report SOC 2 Security, Availability, & Confidentiality Report SOC 3 General Controls Report Global United States CJIS Criminal Justice Information Services DoD SRG DoD Data Processing FedRAMP Government Data Standards FERPA Educational Privacy Act FIPS Government Security Standards FISMA Federal Information Security Management GxP Quality Guidelines and Regulations ISO FFIEC Financial Institutions Regulation HIPAA Protected Health Information ITAR International Arms Regulations MPAA Protected Media Content NIST National Institute of Standards and Technology SEC Rule 17a-4(f) Financial Data Standards VPAT/Section 508 Accountability Standards Asia Pacific FISC [Japan] Financial Industry Information Systems IRAP [Australia] Australian Security Standards K-ISMS [Korea] Korean Information Security MTCS Tier 3 [Singapore] Multi-Tier Cloud Security Standard My Number Act [Japan] Personal Information Protection Europe C5 [Germany] Operational Security Attestation Cyber Essentials Plus [UK] Cyber Threat Protection G-Cloud [UK] UK Government Standards IT-Grundschutz [Germany] Baseline Protection Methodology X P G GDPR [EU] General Data Protection RegulationCCPA California Consumer Privacy Act of 2018
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Internal challenges with development lifecycle What challenges do I have, and what can I do? What are my major risks for data compromise?
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Situation two: External challenges with bad actors What challenges do I have, and what can I do? What are my major risks for data compromise?
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How we’ve handled such challenges, pre-GDPR • Outsource credit card processing • Masking personally identifiable information (PII) • Firewall rules for distributed denial-of-service (DDoS) • Network lockdowns • Database encryption • Alerting • Environment automation (no SSH) • Validation and sanitation of your data input and output (SQL injection / cross-site scripting). Does the data look like it’s supposed to?
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How does a de-identified data lake help? • It protects what both internal and external bad actors want—the data! • It allows developers to focus on their goals— high-quality, tested software
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Then and now Directives Laws Best practices / Good ethics Regulatory requirements No consequences Heavy fines Overhead In design and necessity
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How do we resolve PII dangers? • Do we need to solve these individual issues? • Is there a solution architecture that solves all PII issues? • What best practices have you used to mitigate PII dangers?
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Oracle SQL Server Teradata Avro, Parquet CSV, JSON, XML DB, DWH, NoSQL, NewSQL, In-Memory Structured & Richer file formats SaaS, On premise Social & Web Logs Batch Ingestion Compute ClusterAmazon EMR 2nd Party PII Curated data Amazon Redshift / Spectrum Amazon RDS Amazon DynamoDB Analysis, ML, Search, Reporting & Dashboards Data Catalog Exploratory Analytics Analytics Data Processing at Scale PII Masking/Filter ETL 2nd Parties Hashed PII Hashed PII Decentralized Hashed PII Matching HITCH Token+De-Identified Data Amazon SageMaker Amazon Macie PII Compliance AWS Glue Amazon Athena S3 Data Lake Amazon QuickSight Media Ingestion Video & Images S3 Staging Image Scanning Amazon Rekognition Streaming Kinesis Amazon Elasticsearch Service Identity& Security IAM Directory Service KMS Payroll Providers Marketing Partners Insurance Providers
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Why catalogs are key New business initiatives Advanced analytics How do I understand, document, and ensure proper usage? What data is available to me? Regulatory compliance How do I use my data for business value?
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data Catalog current & future state Current state Two types of data catalogs: 1. Pure data cataloging for inventorying and identification 2. Catalogs embedded in apps to make more useful Has resulted in limited usefulness because no integration into larger enterprise information management activities Desired state Data marketplace with an actionable data catalog: - Find data - Transform data - Provision data - Maintain data governance and catalog currency
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Actionable Data Catalog via a self-service Data Platform Self-ServiceDataPlatform RDBMS Data lake Catalogs Data governance tools Data discovery & catalog Ingest Transform Improve Files RDBMS Data lake Reports Data marketplace Data management Data consumers Applications Actionable data catalog • Enables a self-service Data Platform • Increases productivity of data producers and consumers • Governance and catalog currency are infused throughout the process
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Self-Service Data Platform: Discover, catalog & ingest • Catalog everything ▪ Relational database management system (RDBMS), Data Lakes, Automated Data Inventory • Leverage enterprise definitions & standards • Annotate & customize for business need • Empower SME’s to discover new data Ingest Catalog
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Self-Service Data Platform: Prepare & manage • Enrich from catalog ▪ Leverage data profile & quality findings ▪ Build recipes to cleanse, join, and aggregate • Schedule, Manage, Operationalize ▪ Scale out ▪ Govern based on DM policies Workflow designer Token masking
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Self-Service Data Platform: Marketplace • Collaborate ▪ Workspaces & rankings ▪ Annotate and customize • Share ▪ Provision to existing enterprise data warehouses (EDW) or data marts ▪ Deliver to data science data lake or zone ▪ Enable in data visualization or BI tools (Tableau) Share Provision
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Deployment architecture for AWS data lake with ZDP Amazon ES Amazon Athena Amazon Kinesis Amazon Kinesis Data Firehose Amazon S3 AWS Glue Availability ZoneAvailability Zone Private subnet 1 Public subnet 1 Private subnet 2 Public subnet 2 Amazon QuickSight NAT GW NAT GW Internet GW Amazon RDS S Amazon RDS Amazon S3 Endpoint Amazon S3 Endpoint AWS Region Amazon Redshift Storage GWStorage GW Zaloni bastionZaloni bastion Auto Scaling Group Cluster instances Cluster instances Zaloni Managed Clusters AWS Directory Service Amazon CloudWatch Elastic IPs Elastic IPs
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Zaloni’s Integrated Self-Service Data Platform (ZDP) offerings Enable Govern Engage • Batch ingestion • Streaming ingestion • Metadata capture • Auto discovery • Data quality • Data lineage • Data mastering • Data privacy/security • Data enrichment • Data lifecycle management • Discovery catalog • Self-service ingestion • Self-service preparation
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Dataguise DgSecure capabilities Detect Find and report the exact quantity and location of sensitive data in structured, unstructured and semi- structured content Protect Remediate your sensitive data exposure by masking and / or encryption it at the element level Monitor Track how and where sensitive data is being accessed through a 360° dashboard Right of access Upon request, precisely find and report all records and other information of a specific individual Right to erasure Upon request, delete all records and other information of a specific individual
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Validating the knowns & finding the unknowns— Structured and semi-structured data Email Customer ID Transcript csalazar@example.com 19664 Just talked to Carlos Salazar mary@example.com 23423 Mary’s SSN is 000000000 mateo@example.com 99644 Mateo is moving to Nevada NA 02945 It is expected to rain tomorrow ID Name, SSN, StateEmail
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Validating the knowns & finding the unknowns— Structured and semi-structured Email Customer ID Transcript csalazar@example.com 19664 Just talked to Carlos Salazar mary@example.com 23423 Mary’s SSN is 000000000 mateo@example.com 99644 Mateo is moving to Nevada NA 02945 It is expected to rain tomorrow ID Name, SSN, StateEmail Email Customer ID Transcript 4t34gttt 7462391 Just talked to Jane Roe 44e5325 1239474 Jorge’s SSN is 666666666 0we&yrw 9983487 Sofia is moving to Texas NA 3344325 It is expected to rain tomorrow
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Finding the needles in the haystack Unstructured data Customers Call Center Your call will be recorded for quality assurance Social Security Number Full Name 1 2 3 4 5 ……………..this is Shirley Rodriguez, and my social is six six six six six six six six six is there any more information you need for my app...........
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Match datasets without PII ever leaving your organization Senate matching Remain in control Matching subject to strict governance and licensing workflows. Audit all matching. Private by design PII never leaves your organization’s firewall Future-proof compliance The end of the PII honeypot Decentralized matching occurs on hash fragments. Protect PII at all times.
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. The decentralized matching process Senate matching protects customer PII for data custodians: 1. Tokenization: Anonymizes PII by replacing it with random tokens when datasets are uploaded to the contributor node 2. Hashing: Protects PII by hashing (one-way, non reversible encryption)—original PII is not stored 3. Slicing: Distributes hash fragments across a distributed computing network 4. Matching: Provides a matching service to link identities across different contributor nodes, without disclosing data or putting it at risk of misuse 5. Governance: Performs these matches according to data exchange governance within Data Republic’s Senate Platform 6. Auditable: Functions in a transparent and verifiable way so that analysts and custodians can build trust in the system, the data, and with each other
  • 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Comparing methods for PII management Custodian’s database (CRM or EDW) contains customer PII data that must be protected The central data store is able to match customer records from multiple contributors Matched customer tokens are used by an analyst to link datasets Custodian’s database (CRM or EDW) contains customer PII data that must be protected No single company can match or re-identify customer data Matched customer tokens are used by an analyst to link datasets Senate Matching Contributor Node hashes the PII so that the original text is unrecoverable Hash values are sliced into parts and stored in a distributed network The old way The new way with senate matching
  • 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Senate matching processSenate matching Customer database Contributor node Decentralized Network Aggregator node Your org’s environment The matcher network Senate After logging into the DR’s Senate Platform Attribute data uploaded Data license + match approved Match Discovery workspace Output The node returns the random tokens to the data custodian. Aggregator node Custodian approves match request. The Matcher network sends their list of matches to the Aggregator Node, which sends decrypted tokens. Anonymized data Custodian sends anonymized data to Senate by substituting any identifying information with the random token generated by their Contributor Node
  • 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. The problem The old way Bringing data from sources like databases into data lakes and data warehouses to perform analytics Today There are many operational systems and sources of data ETL needs to adapt Risk is increased by compounding the number of locations where data is stored Cloud-based repositories Modern data lakes and analytics tools are all based in the cloud
  • 34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How data normally flows … Extraction process Load process Transformation process Amazon S3 data lake Amazon Redshift staging table Reporting process Amazon Redshift destination table Reports and extracts Source data (Database or API)
  • 35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. The solution: Transforming sensitive data The key to building a de- identified system is adding a sensitive data transformation step to the data extraction process Extraction and transformation process Load process Post-load transformation Amazon S3 data lake Amazon Redshift staging table Reporting process Amazon Redshift destination table Source data (Database or API) Reporting process Reports and extracts
  • 36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How customers use Etleap Source data (Database or API) Amazon S3 de-identified data lake Amazon Redshift data warehouse Extraction, sensitive data filtering, milestoning, and transformation Data curation
  • 37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Oracle SQL Server Teradata Avro, Parquet CSV, JSON, XML DB, DWH, NoSQL, NewSQL, In-Memory Structured & Richer file formats SaaS, On premise Social & Web Logs Batch Ingestion Compute ClusterAmazon EMR 2nd Party PII Curated data Amazon Redshift / Spectrum Amazon RDS Amazon DynamoDB Analysis, ML, Search, Reporting & Dashboards Data Catalog Exploratory Analytics Analytics Data Processing at Scale PII Masking/Filter ETL 2nd Parties Hashed PII Hashed PII Decentralized Hashed PII Matching HITCH Token+De-Identified Data Amazon SageMaker Amazon Macie PII Compliance AWS Glue Amazon Athena S3 Data Lake Amazon QuickSight Media Ingestion Video & Images S3 Staging Image Scanning Amazon Rekognition Streaming Kinesis Amazon Elasticsearch Service Identity& Security IAM Directory Service KMS Payroll Providers Marketing Partners Insurance Providers
  • 38. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Key takeaways Detailed catalog of locations where sensitive data resides Methodology to audit, assess, and then handle PII Significantly accelerated time to value— months to minutes Improved consumer, employee, and shareholder trust and revenue Risk of 2nd party misuse reduced, enables data monetization
  • 39. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Ryan Peterson ryapet@amazon.com
  • 40. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.