SlideShare a Scribd company logo
1 of 75
Download to read offline
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Best Practices to Secure Data Lake
on AWS
Varun Rao Bhamidimarri
Solution Architect
AWS
A N T 3 2 7
Tony Nguyen
Senior Big Data Consultant
AWS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Understand
the value
proposition for the
data lake and how
Amazon Web
Services (AWS) can
help
1
of some best
practices for secure
data lake
implementation
2
into role/scenario
based approaches to
data lake security
3
What to expect
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Assumptions and Housekeeping
• Targeted towards anyone wanting to build a secure data lake on AWS
• Assumes:
• Foundational AWS knowledge
• High level knowledge of Data Lake and AWS Analytics service portfolio
• Knowledge of security concepts such as SSL / TLS, encryption, authentication /
authorization
• This session slides and recording will be shared online
• Please don’t forget to submit your feedback!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What is a data lake?
• Collect and store all data, at any scale,
and low cost
• Helps locate, curate, and secure your
data
• Provide democratized access to data
within your organization
• Quickly and easily perform new types of
data analysis
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Primary components of a data lake
Storage
Compute
Metadata/Catalog
Storage & streams
Catalogue & search
Entitlements
API & UI
Attributes of a modern
data architecture
Key pillars of a
data lake
Architectural layers of a
Data lake (without security)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Primary components of a data lake
Storage
Compute
Metadata/Catalog
Architectural layers of a
data lake (without security)
• Object
• Block
• File
• Attached
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Primary components of a data lake
Storage
Compute
Metadata/Catalog
Architectural layers of a
data lake (without security)
index
search
• Curate
commission
decommission
lineage
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Primary components of a data lake
Storage
Compute
Metadata/Catalog
Architectural layers of a
data lake (without security)
• Server-based
• Serverless
• Hybrid
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Building a data lake on AWS
Batch
AWS STS
Metadata User Access
Data Movement Analytics and Machine Learning
Security/Governance
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Building a data lake on AWS
Batch
AWS STS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Securing all of these tools is challenging
• Having such a diverse set of tools
from the ecosystem allows you to
choose the best tool for the job…
• …but also makes a single unified
solution for security challenging!
• How do you secure each layer,
while still satisfying your specific
security and compliance
requirements?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Security challenges with data lakes
Data challenges
• Controlling access to data
• Data masking, row / column / cell
level encryption, key management
• Data loss / exfiltration
• Loss of data integrity
• Data provenance
• Compliance requirements
(GDPR and others)
Management challenges
• Central administration
• Federated authentication,
typically with Active
Directory
• Role-based access control
(RBAC)
• Centralized audit
• End-to-end data protection
(at-rest and in-transit)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Shared responsibility model
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Shared responsibility model – service types
• Infrastructure
• Managed
• Serverless
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Shared responsibility model – comparison
Amazon EMR
• Amazon EC2 infrastructure needs to
be managed
• Root-level access via SSH
• Patching of instances
• Some level of Amazon CloudWatch /
Amazon CloudTrail logging is done
for customer, but not exhaustive
• Instance profile role, Amazon EMR
Service role need to be configured by
customer
• Local disk encryption, Amazon S3
encryption, etc. needs to be
configured by customer…
Amazon Athena
• No infrastructure to manage
• Service access is governed via IAM
policy documents
• Amazon S3 access is via bucket
policy / IAM policy
• Encryption is managed
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS helps you secure
Compliance
AWS Artifact
Amazon Inspector
AWS CloudHSM
Amazon Cognito
AWS CloudTrail
Security
Amazon GuardDuty
AWS Shield
AWS WAF
Amazon Macie
Amazon Virtual Private
Cloud (Amazon VPC)
Encryption
AWS Certificate Manager
AWS Key Management
Service (AWS KMS)
Encryption at rest
Encryption in transit
Bring your own keys, HSM
support
Identity
AWS Identity and Access
Management (IAM)
AWS Single Sign-On
Amazon Cloud Directory
AWS Directory Service
AWS Organizations
Customers need to have multiple levels of security, identity and access management,
encryption, and compliance to secure their data lake
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon
S3 buckets and
prefixes
Amazon
EMR clusters
Team X
Amazon
Redshift clusters
“Coarse-grained”
ownership
Prefer “coarse-grained” ownership
• Teams own entire Amazon S3 buckets
and clusters
• Ownership segregated by AWS accounts
• Access control easier to setup and
maintain
• Suitable for autonomous teams
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Encrypt data at rest
Pick encryption mode for Amazon S3 objects
CSE - KMS CSE - C
SSE - KMS SSE - C SSE - S3Server side
Client side
AWS KMS S3 built-inCustom KMS
Key
Management
overhead
Encryption
point
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Compliance: Log and audit all AWS activity
• Log and continuously
monitor every account activity
and API calls with Amazon
CloudTrail
• Increase visibility into your user
and resource activity
• Log management and data
events into separate trails
• Centralize logs into separate
security account
• Disable S3 delete using IAM
Store data in
Amazon S3
Account event
occurs generating
API activity
CloudTrail captures
and records the
API activity
A log of API calls
is delivered to
S3 bucket and
optionally delivered
to Amazon
CloudWatch Logs
and CloudWatch
Events
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Security in the cloud - basics
• Account
• Federate accounts with Active Directory / Identity Provider
• Setup multi-factor authentication (MFA)
• Avoid using root account credentials
• IAM access should be least privilege
• Network
• Private VPC Subnets
• VPC endpoint/Interface endpoints
• Least privilege for Security groups
• Storage
• Encrypt using KMS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Different types of roles
Security Admin
Data Lake on AWSData Curator
Analyst
Data Engineer
Data scientist
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data workflow
• All roles have actions and responsibilities that correspond to each
phase in the overall data workflow
• Think of the data lake in terms of producers and consumers
COLLECT STORE DELIVER ANALYZESUBSCRIBEDISCOVER
What data do I have?
“Through 2018, 80% of data
lakes will not include effective
metadata management
capabilities, making them
inefficient.”
-Gartner
Data lake
on AWS
Storage | Archival storage | Data catalog
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Onboarding new data
Data Curator
Identify New
Data
Create Dataset
Definition
Load / Stage
Raw Data
Register Raw
Data against
Dataset
Definition
Data Owner Developer / Data
Engineer
Developer / Data
Engineer
COLLECT STORE
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Searching and accessing data
DELIVERSUBSCRIBEDISCOVER
Search for
Data in Data
Catalog
Identify Data,
Request
Access
Approve
Access
Access and
Query Data
Data Scientist /
Business User
Data Owner/Security
Admin
Data Scientist /
Analyst
Data Scientist /
Business User
ANALYZE
DISCOVER
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data analytics tools and access patterns
Machine Learning/Deep Learning
IDE
Central
Storage
spectrum
Programmatic
SQL
API
JDBC/ODBC
CLI
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Security admin tasks
• Setup security guardrails
• Preemptive and detective controls
• Provide data access across teams/environments
• Validate security requirements based on data classification
• Verify Data owner/producer has authorized access
• Run regular audits
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3 – preemptive controls
• Create buckets based on business domains
• Assign bucket policies
• Restrict by VPC, HTTPS, IP filters, KMS keys
• Restrict using Tags/Conditions
• "Condition": {"StringEquals": {"S3:ResourceTag/HIPAA":"True"}
• "Condition": {"StringEquals": {"aws:UserAgent": "AWS Redshift/Spectrum"}
• Enable encryption/Enable versioning
• MFA delete
• Enable backups – across accounts/regions
• IAM permission boundary
• S3 public access setting N E W !
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3 data – detective controls
• Enable AWS Config to detect S3 bucket level changes
• s3-bucket-public-read-prohibited, s3-bucket-public-write-prohibited, s3-bucket-ssl-
requests-only
• S3 data access audit using CloudTrail – Log to separate CloudWatch
logs
• Kerberos enabled EMR clusters allows you to track AD user
• Use Amazon GuardDuty to detect unauthorized and unexpected
activity
• Enable Amazon Macie to classify sensitive data
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Encrypt data in transit
Point “A” Point “B” Data flow protection
Enterprise data
sources
Amazon S3 Encrypted with SSL/TLS; S3 requests signed with AWS Sigv4
Amazon S3 Amazon EMR Encrypted with SSL/TLS
Amazon S3 Amazon Redshift Encrypted with SSL/TLS
Amazon EMR Clients Encrypted with SSL/TLS; varies with Hadoop application client
Amazon Redshift Clients Supports SSL/TLS; Requires configuration
Apache Hadoop on Amazon EMR
• Hadoop RPC encryption
• HDFS Block data transfer encryption
• KMS over HTTPS is not enabled by default with Hadoop KMS
• May vary with EMR release (such as Tez and Spark in release 5.0.0+)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AD groups
Security authorization mapping
IAM roles Redshift DB Groups
EMR Security
configuration
AD group based
policies
Glue catalog
resource policies
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Map database ACL’s to db grant/glue policy
catalog.user_table
grant group developer select on catalog.user_table
AD group: developer
Action: ['glue:GetTable*’, ‘glue:GetPartiton*’]
Principal: [“arn:aws:iam::<account>:role/developer_role”]
Resource:[“arn:aws:glue:<region>:<account>:table/gluecatalog/user_table”,
“arn:aws:glue:<region>:<account-id>:table/gluecatalog/user_table/*”]
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Map storage ACL’s to Amazon S3 policy
s3://bucket/path/
Effect: Allow
Action: [s3:ListBucket’, “s3:GetObject”]
Principal: [“arn:aws:iam::<account>:role/developer_role”]
Resource: [“s3://bucket/path”, “s3://bucket/path/*”]
AD group: developer
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Scenario – retail company X
Sales
SKU
customer_id
date_of_purchase
price
Product
SKU
category
product_name
manufacturer
Customer
customer_id
age
gender
location
cc_info
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Gather insights from the data
• Business user (External vendor – belongs to a manufacturer)
• Sales by product category (cannot see other manufacture's data)
• Sales by location
• Get sales forecast by product
• Analyst (Employee – may belong to a Product line/Business unit)
• Sales by product category
• Sales by location
• Data scientist (Employee – may not belong to a Business Unit)
• Forecast the sales of a specific product, based on age group, location and time of the year
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Workflow – onboarding new data
Security AdminAnalyst/Business
user
Analyst/Business
user
ANALYZE
DISCOVER
DELIVERSUBSCRIBEDISCOVER
Curator
Data Engineer
STORE
Get Sales
by product
category
Setup data
pipeline
Register to
data
catalog
Setup
appropriate
permissions
Create and
consume
reports
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Role based tasks
Data EngineerCurator Security Admin
Setup staging catalog
Enable access to Data
Engineer
Verifies and
commissions dataset to
production catalog
Setup Amazon EMR
cluster
Setup process to
move data from
source into Amazon
S3
Orchestrate and
schedule the job
Analyst
Create and publish
dashboard
Enable access to
Analyst
Setup Row-level
security for
business users
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Grant data/catalog access – data engineer
Central Catalog Account
Staging
Curator
Data Engineer
Business unit account
Data Engineer
Role
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Onboarding new data – security/configuration
"Classification": "spark-hive-site", "Properties":
{
"hive.metastore.client.factory.class":
"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory",
"hive.metastore.glue.catalogid": "acct-id"
}
Effect: Allow
Action: ['glue:*Database*’, ‘glue:*Table*’,’glue:*Partition*’]
Effect: Allow
Action: ['s3:PutObject', 's3:GetObject', 's3:DeleteObject']
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue catalog - resource policies
• Fine-grained access control to Catalog using IAM policies
• Restrict what they can view and query
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Build the data pipeline – Amazon EMR
Data
Extract/Collect User/VisualizeStore Process/Analyze
Answer
and
Insight
Data
Data Engineer
AWS DataSync
SFTP Service for Amazon S3
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon EMR - authentication
• Configure Kerberos for cluster authentication
• LDAP for HiveServer2, Hue, Presto, Zeppelin
• Perimeter security using Apache Knox
• Simplify authentication of various Hadoop services and UI's
• Mask service specific URL's/Ports by acting as a Proxy
• Enable SSL/TLS termination at the perimeter
• Ease management of published endpoints across multiple clusters
• Supports federation
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon EMR – storage authorization
• Control access to Amazon S3 based
on user’s AD groups
• Use different IAM roles for EMRFS
requests to Amazon S3
• These IAM roles can be mapped to
users, groups or the location of data
in Amazon S3.
AD groups IAM roles
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon EMR – service authorization
• Apache Ranger provides
authorization of Hadoop cluster
services
• Eg: Hive tables, HDFS files, HBase etc
• Also provides Audits
• Column masking and Row filtering
for Hive
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Best practices - Amazon EMR security
• EMRFS storage AuthZ
• Apache Ranger
• Table and SQL-level authorization
for Hive using HiveServer2
• Role-based Authorization with AD
• IAM
• Amazon EMR logs to Amazon S3
• Amazon S3 Access Logs
• Apache Ranger Audit
• Amazon CloudTrail – Amazon EMR
API’s/EMRFS calls
AuthorizationAuthentication
Data protection at rest
Audit
Data protection at motion Compliance Programs
• Kerberos
• Knox, Shiro
• LDAP / AD integration
• SSE-S3 , SSE-KMS, Amazon S3
Client Encryption
• Disk encryption using AWS KMS
• SELinux using EMR BA
• Custom AMI
• SSL/TLS in transit using Security
configurations
• SSL/TLS for calls to S3 (default)
• SOC1,2,3
• ISO
• PCI DSS
• FedRAMP
• HIPAA BAA
• DoD SRG IL2/IL4
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data ready - what next?
• Curator
• Verifies data registered with staging catalog
• Runs sanity checks
• Commissions the dataset into the production catalog
• Creates a View to filter data by Product category
• select * from sales join product where sales.sku = product.sku and category = ‘Electronics’
• Security Admin
• Enable access to Analyst
• Setup Row-level security
• Analyst
• Create and publish dashboard
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Athena - create view
CREATE VIEW sales_electronics AS
SELECT sum(price) FROM sales, product
WHERE sales.sku = product.sku and
product.category = ‘Electronics’
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Athena – secure data flow
Central Catalog Account Business unit account
Data analyst
On-Premise
Identity
provider (IDP)
SAML token
Role
Production
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon QuickSight - data governance
Create managed datasets that give power users and authors the flexibility
to perform self-serve analytics on data that you control.
Create datasets that:
• Can be shared with any user
• Automatically refresh
• Have row level security
• Users cannot modify
• Dynamically update
with changes
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon QuickSight – secure data flow
Central catalog account Business unit account
Business User
On-Premise
Identity
provider (IDP)
SAML token
Role
Row level
filter based
on the
manufacturer
Production
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Athena - security Controls
• IAM policies mapped to Roles –
these polices are passed all the way
to storage layer
• Views with Glue Catalog resource
policies
• All API calls are logged to
CloudTrail
• S3 Access Logs can provide data
access information
AuthorizationAuthentication
Data protection at rest
Audit
Data protection in motion Compliance Programs
• IAM federation
• Cross-account
• EC2 instance profile
• CSE-KMS
• SSE-KMS
• SSE-S3
• Use separate KMS keys for source
and destination buckets
• JDBC connections use TLS/SSL by
default
• Data transfer between S3 and
Athena is encrypted by TLS
• SOC 1,2,3
• HIPAA BAA
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon QuickSight - security controls
• IAM policies
• Row-level Security
• Amazon CloudTrail
• Amazon S3 Access Logs
AuthorizationAuthentication
Data protection at rest
Audit
Data protection in motion Compliance Programs
• IAM federation
• QuickSight-only users
• Cross-account via Amazon S3
• MFA
• Differences between Standard and
Enterprise
• Encrypt your source datasets and
Amazon S3
• QuickSight Enterprise edition: data
at rest in SPICE is also encrypted
• SSL/TLS
• Interface Endpoints to VPCs and
Direct Connect
• HIPAA
• SOC2
• PCI-DSS
• ISO
• FedRAMP
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Workflow – access existing data
Security Engineer
Query Sales by
Location
Access the user
location data
(user id, state)
Review and
authorize
specific
columns
Setup grants
in database
Analyze data
Analyst
Data owner -
Marketing
Analyst Analyst
ANALYZE
DISCOVER
DELIVERSUBSCRIBEDISCOVER
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift – secure data flow
Central Data Catalog Account
Business team account Business
analyst
On-Premise
Identity
provider (IDP)
SAML token
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Redshift federated authentication
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift federated authorization
• Setup Amazon Redshift DBGroups
• For example - CREATE Group ‘XXX’
• Use Grants to setup authorization access
• GRANT SELECT on table ‘YYYY’ to group ‘group1’
• Configure SAML assertion for your IDP
<Attribute Name="https://redshift.amazon.com/SAML/Attributes/DbGroups">
<AttributeValue>group1</AttributeValue>
<AttributeValue>group2</AttributeValue>
<AttributeValue>group3</AttributeValue>
</Attribute>
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift audit logging
Cloudtrail
• Amazon Redshift
API calls
• KMS API calls
• S3 calls
Amazon S3
• Connection logs
• User logs
• User activity logs
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Best practices - Amazon Redshift security
• DB groups with grants
• Restrict access by IAM policy
• Use condition keys “ResourceTag”
and “RequestTag”
• API logs to Amazon Cloudtrail
• Logs to Amazon S3
• Connection logs
• User logs
• User activity logs
AuthorizationAuthentication
Data protection at rest
Audit
Data protection at motion Compliance Programs
• IAM federation
• DB username and password
• KMS
• HSM – AWS CloudHSM Classic
• Key rotation – CMK, DEK
• SSL (ACM) - Set “require_SSL =
true” in parameter group
• FIPS 140-2 support
• SOC1,2,3
• PCI DSS Level 1
• FedRAMP
• HIPAA eligible with BAA
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Workflow – build predictive model
Data ScientistData scientist Data Owner
Want to
predict sales
by location
Access the user
location data
(user id, state)
Review and
authorize
specific
columns
Pull data into
notebook,
develop and
train model
Deploy and
test model
ANALYZE
DISCOVER
DELIVERSUBSCRIBEDISCOVER
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon SageMaker with Apache Spark
Data scientist
Spark Magic
Model EndpointsModels
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Best practices – Amazon Sagemaker security
• Restrict access by IAM policy and
condition keys
• API logs to Amazon Cloudtrail -
exception of InvokeEndpoint
AuthorizationAuthentication
Data protection at rest
Audit
Data protection at motion Compliance Programs
• IAM federation
• KMS based encryption for
• Notebooks
• Training jobs
• Amazon S3 location to store
models
• Endpoint
• HTTPS for API/Console
• Notebooks
• VPC enabled
• Interface endpoint
• Limit by IP
• Training jobs/Endpoints
• VPC enabled
• PCI DSS
• HIPAA eligible with BAA
• ISO
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon.com’s vision is to be
the earth’s most customer—
centric company; where people
can find anything they want
to buy online.
Challenge:
Load 500K+ transactions each day, and
serve 300K+ queries/extracts each day
from Amazon businsses (Amazon.com,
Amazon Prime, Amazon Music, Amazon
Alexa, Amazon Video, and Twitch).
Solution:
• Land data in S3 as a data lake
• Use Redshift as preferred SQL based
analysis by business users, and EMR
for machine learning
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon.com uses AWS for data lakes & analytics
• DynamoDB capturing all
Amazon.com transactions
• Everything from DynamoDB,
RDS PostgreSQL and Kinesis
fed to a Amazon S3 data
lake
• AWS Glue used to catalog
the data
• Amazon Redshift used for all
SQL-based queries, and
Amazon EMR for all machine
learning and big data
processing
• End-users use Amazon
QuickSight for visualizations
AWS Glue
Catalog
Amazon
QuickSight
Amazon S3
Amazon
Athena
Amazon EMR
Amazon
DynamoDB
PostgreSQL
Amazon Kinesis
Amazon
Redshift
Machine
Learning
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Summary
• Federate access
• Setup roles and responsibility matrix within your organization
• Leverage centralized data catalog
• Use both preemptive and detective controls
• Perform regular audits
• Secure storage, catalog and processing layers
• Incentivize teams to register datasets to catalog
• Streamline process between data producers and data consumers
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Varun Rao Bhamidimarri
vbhamidi@amazon.com
Tony Nguyen
aanwin@amazon.com
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

More Related Content

What's hot

Protecting Your Data- AWS Security Tools and Features
Protecting Your Data- AWS Security Tools and FeaturesProtecting Your Data- AWS Security Tools and Features
Protecting Your Data- AWS Security Tools and FeaturesAmazon Web Services
 
데브옵스 엔지니어를 위한 신규 운영 서비스 - 김필중, AWS 개발 전문 솔루션즈 아키텍트 / 김현민, 메가존클라우드 솔루션즈 아키텍트 :...
데브옵스 엔지니어를 위한 신규 운영 서비스 - 김필중, AWS 개발 전문 솔루션즈 아키텍트 / 김현민, 메가존클라우드 솔루션즈 아키텍트 :...데브옵스 엔지니어를 위한 신규 운영 서비스 - 김필중, AWS 개발 전문 솔루션즈 아키텍트 / 김현민, 메가존클라우드 솔루션즈 아키텍트 :...
데브옵스 엔지니어를 위한 신규 운영 서비스 - 김필중, AWS 개발 전문 솔루션즈 아키텍트 / 김현민, 메가존클라우드 솔루션즈 아키텍트 :...Amazon Web Services Korea
 
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)Amazon Web Services Korea
 
게임서비스를 위한 ElastiCache 활용 전략 :: 구승모 솔루션즈 아키텍트 :: Gaming on AWS 2016
게임서비스를 위한 ElastiCache 활용 전략 :: 구승모 솔루션즈 아키텍트 :: Gaming on AWS 2016게임서비스를 위한 ElastiCache 활용 전략 :: 구승모 솔루션즈 아키텍트 :: Gaming on AWS 2016
게임서비스를 위한 ElastiCache 활용 전략 :: 구승모 솔루션즈 아키텍트 :: Gaming on AWS 2016Amazon Web Services Korea
 
HSBC and AWS Day - AWS foundations
HSBC and AWS Day - AWS foundationsHSBC and AWS Day - AWS foundations
HSBC and AWS Day - AWS foundationsAmazon Web Services
 
Terraform을 기반한 AWS 기반 대규모 마이크로서비스 인프라 운영 노하우 - 이용욱, 삼성전자 :: AWS Summit Seoul ...
Terraform을 기반한 AWS 기반 대규모 마이크로서비스 인프라 운영 노하우 - 이용욱, 삼성전자 :: AWS Summit Seoul ...Terraform을 기반한 AWS 기반 대규모 마이크로서비스 인프라 운영 노하우 - 이용욱, 삼성전자 :: AWS Summit Seoul ...
Terraform을 기반한 AWS 기반 대규모 마이크로서비스 인프라 운영 노하우 - 이용욱, 삼성전자 :: AWS Summit Seoul ...Amazon Web Services Korea
 
AWS Backup을 이용한 데이터베이스의 백업 자동화와 편리한 복구방법
AWS Backup을 이용한 데이터베이스의 백업 자동화와 편리한 복구방법AWS Backup을 이용한 데이터베이스의 백업 자동화와 편리한 복구방법
AWS Backup을 이용한 데이터베이스의 백업 자동화와 편리한 복구방법Amazon Web Services Korea
 
Amazon SageMaker 모델 학습 방법 소개::최영준, 솔루션즈 아키텍트 AI/ML 엑스퍼트, AWS::AWS AIML 스페셜 웨비나
Amazon SageMaker 모델 학습 방법 소개::최영준, 솔루션즈 아키텍트 AI/ML 엑스퍼트, AWS::AWS AIML 스페셜 웨비나Amazon SageMaker 모델 학습 방법 소개::최영준, 솔루션즈 아키텍트 AI/ML 엑스퍼트, AWS::AWS AIML 스페셜 웨비나
Amazon SageMaker 모델 학습 방법 소개::최영준, 솔루션즈 아키텍트 AI/ML 엑스퍼트, AWS::AWS AIML 스페셜 웨비나Amazon Web Services Korea
 
워크로드 특성에 따른 안전하고 효율적인 Data Lake 운영 방안
워크로드 특성에 따른 안전하고 효율적인 Data Lake 운영 방안워크로드 특성에 따른 안전하고 효율적인 Data Lake 운영 방안
워크로드 특성에 따른 안전하고 효율적인 Data Lake 운영 방안Amazon Web Services Korea
 
Amazon Personalize 개인화 추천 모델 만들기::김태수, 솔루션즈 아키텍트, AWS::AWS AIML 스페셜 웨비나
Amazon Personalize 개인화 추천 모델 만들기::김태수, 솔루션즈 아키텍트, AWS::AWS AIML 스페셜 웨비나Amazon Personalize 개인화 추천 모델 만들기::김태수, 솔루션즈 아키텍트, AWS::AWS AIML 스페셜 웨비나
Amazon Personalize 개인화 추천 모델 만들기::김태수, 솔루션즈 아키텍트, AWS::AWS AIML 스페셜 웨비나Amazon Web Services Korea
 
Amazon Personalize 소개 (+ 실습 구성)::김영진, 솔루션즈 아키텍트, AWS::AWS AIML 스페셜 웨비나
Amazon Personalize 소개 (+ 실습 구성)::김영진, 솔루션즈 아키텍트, AWS::AWS AIML 스페셜 웨비나Amazon Personalize 소개 (+ 실습 구성)::김영진, 솔루션즈 아키텍트, AWS::AWS AIML 스페셜 웨비나
Amazon Personalize 소개 (+ 실습 구성)::김영진, 솔루션즈 아키텍트, AWS::AWS AIML 스페셜 웨비나Amazon Web Services Korea
 
20190326 AWS Black Belt Online Seminar Amazon CloudWatch
20190326 AWS Black Belt Online Seminar Amazon CloudWatch20190326 AWS Black Belt Online Seminar Amazon CloudWatch
20190326 AWS Black Belt Online Seminar Amazon CloudWatchAmazon Web Services Japan
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesAmazon Web Services
 
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나Amazon Web Services Korea
 
Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200Amazon Web Services
 
AWS Summit Seoul 2023 | 실시간 CDC 데이터 처리! Modern Transactional Data Lake 구축하기
AWS Summit Seoul 2023 | 실시간 CDC 데이터 처리! Modern Transactional Data Lake 구축하기AWS Summit Seoul 2023 | 실시간 CDC 데이터 처리! Modern Transactional Data Lake 구축하기
AWS Summit Seoul 2023 | 실시간 CDC 데이터 처리! Modern Transactional Data Lake 구축하기Amazon Web Services Korea
 
Aurora MySQL Backtrack을 이용한 빠른 복구 방법 - 진교선 :: AWS Database Modernization Day 온라인
Aurora MySQL Backtrack을 이용한 빠른 복구 방법 - 진교선 :: AWS Database Modernization Day 온라인Aurora MySQL Backtrack을 이용한 빠른 복구 방법 - 진교선 :: AWS Database Modernization Day 온라인
Aurora MySQL Backtrack을 이용한 빠른 복구 방법 - 진교선 :: AWS Database Modernization Day 온라인Amazon Web Services Korea
 
[AWS Migration Workshop] 데이터베이스를 AWS로 손쉽게 마이그레이션 하기
[AWS Migration Workshop]  데이터베이스를 AWS로 손쉽게 마이그레이션 하기[AWS Migration Workshop]  데이터베이스를 AWS로 손쉽게 마이그레이션 하기
[AWS Migration Workshop] 데이터베이스를 AWS로 손쉽게 마이그레이션 하기Amazon Web Services Korea
 

What's hot (20)

Protecting Your Data- AWS Security Tools and Features
Protecting Your Data- AWS Security Tools and FeaturesProtecting Your Data- AWS Security Tools and Features
Protecting Your Data- AWS Security Tools and Features
 
데브옵스 엔지니어를 위한 신규 운영 서비스 - 김필중, AWS 개발 전문 솔루션즈 아키텍트 / 김현민, 메가존클라우드 솔루션즈 아키텍트 :...
데브옵스 엔지니어를 위한 신규 운영 서비스 - 김필중, AWS 개발 전문 솔루션즈 아키텍트 / 김현민, 메가존클라우드 솔루션즈 아키텍트 :...데브옵스 엔지니어를 위한 신규 운영 서비스 - 김필중, AWS 개발 전문 솔루션즈 아키텍트 / 김현민, 메가존클라우드 솔루션즈 아키텍트 :...
데브옵스 엔지니어를 위한 신규 운영 서비스 - 김필중, AWS 개발 전문 솔루션즈 아키텍트 / 김현민, 메가존클라우드 솔루션즈 아키텍트 :...
 
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
 
게임서비스를 위한 ElastiCache 활용 전략 :: 구승모 솔루션즈 아키텍트 :: Gaming on AWS 2016
게임서비스를 위한 ElastiCache 활용 전략 :: 구승모 솔루션즈 아키텍트 :: Gaming on AWS 2016게임서비스를 위한 ElastiCache 활용 전략 :: 구승모 솔루션즈 아키텍트 :: Gaming on AWS 2016
게임서비스를 위한 ElastiCache 활용 전략 :: 구승모 솔루션즈 아키텍트 :: Gaming on AWS 2016
 
HSBC and AWS Day - AWS foundations
HSBC and AWS Day - AWS foundationsHSBC and AWS Day - AWS foundations
HSBC and AWS Day - AWS foundations
 
Terraform을 기반한 AWS 기반 대규모 마이크로서비스 인프라 운영 노하우 - 이용욱, 삼성전자 :: AWS Summit Seoul ...
Terraform을 기반한 AWS 기반 대규모 마이크로서비스 인프라 운영 노하우 - 이용욱, 삼성전자 :: AWS Summit Seoul ...Terraform을 기반한 AWS 기반 대규모 마이크로서비스 인프라 운영 노하우 - 이용욱, 삼성전자 :: AWS Summit Seoul ...
Terraform을 기반한 AWS 기반 대규모 마이크로서비스 인프라 운영 노하우 - 이용욱, 삼성전자 :: AWS Summit Seoul ...
 
AWS Backup을 이용한 데이터베이스의 백업 자동화와 편리한 복구방법
AWS Backup을 이용한 데이터베이스의 백업 자동화와 편리한 복구방법AWS Backup을 이용한 데이터베이스의 백업 자동화와 편리한 복구방법
AWS Backup을 이용한 데이터베이스의 백업 자동화와 편리한 복구방법
 
Amazon SageMaker 모델 학습 방법 소개::최영준, 솔루션즈 아키텍트 AI/ML 엑스퍼트, AWS::AWS AIML 스페셜 웨비나
Amazon SageMaker 모델 학습 방법 소개::최영준, 솔루션즈 아키텍트 AI/ML 엑스퍼트, AWS::AWS AIML 스페셜 웨비나Amazon SageMaker 모델 학습 방법 소개::최영준, 솔루션즈 아키텍트 AI/ML 엑스퍼트, AWS::AWS AIML 스페셜 웨비나
Amazon SageMaker 모델 학습 방법 소개::최영준, 솔루션즈 아키텍트 AI/ML 엑스퍼트, AWS::AWS AIML 스페셜 웨비나
 
워크로드 특성에 따른 안전하고 효율적인 Data Lake 운영 방안
워크로드 특성에 따른 안전하고 효율적인 Data Lake 운영 방안워크로드 특성에 따른 안전하고 효율적인 Data Lake 운영 방안
워크로드 특성에 따른 안전하고 효율적인 Data Lake 운영 방안
 
Amazon Personalize 개인화 추천 모델 만들기::김태수, 솔루션즈 아키텍트, AWS::AWS AIML 스페셜 웨비나
Amazon Personalize 개인화 추천 모델 만들기::김태수, 솔루션즈 아키텍트, AWS::AWS AIML 스페셜 웨비나Amazon Personalize 개인화 추천 모델 만들기::김태수, 솔루션즈 아키텍트, AWS::AWS AIML 스페셜 웨비나
Amazon Personalize 개인화 추천 모델 만들기::김태수, 솔루션즈 아키텍트, AWS::AWS AIML 스페셜 웨비나
 
Amazon Personalize 소개 (+ 실습 구성)::김영진, 솔루션즈 아키텍트, AWS::AWS AIML 스페셜 웨비나
Amazon Personalize 소개 (+ 실습 구성)::김영진, 솔루션즈 아키텍트, AWS::AWS AIML 스페셜 웨비나Amazon Personalize 소개 (+ 실습 구성)::김영진, 솔루션즈 아키텍트, AWS::AWS AIML 스페셜 웨비나
Amazon Personalize 소개 (+ 실습 구성)::김영진, 솔루션즈 아키텍트, AWS::AWS AIML 스페셜 웨비나
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
20190326 AWS Black Belt Online Seminar Amazon CloudWatch
20190326 AWS Black Belt Online Seminar Amazon CloudWatch20190326 AWS Black Belt Online Seminar Amazon CloudWatch
20190326 AWS Black Belt Online Seminar Amazon CloudWatch
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
 
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나
 
Building-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWSBuilding-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWS
 
Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200
 
AWS Summit Seoul 2023 | 실시간 CDC 데이터 처리! Modern Transactional Data Lake 구축하기
AWS Summit Seoul 2023 | 실시간 CDC 데이터 처리! Modern Transactional Data Lake 구축하기AWS Summit Seoul 2023 | 실시간 CDC 데이터 처리! Modern Transactional Data Lake 구축하기
AWS Summit Seoul 2023 | 실시간 CDC 데이터 처리! Modern Transactional Data Lake 구축하기
 
Aurora MySQL Backtrack을 이용한 빠른 복구 방법 - 진교선 :: AWS Database Modernization Day 온라인
Aurora MySQL Backtrack을 이용한 빠른 복구 방법 - 진교선 :: AWS Database Modernization Day 온라인Aurora MySQL Backtrack을 이용한 빠른 복구 방법 - 진교선 :: AWS Database Modernization Day 온라인
Aurora MySQL Backtrack을 이용한 빠른 복구 방법 - 진교선 :: AWS Database Modernization Day 온라인
 
[AWS Migration Workshop] 데이터베이스를 AWS로 손쉽게 마이그레이션 하기
[AWS Migration Workshop]  데이터베이스를 AWS로 손쉽게 마이그레이션 하기[AWS Migration Workshop]  데이터베이스를 AWS로 손쉽게 마이그레이션 하기
[AWS Migration Workshop] 데이터베이스를 AWS로 손쉽게 마이그레이션 하기
 

Similar to Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018

Getting Started with AWS Security
Getting Started with AWS SecurityGetting Started with AWS Security
Getting Started with AWS SecurityAmazon Web Services
 
SecuringYourCustomersDataFromDayOne_SFStartupDay
SecuringYourCustomersDataFromDayOne_SFStartupDaySecuringYourCustomersDataFromDayOne_SFStartupDay
SecuringYourCustomersDataFromDayOne_SFStartupDayAmazon Web Services
 
Securing Your Customers Data From Day One
Securing Your Customers Data From Day OneSecuring Your Customers Data From Day One
Securing Your Customers Data From Day OneAmazon Web Services
 
AWS18_StartupDayToronto_SecuringYourCustomersDataFromDayOne
AWS18_StartupDayToronto_SecuringYourCustomersDataFromDayOneAWS18_StartupDayToronto_SecuringYourCustomersDataFromDayOne
AWS18_StartupDayToronto_SecuringYourCustomersDataFromDayOneAmazon Web Services
 
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018Amazon Web Services
 
Securing Customer Data from Day 1 - AWS Startup Day Boston 2018.pdf
Securing Customer Data from Day 1 - AWS Startup Day Boston 2018.pdfSecuring Customer Data from Day 1 - AWS Startup Day Boston 2018.pdf
Securing Customer Data from Day 1 - AWS Startup Day Boston 2018.pdfAmazon Web Services
 
AWS STARTUP DAY 2018 I Securing Your Customer Data From Day One
AWS STARTUP DAY 2018 I Securing Your Customer Data From Day OneAWS STARTUP DAY 2018 I Securing Your Customer Data From Day One
AWS STARTUP DAY 2018 I Securing Your Customer Data From Day OneAWS Germany
 
Deep dive - AWS security by design
Deep dive - AWS security by designDeep dive - AWS security by design
Deep dive - AWS security by designRichard Harvey
 
Deep Dive - AWS Security by Design
Deep Dive - AWS Security by DesignDeep Dive - AWS Security by Design
Deep Dive - AWS Security by DesignAmazon Web Services
 
Crash Course in Security Best Practices, AWS Startup Day Cape Town 2018
Crash Course in Security Best Practices, AWS Startup Day Cape Town 2018Crash Course in Security Best Practices, AWS Startup Day Cape Town 2018
Crash Course in Security Best Practices, AWS Startup Day Cape Town 2018Amazon Web Services
 
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud   rohit pujari 5.30.18Big data journey to the cloud   rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18Cloudera, Inc.
 
Evolve Your Incident Response Process and Powers for AWS - SID306 - Chicago A...
Evolve Your Incident Response Process and Powers for AWS - SID306 - Chicago A...Evolve Your Incident Response Process and Powers for AWS - SID306 - Chicago A...
Evolve Your Incident Response Process and Powers for AWS - SID306 - Chicago A...Amazon Web Services
 
How to Implement a Well-Architected Security Solution.pdf
How to Implement a Well-Architected Security Solution.pdfHow to Implement a Well-Architected Security Solution.pdf
How to Implement a Well-Architected Security Solution.pdfAmazon Web Services
 
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...AWS Riyadh User Group
 
Evolve Your Incident Response Process and Powers for AWS
Evolve Your Incident Response Process and Powers for AWS Evolve Your Incident Response Process and Powers for AWS
Evolve Your Incident Response Process and Powers for AWS Amazon Web Services
 
Security Automation using AWS Management Tools
Security Automation using AWS Management ToolsSecurity Automation using AWS Management Tools
Security Automation using AWS Management ToolsAmazon Web Services
 
Data Security in the Cloud - Matt Taylor - AWS TechShift ANZ 2018
Data Security in the Cloud - Matt Taylor - AWS TechShift ANZ 2018Data Security in the Cloud - Matt Taylor - AWS TechShift ANZ 2018
Data Security in the Cloud - Matt Taylor - AWS TechShift ANZ 2018Amazon Web Services
 

Similar to Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018 (20)

Getting Started with AWS Security
Getting Started with AWS SecurityGetting Started with AWS Security
Getting Started with AWS Security
 
SecuringYourCustomersDataFromDayOne_SFStartupDay
SecuringYourCustomersDataFromDayOne_SFStartupDaySecuringYourCustomersDataFromDayOne_SFStartupDay
SecuringYourCustomersDataFromDayOne_SFStartupDay
 
Securing Your Customers Data From Day One
Securing Your Customers Data From Day OneSecuring Your Customers Data From Day One
Securing Your Customers Data From Day One
 
Securing Your Customers Data From Day One
Securing Your Customers Data From Day OneSecuring Your Customers Data From Day One
Securing Your Customers Data From Day One
 
AWS18_StartupDayToronto_SecuringYourCustomersDataFromDayOne
AWS18_StartupDayToronto_SecuringYourCustomersDataFromDayOneAWS18_StartupDayToronto_SecuringYourCustomersDataFromDayOne
AWS18_StartupDayToronto_SecuringYourCustomersDataFromDayOne
 
How AI is disrupting the world
How AI is disrupting the world How AI is disrupting the world
How AI is disrupting the world
 
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
 
Securing Customer Data from Day 1 - AWS Startup Day Boston 2018.pdf
Securing Customer Data from Day 1 - AWS Startup Day Boston 2018.pdfSecuring Customer Data from Day 1 - AWS Startup Day Boston 2018.pdf
Securing Customer Data from Day 1 - AWS Startup Day Boston 2018.pdf
 
AWS STARTUP DAY 2018 I Securing Your Customer Data From Day One
AWS STARTUP DAY 2018 I Securing Your Customer Data From Day OneAWS STARTUP DAY 2018 I Securing Your Customer Data From Day One
AWS STARTUP DAY 2018 I Securing Your Customer Data From Day One
 
Deep dive - AWS security by design
Deep dive - AWS security by designDeep dive - AWS security by design
Deep dive - AWS security by design
 
Deep Dive - AWS Security by Design
Deep Dive - AWS Security by DesignDeep Dive - AWS Security by Design
Deep Dive - AWS Security by Design
 
Crash Course in Security Best Practices, AWS Startup Day Cape Town 2018
Crash Course in Security Best Practices, AWS Startup Day Cape Town 2018Crash Course in Security Best Practices, AWS Startup Day Cape Town 2018
Crash Course in Security Best Practices, AWS Startup Day Cape Town 2018
 
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud   rohit pujari 5.30.18Big data journey to the cloud   rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18
 
Evolve Your Incident Response Process and Powers for AWS - SID306 - Chicago A...
Evolve Your Incident Response Process and Powers for AWS - SID306 - Chicago A...Evolve Your Incident Response Process and Powers for AWS - SID306 - Chicago A...
Evolve Your Incident Response Process and Powers for AWS - SID306 - Chicago A...
 
How to Implement a Well-Architected Security Solution.pdf
How to Implement a Well-Architected Security Solution.pdfHow to Implement a Well-Architected Security Solution.pdf
How to Implement a Well-Architected Security Solution.pdf
 
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
 
Security@Scale
Security@ScaleSecurity@Scale
Security@Scale
 
Evolve Your Incident Response Process and Powers for AWS
Evolve Your Incident Response Process and Powers for AWS Evolve Your Incident Response Process and Powers for AWS
Evolve Your Incident Response Process and Powers for AWS
 
Security Automation using AWS Management Tools
Security Automation using AWS Management ToolsSecurity Automation using AWS Management Tools
Security Automation using AWS Management Tools
 
Data Security in the Cloud - Matt Taylor - AWS TechShift ANZ 2018
Data Security in the Cloud - Matt Taylor - AWS TechShift ANZ 2018Data Security in the Cloud - Matt Taylor - AWS TechShift ANZ 2018
Data Security in the Cloud - Matt Taylor - AWS TechShift ANZ 2018
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018

  • 1.
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Best Practices to Secure Data Lake on AWS Varun Rao Bhamidimarri Solution Architect AWS A N T 3 2 7 Tony Nguyen Senior Big Data Consultant AWS
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Understand the value proposition for the data lake and how Amazon Web Services (AWS) can help 1 of some best practices for secure data lake implementation 2 into role/scenario based approaches to data lake security 3 What to expect
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Assumptions and Housekeeping • Targeted towards anyone wanting to build a secure data lake on AWS • Assumes: • Foundational AWS knowledge • High level knowledge of Data Lake and AWS Analytics service portfolio • Knowledge of security concepts such as SSL / TLS, encryption, authentication / authorization • This session slides and recording will be shared online • Please don’t forget to submit your feedback!
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What is a data lake? • Collect and store all data, at any scale, and low cost • Helps locate, curate, and secure your data • Provide democratized access to data within your organization • Quickly and easily perform new types of data analysis
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Primary components of a data lake Storage Compute Metadata/Catalog Storage & streams Catalogue & search Entitlements API & UI Attributes of a modern data architecture Key pillars of a data lake Architectural layers of a Data lake (without security)
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Primary components of a data lake Storage Compute Metadata/Catalog Architectural layers of a data lake (without security) • Object • Block • File • Attached
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Primary components of a data lake Storage Compute Metadata/Catalog Architectural layers of a data lake (without security) index search • Curate commission decommission lineage
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Primary components of a data lake Storage Compute Metadata/Catalog Architectural layers of a data lake (without security) • Server-based • Serverless • Hybrid
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Building a data lake on AWS Batch AWS STS Metadata User Access Data Movement Analytics and Machine Learning Security/Governance
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Building a data lake on AWS Batch AWS STS
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Securing all of these tools is challenging • Having such a diverse set of tools from the ecosystem allows you to choose the best tool for the job… • …but also makes a single unified solution for security challenging! • How do you secure each layer, while still satisfying your specific security and compliance requirements?
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Security challenges with data lakes Data challenges • Controlling access to data • Data masking, row / column / cell level encryption, key management • Data loss / exfiltration • Loss of data integrity • Data provenance • Compliance requirements (GDPR and others) Management challenges • Central administration • Federated authentication, typically with Active Directory • Role-based access control (RBAC) • Centralized audit • End-to-end data protection (at-rest and in-transit)
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Shared responsibility model
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Shared responsibility model – service types • Infrastructure • Managed • Serverless
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Shared responsibility model – comparison Amazon EMR • Amazon EC2 infrastructure needs to be managed • Root-level access via SSH • Patching of instances • Some level of Amazon CloudWatch / Amazon CloudTrail logging is done for customer, but not exhaustive • Instance profile role, Amazon EMR Service role need to be configured by customer • Local disk encryption, Amazon S3 encryption, etc. needs to be configured by customer… Amazon Athena • No infrastructure to manage • Service access is governed via IAM policy documents • Amazon S3 access is via bucket policy / IAM policy • Encryption is managed
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS helps you secure Compliance AWS Artifact Amazon Inspector AWS CloudHSM Amazon Cognito AWS CloudTrail Security Amazon GuardDuty AWS Shield AWS WAF Amazon Macie Amazon Virtual Private Cloud (Amazon VPC) Encryption AWS Certificate Manager AWS Key Management Service (AWS KMS) Encryption at rest Encryption in transit Bring your own keys, HSM support Identity AWS Identity and Access Management (IAM) AWS Single Sign-On Amazon Cloud Directory AWS Directory Service AWS Organizations Customers need to have multiple levels of security, identity and access management, encryption, and compliance to secure their data lake
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon S3 buckets and prefixes Amazon EMR clusters Team X Amazon Redshift clusters “Coarse-grained” ownership Prefer “coarse-grained” ownership • Teams own entire Amazon S3 buckets and clusters • Ownership segregated by AWS accounts • Access control easier to setup and maintain • Suitable for autonomous teams
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Encrypt data at rest Pick encryption mode for Amazon S3 objects CSE - KMS CSE - C SSE - KMS SSE - C SSE - S3Server side Client side AWS KMS S3 built-inCustom KMS Key Management overhead Encryption point
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Compliance: Log and audit all AWS activity • Log and continuously monitor every account activity and API calls with Amazon CloudTrail • Increase visibility into your user and resource activity • Log management and data events into separate trails • Centralize logs into separate security account • Disable S3 delete using IAM Store data in Amazon S3 Account event occurs generating API activity CloudTrail captures and records the API activity A log of API calls is delivered to S3 bucket and optionally delivered to Amazon CloudWatch Logs and CloudWatch Events
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Security in the cloud - basics • Account • Federate accounts with Active Directory / Identity Provider • Setup multi-factor authentication (MFA) • Avoid using root account credentials • IAM access should be least privilege • Network • Private VPC Subnets • VPC endpoint/Interface endpoints • Least privilege for Security groups • Storage • Encrypt using KMS
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Different types of roles Security Admin Data Lake on AWSData Curator Analyst Data Engineer Data scientist
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data workflow • All roles have actions and responsibilities that correspond to each phase in the overall data workflow • Think of the data lake in terms of producers and consumers COLLECT STORE DELIVER ANALYZESUBSCRIBEDISCOVER
  • 27. What data do I have? “Through 2018, 80% of data lakes will not include effective metadata management capabilities, making them inefficient.” -Gartner Data lake on AWS Storage | Archival storage | Data catalog
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Onboarding new data Data Curator Identify New Data Create Dataset Definition Load / Stage Raw Data Register Raw Data against Dataset Definition Data Owner Developer / Data Engineer Developer / Data Engineer COLLECT STORE
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Searching and accessing data DELIVERSUBSCRIBEDISCOVER Search for Data in Data Catalog Identify Data, Request Access Approve Access Access and Query Data Data Scientist / Business User Data Owner/Security Admin Data Scientist / Analyst Data Scientist / Business User ANALYZE DISCOVER
  • 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data analytics tools and access patterns Machine Learning/Deep Learning IDE Central Storage spectrum Programmatic SQL API JDBC/ODBC CLI
  • 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Security admin tasks • Setup security guardrails • Preemptive and detective controls • Provide data access across teams/environments • Validate security requirements based on data classification • Verify Data owner/producer has authorized access • Run regular audits
  • 33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon S3 – preemptive controls • Create buckets based on business domains • Assign bucket policies • Restrict by VPC, HTTPS, IP filters, KMS keys • Restrict using Tags/Conditions • "Condition": {"StringEquals": {"S3:ResourceTag/HIPAA":"True"} • "Condition": {"StringEquals": {"aws:UserAgent": "AWS Redshift/Spectrum"} • Enable encryption/Enable versioning • MFA delete • Enable backups – across accounts/regions • IAM permission boundary • S3 public access setting N E W !
  • 34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon S3 data – detective controls • Enable AWS Config to detect S3 bucket level changes • s3-bucket-public-read-prohibited, s3-bucket-public-write-prohibited, s3-bucket-ssl- requests-only • S3 data access audit using CloudTrail – Log to separate CloudWatch logs • Kerberos enabled EMR clusters allows you to track AD user • Use Amazon GuardDuty to detect unauthorized and unexpected activity • Enable Amazon Macie to classify sensitive data
  • 35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Encrypt data in transit Point “A” Point “B” Data flow protection Enterprise data sources Amazon S3 Encrypted with SSL/TLS; S3 requests signed with AWS Sigv4 Amazon S3 Amazon EMR Encrypted with SSL/TLS Amazon S3 Amazon Redshift Encrypted with SSL/TLS Amazon EMR Clients Encrypted with SSL/TLS; varies with Hadoop application client Amazon Redshift Clients Supports SSL/TLS; Requires configuration Apache Hadoop on Amazon EMR • Hadoop RPC encryption • HDFS Block data transfer encryption • KMS over HTTPS is not enabled by default with Hadoop KMS • May vary with EMR release (such as Tez and Spark in release 5.0.0+)
  • 36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AD groups Security authorization mapping IAM roles Redshift DB Groups EMR Security configuration AD group based policies Glue catalog resource policies
  • 37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Map database ACL’s to db grant/glue policy catalog.user_table grant group developer select on catalog.user_table AD group: developer Action: ['glue:GetTable*’, ‘glue:GetPartiton*’] Principal: [“arn:aws:iam::<account>:role/developer_role”] Resource:[“arn:aws:glue:<region>:<account>:table/gluecatalog/user_table”, “arn:aws:glue:<region>:<account-id>:table/gluecatalog/user_table/*”]
  • 38. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Map storage ACL’s to Amazon S3 policy s3://bucket/path/ Effect: Allow Action: [s3:ListBucket’, “s3:GetObject”] Principal: [“arn:aws:iam::<account>:role/developer_role”] Resource: [“s3://bucket/path”, “s3://bucket/path/*”] AD group: developer
  • 39. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 40. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Scenario – retail company X Sales SKU customer_id date_of_purchase price Product SKU category product_name manufacturer Customer customer_id age gender location cc_info
  • 41. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Gather insights from the data • Business user (External vendor – belongs to a manufacturer) • Sales by product category (cannot see other manufacture's data) • Sales by location • Get sales forecast by product • Analyst (Employee – may belong to a Product line/Business unit) • Sales by product category • Sales by location • Data scientist (Employee – may not belong to a Business Unit) • Forecast the sales of a specific product, based on age group, location and time of the year
  • 42. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Workflow – onboarding new data Security AdminAnalyst/Business user Analyst/Business user ANALYZE DISCOVER DELIVERSUBSCRIBEDISCOVER Curator Data Engineer STORE Get Sales by product category Setup data pipeline Register to data catalog Setup appropriate permissions Create and consume reports
  • 43. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Role based tasks Data EngineerCurator Security Admin Setup staging catalog Enable access to Data Engineer Verifies and commissions dataset to production catalog Setup Amazon EMR cluster Setup process to move data from source into Amazon S3 Orchestrate and schedule the job Analyst Create and publish dashboard Enable access to Analyst Setup Row-level security for business users
  • 44. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Grant data/catalog access – data engineer Central Catalog Account Staging Curator Data Engineer Business unit account Data Engineer Role
  • 45. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Onboarding new data – security/configuration "Classification": "spark-hive-site", "Properties": { "hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory", "hive.metastore.glue.catalogid": "acct-id" } Effect: Allow Action: ['glue:*Database*’, ‘glue:*Table*’,’glue:*Partition*’] Effect: Allow Action: ['s3:PutObject', 's3:GetObject', 's3:DeleteObject']
  • 46. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Glue catalog - resource policies • Fine-grained access control to Catalog using IAM policies • Restrict what they can view and query
  • 47. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 48. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Build the data pipeline – Amazon EMR Data Extract/Collect User/VisualizeStore Process/Analyze Answer and Insight Data Data Engineer AWS DataSync SFTP Service for Amazon S3
  • 49. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon EMR - authentication • Configure Kerberos for cluster authentication • LDAP for HiveServer2, Hue, Presto, Zeppelin • Perimeter security using Apache Knox • Simplify authentication of various Hadoop services and UI's • Mask service specific URL's/Ports by acting as a Proxy • Enable SSL/TLS termination at the perimeter • Ease management of published endpoints across multiple clusters • Supports federation
  • 50. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon EMR – storage authorization • Control access to Amazon S3 based on user’s AD groups • Use different IAM roles for EMRFS requests to Amazon S3 • These IAM roles can be mapped to users, groups or the location of data in Amazon S3. AD groups IAM roles
  • 51. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon EMR – service authorization • Apache Ranger provides authorization of Hadoop cluster services • Eg: Hive tables, HDFS files, HBase etc • Also provides Audits • Column masking and Row filtering for Hive
  • 52. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Best practices - Amazon EMR security • EMRFS storage AuthZ • Apache Ranger • Table and SQL-level authorization for Hive using HiveServer2 • Role-based Authorization with AD • IAM • Amazon EMR logs to Amazon S3 • Amazon S3 Access Logs • Apache Ranger Audit • Amazon CloudTrail – Amazon EMR API’s/EMRFS calls AuthorizationAuthentication Data protection at rest Audit Data protection at motion Compliance Programs • Kerberos • Knox, Shiro • LDAP / AD integration • SSE-S3 , SSE-KMS, Amazon S3 Client Encryption • Disk encryption using AWS KMS • SELinux using EMR BA • Custom AMI • SSL/TLS in transit using Security configurations • SSL/TLS for calls to S3 (default) • SOC1,2,3 • ISO • PCI DSS • FedRAMP • HIPAA BAA • DoD SRG IL2/IL4
  • 53. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 54. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data ready - what next? • Curator • Verifies data registered with staging catalog • Runs sanity checks • Commissions the dataset into the production catalog • Creates a View to filter data by Product category • select * from sales join product where sales.sku = product.sku and category = ‘Electronics’ • Security Admin • Enable access to Analyst • Setup Row-level security • Analyst • Create and publish dashboard
  • 55. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Athena - create view CREATE VIEW sales_electronics AS SELECT sum(price) FROM sales, product WHERE sales.sku = product.sku and product.category = ‘Electronics’
  • 56. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Athena – secure data flow Central Catalog Account Business unit account Data analyst On-Premise Identity provider (IDP) SAML token Role Production
  • 57. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon QuickSight - data governance Create managed datasets that give power users and authors the flexibility to perform self-serve analytics on data that you control. Create datasets that: • Can be shared with any user • Automatically refresh • Have row level security • Users cannot modify • Dynamically update with changes
  • 58. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon QuickSight – secure data flow Central catalog account Business unit account Business User On-Premise Identity provider (IDP) SAML token Role Row level filter based on the manufacturer Production
  • 59. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Athena - security Controls • IAM policies mapped to Roles – these polices are passed all the way to storage layer • Views with Glue Catalog resource policies • All API calls are logged to CloudTrail • S3 Access Logs can provide data access information AuthorizationAuthentication Data protection at rest Audit Data protection in motion Compliance Programs • IAM federation • Cross-account • EC2 instance profile • CSE-KMS • SSE-KMS • SSE-S3 • Use separate KMS keys for source and destination buckets • JDBC connections use TLS/SSL by default • Data transfer between S3 and Athena is encrypted by TLS • SOC 1,2,3 • HIPAA BAA
  • 60. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon QuickSight - security controls • IAM policies • Row-level Security • Amazon CloudTrail • Amazon S3 Access Logs AuthorizationAuthentication Data protection at rest Audit Data protection in motion Compliance Programs • IAM federation • QuickSight-only users • Cross-account via Amazon S3 • MFA • Differences between Standard and Enterprise • Encrypt your source datasets and Amazon S3 • QuickSight Enterprise edition: data at rest in SPICE is also encrypted • SSL/TLS • Interface Endpoints to VPCs and Direct Connect • HIPAA • SOC2 • PCI-DSS • ISO • FedRAMP
  • 61. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 62. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Workflow – access existing data Security Engineer Query Sales by Location Access the user location data (user id, state) Review and authorize specific columns Setup grants in database Analyze data Analyst Data owner - Marketing Analyst Analyst ANALYZE DISCOVER DELIVERSUBSCRIBEDISCOVER
  • 63. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift – secure data flow Central Data Catalog Account Business team account Business analyst On-Premise Identity provider (IDP) SAML token
  • 64. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Redshift federated authentication
  • 65. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift federated authorization • Setup Amazon Redshift DBGroups • For example - CREATE Group ‘XXX’ • Use Grants to setup authorization access • GRANT SELECT on table ‘YYYY’ to group ‘group1’ • Configure SAML assertion for your IDP <Attribute Name="https://redshift.amazon.com/SAML/Attributes/DbGroups"> <AttributeValue>group1</AttributeValue> <AttributeValue>group2</AttributeValue> <AttributeValue>group3</AttributeValue> </Attribute>
  • 66. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift audit logging Cloudtrail • Amazon Redshift API calls • KMS API calls • S3 calls Amazon S3 • Connection logs • User logs • User activity logs
  • 67. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Best practices - Amazon Redshift security • DB groups with grants • Restrict access by IAM policy • Use condition keys “ResourceTag” and “RequestTag” • API logs to Amazon Cloudtrail • Logs to Amazon S3 • Connection logs • User logs • User activity logs AuthorizationAuthentication Data protection at rest Audit Data protection at motion Compliance Programs • IAM federation • DB username and password • KMS • HSM – AWS CloudHSM Classic • Key rotation – CMK, DEK • SSL (ACM) - Set “require_SSL = true” in parameter group • FIPS 140-2 support • SOC1,2,3 • PCI DSS Level 1 • FedRAMP • HIPAA eligible with BAA
  • 68. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Workflow – build predictive model Data ScientistData scientist Data Owner Want to predict sales by location Access the user location data (user id, state) Review and authorize specific columns Pull data into notebook, develop and train model Deploy and test model ANALYZE DISCOVER DELIVERSUBSCRIBEDISCOVER
  • 69. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon SageMaker with Apache Spark Data scientist Spark Magic Model EndpointsModels
  • 70. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Best practices – Amazon Sagemaker security • Restrict access by IAM policy and condition keys • API logs to Amazon Cloudtrail - exception of InvokeEndpoint AuthorizationAuthentication Data protection at rest Audit Data protection at motion Compliance Programs • IAM federation • KMS based encryption for • Notebooks • Training jobs • Amazon S3 location to store models • Endpoint • HTTPS for API/Console • Notebooks • VPC enabled • Interface endpoint • Limit by IP • Training jobs/Endpoints • VPC enabled • PCI DSS • HIPAA eligible with BAA • ISO
  • 71. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon.com’s vision is to be the earth’s most customer— centric company; where people can find anything they want to buy online. Challenge: Load 500K+ transactions each day, and serve 300K+ queries/extracts each day from Amazon businsses (Amazon.com, Amazon Prime, Amazon Music, Amazon Alexa, Amazon Video, and Twitch). Solution: • Land data in S3 as a data lake • Use Redshift as preferred SQL based analysis by business users, and EMR for machine learning
  • 72. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon.com uses AWS for data lakes & analytics • DynamoDB capturing all Amazon.com transactions • Everything from DynamoDB, RDS PostgreSQL and Kinesis fed to a Amazon S3 data lake • AWS Glue used to catalog the data • Amazon Redshift used for all SQL-based queries, and Amazon EMR for all machine learning and big data processing • End-users use Amazon QuickSight for visualizations AWS Glue Catalog Amazon QuickSight Amazon S3 Amazon Athena Amazon EMR Amazon DynamoDB PostgreSQL Amazon Kinesis Amazon Redshift Machine Learning
  • 73. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Summary • Federate access • Setup roles and responsibility matrix within your organization • Leverage centralized data catalog • Use both preemptive and detective controls • Perform regular audits • Secure storage, catalog and processing layers • Incentivize teams to register datasets to catalog • Streamline process between data producers and data consumers
  • 74. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Varun Rao Bhamidimarri vbhamidi@amazon.com Tony Nguyen aanwin@amazon.com
  • 75. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.