SlideShare a Scribd company logo
1 of 19
Cobus Bernard
Sr Developer Advocate
Amazon Web Services
Getting Started with Data Lakes on
AWS
@cobusbernard
cobusbernard
cobusbernard
Agenda
What is a Data Lake
Storing data in S3
Steps to build a Data Lake
AWS Lakeformation
Demo
Q&A
A data lake is a
centralised repository
that allows you to store all your
structured and unstructured
data at any scale
Why data lakes?
Data Lakes provide:
Relational and non-relational data
Scale-out to EBs (1EB = 1,024 PB = 1,048,576 TB)
Diverse set of analytics and machine learning tools
Work on data without any data movement
Designed for low cost storage and analytics
OLTP ERP CRM LOB
Data Warehouse
Business
Intelligence
Data Lake
1001100001001010111001
0101011100101010000101
1111011010
0011110010110010110
0100011000010
Devices Web Sensors Social
Catalog
Machine
Learning
DW Queries Big data
processing
Interactive Real-time
Build a secure data lake on Amazon S3
Amazon S3 Block
Public Access
Amazon S3
object lock
Amazon S3
object tags
Amazon S3
access points
• Multi-tenant bucket
• Dedicated access
points
• Custom permissions
from a virtual private
cloud (VPC)
• Controls public access
• Across AWS accounts &
individual S3 bucket
levels
• Specify any type of
public permissions via
ACL or policy
• Immutable
Amazon S3 objects
• Retention management
controls
• Data protection and
compliance
• Access control, lifecycle
policies, analysis
• Classify data, filter
objects
• Define replication
policies
FSx for Lustre
Choosing the right data lake storage class
Select storage class by data pipeline stage
Raw data ETL
• Small log files
• Overwrites if synced
• Short lived
• Moved & deleted
• Batched & archived
Production
data lake Historical data
Amazon S3
Standard
Amazon S3
Standard
Amazon S3
Intelligent-Tiering
Amazon S3 Glacier or
S3 Glacier Deep Archive
• Data churn
• Small intermediates
• Multiple transforms
• Deletes <30 days
• Output to data lake
• Optimized sizes (MBs)
• Many users
• Unpredictable access
• Long-lived assets
• Hot to cool
• Historical assets
• ML model training
• Compliance/Audit
• Data protection
• Planned restores
Online cool data
Amazon S3 Standard
Infrequent Access
and One Zone-IA
• Replicated DR data
• Infrequently accessed
• Infrequent queries
• ML model training
Typical steps of building a data lake
Setup storage1
Move data2
Cleanse, prep, and
catalog data
3
Configure and enforce
security and compliance
policies
4
Make data available
for analytics
5
Data preparation accounts for ~80% of the work
Building training sets
Cleaning and organizing data
Collecting data sets
Mining data for patterns
Refining algorithms
Other
Sample of steps required Find sources
Create Amazon Simple Storage Service (Amazon S3) locations
Configure access policies
Map tables to Amazon S3 locations
ETL jobs to load and clean data
Create metadata access policies
Configure access from analytics services
Rinse and repeat for other:
data sets, users, and end-services
And more:
manage and monitor ETL jobs
update metadata catalog as data changes
update policies across services as users and permissions change
manually maintain cleansing scripts
create audit processes for compliance
…
Manual | Error-prone | Time consuming
Enforce security policies
across multiple services
Gain and manage new
insights
Identify, ingest, clean, and
transform data
Build a secure data lake in days
AWS Lake Formation
How it works
Register existing data or import new
Amazon S3 forms the storage layer for
Lake Formation
Register existing S3 buckets that
contain your data
Ask Lake Formation to create required
S3 buckets and import data into them
Data is stored in your account. You have
direct access to it. No lock-in.
Data Lake Storage
Data
Catalog
Access
Control
Data
import
Lake Formation
Crawlers ML-based
data prep
Easily load data to your data lake
logs
DBs
Blueprints
Data Lake Storage
Data
Catalog
Access
Control
Data
import
Lake Formation
Crawlers ML-based
data prep
one-shot
incremental
With blueprints
You
1. Point us to the source
2. Tell us the location to load to
in your data lake
3. Specify how often you want
to load the data
Blueprints
1. Discover the source table(s)
schema
2. Automatically convert to the
target data format
3. Automatically partition the
data based on the
partitioning schema
4. Keep track of data that was
already processed
5. You can customize any of the
above
Blueprints build on AWS Glue
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
https://aws.training
https://aws.amazon.com/training/path-databases
https://aws.amazon.com/training/path-advanced-networking
https://bit.ly/aws-office-hours
https://bit.ly/africa-virtual-day
https://bit.ly/emea-summit
https://bit.ly/cobus-youtube
Useful links
Thank you!
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cobus Bernard
Sr Developer Advocate
Amazon Web Services
@cobusbernard
cobusbernard
cobusbernard

More Related Content

What's hot

Amazon RDS: Deep Dive - SRV310 - Chicago AWS Summit
Amazon RDS: Deep Dive - SRV310 - Chicago AWS SummitAmazon RDS: Deep Dive - SRV310 - Chicago AWS Summit
Amazon RDS: Deep Dive - SRV310 - Chicago AWS SummitAmazon Web Services
 
AWS-Enabled Disaster Recovery and Business Continuity for SIFIs
AWS-Enabled Disaster Recovery and Business Continuity for SIFIsAWS-Enabled Disaster Recovery and Business Continuity for SIFIs
AWS-Enabled Disaster Recovery and Business Continuity for SIFIsAmazon Web Services
 
Ask Me Anything about Amazon Aurora (DAT369-R1) - AWS re:Invent 2018
Ask Me Anything about Amazon Aurora (DAT369-R1) - AWS re:Invent 2018Ask Me Anything about Amazon Aurora (DAT369-R1) - AWS re:Invent 2018
Ask Me Anything about Amazon Aurora (DAT369-R1) - AWS re:Invent 2018Amazon Web Services
 
What's New in Amazon Aurora (DAT204-R1) - AWS re:Invent 2018
What's New in Amazon Aurora (DAT204-R1) - AWS re:Invent 2018What's New in Amazon Aurora (DAT204-R1) - AWS re:Invent 2018
What's New in Amazon Aurora (DAT204-R1) - AWS re:Invent 2018Amazon Web Services
 
Disaster Recovery Options on AWS Loft
Disaster Recovery Options on AWS LoftDisaster Recovery Options on AWS Loft
Disaster Recovery Options on AWS LoftAmazon Web Services
 
Building with AWS Databases: Match Your Workload to the Right Database (DAT30...
Building with AWS Databases: Match Your Workload to the Right Database (DAT30...Building with AWS Databases: Match Your Workload to the Right Database (DAT30...
Building with AWS Databases: Match Your Workload to the Right Database (DAT30...Amazon Web Services
 
Best Practices for Migrating Oracle Databases to the Cloud - AWS Online Tech ...
Best Practices for Migrating Oracle Databases to the Cloud - AWS Online Tech ...Best Practices for Migrating Oracle Databases to the Cloud - AWS Online Tech ...
Best Practices for Migrating Oracle Databases to the Cloud - AWS Online Tech ...Amazon Web Services
 
Building a Strong Foundation with AWS Storage Services
Building a Strong Foundation with AWS Storage ServicesBuilding a Strong Foundation with AWS Storage Services
Building a Strong Foundation with AWS Storage ServicesAmazon Web Services
 
Amazon Aurora and AWS Database Migration Service
Amazon Aurora and AWS Database Migration ServiceAmazon Aurora and AWS Database Migration Service
Amazon Aurora and AWS Database Migration ServiceAmazon Web Services
 
AWS tutorial-Part82: Exam Essentials#2
AWS tutorial-Part82: Exam Essentials#2AWS tutorial-Part82: Exam Essentials#2
AWS tutorial-Part82: Exam Essentials#2SaM theCloudGuy
 
Builders Day' - Databases on AWS: The Right Tool for The Right Job
Builders Day' - Databases on AWS: The Right Tool for The Right JobBuilders Day' - Databases on AWS: The Right Tool for The Right Job
Builders Day' - Databases on AWS: The Right Tool for The Right JobAmazon Web Services LATAM
 
AWSome Day 2016 - Module 4: Databases: Amazon DynamoDB and Amazon RDS
AWSome Day 2016 - Module 4: Databases: Amazon DynamoDB and Amazon RDSAWSome Day 2016 - Module 4: Databases: Amazon DynamoDB and Amazon RDS
AWSome Day 2016 - Module 4: Databases: Amazon DynamoDB and Amazon RDSAmazon Web Services
 
AWS tutorial-Part58:AWS Cloud Database Products-1st Intro Session
AWS tutorial-Part58:AWS Cloud Database Products-1st Intro SessionAWS tutorial-Part58:AWS Cloud Database Products-1st Intro Session
AWS tutorial-Part58:AWS Cloud Database Products-1st Intro SessionSaM theCloudGuy
 
Backup and Recovery with Cloud-Native Deduplication and Use Cases from the Fi...
Backup and Recovery with Cloud-Native Deduplication and Use Cases from the Fi...Backup and Recovery with Cloud-Native Deduplication and Use Cases from the Fi...
Backup and Recovery with Cloud-Native Deduplication and Use Cases from the Fi...Amazon Web Services
 
Cost and Performance Optimisation in Amazon RDS - AWS Summit Sydney 2018
Cost and Performance Optimisation in Amazon RDS - AWS Summit Sydney 2018Cost and Performance Optimisation in Amazon RDS - AWS Summit Sydney 2018
Cost and Performance Optimisation in Amazon RDS - AWS Summit Sydney 2018Amazon Web Services
 
Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018
Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018
Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018Amazon Web Services
 
AWS Database Migration Service
AWS Database Migration ServiceAWS Database Migration Service
AWS Database Migration Servicetechugo
 
Best Practices for Running Oracle Databases on Amazon RDS (DAT317) - AWS re:I...
Best Practices for Running Oracle Databases on Amazon RDS (DAT317) - AWS re:I...Best Practices for Running Oracle Databases on Amazon RDS (DAT317) - AWS re:I...
Best Practices for Running Oracle Databases on Amazon RDS (DAT317) - AWS re:I...Amazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 

What's hot (20)

Amazon RDS: Deep Dive - SRV310 - Chicago AWS Summit
Amazon RDS: Deep Dive - SRV310 - Chicago AWS SummitAmazon RDS: Deep Dive - SRV310 - Chicago AWS Summit
Amazon RDS: Deep Dive - SRV310 - Chicago AWS Summit
 
How Amazon uses AWS Analytics
How Amazon uses AWS AnalyticsHow Amazon uses AWS Analytics
How Amazon uses AWS Analytics
 
AWS-Enabled Disaster Recovery and Business Continuity for SIFIs
AWS-Enabled Disaster Recovery and Business Continuity for SIFIsAWS-Enabled Disaster Recovery and Business Continuity for SIFIs
AWS-Enabled Disaster Recovery and Business Continuity for SIFIs
 
Ask Me Anything about Amazon Aurora (DAT369-R1) - AWS re:Invent 2018
Ask Me Anything about Amazon Aurora (DAT369-R1) - AWS re:Invent 2018Ask Me Anything about Amazon Aurora (DAT369-R1) - AWS re:Invent 2018
Ask Me Anything about Amazon Aurora (DAT369-R1) - AWS re:Invent 2018
 
What's New in Amazon Aurora (DAT204-R1) - AWS re:Invent 2018
What's New in Amazon Aurora (DAT204-R1) - AWS re:Invent 2018What's New in Amazon Aurora (DAT204-R1) - AWS re:Invent 2018
What's New in Amazon Aurora (DAT204-R1) - AWS re:Invent 2018
 
Disaster Recovery Options on AWS Loft
Disaster Recovery Options on AWS LoftDisaster Recovery Options on AWS Loft
Disaster Recovery Options on AWS Loft
 
Building with AWS Databases: Match Your Workload to the Right Database (DAT30...
Building with AWS Databases: Match Your Workload to the Right Database (DAT30...Building with AWS Databases: Match Your Workload to the Right Database (DAT30...
Building with AWS Databases: Match Your Workload to the Right Database (DAT30...
 
Best Practices for Migrating Oracle Databases to the Cloud - AWS Online Tech ...
Best Practices for Migrating Oracle Databases to the Cloud - AWS Online Tech ...Best Practices for Migrating Oracle Databases to the Cloud - AWS Online Tech ...
Best Practices for Migrating Oracle Databases to the Cloud - AWS Online Tech ...
 
Building a Strong Foundation with AWS Storage Services
Building a Strong Foundation with AWS Storage ServicesBuilding a Strong Foundation with AWS Storage Services
Building a Strong Foundation with AWS Storage Services
 
Amazon Aurora and AWS Database Migration Service
Amazon Aurora and AWS Database Migration ServiceAmazon Aurora and AWS Database Migration Service
Amazon Aurora and AWS Database Migration Service
 
AWS tutorial-Part82: Exam Essentials#2
AWS tutorial-Part82: Exam Essentials#2AWS tutorial-Part82: Exam Essentials#2
AWS tutorial-Part82: Exam Essentials#2
 
Builders Day' - Databases on AWS: The Right Tool for The Right Job
Builders Day' - Databases on AWS: The Right Tool for The Right JobBuilders Day' - Databases on AWS: The Right Tool for The Right Job
Builders Day' - Databases on AWS: The Right Tool for The Right Job
 
AWSome Day 2016 - Module 4: Databases: Amazon DynamoDB and Amazon RDS
AWSome Day 2016 - Module 4: Databases: Amazon DynamoDB and Amazon RDSAWSome Day 2016 - Module 4: Databases: Amazon DynamoDB and Amazon RDS
AWSome Day 2016 - Module 4: Databases: Amazon DynamoDB and Amazon RDS
 
AWS tutorial-Part58:AWS Cloud Database Products-1st Intro Session
AWS tutorial-Part58:AWS Cloud Database Products-1st Intro SessionAWS tutorial-Part58:AWS Cloud Database Products-1st Intro Session
AWS tutorial-Part58:AWS Cloud Database Products-1st Intro Session
 
Backup and Recovery with Cloud-Native Deduplication and Use Cases from the Fi...
Backup and Recovery with Cloud-Native Deduplication and Use Cases from the Fi...Backup and Recovery with Cloud-Native Deduplication and Use Cases from the Fi...
Backup and Recovery with Cloud-Native Deduplication and Use Cases from the Fi...
 
Cost and Performance Optimisation in Amazon RDS - AWS Summit Sydney 2018
Cost and Performance Optimisation in Amazon RDS - AWS Summit Sydney 2018Cost and Performance Optimisation in Amazon RDS - AWS Summit Sydney 2018
Cost and Performance Optimisation in Amazon RDS - AWS Summit Sydney 2018
 
Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018
Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018
Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018
 
AWS Database Migration Service
AWS Database Migration ServiceAWS Database Migration Service
AWS Database Migration Service
 
Best Practices for Running Oracle Databases on Amazon RDS (DAT317) - AWS re:I...
Best Practices for Running Oracle Databases on Amazon RDS (DAT317) - AWS re:I...Best Practices for Running Oracle Databases on Amazon RDS (DAT317) - AWS re:I...
Best Practices for Running Oracle Databases on Amazon RDS (DAT317) - AWS re:I...
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 

Similar to AWS SSA Webinar 21 - Getting Started with Data lakes on AWS

What's New with Big Data Analytics
What's New with Big Data AnalyticsWhat's New with Big Data Analytics
What's New with Big Data AnalyticsAmazon Web Services
 
Introduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxIntroduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxSwathiPonugumati
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSAmazon Web Services
 
Building Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudBuilding Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudAmazon Web Services
 
AWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWSAWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWSAmazon Web Services
 
Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2Amazon Web Services
 
Full Stack Analytics on AWS - AWS Summit Cape Town 2017
Full Stack Analytics on AWS - AWS Summit Cape Town 2017 Full Stack Analytics on AWS - AWS Summit Cape Town 2017
Full Stack Analytics on AWS - AWS Summit Cape Town 2017 Amazon Web Services
 
AWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
AWS Innovate: Build a Data Lake on AWS- Johnathon MeichtryAWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
AWS Innovate: Build a Data Lake on AWS- Johnathon MeichtryAmazon Web Services Korea
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWSAmazon Web Services
 
AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS Amazon Web Services
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Amazon Web Services
 
Building your First Big Data Application on AWS
Building your First Big Data Application on AWSBuilding your First Big Data Application on AWS
Building your First Big Data Application on AWSAmazon Web Services
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...Amazon Web Services
 
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS Amazon Web Services LATAM
 
BDA305 Building Data Lakes and Analytics on AWS
BDA305 Building Data Lakes and Analytics on AWSBDA305 Building Data Lakes and Analytics on AWS
BDA305 Building Data Lakes and Analytics on AWSAmazon Web Services
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Michael Rys
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Amazon Web Services
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Amazon Web Services
 
Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200Amazon Web Services
 

Similar to AWS SSA Webinar 21 - Getting Started with Data lakes on AWS (20)

What's New with Big Data Analytics
What's New with Big Data AnalyticsWhat's New with Big Data Analytics
What's New with Big Data Analytics
 
Introduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxIntroduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptx
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWS
 
Building Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudBuilding Data Lakes in the AWS Cloud
Building Data Lakes in the AWS Cloud
 
AWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWSAWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWS
 
Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2
 
Full Stack Analytics on AWS - AWS Summit Cape Town 2017
Full Stack Analytics on AWS - AWS Summit Cape Town 2017 Full Stack Analytics on AWS - AWS Summit Cape Town 2017
Full Stack Analytics on AWS - AWS Summit Cape Town 2017
 
AWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
AWS Innovate: Build a Data Lake on AWS- Johnathon MeichtryAWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
AWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS
 
AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
 
Building your First Big Data Application on AWS
Building your First Big Data Application on AWSBuilding your First Big Data Application on AWS
Building your First Big Data Application on AWS
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
 
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
 
BDA305 Building Data Lakes and Analytics on AWS
BDA305 Building Data Lakes and Analytics on AWSBDA305 Building Data Lakes and Analytics on AWS
BDA305 Building Data Lakes and Analytics on AWS
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
 
Implementing a Data Lake
Implementing a Data LakeImplementing a Data Lake
Implementing a Data Lake
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200
 

More from Cobus Bernard

London Microservices Meetup: Lessons learnt adopting microservices
London Microservices  Meetup: Lessons learnt adopting microservicesLondon Microservices  Meetup: Lessons learnt adopting microservices
London Microservices Meetup: Lessons learnt adopting microservicesCobus Bernard
 
AWS SSA Webinar 34 - Getting started with databases on AWS - Managing DBs wit...
AWS SSA Webinar 34 - Getting started with databases on AWS - Managing DBs wit...AWS SSA Webinar 34 - Getting started with databases on AWS - Managing DBs wit...
AWS SSA Webinar 34 - Getting started with databases on AWS - Managing DBs wit...Cobus Bernard
 
AWS SSA Webinar 33 - Getting started with databases on AWS Amazon DynamoDB
AWS SSA Webinar 33 - Getting started with databases on AWS Amazon DynamoDBAWS SSA Webinar 33 - Getting started with databases on AWS Amazon DynamoDB
AWS SSA Webinar 33 - Getting started with databases on AWS Amazon DynamoDBCobus Bernard
 
AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the righ...
AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the righ...AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the righ...
AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the righ...Cobus Bernard
 
AWS SSA Webinar 30 - Getting Started with AWS - Infrastructure as Code - Terr...
AWS SSA Webinar 30 - Getting Started with AWS - Infrastructure as Code - Terr...AWS SSA Webinar 30 - Getting Started with AWS - Infrastructure as Code - Terr...
AWS SSA Webinar 30 - Getting Started with AWS - Infrastructure as Code - Terr...Cobus Bernard
 
AWS SSA Webinar 28 - Getting Started with AWS - Infrastructure as Code
AWS SSA Webinar 28 - Getting Started with AWS - Infrastructure as CodeAWS SSA Webinar 28 - Getting Started with AWS - Infrastructure as Code
AWS SSA Webinar 28 - Getting Started with AWS - Infrastructure as CodeCobus Bernard
 
AWS Webinar 24 - Getting Started with AWS - Understanding DR
AWS Webinar 24 - Getting Started with AWS - Understanding DRAWS Webinar 24 - Getting Started with AWS - Understanding DR
AWS Webinar 24 - Getting Started with AWS - Understanding DRCobus Bernard
 
AWS SSA Webinar 19 - Getting Started with Multi-Region Architecture: Services
AWS SSA Webinar 19 - Getting Started with Multi-Region Architecture: ServicesAWS SSA Webinar 19 - Getting Started with Multi-Region Architecture: Services
AWS SSA Webinar 19 - Getting Started with Multi-Region Architecture: ServicesCobus Bernard
 
AWS SSA Webinar 18 - Getting Started with Multi-Region Architecture: Data
AWS SSA Webinar 18 - Getting Started with Multi-Region Architecture: DataAWS SSA Webinar 18 - Getting Started with Multi-Region Architecture: Data
AWS SSA Webinar 18 - Getting Started with Multi-Region Architecture: DataCobus Bernard
 
AWS EMEA Online Summit - Live coding with containers
AWS EMEA Online Summit - Live coding with containersAWS EMEA Online Summit - Live coding with containers
AWS EMEA Online Summit - Live coding with containersCobus Bernard
 
AWS EMEA Online Summit - Blending Spot and On-Demand instances to optimizing ...
AWS EMEA Online Summit - Blending Spot and On-Demand instances to optimizing ...AWS EMEA Online Summit - Blending Spot and On-Demand instances to optimizing ...
AWS EMEA Online Summit - Blending Spot and On-Demand instances to optimizing ...Cobus Bernard
 
AWS SSA Webinar 17 - Getting Started on AWS with Amazon RDS
AWS SSA Webinar 17 - Getting Started on AWS with Amazon RDSAWS SSA Webinar 17 - Getting Started on AWS with Amazon RDS
AWS SSA Webinar 17 - Getting Started on AWS with Amazon RDSCobus Bernard
 
AWS SSA Webinar 16 - Getting Started on AWS with Amazon EC2
AWS SSA Webinar 16 - Getting Started on AWS with Amazon EC2AWS SSA Webinar 16 - Getting Started on AWS with Amazon EC2
AWS SSA Webinar 16 - Getting Started on AWS with Amazon EC2Cobus Bernard
 
AWS SSA Webinar 15 - Getting started on AWS with Containers: Amazon EKS
AWS SSA Webinar 15 - Getting started on AWS with Containers: Amazon EKSAWS SSA Webinar 15 - Getting started on AWS with Containers: Amazon EKS
AWS SSA Webinar 15 - Getting started on AWS with Containers: Amazon EKSCobus Bernard
 
AWS SSA Webinar 13 - Getting started on AWS with Containers: Amazon ECS
AWS SSA Webinar 13 - Getting started on AWS with Containers: Amazon ECSAWS SSA Webinar 13 - Getting started on AWS with Containers: Amazon ECS
AWS SSA Webinar 13 - Getting started on AWS with Containers: Amazon ECSCobus Bernard
 
AWS SSA Webinar 11 - Getting started on AWS: Security
AWS SSA Webinar 11 - Getting started on AWS: SecurityAWS SSA Webinar 11 - Getting started on AWS: Security
AWS SSA Webinar 11 - Getting started on AWS: SecurityCobus Bernard
 
AWS SSA Webinar 12 - Getting started on AWS with Containers
AWS SSA Webinar 12 - Getting started on AWS with ContainersAWS SSA Webinar 12 - Getting started on AWS with Containers
AWS SSA Webinar 12 - Getting started on AWS with ContainersCobus Bernard
 
HashiTalks Africa - Going multi-account on AWS with Terraform
HashiTalks Africa - Going multi-account on AWS with TerraformHashiTalks Africa - Going multi-account on AWS with Terraform
HashiTalks Africa - Going multi-account on AWS with TerraformCobus Bernard
 
AWS SSA Webinar 10 - Getting Started on AWS: Networking
AWS SSA Webinar 10 - Getting Started on AWS: NetworkingAWS SSA Webinar 10 - Getting Started on AWS: Networking
AWS SSA Webinar 10 - Getting Started on AWS: NetworkingCobus Bernard
 
AWS SSA Webinar 9 - Getting Started on AWS: Storage
AWS SSA Webinar 9 - Getting Started on AWS: StorageAWS SSA Webinar 9 - Getting Started on AWS: Storage
AWS SSA Webinar 9 - Getting Started on AWS: StorageCobus Bernard
 

More from Cobus Bernard (20)

London Microservices Meetup: Lessons learnt adopting microservices
London Microservices  Meetup: Lessons learnt adopting microservicesLondon Microservices  Meetup: Lessons learnt adopting microservices
London Microservices Meetup: Lessons learnt adopting microservices
 
AWS SSA Webinar 34 - Getting started with databases on AWS - Managing DBs wit...
AWS SSA Webinar 34 - Getting started with databases on AWS - Managing DBs wit...AWS SSA Webinar 34 - Getting started with databases on AWS - Managing DBs wit...
AWS SSA Webinar 34 - Getting started with databases on AWS - Managing DBs wit...
 
AWS SSA Webinar 33 - Getting started with databases on AWS Amazon DynamoDB
AWS SSA Webinar 33 - Getting started with databases on AWS Amazon DynamoDBAWS SSA Webinar 33 - Getting started with databases on AWS Amazon DynamoDB
AWS SSA Webinar 33 - Getting started with databases on AWS Amazon DynamoDB
 
AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the righ...
AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the righ...AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the righ...
AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the righ...
 
AWS SSA Webinar 30 - Getting Started with AWS - Infrastructure as Code - Terr...
AWS SSA Webinar 30 - Getting Started with AWS - Infrastructure as Code - Terr...AWS SSA Webinar 30 - Getting Started with AWS - Infrastructure as Code - Terr...
AWS SSA Webinar 30 - Getting Started with AWS - Infrastructure as Code - Terr...
 
AWS SSA Webinar 28 - Getting Started with AWS - Infrastructure as Code
AWS SSA Webinar 28 - Getting Started with AWS - Infrastructure as CodeAWS SSA Webinar 28 - Getting Started with AWS - Infrastructure as Code
AWS SSA Webinar 28 - Getting Started with AWS - Infrastructure as Code
 
AWS Webinar 24 - Getting Started with AWS - Understanding DR
AWS Webinar 24 - Getting Started with AWS - Understanding DRAWS Webinar 24 - Getting Started with AWS - Understanding DR
AWS Webinar 24 - Getting Started with AWS - Understanding DR
 
AWS SSA Webinar 19 - Getting Started with Multi-Region Architecture: Services
AWS SSA Webinar 19 - Getting Started with Multi-Region Architecture: ServicesAWS SSA Webinar 19 - Getting Started with Multi-Region Architecture: Services
AWS SSA Webinar 19 - Getting Started with Multi-Region Architecture: Services
 
AWS SSA Webinar 18 - Getting Started with Multi-Region Architecture: Data
AWS SSA Webinar 18 - Getting Started with Multi-Region Architecture: DataAWS SSA Webinar 18 - Getting Started with Multi-Region Architecture: Data
AWS SSA Webinar 18 - Getting Started with Multi-Region Architecture: Data
 
AWS EMEA Online Summit - Live coding with containers
AWS EMEA Online Summit - Live coding with containersAWS EMEA Online Summit - Live coding with containers
AWS EMEA Online Summit - Live coding with containers
 
AWS EMEA Online Summit - Blending Spot and On-Demand instances to optimizing ...
AWS EMEA Online Summit - Blending Spot and On-Demand instances to optimizing ...AWS EMEA Online Summit - Blending Spot and On-Demand instances to optimizing ...
AWS EMEA Online Summit - Blending Spot and On-Demand instances to optimizing ...
 
AWS SSA Webinar 17 - Getting Started on AWS with Amazon RDS
AWS SSA Webinar 17 - Getting Started on AWS with Amazon RDSAWS SSA Webinar 17 - Getting Started on AWS with Amazon RDS
AWS SSA Webinar 17 - Getting Started on AWS with Amazon RDS
 
AWS SSA Webinar 16 - Getting Started on AWS with Amazon EC2
AWS SSA Webinar 16 - Getting Started on AWS with Amazon EC2AWS SSA Webinar 16 - Getting Started on AWS with Amazon EC2
AWS SSA Webinar 16 - Getting Started on AWS with Amazon EC2
 
AWS SSA Webinar 15 - Getting started on AWS with Containers: Amazon EKS
AWS SSA Webinar 15 - Getting started on AWS with Containers: Amazon EKSAWS SSA Webinar 15 - Getting started on AWS with Containers: Amazon EKS
AWS SSA Webinar 15 - Getting started on AWS with Containers: Amazon EKS
 
AWS SSA Webinar 13 - Getting started on AWS with Containers: Amazon ECS
AWS SSA Webinar 13 - Getting started on AWS with Containers: Amazon ECSAWS SSA Webinar 13 - Getting started on AWS with Containers: Amazon ECS
AWS SSA Webinar 13 - Getting started on AWS with Containers: Amazon ECS
 
AWS SSA Webinar 11 - Getting started on AWS: Security
AWS SSA Webinar 11 - Getting started on AWS: SecurityAWS SSA Webinar 11 - Getting started on AWS: Security
AWS SSA Webinar 11 - Getting started on AWS: Security
 
AWS SSA Webinar 12 - Getting started on AWS with Containers
AWS SSA Webinar 12 - Getting started on AWS with ContainersAWS SSA Webinar 12 - Getting started on AWS with Containers
AWS SSA Webinar 12 - Getting started on AWS with Containers
 
HashiTalks Africa - Going multi-account on AWS with Terraform
HashiTalks Africa - Going multi-account on AWS with TerraformHashiTalks Africa - Going multi-account on AWS with Terraform
HashiTalks Africa - Going multi-account on AWS with Terraform
 
AWS SSA Webinar 10 - Getting Started on AWS: Networking
AWS SSA Webinar 10 - Getting Started on AWS: NetworkingAWS SSA Webinar 10 - Getting Started on AWS: Networking
AWS SSA Webinar 10 - Getting Started on AWS: Networking
 
AWS SSA Webinar 9 - Getting Started on AWS: Storage
AWS SSA Webinar 9 - Getting Started on AWS: StorageAWS SSA Webinar 9 - Getting Started on AWS: Storage
AWS SSA Webinar 9 - Getting Started on AWS: Storage
 

Recently uploaded

Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Dana Luther
 
Denver Web Design brochure for public viewing
Denver Web Design brochure for public viewingDenver Web Design brochure for public viewing
Denver Web Design brochure for public viewingbigorange77
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012rehmti665
 
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on DeliveryCall Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Deliverybabeytanya
 
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024APNIC
 
Call Girls in East Of Kailash 9711199171 Delhi Enjoy Call Girls With Our Escorts
Call Girls in East Of Kailash 9711199171 Delhi Enjoy Call Girls With Our EscortsCall Girls in East Of Kailash 9711199171 Delhi Enjoy Call Girls With Our Escorts
Call Girls in East Of Kailash 9711199171 Delhi Enjoy Call Girls With Our Escortsindian call girls near you
 
Russian Call girls in Dubai +971563133746 Dubai Call girls
Russian  Call girls in Dubai +971563133746 Dubai  Call girlsRussian  Call girls in Dubai +971563133746 Dubai  Call girls
Russian Call girls in Dubai +971563133746 Dubai Call girlsstephieert
 
Russian Call Girls Thane Swara 8617697112 Independent Escort Service Thane
Russian Call Girls Thane Swara 8617697112 Independent Escort Service ThaneRussian Call Girls Thane Swara 8617697112 Independent Escort Service Thane
Russian Call Girls Thane Swara 8617697112 Independent Escort Service ThaneCall girls in Ahmedabad High profile
 
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With RoomVIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Roomgirls4nights
 
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130  Available With RoomVIP Kolkata Call Girl Kestopur 👉 8250192130  Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Roomdivyansh0kumar0
 
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...SofiyaSharma5
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITMgdsc13
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsThierry TROUIN ☁
 
象限策略:Google Workspace 与 Microsoft 365 对业务的影响 .pdf
象限策略:Google Workspace 与 Microsoft 365 对业务的影响 .pdf象限策略:Google Workspace 与 Microsoft 365 对业务的影响 .pdf
象限策略:Google Workspace 与 Microsoft 365 对业务的影响 .pdfkeithzhangding
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebJames Anderson
 
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...aditipandeya
 

Recently uploaded (20)

Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
 
Denver Web Design brochure for public viewing
Denver Web Design brochure for public viewingDenver Web Design brochure for public viewing
Denver Web Design brochure for public viewing
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
 
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
 
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on DeliveryCall Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
 
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
 
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
Call Girls in East Of Kailash 9711199171 Delhi Enjoy Call Girls With Our Escorts
Call Girls in East Of Kailash 9711199171 Delhi Enjoy Call Girls With Our EscortsCall Girls in East Of Kailash 9711199171 Delhi Enjoy Call Girls With Our Escorts
Call Girls in East Of Kailash 9711199171 Delhi Enjoy Call Girls With Our Escorts
 
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
Russian Call girls in Dubai +971563133746 Dubai Call girls
Russian  Call girls in Dubai +971563133746 Dubai  Call girlsRussian  Call girls in Dubai +971563133746 Dubai  Call girls
Russian Call girls in Dubai +971563133746 Dubai Call girls
 
Russian Call Girls Thane Swara 8617697112 Independent Escort Service Thane
Russian Call Girls Thane Swara 8617697112 Independent Escort Service ThaneRussian Call Girls Thane Swara 8617697112 Independent Escort Service Thane
Russian Call Girls Thane Swara 8617697112 Independent Escort Service Thane
 
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With RoomVIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
 
Vip Call Girls Aerocity ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Aerocity ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Aerocity ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Aerocity ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130  Available With RoomVIP Kolkata Call Girl Kestopur 👉 8250192130  Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
 
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITM
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with Flows
 
象限策略:Google Workspace 与 Microsoft 365 对业务的影响 .pdf
象限策略:Google Workspace 与 Microsoft 365 对业务的影响 .pdf象限策略:Google Workspace 与 Microsoft 365 对业务的影响 .pdf
象限策略:Google Workspace 与 Microsoft 365 对业务的影响 .pdf
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
 
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
 

AWS SSA Webinar 21 - Getting Started with Data lakes on AWS

  • 1. Cobus Bernard Sr Developer Advocate Amazon Web Services Getting Started with Data Lakes on AWS @cobusbernard cobusbernard cobusbernard
  • 2. Agenda What is a Data Lake Storing data in S3 Steps to build a Data Lake AWS Lakeformation Demo Q&A
  • 3. A data lake is a centralised repository that allows you to store all your structured and unstructured data at any scale
  • 4. Why data lakes? Data Lakes provide: Relational and non-relational data Scale-out to EBs (1EB = 1,024 PB = 1,048,576 TB) Diverse set of analytics and machine learning tools Work on data without any data movement Designed for low cost storage and analytics OLTP ERP CRM LOB Data Warehouse Business Intelligence Data Lake 1001100001001010111001 0101011100101010000101 1111011010 0011110010110010110 0100011000010 Devices Web Sensors Social Catalog Machine Learning DW Queries Big data processing Interactive Real-time
  • 5. Build a secure data lake on Amazon S3 Amazon S3 Block Public Access Amazon S3 object lock Amazon S3 object tags Amazon S3 access points • Multi-tenant bucket • Dedicated access points • Custom permissions from a virtual private cloud (VPC) • Controls public access • Across AWS accounts & individual S3 bucket levels • Specify any type of public permissions via ACL or policy • Immutable Amazon S3 objects • Retention management controls • Data protection and compliance • Access control, lifecycle policies, analysis • Classify data, filter objects • Define replication policies FSx for Lustre
  • 6.
  • 7. Choosing the right data lake storage class Select storage class by data pipeline stage Raw data ETL • Small log files • Overwrites if synced • Short lived • Moved & deleted • Batched & archived Production data lake Historical data Amazon S3 Standard Amazon S3 Standard Amazon S3 Intelligent-Tiering Amazon S3 Glacier or S3 Glacier Deep Archive • Data churn • Small intermediates • Multiple transforms • Deletes <30 days • Output to data lake • Optimized sizes (MBs) • Many users • Unpredictable access • Long-lived assets • Hot to cool • Historical assets • ML model training • Compliance/Audit • Data protection • Planned restores Online cool data Amazon S3 Standard Infrequent Access and One Zone-IA • Replicated DR data • Infrequently accessed • Infrequent queries • ML model training
  • 8. Typical steps of building a data lake Setup storage1 Move data2 Cleanse, prep, and catalog data 3 Configure and enforce security and compliance policies 4 Make data available for analytics 5
  • 9. Data preparation accounts for ~80% of the work Building training sets Cleaning and organizing data Collecting data sets Mining data for patterns Refining algorithms Other
  • 10. Sample of steps required Find sources Create Amazon Simple Storage Service (Amazon S3) locations Configure access policies Map tables to Amazon S3 locations ETL jobs to load and clean data Create metadata access policies Configure access from analytics services Rinse and repeat for other: data sets, users, and end-services And more: manage and monitor ETL jobs update metadata catalog as data changes update policies across services as users and permissions change manually maintain cleansing scripts create audit processes for compliance … Manual | Error-prone | Time consuming
  • 11. Enforce security policies across multiple services Gain and manage new insights Identify, ingest, clean, and transform data Build a secure data lake in days AWS Lake Formation
  • 13. Register existing data or import new Amazon S3 forms the storage layer for Lake Formation Register existing S3 buckets that contain your data Ask Lake Formation to create required S3 buckets and import data into them Data is stored in your account. You have direct access to it. No lock-in. Data Lake Storage Data Catalog Access Control Data import Lake Formation Crawlers ML-based data prep
  • 14. Easily load data to your data lake logs DBs Blueprints Data Lake Storage Data Catalog Access Control Data import Lake Formation Crawlers ML-based data prep one-shot incremental
  • 15. With blueprints You 1. Point us to the source 2. Tell us the location to load to in your data lake 3. Specify how often you want to load the data Blueprints 1. Discover the source table(s) schema 2. Automatically convert to the target data format 3. Automatically partition the data based on the partitioning schema 4. Keep track of data that was already processed 5. You can customize any of the above
  • 17. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 19. Thank you! © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. Cobus Bernard Sr Developer Advocate Amazon Web Services @cobusbernard cobusbernard cobusbernard

Editor's Notes

  1. That’s why many customers are moving to a data lake architecture. A data lake is an architectural approach that helps you manage multiple data types from a wide variety of sources, both structured and unstructured, through a unified set of tools, so it's readily available to be categorized, processed, analyzed and consumed by diverse groups within an organization. Since data can be stored as-is, there is no need to convert it to a predefined schema and you no longer need to know what questions you want to ask of your data beforehand. batch, interactive, online, search, in-memory and other processing engines. It helps you manage multiple data types from a wide variety of sources, both structured and unstructured, through a unified set of tools
  2. Data lakes allow you to break down data silos and bring data into a single central repository. You can store a wide variety of data formats, at any scale and at low cost. Data lakes provide you a single source of truth and allow you access to the same data using a variety of analytics and machine-learning tools.
  3. Turns out there are a lot of steps involved in building data lakes 1/ Set up storage – Data lakes hold a massive amount of data. Before doing anything else, customers need to set up storage to hold all of that data. If they are using AWS they configure S3 buckets and partitions. If they are doing this on-prem they acquire hardware and set up large disk arrays to hold all of the data for their data lake. 2/ Move data -- Customers need to connect to different data sources on-premises, in the cloud, and on IoT devices. Then they need to collect and organize the relevant data sets from those sources, crawl the data to extract the schemas, and add metadata tags to the catalog. Customers do this today with a collection of file transfer and ETL tools, like AWS Glue. 3/ Clean and prepare data -- Next, that data must be carefully partitioned, indexed, and transformed to columnar formats to optimize for performance and cost. Customers need to clean, de-duplicate, and match related records. Today this is done using rigid and complex SQL statements that only work so well and are difficult to maintain. This process of collecting, cleaning, and transforming the incoming data is complex and must be manually monitored in order to avoid errors. 4/ Configure and enforce policies – Sensitive data must be secured according to compliance requirements. This means creating and applying data access, protection, and compliance policies to make sure you are meeting required standards. For example, restricting access to personally identifiable information (PII) at the table, column, or row level, encrypting all data, and keeping audit logs of who is accessing the data. Today customers use access control lists on S3 buckets or they use 3rd party encryption and access control software to secure the data. And for every analytics service that needs to access the data, customers need to create and maintain data access, protection and compliance policies for each one. For example, if you are running analysis against your data lake, using Redshift and Athena, you need to set up access control rules for each of these services. 5/ Make it easy to find data - Different people in your organizations, like analysts and data scientists, may have trouble finding and trusting data sets in the data lake. You need to make it easy for those end-users to easily find relevant and trusted data. To do this you must clearly label the data in a catalog of the data lake and provide users with the ability to access and analyze this data without making requests to IT. Each of these steps involve a lot of work because today a lot of it is done manually. Customers can spend months building data access and transformation workflows, mapping security and policy settings, and configuring tools and services for data movement, storage, cataloging, security, analytics, and machine learning. With all these steps, a fully productive data lake can take months to implement. TRANSITION: We’ve learned from the tens of thousands of customers running analytics on AWS that most customers that want to do analytics want to build a data lake, and many of them want this to be easier and faster than it is today.
  4. A recent study by CrowdFlower who surveyed ~80 data scientists about their jobs. They found Data scientists spend 60% of their time on cleaning and organizing data. Collecting data sets comes second at 19% of their time, meaning data scientists spend around ~80% of their time on preparing and managing data for analysis. They also found data preparation was the least enjoyable parts of their work! https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/#6493d6c76f63 The question we have to ask ourselves is if we can make data preparation easier? Can we minimize the time that people are collecting data sets and cleaning/organizing their data?
  5. Lake Formation automates many of the steps we discussed, allowing customers to get started with just a few clicks from a single, unified dashboard. 1/ Identify, ingest, clean, and transform data: With Lake Formation, you can move, store, catalog, and clean your data faster. 2/ Enforce security policies across multiple services: Once your data sources are setup, you then define security, governance, and auditing policies in one place, and enforce those policies for all users and all applications. 3/ Gain and manage new insights: With Michigan you build a data catalog that describes available data sets and their appropriate business uses. This makes your users more productive by helping them find the right data set to analyze. By providing a catalog of your data and consistent security enforcement, Michigan makes it easier for your analysts and data scientists to combine multiple analytic tools, like Athena, Redshift, and EMR, across diverse data sets.
  6. With just a few clicks, you can setup your data lake on Amazon S3 and start ingesting data that is readily queryable. Lake For To get started, you go to the Michigan dashboard in the AWS console, add your data sources and then Michigan will crawl those sources and move the data into your new Amazon S3 data lake. Michigan uses machine learning to automatically lay out the data in Amazon S3 partitions, change it into formats for faster analytics, like Apache Parquet and ORC, and also de-duplicates and finds matching records to increase data quality. From a single screen set up all of the permissions for your data lake and they will be implemented across all services accessing this data - analytics and machine learning services (Amazon Redshift, Amazon Athena, and Amazon EMR.) This reduces the hassle in re-defining policies across multiple services and provides consistent enforcement and compliance of those policies. mation …
  7. Blueprints heavily leverage the functionality in AWS Glue: We use Glue crawlers and connections to connect and discover the raw data that needs to be ingested. We use Glue code-gen and jobs to generate the ingest code to bring that data into the data lake. We leverage the data catalog for organizing the metadata. We have added a workflow construct to stitch together crawlers, jobs, and allow for monitoring for individual workflows. They’re an natural extension of the AWS Glue capabilities.