SlideShare a Scribd company logo

AWS Lake Formation Deep Dive

Learn about building out your data lake on AWS using AWS Lake Formation.

1 of 62
Download to read offline
© 2020, Amazon Web Services, Inc. or its Affiliates.
Cobus Bernard
Senior Developer Advocate
Amazon Web Services
AWS Lake Formation
Deep Dive
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
What’s coming up:
Bonus
• QR codes
• Data Lake refresher
• AWS Lake Formation & Related Services
• Use-case examples
• Going forward
• QA
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
Poll – Which of these are you using?
• S3 Lifecycle policy
• Glue ETL with ML
• Athena to query Relational Databases
• AWS Datalake
Amazon S3 | AWS Glue
AWS Direct Connect
AWS Snowball
AWS Snowmobile
AWS Database Migration Service
AWS IoT Core
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
Amazon Kinesis Video Streams
On-premises
Data Movement
Amazon SageMaker
AWS Deep Learning AMIs
Amazon Rekognition
Amazon Lex
AWS DeepLens
Amazon Comprehend
Amazon Translate
Amazon Transcribe
Amazon Polly
Amazon Athena
Amazon EMR
Amazon Redshift
Amazon Elasticsearch Service
Amazon Kinesis
Amazon QuickSight
Analytics
Machine Learning
Real-time
Data Movement
Data Lake
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
Data Lake Refresher
Concepts & flow
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
A data lake isa centralized repository that allows you
to store all your structured and unstructured dataat
any scale
DataLakeDefinition
© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
• All data in one place, a single source of truth
• Support Different Formats - structured/semi-structured/unstructured/raw data
• Supports fast ingestion and consumption
• Schema on read
• Designed for low-cost storage
• Decouples storage and compute
• Supports protection and security rules
DataLakeMainConcepts

Recommended

Introduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxIntroduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxSwathiPonugumati
 
Best Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSBest Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSAmazon Web Services
 
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineThe Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineAmazon Web Services
 
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017Amazon Web Services
 
Building Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueBuilding Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueAmazon Web Services
 

More Related Content

What's hot

How to go from zero to data lakes in days - ADB202 - New York AWS Summit
How to go from zero to data lakes in days - ADB202 - New York AWS SummitHow to go from zero to data lakes in days - ADB202 - New York AWS Summit
How to go from zero to data lakes in days - ADB202 - New York AWS SummitAmazon Web Services
 
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)Amazon Web Services Korea
 
Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018
Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018
Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018Amazon Web Services
 
Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200Amazon Web Services
 
Architecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWSArchitecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWSAmazon Web Services
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Amazon Web Services
 
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나Amazon Web Services Korea
 
How to backup, restore and archive your data on AWS
How to backup, restore and archive your data on AWSHow to backup, restore and archive your data on AWS
How to backup, restore and archive your data on AWSAmazon Web Services
 
Amazon S3 & Amazon Glacier - Object Storage Overview
Amazon S3 & Amazon Glacier - Object Storage OverviewAmazon S3 & Amazon Glacier - Object Storage Overview
Amazon S3 & Amazon Glacier - Object Storage OverviewAmazon Web Services
 
Getting Started with AWS Enterprise Applications: WorkSpaces, WorkMail, WorkDocs
Getting Started with AWS Enterprise Applications: WorkSpaces, WorkMail, WorkDocsGetting Started with AWS Enterprise Applications: WorkSpaces, WorkMail, WorkDocs
Getting Started with AWS Enterprise Applications: WorkSpaces, WorkMail, WorkDocsAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Building Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudBuilding Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudAmazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Secure, Build and Deduplicate Your Data Lake Data with Amazon Lake Formation
Secure, Build and Deduplicate Your Data Lake Data with Amazon Lake FormationSecure, Build and Deduplicate Your Data Lake Data with Amazon Lake Formation
Secure, Build and Deduplicate Your Data Lake Data with Amazon Lake FormationAmazon Web Services
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaAmazon Web Services
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesAmazon Web Services
 
Deep Dive on Amazon Athena - AWS Online Tech Talks
Deep Dive on Amazon Athena - AWS Online Tech TalksDeep Dive on Amazon Athena - AWS Online Tech Talks
Deep Dive on Amazon Athena - AWS Online Tech TalksAmazon Web Services
 

What's hot (20)

Amazon QuickSight
Amazon QuickSightAmazon QuickSight
Amazon QuickSight
 
How to go from zero to data lakes in days - ADB202 - New York AWS Summit
How to go from zero to data lakes in days - ADB202 - New York AWS SummitHow to go from zero to data lakes in days - ADB202 - New York AWS Summit
How to go from zero to data lakes in days - ADB202 - New York AWS Summit
 
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
 
Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018
Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018
Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018
 
Amazon QuickSight
Amazon QuickSightAmazon QuickSight
Amazon QuickSight
 
AWS glue technical enablement training
AWS glue technical enablement trainingAWS glue technical enablement training
AWS glue technical enablement training
 
Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200
 
Architecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWSArchitecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWS
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
 
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나
 
How to backup, restore and archive your data on AWS
How to backup, restore and archive your data on AWSHow to backup, restore and archive your data on AWS
How to backup, restore and archive your data on AWS
 
Amazon S3 & Amazon Glacier - Object Storage Overview
Amazon S3 & Amazon Glacier - Object Storage OverviewAmazon S3 & Amazon Glacier - Object Storage Overview
Amazon S3 & Amazon Glacier - Object Storage Overview
 
Getting Started with AWS Enterprise Applications: WorkSpaces, WorkMail, WorkDocs
Getting Started with AWS Enterprise Applications: WorkSpaces, WorkMail, WorkDocsGetting Started with AWS Enterprise Applications: WorkSpaces, WorkMail, WorkDocs
Getting Started with AWS Enterprise Applications: WorkSpaces, WorkMail, WorkDocs
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Building Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudBuilding Data Lakes in the AWS Cloud
Building Data Lakes in the AWS Cloud
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Secure, Build and Deduplicate Your Data Lake Data with Amazon Lake Formation
Secure, Build and Deduplicate Your Data Lake Data with Amazon Lake FormationSecure, Build and Deduplicate Your Data Lake Data with Amazon Lake Formation
Secure, Build and Deduplicate Your Data Lake Data with Amazon Lake Formation
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
 
Deep Dive on Amazon Athena - AWS Online Tech Talks
Deep Dive on Amazon Athena - AWS Online Tech TalksDeep Dive on Amazon Athena - AWS Online Tech Talks
Deep Dive on Amazon Athena - AWS Online Tech Talks
 

Similar to AWS Lake Formation Deep Dive

Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Amazon Web Services
 
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAdir Sharabi
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesBuild Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesAmazon Web Services
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudBuilding a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudAmazon Web Services
 
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...Amazon Web Services
 
BI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSBI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSAmazon Web Services
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfAmazon Web Services
 
Building-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWSBuilding-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWSAmazon Web Services
 
Building Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWSBuilding Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWSAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Module 1 - CP Datalake on AWS
Module 1 - CP Datalake on AWSModule 1 - CP Datalake on AWS
Module 1 - CP Datalake on AWSLam Le
 
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSAWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSSteven Hsieh
 

Similar to AWS Lake Formation Deep Dive (20)

Data_Analytics_and_AI_ML
Data_Analytics_and_AI_MLData_Analytics_and_AI_ML
Data_Analytics_and_AI_ML
 
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28
 
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Construindo data lakes e analytics com AWS
Construindo data lakes e analytics com AWSConstruindo data lakes e analytics com AWS
Construindo data lakes e analytics com AWS
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesBuild Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudBuilding a Modern Data Platform in the Cloud
Building a Modern Data Platform in the Cloud
 
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
 
Data Warehouses and Data Lakes
Data Warehouses and Data LakesData Warehouses and Data Lakes
Data Warehouses and Data Lakes
 
BI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSBI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWS
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdf
 
Building-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWSBuilding-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWS
 
Building Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWSBuilding Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Module 1 - CP Datalake on AWS
Module 1 - CP Datalake on AWSModule 1 - CP Datalake on AWS
Module 1 - CP Datalake on AWS
 
Implementing a Data Lake
Implementing a Data LakeImplementing a Data Lake
Implementing a Data Lake
 
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSAWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
 

More from Cobus Bernard

London Microservices Meetup: Lessons learnt adopting microservices
London Microservices  Meetup: Lessons learnt adopting microservicesLondon Microservices  Meetup: Lessons learnt adopting microservices
London Microservices Meetup: Lessons learnt adopting microservicesCobus Bernard
 
AWS SSA Webinar 34 - Getting started with databases on AWS - Managing DBs wit...
AWS SSA Webinar 34 - Getting started with databases on AWS - Managing DBs wit...AWS SSA Webinar 34 - Getting started with databases on AWS - Managing DBs wit...
AWS SSA Webinar 34 - Getting started with databases on AWS - Managing DBs wit...Cobus Bernard
 
AWS SSA Webinar 33 - Getting started with databases on AWS Amazon DynamoDB
AWS SSA Webinar 33 - Getting started with databases on AWS Amazon DynamoDBAWS SSA Webinar 33 - Getting started with databases on AWS Amazon DynamoDB
AWS SSA Webinar 33 - Getting started with databases on AWS Amazon DynamoDBCobus Bernard
 
AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the righ...
AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the righ...AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the righ...
AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the righ...Cobus Bernard
 
AWS SSA Webinar 30 - Getting Started with AWS - Infrastructure as Code - Terr...
AWS SSA Webinar 30 - Getting Started with AWS - Infrastructure as Code - Terr...AWS SSA Webinar 30 - Getting Started with AWS - Infrastructure as Code - Terr...
AWS SSA Webinar 30 - Getting Started with AWS - Infrastructure as Code - Terr...Cobus Bernard
 
AWS SSA Webinar 28 - Getting Started with AWS - Infrastructure as Code
AWS SSA Webinar 28 - Getting Started with AWS - Infrastructure as CodeAWS SSA Webinar 28 - Getting Started with AWS - Infrastructure as Code
AWS SSA Webinar 28 - Getting Started with AWS - Infrastructure as CodeCobus Bernard
 
AWS Webinar 24 - Getting Started with AWS - Understanding DR
AWS Webinar 24 - Getting Started with AWS - Understanding DRAWS Webinar 24 - Getting Started with AWS - Understanding DR
AWS Webinar 24 - Getting Started with AWS - Understanding DRCobus Bernard
 
AWS Webinar 23 - Getting Started with AWS - Understanding total cost of owner...
AWS Webinar 23 - Getting Started with AWS - Understanding total cost of owner...AWS Webinar 23 - Getting Started with AWS - Understanding total cost of owner...
AWS Webinar 23 - Getting Started with AWS - Understanding total cost of owner...Cobus Bernard
 
AWS SSA Webinar 21 - Getting Started with Data lakes on AWS
AWS SSA Webinar 21 - Getting Started with Data lakes on AWSAWS SSA Webinar 21 - Getting Started with Data lakes on AWS
AWS SSA Webinar 21 - Getting Started with Data lakes on AWSCobus Bernard
 
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSAWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSCobus Bernard
 
AWS SSA Webinar 19 - Getting Started with Multi-Region Architecture: Services
AWS SSA Webinar 19 - Getting Started with Multi-Region Architecture: ServicesAWS SSA Webinar 19 - Getting Started with Multi-Region Architecture: Services
AWS SSA Webinar 19 - Getting Started with Multi-Region Architecture: ServicesCobus Bernard
 
AWS SSA Webinar 18 - Getting Started with Multi-Region Architecture: Data
AWS SSA Webinar 18 - Getting Started with Multi-Region Architecture: DataAWS SSA Webinar 18 - Getting Started with Multi-Region Architecture: Data
AWS SSA Webinar 18 - Getting Started with Multi-Region Architecture: DataCobus Bernard
 
AWS EMEA Online Summit - Live coding with containers
AWS EMEA Online Summit - Live coding with containersAWS EMEA Online Summit - Live coding with containers
AWS EMEA Online Summit - Live coding with containersCobus Bernard
 
AWS EMEA Online Summit - Blending Spot and On-Demand instances to optimizing ...
AWS EMEA Online Summit - Blending Spot and On-Demand instances to optimizing ...AWS EMEA Online Summit - Blending Spot and On-Demand instances to optimizing ...
AWS EMEA Online Summit - Blending Spot and On-Demand instances to optimizing ...Cobus Bernard
 
AWS SSA Webinar 17 - Getting Started on AWS with Amazon RDS
AWS SSA Webinar 17 - Getting Started on AWS with Amazon RDSAWS SSA Webinar 17 - Getting Started on AWS with Amazon RDS
AWS SSA Webinar 17 - Getting Started on AWS with Amazon RDSCobus Bernard
 
AWS SSA Webinar 16 - Getting Started on AWS with Amazon EC2
AWS SSA Webinar 16 - Getting Started on AWS with Amazon EC2AWS SSA Webinar 16 - Getting Started on AWS with Amazon EC2
AWS SSA Webinar 16 - Getting Started on AWS with Amazon EC2Cobus Bernard
 
AWS SSA Webinar 15 - Getting started on AWS with Containers: Amazon EKS
AWS SSA Webinar 15 - Getting started on AWS with Containers: Amazon EKSAWS SSA Webinar 15 - Getting started on AWS with Containers: Amazon EKS
AWS SSA Webinar 15 - Getting started on AWS with Containers: Amazon EKSCobus Bernard
 
AWS SSA Webinar 13 - Getting started on AWS with Containers: Amazon ECS
AWS SSA Webinar 13 - Getting started on AWS with Containers: Amazon ECSAWS SSA Webinar 13 - Getting started on AWS with Containers: Amazon ECS
AWS SSA Webinar 13 - Getting started on AWS with Containers: Amazon ECSCobus Bernard
 
AWS SSA Webinar 11 - Getting started on AWS: Security
AWS SSA Webinar 11 - Getting started on AWS: SecurityAWS SSA Webinar 11 - Getting started on AWS: Security
AWS SSA Webinar 11 - Getting started on AWS: SecurityCobus Bernard
 
AWS SSA Webinar 12 - Getting started on AWS with Containers
AWS SSA Webinar 12 - Getting started on AWS with ContainersAWS SSA Webinar 12 - Getting started on AWS with Containers
AWS SSA Webinar 12 - Getting started on AWS with ContainersCobus Bernard
 

More from Cobus Bernard (20)

London Microservices Meetup: Lessons learnt adopting microservices
London Microservices  Meetup: Lessons learnt adopting microservicesLondon Microservices  Meetup: Lessons learnt adopting microservices
London Microservices Meetup: Lessons learnt adopting microservices
 
AWS SSA Webinar 34 - Getting started with databases on AWS - Managing DBs wit...
AWS SSA Webinar 34 - Getting started with databases on AWS - Managing DBs wit...AWS SSA Webinar 34 - Getting started with databases on AWS - Managing DBs wit...
AWS SSA Webinar 34 - Getting started with databases on AWS - Managing DBs wit...
 
AWS SSA Webinar 33 - Getting started with databases on AWS Amazon DynamoDB
AWS SSA Webinar 33 - Getting started with databases on AWS Amazon DynamoDBAWS SSA Webinar 33 - Getting started with databases on AWS Amazon DynamoDB
AWS SSA Webinar 33 - Getting started with databases on AWS Amazon DynamoDB
 
AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the righ...
AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the righ...AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the righ...
AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the righ...
 
AWS SSA Webinar 30 - Getting Started with AWS - Infrastructure as Code - Terr...
AWS SSA Webinar 30 - Getting Started with AWS - Infrastructure as Code - Terr...AWS SSA Webinar 30 - Getting Started with AWS - Infrastructure as Code - Terr...
AWS SSA Webinar 30 - Getting Started with AWS - Infrastructure as Code - Terr...
 
AWS SSA Webinar 28 - Getting Started with AWS - Infrastructure as Code
AWS SSA Webinar 28 - Getting Started with AWS - Infrastructure as CodeAWS SSA Webinar 28 - Getting Started with AWS - Infrastructure as Code
AWS SSA Webinar 28 - Getting Started with AWS - Infrastructure as Code
 
AWS Webinar 24 - Getting Started with AWS - Understanding DR
AWS Webinar 24 - Getting Started with AWS - Understanding DRAWS Webinar 24 - Getting Started with AWS - Understanding DR
AWS Webinar 24 - Getting Started with AWS - Understanding DR
 
AWS Webinar 23 - Getting Started with AWS - Understanding total cost of owner...
AWS Webinar 23 - Getting Started with AWS - Understanding total cost of owner...AWS Webinar 23 - Getting Started with AWS - Understanding total cost of owner...
AWS Webinar 23 - Getting Started with AWS - Understanding total cost of owner...
 
AWS SSA Webinar 21 - Getting Started with Data lakes on AWS
AWS SSA Webinar 21 - Getting Started with Data lakes on AWSAWS SSA Webinar 21 - Getting Started with Data lakes on AWS
AWS SSA Webinar 21 - Getting Started with Data lakes on AWS
 
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSAWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
 
AWS SSA Webinar 19 - Getting Started with Multi-Region Architecture: Services
AWS SSA Webinar 19 - Getting Started with Multi-Region Architecture: ServicesAWS SSA Webinar 19 - Getting Started with Multi-Region Architecture: Services
AWS SSA Webinar 19 - Getting Started with Multi-Region Architecture: Services
 
AWS SSA Webinar 18 - Getting Started with Multi-Region Architecture: Data
AWS SSA Webinar 18 - Getting Started with Multi-Region Architecture: DataAWS SSA Webinar 18 - Getting Started with Multi-Region Architecture: Data
AWS SSA Webinar 18 - Getting Started with Multi-Region Architecture: Data
 
AWS EMEA Online Summit - Live coding with containers
AWS EMEA Online Summit - Live coding with containersAWS EMEA Online Summit - Live coding with containers
AWS EMEA Online Summit - Live coding with containers
 
AWS EMEA Online Summit - Blending Spot and On-Demand instances to optimizing ...
AWS EMEA Online Summit - Blending Spot and On-Demand instances to optimizing ...AWS EMEA Online Summit - Blending Spot and On-Demand instances to optimizing ...
AWS EMEA Online Summit - Blending Spot and On-Demand instances to optimizing ...
 
AWS SSA Webinar 17 - Getting Started on AWS with Amazon RDS
AWS SSA Webinar 17 - Getting Started on AWS with Amazon RDSAWS SSA Webinar 17 - Getting Started on AWS with Amazon RDS
AWS SSA Webinar 17 - Getting Started on AWS with Amazon RDS
 
AWS SSA Webinar 16 - Getting Started on AWS with Amazon EC2
AWS SSA Webinar 16 - Getting Started on AWS with Amazon EC2AWS SSA Webinar 16 - Getting Started on AWS with Amazon EC2
AWS SSA Webinar 16 - Getting Started on AWS with Amazon EC2
 
AWS SSA Webinar 15 - Getting started on AWS with Containers: Amazon EKS
AWS SSA Webinar 15 - Getting started on AWS with Containers: Amazon EKSAWS SSA Webinar 15 - Getting started on AWS with Containers: Amazon EKS
AWS SSA Webinar 15 - Getting started on AWS with Containers: Amazon EKS
 
AWS SSA Webinar 13 - Getting started on AWS with Containers: Amazon ECS
AWS SSA Webinar 13 - Getting started on AWS with Containers: Amazon ECSAWS SSA Webinar 13 - Getting started on AWS with Containers: Amazon ECS
AWS SSA Webinar 13 - Getting started on AWS with Containers: Amazon ECS
 
AWS SSA Webinar 11 - Getting started on AWS: Security
AWS SSA Webinar 11 - Getting started on AWS: SecurityAWS SSA Webinar 11 - Getting started on AWS: Security
AWS SSA Webinar 11 - Getting started on AWS: Security
 
AWS SSA Webinar 12 - Getting started on AWS with Containers
AWS SSA Webinar 12 - Getting started on AWS with ContainersAWS SSA Webinar 12 - Getting started on AWS with Containers
AWS SSA Webinar 12 - Getting started on AWS with Containers
 

Recently uploaded

NANOG 90: 'BGP in 2023' presented by Geoff Huston
NANOG 90: 'BGP in 2023' presented by Geoff HustonNANOG 90: 'BGP in 2023' presented by Geoff Huston
NANOG 90: 'BGP in 2023' presented by Geoff HustonAPNIC
 
Reactive programming with Spring Webflux.pptx
Reactive programming with Spring Webflux.pptxReactive programming with Spring Webflux.pptx
Reactive programming with Spring Webflux.pptxJoão Esperancinha
 
Seagate HDD Firmware Repair Tool Datasheet 2024
Seagate HDD Firmware Repair Tool Datasheet 2024Seagate HDD Firmware Repair Tool Datasheet 2024
Seagate HDD Firmware Repair Tool Datasheet 2024Dolphin Data Lab
 
WAN-IFRA: World Press Trends Outlook 2023-2024
WAN-IFRA: World Press Trends Outlook 2023-2024WAN-IFRA: World Press Trends Outlook 2023-2024
WAN-IFRA: World Press Trends Outlook 2023-2024Damian Radcliffe
 
Information Technology Project to Create a Business
Information Technology Project to Create a BusinessInformation Technology Project to Create a Business
Information Technology Project to Create a Businessmbowl010
 
ConFoo 2024 - Need for Speed: Removing speed bumps in API Projects
ConFoo 2024  - Need for Speed: Removing speed bumps in API ProjectsConFoo 2024  - Need for Speed: Removing speed bumps in API Projects
ConFoo 2024 - Need for Speed: Removing speed bumps in API ProjectsŁukasz Chruściel
 
DNS-OARC 42: Is the DNS ready for IPv6? presentation by Geoff Huston
DNS-OARC 42: Is the DNS ready for IPv6? presentation by Geoff HustonDNS-OARC 42: Is the DNS ready for IPv6? presentation by Geoff Huston
DNS-OARC 42: Is the DNS ready for IPv6? presentation by Geoff HustonAPNIC
 
Elevate Your Business: Unleashing Collaboration and Efficiency through Expert...
Elevate Your Business: Unleashing Collaboration and Efficiency through Expert...Elevate Your Business: Unleashing Collaboration and Efficiency through Expert...
Elevate Your Business: Unleashing Collaboration and Efficiency through Expert...Prometix Pty Ltd
 
Biometrics Technology Intresting PPT
Biometrics Technology Intresting PPTBiometrics Technology Intresting PPT
Biometrics Technology Intresting PPTPraveenKumarThota7
 
ConFoo 2024 - Sylius 2.0, top-notch eCommerce for customizable solution
ConFoo 2024 - Sylius 2.0, top-notch eCommerce for customizable solutionConFoo 2024 - Sylius 2.0, top-notch eCommerce for customizable solution
ConFoo 2024 - Sylius 2.0, top-notch eCommerce for customizable solutionŁukasz Chruściel
 

Recently uploaded (10)

NANOG 90: 'BGP in 2023' presented by Geoff Huston
NANOG 90: 'BGP in 2023' presented by Geoff HustonNANOG 90: 'BGP in 2023' presented by Geoff Huston
NANOG 90: 'BGP in 2023' presented by Geoff Huston
 
Reactive programming with Spring Webflux.pptx
Reactive programming with Spring Webflux.pptxReactive programming with Spring Webflux.pptx
Reactive programming with Spring Webflux.pptx
 
Seagate HDD Firmware Repair Tool Datasheet 2024
Seagate HDD Firmware Repair Tool Datasheet 2024Seagate HDD Firmware Repair Tool Datasheet 2024
Seagate HDD Firmware Repair Tool Datasheet 2024
 
WAN-IFRA: World Press Trends Outlook 2023-2024
WAN-IFRA: World Press Trends Outlook 2023-2024WAN-IFRA: World Press Trends Outlook 2023-2024
WAN-IFRA: World Press Trends Outlook 2023-2024
 
Information Technology Project to Create a Business
Information Technology Project to Create a BusinessInformation Technology Project to Create a Business
Information Technology Project to Create a Business
 
ConFoo 2024 - Need for Speed: Removing speed bumps in API Projects
ConFoo 2024  - Need for Speed: Removing speed bumps in API ProjectsConFoo 2024  - Need for Speed: Removing speed bumps in API Projects
ConFoo 2024 - Need for Speed: Removing speed bumps in API Projects
 
DNS-OARC 42: Is the DNS ready for IPv6? presentation by Geoff Huston
DNS-OARC 42: Is the DNS ready for IPv6? presentation by Geoff HustonDNS-OARC 42: Is the DNS ready for IPv6? presentation by Geoff Huston
DNS-OARC 42: Is the DNS ready for IPv6? presentation by Geoff Huston
 
Elevate Your Business: Unleashing Collaboration and Efficiency through Expert...
Elevate Your Business: Unleashing Collaboration and Efficiency through Expert...Elevate Your Business: Unleashing Collaboration and Efficiency through Expert...
Elevate Your Business: Unleashing Collaboration and Efficiency through Expert...
 
Biometrics Technology Intresting PPT
Biometrics Technology Intresting PPTBiometrics Technology Intresting PPT
Biometrics Technology Intresting PPT
 
ConFoo 2024 - Sylius 2.0, top-notch eCommerce for customizable solution
ConFoo 2024 - Sylius 2.0, top-notch eCommerce for customizable solutionConFoo 2024 - Sylius 2.0, top-notch eCommerce for customizable solution
ConFoo 2024 - Sylius 2.0, top-notch eCommerce for customizable solution
 

AWS Lake Formation Deep Dive

  • 1. © 2020, Amazon Web Services, Inc. or its Affiliates. Cobus Bernard Senior Developer Advocate Amazon Web Services AWS Lake Formation Deep Dive
  • 2. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. What’s coming up: Bonus • QR codes • Data Lake refresher • AWS Lake Formation & Related Services • Use-case examples • Going forward • QA
  • 3. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Poll – Which of these are you using? • S3 Lifecycle policy • Glue ETL with ML • Athena to query Relational Databases • AWS Datalake Amazon S3 | AWS Glue AWS Direct Connect AWS Snowball AWS Snowmobile AWS Database Migration Service AWS IoT Core Amazon Kinesis Data Firehose Amazon Kinesis Data Streams Amazon Kinesis Video Streams On-premises Data Movement Amazon SageMaker AWS Deep Learning AMIs Amazon Rekognition Amazon Lex AWS DeepLens Amazon Comprehend Amazon Translate Amazon Transcribe Amazon Polly Amazon Athena Amazon EMR Amazon Redshift Amazon Elasticsearch Service Amazon Kinesis Amazon QuickSight Analytics Machine Learning Real-time Data Movement Data Lake
  • 4. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Data Lake Refresher Concepts & flow
  • 5. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. A data lake isa centralized repository that allows you to store all your structured and unstructured dataat any scale DataLakeDefinition
  • 6. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. • All data in one place, a single source of truth • Support Different Formats - structured/semi-structured/unstructured/raw data • Supports fast ingestion and consumption • Schema on read • Designed for low-cost storage • Decouples storage and compute • Supports protection and security rules DataLakeMainConcepts
  • 7. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Traditional Data lake Then: - Enterprise Data Warehouse - Batched based ETL for BI analysis - Dashboarding tools Data silos to ERP CRM LOB DW Silo 1 Business Intelligence Devices Web Sensors Social DW Silo 2 Business Intelligence
  • 8. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Modern Data lake Now: - More data, less structured - Accessed in various ways - Real time-streaming - Machine learning, scientific, analytics, regulatory requirements - Low cost storage & analytics OLTP ERP CRM LOB DataWarehouse Business Intelligence Data Lake 10011000010010101110010101 011100101010000101111101101 0 0011110010110010110 0100011000010 Devices Web Sensors Social Catalog Machine Learning DW Queries Big data processing Interactive Real-time
  • 9. © 2020, Amazon Web Services, Inc. or its Affiliates. Why choose AWS for data lakes and analytics? Most secure infrastructure for analytics Most scalable and cost effective Easiest to build data lakes and analytics Most comprehensive and open 1 2 3 4
  • 10. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Moving to data lake architectures Extends or evolves DW architectures Store any data in any format Durable, available, and exabyte scale Secure, compliant, auditable Run any type of analytics from DW to Predictive Data Warehousing Analytics Machine Learning Data lake
  • 11. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. More data lakes and analytics than anywhere else Tens of thousands of data lakes run on AWS across all industries
  • 12. © 2020, Amazon Web Services, Inc. or its Affiliates. Workflow of AWS services for big data End-to-end Pipeline
  • 13. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Simplified Data Pipeline Data Sources Ingest Process & Analyze Consume Amazon S3 Catalog Store Amazon S3 Store
  • 14. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Multiple Data Sources Data sources Amazon DynamoDB Web logs / cookies ERP Connected devices Ingest Process & Analyze Consume Amazon S3 Catalog Store Amazon S3 Store
  • 15. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Amazon DynamoDB Fully managed, multi-region, multi-master database Nonrelational database that delivers reliable performance at any scale Consistent single-digit millisecond latency Built-in security, backup and restore, in-memory Caching Support Streams
  • 16. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Process & Analyze Consume Ingestion Options Ingest Amazon Kinesis AWS Snowball Amazon MSK Data sources Amazon DynamoDB Web logs / cookies ERP Connected devices Database Migration Service Catalog Store Amazon S3 Store
  • 17. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Amazon Kinesis Data Streams • For technical developers • Build your own custom applications that process or analyze streaming data Amazon Kinesis Data Firehose • For all developers, data scientists • Easily load massive volumes of streaming data into S3, Amazon Redshift, and Amazon Elasticsearch Amazon Kinesis Data Analytics • For all developers, data scientists • Easily analyze data streams using standard SQL queries Amazon Kinesis:StreamingData MadeEasy
  • 18. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Storage Layer Process & Analyze Consume Catalog IngestIngest Amazon Kinesis AWS Snowball Amazon MSK Data sources Amazon DynamoDB Web logs / cookies ERP Connected devices Database Migration Service Amazon S3 Store Amazon S3
  • 19. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Secure, highly scalable, durable object storage with millisecond latency for data access Store any type of data–web sites, mobile apps, corporate applications, and IoT sensors, at any scale Store data in the format you want: Unstructured (logs, dump files) | semi-structured (JSON, XML) | structured (CSV, Parquet) Storage lifecycle integration Amazon S3-Standard |Amazon S3-Infrequent Access | Amazon Glacier AmazonS3is theBase
  • 20. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Store Data Discovery and Catalog Amazon S3 Process & Analyze Consume Catalog AWS Glue IngestIngest Amazon Kinesis AWS Snowball Amazon MSK Data sources Amazon DynamoDB Web logs / cookies ERP Connected devices Database Migration Service Store Amazon S3
  • 21. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Automatically discovers data and stores schema Data searchable, and available for ETL Generates customizable code Schedules and runs your ETL jobs Serverless AWSGlue -ServerlessDataCatalogand ETL
  • 22. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Ingest Consume Amazon Athena Amazon EMR Amazon Redshift Amazon Elasticsearch Store Amazon S3 Process & Analyze Process and Analyze Ingest Amazon Kinesis AWS Snowball Amazon MSK Data sources Amazon DynamoDB Web logs / cookies ERP Connected devices Database Migration Service Catalog AWS Glue
  • 23. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Interactive query service to analyze data in Amazon S3 using standard SQL No infrastructure to set up or manage and no data to load Supports Multiple Data Formats – Define Schema on Demand AmazonAthena -InteractiveAnalysis
  • 24. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Ingest Consume Amazon Kinesis BITools Querying the Data Lake Database Migration Service AWS Snowball Amazon MSK Amazon Athena Amazon EMR Amazon Redshift Amazon Elasticsearch Process & Analyze Jupyter Notebooks Amazon API Gateway Amazon QuickSight Catalog AWS Glue Store Amazon S3 Store Amazon S3 Data sources Amazon DynamoDB Web logs / cookies ERP Connected devices
  • 25. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Amazon QuickSight Supports variety of Data source andTargets Fully managed and scalable Super fast and easy to use Cost-effective
  • 26. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Poll – Which do you struggle most with? • Data migration/ingestion • Data processing/ETL • Data access security/auditing • Cancelling your line with Telkom
  • 27. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Data preparation accounts for ~80% of the work Building training sets Cleaning and organizing data Collecting data sets Mining data for patterns Refining algorithms Other
  • 28. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Sample of steps required Find sources Create Amazon Simple Storage Service (Amazon S3) locations Configure access policies Map tables to Amazon S3 locations ETL jobs to load and clean data Create metadata access policies Configure access from analytics services Rinse and repeat for other: data sets, users, and end-services And more: manage and monitor ETL jobs update metadata catalog as data changes update policies across services as users and permissions change manually maintain cleansing scripts create audit processes for compliance … Manual | Error-prone |Time consuming
  • 29. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Lake Formation Data lake infrastructure & management S3/Glacier AWS GlueLake Formation
  • 30. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Lake Formation Data lake infrastructure & management S3/Glacier AWS GlueLake Formation
  • 31. © 2020, Amazon Web Services, Inc. or its Affiliates. Build a secure data lake in days with AWS Lake Formation Amazon S3 data lake storage AWS Lake Formation Simplified ingest and cleaning enables data engineers to build faster Centralized management of fine-grained permissions empower security officers Comprehensive set of integrated tools enable user access consistently AWS Glue Blueprints ML Transforms Data catalog Access control
  • 32. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Top features of Lake Formation • Enhanced governance layer - security and governance layer at the Data Catalog level • Blueprints / Data Importers - templates for ETL, metadata (schema) and partition management • ML Transformations – ML algorithms that customers can use to create their own ML Transforms (i.e. record de-duplication) • Enhanced Data Catalog - enable users to record more metadata and tag Data Catalog objects (i.e. databases, tables, columns)
  • 33. © 2020, Amazon Web Services, Inc. or its Affiliates. Easier way to load data to the lake Logs DBs Blueprints One-shot Incremental
  • 34. © 2020, Amazon Web Services, Inc. or its Affiliates. How it works
  • 35. © 2020, Amazon Web Services, Inc. or its Affiliates. How it works
  • 36. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Lake Formation Security
  • 37. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Security permissions in Lake Formation Search and view permissions granted to a user, role, or group in one place Verify permissions granted to a user Easily revoke policies for a user
  • 38. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Security permissions in Lake Formation Control data access with simple grant and revoke permissions Specify permissions on tables and columns rather than on buckets and objects Easily view policies granted to a particular user Audit all data access at one place
  • 39. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Roles in a data lake project IAM Administrator Data Lake Admin Data Lake Developer/Analyst ProvisionsAWS resources and accounts Manages Data Lake Resources Access/Query Data Lake e.g. - Create Data Lake Admin - Create S3 bucket - Create Roles/Policies e.g. - Registers Data Lake - Secure data lake/add access controls e.g. - Query Data Lake - Creates MLTransforms - Writes ETL Jobs - Visualization usingAmazon QuickSight
  • 40. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Configuration and Execution Steps Activity User 1 Provision the following: - Data lake administrator - Data lake analyst/developer - Data lake and glue roles IAMAdministrator 2 Register a data lake Data Lake Administrator 3 Assign Lake Formation Permissions to data lake analyst Data Lake Administrator 4 Create AWS Glue Database Data Lake Administrator 5 Crawl and catalog Patient data in AWS Glue Data Lake Analyst 6 Assign table permissions to data lake Analyst Data Lake Administrator 7 Observe the data pattern and duplicates in data usingAmazon Athena Data Lake Analyst 8 Create, teach andTune an AWS Lake Formation MLTransform Data Lake Analyst 9 Create an AWS Glue ETL Job to use MLTransform for data deduplication Data Lake Analyst 10 Catalog de-duplicated data and query usingAmazon Athena Data Lake Analyst
  • 41. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Secure once, access in multiple ways Data Lake Storage Data Catalog Access Control Lake Formation Admin
  • 42. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Search and collaborate across multiple users Text-based, faceted search across all metadata Add attributes like Data owners, stewards, and other as table properties Add data sensitivity level, column definitions, and others as column properties Text-based search and filtering Query data in Amazon Athena
  • 43. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Enhanced Governance Layer AWS Lake Formation provides a security and governance layer at the Data Catalog level. Users can grant or revoke permissions to the Data Catalog objects such as databases, tables and columns for IAM principals (IAM users and roles). This functionality will be extended to row level access in subsequent releases.
  • 44. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Lake Formation Glue/Blueprints
  • 45. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. AWS Glue Less hassle Integrated across AWS: supports Aurora, RDS, Redshift, S3, and common database engines in yourVPC running on EC2 Serverless Serverless: no infrastructure to provision or manage More power Automatically generates the code to execute your data transformations and loading processes Simple, flexible, and cost-effective ETL & Data Catalog
  • 46. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Blueprints build on AWS Glue
  • 47. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Blueprints / Data Importers Blueprints are templates for data ingestion, transformation, metadata (schema) and partition management. Blueprints help customers to quickly and easily build and maintain a data lake.
  • 48. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. With blueprints You 1. Point us to the source 2. Tell us the location to load to in your data lake 3. Specify how often you want to load the data Blueprints 1. Discover the source table(s) schema 2. Automatically convert to the target data format 3. Automatically partition the data based on the partitioning schema 4. Keep track of data that was already processed 5. You can customize any of the above
  • 49. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Glue Features andTips • Use VPC Endpoint for Glue, Athena and S3 • Daisy-chain JDBC jobs to avoid startup pain • Use Lambda for tricks that aren’t yet available
  • 50. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Lambda with Glue • Automatically start an AWS Glue job when a file is uploaded to S3 • Use Lambda to Auto-Run a newly created Glue Crawler • Use Cloudwatch with Lambda to receive SNS notices for specific Glue Events • Create & Enable Glue Trigger in Lambda
  • 51. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Lake Formation MLTransformations
  • 52. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. MLTransformations Deduplication & FuzzyMatching of your Data
  • 53. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. MLTransforms
  • 54. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. 1. Create a new MLTransform
  • 55. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. 2.
  • 56. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Fuzzy deduplication – Under the hood
  • 57. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. What next? Recommended things to check out
  • 58. © 2020, Amazon Web Services, Inc. or its Affiliates. http://bit.ly/LFML-2019 Lake Formation Lab
  • 59. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Free foundational to advanced digital courses cover AWS services and teach architecting best practices AWSTraining and Certification Visit aws.amazon.com/training/path-architecting/ Classroom offerings, including Architecting on AWS, feature AWS expert instructors and hands-on labs Validate expertise with the AWS Certified Solutions Architect - Associate or AWS Certification Solutions Architect - Professional exams Resources created by the experts at AWS to propel your organization and career forward
  • 60. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Big Data blogs Real deep, use-cases, step by step. • “Use AWS Glue to run ETL jobs against non-native JDBC data sources” • “Matching patient records with the AWS Lake Formation FindMatches transform” • “Discovering metadata with AWS Lake Formation “ https://aws.amazon.com/blogs/?awsf.blog-master-category=category%23big-data
  • 61. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Q&A
  • 62. © 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates. Thank you!

Editor's Notes

  1. Data Lake The new, the old End to end pipeline Lake formation Security Glue Blueprints Lambda ML
  2. Who doesn’t have an AWS account?
  3. Dark data Schema on read means means defined in a catalog Parquet preferred Compute managed, but options
  4. Business decisions were made around the Enterprise Data warehousing with BI tools. Less relational, more diverse. 10x every 5 years Who has access/what type of access
  5. Business decisions were made around the Enterprise Data warehousing with BI tools. Less relational, more diverse. 10x every 5 years Who has access/what type of access Datalake is the evolution of data warehousing.
  6. 1/ easy path to build a data lake and start running diverse analytics workloads, 2/ secure cloud storage, compute, and network infrastructure that meets the specific needs of analytic workloads, 3/ a fully integrated analytics stack with a mature set of analytics tools, covering all common uses cases and leveraging open source and standard languages, engines, and platforms, and 4/ performance, the most scalability, and the lowest cost for analytics.
  7. analyze in a variety of ways with different engines go beyond insights, from operational reporting on historical data, to being able to perform ML and real-time analytics => accurately predict future outcomes.   S3 to provide even more insight without the delays and cost from moving or transforming your data.
  8. Many mature companies with dark data, to startup companies running real time applications built on what they learn
  9. (How to Build a Data Lake.pptx)
  10. Spend more time here, ask what people are using
  11. Natively supported by big data frameworks (Spark, Hive, Presto, and others) Decouple storage and compute No need to run compute clusters for storage (unlike HDFS) Can run transient Amazon EMR clusters with Amazon EC2 Spot Instances Multiple & heterogeneous analysis clusters and services can use the same data Designed for 99.999999999% durability No need to pay for data replication within a region Secure – SSL, client/server-side encryption at rest
  12. Encryptable Hive compatible
  13. Mini-ETL with Create Table As Statement, Views, Workgroups, query JSON, catalog upgrades
  14. Who doesn’t love a good pie chart.
  15. I think that “other” might be designing their data storage structure. As someone helping solve issues in customer datasets, I can tell you some people need to spend more time defining their partition structure and data generation size
  16. Mention security here how governments + medical companies are using it There is a slide coming, detail in there, but just mention the security here
  17. Many exists, but still not simple enough. Manual and time-consuming tasks such as loading data from diverse sources, monitoring these data flows, setting up partitions, turning on encryption and managing keys, re-organizing data into columnar format, and granting and auditing access. days, not months. enables secured self-service discovery and access for users Aware of multiple analytics services, easy on-demand access to specific resources that fit the processor and memory requirements of each analytics workload. The data is curated and cataloged, already prepared for any flavor of analytics, and related records are matched and de-duplicated with machine learning. Automation reduces the time it takes to get to answers when your data lake is built on top of AWS
  18. Lake Formation simplifies this manual process, automates many of the steps, allowing customers to setup a Data Lake just a few clicks from a single, unified dashboard. This reduces the time to setup a Data Lake from months to days. To eliminate siloes, you need to build a data lake automates many of the complex steps required to set up a data lake, reducing the time required to build a secure data lake from months to days. Security control at the object level for our object storage (data lake storage) layer. Other cloud vendors only provide bucket level security control. Deep integration across services that are needed to get answers from your data, including storage, compute, networking, and data movement. For example, Amazon EMR makes it easy to use EC2 Spot instances to save up to 90% on analytic workloads. Amazon Redshift allows you to query your S3 objects directly from your data warehouse. A single security model across all analytic services. AWS Lake Formation provides a single way to control access to your data whether you are accessing that data from a data warehouse, a Spark cluster, or a serverless query technology. Mature analytics services. Amazon EMR was first released in 2009 and Amazon Redshift first launched in 2013. Amazon S3 was one of the first AWS products and has been available since 2006. Tens of thousands of customers have data lakes on AWS and X exabytes of data is analyzed every day. A single object storage layer that is compatible with all AWS analytics and machine learning services. Amazon S3 is our only object storage service, we do not have different versions of S3 and we do not have separate “data lake storage.” 5 storage tiers and intelligent tiering in Amazon S3, so are able to store more data at a lower cost and with less manual data lifecycle work than with any other cloud provider. AWS S3 and AWS managed services store customer data in independent data centers across three availability zones within a single AWS Region and automatically replicate data between any regions regardless of storage class, providing a very high degree of fault tolerance and data durability out-of-the-box. AWS analytics services provide best of breed performance. Amazon Redshift is 2x faster than the next most popular competitor and Amazon EMR runs Apache Spark workloads over 10x faster than open source Spark. Speed helps get to answers quickly and also helps keep costs down for complex analytics.
  19. AWS Lake Formation has an enhanced Data Catalog to enable users to record more metadata and Tags at Databases, Tables and Columns. All this data will be searchable. 
  20. Good time to break?
  21. Database, table, column
  22. IAM is API based, but it isn’t designed for real-world access control. On Glue Catalog we can grant resource level permissions, again, it is API based and doesn’t give granular enough access.
  23. Register location till file level. Need to give IAM role that has access to that location and trust Lake Formation We update SLR policy. By default has permissions to List bucket, but as you register locations, we add additional permissions
  24. Data lake admin not the IAM Admin by default. Keep separate for security. Data lake will not ignore IAM. Need to add themselves as DataLake admin. In LF you would give permissions to a table, without having to give access to the bucket.
  25. Glue customer but not lake-formation customer, will have full access. Until location registered. Tells Athena, Redshift and EMR to check LF when querying. By default, no one has access. Grant permissions to any IAM users/roles. All permissions are on Catalog objects. After registering, Athena won’t work. Root is denied, shouldn’t access. Allowed on users/roles
  26. Takes in principle (user/role), then resource (db/table/column). Table permissions grant/deny. Similar to databases, but differences. Grant permissions are permissions to give permissions, for example for managers. Best practice for design. Example: Athena query: get-table get-temporary-credentials We don’t use
  27. Grant permissions on table, must specify database as well. Athena/redshift can do column level filtering. Glue can’t.
  28. Column level support. Encryption on catalog Service side security
  29. Relationship advice Who here is building ETL with Glue? Who uses Crawler? Who uses Workflow?
  30. 1/ integrated natively supports data stored in Amazon Aurora and all other Amazon RDS engines, Amazon Redshift, and Amazon S3, as well as common database engines, databases EC2. 2/ Cost Effective / Serverless: AWS Glue is serverless. There is no infrastructure to provision or manage. AWS Glue handles provisioning, configuration, and scaling of the resources required to run your ETL jobs on a fully managed, scale-out Apache Spark environment. You pay only for the resources used while your jobs are running. 3/ Mover Power: AWS Glue automates much of the effort in building, maintaining, and running ETL jobs. AWS Glue crawls your data sources, identifies data formats, and suggests schemas and transformations. AWS Glue automatically generates the code to execute your data transformations and loading processes.
  31. Console only feature Why no S3 to S3?
  32. Endpoint is more secure, less latency. Examples:
  33. AWS Lake Formation includes specialized ML-based dataset transformation algorithms customers can use to create their own ML Transforms. These include record de-duplication and match finding.
  34. 1/ Less Hassle: AWS Glue is integrated across a wide range of AWS services, meaning less hassle for you when onboarding. AWS Glue natively supports data stored in Amazon Aurora and all other Amazon RDS engines, Amazon Redshift, and Amazon S3, as well as common database engines and databases in your Virtual Private Cloud (Amazon VPC) running on Amazon EC2. 2/ Cost Effective / Serverless: AWS Glue is serverless. There is no infrastructure to provision or manage. AWS Glue handles provisioning, configuration, and scaling of the resources required to run your ETL jobs on a fully managed, scale-out Apache Spark environment. You pay only for the resources used while your jobs are running. 3/ Mower Power: AWS Glue automates much of the effort in building, maintaining, and running ETL jobs. AWS Glue crawls your data sources, identifies data formats, and suggests schemas and transformations. AWS Glue automatically generates the code to execute your data transformations and loading processes.
  35. Go read the docs!!
  36. STORY BACKGROUND Georgia-Pacific, owned by Koch Industries, is an American wood products, pulp, and paper company based in Atlanta, Georgia. The organization is one of the world’s largest manufacturers and distributors of pulp, towel and tissue paper and dispensers, packaging, and wood and gypsum building products. They use an S3 data lake as part of an advanced analytics and ML solution to gain new insights, optimize processes, and maximize resources. They now save millions annually by leveraging new insights to improve equipment failure predictions, run more production lines efficiently, and ensure high quality products. https://aws.amazon.com/solutions/case-studies/georgia-pacific/ In the first six months, Georgia-Pacific transferred about 50 TB of production data—more than 500 billion records—from hundreds of large, complex manufacturing and converting-process machines. The company uses Amazon Kinesis to stream real-time data from manufacturing equipment to a central data lake based on Amazon Simple Storage Service (Amazon S3), allowing it to efficiently ingest and analyze structured and unstructured data at scale. Georgia-Pacific knew it could learn from its structured and unstructured data, but the company lacked a cost-effective storage mechanism to ingest, transform, house, and analyze this data.   Georgia-Pacific uses Amazon Elastic MapReduce (Amazon EMR) to transform the data before delivering it in a structured fashion to data analysts through Amazon Redshift. The analysts use Amazon Athena on top of Amazon S3 to query the raw data, which includes information on pulping mechanisms, paper machines, converting lines, vibration trends, throughput, and paper quality. Georgia-Pacific also uses Amazon SageMaker, an AWS machine-learning (ML) solution, to build, train, and deploy ML models at scale. Using ML models built with raw production data, Amazon SageMaker provides real-time feedback to machine operators regarding optimum machine speeds and other adjustable variables, enabling less experienced operators to detect breaks earlier and maintain quality.
  37. Sticky-note feedback.