SlideShare a Scribd company logo

Building-a-Data-Lake-on-AWS

AWS delivers an integrated suite of services that provide everything needed to quickly and easily build and manage a data lake for analytics. AWS-powered data lakes can handle the scale, agility, and flexibility required to combine different types of data and analytics approaches to gain deeper insights, in ways that traditional data silos and data warehouses cannot. In this session, we will show you how you can quickly build a data lake on AWS that ingests, catalogs and processes incoming data and makes it ready for analysis. Using a live demo, we demonstrate the capabilities of AWS provided analytical services such as AWS Glue, Amazon Athena and Amazon EMR and how to build a Data Lake on AWS step-by-step.

1 of 36
Download to read offline
S U M M I T
Hong Kong
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Building a Data Lake on AWS
Rahul Bhartia
Principal Big Data Architect - AWS
rbhartia@
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
There are more
people accessing data
And more
requirements for
making data available
Data Scientists
Analysts
Business Users
Applications
Secure Real time
Flexible Scalable
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
A
data lake
is a
centralized repository
that allows you to store
all your structured and unstructured data
at any scale
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Building a Data lake on AWS
Amazon S3
AWS
Glue
AWS
Snowball
AWS
DataSync
AWS Data
Migration
AWS Storage
Gateway
Amazon
Kinesis
Crawler
Job
Data
Catalog
AWS Lake
Formation
Amazon
Athena
Amazon
EMR
Amazon
QuickSight
Amazon
Redshift
Amazon
SageMaker
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Data lakes made serverless
Amazon
S3
AWS
Glue
Amazon
Athena
Amazon
QuickSight
Serverless. Zero
infrastructure. Zero
administration
Never pay for
idle resources
$
Availability and
fault tolerance
built in
Automatically scales
resources with
usage
Amazon
Kinesis
Amazon
Sagemaker
Devices Web Sensors Social
Ad

Recommended

AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveCobus Bernard
 
Architecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWSArchitecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWSAmazon Web Services
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesAmazon Web Services
 

More Related Content

What's hot

Introduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxIntroduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxSwathiPonugumati
 
Introduction to Amazon QuickSight - Pop-up Loft TLV 2017
Introduction to Amazon QuickSight - Pop-up Loft TLV 2017Introduction to Amazon QuickSight - Pop-up Loft TLV 2017
Introduction to Amazon QuickSight - Pop-up Loft TLV 2017Amazon Web Services
 
Automated Solution for Deploying AWS Landing Zone (GPSWS407) - AWS re:Invent ...
Automated Solution for Deploying AWS Landing Zone (GPSWS407) - AWS re:Invent ...Automated Solution for Deploying AWS Landing Zone (GPSWS407) - AWS re:Invent ...
Automated Solution for Deploying AWS Landing Zone (GPSWS407) - AWS re:Invent ...Amazon Web Services
 
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나Amazon Web Services Korea
 
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdf
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdfData & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdf
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdfChris Bingham
 
AWS Cloud Migration Insights Forum
AWS Cloud Migration Insights ForumAWS Cloud Migration Insights Forum
AWS Cloud Migration Insights ForumAmazon Web Services
 
Building Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudBuilding Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudAmazon Web Services
 
Getting Started with AWS Database Migration Service
Getting Started with AWS Database Migration ServiceGetting Started with AWS Database Migration Service
Getting Started with AWS Database Migration ServiceAmazon Web Services
 
Introducing AWS DataSync - Simplify, automate, and accelerate online data tra...
Introducing AWS DataSync - Simplify, automate, and accelerate online data tra...Introducing AWS DataSync - Simplify, automate, and accelerate online data tra...
Introducing AWS DataSync - Simplify, automate, and accelerate online data tra...Amazon Web Services
 
Building a Modern Data Platform on AWS
Building a Modern Data Platform on AWSBuilding a Modern Data Platform on AWS
Building a Modern Data Platform on AWSAmazon Web Services
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Amazon Web Services
 
Building Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWSBuilding Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWSAmazon Web Services
 

What's hot (20)

Introduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxIntroduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptx
 
Introduction to Amazon QuickSight - Pop-up Loft TLV 2017
Introduction to Amazon QuickSight - Pop-up Loft TLV 2017Introduction to Amazon QuickSight - Pop-up Loft TLV 2017
Introduction to Amazon QuickSight - Pop-up Loft TLV 2017
 
Introduction to AWS Glue
Introduction to AWS GlueIntroduction to AWS Glue
Introduction to AWS Glue
 
Automated Solution for Deploying AWS Landing Zone (GPSWS407) - AWS re:Invent ...
Automated Solution for Deploying AWS Landing Zone (GPSWS407) - AWS re:Invent ...Automated Solution for Deploying AWS Landing Zone (GPSWS407) - AWS re:Invent ...
Automated Solution for Deploying AWS Landing Zone (GPSWS407) - AWS re:Invent ...
 
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나
 
Cloud Migration Workshop
Cloud Migration WorkshopCloud Migration Workshop
Cloud Migration Workshop
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
BDA311 Introduction to AWS Glue
BDA311 Introduction to AWS GlueBDA311 Introduction to AWS Glue
BDA311 Introduction to AWS Glue
 
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdf
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdfData & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdf
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdf
 
Amazon QuickSight
Amazon QuickSightAmazon QuickSight
Amazon QuickSight
 
AWS Cloud Migration Insights Forum
AWS Cloud Migration Insights ForumAWS Cloud Migration Insights Forum
AWS Cloud Migration Insights Forum
 
Building Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudBuilding Data Lakes in the AWS Cloud
Building Data Lakes in the AWS Cloud
 
Introduction to Amazon Aurora
Introduction to Amazon AuroraIntroduction to Amazon Aurora
Introduction to Amazon Aurora
 
AWS glue technical enablement training
AWS glue technical enablement trainingAWS glue technical enablement training
AWS glue technical enablement training
 
Getting Started with AWS Database Migration Service
Getting Started with AWS Database Migration ServiceGetting Started with AWS Database Migration Service
Getting Started with AWS Database Migration Service
 
Introducing AWS DataSync - Simplify, automate, and accelerate online data tra...
Introducing AWS DataSync - Simplify, automate, and accelerate online data tra...Introducing AWS DataSync - Simplify, automate, and accelerate online data tra...
Introducing AWS DataSync - Simplify, automate, and accelerate online data tra...
 
Building a Modern Data Platform on AWS
Building a Modern Data Platform on AWSBuilding a Modern Data Platform on AWS
Building a Modern Data Platform on AWS
 
Amazon QuickSight
Amazon QuickSightAmazon QuickSight
Amazon QuickSight
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
 
Building Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWSBuilding Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWS
 

Similar to Building-a-Data-Lake-on-AWS

Building-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWSBuilding-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWSAmazon Web Services
 
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSAWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSSteven Hsieh
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaAmazon Web Services
 
Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudBuilding a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudAmazon Web Services
 
Building a modern data platform in AWS
Building a modern data platform in AWSBuilding a modern data platform in AWS
Building a modern data platform in AWSAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019javier ramirez
 
Build your own log analytics solution on AWS - ADB301 - Atlanta AWS Summit
Build your own log analytics solution on AWS - ADB301 - Atlanta AWS SummitBuild your own log analytics solution on AWS - ADB301 - Atlanta AWS Summit
Build your own log analytics solution on AWS - ADB301 - Atlanta AWS SummitAmazon Web Services
 
Stream processing and managing real-time data
Stream processing and managing real-time dataStream processing and managing real-time data
Stream processing and managing real-time dataAmazon Web Services
 
在 AWS 上構建無服務器分析
在 AWS 上構建無服務器分析在 AWS 上構建無服務器分析
在 AWS 上構建無服務器分析Amazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Best Practices for Migrating Databases to the Cloud - AWS Summit Sydney
Best Practices for Migrating Databases to the Cloud - AWS Summit SydneyBest Practices for Migrating Databases to the Cloud - AWS Summit Sydney
Best Practices for Migrating Databases to the Cloud - AWS Summit SydneyAmazon Web Services
 
Build a dashboard using serverless security analytics - SDD201 - AWS re:Infor...
Build a dashboard using serverless security analytics - SDD201 - AWS re:Infor...Build a dashboard using serverless security analytics - SDD201 - AWS re:Infor...
Build a dashboard using serverless security analytics - SDD201 - AWS re:Infor...Amazon Web Services
 
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...Amazon Web Services
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019Amazon Web Services
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Summits
 

Similar to Building-a-Data-Lake-on-AWS (20)

Building-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWSBuilding-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWS
 
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSAWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudBuilding a Modern Data Platform in the Cloud
Building a Modern Data Platform in the Cloud
 
Data_Analytics_and_AI_ML
Data_Analytics_and_AI_MLData_Analytics_and_AI_ML
Data_Analytics_and_AI_ML
 
Building a modern data platform in AWS
Building a modern data platform in AWSBuilding a modern data platform in AWS
Building a modern data platform in AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Modern Data Platform on AWS
Modern Data Platform on AWSModern Data Platform on AWS
Modern Data Platform on AWS
 
Construindo data lakes e analytics com AWS
Construindo data lakes e analytics com AWSConstruindo data lakes e analytics com AWS
Construindo data lakes e analytics com AWS
 
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
 
Build your own log analytics solution on AWS - ADB301 - Atlanta AWS Summit
Build your own log analytics solution on AWS - ADB301 - Atlanta AWS SummitBuild your own log analytics solution on AWS - ADB301 - Atlanta AWS Summit
Build your own log analytics solution on AWS - ADB301 - Atlanta AWS Summit
 
Stream processing and managing real-time data
Stream processing and managing real-time dataStream processing and managing real-time data
Stream processing and managing real-time data
 
在 AWS 上構建無服務器分析
在 AWS 上構建無服務器分析在 AWS 上構建無服務器分析
在 AWS 上構建無服務器分析
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Best Practices for Migrating Databases to the Cloud - AWS Summit Sydney
Best Practices for Migrating Databases to the Cloud - AWS Summit SydneyBest Practices for Migrating Databases to the Cloud - AWS Summit Sydney
Best Practices for Migrating Databases to the Cloud - AWS Summit Sydney
 
Build a dashboard using serverless security analytics - SDD201 - AWS re:Infor...
Build a dashboard using serverless security analytics - SDD201 - AWS re:Infor...Build a dashboard using serverless security analytics - SDD201 - AWS re:Infor...
Build a dashboard using serverless security analytics - SDD201 - AWS re:Infor...
 
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Building-a-Data-Lake-on-AWS

  • 1. S U M M I T Hong Kong
  • 2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Building a Data Lake on AWS Rahul Bhartia Principal Big Data Architect - AWS rbhartia@
  • 3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T There are more people accessing data And more requirements for making data available Data Scientists Analysts Business Users Applications Secure Real time Flexible Scalable
  • 4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale
  • 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Building a Data lake on AWS Amazon S3 AWS Glue AWS Snowball AWS DataSync AWS Data Migration AWS Storage Gateway Amazon Kinesis Crawler Job Data Catalog AWS Lake Formation Amazon Athena Amazon EMR Amazon QuickSight Amazon Redshift Amazon SageMaker
  • 6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Data lakes made serverless Amazon S3 AWS Glue Amazon Athena Amazon QuickSight Serverless. Zero infrastructure. Zero administration Never pay for idle resources $ Availability and fault tolerance built in Automatically scales resources with usage Amazon Kinesis Amazon Sagemaker Devices Web Sensors Social
  • 7. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon S3—Object Storage Security and compliance • 3 different forms of encryption at rest and encryption in transit • Log and monitor with CloudTrail & discover and protect data with Macie Flexible management • Classify, report, and visualize data usage trends • Use Tag for consumption, cost, and security • Build lifecycle policies to automate tiering and retention Durability, availability & scalability • Built for eleven nine’s of durability • Data distributed across 3 facilities within a region; • Global replication capabilities Query-in-Place • Run analytics & ML on without moving data • Retrieve subset of data, improving performance by 400% with S3 Select
  • 9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T AWS Glue Data Catalog Unified metadata repository for data in • Amazon S3 • Amazon DynamoDB • Relational databases - Amazon RDS, Amazon Redshift Query your data from Amazon Athena or Amazon Redshift Spectrum or Amazon EMR Augment technical metadata with business metadata for tables Schema evolution using versioning Central and searchable view of your data-assets
  • 10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Crawlers automatically build your Data Catalog and keep it in sync. Automatically discover new data, extracts schema definitions • Detect schema changes and version tables • Detect Hive style partitions on Amazon S3 Built-in classifiers for popular types; custom classifiers using Grok expression Run ad hoc or on a schedule; serverless – only pay when crawler runs AWS Glue Crawlers Automatically catalog your data
  • 11. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Enforce security policies across multiple services Gain and manage new insights Identify, ingest, clean, and transform data Build a secure data lake in days AWS Lake Formation
  • 12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Easily load data to your data lake Logs and Events Databases Amazon S3 Blueprints AWS Lake Formation Amazon RDS Amazon Aurora Amazon Kinesis Firehose Amazon CloudTrail Full-load Incremental AWS Glue Data Catalog
  • 13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Blueprints build on AWS Glue
  • 14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Easily de-duplicate your data with ML transforms
  • 15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Secure once, access in multiple ways Access Control Admin Amazon S3 AWS Lake Formation AWS Glue Data Catalog Amazon Athena Amazon EMR Amazon QuickSight Amazon Redshift
  • 16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Security permissions in AWS Lake Formation Control data access with simple grant and revoke permissions Specify permissions on tables and columns rather than on buckets and objects Easily view policies granted to a particular user Audit all data access at one place
  • 17. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Data movement from on-premises AWS Snowball Petabyte and Exabyte- scale data transport solution that uses secure appliances to transfer large amounts of data into and out of the AWS cloud AWS DataSync Automate moving data between on-premises and Amazon S3 using Network File System (NFS) protocol, at speeds up to 10 times faster than open-source tools. AWS Storage Gateway Lets your on-premises applications to use AWS for storage; includes a highly-optimized data transfer mechanism, bandwidth management, along with local cache AWS Database Migration Service Migrate database from the most widely-used commercial and open- source offerings to AWS quickly and securely with minimal downtime to applications
  • 19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Change Data Capture (CDC) to Amazon S3 AWS Database Migration Service Source database Crawlers Data catalogSnapshot Data AWS Glue Amazon Athena Amazon EMR New! • Support for Parquet • Support for S3 encryption with KMS Amazon Redshift
  • 20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Data movement in real-time Amazon Kinesis Video Streams Securely stream video from connected devices to AWS for analytics, machine learning (ML), and other processing Amazon Kinesis Data Firehose Capture, transform, and load data streams into AWS data stores for near real-time analytics with existing business intelligence tools. Amazon Kinesis Data Streams Build custom, real-time applications that process data streams using popular stream processing frameworks Managed Streaming For Kafka Fully managed open- source platform for building real-time streaming data pipelines and applications.
  • 21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Prefix: raw/life/year=!{timestamp:yyyy}/month=!{timestamp:MM}/day=!{timestamp:dd}/ Buffer: Up to 128MB or 15 minutes Real-time events to Amazon S3 Kinesis Data Streams Kinesis Data Firehose Lambda Transformation Aggregated JSON Data Aggregated Parquet Data Amazon Athena Crawlers Save as Parquet Data Catalog
  • 22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T AWS Glue ETL New!
  • 23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Transforming data using AWS Glue Amazon S3 (Raw data) Amazon S3 (Processed data) AWS Glue Data Catalog AWS Glue Crawler AWS Glue Crawler AWS Glue job Lambda Function Amazon S3 (Enriched data) AWS Glue Crawler AWS Glue job File Put Event Trigger
  • 24. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Analytics Amazon QuickSight Pay only for what you use; Scale to tens of thousands of users; Embedded analytics; Build end-to-end BI solutions Amazon Athena Amazon EMR Flexible, open source choice for Hadoop and Spark; Lower cost than on-premises with autoscaling; Security with Encryption, Authentication and Authorization Amazon Redshift Cost-effective and up to 10x faster than traditional data warehouses; Easy to setup, deploy and manage; Scale on-demand for large data volume and high query concurrency Run interactive queries to easily analyze data in Amazon S3 using standard SQL; No infrastructure to set up or manage and no data to load New! Workgroups Multi-master Concurrency scaling ML Insights
  • 26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Automated reports using Amazon Athena athena.startQueryExecution("SELECT * FROM business_view”) SNS Queue 1 2 3 4 Email notification 5 1. Schedule query 2. Track QueryID for status 3. Query results to Amazon S3 4. New file trigger 5. Job complete notification Lambda Function Athena Query S3 Bucket Lambda Function SNS Topic DynamoDB Table
  • 27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Athena Workgroups Isolation Metrics Cost Controls Tags Use tags to categorize your AWS resources in different ways, for instance by purpose, owner, or environment. Build dashboards and alerting based on Workgroup metrics are published to Cloudwatch Define per query data scanned threshold; Any query exceeding that will be cancelled; Trigger alarms to notify of increasing usage and cost Unique query output location per Workgroup Encrypt results with unique AWS KMS key per Workgroup
  • 28. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon EMR Notebooks EMR Cluster AWS Management Console EMR-managed Jupyter notebook Users S3 bucket Auto-save Amazon S3 AWS Glue Data Catalog SageMaker Athena
  • 29. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon EMR – High Availability Livy Zookeeper HiveServer2 Yarn RM HDFS NameNode Livy Zookeeper HiveServer2 Yarn RM HDFS NameNode Livy Zookeeper HiveServer2 Yarn RM Master Node 1 Master Node 2 Master Node 3 EMR Cluster Active Standby Standby Active Active Active
  • 30. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Data-warehousing with Amazon Redshift AWS Database Migration Service Database Crawlers Data catalog Amazon Kinesis Firehose Amazon Redshift Files Events Save as Parquet Upload to S3 Redshift Spectrum CDC Replication
  • 31. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Unload to Parquet Amazon Redshift N E W ! New features Speed Scale WLM ConcurrencySimplicity Amazon Lake Formation integrationSecurity Auto-Vacuum & Analyze Auto Data Distribution Deferred Maintenance Snapshot Scheduler Spectrum Request Accelerator 10x average performance improvement Elastic resize Concurrency Scaling N E W ! N E W !N E W ! C O M I N G S O O N C O M I N G S O O N C O M I N G S O O N Improving short query acceleration C O M I N G S O O N C O M I N G S O O N Stored procedures N E W ! N E W ! N E W !
  • 32. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T ML Insights with Amazon QuickSight ML Anomaly detection ML Forecasting Auto Narratives
  • 33. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T AI & Machine Learning AI services that enable developers to plug-in pre-built AI functionality into their apps ML services that make it easy for any developer to get started and get deep with ML Frameworks and interfaces for machine learning practitioners Amazon S3 Raw Data Initial training data is annotated by human labelers Active learning model is trained from human labeled data Ambiguous data is sent to human labelers for annotation Human labeled data is then sent back to retrain and improve the machine learning model Training data the model understands is labeled automatically An accurate training data set is ready for use in Amazon SageMaker
  • 34. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon SageMaker Frameworks Interfaces EC2 P3 & P3dn EC2 C5 FPGASs GreenGrass Elastic Inference AI & Machine Learning AI Services Frameworks & Infrastructure Rekognition Image Polly Transcribe Translate Comprehend & Comprehend Medical Rekognition Video Textract Forecast PersonalizeLex Vision Speech ChatbotsLanguage Forecasting Recommendations Infrastructure Pre-built algorithms & notebooks Data labeling (Ground Truth) One-click model training & tuning Optimization (NEO) One-click deployment & hosting Reinforcement learningAlgorithms & models (AWS Marketplace for ML) Train DeployBuild ML Services
  • 35. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Thank you! S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 36. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.