SlideShare a Scribd company logo
1 of 36
Download to read offline
S U M M I T
SA NTA CLA R A 2 0 1 9
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Performing real-time ETL into data
lakes
A D B 2 0 2
Joyjeet Banerjee
Enterprise Solutions Architect
Amazon Web Services
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
,
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
,
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
”
”
,
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Stream ingestion
Data from tens ofthousandsof datasources canbewritten toasingle stream
AWS IoT
Amazon CloudWatch Logs
Amazon CloudWatch Events
AWS SDK LOG4J
Flume
FluentdAWS Mobile
SDK
Kinesis
Producer
Library
Kinesis Agent
*AWS DMS includes eight on-premises databases, one Azure database, five Amazon RDS / Amazon Aurora
database types, and Amazon S3
AWS toolkits & libraries AWS service integrations Third-party offerings
AWS Database Migration
Service (AWS DMS)*
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Data is stored inthe order it wasreceived for aset duration. It can bereplayed indefinitely
during this time.
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Amazon Kinesis Data Streams
• Easy administration and low cost
• Real-time, elastic performance
• Secure, durable storage
• Available to multiple real-time analytics applications
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Amazon Kinesis Data Firehose
• Zero administration and seamless elasticity
• Direct-to-data store integration
• Serverless continuousdata transformations
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Records are read in the order they are produced, enabling real-time analytics orstreaming
ETL
Amazon EMR
AWS Lambda
Kinesis
Kinesis Client Library
AWS services
Apache Spark
Third party
SQL/
Java
Amazon Kinesis
Data Analytics
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Amazon Kinesis Data Analytics
• Interact with streaming data in real-time using SQL or integrated Javaapplications
• Build fully managed and elastic stream processing applications
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Real-time streaming on AWS
Real-time streaming and analytics
Easily collect, process, and analyze video and data streams in real time
Kinesis
Video Streams
Kinesis
Data Streams
Kinesis
Data Firehose
Kinesis
Data Analytics
Capture and store video
streamsfor analytics
Load data streamsinto
AWS data stores
Analyze data streams
with SQL or Java
Collect and store data
streamsfor analytics
Amazon Managed
Streaming for Kafka
Collect and store data
streamsfor analytics
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Source
.
Destination
Amazon S3IoT devices
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Stream processing Analytical readiness
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
AWS IoT CoreIoT device IoT
rule
Amazon Kinesis Data
Firehose
Lambda function
Amazon S3
AWS Glue
AWS Glue Data
Catalog
Optional convert to Parquet or
ORC
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Stream processing Analytical readiness
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Amazon S3Amazon Kinesis Data
Analytics
Amazon Kinesis Data
Streams
Elastic Load
Balancing
IoT device Amazon EC2
instances
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Stream processing Analytical readiness
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
An example architecture
Amazon S3
Amazon Redshift
Amazon Elasticsearch Service
Splunk
Real-time applications (seconds)
Streaming ETL (minutes)
Stream ingestion
[Wed Oct 11 14:32:52 2018]
[error] [client 127.0.0.1]
client denied by server
configuration:
/export/home/live/ap/htdocs
/test
Mobile device
Metering
Click streams
IoT sensors
Logs
AWS SDKsAmazon Kinesis
Agent
Amazon Kinesis Producer
Library
AmazonKinesisConsumer
LibraryAmazon Kinesis Data
Analytics
Amazon Kinesis Data
Streams
Amazon Kinesis Data
Firehose
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Thomson Reuters provides professionals with the
intelligence, technology, and human expertise they need to
find trusted answers.
”
“
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Customers using AWS Streaming solutions
1 billion events per
week from connected
devices
Near-real-time home
valuation (Zestimates)
Live clickstream
dashboards refreshed
under 10s
100 GB/day
clickstreams from 250+
sites
50 billion daily ad
impressions, sub-50
ms responses
Online stylist processing
10 million events/day
Migrated data bus
from Self-Managed
Kafka to Kinesis
Facilitate
communications
between 100+
microservices
IoT predictive
analytics
Real-time
game events analytics
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Amazon Go
video analytics
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Thank you!
S UM M I T © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I TS UM M I T © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.

More Related Content

What's hot

Getting Started with AWS Enterprise Applications: WorkSpaces, WorkMail, WorkDocs
Getting Started with AWS Enterprise Applications: WorkSpaces, WorkMail, WorkDocsGetting Started with AWS Enterprise Applications: WorkSpaces, WorkMail, WorkDocs
Getting Started with AWS Enterprise Applications: WorkSpaces, WorkMail, WorkDocsAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Best Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSBest Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSAmazon Web Services
 
Architecting for High Availability
Architecting for High AvailabilityArchitecting for High Availability
Architecting for High AvailabilityAmazon Web Services
 
Getting Started with Serverless Architectures
Getting Started with Serverless ArchitecturesGetting Started with Serverless Architectures
Getting Started with Serverless ArchitecturesAmazon Web Services
 
AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveCobus Bernard
 
Serverless Architecture - Design Patterns and Best Practices
Serverless Architecture - Design Patterns and Best PracticesServerless Architecture - Design Patterns and Best Practices
Serverless Architecture - Design Patterns and Best PracticesAmazon Web Services
 
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineThe Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineAmazon Web Services
 
Real-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon KinesisReal-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon KinesisAmazon Web Services
 
Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200Amazon Web Services
 
Serverless computing with AWS Lambda
Serverless computing with AWS Lambda Serverless computing with AWS Lambda
Serverless computing with AWS Lambda Apigee | Google Cloud
 
AWS Security Week: Security, Identity, & Compliance
AWS Security Week: Security, Identity, & ComplianceAWS Security Week: Security, Identity, & Compliance
AWS Security Week: Security, Identity, & ComplianceAmazon Web Services
 
Apache Hadoop and Spark on AWS: Getting started with Amazon EMR - Pop-up Loft...
Apache Hadoop and Spark on AWS: Getting started with Amazon EMR - Pop-up Loft...Apache Hadoop and Spark on AWS: Getting started with Amazon EMR - Pop-up Loft...
Apache Hadoop and Spark on AWS: Getting started with Amazon EMR - Pop-up Loft...Amazon Web Services
 
Aws organizations
Aws organizationsAws organizations
Aws organizationsOlaf Conijn
 
Overview of AWS by Andy Jassy - SVP, AWS
Overview of AWS by Andy Jassy - SVP, AWSOverview of AWS by Andy Jassy - SVP, AWS
Overview of AWS by Andy Jassy - SVP, AWSAmazon Web Services
 

What's hot (20)

Getting Started with AWS Enterprise Applications: WorkSpaces, WorkMail, WorkDocs
Getting Started with AWS Enterprise Applications: WorkSpaces, WorkMail, WorkDocsGetting Started with AWS Enterprise Applications: WorkSpaces, WorkMail, WorkDocs
Getting Started with AWS Enterprise Applications: WorkSpaces, WorkMail, WorkDocs
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Best Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSBest Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWS
 
Architecting for High Availability
Architecting for High AvailabilityArchitecting for High Availability
Architecting for High Availability
 
Getting Started with Serverless Architectures
Getting Started with Serverless ArchitecturesGetting Started with Serverless Architectures
Getting Started with Serverless Architectures
 
AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep Dive
 
Serverless Architectures.pdf
Serverless Architectures.pdfServerless Architectures.pdf
Serverless Architectures.pdf
 
Serverless Architecture - Design Patterns and Best Practices
Serverless Architecture - Design Patterns and Best PracticesServerless Architecture - Design Patterns and Best Practices
Serverless Architecture - Design Patterns and Best Practices
 
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineThe Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
 
Real-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon KinesisReal-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon Kinesis
 
What is AWS Glue
What is AWS GlueWhat is AWS Glue
What is AWS Glue
 
Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200
 
Introduction to Amazon DynamoDB
Introduction to Amazon DynamoDBIntroduction to Amazon DynamoDB
Introduction to Amazon DynamoDB
 
Serverless computing with AWS Lambda
Serverless computing with AWS Lambda Serverless computing with AWS Lambda
Serverless computing with AWS Lambda
 
AWS Security Week: Security, Identity, & Compliance
AWS Security Week: Security, Identity, & ComplianceAWS Security Week: Security, Identity, & Compliance
AWS Security Week: Security, Identity, & Compliance
 
AWS Lambda Features and Uses
AWS Lambda Features and UsesAWS Lambda Features and Uses
AWS Lambda Features and Uses
 
Apache Hadoop and Spark on AWS: Getting started with Amazon EMR - Pop-up Loft...
Apache Hadoop and Spark on AWS: Getting started with Amazon EMR - Pop-up Loft...Apache Hadoop and Spark on AWS: Getting started with Amazon EMR - Pop-up Loft...
Apache Hadoop and Spark on AWS: Getting started with Amazon EMR - Pop-up Loft...
 
Aws organizations
Aws organizationsAws organizations
Aws organizations
 
Introduction to AWS Glue
Introduction to AWS GlueIntroduction to AWS Glue
Introduction to AWS Glue
 
Overview of AWS by Andy Jassy - SVP, AWS
Overview of AWS by Andy Jassy - SVP, AWSOverview of AWS by Andy Jassy - SVP, AWS
Overview of AWS by Andy Jassy - SVP, AWS
 

Similar to AWS Real-Time ETL to Data Lakes

Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...AWS Summits
 
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...Amazon Web Services
 
Stream processing and managing real-time data
Stream processing and managing real-time dataStream processing and managing real-time data
Stream processing and managing real-time dataAmazon Web Services
 
Architetture per l'analisi di flussi di dati in tempo reale
Architetture per l'analisi di flussi di dati in tempo realeArchitetture per l'analisi di flussi di dati in tempo reale
Architetture per l'analisi di flussi di dati in tempo realeAmazon Web Services
 
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS SummitBuilding Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS SummitAmazon Web Services
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019Amazon Web Services
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Summits
 
갤럭시 규모의 인공지능 서비스를 위한 AWS 데이터베이스 아키텍처 - 김상필 솔루션 아키텍트 매니저, AWS / 김정환 데브옵스 엔지니어,...
갤럭시 규모의 인공지능 서비스를 위한 AWS 데이터베이스 아키텍처 - 김상필 솔루션 아키텍트 매니저, AWS / 김정환 데브옵스 엔지니어,...갤럭시 규모의 인공지능 서비스를 위한 AWS 데이터베이스 아키텍처 - 김상필 솔루션 아키텍트 매니저, AWS / 김정환 데브옵스 엔지니어,...
갤럭시 규모의 인공지능 서비스를 위한 AWS 데이터베이스 아키텍처 - 김상필 솔루션 아키텍트 매니저, AWS / 김정환 데브옵스 엔지니어,...Amazon Web Services Korea
 
A culture of rapid innovation with DevOps, microservices, & serverless - MAD2...
A culture of rapid innovation with DevOps, microservices, & serverless - MAD2...A culture of rapid innovation with DevOps, microservices, & serverless - MAD2...
A culture of rapid innovation with DevOps, microservices, & serverless - MAD2...Amazon Web Services
 
Building data lakes for analytics on AWS - ADB201 - Santa Clara AWS Summit.pdf
Building data lakes for analytics on AWS - ADB201 - Santa Clara AWS Summit.pdfBuilding data lakes for analytics on AWS - ADB201 - Santa Clara AWS Summit.pdf
Building data lakes for analytics on AWS - ADB201 - Santa Clara AWS Summit.pdfAmazon Web Services
 
NEW LAUNCH! AWS IoT Analytics from Consumer IoT to Industrial IoT - IOT211 - ...
NEW LAUNCH! AWS IoT Analytics from Consumer IoT to Industrial IoT - IOT211 - ...NEW LAUNCH! AWS IoT Analytics from Consumer IoT to Industrial IoT - IOT211 - ...
NEW LAUNCH! AWS IoT Analytics from Consumer IoT to Industrial IoT - IOT211 - ...Amazon Web Services
 
AWS 미디어 서비스를 이용한 글로벌 라이브 스트리밍 서비스 구축 - 황윤상 솔루션즈 아키텍트, AWS / 조용진 솔루션즈 아키텍트, AW...
AWS 미디어 서비스를 이용한 글로벌 라이브 스트리밍 서비스 구축 - 황윤상 솔루션즈 아키텍트, AWS / 조용진 솔루션즈 아키텍트, AW...AWS 미디어 서비스를 이용한 글로벌 라이브 스트리밍 서비스 구축 - 황윤상 솔루션즈 아키텍트, AWS / 조용진 솔루션즈 아키텍트, AW...
AWS 미디어 서비스를 이용한 글로벌 라이브 스트리밍 서비스 구축 - 황윤상 솔루션즈 아키텍트, AWS / 조용진 솔루션즈 아키텍트, AW...Amazon Web Services Korea
 
Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018
Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018
Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018Amazon Web Services
 
Connecting buildings to new opportunities with AWS IoT - SVC204 - New York AW...
Connecting buildings to new opportunities with AWS IoT - SVC204 - New York AW...Connecting buildings to new opportunities with AWS IoT - SVC204 - New York AW...
Connecting buildings to new opportunities with AWS IoT - SVC204 - New York AW...Amazon Web Services
 
Building-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWSBuilding-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWSAmazon Web Services
 
Machine learning at the edge for industrial applications - SVC302 - New York ...
Machine learning at the edge for industrial applications - SVC302 - New York ...Machine learning at the edge for industrial applications - SVC302 - New York ...
Machine learning at the edge for industrial applications - SVC302 - New York ...Amazon Web Services
 
Lake Formation, 데이터레이크 관리와 운영을 하나로 :: 이재성 - AWS Community Day 2019
Lake Formation, 데이터레이크 관리와 운영을 하나로 :: 이재성 - AWS Community Day 2019Lake Formation, 데이터레이크 관리와 운영을 하나로 :: 이재성 - AWS Community Day 2019
Lake Formation, 데이터레이크 관리와 운영을 하나로 :: 이재성 - AWS Community Day 2019AWSKRUG - AWS한국사용자모임
 
Build data-drive, high performance, internet scale applications with AWS Data...
Build data-drive, high performance, internet scale applications with AWS Data...Build data-drive, high performance, internet scale applications with AWS Data...
Build data-drive, high performance, internet scale applications with AWS Data...Amazon Web Services
 
Optimize data lakes with Amazon S3 - STG302 - Santa Clara AWS Summit
Optimize data lakes with Amazon S3 - STG302 - Santa Clara AWS SummitOptimize data lakes with Amazon S3 - STG302 - Santa Clara AWS Summit
Optimize data lakes with Amazon S3 - STG302 - Santa Clara AWS SummitAmazon Web Services
 

Similar to AWS Real-Time ETL to Data Lakes (20)

Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
 
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
 
Stream processing and managing real-time data
Stream processing and managing real-time dataStream processing and managing real-time data
Stream processing and managing real-time data
 
Architetture per l'analisi di flussi di dati in tempo reale
Architetture per l'analisi di flussi di dati in tempo realeArchitetture per l'analisi di flussi di dati in tempo reale
Architetture per l'analisi di flussi di dati in tempo reale
 
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS SummitBuilding Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
 
Data_Analytics_and_AI_ML
Data_Analytics_and_AI_MLData_Analytics_and_AI_ML
Data_Analytics_and_AI_ML
 
갤럭시 규모의 인공지능 서비스를 위한 AWS 데이터베이스 아키텍처 - 김상필 솔루션 아키텍트 매니저, AWS / 김정환 데브옵스 엔지니어,...
갤럭시 규모의 인공지능 서비스를 위한 AWS 데이터베이스 아키텍처 - 김상필 솔루션 아키텍트 매니저, AWS / 김정환 데브옵스 엔지니어,...갤럭시 규모의 인공지능 서비스를 위한 AWS 데이터베이스 아키텍처 - 김상필 솔루션 아키텍트 매니저, AWS / 김정환 데브옵스 엔지니어,...
갤럭시 규모의 인공지능 서비스를 위한 AWS 데이터베이스 아키텍처 - 김상필 솔루션 아키텍트 매니저, AWS / 김정환 데브옵스 엔지니어,...
 
A culture of rapid innovation with DevOps, microservices, & serverless - MAD2...
A culture of rapid innovation with DevOps, microservices, & serverless - MAD2...A culture of rapid innovation with DevOps, microservices, & serverless - MAD2...
A culture of rapid innovation with DevOps, microservices, & serverless - MAD2...
 
Building data lakes for analytics on AWS - ADB201 - Santa Clara AWS Summit.pdf
Building data lakes for analytics on AWS - ADB201 - Santa Clara AWS Summit.pdfBuilding data lakes for analytics on AWS - ADB201 - Santa Clara AWS Summit.pdf
Building data lakes for analytics on AWS - ADB201 - Santa Clara AWS Summit.pdf
 
NEW LAUNCH! AWS IoT Analytics from Consumer IoT to Industrial IoT - IOT211 - ...
NEW LAUNCH! AWS IoT Analytics from Consumer IoT to Industrial IoT - IOT211 - ...NEW LAUNCH! AWS IoT Analytics from Consumer IoT to Industrial IoT - IOT211 - ...
NEW LAUNCH! AWS IoT Analytics from Consumer IoT to Industrial IoT - IOT211 - ...
 
AWS 미디어 서비스를 이용한 글로벌 라이브 스트리밍 서비스 구축 - 황윤상 솔루션즈 아키텍트, AWS / 조용진 솔루션즈 아키텍트, AW...
AWS 미디어 서비스를 이용한 글로벌 라이브 스트리밍 서비스 구축 - 황윤상 솔루션즈 아키텍트, AWS / 조용진 솔루션즈 아키텍트, AW...AWS 미디어 서비스를 이용한 글로벌 라이브 스트리밍 서비스 구축 - 황윤상 솔루션즈 아키텍트, AWS / 조용진 솔루션즈 아키텍트, AW...
AWS 미디어 서비스를 이용한 글로벌 라이브 스트리밍 서비스 구축 - 황윤상 솔루션즈 아키텍트, AWS / 조용진 솔루션즈 아키텍트, AW...
 
Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018
Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018
Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018
 
Connecting buildings to new opportunities with AWS IoT - SVC204 - New York AW...
Connecting buildings to new opportunities with AWS IoT - SVC204 - New York AW...Connecting buildings to new opportunities with AWS IoT - SVC204 - New York AW...
Connecting buildings to new opportunities with AWS IoT - SVC204 - New York AW...
 
Building-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWSBuilding-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWS
 
Machine learning at the edge for industrial applications - SVC302 - New York ...
Machine learning at the edge for industrial applications - SVC302 - New York ...Machine learning at the edge for industrial applications - SVC302 - New York ...
Machine learning at the edge for industrial applications - SVC302 - New York ...
 
Lake Formation, 데이터레이크 관리와 운영을 하나로 :: 이재성 - AWS Community Day 2019
Lake Formation, 데이터레이크 관리와 운영을 하나로 :: 이재성 - AWS Community Day 2019Lake Formation, 데이터레이크 관리와 운영을 하나로 :: 이재성 - AWS Community Day 2019
Lake Formation, 데이터레이크 관리와 운영을 하나로 :: 이재성 - AWS Community Day 2019
 
Build data-drive, high performance, internet scale applications with AWS Data...
Build data-drive, high performance, internet scale applications with AWS Data...Build data-drive, high performance, internet scale applications with AWS Data...
Build data-drive, high performance, internet scale applications with AWS Data...
 
Optimize data lakes with Amazon S3 - STG302 - Santa Clara AWS Summit
Optimize data lakes with Amazon S3 - STG302 - Santa Clara AWS SummitOptimize data lakes with Amazon S3 - STG302 - Santa Clara AWS Summit
Optimize data lakes with Amazon S3 - STG302 - Santa Clara AWS Summit
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

AWS Real-Time ETL to Data Lakes

  • 1. S U M M I T SA NTA CLA R A 2 0 1 9
  • 2. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Performing real-time ETL into data lakes A D B 2 0 2 Joyjeet Banerjee Enterprise Solutions Architect Amazon Web Services
  • 3. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
  • 4. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
  • 5. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
  • 6. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T ,
  • 7. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
  • 8. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T ,
  • 9. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T ” ” ,
  • 10. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
  • 11. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
  • 12. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
  • 13. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
  • 14. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Stream ingestion Data from tens ofthousandsof datasources canbewritten toasingle stream AWS IoT Amazon CloudWatch Logs Amazon CloudWatch Events AWS SDK LOG4J Flume FluentdAWS Mobile SDK Kinesis Producer Library Kinesis Agent *AWS DMS includes eight on-premises databases, one Azure database, five Amazon RDS / Amazon Aurora database types, and Amazon S3 AWS toolkits & libraries AWS service integrations Third-party offerings AWS Database Migration Service (AWS DMS)*
  • 15. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
  • 16. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Data is stored inthe order it wasreceived for aset duration. It can bereplayed indefinitely during this time.
  • 17. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Amazon Kinesis Data Streams • Easy administration and low cost • Real-time, elastic performance • Secure, durable storage • Available to multiple real-time analytics applications
  • 18. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Amazon Kinesis Data Firehose • Zero administration and seamless elasticity • Direct-to-data store integration • Serverless continuousdata transformations
  • 19. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
  • 20. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Records are read in the order they are produced, enabling real-time analytics orstreaming ETL Amazon EMR AWS Lambda Kinesis Kinesis Client Library AWS services Apache Spark Third party SQL/ Java Amazon Kinesis Data Analytics
  • 21. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Amazon Kinesis Data Analytics • Interact with streaming data in real-time using SQL or integrated Javaapplications • Build fully managed and elastic stream processing applications
  • 22. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Real-time streaming on AWS Real-time streaming and analytics Easily collect, process, and analyze video and data streams in real time Kinesis Video Streams Kinesis Data Streams Kinesis Data Firehose Kinesis Data Analytics Capture and store video streamsfor analytics Load data streamsinto AWS data stores Analyze data streams with SQL or Java Collect and store data streamsfor analytics Amazon Managed Streaming for Kafka Collect and store data streamsfor analytics
  • 23. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Source . Destination Amazon S3IoT devices
  • 24. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Stream processing Analytical readiness
  • 25. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T AWS IoT CoreIoT device IoT rule Amazon Kinesis Data Firehose Lambda function Amazon S3 AWS Glue AWS Glue Data Catalog Optional convert to Parquet or ORC
  • 26. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Stream processing Analytical readiness
  • 27. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Amazon S3Amazon Kinesis Data Analytics Amazon Kinesis Data Streams Elastic Load Balancing IoT device Amazon EC2 instances
  • 28. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Stream processing Analytical readiness
  • 29. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
  • 30. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T An example architecture Amazon S3 Amazon Redshift Amazon Elasticsearch Service Splunk Real-time applications (seconds) Streaming ETL (minutes) Stream ingestion [Wed Oct 11 14:32:52 2018] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/htdocs /test Mobile device Metering Click streams IoT sensors Logs AWS SDKsAmazon Kinesis Agent Amazon Kinesis Producer Library AmazonKinesisConsumer LibraryAmazon Kinesis Data Analytics Amazon Kinesis Data Streams Amazon Kinesis Data Firehose
  • 31. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Thomson Reuters provides professionals with the intelligence, technology, and human expertise they need to find trusted answers. ” “
  • 32. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
  • 33. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Customers using AWS Streaming solutions 1 billion events per week from connected devices Near-real-time home valuation (Zestimates) Live clickstream dashboards refreshed under 10s 100 GB/day clickstreams from 250+ sites 50 billion daily ad impressions, sub-50 ms responses Online stylist processing 10 million events/day Migrated data bus from Self-Managed Kafka to Kinesis Facilitate communications between 100+ microservices IoT predictive analytics Real-time game events analytics
  • 34. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Amazon Go video analytics
  • 35. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Thank you! S UM M I T © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.
  • 36. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I TS UM M I T © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.