SlideShare a Scribd company logo
1 of 45
Download to read offline
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS re:INVENT
From Batch to Streaming:
H o w A m a z o n F l e x U s e s R e a l - t i m e A n a l y t i c s t o D e l i v e r P a c k a g e s o n T i m e
N o v e m b e r 2 8 , 2 0 1 7
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Agenda
• Real-time streaming data overview
• Streaming data services
• Benefits of streaming analytics
• Batch to streaming best practices
• How Amazon Flex moved from batch to streaming
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What is batch processing?
Execution of a series of jobs in a program on a
computer without manual intervention - Wikipedia
• Data is collected over a period of time
• Process and analyze on a schedule
• Combine several processes to obtain final result
Most data is produced continuously
Mobile apps Web clickstream Application logs
Metering records IoT sensors Smart buildings
The diminishing value of data
Recent data is highly valuable
• If you act on it in time
• Perishable insights (M. Gualtieri,
Forrester)
Old + recent data is more
valuable
• If you have the means to combine
them
Processing real-time, streaming data
• Durable
• Continuous
• Fast
• Correct
• Reactive
• Reliable
What are the key requirements?
Collect Transform Analyze React Persist
Amazon Kinesis makes it easy to work with real-
time streaming data
Kinesis Streams
• For technical developers
• Collect and stream data
for ordered, replayable,
real-time processing
Kinesis Firehose
• For all developers, data
scientists
• Easily load massive
volumes of streaming data
into Amazon S3, Redshift,
ElasticSearch
Kinesis Analytics
• For all developers, data
scientists
• Easily analyze data streams
using standard SQL queries
• Compute analytics in
real time
Amazon Kinesis Streams
• Reliably ingest and durably store streaming data at low cost
• Build custom real-time applications to process streaming data
• Use your stream-processing framework of choice
Amazon Kinesis Firehose
• Reliably ingest and deliver batched, compressed, and
encrypted data to S3, Redshift, and Elasticsearch
• Point and click setup with zero administration and
seamless elasticity
• Managed stream-processing consumer
Amazon Kinesis Analytics
• Interact with streaming data in real time using SQL
• Build fully managed and elastic stream processing
applications that process data for real-time
visualizations and alarms
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Benefits of streaming analysis
Immediate results
• Real-time
aggregations
• Filtering
• Anomaly detection
Reduced
complexity
• Fewer scheduled
jobs to manage
• Kinesis is a fully-
managed solution
Scalable
• Enables parallel
processing
• Horizontally
scales, based on
your ingest rate
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Batch to streaming best practices
Migrate incrementally
• Don’t boil the ocean
• Begin by streaming data
in parallel to existing
batch processes
• Persist streaming data
into durable storage, like
Amazon S3
• Add in streaming
analysis results to
replace batch analysis
Application databases Data warehouseData producer
Amazon Kinesis
ETL
ETL
Amazon S3
Streaming
data
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Batch to streaming best practices
Perform ITL rather than ETL
• ITL: Ingest-Transform-Load
• ETL: Extract-Transform-Load
• Transform data in near-real time
rather than a scheduled job
• Enrich data in near-real time
• Persist transformed and/or
enriched data
Data producer
Amazon Kinesis
Firehose
Raw streaming
data
AWS Lambda
function
Amazon S3
Transformed
data
Transform
data
Enrichment
source data
Raw data Transformed and/or
enriched data
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Batch to streaming best practices
Aggregate upon arrival
• Continuously write raw data
to persistent data store for
archival and other analysis
• Aggregate in real time when
window size < 1 hour
• Write aggregated data to
persistent data store for
immediate value
Amazon Kinesis
Firehose
Raw streaming
data
Amazon S3
Raw
data
Aggregated
data
Amazon Kinesis
Analytics
Aggregate
Results
Data producer
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Batch to streaming example
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Brandon Smith
• Senior software engineer
• Worked at Amazon for 12 years in Kindle, AWS, and now Last Mile Delivery
• Currently working on Amazon Flex
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Amazon delivery app (Android/iOS)
• Crowd-sourced model launched in
30+ U.S. cities
• Used by Amazon Logistics worldwide
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Deliveries for Amazon.com, Prime
Now, Amazon Fresh, restaurants,
grocery stores
• Millions of packages per year
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The problem
• Collecting, processing, and storing telemetry data
• Telemetry data = remote measurements
• Includes metrics, crashes, logs, sensor data, clickstream data, etc.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The goal
• Understand what’s happening in the field
• Analyze all the data and make performance optimizations
• Focus our time on improving the app and the delivery flow
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Use cases
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Use case 1: Alarming
• We want to know within minutes if there are problems
• Example: If the delivery count drops below our expected/historical value,
we want to alarm
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Use case 2: Troubleshooting
• Logs and crashes published to AWS CloudWatch Logs in near-real time
• Can filter and search to troubleshoot issues
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Use case 3: Dashboards
• We can write SQL, generate reports, and create visualizations
• But we really want real-time dashboards instead of daily reports
Daily reports Real-time dashboards
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Use case 4: Releases
• Deploying new app versions and monitoring adoption in real time
• Release new code smoothly and with confidence
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Use case 5: Sharing data
• Consumers get notifications of new data in real time
• Consumers can join their data with other data in the data lake
S3 bucket Data lake
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Use case 6: Deeper analytics
• Look at the stream of data and the historical data
• Build ML models, create predictions, detect anomalies
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How did we build it?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Getting from batch to streaming
• To solve our use cases, we had to incrementally improve our system
• We evolved from a batch-based system to a stream-based system
• Let’s walk through the iterations
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Collect metrics and send to an existing metrics service
• ETL jobs to load data into a big Oracle Data Warehouse
Iteration 1: Use existing systems
Existing metrics serviceApp DW
ETL
Data
collection
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
1. Batch process with 24-hour delay
2. Fixed, inflexible DB schema
3. Analysis difficult and slow via SQL
Iteration 1: Use existing systems
Existing metrics serviceApp DW
ETL
Data
collection
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Collect metrics in the app using AWS Amazon Mobile Analytics SDK,
which automatically loads data into Redshift
Iteration 2: Use AWS
App
CloudFormation
ETL system
Data
collection
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
1. Batch process with 24-hour delay 2-hour delay
2. Fixed, inflexible DB schema
3. Analysis difficult and slow via SQL
Iteration 2: Use AWS
App
CloudFormation
ETL system
Data
Collection
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Add shared configuration that is used in the app and automatically
updates the Redshift schema
Iteration 3: Automated DB schema
App
CloudFormation
ETL system
Data
collection
Schema config
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
1. Batch process with 24-hour delay 2-hour delay
2. Fixed, inflexible Auto-updating DB schema
3. Analysis difficult and slow via SQL
Iteration 3: Automated DB schema
App
Schema config
CloudFormation
ETL system
Data
collection
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Introduce a Kinesis stream and Kinesis Firehose to publish to Redshift
• Partition data by date to simplify data retention policies
Iteration 4: Use Streams
App
Data
collection Via Pinpoint
Schema
config
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
1. Batch Streaming process with 24-hour 2 hour a delay of a couple
minutes
2. Fixed, inflexible Auto-updating DB schema
3. Analysis difficult and slow via SQL
Iteration 4: Use Streams
App
Data
collection Via Pinpoint
Schema
config
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Use generic message types
• Publish the data to:
• S3
• Redshift
• ElasticSearch
Iteration 5: Generic message types
App
ElasticSearch
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Iteration 5
App
Data
collection
ElasticSearch
Consumer Lambdas
SQL reports
Dashboards
ProtoBuf
Consumer Redshifts
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
1. Batch Streaming process with 24-hour 2 hour a few seconds delay
2. Fixed, inflexible Auto-updating DB schema and generic message types
3. Analysis difficult and slow via SQL flexible by processing message payload
Iteration 5: Generic message types
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data flow
App
ElasticSearch
Consumer Redshifts
Consumer Lambdas
SQL reports
Dashboards
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Future improvements
Some ideas to make the system even better:
1. Use Kinesis Analytics to query the real-time data stream
2. Use AWS Athena to query data directly from S3
3. Use AWS Amazon AI Services to do deeper data analysis
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Summary
Did we solve our use cases?
1. Real-time metrics and alarming
2. Real-time dashboards
3. Real-time logs and crash troubleshooting
4. Monitoring new releases
5. Sharing data with other teams
6. Deeper analytics
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Benefits of Streaming
1. Agility: real-time data means your business can react quicker
2. Flexibility: generic message types give you flexible schemas so your
system can handle multiple data types and future use cases
3. Shareability: streams allow you to multiplex and share your data easily
with your consumers
4. Extensibility: Processing streams of data allows us to write it to
multiple data storage systems, which enables a variety of analytics
tools
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!

More Related Content

What's hot

Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Visualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSightVisualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSightAmazon Web Services
 
AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveCobus Bernard
 
AWS October Webinar Series - Introducing Amazon QuickSight
AWS October Webinar Series - Introducing Amazon QuickSightAWS October Webinar Series - Introducing Amazon QuickSight
AWS October Webinar Series - Introducing Amazon QuickSightAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料Amazon Web Services
 
Visualization with Amazon QuickSight
Visualization with Amazon QuickSightVisualization with Amazon QuickSight
Visualization with Amazon QuickSightAmazon Web Services
 
Visualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSightVisualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSightAmazon Web Services
 
Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)Amazon Web Services
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaAmazon Web Services
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Amazon Web Services
 
Building Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudBuilding Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudAmazon Web Services
 
Getting Started with Serverless Architectures
Getting Started with Serverless ArchitecturesGetting Started with Serverless Architectures
Getting Started with Serverless ArchitecturesAmazon Web Services
 

What's hot (20)

Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Intro to AWS Lambda
Intro to AWS Lambda Intro to AWS Lambda
Intro to AWS Lambda
 
Visualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSightVisualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSight
 
AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep Dive
 
AWS October Webinar Series - Introducing Amazon QuickSight
AWS October Webinar Series - Introducing Amazon QuickSightAWS October Webinar Series - Introducing Amazon QuickSight
AWS October Webinar Series - Introducing Amazon QuickSight
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
 
Visualization with Amazon QuickSight
Visualization with Amazon QuickSightVisualization with Amazon QuickSight
Visualization with Amazon QuickSight
 
Visualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSightVisualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSight
 
Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)
 
AWS Lambda Features and Uses
AWS Lambda Features and UsesAWS Lambda Features and Uses
AWS Lambda Features and Uses
 
Introduction to Amazon Aurora
Introduction to Amazon AuroraIntroduction to Amazon Aurora
Introduction to Amazon Aurora
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
 
Introduction to AWS Batch
Introduction to AWS BatchIntroduction to AWS Batch
Introduction to AWS Batch
 
Aws ppt
Aws pptAws ppt
Aws ppt
 
Building Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudBuilding Data Lakes in the AWS Cloud
Building Data Lakes in the AWS Cloud
 
Introduction to Amazon DynamoDB
Introduction to Amazon DynamoDBIntroduction to Amazon DynamoDB
Introduction to Amazon DynamoDB
 
Amazon Kinesis
Amazon KinesisAmazon Kinesis
Amazon Kinesis
 
Getting Started with Serverless Architectures
Getting Started with Serverless ArchitecturesGetting Started with Serverless Architectures
Getting Started with Serverless Architectures
 

Similar to ABD217_From Batch to Streaming

From Batch to Streaming - How Amazon Flex Uses Real-time Analytics
From Batch to Streaming - How Amazon Flex Uses Real-time AnalyticsFrom Batch to Streaming - How Amazon Flex Uses Real-time Analytics
From Batch to Streaming - How Amazon Flex Uses Real-time AnalyticsAmazon Web Services
 
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...Amazon Web Services
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesAmazon Web Services
 
Citrix Moves Data to Amazon Redshift Fast with Matillion ETL
 Citrix Moves Data to Amazon Redshift Fast with Matillion ETL Citrix Moves Data to Amazon Redshift Fast with Matillion ETL
Citrix Moves Data to Amazon Redshift Fast with Matillion ETLAmazon Web Services
 
Analyzing Streaming Data in Real-time with Amazon Kinesis
Analyzing Streaming Data in Real-time with Amazon KinesisAnalyzing Streaming Data in Real-time with Amazon Kinesis
Analyzing Streaming Data in Real-time with Amazon KinesisAmazon Web Services
 
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018Amazon Web Services
 
Considerations for Building Your First Streaming Application (ANT359) - AWS r...
Considerations for Building Your First Streaming Application (ANT359) - AWS r...Considerations for Building Your First Streaming Application (ANT359) - AWS r...
Considerations for Building Your First Streaming Application (ANT359) - AWS r...Amazon Web Services
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
 
Getting started with Amazon Kinesis
Getting started with Amazon KinesisGetting started with Amazon Kinesis
Getting started with Amazon KinesisAmazon Web Services
 
Getting started with amazon kinesis
Getting started with amazon kinesisGetting started with amazon kinesis
Getting started with amazon kinesisJampp
 
Serverless Datalake Day with AWS
Serverless Datalake Day with AWSServerless Datalake Day with AWS
Serverless Datalake Day with AWSAmazon Web Services
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesAmazon Web Services
 
21st Century Analytics with Zopa
21st Century Analytics with Zopa21st Century Analytics with Zopa
21st Century Analytics with ZopaAmazon Web Services
 
How TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPTHow TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPTAmazon Web Services
 
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Amazon Web Services
 
How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...
How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...
How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...Amazon Web Services
 
Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018Amazon Web Services
 
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018Amazon Web Services
 

Similar to ABD217_From Batch to Streaming (20)

From Batch to Streaming - How Amazon Flex Uses Real-time Analytics
From Batch to Streaming - How Amazon Flex Uses Real-time AnalyticsFrom Batch to Streaming - How Amazon Flex Uses Real-time Analytics
From Batch to Streaming - How Amazon Flex Uses Real-time Analytics
 
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
 
Citrix Moves Data to Amazon Redshift Fast with Matillion ETL
 Citrix Moves Data to Amazon Redshift Fast with Matillion ETL Citrix Moves Data to Amazon Redshift Fast with Matillion ETL
Citrix Moves Data to Amazon Redshift Fast with Matillion ETL
 
Analyzing Streaming Data in Real-time with Amazon Kinesis
Analyzing Streaming Data in Real-time with Amazon KinesisAnalyzing Streaming Data in Real-time with Amazon Kinesis
Analyzing Streaming Data in Real-time with Amazon Kinesis
 
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
 
Considerations for Building Your First Streaming Application (ANT359) - AWS r...
Considerations for Building Your First Streaming Application (ANT359) - AWS r...Considerations for Building Your First Streaming Application (ANT359) - AWS r...
Considerations for Building Your First Streaming Application (ANT359) - AWS r...
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
Getting started with Amazon Kinesis
Getting started with Amazon KinesisGetting started with Amazon Kinesis
Getting started with Amazon Kinesis
 
Getting started with amazon kinesis
Getting started with amazon kinesisGetting started with amazon kinesis
Getting started with amazon kinesis
 
Serverless Datalake Day with AWS
Serverless Datalake Day with AWSServerless Datalake Day with AWS
Serverless Datalake Day with AWS
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
 
21st Century Analytics with Zopa
21st Century Analytics with Zopa21st Century Analytics with Zopa
21st Century Analytics with Zopa
 
How TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPTHow TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPT
 
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
 
How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...
How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...
How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...
 
What's new in AWS?
What's new in AWS?What's new in AWS?
What's new in AWS?
 
STG401_This Is My Architecture
STG401_This Is My ArchitectureSTG401_This Is My Architecture
STG401_This Is My Architecture
 
Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018
 
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 
Come costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSCome costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 
Come costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSCome costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWS
 

ABD217_From Batch to Streaming

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS re:INVENT From Batch to Streaming: H o w A m a z o n F l e x U s e s R e a l - t i m e A n a l y t i c s t o D e l i v e r P a c k a g e s o n T i m e N o v e m b e r 2 8 , 2 0 1 7
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda • Real-time streaming data overview • Streaming data services • Benefits of streaming analytics • Batch to streaming best practices • How Amazon Flex moved from batch to streaming
  • 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What is batch processing? Execution of a series of jobs in a program on a computer without manual intervention - Wikipedia • Data is collected over a period of time • Process and analyze on a schedule • Combine several processes to obtain final result
  • 4. Most data is produced continuously Mobile apps Web clickstream Application logs Metering records IoT sensors Smart buildings
  • 5. The diminishing value of data Recent data is highly valuable • If you act on it in time • Perishable insights (M. Gualtieri, Forrester) Old + recent data is more valuable • If you have the means to combine them
  • 6. Processing real-time, streaming data • Durable • Continuous • Fast • Correct • Reactive • Reliable What are the key requirements? Collect Transform Analyze React Persist
  • 7. Amazon Kinesis makes it easy to work with real- time streaming data Kinesis Streams • For technical developers • Collect and stream data for ordered, replayable, real-time processing Kinesis Firehose • For all developers, data scientists • Easily load massive volumes of streaming data into Amazon S3, Redshift, ElasticSearch Kinesis Analytics • For all developers, data scientists • Easily analyze data streams using standard SQL queries • Compute analytics in real time
  • 8. Amazon Kinesis Streams • Reliably ingest and durably store streaming data at low cost • Build custom real-time applications to process streaming data • Use your stream-processing framework of choice
  • 9. Amazon Kinesis Firehose • Reliably ingest and deliver batched, compressed, and encrypted data to S3, Redshift, and Elasticsearch • Point and click setup with zero administration and seamless elasticity • Managed stream-processing consumer
  • 10. Amazon Kinesis Analytics • Interact with streaming data in real time using SQL • Build fully managed and elastic stream processing applications that process data for real-time visualizations and alarms
  • 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Benefits of streaming analysis Immediate results • Real-time aggregations • Filtering • Anomaly detection Reduced complexity • Fewer scheduled jobs to manage • Kinesis is a fully- managed solution Scalable • Enables parallel processing • Horizontally scales, based on your ingest rate
  • 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Batch to streaming best practices Migrate incrementally • Don’t boil the ocean • Begin by streaming data in parallel to existing batch processes • Persist streaming data into durable storage, like Amazon S3 • Add in streaming analysis results to replace batch analysis Application databases Data warehouseData producer Amazon Kinesis ETL ETL Amazon S3 Streaming data
  • 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Batch to streaming best practices Perform ITL rather than ETL • ITL: Ingest-Transform-Load • ETL: Extract-Transform-Load • Transform data in near-real time rather than a scheduled job • Enrich data in near-real time • Persist transformed and/or enriched data Data producer Amazon Kinesis Firehose Raw streaming data AWS Lambda function Amazon S3 Transformed data Transform data Enrichment source data Raw data Transformed and/or enriched data
  • 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Batch to streaming best practices Aggregate upon arrival • Continuously write raw data to persistent data store for archival and other analysis • Aggregate in real time when window size < 1 hour • Write aggregated data to persistent data store for immediate value Amazon Kinesis Firehose Raw streaming data Amazon S3 Raw data Aggregated data Amazon Kinesis Analytics Aggregate Results Data producer
  • 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Batch to streaming example
  • 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Brandon Smith • Senior software engineer • Worked at Amazon for 12 years in Kindle, AWS, and now Last Mile Delivery • Currently working on Amazon Flex
  • 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Amazon delivery app (Android/iOS) • Crowd-sourced model launched in 30+ U.S. cities • Used by Amazon Logistics worldwide
  • 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Deliveries for Amazon.com, Prime Now, Amazon Fresh, restaurants, grocery stores • Millions of packages per year
  • 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The problem • Collecting, processing, and storing telemetry data • Telemetry data = remote measurements • Includes metrics, crashes, logs, sensor data, clickstream data, etc.
  • 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The goal • Understand what’s happening in the field • Analyze all the data and make performance optimizations • Focus our time on improving the app and the delivery flow
  • 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Use cases
  • 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Use case 1: Alarming • We want to know within minutes if there are problems • Example: If the delivery count drops below our expected/historical value, we want to alarm
  • 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Use case 2: Troubleshooting • Logs and crashes published to AWS CloudWatch Logs in near-real time • Can filter and search to troubleshoot issues
  • 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Use case 3: Dashboards • We can write SQL, generate reports, and create visualizations • But we really want real-time dashboards instead of daily reports Daily reports Real-time dashboards
  • 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Use case 4: Releases • Deploying new app versions and monitoring adoption in real time • Release new code smoothly and with confidence
  • 26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Use case 5: Sharing data • Consumers get notifications of new data in real time • Consumers can join their data with other data in the data lake S3 bucket Data lake
  • 27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Use case 6: Deeper analytics • Look at the stream of data and the historical data • Build ML models, create predictions, detect anomalies
  • 28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How did we build it?
  • 29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Getting from batch to streaming • To solve our use cases, we had to incrementally improve our system • We evolved from a batch-based system to a stream-based system • Let’s walk through the iterations
  • 30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Collect metrics and send to an existing metrics service • ETL jobs to load data into a big Oracle Data Warehouse Iteration 1: Use existing systems Existing metrics serviceApp DW ETL Data collection
  • 31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 1. Batch process with 24-hour delay 2. Fixed, inflexible DB schema 3. Analysis difficult and slow via SQL Iteration 1: Use existing systems Existing metrics serviceApp DW ETL Data collection
  • 32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Collect metrics in the app using AWS Amazon Mobile Analytics SDK, which automatically loads data into Redshift Iteration 2: Use AWS App CloudFormation ETL system Data collection
  • 33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 1. Batch process with 24-hour delay 2-hour delay 2. Fixed, inflexible DB schema 3. Analysis difficult and slow via SQL Iteration 2: Use AWS App CloudFormation ETL system Data Collection
  • 34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Add shared configuration that is used in the app and automatically updates the Redshift schema Iteration 3: Automated DB schema App CloudFormation ETL system Data collection Schema config
  • 35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 1. Batch process with 24-hour delay 2-hour delay 2. Fixed, inflexible Auto-updating DB schema 3. Analysis difficult and slow via SQL Iteration 3: Automated DB schema App Schema config CloudFormation ETL system Data collection
  • 36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Introduce a Kinesis stream and Kinesis Firehose to publish to Redshift • Partition data by date to simplify data retention policies Iteration 4: Use Streams App Data collection Via Pinpoint Schema config
  • 37. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 1. Batch Streaming process with 24-hour 2 hour a delay of a couple minutes 2. Fixed, inflexible Auto-updating DB schema 3. Analysis difficult and slow via SQL Iteration 4: Use Streams App Data collection Via Pinpoint Schema config
  • 38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Use generic message types • Publish the data to: • S3 • Redshift • ElasticSearch Iteration 5: Generic message types App ElasticSearch
  • 39. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Iteration 5 App Data collection ElasticSearch Consumer Lambdas SQL reports Dashboards ProtoBuf Consumer Redshifts
  • 40. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 1. Batch Streaming process with 24-hour 2 hour a few seconds delay 2. Fixed, inflexible Auto-updating DB schema and generic message types 3. Analysis difficult and slow via SQL flexible by processing message payload Iteration 5: Generic message types
  • 41. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data flow App ElasticSearch Consumer Redshifts Consumer Lambdas SQL reports Dashboards
  • 42. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Future improvements Some ideas to make the system even better: 1. Use Kinesis Analytics to query the real-time data stream 2. Use AWS Athena to query data directly from S3 3. Use AWS Amazon AI Services to do deeper data analysis
  • 43. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Summary Did we solve our use cases? 1. Real-time metrics and alarming 2. Real-time dashboards 3. Real-time logs and crash troubleshooting 4. Monitoring new releases 5. Sharing data with other teams 6. Deeper analytics
  • 44. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Benefits of Streaming 1. Agility: real-time data means your business can react quicker 2. Flexibility: generic message types give you flexible schemas so your system can handle multiple data types and future use cases 3. Shareability: streams allow you to multiplex and share your data easily with your consumers 4. Extensibility: Processing streams of data allows us to write it to multiple data storage systems, which enables a variety of analytics tools
  • 45. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you!