SlideShare a Scribd company logo
Serverless Data
Streaming at Scale
Anahit Pogosova
Lead Cloud Software Engineer (Solita / Yle)
20.10.2020
o Who, What & Why?
o Under the Hood
o Gotchas and Lessons Learned
• Lead Cloud Software Engineer
• Part of the Data & AI team at
Finland’s national public
broadcasting company
Me
• AWS Community Builder
10+ years
”full stack”, all kinds of stuff
• Yle Areena, the biggest
streaming service in Finland
• Areena recommendations
• Areena image personalisation
• Automatic image extraction
• Article recommendations (yle.fi)
• Smart notifications (Yle Uutisvahti)
• .. and more
Yle
• Data!
• user interaction and content metadata
• Collecting the data
• Storing the data
• Visualizing the data
• Utilizing the data (ML & AI)
• To understand the customers
• To help provide better service for everyone
{
"adobe":true,
"is_heartbeat":true,
"collectorreceived":1555267233549,
...
"s:asset:name":"Yle TV1",
"s:event:type":"start",
"s:meta:category":"nettitv",
"s:meta:content_type":"livetv",
"s:meta:ns_st_st":"yle tv1",
...
"s:meta:title":"eduskuntavaalit 2019 - tulosilta",
...
"s:meta:yle.vrsContent":"video",
"s:meta:yle.vrsDevice":"android",
"s:meta:yle.vrsPlatform":"mobile",
"s:meta:yle.vrsProduct":"areena",
"s:meta:yle_client":"android.areena.481-b4ce224bf",
"s:meta:yle_language":"fi",
"s:sp:channel":"yleisradio",
"s:sp:hb_version":"android-2.2.1.214-d5c678",
"s:user:mid":"71057009616815049761612335654599557361"
}
Yle
• ~ 500 000 000 requests per day
• ~ 600 000 rpm during prime time
• > 0.5 TB event data per day
• Apache Parquet
• JSON
• Max so far: ~ 2.5 mln rpm
• elections + hockey finals
Yle
Under the Hood
Agenda
Load the data to the datalake in a
columnar format
Enable content personalization
through near real-time analytics
No server is easier to manage than
“no server”.
Dr. Werner Vogels
(CTO, Amazon)
Kinesis Data Streams
• Fully managed and massively scalable service to stream data
• Data available in milliseconds and stored from 24 hours to up to 7
days
• Custom stream processing with consumers
• Shard is the unit of parallelism
• In: 1 MB/sec or 1 000 records/sec
• Out: 2 MB/sec
Amazon Kinesis
Agent
• Stand-alone Java
application to
stream data from
files
Service
Integrations
• CloudWatch Logs
• CloudWatch Events
• AWS IoT
• DB Migration Service
• API Gateway
Amazon Kinesis
Producer Library
(KPL)
• Provides higher
level of abstraction
over API calls
Amazon Kinesis
API (AWS SDK)
• Most flexible
• Allows full control
over writing data
Kinesis Data Streams, Writing Data
• putRecord(params, callback)
• putRecords(params, callback)
• Up to 500 records
• Up to 5 MiB
Kinesis Data Streams, AWS SDK
putRecords(params, callback)
• Request failure
• Retries by default up to 3 times
• Uses exponential backoff
• Base delay by default is 100 ms
Kinesis Data Streams, AWS SDK
Lambda
o One Lambda is invoked per each shard by default
• NEW(ish)! Parallelization factor (max 10)
• Up to 10 times as many concurrent Lambdas as there are shards!
o Lambda is invoked once per second,
or:
• the number of records reaches the configured batch size
(max 10 000 records)
• the record batch size reaches synchronous Lambda’s payload limit
(6MB)
• NEW(ish)! the batch window reaches its maximum value
(max 5 min)
Lambda
Before..
• Lambda retries the batch until
success or data expiration
• No other batches are
processed from the
shard (aka poison pill)!
Lambda, Error Handling
After!
• Maximum retry attempts
(max 10 000)
• Maximum record age
(1 min – 7 days)
• Bisect batch on function failure
• On-failure destination
(SQS or SNS)
Agenda
Load the data to the datalake in a
columnar format
Enable content personalization
through near real-time analytics
• Fully managed service to load streaming
data into a data lake
• S3, Redshift, AWS Elasticsearch
• HTTP endpoints (New!)
• Datadog, New Relic, MongoDB, and Splunk (Newish!)
• Allows to load streaming data with 0 lines of code
• Scales automatically (no shards to manage)
• Can batch, compress, transform and convert
data before loading to the destination
Kinesis Firehose
• Data stored up to 24 hours
• Batches records to certain size or for certain
period of time
• 1 to 128 MB
• 60 to 900 seconds
• Uses Glue Data Catalog to convert JSON to
• Apache Parquet
• Apache ORC
Kinesis Firehose
Kinesis Streams vs. Firehose
• Fully managed service to
stream data
• Data available up to 7 days
• Scaling using shards
• Custom stream processing with
consumers
• Fully managed service to
load data into a data lake
• Data available for 24 hours
• Scales automatically
• Batching, compressing, converting
data out of the box
+ custom transformations with
Amazon Kinesis
Agent
• Stand-alone Java
application to
stream data from
files
Service
Integrations
• Kinesis Streams
• CloudWatch Logs
• CloudWatch Events
• AWS IoT
Amazon Kinesis API
(AWS SDK)
• Most flexible
• Allows full control
over writing data
Kinesis Firehose, Writing Data
putRecordBatch(params, callback)
• Request failure
• Retries by default up to 3 times
• Uses exponential backoff
• Base delay by default is 100 ms
Kinesis Firehose, AWS SDK
Load the data to the datalake in a
columnar format
Enable content personalization
through near real-time analytics
Agenda
• Fully managed service to run SQL queries
on the streaming data
• Join, filter and aggregate data over a time-based or a row-based window
Kinesis Data Analytics
Ingests data from:
Kinesis Data Stream
Kinesis Firehose
Sends results to:
Kinesis Data Stream
Kinesis Firehose
Lambda function
Gotchas and
Lessons Learned
putRecords(params, callback)
Partial failure:
• Exponential backoff + jitter
Gotcha!
Writing to Kinesis Streams
• Kinesis limits are per second, CloudWatch metrics are per minute
• 1 MB/sec or 1 000 records/sec
• Can 5 000 records/minute exceed the throughput?
• Beware of network latency!
• can be one reason for bursts in Kinesis
• avoid the external network by using a Kinesis VPC endpoint
Gotcha!
Writing to Kinesis Streams
• IncomingRecords = PutRecord + PutRecods
The number of records successfully put to the Kinesis Stream
• WriteProvisionedThroughputExceeded = PutRecord + PutRecords
The number of records rejected due to throttling
• IncomingRecords + WriteProvisionedThroughputExceeded = Total
amount of incoming records
Gotcha!
Writing to Kinesis Streams
• IteratorAge
• latency between when a record is added, and when it is processed
• If it’s increasing, increase the number of shards, or
• increase the parallelization factor (NEWish)
• Two different iterator age metrics
• Kinesis stream iterator age is a combination metric across all consumers
• not too informative
• Lambda’s own iterator age should be used instead!
Gotcha!
Reading from Kinesis Streams
• Beware of timeouts!
• connectTimeout: timeout for establishing a new connection on a
socket
• if not explicitly set, this value will default to the value of timeout
• timeout: read timeout for an existing socket (2 min)
• time between when request ends and the response is
received, including service and network round-trips
Gotcha!
• Firehose scales endlessly!
Or does it?
• “It is a fully managed service that automatically scales to match the
throughput of your data.”
• ”When Direct PUT is configured as the data source, each Kinesis Data
Firehose delivery stream is subject to the following limits: […]
5,000 records/second, 2,000 transactions/second, and 5 MiB/second.
• ThrottledRecords: the number of records that were throttled because data
ingestion exceeded one of the delivery stream limits.
Gotcha!
• Always learn about the service limits
• (there are always limits)
• hard and soft limits
• Keep a close eye on lambda’s
concurrency limits
• Deep dive into the error handling
• Don’t just assume things ..
• If not sure, ask the AWS Support
• Keep a close eye on service updates
• Everything fails all the time, especially at scale, so better be prepared and
fail fast
“Everything fails,
all the time”
Dr. Werner Vogels
(CTO, Amazon)
Lessons Learned
Thank you!
ANAHIT POGOSOVA
@anahit_fi
Shameless Plug
Real World Serverless with Yan Cui, @theburningmonk
episode #14
Mastering AWS Kinesis Data Streams, Part 1 (2)
dev.solita.fi
@anahit_fi
Anahit Pogosova
AWS Community Nordics Virtual Meetup

More Related Content

What's hot

Cloud Connect 2012, Big Data @ Netflix
Cloud Connect 2012, Big Data @ NetflixCloud Connect 2012, Big Data @ Netflix
Cloud Connect 2012, Big Data @ Netflix
Jerome Boulon
 
Real-time Cassandra
Real-time CassandraReal-time Cassandra
Real-time Cassandra
Acunu
 
Introducing Kafka Connect and Implementing Custom Connectors
Introducing Kafka Connect and Implementing Custom ConnectorsIntroducing Kafka Connect and Implementing Custom Connectors
Introducing Kafka Connect and Implementing Custom Connectors
Itai Yaffe
 
MongoDB World 2016: Scaling Targeted Notifications in the Music Streaming Wor...
MongoDB World 2016: Scaling Targeted Notifications in the Music Streaming Wor...MongoDB World 2016: Scaling Targeted Notifications in the Music Streaming Wor...
MongoDB World 2016: Scaling Targeted Notifications in the Music Streaming Wor...
MongoDB
 
MMS - Monitoring, backup and management at a single click
MMS - Monitoring, backup and management at a single clickMMS - Monitoring, backup and management at a single click
MMS - Monitoring, backup and management at a single click
Matias Cascallares
 
Cost Effectively and Reliably Aggregating Billions of Messages Per Day Using ...
Cost Effectively and Reliably Aggregating Billions of Messages Per Day Using ...Cost Effectively and Reliably Aggregating Billions of Messages Per Day Using ...
Cost Effectively and Reliably Aggregating Billions of Messages Per Day Using ...
confluent
 
Mongo db eveningschemadesign
Mongo db eveningschemadesignMongo db eveningschemadesign
Mongo db eveningschemadesign
MongoDB APAC
 
Cassandra 2.0 (Introduction)
Cassandra 2.0 (Introduction)Cassandra 2.0 (Introduction)
Cassandra 2.0 (Introduction)
bigdatagurus_meetup
 
Big data serving: Processing and inference at scale in real time
Big data serving: Processing and inference at scale in real timeBig data serving: Processing and inference at scale in real time
Big data serving: Processing and inference at scale in real time
Itai Yaffe
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at Appsflyer
Michael Spector
 
Game Analytics at London Apache Druid Meetup
Game Analytics at London Apache Druid MeetupGame Analytics at London Apache Druid Meetup
Game Analytics at London Apache Druid Meetup
Jelena Zanko
 
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience SharingClickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Vianney FOUCAULT
 
NoSQL benchmarking
NoSQL benchmarkingNoSQL benchmarking
NoSQL benchmarking
Prasoon Kumar
 
Webinar : Nouveautés de MongoDB 3.2
Webinar : Nouveautés de MongoDB 3.2Webinar : Nouveautés de MongoDB 3.2
Webinar : Nouveautés de MongoDB 3.2
MongoDB
 
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
Lucidworks
 
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
Prasoon Kumar
 
Meetup070416 Presentations
Meetup070416 PresentationsMeetup070416 Presentations
Meetup070416 Presentations
Ana Rebelo
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipeline
Monal Daxini
 
HBaseCon 2013: Near Real Time Indexing for eBay Search
HBaseCon 2013: Near Real Time Indexing for eBay SearchHBaseCon 2013: Near Real Time Indexing for eBay Search
HBaseCon 2013: Near Real Time Indexing for eBay Search
Cloudera, Inc.
 
Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"
Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"
Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"
Fwdays
 

What's hot (20)

Cloud Connect 2012, Big Data @ Netflix
Cloud Connect 2012, Big Data @ NetflixCloud Connect 2012, Big Data @ Netflix
Cloud Connect 2012, Big Data @ Netflix
 
Real-time Cassandra
Real-time CassandraReal-time Cassandra
Real-time Cassandra
 
Introducing Kafka Connect and Implementing Custom Connectors
Introducing Kafka Connect and Implementing Custom ConnectorsIntroducing Kafka Connect and Implementing Custom Connectors
Introducing Kafka Connect and Implementing Custom Connectors
 
MongoDB World 2016: Scaling Targeted Notifications in the Music Streaming Wor...
MongoDB World 2016: Scaling Targeted Notifications in the Music Streaming Wor...MongoDB World 2016: Scaling Targeted Notifications in the Music Streaming Wor...
MongoDB World 2016: Scaling Targeted Notifications in the Music Streaming Wor...
 
MMS - Monitoring, backup and management at a single click
MMS - Monitoring, backup and management at a single clickMMS - Monitoring, backup and management at a single click
MMS - Monitoring, backup and management at a single click
 
Cost Effectively and Reliably Aggregating Billions of Messages Per Day Using ...
Cost Effectively and Reliably Aggregating Billions of Messages Per Day Using ...Cost Effectively and Reliably Aggregating Billions of Messages Per Day Using ...
Cost Effectively and Reliably Aggregating Billions of Messages Per Day Using ...
 
Mongo db eveningschemadesign
Mongo db eveningschemadesignMongo db eveningschemadesign
Mongo db eveningschemadesign
 
Cassandra 2.0 (Introduction)
Cassandra 2.0 (Introduction)Cassandra 2.0 (Introduction)
Cassandra 2.0 (Introduction)
 
Big data serving: Processing and inference at scale in real time
Big data serving: Processing and inference at scale in real timeBig data serving: Processing and inference at scale in real time
Big data serving: Processing and inference at scale in real time
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at Appsflyer
 
Game Analytics at London Apache Druid Meetup
Game Analytics at London Apache Druid MeetupGame Analytics at London Apache Druid Meetup
Game Analytics at London Apache Druid Meetup
 
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience SharingClickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
 
NoSQL benchmarking
NoSQL benchmarkingNoSQL benchmarking
NoSQL benchmarking
 
Webinar : Nouveautés de MongoDB 3.2
Webinar : Nouveautés de MongoDB 3.2Webinar : Nouveautés de MongoDB 3.2
Webinar : Nouveautés de MongoDB 3.2
 
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
 
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
 
Meetup070416 Presentations
Meetup070416 PresentationsMeetup070416 Presentations
Meetup070416 Presentations
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipeline
 
HBaseCon 2013: Near Real Time Indexing for eBay Search
HBaseCon 2013: Near Real Time Indexing for eBay SearchHBaseCon 2013: Near Real Time Indexing for eBay Search
HBaseCon 2013: Near Real Time Indexing for eBay Search
 
Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"
Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"
Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"
 

Similar to AWS Community Nordics Virtual Meetup

Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteStructure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Gigaom
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
Amazon Web Services
 
AWS APAC Webinar Week - Real Time Data Processing with Kinesis
AWS APAC Webinar Week - Real Time Data Processing with KinesisAWS APAC Webinar Week - Real Time Data Processing with Kinesis
AWS APAC Webinar Week - Real Time Data Processing with Kinesis
Amazon Web Services
 
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon KinesisDay 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Amazon Web Services
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Amazon Web Services
 
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB
 
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
Amazon Web Services
 
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Amazon Web Services
 
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...
Amazon Web Services
 
AWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon Kinesis
Amazon Web Services
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
Amazon Web Services Korea
 
Amazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptxAmazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptx
RenjithPillai26
 
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
Amazon Web Services
 
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...
Amazon Web Services
 
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming Applications
Amazon Web Services
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
Amazon Web Services
 
Real Time Data Processing Using AWS Lambda
Real Time Data Processing Using AWS LambdaReal Time Data Processing Using AWS Lambda
Real Time Data Processing Using AWS Lambda
Amazon Web Services
 
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Web Services
 
AWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis WebinarAWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis Webinar
Amazon Web Services
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
Amazon Web Services
 

Similar to AWS Community Nordics Virtual Meetup (20)

Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteStructure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
AWS APAC Webinar Week - Real Time Data Processing with Kinesis
AWS APAC Webinar Week - Real Time Data Processing with KinesisAWS APAC Webinar Week - Real Time Data Processing with Kinesis
AWS APAC Webinar Week - Real Time Data Processing with Kinesis
 
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon KinesisDay 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...
 
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
 
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
 
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
 
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...
 
AWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon Kinesis
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
 
Amazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptxAmazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptx
 
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
 
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...
 
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming Applications
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
 
Real Time Data Processing Using AWS Lambda
Real Time Data Processing Using AWS LambdaReal Time Data Processing Using AWS Lambda
Real Time Data Processing Using AWS Lambda
 
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
 
AWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis WebinarAWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis Webinar
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
 

Recently uploaded

State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 

Recently uploaded (20)

State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 

AWS Community Nordics Virtual Meetup

  • 1. Serverless Data Streaming at Scale Anahit Pogosova Lead Cloud Software Engineer (Solita / Yle) 20.10.2020
  • 2. o Who, What & Why? o Under the Hood o Gotchas and Lessons Learned
  • 3.
  • 4. • Lead Cloud Software Engineer • Part of the Data & AI team at Finland’s national public broadcasting company Me • AWS Community Builder 10+ years ”full stack”, all kinds of stuff
  • 5. • Yle Areena, the biggest streaming service in Finland • Areena recommendations • Areena image personalisation • Automatic image extraction • Article recommendations (yle.fi) • Smart notifications (Yle Uutisvahti) • .. and more Yle
  • 6. • Data! • user interaction and content metadata • Collecting the data • Storing the data • Visualizing the data • Utilizing the data (ML & AI) • To understand the customers • To help provide better service for everyone { "adobe":true, "is_heartbeat":true, "collectorreceived":1555267233549, ... "s:asset:name":"Yle TV1", "s:event:type":"start", "s:meta:category":"nettitv", "s:meta:content_type":"livetv", "s:meta:ns_st_st":"yle tv1", ... "s:meta:title":"eduskuntavaalit 2019 - tulosilta", ... "s:meta:yle.vrsContent":"video", "s:meta:yle.vrsDevice":"android", "s:meta:yle.vrsPlatform":"mobile", "s:meta:yle.vrsProduct":"areena", "s:meta:yle_client":"android.areena.481-b4ce224bf", "s:meta:yle_language":"fi", "s:sp:channel":"yleisradio", "s:sp:hb_version":"android-2.2.1.214-d5c678", "s:user:mid":"71057009616815049761612335654599557361" } Yle
  • 7. • ~ 500 000 000 requests per day • ~ 600 000 rpm during prime time • > 0.5 TB event data per day • Apache Parquet • JSON • Max so far: ~ 2.5 mln rpm • elections + hockey finals Yle
  • 9.
  • 10. Agenda Load the data to the datalake in a columnar format Enable content personalization through near real-time analytics
  • 11. No server is easier to manage than “no server”. Dr. Werner Vogels (CTO, Amazon)
  • 12.
  • 13. Kinesis Data Streams • Fully managed and massively scalable service to stream data • Data available in milliseconds and stored from 24 hours to up to 7 days • Custom stream processing with consumers • Shard is the unit of parallelism • In: 1 MB/sec or 1 000 records/sec • Out: 2 MB/sec
  • 14. Amazon Kinesis Agent • Stand-alone Java application to stream data from files Service Integrations • CloudWatch Logs • CloudWatch Events • AWS IoT • DB Migration Service • API Gateway Amazon Kinesis Producer Library (KPL) • Provides higher level of abstraction over API calls Amazon Kinesis API (AWS SDK) • Most flexible • Allows full control over writing data Kinesis Data Streams, Writing Data
  • 15. • putRecord(params, callback) • putRecords(params, callback) • Up to 500 records • Up to 5 MiB Kinesis Data Streams, AWS SDK
  • 16. putRecords(params, callback) • Request failure • Retries by default up to 3 times • Uses exponential backoff • Base delay by default is 100 ms Kinesis Data Streams, AWS SDK
  • 17. Lambda o One Lambda is invoked per each shard by default • NEW(ish)! Parallelization factor (max 10) • Up to 10 times as many concurrent Lambdas as there are shards!
  • 18. o Lambda is invoked once per second, or: • the number of records reaches the configured batch size (max 10 000 records) • the record batch size reaches synchronous Lambda’s payload limit (6MB) • NEW(ish)! the batch window reaches its maximum value (max 5 min) Lambda
  • 19. Before.. • Lambda retries the batch until success or data expiration • No other batches are processed from the shard (aka poison pill)! Lambda, Error Handling After! • Maximum retry attempts (max 10 000) • Maximum record age (1 min – 7 days) • Bisect batch on function failure • On-failure destination (SQS or SNS)
  • 20. Agenda Load the data to the datalake in a columnar format Enable content personalization through near real-time analytics
  • 21.
  • 22. • Fully managed service to load streaming data into a data lake • S3, Redshift, AWS Elasticsearch • HTTP endpoints (New!) • Datadog, New Relic, MongoDB, and Splunk (Newish!) • Allows to load streaming data with 0 lines of code • Scales automatically (no shards to manage) • Can batch, compress, transform and convert data before loading to the destination Kinesis Firehose
  • 23. • Data stored up to 24 hours • Batches records to certain size or for certain period of time • 1 to 128 MB • 60 to 900 seconds • Uses Glue Data Catalog to convert JSON to • Apache Parquet • Apache ORC Kinesis Firehose
  • 24. Kinesis Streams vs. Firehose • Fully managed service to stream data • Data available up to 7 days • Scaling using shards • Custom stream processing with consumers • Fully managed service to load data into a data lake • Data available for 24 hours • Scales automatically • Batching, compressing, converting data out of the box + custom transformations with
  • 25. Amazon Kinesis Agent • Stand-alone Java application to stream data from files Service Integrations • Kinesis Streams • CloudWatch Logs • CloudWatch Events • AWS IoT Amazon Kinesis API (AWS SDK) • Most flexible • Allows full control over writing data Kinesis Firehose, Writing Data
  • 26. putRecordBatch(params, callback) • Request failure • Retries by default up to 3 times • Uses exponential backoff • Base delay by default is 100 ms Kinesis Firehose, AWS SDK
  • 27. Load the data to the datalake in a columnar format Enable content personalization through near real-time analytics Agenda
  • 28.
  • 29. • Fully managed service to run SQL queries on the streaming data • Join, filter and aggregate data over a time-based or a row-based window Kinesis Data Analytics Ingests data from: Kinesis Data Stream Kinesis Firehose Sends results to: Kinesis Data Stream Kinesis Firehose Lambda function
  • 30.
  • 32. putRecords(params, callback) Partial failure: • Exponential backoff + jitter Gotcha! Writing to Kinesis Streams
  • 33. • Kinesis limits are per second, CloudWatch metrics are per minute • 1 MB/sec or 1 000 records/sec • Can 5 000 records/minute exceed the throughput? • Beware of network latency! • can be one reason for bursts in Kinesis • avoid the external network by using a Kinesis VPC endpoint Gotcha! Writing to Kinesis Streams
  • 34. • IncomingRecords = PutRecord + PutRecods The number of records successfully put to the Kinesis Stream • WriteProvisionedThroughputExceeded = PutRecord + PutRecords The number of records rejected due to throttling • IncomingRecords + WriteProvisionedThroughputExceeded = Total amount of incoming records Gotcha! Writing to Kinesis Streams
  • 35. • IteratorAge • latency between when a record is added, and when it is processed • If it’s increasing, increase the number of shards, or • increase the parallelization factor (NEWish) • Two different iterator age metrics • Kinesis stream iterator age is a combination metric across all consumers • not too informative • Lambda’s own iterator age should be used instead! Gotcha! Reading from Kinesis Streams
  • 36. • Beware of timeouts! • connectTimeout: timeout for establishing a new connection on a socket • if not explicitly set, this value will default to the value of timeout • timeout: read timeout for an existing socket (2 min) • time between when request ends and the response is received, including service and network round-trips Gotcha!
  • 37. • Firehose scales endlessly! Or does it? • “It is a fully managed service that automatically scales to match the throughput of your data.” • ”When Direct PUT is configured as the data source, each Kinesis Data Firehose delivery stream is subject to the following limits: […] 5,000 records/second, 2,000 transactions/second, and 5 MiB/second. • ThrottledRecords: the number of records that were throttled because data ingestion exceeded one of the delivery stream limits. Gotcha!
  • 38. • Always learn about the service limits • (there are always limits) • hard and soft limits • Keep a close eye on lambda’s concurrency limits • Deep dive into the error handling • Don’t just assume things .. • If not sure, ask the AWS Support • Keep a close eye on service updates • Everything fails all the time, especially at scale, so better be prepared and fail fast “Everything fails, all the time” Dr. Werner Vogels (CTO, Amazon) Lessons Learned
  • 40. Shameless Plug Real World Serverless with Yan Cui, @theburningmonk episode #14 Mastering AWS Kinesis Data Streams, Part 1 (2) dev.solita.fi @anahit_fi Anahit Pogosova