SlideShare a Scribd company logo
Principal Data Architect at Home24
Data Services: Search, Recommendations, Ranking
Worked on: Here Maps, Sapo.pt, DataJet, Xing, …
Scala, Perl, Prolog, Java, SQL, R, …
AWS: Step-Functions, Lambda Function, EMR, EC2,
Batch, SQS, SNS, Firehose, Athena, API Gateway, ...
home24.tech.blog
€
● 15 persons of 12 Nationalities
● Serverless Lovers. For data ingestion we have:
● AWS Technologies: Step-Functions, Cloud-Formation, Lambda Functions,
Athena, EMR, Redshift, S3, ...
Production Development
Number of Lambdas 625 2311
Number of Step Function 113 490
Consumed time (a month) 3,383,525 sec (39 days) 5,371,037 sec (62 days)
Number of requests (a month) 2,014,203 Requests 3,300,118 Requests
● Majority of our Streams are low rate messages
● The Big Stream doesn’t have an easily predictable rate of
messages and can peak to 100 messages/sec
● We will have many more low rate Streams
Main requirements
● Store new Stream Data in Raw S3 Bucket
● Refine Raw S3 Bucket data to a Refined S3 Bucket
● Wrong formatted messages shall not stop the flow
● Notification shall be sent on bad data
● Data must be refined in less than 10 minutes
Other
● Able to replay many days of data fast
● For development, every developer shall be able to deploy his version
independently
Requirements
● Collect data from SNS
● The data must be stored as received in S3.
● Files size must be easy to process on
Lambda (< 10MB)
● At least 1 file per minute must be created
Requirements
● Collect data from SNS
● The data must be stored as received in S3.
● Files size must be easy to process on
Lambda (< 10MB)
● At least 1 file per minute must be created
Architecture
● A SQS Queue collects all data from the SNS
Requirements
● Collect data from SNS
● The data must be stored as received in S3.
● Files size must be easy to process on
Lambda (< 10MB)
● At least 1 file per minute must be created
Architecture
● A SQS Queue collects all data from the SNS
● A Lambda copies the data from the SQS to a
Firehose
● The Lambda Function is invoked once a
minute via CloudWatch Event
Requirements
● Collect data from SNS
● The data must be stored as received in S3.
● Files size must be easy to process on
Lambda (< 10MB)
● At least 1 file per minute must be created
Architecture
● A SQS Queue collects all data from the SNS
● A Lambda copies the data from the SQS to a
Firehose
● The Lambda Function is invoked once a
minute via CloudWatch Event
● Firehose merges the data and creates files
on Raw S3 Bucket
Requirement
● When some message are not
processable, send a notification.
Requirement
● When some message are not
processable, send a notification.
Architecture
● The data is deleted from the SQS
Queue after successful copy to
Firehose
Requirement
● When some message are not
processable, send a notification.
Architecture
● The data is deleted from the SQS
Queue after successful copy to
Firehose
● On case of error, the messages will
end on the Dead-Letter Queue
Requirement
● When some message are not
processable, send a notification.
Architecture
● The data is deleted from the SQS
Queue after successful copy to
Firehose
● On case of error, the messages will
end on the Dead-Letter Queue
● Non empty Dead-Letter SQS means
there is an error on the data
Requirement
● When some message are not
processable, send a notification.
Architecture
● The data is deleted from the SQS
Queue after successful copy to
Firehose
● On case of error, the messages will
end on the Dead-Letter Queue
● Non empty Dead-Letter SQS means
there is an error on the data
● After fixing the Lambda function, one
can always copy the messages back
to the Raw SQS
Requirements
● Decompress data (zip, deflate, gz,
base64, ...)
● Normalize fields (dates for example)
● Add metadata
● Convert all to JSON
● Stored on S3
Requirements
● Decompress data (zip, deflate, gz,
base64, ...)
● Normalize fields (dates for example)
● Add metadata
● Convert all to JSON
● Stored on S3
Architecture
● When a new file is created on Raw S3
Bucket a message is sent to SQS via
SNS
Requirements
● Decompress data (zip, deflate, gz,
base64, ...)
● Normalize fields (dates for example)
● Add metadata
● Convert all to JSON
● Stored on S3
Architecture
● When a new file is created on Raw S3
Bucket a message is sent to SQS via
SNS
● The Lambda Function is invoked once
a minute via CloudWatch Event and
process all unprocessed files
Requirements
● Decompress data (zip, deflate, gz,
base64, ...)
● Normalize fields (dates for example)
● Add metadata
● Convert all to JSON
● Stored on S3
Architecture
● When a new file is created on Raw S3
Bucket a message is sent to SQS via
SNS
● The Lambda Function is invoked once
a minute via CloudWatch Event and
process all unprocessed files
● A file with the same key, as Raw file, is
created on the Refine S3 Bucket
Requirements
● Decompress data (zip, deflate, gz,
base64, ...)
● Normalize fields (dates for example)
● Add metadata
● Convert all to JSON
● Stored on S3
Architecture
● When a new file is created on Raw S3
Bucket a message is sent to SQS via
SNS
● The Lambda Function is invoked once
a minute via CloudWatch Event and
process all unprocessed files
● A file with the same key, as Raw file, is
created on the Refine S3 Bucket
● Messages that fail to process will end
on the Dead Letter Queue
Requirements
● Replay multiple days of data
Requirements
● Replay multiple days of data
Architecture
● Lambda Function List files on the
Raw S3 Bucket and send
messages to SQS
Requirements
● Replay multiple days of data
Architecture
● Lambda Function List files on the
Raw S3 Bucket and send
messages to SQS
● Since the files in Raw and Refine
have the same key, the files will
always overwrite the existing ones
Requirements
● Replay multiple days of data
Architecture
● Lambda Function List files on the
Raw S3 Bucket and send
messages to SQS
● Since the files in Raw and Refine
have the same key, the files will
always overwrite the existing ones
● The execution time of the Refiner
Lambda will rise and the Refiner
Lambdas will work in parallel
Requirements
● Replay multiple days of data
Architecture
● Lambda Function List files on the
Raw S3 Bucket and send
messages to SQS
● Since the files in Raw and Refine
have the same key, the files will
always overwrite the existing ones
● The execution time of the Refiner
Lambda will rise and the Refiner
Lambdas will work in parallelParallelism:
● our Lambda goes to ~190 sec, 3 lambdas
running in parallel.
● 9198 S3 objects
● 30 GB of GZip data, 10GB/hour
Requirement
● Developers shall be able to
deploy their Stream
Processors
● No interaction with external
team shall be required
Requirement
● Developers shall be able to
deploy their Stream
Processors
● No interaction with external
team shall be required
Architecture
● We created an internal SNS
where we clone the external
messages
Requirement
● Developers shall be able to
deploy their Stream
Processors
● No interaction with external
team shall be required
Architecture
● We created an internal SNS
where we clone the external
messages
● SNS can write to multiple
SQS
Requirement
● Developers shall be able to
deploy their Stream
Processors
● No interaction with external
team shall be required
Architecture
● We created an internal SNS
where we clone the external
messages
● SNS can write to multiple
SQS
● Same CloudFormation magic
and every developer can
deploy his own Environment
EC2 Lambda
CPU /
Price
1 t2.nano (5% vCPU and 500MB)
0.0063*24*30 = 4.536$/month
Considering 3 seconds a minute
with the highest memory (2
vCPU and 1536 MB)
3*60*24*30*10*(0.000002501+0
.0000002) = 3.5$/month
EC2 Lambda
CPU /
Price
1 t2.nano (5% vCPU and 500MB)
0.0063*24*30 = 4.536$/month
Considering 3 seconds a minute
with the highest memory (2
vCPU and 1536 MB)
3*60*24*30*10*(0.000002501+0
.0000002) = 3.5$/month
Devops Higher Low
EC2 Lambda
CPU /
Price
1 t2.nano (5% vCPU and 500MB)
0.0063*24*30 = 4.536$/month
Considering 3 seconds a minute
with the highest memory (2
vCPU and 1536 MB)
3*60*24*30*10*(0.000002501+0
.0000002) = 3.5$/month
Devops Higher Low
Scale Scale while it has credits to 1
vCPU. To have more vCPUs you
need to use more expensive
instance types or implement
autoscaling
Out of the box until a certain
level.
2 vCPU * 5 Lambdas = 10
vCPUs
EC2 Lambda
CPU /
Price
1 t2.nano (5% vCPU and 500MB)
0.0063*24*30 = 4.536$/month
Considering 3 seconds a minute
with the highest memory (2
vCPU and 1536 MB)
3*60*24*30*10*(0.000002501+0
.0000002) = 3.5$/month
Devops Higher Low
Scale Scale while it has credits to 1
vCPU. To have more vCPUs you
need to use more expensive
instance types or implement
autoscaling
Out of the box until a certain
level.
2 vCPU * 5 Lambdas = 10
vCPUs
Price wise, lambda seems a good solution. For our problems, 10 vCPUs is
clearly more than enough.
Kinesys SQS
Slow stream 2 Shards 24.5$/month
Puts 0.042$/Month
Requests
2.07$/month
We analyze our 2 types of stream of data:
● Slow Stream: 1 message/sec (2.6 million requests/month)
On SQS you pay PUTs and GETs on Kinesys you pay PUTs
Kinesys SQS
Slow stream 2 Shards 24.5$/month
Puts 0.042$/Month
Requests
2.07$/month
Fast stream 3 Shards 36.7$/month
Puts 1.1$/month
Requests
51.8$/month
We analyze our 2 types of stream of data:
● Slow Stream: 1 message/sec (2.6 million requests/month)
● Fast Stream: 25 message/second (64.8 million requests/month)
with spikes of 100 message/second
On SQS you pay PUTs and GETs on Kinesys you pay PUTs
Kinesys SQS
Slow stream 2 Shards 24.5$/month
Puts 0.042$/Month
Requests
2.07$/month
Fast stream 3 Shards 36.7$/month
Puts 1.1$/month
Requests
51.8$/month
Errors Errors have to be controlled
externally
Errors go to
DeadLeter Queue
We analyze our 2 types of stream of data:
● Slow Stream: 1 message/sec (2.6 million requests/month)
● Fast Stream: 25 message/second (64.8 million requests/month)
with spikes of 100 message/second
On SQS you pay PUTs and GETs on Kinesys you pay PUTs
● You just pay for what you use
● Scalability is not an issue at our messages volume (top 100
messages/second)
○ SQS and Firehose can easily process that volume of messages
○ Multiple Lambdas can work in parallel in case of high traffic or
replay.
● Separated Lambdas by Stream help understanding the logs
● Separated environments simplify developers work
● Data is on S3 and it can be queried via Athena, EMR, Redshift
Spectrum, ...
Questions
Answers

More Related Content

What's hot

ApacheCon BigData Europe 2015
ApacheCon BigData Europe 2015 ApacheCon BigData Europe 2015
ApacheCon BigData Europe 2015
Renato Javier Marroquín Mogrovejo
 
Benchmarking Apache Samza: 1.2 million messages per sec per node
Benchmarking Apache Samza: 1.2 million messages per sec per nodeBenchmarking Apache Samza: 1.2 million messages per sec per node
Benchmarking Apache Samza: 1.2 million messages per sec per node
Tao Feng
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipeline
Monal Daxini
 
Unbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniUnbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxini
Monal Daxini
 
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax Academy
 
Samza at LinkedIn: Taking Stream Processing to the Next Level
Samza at LinkedIn: Taking Stream Processing to the Next LevelSamza at LinkedIn: Taking Stream Processing to the Next Level
Samza at LinkedIn: Taking Stream Processing to the Next Level
Martin Kleppmann
 
Air traffic controller - Streams Processing meetup
Air traffic controller  - Streams Processing meetupAir traffic controller  - Streams Processing meetup
Air traffic controller - Streams Processing meetup
Ed Yakabosky
 
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARNApache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
blueboxtraveler
 
Harvesting the Power of Samza in LinkedIn's Feed
Harvesting the Power of Samza in LinkedIn's FeedHarvesting the Power of Samza in LinkedIn's Feed
Harvesting the Power of Samza in LinkedIn's Feed
Mohamed El-Geish
 
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Alexey Kharlamov
 
Spark streaming: Best Practices
Spark streaming: Best PracticesSpark streaming: Best Practices
Spark streaming: Best Practices
Prakash Chockalingam
 
Spark Streaming into context
Spark Streaming into contextSpark Streaming into context
Spark Streaming into context
David Martínez Rego
 
PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Fi...
PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Fi...PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Fi...
PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Fi...
DataStax
 
Thinking Functionally with Clojure
Thinking Functionally with ClojureThinking Functionally with Clojure
Thinking Functionally with Clojure
John Stevenson
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015
Robbie Strickland
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
Allen (Xiaozhong) Wang
 
MongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: ShardingMongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: Sharding
MongoDB
 
stream-processing-at-linkedin-with-apache-samza
stream-processing-at-linkedin-with-apache-samzastream-processing-at-linkedin-with-apache-samza
stream-processing-at-linkedin-with-apache-samza
Abhishek Shivanna
 
Scala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache sparkScala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache spark
Demi Ben-Ari
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
Prakash Chockalingam
 

What's hot (20)

ApacheCon BigData Europe 2015
ApacheCon BigData Europe 2015 ApacheCon BigData Europe 2015
ApacheCon BigData Europe 2015
 
Benchmarking Apache Samza: 1.2 million messages per sec per node
Benchmarking Apache Samza: 1.2 million messages per sec per nodeBenchmarking Apache Samza: 1.2 million messages per sec per node
Benchmarking Apache Samza: 1.2 million messages per sec per node
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipeline
 
Unbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniUnbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxini
 
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and Analytics
 
Samza at LinkedIn: Taking Stream Processing to the Next Level
Samza at LinkedIn: Taking Stream Processing to the Next LevelSamza at LinkedIn: Taking Stream Processing to the Next Level
Samza at LinkedIn: Taking Stream Processing to the Next Level
 
Air traffic controller - Streams Processing meetup
Air traffic controller  - Streams Processing meetupAir traffic controller  - Streams Processing meetup
Air traffic controller - Streams Processing meetup
 
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARNApache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
 
Harvesting the Power of Samza in LinkedIn's Feed
Harvesting the Power of Samza in LinkedIn's FeedHarvesting the Power of Samza in LinkedIn's Feed
Harvesting the Power of Samza in LinkedIn's Feed
 
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
 
Spark streaming: Best Practices
Spark streaming: Best PracticesSpark streaming: Best Practices
Spark streaming: Best Practices
 
Spark Streaming into context
Spark Streaming into contextSpark Streaming into context
Spark Streaming into context
 
PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Fi...
PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Fi...PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Fi...
PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Fi...
 
Thinking Functionally with Clojure
Thinking Functionally with ClojureThinking Functionally with Clojure
Thinking Functionally with Clojure
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
MongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: ShardingMongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: Sharding
 
stream-processing-at-linkedin-with-apache-samza
stream-processing-at-linkedin-with-apache-samzastream-processing-at-linkedin-with-apache-samza
stream-processing-at-linkedin-with-apache-samza
 
Scala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache sparkScala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache spark
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
 

Similar to Store stream data on Data Lake

Real-Time Event Processing
Real-Time Event ProcessingReal-Time Event Processing
Real-Time Event Processing
Amazon Web Services
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
Amazon Web Services
 
Cloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark AnalyticsCloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark Analytics
amesar0
 
Raleigh DevDay 2017: Real time data processing using AWS Lambda
Raleigh DevDay 2017: Real time data processing using AWS LambdaRaleigh DevDay 2017: Real time data processing using AWS Lambda
Raleigh DevDay 2017: Real time data processing using AWS Lambda
Amazon Web Services
 
Riga dev day: Lambda architecture at AWS
Riga dev day: Lambda architecture at AWSRiga dev day: Lambda architecture at AWS
Riga dev day: Lambda architecture at AWS
Antons Kranga
 
TenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience SharingTenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience Sharing
Chen-en Lu
 
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark  - Demi Be...S3, Cassandra or Outer Space? Dumping Time Series Data using Spark  - Demi Be...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
Codemotion
 
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
Codemotion Tel Aviv
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
aspyker
 
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017Amazon Web Services
 
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...
Amazon Web Services
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
Amazon Web Services
 
Serverlessusecase workshop feb3_v2
Serverlessusecase workshop feb3_v2Serverlessusecase workshop feb3_v2
Serverlessusecase workshop feb3_v2
kartraj
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
Amazon Web Services
 
Scalable crawling with Kafka, scrapy and spark - November 2021
Scalable crawling with Kafka, scrapy and spark - November 2021Scalable crawling with Kafka, scrapy and spark - November 2021
Scalable crawling with Kafka, scrapy and spark - November 2021
Max Lapan
 
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Amazon Web Services
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
datamantra
 
SMC303 Real-time Data Processing Using AWS Lambda
SMC303 Real-time Data Processing Using AWS LambdaSMC303 Real-time Data Processing Using AWS Lambda
SMC303 Real-time Data Processing Using AWS Lambda
Amazon Web Services
 
Collecting 600M events/day
Collecting 600M events/dayCollecting 600M events/day
Collecting 600M events/day
Lars Marius Garshol
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
Ruslan Meshenberg
 

Similar to Store stream data on Data Lake (20)

Real-Time Event Processing
Real-Time Event ProcessingReal-Time Event Processing
Real-Time Event Processing
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
Cloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark AnalyticsCloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark Analytics
 
Raleigh DevDay 2017: Real time data processing using AWS Lambda
Raleigh DevDay 2017: Real time data processing using AWS LambdaRaleigh DevDay 2017: Real time data processing using AWS Lambda
Raleigh DevDay 2017: Real time data processing using AWS Lambda
 
Riga dev day: Lambda architecture at AWS
Riga dev day: Lambda architecture at AWSRiga dev day: Lambda architecture at AWS
Riga dev day: Lambda architecture at AWS
 
TenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience SharingTenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience Sharing
 
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark  - Demi Be...S3, Cassandra or Outer Space? Dumping Time Series Data using Spark  - Demi Be...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
 
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
 
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
Serverlessusecase workshop feb3_v2
Serverlessusecase workshop feb3_v2Serverlessusecase workshop feb3_v2
Serverlessusecase workshop feb3_v2
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
Scalable crawling with Kafka, scrapy and spark - November 2021
Scalable crawling with Kafka, scrapy and spark - November 2021Scalable crawling with Kafka, scrapy and spark - November 2021
Scalable crawling with Kafka, scrapy and spark - November 2021
 
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
 
SMC303 Real-time Data Processing Using AWS Lambda
SMC303 Real-time Data Processing Using AWS LambdaSMC303 Real-time Data Processing Using AWS Lambda
SMC303 Real-time Data Processing Using AWS Lambda
 
Collecting 600M events/day
Collecting 600M events/dayCollecting 600M events/day
Collecting 600M events/day
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 

More from Marcos Rebelo

Coordinating external data importer services using AWS step functions
Coordinating external data importer services using AWS step functionsCoordinating external data importer services using AWS step functions
Coordinating external data importer services using AWS step functions
Marcos Rebelo
 
Modern Perl
Modern PerlModern Perl
Modern Perl
Marcos Rebelo
 
Perl Introduction
Perl IntroductionPerl Introduction
Perl Introduction
Marcos Rebelo
 
Perl In The Command Line
Perl In The Command LinePerl In The Command Line
Perl In The Command LineMarcos Rebelo
 

More from Marcos Rebelo (6)

Coordinating external data importer services using AWS step functions
Coordinating external data importer services using AWS step functionsCoordinating external data importer services using AWS step functions
Coordinating external data importer services using AWS step functions
 
Mojolicious
MojoliciousMojolicious
Mojolicious
 
Perl5i
Perl5iPerl5i
Perl5i
 
Modern Perl
Modern PerlModern Perl
Modern Perl
 
Perl Introduction
Perl IntroductionPerl Introduction
Perl Introduction
 
Perl In The Command Line
Perl In The Command LinePerl In The Command Line
Perl In The Command Line
 

Recently uploaded

Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Brad Spiegel Macon GA
 
Comptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guideComptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guide
GTProductions1
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
ufdana
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
JungkooksNonexistent
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
nirahealhty
 
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptxInternet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
VivekSinghShekhawat2
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
3ipehhoa
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
keoku
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
Rogerio Filho
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
laozhuseo02
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
laozhuseo02
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Sanjeev Rampal
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
Gal Baras
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
Arif0071
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
3ipehhoa
 
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdfJAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
Javier Lasa
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
JeyaPerumal1
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
eutxy
 
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
natyesu
 

Recently uploaded (20)

Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
 
Comptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guideComptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guide
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
 
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptxInternet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
 
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdfJAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
 
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
 

Store stream data on Data Lake

  • 1.
  • 2. Principal Data Architect at Home24 Data Services: Search, Recommendations, Ranking Worked on: Here Maps, Sapo.pt, DataJet, Xing, … Scala, Perl, Prolog, Java, SQL, R, … AWS: Step-Functions, Lambda Function, EMR, EC2, Batch, SQS, SNS, Firehose, Athena, API Gateway, ...
  • 4.
  • 5. ● 15 persons of 12 Nationalities ● Serverless Lovers. For data ingestion we have: ● AWS Technologies: Step-Functions, Cloud-Formation, Lambda Functions, Athena, EMR, Redshift, S3, ... Production Development Number of Lambdas 625 2311 Number of Step Function 113 490 Consumed time (a month) 3,383,525 sec (39 days) 5,371,037 sec (62 days) Number of requests (a month) 2,014,203 Requests 3,300,118 Requests
  • 6. ● Majority of our Streams are low rate messages ● The Big Stream doesn’t have an easily predictable rate of messages and can peak to 100 messages/sec ● We will have many more low rate Streams
  • 7. Main requirements ● Store new Stream Data in Raw S3 Bucket ● Refine Raw S3 Bucket data to a Refined S3 Bucket ● Wrong formatted messages shall not stop the flow ● Notification shall be sent on bad data ● Data must be refined in less than 10 minutes Other ● Able to replay many days of data fast ● For development, every developer shall be able to deploy his version independently
  • 8. Requirements ● Collect data from SNS ● The data must be stored as received in S3. ● Files size must be easy to process on Lambda (< 10MB) ● At least 1 file per minute must be created
  • 9. Requirements ● Collect data from SNS ● The data must be stored as received in S3. ● Files size must be easy to process on Lambda (< 10MB) ● At least 1 file per minute must be created Architecture ● A SQS Queue collects all data from the SNS
  • 10. Requirements ● Collect data from SNS ● The data must be stored as received in S3. ● Files size must be easy to process on Lambda (< 10MB) ● At least 1 file per minute must be created Architecture ● A SQS Queue collects all data from the SNS ● A Lambda copies the data from the SQS to a Firehose ● The Lambda Function is invoked once a minute via CloudWatch Event
  • 11. Requirements ● Collect data from SNS ● The data must be stored as received in S3. ● Files size must be easy to process on Lambda (< 10MB) ● At least 1 file per minute must be created Architecture ● A SQS Queue collects all data from the SNS ● A Lambda copies the data from the SQS to a Firehose ● The Lambda Function is invoked once a minute via CloudWatch Event ● Firehose merges the data and creates files on Raw S3 Bucket
  • 12. Requirement ● When some message are not processable, send a notification.
  • 13. Requirement ● When some message are not processable, send a notification. Architecture ● The data is deleted from the SQS Queue after successful copy to Firehose
  • 14. Requirement ● When some message are not processable, send a notification. Architecture ● The data is deleted from the SQS Queue after successful copy to Firehose ● On case of error, the messages will end on the Dead-Letter Queue
  • 15. Requirement ● When some message are not processable, send a notification. Architecture ● The data is deleted from the SQS Queue after successful copy to Firehose ● On case of error, the messages will end on the Dead-Letter Queue ● Non empty Dead-Letter SQS means there is an error on the data
  • 16. Requirement ● When some message are not processable, send a notification. Architecture ● The data is deleted from the SQS Queue after successful copy to Firehose ● On case of error, the messages will end on the Dead-Letter Queue ● Non empty Dead-Letter SQS means there is an error on the data ● After fixing the Lambda function, one can always copy the messages back to the Raw SQS
  • 17. Requirements ● Decompress data (zip, deflate, gz, base64, ...) ● Normalize fields (dates for example) ● Add metadata ● Convert all to JSON ● Stored on S3
  • 18. Requirements ● Decompress data (zip, deflate, gz, base64, ...) ● Normalize fields (dates for example) ● Add metadata ● Convert all to JSON ● Stored on S3 Architecture ● When a new file is created on Raw S3 Bucket a message is sent to SQS via SNS
  • 19. Requirements ● Decompress data (zip, deflate, gz, base64, ...) ● Normalize fields (dates for example) ● Add metadata ● Convert all to JSON ● Stored on S3 Architecture ● When a new file is created on Raw S3 Bucket a message is sent to SQS via SNS ● The Lambda Function is invoked once a minute via CloudWatch Event and process all unprocessed files
  • 20. Requirements ● Decompress data (zip, deflate, gz, base64, ...) ● Normalize fields (dates for example) ● Add metadata ● Convert all to JSON ● Stored on S3 Architecture ● When a new file is created on Raw S3 Bucket a message is sent to SQS via SNS ● The Lambda Function is invoked once a minute via CloudWatch Event and process all unprocessed files ● A file with the same key, as Raw file, is created on the Refine S3 Bucket
  • 21. Requirements ● Decompress data (zip, deflate, gz, base64, ...) ● Normalize fields (dates for example) ● Add metadata ● Convert all to JSON ● Stored on S3 Architecture ● When a new file is created on Raw S3 Bucket a message is sent to SQS via SNS ● The Lambda Function is invoked once a minute via CloudWatch Event and process all unprocessed files ● A file with the same key, as Raw file, is created on the Refine S3 Bucket ● Messages that fail to process will end on the Dead Letter Queue
  • 23. Requirements ● Replay multiple days of data Architecture ● Lambda Function List files on the Raw S3 Bucket and send messages to SQS
  • 24. Requirements ● Replay multiple days of data Architecture ● Lambda Function List files on the Raw S3 Bucket and send messages to SQS ● Since the files in Raw and Refine have the same key, the files will always overwrite the existing ones
  • 25. Requirements ● Replay multiple days of data Architecture ● Lambda Function List files on the Raw S3 Bucket and send messages to SQS ● Since the files in Raw and Refine have the same key, the files will always overwrite the existing ones ● The execution time of the Refiner Lambda will rise and the Refiner Lambdas will work in parallel
  • 26. Requirements ● Replay multiple days of data Architecture ● Lambda Function List files on the Raw S3 Bucket and send messages to SQS ● Since the files in Raw and Refine have the same key, the files will always overwrite the existing ones ● The execution time of the Refiner Lambda will rise and the Refiner Lambdas will work in parallelParallelism: ● our Lambda goes to ~190 sec, 3 lambdas running in parallel. ● 9198 S3 objects ● 30 GB of GZip data, 10GB/hour
  • 27. Requirement ● Developers shall be able to deploy their Stream Processors ● No interaction with external team shall be required
  • 28. Requirement ● Developers shall be able to deploy their Stream Processors ● No interaction with external team shall be required Architecture ● We created an internal SNS where we clone the external messages
  • 29. Requirement ● Developers shall be able to deploy their Stream Processors ● No interaction with external team shall be required Architecture ● We created an internal SNS where we clone the external messages ● SNS can write to multiple SQS
  • 30. Requirement ● Developers shall be able to deploy their Stream Processors ● No interaction with external team shall be required Architecture ● We created an internal SNS where we clone the external messages ● SNS can write to multiple SQS ● Same CloudFormation magic and every developer can deploy his own Environment
  • 31. EC2 Lambda CPU / Price 1 t2.nano (5% vCPU and 500MB) 0.0063*24*30 = 4.536$/month Considering 3 seconds a minute with the highest memory (2 vCPU and 1536 MB) 3*60*24*30*10*(0.000002501+0 .0000002) = 3.5$/month
  • 32. EC2 Lambda CPU / Price 1 t2.nano (5% vCPU and 500MB) 0.0063*24*30 = 4.536$/month Considering 3 seconds a minute with the highest memory (2 vCPU and 1536 MB) 3*60*24*30*10*(0.000002501+0 .0000002) = 3.5$/month Devops Higher Low
  • 33. EC2 Lambda CPU / Price 1 t2.nano (5% vCPU and 500MB) 0.0063*24*30 = 4.536$/month Considering 3 seconds a minute with the highest memory (2 vCPU and 1536 MB) 3*60*24*30*10*(0.000002501+0 .0000002) = 3.5$/month Devops Higher Low Scale Scale while it has credits to 1 vCPU. To have more vCPUs you need to use more expensive instance types or implement autoscaling Out of the box until a certain level. 2 vCPU * 5 Lambdas = 10 vCPUs
  • 34. EC2 Lambda CPU / Price 1 t2.nano (5% vCPU and 500MB) 0.0063*24*30 = 4.536$/month Considering 3 seconds a minute with the highest memory (2 vCPU and 1536 MB) 3*60*24*30*10*(0.000002501+0 .0000002) = 3.5$/month Devops Higher Low Scale Scale while it has credits to 1 vCPU. To have more vCPUs you need to use more expensive instance types or implement autoscaling Out of the box until a certain level. 2 vCPU * 5 Lambdas = 10 vCPUs Price wise, lambda seems a good solution. For our problems, 10 vCPUs is clearly more than enough.
  • 35. Kinesys SQS Slow stream 2 Shards 24.5$/month Puts 0.042$/Month Requests 2.07$/month We analyze our 2 types of stream of data: ● Slow Stream: 1 message/sec (2.6 million requests/month) On SQS you pay PUTs and GETs on Kinesys you pay PUTs
  • 36. Kinesys SQS Slow stream 2 Shards 24.5$/month Puts 0.042$/Month Requests 2.07$/month Fast stream 3 Shards 36.7$/month Puts 1.1$/month Requests 51.8$/month We analyze our 2 types of stream of data: ● Slow Stream: 1 message/sec (2.6 million requests/month) ● Fast Stream: 25 message/second (64.8 million requests/month) with spikes of 100 message/second On SQS you pay PUTs and GETs on Kinesys you pay PUTs
  • 37. Kinesys SQS Slow stream 2 Shards 24.5$/month Puts 0.042$/Month Requests 2.07$/month Fast stream 3 Shards 36.7$/month Puts 1.1$/month Requests 51.8$/month Errors Errors have to be controlled externally Errors go to DeadLeter Queue We analyze our 2 types of stream of data: ● Slow Stream: 1 message/sec (2.6 million requests/month) ● Fast Stream: 25 message/second (64.8 million requests/month) with spikes of 100 message/second On SQS you pay PUTs and GETs on Kinesys you pay PUTs
  • 38. ● You just pay for what you use ● Scalability is not an issue at our messages volume (top 100 messages/second) ○ SQS and Firehose can easily process that volume of messages ○ Multiple Lambdas can work in parallel in case of high traffic or replay. ● Separated Lambdas by Stream help understanding the logs ● Separated environments simplify developers work ● Data is on S3 and it can be queried via Athena, EMR, Redshift Spectrum, ...