SlideShare a Scribd company logo
Opher Dubrovsky & Avner Tamir
June 10, 2020
DevTalks Reimagined
Taming the Beast !!
3
● Amazing powers of serverless
● Pitfalls
● How not to get burnt
4
A system - DataOut
250 Billion events / day
Serverless
5
Who We Are
Big Data dev lead
Loves data and challenges
Opher
Dubrovsky
Avner
Tamir
Director of Cloud
Operations
Automate everything !
6
Nielsen Identity Engine
● eXelate (startup) acquired by Nielsen in 2015
● A Data company
● Machine learning models for insights
● Data used for (its a DMP):
○ Targeting (campaigns)
○ Business decisions
7
Nielsen Identity Engine in numbers
~300B events/day
~60TB/day
S3
~3000 nodes/day
$500,000/month
8
Outline
○ Serverless
○ Problem & solution architecture
○ Pitfalls we encountered & solutions
○ Q&A
Serverless
10
Running Code on Servers
Some Code
__________
__________
__________
__________
Steps to Run
1. Deploy & Install
2. Start server
3. Run code
4. Maintain
5. Scale
Costs $$
Hassle
Costs $$
Hassle
11
Serverless
Some Code
__________
__________
__________
__________
Steps to Run
1. Deploy
2. Start server
3. Run code
4. Stop server
5. Delete code
6. Reuse server for
something elseBenefits:
● Save $$
● Less management hassle
● Automatic scaling
● Highly parallelizable
12
Unexpected Benefit
Code Improvements === Cost Reduction
improvements
$$
13
Our System: DataOut
14
Goals:
● BigData pipeline - 24/7 stream
● Sends data to ad platforms
● Daily data:
○ 200 platforms * 400,000 segments * millions of users
○ == 120 Billion Events / Day
Has to be:
● Scalable
● Cost effective
● Meet SLAs
● Transparent
15
Our Solution - Serverless Infrastructure
Input Output
16
Our Solution - Serverless Infrastructure
Input Output
17
Our Solution - Serverless Infrastructure
Input Output
18
What Makes It Attractive ?
Data
Processing
Costs
19
Classical Server Systems Are
Built for the worst case Many hours are like this
Wasted $$$$$$
20
What Makes It Attractive ?
Data
Processing
Costs
Savings
21
Benefits of Serverless
● Scale up / down
● Run in parallel (short processing time, SLA)
● Save when not in use
● Optimizations translate to savings
● Simple to maintain
22
We Got
Superpowers
23
Some Numbers
● Average Day - 120 Billion events
● Highest Day - 250 Billion events
○ 17 million files
○ 55 TB of data
● Hourly Scaling -
○ 1 TB ←→ 6 TB
○ Scales up/down by a factor of 6 !!
24
BUT - Life is Not That Easy
25
Pitfall 1 - Clash of clans
● Easy is risky, easy and manual is crazy
● Too many breaking changes
● Too many hands
● No orchestration
26
First Shot Solution
● Blocking permissions
● BUT - CloudOps became a bottleneck
Created tension and ARGUMENTS !!
27
Second Shot
● Automate anything and everything…..
● Give developers access to run all automation tools
Reduced tensions - developers could now work !!
28
Clash of Clans - Peace Conference
● Created a shared task force (Cloud & Data engineers)
● Define & Implement holistic solution
○ Tools (Terraform & Jenkins)
○ Processes (CI & CD)
○ Deployments (Canary, gradual, full)
29
Clash of Clans - New Era
● Developers own production!
● CloudOps maintain tools and approve changes
30
Pitfall 2 - Costs
● Optimization will save $$$
● BUT - what is the reverse of optimization ?
BUGS !!
31
BUG 1 - Reprocessing
32
BUG 1 - Reprocessing 38 Times !!
● Monitor tasks
● If task is “in-progress” for too long - we reprocess
Time
Task
Task
Task
Task
Task
Task
Task Processing Time
Reprocess
Threshold
Reprocess
Tasks
33
The Bug
● We deployed a version that had a lower threshold
● Many normal tasks were reprocessed (38 times)
Time
Task
Task
Task
Task
Task
Task
Task Processing Time
Reprocess
Threshold
Reprocess
Tasks
Task
Task
34
Result - Costs
35
Total $ Damage - $3413
Item Cost
Total cost of 4 bug days $ 7,442
Average 4 day costs $ 4,029
Net $ damage $ 3,413
36
Solution for Reprocessing
● Task validations to prevent multi processing
● Stick to your procedures !!!
37
BUG 2 - Lambda Throttling
38
BUG 2 - Lambda Throttled Queue
SNS
Work
Manager
Workers
FanOut
DB
High load → causing queued invocations
39
BUG 2 - Lambda Throttled Queue
SNS
Increased
concurrency
Workers
FanOut
40
AWS Cost Report
41
Total $ Damage - $2555
Item Cost
Total cost of 5 bug days $ 6,055
Average day costs (of 5 days) $ 3,500
Net $ damage $ 2,555
42
Solution for Escalating Costs
● New monitoring & alerts on:
○ number of invocations
○ Measure costs in real time and alert
43
Pitfall 3 - DDoS Attack on Partners
Our system scales up and down as traffic comes in !!
BUT - the world does not work like that !!
44
Email from Partner
Subject: You’re killing our systems
Message: can you guys check…….
……….
Data going up 1 → 6x over the day
45
Solution for DDoS
Throttling
Adslfjk alsdf al;sdjf aljfds
j;laj fda;sdflj a;lksdfj
;alksdjf ;alksdjf a;lkjdsf
;alksjdf a;lsdkfj a;lskdjf
;alsdkfj a;lsdkjf a;ldjkf
a;ldkjf ;alsdj a;ldkjf a;ldjkf
Adslfjk alsdf al;sdjf aljfds j;laj fda;sdflj a;lksdfj ……….
We can throttle on:
● MB/sec
● Items/sec
● HTTP requests/sec , connections, …..
46
Summary - Avoiding the Pitfalls
● Measure all risk dimensions
● In our system they are:
○ Cost
○ Throughput
○ Load
● Quickly alert on deviations
● Build detailed dashboards on the above
● Create circuit breakers in the system based on alerts !!
47
Summary - Serverless
● Amazing powers and scale
● Great cost optimizations
● Parallelization
BUT - BEWARE of escalating costs
and be ready for the UNEXPECTED !!
Copyright © 2017 The Nielsen Company. Confidential and proprietary.
Questions

More Related Content

What's hot

Streaming Auto-scaling in Google Cloud Dataflow
Streaming Auto-scaling in Google Cloud DataflowStreaming Auto-scaling in Google Cloud Dataflow
Streaming Auto-scaling in Google Cloud Dataflow
C4Media
 
TiDB for Big Data
TiDB for Big DataTiDB for Big Data
TiDB for Big Data
PingCAP
 
Taskerman - a distributed cluster task manager
Taskerman - a distributed cluster task managerTaskerman - a distributed cluster task manager
Taskerman - a distributed cluster task manager
Raghavendra Prabhu
 
How to build TiDB
How to build TiDBHow to build TiDB
How to build TiDB
PingCAP
 
Pre fosdem2020 uber
Pre fosdem2020 uberPre fosdem2020 uber
Pre fosdem2020 uber
Giedrius Jaraminas
 
The Dark Side Of Go -- Go runtime related problems in TiDB in production
The Dark Side Of Go -- Go runtime related problems in TiDB  in productionThe Dark Side Of Go -- Go runtime related problems in TiDB  in production
The Dark Side Of Go -- Go runtime related problems in TiDB in production
PingCAP
 
Live traffic capture and replay in cassandra 4.0
Live traffic capture and replay in cassandra 4.0Live traffic capture and replay in cassandra 4.0
Live traffic capture and replay in cassandra 4.0
Vinay Kumar Chella
 
A Brief Introduction of TiDB (Percona Live)
A Brief Introduction of TiDB (Percona Live)A Brief Introduction of TiDB (Percona Live)
A Brief Introduction of TiDB (Percona Live)
PingCAP
 
Using Time Window Compaction Strategy For Time Series Workloads
Using Time Window Compaction Strategy For Time Series WorkloadsUsing Time Window Compaction Strategy For Time Series Workloads
Using Time Window Compaction Strategy For Time Series Workloads
Jeff Jirsa
 
Rust in TiKV
Rust in TiKVRust in TiKV
Rust in TiKV
PingCAP
 
Building a transactional key-value store that scales to 100+ nodes (percona l...
Building a transactional key-value store that scales to 100+ nodes (percona l...Building a transactional key-value store that scales to 100+ nodes (percona l...
Building a transactional key-value store that scales to 100+ nodes (percona l...
PingCAP
 
Golang in TiDB (GopherChina 2017)
Golang in TiDB  (GopherChina 2017)Golang in TiDB  (GopherChina 2017)
Golang in TiDB (GopherChina 2017)
PingCAP
 
Netty training
Netty trainingNetty training
Scale Relational Database with NewSQL
Scale Relational Database with NewSQLScale Relational Database with NewSQL
Scale Relational Database with NewSQL
PingCAP
 
A glimpse of cassandra 4.0 features netflix
A glimpse of cassandra 4.0 features   netflixA glimpse of cassandra 4.0 features   netflix
A glimpse of cassandra 4.0 features netflix
Vinay Kumar Chella
 
Safer restarts, faster streaming, and better repair, just a glimpse of cassan...
Safer restarts, faster streaming, and better repair, just a glimpse of cassan...Safer restarts, faster streaming, and better repair, just a glimpse of cassan...
Safer restarts, faster streaming, and better repair, just a glimpse of cassan...
Vinay Kumar Chella
 
DynamoDB at HasOffers
DynamoDB at HasOffers DynamoDB at HasOffers
DynamoDB at HasOffers
Amazon Web Services
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in Production
DataStax Academy
 
Nagios Conference 2014 - Dorance Martinez Cortes - Customizing Nagios
Nagios Conference 2014 - Dorance Martinez Cortes - Customizing NagiosNagios Conference 2014 - Dorance Martinez Cortes - Customizing Nagios
Nagios Conference 2014 - Dorance Martinez Cortes - Customizing Nagios
Nagios
 

What's hot (19)

Streaming Auto-scaling in Google Cloud Dataflow
Streaming Auto-scaling in Google Cloud DataflowStreaming Auto-scaling in Google Cloud Dataflow
Streaming Auto-scaling in Google Cloud Dataflow
 
TiDB for Big Data
TiDB for Big DataTiDB for Big Data
TiDB for Big Data
 
Taskerman - a distributed cluster task manager
Taskerman - a distributed cluster task managerTaskerman - a distributed cluster task manager
Taskerman - a distributed cluster task manager
 
How to build TiDB
How to build TiDBHow to build TiDB
How to build TiDB
 
Pre fosdem2020 uber
Pre fosdem2020 uberPre fosdem2020 uber
Pre fosdem2020 uber
 
The Dark Side Of Go -- Go runtime related problems in TiDB in production
The Dark Side Of Go -- Go runtime related problems in TiDB  in productionThe Dark Side Of Go -- Go runtime related problems in TiDB  in production
The Dark Side Of Go -- Go runtime related problems in TiDB in production
 
Live traffic capture and replay in cassandra 4.0
Live traffic capture and replay in cassandra 4.0Live traffic capture and replay in cassandra 4.0
Live traffic capture and replay in cassandra 4.0
 
A Brief Introduction of TiDB (Percona Live)
A Brief Introduction of TiDB (Percona Live)A Brief Introduction of TiDB (Percona Live)
A Brief Introduction of TiDB (Percona Live)
 
Using Time Window Compaction Strategy For Time Series Workloads
Using Time Window Compaction Strategy For Time Series WorkloadsUsing Time Window Compaction Strategy For Time Series Workloads
Using Time Window Compaction Strategy For Time Series Workloads
 
Rust in TiKV
Rust in TiKVRust in TiKV
Rust in TiKV
 
Building a transactional key-value store that scales to 100+ nodes (percona l...
Building a transactional key-value store that scales to 100+ nodes (percona l...Building a transactional key-value store that scales to 100+ nodes (percona l...
Building a transactional key-value store that scales to 100+ nodes (percona l...
 
Golang in TiDB (GopherChina 2017)
Golang in TiDB  (GopherChina 2017)Golang in TiDB  (GopherChina 2017)
Golang in TiDB (GopherChina 2017)
 
Netty training
Netty trainingNetty training
Netty training
 
Scale Relational Database with NewSQL
Scale Relational Database with NewSQLScale Relational Database with NewSQL
Scale Relational Database with NewSQL
 
A glimpse of cassandra 4.0 features netflix
A glimpse of cassandra 4.0 features   netflixA glimpse of cassandra 4.0 features   netflix
A glimpse of cassandra 4.0 features netflix
 
Safer restarts, faster streaming, and better repair, just a glimpse of cassan...
Safer restarts, faster streaming, and better repair, just a glimpse of cassan...Safer restarts, faster streaming, and better repair, just a glimpse of cassan...
Safer restarts, faster streaming, and better repair, just a glimpse of cassan...
 
DynamoDB at HasOffers
DynamoDB at HasOffers DynamoDB at HasOffers
DynamoDB at HasOffers
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in Production
 
Nagios Conference 2014 - Dorance Martinez Cortes - Customizing Nagios
Nagios Conference 2014 - Dorance Martinez Cortes - Customizing NagiosNagios Conference 2014 - Dorance Martinez Cortes - Customizing Nagios
Nagios Conference 2014 - Dorance Martinez Cortes - Customizing Nagios
 

Similar to THE RISE AND FALL OF SERVERLESS COSTS - TAMING THE (SERVERLESS) BEAST

Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
C4Media
 
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutesDruid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Shivji Kumar Jha
 
Serverless apps: The startup founder's secret weapon
Serverless apps:  The startup founder's secret weaponServerless apps:  The startup founder's secret weapon
Serverless apps: The startup founder's secret weapon
Ardee Aram
 
Solving the Database Problem
Solving the Database ProblemSolving the Database Problem
Solving the Database Problem
Jay Gordon
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB
 
Cloud arch patterns
Cloud arch patternsCloud arch patterns
Cloud arch patterns
Corey Huinker
 
Secure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budgetSecure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budget
Lars Albertsson
 
Aws uk ug #8 not everything that happens in vegas stay in vegas
Aws uk ug #8   not everything that happens in vegas stay in vegasAws uk ug #8   not everything that happens in vegas stay in vegas
Aws uk ug #8 not everything that happens in vegas stay in vegas
Peter Mounce
 
Snowflake Automated Deployments / CI/CD Pipelines
Snowflake Automated Deployments / CI/CD PipelinesSnowflake Automated Deployments / CI/CD Pipelines
Snowflake Automated Deployments / CI/CD Pipelines
Drew Hansen
 
Altitude SF 2017: Reddit - How we built and scaled r/place
Altitude SF 2017: Reddit - How we built and scaled r/placeAltitude SF 2017: Reddit - How we built and scaled r/place
Altitude SF 2017: Reddit - How we built and scaled r/place
Fastly
 
Google Cloud - Stand Out Features
Google Cloud - Stand Out FeaturesGoogle Cloud - Stand Out Features
Google Cloud - Stand Out Features
GDG Cloud Bengaluru
 
Our journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scaleOur journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scale
Itai Yaffe
 
Auditing data and answering the life long question, is it the end of the day ...
Auditing data and answering the life long question, is it the end of the day ...Auditing data and answering the life long question, is it the end of the day ...
Auditing data and answering the life long question, is it the end of the day ...
Simona Meriam
 
Scale-Out Using Spark in Serverless Herd Mode!
Scale-Out Using Spark in Serverless Herd Mode!Scale-Out Using Spark in Serverless Herd Mode!
Scale-Out Using Spark in Serverless Herd Mode!
Databricks
 
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Codemotion
 
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Demi Ben-Ari
 
Data ops in practice - Swedish style
Data ops in practice - Swedish styleData ops in practice - Swedish style
Data ops in practice - Swedish style
Lars Albertsson
 
Server fleet management using Camunda by Akhil Ahuja
Server fleet management using Camunda by Akhil AhujaServer fleet management using Camunda by Akhil Ahuja
Server fleet management using Camunda by Akhil Ahuja
camunda services GmbH
 
An EyeWitness View into your Network
An EyeWitness View into your NetworkAn EyeWitness View into your Network
An EyeWitness View into your Network
CTruncer
 
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Noam Elfanbaum
 

Similar to THE RISE AND FALL OF SERVERLESS COSTS - TAMING THE (SERVERLESS) BEAST (20)

Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutesDruid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
 
Serverless apps: The startup founder's secret weapon
Serverless apps:  The startup founder's secret weaponServerless apps:  The startup founder's secret weapon
Serverless apps: The startup founder's secret weapon
 
Solving the Database Problem
Solving the Database ProblemSolving the Database Problem
Solving the Database Problem
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
 
Cloud arch patterns
Cloud arch patternsCloud arch patterns
Cloud arch patterns
 
Secure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budgetSecure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budget
 
Aws uk ug #8 not everything that happens in vegas stay in vegas
Aws uk ug #8   not everything that happens in vegas stay in vegasAws uk ug #8   not everything that happens in vegas stay in vegas
Aws uk ug #8 not everything that happens in vegas stay in vegas
 
Snowflake Automated Deployments / CI/CD Pipelines
Snowflake Automated Deployments / CI/CD PipelinesSnowflake Automated Deployments / CI/CD Pipelines
Snowflake Automated Deployments / CI/CD Pipelines
 
Altitude SF 2017: Reddit - How we built and scaled r/place
Altitude SF 2017: Reddit - How we built and scaled r/placeAltitude SF 2017: Reddit - How we built and scaled r/place
Altitude SF 2017: Reddit - How we built and scaled r/place
 
Google Cloud - Stand Out Features
Google Cloud - Stand Out FeaturesGoogle Cloud - Stand Out Features
Google Cloud - Stand Out Features
 
Our journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scaleOur journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scale
 
Auditing data and answering the life long question, is it the end of the day ...
Auditing data and answering the life long question, is it the end of the day ...Auditing data and answering the life long question, is it the end of the day ...
Auditing data and answering the life long question, is it the end of the day ...
 
Scale-Out Using Spark in Serverless Herd Mode!
Scale-Out Using Spark in Serverless Herd Mode!Scale-Out Using Spark in Serverless Herd Mode!
Scale-Out Using Spark in Serverless Herd Mode!
 
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
 
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
 
Data ops in practice - Swedish style
Data ops in practice - Swedish styleData ops in practice - Swedish style
Data ops in practice - Swedish style
 
Server fleet management using Camunda by Akhil Ahuja
Server fleet management using Camunda by Akhil AhujaServer fleet management using Camunda by Akhil Ahuja
Server fleet management using Camunda by Akhil Ahuja
 
An EyeWitness View into your Network
An EyeWitness View into your NetworkAn EyeWitness View into your Network
An EyeWitness View into your Network
 
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
 

Recently uploaded

Kubernetes at Scale: Going Multi-Cluster with Istio
Kubernetes at Scale:  Going Multi-Cluster  with IstioKubernetes at Scale:  Going Multi-Cluster  with Istio
Kubernetes at Scale: Going Multi-Cluster with Istio
Severalnines
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
Patrick Weigel
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
XfilesPro
 
Manyata Tech Park Bangalore_ Infrastructure, Facilities and More
Manyata Tech Park Bangalore_ Infrastructure, Facilities and MoreManyata Tech Park Bangalore_ Infrastructure, Facilities and More
Manyata Tech Park Bangalore_ Infrastructure, Facilities and More
narinav14
 
Malibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed RoundMalibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed Round
sjcobrien
 
Alluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio Webinar | 10x Faster Trino Queries on Your Data PlatformAlluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio, Inc.
 
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
dakas1
 
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
gapen1
 
Upturn India Technologies - Web development company in Nashik
Upturn India Technologies - Web development company in NashikUpturn India Technologies - Web development company in Nashik
Upturn India Technologies - Web development company in Nashik
Upturn India Technologies
 
All you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVMAll you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVM
Alina Yurenko
 
Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !
Marcin Chrost
 
What’s New in Odoo 17 – A Complete Roadmap
What’s New in Odoo 17 – A Complete RoadmapWhat’s New in Odoo 17 – A Complete Roadmap
What’s New in Odoo 17 – A Complete Roadmap
Envertis Software Solutions
 
TMU毕业证书精仿办理
TMU毕业证书精仿办理TMU毕业证书精仿办理
TMU毕业证书精仿办理
aeeva
 
ACE - Team 24 Wrapup event at ahmedabad.
ACE - Team 24 Wrapup event at ahmedabad.ACE - Team 24 Wrapup event at ahmedabad.
ACE - Team 24 Wrapup event at ahmedabad.
Maitrey Patel
 
Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
Alberto Brandolini
 
DevOps Consulting Company | Hire DevOps Services
DevOps Consulting Company | Hire DevOps ServicesDevOps Consulting Company | Hire DevOps Services
DevOps Consulting Company | Hire DevOps Services
seospiralmantra
 
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
Bert Jan Schrijver
 
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Paul Brebner
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Julian Hyde
 
Photoshop Tutorial for Beginners (2024 Edition)
Photoshop Tutorial for Beginners (2024 Edition)Photoshop Tutorial for Beginners (2024 Edition)
Photoshop Tutorial for Beginners (2024 Edition)
alowpalsadig
 

Recently uploaded (20)

Kubernetes at Scale: Going Multi-Cluster with Istio
Kubernetes at Scale:  Going Multi-Cluster  with IstioKubernetes at Scale:  Going Multi-Cluster  with Istio
Kubernetes at Scale: Going Multi-Cluster with Istio
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
 
Manyata Tech Park Bangalore_ Infrastructure, Facilities and More
Manyata Tech Park Bangalore_ Infrastructure, Facilities and MoreManyata Tech Park Bangalore_ Infrastructure, Facilities and More
Manyata Tech Park Bangalore_ Infrastructure, Facilities and More
 
Malibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed RoundMalibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed Round
 
Alluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio Webinar | 10x Faster Trino Queries on Your Data PlatformAlluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio Webinar | 10x Faster Trino Queries on Your Data Platform
 
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
 
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
 
Upturn India Technologies - Web development company in Nashik
Upturn India Technologies - Web development company in NashikUpturn India Technologies - Web development company in Nashik
Upturn India Technologies - Web development company in Nashik
 
All you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVMAll you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVM
 
Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !
 
What’s New in Odoo 17 – A Complete Roadmap
What’s New in Odoo 17 – A Complete RoadmapWhat’s New in Odoo 17 – A Complete Roadmap
What’s New in Odoo 17 – A Complete Roadmap
 
TMU毕业证书精仿办理
TMU毕业证书精仿办理TMU毕业证书精仿办理
TMU毕业证书精仿办理
 
ACE - Team 24 Wrapup event at ahmedabad.
ACE - Team 24 Wrapup event at ahmedabad.ACE - Team 24 Wrapup event at ahmedabad.
ACE - Team 24 Wrapup event at ahmedabad.
 
Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
 
DevOps Consulting Company | Hire DevOps Services
DevOps Consulting Company | Hire DevOps ServicesDevOps Consulting Company | Hire DevOps Services
DevOps Consulting Company | Hire DevOps Services
 
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
 
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
 
Photoshop Tutorial for Beginners (2024 Edition)
Photoshop Tutorial for Beginners (2024 Edition)Photoshop Tutorial for Beginners (2024 Edition)
Photoshop Tutorial for Beginners (2024 Edition)
 

THE RISE AND FALL OF SERVERLESS COSTS - TAMING THE (SERVERLESS) BEAST

  • 1. Opher Dubrovsky & Avner Tamir June 10, 2020 DevTalks Reimagined Taming the Beast !!
  • 2. 3 ● Amazing powers of serverless ● Pitfalls ● How not to get burnt
  • 3. 4 A system - DataOut 250 Billion events / day Serverless
  • 4. 5 Who We Are Big Data dev lead Loves data and challenges Opher Dubrovsky Avner Tamir Director of Cloud Operations Automate everything !
  • 5. 6 Nielsen Identity Engine ● eXelate (startup) acquired by Nielsen in 2015 ● A Data company ● Machine learning models for insights ● Data used for (its a DMP): ○ Targeting (campaigns) ○ Business decisions
  • 6. 7 Nielsen Identity Engine in numbers ~300B events/day ~60TB/day S3 ~3000 nodes/day $500,000/month
  • 7. 8 Outline ○ Serverless ○ Problem & solution architecture ○ Pitfalls we encountered & solutions ○ Q&A
  • 9. 10 Running Code on Servers Some Code __________ __________ __________ __________ Steps to Run 1. Deploy & Install 2. Start server 3. Run code 4. Maintain 5. Scale Costs $$ Hassle Costs $$ Hassle
  • 10. 11 Serverless Some Code __________ __________ __________ __________ Steps to Run 1. Deploy 2. Start server 3. Run code 4. Stop server 5. Delete code 6. Reuse server for something elseBenefits: ● Save $$ ● Less management hassle ● Automatic scaling ● Highly parallelizable
  • 11. 12 Unexpected Benefit Code Improvements === Cost Reduction improvements $$
  • 13. 14 Goals: ● BigData pipeline - 24/7 stream ● Sends data to ad platforms ● Daily data: ○ 200 platforms * 400,000 segments * millions of users ○ == 120 Billion Events / Day Has to be: ● Scalable ● Cost effective ● Meet SLAs ● Transparent
  • 14. 15 Our Solution - Serverless Infrastructure Input Output
  • 15. 16 Our Solution - Serverless Infrastructure Input Output
  • 16. 17 Our Solution - Serverless Infrastructure Input Output
  • 17. 18 What Makes It Attractive ? Data Processing Costs
  • 18. 19 Classical Server Systems Are Built for the worst case Many hours are like this Wasted $$$$$$
  • 19. 20 What Makes It Attractive ? Data Processing Costs Savings
  • 20. 21 Benefits of Serverless ● Scale up / down ● Run in parallel (short processing time, SLA) ● Save when not in use ● Optimizations translate to savings ● Simple to maintain
  • 22. 23 Some Numbers ● Average Day - 120 Billion events ● Highest Day - 250 Billion events ○ 17 million files ○ 55 TB of data ● Hourly Scaling - ○ 1 TB ←→ 6 TB ○ Scales up/down by a factor of 6 !!
  • 23. 24 BUT - Life is Not That Easy
  • 24. 25 Pitfall 1 - Clash of clans ● Easy is risky, easy and manual is crazy ● Too many breaking changes ● Too many hands ● No orchestration
  • 25. 26 First Shot Solution ● Blocking permissions ● BUT - CloudOps became a bottleneck Created tension and ARGUMENTS !!
  • 26. 27 Second Shot ● Automate anything and everything….. ● Give developers access to run all automation tools Reduced tensions - developers could now work !!
  • 27. 28 Clash of Clans - Peace Conference ● Created a shared task force (Cloud & Data engineers) ● Define & Implement holistic solution ○ Tools (Terraform & Jenkins) ○ Processes (CI & CD) ○ Deployments (Canary, gradual, full)
  • 28. 29 Clash of Clans - New Era ● Developers own production! ● CloudOps maintain tools and approve changes
  • 29. 30 Pitfall 2 - Costs ● Optimization will save $$$ ● BUT - what is the reverse of optimization ? BUGS !!
  • 30. 31 BUG 1 - Reprocessing
  • 31. 32 BUG 1 - Reprocessing 38 Times !! ● Monitor tasks ● If task is “in-progress” for too long - we reprocess Time Task Task Task Task Task Task Task Processing Time Reprocess Threshold Reprocess Tasks
  • 32. 33 The Bug ● We deployed a version that had a lower threshold ● Many normal tasks were reprocessed (38 times) Time Task Task Task Task Task Task Task Processing Time Reprocess Threshold Reprocess Tasks Task Task
  • 34. 35 Total $ Damage - $3413 Item Cost Total cost of 4 bug days $ 7,442 Average 4 day costs $ 4,029 Net $ damage $ 3,413
  • 35. 36 Solution for Reprocessing ● Task validations to prevent multi processing ● Stick to your procedures !!!
  • 36. 37 BUG 2 - Lambda Throttling
  • 37. 38 BUG 2 - Lambda Throttled Queue SNS Work Manager Workers FanOut DB High load → causing queued invocations
  • 38. 39 BUG 2 - Lambda Throttled Queue SNS Increased concurrency Workers FanOut
  • 40. 41 Total $ Damage - $2555 Item Cost Total cost of 5 bug days $ 6,055 Average day costs (of 5 days) $ 3,500 Net $ damage $ 2,555
  • 41. 42 Solution for Escalating Costs ● New monitoring & alerts on: ○ number of invocations ○ Measure costs in real time and alert
  • 42. 43 Pitfall 3 - DDoS Attack on Partners Our system scales up and down as traffic comes in !! BUT - the world does not work like that !!
  • 43. 44 Email from Partner Subject: You’re killing our systems Message: can you guys check……. ………. Data going up 1 → 6x over the day
  • 44. 45 Solution for DDoS Throttling Adslfjk alsdf al;sdjf aljfds j;laj fda;sdflj a;lksdfj ;alksdjf ;alksdjf a;lkjdsf ;alksjdf a;lsdkfj a;lskdjf ;alsdkfj a;lsdkjf a;ldjkf a;ldkjf ;alsdj a;ldkjf a;ldjkf Adslfjk alsdf al;sdjf aljfds j;laj fda;sdflj a;lksdfj ………. We can throttle on: ● MB/sec ● Items/sec ● HTTP requests/sec , connections, …..
  • 45. 46 Summary - Avoiding the Pitfalls ● Measure all risk dimensions ● In our system they are: ○ Cost ○ Throughput ○ Load ● Quickly alert on deviations ● Build detailed dashboards on the above ● Create circuit breakers in the system based on alerts !!
  • 46. 47 Summary - Serverless ● Amazing powers and scale ● Great cost optimizations ● Parallelization BUT - BEWARE of escalating costs and be ready for the UNEXPECTED !!
  • 47. Copyright © 2017 The Nielsen Company. Confidential and proprietary. Questions

Editor's Notes

  1. Data Company - which means that we get or buy data from our partners in various ways Online and Offline We enrich the data - which in our case is generating attributes Attribute - that we assign to a device based on the data that we have, for example Sports Fan, Eats Organic food etc.. The enriched data that we generate help support our clients’ business decision and also allows them to Target the relevant audiences Nielsen Identity Engine -A group inside Nielsen, -Born from eXelate company that was acquired by Nielsen on March 2015 -Nielsen is a data company and so are we and we had strong business relationship until at some point they decided to go for it and acquired exelate -Data company meaning -Buying and onboarding data into Nielsen Identity Engine from data providers, customers and Nielsen data -We have huge high quality dataset -enrich the data using machine learning models in order to create more relevant quality insights -categorize and sell according to a need -Helping brands to take intelligence business decisions -E.g. Targeting in the digital marketing world -Meaning help fit ads to viewers For example street sign can fit to a very small % of people who see it vs Online ads that can fit the profile of the individual that sees it -More interesting to the user -More chances he will click the ad -Better ROI for the marketer