SlideShare a Scribd company logo
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
You’re spiky and
we know it !!
Ravindra Bhanot
©2022 TWILIO INC. ALL RIGHTS RESERVED
Ravi
Bhanot
Principal Software Engineer
Thomas
D’Silva
Principal Software Engineer
Scott
Reynolds
Architect
Key Contributors
©2022 TWILIO INC. ALL RIGHTS RESERVED
Evolved the app through its
stages to support scale and
requirements.
Critical in designing
composable and flexible
filter design
Guided in key, value
designs of schemas with
scalability and
latency considerations.
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Introduction to Twilio
©2022 TWILIO INC. ALL RIGHTS RESERVED
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Twilio communication data patterns
- spikes and seasonality
Spikes at hour boundaries
Elevated traffic at some hours of the day
©2022 TWILIO INC. ALL RIGHTS RESERVED
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Alerting use case / Minimum
viable product
- Alert customers on defined threshold criteria as real-time as
possible
©2022 TWILIO INC. ALL RIGHTS RESERVED
- Allow setting alerts over a subset of data in event stream using field
level filters
- Allow flexible time rollups
- Allow statistical operations over rollups - SUM, COUNT, AVERAGE,
PERCENTAGE
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Approaches to creating a monitoring
solution for customers
● Cache/Database based approach with cron/routines to check
thresholds
● Streams based approach
©2022 TWILIO INC. ALL RIGHTS RESERVED
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Chosen way
● Kafka Streams due to familiarity
● Ease of maintaining state in statestores
● Ease of calculating and updating aggregates
©2022 TWILIO INC. ALL RIGHTS RESERVED
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Evolution of the solution
Journey of how we went from a simple two stage(filter and
aggregate application) to a three stage scalable solution.
Image credit - Shutterstock
©2022 TWILIO INC. ALL RIGHTS RESERVED
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Glossary for forthcoming slides
● Input topic - Stream of records on which aggregates will be
calculated and thresholds will be checked.
● Alert Config - User/Customer defined configuration of
thresholds, time constraints and filters.
● Account - Notion of a single user of the system whose
interactions will cause multiple events on the input topic.
● Alert - A single violation or recovery event of an alert config’s
threshold criteria.
©2022 TWILIO INC. ALL RIGHTS RESERVED
● Sink - Send stream of records to kafka broker than just
forwarding to next stage of app locally inside a machine
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Sample Alert Config/Alert Criteria definition
{
"alert_config_sid": "AK10001",
"account_sid": "AC40c3a0fa4f71f6f0b7cbd895724fb211",
"recordFilter": {
"field_name": {"string": "phonenumber"},
"equality_type": "EQUALS",
"field_value": {"string": "+1949xxxxxxx"},
"operator_type": "LEAF",
"operands": null
},
"dataset": "BillingTransactions",
"measure_field_name": {"string": "amount"},
"threshold": {
"alert_level_threshold": "100",
"comparison": "ABOVE",
"operation": "SUM",
"time_period": "FIVE_MINS"
},
"flap_detection_method": "WEIGHTED_AVERAGE"
"notification_preferences": [..],
"date_created_epoch_milli": 1599071100000,
"date_updated_epoch_milli": 1599071100000
}
©2022 TWILIO INC. ALL RIGHTS RESERVED
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Sample Triggered Alert
ERROR SAMPLE -
{
"alert_config_sid": "AK10001",
"dataset": "BillingTransactions",
"account_sid": "AC40c3a0fa4f71f6f0b7cbd895724fb211",
"alert_time_epoch": 1599071100000,
"metric_value": "200",
"alert_status": "ERROR",
}
RECOVERED SAMPLE -
{
"alert_config_sid": "AK10001",
"dataset": "BillingTransactions",
"account_sid": "AC40c3a0fa4f71f6f0b7cbd895724fb211",
"alert_time_epoch": 159907113600,
"metric_value": "10",
"alert_status": "RECOVERED",
}
©2022 TWILIO INC. ALL RIGHTS RESERVED
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Sample Alert Config with Composable Filters
{
.
.
"recordFilter": {
"field_name": null,
"equality_type": "EQUALS",
"field_value": null,
"operator_type": "OR",
"operands": {
"array": [
{
"field_name": {"string": "status"},
"equality_type": "EQUALS",
"field_value": {"string": "SENT"},
"operator_type": "LEAF",
"operands": null
},
{
"field_name": {"string": "status"},
"equality_type": "EQUALS",
"field_value": {"string": "DELIVERED"},
"operator_type": "LEAF",
"operands": null
}
]
}
},
.
.
©2022 TWILIO INC. ALL RIGHTS RESERVED
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Australopithecus state - Filter, Aggregate
& Inspect
©2022 TWILIO INC. ALL RIGHTS RESERVED
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Filter, Aggregate & Inspect - Key
design considerations
- Alert configs topic read into a global statestore - All alert configs
are available on all machines in app cluster
- Key of output topic of all stages to include account_id and
alert_config_id for contextual reference when processing
- Sink output topic of first stage(Filter) to broker - Records get
distributed across the machines in the app cluster as source to
second stage(Aggregator)
- Third stage(Inspect) consumes output topic of second
stage(aggregator) local to each machine
©2022 TWILIO INC. ALL RIGHTS RESERVED
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Filter, Aggregate & Inspect - Downsides
- Too many records in first stage(Filter) for an account causing too
many inspections in third stage(Inspect) -> causing lag for
inspection of other accounts when their records show up
- Aggregates for present minute still getting populated while being
Inspected - can cause false alerts to trigger
Image credit - Adobe
©2022 TWILIO INC. ALL RIGHTS RESERVED
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Filter, Aggregate & Inspect -
Crucial Learnings
- Distribution of records amongst partitions
Sink to broker if trying a repartition.
Forward to next stage if processing metadata locally present.
- Statestore/RocksDB tuning
a) Using LRU cache with a BloomFilter
b) Use kHashSearch instead of default Binary Search in case of
frequent lookups
Blog - https://www.twilio.com/blog/kafka-streams-near-real-time
©2022 TWILIO INC. ALL RIGHTS RESERVED
Questions ?
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Neanderthal state - Filter, Aggregate
& Punctuate
©2022 TWILIO INC. ALL RIGHTS RESERVED
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Filter, Aggregate & Punctuate -
Design considerations
- Punctuator to inspect state of local aggregates
Punctuate every 60 seconds based on wallclock time so that only
aggregates local to a node are inspected.
Aggregates not read immediately to allow record state to finalize.
- Aggregation pauses when punctuator runs
Punctuation should be quick and as current minute aggregates
are not read in punctuation, okay to pause processing of a stage
to free up cores for punctuator.
©2022 TWILIO INC. ALL RIGHTS RESERVED
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Filter, Aggregate & Punctuate -
Downsides
- Limit the number of alert configs per account as time takes
scales as the number of alert configs increases.
Performance test to determine the limits on number of alert
configs per account for a single twilio use case.
- Flappy/Toggling Alerts
Alerts can toggle between ERROR and RECOVERED states
causing a lot of notification spam for the end customer.
©2022 TWILIO INC. ALL RIGHTS RESERVED
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Homo Sapiens state -
Filter, Aggregate/Punctuate
& Conduct
©2022 TWILIO INC. ALL RIGHTS RESERVED
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Filter, Aggregate/Punctuate &
Conduct - Design features
- Decaying weighted
average to avoid over
notifying customers
Based on history of last X
time windows, assign
decreasing weights based
on time of state
transitions (example -
ERROR -> RECOVERY)
Use this cumulative score
to decide if flappy or not.
Image credit - Nagios
©2022 TWILIO INC. ALL RIGHTS RESERVED
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Scalability testing
©2022 TWILIO INC. ALL RIGHTS RESERVED
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Future state
- Support predictive or proactive operations
to avoid entering bad states
- Make punctuator adaptive to needs of real
timeliness
- Detect anomalies in data
©2022 TWILIO INC. ALL RIGHTS RESERVED
Q & A
Other notable contributors over the years
● Minakshi Korad
● Georgiana Ogrean
● Jyotsna Shevade
● Ram Kolla
● Dante Bourret
● Sriram Ramarathnam
● Tom Tobin

More Related Content

Similar to You’re Spiky and We Know It With Ravindra Bhanot | Current 2022

Easy Data Integrations Tips for High-Value Reporting
Easy Data Integrations Tips for High-Value ReportingEasy Data Integrations Tips for High-Value Reporting
Easy Data Integrations Tips for High-Value Reporting
TIBCO Jaspersoft
 
How to Get the Best User Experience for Your Customers With: CA View® / CA De...
How to Get the Best User Experience for Your Customers With: CA View® / CA De...How to Get the Best User Experience for Your Customers With: CA View® / CA De...
How to Get the Best User Experience for Your Customers With: CA View® / CA De...
CA Technologies
 
Hands-On Lab: CA Spectrum : How To Leverage UI Updates For Operational Effic...
Hands-On Lab: CA Spectrum: How To Leverage UI Updates For Operational Effic...Hands-On Lab: CA Spectrum: How To Leverage UI Updates For Operational Effic...
Hands-On Lab: CA Spectrum : How To Leverage UI Updates For Operational Effic...
CA Technologies
 
Beyond Diagnostics & Erasure – The Future of Ultra-Efficient Mobile Device Pr...
Beyond Diagnostics & Erasure – The Future of Ultra-Efficient Mobile Device Pr...Beyond Diagnostics & Erasure – The Future of Ultra-Efficient Mobile Device Pr...
Beyond Diagnostics & Erasure – The Future of Ultra-Efficient Mobile Device Pr...
Blancco
 
How To Save A Million Dollars in 15 Minutes with Software Asset Management (a...
How To Save A Million Dollars in 15 Minutes with Software Asset Management (a...How To Save A Million Dollars in 15 Minutes with Software Asset Management (a...
How To Save A Million Dollars in 15 Minutes with Software Asset Management (a...
CA Technologies
 
IRJET - Farm Direct Marketing
IRJET - Farm Direct MarketingIRJET - Farm Direct Marketing
IRJET - Farm Direct Marketing
IRJET Journal
 
brainwaregroup ITAM Review Tools Day Presentation 2015
brainwaregroup ITAM Review Tools Day Presentation 2015brainwaregroup ITAM Review Tools Day Presentation 2015
brainwaregroup ITAM Review Tools Day Presentation 2015
Martin Thompson
 
IRJET- Agriculture Business to Business Website
IRJET- Agriculture Business to Business WebsiteIRJET- Agriculture Business to Business Website
IRJET- Agriculture Business to Business Website
IRJET Journal
 
Open Banking and the Realization of Banking-as-a-Service
Open Banking and the Realization of Banking-as-a-ServiceOpen Banking and the Realization of Banking-as-a-Service
Open Banking and the Realization of Banking-as-a-Service
Kyriba Corporation
 
Autonomous Platform with AIML Document Intelligence Capabilities to Handle Se...
Autonomous Platform with AIML Document Intelligence Capabilities to Handle Se...Autonomous Platform with AIML Document Intelligence Capabilities to Handle Se...
Autonomous Platform with AIML Document Intelligence Capabilities to Handle Se...
IRJET Journal
 
Pre-Con Education: Zero to Compliance Using Software Asset Management Soluti...
Pre-Con Education: Zero to Compliance Using Software Asset Management Soluti...Pre-Con Education: Zero to Compliance Using Software Asset Management Soluti...
Pre-Con Education: Zero to Compliance Using Software Asset Management Soluti...
CA Technologies
 
What’s New and Exciting with RTView and TIBCO Middleware Monitoring
What’s New and Exciting with RTView and TIBCO Middleware MonitoringWhat’s New and Exciting with RTView and TIBCO Middleware Monitoring
What’s New and Exciting with RTView and TIBCO Middleware Monitoring
SL Corporation
 
Journey to the Center of Security Operations
Journey to the Center of Security OperationsJourney to the Center of Security Operations
Journey to the Center of Security Operations
♟Sergej Epp
 
Business Utility Application
Business Utility ApplicationBusiness Utility Application
Business Utility Application
IRJET Journal
 
How to Select High Impact Use Cases to Drive a Successful Network Automation ...
How to Select High Impact Use Cases to Drive a Successful Network Automation ...How to Select High Impact Use Cases to Drive a Successful Network Automation ...
How to Select High Impact Use Cases to Drive a Successful Network Automation ...
Itential
 
Still Suffering from IT Outages? Accept Failure, Learn from Failure and Get R...
Still Suffering from IT Outages? Accept Failure, Learn from Failure and Get R...Still Suffering from IT Outages? Accept Failure, Learn from Failure and Get R...
Still Suffering from IT Outages? Accept Failure, Learn from Failure and Get R...
Splunk
 
Supervision Reporting Service
Supervision Reporting ServiceSupervision Reporting Service
Supervision Reporting Service
m0gabr79
 
See Inside the Middleware Black Box
See Inside the Middleware Black Box See Inside the Middleware Black Box
See Inside the Middleware Black Box
CA Technologies
 
Knock, Knock…The Internet of Things wants to come in?
Knock, Knock…The Internet of Things wants to come in? Knock, Knock…The Internet of Things wants to come in?
Knock, Knock…The Internet of Things wants to come in?
CA Technologies
 
CA Unified Infrastructure Management for z Systems: Get a Holistic View of Yo...
CA Unified Infrastructure Management for z Systems: Get a Holistic View of Yo...CA Unified Infrastructure Management for z Systems: Get a Holistic View of Yo...
CA Unified Infrastructure Management for z Systems: Get a Holistic View of Yo...
CA Technologies
 

Similar to You’re Spiky and We Know It With Ravindra Bhanot | Current 2022 (20)

Easy Data Integrations Tips for High-Value Reporting
Easy Data Integrations Tips for High-Value ReportingEasy Data Integrations Tips for High-Value Reporting
Easy Data Integrations Tips for High-Value Reporting
 
How to Get the Best User Experience for Your Customers With: CA View® / CA De...
How to Get the Best User Experience for Your Customers With: CA View® / CA De...How to Get the Best User Experience for Your Customers With: CA View® / CA De...
How to Get the Best User Experience for Your Customers With: CA View® / CA De...
 
Hands-On Lab: CA Spectrum : How To Leverage UI Updates For Operational Effic...
Hands-On Lab: CA Spectrum: How To Leverage UI Updates For Operational Effic...Hands-On Lab: CA Spectrum: How To Leverage UI Updates For Operational Effic...
Hands-On Lab: CA Spectrum : How To Leverage UI Updates For Operational Effic...
 
Beyond Diagnostics & Erasure – The Future of Ultra-Efficient Mobile Device Pr...
Beyond Diagnostics & Erasure – The Future of Ultra-Efficient Mobile Device Pr...Beyond Diagnostics & Erasure – The Future of Ultra-Efficient Mobile Device Pr...
Beyond Diagnostics & Erasure – The Future of Ultra-Efficient Mobile Device Pr...
 
How To Save A Million Dollars in 15 Minutes with Software Asset Management (a...
How To Save A Million Dollars in 15 Minutes with Software Asset Management (a...How To Save A Million Dollars in 15 Minutes with Software Asset Management (a...
How To Save A Million Dollars in 15 Minutes with Software Asset Management (a...
 
IRJET - Farm Direct Marketing
IRJET - Farm Direct MarketingIRJET - Farm Direct Marketing
IRJET - Farm Direct Marketing
 
brainwaregroup ITAM Review Tools Day Presentation 2015
brainwaregroup ITAM Review Tools Day Presentation 2015brainwaregroup ITAM Review Tools Day Presentation 2015
brainwaregroup ITAM Review Tools Day Presentation 2015
 
IRJET- Agriculture Business to Business Website
IRJET- Agriculture Business to Business WebsiteIRJET- Agriculture Business to Business Website
IRJET- Agriculture Business to Business Website
 
Open Banking and the Realization of Banking-as-a-Service
Open Banking and the Realization of Banking-as-a-ServiceOpen Banking and the Realization of Banking-as-a-Service
Open Banking and the Realization of Banking-as-a-Service
 
Autonomous Platform with AIML Document Intelligence Capabilities to Handle Se...
Autonomous Platform with AIML Document Intelligence Capabilities to Handle Se...Autonomous Platform with AIML Document Intelligence Capabilities to Handle Se...
Autonomous Platform with AIML Document Intelligence Capabilities to Handle Se...
 
Pre-Con Education: Zero to Compliance Using Software Asset Management Soluti...
Pre-Con Education: Zero to Compliance Using Software Asset Management Soluti...Pre-Con Education: Zero to Compliance Using Software Asset Management Soluti...
Pre-Con Education: Zero to Compliance Using Software Asset Management Soluti...
 
What’s New and Exciting with RTView and TIBCO Middleware Monitoring
What’s New and Exciting with RTView and TIBCO Middleware MonitoringWhat’s New and Exciting with RTView and TIBCO Middleware Monitoring
What’s New and Exciting with RTView and TIBCO Middleware Monitoring
 
Journey to the Center of Security Operations
Journey to the Center of Security OperationsJourney to the Center of Security Operations
Journey to the Center of Security Operations
 
Business Utility Application
Business Utility ApplicationBusiness Utility Application
Business Utility Application
 
How to Select High Impact Use Cases to Drive a Successful Network Automation ...
How to Select High Impact Use Cases to Drive a Successful Network Automation ...How to Select High Impact Use Cases to Drive a Successful Network Automation ...
How to Select High Impact Use Cases to Drive a Successful Network Automation ...
 
Still Suffering from IT Outages? Accept Failure, Learn from Failure and Get R...
Still Suffering from IT Outages? Accept Failure, Learn from Failure and Get R...Still Suffering from IT Outages? Accept Failure, Learn from Failure and Get R...
Still Suffering from IT Outages? Accept Failure, Learn from Failure and Get R...
 
Supervision Reporting Service
Supervision Reporting ServiceSupervision Reporting Service
Supervision Reporting Service
 
See Inside the Middleware Black Box
See Inside the Middleware Black Box See Inside the Middleware Black Box
See Inside the Middleware Black Box
 
Knock, Knock…The Internet of Things wants to come in?
Knock, Knock…The Internet of Things wants to come in? Knock, Knock…The Internet of Things wants to come in?
Knock, Knock…The Internet of Things wants to come in?
 
CA Unified Infrastructure Management for z Systems: Get a Holistic View of Yo...
CA Unified Infrastructure Management for z Systems: Get a Holistic View of Yo...CA Unified Infrastructure Management for z Systems: Get a Holistic View of Yo...
CA Unified Infrastructure Management for z Systems: Get a Holistic View of Yo...
 

More from HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 

More from HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Recently uploaded

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 

Recently uploaded (20)

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 

You’re Spiky and We Know It With Ravindra Bhanot | Current 2022

  • 1. © 2019 TWILIO INC. ALL RIGHTS RESERVED. You’re spiky and we know it !! Ravindra Bhanot ©2022 TWILIO INC. ALL RIGHTS RESERVED
  • 2. Ravi Bhanot Principal Software Engineer Thomas D’Silva Principal Software Engineer Scott Reynolds Architect Key Contributors ©2022 TWILIO INC. ALL RIGHTS RESERVED Evolved the app through its stages to support scale and requirements. Critical in designing composable and flexible filter design Guided in key, value designs of schemas with scalability and latency considerations.
  • 3. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Introduction to Twilio ©2022 TWILIO INC. ALL RIGHTS RESERVED
  • 4. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Twilio communication data patterns - spikes and seasonality Spikes at hour boundaries Elevated traffic at some hours of the day ©2022 TWILIO INC. ALL RIGHTS RESERVED
  • 5. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Alerting use case / Minimum viable product - Alert customers on defined threshold criteria as real-time as possible ©2022 TWILIO INC. ALL RIGHTS RESERVED - Allow setting alerts over a subset of data in event stream using field level filters - Allow flexible time rollups - Allow statistical operations over rollups - SUM, COUNT, AVERAGE, PERCENTAGE
  • 6. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Approaches to creating a monitoring solution for customers ● Cache/Database based approach with cron/routines to check thresholds ● Streams based approach ©2022 TWILIO INC. ALL RIGHTS RESERVED
  • 7. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Chosen way ● Kafka Streams due to familiarity ● Ease of maintaining state in statestores ● Ease of calculating and updating aggregates ©2022 TWILIO INC. ALL RIGHTS RESERVED
  • 8. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Evolution of the solution Journey of how we went from a simple two stage(filter and aggregate application) to a three stage scalable solution. Image credit - Shutterstock ©2022 TWILIO INC. ALL RIGHTS RESERVED
  • 9. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Glossary for forthcoming slides ● Input topic - Stream of records on which aggregates will be calculated and thresholds will be checked. ● Alert Config - User/Customer defined configuration of thresholds, time constraints and filters. ● Account - Notion of a single user of the system whose interactions will cause multiple events on the input topic. ● Alert - A single violation or recovery event of an alert config’s threshold criteria. ©2022 TWILIO INC. ALL RIGHTS RESERVED ● Sink - Send stream of records to kafka broker than just forwarding to next stage of app locally inside a machine
  • 10. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Sample Alert Config/Alert Criteria definition { "alert_config_sid": "AK10001", "account_sid": "AC40c3a0fa4f71f6f0b7cbd895724fb211", "recordFilter": { "field_name": {"string": "phonenumber"}, "equality_type": "EQUALS", "field_value": {"string": "+1949xxxxxxx"}, "operator_type": "LEAF", "operands": null }, "dataset": "BillingTransactions", "measure_field_name": {"string": "amount"}, "threshold": { "alert_level_threshold": "100", "comparison": "ABOVE", "operation": "SUM", "time_period": "FIVE_MINS" }, "flap_detection_method": "WEIGHTED_AVERAGE" "notification_preferences": [..], "date_created_epoch_milli": 1599071100000, "date_updated_epoch_milli": 1599071100000 } ©2022 TWILIO INC. ALL RIGHTS RESERVED
  • 11. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Sample Triggered Alert ERROR SAMPLE - { "alert_config_sid": "AK10001", "dataset": "BillingTransactions", "account_sid": "AC40c3a0fa4f71f6f0b7cbd895724fb211", "alert_time_epoch": 1599071100000, "metric_value": "200", "alert_status": "ERROR", } RECOVERED SAMPLE - { "alert_config_sid": "AK10001", "dataset": "BillingTransactions", "account_sid": "AC40c3a0fa4f71f6f0b7cbd895724fb211", "alert_time_epoch": 159907113600, "metric_value": "10", "alert_status": "RECOVERED", } ©2022 TWILIO INC. ALL RIGHTS RESERVED
  • 12. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Sample Alert Config with Composable Filters { . . "recordFilter": { "field_name": null, "equality_type": "EQUALS", "field_value": null, "operator_type": "OR", "operands": { "array": [ { "field_name": {"string": "status"}, "equality_type": "EQUALS", "field_value": {"string": "SENT"}, "operator_type": "LEAF", "operands": null }, { "field_name": {"string": "status"}, "equality_type": "EQUALS", "field_value": {"string": "DELIVERED"}, "operator_type": "LEAF", "operands": null } ] } }, . . ©2022 TWILIO INC. ALL RIGHTS RESERVED
  • 13. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Australopithecus state - Filter, Aggregate & Inspect ©2022 TWILIO INC. ALL RIGHTS RESERVED
  • 14. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Filter, Aggregate & Inspect - Key design considerations - Alert configs topic read into a global statestore - All alert configs are available on all machines in app cluster - Key of output topic of all stages to include account_id and alert_config_id for contextual reference when processing - Sink output topic of first stage(Filter) to broker - Records get distributed across the machines in the app cluster as source to second stage(Aggregator) - Third stage(Inspect) consumes output topic of second stage(aggregator) local to each machine ©2022 TWILIO INC. ALL RIGHTS RESERVED
  • 15. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Filter, Aggregate & Inspect - Downsides - Too many records in first stage(Filter) for an account causing too many inspections in third stage(Inspect) -> causing lag for inspection of other accounts when their records show up - Aggregates for present minute still getting populated while being Inspected - can cause false alerts to trigger Image credit - Adobe ©2022 TWILIO INC. ALL RIGHTS RESERVED
  • 16. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Filter, Aggregate & Inspect - Crucial Learnings - Distribution of records amongst partitions Sink to broker if trying a repartition. Forward to next stage if processing metadata locally present. - Statestore/RocksDB tuning a) Using LRU cache with a BloomFilter b) Use kHashSearch instead of default Binary Search in case of frequent lookups Blog - https://www.twilio.com/blog/kafka-streams-near-real-time ©2022 TWILIO INC. ALL RIGHTS RESERVED
  • 18. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Neanderthal state - Filter, Aggregate & Punctuate ©2022 TWILIO INC. ALL RIGHTS RESERVED
  • 19. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Filter, Aggregate & Punctuate - Design considerations - Punctuator to inspect state of local aggregates Punctuate every 60 seconds based on wallclock time so that only aggregates local to a node are inspected. Aggregates not read immediately to allow record state to finalize. - Aggregation pauses when punctuator runs Punctuation should be quick and as current minute aggregates are not read in punctuation, okay to pause processing of a stage to free up cores for punctuator. ©2022 TWILIO INC. ALL RIGHTS RESERVED
  • 20. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Filter, Aggregate & Punctuate - Downsides - Limit the number of alert configs per account as time takes scales as the number of alert configs increases. Performance test to determine the limits on number of alert configs per account for a single twilio use case. - Flappy/Toggling Alerts Alerts can toggle between ERROR and RECOVERED states causing a lot of notification spam for the end customer. ©2022 TWILIO INC. ALL RIGHTS RESERVED
  • 21. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Homo Sapiens state - Filter, Aggregate/Punctuate & Conduct ©2022 TWILIO INC. ALL RIGHTS RESERVED
  • 22. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Filter, Aggregate/Punctuate & Conduct - Design features - Decaying weighted average to avoid over notifying customers Based on history of last X time windows, assign decreasing weights based on time of state transitions (example - ERROR -> RECOVERY) Use this cumulative score to decide if flappy or not. Image credit - Nagios ©2022 TWILIO INC. ALL RIGHTS RESERVED
  • 23. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Scalability testing ©2022 TWILIO INC. ALL RIGHTS RESERVED
  • 24. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Future state - Support predictive or proactive operations to avoid entering bad states - Make punctuator adaptive to needs of real timeliness - Detect anomalies in data ©2022 TWILIO INC. ALL RIGHTS RESERVED
  • 25. Q & A Other notable contributors over the years ● Minakshi Korad ● Georgiana Ogrean ● Jyotsna Shevade ● Ram Kolla ● Dante Bourret ● Sriram Ramarathnam ● Tom Tobin