Why am I doing this???

Why am I doing this???
Anne-Marie Tousch
Senior Data Scientist, Datadog
PyLadies Meetup
November 16th, 2023

❏ To share my pain
❏ To show off my knowledge
❏ To explain away why I'm failing so much
❏ Why did I sign up for this talk?
❏ To make you ask the same question
Why am I doing this?

Why am I doing this???
Or why data science is harder than you think
Anne-Marie Tousch
Senior Data Scientist, Datadog
PyLadies Meetup
November 16th, 2023

Quick bio
computer vision
(PhD)
computer vision
(startup)
ML (RecSys, …)
2020-?: AIOps
4
More Machine Learning More Software Engineering
2006
2010
2014
2020
?
?
?
?

● We run on millions
of hosts
● We collect tens of
trillions of
events per day
Visit datadoghq.com for more information

Datadog Watchdog™
https://docs.datadoghq.com/watchdog/

Anomaly monitors
https://docs.datadoghq.com/moni
tors/types/anomaly/#overview

The challenge of
Anomaly
Detection

The challenge of anomaly detection
Is this an anomaly?

Ghosh, Supriyo, et al. "How to fight production incidents? an empirical study on a large-scale cloud
service." Proceedings of the 13th Symposium on Cloud Computing. 2022.

Ghosh, Supriyo, et al. "How to fight production incidents? an empirical study on a large-scale cloud
service." Proceedings of the 13th Symposium on Cloud Computing. 2022.
"How incidents are detected? … we
observe that about 55% of the incidents
were detected by the automated
watchdogs."

Is this an anomaly?
Should I page someone?

Anomaly detection for cloud systems
● Account for the
severity of the anomaly
● Low time to detection
● Low false detection
rates
● Explainability matters

Understand the context of the product
Why am I building this algorithm?
Hits/seconds Errors/hits

The challenge of Time Series
Hewamalage, Hansika, Klaus Ackermann, and Christoph Bergmeir. "Forecast evaluation for data scientists:
common pitfalls and best practices." Data Mining and Knowledge Discovery 37.2 (2023): 788-832.

"we regularly come across papers in top
Artificial Intelligence (AI)/ML conferences
and journals (even winning best paper
awards) that use inadequate and misleading
benchmark methods for comparison"

MAE: mean absolute error
MSE: mean squared error

Schmidl, Sebastian, Phillip Wenig, and Thorsten Papenbrock. "Anomaly detection in time series: a
comprehensive evaluation." Proceedings of the VLDB Endowment 15.9 (2022): 1779-1797.

This comprehensive, scientific study
carefully evaluates most
state-of-the-art anomaly detection
algorithms. We collected and
re-implemented 71 anomaly detection
algorithms from different domains and
evaluated them on 976 time series
datasets.
��

Our experimental results on the
different datasets show that, overall,
every anomaly detection family can be
effective and there is no clear winner.

Choosing the right algorithm for the context
● What do your time series look like?
○ Domain knowledge
● Are you evaluating correctly?
○ Do you have relevant benchmarks?
○ Do you have a strong "simple"
baseline?
○ Do you have relevant evaluation
metrics?

Is this an anomaly?
Is this unlike other events in the same
context?

The challenge of
Data Science in
general

Classical software
Use algorithms to process data.

Classical Software
31
smooth
31
Threshold
Anomaly
detection
Strong contracts

Machine Learning: so what's different?
The function is generated from the data
32

Machine Learning
33
Weak contracts

Different kinds of contracts
Function definition is
clear
- Rules / mathematics
- Unit tests
- Explainable
34
Function definition is
data-dependent
- Examples
- Statistical accuracy
- Uncertain outcome
Strong contracts Weak contracts

Different kinds of contracts
"An anomaly is whenever
latency goes above given
threshold"
35
"An anomaly is an event
unlike others in the same
context"
(ideas from Two big challenges in machine learning Keynote by Leon Bottou, ICML 2015)

Should I use machine learning?
● Can you describe the problem with
simple rules?
● Do you have data?
● Do you need 100% accuracy?
○ Can you have 100% accuracy realistically?
● Do you need 100% explainability?
○ Eg regulations/law

So, why am I doing
this?
Takeaways

Data science is harder than you think
● Understand the product
○ What kind of contract fits better?
● Evaluate rigorously
○ Why is this algorithm better than any other?
● Adapt to the context
○ Why am I doing this?

Thanks! Questions?
annemarie@datadoghq.com

Why am I doing this???

More Related Content

Similar to Why am I doing this???

More from Anne-Marie Tousch

Recently uploaded

Why am I doing this???