SlideShare a Scribd company logo
1 of 30
Sta$s$cal 
Learning 
Based 
Anomaly 
Detec$on 
@ 
Twi9er 
Arun Kejariwal 
(@arun_kejariwal) 
Joint work with Jordan Hochenbaum and Owen Vallis 
November 2014
Internet 
trends 
• Real-time 
[1] 
h9p://techcrunch.com/2014/05/05/amazon-­‐extends-­‐its-­‐shopping-­‐cart-­‐to-­‐twi9er/ 
AK 
2 
[1]
Twi9er: 
Global 
Town 
Square 
AK 
3
Data 
Fidelity 
• Data-driven decision making 
q Evolving product landscape 
• Data partners 
q Nielsen 
q Dataminr 
• Operational 
q Performance and Availability 
AK 
4
Data 
Fidelity: 
Challenges 
• Anomalies 
q Exogenic factors 
§ User behavior 
§ Events 
§ Data center 
q Endogenic factors 
§ Agile development 
o Fail fast 
§ Data collection 
• Millions of time series [1,2] 
q Scalability 
AK 
5 
[1] 
h9p://strata.oreilly.com/2013/09/how-­‐twi9er-­‐monitors-­‐millions-­‐of-­‐$me-­‐series.html 
[2] 
h9p://strataconf.com/strata2014/public/schedule/detail/32431
Anomaly 
Detec$on: 
Why 
Bother? 
• Analyze User Engagement 
q Events 
§ Super Bowl, Japanese New Year 
q Year over year analysis (input to forecasting) 
• Identify Attacks 
q DoS 
q Malware attacks 
• Identify Bots 
q Separating actual users from spam 
AK 
6
Anomaly 
Detec$on 
• Visual 
q Prone to errors 
q Not scalable 
§ Machine generated data 
11% of the digital universe in 2005 
to > 40% by 2020 [1] 
§ Cloud Infrastructure 2013-2017 CAGR ~50% [2] 
• Algorithmic approach 
q Automate! 
[1] 
h9p://www.emc.com/about/news/press/2012/20121211-­‐01.htm 
AK 
7 
[2] 
h9p://www.forbes.com/sites/gilpress/2013/12/12/16-­‐1-­‐billion-­‐big-­‐data-­‐market-­‐2014-­‐predic$ons-­‐from-­‐idc-­‐and-­‐iia/
Anomaly 
Detec$on: 
Background 
• Over 50 years of research [1] 
q Statistics 
§ Extreme Value Theory 
§ Robust Statistics, Grubb’s Test, ESD 
q Econometrics 
q Finance 
§ Value at Risk (VaR) 
q Signal Processing 
q Music Information Retrieval 
q Networking 
q E- Commerce 
q Performance Regression 
[1] 
“Anomaly 
Detec$on” 
by 
Chandola 
et 
al. 
ACM 
Compu$ng 
Surveys, 
2009. 
AK 
8 
Jon 
from 
Etsy 
Toufic 
from 
Metafor
Anomaly 
Detec$on: 
Overview 
• Definition 
q “An anomaly is an observation that deviates so much from other observations so 
as to arouse suspicions that it is was generated by a different mechanism” [1,2] 
[1] 
“Iden$fica$on 
of 
outliers” 
by 
Hawkins, 
Douglas 
M. 
London: 
Chapman 
and 
Hall, 
1980. 
AK 
9 
[2] 
“Outlier 
Analysis” 
by 
Charu 
C. 
Aggarwal. 
Springer, 
2013.
Anomaly 
Detec$on 
• Characterization 
q Magnitude 
q Width 
q Frequency 
q Direction 
AK 
10
Anomaly 
Detec$on 
(contd.) 
• Two flavors 
q Global 
§ Max Value 
q Local 
§ Intra-day 
AK 
11 
Global 
Local
Anomaly 
Detec$on 
(contd.) 
• Traditional Approaches 
q Metrics 
§ Mean μ 
§ Variance σ 
q Rule of thumb 
§ μ + 3*σ 
q Which time series? 
§ Raw 
§ Moving Averages 
o SMA, EWMA, PEWMA 
AK 
12 
3 * σ
Anomaly 
Detec$on 
(contd.) 
• Impact of multi-modal distribution 
q μ Shift ~ 0.2% 
q Inflates σ by 4.5% 
§ Miss quite a few anomalies 
q What do multiple modes correspond to? 
§ Seasonality 
AK 
13
• Robust Statistics 
q MAD 
§ Robust Breakdown point 
o Median 50% vs. Mean 0% 
q σMAD 
§ K = 1.4826 for normally distributed data 
AK 
14 
Anomaly 
Detec$on 
(contd.)
• Limitations of using MAD 
AK 
15 
Anomaly 
Detec$on 
(contd.)
• Grubb’s Test 
q Critical value is derived from data using a statistical confidence (α) 
• Limitations 
q Assumes data distribution is normal 
q Good for detecting ONLY 1 outlier 
q Seasonality unaware 
AK 
16 
Anomaly 
Detec$on 
(contd.)
• ESD (Generalized Extreme Studentized Deviate) [1] 
q Critical value (λi) re-calculated every iteration 
q Largest i such that Ri > λi determines # of anomalies 
q An upper-bound on the number of anomalies is an input parameter 
• Limitations 
q Generalized ESD assumes a “normal” distribution 
q Seasonality unaware 
AK 
17 
Anomaly 
Detec$on 
(contd.) 
[1] 
Rosner, 
Bernard. 
“Percentage 
Points 
for 
a 
Generalized 
ESD 
Many-­‐outlier 
Procedure.” 
Technometrics 
25, 
no. 
2 
(1983): 
165–172.
Our 
Approach
• Addressing Seasonality 
q Key Idea 
§ Time Series Decomposition 
AK 
19 
Anomaly 
Detec$on 
(contd.)
• Determining seasonal component 
q Regression on sub-cycle plots [1] 
AK 
20 
Anomaly 
Detec$on 
(contd.) 
[1] 
“STL: 
A 
seasonal-­‐trend 
decomposi$on 
procedure 
based 
on 
loess” 
by 
Cleveland, 
et 
al. 
Journal 
of 
Official 
Sta$s$cs, 
Vol. 
6, 
Issue 
1, 
1990.
• Impact of removal of seasonal and trend 
q Transforms our multi-modal data into unimodal data. 
§ Amenable to ESD/MAD! 
AK 
21 
Anomaly 
Detec$on 
(contd.) 
The decomposed Residual 
becomes "Uni-modal". This 
significantly shrinks the value of 
sigma. 
The original "Multi-Modal" 
Raw Data has a much wider 
value for sigma, leading ESD 
to miss a lot of the outliers.
Trend Smoothing Distortion 
Creates “Phantom” Anomalies 
• Challenges remain! 
AK 
22 
Anomaly 
Detec$on 
(contd.)
• Marrying Robust Statistics with Seasonal Decomposition 
AK 
23 
Anomaly 
Detec$on 
(contd.) 
Median is Free from Distortion
• Applying ESD on the Residual 
AK 
24 
Anomaly 
Detec$on 
(contd.) 
Decomposition Exposes Anomalies
• Recap 
q Extract the seasonal component using STL 
§ Filters out periodic spikes 
q Residual = Raw - Seasonalraw- Medianraw 
q Run ESD on residual (using median and MAD) 
AK 
25 
Anomaly 
Detec$on 
(contd.)
• Illustrative example 
AK 
26 
Anomaly 
Detec$on 
(contd.)
• Applications 
q Three perspectives 
§ Capacity 
o CPU utilization 
o Garbage collection 
o Network activity 
§ User behavior 
o Events 
• Impressions 
• Link clicks 
o Spam 
§ Forecasting 
AK 
27 
Anomaly 
Detec$on 
(contd.)
• Deployed in production 
q Used by large number of services at Twitter 
q Automatic e-mail notification 
§ Only sent if anomalies are present 
§ Anomalies annotated 
§ CSV with anomaly locations attached 
AK 
28 
Anomaly 
Detec$on 
(contd.)
• Skyline from Etsy 
q https://github.com/etsy/skyline/blob/master/src/analyzer/algorithms.py 
• Coming soon! 
q R package 
AK 
29 
Open 
Sourcing
Join 
the 
Flock 
Like 
problem 
solving? 
Like 
challenges? 
Be 
at 
cukng 
Edge 
Make 
an 
impact 
• We are hiring!! 
q https://twitter.com/JoinTheFlock 
q https://twitter.com/jobs 
q Contact us: @arun_kejariwal 
AK 
30

More Related Content

What's hot

Microsoft Malware Classification Challenge 上位手法の紹介 (in Kaggle Study Meetup)
Microsoft Malware Classification Challenge 上位手法の紹介 (in Kaggle Study Meetup)Microsoft Malware Classification Challenge 上位手法の紹介 (in Kaggle Study Meetup)
Microsoft Malware Classification Challenge 上位手法の紹介 (in Kaggle Study Meetup)Shotaro Sano
 
自己教師学習(Self-Supervised Learning)
自己教師学習(Self-Supervised Learning)自己教師学習(Self-Supervised Learning)
自己教師学習(Self-Supervised Learning)cvpaper. challenge
 
Monitoring - 入門監視
Monitoring - 入門監視Monitoring - 入門監視
Monitoring - 入門監視Eiji KOMINAMI
 
Active Learning 入門
Active Learning 入門Active Learning 入門
Active Learning 入門Shuyo Nakatani
 
KDD2018 DiDi 「large-scale order dispatch in on-demand ride-hailing platforms:...
KDD2018 DiDi 「large-scale order dispatch in on-demand ride-hailing platforms:...KDD2018 DiDi 「large-scale order dispatch in on-demand ride-hailing platforms:...
KDD2018 DiDi 「large-scale order dispatch in on-demand ride-hailing platforms:...SaeruYamamuro
 
多段階計算の型システムの基礎
多段階計算の型システムの基礎多段階計算の型システムの基礎
多段階計算の型システムの基礎T. Suwa
 
猫でも分かるVariational AutoEncoder
猫でも分かるVariational AutoEncoder猫でも分かるVariational AutoEncoder
猫でも分かるVariational AutoEncoderSho Tatsuno
 
初探 Elastic Observability 的實踐方法
初探 Elastic Observability 的實踐方法初探 Elastic Observability 的實踐方法
初探 Elastic Observability 的實踐方法Joe Wu
 
SSII2020 [OS2-03] 深層学習における半教師あり学習の最新動向
SSII2020 [OS2-03] 深層学習における半教師あり学習の最新動向SSII2020 [OS2-03] 深層学習における半教師あり学習の最新動向
SSII2020 [OS2-03] 深層学習における半教師あり学習の最新動向SSII
 
1.単純パーセプトロンと学習アルゴリズム
1.単純パーセプトロンと学習アルゴリズム1.単純パーセプトロンと学習アルゴリズム
1.単純パーセプトロンと学習アルゴリズム浩気 西山
 
深層生成モデルを用いたマルチモーダル学習
深層生成モデルを用いたマルチモーダル学習深層生成モデルを用いたマルチモーダル学習
深層生成モデルを用いたマルチモーダル学習Masahiro Suzuki
 
Rでisomap(多様体学習のはなし)
Rでisomap(多様体学習のはなし)Rでisomap(多様体学習のはなし)
Rでisomap(多様体学習のはなし)Kohta Ishikawa
 
「いい検索」を考える
「いい検索」を考える「いい検索」を考える
「いい検索」を考えるShuryo Uchida
 
プログラマのための線形代数再入門2 〜 要件定義から学ぶ行列式と逆行列
プログラマのための線形代数再入門2 〜 要件定義から学ぶ行列式と逆行列プログラマのための線形代数再入門2 〜 要件定義から学ぶ行列式と逆行列
プログラマのための線形代数再入門2 〜 要件定義から学ぶ行列式と逆行列Taketo Sano
 
【宝くじ仮説】The Lottery Ticket Hypothesis: Finding Small, Trainable Neural Networks
【宝くじ仮説】The Lottery Ticket Hypothesis: Finding Small, Trainable Neural Networks【宝くじ仮説】The Lottery Ticket Hypothesis: Finding Small, Trainable Neural Networks
【宝くじ仮説】The Lottery Ticket Hypothesis: Finding Small, Trainable Neural NetworksYosuke Shinya
 
Gradient Tree Boosting はいいぞ
Gradient Tree Boosting はいいぞGradient Tree Boosting はいいぞ
Gradient Tree Boosting はいいぞ7X RUSK
 
公平性を保証したAI/機械学習
アルゴリズムの最新理論
公平性を保証したAI/機械学習
アルゴリズムの最新理論公平性を保証したAI/機械学習
アルゴリズムの最新理論
公平性を保証したAI/機械学習
アルゴリズムの最新理論Kazuto Fukuchi
 
確率的推論と行動選択
確率的推論と行動選択確率的推論と行動選択
確率的推論と行動選択Masahiro Suzuki
 
Introduction to A3C model
Introduction to A3C modelIntroduction to A3C model
Introduction to A3C modelWEBFARMER. ltd.
 
[Retail & CPG Day 2019] 마켓컬리 서비스 AWS 이관 및 최적화 여정 - 임상석, 마켓컬리 개발 리더
[Retail & CPG Day 2019] 마켓컬리 서비스 AWS 이관 및 최적화 여정 - 임상석, 마켓컬리 개발 리더[Retail & CPG Day 2019] 마켓컬리 서비스 AWS 이관 및 최적화 여정 - 임상석, 마켓컬리 개발 리더
[Retail & CPG Day 2019] 마켓컬리 서비스 AWS 이관 및 최적화 여정 - 임상석, 마켓컬리 개발 리더Amazon Web Services Korea
 

What's hot (20)

Microsoft Malware Classification Challenge 上位手法の紹介 (in Kaggle Study Meetup)
Microsoft Malware Classification Challenge 上位手法の紹介 (in Kaggle Study Meetup)Microsoft Malware Classification Challenge 上位手法の紹介 (in Kaggle Study Meetup)
Microsoft Malware Classification Challenge 上位手法の紹介 (in Kaggle Study Meetup)
 
自己教師学習(Self-Supervised Learning)
自己教師学習(Self-Supervised Learning)自己教師学習(Self-Supervised Learning)
自己教師学習(Self-Supervised Learning)
 
Monitoring - 入門監視
Monitoring - 入門監視Monitoring - 入門監視
Monitoring - 入門監視
 
Active Learning 入門
Active Learning 入門Active Learning 入門
Active Learning 入門
 
KDD2018 DiDi 「large-scale order dispatch in on-demand ride-hailing platforms:...
KDD2018 DiDi 「large-scale order dispatch in on-demand ride-hailing platforms:...KDD2018 DiDi 「large-scale order dispatch in on-demand ride-hailing platforms:...
KDD2018 DiDi 「large-scale order dispatch in on-demand ride-hailing platforms:...
 
多段階計算の型システムの基礎
多段階計算の型システムの基礎多段階計算の型システムの基礎
多段階計算の型システムの基礎
 
猫でも分かるVariational AutoEncoder
猫でも分かるVariational AutoEncoder猫でも分かるVariational AutoEncoder
猫でも分かるVariational AutoEncoder
 
初探 Elastic Observability 的實踐方法
初探 Elastic Observability 的實踐方法初探 Elastic Observability 的實踐方法
初探 Elastic Observability 的實踐方法
 
SSII2020 [OS2-03] 深層学習における半教師あり学習の最新動向
SSII2020 [OS2-03] 深層学習における半教師あり学習の最新動向SSII2020 [OS2-03] 深層学習における半教師あり学習の最新動向
SSII2020 [OS2-03] 深層学習における半教師あり学習の最新動向
 
1.単純パーセプトロンと学習アルゴリズム
1.単純パーセプトロンと学習アルゴリズム1.単純パーセプトロンと学習アルゴリズム
1.単純パーセプトロンと学習アルゴリズム
 
深層生成モデルを用いたマルチモーダル学習
深層生成モデルを用いたマルチモーダル学習深層生成モデルを用いたマルチモーダル学習
深層生成モデルを用いたマルチモーダル学習
 
Rでisomap(多様体学習のはなし)
Rでisomap(多様体学習のはなし)Rでisomap(多様体学習のはなし)
Rでisomap(多様体学習のはなし)
 
「いい検索」を考える
「いい検索」を考える「いい検索」を考える
「いい検索」を考える
 
プログラマのための線形代数再入門2 〜 要件定義から学ぶ行列式と逆行列
プログラマのための線形代数再入門2 〜 要件定義から学ぶ行列式と逆行列プログラマのための線形代数再入門2 〜 要件定義から学ぶ行列式と逆行列
プログラマのための線形代数再入門2 〜 要件定義から学ぶ行列式と逆行列
 
【宝くじ仮説】The Lottery Ticket Hypothesis: Finding Small, Trainable Neural Networks
【宝くじ仮説】The Lottery Ticket Hypothesis: Finding Small, Trainable Neural Networks【宝くじ仮説】The Lottery Ticket Hypothesis: Finding Small, Trainable Neural Networks
【宝くじ仮説】The Lottery Ticket Hypothesis: Finding Small, Trainable Neural Networks
 
Gradient Tree Boosting はいいぞ
Gradient Tree Boosting はいいぞGradient Tree Boosting はいいぞ
Gradient Tree Boosting はいいぞ
 
公平性を保証したAI/機械学習
アルゴリズムの最新理論
公平性を保証したAI/機械学習
アルゴリズムの最新理論公平性を保証したAI/機械学習
アルゴリズムの最新理論
公平性を保証したAI/機械学習
アルゴリズムの最新理論
 
確率的推論と行動選択
確率的推論と行動選択確率的推論と行動選択
確率的推論と行動選択
 
Introduction to A3C model
Introduction to A3C modelIntroduction to A3C model
Introduction to A3C model
 
[Retail & CPG Day 2019] 마켓컬리 서비스 AWS 이관 및 최적화 여정 - 임상석, 마켓컬리 개발 리더
[Retail & CPG Day 2019] 마켓컬리 서비스 AWS 이관 및 최적화 여정 - 임상석, 마켓컬리 개발 리더[Retail & CPG Day 2019] 마켓컬리 서비스 AWS 이관 및 최적화 여정 - 임상석, 마켓컬리 개발 리더
[Retail & CPG Day 2019] 마켓컬리 서비스 AWS 이관 및 최적화 여정 - 임상석, 마켓컬리 개발 리더
 

Viewers also liked

Data Data Everywhere: Not An Insight to Take Action Upon
Data Data Everywhere: Not An Insight to Take Action UponData Data Everywhere: Not An Insight to Take Action Upon
Data Data Everywhere: Not An Insight to Take Action UponArun Kejariwal
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection철 김
 
Finding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactFinding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactArun Kejariwal
 
Real Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and SystemsReal Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and SystemsArun Kejariwal
 
Anomaly detection in real-time data streams using Heron
Anomaly detection in real-time data streams using HeronAnomaly detection in real-time data streams using Heron
Anomaly detection in real-time data streams using HeronArun Kejariwal
 
Anomaly Detection @Twitter
Anomaly Detection @TwitterAnomaly Detection @Twitter
Anomaly Detection @TwitterZhan Zhang
 
Isolating Events from the Fail Whale
Isolating Events from the Fail WhaleIsolating Events from the Fail Whale
Isolating Events from the Fail WhaleArun Kejariwal
 
Gimme More! Supporting User Growth in a Performant and Efficient Fashion
Gimme More! Supporting User Growth in a Performant and Efficient FashionGimme More! Supporting User Growth in a Performant and Efficient Fashion
Gimme More! Supporting User Growth in a Performant and Efficient FashionArun Kejariwal
 
When Data is Everywhere, Where Do You Start?: Using Drupal to Manage, Distrib...
When Data is Everywhere, Where Do You Start?: Using Drupal to Manage, Distrib...When Data is Everywhere, Where Do You Start?: Using Drupal to Manage, Distrib...
When Data is Everywhere, Where Do You Start?: Using Drupal to Manage, Distrib...Forum One
 
Everyone Is an Analyst and Data Is Everywhere, But Research Has Never Been Ne...
Everyone Is an Analyst and Data Is Everywhere, But Research Has Never Been Ne...Everyone Is an Analyst and Data Is Everywhere, But Research Has Never Been Ne...
Everyone Is an Analyst and Data Is Everywhere, But Research Has Never Been Ne...MRAMidAtlanticChapter
 
Everyone is a Data Analyst Adobe EMEA Summit 2014
Everyone is a Data Analyst Adobe EMEA Summit 2014Everyone is a Data Analyst Adobe EMEA Summit 2014
Everyone is a Data Analyst Adobe EMEA Summit 2014Simon James
 
Days In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy serviceDays In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy serviceArun Kejariwal
 
A Systematic Approach to Capacity Planning in the Real World
A Systematic Approach to Capacity Planning in the Real WorldA Systematic Approach to Capacity Planning in the Real World
A Systematic Approach to Capacity Planning in the Real WorldArun Kejariwal
 
Time series Analysis & fpp package
Time series Analysis & fpp packageTime series Analysis & fpp package
Time series Analysis & fpp packageDr. Fiona McGroarty
 
Anomaly detection : QuantUniversity Workshop
Anomaly detection : QuantUniversity Workshop Anomaly detection : QuantUniversity Workshop
Anomaly detection : QuantUniversity Workshop QuantUniversity
 
Data, data, everywhere… - SEE UK - 2016
Data, data, everywhere… - SEE UK - 2016Data, data, everywhere… - SEE UK - 2016
Data, data, everywhere… - SEE UK - 2016TOPdesk
 
Anomaly detection Meetup Slides
Anomaly detection Meetup SlidesAnomaly detection Meetup Slides
Anomaly detection Meetup SlidesQuantUniversity
 

Viewers also liked (20)

Data Data Everywhere: Not An Insight to Take Action Upon
Data Data Everywhere: Not An Insight to Take Action UponData Data Everywhere: Not An Insight to Take Action Upon
Data Data Everywhere: Not An Insight to Take Action Upon
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Finding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactFinding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impact
 
Velocity 2015-final
Velocity 2015-finalVelocity 2015-final
Velocity 2015-final
 
Real Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and SystemsReal Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and Systems
 
Anomaly detection in real-time data streams using Heron
Anomaly detection in real-time data streams using HeronAnomaly detection in real-time data streams using Heron
Anomaly detection in real-time data streams using Heron
 
Anomaly Detection @Twitter
Anomaly Detection @TwitterAnomaly Detection @Twitter
Anomaly Detection @Twitter
 
Isolating Events from the Fail Whale
Isolating Events from the Fail WhaleIsolating Events from the Fail Whale
Isolating Events from the Fail Whale
 
Gimme More! Supporting User Growth in a Performant and Efficient Fashion
Gimme More! Supporting User Growth in a Performant and Efficient FashionGimme More! Supporting User Growth in a Performant and Efficient Fashion
Gimme More! Supporting User Growth in a Performant and Efficient Fashion
 
When Data is Everywhere, Where Do You Start?: Using Drupal to Manage, Distrib...
When Data is Everywhere, Where Do You Start?: Using Drupal to Manage, Distrib...When Data is Everywhere, Where Do You Start?: Using Drupal to Manage, Distrib...
When Data is Everywhere, Where Do You Start?: Using Drupal to Manage, Distrib...
 
Everyone Is an Analyst and Data Is Everywhere, But Research Has Never Been Ne...
Everyone Is an Analyst and Data Is Everywhere, But Research Has Never Been Ne...Everyone Is an Analyst and Data Is Everywhere, But Research Has Never Been Ne...
Everyone Is an Analyst and Data Is Everywhere, But Research Has Never Been Ne...
 
Everyone is a Data Analyst Adobe EMEA Summit 2014
Everyone is a Data Analyst Adobe EMEA Summit 2014Everyone is a Data Analyst Adobe EMEA Summit 2014
Everyone is a Data Analyst Adobe EMEA Summit 2014
 
Days In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy serviceDays In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy service
 
A Systematic Approach to Capacity Planning in the Real World
A Systematic Approach to Capacity Planning in the Real WorldA Systematic Approach to Capacity Planning in the Real World
A Systematic Approach to Capacity Planning in the Real World
 
Time series Analysis & fpp package
Time series Analysis & fpp packageTime series Analysis & fpp package
Time series Analysis & fpp package
 
PyGotham 2016
PyGotham 2016PyGotham 2016
PyGotham 2016
 
Anomaly detection : QuantUniversity Workshop
Anomaly detection : QuantUniversity Workshop Anomaly detection : QuantUniversity Workshop
Anomaly detection : QuantUniversity Workshop
 
Data, data, everywhere… - SEE UK - 2016
Data, data, everywhere… - SEE UK - 2016Data, data, everywhere… - SEE UK - 2016
Data, data, everywhere… - SEE UK - 2016
 
Anomaly detection Meetup Slides
Anomaly detection Meetup SlidesAnomaly detection Meetup Slides
Anomaly detection Meetup Slides
 

Similar to Statistical Learning Based Anomaly Detection @ Twitter

Anomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningAnomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningKuppusamy P
 
Monte Carlo Schedule Risk Analysis
Monte Carlo Schedule Risk AnalysisMonte Carlo Schedule Risk Analysis
Monte Carlo Schedule Risk AnalysisIntaver Insititute
 
Monte Carlo and Schedule Risk Analysis
Monte Carlo and Schedule Risk AnalysisMonte Carlo and Schedule Risk Analysis
Monte Carlo and Schedule Risk AnalysisIntaver Insititute
 
Wqtc2013 invest ofperformanceprobswitheds-20130910
Wqtc2013 invest ofperformanceprobswitheds-20130910Wqtc2013 invest ofperformanceprobswitheds-20130910
Wqtc2013 invest ofperformanceprobswitheds-20130910John B. Cook, PE, CEO
 
Sampling-SDM2012_Jun
Sampling-SDM2012_JunSampling-SDM2012_Jun
Sampling-SDM2012_JunMDO_Lab
 
Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...
Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...
Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...Spark Summit
 
Graph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph AnalyticsGraph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph AnalyticsNesreen K. Ahmed
 
TAO Refresh - Automation of Data Spike Flagging Quality
TAO Refresh - Automation of Data Spike Flagging Quality TAO Refresh - Automation of Data Spike Flagging Quality
TAO Refresh - Automation of Data Spike Flagging Quality Sathishkumar Samiappan
 
Forecasting time series powerful and simple
Forecasting time series powerful and simpleForecasting time series powerful and simple
Forecasting time series powerful and simpleIvo Andreev
 
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...thanhdowork
 
Combining remote sensing earth observations and in situ networks: detection o...
Combining remote sensing earth observations and in situ networks: detection o...Combining remote sensing earth observations and in situ networks: detection o...
Combining remote sensing earth observations and in situ networks: detection o...Integrated Carbon Observation System (ICOS)
 
Running windmills with machine learning - Anette Bergo
Running windmills with machine learning - Anette BergoRunning windmills with machine learning - Anette Bergo
Running windmills with machine learning - Anette BergoThoughtworks
 
impervious cover
impervious coverimpervious cover
impervious coverJames Yang
 
Lightweight Neighborhood Cardinality Estimation in Dynamic Wireless Networks ...
Lightweight Neighborhood Cardinality Estimation in Dynamic Wireless Networks ...Lightweight Neighborhood Cardinality Estimation in Dynamic Wireless Networks ...
Lightweight Neighborhood Cardinality Estimation in Dynamic Wireless Networks ...Marco Cattani
 
Weather Data: Why Accuracy is More Complicated Than You Think
Weather Data: Why Accuracy is More Complicated Than You ThinkWeather Data: Why Accuracy is More Complicated Than You Think
Weather Data: Why Accuracy is More Complicated Than You ThinkMETER Group, Inc. USA
 
Flight Delay Prediction Model (2)
Flight Delay Prediction Model (2)Flight Delay Prediction Model (2)
Flight Delay Prediction Model (2)Shubham Gupta
 
Looking out for anomalies
Looking out for anomaliesLooking out for anomalies
Looking out for anomaliesCSIRO
 
7 8. emi - analog instruments and digital instruments
7 8. emi - analog instruments and digital instruments7 8. emi - analog instruments and digital instruments
7 8. emi - analog instruments and digital instrumentsJawad Khan
 
autonomus Bike Progress
autonomus Bike Progressautonomus Bike Progress
autonomus Bike ProgressNadeem Qandeel
 

Similar to Statistical Learning Based Anomaly Detection @ Twitter (20)

Anomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningAnomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine Learning
 
Monte Carlo Schedule Risk Analysis
Monte Carlo Schedule Risk AnalysisMonte Carlo Schedule Risk Analysis
Monte Carlo Schedule Risk Analysis
 
Spc
SpcSpc
Spc
 
Monte Carlo and Schedule Risk Analysis
Monte Carlo and Schedule Risk AnalysisMonte Carlo and Schedule Risk Analysis
Monte Carlo and Schedule Risk Analysis
 
Wqtc2013 invest ofperformanceprobswitheds-20130910
Wqtc2013 invest ofperformanceprobswitheds-20130910Wqtc2013 invest ofperformanceprobswitheds-20130910
Wqtc2013 invest ofperformanceprobswitheds-20130910
 
Sampling-SDM2012_Jun
Sampling-SDM2012_JunSampling-SDM2012_Jun
Sampling-SDM2012_Jun
 
Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...
Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...
Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...
 
Graph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph AnalyticsGraph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph Analytics
 
TAO Refresh - Automation of Data Spike Flagging Quality
TAO Refresh - Automation of Data Spike Flagging Quality TAO Refresh - Automation of Data Spike Flagging Quality
TAO Refresh - Automation of Data Spike Flagging Quality
 
Forecasting time series powerful and simple
Forecasting time series powerful and simpleForecasting time series powerful and simple
Forecasting time series powerful and simple
 
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
 
Combining remote sensing earth observations and in situ networks: detection o...
Combining remote sensing earth observations and in situ networks: detection o...Combining remote sensing earth observations and in situ networks: detection o...
Combining remote sensing earth observations and in situ networks: detection o...
 
Running windmills with machine learning - Anette Bergo
Running windmills with machine learning - Anette BergoRunning windmills with machine learning - Anette Bergo
Running windmills with machine learning - Anette Bergo
 
impervious cover
impervious coverimpervious cover
impervious cover
 
Lightweight Neighborhood Cardinality Estimation in Dynamic Wireless Networks ...
Lightweight Neighborhood Cardinality Estimation in Dynamic Wireless Networks ...Lightweight Neighborhood Cardinality Estimation in Dynamic Wireless Networks ...
Lightweight Neighborhood Cardinality Estimation in Dynamic Wireless Networks ...
 
Weather Data: Why Accuracy is More Complicated Than You Think
Weather Data: Why Accuracy is More Complicated Than You ThinkWeather Data: Why Accuracy is More Complicated Than You Think
Weather Data: Why Accuracy is More Complicated Than You Think
 
Flight Delay Prediction Model (2)
Flight Delay Prediction Model (2)Flight Delay Prediction Model (2)
Flight Delay Prediction Model (2)
 
Looking out for anomalies
Looking out for anomaliesLooking out for anomalies
Looking out for anomalies
 
7 8. emi - analog instruments and digital instruments
7 8. emi - analog instruments and digital instruments7 8. emi - analog instruments and digital instruments
7 8. emi - analog instruments and digital instruments
 
autonomus Bike Progress
autonomus Bike Progressautonomus Bike Progress
autonomus Bike Progress
 

More from Arun Kejariwal

Anomaly Detection At The Edge
Anomaly Detection At The EdgeAnomaly Detection At The Edge
Anomaly Detection At The EdgeArun Kejariwal
 
Serverless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the EnterpriseServerless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the EnterpriseArun Kejariwal
 
Sequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesArun Kejariwal
 
Sequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesArun Kejariwal
 
Model Serving via Pulsar Functions
Model Serving via Pulsar FunctionsModel Serving via Pulsar Functions
Model Serving via Pulsar FunctionsArun Kejariwal
 
Designing Modern Streaming Data Applications
Designing Modern Streaming Data ApplicationsDesigning Modern Streaming Data Applications
Designing Modern Streaming Data ApplicationsArun Kejariwal
 
Correlation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsCorrelation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsArun Kejariwal
 
Deep Learning for Time Series Data
Deep Learning for Time Series DataDeep Learning for Time Series Data
Deep Learning for Time Series DataArun Kejariwal
 
Correlation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsCorrelation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsArun Kejariwal
 
Live Anomaly Detection
Live Anomaly DetectionLive Anomaly Detection
Live Anomaly DetectionArun Kejariwal
 
Modern real-time streaming architectures
Modern real-time streaming architecturesModern real-time streaming architectures
Modern real-time streaming architecturesArun Kejariwal
 
Techniques for Minimizing Cloud Footprint
Techniques for Minimizing Cloud FootprintTechniques for Minimizing Cloud Footprint
Techniques for Minimizing Cloud FootprintArun Kejariwal
 
A Tool for Practical Garbage Collection Analysis In the Cloud
A Tool for Practical Garbage Collection Analysis In the CloudA Tool for Practical Garbage Collection Analysis In the Cloud
A Tool for Practical Garbage Collection Analysis In the CloudArun Kejariwal
 

More from Arun Kejariwal (13)

Anomaly Detection At The Edge
Anomaly Detection At The EdgeAnomaly Detection At The Edge
Anomaly Detection At The Edge
 
Serverless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the EnterpriseServerless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the Enterprise
 
Sequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time Series
 
Sequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time Series
 
Model Serving via Pulsar Functions
Model Serving via Pulsar FunctionsModel Serving via Pulsar Functions
Model Serving via Pulsar Functions
 
Designing Modern Streaming Data Applications
Designing Modern Streaming Data ApplicationsDesigning Modern Streaming Data Applications
Designing Modern Streaming Data Applications
 
Correlation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsCorrelation Analysis on Live Data Streams
Correlation Analysis on Live Data Streams
 
Deep Learning for Time Series Data
Deep Learning for Time Series DataDeep Learning for Time Series Data
Deep Learning for Time Series Data
 
Correlation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsCorrelation Analysis on Live Data Streams
Correlation Analysis on Live Data Streams
 
Live Anomaly Detection
Live Anomaly DetectionLive Anomaly Detection
Live Anomaly Detection
 
Modern real-time streaming architectures
Modern real-time streaming architecturesModern real-time streaming architectures
Modern real-time streaming architectures
 
Techniques for Minimizing Cloud Footprint
Techniques for Minimizing Cloud FootprintTechniques for Minimizing Cloud Footprint
Techniques for Minimizing Cloud Footprint
 
A Tool for Practical Garbage Collection Analysis In the Cloud
A Tool for Practical Garbage Collection Analysis In the CloudA Tool for Practical Garbage Collection Analysis In the Cloud
A Tool for Practical Garbage Collection Analysis In the Cloud
 

Recently uploaded

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 

Statistical Learning Based Anomaly Detection @ Twitter

  • 1. Sta$s$cal Learning Based Anomaly Detec$on @ Twi9er Arun Kejariwal (@arun_kejariwal) Joint work with Jordan Hochenbaum and Owen Vallis November 2014
  • 2. Internet trends • Real-time [1] h9p://techcrunch.com/2014/05/05/amazon-­‐extends-­‐its-­‐shopping-­‐cart-­‐to-­‐twi9er/ AK 2 [1]
  • 3. Twi9er: Global Town Square AK 3
  • 4. Data Fidelity • Data-driven decision making q Evolving product landscape • Data partners q Nielsen q Dataminr • Operational q Performance and Availability AK 4
  • 5. Data Fidelity: Challenges • Anomalies q Exogenic factors § User behavior § Events § Data center q Endogenic factors § Agile development o Fail fast § Data collection • Millions of time series [1,2] q Scalability AK 5 [1] h9p://strata.oreilly.com/2013/09/how-­‐twi9er-­‐monitors-­‐millions-­‐of-­‐$me-­‐series.html [2] h9p://strataconf.com/strata2014/public/schedule/detail/32431
  • 6. Anomaly Detec$on: Why Bother? • Analyze User Engagement q Events § Super Bowl, Japanese New Year q Year over year analysis (input to forecasting) • Identify Attacks q DoS q Malware attacks • Identify Bots q Separating actual users from spam AK 6
  • 7. Anomaly Detec$on • Visual q Prone to errors q Not scalable § Machine generated data 11% of the digital universe in 2005 to > 40% by 2020 [1] § Cloud Infrastructure 2013-2017 CAGR ~50% [2] • Algorithmic approach q Automate! [1] h9p://www.emc.com/about/news/press/2012/20121211-­‐01.htm AK 7 [2] h9p://www.forbes.com/sites/gilpress/2013/12/12/16-­‐1-­‐billion-­‐big-­‐data-­‐market-­‐2014-­‐predic$ons-­‐from-­‐idc-­‐and-­‐iia/
  • 8. Anomaly Detec$on: Background • Over 50 years of research [1] q Statistics § Extreme Value Theory § Robust Statistics, Grubb’s Test, ESD q Econometrics q Finance § Value at Risk (VaR) q Signal Processing q Music Information Retrieval q Networking q E- Commerce q Performance Regression [1] “Anomaly Detec$on” by Chandola et al. ACM Compu$ng Surveys, 2009. AK 8 Jon from Etsy Toufic from Metafor
  • 9. Anomaly Detec$on: Overview • Definition q “An anomaly is an observation that deviates so much from other observations so as to arouse suspicions that it is was generated by a different mechanism” [1,2] [1] “Iden$fica$on of outliers” by Hawkins, Douglas M. London: Chapman and Hall, 1980. AK 9 [2] “Outlier Analysis” by Charu C. Aggarwal. Springer, 2013.
  • 10. Anomaly Detec$on • Characterization q Magnitude q Width q Frequency q Direction AK 10
  • 11. Anomaly Detec$on (contd.) • Two flavors q Global § Max Value q Local § Intra-day AK 11 Global Local
  • 12. Anomaly Detec$on (contd.) • Traditional Approaches q Metrics § Mean μ § Variance σ q Rule of thumb § μ + 3*σ q Which time series? § Raw § Moving Averages o SMA, EWMA, PEWMA AK 12 3 * σ
  • 13. Anomaly Detec$on (contd.) • Impact of multi-modal distribution q μ Shift ~ 0.2% q Inflates σ by 4.5% § Miss quite a few anomalies q What do multiple modes correspond to? § Seasonality AK 13
  • 14. • Robust Statistics q MAD § Robust Breakdown point o Median 50% vs. Mean 0% q σMAD § K = 1.4826 for normally distributed data AK 14 Anomaly Detec$on (contd.)
  • 15. • Limitations of using MAD AK 15 Anomaly Detec$on (contd.)
  • 16. • Grubb’s Test q Critical value is derived from data using a statistical confidence (α) • Limitations q Assumes data distribution is normal q Good for detecting ONLY 1 outlier q Seasonality unaware AK 16 Anomaly Detec$on (contd.)
  • 17. • ESD (Generalized Extreme Studentized Deviate) [1] q Critical value (λi) re-calculated every iteration q Largest i such that Ri > λi determines # of anomalies q An upper-bound on the number of anomalies is an input parameter • Limitations q Generalized ESD assumes a “normal” distribution q Seasonality unaware AK 17 Anomaly Detec$on (contd.) [1] Rosner, Bernard. “Percentage Points for a Generalized ESD Many-­‐outlier Procedure.” Technometrics 25, no. 2 (1983): 165–172.
  • 19. • Addressing Seasonality q Key Idea § Time Series Decomposition AK 19 Anomaly Detec$on (contd.)
  • 20. • Determining seasonal component q Regression on sub-cycle plots [1] AK 20 Anomaly Detec$on (contd.) [1] “STL: A seasonal-­‐trend decomposi$on procedure based on loess” by Cleveland, et al. Journal of Official Sta$s$cs, Vol. 6, Issue 1, 1990.
  • 21. • Impact of removal of seasonal and trend q Transforms our multi-modal data into unimodal data. § Amenable to ESD/MAD! AK 21 Anomaly Detec$on (contd.) The decomposed Residual becomes "Uni-modal". This significantly shrinks the value of sigma. The original "Multi-Modal" Raw Data has a much wider value for sigma, leading ESD to miss a lot of the outliers.
  • 22. Trend Smoothing Distortion Creates “Phantom” Anomalies • Challenges remain! AK 22 Anomaly Detec$on (contd.)
  • 23. • Marrying Robust Statistics with Seasonal Decomposition AK 23 Anomaly Detec$on (contd.) Median is Free from Distortion
  • 24. • Applying ESD on the Residual AK 24 Anomaly Detec$on (contd.) Decomposition Exposes Anomalies
  • 25. • Recap q Extract the seasonal component using STL § Filters out periodic spikes q Residual = Raw - Seasonalraw- Medianraw q Run ESD on residual (using median and MAD) AK 25 Anomaly Detec$on (contd.)
  • 26. • Illustrative example AK 26 Anomaly Detec$on (contd.)
  • 27. • Applications q Three perspectives § Capacity o CPU utilization o Garbage collection o Network activity § User behavior o Events • Impressions • Link clicks o Spam § Forecasting AK 27 Anomaly Detec$on (contd.)
  • 28. • Deployed in production q Used by large number of services at Twitter q Automatic e-mail notification § Only sent if anomalies are present § Anomalies annotated § CSV with anomaly locations attached AK 28 Anomaly Detec$on (contd.)
  • 29. • Skyline from Etsy q https://github.com/etsy/skyline/blob/master/src/analyzer/algorithms.py • Coming soon! q R package AK 29 Open Sourcing
  • 30. Join the Flock Like problem solving? Like challenges? Be at cukng Edge Make an impact • We are hiring!! q https://twitter.com/JoinTheFlock q https://twitter.com/jobs q Contact us: @arun_kejariwal AK 30