Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Anomaly detection made easy

1,299 views

Published on

How we detect anomalies @Allegro

Published in: Data & Analytics
  • Advertisers run several ad campaigns across multiple websites and mobile apps. These ad campaigns' KPIs need to be proactively monitored and optimized to increase their ROI. Hence, we have built our own automated campaign data anomaly detection system using machine learning. This system will help spot data anomalies in campaign performance data for thousands of campaigns dailyAdvertisers run several ad campaigns across multiple websites and mobile apps. These ad campaigns' KPIs need to be proactively monitored and optimized to increase their ROI. Hence, we have built our own automated campaign data anomaly detection system using machine learning. This system will help spot data anomalies in campaign performance data for thousands of campaigns daily http://bit.ly/2N5Z6kh
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Prezentacja bardzo dobra, świetnie poprowadzona. Tylko jedna mała uwaga edytorska - przed pytajnikiem nie stawiamy spacji.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Anomaly detection made easy

  1. 1. Anomaly Detection made easy Piotr Guzik
  2. 2. $whoami 2 ● Data Engineer @Allegro (Scala, Kafka, Spark, Ansible, ML) ● Trainer @GetInData ● https://twitter.com/guzik_io ● Data Science flavour
  3. 3. Why anomaly detection is interesting ? 3 Anomaly detection on clickstream is all about: ● SLA (data should be a first-class citizen) ● You should be the first to know if something is wrong “Engineers in XX Century made mistakes but never by more than one order of magnitude. In IT it is not that good.”
  4. 4. Motivation and goals 4 Goal: Get quick information if the data is lost ● Losing data is somehow similar to losing money ● “You cannot improve what you cannot measure” ● Team responsible for given service should be alerted when something is wrong
  5. 5. How to start ? - important questions 5 ● How to get the data ? ● Real-time detection ? ● Delay ? ● What is an anomaly ?
  6. 6. Discovering datasource and data itself 6 ● Datasource - Druid ● OLAP cube dimensions as domains ● Data aggregated every 15 minutes ● Metric - simplest count
  7. 7. What is a core data ? 7 Data ~= result of the query: ● select count (*) as cnt, category,action,time_window_15_m from page_views where category = ‘Search’ and action = ’ShowItem’ group by category, action, time_window_15_m
  8. 8. First look at the data 8
  9. 9. Knowing the data 9 ● Clickstream is periodical ● Week == period ● Days of week differs a lot ● There is a rapid increase in web traffic about 6PM and it starts to fall at about 10PM
  10. 10. Research 10 Motto: Solution must be easy. Not only for data scientist. Available solutions: ● Twitter library - too hard, heavy math, many hyperparameters ● HTM algorithms - way too hard, neural networks, deep learning, very hard to reason about algorithm and its results We have to create our own simple model
  11. 11. How our model should be ? 11 Perfect model: ● Simple ● Time aware ● Detection is in minutes rather than hours ● Adapt to trends (ads, currently popular items) ● Should not report too many false-positives ● Use confidence intervals
  12. 12. Best tool for inventing algorithm 12
  13. 13. Model draft 13
  14. 14. F.A.I.L. - first attempt in learning 14 Simple statistical model in R First results: ● Rapid change of metric is a problem ● Trend is important but cannot lead to overfitting
  15. 15. Experimenting in progress 15 After model evolution: ● Outliers are problematic (sd) ● Outliers == duplicates of data on HDFS (thank you Camus!) ● Percentiles are great for outliers removal
  16. 16. Problems with R 16 ● Only Data Scientist knows R ● There is not an easy way to deploy it ● You cannot monitor it easily ● It is hard to maintain Decision: we have to rewrite it. From scratch. In Scala.
  17. 17. Input from Druid 17
  18. 18. Model 18
  19. 19. Some math (ema !) 19
  20. 20. Trend (fast changing world) 20
  21. 21. Learning is a difficult process 21 What if we learned something that is not valid anymore ? Mean could be bad, but what about ema ?
  22. 22. Anomaly Detection - almost there ? 22
  23. 23. Anomaly Detection - did we miss something ? 23 ● Long lasting anomaly is not an anomaly anymore ● Loss of data is crucial ● Output should be easy to understand
  24. 24. Long lasting anomalies - key concepts 24 Output: probability (with sign) of anomaly ● Small anomalies should be smoothen and larger should be outraged (monitoring and alerting) ● We define where obvious anomalies starts ● We define after how long we should treat anomalies as a norm (be careful here)
  25. 25. Long lasting anomalies - fix 25 In case of long lasting anomalies, we multiply all model params, as if we were wrong from the beginning
  26. 26. Deployment 26 SaaS model ● Multiple deployments with same codebase ● Different configuration ● Clients define how they want to react
  27. 27. Configuration example 27
  28. 28. Whole team - thank you 28 It was more than just me and my team involved in this process: Big thanks to: ● My team for motivation and hot discussions :) ● Paweł Zawistowski - initial model in R ● Other teams for real use cases (that is why you would like to be in production quickly)
  29. 29. Thank you Q & A Piotr Guzik

×