Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

BigML Education - Anomaly Detection

250 views

Published on

Learn how to identify unusual instances in your data using BigML's Anomaly Detector.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

BigML Education - Anomaly Detection

  1. 1. BigML Education Anomaly Detection July 2017
  2. 2. BigML Education Program 2Ensembles In This Video • Definition of anomaly detection • Creation and interpretation of a BigML anomaly detector • Generating an anomaly-free dataset • Scoring instances with the trained anomaly detector
  3. 3. BigML Education Program 3Ensembles Unsupervised Learning • Supervised learning • One field is the “objective field” (or “target variable”, or “label”) that is to be predicted • The algorithm is trying to create a model that makes this prediction accurately • Unsupervised learning • Algorithm is trying to discover some structure in the data • Learned structure can often be applied to new data
  4. 4. BigML Education Program 4Ensembles Anomalies in a Dataset Anomalous Instances
  5. 5. BigML Education Program 5Ensembles Applications • Detecting rare, malicious behavior (fraud, intrusion) • Alerting service technicians to possible failures • Filtering of anomalies for “cleaner” supervised learning • Assessing model competence
  6. 6. BigML Education Program 6Ensembles Isolation Forests 4 Chapter 2. Understanding Anomalies Figure 2.1: Graphic representation example of a normal data point (left) versus an anomalous data point (right) When all instances have been isolated, BigML automatically calculates an anomaly score by averaging the number of splits needed to isolate an instance across trees in the ensemble. Lower number of splits will result in higher scores. Then these averages are normalized to get a final score that can take values between 0% and 100%. This score measures how anomalous an instance is, e.g., the red data point on the left in Figure 2.1 took 10 partitions to isolate, while the one on the right took only 4, so the one on the right will have a higher anomaly score. xo - Easy to Isolate 4 Figure 2.1: Graphic representation example of a n anomalous data point (right) When all instances have been isolated, BigML automatically the number of splits needed to isolate an instance across splits will result in higher scores. Then these averages are n values between 0% and 100%. This score measures how a point on the left in Figure 2.1 took 10 partitions to isolate, w one on the right will have a higher anomaly score. xi - Difficult to Isolate
  7. 7. BigML Education Program 7Ensembles Review • Anomaly detection is a way of detecting unusual instances in your dataset • Detecting anomalies has many important real-world use cases • The BigML interface allows you to easily view and interact with the detected anomalies in your dataset • You can create a new dataset with your anomaly detector, either by filtering anomalies from the training data, or scoring a new dataset with the trained anomaly detector

×