Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

0

Share

Download to read offline

Mathematics of anomalies

Download to read offline

Why are anomalies important? Because they tell us a different story from the norm. An anomaly might signify a failing heart rate of a patient, a fraudulent credit card activity, or an early indication of a tsunami. As such, it is extremely important to detect anomalies.
There are many anomaly detection algorithms available. Most algorithms have parameters. Parameters are a tricky business because users need to set them. Sometimes it is not clear how to set these parameters. For example, there are anomaly detection algorithms that use kernel density estimates to detect anomalies. But they require the user to set the bandwidth. Setting the bandwidth for anomaly detection is different from setting the bandwidth for general kernel density estimation. Especially in high dimensions this is not an obvious task.
In this talk, we introduce lookout, a new approach that uses topological data analysis to select the bandwidth for anomaly detection. Using this bandwidth lookout uses leave-one-out kernel density estimates and extreme value theory to detect anomalies.
We also define the concept of anomaly persistence, which explores the birth and death of anomalies as the bandwidth changes. If a data point is identified as an anomaly for a large range of bandwidth values, then its significance as an anomaly increases.
The R package lookout implements this algorithm.

  • Be the first to like this

Why are anomalies important? Because they tell us a different story from the norm. An anomaly might signify a failing heart rate of a patient, a fraudulent credit card activity, or an early indication of a tsunami. As such, it is extremely important to detect anomalies. There are many anomaly detection algorithms available. Most algorithms have parameters. Parameters are a tricky business because users need to set them. Sometimes it is not clear how to set these parameters. For example, there are anomaly detection algorithms that use kernel density estimates to detect anomalies. But they require the user to set the bandwidth. Setting the bandwidth for anomaly detection is different from setting the bandwidth for general kernel density estimation. Especially in high dimensions this is not an obvious task. In this talk, we introduce lookout, a new approach that uses topological data analysis to select the bandwidth for anomaly detection. Using this bandwidth lookout uses leave-one-out kernel density estimates and extreme value theory to detect anomalies. We also define the concept of anomaly persistence, which explores the birth and death of anomalies as the bandwidth changes. If a data point is identified as an anomaly for a large range of bandwidth values, then its significance as an anomaly increases. The R package lookout implements this algorithm.

Views

Total views

37

On Slideshare

0

From embeds

0

Number of embeds

2

Actions

Downloads

0

Shares

0

Comments

0

Likes

0

×