Data Applied:Outliers

679 views

Published on

Data Applied:Outliers

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
679
On SlideShare
0
From Embeds
0
Number of Embeds
18
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Data Applied:Outliers

  1. 1. 5<br /> Data-Applied.com: Outliers<br />
  2. 2. Introduction<br />Outliers are examples in a database with unusual properties<br />They can produce erroneous results and hence should be removed<br />The algorithm we will define, differs from a conventional nested loop outlier detection algorithm by an improved running time <br />
  3. 3. Pseudo code<br />Procedure: Find Outliers<br />Input: k, the number of nearest neighbours; n, the number of outliers to return; D, a set of examples in random order.<br />Output: O, a set of outliers.<br />Let maxdist(x, Y ) return the maximum distance between x and an example in Y .<br />Let Closest(x, Y , k) return the k closest examples in Y to x.<br />begin<br />1. c <- 0 // set the cut off for pruning to 0<br />2. O <- Null // initialize to the empty set<br />3. while B<- get-next-block(D) {// load a block of examples from D<br />4. Neighbours(b) <- NULL for all b in B<br />
  4. 4. Pseudo code (contd.)<br />5. for each d in D {<br />6. for each b in B, b != d { <br />7. if |Neighbours(b)| < k or distance(b, d) < maxdist(b, Neighbours(b)) { <br />8. Neighbours(b) <- Closest(b, Neighbours(b) U d, k)<br />9. if score(Neighbours(b),b) < c {<br />10. remove b from B<br />11. }}}}<br />12. O <- Top(B U O, n) // keep only the top n outliers<br />13. c <- min(score(o)) for all o in O // the cutoff is the score of the weakest outlier<br />14. }<br />15. return O<br />end<br />
  5. 5. Outliers using Data Applied’s web interface<br />
  6. 6. Step1: Selection of data<br />
  7. 7. Step2: Selecting Outliers<br />
  8. 8. Step 3: Result<br />
  9. 9. Visit more self help tutorials<br /><ul><li>Pick a tutorial of your choice and browse through it at your own pace.
  10. 10. The tutorials section is free, self-guiding and will not involve any additional support.
  11. 11. Visit us at www.dataminingtools.net</li>

×