Reverse nearest neighbors in unsupervised distance based outlier

•Download as DOCX, PDF•

0 likes•189 views

Outlier detection in high-dimensional data presents various challenges resulting from the “curse of dimensionality.” A prevailing view is that distance concentration, i.e., the tendency of distances in high-dimensional data to become indiscernible, hinders the detection of outliers by making distance-based methods label all points as almost equally good outliers.

Technology

1.
#13/ 19, 1st Floor, Municipal Colony, Kangayanellore Road, Gandhi Nagar, vellore – 6.
Off: 0416-2247353 / 6066663 Mo: +91 9500218218 /8870603602,
Project Titles: http://shakastech.weebly.com/2015-2016-titles
Website: www.shakastech.com, Email - id: shakastech@gmail.com, info@shakastech.com
REVERSE NEAREST NEIGHBORS IN UNSUPERVISED DISTANCE-BASED
OUTLIER DETECTION
ABSTRACT:
Outlier detection in high-dimensional data presents various challenges resulting from the
“curse of dimensionality.” A prevailing view is that distance concentration, i.e., the tendency of
distances in high-dimensional data to become indiscernible, hinders the detection of outliers by
making distance-based methods label all points as almost equally good outliers. In this paper we
provide evidence supporting the opinion that such a view is too simple, by demonstrating that
distance-based methods can produce more contrasting outlier scores in high-dimensional
settings. Furthermore, we show that high dimensionality can have a different impact, by
reexamining the notion of reverse nearest neighbors in the unsupervised outlier-detection
context. Namely, it was recently observed that the distribution of points’ reverse-neighbor counts
becomes skewed in high dimensions, resulting in the phenomenon known as hubness. We
provide insight into how some points (antihubs) appear very infrequently in k-NN lists of other
points, and explain the connection between antihubs, outliers, and existing unsupervised outlier-
detection methods. By evaluating the classic k-NN method, the angle-based technique (ABOD)
designed for high-dimensional data, the density-based local outlier factor (LOF) and influenced
outlierness (INFLO) methods, and antihub-based methods on various synthetic and real-world
data sets, we offer novel insight into the usefulness of reverse neighbor counts in unsupervised
outlier detection.

The presentation was presented by Nenad Tomasev at the 7th International Conference on Machine Learning and Data Mining in Pattern Recognition (MLDM 2011) in New York, NY, USA on August 30th, 2011. Publication: http://bit.ly/yQQsOq Abstract: High-dimensional data are by their very nature often difficult to handle by conventional machine-learning algorithms, which is usually characterized as an aspect of the curse of dimensionality. However, it was shown that some of the arising high-dimensional phenomena can be exploited to increase algorithm accuracy. One such phenomenon is hubness, which refers to the emergence of hubs in high-dimensional spaces, where hubs are influential points included in many k-neighbor sets of other points in the data. This phenomenon was previously used to devise a crisp weighted voting scheme for the k-nearest neighbor classifier. In this paper we go a step further by embracing the soft approach, and propose several fuzzy measures for k-nearest neighbor classification, all based on hubness, which express fuzziness of elements appearing in k-neighborhoods of other points. Experimental evaluation on real data from the UCI repository and the image domain suggests that the fuzzy approach provides a useful measure of confidence in the predicted labels, resulting in improvement over the crisp weighted method, as well the standard kNN classifier.

Nearest Neighbor Algorithm Zaffar AhmedZaffar Ahmed Shaikh

KNN

West Virginia University

A probabilistic misbehavior detection scheme towards efficient trust establis...

Shakas Technologies

Continuous answering-holistic-queries-over sensor networks

Shakas Technologies

Link error and malicious packet dropping are two sources for packet losses in multi-hop wireless ad hoc network. In this paper, while observing a sequence of packet losses in the network, we are interested in determining whether the losses are caused by link errors only, or by the combined effect of link errors and malicious drop. We are especially interested in the insider-attack case, whereby malicious nodes that are part of the route exploit their knowledge of the communication context to selectively drop a small amount of packets critical to the network performance. Because the packet dropping rate in this case is comparable to the channel error rate, conventional algorithms that are based on detecting the packet loss rate cannot achieve satisfactory detection accuracy.

Secure and distributed data discovery and dissemination in wireless sensor ne...

Shakas Technologies

Behavior rule specification based ntrusion detection for safety critical medi...

Shakas Technologies

Secure two party differentially private data release for vertically partition...

Shakas Technologies

Privacy preserving and truthful detection of packet

Shakas Technologies

A framework for secure computations with two

Shakas Technologies

A framework for secure computations with two

Shakas Technologies

User service-rating-prediction-by-exploring social users rating Behaviours

Shakas Technologies

Cost aware secure routing (caser) protocol design for wireless sensor networks

Shakas Technologies

A Mixture Model of Hubness and PCA for Detection of Projected Outliers

Zac Darcy

With the Advancement of time and technology, Outlier Mining methodologies help to sift through the large amount of interesting data patterns and winnows the malicious data entering in any field of concern. It has become indispensible to build not only a robust and a generalised model for anomaly detection but also to dress the same model with extra features like utmost accuracy and precision. Although the K-means algorithm is one of the most popular, unsupervised, unique and the easiest clustering algorithm, yet it can be used to dovetail PCA with hubness and the robust model formed from Guassian Mixture to build a very generalised and a robust anomaly detection system. A major loophole of the K-means algorithm is its constant attempt to find the local minima and result in a cluster that leads to ambiguity. In this paper, an attempt has done to combine K-means algorithm with PCA technique that results in the formation of more closely centred clusters that work more accurately with K-means algorithm .This combination not only provides the great boost to the detection of outliers but also enhances its accuracy and precision.

A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS

Zac Darcy

A Mixture Model of Hubness and PCA for Detection of Projected Outliers

Zac Darcy

Eye gaze tracking with a web camera in a desktop environment

Shakas Technologies

A unified framework for user identification across online and offline data

Shakas Technologies

A unified framework for user identification across online and offline data

Shakas Technologies

Outlier Detection in Data Mining An Essential Component of Semiconductor Manu...

yieldWerx Semiconductor

Outlier detection is a critical research field within data mining due to its vast range of applications including fraud detection, cybersecurity, health diagnostics, and significantly for the semiconductor manufacturing industry. It refers to identifying data points that significantly deviate from expected patterns, providing crucial insights into different aspects of data. However, the ambiguity between outliers and normal behavior, evolving definitions of 'normal', application-specific techniques, and noisy data mimicking outliers, often complicate the outlier detection process. This review article offers an in-depth analysis of the most advanced outlier detection methods, presenting a thorough understanding of future research prospects.

PANDA: PUBLIC AUDITING FOR SHARED DATA WITH EFFICIENT USER REVOCATION IN THE ...

Shakas Technologies

PANDA: PUBLIC AUDITING FOR SHARED DATA WITH EFFICIENT USER REVOCATION IN THE ...

Shakas Technologies

Kdd08 abodKruthikka Palraj

angle based outlier de

Kruthikka Palraj

A Review on Deep-Learning-Based Cyberbullying Detection

Shakas Technologies

A Review on Deep-Learning-Based Cyberbullying Detection Shakas Technologies ( Galaxy of Knowledge) #11/A 2nd East Main Road, Gandhi Nagar, Vellore - 632006. Mobile : +91-9500218218 / 8220150373| land line- 0416- 3552723 Shakas Training & Development | Shakas Sales & Services | Shakas Educational Trust|IEEE projects | Research & Development | Journal Publication | Email : info@shakastech.com | shakastech@gmail.com | website: www.shakastech.com Facebook: https://www.facebook.com/pages/Shakas-Technologies

A Personal Privacy Data Protection Scheme for Encryption and Revocation of Hi...

Shakas Technologies

A Personal Privacy Data Protection Scheme for Encryption and Revocation of High-Dimensional Attri Shakas Technologies ( Galaxy of Knowledge) #11/A 2nd East Main Road, Gandhi Nagar, Vellore - 632006. Mobile : +91-9500218218 / 8220150373| land line- 0416- 3552723 Shakas Training & Development | Shakas Sales & Services | Shakas Educational Trust|IEEE projects | Research & Development | Journal Publication | Email : info@shakastech.com | shakastech@gmail.com | website: www.shakastech.com Facebook: https://www.facebook.com/pages/Shakas-Technologies

Similar to Reverse nearest neighbors in unsupervised distance based outlier

Privacy preserving and truthful detection of packet dropping attacks in wirel...

Shakas Technologies

Secure and distributed data discovery and dissemination in wireless sensor ne...

Shakas Technologies

Behavior rule specification based ntrusion detection for safety critical medi...

Shakas Technologies

Secure two party differentially private data release for vertically partition...

Shakas Technologies

Privacy preserving and truthful detection of packet

Shakas Technologies

A framework for secure computations with two

Shakas Technologies

A framework for secure computations with two

Shakas Technologies

User service-rating-prediction-by-exploring social users rating Behaviours

Shakas Technologies

Cost aware secure routing (caser) protocol design for wireless sensor networks

Shakas Technologies

A Mixture Model of Hubness and PCA for Detection of Projected Outliers

Zac Darcy

A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS

Zac Darcy

A Mixture Model of Hubness and PCA for Detection of Projected Outliers

Zac Darcy

Eye gaze tracking with a web camera in a desktop environment

Shakas Technologies

A unified framework for user identification across online and offline data

Shakas Technologies

A unified framework for user identification across online and offline data

Shakas Technologies

Outlier Detection in Data Mining An Essential Component of Semiconductor Manu...

yieldWerx Semiconductor

PANDA: PUBLIC AUDITING FOR SHARED DATA WITH EFFICIENT USER REVOCATION IN THE ...

Shakas Technologies

PANDA: PUBLIC AUDITING FOR SHARED DATA WITH EFFICIENT USER REVOCATION IN THE ...

Shakas Technologies

Kdd08 abodKruthikka Palraj

angle based outlier de

Kruthikka Palraj

Similar to Reverse nearest neighbors in unsupervised distance based outlier (20)