SlideShare a Scribd company logo
1 of 6
Download to read offline
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 01 | Jan -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1350
Unsupervised Distance Based Detection of Outliers by using Anti-hubs
Mr. Kiran V. Markad, Mr. Kiran M. Moholkar , Mr. Sopan N. Abdal Department of Information
Technology D Y PATIL College of Engineering, Ambi Talegaon , Pune, India
Prof. Rajani ThiteAssistantProfessor
Department of Information Technology D Y PATIL College of Engineering, Ambi Talegaon , Pune, India
---------------------------------------------------------------------***-------------------------------------------------------------------
Abstract- Distance based outlier detection methods fails
as the dimensionality of the data increases due to all point
becomes good outlier. Reason of these issues is irrelevant
and redundant features; nearest neighbor of the point P is K
points whose distance to point P is less than all other points.
Reverse nearest neighbors (RNN) of Point P is the points for
which P is in their k nearest neighbor list. Some points are
frequently comes in k-nearestneighborlistofanotherpoints
and some points are infrequently comes in k nearest
neighbor list of different points are called as Anti-hubs.
There are many researches of ant hub based unsupervised
outlier detection but there is one issue is occurring that high
computation cost for finding anti hubs. If the data that
having better dimensionality, high computation cost, high
computation complexity, time requirementto findantihubs.
To remove the unwanted feature and high dimensionality
data to make an efficient system results. Feature selection is
the process proposed toremoveunwantedfeatureandmake
a system more efficient. By applying feature selection this
paper extends anti-hub based outlier detection method for
high dimensional data.
Key Words :- High-Dimensional, Detection of Data
Outliers, Reverse nearest Neighbours.
1.INTRODUCTION
There are three main types of outlier detection methods
namely, unsupervised, semi-supervised and supervised.
There is need to find outlier in many application for that we
have to study outlier detection analysis. There is Need of
availability of correct labels of the instances for Supervised
and Semi Supervised outlier detection. Currently
unsupervised technique is used widely which does not need
label to the instances for outlier detection.
Currently the best efficient methodforoutlierdetectionis
unsupervised distance base outlier detection method. The
normal instances have small amount of distances among
them and outliers have large amount of distances among
them in distance base outlier detection. As the increase in
dimensions of data distances are not useful to find outliers
causes every point become an outlier. Therefore, high
dimensionality of data is the biggest problemorchallengefor
unsupervised distance outlier detection. The base paper can
show that unsupervised distance base outlier detection
system can handle high dimensionaldata,itcandetectoutlier
under specific condition i.e. data should be useful and
attributes are meaningful means data should not be noisy
then it can be success to handle high dimensional data. K-
nearest neighbor (KNN) of the point P is K points whose
distance to point P is less than all other points. Reverse
nearest neighbors (RNN) of Point P is the points for which P
is in their k nearest neighbor list. Some points are frequently
comes in k-nearest neighbor list of other points and some
points are infrequently comes in k nearest neighbor list of
some other points are called as Anti-hubs.
For outlier detection RNN concept is used in literature, but
there is no theoretical proof which explores the relation
between the Outlier natures of the points and reverses
nearest neighbors. The reverse nearest count is get affected
as the dimensionality of the data increases,sothereisneedto
investigate how outlier detection methods bases on RNN get
affected by the dimensionality of the data.
For outlier detection RNN concept isusedinliterature[2][4],
but there is no theoretical proof which explores the relation
between the Outlier naturesofthepointsandreversenearest
neighbors. Paper [6] states that reverse nearest count is get
affected as the dimensionality of the data increases, so there
is need to investigatehowoutlierdetectionmethodsbaseson
RNN get affected by the dimensionality of the data.
In existing system it takes large computation cost, time to
calculate the reverse nearest neighbors of the all points. Use
of Antihubs for outlier detection is of high computational
task. Computation complexity increases with the data
dimensionality. For this there is scope to removal of
irrelevant features before application of Reverse Nearest
Neighbor. So to overcome this problem, feature selection is
applied on the data. In this step, all features are rank
according to their importance and required features are
selected for finding reverse nearest neighbors. To find
reverse nearest neighbor using Euclidean distance and
outlier score is calculated by using technique from existing
system. According to studies, if system does not know about
the distribution of the data then Euclidean distance is the
best choice. Proposed scheme deals with curse of
dimensionality efficiently. We discussed existing system,
problem statement and proposed scheme with detailed
structure and algorithms.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 01 | Jan -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1351
2. Literature Survey
Hawkins et al [1] the quality of data has problem observed in
OBIS data binding processes is discussed. DBSCAN a density
based clustering algorithm for large spatial database is
employed to identify outliers. The algorithmistobeeffective
and efficient for this purpose. The relationship between
outliers and erroneous data points are discussed and the
future scope to develop an operational data quality checking
tool based on this algorithm is discussed.
V. Hautamaki et al [2] used reverse nearest neighbor countis
to score outlier nature of the point. User defined threshold
was used to take decision about outlier nature of the point.
Method proposed in this paper is named as Outlier Detection
using in degree Number (ODIN). If score is less than
threshold then the point is said to be an outlierotherwiseitis
normal point. The link between the reverse nearestneighbor
count and outlier nature of the point investigate by this
paper.
J. Lin et al [3] were special case of ODIN [2] where point was
considered outlier if reverse nearest neighbor count of the
point is zero. In this paper does not provide any
mathematical explanation or proof why point which has
reverses nearest neighbor count is outlier. They mainly
focused on the speed and scalability.
The method to find reverses nearest neighbor of the point in
metric spaces described by Y. Tao et al [4]. Proposed
algorithms do not necessitate representationoftheinstances
i.e. objects. Proposed techniqueusesmetricindexthereforeit
affirms by recurring to the insertion/deletion operations of
the index.
C. Lijun et al [5] explored the relation between outlier and
RNN but there was no re-search study how high
dimensionality was connected with reverse nearest
neighbors. They focused on data stream application and
reducing execution timefor finding reversenearestneighbor
of point.
Outlier detectionwastheprocessofdiscoveringobservations
which noticeably deviatesfromotherobservationsandalsoit
was a fundamental approach in data analysis task described
by Gustavo H. Orair et al [6]. Applications range from
financial fraud detection to clinical diagnosis of diseases and
network intrusion detection. They described and evaluated
several distance based outlier detection approaches. They
presented the study to understand the impact of
optimizations strategies and tried to consolidate them.
K. S. Beyer et al [7] tried to finding the effective answers for
the problem of nearestneighbor.Thisproblemisspecified[7]
as, finding the data point that was closest to the query point
by giving anaggregation of points ofdataandaquerypointin
a multidimensional metric space.Theyanalyzedtheeffectsof
dimensionality on Nearest Neighbor queries. They observed
that as there is increase in the dimensionality, the distanceto
the neighbor advances to the distance to the farthest
neighbor. Conducted the experiments to find out the
proportion at which the NN breaks down and also explored
the situations where even on dimensionality NN queries do
not break down.
Many currentsystemsusesdataminingbasedmethodsorthe
methods that are based on signature which depends on
labeled data i.e. supervised training data for the purpose of
intrusion detection described by E. Eskin et al [8]. As such
systems can only detects the intrusions based on previously
identified intrusions, there wasa risk ofattackuntilnewtype
of attack has been manually revised. To train the model
which will detect the attacks, anomaly detection algorithms
based on supervised learning needs purely normal data,
might contain some intrusions. Algorithm may not be able to
identify and predict the future instancesofsuchintrusionsas
it was considered as normal. To address the problem, they
proposed a geometric model for anomaly detection founded
on unsupervised method. For the detection of the outlier,
paper proposed three algorithms. First algorithm was based
on cluster based technique, second was based on k-nearest
neighbor and third was based on support vector machine
based algorithm.
The context of outlier detection,inthisapproachassignedthe
each object with the levelofbeinganoutlierandthisassigned
level i.e. degree of the object was called as Local Outlier
Factor (LOF) explored by H.P.Kriegel et al [9]. In this
approach, they used density based clustering for finding
outliers in multidimensional datasets. Local Outlier factor
means instead of considering an outlier as a dual dimension,
assign object a level to which it was kept apart from the
around neighbors.
Hans-Peter Kriegel et al [10] describe distance based
approaches. This distance based approaches degrades in
performance due to high dimensionality of the data [10]. As
they rely on the distances curse of dimensionality arises the
performance issues. Identification of Density
Hermine N. Akouemo [11] this paper proposed the
combination of two statistical techniques for the detection
and imputation of outliers in time series data. An
autoregressive integrated moving average with exogenous
inputs (ARIMAX) model is used to extract the characteristics
of the time series and to find the residuals. The outliers are
detected by performing hypothesis testing on the residuals
and the anomalous data are imputed using another ARIMAX
model. This paper tests the algorithm using both synthetic
and real data sets and wepresent the analysis andcomments
on those results.
Discoursed issues in outlier detection in the case of eminent
data dimensionality and showed the way outlier detection in
high dimensional data can be made using unsupervised
methods describe In paper [12] Milos Radovanovic et al. It
also enquires how Anti-hubs are associated to the point’s
outlier nature.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 01 | Jan -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1352
Milos Radovanovic et al [13] discusses problems in outlier
detection in high dimensionality and shows that how
unsupervised methods can be used for outlier detection in
high dimensional data 2) investigates how Anti-hubs are
related to outlier nature of the point. Based on the relation
anti-hubs and outlier two methods are proposed for outlier
detection for high and low dimensional data.
2.1 Proposed System
1. Feature Selection
To deal with the Curse of dimensionality proposed systemis
designed. It takes high computationcost,timetocalculatethe
reverse nearest neighbors ofthe all points inexistingsystem.
Feature selection is applied on the data to overcome this
problem. In this step, all features are rank according to their
importance and required features are selected for finding
reverse nearest neighbors. Importance of the feature is
calculated using the Mutual Information (MI) measure.
Mutual Information is one most important feature which
calculates the mutual dependence between two features.
MI (A, B) = ………….. (1)
The mutual information between feature A and feature B
calculated by Equation 1 where PB (b).PA (a) is marginal
probability distribution and PAB (a, b) is joint probability
distribution. To calculate the MI of A, sum of MI of A with all
other features is taken,
MI (A) = (MI (A, i))………………………….. (2)
After calculation of MI values of all features, features with MI
values less than threshold values are discarded from further
process.
2. Find Reverse nearest Neighbor
In this step, data of selected features will be considered for
finding the reverse nearest neighbor. To determine the
reverse nearest neighbor, first k-nearest neighbors of each
point is evaluated. Existing system used Euclidian measure
for calculating the distance between two instances.Euclidian
distance measure
Fig. 1. System Architecture
Works fine for two and three dimensional data but is gets
negatively affected with high dimensionality. According to
studies, if system doesn’t know about the distribution of the
data then Euclidean distance is the best choice. Number of
occurrences of point P in the k nearest neighbor list of the all
other points is called as k-occurrence. Points in the dataset
for which point’s P is k-nearest neighbor are reverse nearest
neighbor forpoint P. From the k-nearest neighborlistofeach
point, reverse nearestneighborlistofeachpointiscalculated.
3. Outlier Score of Each Point
Previous methods than existing system considered k-
occurrence of the point as anoutlier score.Lessk-occurrence
indicates more outlier score of the point. Proposed system
will follow existing system to calculatetheoutlierscoreofthe
point. Sum of k-occurrence score of k-nearest neighbors of
the point P is outlier score of the point P.
Outlier Score (P) = (koccurence(pi))
Where pi is the nearest point of point P If Outlier scores
(P) is larger than the threshold then Point P is considered as
outlier.
3 Implementation details
A. Algorithm
Algorithm AntiHub 2 with feature selection
It works under the following stages
1: Select features
2: Computation of mutual dependence of two random variable
using equation
MI (A, B) = …….. (1)
Equation 1 calculates the mutual information between
feature A and feature B. where PB(b).PA(a) is marginal
probability distribution and PAB(a,b) is joint probability
distribution
3: Then MI of one feature with all other features is computed
using the relation:
MIft = MIftj.............. (2)
Where i,j=1,2,......ft with i!=j and ft is total number offeatures
4: Then MI of each feature is use to rank the feature
5: a = AntiHubdist (D,k)
6: For each i (1,2, …..,n)
7: anni =∑j_ NNdist(k, i) aj where NNdist (k, i) is the set of
indices of k nearest neighbors of xi
8: disc=0
9: For each α (0, step, 2* step ….1)
10: For each i _ (1, 2 … n)
11: cti = (1-α).ai + α.anni
12: cdisc = discScore (ct, p)
13: If cdisc > disc
14: t = ct, disc = cdisc
15. For each i _ (1, 2… n)
16. si= f(ti) where f : R R is a monotone function
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 01 | Jan -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1353
B. Mathematical Model
Let, S be Anti hub based fast unsupervised outlier detection
scheme having Input, Processes and Output it can be
represented as,
S = (I, P, O)
Where, I, is a set of inputs given to the System, O is a set of
outputs given by the System,
P is a set of processes in the System.
I = (I1, I2, I3, I4)
I1- is set of input data D with m number of features with n
number of instances.
I2- k for knn
I3- Mutual Information threshold
I4- Outlier score threshold
P= (P1, P2, P3, P4, P5, P6, P7)
P1- Find the Mutual Information between two random
variables A and B
MI (A, B) = (1)
Where
PA (a) is marginal probability distribution and
PAB (a; b) is joint probability distribution
Output will be O1
P2 - Find Mutual Information of Feature
MI (A) = (MI (A, i)) (2)
Features with high MI than threshold MI is selected for
farther process
If MI (Ai) ≥Threshold MI
Then Select Ai
Else discard Ai
P3 - To find the distance between two instances Euclidean
distance is used
Where d (p,q ) is Euclidean distance between p andqpoints,
both points has n dimensions
Output will be O3
P4 - Find k-nearest neighbor of each point
Knn (P) = (p1, p2, p3, p4, .pk)
List of k nearest neighbor points is calculated.
P5 - Find RNN list of each point
RNN (P) = Set of points for which P is in their knn list
P6 - Outlier score of each point
Outlier Score (P) = (koccurrence (pi))
Where k indicates k nearest neighbors of point p
P7 - Outlier detection
If Outlier Score (P) > threshold then P is outlier
O1 - List of MI of among all features in D
O2 - List of selected features
O3 - Euclidean distance
O4 - List of list of knn points for each point
O5 - List of RNN of each point is calculated
O6 - List of outlier score of each point
Let, S be Anti hub based fast unsupervised outlier detection
scheme having Input, Processes and Output it can be
represented as,
S = (I, P, O)
Where, I, is a set of inputs given to the System, O is a set of
outputs given by the System,
P is a set of processes in the System.
I = (I1, I2, I3, I4)
I1- is set of input data D with m number of features with n
number of instances.
I2- k for knn
I3- Mutual Information threshold
I4- Outlier score threshold
P= (P1, P2, P3, P4, P5, P6, P7)
P1- Find the Mutual Information between two random
variables A and B
MI (A, B) = (1)
Where
PA (a) is marginal probability distribution and
PAB (a; b) is joint probability distribution
Output will be O1
P2 - Find Mutual Information of Feature
MI (A) = (MI (A, i)) (2)
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 01 | Jan -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1354
Features with high MI than threshold MI is selected for
farther process
If MI (Ai) ≥Threshold MI
Then Select Ai
Else discard Ai
P3 - To find the distance between two instances Euclidean
distance is used
Where d (p,q ) is Euclidean distance between p andqpoints,
both points has n dimensions
Output will be O3
P4 - Find k-nearest neighbor of each point
Knn (P) = (p1, p2, p3, p4, .pk)
List of k nearest neighbor points is calculated.
P5 - Find RNN list of each point
RNN (P) = Set of points for which P is in their knn list
P6 - Outlier score of each point
Outlier Score (P) = (koccurrence (pi))
Where k indicates k nearest neighbors of point p
P7 - Outlier detection
If Outlier Score (P) > threshold then P is outlier
O1 - List of MI of among all features in D
O2 - List of selected features
O3 - Euclidean distance
O4 - List of list of knn points for each point
O5 - List of RNN of each point is calculated
O6 - List of outlier score of each point
C Experimental Setup
The scheme is implemented using Java framework (version
jdk 1.8) on Windows platform. The Net bean IDE (version
8.0.2) is applied as a development tool. The scheme doesn’t
need any particular hardware to run; any standard machine
can be able to run the application.
4. Experimental Results
The reason of the conducting experiments is to check the
effect of feature selection before anti-hub based outlier
detection on high dimensional data. To see the effectiveness
accuracy, memory and time requirement of Antihub based
outlier detection i.e. Antihub2 [13] and Proposed method is
compared. For experiment purpose, we used KDD dataset.
Dataset contains 1050 instances, 42 attributes and 1.456%
outliers. Minor class category considered as outlier class.
Table 1 shows the actual results.
TABLE1.ACCURACY COMPARISON WITH K VARIATION
K Simple Anithub 2 Antihub 2 with
feature selection
10 84.79 % 93.40%
100 84.70 % 90.34 %
500 83.37 % 88.04 %
Fig. 2. Accuracy comparison with k variation
Table2.time comparison with k variation
K Simple Anithub 2
time in sec.
Antihub 2 with
feature selection
time in sec.
10 322 310
100 324 314
500 335 324
Fig. 3. time comparison with k variation
Table3.memory comparison with k variation
K Simple Anithub 2
memory in MB
Antihub 2 with
feature selection
memory in MB
10 25 19
100 37 26
500 48 45
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 01 | Jan -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1355
Fig. 4. Memory comparison with k variation
onsider Feature selection selects25dimensionsfrom38
dimensions. If existing system needs 1 unit time to
process all 28 features then proposed system will
required 0.65 unit time. Same as time, memory
requirement will be less than existing system.
5. Conclusion
Exiting method proposed reverse nearest neighbor
outlier detection using anti-hub. But using anti hub for
outlier detection is of large computational task.
Computational complexity increases with the data
dimensionality to avoid thisdiscardingofunwantedfeatures
before application of reverse nearest neighbor is proceed.
Reduces computational task and improves the efficiency of
finding anti-hub and also enhances the anti-hub based
unsupervised outlier detection. Fromactual resultsitisclear
that proposed system maintains the accuracy and also
reduces the time and memory requirement for outlier
detection.
REFERENCES
[1] Hawkins, D.: “Identification of Outliers”, Chapman and
Hall, London, 1980..
[2] V. Hautamaki, I. Karkkainen, and P. Franti,“Outlier
detection using k-nearest neigh-bour graph,” in Proc
17th Int. Conf. Pattern Recognit., vol. 3, 2004, pp. 430-
433.
[3] J. Lin, D. Etter, and D. DeBarr,“Exact and approximate
reverse nearest neighborsearchformultimedia data,”in
Proc 8th SIAM Int. Conf. Data Mining, 2008,pp.656-667.
[4] Y. Tao, M. L. Yiu, and N. Mamoulis, ”Reverse nearest
neighbor search in metric spaces,”IEEE Trans. Knowl.
Data Eng.,vol. 18, no. 9, pp. 1239-1252, Sep. 2006.
[5] C. Lijun, L. Xiyin, Z. Tiejun, Z. Zhongping, and L. Aiyong,
”A data stream outlier delection algorithm based on
reverse k nearest neighbors,” in Proc. 3rd Int. Symp.
Comput.Intell.Des., 2010, pp. 236-239.
[6] Gustavo H. Orair, Carlos H. C. Teixeira, Wagner Meira Jr.,
Ye Wang and Srinivasan-Parthasarathy , ”Distance-
Based Outlier Detection: Consolidation and Renewed
Bear-ing,”2010.
[7] K. S. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft,
”When is nearest neighbor meaningful?,”inProc.7thInt.
Conf. Database Theory, 1999, pp. 217-235.
[8] E. Eskin, A. Arnold, M. Prerau, L. Portnoy, and S. Stolfo,
”A geometric framework for unsupervised anomaly
detection: Detecting intrusions in unlabeled data,”in
Proc. Conf. Appl. Data Mining Comput. Security, 2002,
pp. 78-100.
[9] M. Breunig, H.P.Kriegel, R. T. Ng, and J. Sander, ”LOF:
Identifying density-based local outliers,” SIGMOD Rec.,
vol. 29, no. 2, pp. 93-104, 2000.
[10] Hans-Peter Kriegel,MatthiasSchubertandArthurZimek,
”Angle-Based Outlier De-tection in High-dimensional
Data,” 2008.
[11] Hermine N. Akouemo and Richard J. Povinelli “Time
series outlier detection and imputation” Milwaukee,
Wisconsin 53233,july 2014
[12] Milos Radovanovic, AlexandrosNanopoulos, and
MirjanaIvanov , ”Reverse nearest Neighbors in
Unsupervised Distance-Based Outlier Detection,” 2015.
[13] Milos Radovanovic, AlexandrosNanopoulos, and
MirjanaIvanov , ”Reverse nearest Neighbors in
Unsupervised Distance-Based Outlier Detection,” 2015.

More Related Content

What's hot

Research Inventy : International Journal of Engineering and Science is publis...
Research Inventy : International Journal of Engineering and Science is publis...Research Inventy : International Journal of Engineering and Science is publis...
Research Inventy : International Journal of Engineering and Science is publis...researchinventy
 
Software Defect Prediction Using Radial Basis and Probabilistic Neural Networks
Software Defect Prediction Using Radial Basis and Probabilistic Neural NetworksSoftware Defect Prediction Using Radial Basis and Probabilistic Neural Networks
Software Defect Prediction Using Radial Basis and Probabilistic Neural NetworksEditor IJCATR
 
Sentiment Analysis on Twitter Data
Sentiment Analysis on Twitter DataSentiment Analysis on Twitter Data
Sentiment Analysis on Twitter DataIRJET Journal
 
Anomaly Detection using multidimensional reduction Principal Component Analysis
Anomaly Detection using multidimensional reduction Principal Component AnalysisAnomaly Detection using multidimensional reduction Principal Component Analysis
Anomaly Detection using multidimensional reduction Principal Component AnalysisIOSR Journals
 
IRJET- A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...
IRJET-  	  A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...IRJET-  	  A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...
IRJET- A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...IRJET Journal
 
Insights to Problems, Research Trend and Progress in Techniques of Sentiment ...
Insights to Problems, Research Trend and Progress in Techniques of Sentiment ...Insights to Problems, Research Trend and Progress in Techniques of Sentiment ...
Insights to Problems, Research Trend and Progress in Techniques of Sentiment ...IJECEIAES
 
Artificial Neural Content Techniques for Enhanced Intrusion Detection and Pre...
Artificial Neural Content Techniques for Enhanced Intrusion Detection and Pre...Artificial Neural Content Techniques for Enhanced Intrusion Detection and Pre...
Artificial Neural Content Techniques for Enhanced Intrusion Detection and Pre...AM Publications
 
Research Method EMBA chapter 12
Research Method EMBA chapter 12Research Method EMBA chapter 12
Research Method EMBA chapter 12Mazhar Poohlah
 
Research Method EMBA chapter 11
Research Method EMBA chapter 11Research Method EMBA chapter 11
Research Method EMBA chapter 11Mazhar Poohlah
 
Yolinda chiramba Survey Paper
Yolinda chiramba Survey PaperYolinda chiramba Survey Paper
Yolinda chiramba Survey PaperYolinda Chiramba
 
GROUP FUZZY TOPSIS METHODOLOGY IN COMPUTER SECURITY SOFTWARE SELECTION
GROUP FUZZY TOPSIS METHODOLOGY IN COMPUTER SECURITY SOFTWARE SELECTIONGROUP FUZZY TOPSIS METHODOLOGY IN COMPUTER SECURITY SOFTWARE SELECTION
GROUP FUZZY TOPSIS METHODOLOGY IN COMPUTER SECURITY SOFTWARE SELECTIONijfls
 
A novel ensemble modeling for intrusion detection system
A novel ensemble modeling for intrusion detection system A novel ensemble modeling for intrusion detection system
A novel ensemble modeling for intrusion detection system IJECEIAES
 
Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...Damian R. Mingle, MBA
 
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWSUSING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWScsandit
 
A Defect Prediction Model for Software Product based on ANFIS
A Defect Prediction Model for Software Product based on ANFISA Defect Prediction Model for Software Product based on ANFIS
A Defect Prediction Model for Software Product based on ANFISIJSRD
 
Sentiment Analysis and Classification of Tweets using Data Mining
Sentiment Analysis and Classification of Tweets using Data MiningSentiment Analysis and Classification of Tweets using Data Mining
Sentiment Analysis and Classification of Tweets using Data MiningIRJET Journal
 

What's hot (18)

Research Inventy : International Journal of Engineering and Science is publis...
Research Inventy : International Journal of Engineering and Science is publis...Research Inventy : International Journal of Engineering and Science is publis...
Research Inventy : International Journal of Engineering and Science is publis...
 
Software Defect Prediction Using Radial Basis and Probabilistic Neural Networks
Software Defect Prediction Using Radial Basis and Probabilistic Neural NetworksSoftware Defect Prediction Using Radial Basis and Probabilistic Neural Networks
Software Defect Prediction Using Radial Basis and Probabilistic Neural Networks
 
Sentiment Analysis on Twitter Data
Sentiment Analysis on Twitter DataSentiment Analysis on Twitter Data
Sentiment Analysis on Twitter Data
 
Anomaly Detection using multidimensional reduction Principal Component Analysis
Anomaly Detection using multidimensional reduction Principal Component AnalysisAnomaly Detection using multidimensional reduction Principal Component Analysis
Anomaly Detection using multidimensional reduction Principal Component Analysis
 
IRJET- A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...
IRJET-  	  A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...IRJET-  	  A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...
IRJET- A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...
 
Insights to Problems, Research Trend and Progress in Techniques of Sentiment ...
Insights to Problems, Research Trend and Progress in Techniques of Sentiment ...Insights to Problems, Research Trend and Progress in Techniques of Sentiment ...
Insights to Problems, Research Trend and Progress in Techniques of Sentiment ...
 
Artificial Neural Content Techniques for Enhanced Intrusion Detection and Pre...
Artificial Neural Content Techniques for Enhanced Intrusion Detection and Pre...Artificial Neural Content Techniques for Enhanced Intrusion Detection and Pre...
Artificial Neural Content Techniques for Enhanced Intrusion Detection and Pre...
 
Research Method EMBA chapter 12
Research Method EMBA chapter 12Research Method EMBA chapter 12
Research Method EMBA chapter 12
 
Research Method EMBA chapter 11
Research Method EMBA chapter 11Research Method EMBA chapter 11
Research Method EMBA chapter 11
 
F363941
F363941F363941
F363941
 
Yolinda chiramba Survey Paper
Yolinda chiramba Survey PaperYolinda chiramba Survey Paper
Yolinda chiramba Survey Paper
 
GROUP FUZZY TOPSIS METHODOLOGY IN COMPUTER SECURITY SOFTWARE SELECTION
GROUP FUZZY TOPSIS METHODOLOGY IN COMPUTER SECURITY SOFTWARE SELECTIONGROUP FUZZY TOPSIS METHODOLOGY IN COMPUTER SECURITY SOFTWARE SELECTION
GROUP FUZZY TOPSIS METHODOLOGY IN COMPUTER SECURITY SOFTWARE SELECTION
 
A novel ensemble modeling for intrusion detection system
A novel ensemble modeling for intrusion detection system A novel ensemble modeling for intrusion detection system
A novel ensemble modeling for intrusion detection system
 
Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...
 
A45010107
A45010107A45010107
A45010107
 
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWSUSING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
 
A Defect Prediction Model for Software Product based on ANFIS
A Defect Prediction Model for Software Product based on ANFISA Defect Prediction Model for Software Product based on ANFIS
A Defect Prediction Model for Software Product based on ANFIS
 
Sentiment Analysis and Classification of Tweets using Data Mining
Sentiment Analysis and Classification of Tweets using Data MiningSentiment Analysis and Classification of Tweets using Data Mining
Sentiment Analysis and Classification of Tweets using Data Mining
 

Similar to Unsupervised Distance Based Detection of Outliers by using Anti-hubs

Outlier Detection using Reverse Neares Neighbor for Unsupervised Data
Outlier Detection using Reverse Neares Neighbor for Unsupervised DataOutlier Detection using Reverse Neares Neighbor for Unsupervised Data
Outlier Detection using Reverse Neares Neighbor for Unsupervised Dataijtsrd
 
Outlier Detection Using Unsupervised Learning on High Dimensional Data
Outlier Detection Using Unsupervised Learning on High Dimensional DataOutlier Detection Using Unsupervised Learning on High Dimensional Data
Outlier Detection Using Unsupervised Learning on High Dimensional DataIJERA Editor
 
IRJET - Crime Analysis and Prediction - by using DBSCAN Algorithm
IRJET -  	  Crime Analysis and Prediction - by using DBSCAN AlgorithmIRJET -  	  Crime Analysis and Prediction - by using DBSCAN Algorithm
IRJET - Crime Analysis and Prediction - by using DBSCAN AlgorithmIRJET Journal
 
Data Analytics on Solar Energy Using Hadoop
Data Analytics on Solar Energy Using HadoopData Analytics on Solar Energy Using Hadoop
Data Analytics on Solar Energy Using HadoopIJMERJOURNAL
 
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...theijes
 
IRJET- Credit Card Fraud Detection using Isolation Forest
IRJET- Credit Card Fraud Detection using Isolation ForestIRJET- Credit Card Fraud Detection using Isolation Forest
IRJET- Credit Card Fraud Detection using Isolation ForestIRJET Journal
 
Outlier Detection Approaches in Data Mining
Outlier Detection Approaches in Data MiningOutlier Detection Approaches in Data Mining
Outlier Detection Approaches in Data MiningIRJET Journal
 
IRJET- Missing Data Imputation by Evidence Chain
IRJET- Missing Data Imputation by Evidence ChainIRJET- Missing Data Imputation by Evidence Chain
IRJET- Missing Data Imputation by Evidence ChainIRJET Journal
 
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected OutliersA Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected OutliersZac Darcy
 
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERSA MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERSZac Darcy
 
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected OutliersA Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected OutliersZac Darcy
 
Detection of Outliers in Large Dataset using Distributed Approach
Detection of Outliers in Large Dataset using Distributed ApproachDetection of Outliers in Large Dataset using Distributed Approach
Detection of Outliers in Large Dataset using Distributed ApproachEditor IJMTER
 
IRJET- Probability based Missing Value Imputation Method and its Analysis
IRJET- Probability based Missing Value Imputation Method and its AnalysisIRJET- Probability based Missing Value Imputation Method and its Analysis
IRJET- Probability based Missing Value Imputation Method and its AnalysisIRJET Journal
 
IRJET- Credit Card Fraud Detection Analysis
IRJET- Credit Card Fraud Detection AnalysisIRJET- Credit Card Fraud Detection Analysis
IRJET- Credit Card Fraud Detection AnalysisIRJET Journal
 
Review of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & PredictionReview of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & PredictionIRJET Journal
 
Comparison of Data Mining Techniques used in Anomaly Based IDS
Comparison of Data Mining Techniques used in Anomaly Based IDS  Comparison of Data Mining Techniques used in Anomaly Based IDS
Comparison of Data Mining Techniques used in Anomaly Based IDS IRJET Journal
 
Analysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOTAnalysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOTIJERA Editor
 
Enhanced DV Hop Localization Algorithm for Wireless Sensor Networks
Enhanced DV Hop Localization Algorithm for Wireless Sensor NetworksEnhanced DV Hop Localization Algorithm for Wireless Sensor Networks
Enhanced DV Hop Localization Algorithm for Wireless Sensor Networksijtsrd
 
An Efficient Approach for Outlier Detection in Wireless Sensor Network
An Efficient Approach for Outlier Detection in Wireless Sensor NetworkAn Efficient Approach for Outlier Detection in Wireless Sensor Network
An Efficient Approach for Outlier Detection in Wireless Sensor NetworkIOSR Journals
 

Similar to Unsupervised Distance Based Detection of Outliers by using Anti-hubs (20)

Outlier Detection using Reverse Neares Neighbor for Unsupervised Data
Outlier Detection using Reverse Neares Neighbor for Unsupervised DataOutlier Detection using Reverse Neares Neighbor for Unsupervised Data
Outlier Detection using Reverse Neares Neighbor for Unsupervised Data
 
Outlier Detection Using Unsupervised Learning on High Dimensional Data
Outlier Detection Using Unsupervised Learning on High Dimensional DataOutlier Detection Using Unsupervised Learning on High Dimensional Data
Outlier Detection Using Unsupervised Learning on High Dimensional Data
 
G44093135
G44093135G44093135
G44093135
 
IRJET - Crime Analysis and Prediction - by using DBSCAN Algorithm
IRJET -  	  Crime Analysis and Prediction - by using DBSCAN AlgorithmIRJET -  	  Crime Analysis and Prediction - by using DBSCAN Algorithm
IRJET - Crime Analysis and Prediction - by using DBSCAN Algorithm
 
Data Analytics on Solar Energy Using Hadoop
Data Analytics on Solar Energy Using HadoopData Analytics on Solar Energy Using Hadoop
Data Analytics on Solar Energy Using Hadoop
 
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...
 
IRJET- Credit Card Fraud Detection using Isolation Forest
IRJET- Credit Card Fraud Detection using Isolation ForestIRJET- Credit Card Fraud Detection using Isolation Forest
IRJET- Credit Card Fraud Detection using Isolation Forest
 
Outlier Detection Approaches in Data Mining
Outlier Detection Approaches in Data MiningOutlier Detection Approaches in Data Mining
Outlier Detection Approaches in Data Mining
 
IRJET- Missing Data Imputation by Evidence Chain
IRJET- Missing Data Imputation by Evidence ChainIRJET- Missing Data Imputation by Evidence Chain
IRJET- Missing Data Imputation by Evidence Chain
 
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected OutliersA Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
 
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERSA MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
 
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected OutliersA Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
 
Detection of Outliers in Large Dataset using Distributed Approach
Detection of Outliers in Large Dataset using Distributed ApproachDetection of Outliers in Large Dataset using Distributed Approach
Detection of Outliers in Large Dataset using Distributed Approach
 
IRJET- Probability based Missing Value Imputation Method and its Analysis
IRJET- Probability based Missing Value Imputation Method and its AnalysisIRJET- Probability based Missing Value Imputation Method and its Analysis
IRJET- Probability based Missing Value Imputation Method and its Analysis
 
IRJET- Credit Card Fraud Detection Analysis
IRJET- Credit Card Fraud Detection AnalysisIRJET- Credit Card Fraud Detection Analysis
IRJET- Credit Card Fraud Detection Analysis
 
Review of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & PredictionReview of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & Prediction
 
Comparison of Data Mining Techniques used in Anomaly Based IDS
Comparison of Data Mining Techniques used in Anomaly Based IDS  Comparison of Data Mining Techniques used in Anomaly Based IDS
Comparison of Data Mining Techniques used in Anomaly Based IDS
 
Analysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOTAnalysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOT
 
Enhanced DV Hop Localization Algorithm for Wireless Sensor Networks
Enhanced DV Hop Localization Algorithm for Wireless Sensor NetworksEnhanced DV Hop Localization Algorithm for Wireless Sensor Networks
Enhanced DV Hop Localization Algorithm for Wireless Sensor Networks
 
An Efficient Approach for Outlier Detection in Wireless Sensor Network
An Efficient Approach for Outlier Detection in Wireless Sensor NetworkAn Efficient Approach for Outlier Detection in Wireless Sensor Network
An Efficient Approach for Outlier Detection in Wireless Sensor Network
 

More from IRJET Journal

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...IRJET Journal
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTUREIRJET Journal
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...IRJET Journal
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsIRJET Journal
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...IRJET Journal
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...IRJET Journal
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...IRJET Journal
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...IRJET Journal
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASIRJET Journal
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...IRJET Journal
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProIRJET Journal
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...IRJET Journal
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemIRJET Journal
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesIRJET Journal
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web applicationIRJET Journal
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...IRJET Journal
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.IRJET Journal
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...IRJET Journal
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignIRJET Journal
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...IRJET Journal
 

More from IRJET Journal (20)

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
 

Recently uploaded

Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2RajaP95
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxvipinkmenon1
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 

Recently uploaded (20)

Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptx
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 

Unsupervised Distance Based Detection of Outliers by using Anti-hubs

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 01 | Jan -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1350 Unsupervised Distance Based Detection of Outliers by using Anti-hubs Mr. Kiran V. Markad, Mr. Kiran M. Moholkar , Mr. Sopan N. Abdal Department of Information Technology D Y PATIL College of Engineering, Ambi Talegaon , Pune, India Prof. Rajani ThiteAssistantProfessor Department of Information Technology D Y PATIL College of Engineering, Ambi Talegaon , Pune, India ---------------------------------------------------------------------***------------------------------------------------------------------- Abstract- Distance based outlier detection methods fails as the dimensionality of the data increases due to all point becomes good outlier. Reason of these issues is irrelevant and redundant features; nearest neighbor of the point P is K points whose distance to point P is less than all other points. Reverse nearest neighbors (RNN) of Point P is the points for which P is in their k nearest neighbor list. Some points are frequently comes in k-nearestneighborlistofanotherpoints and some points are infrequently comes in k nearest neighbor list of different points are called as Anti-hubs. There are many researches of ant hub based unsupervised outlier detection but there is one issue is occurring that high computation cost for finding anti hubs. If the data that having better dimensionality, high computation cost, high computation complexity, time requirementto findantihubs. To remove the unwanted feature and high dimensionality data to make an efficient system results. Feature selection is the process proposed toremoveunwantedfeatureandmake a system more efficient. By applying feature selection this paper extends anti-hub based outlier detection method for high dimensional data. Key Words :- High-Dimensional, Detection of Data Outliers, Reverse nearest Neighbours. 1.INTRODUCTION There are three main types of outlier detection methods namely, unsupervised, semi-supervised and supervised. There is need to find outlier in many application for that we have to study outlier detection analysis. There is Need of availability of correct labels of the instances for Supervised and Semi Supervised outlier detection. Currently unsupervised technique is used widely which does not need label to the instances for outlier detection. Currently the best efficient methodforoutlierdetectionis unsupervised distance base outlier detection method. The normal instances have small amount of distances among them and outliers have large amount of distances among them in distance base outlier detection. As the increase in dimensions of data distances are not useful to find outliers causes every point become an outlier. Therefore, high dimensionality of data is the biggest problemorchallengefor unsupervised distance outlier detection. The base paper can show that unsupervised distance base outlier detection system can handle high dimensionaldata,itcandetectoutlier under specific condition i.e. data should be useful and attributes are meaningful means data should not be noisy then it can be success to handle high dimensional data. K- nearest neighbor (KNN) of the point P is K points whose distance to point P is less than all other points. Reverse nearest neighbors (RNN) of Point P is the points for which P is in their k nearest neighbor list. Some points are frequently comes in k-nearest neighbor list of other points and some points are infrequently comes in k nearest neighbor list of some other points are called as Anti-hubs. For outlier detection RNN concept is used in literature, but there is no theoretical proof which explores the relation between the Outlier natures of the points and reverses nearest neighbors. The reverse nearest count is get affected as the dimensionality of the data increases,sothereisneedto investigate how outlier detection methods bases on RNN get affected by the dimensionality of the data. For outlier detection RNN concept isusedinliterature[2][4], but there is no theoretical proof which explores the relation between the Outlier naturesofthepointsandreversenearest neighbors. Paper [6] states that reverse nearest count is get affected as the dimensionality of the data increases, so there is need to investigatehowoutlierdetectionmethodsbaseson RNN get affected by the dimensionality of the data. In existing system it takes large computation cost, time to calculate the reverse nearest neighbors of the all points. Use of Antihubs for outlier detection is of high computational task. Computation complexity increases with the data dimensionality. For this there is scope to removal of irrelevant features before application of Reverse Nearest Neighbor. So to overcome this problem, feature selection is applied on the data. In this step, all features are rank according to their importance and required features are selected for finding reverse nearest neighbors. To find reverse nearest neighbor using Euclidean distance and outlier score is calculated by using technique from existing system. According to studies, if system does not know about the distribution of the data then Euclidean distance is the best choice. Proposed scheme deals with curse of dimensionality efficiently. We discussed existing system, problem statement and proposed scheme with detailed structure and algorithms.
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 01 | Jan -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1351 2. Literature Survey Hawkins et al [1] the quality of data has problem observed in OBIS data binding processes is discussed. DBSCAN a density based clustering algorithm for large spatial database is employed to identify outliers. The algorithmistobeeffective and efficient for this purpose. The relationship between outliers and erroneous data points are discussed and the future scope to develop an operational data quality checking tool based on this algorithm is discussed. V. Hautamaki et al [2] used reverse nearest neighbor countis to score outlier nature of the point. User defined threshold was used to take decision about outlier nature of the point. Method proposed in this paper is named as Outlier Detection using in degree Number (ODIN). If score is less than threshold then the point is said to be an outlierotherwiseitis normal point. The link between the reverse nearestneighbor count and outlier nature of the point investigate by this paper. J. Lin et al [3] were special case of ODIN [2] where point was considered outlier if reverse nearest neighbor count of the point is zero. In this paper does not provide any mathematical explanation or proof why point which has reverses nearest neighbor count is outlier. They mainly focused on the speed and scalability. The method to find reverses nearest neighbor of the point in metric spaces described by Y. Tao et al [4]. Proposed algorithms do not necessitate representationoftheinstances i.e. objects. Proposed techniqueusesmetricindexthereforeit affirms by recurring to the insertion/deletion operations of the index. C. Lijun et al [5] explored the relation between outlier and RNN but there was no re-search study how high dimensionality was connected with reverse nearest neighbors. They focused on data stream application and reducing execution timefor finding reversenearestneighbor of point. Outlier detectionwastheprocessofdiscoveringobservations which noticeably deviatesfromotherobservationsandalsoit was a fundamental approach in data analysis task described by Gustavo H. Orair et al [6]. Applications range from financial fraud detection to clinical diagnosis of diseases and network intrusion detection. They described and evaluated several distance based outlier detection approaches. They presented the study to understand the impact of optimizations strategies and tried to consolidate them. K. S. Beyer et al [7] tried to finding the effective answers for the problem of nearestneighbor.Thisproblemisspecified[7] as, finding the data point that was closest to the query point by giving anaggregation of points ofdataandaquerypointin a multidimensional metric space.Theyanalyzedtheeffectsof dimensionality on Nearest Neighbor queries. They observed that as there is increase in the dimensionality, the distanceto the neighbor advances to the distance to the farthest neighbor. Conducted the experiments to find out the proportion at which the NN breaks down and also explored the situations where even on dimensionality NN queries do not break down. Many currentsystemsusesdataminingbasedmethodsorthe methods that are based on signature which depends on labeled data i.e. supervised training data for the purpose of intrusion detection described by E. Eskin et al [8]. As such systems can only detects the intrusions based on previously identified intrusions, there wasa risk ofattackuntilnewtype of attack has been manually revised. To train the model which will detect the attacks, anomaly detection algorithms based on supervised learning needs purely normal data, might contain some intrusions. Algorithm may not be able to identify and predict the future instancesofsuchintrusionsas it was considered as normal. To address the problem, they proposed a geometric model for anomaly detection founded on unsupervised method. For the detection of the outlier, paper proposed three algorithms. First algorithm was based on cluster based technique, second was based on k-nearest neighbor and third was based on support vector machine based algorithm. The context of outlier detection,inthisapproachassignedthe each object with the levelofbeinganoutlierandthisassigned level i.e. degree of the object was called as Local Outlier Factor (LOF) explored by H.P.Kriegel et al [9]. In this approach, they used density based clustering for finding outliers in multidimensional datasets. Local Outlier factor means instead of considering an outlier as a dual dimension, assign object a level to which it was kept apart from the around neighbors. Hans-Peter Kriegel et al [10] describe distance based approaches. This distance based approaches degrades in performance due to high dimensionality of the data [10]. As they rely on the distances curse of dimensionality arises the performance issues. Identification of Density Hermine N. Akouemo [11] this paper proposed the combination of two statistical techniques for the detection and imputation of outliers in time series data. An autoregressive integrated moving average with exogenous inputs (ARIMAX) model is used to extract the characteristics of the time series and to find the residuals. The outliers are detected by performing hypothesis testing on the residuals and the anomalous data are imputed using another ARIMAX model. This paper tests the algorithm using both synthetic and real data sets and wepresent the analysis andcomments on those results. Discoursed issues in outlier detection in the case of eminent data dimensionality and showed the way outlier detection in high dimensional data can be made using unsupervised methods describe In paper [12] Milos Radovanovic et al. It also enquires how Anti-hubs are associated to the point’s outlier nature.
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 01 | Jan -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1352 Milos Radovanovic et al [13] discusses problems in outlier detection in high dimensionality and shows that how unsupervised methods can be used for outlier detection in high dimensional data 2) investigates how Anti-hubs are related to outlier nature of the point. Based on the relation anti-hubs and outlier two methods are proposed for outlier detection for high and low dimensional data. 2.1 Proposed System 1. Feature Selection To deal with the Curse of dimensionality proposed systemis designed. It takes high computationcost,timetocalculatethe reverse nearest neighbors ofthe all points inexistingsystem. Feature selection is applied on the data to overcome this problem. In this step, all features are rank according to their importance and required features are selected for finding reverse nearest neighbors. Importance of the feature is calculated using the Mutual Information (MI) measure. Mutual Information is one most important feature which calculates the mutual dependence between two features. MI (A, B) = ………….. (1) The mutual information between feature A and feature B calculated by Equation 1 where PB (b).PA (a) is marginal probability distribution and PAB (a, b) is joint probability distribution. To calculate the MI of A, sum of MI of A with all other features is taken, MI (A) = (MI (A, i))………………………….. (2) After calculation of MI values of all features, features with MI values less than threshold values are discarded from further process. 2. Find Reverse nearest Neighbor In this step, data of selected features will be considered for finding the reverse nearest neighbor. To determine the reverse nearest neighbor, first k-nearest neighbors of each point is evaluated. Existing system used Euclidian measure for calculating the distance between two instances.Euclidian distance measure Fig. 1. System Architecture Works fine for two and three dimensional data but is gets negatively affected with high dimensionality. According to studies, if system doesn’t know about the distribution of the data then Euclidean distance is the best choice. Number of occurrences of point P in the k nearest neighbor list of the all other points is called as k-occurrence. Points in the dataset for which point’s P is k-nearest neighbor are reverse nearest neighbor forpoint P. From the k-nearest neighborlistofeach point, reverse nearestneighborlistofeachpointiscalculated. 3. Outlier Score of Each Point Previous methods than existing system considered k- occurrence of the point as anoutlier score.Lessk-occurrence indicates more outlier score of the point. Proposed system will follow existing system to calculatetheoutlierscoreofthe point. Sum of k-occurrence score of k-nearest neighbors of the point P is outlier score of the point P. Outlier Score (P) = (koccurence(pi)) Where pi is the nearest point of point P If Outlier scores (P) is larger than the threshold then Point P is considered as outlier. 3 Implementation details A. Algorithm Algorithm AntiHub 2 with feature selection It works under the following stages 1: Select features 2: Computation of mutual dependence of two random variable using equation MI (A, B) = …….. (1) Equation 1 calculates the mutual information between feature A and feature B. where PB(b).PA(a) is marginal probability distribution and PAB(a,b) is joint probability distribution 3: Then MI of one feature with all other features is computed using the relation: MIft = MIftj.............. (2) Where i,j=1,2,......ft with i!=j and ft is total number offeatures 4: Then MI of each feature is use to rank the feature 5: a = AntiHubdist (D,k) 6: For each i (1,2, …..,n) 7: anni =∑j_ NNdist(k, i) aj where NNdist (k, i) is the set of indices of k nearest neighbors of xi 8: disc=0 9: For each α (0, step, 2* step ….1) 10: For each i _ (1, 2 … n) 11: cti = (1-α).ai + α.anni 12: cdisc = discScore (ct, p) 13: If cdisc > disc 14: t = ct, disc = cdisc 15. For each i _ (1, 2… n) 16. si= f(ti) where f : R R is a monotone function
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 01 | Jan -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1353 B. Mathematical Model Let, S be Anti hub based fast unsupervised outlier detection scheme having Input, Processes and Output it can be represented as, S = (I, P, O) Where, I, is a set of inputs given to the System, O is a set of outputs given by the System, P is a set of processes in the System. I = (I1, I2, I3, I4) I1- is set of input data D with m number of features with n number of instances. I2- k for knn I3- Mutual Information threshold I4- Outlier score threshold P= (P1, P2, P3, P4, P5, P6, P7) P1- Find the Mutual Information between two random variables A and B MI (A, B) = (1) Where PA (a) is marginal probability distribution and PAB (a; b) is joint probability distribution Output will be O1 P2 - Find Mutual Information of Feature MI (A) = (MI (A, i)) (2) Features with high MI than threshold MI is selected for farther process If MI (Ai) ≥Threshold MI Then Select Ai Else discard Ai P3 - To find the distance between two instances Euclidean distance is used Where d (p,q ) is Euclidean distance between p andqpoints, both points has n dimensions Output will be O3 P4 - Find k-nearest neighbor of each point Knn (P) = (p1, p2, p3, p4, .pk) List of k nearest neighbor points is calculated. P5 - Find RNN list of each point RNN (P) = Set of points for which P is in their knn list P6 - Outlier score of each point Outlier Score (P) = (koccurrence (pi)) Where k indicates k nearest neighbors of point p P7 - Outlier detection If Outlier Score (P) > threshold then P is outlier O1 - List of MI of among all features in D O2 - List of selected features O3 - Euclidean distance O4 - List of list of knn points for each point O5 - List of RNN of each point is calculated O6 - List of outlier score of each point Let, S be Anti hub based fast unsupervised outlier detection scheme having Input, Processes and Output it can be represented as, S = (I, P, O) Where, I, is a set of inputs given to the System, O is a set of outputs given by the System, P is a set of processes in the System. I = (I1, I2, I3, I4) I1- is set of input data D with m number of features with n number of instances. I2- k for knn I3- Mutual Information threshold I4- Outlier score threshold P= (P1, P2, P3, P4, P5, P6, P7) P1- Find the Mutual Information between two random variables A and B MI (A, B) = (1) Where PA (a) is marginal probability distribution and PAB (a; b) is joint probability distribution Output will be O1 P2 - Find Mutual Information of Feature MI (A) = (MI (A, i)) (2)
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 01 | Jan -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1354 Features with high MI than threshold MI is selected for farther process If MI (Ai) ≥Threshold MI Then Select Ai Else discard Ai P3 - To find the distance between two instances Euclidean distance is used Where d (p,q ) is Euclidean distance between p andqpoints, both points has n dimensions Output will be O3 P4 - Find k-nearest neighbor of each point Knn (P) = (p1, p2, p3, p4, .pk) List of k nearest neighbor points is calculated. P5 - Find RNN list of each point RNN (P) = Set of points for which P is in their knn list P6 - Outlier score of each point Outlier Score (P) = (koccurrence (pi)) Where k indicates k nearest neighbors of point p P7 - Outlier detection If Outlier Score (P) > threshold then P is outlier O1 - List of MI of among all features in D O2 - List of selected features O3 - Euclidean distance O4 - List of list of knn points for each point O5 - List of RNN of each point is calculated O6 - List of outlier score of each point C Experimental Setup The scheme is implemented using Java framework (version jdk 1.8) on Windows platform. The Net bean IDE (version 8.0.2) is applied as a development tool. The scheme doesn’t need any particular hardware to run; any standard machine can be able to run the application. 4. Experimental Results The reason of the conducting experiments is to check the effect of feature selection before anti-hub based outlier detection on high dimensional data. To see the effectiveness accuracy, memory and time requirement of Antihub based outlier detection i.e. Antihub2 [13] and Proposed method is compared. For experiment purpose, we used KDD dataset. Dataset contains 1050 instances, 42 attributes and 1.456% outliers. Minor class category considered as outlier class. Table 1 shows the actual results. TABLE1.ACCURACY COMPARISON WITH K VARIATION K Simple Anithub 2 Antihub 2 with feature selection 10 84.79 % 93.40% 100 84.70 % 90.34 % 500 83.37 % 88.04 % Fig. 2. Accuracy comparison with k variation Table2.time comparison with k variation K Simple Anithub 2 time in sec. Antihub 2 with feature selection time in sec. 10 322 310 100 324 314 500 335 324 Fig. 3. time comparison with k variation Table3.memory comparison with k variation K Simple Anithub 2 memory in MB Antihub 2 with feature selection memory in MB 10 25 19 100 37 26 500 48 45
  • 6. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 01 | Jan -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1355 Fig. 4. Memory comparison with k variation onsider Feature selection selects25dimensionsfrom38 dimensions. If existing system needs 1 unit time to process all 28 features then proposed system will required 0.65 unit time. Same as time, memory requirement will be less than existing system. 5. Conclusion Exiting method proposed reverse nearest neighbor outlier detection using anti-hub. But using anti hub for outlier detection is of large computational task. Computational complexity increases with the data dimensionality to avoid thisdiscardingofunwantedfeatures before application of reverse nearest neighbor is proceed. Reduces computational task and improves the efficiency of finding anti-hub and also enhances the anti-hub based unsupervised outlier detection. Fromactual resultsitisclear that proposed system maintains the accuracy and also reduces the time and memory requirement for outlier detection. REFERENCES [1] Hawkins, D.: “Identification of Outliers”, Chapman and Hall, London, 1980.. [2] V. Hautamaki, I. Karkkainen, and P. Franti,“Outlier detection using k-nearest neigh-bour graph,” in Proc 17th Int. Conf. Pattern Recognit., vol. 3, 2004, pp. 430- 433. [3] J. Lin, D. Etter, and D. DeBarr,“Exact and approximate reverse nearest neighborsearchformultimedia data,”in Proc 8th SIAM Int. Conf. Data Mining, 2008,pp.656-667. [4] Y. Tao, M. L. Yiu, and N. Mamoulis, ”Reverse nearest neighbor search in metric spaces,”IEEE Trans. Knowl. Data Eng.,vol. 18, no. 9, pp. 1239-1252, Sep. 2006. [5] C. Lijun, L. Xiyin, Z. Tiejun, Z. Zhongping, and L. Aiyong, ”A data stream outlier delection algorithm based on reverse k nearest neighbors,” in Proc. 3rd Int. Symp. Comput.Intell.Des., 2010, pp. 236-239. [6] Gustavo H. Orair, Carlos H. C. Teixeira, Wagner Meira Jr., Ye Wang and Srinivasan-Parthasarathy , ”Distance- Based Outlier Detection: Consolidation and Renewed Bear-ing,”2010. [7] K. S. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, ”When is nearest neighbor meaningful?,”inProc.7thInt. Conf. Database Theory, 1999, pp. 217-235. [8] E. Eskin, A. Arnold, M. Prerau, L. Portnoy, and S. Stolfo, ”A geometric framework for unsupervised anomaly detection: Detecting intrusions in unlabeled data,”in Proc. Conf. Appl. Data Mining Comput. Security, 2002, pp. 78-100. [9] M. Breunig, H.P.Kriegel, R. T. Ng, and J. Sander, ”LOF: Identifying density-based local outliers,” SIGMOD Rec., vol. 29, no. 2, pp. 93-104, 2000. [10] Hans-Peter Kriegel,MatthiasSchubertandArthurZimek, ”Angle-Based Outlier De-tection in High-dimensional Data,” 2008. [11] Hermine N. Akouemo and Richard J. Povinelli “Time series outlier detection and imputation” Milwaukee, Wisconsin 53233,july 2014 [12] Milos Radovanovic, AlexandrosNanopoulos, and MirjanaIvanov , ”Reverse nearest Neighbors in Unsupervised Distance-Based Outlier Detection,” 2015. [13] Milos Radovanovic, AlexandrosNanopoulos, and MirjanaIvanov , ”Reverse nearest Neighbors in Unsupervised Distance-Based Outlier Detection,” 2015.