SlideShare a Scribd company logo
1 of 12
Download to read offline
Local Outlier Factor
Lab Report: Lab Development and Application of Data Mining and
Learning Systems 2015
Amr Koura
Abstract: Outlier detection has become an important problem in many
real world applications. In some applications like intrusion detection , ļ¬nding
outlier become more important than ļ¬nding common pattern. In this paper ,
we are going to discuss one of outlier detection algorithms called ā€LOF:Local
outlier factorā€ algorithm.we will show the algorithm in two modes, ļ¬rst is
ā€batch modeā€ , where input data set is known in advance, while second mode
is ā€incremental modeā€ , where outlier should be detected on the ļ¬‚y during re-
ceiving streaming data. The paper will also show the implementation details
for the two modes and the integration with open source data mining project
called ā€RealKDā€.In the ļ¬rst part,LOF batch mode, the paper will provide
theoretical explanation for algorithm and will discuss the implementation
details of the algorithm, while in the second part,The incremental mode ,the
paper will show how the algorithm computes the outlier eļ¬ƒciently such as
insertion and deletion of points will eļ¬€ect only limited number of nearest
neighbors and donā€™t depend on the total number of points N in the data
set.the implementation details and integration with realKD library will also
be discussed.
1 Introduction
knowledge discovery in database (KDD) are focusing on identifying understandable
knowledge from the existing data.Most of KDD algorithms concentrates on comput-
ing patterns that matches large portion of objects in dataset. However , in application
like intrusion detection , detecting rare events that deviates from the majority, is more
important than identifying common patterns.
Most of outlier detection algorithms rely on clustering algorithm. For clustering al-
gorithms , outliers are points reside outside the cluster and considered to be noise. this
approach depends more in particular clustering algorithm and parameters.In fact,There
are very few algorithms that are directly concerned with outlier detection.
1
Most of outlier detection algorithms are giving the outlier as binary property so, the
points are either classiļ¬ed as an outlier or not. The Local outlier factor is an algorithm
that quantify the tendency of being outlier. the algorithm computes local outlier factory
for each example that show the tendency of point to be an outlier.
The algorithm depends on density based clustering , so it computes the density of
each point and compare it to the density of its neighbors. and so, the outlier factor is
local in since that only restricted neighbors for each point is taken into account.
Online detection of outlier plays an important role in many streaming applications.
Automated identiļ¬cation for outlier from data stream is hot research topic and has
many usage in modern applications like security application, image and multimedia
analysis.The incremental mode of LOF algorithm can detect eļ¬ƒciently the outlier. The
incremental LOF provide equivalent performance for batch LOF mode , and has O(N
log N) time complexity, where N is the total number of points in the data set. The paper
will do experiments on insertion and deletion of points using incremental LOF and will
compare the result with the results obtained from the batch mode test.
The paper will also discuss the implementation details for batch and incremental mode
and show how to integrate the code with an open-source project ā€RealKDā€.
2 Related Work
Most of previous KDD outlier detection research papers was building outlier detection
approaches based on clustering algorithms. The algorithms was optimized for clustering
purpose and considering the outliers as noise data.There was a need to build algorithms
that are designed solely to identify the outlier.
The ā€LOF: Identifying Density-based Local Outliersā€ paper [1] was one of the very
ļ¬rst papers who studied LOF algorithm. in That paper , authors explains the theory
behind the LOF algorithm and explains the equation behind the algorithm.
The paper shows the problem with previous distance based algorithms using ļ¬gure
1,In this ļ¬gure , you can see data set with dense cluster C2 and other less dense cluster
C1 and two objects O1 and O2. according to the paper , if the distance between each
object in C1 and its nearest neighbors are larger than the distance between O2 and C2,
so we canā€™t ļ¬nd a minimum distance dmin that we use it, to classify O2 as outlier without
classify objects in C1 as outlier too. LOF was a solution for such problem because it has
local view of the points, not global view as distance-based did.our implementation for
batch mode totally is based on the equations that was provided in [1].
The demand for detecting in the data stream applications has become increasingly
important and active research area. The incremental mode of LOF algorithm in this
paper is based on [2]. In that paper , the Author shows eļ¬ƒcient algorithm in detecting
2
Figure 1: distance based algorithm
http://www.dbs.iļ¬.lmu.de/Publikationen/Papers/LOF.pdf
the outlier in data stream application and shows that insertion or deletion of point
depends on limited numbers of neighbors and donā€™t depends in total number of points N
in the data set. in this paper we use the same algorithms and try it with real example.
Our implementation of incremental mode of LOF is based on the algorithms that are
provided in [2].
our implementation should integrate with open-source project called ā€realKDā€. re-
alKD is an open-source java library that has be designed to help users to apply KDD
algorithms to discover real knowledge from real data.
The repository for realKD library is ā€https://bitbucket.org/realKD/realkd/wiki/Homeā€.
realKD library is published under MIT license and there are many algorithms which are
already implemented.There was an outlier detection algorithm that implemented pre-
viously using Support vector machine SVM and our algorithm supposed to be second
outlier detection algorithm for that library. we will discuss in the next section the inter-
faces that we use to implement in order to integrate successfully. and will also mention
how the user can call our algorithm with parameters from the command line interface.
3 Local Outlier Factor Algorithm
The local outlier factor algorithm is based on a concept of a local density, where the
algorithm compare the density of each point with the density of nearest neighbors. In
the following subsection , the papers will discuss the theory behind algorithm and the
implementation details for the algorithm and the integration part. In ļ¬rst subsection
, we will deal with the ā€Batchā€ mode , while in second section, we will deal with the
ā€Incrementalā€ mode.
3.1 LOF Batch Mode:
In this section , we will discuss the details of theory behind the LOF batch mode and the
implementation details for the algorithm and integration. the details for the algorithm
3
is based on [1].
3.1.1 Fomal Deļ¬nition:
let K-distance(p) be the distance of the object p to the k-th nearest neighbor. set of the
k nearest neighbors includes all objects at this distance,and can be more than k objects.
we will denote the set of k nearest neighbors as Nk(p).
Nk(p) = {q āˆˆ D  {p} | d(q, p) ā‰¤ k āˆ’ distance(p)}
Deļ¬nition: reachability distance of an object p w.r.t object o:
Let K be integer number. the reachability distance of an object p with respect to
object o is deļ¬ned as:
reach āˆ’ distk(p, o) = max{k āˆ’ distance(o), d(p, o)}
Figure 2: reach-dist(p1,o) and reach-dist(p2,o), for k=4
http://www.dbs.iļ¬.lmu.de/Publikationen/Papers/LOF.pdf
Figure 2 illustrate the idea of reachability distance between objects p and o. if object
p is far away from object o, like p2, then the reachability distance in that case equals to
the distance between p and o , d(p,o). but if object p is near from o, like p1, then the
reachability distance will be equals to k-distance of object o. This shows the importance
of parameter K, the higher the value of K , the more similar the reachability distance
for objects within the same neighborhood.
Deļ¬nition: local reachability density of object p:
lrdK(p) = 1/
oāˆˆNK (p)
reachāˆ’distK (p,o)
|NK (p)|
the Local reachability density of object p is the inverse of average reachability distance
based on K nearest neighbor of p.
Deļ¬nition: local outlier factor of an object p:
4
The local outlier factor of an object p is deļ¬ned as:
LOFK(p) =
oāˆˆNK (p)
lrdK (o)
lrdK (p)
|NK (p)|
The LOF value for an object gives the tendency of that object to be an outlier.The
LOF value has a very special properties that can help to detect the outlier. The LOF
value for objects that exists deep inside the cluster is approximately equals 1, but LOF
values objects outside the cluster has values larger than 1. proof [1]
3.1.2 implementation details
In this subsection, we will show out implementation details for the batch mode LOF
algorithm.
The class ā€LOFOutlierā€ in package ā€de.unibonn.realkd.algorithms.outlier.LOFā€ con-
tains all the details code to implement the algorithm equation. The class extends from
abstract class ā€AbstractMiningAlgorithmā€ which is the class that contains the logic to
be called from the realKD framework. we maintain N*N matrix called ā€trainingMatrixā€
that contains the distance between all data in our data set. Another N*N matrix called
ā€sortedTrainingMatrixā€ contains sorted order of indices of data according to their dis-
tance. for example , if the nearest neighbor for data with index 0 is the data with index
2, then index 6, then the ļ¬rst row of this matrix, will be like this:
0 2 6 ...
so for the ļ¬rst data , the nearest neighbor is it self (index:0), then data in index 2,
then data in index 6 , and so on. The reason behind selecting this data structure is
to optimize the speed , so in order to ļ¬nd K-nearest neighbor , or K- inverse nearest
neighbor ā€which we will need in the incremental mode laterā€, it is easy to work with
these matrices to do the task fast.
the main parameter for this algorithm to work is the input data set and value of
K parameter. The class ā€FractionOFLOFParameterā€ is the class that represents the
place holder for K value. The main logic exists in function ā€concreteCallā€ which is the
function that will be called from the realKD once the user specify the algorithm name
ā€LOFā€. and pass the K value. the function is calling ā€computeLofValuesā€ function
which takes the input data set and K value and call corresponding functions to compute
the LRD,LOF for each example in the data set. to test this class. the user should call the
program with the following parameters: RealKD load ā€Path to input data setā€ ā€Path
to input attribute Fileā€ ā€input to group input ļ¬leā€ run LOF ā€specify Numeric target
attributesā€ KValue=[Value of K] for Example:
5
RealKD load ā€/Users/XYZ/simpleTestFile/data.txtā€ ā€/Users/XYZ/simpleTestFile/attributes.txtā€
ā€/Users/XYZ/simpleTestFile/groups.txtā€ run LOF ā€Numeric Target attributes=Latitude,Longitudeā€
KValue=3
3.2 Incremental LOF Mode
designing an incremental LOF algorithm is motivated by two goals. First,the perfor-
mance of the algorithm should be equivalent to the performance of iterated ā€staticā€
LOF algorithm,Second because of stream data are considered to be inļ¬nite , so we need
an eļ¬ƒcient algorithm, that can do insertion/deletion eļ¬ƒciently and donā€™t depend on the
total number of data N, otherwise the performance will be O(N2logN). The paper in
[2], shows an eļ¬ƒcient algorithm for incremental LOF Mode for insertion and deletion.
in each insertion/deletion operation , the algorithm update limited number of neighbors,
so it doesnā€™t depends on total number of records N. This improves the complexity of
algorithm comparing to iterated static LOF and makes the complexityO(NlogN) rather
than O(N2logN).
In this paper we will show the details for the insertion and deletion part and our
implementation is based on algorithm in [2].
3.2.1 Insertion
In insertion part , the algorithm should keep track of the points that should be updated
ā€K-distance,LRD,LOFā€ after inserting the new data.
Figure 3 shows general framework for inserting new point in the incremental LOF.
3.2.2 Deletion
In data stream application, there is a need to delete one or more examples to resolve
because of the memory limitation and sometime because of these examples become
outdated.
Same like insertion part, the deletion part should keep track of eļ¬€ected examples to
update their K-distance, LRD, and LOF after deleting the required data example.
Figure 4 shows framework for deletion.
3.2.3 Implementation Details:
In this section we will show the implementation details , integration and command line
interface to call incremental LOF for both insertion and deletion.
Analogy to batch LOF, we create two classes for the incremental LOF, ļ¬rst ā€ILO-
FOutlierAddā€ class that contains the logic of insertion algorithm and, second is ā€ILO-
FOutlierDeleteā€ which contains the of deletion algorithm. For reusability reasons, both
classes are extends ā€LOFOutlierā€ class, because we will need to reuse all functions that
6
Figure 3: incremental LOF insertion
http://www-ai.cs.uni-
dortmund.de/LEHRE/FACHPROJEKT/SS12/paper/outlier/pokrajac2007.pdf
compute LRD,LOF and we will need to use trainingMatrix and sortedTrainingMatrix
as well.
3.2.3.1 insertion: in case of insertion , the algorithm will need the K value ā€which
was already implemented in batch mode , so we will reuse it againā€ and will need
the new data point to be inserted. the class that correspond to the new parameter is
ā€ILOFNewDataParameterā€.
Again the function ā€concreteCallā€ contains the main logic behind the insertion algo-
rithm. In this function , the code is inserting new example into data set and update
only the value of LOF values for limited number of eļ¬€ected neighbors as shown in the
algorithm.
To test incremental LOF addition, the user should call the program with the following
parameters:
RealKD load ā€Path to input data setā€ ā€Path to input attribute Fileā€ ā€input to group
input ļ¬leā€ run LOF ā€specify Numeric target attributesā€ KValue=[Value of K] new-
7
Figure 4: incremental LOF Deletion
http://www-ai.cs.uni-
dortmund.de/LEHRE/FACHPROJEKT/SS12/paper/outlier/pokrajac2007.pdf
Point=ā€string delimited of new exampleā€.
for Example, to insert data sample with attributes ā€Alexandriaā€,ā€31.205753ā€,ā€29.924526ā€,
the user should executes:
RealKD load ā€/Users/XYZ/simpleTestFile/data.txtā€ ā€/Users/XYZ/simpleTestFile/attributes.txtā€
ā€/Users/XYZ/simpleTestFile/groups.txtā€ run LOF ā€Numeric Target attributes=Latitude,Longitudeā€
KValue=3 newPoint=ā€Alexandria;31.205753;29.924526ā€
3.2.3.2 Deletion: In case of Deletion , the algorithm will need the K value ā€which
was already implemented in batch mode , so we will reuse it againā€ and will need
the index of data to be deleted. the class that correspond to the new parameter is
ā€ILOFDeleteExampleā€.
Again the function ā€concreteCallā€ contains the main logic behind the deletion algo-
rithm. In this function , the code deleted the example with at the index position that
the user pass and updates only the value of LOF values for limited number of eļ¬€ected
neighbors as shown in the algorithm.
8
To test incremental LOF deletion, the user should call the program with the following
parameters: RealKD loadā€Path to input data setā€ ā€Path to input attribute Fileā€ ā€in-
put to group input ļ¬leā€ run LOF ā€specify Numeric target attributesā€ KValue=[Value
of K] deleteIndex=index to be deleted. for Example, to delete the fourth data in
data set ā€index=3ā€, the user executes:
RealKD load ā€/Users/XYZ/simpleTestFile/data.txtā€ ā€/Users/XYZ/simpleTestFile/attributes.txtā€
ā€/Users/XYZ/simpleTestFile/groups.txtā€ run LOF ā€Numeric Target attributes=Latitude,Longitudeā€
KValue=3 deleteIndex=3
4 Experiments
In this section , we will show an experiment of running the algorithm against simple
geographical data set that contains German and Egyptian cities with their coordinates
(Longitude and Latitude).For simplicity , we will select small value for K value =3.
4.1 Running Batch Mode
In the dataset , we put nine German Cities and one Egyptian city with their coordinates
ā€Longitude and Latitudeā€ and then run the algorithm with K value equals 3.
Here is the input dataset, each record contains city name,Latitude,Longitude:
Berlin 52.520 13.380
Hamburg 53.550 10.000
Munchen 48.140 11.580
Koln 50.950 6.970
Frankfurt 50.120 8.680
Dortmund 51.510 7.480
Stuttgart 48.790 9.190
Essen 51.470 7.000
Bonn 50.730 7.100
Cairo 30.3 31.14
After running the program, The algorithms compute the LOF value for all cities we
can see that the Egyptian city has big LOF value.The following is the program output
which shows the index of city along with its LOF value.
9
1 1.191001325549464
2 1.1997290645736223
3 0.9628552264586343
4 0.7586643428289646
5 0.7359971660104047
6 0.7495005015494334
7 1.005038367007253
8 0.6938237614108347
9 0.8042180889618675
10 5.521620958353801
from the program output, we see that the object which outside the cluster (Cairo) has
LOF value larger than 1, while all other objects has LOF value approximately equals 1.
, now letā€™s add other three Egyptian cities.
Aswan 25.6833 32.6500
Alexandria 31.13 29.58
Hurghada 27.15 33.50
So, now number of Egyptian Cities =4 , and now they form their own cluster , as their
number is larger than the value of K.So when the algorithm is run ask for the nearest
3 neighbors of the Egyptian cities, the list will be 3 cities from the same cluster , and
then the LOF values will be approximately equals 1. when run the program , we get the
following output:
1 1.187791873035322
2 1.1898577015355898
3 0.968974318939152
4 0.7563190608212818
5 0.7411657187228445
6 0.7532750824303704
7 0.9872825814658073
8 0.6951376411639997
9 0.8009756608212486
10 0.77008269448957
11 0.7192686329698315
12 0.7903058153557572
13 0.7239698839427493
now , we can see that all values has approximately LOF value equals 1. and this
matches our expectation , as all examples now within clusters.
4.2 Running incremental Mode
In this section we will run incremental LOF insertion and will compute the LOF value
after that and compare the result with the result that we get from running the batch
10
mode in the previous subsection. we start with the same nine german cities, plus three
Egyptian Cities:
Cairo 30.3 31.14
Aswan 25.6833 32.6500
Alexandria 31.13 29.58
Then by running the incremental LOF addition , we will add fourth Egyptian city:
Hurghada;27.15;33.50 and then compute the LOF values for all cities , the user calls the
program from the command line with the following parameters:
RealKD load ā€/Users/XYZ/simpleTestFile/data.txtā€ ā€/Users/XYZ/simpleTestFile/attributes.txtā€
ā€/Users/XYZ/simpleTestFile/groups.txtā€ run LOF ā€Numeric Target attributes=Latitude,Longitudeā€
KValue=3 newPoint=ā€Hurghada;27.15;33.50ā€
Then, this is the output:
1 1.187791873035322
2 1.1898577015355898
3 0.968974318939152
4 0.7563190608212818
5 0.7411657187228445
6 0.7532750824303704
7 0.9872825814658073
8 0.6951376411639997
9 0.8009756608212486
10 0.77008269448957
11 0.7192686329698315
12 0.7903058153557572
13 0.7239698839427493
So, we get same result by running the LOF in Batch mode and in incremental mode
as well.
5 Conclusion
In this paper , we have discussed two modes of running LOF algorithm , batch mode
and incremental mode. by running experiment on simple 2-d geographical dataset , we
get the same expected result from our theoretical understanding.
The algorithm computes LOF value for each existing data example and this value
gives the tendency of this data point to be an outlier. the data examples that are exists
in the cluster has LOF value approximately equals 1, while examples outside the cluster
has LOF values larger than 1.
The incremental LOF algorithm has same performance as the iterated static algo-
rithm but has better complexity as it updates only limited number of neighbors and not
depends on total number of examples in dataset.
11
out implementation of Batch Mode has integrated correctly with realkd development
branch, while the the incremental mode is tested correctly in local machine and will be
merged soon with the realkd development branch.
More research are required to ļ¬nd optimized way to compute K-nearest neighbor and
reverse K-nearest neighbors to improve the performance. Also , more research is also
required to select suitable K value and to determine the LOF threshold for identifying
the outliers when applying the algorithm to real world data set.
References
[1] Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, and JĀØorg Sander. Lof:
Identifying density-based local outliers. SIGMOD Rec., 29(2):93ā€“104, May 2000.
[2] Dragoljub Pokrajac. Incremental local outlier detection for data streams. In In
Proceedings of IEEE Symposium on Computational Intelligence and Data Mining,
pages 504ā€“515, 2007.
12

More Related Content

What's hot

Python Application: Visual Approach of Hopfield Discrete Method for Hiragana ...
Python Application: Visual Approach of Hopfield Discrete Method for Hiragana ...Python Application: Visual Approach of Hopfield Discrete Method for Hiragana ...
Python Application: Visual Approach of Hopfield Discrete Method for Hiragana ...journalBEEI
Ā 
Quantum cryptography
Quantum cryptographyQuantum cryptography
Quantum cryptographyHimanshu Shekhar
Ā 
A Comparison of Serial and Parallel Substring Matching Algorithms
A Comparison of Serial and Parallel Substring Matching AlgorithmsA Comparison of Serial and Parallel Substring Matching Algorithms
A Comparison of Serial and Parallel Substring Matching Algorithmszexin wan
Ā 
CSMR10a.ppt
CSMR10a.pptCSMR10a.ppt
CSMR10a.pptPtidej Team
Ā 
Incremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clusteringIncremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clusteringAllen Wu
Ā 
Quantum cryptography
Quantum cryptographyQuantum cryptography
Quantum cryptographyAnisur Rahman
Ā 
Summarizing Software API Usage Examples Using Clustering Techniques
Summarizing Software API Usage Examples Using Clustering TechniquesSummarizing Software API Usage Examples Using Clustering Techniques
Summarizing Software API Usage Examples Using Clustering TechniquesNikos Katirtzis
Ā 
Iaetsd implementation of lsb image steganography system using edge detection
Iaetsd implementation of lsb image steganography system using edge detectionIaetsd implementation of lsb image steganography system using edge detection
Iaetsd implementation of lsb image steganography system using edge detectionIaetsd Iaetsd
Ā 
636902main kwiat presentation
636902main kwiat presentation636902main kwiat presentation
636902main kwiat presentationClifford Stone
Ā 
Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods
Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis MethodsIntroducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods
Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis MethodsKamiya Toshihiro
Ā 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
Ā 
A hybrid model to detect malicious executables
A hybrid model to detect malicious executablesA hybrid model to detect malicious executables
A hybrid model to detect malicious executablesUltraUploader
Ā 
New Data Association Technique for Target Tracking in Dense Clutter Environme...
New Data Association Technique for Target Tracking in Dense Clutter Environme...New Data Association Technique for Target Tracking in Dense Clutter Environme...
New Data Association Technique for Target Tracking in Dense Clutter Environme...CSCJournals
Ā 
A novel approach based on topic
A novel approach based on topicA novel approach based on topic
A novel approach based on topiccsandit
Ā 
Finding bursty topics from microblogs
Finding bursty topics from microblogsFinding bursty topics from microblogs
Finding bursty topics from microblogsmoresmile
Ā 
Quantum cryptography
Quantum cryptographyQuantum cryptography
Quantum cryptographySukhdeep Kaur
Ā 
Scale free network Visualiuzation
Scale free network VisualiuzationScale free network Visualiuzation
Scale free network VisualiuzationHarshit Srivastava
Ā 

What's hot (18)

Python Application: Visual Approach of Hopfield Discrete Method for Hiragana ...
Python Application: Visual Approach of Hopfield Discrete Method for Hiragana ...Python Application: Visual Approach of Hopfield Discrete Method for Hiragana ...
Python Application: Visual Approach of Hopfield Discrete Method for Hiragana ...
Ā 
Quantum cryptography
Quantum cryptographyQuantum cryptography
Quantum cryptography
Ā 
A Comparison of Serial and Parallel Substring Matching Algorithms
A Comparison of Serial and Parallel Substring Matching AlgorithmsA Comparison of Serial and Parallel Substring Matching Algorithms
A Comparison of Serial and Parallel Substring Matching Algorithms
Ā 
CSMR10a.ppt
CSMR10a.pptCSMR10a.ppt
CSMR10a.ppt
Ā 
Incremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clusteringIncremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clustering
Ā 
Quantum cryptography
Quantum cryptographyQuantum cryptography
Quantum cryptography
Ā 
Summarizing Software API Usage Examples Using Clustering Techniques
Summarizing Software API Usage Examples Using Clustering TechniquesSummarizing Software API Usage Examples Using Clustering Techniques
Summarizing Software API Usage Examples Using Clustering Techniques
Ā 
Iaetsd implementation of lsb image steganography system using edge detection
Iaetsd implementation of lsb image steganography system using edge detectionIaetsd implementation of lsb image steganography system using edge detection
Iaetsd implementation of lsb image steganography system using edge detection
Ā 
636902main kwiat presentation
636902main kwiat presentation636902main kwiat presentation
636902main kwiat presentation
Ā 
Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods
Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis MethodsIntroducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods
Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods
Ā 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
Ā 
A hybrid model to detect malicious executables
A hybrid model to detect malicious executablesA hybrid model to detect malicious executables
A hybrid model to detect malicious executables
Ā 
New Data Association Technique for Target Tracking in Dense Clutter Environme...
New Data Association Technique for Target Tracking in Dense Clutter Environme...New Data Association Technique for Target Tracking in Dense Clutter Environme...
New Data Association Technique for Target Tracking in Dense Clutter Environme...
Ā 
A novel approach based on topic
A novel approach based on topicA novel approach based on topic
A novel approach based on topic
Ā 
Cdma
CdmaCdma
Cdma
Ā 
Finding bursty topics from microblogs
Finding bursty topics from microblogsFinding bursty topics from microblogs
Finding bursty topics from microblogs
Ā 
Quantum cryptography
Quantum cryptographyQuantum cryptography
Quantum cryptography
Ā 
Scale free network Visualiuzation
Scale free network VisualiuzationScale free network Visualiuzation
Scale free network Visualiuzation
Ā 

Similar to Labreport

Data Mining Un-Compressed Images from cloud with Clustering Compression techn...
Data Mining Un-Compressed Images from cloud with Clustering Compression techn...Data Mining Un-Compressed Images from cloud with Clustering Compression techn...
Data Mining Un-Compressed Images from cloud with Clustering Compression techn...ijaia
Ā 
The Last Line Effect
The Last Line EffectThe Last Line Effect
The Last Line EffectAndrey Karpov
Ā 
C04402019023
C04402019023C04402019023
C04402019023ijceronline
Ā 
DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..butest
Ā 
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network AlgorithmsWeb Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network Algorithmsaciijournal
Ā 
Outlier detection method introduction
Outlier detection method introductionOutlier detection method introduction
Outlier detection method introductionDaeJin Kim
Ā 
Analysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOTAnalysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOTIJERA Editor
Ā 
Recognition of handwritten digits using rbf neural network
Recognition of handwritten digits using rbf neural networkRecognition of handwritten digits using rbf neural network
Recognition of handwritten digits using rbf neural networkeSAT Journals
Ā 
Recognition of handwritten digits using rbf neural network
Recognition of handwritten digits using rbf neural networkRecognition of handwritten digits using rbf neural network
Recognition of handwritten digits using rbf neural networkeSAT Publishing House
Ā 
Computer Vision: Visual Extent of an Object
Computer Vision: Visual Extent of an ObjectComputer Vision: Visual Extent of an Object
Computer Vision: Visual Extent of an ObjectIOSR Journals
Ā 
Algorithmic Analysis to Video Object Tracking and Background Segmentation and...
Algorithmic Analysis to Video Object Tracking and Background Segmentation and...Algorithmic Analysis to Video Object Tracking and Background Segmentation and...
Algorithmic Analysis to Video Object Tracking and Background Segmentation and...Editor IJCATR
Ā 
MULTIPLE HUMAN TRACKING USING RETINANET FEATURES, SIAMESE NEURAL NETWORK, AND...
MULTIPLE HUMAN TRACKING USING RETINANET FEATURES, SIAMESE NEURAL NETWORK, AND...MULTIPLE HUMAN TRACKING USING RETINANET FEATURES, SIAMESE NEURAL NETWORK, AND...
MULTIPLE HUMAN TRACKING USING RETINANET FEATURES, SIAMESE NEURAL NETWORK, AND...IAEME Publication
Ā 
Phenoflow: An Architecture for Computable Phenotypes
Phenoflow: An Architecture for Computable PhenotypesPhenoflow: An Architecture for Computable Phenotypes
Phenoflow: An Architecture for Computable PhenotypesMartin Chapman
Ā 
Traffic Features Extraction and Clustering Analysis for Abnormal Behavior Det...
Traffic Features Extraction and Clustering Analysis for Abnormal Behavior Det...Traffic Features Extraction and Clustering Analysis for Abnormal Behavior Det...
Traffic Features Extraction and Clustering Analysis for Abnormal Behavior Det...Areej Qasrawi
Ā 
Lecture06-Arithmetic Code-2-Algorithm Implementation-P2.pdf
Lecture06-Arithmetic Code-2-Algorithm Implementation-P2.pdfLecture06-Arithmetic Code-2-Algorithm Implementation-P2.pdf
Lecture06-Arithmetic Code-2-Algorithm Implementation-P2.pdfssuserb4d806
Ā 
Software architacture recovery
Software architacture recoverySoftware architacture recovery
Software architacture recoveryImdad Ul Haq
Ā 
IRJET- Autonomous Underwater Vehicle: Electronics and Software Implementation...
IRJET- Autonomous Underwater Vehicle: Electronics and Software Implementation...IRJET- Autonomous Underwater Vehicle: Electronics and Software Implementation...
IRJET- Autonomous Underwater Vehicle: Electronics and Software Implementation...IRJET Journal
Ā 
Web spam classification using supervised artificial neural network algorithms
Web spam classification using supervised artificial neural network algorithmsWeb spam classification using supervised artificial neural network algorithms
Web spam classification using supervised artificial neural network algorithmsaciijournal
Ā 
Improvement of Search Algorithm for Integral Distinguisher in Subblock-Based ...
Improvement of Search Algorithm for Integral Distinguisher in Subblock-Based ...Improvement of Search Algorithm for Integral Distinguisher in Subblock-Based ...
Improvement of Search Algorithm for Integral Distinguisher in Subblock-Based ...ijcisjournal
Ā 

Similar to Labreport (20)

Data Mining Un-Compressed Images from cloud with Clustering Compression techn...
Data Mining Un-Compressed Images from cloud with Clustering Compression techn...Data Mining Un-Compressed Images from cloud with Clustering Compression techn...
Data Mining Un-Compressed Images from cloud with Clustering Compression techn...
Ā 
The Last Line Effect
The Last Line EffectThe Last Line Effect
The Last Line Effect
Ā 
C04402019023
C04402019023C04402019023
C04402019023
Ā 
DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..
Ā 
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network AlgorithmsWeb Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
Ā 
Outlier detection method introduction
Outlier detection method introductionOutlier detection method introduction
Outlier detection method introduction
Ā 
Analysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOTAnalysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOT
Ā 
Recognition of handwritten digits using rbf neural network
Recognition of handwritten digits using rbf neural networkRecognition of handwritten digits using rbf neural network
Recognition of handwritten digits using rbf neural network
Ā 
Recognition of handwritten digits using rbf neural network
Recognition of handwritten digits using rbf neural networkRecognition of handwritten digits using rbf neural network
Recognition of handwritten digits using rbf neural network
Ā 
Computer Vision: Visual Extent of an Object
Computer Vision: Visual Extent of an ObjectComputer Vision: Visual Extent of an Object
Computer Vision: Visual Extent of an Object
Ā 
Algorithmic Analysis to Video Object Tracking and Background Segmentation and...
Algorithmic Analysis to Video Object Tracking and Background Segmentation and...Algorithmic Analysis to Video Object Tracking and Background Segmentation and...
Algorithmic Analysis to Video Object Tracking and Background Segmentation and...
Ā 
MULTIPLE HUMAN TRACKING USING RETINANET FEATURES, SIAMESE NEURAL NETWORK, AND...
MULTIPLE HUMAN TRACKING USING RETINANET FEATURES, SIAMESE NEURAL NETWORK, AND...MULTIPLE HUMAN TRACKING USING RETINANET FEATURES, SIAMESE NEURAL NETWORK, AND...
MULTIPLE HUMAN TRACKING USING RETINANET FEATURES, SIAMESE NEURAL NETWORK, AND...
Ā 
Phenoflow: An Architecture for Computable Phenotypes
Phenoflow: An Architecture for Computable PhenotypesPhenoflow: An Architecture for Computable Phenotypes
Phenoflow: An Architecture for Computable Phenotypes
Ā 
Traffic Features Extraction and Clustering Analysis for Abnormal Behavior Det...
Traffic Features Extraction and Clustering Analysis for Abnormal Behavior Det...Traffic Features Extraction and Clustering Analysis for Abnormal Behavior Det...
Traffic Features Extraction and Clustering Analysis for Abnormal Behavior Det...
Ā 
Lecture06-Arithmetic Code-2-Algorithm Implementation-P2.pdf
Lecture06-Arithmetic Code-2-Algorithm Implementation-P2.pdfLecture06-Arithmetic Code-2-Algorithm Implementation-P2.pdf
Lecture06-Arithmetic Code-2-Algorithm Implementation-P2.pdf
Ā 
Software architacture recovery
Software architacture recoverySoftware architacture recovery
Software architacture recovery
Ā 
IRJET- Autonomous Underwater Vehicle: Electronics and Software Implementation...
IRJET- Autonomous Underwater Vehicle: Electronics and Software Implementation...IRJET- Autonomous Underwater Vehicle: Electronics and Software Implementation...
IRJET- Autonomous Underwater Vehicle: Electronics and Software Implementation...
Ā 
Web spam classification using supervised artificial neural network algorithms
Web spam classification using supervised artificial neural network algorithmsWeb spam classification using supervised artificial neural network algorithms
Web spam classification using supervised artificial neural network algorithms
Ā 
Improvement of Search Algorithm for Integral Distinguisher in Subblock-Based ...
Improvement of Search Algorithm for Integral Distinguisher in Subblock-Based ...Improvement of Search Algorithm for Integral Distinguisher in Subblock-Based ...
Improvement of Search Algorithm for Integral Distinguisher in Subblock-Based ...
Ā 
Project Report
Project ReportProject Report
Project Report
Ā 

More from AMR koura

End-to-end machine learning project in Arabic
End-to-end machine learning project in ArabicEnd-to-end machine learning project in Arabic
End-to-end machine learning project in ArabicAMR koura
Ā 
Neural Network Based Player Retention Prediction in Free to Play Games
Neural Network Based Player Retention Prediction in Free to Play GamesNeural Network Based Player Retention Prediction in Free to Play Games
Neural Network Based Player Retention Prediction in Free to Play GamesAMR koura
Ā 
yelp data challenge
yelp data challengeyelp data challenge
yelp data challengeAMR koura
Ā 
Svm V SVC
Svm V SVCSvm V SVC
Svm V SVCAMR koura
Ā 
Local Outlier Factor
Local Outlier FactorLocal Outlier Factor
Local Outlier FactorAMR koura
Ā 
parameterized complexity for graph Motif
parameterized complexity for graph Motifparameterized complexity for graph Motif
parameterized complexity for graph MotifAMR koura
Ā 

More from AMR koura (6)

End-to-end machine learning project in Arabic
End-to-end machine learning project in ArabicEnd-to-end machine learning project in Arabic
End-to-end machine learning project in Arabic
Ā 
Neural Network Based Player Retention Prediction in Free to Play Games
Neural Network Based Player Retention Prediction in Free to Play GamesNeural Network Based Player Retention Prediction in Free to Play Games
Neural Network Based Player Retention Prediction in Free to Play Games
Ā 
yelp data challenge
yelp data challengeyelp data challenge
yelp data challenge
Ā 
Svm V SVC
Svm V SVCSvm V SVC
Svm V SVC
Ā 
Local Outlier Factor
Local Outlier FactorLocal Outlier Factor
Local Outlier Factor
Ā 
parameterized complexity for graph Motif
parameterized complexity for graph Motifparameterized complexity for graph Motif
parameterized complexity for graph Motif
Ā 

Recently uploaded

Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
Ā 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
Ā 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
Ā 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonJericReyAuditor
Ā 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
Ā 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
Ā 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
Ā 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
Ā 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
Ā 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
Ā 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
Ā 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
Ā 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
Ā 
call girls in Kamla Market (DELHI) šŸ” >ą¼’9953330565šŸ” genuine Escort Service šŸ”āœ”ļøāœ”ļø
call girls in Kamla Market (DELHI) šŸ” >ą¼’9953330565šŸ” genuine Escort Service šŸ”āœ”ļøāœ”ļøcall girls in Kamla Market (DELHI) šŸ” >ą¼’9953330565šŸ” genuine Escort Service šŸ”āœ”ļøāœ”ļø
call girls in Kamla Market (DELHI) šŸ” >ą¼’9953330565šŸ” genuine Escort Service šŸ”āœ”ļøāœ”ļø9953056974 Low Rate Call Girls In Saket, Delhi NCR
Ā 
ā€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
ā€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...ā€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
ā€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
Ā 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
Ā 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
Ā 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
Ā 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
Ā 

Recently uploaded (20)

Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
Ā 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
Ā 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
Ā 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lesson
Ā 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
Ā 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
Ā 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
Ā 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Ā 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
Ā 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
Ā 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
Ā 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
Ā 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
Ā 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
Ā 
call girls in Kamla Market (DELHI) šŸ” >ą¼’9953330565šŸ” genuine Escort Service šŸ”āœ”ļøāœ”ļø
call girls in Kamla Market (DELHI) šŸ” >ą¼’9953330565šŸ” genuine Escort Service šŸ”āœ”ļøāœ”ļøcall girls in Kamla Market (DELHI) šŸ” >ą¼’9953330565šŸ” genuine Escort Service šŸ”āœ”ļøāœ”ļø
call girls in Kamla Market (DELHI) šŸ” >ą¼’9953330565šŸ” genuine Escort Service šŸ”āœ”ļøāœ”ļø
Ā 
ā€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
ā€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...ā€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
ā€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
Ā 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
Ā 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
Ā 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Ā 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Ā 

Labreport

  • 1. Local Outlier Factor Lab Report: Lab Development and Application of Data Mining and Learning Systems 2015 Amr Koura Abstract: Outlier detection has become an important problem in many real world applications. In some applications like intrusion detection , ļ¬nding outlier become more important than ļ¬nding common pattern. In this paper , we are going to discuss one of outlier detection algorithms called ā€LOF:Local outlier factorā€ algorithm.we will show the algorithm in two modes, ļ¬rst is ā€batch modeā€ , where input data set is known in advance, while second mode is ā€incremental modeā€ , where outlier should be detected on the ļ¬‚y during re- ceiving streaming data. The paper will also show the implementation details for the two modes and the integration with open source data mining project called ā€RealKDā€.In the ļ¬rst part,LOF batch mode, the paper will provide theoretical explanation for algorithm and will discuss the implementation details of the algorithm, while in the second part,The incremental mode ,the paper will show how the algorithm computes the outlier eļ¬ƒciently such as insertion and deletion of points will eļ¬€ect only limited number of nearest neighbors and donā€™t depend on the total number of points N in the data set.the implementation details and integration with realKD library will also be discussed. 1 Introduction knowledge discovery in database (KDD) are focusing on identifying understandable knowledge from the existing data.Most of KDD algorithms concentrates on comput- ing patterns that matches large portion of objects in dataset. However , in application like intrusion detection , detecting rare events that deviates from the majority, is more important than identifying common patterns. Most of outlier detection algorithms rely on clustering algorithm. For clustering al- gorithms , outliers are points reside outside the cluster and considered to be noise. this approach depends more in particular clustering algorithm and parameters.In fact,There are very few algorithms that are directly concerned with outlier detection. 1
  • 2. Most of outlier detection algorithms are giving the outlier as binary property so, the points are either classiļ¬ed as an outlier or not. The Local outlier factor is an algorithm that quantify the tendency of being outlier. the algorithm computes local outlier factory for each example that show the tendency of point to be an outlier. The algorithm depends on density based clustering , so it computes the density of each point and compare it to the density of its neighbors. and so, the outlier factor is local in since that only restricted neighbors for each point is taken into account. Online detection of outlier plays an important role in many streaming applications. Automated identiļ¬cation for outlier from data stream is hot research topic and has many usage in modern applications like security application, image and multimedia analysis.The incremental mode of LOF algorithm can detect eļ¬ƒciently the outlier. The incremental LOF provide equivalent performance for batch LOF mode , and has O(N log N) time complexity, where N is the total number of points in the data set. The paper will do experiments on insertion and deletion of points using incremental LOF and will compare the result with the results obtained from the batch mode test. The paper will also discuss the implementation details for batch and incremental mode and show how to integrate the code with an open-source project ā€RealKDā€. 2 Related Work Most of previous KDD outlier detection research papers was building outlier detection approaches based on clustering algorithms. The algorithms was optimized for clustering purpose and considering the outliers as noise data.There was a need to build algorithms that are designed solely to identify the outlier. The ā€LOF: Identifying Density-based Local Outliersā€ paper [1] was one of the very ļ¬rst papers who studied LOF algorithm. in That paper , authors explains the theory behind the LOF algorithm and explains the equation behind the algorithm. The paper shows the problem with previous distance based algorithms using ļ¬gure 1,In this ļ¬gure , you can see data set with dense cluster C2 and other less dense cluster C1 and two objects O1 and O2. according to the paper , if the distance between each object in C1 and its nearest neighbors are larger than the distance between O2 and C2, so we canā€™t ļ¬nd a minimum distance dmin that we use it, to classify O2 as outlier without classify objects in C1 as outlier too. LOF was a solution for such problem because it has local view of the points, not global view as distance-based did.our implementation for batch mode totally is based on the equations that was provided in [1]. The demand for detecting in the data stream applications has become increasingly important and active research area. The incremental mode of LOF algorithm in this paper is based on [2]. In that paper , the Author shows eļ¬ƒcient algorithm in detecting 2
  • 3. Figure 1: distance based algorithm http://www.dbs.iļ¬.lmu.de/Publikationen/Papers/LOF.pdf the outlier in data stream application and shows that insertion or deletion of point depends on limited numbers of neighbors and donā€™t depends in total number of points N in the data set. in this paper we use the same algorithms and try it with real example. Our implementation of incremental mode of LOF is based on the algorithms that are provided in [2]. our implementation should integrate with open-source project called ā€realKDā€. re- alKD is an open-source java library that has be designed to help users to apply KDD algorithms to discover real knowledge from real data. The repository for realKD library is ā€https://bitbucket.org/realKD/realkd/wiki/Homeā€. realKD library is published under MIT license and there are many algorithms which are already implemented.There was an outlier detection algorithm that implemented pre- viously using Support vector machine SVM and our algorithm supposed to be second outlier detection algorithm for that library. we will discuss in the next section the inter- faces that we use to implement in order to integrate successfully. and will also mention how the user can call our algorithm with parameters from the command line interface. 3 Local Outlier Factor Algorithm The local outlier factor algorithm is based on a concept of a local density, where the algorithm compare the density of each point with the density of nearest neighbors. In the following subsection , the papers will discuss the theory behind algorithm and the implementation details for the algorithm and the integration part. In ļ¬rst subsection , we will deal with the ā€Batchā€ mode , while in second section, we will deal with the ā€Incrementalā€ mode. 3.1 LOF Batch Mode: In this section , we will discuss the details of theory behind the LOF batch mode and the implementation details for the algorithm and integration. the details for the algorithm 3
  • 4. is based on [1]. 3.1.1 Fomal Deļ¬nition: let K-distance(p) be the distance of the object p to the k-th nearest neighbor. set of the k nearest neighbors includes all objects at this distance,and can be more than k objects. we will denote the set of k nearest neighbors as Nk(p). Nk(p) = {q āˆˆ D {p} | d(q, p) ā‰¤ k āˆ’ distance(p)} Deļ¬nition: reachability distance of an object p w.r.t object o: Let K be integer number. the reachability distance of an object p with respect to object o is deļ¬ned as: reach āˆ’ distk(p, o) = max{k āˆ’ distance(o), d(p, o)} Figure 2: reach-dist(p1,o) and reach-dist(p2,o), for k=4 http://www.dbs.iļ¬.lmu.de/Publikationen/Papers/LOF.pdf Figure 2 illustrate the idea of reachability distance between objects p and o. if object p is far away from object o, like p2, then the reachability distance in that case equals to the distance between p and o , d(p,o). but if object p is near from o, like p1, then the reachability distance will be equals to k-distance of object o. This shows the importance of parameter K, the higher the value of K , the more similar the reachability distance for objects within the same neighborhood. Deļ¬nition: local reachability density of object p: lrdK(p) = 1/ oāˆˆNK (p) reachāˆ’distK (p,o) |NK (p)| the Local reachability density of object p is the inverse of average reachability distance based on K nearest neighbor of p. Deļ¬nition: local outlier factor of an object p: 4
  • 5. The local outlier factor of an object p is deļ¬ned as: LOFK(p) = oāˆˆNK (p) lrdK (o) lrdK (p) |NK (p)| The LOF value for an object gives the tendency of that object to be an outlier.The LOF value has a very special properties that can help to detect the outlier. The LOF value for objects that exists deep inside the cluster is approximately equals 1, but LOF values objects outside the cluster has values larger than 1. proof [1] 3.1.2 implementation details In this subsection, we will show out implementation details for the batch mode LOF algorithm. The class ā€LOFOutlierā€ in package ā€de.unibonn.realkd.algorithms.outlier.LOFā€ con- tains all the details code to implement the algorithm equation. The class extends from abstract class ā€AbstractMiningAlgorithmā€ which is the class that contains the logic to be called from the realKD framework. we maintain N*N matrix called ā€trainingMatrixā€ that contains the distance between all data in our data set. Another N*N matrix called ā€sortedTrainingMatrixā€ contains sorted order of indices of data according to their dis- tance. for example , if the nearest neighbor for data with index 0 is the data with index 2, then index 6, then the ļ¬rst row of this matrix, will be like this: 0 2 6 ... so for the ļ¬rst data , the nearest neighbor is it self (index:0), then data in index 2, then data in index 6 , and so on. The reason behind selecting this data structure is to optimize the speed , so in order to ļ¬nd K-nearest neighbor , or K- inverse nearest neighbor ā€which we will need in the incremental mode laterā€, it is easy to work with these matrices to do the task fast. the main parameter for this algorithm to work is the input data set and value of K parameter. The class ā€FractionOFLOFParameterā€ is the class that represents the place holder for K value. The main logic exists in function ā€concreteCallā€ which is the function that will be called from the realKD once the user specify the algorithm name ā€LOFā€. and pass the K value. the function is calling ā€computeLofValuesā€ function which takes the input data set and K value and call corresponding functions to compute the LRD,LOF for each example in the data set. to test this class. the user should call the program with the following parameters: RealKD load ā€Path to input data setā€ ā€Path to input attribute Fileā€ ā€input to group input ļ¬leā€ run LOF ā€specify Numeric target attributesā€ KValue=[Value of K] for Example: 5
  • 6. RealKD load ā€/Users/XYZ/simpleTestFile/data.txtā€ ā€/Users/XYZ/simpleTestFile/attributes.txtā€ ā€/Users/XYZ/simpleTestFile/groups.txtā€ run LOF ā€Numeric Target attributes=Latitude,Longitudeā€ KValue=3 3.2 Incremental LOF Mode designing an incremental LOF algorithm is motivated by two goals. First,the perfor- mance of the algorithm should be equivalent to the performance of iterated ā€staticā€ LOF algorithm,Second because of stream data are considered to be inļ¬nite , so we need an eļ¬ƒcient algorithm, that can do insertion/deletion eļ¬ƒciently and donā€™t depend on the total number of data N, otherwise the performance will be O(N2logN). The paper in [2], shows an eļ¬ƒcient algorithm for incremental LOF Mode for insertion and deletion. in each insertion/deletion operation , the algorithm update limited number of neighbors, so it doesnā€™t depends on total number of records N. This improves the complexity of algorithm comparing to iterated static LOF and makes the complexityO(NlogN) rather than O(N2logN). In this paper we will show the details for the insertion and deletion part and our implementation is based on algorithm in [2]. 3.2.1 Insertion In insertion part , the algorithm should keep track of the points that should be updated ā€K-distance,LRD,LOFā€ after inserting the new data. Figure 3 shows general framework for inserting new point in the incremental LOF. 3.2.2 Deletion In data stream application, there is a need to delete one or more examples to resolve because of the memory limitation and sometime because of these examples become outdated. Same like insertion part, the deletion part should keep track of eļ¬€ected examples to update their K-distance, LRD, and LOF after deleting the required data example. Figure 4 shows framework for deletion. 3.2.3 Implementation Details: In this section we will show the implementation details , integration and command line interface to call incremental LOF for both insertion and deletion. Analogy to batch LOF, we create two classes for the incremental LOF, ļ¬rst ā€ILO- FOutlierAddā€ class that contains the logic of insertion algorithm and, second is ā€ILO- FOutlierDeleteā€ which contains the of deletion algorithm. For reusability reasons, both classes are extends ā€LOFOutlierā€ class, because we will need to reuse all functions that 6
  • 7. Figure 3: incremental LOF insertion http://www-ai.cs.uni- dortmund.de/LEHRE/FACHPROJEKT/SS12/paper/outlier/pokrajac2007.pdf compute LRD,LOF and we will need to use trainingMatrix and sortedTrainingMatrix as well. 3.2.3.1 insertion: in case of insertion , the algorithm will need the K value ā€which was already implemented in batch mode , so we will reuse it againā€ and will need the new data point to be inserted. the class that correspond to the new parameter is ā€ILOFNewDataParameterā€. Again the function ā€concreteCallā€ contains the main logic behind the insertion algo- rithm. In this function , the code is inserting new example into data set and update only the value of LOF values for limited number of eļ¬€ected neighbors as shown in the algorithm. To test incremental LOF addition, the user should call the program with the following parameters: RealKD load ā€Path to input data setā€ ā€Path to input attribute Fileā€ ā€input to group input ļ¬leā€ run LOF ā€specify Numeric target attributesā€ KValue=[Value of K] new- 7
  • 8. Figure 4: incremental LOF Deletion http://www-ai.cs.uni- dortmund.de/LEHRE/FACHPROJEKT/SS12/paper/outlier/pokrajac2007.pdf Point=ā€string delimited of new exampleā€. for Example, to insert data sample with attributes ā€Alexandriaā€,ā€31.205753ā€,ā€29.924526ā€, the user should executes: RealKD load ā€/Users/XYZ/simpleTestFile/data.txtā€ ā€/Users/XYZ/simpleTestFile/attributes.txtā€ ā€/Users/XYZ/simpleTestFile/groups.txtā€ run LOF ā€Numeric Target attributes=Latitude,Longitudeā€ KValue=3 newPoint=ā€Alexandria;31.205753;29.924526ā€ 3.2.3.2 Deletion: In case of Deletion , the algorithm will need the K value ā€which was already implemented in batch mode , so we will reuse it againā€ and will need the index of data to be deleted. the class that correspond to the new parameter is ā€ILOFDeleteExampleā€. Again the function ā€concreteCallā€ contains the main logic behind the deletion algo- rithm. In this function , the code deleted the example with at the index position that the user pass and updates only the value of LOF values for limited number of eļ¬€ected neighbors as shown in the algorithm. 8
  • 9. To test incremental LOF deletion, the user should call the program with the following parameters: RealKD loadā€Path to input data setā€ ā€Path to input attribute Fileā€ ā€in- put to group input ļ¬leā€ run LOF ā€specify Numeric target attributesā€ KValue=[Value of K] deleteIndex=index to be deleted. for Example, to delete the fourth data in data set ā€index=3ā€, the user executes: RealKD load ā€/Users/XYZ/simpleTestFile/data.txtā€ ā€/Users/XYZ/simpleTestFile/attributes.txtā€ ā€/Users/XYZ/simpleTestFile/groups.txtā€ run LOF ā€Numeric Target attributes=Latitude,Longitudeā€ KValue=3 deleteIndex=3 4 Experiments In this section , we will show an experiment of running the algorithm against simple geographical data set that contains German and Egyptian cities with their coordinates (Longitude and Latitude).For simplicity , we will select small value for K value =3. 4.1 Running Batch Mode In the dataset , we put nine German Cities and one Egyptian city with their coordinates ā€Longitude and Latitudeā€ and then run the algorithm with K value equals 3. Here is the input dataset, each record contains city name,Latitude,Longitude: Berlin 52.520 13.380 Hamburg 53.550 10.000 Munchen 48.140 11.580 Koln 50.950 6.970 Frankfurt 50.120 8.680 Dortmund 51.510 7.480 Stuttgart 48.790 9.190 Essen 51.470 7.000 Bonn 50.730 7.100 Cairo 30.3 31.14 After running the program, The algorithms compute the LOF value for all cities we can see that the Egyptian city has big LOF value.The following is the program output which shows the index of city along with its LOF value. 9
  • 10. 1 1.191001325549464 2 1.1997290645736223 3 0.9628552264586343 4 0.7586643428289646 5 0.7359971660104047 6 0.7495005015494334 7 1.005038367007253 8 0.6938237614108347 9 0.8042180889618675 10 5.521620958353801 from the program output, we see that the object which outside the cluster (Cairo) has LOF value larger than 1, while all other objects has LOF value approximately equals 1. , now letā€™s add other three Egyptian cities. Aswan 25.6833 32.6500 Alexandria 31.13 29.58 Hurghada 27.15 33.50 So, now number of Egyptian Cities =4 , and now they form their own cluster , as their number is larger than the value of K.So when the algorithm is run ask for the nearest 3 neighbors of the Egyptian cities, the list will be 3 cities from the same cluster , and then the LOF values will be approximately equals 1. when run the program , we get the following output: 1 1.187791873035322 2 1.1898577015355898 3 0.968974318939152 4 0.7563190608212818 5 0.7411657187228445 6 0.7532750824303704 7 0.9872825814658073 8 0.6951376411639997 9 0.8009756608212486 10 0.77008269448957 11 0.7192686329698315 12 0.7903058153557572 13 0.7239698839427493 now , we can see that all values has approximately LOF value equals 1. and this matches our expectation , as all examples now within clusters. 4.2 Running incremental Mode In this section we will run incremental LOF insertion and will compute the LOF value after that and compare the result with the result that we get from running the batch 10
  • 11. mode in the previous subsection. we start with the same nine german cities, plus three Egyptian Cities: Cairo 30.3 31.14 Aswan 25.6833 32.6500 Alexandria 31.13 29.58 Then by running the incremental LOF addition , we will add fourth Egyptian city: Hurghada;27.15;33.50 and then compute the LOF values for all cities , the user calls the program from the command line with the following parameters: RealKD load ā€/Users/XYZ/simpleTestFile/data.txtā€ ā€/Users/XYZ/simpleTestFile/attributes.txtā€ ā€/Users/XYZ/simpleTestFile/groups.txtā€ run LOF ā€Numeric Target attributes=Latitude,Longitudeā€ KValue=3 newPoint=ā€Hurghada;27.15;33.50ā€ Then, this is the output: 1 1.187791873035322 2 1.1898577015355898 3 0.968974318939152 4 0.7563190608212818 5 0.7411657187228445 6 0.7532750824303704 7 0.9872825814658073 8 0.6951376411639997 9 0.8009756608212486 10 0.77008269448957 11 0.7192686329698315 12 0.7903058153557572 13 0.7239698839427493 So, we get same result by running the LOF in Batch mode and in incremental mode as well. 5 Conclusion In this paper , we have discussed two modes of running LOF algorithm , batch mode and incremental mode. by running experiment on simple 2-d geographical dataset , we get the same expected result from our theoretical understanding. The algorithm computes LOF value for each existing data example and this value gives the tendency of this data point to be an outlier. the data examples that are exists in the cluster has LOF value approximately equals 1, while examples outside the cluster has LOF values larger than 1. The incremental LOF algorithm has same performance as the iterated static algo- rithm but has better complexity as it updates only limited number of neighbors and not depends on total number of examples in dataset. 11
  • 12. out implementation of Batch Mode has integrated correctly with realkd development branch, while the the incremental mode is tested correctly in local machine and will be merged soon with the realkd development branch. More research are required to ļ¬nd optimized way to compute K-nearest neighbor and reverse K-nearest neighbors to improve the performance. Also , more research is also required to select suitable K value and to determine the LOF threshold for identifying the outliers when applying the algorithm to real world data set. References [1] Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, and JĀØorg Sander. Lof: Identifying density-based local outliers. SIGMOD Rec., 29(2):93ā€“104, May 2000. [2] Dragoljub Pokrajac. Incremental local outlier detection for data streams. In In Proceedings of IEEE Symposium on Computational Intelligence and Data Mining, pages 504ā€“515, 2007. 12