Seminarppt

Outlier detection by using integrated
feature selection algorithms in high
Dimensional Data.
Under the esteemed guidance of
Dr. D.Naga Raju Ph.D.
Professor
Presented by
M.Rao Batchanaboyina
A seminar
on

OUTLIERS
2
Definition:
An outlier is an observation which deviates so much from the
other observations as to arouse other suspicions that it was
generated by a different mechanism.
Outliers also referred to as abnormalities, discordants,
deviants or anomalies.
Useful Applications:
Intrusion Detection Systems
Interesting Sensor Events
Medical Diagnosis
Law Enforcement
Earth Science

3
Outlier Detection models:
Extreme value analysis
Probabilistic and statistical Model
Linear Models
Proximity Based Model
Information Theoretic Models
High Dimensional Outlier Detection

4
High Dimensional Data:
 The dimensionality N is considered large if it is in the range
of hundreds.
But, In recent applications of feature selection, the
Dimensionality can not only in tens but also in hundreds or
thousands.

5
High Dimensional Outlier Detection Methods:
The Subspace Method:
Projected Outliers with Grid
 Distance-based Subspace Outlier Detection
Combining Outliers from Multiple Subspaces

6
High Dimensional Outlier Detection:
While performing outlier detection in High dimensional data
with conventional algorithms, which are suffers from the well
known artifact “curse of High Dimensionality”.
 It is defined as in high dimensional space, the data
becomes sparse, and true outliers are becomes masked by
the noise effects of multiple dimensions, when analyzed in full
dimensionality.

7
Integrating Feature selection algorithms:
Feature subset selection involves 4 steps as follows
1.Subset Generation
2.Subset Evaluation
3. Stopping Criteria
4. Result Evaluation

8
Feature subset selection:
Subset Generation:
For subset generation it follows two approaches:
1. Forward Approach
2. Backward Approach
To efficiently implement these approaches, it uses following:
a. Complete search- Branch & Bound method
b. Sequential Search-Greedy Hill Climbing method
c. Random search- Random State Hill climbing

9
Subset Evaluation:
The subset evaluation carried out in two ways
1. Independent Criteria
2. Dependent Criteria

10
Subset Evaluation:
The subset evaluation carried out in two ways
1. Independent Criteria
2. Dependent Criteria

11
Stopping Criteria:
A Stopping criteria determines when the feature selection
process stopped.
Some frequent used stopping criteria are:
1.The search completes.
2. Some given bound reached.
3. Subsequent addition( or deletion) of any feature
does not produce a better subset.
4. A Sufficiently good subset is selected.

12
Result Validation:
 A straight forward way for result validation is to directly
measure the result using prior knowledge about data.
In real-world applications, however, we usually
do not have such prior knowledge. Hence, we have to rely
on some indirect methods by monitoring the change of
mining performance with the change of features.

13
Feature selection steps:
Subset
generation
Original
set from
HD
Alogorith
ms
Subset
Evaluation
Sub set
Stopping
criteria
Goodness
of Sub set
Result
Validation
YESNO
FOUR KEY STEPS OF FEATURE SELECTION

14
Design of Integrated system for intelligent Feature selection:
It is a two step process,
1. A Unifying Plat Form:
The unifying plat form serves two purposes:
a) To group existing algorithms with similar
characteristics and investigate their strength and
weaknesses on the sample plat form,
b) To provide a guide line in building an intelligent feature
selection.

15
Categorizing Frame Work for Feature Selection Algorithm:
Categorizing Frame Work for Feature Selection Algorithms in a Three
Dimensional Frame Work

16
Design of Integrated system for intelligent Feature selection:
2. An Integrated System
A Preliminary Integrated System

17
Feature Selection with Large Dimensionality:
From the frame work for feature selection, it is seemed to
be the filter model algorithms is preferred to Wrapper model
algorithms when dealing with the large dimensionality.
 Since filter model algorithms use evaluation criteria that
are less computationally expensive when compared with the
wrapper model algorithms.
Recently, algorithms of Hybrid model are considered to
handle data sets with high dimensionality.

18
Real world Application of Feature Selection :
Intrusion Detection
Genomic Analysis
Image Retrieval
Customer Relationship Management
Existing System Vs Proposed System

19
Conclusion:
By implementing the a new frame work by using High
Dimensional Outlier Detection algorithms with
integrating feature selection is applied to detect
outliers in high Dimensional Data.

References
[1] Huan Liu, Lei Yu “Toward Integrating Feature Selection Algorithms for Classification
and Clustering” IEEE Transactions on Knowledge and Data Engineering, Vol 17,NO
.4,April 2005 .
[2] C.C Aggarwal, Outlier analysis. 1 53, DOI 10.1007/978- - © Springer
science+Business Media New York 2013
[3] M. Dash and H. Liu, “Feature Selection for Clustering,” Proc.
Fourth Pacific Asia Conf. Knowledge Discovery and Data Mining,
(PAKDD-2000), pp. 110-121, 2000.
[4] M. Devaney and A. Ram, “Efficient Feature Selection in
Conceptual Clustering,” Proc. 14th Int’l Conf. Machine Learning,
pp. 92-97, 1997.
[5] Feature Selection: An Ever Evolving Frontier in Data Mining ,JMLR: Workshop and
Conference Proceedings 10: 4-13 The Fourth Workshop on Feature Selection in Data
Mining
20

Seminarppt

More Related Content

What's hot

Viewers also liked

Similar to Seminarppt

Recently uploaded

Seminarppt