Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Analysis Of Attribute Revelance
1. DATA MINING
ANALYSIS OF ATTRIBUTE REVELANCE
BY
V.PRADEEPA
I.M.SC(CS&IT)
NADAR SARASWATHI COLLEGE
OF ARTS AND SCIENCE, THENI.
2. INTRODUCTION
* Performing data mining analysis on databases is
very tough because of the extensive volume of data.
* Attribute oriented analysis is one such technique.
* Here the analysis is done on the basis of
attributes. Attributes are selected and generalised. And the
patterns of knowledge ultimately formed are on the basis
of attributes only.
* Attribute is a property or characteristic of an
object. A collection of attributes describes an object.
3. Attribute Generalisation
* Attribute generalisation is based on the following
rule: “ if there is a large set of distinct values for an attribute,
then a generalisation operator should be selected and applied
to the attribute.”
* Nominal attributes: The operation defines a sub-cube
by performing a selection on two or more dimensions.
* Structured attributes: Climbing up concept hierarchy
is used. Replacing a value in an attribute value pair with a
more general one. The operation performs aggregation on
data cube, either by climbing up a concept hierarchy for a
dimension or by dimension reduction.
4. ATTRIBUTE RELEVANCE
* The general idea behind attribute relevance
analysis is to compute some measure which is used to
quantify the relevance of an attribute with respect to
given class or concept.
5. ATTRIBUTE SELECTION
* Attribute selection is a term commonly used in data
mining to describe the tools and techniques available for
reducing inputs to a manageable size for processing and
analysis.
* Attribute selection implies not only cardinality
reduction but also the choice of attributes based on their
usefulness for analysis.
6. SELECTION CRITERIA
* Find a subset of attributes that is most likely to
describe/predict the class best. The following method may be
used.
* Filtering: Filter type methods select variables
regardless of the model. Filter methods suppress the least
interesting variables. These methods are particularly effective
in computation time and robust to over fitting.
7. INSTANCEBASEDATTRIBUTE SELECTION
* Instance Based Filters: The goal of the
instance based search is to find the closest decision
boundary to the instance under consideration and assign
weight to the features that bring about the change.
8. CLASS COMPARISON
* In many applications, users may not be interested in
having a single class described or characterised, but rather
would prefer to mine a description that compares or
distinguishes one class from other comparable classes.
* Class comparison mines descriptions that
distinguish a target class from its contrasting classes.
9. STATISTICAL MEASURES
* The general procedure for class comparison is as
follows:
* Data Collection: The set of relevant data in the
database is collected by query processing and is partitioned
respectively into a target class and one or a set of contrasting
class.
* Dimension relevance analysis: If there are many
dimensions and analytical comparisons is desired, then
dimension relevance analysis should be performed on these
classes and only the highly relevant dimensions are included
in the further analysis.
10. * Synchronous generalization: Generalization is
performed on the target class to the level controlled by a
user-or expert specified dimension threshold, which
results in a prime target class relation.
* Presentation of the derived comparison: The
resulting class comparison description can be visualized in
the form of tables, graphs, and rules.
* This presentation usually includes a “contrasting”
measure (such as count %)that reflects the comparisons
between the target and contrasting classes.
11. * The descriptive statistics are of great help in
understanding the distribution of the data.
* They help us choose an effective
implementation.
12. MEASURING CENTRAL TENDENCY
* Arithmetic mean is the sum of a collection of
numbers divided by the number of numbers in the
collection.
* Median: Median is the number separating the
higher half of a data sample.
* Mode: mode is the value that appears most often
in a set of data.
13. MEASURING DISPERSION:
* Variance (σ): variance measures how far a
set of numbers is spread out.
* Standard deviation (σ 2 ): standard
deviation is a measure that is used to quantify the amount of
variation or dispersion of a set of data values.