DATA MINING
ANALYSIS OF ATTRIBUTE REVELANCE
BY
V.PRADEEPA
I.M.SC(CS&IT)
NADAR SARASWATHI COLLEGE
OF ARTS AND SCIENCE, THENI.
INTRODUCTION
* Performing data mining analysis on databases is
very tough because of the extensive volume of data.
* Attribute oriented analysis is one such technique.
* Here the analysis is done on the basis of
attributes. Attributes are selected and generalised. And the
patterns of knowledge ultimately formed are on the basis
of attributes only.
* Attribute is a property or characteristic of an
object. A collection of attributes describes an object.
Attribute Generalisation
* Attribute generalisation is based on the following
rule: “ if there is a large set of distinct values for an attribute,
then a generalisation operator should be selected and applied
to the attribute.”
* Nominal attributes: The operation defines a sub-cube
by performing a selection on two or more dimensions.
* Structured attributes: Climbing up concept hierarchy
is used. Replacing a value in an attribute value pair with a
more general one. The operation performs aggregation on
data cube, either by climbing up a concept hierarchy for a
dimension or by dimension reduction.
ATTRIBUTE RELEVANCE
* The general idea behind attribute relevance
analysis is to compute some measure which is used to
quantify the relevance of an attribute with respect to
given class or concept.
ATTRIBUTE SELECTION
* Attribute selection is a term commonly used in data
mining to describe the tools and techniques available for
reducing inputs to a manageable size for processing and
analysis.
* Attribute selection implies not only cardinality
reduction but also the choice of attributes based on their
usefulness for analysis.
SELECTION CRITERIA
* Find a subset of attributes that is most likely to
describe/predict the class best. The following method may be
used.
* Filtering: Filter type methods select variables
regardless of the model. Filter methods suppress the least
interesting variables. These methods are particularly effective
in computation time and robust to over fitting.
INSTANCEBASEDATTRIBUTE SELECTION
* Instance Based Filters: The goal of the
instance based search is to find the closest decision
boundary to the instance under consideration and assign
weight to the features that bring about the change.
CLASS COMPARISON
* In many applications, users may not be interested in
having a single class described or characterised, but rather
would prefer to mine a description that compares or
distinguishes one class from other comparable classes.
* Class comparison mines descriptions that
distinguish a target class from its contrasting classes.
STATISTICAL MEASURES
* The general procedure for class comparison is as
follows:
* Data Collection: The set of relevant data in the
database is collected by query processing and is partitioned
respectively into a target class and one or a set of contrasting
class.
* Dimension relevance analysis: If there are many
dimensions and analytical comparisons is desired, then
dimension relevance analysis should be performed on these
classes and only the highly relevant dimensions are included
in the further analysis.
* Synchronous generalization: Generalization is
performed on the target class to the level controlled by a
user-or expert specified dimension threshold, which
results in a prime target class relation.
* Presentation of the derived comparison: The
resulting class comparison description can be visualized in
the form of tables, graphs, and rules.
* This presentation usually includes a “contrasting”
measure (such as count %)that reflects the comparisons
between the target and contrasting classes.
* The descriptive statistics are of great help in
understanding the distribution of the data.
* They help us choose an effective
implementation.
MEASURING CENTRAL TENDENCY
* Arithmetic mean is the sum of a collection of
numbers divided by the number of numbers in the
collection.
* Median: Median is the number separating the
higher half of a data sample.
* Mode: mode is the value that appears most often
in a set of data.
MEASURING DISPERSION:
* Variance (σ): variance measures how far a
set of numbers is spread out.
* Standard deviation (σ 2 ): standard
deviation is a measure that is used to quantify the amount of
variation or dispersion of a set of data values.
THANK YOU

Analysis Of Attribute Revelance

  • 1.
    DATA MINING ANALYSIS OFATTRIBUTE REVELANCE BY V.PRADEEPA I.M.SC(CS&IT) NADAR SARASWATHI COLLEGE OF ARTS AND SCIENCE, THENI.
  • 2.
    INTRODUCTION * Performing datamining analysis on databases is very tough because of the extensive volume of data. * Attribute oriented analysis is one such technique. * Here the analysis is done on the basis of attributes. Attributes are selected and generalised. And the patterns of knowledge ultimately formed are on the basis of attributes only. * Attribute is a property or characteristic of an object. A collection of attributes describes an object.
  • 3.
    Attribute Generalisation * Attributegeneralisation is based on the following rule: “ if there is a large set of distinct values for an attribute, then a generalisation operator should be selected and applied to the attribute.” * Nominal attributes: The operation defines a sub-cube by performing a selection on two or more dimensions. * Structured attributes: Climbing up concept hierarchy is used. Replacing a value in an attribute value pair with a more general one. The operation performs aggregation on data cube, either by climbing up a concept hierarchy for a dimension or by dimension reduction.
  • 4.
    ATTRIBUTE RELEVANCE * Thegeneral idea behind attribute relevance analysis is to compute some measure which is used to quantify the relevance of an attribute with respect to given class or concept.
  • 5.
    ATTRIBUTE SELECTION * Attributeselection is a term commonly used in data mining to describe the tools and techniques available for reducing inputs to a manageable size for processing and analysis. * Attribute selection implies not only cardinality reduction but also the choice of attributes based on their usefulness for analysis.
  • 6.
    SELECTION CRITERIA * Finda subset of attributes that is most likely to describe/predict the class best. The following method may be used. * Filtering: Filter type methods select variables regardless of the model. Filter methods suppress the least interesting variables. These methods are particularly effective in computation time and robust to over fitting.
  • 7.
    INSTANCEBASEDATTRIBUTE SELECTION * InstanceBased Filters: The goal of the instance based search is to find the closest decision boundary to the instance under consideration and assign weight to the features that bring about the change.
  • 8.
    CLASS COMPARISON * Inmany applications, users may not be interested in having a single class described or characterised, but rather would prefer to mine a description that compares or distinguishes one class from other comparable classes. * Class comparison mines descriptions that distinguish a target class from its contrasting classes.
  • 9.
    STATISTICAL MEASURES * Thegeneral procedure for class comparison is as follows: * Data Collection: The set of relevant data in the database is collected by query processing and is partitioned respectively into a target class and one or a set of contrasting class. * Dimension relevance analysis: If there are many dimensions and analytical comparisons is desired, then dimension relevance analysis should be performed on these classes and only the highly relevant dimensions are included in the further analysis.
  • 10.
    * Synchronous generalization:Generalization is performed on the target class to the level controlled by a user-or expert specified dimension threshold, which results in a prime target class relation. * Presentation of the derived comparison: The resulting class comparison description can be visualized in the form of tables, graphs, and rules. * This presentation usually includes a “contrasting” measure (such as count %)that reflects the comparisons between the target and contrasting classes.
  • 11.
    * The descriptivestatistics are of great help in understanding the distribution of the data. * They help us choose an effective implementation.
  • 12.
    MEASURING CENTRAL TENDENCY *Arithmetic mean is the sum of a collection of numbers divided by the number of numbers in the collection. * Median: Median is the number separating the higher half of a data sample. * Mode: mode is the value that appears most often in a set of data.
  • 13.
    MEASURING DISPERSION: * Variance(σ): variance measures how far a set of numbers is spread out. * Standard deviation (σ 2 ): standard deviation is a measure that is used to quantify the amount of variation or dispersion of a set of data values.
  • 14.