DATA MINING(DEFINITION)
 Data mining is the process of sorting through
large data sets to identify patterns and establish
relationships to solve problems through data
analysis. Data mining tools allow enterprises to
predict future trends.
 The term "data mining" is in fact a misnomer,
because the goal is the extraction of patterns
and knowledge from large amounts of data, not
the extraction (mining) of data itself.
 Data mining is an interdisciplinary subfield
of computer science and statistics with an
overall goal to extract information (with
intelligent methods) from a data set and
transform the information into a
comprehensible structure for further use. Data
mining is the analysis step of the "knowledge
discovery in databases" process, or KDD.
 Aside from the raw analysis step, it also
involves database and data
management aspects, data pre-
processing, model and inference considerations
, interestingness
metrics, complexity considerations, post-
processing of discovered
structures, visualization, and online updating.
 The difference between data analysis and data
mining is that data analysis is to summarize the
history such as analyzing the effectiveness of a
marketing campaign, in contrast, data mining
focuses on using specific machine learning and
statistical models to predict the future and
discover the patterns among data.
Knowledge Discovery in Databases (KDD)
 Knowledge discovery in databases (KDD) is the
process of discovering useful knowledge from a
collection of data. This widely used data mining
technique is a process that includes data preparation
and selection, data cleansing, incorporating prior
knowledge on data sets and interpreting accurate
solutions from the observed results.
 Major KDD application areas include marketing,
fraud detection, telecommunication and
manufacturing.
 Traditionally, data mining and knowledge discovery
was performed manually. As time passed, the amount
of data in many systems grew to larger than terabyte
size, and could no longer be maintained manually.
Moreover, for the successful existence of any
business, discovering underlying patterns in data is
considered essential. As a result, several software
tools were developed to discover hidden data and
make assumptions, which formed a part of artificial
intelligence.
 The KDD process has reached its peak in the
last 10 years. It now houses many different
approaches to discovery, which includes
inductive learning, Bayesian statistics,
semantic query optimization, knowledge
acquisition for expert systems and information
theory. The ultimate goal is to extract high-
level knowledge from low-level data.
PROCESS OF KDD:
STEPS IN KDD:
STAGES IN KDD:
 The overall process of finding and interpreting
patterns from data involves the repeated application of
the following steps:
 Developing an understanding of
 the application domain
 the relevant prior knowledge
 the goals of the end-user
 Creating a target data set: selecting a data set, or
focusing on a subset of variables, or data samples, on
which discovery is to be performed.
 Data cleaning and preprocessing.
 Removal of noise or outliers.
 Collecting necessary information to model or account
for noise.
 Strategies for handling missing data fields.
 Accounting for time sequence information and known
changes.
 Data reduction and projection.
 Finding useful features to represent the data depending
on the goal of the task.
 Using dimensionality reduction or transformation
methods to reduce the effective number of variables
under consideration or to find invariant representations
for the data.
 Choosing the data mining task.
 Deciding whether the goal of the KDD process is
classification, regression, clustering, etc.
 Choosing the data mining algorithm(s).
 Selecting method(s) to be used for searching for
patterns in the data.
 Deciding which models and parameters may be
appropriate.
 Matching a particular data mining method with the
overall criteria of the KDD process.
 Data mining.
 Searching for patterns of interest in a particular
representational form or a set of such representations as
classification rules or trees, regression, clustering, and
so forth.
 Interpreting mined patterns.
 Consolidating discovered knowledge.
THANK YOU

Data mining

  • 2.
    DATA MINING(DEFINITION)  Datamining is the process of sorting through large data sets to identify patterns and establish relationships to solve problems through data analysis. Data mining tools allow enterprises to predict future trends.  The term "data mining" is in fact a misnomer, because the goal is the extraction of patterns and knowledge from large amounts of data, not the extraction (mining) of data itself.
  • 3.
     Data miningis an interdisciplinary subfield of computer science and statistics with an overall goal to extract information (with intelligent methods) from a data set and transform the information into a comprehensible structure for further use. Data mining is the analysis step of the "knowledge discovery in databases" process, or KDD.
  • 4.
     Aside fromthe raw analysis step, it also involves database and data management aspects, data pre- processing, model and inference considerations , interestingness metrics, complexity considerations, post- processing of discovered structures, visualization, and online updating.
  • 5.
     The differencebetween data analysis and data mining is that data analysis is to summarize the history such as analyzing the effectiveness of a marketing campaign, in contrast, data mining focuses on using specific machine learning and statistical models to predict the future and discover the patterns among data.
  • 6.
    Knowledge Discovery inDatabases (KDD)  Knowledge discovery in databases (KDD) is the process of discovering useful knowledge from a collection of data. This widely used data mining technique is a process that includes data preparation and selection, data cleansing, incorporating prior knowledge on data sets and interpreting accurate solutions from the observed results.  Major KDD application areas include marketing, fraud detection, telecommunication and manufacturing.
  • 7.
     Traditionally, datamining and knowledge discovery was performed manually. As time passed, the amount of data in many systems grew to larger than terabyte size, and could no longer be maintained manually. Moreover, for the successful existence of any business, discovering underlying patterns in data is considered essential. As a result, several software tools were developed to discover hidden data and make assumptions, which formed a part of artificial intelligence.
  • 8.
     The KDDprocess has reached its peak in the last 10 years. It now houses many different approaches to discovery, which includes inductive learning, Bayesian statistics, semantic query optimization, knowledge acquisition for expert systems and information theory. The ultimate goal is to extract high- level knowledge from low-level data.
  • 9.
  • 10.
  • 11.
    STAGES IN KDD: The overall process of finding and interpreting patterns from data involves the repeated application of the following steps:  Developing an understanding of  the application domain  the relevant prior knowledge  the goals of the end-user
  • 12.
     Creating atarget data set: selecting a data set, or focusing on a subset of variables, or data samples, on which discovery is to be performed.  Data cleaning and preprocessing.  Removal of noise or outliers.  Collecting necessary information to model or account for noise.  Strategies for handling missing data fields.  Accounting for time sequence information and known changes.
  • 13.
     Data reductionand projection.  Finding useful features to represent the data depending on the goal of the task.  Using dimensionality reduction or transformation methods to reduce the effective number of variables under consideration or to find invariant representations for the data.  Choosing the data mining task.  Deciding whether the goal of the KDD process is classification, regression, clustering, etc.
  • 14.
     Choosing thedata mining algorithm(s).  Selecting method(s) to be used for searching for patterns in the data.  Deciding which models and parameters may be appropriate.  Matching a particular data mining method with the overall criteria of the KDD process.
  • 15.
     Data mining. Searching for patterns of interest in a particular representational form or a set of such representations as classification rules or trees, regression, clustering, and so forth.  Interpreting mined patterns.  Consolidating discovered knowledge.
  • 16.