Data mining

DATA MINING(DEFINITION)
 Data mining is the process of sorting through
large data sets to identify patterns and establish
relationships to solve problems through data
analysis. Data mining tools allow enterprises to
predict future trends.
 The term "data mining" is in fact a misnomer,
because the goal is the extraction of patterns
and knowledge from large amounts of data, not
the extraction (mining) of data itself.

 Data mining is an interdisciplinary subfield
of computer science and statistics with an
overall goal to extract information (with
intelligent methods) from a data set and
transform the information into a
comprehensible structure for further use. Data
mining is the analysis step of the "knowledge
discovery in databases" process, or KDD.

 Aside from the raw analysis step, it also
involves database and data
management aspects, data pre-
processing, model and inference considerations
, interestingness
metrics, complexity considerations, post-
processing of discovered
structures, visualization, and online updating.

 The difference between data analysis and data
mining is that data analysis is to summarize the
history such as analyzing the effectiveness of a
marketing campaign, in contrast, data mining
focuses on using specific machine learning and
statistical models to predict the future and
discover the patterns among data.

Knowledge Discovery in Databases (KDD)
 Knowledge discovery in databases (KDD) is the
process of discovering useful knowledge from a
collection of data. This widely used data mining
technique is a process that includes data preparation
and selection, data cleansing, incorporating prior
knowledge on data sets and interpreting accurate
solutions from the observed results.
 Major KDD application areas include marketing,
fraud detection, telecommunication and
manufacturing.

 Traditionally, data mining and knowledge discovery
was performed manually. As time passed, the amount
of data in many systems grew to larger than terabyte
size, and could no longer be maintained manually.
Moreover, for the successful existence of any
business, discovering underlying patterns in data is
considered essential. As a result, several software
tools were developed to discover hidden data and
make assumptions, which formed a part of artificial
intelligence.

 The KDD process has reached its peak in the
last 10 years. It now houses many different
approaches to discovery, which includes
inductive learning, Bayesian statistics,
semantic query optimization, knowledge
acquisition for expert systems and information
theory. The ultimate goal is to extract high-
level knowledge from low-level data.

STAGES IN KDD:
 The overall process of finding and interpreting
patterns from data involves the repeated application of
the following steps:
 Developing an understanding of
 the application domain
 the relevant prior knowledge
 the goals of the end-user

 Creating a target data set: selecting a data set, or
focusing on a subset of variables, or data samples, on
which discovery is to be performed.
 Data cleaning and preprocessing.
 Removal of noise or outliers.
 Collecting necessary information to model or account
for noise.
 Strategies for handling missing data fields.
 Accounting for time sequence information and known
changes.

 Data reduction and projection.
 Finding useful features to represent the data depending
on the goal of the task.
 Using dimensionality reduction or transformation
methods to reduce the effective number of variables
under consideration or to find invariant representations
for the data.
 Choosing the data mining task.
 Deciding whether the goal of the KDD process is
classification, regression, clustering, etc.

 Choosing the data mining algorithm(s).
 Selecting method(s) to be used for searching for
patterns in the data.
 Deciding which models and parameters may be
appropriate.
 Matching a particular data mining method with the
overall criteria of the KDD process.

 Data mining.
 Searching for patterns of interest in a particular
representational form or a set of such representations as
classification rules or trees, regression, clustering, and
so forth.
 Interpreting mined patterns.
 Consolidating discovered knowledge.

Data mining

More Related Content

What's hot

Similar to Data mining

More from DeepikaT13

Recently uploaded

Data mining