2. Overview
History of Data Mining
Definition of Data Mining
What is Data Mining?
Data Mining as a whole Process
Why Data Mining is required
Applications of Data Mining
Functions for Data Mining
3. History of Data Mining
The term "Data mining" was introduced in the
1990s, but data mining is the evolution of a field
with a long history.
Early methods of identifying patterns in data
include Bayes' theorem (1700s) and regression
analysis (1800s).
4. Definition of Data Mining
Data mining is the process of discovering patterns in
large data sets involving methods at the intersection
of machine learning, statistics, and database systems.
Data mining is an interdisciplinary subfield of
computer science and statistics with an overall goal to
extract information (with intelligent methods) from a
data set and transform the information into a
comprehensible structure for further use.
6. What is Data Mining
Data mining is the analysis step of the "knowledge
discovery in databases" process.
Aside from the raw analysis step, it also involves
database and data management aspects, data pre-
processing, model and inference considerations,
interestingness metrics, complexity considerations,
post-processing of discovered structures,
visualization, and online updating.
7. What is Data Mining
Technically, data mining is the computational
process of analyzing data from different
perspective, dimensions, angles and
categorizing/summarizing it into meaningful
information.
Data Mining can be applied to any type of data
e.g. Data Warehouses, Transactional Databases,
Relational Databases, Multimedia Databases,
Spatial Databases, Time-series Databases, World
Wide Web.
8. Data Analysis Vs. Data Mining
The difference between data analysis and
data mining is that data analysis is used to
test models and hypotheses on the dataset.
e.g., analyzing the effectiveness of a
marketing campaign, regardless of the
amount of data.
In contrast, data mining uses machine-
learning and statistical models to uncover
clandestine or hidden patterns in a large
volume of data.
9. Data Mining as a whole process
The whole process of Data Mining comprises of
three main phases:
1. Data Pre-processing – Data cleaning ,
integration , selection and transformation takes
place
2. Data Extraction – Occurrence of exact data
mining
3. Data Evaluation and Presentation –
Analyzing and presenting results
11. Why Data Mining is required??
There is a huge amount of data available in the
Information Industry. This data is of no use until it is
converted into useful information. It is necessary to
analyze this huge amount of data and extract useful
information from it.
Extraction of information is not the only process we
need to perform.
Data mining also involves other processes such as
Data Cleaning, Data Integration, Data Transformation,
Pattern Evaluation and Data Presentation.
12. Data mining applications
The information or knowledge extracted so
can be used for any of the following
applications −
o Market Analysis
o Fraud Detection
o Customer Retention
o Production Control
o Science Exploration
Apart from these, data mining can also be used in
the areas of sports , astrology , and Internet Web
Surf-Aid.
14. Market Analysis and
Management
Market Analysis is a technique which gives the
careful study of purchases done by a customer in
a super market.
The concept is basically applied to identify the
items that are bought together by a customer.
Say, if a person buys bread, what are the
chances that he/she will also purchase butter.
This analysis helps in promoting offers and deals
by the companies. The same is done with the
help of data mining.
15. Market Analysis and
Management
Listed below are the various fields of market
where data mining is used −
Customer Profiling − Data mining helps determine
what kind of people buy what kind of products.
Identifying Customer Requirements − Data
mining helps in identifying the best products for
different customers. It uses prediction to find the
factors that may attract new customers.
16. Market Analysis and
Management
Cross Market Analysis − Data mining performs
Association/correlations between product sales.
Target Marketing − Data mining helps to find
clusters of model customers who share the same
characteristics such as interests, spending habits,
income, etc.
Determining Customer purchasing pattern − Data
mining helps in determining customer purchasing
pattern.
17. Market Analysis and
Management
Providing Summary Information − Data mining
provides us various multidimensional summary
reports.
18. Corporate Analysis and
Risk Management
Data mining is used in the following fields of the
Corporate Sector −
Finance Planning and Asset Evaluation − It
involves cash flow analysis and prediction,
contingent claim analysis to evaluate assets.
Resource Planning − It involves summarizing and
comparing the resources and spending.
Competition − It involves monitoring competitors
and market directions.
19. Fraud Detection
Data mining is also used in the fields of credit
card services and telecommunication to detect
frauds.
In fraud telephone calls, it helps to find the
destination of the call, duration of the call, time of
the day or week, etc.
It also analyzes the patterns that deviate/differs
from expected norms (normal condition).
20. Functions for Data Mining
Data mining deals with the kind of patterns that
can be mined.
On the basis of the kind of data to be mined,
there are two categories of functions involved in
Data Mining −
Descriptive
Classification and Prediction
22. Descriptive Function
The descriptive function deals with the general
properties of data in the database. Here is the list
of descriptive functions−
Class/Concept Description
Mining of Frequent Patterns
Mining of Associations
Mining of Correlations
Mining of Clusters
23. Classification and Prediction
Classification is the process of finding a model
that describes the data classes or concepts.
The purpose is to be able to use this model to
predict the class of objects whose class label is
unknown.
This derived model is based on the analysis of
sets of training data.
24. Classification and Prediction
The derived model can be presented in the
following forms−
Classification (IF-THEN) Rules
Decision Trees
Mathematical Formulae
Neural Networks
25. Classification and Prediction
The list of functions involved in these processes
are as follows −
Classification
Prediction
Outlier Analysis
Evolution Analysis