Data mining involves extracting useful information from large datasets. It begins by analyzing simple data to develop representations, then extends this to more complex datasets. Data mining has applications in retail, banking, insurance, and medicine. The main data mining operations are predictive modeling, database segmentation, link analysis, and deviation detection. The CRISP-DM process standardizes the data mining process into business understanding, data understanding, data preparation, modeling, evaluation, and deployment phases.
2. What is data mining?
The process of extracting valid
previously unknown, comprehensive,
and actionable information from large
databases and using it to make
crucial business decision
It starts by developing a
representation of simple data. then
extended to larger sets of data
working on the premise that the larger
data has a structure similar to the
3. Data mining Applications
It is almost applicable in all areas
whether it is for business or for
science.
Provides different purpose and
benefits depending where this
technique is applied.
4. Data mining Applications
Retail/Marketing
Identify buying patterns of customers.
Finding association among customer
demographic characteristic.
Predicting response to mailing
campaigns.
Market basket analysis.
5. Data mining Applications
Banking
Detecting patterns of fraudulent credit
card use.
Identifying loyal customers.
Predicting customers likely to change
their credit card affiliation.
Determining credit card spending by
customer groups.
6. Data mining Applications
Insurance
Claims analysis.
Predicting which customers will buy
new policies.
Medicine
Characterizing patient behavior to
predict surgery visit.
Identifying successful medical
therapies for different illnesses.
7. Data mining Operations
4 main operations of data mining:
Predictive modeling
Database segmentation
Link analysis
Deviation detection
8. Data mining Operations
Predictive modeling
Based observations to form a model of
the important characteristics of some
phenomenon.
Database segmentation
Is about partitioning of database into an
unknown number of segments or
clusters of similar records.
9. Data mining Operations
Link analysis
Based on links called associations
between the individual records and set
of records in a database.
Deviation detection
Newest data mining operation
Often a source of true discovery
because it identifies outliers which
express deviation.
10. Data mining Process
Cross-IndustryStandard Process for
Data Mining (CRISP-DM)
Specifies a data of data mining process
model that is not specific to any industry
tool.
Involved from unknown knowledge
discovery processes used widely in
industry and in direct response to user
requirements.
11. Data mining Process (cont…)
Major objectives of this specification are
to make large data mining projects run
more efficiently as well as to make them
cheaper, more reliable and more
manageable.
A hierarchy process model
12. Data mining Process (cont…)
The process is divided into 6 different
generic phases ranging from business
understanding to deployment of
project result.
The phases of CRISP-DM model are:
Business understanding
Data understanding
Data preparation
Modeling
13. Data mining Process (cont…)
Evaluation
Deployment
Business understanding
This phase is focuses on understanding
the project objectives and requirements
from the business point of view.
Data understanding
This phase includes task for initial
collection of the data and is concerned
with establishing the main characteristics
14. Data mining Process (cont…)
Data preparation
This phase involves all the activities for
constructing the final data set on which
modeling tools can be applied directly.
Modeling
This phase is the actual data mining
operation and involves selecting modeling
techniques, selecting modeling parameters
and assessing the model created.
15. Data mining Process (cont…)
Evaluation
This phase validates the model from the data
analysis point of view.
The model and the steps in modeling are
verified within the context of achieving the
business goals.
Deployment
This phase is all about generating report or as
complex as implementing repeatable data
mining processing across the enterprise.