2. Introduction
data mining is the art and science of discovering
the knowledge insights and patterns in the data.
it is act of extracting useful pattern from an
organized Collection of data.
patterns must be valid, novel, potentially useful
and understandable.
data mining is a multidisciplinary field that
borrows techniques from a variety of field.
it utilizes the knowledge of the data quality & data
organizing from the databases area.
3. Cont…
It draws-
Analytical techniques from statistics and computer
science
knowledge of decision making from field of business
management.
Example: customer who buy cheese & milk also
buy bread
4. Gathering & selecting data
Growth of data is coming with higher velocity,
volume & variety.
to learn from Data quality data needs to be
effectively gathered in the and organised and
then sufficiently mined.
gathering and curating data takes time & efforts
when data is unstructured or semistructured.
knowledge of the business domain helps to select
the right streams of the data for pursuing the new
insights.
5. Data cleaning & preparation
Duplicate data needs to be removed
Missing values need to be filled in
Data element should be comparable
Continuous values may need to be binned
Outlier data elements need to be removed
Ensure that the data is representative of the
phenomena
Data may need to be selected to increase information
density
6. outputs of data mining
data mining output servers different types of the
objective
data mining output are
decision tree
regression evaluation or mathematical functions
Some business rules
9. Data mining techniques
Decision Tree
Regression
artificial neural networks
cluster analysis
Association rules
10. Tools & platforms for data mining
simple or a sophisticated
standalone or embedded
open source or a commercial
User interface
Data formats
12. Data mining best practices
Business understanding
data understanding
data preparation
modeling
model evaluation
discrimination and rollout
14. Myths about Data Mining
Myth #1 Data mining is about algorithms
Myth #2 Data mining is about predictive accuracy
Myth #3 Data mining requires a data warehouse
Myth #4 Data mining requires large quantities of
data
Myth #5 Data mining technology expert
15. Data mining mistakes
Mistake #1 Selecting the wrong problem for data
mining
Mistake #2 Buried under mountains of data
without clear metadata
Mistake #3 Disorganized data mining
Mistake #4 Insufficient business knowledge
Mistake #5 Incompatibility of data mining tools
and datasets
Mistake #6 Looking only at aggregated results
and not at individual records/predictions
Mistake #7 Not measuring your results differently
from the way you are sponsor measures them