defining the metrics by which the model will be evaluated
defining the final objective for the data mining project.
Prepare and Explore Data
Data can be scattered across a company and stored in different formats
May contain inconsistencies such as flawed or missing entries
For eg. customer bought a product before that customer was actually even born
understand the data in order to make appropriate decisions when models are created .
Building Model and Validating Model
knowledge gained from the Exploring Data step help define and create a mining model.
A model typically contains input columns, an identifying column, and a predictable column.
Patterns are found by passing the original data through a mathematical algorithm.
Model to be validated before put into production. Several tests are run.
Data Mining Algorithms
Linear Regression Algorithm
Decision Trees Algorithm
Naive Bayes Algorithm
Sequence Clustering Algorithm
Time Series Algorithm
Neural Network Algorithm (SSAS)
Logistic Regression Algorithm
Financial Data Mining
Data Mining In Healthcare
Scientific Data Mining
Data Mining in Oil and Gas industry
Real World Example
Consider a bank which gives loan to customers and it has an dataset of a group of customers who selected a financial loan product, some of whom went "BAD".
The information we will make use of comes from standard credit reports provided by all the major credit bureaus, including variables such as:
Number of credit reports requested for this person in last 6 months
Number of credit cards with balances greater than 80% of available credit
Number of new credit accounts opened in last 12 months
How long ago was oldest account opened?
How long ago was newest account opened?
Can generate new business opportunities by:
Automated prediction of trends and behaviors :
Data mining automates the process of finding predictive information in a large database. It uses data on past promotional mailings to identify the targets most likely to maximize return on investment in future mailings.
Automated discovery of previously unknown patterns
Data mining tools sweep through databases and identify previously hidden patterns. An example of pattern discovery is the analysis of retail sales data to identify seemingly unrelated products that are often purchased together.
What if every telephone call you make, every credit card purchase you make, every flight you take, every visit to the doctor you make, every warranty card you send in, every employment application you fill out, every school record you have, your credit record, every web page you visit ... was all collected together? A lot would be known about you!
Data Readiness for Analysis :
Data-mining requires a consolidated "de-duplicated" and cleaned data store to draw from. Seventy to 85 percent of the work in building models using data mining relates to the cleaning and preparation of data prior to a specific analysis.
Future of Data Mining
Intelligent agents turned loose on medical research data or on sub-atomic particle data.
Computers may reveal new treatments for diseases or new insights into the nature of the universe.