DEFINITION
Data mining, the extraction of
hidden predictive information
from large databases, is a
powerful new technology...
Extract, transform, and load transaction data
onto the data warehouse system.
Store and manage the data in a
multidimens...
Classes
Clusters
Association
Sequential
patterns
Stored data is used to locate data in
predetermined groups. For example, a
restaurant chain could mine customer
purchase d...
Data items are grouped according to logical
relationships or consumer preferences. For
example, data can be mined to ident...
Data can be mined to identify associations.
The beer-diaper example is an example of
associative mining.
• Data is mined to anticipate behavior patterns
and trends. For example, an outdoor
equipment retailer could predict the l...
Evolutionary Step Business Question Enabling Technologies Product Providers Characteristics
Data
Collection(1960s)
"What w...
Techniques
Neural Network
Decision
Tree
Visualisation
Link
Analysis
Neural Network
• Are used in a blackbox fashion.
• One creates a test data set,lets the neural
network learn patterns base...
Link analysis
• This is another technique for associating like
records
• Not used too much, but there are some tools
creat...
Visualisation
• Helps users understand their data
• Makes the bridge from text based to graphical
presentation.
• Such thi...
Decision Tree
• Use real data mining algorithms
• Decision trees help with classification and spit out
information that is...
PROCESS STAGES
1 The initial exploration
2
3
Model building or pattern identification with
validation/verification
Deploym...
Stage 1: Exploration
• This stage usually starts with data preparation
which may involve cleaning data, data
transformatio...
Stage 2: Model building and
validation
This stage involves considering various models
and choosing the best one based on ...
Process Models
Business Understanding Data Understanding
Data Preparation Modeling
Evaluation
Deployment
Define
Measure
Analyze
Improve
Control
Sample
Explore
Modify
Model
Assess
Stage 3: Deployment
That final stage involves using the model
selected as best in the previous stage and
applying it to n...
• KDD Nuggets and Rexer
Analytics have surveys and
asked people involved in
data mining which the
most popular software th...
• Include a wide variety of methods.
• Easy to use interface makes it accessible
for general user
• Flexibility and extens...
• Part of SAS suite of analysis software and uses a
client-server architacture with java based client
allowing parallel pr...
Data mining
Data mining
Data mining
Data mining
Data mining
Data mining
Data mining
Data mining
Data mining
Upcoming SlideShare
Loading in...5
×

Data mining

277

Published on

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
277
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Data mining

  1. 1. DEFINITION Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses.
  2. 2. Extract, transform, and load transaction data onto the data warehouse system. Store and manage the data in a multidimensional database system. Provide data access to business analysts and information technology professionals. Analyze the data by application software. Present the data in a useful format, such as a graph or table.
  3. 3. Classes Clusters Association Sequential patterns
  4. 4. Stored data is used to locate data in predetermined groups. For example, a restaurant chain could mine customer purchase data to determine when customers visit and what they typically order. This information could be used to increase traffic by having daily specials.
  5. 5. Data items are grouped according to logical relationships or consumer preferences. For example, data can be mined to identify market segments or consumer affinities.
  6. 6. Data can be mined to identify associations. The beer-diaper example is an example of associative mining.
  7. 7. • Data is mined to anticipate behavior patterns and trends. For example, an outdoor equipment retailer could predict the likelihood of a backpack being purchased based on a consumer's purchase of sleeping bags and hiking shoes.
  8. 8. Evolutionary Step Business Question Enabling Technologies Product Providers Characteristics Data Collection(1960s) "What was my total revenue in the last five years?" Computers, tapes, disks IBM, CDC Retrospective, static data delivery Data Access(1980s) "What were unit sales in New England last March?" Relational databases (RDBMS), Structured Query Language (SQL), ODBC Oracle, Sybase, Informix, IBM, Microsoft Retrospective, dynamic data delivery at record level Data Warehousing &Decision Support (1990s) "What were unit sales in New England last March? Drill down to Boston." On-line analytic processing (OLAP), multidimensional databases, data warehouses Pilot, Comshare, Arbor, Cognos, Microstrategy Retrospective, dynamic data delivery at multiple levels Data Mining(Emerging Today) "What’s likely to happen to Boston unit sales next month? Why?" Advanced algorithms, multiprocessor computers, massive databases Pilot, Lockheed, IBM, SGI, numerous startups (nascent industry) Prospective, proactive information delivery
  9. 9. Techniques Neural Network Decision Tree Visualisation Link Analysis
  10. 10. Neural Network • Are used in a blackbox fashion. • One creates a test data set,lets the neural network learn patterns based on known outcomes, then sets the neural network loose on huge amounts of data. • For example, a credit card company has 3,000 records, 100 of which are known fraud records • The data set updates the neural network to make sure it knows the difference between the fraud records and the legitimate ones.
  11. 11. Link analysis • This is another technique for associating like records • Not used too much, but there are some tools created just for this. • As the name suggests, the technique tries to find links, either in customers, transactions and demonstrate those links.
  12. 12. Visualisation • Helps users understand their data • Makes the bridge from text based to graphical presentation. • Such things as decision tree, rule ,cluster and pattern visualization help users see data relationships rather than read about them. • Many of the stronger data mining programs have made strides in improving their visual content over the past few years.
  13. 13. Decision Tree • Use real data mining algorithms • Decision trees help with classification and spit out information that is very descriptive,helping users to understand their data. • A decision tree process will generate the rules followed in a process. • For example, a lender at a bank goes through a set of rules when approving a loan. • Based on the loan data a bank has, the outcomes of the loans and limits of acceptable levels of default, the decision tree can set up the guidelines for the lending institution.
  14. 14. PROCESS STAGES 1 The initial exploration 2 3 Model building or pattern identification with validation/verification Deployment
  15. 15. Stage 1: Exploration • This stage usually starts with data preparation which may involve cleaning data, data transformations, selecting subsets of records and - in case of data sets with large numbers of variables ("fields")
  16. 16. Stage 2: Model building and validation This stage involves considering various models and choosing the best one based on their predictive performance. • i.e. explaining the variability in question and producing stable results across samples.
  17. 17. Process Models Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment
  18. 18. Define Measure Analyze Improve Control Sample Explore Modify Model Assess
  19. 19. Stage 3: Deployment That final stage involves using the model selected as best in the previous stage and applying it to new data in order to generate predictions or estimates of the expected outcome.
  20. 20. • KDD Nuggets and Rexer Analytics have surveys and asked people involved in data mining which the most popular software that they use. • While it is not necessarily true that the most popular software is the best for a particular purpose they can help guide us in choosing which software to evaluate.
  21. 21. • Include a wide variety of methods. • Easy to use interface makes it accessible for general user • Flexibility and extensibility make it suitible for academic user • Is written in java and released under the GNU General Public Licence (GPL). • Can be run in Windows, Linux, Mac and other platform.
  22. 22. • Part of SAS suite of analysis software and uses a client-server architacture with java based client allowing parallel processing and grid-computing. • Can be deployed on both Windows and Linux/Unix platforms. • User interface-easy to use data-flow gui • Can intergrate code written in the SAS language. • Data mining package with multiple techniques and data flow interface
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×