• Like
Data Mining by Carol Zhou (5/2)
Upcoming SlideShare
Loading in...5
×

Data Mining by Carol Zhou (5/2)

  • 229 views
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
229
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
7
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. DataMining Carol Zhou CS157B 2006 Spring
  • 2. Preview:
    • What is Data Mining
    • Five major elements of DM
    • Results of Data Mining
    • Data Mining versus OLAP
    • Six phrases of the DM Process
    • Reference
  • 3. What is Data Mining
    • The process of analyzing data from different perspectives and summarizing it into useful information
    • The process of finding correlations or patterns among dozens of fields in large relational databases.
    • Analytical tools for analyzing data
  • 4. Five major elements of DM:
    • Extract, transform, and load transaction data onto the data warehouse system.
    • Store and manage the data in a multidimensional database system.
    • Provide data access to business analysts and information technology professionals.
    • Analyze the data by application software.
    • Present the data in a useful format, such as a graph or table.
  • 5. Results of Data Mining :
    • Forecasting what may happen in the future
    • Classifying people or things into groups by recognizing patterns
    • Clustering people or things into groups based on their attributes
    • Associating what events are likely to occur together
    • Sequencing what events are likely to lead to later events
  • 6. Data Mining versus OLAP
    • OLAP: On-line Analytical Processing
    • It provides you with a very good view of what is happening, but can not predict what will happen in the future or why it is happening
  • 7. Data Mining versus DA
    • Data Mining
    • - Originally developed to act as expert systems to solve problems
    • - Less interested in the mechanics of technique
    • If it makes sense then let’s use it
    • Does not require assumptions to be made about data
    • Can find patterns in very large amounts of data
    • Requires understanding of data and business problem
    • Data Analysis
    • - Tests for statistical correctness of models
    • * Are statistical assumptions of models correct?
    • - Hypothesis testing
    • * Is the relationship significant?
    • use a test to validate significance
    • - Tend to rely on sampling
    • - Techniques are not good for large amounts of data
    • - Requires strong statistical skills
  • 8. Phrase one in the DM Process
    • Business Understanding:
    • - Statement of Business Objective
    • - Statement of Data Mining objective
    • - Statement of Success Criteria
  • 9. Phase two in the DM Process
    • Data Understanding
    • - Explore the data and verify the quality
    • - Find outliers
  • 10. Phase three in the DM Process
    • Data Preparation:
    • - Takes usually over 90% of our time
    • * Collection
    • * Assessment
    • * Consolidation and Cleaning
    • - table links, aggregation level, missing values, etc
    • * Data selection
    • - active role in ignoring non-contributory data
    • - use of samples
    • - visualization tools
    • * Transformations – create new variables
  • 11. Phase four in the DM Process
    • Model building
    • * Selection of the modeling techniques is based upon the data mining objective
    • * Modeling is an iterative process – different for supervised and unsupervised learning
    • - Model for either description or prediction
  • 12. Types of Models
    • Prediction Models for Predicting and Classifying
    • Descriptive Models for Grouping and Finding Associations
  • 13. Phase 5 in the DM Process
    • Model Evaluation
    • - Evaluation of model: how well performed on test data
    • - Methods and criteria depend on model type
    • - Interpretation of model:
    • important or not, easy or hard depends on algorithm
  • 14. Phase 5 in the DM Process
    • Model Evaluation
    • - Evaluation of model: how well performed on test data
    • - Methods and criteria depend on model type
    • - Interpretation of model:
    • important or not, easy or hard depends on algorithm
  • 15. Phase 6 in the DM Process
    • Deployment
    • - Determine how the results need to be utilized
    • - Who needs to use them?
    • - How often do they need to be used
    • Deploy Data Mining results by :
    • - Scoring a database
    • - Utilizing results as business rules
    • - Interactive scoring on-line
  • 16. References:
    • http://www.dama-ncr.org/Library/ 2001.11.14-Laura%20Squier.ppt
    • http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm