DataMining Carol Zhou CS157B 2006 Spring
Preview: <ul><li>What is Data Mining  </li></ul><ul><li>Five major elements of DM </li></ul><ul><li>Results of Data Mining...
What is Data Mining  <ul><li>The process of analyzing data from different perspectives and summarizing it into useful info...
Five major elements of DM:  <ul><li>Extract, transform, and load transaction data onto the data warehouse system.  </li></...
Results of Data Mining : <ul><li>Forecasting what may happen in the future </li></ul><ul><li>Classifying people or things ...
Data Mining versus OLAP <ul><li>OLAP:  On-line Analytical Processing </li></ul><ul><li>It provides you with a very good vi...
Data Mining versus DA <ul><li>Data Mining </li></ul><ul><li>- Originally developed to act as expert systems to solve probl...
Phrase one in the DM Process <ul><li>Business Understanding: </li></ul><ul><li>- Statement of Business Objective </li></ul...
Phase two in the DM Process <ul><li>Data Understanding </li></ul><ul><li>- Explore the data and verify the quality </li></...
Phase three in the DM Process <ul><li>Data Preparation: </li></ul><ul><li>- Takes usually over 90% of our time </li></ul><...
Phase four in the DM Process <ul><li>Model building </li></ul><ul><li>* Selection of the modeling techniques is based upon...
Types of Models <ul><li>Prediction Models for Predicting and Classifying </li></ul><ul><li>Descriptive Models for Grouping...
Phase 5 in the DM Process  <ul><li>Model Evaluation </li></ul><ul><li>- Evaluation of model: how well performed on test da...
Phase 5 in the DM Process  <ul><li>Model Evaluation </li></ul><ul><li>- Evaluation of model: how well performed on test da...
Phase 6 in the DM Process <ul><li>Deployment </li></ul><ul><li>- Determine how the results need to be utilized </li></ul><...
References: <ul><li>http://www.dama-ncr.org/Library/ 2001.11.14-Laura%20Squier.ppt </li></ul><ul><li>http://www.anderson.u...
Upcoming SlideShare
Loading in …5
×

Data Mining by Carol Zhou (5/2)

385 views
305 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
385
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Data Mining by Carol Zhou (5/2)

  1. 1. DataMining Carol Zhou CS157B 2006 Spring
  2. 2. Preview: <ul><li>What is Data Mining </li></ul><ul><li>Five major elements of DM </li></ul><ul><li>Results of Data Mining </li></ul><ul><li>Data Mining versus OLAP </li></ul><ul><li>Six phrases of the DM Process </li></ul><ul><li>Reference </li></ul>
  3. 3. What is Data Mining <ul><li>The process of analyzing data from different perspectives and summarizing it into useful information </li></ul><ul><li>The process of finding correlations or patterns among dozens of fields in large relational databases. </li></ul><ul><li>Analytical tools for analyzing data </li></ul>
  4. 4. Five major elements of DM: <ul><li>Extract, transform, and load transaction data onto the data warehouse system. </li></ul><ul><li>Store and manage the data in a multidimensional database system. </li></ul><ul><li>Provide data access to business analysts and information technology professionals. </li></ul><ul><li>Analyze the data by application software. </li></ul><ul><li>Present the data in a useful format, such as a graph or table. </li></ul>
  5. 5. Results of Data Mining : <ul><li>Forecasting what may happen in the future </li></ul><ul><li>Classifying people or things into groups by recognizing patterns </li></ul><ul><li>Clustering people or things into groups based on their attributes </li></ul><ul><li>Associating what events are likely to occur together </li></ul><ul><li>Sequencing what events are likely to lead to later events </li></ul>
  6. 6. Data Mining versus OLAP <ul><li>OLAP: On-line Analytical Processing </li></ul><ul><li>It provides you with a very good view of what is happening, but can not predict what will happen in the future or why it is happening </li></ul>
  7. 7. Data Mining versus DA <ul><li>Data Mining </li></ul><ul><li>- Originally developed to act as expert systems to solve problems </li></ul><ul><li>- Less interested in the mechanics of technique </li></ul><ul><li>If it makes sense then let’s use it </li></ul><ul><li>Does not require assumptions to be made about data </li></ul><ul><li>Can find patterns in very large amounts of data </li></ul><ul><li>Requires understanding of data and business problem </li></ul><ul><li>Data Analysis </li></ul><ul><li>- Tests for statistical correctness of models </li></ul><ul><li>* Are statistical assumptions of models correct? </li></ul><ul><li>- Hypothesis testing </li></ul><ul><li>* Is the relationship significant? </li></ul><ul><li>use a test to validate significance </li></ul><ul><li>- Tend to rely on sampling </li></ul><ul><li>- Techniques are not good for large amounts of data </li></ul><ul><li>- Requires strong statistical skills </li></ul>
  8. 8. Phrase one in the DM Process <ul><li>Business Understanding: </li></ul><ul><li>- Statement of Business Objective </li></ul><ul><li>- Statement of Data Mining objective </li></ul><ul><li>- Statement of Success Criteria </li></ul>
  9. 9. Phase two in the DM Process <ul><li>Data Understanding </li></ul><ul><li>- Explore the data and verify the quality </li></ul><ul><li>- Find outliers </li></ul>
  10. 10. Phase three in the DM Process <ul><li>Data Preparation: </li></ul><ul><li>- Takes usually over 90% of our time </li></ul><ul><li>* Collection </li></ul><ul><li>* Assessment </li></ul><ul><li>* Consolidation and Cleaning </li></ul><ul><li>- table links, aggregation level, missing values, etc </li></ul><ul><li>* Data selection </li></ul><ul><li>- active role in ignoring non-contributory data </li></ul><ul><li>- use of samples </li></ul><ul><li>- visualization tools </li></ul><ul><li>* Transformations – create new variables </li></ul>
  11. 11. Phase four in the DM Process <ul><li>Model building </li></ul><ul><li>* Selection of the modeling techniques is based upon the data mining objective </li></ul><ul><li>* Modeling is an iterative process – different for supervised and unsupervised learning </li></ul><ul><li>- Model for either description or prediction </li></ul>
  12. 12. Types of Models <ul><li>Prediction Models for Predicting and Classifying </li></ul><ul><li>Descriptive Models for Grouping and Finding Associations </li></ul>
  13. 13. Phase 5 in the DM Process <ul><li>Model Evaluation </li></ul><ul><li>- Evaluation of model: how well performed on test data </li></ul><ul><li>- Methods and criteria depend on model type </li></ul><ul><li>- Interpretation of model: </li></ul><ul><li>important or not, easy or hard depends on algorithm </li></ul>
  14. 14. Phase 5 in the DM Process <ul><li>Model Evaluation </li></ul><ul><li>- Evaluation of model: how well performed on test data </li></ul><ul><li>- Methods and criteria depend on model type </li></ul><ul><li>- Interpretation of model: </li></ul><ul><li>important or not, easy or hard depends on algorithm </li></ul>
  15. 15. Phase 6 in the DM Process <ul><li>Deployment </li></ul><ul><li>- Determine how the results need to be utilized </li></ul><ul><li>- Who needs to use them? </li></ul><ul><li>- How often do they need to be used </li></ul><ul><li>Deploy Data Mining results by : </li></ul><ul><li>- Scoring a database </li></ul><ul><li>- Utilizing results as business rules </li></ul><ul><li>- Interactive scoring on-line </li></ul>
  16. 16. References: <ul><li>http://www.dama-ncr.org/Library/ 2001.11.14-Laura%20Squier.ppt </li></ul><ul><li>http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm </li></ul>

×