DATA MINING: DEFINITIONS  AND DECISION TREE EXAMPLES Emily Thomas Director of Planning and Institutional Research
WHAT IS DATA MINING? <ul><li>Data mining is the discovery of hidden knowledge, unexpected patterns and new rules in large ...
WHY USE DATA MINING? <ul><li>In the corporate world: </li></ul><ul><li>Large amounts of data  are captured in enterprise d...
WHY USE DATA MINING? <ul><li>In institutional research: </li></ul><ul><li>Large numbers of variables </li></ul><ul><li>We ...
WHY DATA MINING NOW? <ul><li>Development of large, integrated enterprise databases </li></ul><ul><li>Development of data m...
DATA MINING TECHNIQUES <ul><li>Decision trees </li></ul><ul><li>Rule induction </li></ul><ul><li>Nearest neighbors </li></...
DECISION TREE ANALYSIS <ul><li>CHAID: Chi-squared Automatic Interaction Detector  </li></ul><ul><li>(SPSS Answer Tree) </l...
TRANSFER RETENTION RATES Percent of new full-time Fall 2002 transfers returning in Spring 2003
TRANSFER RETENTION RATES FALL 2002-SPRING 2003
SOS 2000: SATISFACTION WITH THE QUALITY OF  EDUCATION
VERY LARGE INTELLECTUAL GROWTH 19% of students
LARGE INTELLECTUAL GROWTH 41% of students
LOW OR MODERATE INTELLECTUAL GROWTH 40% of students
SOS 2000: SATISFACTION  WITH  “ THIS COLLEGE IN GENERAL”
DECISION TREE ADVANTAGES AND DISADVANTAGES <ul><li>Discover unexpected relationships  </li></ul><ul><li>Identify subgroup ...
BIBLIOGRAPHY <ul><li>AnswerTree 2.0: User’s Guide . SPSS, 1998. </li></ul><ul><li>Adriaans, P and D Zantinge (1996). Data ...
Upcoming SlideShare
Loading in …5
×

.ppt

403 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
403
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

.ppt

  1. 1. DATA MINING: DEFINITIONS AND DECISION TREE EXAMPLES Emily Thomas Director of Planning and Institutional Research
  2. 2. WHAT IS DATA MINING? <ul><li>Data mining is the discovery of hidden knowledge, unexpected patterns and new rules in large databases. </li></ul><ul><li>Data mining is exploratory. The results lack the protection from spurious conclusions that validates theory-based hypothesis-driven statistics. </li></ul>
  3. 3. WHY USE DATA MINING? <ul><li>In the corporate world: </li></ul><ul><li>Large amounts of data are captured in enterprise data bases. </li></ul><ul><li>These databases are too large for traditional statistical techniques. </li></ul><ul><li>Identifying patterns in the data can target profitable, or unprofitable, customers. </li></ul>
  4. 4. WHY USE DATA MINING? <ul><li>In institutional research: </li></ul><ul><li>Large numbers of variables </li></ul><ul><li>We have insufficient time/resources to investigate all the relationships that might be informative. </li></ul><ul><li>Identifying data patterns can shed light on student behavior. </li></ul>
  5. 5. WHY DATA MINING NOW? <ul><li>Development of large, integrated enterprise databases </li></ul><ul><li>Development of data mining techniques and software </li></ul><ul><li>Development of simplified user interface </li></ul>
  6. 6. DATA MINING TECHNIQUES <ul><li>Decision trees </li></ul><ul><li>Rule induction </li></ul><ul><li>Nearest neighbors </li></ul><ul><li>Neural networks </li></ul><ul><li>Clustering </li></ul><ul><li>Genetic algorithms </li></ul><ul><li>Exploratory factor analysis </li></ul><ul><li>Stepwise regression </li></ul>
  7. 7. DECISION TREE ANALYSIS <ul><li>CHAID: Chi-squared Automatic Interaction Detector </li></ul><ul><li>(SPSS Answer Tree) </li></ul><ul><li>Select significant independent variables </li></ul><ul><li>Identify category groupings or interval breaks to create groups most different with respect to the dependent variable </li></ul><ul><li>Select as the primary independent variable the one identifying groups with the most different values of the dependent variable </li></ul><ul><li>Select additional variables to extend each branch if there are further significant differences </li></ul>
  8. 8. TRANSFER RETENTION RATES Percent of new full-time Fall 2002 transfers returning in Spring 2003
  9. 9. TRANSFER RETENTION RATES FALL 2002-SPRING 2003
  10. 10. SOS 2000: SATISFACTION WITH THE QUALITY OF EDUCATION
  11. 11. VERY LARGE INTELLECTUAL GROWTH 19% of students
  12. 12. LARGE INTELLECTUAL GROWTH 41% of students
  13. 13. LOW OR MODERATE INTELLECTUAL GROWTH 40% of students
  14. 14. SOS 2000: SATISFACTION WITH “ THIS COLLEGE IN GENERAL”
  15. 15. DECISION TREE ADVANTAGES AND DISADVANTAGES <ul><li>Discover unexpected relationships </li></ul><ul><li>Identify subgroup differences </li></ul><ul><li>Use categorical or continuous data </li></ul><ul><li>Accommodate missing data </li></ul><ul><li>Possibly spurious relationships </li></ul><ul><li>Presentation difficulties </li></ul>
  16. 16. BIBLIOGRAPHY <ul><li>AnswerTree 2.0: User’s Guide . SPSS, 1998. </li></ul><ul><li>Adriaans, P and D Zantinge (1996). Data Mining. Harlow, England and elsewhere: Addison-Wesley. </li></ul><ul><li>Bordon, VMH (1995). Segmenting Student Markets with a Student Satisfaction and Priorities Survey. Research in Higher Education 16:2, 115-138. </li></ul><ul><li>Neville, PG. (1999). “Decision Trees for Predictive Modeling,” SAS Technical Report , The SAS Institute. </li></ul><ul><li>Thomas, EH and N Galambos. What Satisfies Students? Mining Student-Opinion Data with Regression and Decision Tree Analysis. Forthcoming in Research in Higher Education , May 2004. </li></ul>

×