Predict student behavior to increase retention


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Predict student behavior to increase retention

  1. 1. Predict student behavior to increase retention Online seminar presented by: Jing Luan, Ph.D., Cabrillo College Bob Valencic, SPSS Inc. August 22, 2002
  2. 2. <ul><li>Business issues in higher education </li></ul><ul><li>How to predict student behavior and increase retention? </li></ul><ul><ul><li>Data mining concepts </li></ul></ul><ul><ul><li>Data mining methods </li></ul></ul><ul><li>Case studies </li></ul><ul><li>Getting started on data mining </li></ul><ul><li>Q&A </li></ul>Seminar agenda
  3. 3. Higher education business issues <ul><li>Institutional effectiveness </li></ul><ul><li>Student learning outcome assessment </li></ul><ul><li>Enrollment management </li></ul><ul><ul><li>Achieving optimum attraction, retention and persistence goals </li></ul></ul><ul><li>Marketing </li></ul><ul><ul><li>Increasing competition for students </li></ul></ul><ul><li>Alumni </li></ul>How can data mining help?
  4. 4. Institutional effectiveness <ul><li>Which students make greatest use of institutional services? </li></ul><ul><li>What courses provide high full-time equivalent students (FTES) and allow better use of space? </li></ul><ul><li>What are the patterns in course taking? </li></ul><ul><li>What courses tend to be taken as a group? </li></ul>Getting to know your students
  5. 5. Enrollment management <ul><li>Who are our best students? </li></ul><ul><li>Where do our students come from? </li></ul><ul><li>Who is most likely to return for another semester? </li></ul><ul><li>Who is most likely to fail or drop out? </li></ul>Helping your students succeed
  6. 6. Marketing <ul><li>Who is most likely to respond to our new campaign? </li></ul><ul><li>Which type of marketing/recruiting works best? </li></ul><ul><li>Where should we focus our advertising and recruiting? </li></ul>Making the best use of tight budgets
  7. 7. Alumni <ul><li>What are the different types/groups of alumni? </li></ul><ul><li>Who is likely to pledge, for how much, and when? </li></ul><ul><li>Where and on whom should we focus our fundraising drives? </li></ul>Continuing the relationship
  8. 8. Our focus today: Predicting student behavior <ul><li>Acquiring new students </li></ul><ul><li>Retaining students </li></ul><ul><li>Increasing persistence to and beyond graduation </li></ul>
  9. 9. Data mining defined <ul><li>“ The process of discovering meaningful new correlations, patterns, and trends by sifting through large amounts of data stored in repositories and by using pattern recognition technologies as well as statistical and mathematical techniques.” </li></ul><ul><li>The Gartner Group </li></ul>
  10. 10. Another definition <ul><li>“ Simply put, data mining is used to discover patterns and relationships in your data in order to help you make better business decisions.” </li></ul><ul><li>Robert Small, Two Crows </li></ul>
  11. 11. CRISP-DM <ul><li>Business Understanding </li></ul><ul><li>Data Understanding </li></ul><ul><li>Data Preparation </li></ul><ul><li>Modeling </li></ul><ul><li>Evaluation </li></ul><ul><li>Deployment </li></ul>
  12. 12. Two types of data mining <ul><li>Supervised </li></ul><ul><li>Purpose : For classification and estimation </li></ul><ul><li>Algorithms </li></ul><ul><ul><li>C5.0 </li></ul></ul><ul><ul><li>C&RT </li></ul></ul><ul><ul><li>Neural </li></ul></ul><ul><ul><li> Network, etc. </li></ul></ul><ul><li>Unsupervised </li></ul><ul><li>Purpose : For clustering and association </li></ul><ul><li>Algorithms </li></ul><ul><ul><li>Kohonen </li></ul></ul><ul><ul><li>Kmeans </li></ul></ul><ul><ul><li>TwoStep </li></ul></ul><ul><ul><li>GRI, etc. </li></ul></ul>
  13. 13. Algorithm vs. model <ul><li>Algorithm </li></ul><ul><li>A technical term describing a specific mathematically driven data mining function </li></ul><ul><li>Model </li></ul><ul><li>A set of representative rules, behaviors or characteristics against which data are analyzed to find similarities </li></ul>
  14. 14. Neural networks <ul><li>Synonymous with Machine Learning </li></ul><ul><li>Identifies complex relations </li></ul><ul><li>Somewhat difficult to interpret </li></ul><ul><li>Long computation times </li></ul>Output Hidden layer Input layer
  15. 15. Decision trees <ul><li>Easy to interpret </li></ul><ul><li>- income < $40K </li></ul><ul><ul><li>job > 5 yrs then yes </li></ul></ul><ul><ul><li>job < 5 yrs then no </li></ul></ul><ul><li>- income > $40K </li></ul><ul><ul><li>high debt then no </li></ul></ul><ul><ul><li>low debt then yes </li></ul></ul>
  16. 16. Apriori <ul><li>Discovers events that occur together </li></ul><ul><li>Often called ‘market basket’ analysis </li></ul><ul><li>Example – What groups classes do certain students take in the same semester that may impact facilities and course scheduling? </li></ul>
  17. 17. Kohonen network <ul><li>Seeks to describe dataset in terms of natural clusters of cases </li></ul><ul><li>Example – identify similar groups of students </li></ul>
  18. 18. <ul><li>Predicting student persistence </li></ul>Case study using Clementine ®
  19. 19. Examining data
  20. 20. Clustering using TwoStep
  21. 21. Building models for persistence in streams A node is being executed (notice the red arrows denoting the flow of data.
  22. 22. Seeing the work of neural thinking Graphic display showing an ANN is learning the data.
  23. 23. Results of neural node These are the outputs of the Neural Networks. Overall accuracy and significance of features (left). Predicted number of policies using fresh data vs. known data (above).
  24. 24. Examining C5.0 The control panel of the C5.0 node, (Expert)
  25. 25. Results of C5.0 node View the prediction by individual records (PNXT vs. $C-PNXT). View the overall prediction accuracy.
  26. 26. Comparing C&RT and C5.0 Use the Analysis node to examine the difference in accuracy for C&RT and C5.0.
  27. 27. Which one is better: C&RT & C5.0 C5.0 has an accuracy rate of 66.3% and C&RT 63.7%. They agree 72% of the time.
  28. 28. Visualizing Results
  29. 29. Visualizing Results
  30. 30. Scoring new data Moment of truth. The most powerful feature of data mining is to use learned “rules” to predict (score) using fresh data for business purposes. Shown here is the change of dataset to a fresh data set unseen by Clementine before now.
  31. 31. Using models to score new data Model Results Scored Results
  32. 32. Additional case study <ul><li>How best to identify future transfer students so college can groom them? </li></ul><ul><li>What can a community college do to increase transfer rates? </li></ul><ul><li>Using decision tree models, the top rule for successful transfers was: taking more than 12 units, taken less than 5 non-transfer courses, must have taken at least one math course. </li></ul>Predicting the behavior of transfer students
  33. 33. Getting started <ul><li>Company stability and customer feedback </li></ul><ul><li>User interface </li></ul><ul><li>Scalability </li></ul><ul><li>Server/Client </li></ul><ul><li>Modeling capacities </li></ul><ul><li>Learning curve </li></ul><ul><li>Join a listserv, such as CLUG </li></ul><ul><li>Cost </li></ul>Evaluate data mining software
  34. 34. Getting started <ul><li>Determine business needs </li></ul><ul><li>Determine technology infrastructure and management support </li></ul><ul><li>Identify mining area and business problems </li></ul><ul><li>Determine data source(s) </li></ul><ul><li>Invite an expert to jump start </li></ul><ul><li>Pilot test mining results </li></ul><ul><li>CRISP-DM and Real-time data mining, Knowledge Discover in Databases (KDD) </li></ul>Develop a data mining plan for your institution
  35. 35. Want to Learn More ? <ul><li>Full training course descriptions at: </li></ul><ul><li> </li></ul><ul><li>Contact us or one of our other data mining experts by calling 800-543-5815 . </li></ul><ul><li>Check out the Knowledge Management/Data Mining Discussion Group: </li></ul><ul><li> </li></ul><ul><li>Obtain the book, “Knowledge Management – Building A Competitive Advantage in Higher Education,” published by Jossey-Bass: </li></ul><ul><li>,,0787962910,00.html </li></ul><ul><li>Bob Valencic [email_address] </li></ul><ul><li>Jing Luan [email_address] </li></ul>
  36. 36. Thank you! <ul><li>Predict student behavior to increase retention </li></ul><ul><li>2 nd Annual Public Sector Roadshow </li></ul><ul><li>October 15 in Washington, D.C. </li></ul><ul><li> </li></ul>