On October 23rd, 2014, we updated our
By continuing to use LinkedIn’s SlideShare service, you agree to the revised terms, so please take a few minutes to review them.
Which students make greatest use of institutional services?
What courses provide high full-time equivalent students (FTES) and allow better use of space?
What are the patterns in course taking?
What courses tend to be taken as a group?
Getting to know your students
Who are our best students?
Where do our students come from?
Who is most likely to return for another semester?
Who is most likely to fail or drop out?
Helping your students succeed
Who is most likely to respond to our new campaign?
Which type of marketing/recruiting works best?
Where should we focus our advertising and recruiting?
Making the best use of tight budgets
What are the different types/groups of alumni?
Who is likely to pledge, for how much, and when?
Where and on whom should we focus our fundraising drives?
Continuing the relationship
Our focus today: Predicting student behavior
Acquiring new students
Increasing persistence to and beyond graduation
Data mining defined
“ The process of discovering meaningful new correlations, patterns, and trends by sifting through large amounts of data stored in repositories and by using pattern recognition technologies as well as statistical and mathematical techniques.”
The Gartner Group
“ Simply put, data mining is used to discover patterns and relationships in your data in order to help you make better business decisions.”
Robert Small, Two Crows
Two types of data mining
Purpose : For classification and estimation
Purpose : For clustering and association
Algorithm vs. model
A technical term describing a specific mathematically driven data mining function
A set of representative rules, behaviors or characteristics against which data are analyzed to find similarities
Synonymous with Machine Learning
Identifies complex relations
Somewhat difficult to interpret
Long computation times
Output Hidden layer Input layer
Easy to interpret
- income < $40K
job > 5 yrs then yes
job < 5 yrs then no
- income > $40K
high debt then no
low debt then yes
Discovers events that occur together
Often called ‘market basket’ analysis
Example – What groups classes do certain students take in the same semester that may impact facilities and course scheduling?
Seeks to describe dataset in terms of natural clusters of cases
Example – identify similar groups of students
Predicting student persistence
Case study using Clementine ®
Clustering using TwoStep
Building models for persistence in streams A node is being executed (notice the red arrows denoting the flow of data.
Seeing the work of neural thinking Graphic display showing an ANN is learning the data.
Results of neural node These are the outputs of the Neural Networks. Overall accuracy and significance of features (left). Predicted number of policies using fresh data vs. known data (above).
Examining C5.0 The control panel of the C5.0 node, (Expert)
Results of C5.0 node View the prediction by individual records (PNXT vs. $C-PNXT). View the overall prediction accuracy.
Comparing C&RT and C5.0 Use the Analysis node to examine the difference in accuracy for C&RT and C5.0.
Which one is better: C&RT & C5.0 C5.0 has an accuracy rate of 66.3% and C&RT 63.7%. They agree 72% of the time.
Scoring new data Moment of truth. The most powerful feature of data mining is to use learned “rules” to predict (score) using fresh data for business purposes. Shown here is the change of dataset to a fresh data set unseen by Clementine before now.
Using models to score new data Model Results Scored Results
Additional case study
How best to identify future transfer students so college can groom them?
What can a community college do to increase transfer rates?
Using decision tree models, the top rule for successful transfers was: taking more than 12 units, taken less than 5 non-transfer courses, must have taken at least one math course.
Predicting the behavior of transfer students
Company stability and customer feedback
Join a listserv, such as CLUG
Evaluate data mining software
Determine business needs
Determine technology infrastructure and management support
Identify mining area and business problems
Determine data source(s)
Invite an expert to jump start
Pilot test mining results
CRISP-DM and Real-time data mining, Knowledge Discover in Databases (KDD)
Develop a data mining plan for your institution
Want to Learn More ?
Full training course descriptions at:
Contact us or one of our other data mining experts by calling 800-543-5815 .
Check out the Knowledge Management/Data Mining Discussion Group:
Obtain the book, “Knowledge Management – Building A Competitive Advantage in Higher Education,” published by Jossey-Bass: