An Introduction to Data Mining in Institutional Research

Uploaded on


  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. An Introduction to Data Mining in Institutional Research Dr. Thulasi Kumar Director of Institutional Research University of Northern Iowa
  • 2. AIR/SPSS Professional Development Series Background Covering variety of topics Up to date information on
  • 3. Common Questions 1. Will I be able to get copies of the slides after the event? 2. Is this web seminar being taped so I or others can view it after the fact? 3. Can I ask questions during this event? Copyright 2003-4, SPSS Inc. 3
  • 4. Common Questions 1. Will I be able to get copies of the slides after the event? Yes 2. Is this web seminar being taped so I or others can view it after the fact? Yes 3. Can I ask questions during this event? Yes Copyright 2003-4, SPSS Inc. 4
  • 5. Today’s Agenda Data Mining Overview History How it compares to other analytic techniques Phases in the Data Mining Process Applications of Data Mining in Institutional Research Data Mining solutions Question and Answer
  • 6. The Evolution of Data Analysis Evolutionary Business Enabling Product Characteristics Step Question Technologies Providers Data Collection "What was my Computers, tapes, IBM, CDC Retrospective, (1960s) total revenue in disks static data the last five delivery years?" Data Access "What were unit Relational Oracle, Sybase, Retrospective, (1980s) sales in New databases Informix, IBM, dynamic data England last (RDBMS), Microsoft delivery at record March?" Structured Query level Language (SQL), ODBC Data "What were unit On-line analytic SPSS, Comshare, Retrospective, Warehousing & sales in New processing Arbor, Cognos, dynamic data Decision Support England last (OLAP), Microstrategy,NC delivery at (1990s) March? Drill multidimensional R multiple levels down to Boston." databases, data warehouses Data Mining "What’s likely to Advanced SPSS/Clementine, Prospective, (Emerging happen to Boston algorithms, Lockheed, IBM, proactive Today) unit sales next multiprocessor SGI, SAS, NCR, information month? Why?" computers, Oracle, numerous delivery massive databases startups Source: SPSS BI
  • 7. What is Data Mining? The process of discovering meaningful new correlations, patterns, and trends by sifting through large amounts of data stored in repositories and by using pattern recognition technologies as well as statistical and mathematical techniques (The Gartner Group). The exploration and analysis of large quantities of data in order to discover meaningful patterns and rules (Berry and Linoff). The nontrivial extraction of implicit, previously unknown, and potentially useful information from data (Frawley, Paitestsky-Shapiro and Mathews).
  • 8. Differences between Statistics and Data Mining STATISTICS DATA MINING Confirmative Explorative Small data sets/File-based Large data sets/Databases Small number of variables Large number of variables Deductive Inductive Numeric data Numeric and non-numeric Clean data Data cleaning
  • 9. Paradigm Shift Traditional IR Work: Data file => Descriptive/Regression Analysis => Tabulations/Reports Historical Predictive Data Mining Driven IR Work: Database => Data Mining (Visualization, Association, Clustering, Predicative Modeling) => Immediate Actions Historical Predictive Source: Jing Luan, Cabrillo College, CA
  • 10. Data Mining is not… OLAP Data Warehousing Data Visualization SQL Ad Hoc Queries Reporting
  • 11. Data Mining Roots and Algorithms Statistics Distributions, mathematics, etc. Machine Learning Computer science, heuristics and induction algorithms Artificial Intelligence Emulating human intelligence Neural Networks Biological models, psychology and engineering
  • 12. Data Mining is… Predictive Modeling Liner/Logistic Regression Neural Networks Decision Trees Clustering Kohonen Neural Networks Clustering K-Means Clustering Nearest Neighbor Clustering
  • 13. Data Mining is…(cont’d) Segmentation Credit ranking (1=default) Decision Trees Weekly pay Cat. % Bad 52.01 168 Good 47.99 155 n Total (100.00) 323 Paid Weekly/Monthly P-value=0.0000, Chi-square=179.6665, df=1 Monthly salary Cat. % n Cat. % n Bad 86.67 143 Bad 15.82 25 Neural Networks Good 13.33 22 Good 84.18 133 Total (51.08) 165 Total (48.92) 158 Age Categorical Age Categorical P-value=0.0000, Chi-square=30.1113, df=1 P-value=0.0000, Chi-square=58.7255, df=1 Young (< 25);Middle (25-35) Old ( > 35) Young (< 25) Middle (25-35);Old ( > 35) Cat. % n Cat. % n Cat. % n Cat. % n Bad 90.51 143 Bad 0.00 0 Bad 48.98 24 Bad 0.92 1 Good 9.49 15 Good 100.00 7 Good 51.02 25 Good 99.08 108 Total (48.92) 158 Total (2.17) 7 Total (15.17) 49 Total (33.75) 109 Predictive Modeling Social Class P-value=0.0016, Chi-square=12.0388, df=1 Management;Clerical Professional Cat. % n Cat. % n Bad 0.00 0 Bad 58.54 24 Good 100.00 8 Good 41.46 17 Total (2.48) 8 Total (12.69) 41 Affinity Analysis Association Rule Sequence Generators
  • 14. Phases in the DM Process: CRISP-DM •Business Understanding •Data Understanding •Data Preparation •Modeling •Evaluation •Deployment
  • 15. CRISP-DM Business Understanding Understanding project objectives and data mining problem identification Data Understanding Capturing, understand, explore your data for quality issues Data Preparation Data cleaning, merge data, derive attributes etc. Modeling Select the data mining techniques, build the model Evaluation Evaluate the results and approved models Deployment Put models into practice, monitoring and maintenance plan
  • 16. Data at the heart of the Predictive Enterprise Interaction data Attitudinal data - Offers - Opinions - Results - Preferences - Context - Needs - Click streams - Desires - Notes Descriptive data Behavioral data - Attributes - Orders - Characteristics - Transactions - Self-declared info - Payment history - (Geo)demographics - Usage history Source: SPSS BI
  • 17. Data Mining Applications Institutional Effectiveness Which students make greatest use of institutional services? What courses provide high full-time equivalent students (FTES) and allow better use of space? What are the patterns in course taking? What courses tend to be taken as a group?
  • 18. Data Mining Applications (cont’d) Enrollment Management Who are our best students? Where do our students come from? Who is most likely to return for another semester? Who is most likely to fail or drop out?
  • 19. Data Mining Applications (cont’d) Marketing Who is most likely to respond to our new campaign? Which type of marketing/recruiting works best? Where should we focus our advertising and recruiting?
  • 20. Data Mining Applications (cont’d) Alumni What are the different types/groups of alumni? Who is likely to pledge, for how much, and when? Where and on whom should we focus our fundraising drives?
  • 21. Data Mining Applications in Institutional Research Categorize your students •Cafeteria meal planning Classification •Student housing planning •Identify high risk students Predict students retention/Alumni donations •Estimate/predict alumni contribution Neural Nets/Regression •Predict new student application rate Group similar students •Course planning Segmentation •Academic scheduling •Identify student preferences for clubs and social organizations Identify courses that are taken together •Faculty teaching load estimation Association •Course planning •Academic scheduling Find patterns and trends over time •Predict alumni donation Sequence •Predict potential demand for library resources
  • 22. Data Mining with Clementine Industry-leading workbench for data mining Comprehensive range of tools for all stages of the data mining process Pioneered visual approach for maximum productivity Multiple modeling techniques to predict future events
  • 23. Summary Successful data mining strategy involves: Well defined goals, project objectives, and questions Sufficient and relevant data Careful consideration and selection of software and analysts (tech and domain expert) Support from senior administrators (VPs and the President) DM provides a set of tools, techniques and a standardized process. Need domain expertise in institutional research to build, test, validate, and deploy models. DM does not build models automatically. Analysts do.
  • 24. Next Steps: Data Mining Resources
  • 25. Questions?
  • 26. Next Steps: Webcasts and White Papers December 12th, 2pm Moving Beyond the Basics: Data Mining for Institutional Research Information at Visit to download a copy of the SPSS Data Mining Tips Guide
  • 27. For more information Complete the evaluation form and tell us what you thought of today’s webcast
  • 28. THANK YOU! Survey also at: gid=0010