SlideShare a Scribd company logo
1 of 31
Why the disparity?
Exploratory Data Analysis
Eliminate Outliers Cut or Impute Missing Data
Data Preparation
Data extraction Aggregate the data
Exploratory Data Analysis
Manual Process
Requires time &
specialized modeling skills
Data Preparation
Data changes over time Getting point-in-time data
Building a Generalized Linear Model
Tasks Challenges
Why the disparity?
Feature Selection
What features do I use?
What about interactions
between features?
Exploratory Data Analysis
Eliminate Outliers Cut or Impute Missing Data
Data Preparation
Data extraction Aggregate the data
Feature Selection
Lack of domain knowledge
& limited resources
Limited shelf-life
Exploratory Data Analysis
Manual Process
Requires time &
specialized modeling skills
Data Preparation
Data changes over time Getting point-in-time data
Building a Generalized Linear Model
Tasks Challenges
PurePredict™
• Thousands of decision trees
Deposit = No
Gender =
Female
(11%)
Applied
Yes
(20%)
No (2%)
Deposit
Yes
(80%)
Visited
Campus
Yes
(95%)
No (70%)
No (10%)
Gender
Female
(11%)
Male
(9%)
Pass No. 1 Pass No. 2
Student
Source
Test
Scores
Financial
Aid
Lead
Qualificatio
n
Application
Essays
Visits &
Events
Comm
Logs
Visit
Surveys
Modular approach to data submission makes retrieval easy
Aggregated
Student
Data
PurePredict™
• Thousands of decision trees
• Modular approach to data submission
makes retrieval easy
• Our advanced algorithms create
hundreds of data features
• Added unique, proprietary features
• A new model each month means we
use all of your data
PurePredict™
July
Inquiry Method
Demographics
November
Application
Source
Admit Flag
Campus Visit
February
Deposit Flag
Interaction Data
(email, SMS,
etc.)
Admitted Student
Reception
Attendance
May
Deposit Flag
New Student
Orientation
Registration
Financial Aid
Offer
• Likelihood to enroll score for each
data record
• A new model each month, delivered
in an easy-to-understand report
• An experienced, on-call client
services team
• Gains in personnel efficiencies,
budget savings and ROI
PurePredict™
Mindstream Retention Model
• Leverages the Predictive model
• Academics
• Financial
• Social
• Demographic
• Up to four risk scores for targeted
• Adds data from first semester
• Courses, grades and attendance
• Extracurricular participation & student life
• Payment plan history
• Transcript requests
intervention strategies
Source: US Department of Education, Mindstream analysis
Data Health Monitoring
• Detailed monitoring
• Actionable intelligence
• Measurable results
Source: US Department of Education, Mindstream analysis
e.g., duplication,
undeliverable
emails
e.g., between-year
variance, changes
in data format
What should I
collect that I’m
not?
Data Health Monitoring
A smarter
approach,
designed for
you from
beginning to
end.
https://mindstreamco.com

More Related Content

Similar to Enroll more, spend less

RFID use in Libraries: ROI
RFID use in Libraries: ROIRFID use in Libraries: ROI
RFID use in Libraries: ROIjeffnarver
 
Visual Access AESA share
Visual Access AESA shareVisual Access AESA share
Visual Access AESA shareEdAdvance
 
PMI-ACP Training Deck
PMI-ACP Training DeckPMI-ACP Training Deck
PMI-ACP Training Deckwjperez0629
 
A/B Testing Best Practices - Do's and Don'ts
A/B Testing Best Practices - Do's and Don'tsA/B Testing Best Practices - Do's and Don'ts
A/B Testing Best Practices - Do's and Don'tsRamkumar Ravichandran
 
Modeling Requirements Narrated2
Modeling Requirements Narrated2Modeling Requirements Narrated2
Modeling Requirements Narrated2Daniel Brookshier
 
Modeling Requirements with SysML
Modeling Requirements with SysML Modeling Requirements with SysML
Modeling Requirements with SysML Daniel Brookshier
 
Drifting Away: Testing ML Models in Production
Drifting Away: Testing ML Models in ProductionDrifting Away: Testing ML Models in Production
Drifting Away: Testing ML Models in ProductionDatabricks
 
The Agile Revolution of IBM
The Agile Revolution of IBMThe Agile Revolution of IBM
The Agile Revolution of IBMAlan Kan
 
Metadata Quality Issues in Learning Repositories
Metadata Quality Issues in Learning RepositoriesMetadata Quality Issues in Learning Repositories
Metadata Quality Issues in Learning RepositoriesNikos Palavitsinis, PhD
 
Modeling Framework to Support Evidence-Based Decisions
Modeling Framework to Support Evidence-Based DecisionsModeling Framework to Support Evidence-Based Decisions
Modeling Framework to Support Evidence-Based DecisionsAlbert Simard
 
It’s all about me_ From big data models to personalized experience Presentation
It’s all about me_ From big data models to personalized experience PresentationIt’s all about me_ From big data models to personalized experience Presentation
It’s all about me_ From big data models to personalized experience PresentationYao H. Morin, Ph.D.
 
Problem Solving: The P in PDSA
Problem Solving: The P in PDSAProblem Solving: The P in PDSA
Problem Solving: The P in PDSATKMG, Inc.
 
EMS - Educational Management System
EMS - Educational Management SystemEMS - Educational Management System
EMS - Educational Management SystemDCS GLOBAL INFO
 
Skill Drills - Requirements Gathering
Skill Drills - Requirements GatheringSkill Drills - Requirements Gathering
Skill Drills - Requirements GatheringLisa Estus
 
Bb world 2012 Suzan Harkness It’s all about the data increasing student succe...
Bb world 2012 Suzan Harkness It’s all about the data increasing student succe...Bb world 2012 Suzan Harkness It’s all about the data increasing student succe...
Bb world 2012 Suzan Harkness It’s all about the data increasing student succe...Suzan Harkness
 
Assessing youragility
Assessing youragilityAssessing youragility
Assessing youragilityrseniv
 

Similar to Enroll more, spend less (20)

RFID use in Libraries: ROI
RFID use in Libraries: ROIRFID use in Libraries: ROI
RFID use in Libraries: ROI
 
Visual Access AESA share
Visual Access AESA shareVisual Access AESA share
Visual Access AESA share
 
Crystal
CrystalCrystal
Crystal
 
PMI-ACP Training Deck
PMI-ACP Training DeckPMI-ACP Training Deck
PMI-ACP Training Deck
 
Data analytics, a (short) tour
Data analytics, a (short) tourData analytics, a (short) tour
Data analytics, a (short) tour
 
A/B Testing Best Practices - Do's and Don'ts
A/B Testing Best Practices - Do's and Don'tsA/B Testing Best Practices - Do's and Don'ts
A/B Testing Best Practices - Do's and Don'ts
 
Modeling Requirements Narrated2
Modeling Requirements Narrated2Modeling Requirements Narrated2
Modeling Requirements Narrated2
 
Modeling Requirements with SysML
Modeling Requirements with SysML Modeling Requirements with SysML
Modeling Requirements with SysML
 
Drifting Away: Testing ML Models in Production
Drifting Away: Testing ML Models in ProductionDrifting Away: Testing ML Models in Production
Drifting Away: Testing ML Models in Production
 
Richard Wilburn - Lean Truth
Richard Wilburn - Lean TruthRichard Wilburn - Lean Truth
Richard Wilburn - Lean Truth
 
The Agile Revolution of IBM
The Agile Revolution of IBMThe Agile Revolution of IBM
The Agile Revolution of IBM
 
Metadata Quality Issues in Learning Repositories
Metadata Quality Issues in Learning RepositoriesMetadata Quality Issues in Learning Repositories
Metadata Quality Issues in Learning Repositories
 
Modeling Framework to Support Evidence-Based Decisions
Modeling Framework to Support Evidence-Based DecisionsModeling Framework to Support Evidence-Based Decisions
Modeling Framework to Support Evidence-Based Decisions
 
It’s all about me_ From big data models to personalized experience Presentation
It’s all about me_ From big data models to personalized experience PresentationIt’s all about me_ From big data models to personalized experience Presentation
It’s all about me_ From big data models to personalized experience Presentation
 
Problem Solving: The P in PDSA
Problem Solving: The P in PDSAProblem Solving: The P in PDSA
Problem Solving: The P in PDSA
 
EMS - Educational Management System
EMS - Educational Management SystemEMS - Educational Management System
EMS - Educational Management System
 
Skill Drills - Requirements Gathering
Skill Drills - Requirements GatheringSkill Drills - Requirements Gathering
Skill Drills - Requirements Gathering
 
Bb world 2012 Suzan Harkness It’s all about the data increasing student succe...
Bb world 2012 Suzan Harkness It’s all about the data increasing student succe...Bb world 2012 Suzan Harkness It’s all about the data increasing student succe...
Bb world 2012 Suzan Harkness It’s all about the data increasing student succe...
 
Assessing youragility
Assessing youragilityAssessing youragility
Assessing youragility
 
Fundamental of Quality Data - Anthony Ndungu
Fundamental of Quality Data - Anthony NdunguFundamental of Quality Data - Anthony Ndungu
Fundamental of Quality Data - Anthony Ndungu
 

Recently uploaded

Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 

Recently uploaded (20)

Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 

Enroll more, spend less

  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21. Why the disparity? Exploratory Data Analysis Eliminate Outliers Cut or Impute Missing Data Data Preparation Data extraction Aggregate the data Exploratory Data Analysis Manual Process Requires time & specialized modeling skills Data Preparation Data changes over time Getting point-in-time data Building a Generalized Linear Model Tasks Challenges
  • 22. Why the disparity? Feature Selection What features do I use? What about interactions between features? Exploratory Data Analysis Eliminate Outliers Cut or Impute Missing Data Data Preparation Data extraction Aggregate the data Feature Selection Lack of domain knowledge & limited resources Limited shelf-life Exploratory Data Analysis Manual Process Requires time & specialized modeling skills Data Preparation Data changes over time Getting point-in-time data Building a Generalized Linear Model Tasks Challenges
  • 23. PurePredict™ • Thousands of decision trees Deposit = No Gender = Female (11%) Applied Yes (20%) No (2%) Deposit Yes (80%) Visited Campus Yes (95%) No (70%) No (10%) Gender Female (11%) Male (9%) Pass No. 1 Pass No. 2
  • 25. • Thousands of decision trees • Modular approach to data submission makes retrieval easy • Our advanced algorithms create hundreds of data features • Added unique, proprietary features • A new model each month means we use all of your data PurePredict™
  • 26. July Inquiry Method Demographics November Application Source Admit Flag Campus Visit February Deposit Flag Interaction Data (email, SMS, etc.) Admitted Student Reception Attendance May Deposit Flag New Student Orientation Registration Financial Aid Offer
  • 27. • Likelihood to enroll score for each data record • A new model each month, delivered in an easy-to-understand report • An experienced, on-call client services team • Gains in personnel efficiencies, budget savings and ROI PurePredict™
  • 28. Mindstream Retention Model • Leverages the Predictive model • Academics • Financial • Social • Demographic • Up to four risk scores for targeted • Adds data from first semester • Courses, grades and attendance • Extracurricular participation & student life • Payment plan history • Transcript requests intervention strategies
  • 29. Source: US Department of Education, Mindstream analysis Data Health Monitoring • Detailed monitoring • Actionable intelligence • Measurable results
  • 30. Source: US Department of Education, Mindstream analysis e.g., duplication, undeliverable emails e.g., between-year variance, changes in data format What should I collect that I’m not? Data Health Monitoring
  • 31. A smarter approach, designed for you from beginning to end. https://mindstreamco.com

Editor's Notes

  1. After a tumultuous year when tough decisions were made to ensure survival, Private Not For- Profit colleges and universities face a future that is volatile, uncertain, complex & ambiguous. Leaders and stakeholders have responded well to the exigencies of the pandemic, but they must now transform their institutions to not just survive, but thrive, in the “New Normal”.   
  2. One Transformative Strategy for thriving in this environment is data-informed decision-making.
  3. Predictive and prescriptive Analytics will enable you to take the guesswork out of enrollment management
  4. Before we get started, allow me to tell you who I am, and what my firm does. Mindstream is a management consulting firm based in San Antonio TX. I’m a graduate of the USMA at West Point. After serving as an Army Officer, I went to work for IBM, and then ended up at Bain Consulting. When I left Bain Consulting in 2001 to start my own management consulting firm, I gravitated towards higher ed. I did so because of my father, Dr. Joe Garcia, who was a long- time administrator in the TAMU System, serving as the VP for Finance and Administration/ CFO at Texas A&M International, Texas A &M Kingsville, and Texas A&M San Antonio. For the first several years we focused on the large public systems in Texas, i.e., the University of Texas and Texas A&M University. I decided to focus on private non- profits in 2014, because I saw that their needs were just as great (if not greater) than the large public institutions. Over the last 7 years we’ve helped schools such as Abilene Christian University, Agnes Scott College, Pepperdine University and St. Edwards University transform themselves. Our mission is to help colleges and universities transform themselves, institutionally, cost- wise, academically and in admissions and enrollment. Our expertise is broad AND deep. My consultants come from higher ed, and we are a great blend of experience and expertise. We see ourselves as problem solvers, and we go beyond report generation, producing work for our clients that results in sustainable change.
  5. Logistic regression has existed for ~ 200 years and is has been one of the leading methods for probability response modeling since the 1950’s. It is also the most common method used by predictive modeling companies in the higher ed space.
  6. What if you could know 80% of your enrolled class would come from the top 20% in your inquiry pool? What about 85% from the top 44%? That sounds pretty good, right? This is the best that you can hope for using a logistic regression model. This is better than nothing, but the fact is that in today’s environment this 80/20 benchmark is unacceptable. Why not? Because it is too inefficient to change your actions, and it doesn’t provide us with sufficient intel. Hence, it isn’t good enough to change your marketing strategy!
  7. Enter machine learning algorithms, the basis for a more effective and efficient predictive model. What began in 1952 with an IBM employee teaching a computer how to play checkers is now the driver behind IBM Watson, Google, Amazon, Facebook, Microsoft, self-driving cars, and most recent advances in the military, healthcare, manufacturing and transportation industries.
  8. Example A: a university that collected and tracked 151 variables more than 245,000 records. Example B : university that collected less than a third of that amount. Shows a dramatic increase in ability to predict enrollment in a much lower percentage of the model population.
  9. Let’s take a closer look at the increased accuracy of PurePredict™ over Logistic regression. In this scenario we have an inquiry pool of 20,000 students with a fall enrollment goal of 1,000. The marketing budget for our inquiry pool is $200,000. Let’s start with results of a standard logistics regression model.
  10. If you’ll remember, the accuracy of the leading model was 85% of enrollees were found in just under the top 50% of the model. So, we must market to half of the inquiry pool (or 10,000 inquiries to get 850 of 1,000 enrolled student goal). Based on the model, if we market to the next 5,000 inquiries found in the third quartile, we will find another 100 enrolled students. The final 50 students needed to hit our goal is found in the bottom 25% of the model. If I’m managing this enrollment scenario, there is too much risk to treat any of the inquiries found in the bottom 50% any different than the top 50%. I need everyone of those students to hit my goal. The bottom 25% holds $1.2M in revenue with those 50 students. I would have to move forward treating all 20,000 inquiries the same. Receiving the same marketing spend and the same recruiting attention. The law of diminishing returns!
  11. A quick look at the spend and its results: With a $200,000 budget and 20,000 inquiries, each student on my inquiry campaign costs me $10. Now let’s look at the same scenario using PurePredict™ .
  12. Again, let me remind you of a university in which PurePredict™ found 97% of their enrollees in the top 10% of the model population. Using those results in our scenario, we only had to target the top 2,000 inquiries to find 970 of our future enrollees. When we add the next 4,000 inquires most likely to enroll to our inquiry campaign we find another 60 enrollees, putting us over our enrollment goal. The model tells us that the remaining 70% of our pool will only yield 15 students. Let’s compare the two graphs.
  13. On the right, the regression model doesn’t really provide me a pool of inquiries in which I might reduce my marketing spend. All three buckets contain a large number of students that is imperative to my success. With PurePredict™, assuming all other things hold constant, I will meet or exceed my enrollment goal within the top 30% of my model population (or my top 6,000 inquiries). That leaves me nearly 14,000 inquiries in which I have one tenth of one percent in landing. If I’m managing this enrollment scenario with PurePredict™ data results, I will implement a multi-layered strategy that should reduce my budget cost or increase my enrollment – maybe both. Let’s play out the marketing spend with
  14. As we look at the campaign results, pay close attention to the “spend per student” which is highlighted. This is where the power of our experienced enrollment team comes into play. We have the same marketing budget of $200,000. One might say, “Let’s run the first two groups or the top 6,000 inquiries through our standard inquiry campaign at $10/ea and forget the other 14,000. We can save over 2/3 of our budget.” If you are in a major cost cutting situation, you might consider venturing down a path such as this, but we would never recommend a strategy that includes halting communication to nearly ¾ of your inquiry pool. In this situation our team would likely suggest the following: Since you know these top 2,000 inquiries are already really interested in your institution, let’s spend a little more money (or time or recruiting energy) to try and land even more of them. This group is the easiest to convince to choose you. A little extra attention should increase your enrolled percentage from this group. You’ll see with the next group, we suggest doubling your spend. These are students truly on the fence and it will take a greater amount of effort to enroll this group, but you still have a much better chance yielding more student from this pool of inquiries than the bottom 70%. Now we move to the remaining 14,000 inquiries. We suggest this group is put on an email only campaign with the main action to either apply or visit campus. No print pieces, no admission’s counselor time. If they take a step toward you on their own or through an email you sent then the next time we run your model, that student will move up into one of the other buckets and begin receiving more communication. You’ll notice that even with spending more on the top 6,000 students than originally planned, you still have a budget surplus of $60,000. And you’ve surpassed your enrollment goal by 45 students.
  15. To complete this comparison scenario, you see that with you: Enrolled 45 more students Reduced your spending by $60,000 (30%) Increase revenue by $1.1M And your ROI with our model was 50% greater than the logistics regression method. Let’s take a closer look at the reasons for the disparity between the two types of analytic methods, and then focus on the specific benefits of PurePredict™
  16. That sets the stage really well to get into a little more detail about our approach. I think the big question in most people’s minds is probably how we’re able to achieve such significant gains through machine learning over more traditional approaches. I know some of you are accustomed to dealing with logistic, probit or even just linear models. So, in order to simplify the problem statement, let me explain that nearly every regression model is actually a linear model under the covers. We just use different distribution functions to transform the optimization problem into a linear one. This family of models are referred to collectively as ‘generalized linear models’, and there are several drawbacks that apply universally to this entire family of modeling approaches. I think the easiest way to explain them is to walk through the process of building one of these models.
  17. The first step with any modeling project is to pull the data. To do this, we need to identify a specific date and be able to pull together a data set with multiple years of data as of this date. If you have a true data warehouse, with snapshots going back several years that you can pull from, this is an easy task; but most of us are not this fortunate. This means we have to use change logs and system dates to rebuild a view of what our system looked like as of a date that could be as many as 3-5 years in the past. As you can imagine, this presents several non-trivial challenges to using a robust data set, and most universities never make it past this step in the process. I’ll walk you quickly through two of the primary challenges: We all know data changes over time. For example, let’s say a freshman, who ultimately wants be a medical doctor applies in November and lists a major of ‘Chemistry’, but at New Student Orientation, she is advised to change that major to ‘Biology’ and add a ’Pre-Med’ concentration. Depending on how that data is stored, it is likely that when we pull major data the next November -- and try to run it through a predictive model -- we show that student as a Biology major with a Pre-Med concentration. The fact, however, is we didn’t have that information for another eight months on a year-over-year basis. If we use that information in our November model, we are using information from the future to predict a future outcome and biasing our model. In this case, the concentration actually serves as a proxy for New Student Orientation attendance. We have found this kind of systemic data stability issue with every client we’ve worked with. Because of this whole concept of not using future information to build a model, best practice dictates that we choose an as-of date for our model. For example, I may use data that was in my system as of June 1 in a given year to create a model of my inquiry pool prior to launching my freshman application for admission in July. This is the only way to ensure a quality model, but it also means that our model has a limited ‘shelf-life’. This is because it’s limited to the data that is available and predictive as of the summer before the senior year in high school. However, once the application opens, we have access to a whole new level and quality of very valuable data, which essentially makes our pre-application model irrelevant in less than a month. There are ways to get around this and make a single model more durable, but all of them involve a careful balancing act that leaves important data out of your model at both the beginning and end of the recruitment cycle. This means we either have to settle for a single ‘ok’ model, or invest in building and implementing several models each year in order to predict a single outcome, which just isn’t sustainable.
  18. Once we have a validated and verified data set as of a specific point in time year-over-year, we would start the process of exploratory data analysis. This requires the attention of someone trained in modeling technique to prepare that data set by eliminating records with missing data or making liberal assumptions about the data in order to impute the missing data, which just means to fill it in with estimated values. This presents a significant challenge in the case of data such as lead qualifications, that might be sparsely populated, but highly predictive.
  19. Let’s assume you have successfully overcome all of these challenges, and you have arrived at a fully populated and validated data set suitable for modeling. At this point, the analyst would visualize the relationships in the data, conduct correlation analyses, and mine the data by adding certain transformed versions of the data into the data set to get more predictive value than offered by the raw data. Unless you have unlimited resources, this means you end up exploring only a very small portion of the data’s potential before moving on to create the model. Even then, you have to have very specific domain knowledge about the underlying student behavior, as well as very specialized knowledge of predictive modeling techniques just to unlock ‘most’ of the potential in your data. To make things more challenging, these two skill sets can rarely be found in one person, so you end up needing a team involving an analyst and a functional manager to guide the modeling process. These are significant challenges, but many of these issues can be circumvented by employing a machine learning approach.
  20. In general, tree-based machine learning models are insensitive to missing data and outliers, and automatically handle variable selection and interaction. In our solution we use a modeling technique called Gradient Boosted Machines. This type of model makes thousands of passes over the data building small decision tree models. <<CLICK FOR PASS No. 1>> With each successive pass, it learns from the prior passes to improve the performance of classifications. I realize this is an overly simplified example, so if you’ve got a background in machine learning please don’t get out the torches and pitchforks, but if in pass one the model did not do a good job of predicting enrollment for female leads who have not deposited, it might focus on this group in the next pass in order to improve the model. <<CLICK FOR PASS No. 2>> Ultimately, we end up with thousands of individual decision trees. Each one by itself may not be a great predictor, but as the model learns over thousands of passes through the data and aggregates the results, it ultimately provides excellent predictions. Our models generally leverage information from 50 or more different features, while a logistic model is typically limited to 6-8 features. In short, we squeeze more predictive value out of the same data.
  21. We’ve divided the data submission into modules that allow you to pull data from different sources and submit them directly to us, without the need to do any aggregations or transformations on your end. You essentially submit a raw data dump from your CRM or Student Information System. The primary module is the core student data, but it is supplemented with 8 other modules encompassing everything from campus visits to financial aid and even the essay text submitted with a student’s application. Yes, you heard me right. We mine the application essay. A lot of people have a hard time understanding that one, but here’s something to remember – every time a student contacts you they leave subtle clues as to where you stand with them in the language they use and how they approach the conversation. If you know how and where to look, you can find significant information even in a student’s application essays. Once we have your raw data, we apply a series of proprietary data mining algorithms to transform it into 500 or more unique features to feed into your model; but here’s the real kicker. Each one of these features is specifically designed and curated to maximize the predictive value of your data. I could download the entire census database and come up with literally thousands of features to throw at a model, but that only increases the chance that the model will generate false-positive predictors, so more is not always better. We do append both geodemographic and other proprietary data we’ve developed prior to running your model, but we don’t overdo it because we know what data is important to predicting enrollment management outcomes. I’ve personally been building predictive enrollment models for over a decade, and we’ve programmed all our years of research and experience in higher education modeling into the algorithms driving the PurePredict system.
  22. One of the most exciting parts of our approach is that we train and deliver a new model to you every month, enabling you to take full advantage of your data as the enrollment cycle progresses.
  23. In July prior to the senior year, your model may contain information about how the student inquired and basic demographic data. Once the application opens, that model is obsolete, because we now have stronger predictors to draw from, and the cycle continues all year long. In order to capitalize on this additional data, you have to update your models on a regular basis, which is simply not sustainable with a traditional modeling approach.
  24. Each month, we will deliver a file with a model score for each student, which you can leverage in your workflows and marketing campaigns. This will be accompanied by a very concise management-level report summarizing your most recent model and what data it is leveraging. Our strategy team and data scientists will be with you throughout the year to help you understand how to use your model results to achieve efficiencies in personnel and marketing operations, find budget savings, and increase ROI. All while ensuring limited risk to the bottom line: your class.
  25. After putting all that work into recruiting a class, it sure would be nice to keep them around for a while, right? So let’s talk about retention for a minute. We’re very excited that our team has identified measures of engagement in the recruiting process that help us to predict retention prior to enrollment. This makes your PurePredict data set the perfect foundation for a retention model which can be delivered prior to enrollment, and even updated to deliver new scores on a regular basis. Additionally, we can add in data throughout the first semester as it becomes available, such as course attendance, mid-term and final grades, extracurricular involvement, payment plan history, and transcript requests. Depending on the data that is available, we can then deliver up to four separate risk scores you can use to target specific intervention strategies. These include risk scores for academic factors, financial situations, social involvement and demographics As with all PurePredict models, we offer convenient modular data submission, the results are summarized in an easy-to-understand report format, and our strategy team will walk with you throughout the process.
  26. One of the questions we get a lot is, “How do I know if my data is good enough to use for a predictive model?”, so let’s talk about how we can help answer that question. We’ve already touched on several of the important data issues that can come up when using your data to create predictive models. The Data Health Monitor is a product we’ve designed with these pitfalls in mind to ensure you are aware of the holes in your data and can take immediate action to remedy them. Each Data Health Monitor report contains not only a high-level summary of the issues we uncover in your data, but also detailed excel-based reports embedded right within the pdf. These sub-reports contain the data necessary for you to take immediate action. As you make progress in cleaning your data, you are able to see the results of your effort reflected in your data health scores. These improvements can then be reported to your executive team to show progress toward institutional goals for data quality.