Data-Driven College Counseling Techniques

Data-Driven College Counseling
Michael Discenza
Senior Data Scientist - SchooLinks
SchooLinks | A personalized college and career readiness solution

About Me
● Statistics B.A. + M.A. @ Columbia
● Data & Accountability Team @ Success Academies Charter
Network in NYC
● Data Science @JPMorgan, @ RUN Ads (Digital ad targeting)
● Currently Data Science @ SchooLinks
● Why am I here and what do I know about college
planning/counseling?

What should you take away?
1) How being data driven counseling can help you in
counseling
2) Process/Framework to follow
3) Exposure and working knowledge of more advanced
techniques

Why data-driven?
● “Big data”
● It helps businesses figure out what they need to do, or what actions
they need to take to meet their goals - predicting the future? Super
powers?
● Why does it work? Boils down to is Scientific Method and the
availability of data.
● Businesses saw that this framework for thinking about the world
was helpful for them.

What is data useful for in college
counseling?

Conduct research yourself if you’re in academia or want to publish
a paper about college counseling…
But more likely:
○ Learn how to be more effective individually,
○ Be more effective as a department, or
○ Justify the investment to use a new curriculum/ new
approach (be sure it has a positive outcome)

What are your goals?
● Examples from folks that we work with:
○ Increase the number of students who have meaningful
post graduation plans
○ Increase college going rate
○ Close achievement gap in your school/district
○ College retention

Quantify these Goals
● Goal Metrics
○ Outcomes (matriculation, retention)
○ Process metrics (setting up for success… completion date of
applications)
● KPI - intermediate/progress tracking
○ FAFSA Completion
○ PSAT/SAT/ACT, etc completion rates

The Process
1) PLAN
2) Get data
3) Prepare/Analyze data
4) Improve based on the learnings from your data

1. Plan
● Background research to make sure your question is a good one for
primary research: i.e. specific to your school and can’t be more
efficiently answered by reading it in a book or elsewhere
● Write down your question
● Make sure it is important and tied to outcomes you are about…
and will yield actionable insights
● Lab notebook (folder on your computer) etc.
● Ensure data access/you will be able to complete the actual
“research"
● List out assumptions

2. Getting Data
● Two main types of data:
○ Outcome data (dependent variable) - college going rate,
students who were accepted into their top 3 choices
(combination of the KPI and the goal metrics we talked about
before)
○ Treatment data (independent variable) - curriculum they used,
programs/extracurricular at schools, sentiment as reported by
surveys
● Sources of data (we’ll talk about strategies for each):
○ Existing data
○ Data you collect

2.1 Getting Data - Using Existing Sources of
Student Data
● Where do you find it/how do you access?
○ SIS data - grades, attendance, participation -> CSV export
○ College tools such as SchooLinks and Naviance, National
College Clearinghouse data
● What does the data mean, “data generating process”
● FERPA?

2.2 Getting Data - Collecting Your Own Data
● Sources:
○ Structured activities/curriculum exposure (what they do)
○ Surveys/Questionnaires (what they say they do and what
they think)
● Best Practices:
○ Organization is essential: lab notebook, dates/timestamps
○ Pay attention to ID space - your ability to analyze data is
actually really tied to your ability to tie outcome data to
treatment data with a key

3. Preparing Data
● Combining data from different data sets: ID space (key)
○ Vlookup (Excel, Google sheets, Apple Numbers)
○ Joins - SQL, python, etc.
● Messy/missing data, outliers - what to include and not to
include?
● Visualizing data

4. Analyzing Data
● All about the relationship between the treatment data and the
outcome data.
● Conditional probability is the most complicated math you’ll
need and most of these dynamics are really early visualized with
graphs
● Background: ASCA has actually a pretty useful book - a review
of percentages/probability, etc focused on giving counselors the
background to do this work - you might be able to pick to up
here if there’s a book store

Sample Data Analysis
Students who had B achieved success at
33% whereas A achieved success at
22%

What to do with your findings:
● Apply them yourself
● Share them - if they’re worthwhile for you, they’re probably
worthwhile for the rest of your dept (ideally “generalizable”)
● Communicate them - for larger adoption across a dept or
funding
○ Graphs, writing, speaking
○ Keep it simple

Pretty simple… what’s the big deal about?
Concepts you should be aware of:
● Regression
● Classification
● Multivariate Analysis
● Causal Analysis
● Statistical Confidence (p values)
● Machine Learning
Goal: know how these are useful

Classification
● Determining the class or group of a
case
● Outcome is the probability case being
part of a certain group
● Most common use case is binary
classification
● Many different statistical methods
(“families of models” can be used)
Example:
Predicted whether a student will fill out FAFSA for
by a certain date based on academic performance

Regression
● Predicting continuous outcomes
● Similar to our exercise but instead
of the probability, we’re looking at
the average score for particular
treatment groups or the average
change in one variable for a unit
change in the other “slope”
Example:
Predicting number of AP classes by house of
extracurricular activity per week

Multivariate Analysis
● Incorporating more than one
independent variable, still only one
response variable
● Can think of it as data in more than two
dimensions
● Think about the effect of one variable
controlling for all others
● Could be in classification setting or
regression setting
http://metabolomicsplatform.com/projects/gc-ms/
Example:
Predicted whether a student will fill out FAFSA for by a
certain date based on academic performance,
demographics, and survey data

Causal Analysis
● Accounting for the fact that example cases
aren’t always assigned to treatment in a
randomized way
● Many techniques, usually require a lot of data,
simplest is Propensity Score Matching (PSM)
● 1) build model to understand probability of
assignment to treatment/control
● 2) Pick groups of subjects in treatment and
control groups that had the same chance of
being assigned to the treatment or control
based on all of the other day (controls for bias)

P-values
● All about quantifying how certain
you are that your finding is a real
finding and not just “random
variation” or statistical noise
● Dependent on sample size and
variability of you data
● Given that there was no true
difference between groups, how
likely would you be to find the
two groups as different as you did
in your analysis
http://uk.cochrane.org/news/key-statistical-result-i
nterpretation-p-value-plain-english

Machine Learning
● Figuring out how to encode knowledge
and patterns into structures we can use
● All about predictive accuracy vs.
statistics which is more about
assembling knowledge of the
underlying patterns that we study
● Supervised vs. Unsupervised Learning
● Many different methodologies:
decisions trees, bayesian learning,
deep learning, clustering, expectation
maximization

Concluding Remarks
● Data skills - more about logic, domain knowledge, and posing
good questions rather than hard technical skills
● Get 80% of the way there with conditional probability
● Use tools to automate workflow and save time
● Ask questions... of your data, your vendors, colleagues, the
internet

Questions?
Contact Info:
Mike@schoolinks.com
SchooLinks | A personalized college and career readiness solution

Data-Driven College Counseling Techniques

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Data-Driven College Counseling Techniques

Similar to Data-Driven College Counseling Techniques (20)

Recently uploaded

Recently uploaded (20)

Data-Driven College Counseling Techniques