1) This document discusses how using data can help make college counseling more effective by quantifying goals, analyzing relationships between student data and outcomes, and identifying actions that lead to meeting goals.
2) It provides a framework for conducting data-driven research including planning questions, obtaining existing and new student data, preparing and analyzing the data, and applying findings.
3) Key analysis techniques mentioned include regression, classification, multivariate analysis, and causal analysis, which can be used to predict outcomes and determine the effects of different factors while accounting for biases in the data.
2. About Me
● Statistics B.A. + M.A. @ Columbia
● Data & Accountability Team @ Success Academies Charter
Network in NYC
● Data Science @JPMorgan, @ RUN Ads (Digital ad targeting)
● Currently Data Science @ SchooLinks
● Why am I here and what do I know about college
planning/counseling?
3. What should you take away?
1) How being data driven counseling can help you in
counseling
2) Process/Framework to follow
3) Exposure and working knowledge of more advanced
techniques
4. Why data-driven?
● “Big data”
● It helps businesses figure out what they need to do, or what actions
they need to take to meet their goals - predicting the future? Super
powers?
● Why does it work? Boils down to is Scientific Method and the
availability of data.
● Businesses saw that this framework for thinking about the world
was helpful for them.
7. Conduct research yourself if you’re in academia or want to publish
a paper about college counseling…
But more likely:
○ Learn how to be more effective individually,
○ Be more effective as a department, or
○ Justify the investment to use a new curriculum/ new
approach (be sure it has a positive outcome)
8. What are your goals?
● Examples from folks that we work with:
○ Increase the number of students who have meaningful
post graduation plans
○ Increase college going rate
○ Close achievement gap in your school/district
○ College retention
9. Quantify these Goals
● Goal Metrics
○ Outcomes (matriculation, retention)
○ Process metrics (setting up for success… completion date of
applications)
● KPI - intermediate/progress tracking
○ FAFSA Completion
○ PSAT/SAT/ACT, etc completion rates
10. The Process
1) PLAN
2) Get data
3) Prepare/Analyze data
4) Improve based on the learnings from your data
11. 1. Plan
● Background research to make sure your question is a good one for
primary research: i.e. specific to your school and can’t be more
efficiently answered by reading it in a book or elsewhere
● Write down your question
● Make sure it is important and tied to outcomes you are about…
and will yield actionable insights
● Lab notebook (folder on your computer) etc.
● Ensure data access/you will be able to complete the actual
“research"
● List out assumptions
12. 2. Getting Data
● Two main types of data:
○ Outcome data (dependent variable) - college going rate,
students who were accepted into their top 3 choices
(combination of the KPI and the goal metrics we talked about
before)
○ Treatment data (independent variable) - curriculum they used,
programs/extracurricular at schools, sentiment as reported by
surveys
● Sources of data (we’ll talk about strategies for each):
○ Existing data
○ Data you collect
13. 2.1 Getting Data - Using Existing Sources of
Student Data
● Where do you find it/how do you access?
○ SIS data - grades, attendance, participation -> CSV export
○ College tools such as SchooLinks and Naviance, National
College Clearinghouse data
● What does the data mean, “data generating process”
● FERPA?
14. 2.2 Getting Data - Collecting Your Own Data
● Sources:
○ Structured activities/curriculum exposure (what they do)
○ Surveys/Questionnaires (what they say they do and what
they think)
● Best Practices:
○ Organization is essential: lab notebook, dates/timestamps
○ Pay attention to ID space - your ability to analyze data is
actually really tied to your ability to tie outcome data to
treatment data with a key
15. 3. Preparing Data
● Combining data from different data sets: ID space (key)
○ Vlookup (Excel, Google sheets, Apple Numbers)
○ Joins - SQL, python, etc.
● Messy/missing data, outliers - what to include and not to
include?
● Visualizing data
16. 4. Analyzing Data
● All about the relationship between the treatment data and the
outcome data.
● Conditional probability is the most complicated math you’ll
need and most of these dynamics are really early visualized with
graphs
● Background: ASCA has actually a pretty useful book - a review
of percentages/probability, etc focused on giving counselors the
background to do this work - you might be able to pick to up
here if there’s a book store
19. What to do with your findings:
● Apply them yourself
● Share them - if they’re worthwhile for you, they’re probably
worthwhile for the rest of your dept (ideally “generalizable”)
● Communicate them - for larger adoption across a dept or
funding
○ Graphs, writing, speaking
○ Keep it simple
20. Pretty simple… what’s the big deal about?
Concepts you should be aware of:
● Regression
● Classification
● Multivariate Analysis
● Causal Analysis
● Statistical Confidence (p values)
● Machine Learning
Goal: know how these are useful
21. Classification
● Determining the class or group of a
case
● Outcome is the probability case being
part of a certain group
● Most common use case is binary
classification
● Many different statistical methods
(“families of models” can be used)
Example:
Predicted whether a student will fill out FAFSA for
by a certain date based on academic performance
22. Regression
● Predicting continuous outcomes
● Similar to our exercise but instead
of the probability, we’re looking at
the average score for particular
treatment groups or the average
change in one variable for a unit
change in the other “slope”
Example:
Predicting number of AP classes by house of
extracurricular activity per week
23. Multivariate Analysis
● Incorporating more than one
independent variable, still only one
response variable
● Can think of it as data in more than two
dimensions
● Think about the effect of one variable
controlling for all others
● Could be in classification setting or
regression setting
http://metabolomicsplatform.com/projects/gc-ms/
Example:
Predicted whether a student will fill out FAFSA for by a
certain date based on academic performance,
demographics, and survey data
24. Causal Analysis
● Accounting for the fact that example cases
aren’t always assigned to treatment in a
randomized way
● Many techniques, usually require a lot of data,
simplest is Propensity Score Matching (PSM)
● 1) build model to understand probability of
assignment to treatment/control
● 2) Pick groups of subjects in treatment and
control groups that had the same chance of
being assigned to the treatment or control
based on all of the other day (controls for bias)
25. P-values
● All about quantifying how certain
you are that your finding is a real
finding and not just “random
variation” or statistical noise
● Dependent on sample size and
variability of you data
● Given that there was no true
difference between groups, how
likely would you be to find the
two groups as different as you did
in your analysis
http://uk.cochrane.org/news/key-statistical-result-i
nterpretation-p-value-plain-english
26. Machine Learning
● Figuring out how to encode knowledge
and patterns into structures we can use
● All about predictive accuracy vs.
statistics which is more about
assembling knowledge of the
underlying patterns that we study
● Supervised vs. Unsupervised Learning
● Many different methodologies:
decisions trees, bayesian learning,
deep learning, clustering, expectation
maximization
28. Concluding Remarks
● Data skills - more about logic, domain knowledge, and posing
good questions rather than hard technical skills
● Get 80% of the way there with conditional probability
● Use tools to automate workflow and save time
● Ask questions... of your data, your vendors, colleagues, the
internet