3. DSSG Summer Fellowship @ UChicago
● Website: https://dssg.uchicago.edu/
● Real world: https://dssg.uchicago.edu/2016/08/18/the-real-world-dssg/
● From computer science and machine learning, to statistics, math, physical
sciences and engineering, to social sciences, public health and public policy.
4.
5. DSaPP
● Website: http://dsapp.uchicago.edu/
● To me is more like a data-driven consulting startup
● Work on applied research projects with government and non-profit partners to
solve high-impact social problems, and create scalable, data-driven systems
for social good
● Uses design and systems thinking to develop reusable, open-source software
tools and data products.
● Combine methods and tools from predictive analytics and machine learning
with rigorous social science methods to build systems that help solve
large-scale social challenges.
6.
7. But Most Common Machine Learning Tasks...
Regression
Using trends to
predict outcomes
Clustering
Finding existing
groups or
categories
Classification
Labeling and
sorting into
groups
Dimensionality Reduction
Dimension Reduction
Finding important
predictors
8. … You Actually Learned In Kindergarten
Regression
Using trends to
predict outcomes
Clustering
Finding existing
groups or categories
Dimension Reduction
Finding important
predictors
Classification
Labeling and
sorting into groups
9. Most Projects Fall in a Few Categories
• Early warning & intervention
• Efficient resource allocation & targeted action
• Effective advocacy & fundraising
• Data-driven policy recommendation & evaluation
10. “Predictive analytics is emerging as a
game-changer. Instead of looking backward to
analyze “what happened?” predictive analytics
help executives answer “What’s next?” and
“What should we do about it?”
Forbes Magazine
Why Predictive Analytics Is A Game-Changer
18. Social Good
● High impact social problems
○ Public Health
○ Education
○ Public Safety
○ Economic Development
○ Criminal Justice
○ Environment
○ ...
19. Data Science for Social Good
● The problem is important and has social impact.
● Data can play a role in solving the problem
● The organization has the right data
● The organization is ready to tackle this problem and take actions based on the work
23. Redirecting People with Complex Conditions to
Effective Care
Jail,
Court,
Probation
EMS
Mental
Health
Center
Johnson
County
Services
24. Goals
● Start with a very vague and abstract goal
● Most organizations haven’t explicitly defined analytical goals for many of the
problems they’re tackling.
● The objective here is to take the outcome we’re trying to achieve and turn it
into a goal that is measurable and can be optimized
26. Actions
● The work we do can typically only have impact if it’s actionable.
● These actions often need to be fairly concrete
○ home inspections
○ enrolling a student in one of three after school programs
○ targeted emails for fundraising or advocacy
○ dispatching an emergency vehicle
○ scheduling a waste pickup
● A well- scoped project ideally has a set of actions that the organizations is
taking that can be now be better informed using data science.
● Sometimes end up creating a new set of actions as well
28. Data
● What data do you have and what data do you need?
● Matching the data to the action
● External and/or Public Data
29. Standard deviation of time between public
system interaction
Had two bookings within a year
Age at earliest interaction with a public
system
Age group at last interaction with a public
service
Number of bookings in last year
Number of mental health entries in the last
year
Total number of bookings
Number of therapists seen
Number of mental health services used
Type of therapy
Average bail amount
Demographics
Counts of
Interactions
Interaction
Context
Timeline
30. Analysis
● Description:
○ Primarily focused on understanding events and behaviors that have happened in the past.
○ Methods used to do description are sometimes called unsupervised learning methods and
include methods for clustering.
● Detection:
○ Less focused on the past and more focused on ongoing events.
○ Detection tasks often involve detecting events and anomalies that are currently happening.
● Prediction:
○ Focused on the future and predicting future behaviors and events.
● Behavior Change:
○ Focused on causing change in behaviors of people, organizations, neighborhoods.
○ Typically uses methods from causal inference and behavioral economics.
31. Data Source Aggregation Prediction Risk Score
Risk score for next
year
Machine
Learning
2010 2012 2014 2016
34. 102 individuals
19 years total jail time
$250,000 absolute minimum cost
2 years since last mental health
contact
John Doe
Jane Smith
James Williams
Mary Johnson
Robert Jones
Michael Davis
Linda Miller
Elizabeth Martinez
William Garcia
Maria Brown
David Moore
6.94
5.79
5.75
6.17
5.02
4.72
4.28
3.85
3.64
3.51
4.49