DSSG Summer Fellowship @ UChicago
● Website: https://dssg.uchicago.edu/
● Real world: https://dssg.uchicago.edu/2016/08/18/the-real-world-dssg/
● From computer science and machine learning, to statistics, math, physical
sciences and engineering, to social sciences, public health and public policy.
● Website: http://dsapp.uchicago.edu/
● To me is more like a data-driven consulting startup
● Work on applied research projects with government and non-profit partners to
solve high-impact social problems, and create scalable, data-driven systems
for social good
● Uses design and systems thinking to develop reusable, open-source software
tools and data products.
● Combine methods and tools from predictive analytics and machine learning
with rigorous social science methods to build systems that help solve
large-scale social challenges.
But Most Common Machine Learning Tasks...
Using trends to
… You Actually Learned In Kindergarten
Using trends to
groups or categories
sorting into groups
Most Projects Fall in a Few Categories
• Early warning & intervention
• Efficient resource allocation & targeted action
• Effective advocacy & fundraising
• Data-driven policy recommendation & evaluation
“Predictive analytics is emerging as a
game-changer. Instead of looking backward to
analyze “what happened?” predictive analytics
help executives answer “What’s next?” and
“What should we do about it?”
Why Predictive Analytics Is A Game-Changer
● High impact social problems
○ Public Health
○ Public Safety
○ Economic Development
○ Criminal Justice
Data Science for Social Good
● The problem is important and has social impact.
● Data can play a role in solving the problem
● The organization has the right data
● The organization is ready to tackle this problem and take actions based on the work
Redirecting People with Complex Conditions to
● Start with a very vague and abstract goal
● Most organizations haven’t explicitly defined analytical goals for many of the
problems they’re tackling.
● The objective here is to take the outcome we’re trying to achieve and turn it
into a goal that is measurable and can be optimized
● The work we do can typically only have impact if it’s actionable.
● These actions often need to be fairly concrete
○ home inspections
○ enrolling a student in one of three after school programs
○ targeted emails for fundraising or advocacy
○ dispatching an emergency vehicle
○ scheduling a waste pickup
● A well- scoped project ideally has a set of actions that the organizations is
taking that can be now be better informed using data science.
● Sometimes end up creating a new set of actions as well
● What data do you have and what data do you need?
● Matching the data to the action
● External and/or Public Data
Standard deviation of time between public
Had two bookings within a year
Age at earliest interaction with a public
Age group at last interaction with a public
Number of bookings in last year
Number of mental health entries in the last
Total number of bookings
Number of therapists seen
Number of mental health services used
Type of therapy
Average bail amount
○ Primarily focused on understanding events and behaviors that have happened in the past.
○ Methods used to do description are sometimes called unsupervised learning methods and
include methods for clustering.
○ Less focused on the past and more focused on ongoing events.
○ Detection tasks often involve detecting events and anomalies that are currently happening.
○ Focused on the future and predicting future behaviors and events.
● Behavior Change:
○ Focused on causing change in behaviors of people, organizations, neighborhoods.
○ Typically uses methods from causal inference and behavioral economics.
Data Source Aggregation Prediction Risk Score
Risk score for next
2010 2012 2014 2016
Rank List: top 200 people
Precision: 52% ~ 102 people
19 years total jail time
$250,000 absolute minimum cost
2 years since last mental health