Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Science for Social Good


Published on

參與過美國芝加哥大學 Data Science for Social Good Fellowship (DSSG) 的 Eddie Lin 經驗分享。

Published in: Government & Nonprofit
  • Hey guys! Who wants to chat with me? More photos with me here 👉
    Are you sure you want to  Yes  No
    Your message goes here

Data Science for Social Good

  1. 1. Data Science for Social Good Feb. 9, 2017@Taipei Eddie Lin
  2. 2. Introduction ● 林子耘 Eddie ● 0.33 Musician + 0.33 Engineer + 0.33 Scientist ● Website: ● GitHub: ● SoundCloud: ● 參加 2016 DSSG, 現在在DSaPP工作
  3. 3. DSSG Summer Fellowship @ UChicago ● Website: ● Real world: ● From computer science and machine learning, to statistics, math, physical sciences and engineering, to social sciences, public health and public policy.
  4. 4. DSaPP ● Website: ● To me is more like a data-driven consulting startup ● Work on applied research projects with government and non-profit partners to solve high-impact social problems, and create scalable, data-driven systems for social good ● Uses design and systems thinking to develop reusable, open-source software tools and data products. ● Combine methods and tools from predictive analytics and machine learning with rigorous social science methods to build systems that help solve large-scale social challenges.
  5. 5. But Most Common Machine Learning Tasks... Regression Using trends to predict outcomes Clustering Finding existing groups or categories Classification Labeling and sorting into groups Dimensionality Reduction Dimension Reduction Finding important predictors
  6. 6. … You Actually Learned In Kindergarten Regression Using trends to predict outcomes Clustering Finding existing groups or categories Dimension Reduction Finding important predictors Classification Labeling and sorting into groups
  7. 7. Most Projects Fall in a Few Categories • Early warning & intervention • Efficient resource allocation & targeted action • Effective advocacy & fundraising • Data-driven policy recommendation & evaluation
  8. 8. “Predictive analytics is emerging as a game-changer. Instead of looking backward to analyze “what happened?” predictive analytics help executives answer “What’s next?” and “What should we do about it?” Forbes Magazine Why Predictive Analytics Is A Game-Changer
  9. 9. The Battle of Venn Diagrams
  10. 10. Social Good ● High impact social problems ○ Public Health ○ Education ○ Public Safety ○ Economic Development ○ Criminal Justice ○ Environment ○ ...
  11. 11. Data Science for Social Good ● The problem is important and has social impact. ● Data can play a role in solving the problem ● The organization has the right data ● The organization is ready to tackle this problem and take actions based on the work
  12. 12. DSSG ● Open Source ○ Git & GitHub ● Databases ○ SQL, NoSQL e.g. PostgreSQL ● Programming ○ Python! R? ○ Web? D3.js? React.js? ○ Hadoop? Spark? ● Analysis ○ Scikit-learn, pandas, gensim, …. ● Pipeline ○ Drake ○ Airflow ○ Luigi….
  13. 13. Project Scoping: Problem Formulation
  14. 14. Redirecting People with Complex Conditions to Effective Care Jail, Court, Probation EMS Mental Health Center Johnson County Services
  15. 15. Goals ● Start with a very vague and abstract goal ● Most organizations haven’t explicitly defined analytical goals for many of the problems they’re tackling. ● The objective here is to take the outcome we’re trying to achieve and turn it into a goal that is measurable and can be optimized
  16. 16. 575,000 people 127,000 County Services
  17. 17. Actions ● The work we do can typically only have impact if it’s actionable. ● These actions often need to be fairly concrete ○ home inspections ○ enrolling a student in one of three after school programs ○ targeted emails for fundraising or advocacy ○ dispatching an emergency vehicle ○ scheduling a waste pickup ● A well- scoped project ideally has a set of actions that the organizations is taking that can be now be better informed using data science. ● Sometimes end up creating a new set of actions as well
  18. 18. EMTs Paramedics Police Probation officers Mental Health Caseworkers Police
  19. 19. Data ● What data do you have and what data do you need? ● Matching the data to the action ● External and/or Public Data
  20. 20. Standard deviation of time between public system interaction Had two bookings within a year Age at earliest interaction with a public system Age group at last interaction with a public service Number of bookings in last year Number of mental health entries in the last year Total number of bookings Number of therapists seen Number of mental health services used Type of therapy Average bail amount Demographics Counts of Interactions Interaction Context Timeline
  21. 21. Analysis ● Description: ○ Primarily focused on understanding events and behaviors that have happened in the past. ○ Methods used to do description are sometimes called unsupervised learning methods and include methods for clustering. ● Detection: ○ Less focused on the past and more focused on ongoing events. ○ Detection tasks often involve detecting events and anomalies that are currently happening. ● Prediction: ○ Focused on the future and predicting future behaviors and events. ● Behavior Change: ○ Focused on causing change in behaviors of people, organizations, neighborhoods. ○ Typically uses methods from causal inference and behavioral economics.
  22. 22. Data Source Aggregation Prediction Risk Score Risk score for next year Machine Learning 2010 2012 2014 2016
  23. 23. 6.94 5.79 5.75 6.87 5.74 4.72 4.69 3.65 3.64 3.61 4.72
  24. 24. 6.94 5.79 5.75 6.87 5.74 4.72 4.69 3.65 3.64 3.61 4.72 Rank List: top 200 people Precision: 52% ~ 102 people
  25. 25. 102 individuals 19 years total jail time $250,000 absolute minimum cost 2 years since last mental health contact John Doe Jane Smith James Williams Mary Johnson Robert Jones Michael Davis Linda Miller Elizabeth Martinez William Garcia Maria Brown David Moore 6.94 5.79 5.75 6.17 5.02 4.72 4.28 3.85 3.64 3.51 4.49
  26. 26. Working Flow
  27. 27. What about Taiwan?