Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Thoughts on Big Data and more for the WA State Legislature


Published on

Brief remarks on big data trends and responsible data science at the Workshop on Science and Technology for Washington State: Advising the Legislature, October 4th 2017 in Seattle.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Thoughts on Big Data and more for the WA State Legislature

  1. 1. A few thoughts on big data, human services, and legislation Bill Howe, PhD Associate Professor, Information School Director, Urbanalytics Group Adjunct Associate Professor, Computer Science & Engineering University of Washington
  2. 2. Data Science for Social Good • Quarter-long, on-site projects, engagement two days per week – Simple two-page proposals – 4-6 concurrent teams: Network effects among cohort beyond 1:1 – Each team is ~50% project lead + ~50% eScience FTE • Capstone and course projects • Commissioned Research projects 2 Submit project ideas at
  3. 3. Predictors of Permanent Housing for Homeless Families Project Leads: Neil Roche & Anjana Sundaram, Gates Foundation DSSG Fellows: Joan Wang, Jason Portenoy, Fabliha Ibnat, Chris Suberlak ALVA High School Students: Cameron Holt, Xilalit Sanchez eScience Data Scientist Mentors: Ariel Rokem, Bryna Hazelton When homeless families engage in services and programs, what factors are most likely to lead to a successful exit? The DSSG team • developed algorithms to identify ‘families’ and to identify ‘episodes’ of homelessness including back-to-back, or overlapping enrollments in individual programs • devised innovative ways to visualize and analyze the ways families transition between programs The Gates Foundation, together with Building Changes have partnered with King, Pierce and Snohomish counties to make homelessness in these counties rare, brief and one-time.
  4. 4. Homeless families may take many pathways through programs Emergency shelter Transitional housing Rapid re-housing Permanent housing Housing with services Unsuccessful exit
  5. 5. Relatively simple visualizations…
  6. 6. Preliminary results to understand potential predictors of successful outcomes Correlation with successful outcome, by family characteristics Correlation with successful outcome, by homelessness program Emergency Shelter use tends to be associated with unsuccessful outcomes (unsurprising!) Homelessness Prevention programs more strongly associated with positive outcomes than transitional housing Substance abuse strongly associated with unsuccessful outcomes Parent employment strongest predictor of successful outcomes
  7. 7. Common trajectories lead to different outcomes: • a successful exit from an episode would mean that the family found a permanent housing solution • a proportion of these still receive government subsidies • other exits are exits back into homelessness, or to other, unknown destinations Novel Analyses of Family Trajectories through Programs An example using Pierce County data
  8. 8. How much time do you spend “handling data” as opposed to “doing science”? Mode answer: “90%” 10/6/2017 Bill Howe, UW 8
  9. 9. My research for 10 years: Making it easier to work with large, noisy, heterogeneous datasets • SQLShare: Easier to use databases • Myria: Easier to use scalable systems • Worked great in the physical sciences • But social, health, and civic colleagues have stricter requirements… October 6, 2017 9
  10. 10. 10/6/2017 Bill Howe, UW 10 Observation: Epistemic issues are beginning to dominate the big data / data science discussion in every field reproducibility, algorithmic bias, curation, fairness, accountability, transparency, provenance, explanations, persuasion
  11. 11. 11 Propublica, May 2016
  12. 12. “Should I be afraid of risk assessment tools?” “No, you gotta tell me a lot more about yoursel At what age were you first arrested? What is the date of your most recent crime?” “…and what was the culture of policing in the neighborhood in which I grew up in?”, September 2016 “Philadelphia is grappling with the prospect of a racist computer algorithm”
  13. 13. First decade of Data Science research and practice: What can we do with massive, noisy, heterogeneous datasets? Next decade of Data Science research and practice: What should we do with massive, noisy, heterogeneous datasets? The way I think about this…..(1)
  14. 14. The way I think about this…. (2) Decisions are based on two sources of information: 1. Past examples e.g., “prior arrests tend to increase likelihood of future arrests” 2. Societal constraints e.g., “we must avoid racial discrimination” 10/6/2017 Data, Responsibly / SciTech NW 14 We’ve become very good at automating the use of past examples We’ve only just started to think about incorporating societal constraints
  15. 15. The way I think about this… (3) How do we apply societal constraints to algorithmic decision- making? Option 1: Rely on human oversight Ex: EU General Data Protection Regulation requires that a human be involved in legally binding algorithmic decision-making Ex: Wisconsin Supreme Court says a human must review algorithmic decisions made by recidivism models Issues with scalability, prejudice Option 2: Build systems to help enforce these constraints This is the approach we are exploring 10/6/2017 Data, Responsibly / SciTech NW 15
  16. 16. 10/6/2017 Bill Howe, UW 16
  17. 17. Closing thoughts…. • WA State has an opportunity to play a leadership role in legislation around algorithmic bias, fairness, accountability, and transparency • We have the private and public tech expertise, the community engagement, and the political will to address this issue directly. • If we let the technology guide the policy, we’re in trouble.