SlideShare a Scribd company logo
1 of 55
Data, Responsibly:
The Next Decade of Data Science
Bill Howe, PhD
Associate Professor, Information School
Director, Cascadia Urban Analytics Cooperative
Adjunct Associate Professor, Computer Science & Engineering
University of Washington
My goals this afternoon…
• Describe “data science” from my perspective
• Describe some concerns that have recently emerged around the
irresponsible use of data science techniques and technologies
• Show off some of the work we’re doing to address it
DataLab
Bill Howe
Databases, data
management
Jessica Hullman
Visualization, HCI
Carole Palmer
Open data, digital
curation
Nic Weber
Open data, civic tech
Jevin West
Science of science,
bibliometrics
…”calling bullshit”
Emma Spiro
Social network
analysis
The Fourth Paradigm
1. Empirical + experimental
2. Theoretical
3. Computational
4. Data-Intensive
Jim Gray
1/10/2018 Bill Howe, UW 4
Nearly every field of discovery is transitioning from
“data poor” to “data rich”
Astronomy: LSST
Physics: LHC
Oceanography: OOI
Social Sciences
Biology: Sequencing
Economics
Neuroscience: EEG, fMRI
My view:
1/10/2018 Bill Howe, UW 8
Data science is about answering questions
using large, noisy, and heterogeneous
datasets, usually those that were
collected for some unrelated purpose
1/10/2018 Bill Howe, UW9
Question:
How early and accurately can we predict flu
outbreaks, so we can plan production levels
of flu vaccine?
Dataset:
Search histories of users
source:
http://www.google.org/flutrends/us/#US
http://www.google.com/permissions/using-product-graphics.html
flu risk
“Scientific hindsight shows that
Google Flu Trends far overstated this
year's flu season….”
“Lots of media attention to this
year's flu season skewed Google's
search engine traffic.”
David Wagner, Atlantic Wire, Feb
13 2013
Question:
1/10/2018 Bill Howe, UW11
Do people that take paroxetine and
pravastatin together exhibit
hypoglycemia symptoms?
Dataset:
Search engine histories
Ryen W White,Nicholas P Tatonetti, Nigam H Shah, Russ B Altman, Eric Horvitz,
Web-scale pharmacovigilance: listening to signals from the crowd, J Am
Med Inform Assoc, March 2013, doi:10.1136/amiajnl-2012-001482
Open Sidewalks – Sidewalk maps for low-mobility citizens
Project Leads: Nick Bolten, Anat Caspi – Taskar Center, CSE
DSSG Fellows: Amir Amini, Yun Hao, Vaishnavi Ravichandran,
Andre Stephens
ALVA High School Students: Nick Krasnoselsky, Doris Layman
eScience Data Scientist Mentors: Anthony Arendt, Jake
Vanderplas
“ 30 million Americans over 15
years old experience limited mobility,
including difficulty walking, climbing stairs, using
wheelchairs, crutches, walkers” while 24
million more persons experience
difficulty walkinga quarter mile”
|Picture: US Federal Highway administration
http://www.fhwa.dot.gov/environment/bicycle_pedestrian/publications/sidewalk2/sidewalks204.cfm
Automated cleaning of sidewalk data through computational geometry
powered by data
from:
SDOT/Socrata
Google API
Step Runtime Solved (All) Percent
Connecting T-Gaps ~3.9s 3,837 (4,352) 88.2
Intersection
Cleaning
~23.6s 38,844 (44,700) 86.9
Polygon Cleaning ~10min 7,283 (8,035) 90.6
Subgraphs ~23.2s 39,913 (45,265) 88.1
Homeless families may take many pathways through programs
Emergency
shelter
Transitional
housing
Rapid
re-housing
Permanent
housing
Housing with
services
Unsuccessful exit
Develop visualizations to show how homeless families move
through programs
Preliminary results to understand potential predictors of
successful outcomes
Correlation with successful outcome,
by family characteristics
Correlation with successful outcome, by
homelessness program
Emergency Shelter use
tends to be associated with
unsuccessful outcomes
(unsurprising!)
Homelessness Prevention
programs more strongly
associated with positive
outcomes than
transitional housing
Substance abuse strongly
associated with
unsuccessful outcomes
Parent employment
strongest predictor of
successful outcomes
Common trajectories lead to different outcomes:
• a successful exit from an episode would mean that the family found a permanent housing
solution
• a proportion of these still receive government subsidies
• other exits are exits back into homelessness, or to other, unknown destinations
Analyzing Family Trajectories through Programs
Data: Pierce County
Emergency Shelter -> Rapid Re-housing
Emergency Shelter -> Transitional Housing
80% successful exits
Only 40% successful exits
ORCA Percentage Difference in Ridership, Seattle
Mark
Hallenbeck
TRAC
1/10/2018 Bill Howe, UW 20
Passenger
Type Redmond Tukwila Redmond Tukwila
Adult 317181 72202 91% 67%
Youth 12818 7433 4% 7%
Senior 5425 4577 2% 4%
Disabled 7722 10449 2% 10%
Low Income 6912 12438 2% 12%
Metro Boardings By Type of Rider
1/10/2018 Bill Howe, UW 21
Session 2
Summer 2014
121,215 students
Session 1
Spring 2013
119,504 students
1/10/2018 Bill Howe, UW 23
14
Cathy O’Neil
September 2016
Three properties of a WMD:
Opacity
Scale
Damage
July 2016
“Data, Responsibly”
Dagstuhl Workshop
Gerhard
Weikum
Serge
Abiteboul
Julia
Stoyanovich
Gerome
Miklau
Observation:
Epistemic issues are beginning to dominate
the data science discussion in every field
reproducibility, “algorithmic bias,” curation, discrimination,
accountability, transparency, provenance, explanations,
persuasion, privacy
21
Ex: Staples online pricing
Reasoning: Offer deals to people that live near competitors’ stores
Effect: lower prices offered to buyers who live in more affluent
neighborhoods
22
[Latanya Sweeney; CACM 2013]
Racially identifying names trigger
ads suggestive of an arrest record
slide adapted from Stoyanovich, Miklau
1/10/2018 Bill Howe, UW 29
Amazon Prime Now Delivery Area: Atlanta Bloomberg, 2016
1/10/2018 Bill Howe, UW 30
Amazon Prime Now Delivery Area: Chicago Bloomberg, 2016
1/10/2018 Bill Howe, UW 31
Amazon Prime Now Delivery Area: Boston Bloomberg, 2016
23
Propublica, May 2016
24
The Special Committee on Criminal Justice Reform's
hearing of reducing the pre-trial jail population.
Technical.ly, September 2016
Philadelphia is grappling with the prospect of a racist computer algorithm
Any background signal in the
data of institutional racism is
amplified by the algorithm
operationalized by the algorithm
legitimized by the algorithm
“Should I be afraid of risk assessment tools?”
“No, you gotta tell me a lot more about yourself.
At what age were you first arrested?
What is the date of your most recent crime?”
“And what’s the culture of policing in the
neighborhood in which I grew up in?”
First decade of Data Science research and practice:
What can we do with massive, noisy, heterogeneous datasets?
Next decade of Data Science research and practice:
What should we do with massive, noisy, heterogeneous datasets?
The way I think about this…..(1)
The way I think about this…. (2)
Decisions are based on two sources of information:
1. Past examples
e.g., “prior arrests tend to increase likelihood of future arrests”
2. Societal constraints
e.g., “we must avoid racial discrimination”
11/10/2016 Data, Responsibly / SciTech NW 16
We’ve become very good at automating the use of past examples
We’ve only just started to think about incorporating societal constraints
The way I think about this… (3)
How do we apply societal constraints to algorithmic
decision-making?
Option 1: Keep a human in the loop
Ex: EU General Data Protection Regulation requires that a
human be involved in legally binding algorithmic decision-making
Ex: Wisconsin Supreme Court says a human must review
algorithmic decisions made by recidivism models
Option 2: Build them into the algorithms themselves
I’ll talk about some approaches for this
11/10/2016 Data, Responsibly / SciTech NW 17
The way I think about this…(4)
On transparency vs. accountability:
• For human decision-making, sometimes explanations are
required, improving transparency
– Supreme court decisions
– Employee reprimands/termination
• But when transparency is difficult, accountability takes over
– medical emergencies, business decisions
• As we shift decisions to algorithms, we lose both
transparency AND accountability
• “The buck stops where?”
11/10/2016 Data, Responsibly / SciTech NW 18
So what can we do about it?
• Algorithms that balance predictive accuracy with fairness
• Increase data sharing, while protecting privacy
– Avoid the “tyranny of convenience”
• Ensure transparency in all methods, datasets
• Track known biases in how data was collected, so it can
be controlled in downstream analytics
• All of these approaches are being explored in the
research community.
1/10/2018 Bill Howe, UW 38
Recap
• There’s a sea change underway in how we will teach
and practice data science
• No longer only about what can be done, but about
what should be done
• This is not just a policy/behavior/culture issue – there
are technical problems to solve
• Prediction: If a company is not thinking about this
stuff, they will soon be facing retention and
compliance issues
– Witness how the privacy discussion evolved
REPRODUCIBILITY
11/10/2016 Bill Howe, UW 32
Science is a complete mess
• Reproducibility
– Begley & Ellis, Nature 2012: 6 out of 53 cancer studies reproducible
– Only about half of psychology 100 studies had effect sizes that approximated
the original result (Science, 2015)
– Ioannidis 2005: Why most public research findings are false
– Reinhart & Rogoff: global economic policy based on spreadsheet fuck ups
11/10/2016 Bill Howe, UW 33
Science, 2015
11/10/2016 Data, Responsibly @ Dagstuhl 35
Retractions are increasing…..
Why is this happening? (1)
11/10/2016 Bill Howe, UW 37
Why is this happening? (2)
11/10/2016 Bill Howe, UW 38
Why is this happening? (2)
Publication Bias!
“DEEP CURATION”
TOWARDS AUTOMATIC SCIENTIFIC CLAIM CHECKING
Vision: Validate scientific claims automatically
– Check for manipulation (manipulated images, Benford’s Law)
– Extract claims from papers
– Check claims against the authors’ data
– Check claims against related data sets
– Automatic meta-analysis across the literature + public datasets
• First steps
– Automatic curation: Validate and attach metadata to public datasets
– Longitudinal analysis of the visual literature
11/10/2016 Data, Responsibly / SciTech NW 41
Microarray experiments
11/10/2016 Bill Howe, UW 43
Microarray samples submitted to the Gene Expression Omnibus
Curation is fast becoming the
bottleneck to data sharing
Maxim
Gretchkin Poon
Hoifung
Maxim
Gretchkin Poon
Hoifung
No growth in number of
datasets used per paper!
Maxim
Gretchkin Poon
Hoifung
Majority of samples are
one-time-use only!
color = labels supplied
as metadata
clusters = 1st two PCA
dimensions on the
gene expression data
itself
Can we use curate algorithmically?
Maxim
Gretchkin Poon
Hoifung
The expression data
and the text labels
appear to disagree
Maxim
Gretchkin Poon
Hoifung
Better Tissue
Type Labels
Domain knowledge
(Ontology)
Expression data
Free-text Metadata
2 Deep Networks
text
expr
SVM
Deep Curation Maxim
Gretchkin Poon
Hoifung
Distant supervision and co-learning between text-
based classified and expression-based classifier: Both
models improve by training on each others’ results.
Free-text classifier
Expression classifier
Deep Curation:
Our stuff wins, with no training data
Maxim
Gretchkin Poon
Hoifung
state of the art
our reimplementation
of the state of the art
our dueling
pianos NN
amount of training data used

More Related Content

What's hot

Citizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsCitizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsAmit Sheth
 
Transitioning Education’s Knowledge Infrastructure ICLS 2018
Transitioning Education’s Knowledge Infrastructure ICLS 2018Transitioning Education’s Knowledge Infrastructure ICLS 2018
Transitioning Education’s Knowledge Infrastructure ICLS 2018Simon Buckingham Shum
 
Ralph schroeder and eric meyer
Ralph schroeder and eric meyerRalph schroeder and eric meyer
Ralph schroeder and eric meyeroiisdp
 
Future Flight Fridays: Public Trust in Future Flight
Future Flight Fridays: Public Trust in Future FlightFuture Flight Fridays: Public Trust in Future Flight
Future Flight Fridays: Public Trust in Future FlightKTN
 
Social and Physical Sensing Enabled Decision Support for Disaster Management ...
Social and Physical Sensing Enabled Decision Support for Disaster Management ...Social and Physical Sensing Enabled Decision Support for Disaster Management ...
Social and Physical Sensing Enabled Decision Support for Disaster Management ...Artificial Intelligence Institute at UofSC
 
Science as an Open Enterprise – Geoffrey Boulton
Science as an Open Enterprise – Geoffrey BoultonScience as an Open Enterprise – Geoffrey Boulton
Science as an Open Enterprise – Geoffrey BoultonOpenAIRE
 
Learning Analytics as Educational Knowledge Infrastructure
Learning Analytics as Educational Knowledge InfrastructureLearning Analytics as Educational Knowledge Infrastructure
Learning Analytics as Educational Knowledge InfrastructureSimon Buckingham Shum
 
Citizen Science Phenotypes
Citizen Science PhenotypesCitizen Science Phenotypes
Citizen Science PhenotypesAndrea Wiggins
 
Mest3 Internet Lessons 1-3
Mest3 Internet Lessons 1-3Mest3 Internet Lessons 1-3
Mest3 Internet Lessons 1-3Macguffin
 
MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...
MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...
MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...Micah Altman
 
Web Observatories and e-Research
Web Observatories and e-ResearchWeb Observatories and e-Research
Web Observatories and e-ResearchDavid De Roure
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Micah Altman
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd Matthew Lease
 
Information, Science, and Society
Information, Science, and SocietyInformation, Science, and Society
Information, Science, and SocietyMelanie Swan
 
Public data archiving: Who does? Who doesn't? What can we do about it?
Public data archiving: Who does?  Who doesn't?  What can we do about it?Public data archiving: Who does?  Who doesn't?  What can we do about it?
Public data archiving: Who does? Who doesn't? What can we do about it?Heather Piwowar
 
An initial exploration of Citizen Science
An initial exploration of Citizen ScienceAn initial exploration of Citizen Science
An initial exploration of Citizen ScienceNiamh O Riordan
 

What's hot (20)

Citizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsCitizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and Applications
 
Transitioning Education’s Knowledge Infrastructure ICLS 2018
Transitioning Education’s Knowledge Infrastructure ICLS 2018Transitioning Education’s Knowledge Infrastructure ICLS 2018
Transitioning Education’s Knowledge Infrastructure ICLS 2018
 
Ralph schroeder and eric meyer
Ralph schroeder and eric meyerRalph schroeder and eric meyer
Ralph schroeder and eric meyer
 
Future Flight Fridays: Public Trust in Future Flight
Future Flight Fridays: Public Trust in Future FlightFuture Flight Fridays: Public Trust in Future Flight
Future Flight Fridays: Public Trust in Future Flight
 
Social and Physical Sensing Enabled Decision Support for Disaster Management ...
Social and Physical Sensing Enabled Decision Support for Disaster Management ...Social and Physical Sensing Enabled Decision Support for Disaster Management ...
Social and Physical Sensing Enabled Decision Support for Disaster Management ...
 
Crowdsourcing Science
Crowdsourcing ScienceCrowdsourcing Science
Crowdsourcing Science
 
Science as an Open Enterprise – Geoffrey Boulton
Science as an Open Enterprise – Geoffrey BoultonScience as an Open Enterprise – Geoffrey Boulton
Science as an Open Enterprise – Geoffrey Boulton
 
Learning Analytics as Educational Knowledge Infrastructure
Learning Analytics as Educational Knowledge InfrastructureLearning Analytics as Educational Knowledge Infrastructure
Learning Analytics as Educational Knowledge Infrastructure
 
Citizen Science Phenotypes
Citizen Science PhenotypesCitizen Science Phenotypes
Citizen Science Phenotypes
 
Delphi2 results (Cycle 2) and towards Delphi3
Delphi2 results (Cycle 2) and towards Delphi3Delphi2 results (Cycle 2) and towards Delphi3
Delphi2 results (Cycle 2) and towards Delphi3
 
Mest3 Internet Lessons 1-3
Mest3 Internet Lessons 1-3Mest3 Internet Lessons 1-3
Mest3 Internet Lessons 1-3
 
MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...
MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...
MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...
 
Web Observatories and e-Research
Web Observatories and e-ResearchWeb Observatories and e-Research
Web Observatories and e-Research
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd
 
Information, Science, and Society
Information, Science, and SocietyInformation, Science, and Society
Information, Science, and Society
 
Little eScience
Little eScienceLittle eScience
Little eScience
 
Data Science and Urban Science @ UW
Data Science and Urban Science @ UWData Science and Urban Science @ UW
Data Science and Urban Science @ UW
 
Public data archiving: Who does? Who doesn't? What can we do about it?
Public data archiving: Who does?  Who doesn't?  What can we do about it?Public data archiving: Who does?  Who doesn't?  What can we do about it?
Public data archiving: Who does? Who doesn't? What can we do about it?
 
An initial exploration of Citizen Science
An initial exploration of Citizen ScienceAn initial exploration of Citizen Science
An initial exploration of Citizen Science
 

Similar to Data Responsibly: The next decade of data science

Data, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data ScienceData, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data ScienceUniversity of Washington
 
Thoughts on Big Data and more for the WA State Legislature
Thoughts on Big Data and more for the WA State LegislatureThoughts on Big Data and more for the WA State Legislature
Thoughts on Big Data and more for the WA State LegislatureUniversity of Washington
 
CODATA International Training Workshop in Big Data for Science for Researcher...
CODATA International Training Workshop in Big Data for Science for Researcher...CODATA International Training Workshop in Big Data for Science for Researcher...
CODATA International Training Workshop in Big Data for Science for Researcher...Johann van Wyk
 
Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?LEARN Project
 
Mind the Gap: Reflections on Data Policies and Practice
Mind the Gap: Reflections on Data Policies and PracticeMind the Gap: Reflections on Data Policies and Practice
Mind the Gap: Reflections on Data Policies and PracticeLizLyon
 
Biomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AloneBiomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AlonePhilip Bourne
 
Citizen Science overview for ASU HSD598 graduate course, "Citizen Science"
Citizen Science overview for ASU HSD598 graduate course, "Citizen Science"Citizen Science overview for ASU HSD598 graduate course, "Citizen Science"
Citizen Science overview for ASU HSD598 graduate course, "Citizen Science"Darlene Cavalier
 
Univ of Miami CTSI: Citizen science seminar; Oct 2014
Univ of Miami CTSI: Citizen science seminar; Oct 2014Univ of Miami CTSI: Citizen science seminar; Oct 2014
Univ of Miami CTSI: Citizen science seminar; Oct 2014Richard Bookman
 
Studying Cybercrime: Raising Awareness of Objectivity & Bias
Studying Cybercrime: Raising Awareness of Objectivity & BiasStudying Cybercrime: Raising Awareness of Objectivity & Bias
Studying Cybercrime: Raising Awareness of Objectivity & Biasgloriakt
 
Respond to these two classmates’ posts. 1. After reading thi.docx
Respond to these two classmates’ posts. 1. After reading thi.docxRespond to these two classmates’ posts. 1. After reading thi.docx
Respond to these two classmates’ posts. 1. After reading thi.docxdaynamckernon
 
Data Science definition
Data Science definitionData Science definition
Data Science definitionCarloLauro1
 
Let's talk about Data Science
Let's talk about Data ScienceLet's talk about Data Science
Let's talk about Data ScienceCarlo Lauro
 
Acting as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decadeActing as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decadeLizLyon
 
Computational Social Science:The Collaborative Futures of Big Data, Computer ...
Computational Social Science:The Collaborative Futures of Big Data, Computer ...Computational Social Science:The Collaborative Futures of Big Data, Computer ...
Computational Social Science:The Collaborative Futures of Big Data, Computer ...Academia Sinica
 
Open Data in a Global Ecosystem
Open Data in a Global EcosystemOpen Data in a Global Ecosystem
Open Data in a Global EcosystemPhilip Bourne
 
Respond to at least two of your classmates’ posts. 1. After .docx
Respond to at least two of your classmates’ posts. 1. After .docxRespond to at least two of your classmates’ posts. 1. After .docx
Respond to at least two of your classmates’ posts. 1. After .docxdaynamckernon
 
A politics of counting - putting people back into big data
A politics of counting - putting people back into big dataA politics of counting - putting people back into big data
A politics of counting - putting people back into big dataHamish Robertson
 
After reading this journal article regarding ethics of interne.docx
After reading this journal article regarding ethics of interne.docxAfter reading this journal article regarding ethics of interne.docx
After reading this journal article regarding ethics of interne.docxrosiecabaniss
 

Similar to Data Responsibly: The next decade of data science (20)

Data, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data ScienceData, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data Science
 
Thoughts on Big Data and more for the WA State Legislature
Thoughts on Big Data and more for the WA State LegislatureThoughts on Big Data and more for the WA State Legislature
Thoughts on Big Data and more for the WA State Legislature
 
Science Data, Responsibly
Science Data, ResponsiblyScience Data, Responsibly
Science Data, Responsibly
 
CODATA International Training Workshop in Big Data for Science for Researcher...
CODATA International Training Workshop in Big Data for Science for Researcher...CODATA International Training Workshop in Big Data for Science for Researcher...
CODATA International Training Workshop in Big Data for Science for Researcher...
 
Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?
 
Mind the Gap: Reflections on Data Policies and Practice
Mind the Gap: Reflections on Data Policies and PracticeMind the Gap: Reflections on Data Policies and Practice
Mind the Gap: Reflections on Data Policies and Practice
 
Biomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AloneBiomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not Alone
 
Citizen Science overview for ASU HSD598 graduate course, "Citizen Science"
Citizen Science overview for ASU HSD598 graduate course, "Citizen Science"Citizen Science overview for ASU HSD598 graduate course, "Citizen Science"
Citizen Science overview for ASU HSD598 graduate course, "Citizen Science"
 
Univ of Miami CTSI: Citizen science seminar; Oct 2014
Univ of Miami CTSI: Citizen science seminar; Oct 2014Univ of Miami CTSI: Citizen science seminar; Oct 2014
Univ of Miami CTSI: Citizen science seminar; Oct 2014
 
Studying Cybercrime: Raising Awareness of Objectivity & Bias
Studying Cybercrime: Raising Awareness of Objectivity & BiasStudying Cybercrime: Raising Awareness of Objectivity & Bias
Studying Cybercrime: Raising Awareness of Objectivity & Bias
 
Respond to these two classmates’ posts. 1. After reading thi.docx
Respond to these two classmates’ posts. 1. After reading thi.docxRespond to these two classmates’ posts. 1. After reading thi.docx
Respond to these two classmates’ posts. 1. After reading thi.docx
 
Data Science definition
Data Science definitionData Science definition
Data Science definition
 
Let's talk about Data Science
Let's talk about Data ScienceLet's talk about Data Science
Let's talk about Data Science
 
Acting as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decadeActing as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decade
 
Computational Social Science:The Collaborative Futures of Big Data, Computer ...
Computational Social Science:The Collaborative Futures of Big Data, Computer ...Computational Social Science:The Collaborative Futures of Big Data, Computer ...
Computational Social Science:The Collaborative Futures of Big Data, Computer ...
 
Open Data in a Global Ecosystem
Open Data in a Global EcosystemOpen Data in a Global Ecosystem
Open Data in a Global Ecosystem
 
Respond to at least two of your classmates’ posts. 1. After .docx
Respond to at least two of your classmates’ posts. 1. After .docxRespond to at least two of your classmates’ posts. 1. After .docx
Respond to at least two of your classmates’ posts. 1. After .docx
 
A politics of counting - putting people back into big data
A politics of counting - putting people back into big dataA politics of counting - putting people back into big data
A politics of counting - putting people back into big data
 
After reading this journal article regarding ethics of interne.docx
After reading this journal article regarding ethics of interne.docxAfter reading this journal article regarding ethics of interne.docx
After reading this journal article regarding ethics of interne.docx
 
A brave new world: student surveillance in higher education
A brave new world: student surveillance in higher educationA brave new world: student surveillance in higher education
A brave new world: student surveillance in higher education
 

More from University of Washington

Database Agnostic Workload Management (CIDR 2019)
Database Agnostic Workload Management (CIDR 2019)Database Agnostic Workload Management (CIDR 2019)
Database Agnostic Workload Management (CIDR 2019)University of Washington
 
The Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore EnvironmentsThe Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore EnvironmentsUniversity of Washington
 
Big Data + Big Sim: Query Processing over Unstructured CFD Models
Big Data + Big Sim: Query Processing over Unstructured CFD ModelsBig Data + Big Sim: Query Processing over Unstructured CFD Models
Big Data + Big Sim: Query Processing over Unstructured CFD ModelsUniversity of Washington
 
Data Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionData Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionUniversity of Washington
 
The Other HPC: High Productivity Computing
The Other HPC: High Productivity ComputingThe Other HPC: High Productivity Computing
The Other HPC: High Productivity ComputingUniversity of Washington
 
Big Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DBig Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DUniversity of Washington
 
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe University of Washington
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)University of Washington
 
XLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and MyriaXLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and MyriaUniversity of Washington
 
Myria: Analytics-as-a-Service for (Data) Scientists
Myria: Analytics-as-a-Service for (Data) ScientistsMyria: Analytics-as-a-Service for (Data) Scientists
Myria: Analytics-as-a-Service for (Data) ScientistsUniversity of Washington
 
Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013University of Washington
 
Enabling Collaborative Research Data Management with SQLShare
Enabling Collaborative Research Data Management with SQLShareEnabling Collaborative Research Data Management with SQLShare
Enabling Collaborative Research Data Management with SQLShareUniversity of Washington
 
Virtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible ResearchVirtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible ResearchUniversity of Washington
 
HaLoop: Efficient Iterative Processing on Large-Scale Clusters
HaLoop: Efficient Iterative Processing on Large-Scale ClustersHaLoop: Efficient Iterative Processing on Large-Scale Clusters
HaLoop: Efficient Iterative Processing on Large-Scale ClustersUniversity of Washington
 

More from University of Washington (20)

Database Agnostic Workload Management (CIDR 2019)
Database Agnostic Workload Management (CIDR 2019)Database Agnostic Workload Management (CIDR 2019)
Database Agnostic Workload Management (CIDR 2019)
 
The Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore EnvironmentsThe Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore Environments
 
Big Data + Big Sim: Query Processing over Unstructured CFD Models
Big Data + Big Sim: Query Processing over Unstructured CFD ModelsBig Data + Big Sim: Query Processing over Unstructured CFD Models
Big Data + Big Sim: Query Processing over Unstructured CFD Models
 
Democratizing Data Science in the Cloud
Democratizing Data Science in the CloudDemocratizing Data Science in the Cloud
Democratizing Data Science in the Cloud
 
Data Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionData Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data Interaction
 
The Other HPC: High Productivity Computing
The Other HPC: High Productivity ComputingThe Other HPC: High Productivity Computing
The Other HPC: High Productivity Computing
 
Urban Data Science at UW
Urban Data Science at UWUrban Data Science at UW
Urban Data Science at UW
 
Intro to Data Science Concepts
Intro to Data Science ConceptsIntro to Data Science Concepts
Intro to Data Science Concepts
 
Big Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DBig Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&D
 
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
 
XLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and MyriaXLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and Myria
 
Myria: Analytics-as-a-Service for (Data) Scientists
Myria: Analytics-as-a-Service for (Data) ScientistsMyria: Analytics-as-a-Service for (Data) Scientists
Myria: Analytics-as-a-Service for (Data) Scientists
 
Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013
 
eResearch New Zealand Keynote
eResearch New Zealand KeynoteeResearch New Zealand Keynote
eResearch New Zealand Keynote
 
Data science curricula at UW
Data science curricula at UWData science curricula at UW
Data science curricula at UW
 
Enabling Collaborative Research Data Management with SQLShare
Enabling Collaborative Research Data Management with SQLShareEnabling Collaborative Research Data Management with SQLShare
Enabling Collaborative Research Data Management with SQLShare
 
Virtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible ResearchVirtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible Research
 
End-to-End eScience
End-to-End eScienceEnd-to-End eScience
End-to-End eScience
 
HaLoop: Efficient Iterative Processing on Large-Scale Clusters
HaLoop: Efficient Iterative Processing on Large-Scale ClustersHaLoop: Efficient Iterative Processing on Large-Scale Clusters
HaLoop: Efficient Iterative Processing on Large-Scale Clusters
 

Recently uploaded

Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdfkhraisr
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...HyderabadDolls
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themeitharjee
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...kumargunjan9515
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 

Recently uploaded (20)

Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 

Data Responsibly: The next decade of data science

  • 1. Data, Responsibly: The Next Decade of Data Science Bill Howe, PhD Associate Professor, Information School Director, Cascadia Urban Analytics Cooperative Adjunct Associate Professor, Computer Science & Engineering University of Washington
  • 2. My goals this afternoon… • Describe “data science” from my perspective • Describe some concerns that have recently emerged around the irresponsible use of data science techniques and technologies • Show off some of the work we’re doing to address it
  • 3. DataLab Bill Howe Databases, data management Jessica Hullman Visualization, HCI Carole Palmer Open data, digital curation Nic Weber Open data, civic tech Jevin West Science of science, bibliometrics …”calling bullshit” Emma Spiro Social network analysis
  • 4. The Fourth Paradigm 1. Empirical + experimental 2. Theoretical 3. Computational 4. Data-Intensive Jim Gray 1/10/2018 Bill Howe, UW 4
  • 5. Nearly every field of discovery is transitioning from “data poor” to “data rich” Astronomy: LSST Physics: LHC Oceanography: OOI Social Sciences Biology: Sequencing Economics Neuroscience: EEG, fMRI
  • 6. My view: 1/10/2018 Bill Howe, UW 8 Data science is about answering questions using large, noisy, and heterogeneous datasets, usually those that were collected for some unrelated purpose
  • 7. 1/10/2018 Bill Howe, UW9 Question: How early and accurately can we predict flu outbreaks, so we can plan production levels of flu vaccine? Dataset: Search histories of users
  • 8. source: http://www.google.org/flutrends/us/#US http://www.google.com/permissions/using-product-graphics.html flu risk “Scientific hindsight shows that Google Flu Trends far overstated this year's flu season….” “Lots of media attention to this year's flu season skewed Google's search engine traffic.” David Wagner, Atlantic Wire, Feb 13 2013
  • 9. Question: 1/10/2018 Bill Howe, UW11 Do people that take paroxetine and pravastatin together exhibit hypoglycemia symptoms? Dataset: Search engine histories
  • 10. Ryen W White,Nicholas P Tatonetti, Nigam H Shah, Russ B Altman, Eric Horvitz, Web-scale pharmacovigilance: listening to signals from the crowd, J Am Med Inform Assoc, March 2013, doi:10.1136/amiajnl-2012-001482
  • 11. Open Sidewalks – Sidewalk maps for low-mobility citizens Project Leads: Nick Bolten, Anat Caspi – Taskar Center, CSE DSSG Fellows: Amir Amini, Yun Hao, Vaishnavi Ravichandran, Andre Stephens ALVA High School Students: Nick Krasnoselsky, Doris Layman eScience Data Scientist Mentors: Anthony Arendt, Jake Vanderplas “ 30 million Americans over 15 years old experience limited mobility, including difficulty walking, climbing stairs, using wheelchairs, crutches, walkers” while 24 million more persons experience difficulty walkinga quarter mile” |Picture: US Federal Highway administration http://www.fhwa.dot.gov/environment/bicycle_pedestrian/publications/sidewalk2/sidewalks204.cfm
  • 12. Automated cleaning of sidewalk data through computational geometry powered by data from: SDOT/Socrata Google API Step Runtime Solved (All) Percent Connecting T-Gaps ~3.9s 3,837 (4,352) 88.2 Intersection Cleaning ~23.6s 38,844 (44,700) 86.9 Polygon Cleaning ~10min 7,283 (8,035) 90.6 Subgraphs ~23.2s 39,913 (45,265) 88.1
  • 13. Homeless families may take many pathways through programs Emergency shelter Transitional housing Rapid re-housing Permanent housing Housing with services Unsuccessful exit
  • 14. Develop visualizations to show how homeless families move through programs
  • 15. Preliminary results to understand potential predictors of successful outcomes Correlation with successful outcome, by family characteristics Correlation with successful outcome, by homelessness program Emergency Shelter use tends to be associated with unsuccessful outcomes (unsurprising!) Homelessness Prevention programs more strongly associated with positive outcomes than transitional housing Substance abuse strongly associated with unsuccessful outcomes Parent employment strongest predictor of successful outcomes
  • 16. Common trajectories lead to different outcomes: • a successful exit from an episode would mean that the family found a permanent housing solution • a proportion of these still receive government subsidies • other exits are exits back into homelessness, or to other, unknown destinations Analyzing Family Trajectories through Programs Data: Pierce County Emergency Shelter -> Rapid Re-housing Emergency Shelter -> Transitional Housing 80% successful exits Only 40% successful exits
  • 17. ORCA Percentage Difference in Ridership, Seattle Mark Hallenbeck TRAC
  • 18. 1/10/2018 Bill Howe, UW 20 Passenger Type Redmond Tukwila Redmond Tukwila Adult 317181 72202 91% 67% Youth 12818 7433 4% 7% Senior 5425 4577 2% 4% Disabled 7722 10449 2% 10% Low Income 6912 12438 2% 12% Metro Boardings By Type of Rider
  • 20. Session 2 Summer 2014 121,215 students Session 1 Spring 2013 119,504 students
  • 22. 14 Cathy O’Neil September 2016 Three properties of a WMD: Opacity Scale Damage
  • 23. July 2016 “Data, Responsibly” Dagstuhl Workshop Gerhard Weikum Serge Abiteboul Julia Stoyanovich Gerome Miklau
  • 24. Observation: Epistemic issues are beginning to dominate the data science discussion in every field reproducibility, “algorithmic bias,” curation, discrimination, accountability, transparency, provenance, explanations, persuasion, privacy
  • 25. 21 Ex: Staples online pricing Reasoning: Offer deals to people that live near competitors’ stores Effect: lower prices offered to buyers who live in more affluent neighborhoods
  • 26. 22 [Latanya Sweeney; CACM 2013] Racially identifying names trigger ads suggestive of an arrest record slide adapted from Stoyanovich, Miklau
  • 27. 1/10/2018 Bill Howe, UW 29 Amazon Prime Now Delivery Area: Atlanta Bloomberg, 2016
  • 28. 1/10/2018 Bill Howe, UW 30 Amazon Prime Now Delivery Area: Chicago Bloomberg, 2016
  • 29. 1/10/2018 Bill Howe, UW 31 Amazon Prime Now Delivery Area: Boston Bloomberg, 2016
  • 31. 24 The Special Committee on Criminal Justice Reform's hearing of reducing the pre-trial jail population. Technical.ly, September 2016 Philadelphia is grappling with the prospect of a racist computer algorithm Any background signal in the data of institutional racism is amplified by the algorithm operationalized by the algorithm legitimized by the algorithm “Should I be afraid of risk assessment tools?” “No, you gotta tell me a lot more about yourself. At what age were you first arrested? What is the date of your most recent crime?” “And what’s the culture of policing in the neighborhood in which I grew up in?”
  • 32. First decade of Data Science research and practice: What can we do with massive, noisy, heterogeneous datasets? Next decade of Data Science research and practice: What should we do with massive, noisy, heterogeneous datasets? The way I think about this…..(1)
  • 33. The way I think about this…. (2) Decisions are based on two sources of information: 1. Past examples e.g., “prior arrests tend to increase likelihood of future arrests” 2. Societal constraints e.g., “we must avoid racial discrimination” 11/10/2016 Data, Responsibly / SciTech NW 16 We’ve become very good at automating the use of past examples We’ve only just started to think about incorporating societal constraints
  • 34. The way I think about this… (3) How do we apply societal constraints to algorithmic decision-making? Option 1: Keep a human in the loop Ex: EU General Data Protection Regulation requires that a human be involved in legally binding algorithmic decision-making Ex: Wisconsin Supreme Court says a human must review algorithmic decisions made by recidivism models Option 2: Build them into the algorithms themselves I’ll talk about some approaches for this 11/10/2016 Data, Responsibly / SciTech NW 17
  • 35. The way I think about this…(4) On transparency vs. accountability: • For human decision-making, sometimes explanations are required, improving transparency – Supreme court decisions – Employee reprimands/termination • But when transparency is difficult, accountability takes over – medical emergencies, business decisions • As we shift decisions to algorithms, we lose both transparency AND accountability • “The buck stops where?” 11/10/2016 Data, Responsibly / SciTech NW 18
  • 36. So what can we do about it? • Algorithms that balance predictive accuracy with fairness • Increase data sharing, while protecting privacy – Avoid the “tyranny of convenience” • Ensure transparency in all methods, datasets • Track known biases in how data was collected, so it can be controlled in downstream analytics • All of these approaches are being explored in the research community. 1/10/2018 Bill Howe, UW 38
  • 37. Recap • There’s a sea change underway in how we will teach and practice data science • No longer only about what can be done, but about what should be done • This is not just a policy/behavior/culture issue – there are technical problems to solve • Prediction: If a company is not thinking about this stuff, they will soon be facing retention and compliance issues – Witness how the privacy discussion evolved
  • 39. Science is a complete mess • Reproducibility – Begley & Ellis, Nature 2012: 6 out of 53 cancer studies reproducible – Only about half of psychology 100 studies had effect sizes that approximated the original result (Science, 2015) – Ioannidis 2005: Why most public research findings are false – Reinhart & Rogoff: global economic policy based on spreadsheet fuck ups 11/10/2016 Bill Howe, UW 33
  • 41. 11/10/2016 Data, Responsibly @ Dagstuhl 35 Retractions are increasing…..
  • 42.
  • 43. Why is this happening? (1) 11/10/2016 Bill Howe, UW 37
  • 44. Why is this happening? (2) 11/10/2016 Bill Howe, UW 38
  • 45. Why is this happening? (2) Publication Bias!
  • 46. “DEEP CURATION” TOWARDS AUTOMATIC SCIENTIFIC CLAIM CHECKING
  • 47. Vision: Validate scientific claims automatically – Check for manipulation (manipulated images, Benford’s Law) – Extract claims from papers – Check claims against the authors’ data – Check claims against related data sets – Automatic meta-analysis across the literature + public datasets • First steps – Automatic curation: Validate and attach metadata to public datasets – Longitudinal analysis of the visual literature 11/10/2016 Data, Responsibly / SciTech NW 41
  • 49. 11/10/2016 Bill Howe, UW 43 Microarray samples submitted to the Gene Expression Omnibus Curation is fast becoming the bottleneck to data sharing Maxim Gretchkin Poon Hoifung
  • 50. Maxim Gretchkin Poon Hoifung No growth in number of datasets used per paper!
  • 51. Maxim Gretchkin Poon Hoifung Majority of samples are one-time-use only!
  • 52. color = labels supplied as metadata clusters = 1st two PCA dimensions on the gene expression data itself Can we use curate algorithmically? Maxim Gretchkin Poon Hoifung The expression data and the text labels appear to disagree
  • 53. Maxim Gretchkin Poon Hoifung Better Tissue Type Labels Domain knowledge (Ontology) Expression data Free-text Metadata 2 Deep Networks text expr SVM
  • 54. Deep Curation Maxim Gretchkin Poon Hoifung Distant supervision and co-learning between text- based classified and expression-based classifier: Both models improve by training on each others’ results. Free-text classifier Expression classifier
  • 55. Deep Curation: Our stuff wins, with no training data Maxim Gretchkin Poon Hoifung state of the art our reimplementation of the state of the art our dueling pianos NN amount of training data used

Editor's Notes

  1. 4
  2. And processing power, either as raw processor speed or via novel multi-core and many-core architectures, is also continuing to increase exponentially…
  3. … but human cognitive capacity is remaining constant. How can computing technologies help scientists make sense out of these vast and complex data sets?
  4. The challenges stem from the large, noisy, and heterogeneous more than from colelcting the data in the first place. Data scie
  5. Google
  6. So in part as an attempt to relate “eSciene” and “data science,” and in part to make sure the idea of data science wasn’t completely taken over by the machine learning people, we ran a massively open online course last Spring called Introduction to Data Science We taught Scalable Databases, MapReduce, Statistics, Machine Learning, Visualization
  7. Following a 2014 report entitled “Big Data: Seizing Opportunities, Preserving Values”