SlideShare a Scribd company logo
Harnessing the power of
data science in the
service of humanity.
Our Purpose: We amplify the impact of social organizations.
Our Customer: Social organizations that have a clear theory of
change for reducing human suffering
Our Competitive Advantage: Our network of pro bono data scientists
Our Product: Data science services, i.e. predictive analytics, machine
learning, AI
Our Style: Human-centered design, jargon-free, accessible
What We Do
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
DataKind and ICAAD
Classifying UPR Records
ICAAD: International Center for Advocates
Against Discrimination
Non-profit organization that combats structural discrimination through
monitoring global trends, fostering research and designing interventions
Promote religious freedom in France
Combat gender-based violence in the Pacific Islands
Better documentation of hate crimes
Mapping discrimination
18
19
What was the DataCorps problem?
We have a database of text records from the United
Nations: the “Universal Periodic Review”
21
Data: Universal Periodic Review
What was the DataCorps problem?
How do we leverage these UPR records to better
understand human rights conditions across the world?
Labeling with Sustainable Development Goals
Adopted in 2015, the SDGs are a set of seventeen aspirational goals that all UN
member states are committed to achieve, covering a broad range of human
rights and development issues
Successor to the Millennium Development Goals
Task: How do we map a UPR to an SDG(s)?
Deliverables
1) Build an MVP algorithm that systematically classifies Universal Periodic
Review (UPR) records using Sustainable Development Goals
2) Using the results from the algorithm, create a dashboard that visualizes
global patterns of discrimination
These two tools will enable ICAAD to better allocate their resources towards
the most important human rights interventions, as well as better disseminate
their findings to other related organizations.
What We Did...
UPR Data
Source Number of Records Number of Labels
ICAAD labeled 1247 2351
DataKind labeled 349 628
All-organic, self-harvested and hand-labeled...
Data Prep
1) Each UPR = (very short) document
2) Clean, tokenize and create (1,2)-grams
3) Create term-document matrix
4) Feed bag-of-words matrix into ML model
5) ML model = two step “ensemble”
Machine Learning Layer: Multi-Label SVM
Support Vector Machine
Linear (no kernel)
Loss function: Squared hinge
Penalty type: L1
Regularization constant: 2.0
Keyword Lookup Layer
If UPR text contains the word “corruption” → SDG #16
If UPR text contains the word “HIV” or “AIDS” → SDG #3
If UPR text contains the word “ICRMW” → SDG #10
And so on...
Final Ensemble Model: CV Metrics
ML Layer ML + Keyword Lookup
Precision 0.827 0.772
Recall 0.758 0.848
F1-Score 0.787 0.802
ML Layer by itself does very well, but by adding the Keyword layer,
we can sacrifice a little bit of precision for a large gain in recall,
and get overall better performance.
Dashboard Visualizations: http://52.3.119.223/
The Aftermath Part 1
● Proof of concept algorithm delivered last October
● Demonstrated and implicated among various project
partners
The Aftermath Part 2
● Team from Xerox brought in to build v2 of algorithm
○ Main SDG category contains 169 additional sub-goals
○ ICAAD wants to classify UPR records using these sub-goals
○ Army of volunteer lawyers doing a lot of manual labeling
● Something concrete by next summer!
What We Learned (Parting Shots)
1. Easier = better
2. Small data is hard
3. Simple Boolean logic works surprisingly well
4. Data scientists are paid (and sometimes not) to do the
dirty work
DataCorps Team
Ben Cohen: Software Engineer @ Warby Parker
Rebecca Wei: PhD Student @ Northwestern
Karry Lu: Senior Data Scientist @ Plated
Project Repo
https://github.com/karry-lu/datakind-icaad-model
39
How do we find evidence?
How do we communicate evidence?
How do we use evidence?
42
43
Information overload
Respondents from UK
conservation community
indicate desire to use
evidence but:
Lacked a support framework
to quickly sort and evaluate
evidence
Experience-
based
Evidence-
based
modified from Pullin et al. 2004
Evidence gap
Evidence gap
Need for knowledge on
effectiveness
Evidence-based decision making:
Using findings to inform actions
Desired outcomes achieved
Research project
Communicate findings
Monitor and evaluate
progress and outcomes
Identify
knowledge gaps
Synthesize
knowledge gluts
Determine
indicators
Adjust actions
RESEARCHERS
PRACTITIONERS
Theory of change
The need
Practitioners need standardized
storage and access to research
insights from academic and grey
literature for evidence-based
decision making
Researchers need a framework to
follow to create these resources
Best
Science
Expert
Opinion
Society’s needs
and preferences
Evidence based
Decision-making
Systematic mapping process
51
Systematic Map
Problem #1: interactivity
Thorn, Jessica PR, et al. "What evidence exists for the
effectiveness of on-farm conservation land
management strategies for preserving ecosystem
services in developing countries? A systematic map."
Environmental Evidence 5.1 (2016): 13.
The AskProblem # 2: Manual screening
Mapping example
Problem # 3: Tools exist
Less
More
colandrapp.com
Tool framework
System 1: relevance ranking
• Citations are ranked by expected relevance
depending on the availability and number of
user-labeled examples
– 1st uses search terms from review planning:
computes the amount of overlap between those
terms and citations' title + abstract + keywords
– 2nd after enough examples have been labeled,
uses distributional word vectors (word2vec) as
features for a support vector classifier that predicts
inclusion or exclusion; use confidence of that
classification as expected relevance
• Citations are randomly sampled each time, to avoid
hasty generalization
Unscreened Relevance is learned every 10
citations and documents are
re-sorted
System 2: extraction and tagging
• A better methodology might be to use the training
data to find sentences in the document that might
indicate a label. (provide provenance)
• We can train the system to over-predict (predict
sentences from a large number of the labels), so
that the system can focus on recall, while human
annotators can focus on precision
• For locations we can use a "Named Entity
Recognition" system to find mentioned locations
in the document, and suggest these as labels
• For other metadata, we can train a model which
predicts the relevance of sentences to a label
• We show the sentences that best predict labels to
the user, who can then use that information to
pick the correct labels
Data extraction
This process also learns
relevance, set at 50 reviews
before it presents confidence
www.natureandpeopleevidence.org
Interaction with the data portal
•Output CSV file with
individual citations and
factor tags
•CSV ingested by receiving
system
http://natureandpeopleevidence.org/
Measuring evidence synthesis and
dissemination on the “T” impact model
Sector (diffuse) impact as
measured by
• Access and operability
• Common vs. uncommon solution
• Dissemination framework
Organization (deep) impact as
measured by
• Operational efficiency
• Increased productivity
• Expanded service
16 reviews
13 review
leads
28
users
Two weeks
of soft
launch
colandr
Two virtual
trainings
conducted
Data Portal
~1,400 SESSIONS
8 MONTHS 47 REGISTERED USERS
Multiple in person trainings
“Evidence Based Conservation”
ARTICLES
ON IT
139
ARTICLES
CITING THEM
2100 HOW MANY PEOPLE
USE EVIDENCE IN
DECISION MAKING?
73
●
●
●
●
●
●
●
79

More Related Content

What's hot

Achieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logsAchieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logs
IOSR Journals
 
From data lakes to actionable data (adventures in data curation)
From data lakes to actionable data (adventures in data curation)From data lakes to actionable data (adventures in data curation)
From data lakes to actionable data (adventures in data curation)
Novartis Institutes for BioMedical Research
 
Emerging Data Citation Infrastructure
Emerging Data Citation InfrastructureEmerging Data Citation Infrastructure
Emerging Data Citation Infrastructure
Micah Altman
 
Artificial Intelligence in Data Curation
Artificial Intelligence in Data CurationArtificial Intelligence in Data Curation
Artificial Intelligence in Data Curation
Novartis Institutes for BioMedical Research
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...
Riccardo Albertoni
 
Best Practices for Managing Your Data
Best Practices for Managing Your DataBest Practices for Managing Your Data
Best Practices for Managing Your Data
Elaine Martin
 
Data and Knowledge as Commodities
Data and Knowledge as CommoditiesData and Knowledge as Commodities
Data and Knowledge as Commodities
Mathieu d'Aquin
 
DATA MINING.doc
DATA MINING.docDATA MINING.doc
DATA MINING.docbutest
 
Metadata Views (by Donald Palmer)
Metadata Views (by Donald Palmer)Metadata Views (by Donald Palmer)
Metadata Views (by Donald Palmer)
Donald Palmer
 
Paradigm4 Research Report: Leaving Data on the table
Paradigm4 Research Report: Leaving Data on the tableParadigm4 Research Report: Leaving Data on the table
Paradigm4 Research Report: Leaving Data on the table
Paradigm4
 
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality Assessment
Amrapali Zaveri, PhD
 
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Merce Crosas
 
Phd thesis final presentation
Phd thesis   final presentationPhd thesis   final presentation
Phd thesis final presentation
Cristhian Figueroa
 
Artificial Intelligence for Automating Data Analysis
Artificial Intelligence for Automating Data AnalysisArtificial Intelligence for Automating Data Analysis
Artificial Intelligence for Automating Data Analysis
Manuel Martín
 
Modern association rule mining methods
Modern association rule mining methodsModern association rule mining methods
Modern association rule mining methods
ijcsity
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dcc.titus.brown
 
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Merce Crosas
 
Amrapali Zaveri Defense
Amrapali Zaveri DefenseAmrapali Zaveri Defense
Amrapali Zaveri Defense
Amrapali Zaveri, PhD
 
CuttingEEG - Open Science, Open Data and BIDS for EEG
CuttingEEG - Open Science, Open Data and BIDS for EEGCuttingEEG - Open Science, Open Data and BIDS for EEG
CuttingEEG - Open Science, Open Data and BIDS for EEG
Robert Oostenveld
 

What's hot (20)

Achieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logsAchieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logs
 
From data lakes to actionable data (adventures in data curation)
From data lakes to actionable data (adventures in data curation)From data lakes to actionable data (adventures in data curation)
From data lakes to actionable data (adventures in data curation)
 
DS4G
DS4GDS4G
DS4G
 
Emerging Data Citation Infrastructure
Emerging Data Citation InfrastructureEmerging Data Citation Infrastructure
Emerging Data Citation Infrastructure
 
Artificial Intelligence in Data Curation
Artificial Intelligence in Data CurationArtificial Intelligence in Data Curation
Artificial Intelligence in Data Curation
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...
 
Best Practices for Managing Your Data
Best Practices for Managing Your DataBest Practices for Managing Your Data
Best Practices for Managing Your Data
 
Data and Knowledge as Commodities
Data and Knowledge as CommoditiesData and Knowledge as Commodities
Data and Knowledge as Commodities
 
DATA MINING.doc
DATA MINING.docDATA MINING.doc
DATA MINING.doc
 
Metadata Views (by Donald Palmer)
Metadata Views (by Donald Palmer)Metadata Views (by Donald Palmer)
Metadata Views (by Donald Palmer)
 
Paradigm4 Research Report: Leaving Data on the table
Paradigm4 Research Report: Leaving Data on the tableParadigm4 Research Report: Leaving Data on the table
Paradigm4 Research Report: Leaving Data on the table
 
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality Assessment
 
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
 
Phd thesis final presentation
Phd thesis   final presentationPhd thesis   final presentation
Phd thesis final presentation
 
Artificial Intelligence for Automating Data Analysis
Artificial Intelligence for Automating Data AnalysisArtificial Intelligence for Automating Data Analysis
Artificial Intelligence for Automating Data Analysis
 
Modern association rule mining methods
Modern association rule mining methodsModern association rule mining methods
Modern association rule mining methods
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc
 
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
 
Amrapali Zaveri Defense
Amrapali Zaveri DefenseAmrapali Zaveri Defense
Amrapali Zaveri Defense
 
CuttingEEG - Open Science, Open Data and BIDS for EEG
CuttingEEG - Open Science, Open Data and BIDS for EEGCuttingEEG - Open Science, Open Data and BIDS for EEG
CuttingEEG - Open Science, Open Data and BIDS for EEG
 

Similar to ODSC East 2017: Data Science Models For Good

Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Gianluca Tarasconi
 
Oracle openworld-presentation
Oracle openworld-presentationOracle openworld-presentation
Oracle openworld-presentation
Dr. Neil Brittliff
 
Martone grethe
Martone gretheMartone grethe
Martone grethe
Maryann Martone
 
informatics_future.pdf
informatics_future.pdfinformatics_future.pdf
informatics_future.pdf
AdhySugara2
 
Rubbish in Rubbish out: applying good data governance techniques to gain maxi...
Rubbish in Rubbish out: applying good data governance techniques to gain maxi...Rubbish in Rubbish out: applying good data governance techniques to gain maxi...
Rubbish in Rubbish out: applying good data governance techniques to gain maxi...
Ringgold Inc
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Miningdataminers.ir
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining Phi Jack
 
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
Stuart Shulman
 
FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...
FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...
FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...
Kathleen Jagodnik
 
Machinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdfMachinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdf
SaketBansal9
 
Making an impact with data science
Making an impact  with data scienceMaking an impact  with data science
Making an impact with data scienceJordan Engbers
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Anita de Waard
 
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedMachine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
Sri Ambati
 
No Free Lunch: Metadata in the life sciences
No Free Lunch:  Metadata in the life sciencesNo Free Lunch:  Metadata in the life sciences
No Free Lunch: Metadata in the life sciences
Chris Dwan
 
data mining
data miningdata mining
data mining
manasa polu
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical Data
Paul Agapow
 
Lowenberg Making Data Count
Lowenberg Making Data CountLowenberg Making Data Count
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental MetadataMaking it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Michel Dumontier
 
Data mining
Data miningData mining

Similar to ODSC East 2017: Data Science Models For Good (20)

Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
 
Oracle openworld-presentation
Oracle openworld-presentationOracle openworld-presentation
Oracle openworld-presentation
 
Martone grethe
Martone gretheMartone grethe
Martone grethe
 
informatics_future.pdf
informatics_future.pdfinformatics_future.pdf
informatics_future.pdf
 
Nicolson
NicolsonNicolson
Nicolson
 
Rubbish in Rubbish out: applying good data governance techniques to gain maxi...
Rubbish in Rubbish out: applying good data governance techniques to gain maxi...Rubbish in Rubbish out: applying good data governance techniques to gain maxi...
Rubbish in Rubbish out: applying good data governance techniques to gain maxi...
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Mining
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
 
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
 
FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...
FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...
FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...
 
Machinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdfMachinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdf
 
Making an impact with data science
Making an impact  with data scienceMaking an impact  with data science
Making an impact with data science
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
 
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedMachine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
 
No Free Lunch: Metadata in the life sciences
No Free Lunch:  Metadata in the life sciencesNo Free Lunch:  Metadata in the life sciences
No Free Lunch: Metadata in the life sciences
 
data mining
data miningdata mining
data mining
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical Data
 
Lowenberg Making Data Count
Lowenberg Making Data CountLowenberg Making Data Count
Lowenberg Making Data Count
 
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental MetadataMaking it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
 
Data mining
Data miningData mining
Data mining
 

Recently uploaded

Uniform Guidance 3.0 - The New 2 CFR 200
Uniform Guidance 3.0 - The New 2 CFR 200Uniform Guidance 3.0 - The New 2 CFR 200
Uniform Guidance 3.0 - The New 2 CFR 200
GrantManagementInsti
 
PPT Item # 8 - Tuxedo Columbine 3way Stop
PPT Item # 8 - Tuxedo Columbine 3way StopPPT Item # 8 - Tuxedo Columbine 3way Stop
PPT Item # 8 - Tuxedo Columbine 3way Stop
ahcitycouncil
 
一比一原版(UQ毕业证)昆士兰大学毕业证成绩单
一比一原版(UQ毕业证)昆士兰大学毕业证成绩单一比一原版(UQ毕业证)昆士兰大学毕业证成绩单
一比一原版(UQ毕业证)昆士兰大学毕业证成绩单
ehbuaw
 
PPT Item # 9 - 2024 Street Maintenance Program(SMP) Amendment
PPT Item # 9 - 2024 Street Maintenance Program(SMP) AmendmentPPT Item # 9 - 2024 Street Maintenance Program(SMP) Amendment
PPT Item # 9 - 2024 Street Maintenance Program(SMP) Amendment
ahcitycouncil
 
Canadian Immigration Tracker March 2024 - Key Slides
Canadian Immigration Tracker March 2024 - Key SlidesCanadian Immigration Tracker March 2024 - Key Slides
Canadian Immigration Tracker March 2024 - Key Slides
Andrew Griffith
 
Russian anarchist and anti-war movement in the third year of full-scale war
Russian anarchist and anti-war movement in the third year of full-scale warRussian anarchist and anti-war movement in the third year of full-scale war
Russian anarchist and anti-war movement in the third year of full-scale war
Antti Rautiainen
 
MHM Roundtable Slide Deck WHA Side-event May 28 2024.pptx
MHM Roundtable Slide Deck WHA Side-event May 28 2024.pptxMHM Roundtable Slide Deck WHA Side-event May 28 2024.pptx
MHM Roundtable Slide Deck WHA Side-event May 28 2024.pptx
ILC- UK
 
一比一原版(QUT毕业证)昆士兰科技大学毕业证成绩单
一比一原版(QUT毕业证)昆士兰科技大学毕业证成绩单一比一原版(QUT毕业证)昆士兰科技大学毕业证成绩单
一比一原版(QUT毕业证)昆士兰科技大学毕业证成绩单
ukyewh
 
2024: The FAR - Federal Acquisition Regulations, Part 36
2024: The FAR - Federal Acquisition Regulations, Part 362024: The FAR - Federal Acquisition Regulations, Part 36
2024: The FAR - Federal Acquisition Regulations, Part 36
JSchaus & Associates
 
2024: The FAR - Federal Acquisition Regulations, Part 37
2024: The FAR - Federal Acquisition Regulations, Part 372024: The FAR - Federal Acquisition Regulations, Part 37
2024: The FAR - Federal Acquisition Regulations, Part 37
JSchaus & Associates
 
PPT Item # 7 - BB Inspection Services Agmt
PPT Item # 7 - BB Inspection Services AgmtPPT Item # 7 - BB Inspection Services Agmt
PPT Item # 7 - BB Inspection Services Agmt
ahcitycouncil
 
ZGB - The Role of Generative AI in Government transformation.pdf
ZGB - The Role of Generative AI in Government transformation.pdfZGB - The Role of Generative AI in Government transformation.pdf
ZGB - The Role of Generative AI in Government transformation.pdf
Saeed Al Dhaheri
 
一比一原版(WSU毕业证)西悉尼大学毕业证成绩单
一比一原版(WSU毕业证)西悉尼大学毕业证成绩单一比一原版(WSU毕业证)西悉尼大学毕业证成绩单
一比一原版(WSU毕业证)西悉尼大学毕业证成绩单
evkovas
 
如何办理(uoit毕业证书)加拿大安大略理工大学毕业证文凭证书录取通知原版一模一样
如何办理(uoit毕业证书)加拿大安大略理工大学毕业证文凭证书录取通知原版一模一样如何办理(uoit毕业证书)加拿大安大略理工大学毕业证文凭证书录取通知原版一模一样
如何办理(uoit毕业证书)加拿大安大略理工大学毕业证文凭证书录取通知原版一模一样
850fcj96
 
一比一原版(Adelaide毕业证)阿德莱德大学毕业证成绩单
一比一原版(Adelaide毕业证)阿德莱德大学毕业证成绩单一比一原版(Adelaide毕业证)阿德莱德大学毕业证成绩单
一比一原版(Adelaide毕业证)阿德莱德大学毕业证成绩单
ehbuaw
 
PPT Item # 5 - 5330 Broadway ARB Case # 930F
PPT Item # 5 - 5330 Broadway ARB Case # 930FPPT Item # 5 - 5330 Broadway ARB Case # 930F
PPT Item # 5 - 5330 Broadway ARB Case # 930F
ahcitycouncil
 
PACT launching workshop presentation-Final.pdf
PACT launching workshop presentation-Final.pdfPACT launching workshop presentation-Final.pdf
PACT launching workshop presentation-Final.pdf
Mohammed325561
 
Up the Ratios Bylaws - a Comprehensive Process of Our Organization
Up the Ratios Bylaws - a Comprehensive Process of Our OrganizationUp the Ratios Bylaws - a Comprehensive Process of Our Organization
Up the Ratios Bylaws - a Comprehensive Process of Our Organization
uptheratios
 
快速制作(ocad毕业证书)加拿大安大略艺术设计学院毕业证本科学历雅思成绩单原版一模一样
快速制作(ocad毕业证书)加拿大安大略艺术设计学院毕业证本科学历雅思成绩单原版一模一样快速制作(ocad毕业证书)加拿大安大略艺术设计学院毕业证本科学历雅思成绩单原版一模一样
快速制作(ocad毕业证书)加拿大安大略艺术设计学院毕业证本科学历雅思成绩单原版一模一样
850fcj96
 
一比一原版(ANU毕业证)澳大利亚国立大学毕业证成绩单
一比一原版(ANU毕业证)澳大利亚国立大学毕业证成绩单一比一原版(ANU毕业证)澳大利亚国立大学毕业证成绩单
一比一原版(ANU毕业证)澳大利亚国立大学毕业证成绩单
ehbuaw
 

Recently uploaded (20)

Uniform Guidance 3.0 - The New 2 CFR 200
Uniform Guidance 3.0 - The New 2 CFR 200Uniform Guidance 3.0 - The New 2 CFR 200
Uniform Guidance 3.0 - The New 2 CFR 200
 
PPT Item # 8 - Tuxedo Columbine 3way Stop
PPT Item # 8 - Tuxedo Columbine 3way StopPPT Item # 8 - Tuxedo Columbine 3way Stop
PPT Item # 8 - Tuxedo Columbine 3way Stop
 
一比一原版(UQ毕业证)昆士兰大学毕业证成绩单
一比一原版(UQ毕业证)昆士兰大学毕业证成绩单一比一原版(UQ毕业证)昆士兰大学毕业证成绩单
一比一原版(UQ毕业证)昆士兰大学毕业证成绩单
 
PPT Item # 9 - 2024 Street Maintenance Program(SMP) Amendment
PPT Item # 9 - 2024 Street Maintenance Program(SMP) AmendmentPPT Item # 9 - 2024 Street Maintenance Program(SMP) Amendment
PPT Item # 9 - 2024 Street Maintenance Program(SMP) Amendment
 
Canadian Immigration Tracker March 2024 - Key Slides
Canadian Immigration Tracker March 2024 - Key SlidesCanadian Immigration Tracker March 2024 - Key Slides
Canadian Immigration Tracker March 2024 - Key Slides
 
Russian anarchist and anti-war movement in the third year of full-scale war
Russian anarchist and anti-war movement in the third year of full-scale warRussian anarchist and anti-war movement in the third year of full-scale war
Russian anarchist and anti-war movement in the third year of full-scale war
 
MHM Roundtable Slide Deck WHA Side-event May 28 2024.pptx
MHM Roundtable Slide Deck WHA Side-event May 28 2024.pptxMHM Roundtable Slide Deck WHA Side-event May 28 2024.pptx
MHM Roundtable Slide Deck WHA Side-event May 28 2024.pptx
 
一比一原版(QUT毕业证)昆士兰科技大学毕业证成绩单
一比一原版(QUT毕业证)昆士兰科技大学毕业证成绩单一比一原版(QUT毕业证)昆士兰科技大学毕业证成绩单
一比一原版(QUT毕业证)昆士兰科技大学毕业证成绩单
 
2024: The FAR - Federal Acquisition Regulations, Part 36
2024: The FAR - Federal Acquisition Regulations, Part 362024: The FAR - Federal Acquisition Regulations, Part 36
2024: The FAR - Federal Acquisition Regulations, Part 36
 
2024: The FAR - Federal Acquisition Regulations, Part 37
2024: The FAR - Federal Acquisition Regulations, Part 372024: The FAR - Federal Acquisition Regulations, Part 37
2024: The FAR - Federal Acquisition Regulations, Part 37
 
PPT Item # 7 - BB Inspection Services Agmt
PPT Item # 7 - BB Inspection Services AgmtPPT Item # 7 - BB Inspection Services Agmt
PPT Item # 7 - BB Inspection Services Agmt
 
ZGB - The Role of Generative AI in Government transformation.pdf
ZGB - The Role of Generative AI in Government transformation.pdfZGB - The Role of Generative AI in Government transformation.pdf
ZGB - The Role of Generative AI in Government transformation.pdf
 
一比一原版(WSU毕业证)西悉尼大学毕业证成绩单
一比一原版(WSU毕业证)西悉尼大学毕业证成绩单一比一原版(WSU毕业证)西悉尼大学毕业证成绩单
一比一原版(WSU毕业证)西悉尼大学毕业证成绩单
 
如何办理(uoit毕业证书)加拿大安大略理工大学毕业证文凭证书录取通知原版一模一样
如何办理(uoit毕业证书)加拿大安大略理工大学毕业证文凭证书录取通知原版一模一样如何办理(uoit毕业证书)加拿大安大略理工大学毕业证文凭证书录取通知原版一模一样
如何办理(uoit毕业证书)加拿大安大略理工大学毕业证文凭证书录取通知原版一模一样
 
一比一原版(Adelaide毕业证)阿德莱德大学毕业证成绩单
一比一原版(Adelaide毕业证)阿德莱德大学毕业证成绩单一比一原版(Adelaide毕业证)阿德莱德大学毕业证成绩单
一比一原版(Adelaide毕业证)阿德莱德大学毕业证成绩单
 
PPT Item # 5 - 5330 Broadway ARB Case # 930F
PPT Item # 5 - 5330 Broadway ARB Case # 930FPPT Item # 5 - 5330 Broadway ARB Case # 930F
PPT Item # 5 - 5330 Broadway ARB Case # 930F
 
PACT launching workshop presentation-Final.pdf
PACT launching workshop presentation-Final.pdfPACT launching workshop presentation-Final.pdf
PACT launching workshop presentation-Final.pdf
 
Up the Ratios Bylaws - a Comprehensive Process of Our Organization
Up the Ratios Bylaws - a Comprehensive Process of Our OrganizationUp the Ratios Bylaws - a Comprehensive Process of Our Organization
Up the Ratios Bylaws - a Comprehensive Process of Our Organization
 
快速制作(ocad毕业证书)加拿大安大略艺术设计学院毕业证本科学历雅思成绩单原版一模一样
快速制作(ocad毕业证书)加拿大安大略艺术设计学院毕业证本科学历雅思成绩单原版一模一样快速制作(ocad毕业证书)加拿大安大略艺术设计学院毕业证本科学历雅思成绩单原版一模一样
快速制作(ocad毕业证书)加拿大安大略艺术设计学院毕业证本科学历雅思成绩单原版一模一样
 
一比一原版(ANU毕业证)澳大利亚国立大学毕业证成绩单
一比一原版(ANU毕业证)澳大利亚国立大学毕业证成绩单一比一原版(ANU毕业证)澳大利亚国立大学毕业证成绩单
一比一原版(ANU毕业证)澳大利亚国立大学毕业证成绩单
 

ODSC East 2017: Data Science Models For Good

  • 1.
  • 2. Harnessing the power of data science in the service of humanity.
  • 3. Our Purpose: We amplify the impact of social organizations. Our Customer: Social organizations that have a clear theory of change for reducing human suffering Our Competitive Advantage: Our network of pro bono data scientists Our Product: Data science services, i.e. predictive analytics, machine learning, AI Our Style: Human-centered design, jargon-free, accessible What We Do
  • 4.
  • 5.
  • 7.
  • 11.
  • 13.
  • 14.
  • 15.
  • 17. ICAAD: International Center for Advocates Against Discrimination Non-profit organization that combats structural discrimination through monitoring global trends, fostering research and designing interventions Promote religious freedom in France Combat gender-based violence in the Pacific Islands Better documentation of hate crimes Mapping discrimination
  • 18. 18
  • 19. 19
  • 20. What was the DataCorps problem? We have a database of text records from the United Nations: the “Universal Periodic Review”
  • 21. 21
  • 23. What was the DataCorps problem? How do we leverage these UPR records to better understand human rights conditions across the world?
  • 24. Labeling with Sustainable Development Goals Adopted in 2015, the SDGs are a set of seventeen aspirational goals that all UN member states are committed to achieve, covering a broad range of human rights and development issues Successor to the Millennium Development Goals
  • 25. Task: How do we map a UPR to an SDG(s)?
  • 26. Deliverables 1) Build an MVP algorithm that systematically classifies Universal Periodic Review (UPR) records using Sustainable Development Goals 2) Using the results from the algorithm, create a dashboard that visualizes global patterns of discrimination These two tools will enable ICAAD to better allocate their resources towards the most important human rights interventions, as well as better disseminate their findings to other related organizations.
  • 28. UPR Data Source Number of Records Number of Labels ICAAD labeled 1247 2351 DataKind labeled 349 628 All-organic, self-harvested and hand-labeled...
  • 29. Data Prep 1) Each UPR = (very short) document 2) Clean, tokenize and create (1,2)-grams 3) Create term-document matrix 4) Feed bag-of-words matrix into ML model 5) ML model = two step “ensemble”
  • 30. Machine Learning Layer: Multi-Label SVM Support Vector Machine Linear (no kernel) Loss function: Squared hinge Penalty type: L1 Regularization constant: 2.0
  • 31. Keyword Lookup Layer If UPR text contains the word “corruption” → SDG #16 If UPR text contains the word “HIV” or “AIDS” → SDG #3 If UPR text contains the word “ICRMW” → SDG #10 And so on...
  • 32. Final Ensemble Model: CV Metrics ML Layer ML + Keyword Lookup Precision 0.827 0.772 Recall 0.758 0.848 F1-Score 0.787 0.802 ML Layer by itself does very well, but by adding the Keyword layer, we can sacrifice a little bit of precision for a large gain in recall, and get overall better performance.
  • 34. The Aftermath Part 1 ● Proof of concept algorithm delivered last October ● Demonstrated and implicated among various project partners
  • 35. The Aftermath Part 2 ● Team from Xerox brought in to build v2 of algorithm ○ Main SDG category contains 169 additional sub-goals ○ ICAAD wants to classify UPR records using these sub-goals ○ Army of volunteer lawyers doing a lot of manual labeling ● Something concrete by next summer!
  • 36. What We Learned (Parting Shots) 1. Easier = better 2. Small data is hard 3. Simple Boolean logic works surprisingly well 4. Data scientists are paid (and sometimes not) to do the dirty work
  • 37. DataCorps Team Ben Cohen: Software Engineer @ Warby Parker Rebecca Wei: PhD Student @ Northwestern Karry Lu: Senior Data Scientist @ Plated
  • 39. 39
  • 40.
  • 41.
  • 42. How do we find evidence? How do we communicate evidence? How do we use evidence? 42
  • 43. 43
  • 44.
  • 46. Respondents from UK conservation community indicate desire to use evidence but: Lacked a support framework to quickly sort and evaluate evidence Experience- based Evidence- based modified from Pullin et al. 2004 Evidence gap
  • 48. Need for knowledge on effectiveness Evidence-based decision making: Using findings to inform actions Desired outcomes achieved Research project Communicate findings Monitor and evaluate progress and outcomes Identify knowledge gaps Synthesize knowledge gluts Determine indicators Adjust actions RESEARCHERS PRACTITIONERS Theory of change
  • 49. The need Practitioners need standardized storage and access to research insights from academic and grey literature for evidence-based decision making Researchers need a framework to follow to create these resources Best Science Expert Opinion Society’s needs and preferences Evidence based Decision-making
  • 50.
  • 52. Problem #1: interactivity Thorn, Jessica PR, et al. "What evidence exists for the effectiveness of on-farm conservation land management strategies for preserving ecosystem services in developing countries? A systematic map." Environmental Evidence 5.1 (2016): 13.
  • 53. The AskProblem # 2: Manual screening
  • 55.
  • 56. Problem # 3: Tools exist
  • 57.
  • 58.
  • 62.
  • 63. System 1: relevance ranking • Citations are ranked by expected relevance depending on the availability and number of user-labeled examples – 1st uses search terms from review planning: computes the amount of overlap between those terms and citations' title + abstract + keywords – 2nd after enough examples have been labeled, uses distributional word vectors (word2vec) as features for a support vector classifier that predicts inclusion or exclusion; use confidence of that classification as expected relevance • Citations are randomly sampled each time, to avoid hasty generalization
  • 64. Unscreened Relevance is learned every 10 citations and documents are re-sorted
  • 65. System 2: extraction and tagging • A better methodology might be to use the training data to find sentences in the document that might indicate a label. (provide provenance) • We can train the system to over-predict (predict sentences from a large number of the labels), so that the system can focus on recall, while human annotators can focus on precision • For locations we can use a "Named Entity Recognition" system to find mentioned locations in the document, and suggest these as labels • For other metadata, we can train a model which predicts the relevance of sentences to a label • We show the sentences that best predict labels to the user, who can then use that information to pick the correct labels
  • 66. Data extraction This process also learns relevance, set at 50 reviews before it presents confidence
  • 68. Interaction with the data portal •Output CSV file with individual citations and factor tags •CSV ingested by receiving system http://natureandpeopleevidence.org/
  • 69. Measuring evidence synthesis and dissemination on the “T” impact model Sector (diffuse) impact as measured by • Access and operability • Common vs. uncommon solution • Dissemination framework Organization (deep) impact as measured by • Operational efficiency • Increased productivity • Expanded service
  • 70. 16 reviews 13 review leads 28 users Two weeks of soft launch colandr Two virtual trainings conducted
  • 71. Data Portal ~1,400 SESSIONS 8 MONTHS 47 REGISTERED USERS Multiple in person trainings
  • 72. “Evidence Based Conservation” ARTICLES ON IT 139 ARTICLES CITING THEM 2100 HOW MANY PEOPLE USE EVIDENCE IN DECISION MAKING?
  • 73. 73
  • 74.
  • 78.
  • 79. 79