SlideShare a Scribd company logo
1 of 26
Using Citizen Science to
organize biomedical
knowledge
Andrew Su, Ph.D.
@andrewsu
asu@scripps.edu
http://sulab.org
March 5, 2015
Future of Genomic Medicine
Slides posted at slideshare.net/andrewsu
2
Candidate genes
FLNB
CTNNB1
EPHA3
SMAD3
XPO1
RPS27
FLCN
ATR
FLT3
BRD2
ERG
RAF1
EGFR
ERBB4
RARA
JAK3
LRP1
WT1
PML
SMARCA4
…
The biomedical literature is growing fast…
3
0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
1983 1988 1993 1998 2003 2008 2013
Number of new PubMed-indexed articles
… but it is very hard to query and compute
4
… but it is very hard to query and compute
5
Imatinib
Crizotinib
Erlotinib
Gefitinib
Sorafenib
Lapatinib
Dasatinib
…
Acute myeloid leukemia
Acute lymphoblastic leukemia
Chronic myelogenous leukemia
Chronic lymphocytic leukemia
Hodgkin lymphoma
Non-Hodgkin lymphoma
Myeloma
…
AND
6
Pathways
Diseases
Proteins
Variants
Genes
Drugs
Goal: Assemble a network of biomedical
knowledge that is comprehensive,
current, computable and traceable.
Information Extraction
7
1. Identify high level concepts in text
2. Identify relationships between concepts
8
Doğan and Lu. Proceedings of the 2012 Workshop on BioNLP, 2012, 91-9.
NCBI Disease Corpus
593 PubMed abstracts 12 expert annotators
(2 per document)
6,900 “disease concept” mentions
Question: Can a group of non-scientists
collectively perform concept recognition in
biomedical texts?
9
Amazon Mechanical Turk (AMT)
10
Requester
Amazon
Workers
1. Create tasks
2. Execute
3. Aggregate
Experimental design
Task: Identify the “disease concepts” in
the 593 abstracts from the NCBI disease
corpus
– $0.06 per Human Intelligence Task (HIT)
– HIT = annotate one abstract from PubMed
– 15 workers annotate each abstract
11
Comparison to gold standard
12
K = 6
F score = 0.87
• 593 documents
• 15 users / doc
• 9 days
• 145 workers
• $630.96
Precision
Recall
Comparisons to text-mining algorithms
13
Fscore
Text-mining
AMT
experiments
Comparisons to human annotators
14
Average level of
agreement
between expert
annotators
(stage 1)
F = 0.76
Comparisons to human annotators
15
F = 0.76
F = 0.87
Average level of
agreement
between expert
annotators
(stage 2)
Does Mechanical Turk scale?
16
1,000,000 articles per year
10 annotators / article
4 tasks / doc
$0.06 / task
$ 2,400,000 / year
Question: Can a group of non-scientists
collectively perform concept recognition in
biomedical texts ?
17
and will they do
it for free?
^
18
http://mark2cure.org
Mark2Cure Campaign #0
• Goal: replicate the NCBI disease corpus
– 593 documents, 15x redundancy
• Launched Jan 19, 2015
• Completed Feb 16, 2015
19
– 4 weeks
– 10,275 document
annotation events
– 212 unique users
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Comparison to gold standard
20
k = 6
F score = 0.84
PrecisionRecall
Voting threshold
Total cost: $0
Does Citizen Science scale?
21
1,000,000 articles * 10 AE / article
15,828
volunteers
needed
10,275 AE * 365 days
212 annotators* 28 days
AE = Annotation events
=
Number of annotation
events per year
Number of annotation
events per year
per volunteer
Does Citizen Science scale?
22
15,828
volunteers
needed
175,000
volunteers
300,000
volunteers
37,000
volunteers
1,000,000
volunteers
Annotating the relationships
23
This molecule inhibits the growth of a broad
panel of cancer cell lines, and is particularly
efficacious in leukemia cells, including
orthotopic leukemia preclinical models as
well as in ex vivo acute myeloid leukemia
(AML) and chronic lymphocytic leukemia
(CLL) patient tumor samples. Thus, inhibition
of CDK9 may represent an interesting
approach as a cancer therapeutic target
especially in hematologic malignancies.
therapeutic target
subject
predicate
object
GENE
DISEASE
24
Candidate genes
FLNB
CTNNB1
EPHA3
SMAD3
XPO1
RPS27
FLCN
ATR
FLT3
BRD2
ERG
RAF1
EGFR
ERBB4
RARA
JAK3
LRP1
WT1
PML
SMARCA4
…
25
Cyrus Afrasiabi
Sebastian Burgstaller
Ramya Gamini
Louis Gioia
Salvatore Loguercio
Adam Mark
Erick Scott
Greg Stupp
Andra Waagmeester
Kevin Xin
Other group members
Contact
http://sulab.org
asu@scripps.edu
@andrewsu
+Andrew Su
Mark2Cure
Ben Good
Max Nanis
Ginger Tsueng
Chunlei Wu
All Mark2Curators!
Funding and Support
BioGPS: GM83924
Gene Wiki: GM089820
BD2K Center of Excellence: GM114833
Icon credits (Noun Project, Wikimedia Commons): Zach VanDeHey, hunotika, Viktorvoigt, Alberto Rojas, Lloyd Humphreys
Matt and Cristina Might
NGLY1 community
Why do I Mark2Cure?
26
I am retired, have a doctorate in
medical humanities, and have two
children with Gaucher disease. I am
just looking for some way to put my
education to use.
My 4 year old daughter Phoebe is
living with and battling rare
disease.
I have Ehlers Danlos Syndrome. I hope to help people
learn about this painful and debilitating disorder, so that
others like me can receive more effective medical care.
Take part in
something that
helps humanity.
I Mark2Cure in memory of
my son Mike who had type 1
diabetes.
Studied biology in
college and I really
miss it!
In memory of my daughter
who had Cystic Fibrosis
To give back

More Related Content

What's hot

Altering the Code of Life
Altering the Code of LifeAltering the Code of Life
Altering the Code of Life
April Johnson
 
Sbi4 u xenotransplantation
Sbi4 u xenotransplantationSbi4 u xenotransplantation
Sbi4 u xenotransplantation
wellsjw
 
Xenotransplantation_Public Presentation
Xenotransplantation_Public PresentationXenotransplantation_Public Presentation
Xenotransplantation_Public Presentation
Joe Rovinsky
 
Xenotransplantation
XenotransplantationXenotransplantation
Xenotransplantation
guest37029a
 

What's hot (20)

Bill Faloon at RAADfest 2020
Bill Faloon at RAADfest 2020Bill Faloon at RAADfest 2020
Bill Faloon at RAADfest 2020
 
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
 
Altering the Code of Life
Altering the Code of LifeAltering the Code of Life
Altering the Code of Life
 
Kim Solez combining resources in tx and regen med make no small plans
Kim Solez combining resources in tx and regen med make no small plansKim Solez combining resources in tx and regen med make no small plans
Kim Solez combining resources in tx and regen med make no small plans
 
NetBioSIG2014-Talk by Salvatore Loguercio
NetBioSIG2014-Talk by Salvatore LoguercioNetBioSIG2014-Talk by Salvatore Loguercio
NetBioSIG2014-Talk by Salvatore Loguercio
 
Barkai2013
Barkai2013Barkai2013
Barkai2013
 
Biomedical literature mining (and why we really need open access)
Biomedical literature mining (and why we really need open access)Biomedical literature mining (and why we really need open access)
Biomedical literature mining (and why we really need open access)
 
antibody engineering and xenotransplantation
antibody engineering and xenotransplantation antibody engineering and xenotransplantation
antibody engineering and xenotransplantation
 
Bill Faloon on Anti-Aging Drugs at DaVinci 50 Conference, 2021
Bill Faloon on Anti-Aging Drugs at DaVinci 50 Conference, 2021Bill Faloon on Anti-Aging Drugs at DaVinci 50 Conference, 2021
Bill Faloon on Anti-Aging Drugs at DaVinci 50 Conference, 2021
 
Open zika presentation
Open zika presentation Open zika presentation
Open zika presentation
 
Xenotransplantion
 Xenotransplantion Xenotransplantion
Xenotransplantion
 
Sbi4 u xenotransplantation
Sbi4 u xenotransplantationSbi4 u xenotransplantation
Sbi4 u xenotransplantation
 
Swansea University (October-2020): Challenges of using GWAS in bacteria
Swansea University (October-2020): Challenges of using GWAS in bacteriaSwansea University (October-2020): Challenges of using GWAS in bacteria
Swansea University (October-2020): Challenges of using GWAS in bacteria
 
Ashg sedlazeck grc_share
Ashg sedlazeck grc_shareAshg sedlazeck grc_share
Ashg sedlazeck grc_share
 
SMART Team Research
SMART Team ResearchSMART Team Research
SMART Team Research
 
Bill Faloon's Keynote Speech from RAADfest 2021
Bill Faloon's Keynote Speech from RAADfest 2021Bill Faloon's Keynote Speech from RAADfest 2021
Bill Faloon's Keynote Speech from RAADfest 2021
 
Xenotransplantation_Public Presentation
Xenotransplantation_Public PresentationXenotransplantation_Public Presentation
Xenotransplantation_Public Presentation
 
Xenotransplantation
XenotransplantationXenotransplantation
Xenotransplantation
 
Bill Faloon on Extracellular Vesicles from amniotic fluid RAADfest 2021
Bill Faloon on Extracellular Vesicles from amniotic fluid RAADfest 2021Bill Faloon on Extracellular Vesicles from amniotic fluid RAADfest 2021
Bill Faloon on Extracellular Vesicles from amniotic fluid RAADfest 2021
 
Dr. Randall Prather - PRRS Resistant Pigs
Dr. Randall Prather - PRRS Resistant PigsDr. Randall Prather - PRRS Resistant Pigs
Dr. Randall Prather - PRRS Resistant Pigs
 

Similar to Using Citizen Science to organize biomedical knowledge

ACS Spring 2016 Combining semantic triple stores across knowledge domains
ACS Spring 2016 Combining semantic triple stores across knowledge domainsACS Spring 2016 Combining semantic triple stores across knowledge domains
ACS Spring 2016 Combining semantic triple stores across knowledge domains
Matthew Clark
 
Cambridge Bioscience_ ACEA User Group Meeting2014
Cambridge Bioscience_ ACEA User Group Meeting2014Cambridge Bioscience_ ACEA User Group Meeting2014
Cambridge Bioscience_ ACEA User Group Meeting2014
Jay Champaneri
 

Similar to Using Citizen Science to organize biomedical knowledge (20)

Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
 
Plasma for fractionation and Patient Blood Management.
Plasma for fractionation and Patient Blood Management.Plasma for fractionation and Patient Blood Management.
Plasma for fractionation and Patient Blood Management.
 
AI in medicine: COVID-19 and beyond
AI in medicine: COVID-19 and beyondAI in medicine: COVID-19 and beyond
AI in medicine: COVID-19 and beyond
 
ACS Spring 2016 Combining semantic triple stores across knowledge domains
ACS Spring 2016 Combining semantic triple stores across knowledge domainsACS Spring 2016 Combining semantic triple stores across knowledge domains
ACS Spring 2016 Combining semantic triple stores across knowledge domains
 
The case for an open biomedical knowledgebase
The case for an open biomedical knowledgebaseThe case for an open biomedical knowledgebase
The case for an open biomedical knowledgebase
 
UK Biobank: A Prospective Cohort Epidemiology Study
UK Biobank: A Prospective Cohort Epidemiology StudyUK Biobank: A Prospective Cohort Epidemiology Study
UK Biobank: A Prospective Cohort Epidemiology Study
 
Nicb Research Overview
Nicb Research OverviewNicb Research Overview
Nicb Research Overview
 
Flow cytometry: Principles and Applications
Flow cytometry: Principles and ApplicationsFlow cytometry: Principles and Applications
Flow cytometry: Principles and Applications
 
Cambridge Bioscience_ ACEA User Group Meeting2014
Cambridge Bioscience_ ACEA User Group Meeting2014Cambridge Bioscience_ ACEA User Group Meeting2014
Cambridge Bioscience_ ACEA User Group Meeting2014
 
Gene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2KGene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2K
 
IRJET- Recognition of Human Blood Disease on Sample Microscopic Images
IRJET-  	  Recognition of Human Blood Disease on Sample Microscopic ImagesIRJET-  	  Recognition of Human Blood Disease on Sample Microscopic Images
IRJET- Recognition of Human Blood Disease on Sample Microscopic Images
 
Overcoming the challenges of molecular diagnostics in government health insti...
Overcoming the challenges of molecular diagnostics in government health insti...Overcoming the challenges of molecular diagnostics in government health insti...
Overcoming the challenges of molecular diagnostics in government health insti...
 
Leukemia Lymphoma Society: An Invitation to Innovation
Leukemia Lymphoma Society: An Invitation to InnovationLeukemia Lymphoma Society: An Invitation to Innovation
Leukemia Lymphoma Society: An Invitation to Innovation
 
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledgeThe Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
 
Flowcytometry 1
Flowcytometry 1Flowcytometry 1
Flowcytometry 1
 
Rohan gupta 2015 b1ab651p facs
Rohan gupta 2015 b1ab651p   facsRohan gupta 2015 b1ab651p   facs
Rohan gupta 2015 b1ab651p facs
 
Rozen 2016-10-05-ieee-cibcb-big-genome-data-to-share
Rozen 2016-10-05-ieee-cibcb-big-genome-data-to-shareRozen 2016-10-05-ieee-cibcb-big-genome-data-to-share
Rozen 2016-10-05-ieee-cibcb-big-genome-data-to-share
 
DGoodman Poster Final
DGoodman Poster FinalDGoodman Poster Final
DGoodman Poster Final
 
Osteoblasts remotely supply lung tumors with cancer-promoting SiglecFhigh neu...
Osteoblasts remotely supply lung tumors with cancer-promoting SiglecFhigh neu...Osteoblasts remotely supply lung tumors with cancer-promoting SiglecFhigh neu...
Osteoblasts remotely supply lung tumors with cancer-promoting SiglecFhigh neu...
 
Asif iqbal ppt Slides on cancer for defense
Asif iqbal ppt Slides on cancer for defenseAsif iqbal ppt Slides on cancer for defense
Asif iqbal ppt Slides on cancer for defense
 

More from Andrew Su

Building and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graphBuilding and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graph
Andrew Su
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Andrew Su
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)
Andrew Su
 
20120220 Tri-Con Cloud Computing Symposium
20120220 Tri-Con Cloud Computing Symposium20120220 Tri-Con Cloud Computing Symposium
20120220 Tri-Con Cloud Computing Symposium
Andrew Su
 

More from Andrew Su (20)

Building and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graphBuilding and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graph
 
Wikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciencesWikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciences
 
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
 
WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)
 
Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)
 
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
 
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
 
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen ScienceCrowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
 
Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)
 
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
 
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
 
Wikipedia as an engine for scientific communication and collaboration at mass...
Wikipedia as an engine for scientific communication and collaboration at mass...Wikipedia as an engine for scientific communication and collaboration at mass...
Wikipedia as an engine for scientific communication and collaboration at mass...
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)
 
GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)
GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)
GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)
 
Crowdsourcing to structure biological knowledge (USC/ISI)
Crowdsourcing to structure biological knowledge (USC/ISI)Crowdsourcing to structure biological knowledge (USC/ISI)
Crowdsourcing to structure biological knowledge (USC/ISI)
 
ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation
ISMB2012: The Gene Wiki: Crowdsourcing human gene annotationISMB2012: The Gene Wiki: Crowdsourcing human gene annotation
ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation
 
ISB2012: The Gene Wiki: Crowdsourcing human gene annotation
ISB2012: The Gene Wiki: Crowdsourcing human gene annotationISB2012: The Gene Wiki: Crowdsourcing human gene annotation
ISB2012: The Gene Wiki: Crowdsourcing human gene annotation
 
20120220 Tri-Con Cloud Computing Symposium
20120220 Tri-Con Cloud Computing Symposium20120220 Tri-Con Cloud Computing Symposium
20120220 Tri-Con Cloud Computing Symposium
 

Recently uploaded

Tuberculosis (TB)-Notes.pdf microbiology notes
Tuberculosis (TB)-Notes.pdf microbiology notesTuberculosis (TB)-Notes.pdf microbiology notes
Tuberculosis (TB)-Notes.pdf microbiology notes
jyothisaisri
 
Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!
University of Hertfordshire
 
Pests of sugarcane_Binomics_IPM_Dr.UPR.pdf
Pests of sugarcane_Binomics_IPM_Dr.UPR.pdfPests of sugarcane_Binomics_IPM_Dr.UPR.pdf
Pests of sugarcane_Binomics_IPM_Dr.UPR.pdf
PirithiRaju
 
Aerodynamics. flippatterncn5tm5ttnj6nmnynyppt
Aerodynamics. flippatterncn5tm5ttnj6nmnynypptAerodynamics. flippatterncn5tm5ttnj6nmnynyppt
Aerodynamics. flippatterncn5tm5ttnj6nmnynyppt
sreddyrahul
 
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Sérgio Sacani
 

Recently uploaded (20)

Ostiguy & Panizza & Moffitt (eds.) - Populism in Global Perspective. A Perfor...
Ostiguy & Panizza & Moffitt (eds.) - Populism in Global Perspective. A Perfor...Ostiguy & Panizza & Moffitt (eds.) - Populism in Global Perspective. A Perfor...
Ostiguy & Panizza & Moffitt (eds.) - Populism in Global Perspective. A Perfor...
 
Tuberculosis (TB)-Notes.pdf microbiology notes
Tuberculosis (TB)-Notes.pdf microbiology notesTuberculosis (TB)-Notes.pdf microbiology notes
Tuberculosis (TB)-Notes.pdf microbiology notes
 
Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!
 
GBSN - Microbiology (Unit 7) Microbiology in Everyday Life
GBSN - Microbiology (Unit 7) Microbiology in Everyday LifeGBSN - Microbiology (Unit 7) Microbiology in Everyday Life
GBSN - Microbiology (Unit 7) Microbiology in Everyday Life
 
B lymphocytes, Receptors, Maturation and Activation
B lymphocytes, Receptors, Maturation and ActivationB lymphocytes, Receptors, Maturation and Activation
B lymphocytes, Receptors, Maturation and Activation
 
Pests of sugarcane_Binomics_IPM_Dr.UPR.pdf
Pests of sugarcane_Binomics_IPM_Dr.UPR.pdfPests of sugarcane_Binomics_IPM_Dr.UPR.pdf
Pests of sugarcane_Binomics_IPM_Dr.UPR.pdf
 
National Biodiversity protection initiatives and Convention on Biological Di...
National Biodiversity protection initiatives and  Convention on Biological Di...National Biodiversity protection initiatives and  Convention on Biological Di...
National Biodiversity protection initiatives and Convention on Biological Di...
 
Aerodynamics. flippatterncn5tm5ttnj6nmnynyppt
Aerodynamics. flippatterncn5tm5ttnj6nmnynypptAerodynamics. flippatterncn5tm5ttnj6nmnynyppt
Aerodynamics. flippatterncn5tm5ttnj6nmnynyppt
 
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
 
Triploidy ...............................pptx
Triploidy ...............................pptxTriploidy ...............................pptx
Triploidy ...............................pptx
 
MODERN PHYSICS_REPORTING_QUANTA_.....pdf
MODERN PHYSICS_REPORTING_QUANTA_.....pdfMODERN PHYSICS_REPORTING_QUANTA_.....pdf
MODERN PHYSICS_REPORTING_QUANTA_.....pdf
 
Microbial bio Synthesis of nanoparticles.pptx
Microbial bio Synthesis of nanoparticles.pptxMicrobial bio Synthesis of nanoparticles.pptx
Microbial bio Synthesis of nanoparticles.pptx
 
Lec 1.b Totipotency and birth of tissue culture.ppt
Lec 1.b Totipotency and birth of tissue culture.pptLec 1.b Totipotency and birth of tissue culture.ppt
Lec 1.b Totipotency and birth of tissue culture.ppt
 
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...
 
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
 
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
 
Film Coated Tablet and Film Coating raw materials.pdf
Film Coated Tablet and Film Coating raw materials.pdfFilm Coated Tablet and Film Coating raw materials.pdf
Film Coated Tablet and Film Coating raw materials.pdf
 
INSIGHT Partner Profile: Tampere University
INSIGHT Partner Profile: Tampere UniversityINSIGHT Partner Profile: Tampere University
INSIGHT Partner Profile: Tampere University
 
Land use land cover change analysis and detection of its drivers using geospa...
Land use land cover change analysis and detection of its drivers using geospa...Land use land cover change analysis and detection of its drivers using geospa...
Land use land cover change analysis and detection of its drivers using geospa...
 
The Scientific names of some important families of Industrial plants .pdf
The Scientific names of some important families of Industrial plants .pdfThe Scientific names of some important families of Industrial plants .pdf
The Scientific names of some important families of Industrial plants .pdf
 

Using Citizen Science to organize biomedical knowledge

  • 1. Using Citizen Science to organize biomedical knowledge Andrew Su, Ph.D. @andrewsu asu@scripps.edu http://sulab.org March 5, 2015 Future of Genomic Medicine Slides posted at slideshare.net/andrewsu
  • 3. The biomedical literature is growing fast… 3 0 200,000 400,000 600,000 800,000 1,000,000 1,200,000 1983 1988 1993 1998 2003 2008 2013 Number of new PubMed-indexed articles
  • 4. … but it is very hard to query and compute 4
  • 5. … but it is very hard to query and compute 5 Imatinib Crizotinib Erlotinib Gefitinib Sorafenib Lapatinib Dasatinib … Acute myeloid leukemia Acute lymphoblastic leukemia Chronic myelogenous leukemia Chronic lymphocytic leukemia Hodgkin lymphoma Non-Hodgkin lymphoma Myeloma … AND
  • 6. 6 Pathways Diseases Proteins Variants Genes Drugs Goal: Assemble a network of biomedical knowledge that is comprehensive, current, computable and traceable.
  • 7. Information Extraction 7 1. Identify high level concepts in text 2. Identify relationships between concepts
  • 8. 8 Doğan and Lu. Proceedings of the 2012 Workshop on BioNLP, 2012, 91-9. NCBI Disease Corpus 593 PubMed abstracts 12 expert annotators (2 per document) 6,900 “disease concept” mentions
  • 9. Question: Can a group of non-scientists collectively perform concept recognition in biomedical texts? 9
  • 10. Amazon Mechanical Turk (AMT) 10 Requester Amazon Workers 1. Create tasks 2. Execute 3. Aggregate
  • 11. Experimental design Task: Identify the “disease concepts” in the 593 abstracts from the NCBI disease corpus – $0.06 per Human Intelligence Task (HIT) – HIT = annotate one abstract from PubMed – 15 workers annotate each abstract 11
  • 12. Comparison to gold standard 12 K = 6 F score = 0.87 • 593 documents • 15 users / doc • 9 days • 145 workers • $630.96 Precision Recall
  • 13. Comparisons to text-mining algorithms 13 Fscore Text-mining AMT experiments
  • 14. Comparisons to human annotators 14 Average level of agreement between expert annotators (stage 1) F = 0.76
  • 15. Comparisons to human annotators 15 F = 0.76 F = 0.87 Average level of agreement between expert annotators (stage 2)
  • 16. Does Mechanical Turk scale? 16 1,000,000 articles per year 10 annotators / article 4 tasks / doc $0.06 / task $ 2,400,000 / year
  • 17. Question: Can a group of non-scientists collectively perform concept recognition in biomedical texts ? 17 and will they do it for free? ^
  • 19. Mark2Cure Campaign #0 • Goal: replicate the NCBI disease corpus – 593 documents, 15x redundancy • Launched Jan 19, 2015 • Completed Feb 16, 2015 19 – 4 weeks – 10,275 document annotation events – 212 unique users
  • 20. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Comparison to gold standard 20 k = 6 F score = 0.84 PrecisionRecall Voting threshold Total cost: $0
  • 21. Does Citizen Science scale? 21 1,000,000 articles * 10 AE / article 15,828 volunteers needed 10,275 AE * 365 days 212 annotators* 28 days AE = Annotation events = Number of annotation events per year Number of annotation events per year per volunteer
  • 22. Does Citizen Science scale? 22 15,828 volunteers needed 175,000 volunteers 300,000 volunteers 37,000 volunteers 1,000,000 volunteers
  • 23. Annotating the relationships 23 This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies. therapeutic target subject predicate object GENE DISEASE
  • 25. 25 Cyrus Afrasiabi Sebastian Burgstaller Ramya Gamini Louis Gioia Salvatore Loguercio Adam Mark Erick Scott Greg Stupp Andra Waagmeester Kevin Xin Other group members Contact http://sulab.org asu@scripps.edu @andrewsu +Andrew Su Mark2Cure Ben Good Max Nanis Ginger Tsueng Chunlei Wu All Mark2Curators! Funding and Support BioGPS: GM83924 Gene Wiki: GM089820 BD2K Center of Excellence: GM114833 Icon credits (Noun Project, Wikimedia Commons): Zach VanDeHey, hunotika, Viktorvoigt, Alberto Rojas, Lloyd Humphreys Matt and Cristina Might NGLY1 community
  • 26. Why do I Mark2Cure? 26 I am retired, have a doctorate in medical humanities, and have two children with Gaucher disease. I am just looking for some way to put my education to use. My 4 year old daughter Phoebe is living with and battling rare disease. I have Ehlers Danlos Syndrome. I hope to help people learn about this painful and debilitating disorder, so that others like me can receive more effective medical care. Take part in something that helps humanity. I Mark2Cure in memory of my son Mike who had type 1 diabetes. Studied biology in college and I really miss it! In memory of my daughter who had Cystic Fibrosis To give back