Panel on Citizen Science and Crowdsourcing Games - March 27, 2015

Andrew Su
Andrew SuProfessor at The Scripps Research Institute
1
Gene-specific review
article for every
human gene
Data integration for
genes, drugs,
diseases
Robust classifiers of breast
cancer prognosis
Annotation of
biomedical literature
Expert-guided
classifier design
Gene-centric
web portal
Bioinformatics
algorithm
optimization
Andrew Su, Ph.D.
@andrewsu
asu@scripps.edu
http://sulab.org
Slides: slideshare.net/andrewsu
Mark2Cure – biocuration by microtasking
• Challenge: The biomedical literature is
massive and growing exponentially, but it is
largely inaccessible
• Opportunity: Better access to existing
knowledge can make scientific process more
efficient and productive
• Current situation
– Manual biocuration by experts
– Natural language processing
2
Mark2Cure – biocuration by microtasking
• Our approach: Use Amazon Mechanical Turk
platform for paid microtask crowdsourcing
• Results: reproduced an expert-generated gold
standard at equivalent accuracy, shorter time,
fraction of cost
3
K = 6
F score = 0.87
Precision
Recall
• 593 documents
• 9 days
• 145 workers
• $0.06 / task
• Total cost: $630.96
Mark2Cure – biocuration by citizen science
• Our approach: Use volunteer-based citizen
science for microtask crowdsourcing
• Results: reproduced an expert-generated gold
standard at equivalent accuracy, shorter time,
at no cost
4
• 593 documents
• 28 days
• 212 workers
• Total cost: $0.00
0
0.2
0.4
0.6
0.8
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
k = 6
F score = 0.84
PrecisionRecall
Voting threshold
http://mark2cure.org
Collaborative knowledge management
• Challenge: Biomedical research allows for
genome-scale profiling, but few genes are
previously known to researcher
• Opportunity: Better access to existing
knowledge can make scientific process more
efficient and productive
• Current situation
– Review articles (but sparse coverage)
– Lots of reading of primary literature
5
Collaborative knowledge management
• Our approach: Create
a gene-specific review
article for every human
gene that is
collaboratively written,
continuously updated,
and community
reviewed
• Results: 5M page
views and >1000 edits
per month
6
Collaborative knowledge management
• Our approach: Create
a gene-specific Wikidata
database entry for every
human gene that is
collaboratively
integrated, continuously
updated, and
community reviewed
• Results: all human
genes and diseases
loaded in Wikidata, soon
to have drugs and
relationships
7
Bioinformatics algorithm optimization
• Challenge: Antibody sequence clustering is
computationally expensive (CPU and memory)
• Opportunity: Large-scale clustering of
antibody sequences can aid vaccine
development
• Current situation: Research-grade code can
cluster ~100k sequences in 1.7 hours on high
memory (150 GB) machine.
8
Bioinformatics algorithm optimization
• Our approach: Ran TopCoder contest for 10
days, offering $7500 in prize money
• Results: Best solution can cluster 2.3M
sequences in 30 seconds on a typical desktop
computer (1.1 GB)
9
log(# sequences processed)
log(executiontime)
Benchmarks
10
Cyrus Afrasiabi
Ramya Gamini
Louis Gioia
Salvatore Loguercio
Adam Mark
Erick Scott
Greg Stupp
Kevin Xin
Other group members
Contact
http://sulab.org
asu@scripps.edu
@andrewsu
+Andrew Su
Mark2Cure
Ben Good
Max Nanis
Ginger Tsueng
Chunlei Wu
All Mark2Curators!
Funding and Support
BioGPS: GM83924
Gene Wiki: GM089820
BD2K Center of Excellence: GM114833
Gene Wiki
Ben Good
Sebastian Burgstaller
Andra Waagmeester
Elvira Mitraka, UMB
Lynn Schriml, UMB
Paul Pavlidis, UBC
Gang Fu, NCBI
Contests
Chunlei Wu
Ben Good
Brian Briney, TSRI
Dennis Burton, TSRI
Rinat Sergeev, HBS
Jin Paik, HBS
Karim Laklani, HBS
Jingbo Shang
Rashid Sial, Appirio
Join the team! bit.ly/sulabawesome
Game for breast cancer prognosis
• Challenge: Genomic classifiers of disease are
difficult to train in a way that consistently
validates on secondary datasets
• Opportunity: Better classifiers of disease
diagnosis and/or prognosis have many clinical
applications
• Current situation: Most attempts to train
classifiers rely on machine learning methods
that utilize little or no biological knowledge
11
Game for breast cancer prognosis
• Our approach: Enlist a crowd of expert game
players with diverse perspectives to identify
most biologically relevant genes
• Results: Gene sets derived from game player
data showed comparable performance to
expert-generated gene sets
12
• 1077 registered players
• 15,669 games played
• Demographics
– 59% male, 41% female
– 21-29 is most frequent age group
– 35% had graduate degree, 32%
were biologists
1 of 12

Recommended

Heart BD2K, Biocuration, and Citizen Science by
Heart BD2K, Biocuration, and Citizen ScienceHeart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen ScienceAndrew Su
1.6K views26 slides
Determining the Human Gut Microbiome Using Genome Sequencing and Dell's Cloud... by
Determining the Human Gut Microbiome Using Genome Sequencing and Dell's Cloud...Determining the Human Gut Microbiome Using Genome Sequencing and Dell's Cloud...
Determining the Human Gut Microbiome Using Genome Sequencing and Dell's Cloud...Larry Smarr
937 views8 slides
Design Poster by
Design PosterDesign Poster
Design PosterHui Dong
89 views1 slide
Citizen Science and Rare Disease Research by
Citizen Science and Rare Disease ResearchCitizen Science and Rare Disease Research
Citizen Science and Rare Disease ResearchAndrew Su
578 views40 slides
Building a Biomedical Knowledge Garden by
Building a Biomedical Knowledge Garden Building a Biomedical Knowledge Garden
Building a Biomedical Knowledge Garden Benjamin Good
6.1K views42 slides
Overview of Next Gen Sequencing Data Analysis by
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisBioinformatics and Computational Biosciences Branch
6.5K views73 slides

More Related Content

Similar to Panel on Citizen Science and Crowdsourcing Games - March 27, 2015

Hedlund_biogrid_BOSC2009 by
Hedlund_biogrid_BOSC2009Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009bosc
484 views35 slides
How novel compute technology transforms life science research by
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science researchDenis C. Bauer
1.1K views28 slides
Docker in Open Science Data Analysis Challenges by Bruce Hoff by
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker, Inc.
652 views34 slides
ReComp and P4@NU: Reproducible Data Science for Health by
ReComp and P4@NU: Reproducible Data Science for HealthReComp and P4@NU: Reproducible Data Science for Health
ReComp and P4@NU: Reproducible Data Science for HealthPaolo Missier
188 views21 slides
Utilization of virtual microscopy in a cooperative group setting by
Utilization of virtual microscopy in a cooperative group settingUtilization of virtual microscopy in a cooperative group setting
Utilization of virtual microscopy in a cooperative group settingBIT002
349 views34 slides
(Bio)Hackathons by
(Bio)Hackathons(Bio)Hackathons
(Bio)HackathonsBenjamin Good
628 views15 slides

Similar to Panel on Citizen Science and Crowdsourcing Games - March 27, 2015(20)

Hedlund_biogrid_BOSC2009 by bosc
Hedlund_biogrid_BOSC2009Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009
bosc484 views
How novel compute technology transforms life science research by Denis C. Bauer
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science research
Denis C. Bauer1.1K views
Docker in Open Science Data Analysis Challenges by Bruce Hoff by Docker, Inc.
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce Hoff
Docker, Inc.652 views
ReComp and P4@NU: Reproducible Data Science for Health by Paolo Missier
ReComp and P4@NU: Reproducible Data Science for HealthReComp and P4@NU: Reproducible Data Science for Health
ReComp and P4@NU: Reproducible Data Science for Health
Paolo Missier188 views
Utilization of virtual microscopy in a cooperative group setting by BIT002
Utilization of virtual microscopy in a cooperative group settingUtilization of virtual microscopy in a cooperative group setting
Utilization of virtual microscopy in a cooperative group setting
BIT002349 views
World Community Grid by Vivek Mehta
World Community GridWorld Community Grid
World Community Grid
Vivek Mehta320 views
High Performance Computing and the Opportunity with Cognitive Technology by IBM Watson
 High Performance Computing and the Opportunity with Cognitive Technology High Performance Computing and the Opportunity with Cognitive Technology
High Performance Computing and the Opportunity with Cognitive Technology
IBM Watson1K views
CI4CC sustainability-panel by Ravi Madduri
CI4CC sustainability-panelCI4CC sustainability-panel
CI4CC sustainability-panel
Ravi Madduri861 views
Developing a Research Case Study by Julie Goldman
Developing a Research Case StudyDeveloping a Research Case Study
Developing a Research Case Study
Julie Goldman821 views
Considerations and challenges in building an end to-end microbiome workflow by Eagle Genomics
Considerations and challenges in building an end to-end microbiome workflowConsiderations and challenges in building an end to-end microbiome workflow
Considerations and challenges in building an end to-end microbiome workflow
Eagle Genomics91 views
wolstencroft-ogf20-astro by webuploader
wolstencroft-ogf20-astrowolstencroft-ogf20-astro
wolstencroft-ogf20-astro
webuploader378 views
WCG7 (assembled) by Vivek Mehta
WCG7 (assembled)WCG7 (assembled)
WCG7 (assembled)
Vivek Mehta239 views
"The Reverse Factory: Embedded Vision in High-Volume Laboratory Applications,... by Edge AI and Vision Alliance
"The Reverse Factory: Embedded Vision in High-Volume Laboratory Applications,..."The Reverse Factory: Embedded Vision in High-Volume Laboratory Applications,...
"The Reverse Factory: Embedded Vision in High-Volume Laboratory Applications,...
VariantSpark: applying Spark-based machine learning methods to genomic inform... by Denis C. Bauer
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...
Denis C. Bauer1.9K views
SaaS and the Transformation of Research by Vas Vasiliadis
SaaS and the Transformation of ResearchSaaS and the Transformation of Research
SaaS and the Transformation of Research
Vas Vasiliadis275 views
Software Sustainability: Better Software Better Science by Carole Goble
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
Carole Goble2.1K views
Jsm madduri-august-2015 by Ravi Madduri
Jsm madduri-august-2015Jsm madduri-august-2015
Jsm madduri-august-2015
Ravi Madduri683 views

More from Andrew Su

Building and mining a heterogeneous biomedical knowledge graph by
Building and mining a heterogeneous biomedical knowledge graphBuilding and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graphAndrew Su
140 views33 slides
Wikidata as a FAIR knowledge graph for the life sciences by
Wikidata as a FAIR knowledge graph for the life sciencesWikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciencesAndrew Su
209 views17 slides
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge by
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledgeThe Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledgeAndrew Su
1.1K views33 slides
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi... by
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...Andrew Su
5K views18 slides
WikiGenomes Poster (ISMB) by
WikiGenomes Poster (ISMB)WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)Andrew Su
335 views1 slide
The case for an open biomedical knowledgebase by
The case for an open biomedical knowledgebaseThe case for an open biomedical knowledgebase
The case for an open biomedical knowledgebaseAndrew Su
353 views26 slides

More from Andrew Su(20)

Building and mining a heterogeneous biomedical knowledge graph by Andrew Su
Building and mining a heterogeneous biomedical knowledge graphBuilding and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graph
Andrew Su140 views
Wikidata as a FAIR knowledge graph for the life sciences by Andrew Su
Wikidata as a FAIR knowledge graph for the life sciencesWikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciences
Andrew Su209 views
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge by Andrew Su
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledgeThe Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
Andrew Su1.1K views
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi... by Andrew Su
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
Andrew Su5K views
WikiGenomes Poster (ISMB) by Andrew Su
WikiGenomes Poster (ISMB)WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)
Andrew Su335 views
The case for an open biomedical knowledgebase by Andrew Su
The case for an open biomedical knowledgebaseThe case for an open biomedical knowledgebase
The case for an open biomedical knowledgebase
Andrew Su353 views
Open data, compound repurposing, and rare diseases (ISCB) by Andrew Su
Open data, compound repurposing, and rare diseases (ISCB)Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)
Andrew Su311 views
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni... by Andrew Su
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Andrew Su300 views
Open biomedical knowledge using crowdsourcing and citizen science by Andrew Su
Open biomedical knowledge using crowdsourcing and citizen scienceOpen biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen science
Andrew Su3.1K views
Using Citizen Science to organize biomedical knowledge by Andrew Su
Using Citizen Science to organize biomedical knowledgeUsing Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledge
Andrew Su3.6K views
UCSD / DBMI seminar 2015-02-6 by Andrew Su
UCSD / DBMI seminar 2015-02-6UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6
Andrew Su3.1K views
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015) by Andrew Su
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Andrew Su2.3K views
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014) by Andrew Su
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Andrew Su1.8K views
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science by Andrew Su
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen ScienceCrowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Andrew Su2.8K views
Centralized Model Organism Database (Biocuration 2014 poster) by Andrew Su
Centralized Model Organism Database (Biocuration 2014 poster)Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)
Andrew Su1.7K views
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G... by Andrew Su
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
Andrew Su4K views
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org by Andrew Su
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Andrew Su720 views
NCBO Webinar: Translating unstructured, crowdsourced content into structured ... by Andrew Su
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
Andrew Su2.5K views
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org by Andrew Su
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Andrew Su569 views
Wikipedia as an engine for scientific communication and collaboration at mass... by Andrew Su
Wikipedia as an engine for scientific communication and collaboration at mass...Wikipedia as an engine for scientific communication and collaboration at mass...
Wikipedia as an engine for scientific communication and collaboration at mass...
Andrew Su1.8K views

Recently uploaded

CSF -SHEEBA.D presentation.pptx by
CSF -SHEEBA.D presentation.pptxCSF -SHEEBA.D presentation.pptx
CSF -SHEEBA.D presentation.pptxSheebaD7
11 views13 slides
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ... by
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...ILRI
5 views1 slide
PRINCIPLES-OF ASSESSMENT by
PRINCIPLES-OF ASSESSMENTPRINCIPLES-OF ASSESSMENT
PRINCIPLES-OF ASSESSMENTrbalmagro
12 views12 slides
Conventional and non-conventional methods for improvement of cucurbits.pptx by
Conventional and non-conventional methods for improvement of cucurbits.pptxConventional and non-conventional methods for improvement of cucurbits.pptx
Conventional and non-conventional methods for improvement of cucurbits.pptxgandhi976
18 views35 slides
application of genetic engineering 2.pptx by
application of genetic engineering 2.pptxapplication of genetic engineering 2.pptx
application of genetic engineering 2.pptxSankSurezz
9 views12 slides
POSTER IV LAWCN_ROVER_IUE.pdf by
POSTER IV LAWCN_ROVER_IUE.pdfPOSTER IV LAWCN_ROVER_IUE.pdf
POSTER IV LAWCN_ROVER_IUE.pdfSOCIEDAD JULIO GARAVITO
8 views1 slide

Recently uploaded(20)

CSF -SHEEBA.D presentation.pptx by SheebaD7
CSF -SHEEBA.D presentation.pptxCSF -SHEEBA.D presentation.pptx
CSF -SHEEBA.D presentation.pptx
SheebaD711 views
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ... by ILRI
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
ILRI5 views
PRINCIPLES-OF ASSESSMENT by rbalmagro
PRINCIPLES-OF ASSESSMENTPRINCIPLES-OF ASSESSMENT
PRINCIPLES-OF ASSESSMENT
rbalmagro12 views
Conventional and non-conventional methods for improvement of cucurbits.pptx by gandhi976
Conventional and non-conventional methods for improvement of cucurbits.pptxConventional and non-conventional methods for improvement of cucurbits.pptx
Conventional and non-conventional methods for improvement of cucurbits.pptx
gandhi97618 views
application of genetic engineering 2.pptx by SankSurezz
application of genetic engineering 2.pptxapplication of genetic engineering 2.pptx
application of genetic engineering 2.pptx
SankSurezz9 views
"How can I develop my learning path in bioinformatics? by Bioinformy
"How can I develop my learning path in bioinformatics?"How can I develop my learning path in bioinformatics?
"How can I develop my learning path in bioinformatics?
Bioinformy23 views
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance... by InsideScientific
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...
InsideScientific49 views
MODULE-9-Biotechnology, Genetically Modified Organisms, and Gene Therapy.pdf by KerryNuez1
MODULE-9-Biotechnology, Genetically Modified Organisms, and Gene Therapy.pdfMODULE-9-Biotechnology, Genetically Modified Organisms, and Gene Therapy.pdf
MODULE-9-Biotechnology, Genetically Modified Organisms, and Gene Therapy.pdf
KerryNuez124 views
Open Access Publishing in Astrophysics by Peter Coles
Open Access Publishing in AstrophysicsOpen Access Publishing in Astrophysics
Open Access Publishing in Astrophysics
Peter Coles808 views
Experimental animal Guinea pigs.pptx by Mansee Arya
Experimental animal Guinea pigs.pptxExperimental animal Guinea pigs.pptx
Experimental animal Guinea pigs.pptx
Mansee Arya15 views
Metatheoretical Panda-Samaneh Borji.pdf by samanehborji
Metatheoretical Panda-Samaneh Borji.pdfMetatheoretical Panda-Samaneh Borji.pdf
Metatheoretical Panda-Samaneh Borji.pdf
samanehborji16 views
Synthesis and Characterization of Magnetite-Magnesium Sulphate-Sodium Dodecyl... by GIFT KIISI NKIN
Synthesis and Characterization of Magnetite-Magnesium Sulphate-Sodium Dodecyl...Synthesis and Characterization of Magnetite-Magnesium Sulphate-Sodium Dodecyl...
Synthesis and Characterization of Magnetite-Magnesium Sulphate-Sodium Dodecyl...
GIFT KIISI NKIN22 views
Pollination By Nagapradheesh.M.pptx by MNAGAPRADHEESH
Pollination By Nagapradheesh.M.pptxPollination By Nagapradheesh.M.pptx
Pollination By Nagapradheesh.M.pptx
MNAGAPRADHEESH16 views
How to be(come) a successful PhD student by Tom Mens
How to be(come) a successful PhD studentHow to be(come) a successful PhD student
How to be(come) a successful PhD student
Tom Mens473 views

Panel on Citizen Science and Crowdsourcing Games - March 27, 2015

  • 1. 1 Gene-specific review article for every human gene Data integration for genes, drugs, diseases Robust classifiers of breast cancer prognosis Annotation of biomedical literature Expert-guided classifier design Gene-centric web portal Bioinformatics algorithm optimization Andrew Su, Ph.D. @andrewsu asu@scripps.edu http://sulab.org Slides: slideshare.net/andrewsu
  • 2. Mark2Cure – biocuration by microtasking • Challenge: The biomedical literature is massive and growing exponentially, but it is largely inaccessible • Opportunity: Better access to existing knowledge can make scientific process more efficient and productive • Current situation – Manual biocuration by experts – Natural language processing 2
  • 3. Mark2Cure – biocuration by microtasking • Our approach: Use Amazon Mechanical Turk platform for paid microtask crowdsourcing • Results: reproduced an expert-generated gold standard at equivalent accuracy, shorter time, fraction of cost 3 K = 6 F score = 0.87 Precision Recall • 593 documents • 9 days • 145 workers • $0.06 / task • Total cost: $630.96
  • 4. Mark2Cure – biocuration by citizen science • Our approach: Use volunteer-based citizen science for microtask crowdsourcing • Results: reproduced an expert-generated gold standard at equivalent accuracy, shorter time, at no cost 4 • 593 documents • 28 days • 212 workers • Total cost: $0.00 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 k = 6 F score = 0.84 PrecisionRecall Voting threshold http://mark2cure.org
  • 5. Collaborative knowledge management • Challenge: Biomedical research allows for genome-scale profiling, but few genes are previously known to researcher • Opportunity: Better access to existing knowledge can make scientific process more efficient and productive • Current situation – Review articles (but sparse coverage) – Lots of reading of primary literature 5
  • 6. Collaborative knowledge management • Our approach: Create a gene-specific review article for every human gene that is collaboratively written, continuously updated, and community reviewed • Results: 5M page views and >1000 edits per month 6
  • 7. Collaborative knowledge management • Our approach: Create a gene-specific Wikidata database entry for every human gene that is collaboratively integrated, continuously updated, and community reviewed • Results: all human genes and diseases loaded in Wikidata, soon to have drugs and relationships 7
  • 8. Bioinformatics algorithm optimization • Challenge: Antibody sequence clustering is computationally expensive (CPU and memory) • Opportunity: Large-scale clustering of antibody sequences can aid vaccine development • Current situation: Research-grade code can cluster ~100k sequences in 1.7 hours on high memory (150 GB) machine. 8
  • 9. Bioinformatics algorithm optimization • Our approach: Ran TopCoder contest for 10 days, offering $7500 in prize money • Results: Best solution can cluster 2.3M sequences in 30 seconds on a typical desktop computer (1.1 GB) 9 log(# sequences processed) log(executiontime) Benchmarks
  • 10. 10 Cyrus Afrasiabi Ramya Gamini Louis Gioia Salvatore Loguercio Adam Mark Erick Scott Greg Stupp Kevin Xin Other group members Contact http://sulab.org asu@scripps.edu @andrewsu +Andrew Su Mark2Cure Ben Good Max Nanis Ginger Tsueng Chunlei Wu All Mark2Curators! Funding and Support BioGPS: GM83924 Gene Wiki: GM089820 BD2K Center of Excellence: GM114833 Gene Wiki Ben Good Sebastian Burgstaller Andra Waagmeester Elvira Mitraka, UMB Lynn Schriml, UMB Paul Pavlidis, UBC Gang Fu, NCBI Contests Chunlei Wu Ben Good Brian Briney, TSRI Dennis Burton, TSRI Rinat Sergeev, HBS Jin Paik, HBS Karim Laklani, HBS Jingbo Shang Rashid Sial, Appirio Join the team! bit.ly/sulabawesome
  • 11. Game for breast cancer prognosis • Challenge: Genomic classifiers of disease are difficult to train in a way that consistently validates on secondary datasets • Opportunity: Better classifiers of disease diagnosis and/or prognosis have many clinical applications • Current situation: Most attempts to train classifiers rely on machine learning methods that utilize little or no biological knowledge 11
  • 12. Game for breast cancer prognosis • Our approach: Enlist a crowd of expert game players with diverse perspectives to identify most biologically relevant genes • Results: Gene sets derived from game player data showed comparable performance to expert-generated gene sets 12 • 1077 registered players • 15,669 games played • Demographics – 59% male, 41% female – 21-29 is most frequent age group – 35% had graduate degree, 32% were biologists