SlideShare a Scribd company logo
The Gene Wiki: Crowdsourcing human gene
               annotation

                    Andrew Su, Ph.D.
      Department of Molecular and Experimental Medicine
               The Scripps Research Institute

                    Biocuration 2012

                       April 2, 2012
2
The Long Tail is a prolific source of content


                      Short
                      Head
            Content
           produced


                                      Long Tail



                              Contributors (sorted)




            News :      Newspapers                 Blogs
             Video:     TV/Hollywood              YouTube
  Product reviews:    Consumer reports         Amazon reviews
    Food reviews:        Food critics               Yelp
    Talent judging:       Olympics              American Idol
  Gene annotation:     Manual curation           Gene Wiki
3




  We can harness the
Long Tail of scientists
to directly participate in
  the gene annotation
        process.
4
Wikipedia is reasonably accurate
5
Wikipedia has breadth and depth



           Articles




            Words
            (millions)



                         Wikipedia       Britannica
                                          Online




                                     http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons, July 2008
Filtering, extracting, and summarizing PubMed



Documents




 Concepts
7
Wiki success depends on a positive feedback

                  Gene wiki page utility




                             1   100
                         2             200




    Number of                                Number of
   contributors                                users
8
 10,000 gene “stubs” within Wikipedia          Utility




                                                         Users

                                        Contributors



                                         Protein structure
    Gene
  summary
                                          Symbols and
                                           identifiers


                                         Gene Ontology
                                          annotations
   Protein
interactions

                                        Tissue expression
  Linked                                     pattern
references

                                         Links to structured
                                             databases



Huss, PLoS Biol, 2008
9
 Gene Wiki has a critical mass of readers
                                                                      Utility




                                                                                Users
                                                               Contributors
                                         Total: ~4.3 million
                                           views / month




Huss, PLoS Biol, 2008; Good, NAR, 2011
10
 Gene Wiki has a critical mass of editors
                                                                                  Utility



                  ~10,000 words added / month
                                                                                            Users
                                                                            Contributors
                         Total 1.42 million words
                            ≈ 230 full-length articles


                    4.3 million views / month


                                                         Cumulative edits
                                                                                  Productive
                                                                                     edits
                             1000 edits / month



                                                                                Vandalism


Good, NAR, 2011
11
A review article for every gene is powerful




      Reelin: 68 editors, 543 edits since July 2002
      Heparin: 175 editors, 320 edits since June 2003
      AMPK: 44 editors, 84 edits since March 2004
      RNAi: 232 editors, 708 edits since October 2002
                                          References to the literature
         Hyperlinks to related concepts
12
Making the Gene Wiki more computable



Free text       Structured annotations
13
Filling the gaps in gene annotation

                                   NCBI Entrez Gene: 3362



                       Gene Wiki
                       mapping


          Wikilink                    Candidate
                                      assertion

                                   GO:0004993



                       GO exact
                       synonym
14
Filling the gaps in gene annotation

                                   NCBI Entrez Gene: 334



                       Gene Wiki
                       mapping


          Wikilink                    Candidate
                                      assertion

                                   GO:0006897



                       GO exact
                        match
Disease associations mined from the Gene Wiki
                                        Good, BMC Genomics 2011, 12:603


  Gene Wiki Articles
      (10,271)                               23% exact
                                               match


      Filter out                                    5% match
     seeded text                                     parent
                                                    2% match
                                                      child
                       70% have
       NCBO
                       no match
     Annotator



  Matched Disease                        2147
                        Compare to
  Ontology terms                       candidate
                        DO database
      (2983)                          annotations
Disease associations mined from the Gene Wiki
                                             Good, BMC Genomics 2011, 12:603




                        Expert curation




                                          Correct
       Incorrect: 10%            86%

           Maybe: 4%                       Overall specificity: 90-93%
GO associations mined from the Gene Wiki
                                        Good, BMC Genomics 2011, 12:603


  Gene Wiki Articles
      (10,271)                                 17% exact
                                                 match


      Filter out
     seeded text                                     26% match
                                                       parent



                       55% have
       NCBO            no match
     Annotator                               2% match
                                               child


   Matched Gene                          6319
                        Compare to
   Ontology terms                      candidate
                        GO database
      (11,022)                        annotations
GO associations mined from the Gene Wiki
                                              Good, BMC Genomics 2011, 12:603




                      Expert curation



                                    Correct

                              14%
                                        Maybe
                       60%      26%
          Incorrect
                                         Overall specificity: 48-64%
19
Common sources of error in GO associations
                                                        Good, BMC Genomics 2011, 12:603



       1) Incorrect concept recognition
            OR2F1: “Olfactory receptors … are
            responsible for the recognition and G protein-
            mediated transduction of odorant signals.”

 Signal transduction (GO:0007165)          Transduction (GO:0009293)
 The cellular process in which a signal    The transfer of genetic information to a
 is conveyed to trigger a change in the    bacterium from a bacteriophage or
 activity or state of a cell. Signal       between bacterial or yeast cells
 transduction begins with reception of a   mediated by a phage vector.
 signal, e.g. a ligand binding to a
 receptor or receptor activation by a
 stimulus such as light, and ends with
 regulation of a downstream cellular
 process…
20
Common sources of error in GO associations
                                         Good, BMC Genomics 2011, 12:603



    2) Incorrect sentence context
        MEF2C: “Several post translational
        modifications have been identified including
        phosphorylation on serine-59 …”

                                          Dephosphorylation
                                          Excretion
                    Phosporylation        Gene expression
                                          Glycosylation
                                          Localization
       MEF2C         Neurogenesis         Methylation
                                          Proteolysis
                                          Secretion
                                          Transport
      Myelination                         Transcription
                                          Translation
21
Novel GO annotations – so what?




                 6319
  11,022                                 ~100,000
                “novel”    4703 (43%)
annotations                             annotations
              annotations match known
mined from                               from GO
              @ 48-64% annotations
 Gene Wiki                              consortium
               specificity
22
Gene Wiki content improves enrichment analysis
    axon                                            Enrichment
  guidance     GO term
                                                     analysis
(GO:0007411)

                                     811 articles

 264 genes                           PubMed          Concept
               Gene list
                                     abstracts      recognition




                     GO:0007411
                      Yes    No
Linked genes   Yes     13        2
   through
               No     251   12033
   PubMed

                 P = 1.55 E-20
23
Gene Wiki content improves enrichment analysis
   muscle                                          Enrichment
 contraction   GO term
                                                    analysis
(GO:0006936)

                                 251 articles

  87 genes                      PubMed              Concept
               Gene list
                                abstracts          recognition
                                     +
                                Gene Wiki
                                 87 articles
                   GO:0006936                     GO:0006936


Linked genes                       Linked genes
   through                            through
   PubMed                            PubMed +
                                     Gene Wiki
                   P = 1.0                        P = 1.22 E-09
24
Gene Wiki content improves enrichment analysis



                     More
    p-value       significant
(PubMed + GW)    PubMed only

                                                  Muscle
                                                contraction



                                     More
                                  significant
                                 PubMed + GW




                   p-value (PubMed only)
25
Challenges and future directions


   • How to complement and integrate with
     traditional biocuration workflows?
   • How to disseminate and utilize
     crowdsourced annotations?
26




          The
 Long Tail of scientists
is a valuable source of
  information on gene
        function
27
       Collaborators                                                  Group members
Doug Howe, ZFIN                                             Erik Clarke       Ian Macleod
John Hogenesch, U Penn
Jon Huss, GNF
                                                            Ben Good (*)      Chunlei Wu
Luca de Alfaro, UCSC                                        Salvatore Loguercio
Angel Pizzaro, U Penn
Faramarz Valafar, SDSU
Pierre Lindenbaum,
      Fondation Jean Dausset
Michael Martone, Rush                                  See poster # 30 for more on
Konrad Koehler, Karo Bio
Warren Kibbe, Simon Lim, Northwestern                      the Gene Wiki and
Many Wikipedia editors                                  crowdsourcing in biology!
    WP:MCB Project



                                                                                     Contact
                                                                                 http://sulab.org
                                                                                asu@scripps.edu
                                                                                  @andrewsu
                                                                                  +Andrew Su



                                        Funding and Support



                                   (BioGPS: GM83924, Gene Wiki: GM089820)
28
Making the Gene Wiki more reliable
  Novartis is a multinational   2       The company name is derived
  pharmaceutical company                 from old Greek, and means
 based in Basel, Switzerland                 "destroyer of birds".
that manufactures drugs such
         as clozapine
     (Clozaril), diclofenac
         (Voltaren), …

                                    2
29
Making the Gene Wiki more reliable
  Novartis is a multinational         2         The company name is derived
  pharmaceutical company                         from old Greek, and means
 based in Basel, Switzerland                         "destroyer of birds".
that manufactures drugs such
   as clozapine (Clozaril),
   diclofenac (Voltaren), …




              36211 total edits              36 total edits

                                  *                                          *
                                  *
                                  *
                                  *                                          *
                                  *
                                  *                                          *
                                  *
                                  *                                          *
                                  *                                          *

          High-trust author               Low-trust author
                                                      http://www.wikitrust.net/

More Related Content

Similar to ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
Andrew Su
 
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
Andrew Su
 
Wikipedia as an engine for scientific communication and collaboration at mass...
Wikipedia as an engine for scientific communication and collaboration at mass...Wikipedia as an engine for scientific communication and collaboration at mass...
Wikipedia as an engine for scientific communication and collaboration at mass...
Andrew Su
 
bioinformatics enabling knowledge generation from agricultural omics data
bioinformatics enabling knowledge generation from agricultural omics databioinformatics enabling knowledge generation from agricultural omics data
bioinformatics enabling knowledge generation from agricultural omics data
International Institute of Tropical Agriculture
 
BioCuration 2019 - Evidence and Conclusion Ontology 2019 Update
BioCuration 2019 - Evidence and Conclusion Ontology 2019 UpdateBioCuration 2019 - Evidence and Conclusion Ontology 2019 Update
BioCuration 2019 - Evidence and Conclusion Ontology 2019 Update
dolleyj
 
Introduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental BiologyIntroduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental Biology
Barry Smith
 
How Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open ScienceHow Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open Sciencedrnigam
 
20120717 ismb2012
20120717 ismb201220120717 ismb2012
20120717 ismb2012
anewgene
 
BioPortal: ontologies and integrated data resources at the click of a mouse
BioPortal: ontologies and integrated data resourcesat the click of a mouseBioPortal: ontologies and integrated data resourcesat the click of a mouse
BioPortal: ontologies and integrated data resources at the click of a mouse
INRAE (MISTEA) and University of Montpellier (LIRMM)
 
20120220 Tri-Con Cloud Computing Symposium
20120220 Tri-Con Cloud Computing Symposium20120220 Tri-Con Cloud Computing Symposium
20120220 Tri-Con Cloud Computing SymposiumAndrew Su
 
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...
Amit Sheth
 
BioWikis BSB10
BioWikis BSB10BioWikis BSB10
BioWikis BSB10
Dan Bolser
 
Integrate Ontologies into your apps
Integrate Ontologies into your appsIntegrate Ontologies into your apps
Integrate Ontologies into your apps
IRIDA_community
 
Knowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, BonnKnowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, Bonn
Todd Vision
 
Bio-ontologies in bioinformatics: Growing up challenges
Bio-ontologies in bioinformatics: Growing up challengesBio-ontologies in bioinformatics: Growing up challenges
Bio-ontologies in bioinformatics: Growing up challenges
Janna Hastings
 
Web Apollo: Lessons learned from community-based biocuration efforts.
Web Apollo: Lessons learned from community-based biocuration efforts.Web Apollo: Lessons learned from community-based biocuration efforts.
Web Apollo: Lessons learned from community-based biocuration efforts.
Monica Munoz-Torres
 
Species pages and portals
Species pages and portals Species pages and portals
Species pages and portals
Cyndy Parr
 
Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...
Dag Endresen
 
Masson - ViralZone
Masson - ViralZoneMasson - ViralZone
Masson - ViralZone
Pascale Gaudet
 

Similar to ISB2012: The Gene Wiki: Crowdsourcing human gene annotation (20)

NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
 
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
 
Wikipedia as an engine for scientific communication and collaboration at mass...
Wikipedia as an engine for scientific communication and collaboration at mass...Wikipedia as an engine for scientific communication and collaboration at mass...
Wikipedia as an engine for scientific communication and collaboration at mass...
 
bioinformatics enabling knowledge generation from agricultural omics data
bioinformatics enabling knowledge generation from agricultural omics databioinformatics enabling knowledge generation from agricultural omics data
bioinformatics enabling knowledge generation from agricultural omics data
 
BioCuration 2019 - Evidence and Conclusion Ontology 2019 Update
BioCuration 2019 - Evidence and Conclusion Ontology 2019 UpdateBioCuration 2019 - Evidence and Conclusion Ontology 2019 Update
BioCuration 2019 - Evidence and Conclusion Ontology 2019 Update
 
Introduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental BiologyIntroduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental Biology
 
Wikis at work
Wikis at workWikis at work
Wikis at work
 
How Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open ScienceHow Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open Science
 
20120717 ismb2012
20120717 ismb201220120717 ismb2012
20120717 ismb2012
 
BioPortal: ontologies and integrated data resources at the click of a mouse
BioPortal: ontologies and integrated data resourcesat the click of a mouseBioPortal: ontologies and integrated data resourcesat the click of a mouse
BioPortal: ontologies and integrated data resources at the click of a mouse
 
20120220 Tri-Con Cloud Computing Symposium
20120220 Tri-Con Cloud Computing Symposium20120220 Tri-Con Cloud Computing Symposium
20120220 Tri-Con Cloud Computing Symposium
 
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...
 
BioWikis BSB10
BioWikis BSB10BioWikis BSB10
BioWikis BSB10
 
Integrate Ontologies into your apps
Integrate Ontologies into your appsIntegrate Ontologies into your apps
Integrate Ontologies into your apps
 
Knowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, BonnKnowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, Bonn
 
Bio-ontologies in bioinformatics: Growing up challenges
Bio-ontologies in bioinformatics: Growing up challengesBio-ontologies in bioinformatics: Growing up challenges
Bio-ontologies in bioinformatics: Growing up challenges
 
Web Apollo: Lessons learned from community-based biocuration efforts.
Web Apollo: Lessons learned from community-based biocuration efforts.Web Apollo: Lessons learned from community-based biocuration efforts.
Web Apollo: Lessons learned from community-based biocuration efforts.
 
Species pages and portals
Species pages and portals Species pages and portals
Species pages and portals
 
Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...
 
Masson - ViralZone
Masson - ViralZoneMasson - ViralZone
Masson - ViralZone
 

More from Andrew Su

Building and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graphBuilding and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graph
Andrew Su
 
Wikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciencesWikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciences
Andrew Su
 
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledgeThe Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
Andrew Su
 
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
Andrew Su
 
WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)
Andrew Su
 
The case for an open biomedical knowledgebase
The case for an open biomedical knowledgebaseThe case for an open biomedical knowledgebase
The case for an open biomedical knowledgebase
Andrew Su
 
Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)
Andrew Su
 
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Andrew Su
 
Citizen Science and Rare Disease Research
Citizen Science and Rare Disease ResearchCitizen Science and Rare Disease Research
Citizen Science and Rare Disease Research
Andrew Su
 
Open biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen scienceOpen biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen science
Andrew Su
 
Heart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen ScienceHeart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen Science
Andrew Su
 
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Andrew Su
 
Using Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledgeUsing Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledge
Andrew Su
 
UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6
Andrew Su
 
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Andrew Su
 
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Andrew Su
 
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen ScienceCrowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Andrew Su
 
Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)
Andrew Su
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Andrew Su
 
GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)
GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)
GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)
Andrew Su
 

More from Andrew Su (20)

Building and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graphBuilding and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graph
 
Wikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciencesWikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciences
 
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledgeThe Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
 
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
 
WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)
 
The case for an open biomedical knowledgebase
The case for an open biomedical knowledgebaseThe case for an open biomedical knowledgebase
The case for an open biomedical knowledgebase
 
Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)
 
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
 
Citizen Science and Rare Disease Research
Citizen Science and Rare Disease ResearchCitizen Science and Rare Disease Research
Citizen Science and Rare Disease Research
 
Open biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen scienceOpen biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen science
 
Heart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen ScienceHeart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen Science
 
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
 
Using Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledgeUsing Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledge
 
UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6
 
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
 
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
 
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen ScienceCrowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
 
Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
 
GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)
GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)
GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)
 

Recently uploaded

Physiology of Chemical Sensation of smell.pdf
Physiology of Chemical Sensation of smell.pdfPhysiology of Chemical Sensation of smell.pdf
Physiology of Chemical Sensation of smell.pdf
MedicoseAcademics
 
Knee anatomy and clinical tests 2024.pdf
Knee anatomy and clinical tests 2024.pdfKnee anatomy and clinical tests 2024.pdf
Knee anatomy and clinical tests 2024.pdf
vimalpl1234
 
Superficial & Deep Fascia of the NECK.pptx
Superficial & Deep Fascia of the NECK.pptxSuperficial & Deep Fascia of the NECK.pptx
Superficial & Deep Fascia of the NECK.pptx
Dr. Rabia Inam Gandapore
 
New Drug Discovery and Development .....
New Drug Discovery and Development .....New Drug Discovery and Development .....
New Drug Discovery and Development .....
NEHA GUPTA
 
Evaluation of antidepressant activity of clitoris ternatea in animals
Evaluation of antidepressant activity of clitoris ternatea in animalsEvaluation of antidepressant activity of clitoris ternatea in animals
Evaluation of antidepressant activity of clitoris ternatea in animals
Shweta
 
POST OPERATIVE OLIGURIA and its management
POST OPERATIVE OLIGURIA and its managementPOST OPERATIVE OLIGURIA and its management
POST OPERATIVE OLIGURIA and its management
touseefaziz1
 
Hemodialysis: Chapter 3, Dialysis Water Unit - Dr.Gawad
Hemodialysis: Chapter 3, Dialysis Water Unit - Dr.GawadHemodialysis: Chapter 3, Dialysis Water Unit - Dr.Gawad
Hemodialysis: Chapter 3, Dialysis Water Unit - Dr.Gawad
NephroTube - Dr.Gawad
 
ANATOMY AND PHYSIOLOGY OF URINARY SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF URINARY SYSTEM.pptxANATOMY AND PHYSIOLOGY OF URINARY SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF URINARY SYSTEM.pptx
Swetaba Besh
 
BRACHYTHERAPY OVERVIEW AND APPLICATORS
BRACHYTHERAPY OVERVIEW  AND  APPLICATORSBRACHYTHERAPY OVERVIEW  AND  APPLICATORS
BRACHYTHERAPY OVERVIEW AND APPLICATORS
Krishan Murari
 
Charaka Samhita Sutra sthana Chapter 15 Upakalpaniyaadhyaya
Charaka Samhita Sutra sthana Chapter 15 UpakalpaniyaadhyayaCharaka Samhita Sutra sthana Chapter 15 Upakalpaniyaadhyaya
Charaka Samhita Sutra sthana Chapter 15 Upakalpaniyaadhyaya
Dr KHALID B.M
 
heat stroke and heat exhaustion in children
heat stroke and heat exhaustion in childrenheat stroke and heat exhaustion in children
heat stroke and heat exhaustion in children
SumeraAhmad5
 
Triangles of Neck and Clinical Correlation by Dr. RIG.pptx
Triangles of Neck and Clinical Correlation by Dr. RIG.pptxTriangles of Neck and Clinical Correlation by Dr. RIG.pptx
Triangles of Neck and Clinical Correlation by Dr. RIG.pptx
Dr. Rabia Inam Gandapore
 
Ocular injury ppt Upendra pal optometrist upums saifai etawah
Ocular injury  ppt  Upendra pal  optometrist upums saifai etawahOcular injury  ppt  Upendra pal  optometrist upums saifai etawah
Ocular injury ppt Upendra pal optometrist upums saifai etawah
pal078100
 
Maxilla, Mandible & Hyoid Bone & Clinical Correlations by Dr. RIG.pptx
Maxilla, Mandible & Hyoid Bone & Clinical Correlations by Dr. RIG.pptxMaxilla, Mandible & Hyoid Bone & Clinical Correlations by Dr. RIG.pptx
Maxilla, Mandible & Hyoid Bone & Clinical Correlations by Dr. RIG.pptx
Dr. Rabia Inam Gandapore
 
micro teaching on communication m.sc nursing.pdf
micro teaching on communication m.sc nursing.pdfmicro teaching on communication m.sc nursing.pdf
micro teaching on communication m.sc nursing.pdf
Anurag Sharma
 
Couples presenting to the infertility clinic- Do they really have infertility...
Couples presenting to the infertility clinic- Do they really have infertility...Couples presenting to the infertility clinic- Do they really have infertility...
Couples presenting to the infertility clinic- Do they really have infertility...
Sujoy Dasgupta
 
Pulmonary Thromboembolism - etilogy, types, medical- Surgical and nursing man...
Pulmonary Thromboembolism - etilogy, types, medical- Surgical and nursing man...Pulmonary Thromboembolism - etilogy, types, medical- Surgical and nursing man...
Pulmonary Thromboembolism - etilogy, types, medical- Surgical and nursing man...
VarunMahajani
 
NVBDCP.pptx Nation vector borne disease control program
NVBDCP.pptx Nation vector borne disease control programNVBDCP.pptx Nation vector borne disease control program
NVBDCP.pptx Nation vector borne disease control program
Sapna Thakur
 
Physiology of Special Chemical Sensation of Taste
Physiology of Special Chemical Sensation of TastePhysiology of Special Chemical Sensation of Taste
Physiology of Special Chemical Sensation of Taste
MedicoseAcademics
 
Report Back from SGO 2024: What’s the Latest in Cervical Cancer?
Report Back from SGO 2024: What’s the Latest in Cervical Cancer?Report Back from SGO 2024: What’s the Latest in Cervical Cancer?
Report Back from SGO 2024: What’s the Latest in Cervical Cancer?
bkling
 

Recently uploaded (20)

Physiology of Chemical Sensation of smell.pdf
Physiology of Chemical Sensation of smell.pdfPhysiology of Chemical Sensation of smell.pdf
Physiology of Chemical Sensation of smell.pdf
 
Knee anatomy and clinical tests 2024.pdf
Knee anatomy and clinical tests 2024.pdfKnee anatomy and clinical tests 2024.pdf
Knee anatomy and clinical tests 2024.pdf
 
Superficial & Deep Fascia of the NECK.pptx
Superficial & Deep Fascia of the NECK.pptxSuperficial & Deep Fascia of the NECK.pptx
Superficial & Deep Fascia of the NECK.pptx
 
New Drug Discovery and Development .....
New Drug Discovery and Development .....New Drug Discovery and Development .....
New Drug Discovery and Development .....
 
Evaluation of antidepressant activity of clitoris ternatea in animals
Evaluation of antidepressant activity of clitoris ternatea in animalsEvaluation of antidepressant activity of clitoris ternatea in animals
Evaluation of antidepressant activity of clitoris ternatea in animals
 
POST OPERATIVE OLIGURIA and its management
POST OPERATIVE OLIGURIA and its managementPOST OPERATIVE OLIGURIA and its management
POST OPERATIVE OLIGURIA and its management
 
Hemodialysis: Chapter 3, Dialysis Water Unit - Dr.Gawad
Hemodialysis: Chapter 3, Dialysis Water Unit - Dr.GawadHemodialysis: Chapter 3, Dialysis Water Unit - Dr.Gawad
Hemodialysis: Chapter 3, Dialysis Water Unit - Dr.Gawad
 
ANATOMY AND PHYSIOLOGY OF URINARY SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF URINARY SYSTEM.pptxANATOMY AND PHYSIOLOGY OF URINARY SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF URINARY SYSTEM.pptx
 
BRACHYTHERAPY OVERVIEW AND APPLICATORS
BRACHYTHERAPY OVERVIEW  AND  APPLICATORSBRACHYTHERAPY OVERVIEW  AND  APPLICATORS
BRACHYTHERAPY OVERVIEW AND APPLICATORS
 
Charaka Samhita Sutra sthana Chapter 15 Upakalpaniyaadhyaya
Charaka Samhita Sutra sthana Chapter 15 UpakalpaniyaadhyayaCharaka Samhita Sutra sthana Chapter 15 Upakalpaniyaadhyaya
Charaka Samhita Sutra sthana Chapter 15 Upakalpaniyaadhyaya
 
heat stroke and heat exhaustion in children
heat stroke and heat exhaustion in childrenheat stroke and heat exhaustion in children
heat stroke and heat exhaustion in children
 
Triangles of Neck and Clinical Correlation by Dr. RIG.pptx
Triangles of Neck and Clinical Correlation by Dr. RIG.pptxTriangles of Neck and Clinical Correlation by Dr. RIG.pptx
Triangles of Neck and Clinical Correlation by Dr. RIG.pptx
 
Ocular injury ppt Upendra pal optometrist upums saifai etawah
Ocular injury  ppt  Upendra pal  optometrist upums saifai etawahOcular injury  ppt  Upendra pal  optometrist upums saifai etawah
Ocular injury ppt Upendra pal optometrist upums saifai etawah
 
Maxilla, Mandible & Hyoid Bone & Clinical Correlations by Dr. RIG.pptx
Maxilla, Mandible & Hyoid Bone & Clinical Correlations by Dr. RIG.pptxMaxilla, Mandible & Hyoid Bone & Clinical Correlations by Dr. RIG.pptx
Maxilla, Mandible & Hyoid Bone & Clinical Correlations by Dr. RIG.pptx
 
micro teaching on communication m.sc nursing.pdf
micro teaching on communication m.sc nursing.pdfmicro teaching on communication m.sc nursing.pdf
micro teaching on communication m.sc nursing.pdf
 
Couples presenting to the infertility clinic- Do they really have infertility...
Couples presenting to the infertility clinic- Do they really have infertility...Couples presenting to the infertility clinic- Do they really have infertility...
Couples presenting to the infertility clinic- Do they really have infertility...
 
Pulmonary Thromboembolism - etilogy, types, medical- Surgical and nursing man...
Pulmonary Thromboembolism - etilogy, types, medical- Surgical and nursing man...Pulmonary Thromboembolism - etilogy, types, medical- Surgical and nursing man...
Pulmonary Thromboembolism - etilogy, types, medical- Surgical and nursing man...
 
NVBDCP.pptx Nation vector borne disease control program
NVBDCP.pptx Nation vector borne disease control programNVBDCP.pptx Nation vector borne disease control program
NVBDCP.pptx Nation vector borne disease control program
 
Physiology of Special Chemical Sensation of Taste
Physiology of Special Chemical Sensation of TastePhysiology of Special Chemical Sensation of Taste
Physiology of Special Chemical Sensation of Taste
 
Report Back from SGO 2024: What’s the Latest in Cervical Cancer?
Report Back from SGO 2024: What’s the Latest in Cervical Cancer?Report Back from SGO 2024: What’s the Latest in Cervical Cancer?
Report Back from SGO 2024: What’s the Latest in Cervical Cancer?
 

ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

  • 1. The Gene Wiki: Crowdsourcing human gene annotation Andrew Su, Ph.D. Department of Molecular and Experimental Medicine The Scripps Research Institute Biocuration 2012 April 2, 2012
  • 2. 2 The Long Tail is a prolific source of content Short Head Content produced Long Tail Contributors (sorted) News : Newspapers Blogs Video: TV/Hollywood YouTube Product reviews: Consumer reports Amazon reviews Food reviews: Food critics Yelp Talent judging: Olympics American Idol Gene annotation: Manual curation Gene Wiki
  • 3. 3 We can harness the Long Tail of scientists to directly participate in the gene annotation process.
  • 5. 5 Wikipedia has breadth and depth Articles Words (millions) Wikipedia Britannica Online http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons, July 2008
  • 6. Filtering, extracting, and summarizing PubMed Documents Concepts
  • 7. 7 Wiki success depends on a positive feedback Gene wiki page utility 1 100 2 200 Number of Number of contributors users
  • 8. 8 10,000 gene “stubs” within Wikipedia Utility Users Contributors Protein structure Gene summary Symbols and identifiers Gene Ontology annotations Protein interactions Tissue expression Linked pattern references Links to structured databases Huss, PLoS Biol, 2008
  • 9. 9 Gene Wiki has a critical mass of readers Utility Users Contributors Total: ~4.3 million views / month Huss, PLoS Biol, 2008; Good, NAR, 2011
  • 10. 10 Gene Wiki has a critical mass of editors Utility ~10,000 words added / month Users Contributors Total 1.42 million words ≈ 230 full-length articles 4.3 million views / month Cumulative edits Productive edits 1000 edits / month Vandalism Good, NAR, 2011
  • 11. 11 A review article for every gene is powerful Reelin: 68 editors, 543 edits since July 2002 Heparin: 175 editors, 320 edits since June 2003 AMPK: 44 editors, 84 edits since March 2004 RNAi: 232 editors, 708 edits since October 2002 References to the literature Hyperlinks to related concepts
  • 12. 12 Making the Gene Wiki more computable Free text Structured annotations
  • 13. 13 Filling the gaps in gene annotation NCBI Entrez Gene: 3362 Gene Wiki mapping Wikilink Candidate assertion GO:0004993 GO exact synonym
  • 14. 14 Filling the gaps in gene annotation NCBI Entrez Gene: 334 Gene Wiki mapping Wikilink Candidate assertion GO:0006897 GO exact match
  • 15. Disease associations mined from the Gene Wiki Good, BMC Genomics 2011, 12:603 Gene Wiki Articles (10,271) 23% exact match Filter out 5% match seeded text parent 2% match child 70% have NCBO no match Annotator Matched Disease 2147 Compare to Ontology terms candidate DO database (2983) annotations
  • 16. Disease associations mined from the Gene Wiki Good, BMC Genomics 2011, 12:603 Expert curation Correct Incorrect: 10% 86% Maybe: 4% Overall specificity: 90-93%
  • 17. GO associations mined from the Gene Wiki Good, BMC Genomics 2011, 12:603 Gene Wiki Articles (10,271) 17% exact match Filter out seeded text 26% match parent 55% have NCBO no match Annotator 2% match child Matched Gene 6319 Compare to Ontology terms candidate GO database (11,022) annotations
  • 18. GO associations mined from the Gene Wiki Good, BMC Genomics 2011, 12:603 Expert curation Correct 14% Maybe 60% 26% Incorrect Overall specificity: 48-64%
  • 19. 19 Common sources of error in GO associations Good, BMC Genomics 2011, 12:603 1) Incorrect concept recognition OR2F1: “Olfactory receptors … are responsible for the recognition and G protein- mediated transduction of odorant signals.” Signal transduction (GO:0007165) Transduction (GO:0009293) The cellular process in which a signal The transfer of genetic information to a is conveyed to trigger a change in the bacterium from a bacteriophage or activity or state of a cell. Signal between bacterial or yeast cells transduction begins with reception of a mediated by a phage vector. signal, e.g. a ligand binding to a receptor or receptor activation by a stimulus such as light, and ends with regulation of a downstream cellular process…
  • 20. 20 Common sources of error in GO associations Good, BMC Genomics 2011, 12:603 2) Incorrect sentence context MEF2C: “Several post translational modifications have been identified including phosphorylation on serine-59 …” Dephosphorylation Excretion Phosporylation Gene expression Glycosylation Localization MEF2C Neurogenesis Methylation Proteolysis Secretion Transport Myelination Transcription Translation
  • 21. 21 Novel GO annotations – so what? 6319 11,022 ~100,000 “novel” 4703 (43%) annotations annotations annotations match known mined from from GO @ 48-64% annotations Gene Wiki consortium specificity
  • 22. 22 Gene Wiki content improves enrichment analysis axon Enrichment guidance GO term analysis (GO:0007411) 811 articles 264 genes PubMed Concept Gene list abstracts recognition GO:0007411 Yes No Linked genes Yes 13 2 through No 251 12033 PubMed P = 1.55 E-20
  • 23. 23 Gene Wiki content improves enrichment analysis muscle Enrichment contraction GO term analysis (GO:0006936) 251 articles 87 genes PubMed Concept Gene list abstracts recognition + Gene Wiki 87 articles GO:0006936 GO:0006936 Linked genes Linked genes through through PubMed PubMed + Gene Wiki P = 1.0 P = 1.22 E-09
  • 24. 24 Gene Wiki content improves enrichment analysis More p-value significant (PubMed + GW) PubMed only Muscle contraction More significant PubMed + GW p-value (PubMed only)
  • 25. 25 Challenges and future directions • How to complement and integrate with traditional biocuration workflows? • How to disseminate and utilize crowdsourced annotations?
  • 26. 26 The Long Tail of scientists is a valuable source of information on gene function
  • 27. 27 Collaborators Group members Doug Howe, ZFIN Erik Clarke Ian Macleod John Hogenesch, U Penn Jon Huss, GNF Ben Good (*) Chunlei Wu Luca de Alfaro, UCSC Salvatore Loguercio Angel Pizzaro, U Penn Faramarz Valafar, SDSU Pierre Lindenbaum, Fondation Jean Dausset Michael Martone, Rush See poster # 30 for more on Konrad Koehler, Karo Bio Warren Kibbe, Simon Lim, Northwestern the Gene Wiki and Many Wikipedia editors crowdsourcing in biology! WP:MCB Project Contact http://sulab.org asu@scripps.edu @andrewsu +Andrew Su Funding and Support (BioGPS: GM83924, Gene Wiki: GM089820)
  • 28. 28 Making the Gene Wiki more reliable Novartis is a multinational 2 The company name is derived pharmaceutical company from old Greek, and means based in Basel, Switzerland "destroyer of birds". that manufactures drugs such as clozapine (Clozaril), diclofenac (Voltaren), … 2
  • 29. 29 Making the Gene Wiki more reliable Novartis is a multinational 2 The company name is derived pharmaceutical company from old Greek, and means based in Basel, Switzerland "destroyer of birds". that manufactures drugs such as clozapine (Clozaril), diclofenac (Voltaren), … 36211 total edits 36 total edits * * * * * * * * * * * * * * High-trust author Low-trust author http://www.wikitrust.net/

Editor's Notes

  1. Relying on the entire community of scientists to digest the biomedical literature: identification filtering extraction summarization
  2. Structured annotations enable pathway analysis, statistical analyses, cross-species comparisons
  3. Transduction accounts for 70% of the concept recognition problems
  4. Tried on 773 GO categories, significant in 356 cases (46%)
  5. We extended this analysis to all 773 GO terms used in human gene annotations and found a consistent improvement in the enrichment scores
  6. We started working with Doug Howe because he helped us learn a lot about biocuration, but clearly we’d need to expand partnersIn particular, since GO curation seems to be largely drawn by organisms
  7. Also want to convince you that the Long Tail of bioinformatics developers is valuable too, but first have to convince you that there is a bottleneck in tool development.
  8. Reverted four minutes later
  9. Reverted four minutes later