SlideShare a Scribd company logo
1 of 19
Wikipedia as an engine for
 scientific communication and
collaboration at massive scale

              Andrew Su, Ph.D.
                 @andrewsu
               asu@scripps.edu
                http://sulab.org   OK

        ScienceWriters2012         OK

          October 27, 2012
2
The biomedical literature is growing rapidly


                       Number of PubMed-indexed articles
          1,000,000

           800,000

           600,000

           400,000

           200,000

                 0
                      1979   1984   1989   1994   1999   2004   2009
3
The biomedical literature is growing rapidly


                    Average of articlesof humantypical scientist
                    Number capacity read by scientist


               20




               10




               0




               1979     1984   1989   1994   1999   2004   2009
4
High-throughput molecular profiling is powerful




                                       Testable
                                      hypothesis




   ~20,000 genes   100+ candidates   10+ experiments
Filtering, extracting, and summarizing PubMed



Documents




 Concepts             Review article
Filtering, extracting, and summarizing PubMed



Documents




 Concepts
7
 10k gene “stubs” within Wikipedia ≈ “Gene Wiki”



                                         Protein structure
         Gene
       summary
                                          Symbols and
                                           identifiers


                                         Gene Ontology
                                          annotations
       Protein
    interactions

                                         Tissue expression
       Linked                                 pattern
     references

                                         Links to structured
                                             databases



Huss, PLoS Biol, 2008
8
 Gene Wiki has a critical mass of readers
           Rank 1001-1010: Specialists                               Rank 101-110: Scientists

                       CSDA                                                   Tau protein
                     CNTNAP2                                                Interleukin 10
                       IGSF8                                                     APC
               Adenosine A3 receptor                                            C-Met
                        RYR1                                                   Factor V
                        ETV6                                                 Interleukin 8
              Small heterodimer partner                                          CD44
                  5-HT1D receptor                                      Histamine H1 receptor
                       TRPC6                                           Kappa Opioid receptor
               Interleukin-6 receptor                                  Dihydrofolate reductase



                                                                                            Rank 1-10: Laypeople
                                              Total: 4.0 million views / month
                                                                                                  Insulin
                                                                                                   Titin
                                                                                        Human chorionic gonadotropin
                                                                                               Vasopressin
                                                                                                  ANKH
                                                                                                 CLOCK
                                                                                                Catalase
                                                                                              Erythropoietin
                                                                                                Glucagon
                                                                                           Parathyroid hormone

Huss, PLoS Biol, 2008; Huss, NAR, 2010; Good, NAR, 2011
9
 Gene Wiki has a critical mass of readers




Huss, PLoS Biol, 2008; Huss, NAR, 2010; Good, NAR, 2011
10
 Gene Wiki has a critical mass of editors



                            Editor count   Editors




                                                             Edit count
                                                     Edits




               Increase of ~10,000 words / month from >1,000 edits
                            Currently 1.42 million words
                   Approximately equal to 230 full-length articles

Huss, NAR, 2010; Good, NAR, 2011
11
A review article for every gene is powerful




     Reelin: 98 editors, 703 edits since July 2002
                                      Hyperlinks to related concepts
     Heparin: 358 editors, 654 edits since June 2003
     AMPK: 109 editors, 203 edits since March 2004
     RNAi: 394 editors, 994 edits since October 2002
                                               References to the literature
12
 The Gene Wiki is timely and current

                                               Manny Ramirez
                                             suspended for doping




                                        Catalase linked to
                                       premature gray hair




                    Also, MGAT2 (obesity), ALDH2 (heart attack), SOX21 (hair
                  loss), SATB1 (breast cancer), TSLP (asthma), CCR5 (HIV), …
Huss, NAR, 2010
13
 The Gene Wiki is (reasonably) reliable

                                      Per edit     Average      Probability
                                     probability   lifetime      by time
   Cumulative edits




                        Good edits     98.9%       115.4 d       99.968%


                        Vandalism      1.1%         3.4 d        0.032%

                      Date                                    (0.63% for
                                                              WP overall)



Good, NAR, 2011
14
 Making the Gene Wiki more reliable
  Novartis is a multinational   2       The company name is derived
  pharmaceutical company                 from old Greek, and means
 based in Basel, Switzerland                 "destroyer of birds".
that manufactures drugs such
         as clozapine
     (Clozaril), diclofenac
         (Voltaren), …

                                    2




Good, NAR, 2011                               http://www.wikitrust.net/
15
 Making the Gene Wiki more reliable
  Novartis is a multinational             2         The company name is derived
  pharmaceutical company                             from old Greek, and means
 based in Basel, Switzerland                             "destroyer of birds".
that manufactures drugs such
         as clozapine
     (Clozaril), diclofenac
         (Voltaren), …




                  36211 total edits              36 total edits

                                      *                                          *
                                      *
                                      *
                                      *                                          *
                                      *
                                      *                                          *
                                      *
                                      *                                          *
                                      *                                          *

           High-trust author                  Low-trust author
Good, NAR, 2011                                           http://www.wikitrust.net/
16
Partnering with traditional scientific publishing
17
Partnering with traditional scientific publishing
18
Partnering with traditional scientific publishing
19
       Collaborators                                                      Group members
Doug Howe, ZFIN                                              Ben Good               Max Nanis
John Hogenesch, U Penn
Jon Huss, GNF
                                                             Salvatore Loguercio    Chunlei Wu
Luca de Alfaro, UCSC                                         Ian Macleod
Angel Pizzaro, U Penn
Faramarz Valafar, SDSU
Pierre Lindenbaum,
      Fondation Jean Dausset
Michael Martone, Rush
Konrad Koehler, Karo Bio
Warren Kibbe, Simon Lim, Northwestern
                                                             http://slideshare.com/andrewsu
Many Wikipedia editors
    WP:MCB Project




                                              Contact
                                          http://sulab.org
                                         asu@scripps.edu
                                           @andrewsu
                                           +Andrew Su

                                        Funding and Support



                                 (BioGPS: GM83924, Gene Wiki: GM089820)

More Related Content

Similar to Wikipedia as an engine for scientific communication and collaboration at massive scale

Zoo Bank Talk Ms Ccourse09 Compressed Test
Zoo Bank Talk Ms Ccourse09 Compressed TestZoo Bank Talk Ms Ccourse09 Compressed Test
Zoo Bank Talk Ms Ccourse09 Compressed Test
ICZN
 
5 kang et al acai paper in food chemistry 2010
5 kang et al acai paper in food chemistry 20105 kang et al acai paper in food chemistry 2010
5 kang et al acai paper in food chemistry 2010
Antonio Rodríguez
 
What Are Archaea And Bacteria Be Classified As Two...
What Are Archaea And Bacteria Be Classified As Two...What Are Archaea And Bacteria Be Classified As Two...
What Are Archaea And Bacteria Be Classified As Two...
Monica Turner
 
Neurohistological Study of Ethanolic Root Bark and Leaf Extracts Of Rauwolfia...
Neurohistological Study of Ethanolic Root Bark and Leaf Extracts Of Rauwolfia...Neurohistological Study of Ethanolic Root Bark and Leaf Extracts Of Rauwolfia...
Neurohistological Study of Ethanolic Root Bark and Leaf Extracts Of Rauwolfia...
Healthcare and Medical Sciences
 
Alexander Lazarev, Ph.D. presentation at ANALYTICA Biotech Forum
Alexander Lazarev, Ph.D.  presentation at ANALYTICA Biotech ForumAlexander Lazarev, Ph.D.  presentation at ANALYTICA Biotech Forum
Alexander Lazarev, Ph.D. presentation at ANALYTICA Biotech Forum
Company Spotlight
 
Searching the pharmacology literature
Searching the pharmacology literatureSearching the pharmacology literature
Searching the pharmacology literature
Andrea Miller-Nesbitt
 
Benchmark 4 review
Benchmark 4 reviewBenchmark 4 review
Benchmark 4 review
farrellw
 

Similar to Wikipedia as an engine for scientific communication and collaboration at massive scale (20)

ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation
ISMB2012: The Gene Wiki: Crowdsourcing human gene annotationISMB2012: The Gene Wiki: Crowdsourcing human gene annotation
ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation
 
Human encodeproject
Human encodeprojectHuman encodeproject
Human encodeproject
 
Building Data
Building DataBuilding Data
Building Data
 
Molecular Systematics and Biodiversity
Molecular Systematics and BiodiversityMolecular Systematics and Biodiversity
Molecular Systematics and Biodiversity
 
Chibucos annot go_final
Chibucos annot go_finalChibucos annot go_final
Chibucos annot go_final
 
Spider venom credit seminar
Spider venom  credit seminarSpider venom  credit seminar
Spider venom credit seminar
 
Zoo Bank Talk Ms Ccourse09 Compressed Test
Zoo Bank Talk Ms Ccourse09 Compressed TestZoo Bank Talk Ms Ccourse09 Compressed Test
Zoo Bank Talk Ms Ccourse09 Compressed Test
 
5 kang et al acai paper in food chemistry 2010
5 kang et al acai paper in food chemistry 20105 kang et al acai paper in food chemistry 2010
5 kang et al acai paper in food chemistry 2010
 
Protein Sequence Databases
Protein Sequence Databases Protein Sequence Databases
Protein Sequence Databases
 
What Are Archaea And Bacteria Be Classified As Two...
What Are Archaea And Bacteria Be Classified As Two...What Are Archaea And Bacteria Be Classified As Two...
What Are Archaea And Bacteria Be Classified As Two...
 
J. Ingram, D. Poulcharidis - Adv. Topics of Chem. Bio. - Dr. Webb - Prof. S. ...
J. Ingram, D. Poulcharidis - Adv. Topics of Chem. Bio. - Dr. Webb - Prof. S. ...J. Ingram, D. Poulcharidis - Adv. Topics of Chem. Bio. - Dr. Webb - Prof. S. ...
J. Ingram, D. Poulcharidis - Adv. Topics of Chem. Bio. - Dr. Webb - Prof. S. ...
 
Chromosomes, Crops and Superdomestication - Pat Heslop-Harrison Malaysia
Chromosomes, Crops and Superdomestication - Pat Heslop-Harrison MalaysiaChromosomes, Crops and Superdomestication - Pat Heslop-Harrison Malaysia
Chromosomes, Crops and Superdomestication - Pat Heslop-Harrison Malaysia
 
Mitochondrial DNA in Taxonomy and Phylogeny
Mitochondrial DNA in Taxonomy and PhylogenyMitochondrial DNA in Taxonomy and Phylogeny
Mitochondrial DNA in Taxonomy and Phylogeny
 
Neurohistological Study of Ethanolic Root Bark and Leaf Extracts Of Rauwolfia...
Neurohistological Study of Ethanolic Root Bark and Leaf Extracts Of Rauwolfia...Neurohistological Study of Ethanolic Root Bark and Leaf Extracts Of Rauwolfia...
Neurohistological Study of Ethanolic Root Bark and Leaf Extracts Of Rauwolfia...
 
Alexander Lazarev, Ph.D. presentation at ANALYTICA Biotech Forum
Alexander Lazarev, Ph.D.  presentation at ANALYTICA Biotech ForumAlexander Lazarev, Ph.D.  presentation at ANALYTICA Biotech Forum
Alexander Lazarev, Ph.D. presentation at ANALYTICA Biotech Forum
 
Searching the pharmacology literature
Searching the pharmacology literatureSearching the pharmacology literature
Searching the pharmacology literature
 
Evolution of Social Brains
Evolution of Social BrainsEvolution of Social Brains
Evolution of Social Brains
 
Eol fellow-march2010
Eol fellow-march2010Eol fellow-march2010
Eol fellow-march2010
 
Lecture 1,2
Lecture 1,2Lecture 1,2
Lecture 1,2
 
Benchmark 4 review
Benchmark 4 reviewBenchmark 4 review
Benchmark 4 review
 

More from Andrew Su

Building and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graphBuilding and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graph
Andrew Su
 
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Andrew Su
 

More from Andrew Su (20)

Building and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graphBuilding and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graph
 
Wikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciencesWikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciences
 
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledgeThe Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
 
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
 
WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)
 
The case for an open biomedical knowledgebase
The case for an open biomedical knowledgebaseThe case for an open biomedical knowledgebase
The case for an open biomedical knowledgebase
 
Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)
 
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
 
Citizen Science and Rare Disease Research
Citizen Science and Rare Disease ResearchCitizen Science and Rare Disease Research
Citizen Science and Rare Disease Research
 
Open biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen scienceOpen biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen science
 
Heart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen ScienceHeart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen Science
 
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
 
Using Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledgeUsing Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledge
 
UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6
 
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
 
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
 
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen ScienceCrowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
 
Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)
 
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
 

Wikipedia as an engine for scientific communication and collaboration at massive scale

  • 1. Wikipedia as an engine for scientific communication and collaboration at massive scale Andrew Su, Ph.D. @andrewsu asu@scripps.edu http://sulab.org OK ScienceWriters2012 OK October 27, 2012
  • 2. 2 The biomedical literature is growing rapidly Number of PubMed-indexed articles 1,000,000 800,000 600,000 400,000 200,000 0 1979 1984 1989 1994 1999 2004 2009
  • 3. 3 The biomedical literature is growing rapidly Average of articlesof humantypical scientist Number capacity read by scientist 20 10 0 1979 1984 1989 1994 1999 2004 2009
  • 4. 4 High-throughput molecular profiling is powerful Testable hypothesis ~20,000 genes 100+ candidates 10+ experiments
  • 5. Filtering, extracting, and summarizing PubMed Documents Concepts Review article
  • 6. Filtering, extracting, and summarizing PubMed Documents Concepts
  • 7. 7 10k gene “stubs” within Wikipedia ≈ “Gene Wiki” Protein structure Gene summary Symbols and identifiers Gene Ontology annotations Protein interactions Tissue expression Linked pattern references Links to structured databases Huss, PLoS Biol, 2008
  • 8. 8 Gene Wiki has a critical mass of readers Rank 1001-1010: Specialists Rank 101-110: Scientists CSDA Tau protein CNTNAP2 Interleukin 10 IGSF8 APC Adenosine A3 receptor C-Met RYR1 Factor V ETV6 Interleukin 8 Small heterodimer partner CD44 5-HT1D receptor Histamine H1 receptor TRPC6 Kappa Opioid receptor Interleukin-6 receptor Dihydrofolate reductase Rank 1-10: Laypeople Total: 4.0 million views / month Insulin Titin Human chorionic gonadotropin Vasopressin ANKH CLOCK Catalase Erythropoietin Glucagon Parathyroid hormone Huss, PLoS Biol, 2008; Huss, NAR, 2010; Good, NAR, 2011
  • 9. 9 Gene Wiki has a critical mass of readers Huss, PLoS Biol, 2008; Huss, NAR, 2010; Good, NAR, 2011
  • 10. 10 Gene Wiki has a critical mass of editors Editor count Editors Edit count Edits Increase of ~10,000 words / month from >1,000 edits Currently 1.42 million words Approximately equal to 230 full-length articles Huss, NAR, 2010; Good, NAR, 2011
  • 11. 11 A review article for every gene is powerful Reelin: 98 editors, 703 edits since July 2002 Hyperlinks to related concepts Heparin: 358 editors, 654 edits since June 2003 AMPK: 109 editors, 203 edits since March 2004 RNAi: 394 editors, 994 edits since October 2002 References to the literature
  • 12. 12 The Gene Wiki is timely and current Manny Ramirez suspended for doping Catalase linked to premature gray hair Also, MGAT2 (obesity), ALDH2 (heart attack), SOX21 (hair loss), SATB1 (breast cancer), TSLP (asthma), CCR5 (HIV), … Huss, NAR, 2010
  • 13. 13 The Gene Wiki is (reasonably) reliable Per edit Average Probability probability lifetime by time Cumulative edits Good edits 98.9% 115.4 d 99.968% Vandalism 1.1% 3.4 d 0.032% Date (0.63% for WP overall) Good, NAR, 2011
  • 14. 14 Making the Gene Wiki more reliable Novartis is a multinational 2 The company name is derived pharmaceutical company from old Greek, and means based in Basel, Switzerland "destroyer of birds". that manufactures drugs such as clozapine (Clozaril), diclofenac (Voltaren), … 2 Good, NAR, 2011 http://www.wikitrust.net/
  • 15. 15 Making the Gene Wiki more reliable Novartis is a multinational 2 The company name is derived pharmaceutical company from old Greek, and means based in Basel, Switzerland "destroyer of birds". that manufactures drugs such as clozapine (Clozaril), diclofenac (Voltaren), … 36211 total edits 36 total edits * * * * * * * * * * * * * * High-trust author Low-trust author Good, NAR, 2011 http://www.wikitrust.net/
  • 16. 16 Partnering with traditional scientific publishing
  • 17. 17 Partnering with traditional scientific publishing
  • 18. 18 Partnering with traditional scientific publishing
  • 19. 19 Collaborators Group members Doug Howe, ZFIN Ben Good Max Nanis John Hogenesch, U Penn Jon Huss, GNF Salvatore Loguercio Chunlei Wu Luca de Alfaro, UCSC Ian Macleod Angel Pizzaro, U Penn Faramarz Valafar, SDSU Pierre Lindenbaum, Fondation Jean Dausset Michael Martone, Rush Konrad Koehler, Karo Bio Warren Kibbe, Simon Lim, Northwestern http://slideshare.com/andrewsu Many Wikipedia editors WP:MCB Project Contact http://sulab.org asu@scripps.edu @andrewsu +Andrew Su Funding and Support (BioGPS: GM83924, Gene Wiki: GM089820)

Editor's Notes

  1. next gen sequencing identifies candidate genesAlso Microarray data, proteomics, GWAS, methylation, post-translational modifications, translocation detection, etc.What do these genes do?
  2. Relying on the entire community of scientists to digest the biomedical literature: identification filtering extraction summarization
  3. Relying on the entire community of scientists to digest the biomedical literature: identification filtering extraction summarization
  4. Reverted four minutes later
  5. Reverted four minutes later