Wikipedia as an engine for
 scientific communication and
collaboration at massive scale

              Andrew Su, Ph.D.
                 @andrewsu
               asu@scripps.edu
                http://sulab.org   OK

        ScienceWriters2012         OK

          October 27, 2012
2
The biomedical literature is growing rapidly


                       Number of PubMed-indexed articles
          1,000,000

           800,000

           600,000

           400,000

           200,000

                 0
                      1979   1984   1989   1994   1999   2004   2009
3
The biomedical literature is growing rapidly


                    Average of articlesof humantypical scientist
                    Number capacity read by scientist


               20




               10




               0




               1979     1984   1989   1994   1999   2004   2009
4
High-throughput molecular profiling is powerful




                                       Testable
                                      hypothesis




   ~20,000 genes   100+ candidates   10+ experiments
Filtering, extracting, and summarizing PubMed



Documents




 Concepts             Review article
Filtering, extracting, and summarizing PubMed



Documents




 Concepts
7
 10k gene “stubs” within Wikipedia ≈ “Gene Wiki”



                                         Protein structure
         Gene
       summary
                                          Symbols and
                                           identifiers


                                         Gene Ontology
                                          annotations
       Protein
    interactions

                                         Tissue expression
       Linked                                 pattern
     references

                                         Links to structured
                                             databases



Huss, PLoS Biol, 2008
8
 Gene Wiki has a critical mass of readers
           Rank 1001-1010: Specialists                               Rank 101-110: Scientists

                       CSDA                                                   Tau protein
                     CNTNAP2                                                Interleukin 10
                       IGSF8                                                     APC
               Adenosine A3 receptor                                            C-Met
                        RYR1                                                   Factor V
                        ETV6                                                 Interleukin 8
              Small heterodimer partner                                          CD44
                  5-HT1D receptor                                      Histamine H1 receptor
                       TRPC6                                           Kappa Opioid receptor
               Interleukin-6 receptor                                  Dihydrofolate reductase



                                                                                            Rank 1-10: Laypeople
                                              Total: 4.0 million views / month
                                                                                                  Insulin
                                                                                                   Titin
                                                                                        Human chorionic gonadotropin
                                                                                               Vasopressin
                                                                                                  ANKH
                                                                                                 CLOCK
                                                                                                Catalase
                                                                                              Erythropoietin
                                                                                                Glucagon
                                                                                           Parathyroid hormone

Huss, PLoS Biol, 2008; Huss, NAR, 2010; Good, NAR, 2011
9
 Gene Wiki has a critical mass of readers




Huss, PLoS Biol, 2008; Huss, NAR, 2010; Good, NAR, 2011
10
 Gene Wiki has a critical mass of editors



                            Editor count   Editors




                                                             Edit count
                                                     Edits




               Increase of ~10,000 words / month from >1,000 edits
                            Currently 1.42 million words
                   Approximately equal to 230 full-length articles

Huss, NAR, 2010; Good, NAR, 2011
11
A review article for every gene is powerful




     Reelin: 98 editors, 703 edits since July 2002
                                      Hyperlinks to related concepts
     Heparin: 358 editors, 654 edits since June 2003
     AMPK: 109 editors, 203 edits since March 2004
     RNAi: 394 editors, 994 edits since October 2002
                                               References to the literature
12
 The Gene Wiki is timely and current

                                               Manny Ramirez
                                             suspended for doping




                                        Catalase linked to
                                       premature gray hair




                    Also, MGAT2 (obesity), ALDH2 (heart attack), SOX21 (hair
                  loss), SATB1 (breast cancer), TSLP (asthma), CCR5 (HIV), …
Huss, NAR, 2010
13
 The Gene Wiki is (reasonably) reliable

                                      Per edit     Average      Probability
                                     probability   lifetime      by time
   Cumulative edits




                        Good edits     98.9%       115.4 d       99.968%


                        Vandalism      1.1%         3.4 d        0.032%

                      Date                                    (0.63% for
                                                              WP overall)



Good, NAR, 2011
14
 Making the Gene Wiki more reliable
  Novartis is a multinational   2       The company name is derived
  pharmaceutical company                 from old Greek, and means
 based in Basel, Switzerland                 "destroyer of birds".
that manufactures drugs such
         as clozapine
     (Clozaril), diclofenac
         (Voltaren), …

                                    2




Good, NAR, 2011                               http://www.wikitrust.net/
15
 Making the Gene Wiki more reliable
  Novartis is a multinational             2         The company name is derived
  pharmaceutical company                             from old Greek, and means
 based in Basel, Switzerland                             "destroyer of birds".
that manufactures drugs such
         as clozapine
     (Clozaril), diclofenac
         (Voltaren), …




                  36211 total edits              36 total edits

                                      *                                          *
                                      *
                                      *
                                      *                                          *
                                      *
                                      *                                          *
                                      *
                                      *                                          *
                                      *                                          *

           High-trust author                  Low-trust author
Good, NAR, 2011                                           http://www.wikitrust.net/
16
Partnering with traditional scientific publishing
17
Partnering with traditional scientific publishing
18
Partnering with traditional scientific publishing
19
       Collaborators                                                      Group members
Doug Howe, ZFIN                                              Ben Good               Max Nanis
John Hogenesch, U Penn
Jon Huss, GNF
                                                             Salvatore Loguercio    Chunlei Wu
Luca de Alfaro, UCSC                                         Ian Macleod
Angel Pizzaro, U Penn
Faramarz Valafar, SDSU
Pierre Lindenbaum,
      Fondation Jean Dausset
Michael Martone, Rush
Konrad Koehler, Karo Bio
Warren Kibbe, Simon Lim, Northwestern
                                                             http://slideshare.com/andrewsu
Many Wikipedia editors
    WP:MCB Project




                                              Contact
                                          http://sulab.org
                                         asu@scripps.edu
                                           @andrewsu
                                           +Andrew Su

                                        Funding and Support



                                 (BioGPS: GM83924, Gene Wiki: GM089820)

Wikipedia as an engine for scientific communication and collaboration at massive scale

  • 1.
    Wikipedia as anengine for scientific communication and collaboration at massive scale Andrew Su, Ph.D. @andrewsu asu@scripps.edu http://sulab.org OK ScienceWriters2012 OK October 27, 2012
  • 2.
    2 The biomedical literatureis growing rapidly Number of PubMed-indexed articles 1,000,000 800,000 600,000 400,000 200,000 0 1979 1984 1989 1994 1999 2004 2009
  • 3.
    3 The biomedical literatureis growing rapidly Average of articlesof humantypical scientist Number capacity read by scientist 20 10 0 1979 1984 1989 1994 1999 2004 2009
  • 4.
    4 High-throughput molecular profilingis powerful Testable hypothesis ~20,000 genes 100+ candidates 10+ experiments
  • 5.
    Filtering, extracting, andsummarizing PubMed Documents Concepts Review article
  • 6.
    Filtering, extracting, andsummarizing PubMed Documents Concepts
  • 7.
    7 10k gene“stubs” within Wikipedia ≈ “Gene Wiki” Protein structure Gene summary Symbols and identifiers Gene Ontology annotations Protein interactions Tissue expression Linked pattern references Links to structured databases Huss, PLoS Biol, 2008
  • 8.
    8 Gene Wikihas a critical mass of readers Rank 1001-1010: Specialists Rank 101-110: Scientists CSDA Tau protein CNTNAP2 Interleukin 10 IGSF8 APC Adenosine A3 receptor C-Met RYR1 Factor V ETV6 Interleukin 8 Small heterodimer partner CD44 5-HT1D receptor Histamine H1 receptor TRPC6 Kappa Opioid receptor Interleukin-6 receptor Dihydrofolate reductase Rank 1-10: Laypeople Total: 4.0 million views / month Insulin Titin Human chorionic gonadotropin Vasopressin ANKH CLOCK Catalase Erythropoietin Glucagon Parathyroid hormone Huss, PLoS Biol, 2008; Huss, NAR, 2010; Good, NAR, 2011
  • 9.
    9 Gene Wikihas a critical mass of readers Huss, PLoS Biol, 2008; Huss, NAR, 2010; Good, NAR, 2011
  • 10.
    10 Gene Wikihas a critical mass of editors Editor count Editors Edit count Edits Increase of ~10,000 words / month from >1,000 edits Currently 1.42 million words Approximately equal to 230 full-length articles Huss, NAR, 2010; Good, NAR, 2011
  • 11.
    11 A review articlefor every gene is powerful Reelin: 98 editors, 703 edits since July 2002 Hyperlinks to related concepts Heparin: 358 editors, 654 edits since June 2003 AMPK: 109 editors, 203 edits since March 2004 RNAi: 394 editors, 994 edits since October 2002 References to the literature
  • 12.
    12 The GeneWiki is timely and current Manny Ramirez suspended for doping Catalase linked to premature gray hair Also, MGAT2 (obesity), ALDH2 (heart attack), SOX21 (hair loss), SATB1 (breast cancer), TSLP (asthma), CCR5 (HIV), … Huss, NAR, 2010
  • 13.
    13 The GeneWiki is (reasonably) reliable Per edit Average Probability probability lifetime by time Cumulative edits Good edits 98.9% 115.4 d 99.968% Vandalism 1.1% 3.4 d 0.032% Date (0.63% for WP overall) Good, NAR, 2011
  • 14.
    14 Making theGene Wiki more reliable Novartis is a multinational 2 The company name is derived pharmaceutical company from old Greek, and means based in Basel, Switzerland "destroyer of birds". that manufactures drugs such as clozapine (Clozaril), diclofenac (Voltaren), … 2 Good, NAR, 2011 http://www.wikitrust.net/
  • 15.
    15 Making theGene Wiki more reliable Novartis is a multinational 2 The company name is derived pharmaceutical company from old Greek, and means based in Basel, Switzerland "destroyer of birds". that manufactures drugs such as clozapine (Clozaril), diclofenac (Voltaren), … 36211 total edits 36 total edits * * * * * * * * * * * * * * High-trust author Low-trust author Good, NAR, 2011 http://www.wikitrust.net/
  • 16.
    16 Partnering with traditionalscientific publishing
  • 17.
    17 Partnering with traditionalscientific publishing
  • 18.
    18 Partnering with traditionalscientific publishing
  • 19.
    19 Collaborators Group members Doug Howe, ZFIN Ben Good Max Nanis John Hogenesch, U Penn Jon Huss, GNF Salvatore Loguercio Chunlei Wu Luca de Alfaro, UCSC Ian Macleod Angel Pizzaro, U Penn Faramarz Valafar, SDSU Pierre Lindenbaum, Fondation Jean Dausset Michael Martone, Rush Konrad Koehler, Karo Bio Warren Kibbe, Simon Lim, Northwestern http://slideshare.com/andrewsu Many Wikipedia editors WP:MCB Project Contact http://sulab.org asu@scripps.edu @andrewsu +Andrew Su Funding and Support (BioGPS: GM83924, Gene Wiki: GM089820)

Editor's Notes

  • #5 next gen sequencing identifies candidate genesAlso Microarray data, proteomics, GWAS, methylation, post-translational modifications, translocation detection, etc.What do these genes do?
  • #6 Relying on the entire community of scientists to digest the biomedical literature: identification filtering extraction summarization
  • #7 Relying on the entire community of scientists to digest the biomedical literature: identification filtering extraction summarization
  • #15 Reverted four minutes later
  • #16 Reverted four minutes later