SlideShare a Scribd company logo
1 of 43
Cultivating and mining the Gene Wiki for crowdsourcedgene annotation ISMB Bio-Ontologies SIG July 14, 2011 Andrew Su, Ph.D.
Few genes are well annotated… 2 Counts TP53 TNF APOE MTHFR IL6 HLA-DRB1 VEGFA EGFR TGFB1 ACE 59% PubMed 38% 23,278 protein-coding genes Gene ontology Genes, sorted by decreasing counts Data: NCBI gene2pubmed, August 2010
… because the literature is sparsely curated? 3
… because the literature is sparsely curated? 4 Number of articles read by typical scientist
5 311,696 articles (1.5% of PubMed) have been cited by GO annotations
6 0 Sooner or later, the research community will need to be involved in the annotation effort to scale up to the rate of data generation.
The Long Tail is a prolific source of content 7 Short Head Content produced Long Tail Contributors (sorted) Publishing: Video: Product reviews: Food reviews: Judging: Newspapers TV/Hollywood Consumer reports Food critics Olympics Blogs YouTube Amazon reviews Yelp American Idol
Wikipedia is reasonably accurate 8
Wikipedia has breadth and depth 9 Articles Words (millions) Words/ article Wikipedia Britannica Online http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons, July 2008
10 We can harness the Long Tail of scientists to directly participate in the gene annotation process.
10,000 gene “stubs” within Wikipedia 11 Protein structure Gene summary Symbols and identifiers Gene Ontology annotations Protein interactions Tissue expression pattern Linked references Links to structured databases
Wiki success depends on a positive feedback 12 Gene wiki page utility 1 100 2 200 Number of users Number of contributors
Filtering, extracting, and summarizing PubMed Documents Concepts
A review article for every gene is powerful 14 Reelin: 68 editors, 543 edits since July 2002 Heparin: 175 editors, 320 edits since June 2003 AMPK: 44 editors, 84 edits since March 2004 RNAi: 232 editors, 708 edits since October 2002 References to the literature Hyperlinks to related concepts
Gene Wiki has a diverse critical mass of readers 15 Utility Rank 101-110: Scientists Tau protein Interleukin 10 APC C-Met Factor V Interleukin 8 CD44 Histamine H1 receptor Kappa Opioid receptor Dihydrofolatereductase Rank 1001-1010: Specialists CSDA CNTNAP2 IGSF8 Adenosine A3 receptor RYR1 ETV6 Small heterodimer partner 5-HT1D receptor TRPC6 Interleukin-6 receptor Users Contributors Rank 1-10: General society Insulin Titin Human chorionic gonadotropin Vasopressin ANKH CLOCK Catalase Erythropoietin Glucagon Parathyroid hormone Total: 5.0 million views / month
Readership is poised to grow 16 Utility Users Contributors
The Gene Wiki has a critical mass of editors 17 Utility Users Contributors Editors Editor count Edit count Edits In Jan – Jun 2010 … … 7474 edits were made by 2109 unique users  … total increase in text  ≈  20 PLoS Biology research articles
Making the Gene Wiki more reliable 18 The company name is derived from old Greek, and means "destroyer of birds". Novartis is a multinational pharmaceutical company based in Basel, Switzerland that manufactures drugs such as clozapine (Clozaril), diclofenac (Voltaren), … 2 2
Making the Gene Wiki more reliable 19 The company name is derived from old Greek, and means "destroyer of birds". Novartis is a multinational pharmaceutical company based in Basel, Switzerland that manufactures drugs such as clozapine (Clozaril), diclofenac (Voltaren), … 2 36211 total edits 36 total edits * * * * * * * * * * * * * * High-trust author Low-trust author http://www.wikitrust.net/
Making the Gene Wiki more computable 20 Structured annotations Free text !
Example text from 5-HT1A receptor Agonists Heart rate Receptor Blood pressure Snippet from article on 5-HT1A receptor: Snippet from article on 5-HT1A receptor: “…5-HT1A receptor agonistsdecrease blood pressureand heart rateor cause hypotension via a central mechanism, by inducing peripheral vasodilation, and by stimulating the vagus nerve…” “…5-HT1A receptor agonists decrease blood pressure and heart rate or cause hypotension via a central mechanism, by inducing peripheral vasodilation, and by stimulating the vagus nerve…” Vasodilation Hypotension Vagus nerve
Example text from 5-HT1A receptor Agonists Heart rate Receptor Blood pressure 5-HT1A receptor Vasodilation Hypotension Vagus nerve
23
Re-discovering common knowledge 24 NCBI Entrez Gene: 3362 Wikilink Candidate assertion GO:0004993 GO exact synonym Gene Wiki mapping
Mining the most recent literature 25 NCBI Entrez Gene: 57620 Wikilink Candidate assertion GO:0030154 GO related concept Gene Wiki mapping
Filling the gaps in gene annotation 26 NCBI Entrez Gene: 334 Wikilink Candidate assertion GO:0006897 GO exact match Gene Wiki mapping
Disease associations mined from the Gene Wiki Gene Wiki Articles (10,271) 23% exact match Filter out seeded text 5% match parent 2% match child NCBO Annotator 70% have no match Compare to DO database Matched Disease Ontology terms (2983) 2147 candidate  annotations
Disease associations mined from the Gene Wiki Expert curation Correct Maybe Incorrect 86% 10% Overall specificity: 90-93% 4%
GO associations mined from the Gene Wiki Gene Wiki Articles (10,271) 17% exact match Filter out seeded text 26% match parent NCBO Annotator 55% have no match 2% match child Compare to GO database Matched Gene Ontology terms (11,022) 6319 candidate  annotations
GO associations mined from the Gene Wiki Expert curation Correct Maybe Incorrect 14% 26% Overall specificity: 48-64% 60%
Common sources of error in GO associations 31 1)  Incorrect concept recognition OR2F1: “Olfactory receptors … are responsible for the recognition and G protein-mediated transductionof odorant signals.” Transduction (GO:0009293) The transfer of genetic information to a bacterium from a bacteriophage or between bacterial or yeast cells mediated by a phage vector.  Signal transduction (GO:0007165) The cellular process in which a signal is conveyed to trigger a change in the activity or state of a cell. Signal transduction begins with reception of a signal, e.g. a ligand binding to a receptor or receptor activation by a stimulus such as light, and ends with regulation of a downstream cellular process…
Common sources of error in GO associations 32 Dephosphorylation Excretion Gene expression Glycosylation Localization Methylation Proteolysis Secretion Transport Transcription Translation 2)  Incorrect sentence context Phosporylation MEF2C: “Several post translational modifications have been identified including phosphorylation on serine-59 …” MEF2C Neurogenesis Myelination
Is 48 – 64 % specificity useful? 33 Enrichment analysis muscle contraction (GO:0006936) GO term 5449 articles Concept recognition PubMed abstracts Gene list 87 genes + Gene Wiki 87 articles GO:0006936 GO:0006936 Linked genes by PubMed only Linked genes by PubMed + Gene Wiki P = 1.0 P = 1.22 E-09
GO associations improve enrichment analyses 34 p-value (PubMed + Gene Wiki) Muscle contraction p-value (PubMed only)
35 “Like the image of the [mammoth] hairball, it is equally unhelpful in understanding the object’s properties. You can guess that the network is large and its connectivity is complex, but not more. At best, the visualization is merely decorative.” - Martin Krzywinski http://mkweb.bcgsc.ca/linnet/talks/linnet-informatics2010.pdf
36 TOP 100 GENES
Mapping to many biomedical semantic groups 37
Semantic representation From text mining to a Semantic Gene Wiki 38 Community contributions Semantics Semantic querying û ü ü Home-grown wiki ü ü û ? Gene Wiki/ Wikipedia ü ü –  Semantic Gene Wiki
Semantic Wiki Links 39 Semantic Gene Wiki Rendered text Gene Wiki Based on Semantic MediaWiki (SMW) Based on MediaWiki apoptosis apoptosis apoptosis Mirror and translate apoptosis [[apoptosis]] [[apoptosis]] [[repress::apoptosis]] {{SWL|target=apoptosis|type=promotes}} apoptosis [[promote::apoptosis]] [[modulate::apoptosis]] Semantic queries, RDF, etc
For community-based science, data is king 40 Data without structure    is valuable, but structure    without data is not.
For community-based science, data is king 41 Data without structure    is valuable, but structure    without data is not. X X Wikipedia WP:MCB, Boghog Artists and illustrators Wiki links, infoboxes DOI bot, CitationBot WikiTrust Copy-editing Figures Structure Citations Provenance = X Domain expert Information scientist
The Gene Wiki successfully harnesses the  Long Tail of scientists  for community annotation  of gene function 42
43 Collaborators Group members Doug Howe, ZFIN Salvatore Loguercio (*), TU Dresden John Hogenesch, U Penn Jon Huss, GNF Angel Pizzaro, U Penn Faramarz Valafar, SDSU Pierre Lindenbaum,  FondationJean Dausset Michael Martone, Rush Konrad Koehler, Karo Bio Warren Kibbe, Simon Lim, Northwestern Many Wikipedia editors 	WP:MCB Project Erik Clarke Ben Good (*) Ian Macleod ChunleiWu (*) See talk on SNPediamashup at 1:55 PM WikiTrust (UCSC) Luca de Alfaro Bo Adler Ian Pye Contact http://sulab.org asu@scripps.edu @andrewsu +Andrew Su ISMB travel support Funding and Support (BioGPS: GM83924, Gene Wiki: GM089820)

More Related Content

More from Andrew Su

WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)Andrew Su
 
The case for an open biomedical knowledgebase
The case for an open biomedical knowledgebaseThe case for an open biomedical knowledgebase
The case for an open biomedical knowledgebaseAndrew Su
 
Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)Andrew Su
 
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...Andrew Su
 
Citizen Science and Rare Disease Research
Citizen Science and Rare Disease ResearchCitizen Science and Rare Disease Research
Citizen Science and Rare Disease ResearchAndrew Su
 
Open biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen scienceOpen biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen scienceAndrew Su
 
Heart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen ScienceHeart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen ScienceAndrew Su
 
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015Andrew Su
 
Using Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledgeUsing Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledgeAndrew Su
 
UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6Andrew Su
 
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)Andrew Su
 
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)Andrew Su
 
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen ScienceCrowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen ScienceAndrew Su
 
Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)Andrew Su
 
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...Andrew Su
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgAndrew Su
 
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...Andrew Su
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgAndrew Su
 
Wikipedia as an engine for scientific communication and collaboration at mass...
Wikipedia as an engine for scientific communication and collaboration at mass...Wikipedia as an engine for scientific communication and collaboration at mass...
Wikipedia as an engine for scientific communication and collaboration at mass...Andrew Su
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)Andrew Su
 

More from Andrew Su (20)

WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)
 
The case for an open biomedical knowledgebase
The case for an open biomedical knowledgebaseThe case for an open biomedical knowledgebase
The case for an open biomedical knowledgebase
 
Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)
 
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
 
Citizen Science and Rare Disease Research
Citizen Science and Rare Disease ResearchCitizen Science and Rare Disease Research
Citizen Science and Rare Disease Research
 
Open biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen scienceOpen biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen science
 
Heart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen ScienceHeart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen Science
 
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
 
Using Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledgeUsing Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledge
 
UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6
 
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
 
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
 
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen ScienceCrowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
 
Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)
 
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
 
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
 
Wikipedia as an engine for scientific communication and collaboration at mass...
Wikipedia as an engine for scientific communication and collaboration at mass...Wikipedia as an engine for scientific communication and collaboration at mass...
Wikipedia as an engine for scientific communication and collaboration at mass...
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)
 

Recently uploaded

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 

Recently uploaded (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 

Cultivating and mining the Gene Wiki for crowdsourced gene annotation

  • 1. Cultivating and mining the Gene Wiki for crowdsourcedgene annotation ISMB Bio-Ontologies SIG July 14, 2011 Andrew Su, Ph.D.
  • 2. Few genes are well annotated… 2 Counts TP53 TNF APOE MTHFR IL6 HLA-DRB1 VEGFA EGFR TGFB1 ACE 59% PubMed 38% 23,278 protein-coding genes Gene ontology Genes, sorted by decreasing counts Data: NCBI gene2pubmed, August 2010
  • 3. … because the literature is sparsely curated? 3
  • 4. … because the literature is sparsely curated? 4 Number of articles read by typical scientist
  • 5. 5 311,696 articles (1.5% of PubMed) have been cited by GO annotations
  • 6. 6 0 Sooner or later, the research community will need to be involved in the annotation effort to scale up to the rate of data generation.
  • 7. The Long Tail is a prolific source of content 7 Short Head Content produced Long Tail Contributors (sorted) Publishing: Video: Product reviews: Food reviews: Judging: Newspapers TV/Hollywood Consumer reports Food critics Olympics Blogs YouTube Amazon reviews Yelp American Idol
  • 9. Wikipedia has breadth and depth 9 Articles Words (millions) Words/ article Wikipedia Britannica Online http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons, July 2008
  • 10. 10 We can harness the Long Tail of scientists to directly participate in the gene annotation process.
  • 11. 10,000 gene “stubs” within Wikipedia 11 Protein structure Gene summary Symbols and identifiers Gene Ontology annotations Protein interactions Tissue expression pattern Linked references Links to structured databases
  • 12. Wiki success depends on a positive feedback 12 Gene wiki page utility 1 100 2 200 Number of users Number of contributors
  • 13. Filtering, extracting, and summarizing PubMed Documents Concepts
  • 14. A review article for every gene is powerful 14 Reelin: 68 editors, 543 edits since July 2002 Heparin: 175 editors, 320 edits since June 2003 AMPK: 44 editors, 84 edits since March 2004 RNAi: 232 editors, 708 edits since October 2002 References to the literature Hyperlinks to related concepts
  • 15. Gene Wiki has a diverse critical mass of readers 15 Utility Rank 101-110: Scientists Tau protein Interleukin 10 APC C-Met Factor V Interleukin 8 CD44 Histamine H1 receptor Kappa Opioid receptor Dihydrofolatereductase Rank 1001-1010: Specialists CSDA CNTNAP2 IGSF8 Adenosine A3 receptor RYR1 ETV6 Small heterodimer partner 5-HT1D receptor TRPC6 Interleukin-6 receptor Users Contributors Rank 1-10: General society Insulin Titin Human chorionic gonadotropin Vasopressin ANKH CLOCK Catalase Erythropoietin Glucagon Parathyroid hormone Total: 5.0 million views / month
  • 16. Readership is poised to grow 16 Utility Users Contributors
  • 17. The Gene Wiki has a critical mass of editors 17 Utility Users Contributors Editors Editor count Edit count Edits In Jan – Jun 2010 … … 7474 edits were made by 2109 unique users … total increase in text ≈ 20 PLoS Biology research articles
  • 18. Making the Gene Wiki more reliable 18 The company name is derived from old Greek, and means "destroyer of birds". Novartis is a multinational pharmaceutical company based in Basel, Switzerland that manufactures drugs such as clozapine (Clozaril), diclofenac (Voltaren), … 2 2
  • 19. Making the Gene Wiki more reliable 19 The company name is derived from old Greek, and means "destroyer of birds". Novartis is a multinational pharmaceutical company based in Basel, Switzerland that manufactures drugs such as clozapine (Clozaril), diclofenac (Voltaren), … 2 36211 total edits 36 total edits * * * * * * * * * * * * * * High-trust author Low-trust author http://www.wikitrust.net/
  • 20. Making the Gene Wiki more computable 20 Structured annotations Free text !
  • 21. Example text from 5-HT1A receptor Agonists Heart rate Receptor Blood pressure Snippet from article on 5-HT1A receptor: Snippet from article on 5-HT1A receptor: “…5-HT1A receptor agonistsdecrease blood pressureand heart rateor cause hypotension via a central mechanism, by inducing peripheral vasodilation, and by stimulating the vagus nerve…” “…5-HT1A receptor agonists decrease blood pressure and heart rate or cause hypotension via a central mechanism, by inducing peripheral vasodilation, and by stimulating the vagus nerve…” Vasodilation Hypotension Vagus nerve
  • 22. Example text from 5-HT1A receptor Agonists Heart rate Receptor Blood pressure 5-HT1A receptor Vasodilation Hypotension Vagus nerve
  • 23. 23
  • 24. Re-discovering common knowledge 24 NCBI Entrez Gene: 3362 Wikilink Candidate assertion GO:0004993 GO exact synonym Gene Wiki mapping
  • 25. Mining the most recent literature 25 NCBI Entrez Gene: 57620 Wikilink Candidate assertion GO:0030154 GO related concept Gene Wiki mapping
  • 26. Filling the gaps in gene annotation 26 NCBI Entrez Gene: 334 Wikilink Candidate assertion GO:0006897 GO exact match Gene Wiki mapping
  • 27. Disease associations mined from the Gene Wiki Gene Wiki Articles (10,271) 23% exact match Filter out seeded text 5% match parent 2% match child NCBO Annotator 70% have no match Compare to DO database Matched Disease Ontology terms (2983) 2147 candidate annotations
  • 28. Disease associations mined from the Gene Wiki Expert curation Correct Maybe Incorrect 86% 10% Overall specificity: 90-93% 4%
  • 29. GO associations mined from the Gene Wiki Gene Wiki Articles (10,271) 17% exact match Filter out seeded text 26% match parent NCBO Annotator 55% have no match 2% match child Compare to GO database Matched Gene Ontology terms (11,022) 6319 candidate annotations
  • 30. GO associations mined from the Gene Wiki Expert curation Correct Maybe Incorrect 14% 26% Overall specificity: 48-64% 60%
  • 31. Common sources of error in GO associations 31 1) Incorrect concept recognition OR2F1: “Olfactory receptors … are responsible for the recognition and G protein-mediated transductionof odorant signals.” Transduction (GO:0009293) The transfer of genetic information to a bacterium from a bacteriophage or between bacterial or yeast cells mediated by a phage vector. Signal transduction (GO:0007165) The cellular process in which a signal is conveyed to trigger a change in the activity or state of a cell. Signal transduction begins with reception of a signal, e.g. a ligand binding to a receptor or receptor activation by a stimulus such as light, and ends with regulation of a downstream cellular process…
  • 32. Common sources of error in GO associations 32 Dephosphorylation Excretion Gene expression Glycosylation Localization Methylation Proteolysis Secretion Transport Transcription Translation 2) Incorrect sentence context Phosporylation MEF2C: “Several post translational modifications have been identified including phosphorylation on serine-59 …” MEF2C Neurogenesis Myelination
  • 33. Is 48 – 64 % specificity useful? 33 Enrichment analysis muscle contraction (GO:0006936) GO term 5449 articles Concept recognition PubMed abstracts Gene list 87 genes + Gene Wiki 87 articles GO:0006936 GO:0006936 Linked genes by PubMed only Linked genes by PubMed + Gene Wiki P = 1.0 P = 1.22 E-09
  • 34. GO associations improve enrichment analyses 34 p-value (PubMed + Gene Wiki) Muscle contraction p-value (PubMed only)
  • 35. 35 “Like the image of the [mammoth] hairball, it is equally unhelpful in understanding the object’s properties. You can guess that the network is large and its connectivity is complex, but not more. At best, the visualization is merely decorative.” - Martin Krzywinski http://mkweb.bcgsc.ca/linnet/talks/linnet-informatics2010.pdf
  • 36. 36 TOP 100 GENES
  • 37. Mapping to many biomedical semantic groups 37
  • 38. Semantic representation From text mining to a Semantic Gene Wiki 38 Community contributions Semantics Semantic querying û ü ü Home-grown wiki ü ü û ? Gene Wiki/ Wikipedia ü ü – Semantic Gene Wiki
  • 39. Semantic Wiki Links 39 Semantic Gene Wiki Rendered text Gene Wiki Based on Semantic MediaWiki (SMW) Based on MediaWiki apoptosis apoptosis apoptosis Mirror and translate apoptosis [[apoptosis]] [[apoptosis]] [[repress::apoptosis]] {{SWL|target=apoptosis|type=promotes}} apoptosis [[promote::apoptosis]] [[modulate::apoptosis]] Semantic queries, RDF, etc
  • 40. For community-based science, data is king 40 Data without structure is valuable, but structure without data is not.
  • 41. For community-based science, data is king 41 Data without structure is valuable, but structure without data is not. X X Wikipedia WP:MCB, Boghog Artists and illustrators Wiki links, infoboxes DOI bot, CitationBot WikiTrust Copy-editing Figures Structure Citations Provenance = X Domain expert Information scientist
  • 42. The Gene Wiki successfully harnesses the Long Tail of scientists for community annotation of gene function 42
  • 43. 43 Collaborators Group members Doug Howe, ZFIN Salvatore Loguercio (*), TU Dresden John Hogenesch, U Penn Jon Huss, GNF Angel Pizzaro, U Penn Faramarz Valafar, SDSU Pierre Lindenbaum, FondationJean Dausset Michael Martone, Rush Konrad Koehler, Karo Bio Warren Kibbe, Simon Lim, Northwestern Many Wikipedia editors WP:MCB Project Erik Clarke Ben Good (*) Ian Macleod ChunleiWu (*) See talk on SNPediamashup at 1:55 PM WikiTrust (UCSC) Luca de Alfaro Bo Adler Ian Pye Contact http://sulab.org asu@scripps.edu @andrewsu +Andrew Su ISMB travel support Funding and Support (BioGPS: GM83924, Gene Wiki: GM089820)

Editor's Notes

  1. We are very early in our efforts to comprehensively annotate human gene functionWhy important? Genome-scale surveys aren’t biased toward well studied genes, huge opportunity for biomedical discovery59% have 5 or fewer references38% have one or no references
  2. If you believe that greater than 1.5% of articles have relevance to gene function, then it says there is a bottleneck in in our curation effortsNumbers updated 7/15/2011
  3. Relying on the entire community of scientists to digest the biomedical literature: identification filtering extraction summarization
  4. Reverted four minutes later
  5. Reverted four minutes later
  6. Structured annotations enable pathway analysis, statistical analyses, cross-species comparisons
  7. 5-HT1a is a serotonin receptorTODO: add real ontology identifiers
  8. 5-HT1a is a serotonin receptor
  9. TODO: update example?
  10. Transduction accounts for 70% of the concept recognition problems
  11. We extended this analysis to all 773 GO terms used in human gene annotations and found a consistent improvement in the enrichment scores