SlideShare a Scribd company logo
A Centralized Model Organism Database (CMOD)
for the Long Tail of Genomes
ABSTRACT
Andrew I. Su, Benjamin M. Good, Chinmay Naik and Adriel Carolino
The Scripps Research Institute, La Jolla, California, USA
Background
How Gene Wiki?
We acknowledge support from the National
Institute of General Medical Sciences
(GM089820 and GM083924).
CONTACT
Benjamin Good: bgood@scripps.edu, @bgood
Andrew Su: asu@scripps.edu, @andrewsu
How Gene Wiki? The CMOD visionGENE WIKI EXAMPLEABSTRACT
FUNDING
Progress and status
CONCLUSION
One: structure from text miningThe Dark Matter of genome annotation
We need more hands on deck! We have
multiple positions open for postdocs and
programmers interested in crowdsourcing
and bioinformatics projects (like CMOD)!
1
10
100
1000
10000
100000
1000000
1997
1999
2001
2003
2005
2007
2009
2011
2013
2015
2017
2019
2021
2023
2025
Bacteria
Eukaryotes
Archaea
Model organism databases (MODs) are fantastic resources for
organizing genomic information for commonly-studied
organisms. To facilitate the creation and maintenance of MODs,
the Generic Model Organism Database (GMOD) Project
provides “a set of interoperable open-source software
components for visualizing, annotating, and managing biological
data.”
Provide a database of the
world’s knowledge that
anyone can edit.
- Denny Vrandečić
Despite the obvious success and value of GMOD, the number of
sequenced genomes is growing exponentially. Does this model
scale with the rate of genome sequencing?
Figure courtesy Scott Cain
Wikidata (http://wikidata.org) is an innovative and important
new tool for community-based knowledge management.
Wikidata is supported by the Wikimedia Foundation, which also
operates Wikipedia. In short, Wikidata is to structured data what
Wikipedia is to free text.
Model organism databases are fantastic resources for
genomics researchers. But relatively few model organisms
have stable funding for their database, and the number of
sequenced genomes is increasing exponentially. It seems
impractical to create and fund a model organism database
for each of them. Here, we describe our efforts to build a
Centralized Model Organism Database (CMOD), a single
online resource to support all genomes and organisms. To
scale to the Long Tail of Genomes, CMOD employs an open
editing model in which the entire research community is
empowered to edit and maintain genomic data. We
describe our efforts to systematically populate CMOD with
two core data types across all organisms – genome
annotations and Gene Ontology annotations.
We propose to build a Centralized Model Organism Database
(CMOD), which would house gene and genome annotations for
all genomes. This database would be based on Wikidata,
enabling it to be community-curated, continuously-updated, and
computer-readable.
CMOD
Gene and genome annotations
CMOD data can be accessed using a number of mechanisms.
The Wikidata web interface offers convenient access using a web
browser. The Wikidata application programming interface (API)
and associated programming libraries allow programmers and
bioinformaticians computational access to the data. Wikidata
export to RDF offers compatibility with the Semantic Web and
Linked Data. We also envision that many popular GMOD tools,
including Gbrowse, Jbrowse, and and WebApollo, can be
modified to use CMOD as the back-end data warehouse.
Wikidata
Wikidata currently catalogs over 14 million entities, and
describes those entities in the form of 27 million statements.
This knowledgebase is the product of over 50 million edits. Of
those edits, ~90% are contributed by bots that predominantly
import data from structured resources, and 10% are contributed
by human editors.
This seminal paper identified 517 operons and 103 small
regulatory RNAs in Listeria monocytogenes, an important human
pathogen. Unfortunately, these annotations cannot be
downloaded from the Broad’s “Listeria monocytogenes
Database”, nor NCBI Genome, nor UCSC’s Microbial Genome
Browser, nor EnsemblBacteria, nor any GMOD instance. The
only place they are available is from the Supplementary
information on the Nature website in PDF format.
We have loaded gene and genome annotation data for ~1000
human genes, the human proteins they encode, and their mouse
orthologs according to the data model shown above. The code
repository for managing these data is available at
https://bitbucket.org/sulab/wikidatagenebot.
The Skeptic’s Corner
Will CMOD scale with the exponential growth in sequenced
genomes? Yes, because there is no gatekeeper to adding new
content. Anyone is empowered to directly contribute. Even
though the technical infrastructure is centralized, the data
management is highly distributed.
Who will contribute to CMOD? We envision a wide spectrum of
contributors, from large biocuration/annotation centers adding
large data sets, to individual bioinformaticians who deposit
structured versions of previously unstructured data, to individual
scientists contributing individual annotations.
Will CMOD content be trustworthy? Like Wikipedia, we expect
that Wikidata overall will asymptotically approach perfect
accuracy and completeness. Moreover, because provenance is a
core part of the data model, the presence/absence/type of the
reference can be used to systematically filter the knowledgebase
according to each user’s needs.
Managing genomic information and knowledge is a critical
challenge for biomedical research. Community infrastructure
that allows individuals to collaboratively and collectively organize
knowledge has the potential to be an enabling technology in
biological research. Here, we propose CMOD as one such
application that is particularly focused on the Long Tail of
sequenced genomes.
Cumulative number of sequenced genomes

More Related Content

What's hot

Introduction to Bioinformatics.
 Introduction to Bioinformatics. Introduction to Bioinformatics.
Introduction to Bioinformatics.
Elena Sügis
 
Wikidata and the Semantic Web of Food
Wikidata and the  Semantic Web of FoodWikidata and the  Semantic Web of Food
Wikidata and the Semantic Web of Food
Benjamin Good
 
Careers in bioinformatics, Scope, Skills and Jobs
Careers in bioinformatics, Scope, Skills and JobsCareers in bioinformatics, Scope, Skills and Jobs
Careers in bioinformatics, Scope, Skills and Jobs
M Abdullah Chaudhry
 
Interoperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use CasesInteroperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use Cases
Rothamsted Research, UK
 
Introduction to Bioinformatics Slides
Introduction to Bioinformatics SlidesIntroduction to Bioinformatics Slides
Introduction to Bioinformatics Slides
Saide OER Africa
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
Nuno Barreto
 
BIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And ChallengesBIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And Challenges
Amos Watentena
 
B.sc biochem i bobi u-1 introduction to bioinformatics
B.sc biochem i bobi u-1 introduction to bioinformaticsB.sc biochem i bobi u-1 introduction to bioinformatics
B.sc biochem i bobi u-1 introduction to bioinformatics
Rai University
 
Bioinformatics: What, Why and Where?
Bioinformatics: What, Why and Where?Bioinformatics: What, Why and Where?
Bioinformatics: What, Why and Where?
Mohamed El Hadidi, Ph.D.
 
Computational Biology and Bioinformatics
Computational Biology and BioinformaticsComputational Biology and Bioinformatics
Computational Biology and Bioinformatics
Sharif Shuvo
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
nadimissimple
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
Anshika Bansal
 
Career oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of BioinformaticsCareer oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of Bioinformatics
Shikha Thakur
 
Data sharing - Data management - The SysMO-SEEK Story
Data sharing - Data management - The SysMO-SEEK StoryData sharing - Data management - The SysMO-SEEK Story
Data sharing - Data management - The SysMO-SEEK Story
Research Information Network
 
Gcc talk baltimore july 2014
Gcc talk baltimore july 2014Gcc talk baltimore july 2014
Gcc talk baltimore july 2014pratikomics
 
Globus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS AnalysisGlobus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS Analysis
Ravi Madduri
 
Application of bioinformatics
Application of bioinformaticsApplication of bioinformatics
Application of bioinformatics
Kamlesh Patade
 
2016 bmdid-mappings
2016 bmdid-mappings2016 bmdid-mappings
2016 bmdid-mappings
Michel Dumontier
 

What's hot (19)

Introduction to Bioinformatics.
 Introduction to Bioinformatics. Introduction to Bioinformatics.
Introduction to Bioinformatics.
 
Wikidata and the Semantic Web of Food
Wikidata and the  Semantic Web of FoodWikidata and the  Semantic Web of Food
Wikidata and the Semantic Web of Food
 
Careers in bioinformatics, Scope, Skills and Jobs
Careers in bioinformatics, Scope, Skills and JobsCareers in bioinformatics, Scope, Skills and Jobs
Careers in bioinformatics, Scope, Skills and Jobs
 
Interoperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use CasesInteroperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use Cases
 
Introduction to Bioinformatics Slides
Introduction to Bioinformatics SlidesIntroduction to Bioinformatics Slides
Introduction to Bioinformatics Slides
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
BIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And ChallengesBIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And Challenges
 
B.sc biochem i bobi u-1 introduction to bioinformatics
B.sc biochem i bobi u-1 introduction to bioinformaticsB.sc biochem i bobi u-1 introduction to bioinformatics
B.sc biochem i bobi u-1 introduction to bioinformatics
 
Bioinformatics: What, Why and Where?
Bioinformatics: What, Why and Where?Bioinformatics: What, Why and Where?
Bioinformatics: What, Why and Where?
 
Bioinformatics principles and applications
Bioinformatics principles and applicationsBioinformatics principles and applications
Bioinformatics principles and applications
 
Computational Biology and Bioinformatics
Computational Biology and BioinformaticsComputational Biology and Bioinformatics
Computational Biology and Bioinformatics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
Career oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of BioinformaticsCareer oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of Bioinformatics
 
Data sharing - Data management - The SysMO-SEEK Story
Data sharing - Data management - The SysMO-SEEK StoryData sharing - Data management - The SysMO-SEEK Story
Data sharing - Data management - The SysMO-SEEK Story
 
Gcc talk baltimore july 2014
Gcc talk baltimore july 2014Gcc talk baltimore july 2014
Gcc talk baltimore july 2014
 
Globus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS AnalysisGlobus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS Analysis
 
Application of bioinformatics
Application of bioinformaticsApplication of bioinformatics
Application of bioinformatics
 
2016 bmdid-mappings
2016 bmdid-mappings2016 bmdid-mappings
2016 bmdid-mappings
 

Similar to Centralized Model Organism Database (Biocuration 2014 poster)

Web services for sharing germplasm data sets, at FAO in Rome (2006)
Web services for sharing germplasm data sets, at FAO in Rome (2006)Web services for sharing germplasm data sets, at FAO in Rome (2006)
Web services for sharing germplasm data sets, at FAO in Rome (2006)
Dag Endresen
 
Gene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2KGene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2K
Benjamin Good
 
Web based servers and softwares for genome analysis
Web based servers and softwares for genome analysisWeb based servers and softwares for genome analysis
Web based servers and softwares for genome analysis
Dr. Naveen Gaurav srivastava
 
20 years of evolution in data production in health and life sciences
20 years of evolution in data production in health and life sciences20 years of evolution in data production in health and life sciences
20 years of evolution in data production in health and life sciences
slecrom
 
Celsi®, CELL SIGNALING
Celsi®, CELL SIGNALINGCelsi®, CELL SIGNALING
Celsi®, CELL SIGNALING
Spleißen Wege Bio, S de RL de CV
 
CELSI®, CELL SIGNALING
CELSI®, CELL SIGNALINGCELSI®, CELL SIGNALING
CELSI®, CELL SIGNALING
Spleißen Wege Bio, S de RL de CV
 
Celsi®, a virtual simulation software for cell signaling pathways
Celsi®, a virtual simulation software for cell signaling pathwaysCelsi®, a virtual simulation software for cell signaling pathways
Celsi®, a virtual simulation software for cell signaling pathways
Spleißen Wege Bio, S de RL de CV
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)
Robert Grossman
 
A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...
CSCJournals
 
Use of open_linked_data_in_bioinformatics
Use of open_linked_data_in_bioinformaticsUse of open_linked_data_in_bioinformatics
Use of open_linked_data_in_bioinformaticsRemzi Çelebi
 
SFSCON23 - Michele Finelli - Management of large genomic data with free software
SFSCON23 - Michele Finelli - Management of large genomic data with free softwareSFSCON23 - Michele Finelli - Management of large genomic data with free software
SFSCON23 - Michele Finelli - Management of large genomic data with free software
South Tyrol Free Software Conference
 
Software Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The UglySoftware Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The Ugly
João André Carriço
 
Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19
Bill Liu
 
Big Data and AI for Covid-19
Big Data and AI for Covid-19Big Data and AI for Covid-19
Big Data and AI for Covid-19
Andrew Zhang
 
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013
Amazon Web Services
 
What is Biological Computing And How It Will Change Our World
What is Biological Computing And How It Will Change Our WorldWhat is Biological Computing And How It Will Change Our World
What is Biological Computing And How It Will Change Our World
Bernard Marr
 
2011Field talk at iEVOBIO 2011
2011Field talk at iEVOBIO 20112011Field talk at iEVOBIO 2011
2011Field talk at iEVOBIO 2011
MIBBI Checklists
 
MMTF-Spark: Interactive, Scalable, and Reproducible Datamining of 3D Macromo...
 MMTF-Spark: Interactive, Scalable, and Reproducible Datamining of 3D Macromo... MMTF-Spark: Interactive, Scalable, and Reproducible Datamining of 3D Macromo...
MMTF-Spark: Interactive, Scalable, and Reproducible Datamining of 3D Macromo...
Peter Rose
 
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Amit Sheth
 

Similar to Centralized Model Organism Database (Biocuration 2014 poster) (20)

Web services for sharing germplasm data sets, at FAO in Rome (2006)
Web services for sharing germplasm data sets, at FAO in Rome (2006)Web services for sharing germplasm data sets, at FAO in Rome (2006)
Web services for sharing germplasm data sets, at FAO in Rome (2006)
 
Gene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2KGene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2K
 
Web based servers and softwares for genome analysis
Web based servers and softwares for genome analysisWeb based servers and softwares for genome analysis
Web based servers and softwares for genome analysis
 
20 years of evolution in data production in health and life sciences
20 years of evolution in data production in health and life sciences20 years of evolution in data production in health and life sciences
20 years of evolution in data production in health and life sciences
 
Celsi®, CELL SIGNALING
Celsi®, CELL SIGNALINGCelsi®, CELL SIGNALING
Celsi®, CELL SIGNALING
 
CELSI®, CELL SIGNALING
CELSI®, CELL SIGNALINGCELSI®, CELL SIGNALING
CELSI®, CELL SIGNALING
 
Celsi®, a virtual simulation software for cell signaling pathways
Celsi®, a virtual simulation software for cell signaling pathwaysCelsi®, a virtual simulation software for cell signaling pathways
Celsi®, a virtual simulation software for cell signaling pathways
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)
 
A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...
 
Mrr iti pm_poster
Mrr iti pm_posterMrr iti pm_poster
Mrr iti pm_poster
 
Use of open_linked_data_in_bioinformatics
Use of open_linked_data_in_bioinformaticsUse of open_linked_data_in_bioinformatics
Use of open_linked_data_in_bioinformatics
 
SFSCON23 - Michele Finelli - Management of large genomic data with free software
SFSCON23 - Michele Finelli - Management of large genomic data with free softwareSFSCON23 - Michele Finelli - Management of large genomic data with free software
SFSCON23 - Michele Finelli - Management of large genomic data with free software
 
Software Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The UglySoftware Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The Ugly
 
Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19
 
Big Data and AI for Covid-19
Big Data and AI for Covid-19Big Data and AI for Covid-19
Big Data and AI for Covid-19
 
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013
 
What is Biological Computing And How It Will Change Our World
What is Biological Computing And How It Will Change Our WorldWhat is Biological Computing And How It Will Change Our World
What is Biological Computing And How It Will Change Our World
 
2011Field talk at iEVOBIO 2011
2011Field talk at iEVOBIO 20112011Field talk at iEVOBIO 2011
2011Field talk at iEVOBIO 2011
 
MMTF-Spark: Interactive, Scalable, and Reproducible Datamining of 3D Macromo...
 MMTF-Spark: Interactive, Scalable, and Reproducible Datamining of 3D Macromo... MMTF-Spark: Interactive, Scalable, and Reproducible Datamining of 3D Macromo...
MMTF-Spark: Interactive, Scalable, and Reproducible Datamining of 3D Macromo...
 
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
 

More from Andrew Su

Building and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graphBuilding and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graph
Andrew Su
 
Wikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciencesWikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciences
Andrew Su
 
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledgeThe Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
Andrew Su
 
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
Andrew Su
 
WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)
Andrew Su
 
The case for an open biomedical knowledgebase
The case for an open biomedical knowledgebaseThe case for an open biomedical knowledgebase
The case for an open biomedical knowledgebase
Andrew Su
 
Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)
Andrew Su
 
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Andrew Su
 
Citizen Science and Rare Disease Research
Citizen Science and Rare Disease ResearchCitizen Science and Rare Disease Research
Citizen Science and Rare Disease Research
Andrew Su
 
Open biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen scienceOpen biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen science
Andrew Su
 
Heart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen ScienceHeart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen Science
Andrew Su
 
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Andrew Su
 
Using Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledgeUsing Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledge
Andrew Su
 
UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6
Andrew Su
 
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Andrew Su
 
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Andrew Su
 
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen ScienceCrowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Andrew Su
 
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
Andrew Su
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Andrew Su
 
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
Andrew Su
 

More from Andrew Su (20)

Building and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graphBuilding and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graph
 
Wikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciencesWikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciences
 
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledgeThe Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
 
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
 
WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)
 
The case for an open biomedical knowledgebase
The case for an open biomedical knowledgebaseThe case for an open biomedical knowledgebase
The case for an open biomedical knowledgebase
 
Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)
 
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
 
Citizen Science and Rare Disease Research
Citizen Science and Rare Disease ResearchCitizen Science and Rare Disease Research
Citizen Science and Rare Disease Research
 
Open biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen scienceOpen biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen science
 
Heart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen ScienceHeart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen Science
 
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
 
Using Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledgeUsing Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledge
 
UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6
 
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
 
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
 
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen ScienceCrowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
 
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
 
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
 

Recently uploaded

Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
fafyfskhan251kmf
 
S.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary levelS.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary level
ronaldlakony0
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Studia Poinsotiana
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiologyBLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
NoelManyise1
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Scintica Instrumentation
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
Areesha Ahmad
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 

Recently uploaded (20)

Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
 
S.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary levelS.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary level
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiologyBLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 

Centralized Model Organism Database (Biocuration 2014 poster)

  • 1. A Centralized Model Organism Database (CMOD) for the Long Tail of Genomes ABSTRACT Andrew I. Su, Benjamin M. Good, Chinmay Naik and Adriel Carolino The Scripps Research Institute, La Jolla, California, USA Background How Gene Wiki? We acknowledge support from the National Institute of General Medical Sciences (GM089820 and GM083924). CONTACT Benjamin Good: bgood@scripps.edu, @bgood Andrew Su: asu@scripps.edu, @andrewsu How Gene Wiki? The CMOD visionGENE WIKI EXAMPLEABSTRACT FUNDING Progress and status CONCLUSION One: structure from text miningThe Dark Matter of genome annotation We need more hands on deck! We have multiple positions open for postdocs and programmers interested in crowdsourcing and bioinformatics projects (like CMOD)! 1 10 100 1000 10000 100000 1000000 1997 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 2019 2021 2023 2025 Bacteria Eukaryotes Archaea Model organism databases (MODs) are fantastic resources for organizing genomic information for commonly-studied organisms. To facilitate the creation and maintenance of MODs, the Generic Model Organism Database (GMOD) Project provides “a set of interoperable open-source software components for visualizing, annotating, and managing biological data.” Provide a database of the world’s knowledge that anyone can edit. - Denny Vrandečić Despite the obvious success and value of GMOD, the number of sequenced genomes is growing exponentially. Does this model scale with the rate of genome sequencing? Figure courtesy Scott Cain Wikidata (http://wikidata.org) is an innovative and important new tool for community-based knowledge management. Wikidata is supported by the Wikimedia Foundation, which also operates Wikipedia. In short, Wikidata is to structured data what Wikipedia is to free text. Model organism databases are fantastic resources for genomics researchers. But relatively few model organisms have stable funding for their database, and the number of sequenced genomes is increasing exponentially. It seems impractical to create and fund a model organism database for each of them. Here, we describe our efforts to build a Centralized Model Organism Database (CMOD), a single online resource to support all genomes and organisms. To scale to the Long Tail of Genomes, CMOD employs an open editing model in which the entire research community is empowered to edit and maintain genomic data. We describe our efforts to systematically populate CMOD with two core data types across all organisms – genome annotations and Gene Ontology annotations. We propose to build a Centralized Model Organism Database (CMOD), which would house gene and genome annotations for all genomes. This database would be based on Wikidata, enabling it to be community-curated, continuously-updated, and computer-readable. CMOD Gene and genome annotations CMOD data can be accessed using a number of mechanisms. The Wikidata web interface offers convenient access using a web browser. The Wikidata application programming interface (API) and associated programming libraries allow programmers and bioinformaticians computational access to the data. Wikidata export to RDF offers compatibility with the Semantic Web and Linked Data. We also envision that many popular GMOD tools, including Gbrowse, Jbrowse, and and WebApollo, can be modified to use CMOD as the back-end data warehouse. Wikidata Wikidata currently catalogs over 14 million entities, and describes those entities in the form of 27 million statements. This knowledgebase is the product of over 50 million edits. Of those edits, ~90% are contributed by bots that predominantly import data from structured resources, and 10% are contributed by human editors. This seminal paper identified 517 operons and 103 small regulatory RNAs in Listeria monocytogenes, an important human pathogen. Unfortunately, these annotations cannot be downloaded from the Broad’s “Listeria monocytogenes Database”, nor NCBI Genome, nor UCSC’s Microbial Genome Browser, nor EnsemblBacteria, nor any GMOD instance. The only place they are available is from the Supplementary information on the Nature website in PDF format. We have loaded gene and genome annotation data for ~1000 human genes, the human proteins they encode, and their mouse orthologs according to the data model shown above. The code repository for managing these data is available at https://bitbucket.org/sulab/wikidatagenebot. The Skeptic’s Corner Will CMOD scale with the exponential growth in sequenced genomes? Yes, because there is no gatekeeper to adding new content. Anyone is empowered to directly contribute. Even though the technical infrastructure is centralized, the data management is highly distributed. Who will contribute to CMOD? We envision a wide spectrum of contributors, from large biocuration/annotation centers adding large data sets, to individual bioinformaticians who deposit structured versions of previously unstructured data, to individual scientists contributing individual annotations. Will CMOD content be trustworthy? Like Wikipedia, we expect that Wikidata overall will asymptotically approach perfect accuracy and completeness. Moreover, because provenance is a core part of the data model, the presence/absence/type of the reference can be used to systematically filter the knowledgebase according to each user’s needs. Managing genomic information and knowledge is a critical challenge for biomedical research. Community infrastructure that allows individuals to collaboratively and collectively organize knowledge has the potential to be an enabling technology in biological research. Here, we propose CMOD as one such application that is particularly focused on the Long Tail of sequenced genomes. Cumulative number of sequenced genomes