Curators are necessarily detail oriented -- a trait born of, and reinforced by, our efforts to describe biological data accurately and precisely. To ensure comprehensive coverage and meaningful integration of new and existing knowledge, however, it is important to periodically step back from this fine-grained view and assess emergent features in accumulated curation. I will explore how PomBase has used the global "big picture" view of curated data to provide biological summaries, modularise content, and improve data display and access for our users. The global perspective can also be used to detect annotation errors and identify knowledge gaps, thereby improving overall annotation quality. I will also describe the progress we have made in engaging fission yeast researchers in community curation. Finally, I will show that the global curation perspective and community engagement share a common theme: both improve overall understanding, accessibility and reuse of accumulated knowledge by our user community.
Introductory slides for the Python hands-on session of the Research Data Visualisation Workshop run by the Software Sustainability Institute, University of Manchester 28th July 2016.
Materials for the session are available at https://github.com/widdowquinn/Teaching-Data-Visualisation
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesMonica Munoz-Torres
Precise elucidation of the many different biological features encoded in a genome requires a careful curation process that involves reviewing all available evidence to allow researchers to resolve discrepancies and validate automated gene models, protein alignments, and other biological elements. Genome annotation is an inherently collaborative task; researchers only rarely work in isolation, turning to colleagues for second opinions and insights from those with expertise in particular domains and gene families.
The i5k initiative seeks to sequence the genomes of 5,000 insect and related arthropod species. The selected species are known to be important to worldwide agriculture, food safety, medicine, and energy production as well as many used as models in biology, those most abundant in world ecosystems, and representatives in every branch of the insect phylogeny in an effort to better understand arthropod evolution and phylogeny. Because computational genome analysis remains an imperfect art, each of these new genomes sequenced will require visualization and curation.
Apollo is an instantaneous, collaborative, genome annotation editor, and the new JavaScript based version allows researchers real-time interactivity, breaking down large amounts of data into manageable portions to mobilize groups of researchers with shared interests. The i5K is a broad and inclusive effort that seeks to involve scientists from around the world in their genome curation process and Apollo is serving as the platform to empower this community. Here we offer details about this collaboration.
Introductory slides for the Python hands-on session of the Research Data Visualisation Workshop run by the Software Sustainability Institute, University of Manchester 28th July 2016.
Materials for the session are available at https://github.com/widdowquinn/Teaching-Data-Visualisation
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesMonica Munoz-Torres
Precise elucidation of the many different biological features encoded in a genome requires a careful curation process that involves reviewing all available evidence to allow researchers to resolve discrepancies and validate automated gene models, protein alignments, and other biological elements. Genome annotation is an inherently collaborative task; researchers only rarely work in isolation, turning to colleagues for second opinions and insights from those with expertise in particular domains and gene families.
The i5k initiative seeks to sequence the genomes of 5,000 insect and related arthropod species. The selected species are known to be important to worldwide agriculture, food safety, medicine, and energy production as well as many used as models in biology, those most abundant in world ecosystems, and representatives in every branch of the insect phylogeny in an effort to better understand arthropod evolution and phylogeny. Because computational genome analysis remains an imperfect art, each of these new genomes sequenced will require visualization and curation.
Apollo is an instantaneous, collaborative, genome annotation editor, and the new JavaScript based version allows researchers real-time interactivity, breaking down large amounts of data into manageable portions to mobilize groups of researchers with shared interests. The i5K is a broad and inclusive effort that seeks to involve scientists from around the world in their genome curation process and Apollo is serving as the platform to empower this community. Here we offer details about this collaboration.
Being Reproducible: SSBSS Summer School 2017Carole Goble
Lecture 2:
Being Reproducible: Models, Research Objects and R* Brouhaha
Reproducibility is a R* minefield, depending on whether you are testing for robustness (rerun), defence (repeat), certification (replicate), comparison (reproduce) or transferring between researchers (reuse). Different forms of "R" make different demands on the completeness, depth and portability of research. Sharing is another minefield raising concerns of credit and protection from sharp practices.
In practice the exchange, reuse and reproduction of scientific experiments is dependent on bundling and exchanging the experimental methods, computational codes, data, algorithms, workflows and so on along with the narrative. These "Research Objects" are not fixed, just as research is not “finished”: the codes fork, data is updated, algorithms are revised, workflows break, service updates are released. ResearchObject.org is an effort to systematically support more portable and reproducible research exchange.
In this talk I will explore these issues in more depth using the FAIRDOM Platform and its support for reproducible modelling. The talk will cover initiatives and technical issues, and raise social and cultural challenges.
BioAssay Express: Creating and exploiting assay metadataPhilip Cheung
The challenge of accurately characterizing bioassays is a real pain point for many drug discovery organizations. Research has shown that some organizations have legacy assay collections exceeding 20,000 protocols, the great majority of which are not accurately characterized. This problem is compounded by the fact that many new protocol registrations are still not following FAIR (Findability, Accessibility, Interoperability, and Reusability) Data principles.
BioAssay Express is a tool focused on transforming the traditional protocol description from an unstructured free form text into a well-curated data store based upon FAIR Data principles. By using well-defined annotations for assays, the tool enables precise ontology based searches without having to resort to imprecise keyword searches.
This talk explores a number of new important features designed to help scientists accelerate the drug discovery process. Some example use-cases include: enabling drug repositioning projects; improving SAR models; identifying appropriate machine learning data sets; fine-tuning integrative-omic pathways;
An aspirational goal for our team is to build a metadata schema based on semantic web vocabularies that is comprehensive to the extent that the text description becomes optional. One of the many possibilities is to take the initial prospective ELN entry for a bioassay protocol and feed it directly to an automated instrument. While there are many challenges involved in creating the ELN-to-robot loop, we will provide some insights into our collaborations with UCSF automation experts.
In summary, the ability to quickly and accurately search or analyze bioassay data (public or internal) is a rate limiting problem in drug discovery. We will present the latest developments toward removing this bottleneck.
https://plan.core-apps.com/acs_sd2019/abstract/6f58993d-a716-49ad-9b09-609edde5a3f4
This presentation was provided by Kristi Holmes of Northwestern University during the NISO hot topic virtual conference "Effective Data Management," which was held on September 29, 2021.
Collaboratively Creating the Knowledge Graph of LifeChris Mungall
Overview of collaborative projects in the life sciences building out the necessary ontologies, schemas, and knowledge graphs for describing biological knowledge
Bio-ontologies in bioinformatics: Growing up challengesJanna Hastings
Bio-ontologies are growing up, and their use is becoming widespread in many areas of computational science. The new maturity is bringing new challenges, however, in particular visualization of complex ontologies; moving from OBO to OWL; using multiple ontologies in conjunction; training appropriate for biologists and community building.
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Amit Sheth
Amit Sheth's Keynote at Semantic Web Technologies for Science and Engineering Workshop (held in conjunction with ISWC2003), Sanibel Island, FL, October 20, 2003.
An introduction to Web Apollo for i5K Pilot Species Projects - HemipteraMonica Munoz-Torres
Introduction to Web Apollo for the i5K Pilot species project. WebApollo is genome annotation editor; it provides a web-based environment that allows multiple distributed users to review, edit, and share manual annotations. This presentation includes information specific to the projects of the Global Initiative to sequence the genomes of 5,000 species of arthropods, i5K. Let's get started!
Web Apollo: Lessons learned from community-based biocuration efforts.Monica Munoz-Torres
This presentation tries to highlight the importance and relevance of community-based curation of biological data. It describes the results of harvesting expertise from dispersed researchers assigning functions to predicted and curated peptides, as well as collaborative efforts for standardization of genes and gene product attributes across species and databases.
This presentation is a thorough guide to the use of Web Apollo, with details on User Navigation, Functionality, and the thought process behind manual annotation.
During this workshop, participants:
- Learn to identify homologs of known genes of interest in your newly sequenced genome.
- Become familiar with the environment and functionality of the Web Apollo genome annotation editing tool.
- Learn how to corroborate or modify automatically annotated gene models using all available evidence in Web Apollo.
- Understand the process of curation in the context of genome annotation.
Being Reproducible: SSBSS Summer School 2017Carole Goble
Lecture 2:
Being Reproducible: Models, Research Objects and R* Brouhaha
Reproducibility is a R* minefield, depending on whether you are testing for robustness (rerun), defence (repeat), certification (replicate), comparison (reproduce) or transferring between researchers (reuse). Different forms of "R" make different demands on the completeness, depth and portability of research. Sharing is another minefield raising concerns of credit and protection from sharp practices.
In practice the exchange, reuse and reproduction of scientific experiments is dependent on bundling and exchanging the experimental methods, computational codes, data, algorithms, workflows and so on along with the narrative. These "Research Objects" are not fixed, just as research is not “finished”: the codes fork, data is updated, algorithms are revised, workflows break, service updates are released. ResearchObject.org is an effort to systematically support more portable and reproducible research exchange.
In this talk I will explore these issues in more depth using the FAIRDOM Platform and its support for reproducible modelling. The talk will cover initiatives and technical issues, and raise social and cultural challenges.
BioAssay Express: Creating and exploiting assay metadataPhilip Cheung
The challenge of accurately characterizing bioassays is a real pain point for many drug discovery organizations. Research has shown that some organizations have legacy assay collections exceeding 20,000 protocols, the great majority of which are not accurately characterized. This problem is compounded by the fact that many new protocol registrations are still not following FAIR (Findability, Accessibility, Interoperability, and Reusability) Data principles.
BioAssay Express is a tool focused on transforming the traditional protocol description from an unstructured free form text into a well-curated data store based upon FAIR Data principles. By using well-defined annotations for assays, the tool enables precise ontology based searches without having to resort to imprecise keyword searches.
This talk explores a number of new important features designed to help scientists accelerate the drug discovery process. Some example use-cases include: enabling drug repositioning projects; improving SAR models; identifying appropriate machine learning data sets; fine-tuning integrative-omic pathways;
An aspirational goal for our team is to build a metadata schema based on semantic web vocabularies that is comprehensive to the extent that the text description becomes optional. One of the many possibilities is to take the initial prospective ELN entry for a bioassay protocol and feed it directly to an automated instrument. While there are many challenges involved in creating the ELN-to-robot loop, we will provide some insights into our collaborations with UCSF automation experts.
In summary, the ability to quickly and accurately search or analyze bioassay data (public or internal) is a rate limiting problem in drug discovery. We will present the latest developments toward removing this bottleneck.
https://plan.core-apps.com/acs_sd2019/abstract/6f58993d-a716-49ad-9b09-609edde5a3f4
This presentation was provided by Kristi Holmes of Northwestern University during the NISO hot topic virtual conference "Effective Data Management," which was held on September 29, 2021.
Collaboratively Creating the Knowledge Graph of LifeChris Mungall
Overview of collaborative projects in the life sciences building out the necessary ontologies, schemas, and knowledge graphs for describing biological knowledge
Bio-ontologies in bioinformatics: Growing up challengesJanna Hastings
Bio-ontologies are growing up, and their use is becoming widespread in many areas of computational science. The new maturity is bringing new challenges, however, in particular visualization of complex ontologies; moving from OBO to OWL; using multiple ontologies in conjunction; training appropriate for biologists and community building.
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Amit Sheth
Amit Sheth's Keynote at Semantic Web Technologies for Science and Engineering Workshop (held in conjunction with ISWC2003), Sanibel Island, FL, October 20, 2003.
An introduction to Web Apollo for i5K Pilot Species Projects - HemipteraMonica Munoz-Torres
Introduction to Web Apollo for the i5K Pilot species project. WebApollo is genome annotation editor; it provides a web-based environment that allows multiple distributed users to review, edit, and share manual annotations. This presentation includes information specific to the projects of the Global Initiative to sequence the genomes of 5,000 species of arthropods, i5K. Let's get started!
Web Apollo: Lessons learned from community-based biocuration efforts.Monica Munoz-Torres
This presentation tries to highlight the importance and relevance of community-based curation of biological data. It describes the results of harvesting expertise from dispersed researchers assigning functions to predicted and curated peptides, as well as collaborative efforts for standardization of genes and gene product attributes across species and databases.
This presentation is a thorough guide to the use of Web Apollo, with details on User Navigation, Functionality, and the thought process behind manual annotation.
During this workshop, participants:
- Learn to identify homologs of known genes of interest in your newly sequenced genome.
- Become familiar with the environment and functionality of the Web Apollo genome annotation editing tool.
- Learn how to corroborate or modify automatically annotated gene models using all available evidence in Web Apollo.
- Understand the process of curation in the context of genome annotation.
A preview of the new website and our new displays.
publication pages, genotype pages, non-redundant displays for GO and FYPO, new network building approach.
PomBase conventions for improving annotation depth, breadth, consistency and ...Valerie Wood
PomBase uses a combination of annotation conventions and QC mechanisms. In addition to identifying annotation inconsistencies and errors, these combined methods improve information content, annotation coverage, depth or specificity and redundancy.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
Curate locally, think globally
1. Curate locally, think globally
(Insights from the “big-picture” view of curation)
Valerie Wood, PomBase,
Department of Biochemistry, University of Cambridge, UK
ISB 2019
2. The PomBase team
Midori Harris (curator , ontology developer)
Antonia Lock (curator)
Kim Rutherford (developer)
3. What can we learn from a “big picture”
view of curated data (especially to
improve our resources for end users) ?
How can we effectively engage users in
the curation process?
4. ● QC- Identify annotation errors
and/or outliers
● Identify annotation gaps
● Identify knowledge gaps (and
improve annotation breadth)
● Improve data access and
presentation
Ultimately curation helps us
to join the dots and
synthesize new knowledge
from data integration.
Insights from the “big-picture” view of curationon
We often overlook the value of
the emergent knowledge from
the ‘sum of the parts’
5. Gene expression
Lorem Ipsum Lorem Ipsum
Making lists using ontologies and vocabularies
Gene1
RNA recognition motif
mRNA export
protein kinase activity
nucleus
transcription
Gene2
protein kinase activity
RNA binding domain
mRNA export
nucleus
transcription
mitotic cell cycle
mRNA export
Gene 1
Gene 2
Gene 4
Gene 6
Gene 8
Gene 9
transcription
Gene 1
Gene 2
Gene 3
Gene 7
Gene 10
mRNA export
Gene 1
Gene 2
Gene 4
Gene 6
Gene 8
Gene 9
transcription
Gene 1
Gene 2
Gene 3
Gene 7
Gene 10
Essentially creating 1000s of lists of
‘objects’ with similar features
We curate detail, annotating
genes to ‘terms’
These lists are often
related to each other
through ontologies
We can use sets of lists
to create “Annotation
subsets”
So why are lists useful?
6. GO slim = Ontology subset
of “high level” GO terms
“GO slim annotation subset”
= set of lists
GO slim
https://www.ebi.ac.uk/QuickGO/
Biological process slim (for analysis)
should represent known biology well
8. Intersection
Metabolic process ∩
cellular process 3167
‘High level’ terms are often uninformative for
physiological role
Fission yeast: 4369 proteins with biological process annotation
metabolic process
3237
75% of BP
annotated
proteins
cellular process
4112
Other process terms excluded
response to (chemical)
phosphorylation
(can also apply to any module)
Terms which apply across annotation space are
often too general to be informative about
physiological role (for a biologist).
Slims with specificity are more useful.
9. Fission yeast GO slim, 53 terms
● Good coverage of process
(99% of gene products with
BP). Important to clearly
indicate what does not “slim”
(and why)
● Some gene products belong
to more than one slim
category. Overlaps are
unavoidable but are minimised
where possible
● Align with biologically
meaningful ‘modules’
10. Slim terms and intersections (biological modules)
5069 proteins
All cardiolipin biosynthesis
Unknown 700
11. tRNA metabolism transmembrane transport
Fission yeast 161 Fission yeast 339
Example intersection with no co-annotated genes
Using co-annotation and biological knowledge as a QC procedure
for annotation
Current intersection 10
possible annotation errors?
All GO annotation 78000 All GO annotation 14000
Transmembrane transport
∩ tRNA metabolism = 0
15. Multispecies rule building results
Pilot project, tested mouse, worm, yeast
107 rules created to state that a particular annotation intersections = 0
Annotation errors (experimental ) identified (and corrected): 147
74
73
Acknowledgements, MGI David Hill, WormBase Kimberley Van Auken, SGD Stacia Engel, Rama
Balakrishnan
16. Multispecies rule building results
Only 0.001% of annotation corpus (600 million) . Lots of scope...
Preliminary rules are now incorporated into the GO rule base
Plan to publish soon….
Electronic annotations are based on manual annotation using experimental data.
Therefore a small number of corrections to manual annotation can fix a large number of
automatic annotation applied ac non-model species
18. Slow progress characterizing
unknowns
Hidden in plain sight: what remains to be discovered
in the eukaryotic proteome? PMID:30938578
20% pombe and cerevisiae
still “unknown process”
19. 20% human also unknown
117 terms
53 terms
Extended pombase slim to
cover multicellular process
annotation
We confirmed that human
unannotated are unknown,
even when not explicitly
annotated as such
20. Why are unknowns unstudied?
27
Based on recent gene characterizations in
fission yeast
Most recently characterised proteins are
involved in non-core functions:
● environment responsive or aging related
processes: detoxification, proteostasis,
lipidostasis, damage accumulation.
● Processes that are only required over
longer timescales
● Less than 25% are housekeeping
processes
21.
22. How can we help users to cut through the complexity?
https://www.pombase.org/browse-curation/fission-yeast-go-slim-terms
See P174
for recent
updates to the
PomBase
website
25. Community Curation, making small-scale data
FAIR
See P133 (Antonia Lock) P168 (Alayne Cuzick)
Easy to use curation tool (Canto), step-by-step workflow
26. Please, add also delta
crs1: normal onset of
premeiotic DNA replication.
Data in Fig S4.
I am wondering a normal
proportion germinates and
go on to form viable
colonies) - this is not what
the definition is suggesting,
but would be a more useful
term
I like this better. Is there
also a ….“reduced viability
of spore population” or
something like this?
….in addition to “delayed
onset of premeiotic DNA
replication” Is it possible to
use two different Term
names?
Yes of course. The peak
looks a bit broader - would
this be the equivalent to
'prolonged premeiotic DNA
replication’?
Yes, the kinetics of the
disappearance of the G1
population is much slower;
prolonged premeiotic DNA
replication is fine (or
extended).
27. Community curation, increasing participation
Literature triage identifies 6K ‘gene specific’ papers
among the 12.5K that mention fission yeast
Quality is EXCELLENT, coverage not so good, but improves
with subsequent sessions.
Once ‘initiated’ drop out rate is low.
Nobody does it until asked, most need reminders
Annotations per low-throughput study
9 18 41
28. Understand curation
improved reuse, visibility
and dissemination
Canto is easy to use
BUT we can can
improve
242 respondents who had used Canto
out of 632 total
29. What are the barriers?
The dog ate my homework (7)
● Many apologies for not having done
yet...
● I know I should have done.
● I keep meaning to and will!
● It is next on the 'to do' list!
● I have no excuse. I should and will
curate my paper
● feel guilty for not doing so!
● ...I'm sure it's not that difficult, just hard
to find time. I do think it's worthwhile
and that I should prioritize my curation
contribution
● Curation of papers is extremely
important and this survey definitely
motivates me to take the time to use
Canto and curate my papers.
250
105
81
67
22
13
67
32. Testimonials - Making new connections
It is back and forth: think about the ..results for a while, then compare with the body of
data in PomBase, then think/work a bit more. Rinse and repeat. Martin Převorovský,
Principal Investigator, Charles University, Czech republic
I don't think we could have done anything without pombase...we build our research around
its knowledge base. Mikel Zaratiegui, Principal Investigator, Rutgers University, US
…...frequently use the ...gene annotations to make connections between pathways
and to design experiments. Amanda Bird, Principal Investigator, The Ohio State
University, US
Recently we performed a screen and by using PomBase we quickly realized that all the hits
were clustered in the same pathway. Finding this out without pombase would have
required extensive review of papers that are not within our field of expertise. In this
example a few minutes of work on PomBase gave us confidence that we were onto
something and saved us many weeks of work. Anonymous Principal Investigator
PomBase...has saved me countless hours of fruitless experiments and helped open up
many new, unexpected avenues of investigation. Gautam Dey, Postdoc, MRC LMCB
UCL, UK
“Over 300 testimonials have been received from across the research
community..Quite simply, without it, many significant discoveries
would simply not have been made….”
”Ultimately, this integrated data is driving science forward in novel
ways by enabling the community to make connections between new
and existing data…” Paul Nurse, Director, Crick Institute
33. acknowledgements
The PomBase team
Midori Harris (curator , ontology developer)
Antonia Lock (curator)
Kim Rutherford (developer)
Collaborators
Gene Ontology editorial team
Pascale Gaudet
David Hill,
Kimberley Van Auken
Harold Drabkin
Chris Mungall
Seth Carbon
35. Intersections in a simple eukaryote
cytoplasmic translation,
RNA metabolism
ribosome biogenesis
nucleocytoplasmic transport
TOTAL 1359
cell wall organization
glycosylation
lipid metabolic process
membrane organization
vesicle-mediated transport
TOTAL 722
Intermodule
only 9 shared
genes
Using co-annotation and biological knowledge as a QC
procedure for annotation
36. Step 1
Annotations shared between sets of GO
terms are explored and annotation
intersections (number of genes annotated)
are noted.
37. Step 3
Identify new annotations
violating existing rules.
Report to contributing
database(s) for validation.
Step 2
Rules are created for “zero intersects” based on known biology:
• (“cellular amino acid meta. proc.” ∩ “DNA recombination”) = 0
• (“lipid meta. proc.” ∩ “carbohydrate meta. proc.”) = 0
38. Step 4
Annotations critically inspected, leading to one of two outcomes:
A: Violation identified: contributing database corrects annotation
B: Annotation confirmed: rules are extended to allow specific exceptions:
Explore
co-annotation
Correct or
modify
Identify and
report
Biological
“rules”
Steps 1- 4
Iterative process