SlideShare a Scribd company logo
PomBase conventions for improving
annotation depth, breadth,
consistency and accuracy
Annotation numbers are important
…but numbers aren’t everything…..
• Use of annotation for data-mining and data-analysis is limited
by errors, inconsistencies and omissions.
• PomBase uses a combination of annotation conventions, to
improve information content (annotation coverage, specificity
and redundancy), and QC mechanisms to identify possible
annotation inconsistencies and errors.
• In combination these mechanisms address many recurring
annotation issues.
1. The definition is critical
All ontology terms have a “fixed” definition
• If a definition is misleading or incorrect its meaning cannot
be changed. To fix the term is obsoleted and annotations
are migrated.
• This makes annotations very robust to ontology changes. If
a term needs to be repositioned the annotations remain
correct .
• We annotate to the definition, not the term name. Always
check the definition.
2. Improving annotation specificity
• i) Consider descendant terms
• ii) Veto use of uninformative terms
2i. Consider descendants
Annotate as specifically as experiment allows and be
unambiguous about the biology
• regulation: positive or negative?
• translation: cytoplasmic or mitochondrial?
• transport: of what? to where? how?
• chromosome segregation: mitotic or meiotic?
If the available terms are insufficient, request a more specific
term
• For a carboxylic acid carrier
“carboxylic acid transport”
looks initially OK
• However “transmembrane transport”
is not explicit here… Carboxylic acid
might be transported in other ways…
2i. Consider descendants e.g.
More specific annotation can
provide additional detail e.g.
• substrate,
• type (transmembrane),
• sometimes directionality
Additional parents increase the
information content as
annotating indirectly to more
terms.
2. Consider descendants e.g.
2. Veto use of non-specific terms
Identify the set of ontology terms where more specific
annotation should be possible (more biological detail)
Examples:
• e.g. cellular process (which one?)
• e.g. translation (cytoplasmic? mitochondrial?)
• e.g. transport ( of what? to where? )
Some GO terms are already flagged as not for manual
annotation. Review and improve annotations to vetoed terms
PomBase blocks 1298 upper level GO terms for direct
annotation (<200 violations)
3. i) Missing parents
Original arrangement
3. Improve the ontologies
3i. Missing parents
These process annotations were originally in different branches
of the ontology, so all annotations were required
New arrangement:
3i. Missing parents
3.i Missing parents
Collapsed 6 processes to 2. Exactly the same information content
Less redundancy, easier for users to interpret annotation
3.ii Report incorrect parents
AKA “True Path Violations” or “TPVs”
For example
protein maturation
--protein processing (part_of)
----proteolysis (part_of)
(not all proteolysis is processing or
maturation)
4. The power of Annotation Extensions
Provide additional specificity for a GO annotation e.g.
• Target gene (kinase substrate, TF regulation target)
• Location of a function
• Localization dependencies (protein A localizes protein B)
• Spatial and temporal aspects of processes, functions, locations (cell cycle stage
of occurrence)
• ADD an example of a gene product specific AE
See: Huntley et. al. A method for increasing expressivity of Gene Ontology
annotations using a compositional approach. PMID:24885854
cyclin-dependent protein serine/threonine kinase
• has substrate fkh2 involved in negative regulation of conjugation with cellular fusion
• directly inhibits srw1 involved in positive regulation regulation of G1/S transition
• has substrate drc1 involved in positive regulation of mitotic cell cycle DNA replication
• has substrate cdc18, orc2 involved in negative regulation of DNA replication during mitotic G2 phase
• has substrate xlf1 involved in negative regulation of double-strand break repair via nonhomologous end joining,
during mitotic G2 phase
• has substrate rap1 involved in negative regulation of mitotic telomere tethering at nuclear periphery
during mitotic M phase
• has substrate hcn1 during mitotic M phase
• has substrate cut3 involved in positive regulation of mitotic chromosome condensation during mitotic metaphase
• has substrate mde4 involved in correction of merotelic attachment, mitotic during mitotic metaphase
• has substrate, nsk1, involved in negative regulation of attachment of mitotic spindle microtubules during mitotic
metaphase
• has substrate mde4,cut7 involved in negative regulation of mitotic spindle elongation during mitotic metaphase
• has substrate klp9 involved in negative regulation of mitotic spindle elongation during mitotic anaphase A
• directly inhibits clp1 involved in negative regulation of exit from mitosis
• has substrate byr4 involved in positive regulation of septation initiation signaling
• directly inhibits dis2,
• has substrate rum1, crb2, sds23
Link function (cyclin-dependent-kinase) to target genes, processes,
and temporal information
4. Annotation Extension e.g. cdc2
Alternative (human CDK1):
Not scalable or maintainable
4. Using AE for effectors
• Reciprocal of the extension (automated) called “target of”
• Collects known “upstream effectors” on cdc2 page
• We can use effector substrate connections to generate
networks (interaction, metabolic, regulatory)
• Provide directional links to support pathway reconstruction
4. Using Annotation Extensions to
generate networks/pathways
sty1
cmk2
srk1
rum1
atf1
srk1
gsa1
gpx1
ntp1
sro1
ish1
4. Automated AE networks e.g.
44/59 connected in automated network based on annotated
connections within “regulation of G2/M transition” (fission yeast)
(Network for each GO slim category from the slim page)
5. Suppress redundant IEA annotation
• PomBase pipelines filter redundant IEA
(Inferred from Electronic Annotation)
evidence
• Removes >90% of IEA (because an existing
manual annotation exists)
5. Suppress redundant IEA annotation
13 annotations are reduced to 4
Same information, fewer terms
Incorrect annotations are more easily spotted
Mis16 is not involved in ‘chromatin modification,- > fix mapping
5. Suppress redundant IEA,
QC of mappings
Missing parents in ontology more obvious
“inorganic anion exchanger” should be an ‘ancestor’ of
GO:0005452, to suppress the IEA as redundant
5. Suppress redundant IEA,
QC of ontology
(SPBC543.05c)
5. Suppress redundant IEA annotation
• >40,000 fission yeast IEAs available.
• PomBase filter 36000 redundant, retain 4000 (IEAs are at least
90% accurate if manual correct).
• It is easier to evaluate the remaining IEA’s to identify/fix
anomalies
Reducing IEAs over time
5. Suppress redundant IEA
• More concise view with zero loss of information
• IEA mappings derived from a single experiment/publication
can be interpreted as proof by repetition and make weak EXP
data appear multiply supported/acceptable
• Fewer annotations, easier QC of remaining IEA’s
Q “Why isn’t an IEA covered by manual annotation?” Either:
1. Incorrect mapping
2. Missing parent in ontology
3. Missing annotation -> find supporting evidence and
annotate manually (EXP or ISO)
(PomBase also filter NAS/TAS/IC)
6. Annotate by process (pathway)
• Annotating by process rather than “ad hoc”
improves consistency and allows ‘annotation
gaps’ to be targeted
• Process papers more quickly (become more
familiar with the field, experimental methods)
Become familiar with an area of biology and
the techniques used. Don’t need to read the
background every time. Recognise
phenotypes.
From PMID:22898774
Regulation of the
metaphase/anaphase
transition by the MCC, the
APC and upstream
Signalling
Identify obvious missing
annotation, for example
between complex
members
6. Annotate by process or pathway
6. Annotate by process or pathway
cdc20
proteasome
APC separase
Cohesin subunit
securin
Post transition
SAC/MCC
Can perform QC on processed or components
e.g. Use STRING to evaluate outliers (potential annotation
errors) Input list “regulation of mitotic metaphase/anaphase
transition”
Can also ask “are any
Complex members missing”
• We are annotating whole organisms…use a
holistic whole annotation approach
• Evaluate annotation breadth (coverage) using
slims
• Evaluate intersections between slim processes
7. Assess annotation at the
organismal level
7. Evaluate organismal annotation
coverage using “slims”
• EXP supported BP
• ISO/IEA inferred BP
‘unknowns’
• Species specific, no
inference possible
• Conserved, but
unannotated in any
species
7. Browsable Slim:
http://preview.pombase.org/browse-curation/fission-yeast-go-slim-terms
7. Sensible assignments?
DNA
recombination
Periodic check that
slim class contents
Look sensible
7. Monitor unslimmed gene products
Note: Exclude biologically uninformative terms like “phosphorylation” or
“response to chemical” as these could apply to any real biological role.
Unknown 830
TOTAL
5054
cytoskeleton
org 206
nuclear DNA
replica on,
recombina on,
repair
305
mito c
chromosome
segrega on
184 regula on of mito c
cell cycle 232
10
CELL DIVISION 751
27
cytokinesis
110
0
39 1
46
3
4. MITOCHONDRIAL
ORG/EXP
280
4
cell wall
org 1303
4
1
MEMBRANES, TRAFFICKING, CELL SURFACE 787
14
lipid met
222 vesicle
Mediated
transport
324
6
glycosyla on
polysacc met
140membrane
org 199
75
0
6
74
10
33
0
detox
SMALL MOLECULE TM
TRANSPORT
288
13
9
0
AA &
sulfur
met
220
vitamin
cofactor
met
9
5 nucleo-base/
side/ de met
219
small
sugar met
77
CENTRAL MET,
ENERGY
AND BUILDING
BLOCKS 549
Nitrogen
15
25
174
54
3430
other energy
genera on
25
23
signalling
404
sexual reproduc ve
process 262
(Many intersec ons)
Other 290
No intersec ons.
Includes adhesion,
many proteases,
peroxions
EXPRESSION 1294
````
EXPRESSION submod 863
4
1
3
ribosome
biogenesis
317
RNA
metabolism
772cytoplasmic
transla on
249
189
c
nucleocyto
transport
110
5
34
26
2
Transcrip on
479
32
18
PROTEIN ASSEMBLY/STABILITY 765
protein
catabolism
& autophagy
251
ubiqui na on
192
63
folding
102
complex
Assembly
325
1
3
4
1
7. Visual slim, all pombe proteins
7. Evaluate intersections between slim
categories
Evaluate intersections between processes
Many GO processes are rarely co-annotated because they are
functionally spatially or temporally distant. For example, would
not expect “ribosome biogenesis” to intersect with “vitamin
metabolism”
We can use this observation to identify potential conflicts using
the GO term matrix
Fission yeast intersections Jan 2012
Fission yeast intersections March 2017
7. Identifies ontology errors (e.g)
DNA metabolism and chromosome segregation do not usually intersect
Regulation of chromosome condensation should not be a DNA metabolic process
7. Ontology error (e.g.)
Genes annotated to folic acid metabolism were also incorrectly annotated to amino acid
metabolism. Folic acid was classified as an amino acid by CHEBI -> fix, CHEBI, which fixes GO
7. Finds incorrect mappings (e.g)
Intersect between tRNA metabolism and transcription.
Elongator is no longer thought to have a direct role in transcription, mapping removed
8. Consider Author intent
Think about the biology the author intended
e.g. rubidium ion transmembrane transporter/ transport
Rubidium ion is used as an assay for K+ transport not rubidium
(non-physiological substrate)
e.g. Apoptosis (RPS19)
Rps19 mutant displayed condensed DNA, a fragmented nucleus
and caspase activation - indicative of apoptosis.
Since RPS19 has an essential role in ribosome biogenesis
apoptosis is likely to be an indirect effect of the disruption of an
upstream process translation (i.e. an experimental readout)
9. Communication with the author
and community curation
• Most authors are happy to discuss their publications. If unsure
about an annotation ask them. PomBase routinely use the
authors as a QC step to refine annotation.
9. Community Curation
• Most authors are happy to curate their own papers
• Co-curation by author and curator improves annotation quality
(especially PhD/post doc/recent papers).
• 9619 annotations (FTPO/GO/MOD) created by Community
from 510 publications (excludes HTP spreadsheet submissions)
Some example sessions
• http://tinyurl.com/q2bgyqv
• http://tinyurl.com/p7d979b
• http://tinyurl.com/o72bzul
Very specific annotation is possible because Canto guides the user
step by step to construct genotypes and ontology based annotations.
“Drill down” to more specific terms is assisted.
Prompts are provided for AE of specified types for certain terms.
10. Prioritise error fixing
• Fixing known errors takes precedence over new annotation....
like critical bugs in code
• Even small errors often uncover larger issues, or can fix many
problems simultaneously across multiple species.
• Prevents propagation of annotation errors
11. GO process vs. phenotype
• GO annotation should reflect a gene's direct involvement
in, or role in regulating, processes or functions.
• Phenotypes may indicate that a mutation *affects* a
process, but may reflect downstream or indirect effects.
e.g. ER membrane defect -> nuclear envelope defect -> chromosome
decondensation defect-> defects in next round of DNA replication.
• A “DNA replication phenotype” alone is not enough to
make a “DNA replication” GO annotation.
• Single phenotype is often NOT SPECIFIC FOR A PROCESS.
Phenotype annotation rules
• To make GO annotations based on phenotypes
• Ask the question
“Is this phenotype or collection of phenotypes
specific to this process (usually need detailed
phenotypes)
Additional data can support GO inference from
phenotype (location, orthology), and author intent.
(Intersections between processes useful for identifying
annotation errors caused by indirect annotation)
Summary

More Related Content

Viewers also liked

Tesegggc
TesegggcTesegggc
Ubuntu
UbuntuUbuntu
Ubuntu
SaRiita Meza
 
Planificación del 3 er cohorte
Planificación del 3 er cohortePlanificación del 3 er cohorte
Planificación del 3 er cohorte
UGMA.
 
JW Healthcare Logo
JW Healthcare LogoJW Healthcare Logo
JW Healthcare LogoJocar Jardin
 
Diapos
DiaposDiapos
Sesion de aprendizaje razonami
Sesion de aprendizaje razonamiSesion de aprendizaje razonami
Sesion de aprendizaje razonamiRuth Myryam
 
Korrika 18ri buruzko gutuna
Korrika 18ri buruzko gutunaKorrika 18ri buruzko gutuna
Korrika 18ri buruzko gutuna
Goiztiri AEK euskaltegia
 
Irakurle kanpaina: 2016ko negua
Irakurle kanpaina: 2016ko neguaIrakurle kanpaina: 2016ko negua
Irakurle kanpaina: 2016ko negua
Goiztiri AEK euskaltegia
 
Ulermena escolar letra ESKUTITZA
Ulermena escolar letra ESKUTITZAUlermena escolar letra ESKUTITZA
Ulermena escolar letra ESKUTITZA
idoialariz
 
Asmakizunak
Asmakizunak Asmakizunak
Asmakizunak
idoialariz
 

Viewers also liked (14)

Tesegggc
TesegggcTesegggc
Tesegggc
 
Reclamebord v2
Reclamebord v2Reclamebord v2
Reclamebord v2
 
5295
52955295
5295
 
Ubuntu
UbuntuUbuntu
Ubuntu
 
Planificación del 3 er cohorte
Planificación del 3 er cohortePlanificación del 3 er cohorte
Planificación del 3 er cohorte
 
Page 12
Page 12Page 12
Page 12
 
JW Healthcare Logo
JW Healthcare LogoJW Healthcare Logo
JW Healthcare Logo
 
Diapos
DiaposDiapos
Diapos
 
Sesion de aprendizaje razonami
Sesion de aprendizaje razonamiSesion de aprendizaje razonami
Sesion de aprendizaje razonami
 
Korrika 18ri buruzko gutuna
Korrika 18ri buruzko gutunaKorrika 18ri buruzko gutuna
Korrika 18ri buruzko gutuna
 
Onet50
Onet50Onet50
Onet50
 
Irakurle kanpaina: 2016ko negua
Irakurle kanpaina: 2016ko neguaIrakurle kanpaina: 2016ko negua
Irakurle kanpaina: 2016ko negua
 
Ulermena escolar letra ESKUTITZA
Ulermena escolar letra ESKUTITZAUlermena escolar letra ESKUTITZA
Ulermena escolar letra ESKUTITZA
 
Asmakizunak
Asmakizunak Asmakizunak
Asmakizunak
 

Similar to PomBase conventions for improving annotation depth, breadth, consistency and accuracy

Copy of biocuration 2017
Copy of biocuration 2017Copy of biocuration 2017
Copy of biocuration 2017
Valerie Wood
 
Bioinformatics t7-protein structure-v2013_wim_vancriekinge
Bioinformatics t7-protein structure-v2013_wim_vancriekingeBioinformatics t7-protein structure-v2013_wim_vancriekinge
Bioinformatics t7-protein structure-v2013_wim_vancriekinge
Prof. Wim Van Criekinge
 
2016 bioinformatics i_proteins_wim_vancriekinge
2016 bioinformatics i_proteins_wim_vancriekinge2016 bioinformatics i_proteins_wim_vancriekinge
2016 bioinformatics i_proteins_wim_vancriekinge
Prof. Wim Van Criekinge
 
GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics Workshop
Morgan Langille
 
Bioinformatica t3-scoring matrices
Bioinformatica t3-scoring matricesBioinformatica t3-scoring matrices
Bioinformatica t3-scoring matrices
Prof. Wim Van Criekinge
 
2015 bioinformatics protein_structure_wimvancriekinge
2015 bioinformatics protein_structure_wimvancriekinge2015 bioinformatics protein_structure_wimvancriekinge
2015 bioinformatics protein_structure_wimvancriekinge
Prof. Wim Van Criekinge
 
Bioinformatica t7-protein structure
Bioinformatica t7-protein structureBioinformatica t7-protein structure
Bioinformatica t7-protein structure
Prof. Wim Van Criekinge
 
Tyler functional annotation thurs 1120
Tyler functional annotation thurs 1120Tyler functional annotation thurs 1120
Tyler functional annotation thurs 1120
Sucheta Tripathy
 
Cross Product Extensions to the Gene Ontology
Cross Product Extensions to the Gene OntologyCross Product Extensions to the Gene Ontology
Cross Product Extensions to the Gene Ontology
Chris Mungall
 
Computational Prediction Of Protein-1.pptx
Computational Prediction Of Protein-1.pptxComputational Prediction Of Protein-1.pptx
Computational Prediction Of Protein-1.pptx
ashharnomani
 
Bioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxBioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptx
Ranjan Jyoti Sarma
 
Homology modeling
Homology modelingHomology modeling
Mapping protein to function
Mapping protein to functionMapping protein to function
Mapping protein to function
Abhik Seal
 
Apollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citriApollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citri
Monica Munoz-Torres
 
Bioinformatics t7-proteinstructure v2014
Bioinformatics t7-proteinstructure v2014Bioinformatics t7-proteinstructure v2014
Bioinformatics t7-proteinstructure v2014
Prof. Wim Van Criekinge
 
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
David Cook
 
Best Practices in Structural Biology
Best Practices in Structural BiologyBest Practices in Structural Biology
Best Practices in Structural Biology
Rohit Satyam
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadata
Philip Cheung
 
Techniques used for separation in proteomics
Techniques used for separation in proteomicsTechniques used for separation in proteomics
Techniques used for separation in proteomics
Nilesh Chandra
 
Critical Reading Biomedical Research Papers-2022.pptx
Critical Reading Biomedical Research Papers-2022.pptxCritical Reading Biomedical Research Papers-2022.pptx
Critical Reading Biomedical Research Papers-2022.pptx
MingdergLai
 

Similar to PomBase conventions for improving annotation depth, breadth, consistency and accuracy (20)

Copy of biocuration 2017
Copy of biocuration 2017Copy of biocuration 2017
Copy of biocuration 2017
 
Bioinformatics t7-protein structure-v2013_wim_vancriekinge
Bioinformatics t7-protein structure-v2013_wim_vancriekingeBioinformatics t7-protein structure-v2013_wim_vancriekinge
Bioinformatics t7-protein structure-v2013_wim_vancriekinge
 
2016 bioinformatics i_proteins_wim_vancriekinge
2016 bioinformatics i_proteins_wim_vancriekinge2016 bioinformatics i_proteins_wim_vancriekinge
2016 bioinformatics i_proteins_wim_vancriekinge
 
GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics Workshop
 
Bioinformatica t3-scoring matrices
Bioinformatica t3-scoring matricesBioinformatica t3-scoring matrices
Bioinformatica t3-scoring matrices
 
2015 bioinformatics protein_structure_wimvancriekinge
2015 bioinformatics protein_structure_wimvancriekinge2015 bioinformatics protein_structure_wimvancriekinge
2015 bioinformatics protein_structure_wimvancriekinge
 
Bioinformatica t7-protein structure
Bioinformatica t7-protein structureBioinformatica t7-protein structure
Bioinformatica t7-protein structure
 
Tyler functional annotation thurs 1120
Tyler functional annotation thurs 1120Tyler functional annotation thurs 1120
Tyler functional annotation thurs 1120
 
Cross Product Extensions to the Gene Ontology
Cross Product Extensions to the Gene OntologyCross Product Extensions to the Gene Ontology
Cross Product Extensions to the Gene Ontology
 
Computational Prediction Of Protein-1.pptx
Computational Prediction Of Protein-1.pptxComputational Prediction Of Protein-1.pptx
Computational Prediction Of Protein-1.pptx
 
Bioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxBioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptx
 
Homology modeling
Homology modelingHomology modeling
Homology modeling
 
Mapping protein to function
Mapping protein to functionMapping protein to function
Mapping protein to function
 
Apollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citriApollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citri
 
Bioinformatics t7-proteinstructure v2014
Bioinformatics t7-proteinstructure v2014Bioinformatics t7-proteinstructure v2014
Bioinformatics t7-proteinstructure v2014
 
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
 
Best Practices in Structural Biology
Best Practices in Structural BiologyBest Practices in Structural Biology
Best Practices in Structural Biology
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadata
 
Techniques used for separation in proteomics
Techniques used for separation in proteomicsTechniques used for separation in proteomics
Techniques used for separation in proteomics
 
Critical Reading Biomedical Research Papers-2022.pptx
Critical Reading Biomedical Research Papers-2022.pptxCritical Reading Biomedical Research Papers-2022.pptx
Critical Reading Biomedical Research Papers-2022.pptx
 

More from Valerie Wood

Go users meeting, unknowns
Go users meeting, unknownsGo users meeting, unknowns
Go users meeting, unknowns
Valerie Wood
 
Curate locally, think globally
Curate locally, think globallyCurate locally, think globally
Curate locally, think globally
Valerie Wood
 
GO slimming tips
GO slimming tipsGO slimming tips
GO slimming tips
Valerie Wood
 
PomBase infographic
PomBase infographicPomBase infographic
PomBase infographic
Valerie Wood
 
New PomBase website features
New PomBase website featuresNew PomBase website features
New PomBase website features
Valerie Wood
 
Hidden in plain sight
Hidden in plain sightHidden in plain sight
Hidden in plain sight
Valerie Wood
 

More from Valerie Wood (6)

Go users meeting, unknowns
Go users meeting, unknownsGo users meeting, unknowns
Go users meeting, unknowns
 
Curate locally, think globally
Curate locally, think globallyCurate locally, think globally
Curate locally, think globally
 
GO slimming tips
GO slimming tipsGO slimming tips
GO slimming tips
 
PomBase infographic
PomBase infographicPomBase infographic
PomBase infographic
 
New PomBase website features
New PomBase website featuresNew PomBase website features
New PomBase website features
 
Hidden in plain sight
Hidden in plain sightHidden in plain sight
Hidden in plain sight
 

Recently uploaded

Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
Sharon Liu
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
vluwdy49
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
TinyAnderson
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
Leonel Morgado
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
MaheshaNanjegowda
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
University of Maribor
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
AbdullaAlAsif1
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
terusbelajar5
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
Hitesh Sikarwar
 

Recently uploaded (20)

Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
 

PomBase conventions for improving annotation depth, breadth, consistency and accuracy

  • 1. PomBase conventions for improving annotation depth, breadth, consistency and accuracy
  • 2. Annotation numbers are important …but numbers aren’t everything….. • Use of annotation for data-mining and data-analysis is limited by errors, inconsistencies and omissions. • PomBase uses a combination of annotation conventions, to improve information content (annotation coverage, specificity and redundancy), and QC mechanisms to identify possible annotation inconsistencies and errors. • In combination these mechanisms address many recurring annotation issues.
  • 3. 1. The definition is critical All ontology terms have a “fixed” definition • If a definition is misleading or incorrect its meaning cannot be changed. To fix the term is obsoleted and annotations are migrated. • This makes annotations very robust to ontology changes. If a term needs to be repositioned the annotations remain correct . • We annotate to the definition, not the term name. Always check the definition.
  • 4. 2. Improving annotation specificity • i) Consider descendant terms • ii) Veto use of uninformative terms
  • 5. 2i. Consider descendants Annotate as specifically as experiment allows and be unambiguous about the biology • regulation: positive or negative? • translation: cytoplasmic or mitochondrial? • transport: of what? to where? how? • chromosome segregation: mitotic or meiotic? If the available terms are insufficient, request a more specific term
  • 6. • For a carboxylic acid carrier “carboxylic acid transport” looks initially OK • However “transmembrane transport” is not explicit here… Carboxylic acid might be transported in other ways… 2i. Consider descendants e.g.
  • 7. More specific annotation can provide additional detail e.g. • substrate, • type (transmembrane), • sometimes directionality Additional parents increase the information content as annotating indirectly to more terms. 2. Consider descendants e.g.
  • 8. 2. Veto use of non-specific terms Identify the set of ontology terms where more specific annotation should be possible (more biological detail) Examples: • e.g. cellular process (which one?) • e.g. translation (cytoplasmic? mitochondrial?) • e.g. transport ( of what? to where? ) Some GO terms are already flagged as not for manual annotation. Review and improve annotations to vetoed terms PomBase blocks 1298 upper level GO terms for direct annotation (<200 violations)
  • 9. 3. i) Missing parents Original arrangement 3. Improve the ontologies
  • 10. 3i. Missing parents These process annotations were originally in different branches of the ontology, so all annotations were required
  • 12. 3.i Missing parents Collapsed 6 processes to 2. Exactly the same information content Less redundancy, easier for users to interpret annotation
  • 13. 3.ii Report incorrect parents AKA “True Path Violations” or “TPVs” For example protein maturation --protein processing (part_of) ----proteolysis (part_of) (not all proteolysis is processing or maturation)
  • 14. 4. The power of Annotation Extensions Provide additional specificity for a GO annotation e.g. • Target gene (kinase substrate, TF regulation target) • Location of a function • Localization dependencies (protein A localizes protein B) • Spatial and temporal aspects of processes, functions, locations (cell cycle stage of occurrence) • ADD an example of a gene product specific AE See: Huntley et. al. A method for increasing expressivity of Gene Ontology annotations using a compositional approach. PMID:24885854
  • 15. cyclin-dependent protein serine/threonine kinase • has substrate fkh2 involved in negative regulation of conjugation with cellular fusion • directly inhibits srw1 involved in positive regulation regulation of G1/S transition • has substrate drc1 involved in positive regulation of mitotic cell cycle DNA replication • has substrate cdc18, orc2 involved in negative regulation of DNA replication during mitotic G2 phase • has substrate xlf1 involved in negative regulation of double-strand break repair via nonhomologous end joining, during mitotic G2 phase • has substrate rap1 involved in negative regulation of mitotic telomere tethering at nuclear periphery during mitotic M phase • has substrate hcn1 during mitotic M phase • has substrate cut3 involved in positive regulation of mitotic chromosome condensation during mitotic metaphase • has substrate mde4 involved in correction of merotelic attachment, mitotic during mitotic metaphase • has substrate, nsk1, involved in negative regulation of attachment of mitotic spindle microtubules during mitotic metaphase • has substrate mde4,cut7 involved in negative regulation of mitotic spindle elongation during mitotic metaphase • has substrate klp9 involved in negative regulation of mitotic spindle elongation during mitotic anaphase A • directly inhibits clp1 involved in negative regulation of exit from mitosis • has substrate byr4 involved in positive regulation of septation initiation signaling • directly inhibits dis2, • has substrate rum1, crb2, sds23 Link function (cyclin-dependent-kinase) to target genes, processes, and temporal information 4. Annotation Extension e.g. cdc2
  • 16. Alternative (human CDK1): Not scalable or maintainable
  • 17. 4. Using AE for effectors • Reciprocal of the extension (automated) called “target of” • Collects known “upstream effectors” on cdc2 page
  • 18. • We can use effector substrate connections to generate networks (interaction, metabolic, regulatory) • Provide directional links to support pathway reconstruction 4. Using Annotation Extensions to generate networks/pathways sty1 cmk2 srk1 rum1 atf1 srk1 gsa1 gpx1 ntp1 sro1 ish1
  • 19. 4. Automated AE networks e.g. 44/59 connected in automated network based on annotated connections within “regulation of G2/M transition” (fission yeast) (Network for each GO slim category from the slim page)
  • 20. 5. Suppress redundant IEA annotation • PomBase pipelines filter redundant IEA (Inferred from Electronic Annotation) evidence • Removes >90% of IEA (because an existing manual annotation exists)
  • 21. 5. Suppress redundant IEA annotation 13 annotations are reduced to 4 Same information, fewer terms
  • 22. Incorrect annotations are more easily spotted Mis16 is not involved in ‘chromatin modification,- > fix mapping 5. Suppress redundant IEA, QC of mappings
  • 23. Missing parents in ontology more obvious “inorganic anion exchanger” should be an ‘ancestor’ of GO:0005452, to suppress the IEA as redundant 5. Suppress redundant IEA, QC of ontology (SPBC543.05c)
  • 24. 5. Suppress redundant IEA annotation • >40,000 fission yeast IEAs available. • PomBase filter 36000 redundant, retain 4000 (IEAs are at least 90% accurate if manual correct). • It is easier to evaluate the remaining IEA’s to identify/fix anomalies Reducing IEAs over time
  • 25. 5. Suppress redundant IEA • More concise view with zero loss of information • IEA mappings derived from a single experiment/publication can be interpreted as proof by repetition and make weak EXP data appear multiply supported/acceptable • Fewer annotations, easier QC of remaining IEA’s Q “Why isn’t an IEA covered by manual annotation?” Either: 1. Incorrect mapping 2. Missing parent in ontology 3. Missing annotation -> find supporting evidence and annotate manually (EXP or ISO) (PomBase also filter NAS/TAS/IC)
  • 26. 6. Annotate by process (pathway) • Annotating by process rather than “ad hoc” improves consistency and allows ‘annotation gaps’ to be targeted • Process papers more quickly (become more familiar with the field, experimental methods) Become familiar with an area of biology and the techniques used. Don’t need to read the background every time. Recognise phenotypes.
  • 27. From PMID:22898774 Regulation of the metaphase/anaphase transition by the MCC, the APC and upstream Signalling Identify obvious missing annotation, for example between complex members 6. Annotate by process or pathway
  • 28. 6. Annotate by process or pathway cdc20 proteasome APC separase Cohesin subunit securin Post transition SAC/MCC Can perform QC on processed or components e.g. Use STRING to evaluate outliers (potential annotation errors) Input list “regulation of mitotic metaphase/anaphase transition” Can also ask “are any Complex members missing”
  • 29. • We are annotating whole organisms…use a holistic whole annotation approach • Evaluate annotation breadth (coverage) using slims • Evaluate intersections between slim processes 7. Assess annotation at the organismal level
  • 30. 7. Evaluate organismal annotation coverage using “slims” • EXP supported BP • ISO/IEA inferred BP ‘unknowns’ • Species specific, no inference possible • Conserved, but unannotated in any species
  • 32. 7. Sensible assignments? DNA recombination Periodic check that slim class contents Look sensible
  • 33. 7. Monitor unslimmed gene products Note: Exclude biologically uninformative terms like “phosphorylation” or “response to chemical” as these could apply to any real biological role.
  • 34. Unknown 830 TOTAL 5054 cytoskeleton org 206 nuclear DNA replica on, recombina on, repair 305 mito c chromosome segrega on 184 regula on of mito c cell cycle 232 10 CELL DIVISION 751 27 cytokinesis 110 0 39 1 46 3 4. MITOCHONDRIAL ORG/EXP 280 4 cell wall org 1303 4 1 MEMBRANES, TRAFFICKING, CELL SURFACE 787 14 lipid met 222 vesicle Mediated transport 324 6 glycosyla on polysacc met 140membrane org 199 75 0 6 74 10 33 0 detox SMALL MOLECULE TM TRANSPORT 288 13 9 0 AA & sulfur met 220 vitamin cofactor met 9 5 nucleo-base/ side/ de met 219 small sugar met 77 CENTRAL MET, ENERGY AND BUILDING BLOCKS 549 Nitrogen 15 25 174 54 3430 other energy genera on 25 23 signalling 404 sexual reproduc ve process 262 (Many intersec ons) Other 290 No intersec ons. Includes adhesion, many proteases, peroxions EXPRESSION 1294 ```` EXPRESSION submod 863 4 1 3 ribosome biogenesis 317 RNA metabolism 772cytoplasmic transla on 249 189 c nucleocyto transport 110 5 34 26 2 Transcrip on 479 32 18 PROTEIN ASSEMBLY/STABILITY 765 protein catabolism & autophagy 251 ubiqui na on 192 63 folding 102 complex Assembly 325 1 3 4 1 7. Visual slim, all pombe proteins
  • 35. 7. Evaluate intersections between slim categories Evaluate intersections between processes Many GO processes are rarely co-annotated because they are functionally spatially or temporally distant. For example, would not expect “ribosome biogenesis” to intersect with “vitamin metabolism” We can use this observation to identify potential conflicts using the GO term matrix
  • 38. 7. Identifies ontology errors (e.g) DNA metabolism and chromosome segregation do not usually intersect Regulation of chromosome condensation should not be a DNA metabolic process
  • 39. 7. Ontology error (e.g.) Genes annotated to folic acid metabolism were also incorrectly annotated to amino acid metabolism. Folic acid was classified as an amino acid by CHEBI -> fix, CHEBI, which fixes GO
  • 40. 7. Finds incorrect mappings (e.g) Intersect between tRNA metabolism and transcription. Elongator is no longer thought to have a direct role in transcription, mapping removed
  • 41. 8. Consider Author intent Think about the biology the author intended e.g. rubidium ion transmembrane transporter/ transport Rubidium ion is used as an assay for K+ transport not rubidium (non-physiological substrate) e.g. Apoptosis (RPS19) Rps19 mutant displayed condensed DNA, a fragmented nucleus and caspase activation - indicative of apoptosis. Since RPS19 has an essential role in ribosome biogenesis apoptosis is likely to be an indirect effect of the disruption of an upstream process translation (i.e. an experimental readout)
  • 42. 9. Communication with the author and community curation • Most authors are happy to discuss their publications. If unsure about an annotation ask them. PomBase routinely use the authors as a QC step to refine annotation.
  • 43. 9. Community Curation • Most authors are happy to curate their own papers • Co-curation by author and curator improves annotation quality (especially PhD/post doc/recent papers). • 9619 annotations (FTPO/GO/MOD) created by Community from 510 publications (excludes HTP spreadsheet submissions)
  • 44. Some example sessions • http://tinyurl.com/q2bgyqv • http://tinyurl.com/p7d979b • http://tinyurl.com/o72bzul
  • 45. Very specific annotation is possible because Canto guides the user step by step to construct genotypes and ontology based annotations. “Drill down” to more specific terms is assisted. Prompts are provided for AE of specified types for certain terms.
  • 46. 10. Prioritise error fixing • Fixing known errors takes precedence over new annotation.... like critical bugs in code • Even small errors often uncover larger issues, or can fix many problems simultaneously across multiple species. • Prevents propagation of annotation errors
  • 47. 11. GO process vs. phenotype • GO annotation should reflect a gene's direct involvement in, or role in regulating, processes or functions. • Phenotypes may indicate that a mutation *affects* a process, but may reflect downstream or indirect effects. e.g. ER membrane defect -> nuclear envelope defect -> chromosome decondensation defect-> defects in next round of DNA replication. • A “DNA replication phenotype” alone is not enough to make a “DNA replication” GO annotation. • Single phenotype is often NOT SPECIFIC FOR A PROCESS.
  • 48. Phenotype annotation rules • To make GO annotations based on phenotypes • Ask the question “Is this phenotype or collection of phenotypes specific to this process (usually need detailed phenotypes) Additional data can support GO inference from phenotype (location, orthology), and author intent. (Intersections between processes useful for identifying annotation errors caused by indirect annotation)

Editor's Notes

  1. describe some pombase curation procedures, might be useful to other daabases/curators
  2. Coverage, genes annotated OR number of different processes for a gene
  3. Another improtant poitn is that annotations are explicity coupled by using a term which covers both (although this can also be done with extensions)
  4. Arrange temporally?
  5. NOTE: we don’t filter redundant EXP annotations, but we do manage this in the display so the term is presented and the source (often multiple) is avaiable in a full view Later we hope to hide higher level EXP annotations
  6. Complexes cluster together, some genes incorrectly annotated, can work out how they are connected, check appropriate sub porcesses annotated fr complexes, complex annotations are internally consistent etc
  7. Add error type examples
  8. CHEbi u= sed to define chemicals in GO
  9. This isn’t speculative, its the curator using what is known but not explicitly stated, it’s a valid interpretation of the experiment based on what is presented- we are modelling the biology