SlideShare a Scribd company logo
Curate locally, think globally
(Insights from the “big-picture” view of curation)
Valerie Wood, PomBase,
Department of Biochemistry, University of Cambridge, UK
ISB 2019
The PomBase team
Midori Harris (curator , ontology developer)
Antonia Lock (curator)
Kim Rutherford (developer)
What can we learn from a “big picture”
view of curated data (especially to
improve our resources for end users) ?
How can we effectively engage users in
the curation process?
● QC- Identify annotation errors
and/or outliers
● Identify annotation gaps
● Identify knowledge gaps (and
improve annotation breadth)
● Improve data access and
presentation
Ultimately curation helps us
to join the dots and
synthesize new knowledge
from data integration.
Insights from the “big-picture” view of curationon
We often overlook the value of
the emergent knowledge from
the ‘sum of the parts’
Gene expression
Lorem Ipsum Lorem Ipsum
Making lists using ontologies and vocabularies
Gene1
RNA recognition motif
mRNA export
protein kinase activity
nucleus
transcription
Gene2
protein kinase activity
RNA binding domain
mRNA export
nucleus
transcription
mitotic cell cycle
mRNA export
Gene 1
Gene 2
Gene 4
Gene 6
Gene 8
Gene 9
transcription
Gene 1
Gene 2
Gene 3
Gene 7
Gene 10
mRNA export
Gene 1
Gene 2
Gene 4
Gene 6
Gene 8
Gene 9
transcription
Gene 1
Gene 2
Gene 3
Gene 7
Gene 10
Essentially creating 1000s of lists of
‘objects’ with similar features
We curate detail, annotating
genes to ‘terms’
These lists are often
related to each other
through ontologies
We can use sets of lists
to create “Annotation
subsets”
So why are lists useful?
GO slim = Ontology subset
of “high level” GO terms
“GO slim annotation subset”
= set of lists
GO slim
https://www.ebi.ac.uk/QuickGO/
Biological process slim (for analysis)
should represent known biology well
Cofactor metabolic
process
DNA metabolic
process
cytoplasmic
translation
mitochondrial
translation
metabolic process
Intersection
Metabolic process ∩
cellular process 3167
‘High level’ terms are often uninformative for
physiological role
Fission yeast: 4369 proteins with biological process annotation
metabolic process
3237
75% of BP
annotated
proteins
cellular process
4112
Other process terms excluded
response to (chemical)
phosphorylation
(can also apply to any module)
Terms which apply across annotation space are
often too general to be informative about
physiological role (for a biologist).
Slims with specificity are more useful.
Fission yeast GO slim, 53 terms
● Good coverage of process
(99% of gene products with
BP). Important to clearly
indicate what does not “slim”
(and why)
● Some gene products belong
to more than one slim
category. Overlaps are
unavoidable but are minimised
where possible
● Align with biologically
meaningful ‘modules’
Slim terms and intersections (biological modules)
5069 proteins
All cardiolipin biosynthesis
Unknown 700
tRNA metabolism transmembrane transport
Fission yeast 161 Fission yeast 339
Example intersection with no co-annotated genes
Using co-annotation and biological knowledge as a QC procedure
for annotation
Current intersection 10
possible annotation errors?
All GO annotation 78000 All GO annotation 14000
Transmembrane transport
∩ tRNA metabolism = 0
The Matrix Tool
http://amigo.geneontology.org/matrix
Seth Carbon & Chris Mungall), Berkeley Lab
Fission yeast intersections 01/2012
Fission yeast intersections 03/2019
Multispecies rule building results
Pilot project, tested mouse, worm, yeast
107 rules created to state that a particular annotation intersections = 0
Annotation errors (experimental ) identified (and corrected): 147
74
73
Acknowledgements, MGI David Hill, WormBase Kimberley Van Auken, SGD Stacia Engel, Rama
Balakrishnan
Multispecies rule building results
Only 0.001% of annotation corpus (600 million) . Lots of scope...
Preliminary rules are now incorporated into the GO rule base
Plan to publish soon….
Electronic annotations are based on manual annotation using experimental data.
Therefore a small number of corrections to manual annotation can fix a large number of
automatic annotation applied ac non-model species
Unknowns- the elephant in the room
Unknown 700
?
Slow progress characterizing
unknowns
Hidden in plain sight: what remains to be discovered
in the eukaryotic proteome? PMID:30938578
20% pombe and cerevisiae
still “unknown process”
20% human also unknown
117 terms
53 terms
Extended pombase slim to
cover multicellular process
annotation
We confirmed that human
unannotated are unknown,
even when not explicitly
annotated as such
Why are unknowns unstudied?
27
Based on recent gene characterizations in
fission yeast
Most recently characterised proteins are
involved in non-core functions:
● environment responsive or aging related
processes: detoxification, proteostasis,
lipidostasis, damage accumulation.
● Processes that are only required over
longer timescales
● Less than 25% are housekeeping
processes
How can we help users to cut through the complexity?
https://www.pombase.org/browse-curation/fission-yeast-go-slim-terms
See P174
for recent
updates to the
PomBase
website
New interactive view (Quilt Tool), cut across data
types
Community Curation, making small-scale data
FAIR
See P133 (Antonia Lock) P168 (Alayne Cuzick)
Easy to use curation tool (Canto), step-by-step workflow
Please, add also delta
crs1: normal onset of
premeiotic DNA replication.
Data in Fig S4.
I am wondering a normal
proportion germinates and
go on to form viable
colonies) - this is not what
the definition is suggesting,
but would be a more useful
term
I like this better. Is there
also a ….“reduced viability
of spore population” or
something like this?
….in addition to “delayed
onset of premeiotic DNA
replication” Is it possible to
use two different Term
names?
Yes of course. The peak
looks a bit broader - would
this be the equivalent to
'prolonged premeiotic DNA
replication’?
Yes, the kinetics of the
disappearance of the G1
population is much slower;
prolonged premeiotic DNA
replication is fine (or
extended).
Community curation, increasing participation
Literature triage identifies 6K ‘gene specific’ papers
among the 12.5K that mention fission yeast
Quality is EXCELLENT, coverage not so good, but improves
with subsequent sessions.
Once ‘initiated’ drop out rate is low.
Nobody does it until asked, most need reminders
Annotations per low-throughput study
9 18 41
Understand curation
improved reuse, visibility
and dissemination
Canto is easy to use
BUT we can can
improve
242 respondents who had used Canto
out of 632 total
What are the barriers?
The dog ate my homework (7)
● Many apologies for not having done
yet...
● I know I should have done.
● I keep meaning to and will!
● It is next on the 'to do' list!
● I have no excuse. I should and will
curate my paper
● feel guilty for not doing so!
● ...I'm sure it's not that difficult, just hard
to find time. I do think it's worthwhile
and that I should prioritize my curation
contribution
● Curation of papers is extremely
important and this survey definitely
motivates me to take the time to use
Canto and curate my papers.
250
105
81
67
22
13
67
Incentives and Nudges
https://www.freepik.com/
Applying small behavioural
‘nudges’ to increase
participation
Easy
Attractive
Social
Timely
Incentives
and Nudges
Reciprocal links between PubMed
and PomBase publication page
Curator Attribution
Testimonials - Making new connections
It is back and forth: think about the ..results for a while, then compare with the body of
data in PomBase, then think/work a bit more. Rinse and repeat. Martin Převorovský,
Principal Investigator, Charles University, Czech republic
I don't think we could have done anything without pombase...we build our research around
its knowledge base. Mikel Zaratiegui, Principal Investigator, Rutgers University, US
…...frequently use the ...gene annotations to make connections between pathways
and to design experiments. Amanda Bird, Principal Investigator, The Ohio State
University, US
Recently we performed a screen and by using PomBase we quickly realized that all the hits
were clustered in the same pathway. Finding this out without pombase would have
required extensive review of papers that are not within our field of expertise. In this
example a few minutes of work on PomBase gave us confidence that we were onto
something and saved us many weeks of work. Anonymous Principal Investigator
PomBase...has saved me countless hours of fruitless experiments and helped open up
many new, unexpected avenues of investigation. Gautam Dey, Postdoc, MRC LMCB
UCL, UK
“Over 300 testimonials have been received from across the research
community..Quite simply, without it, many significant discoveries
would simply not have been made….”
”Ultimately, this integrated data is driving science forward in novel
ways by enabling the community to make connections between new
and existing data…” Paul Nurse, Director, Crick Institute
acknowledgements
The PomBase team
Midori Harris (curator , ontology developer)
Antonia Lock (curator)
Kim Rutherford (developer)
Collaborators
Gene Ontology editorial team
Pascale Gaudet
David Hill,
Kimberley Van Auken
Harold Drabkin
Chris Mungall
Seth Carbon
SPARE SLIDES
Intersections in a simple eukaryote
cytoplasmic translation,
RNA metabolism
ribosome biogenesis
nucleocytoplasmic transport
TOTAL 1359
cell wall organization
glycosylation
lipid metabolic process
membrane organization
vesicle-mediated transport
TOTAL 722
Intermodule
only 9 shared
genes
Using co-annotation and biological knowledge as a QC
procedure for annotation
Step 1
Annotations shared between sets of GO
terms are explored and annotation
intersections (number of genes annotated)
are noted.
Step 3
Identify new annotations
violating existing rules.
Report to contributing
database(s) for validation.
Step 2
Rules are created for “zero intersects” based on known biology:
• (“cellular amino acid meta. proc.” ∩ “DNA recombination”) = 0
• (“lipid meta. proc.” ∩ “carbohydrate meta. proc.”) = 0
Step 4
Annotations critically inspected, leading to one of two outcomes:
A: Violation identified: contributing database corrects annotation
B: Annotation confirmed: rules are extended to allow specific exceptions:
Explore
co-annotation
Correct or
modify
Identify and
report
Biological
“rules”
Steps 1- 4
Iterative process
29 annotation errors corrected
Multispecies exercise: cohesin complex vs. processes

More Related Content

Similar to Curate locally, think globally

Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017
Carole Goble
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadata
Philip Cheung
 
Go users meeting, unknowns
Go users meeting, unknownsGo users meeting, unknowns
Go users meeting, unknowns
Valerie Wood
 
Holmes "Institutional Infrastructure for Data Sharing"
Holmes "Institutional Infrastructure for Data Sharing"Holmes "Institutional Infrastructure for Data Sharing"
Holmes "Institutional Infrastructure for Data Sharing"
National Information Standards Organization (NISO)
 
Collaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeCollaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of Life
Chris Mungall
 
The Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in BiologyThe Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in Biology
robertstevens65
 
2016 06-04 governanve of hackerspaces, subject, self, power martin malthe borch
2016 06-04 governanve of hackerspaces, subject, self, power martin malthe borch2016 06-04 governanve of hackerspaces, subject, self, power martin malthe borch
2016 06-04 governanve of hackerspaces, subject, self, power martin malthe borch
Martin Malthe Borch
 
Bio-ontologies in bioinformatics: Growing up challenges
Bio-ontologies in bioinformatics: Growing up challengesBio-ontologies in bioinformatics: Growing up challenges
Bio-ontologies in bioinformatics: Growing up challenges
Janna Hastings
 
2016 davis-biotech
2016 davis-biotech2016 davis-biotech
2016 davis-biotech
c.titus.brown
 
Ontology for the Financial Services Industry
Ontology for the Financial Services IndustryOntology for the Financial Services Industry
Ontology for the Financial Services IndustryBarry Smith
 
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Amit Sheth
 
Reproducible research - to infinity
Reproducible research - to infinityReproducible research - to infinity
Reproducible research - to infinity
PeterMorrell4
 
An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera
An introduction to Web Apollo for i5K Pilot Species Projects - HemipteraAn introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera
An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera
Monica Munoz-Torres
 
Web Apollo: Lessons learned from community-based biocuration efforts.
Web Apollo: Lessons learned from community-based biocuration efforts.Web Apollo: Lessons learned from community-based biocuration efforts.
Web Apollo: Lessons learned from community-based biocuration efforts.
Monica Munoz-Torres
 
Benefits of Open Science, Lennart Martens (OpenAIRE workshop, Ghent, Nov.2015)
Benefits of Open Science, Lennart Martens (OpenAIRE workshop, Ghent, Nov.2015)  Benefits of Open Science, Lennart Martens (OpenAIRE workshop, Ghent, Nov.2015)
Benefits of Open Science, Lennart Martens (OpenAIRE workshop, Ghent, Nov.2015)
OpenAIRE
 
Web Apollo Workshop UIUC
Web Apollo Workshop UIUCWeb Apollo Workshop UIUC
Web Apollo Workshop UIUC
Monica Munoz-Torres
 
Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giants
Benjamin Good
 
2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
c.titus.brown
 

Similar to Curate locally, think globally (20)

Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadata
 
Go users meeting, unknowns
Go users meeting, unknownsGo users meeting, unknowns
Go users meeting, unknowns
 
Holmes "Institutional Infrastructure for Data Sharing"
Holmes "Institutional Infrastructure for Data Sharing"Holmes "Institutional Infrastructure for Data Sharing"
Holmes "Institutional Infrastructure for Data Sharing"
 
Collaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeCollaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of Life
 
The Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in BiologyThe Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in Biology
 
2014 ucl
2014 ucl2014 ucl
2014 ucl
 
2016 06-04 governanve of hackerspaces, subject, self, power martin malthe borch
2016 06-04 governanve of hackerspaces, subject, self, power martin malthe borch2016 06-04 governanve of hackerspaces, subject, self, power martin malthe borch
2016 06-04 governanve of hackerspaces, subject, self, power martin malthe borch
 
Bio-ontologies in bioinformatics: Growing up challenges
Bio-ontologies in bioinformatics: Growing up challengesBio-ontologies in bioinformatics: Growing up challenges
Bio-ontologies in bioinformatics: Growing up challenges
 
2016 davis-biotech
2016 davis-biotech2016 davis-biotech
2016 davis-biotech
 
Ontology for the Financial Services Industry
Ontology for the Financial Services IndustryOntology for the Financial Services Industry
Ontology for the Financial Services Industry
 
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
 
Reproducible research - to infinity
Reproducible research - to infinityReproducible research - to infinity
Reproducible research - to infinity
 
An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera
An introduction to Web Apollo for i5K Pilot Species Projects - HemipteraAn introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera
An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera
 
Web Apollo: Lessons learned from community-based biocuration efforts.
Web Apollo: Lessons learned from community-based biocuration efforts.Web Apollo: Lessons learned from community-based biocuration efforts.
Web Apollo: Lessons learned from community-based biocuration efforts.
 
2014 naples
2014 naples2014 naples
2014 naples
 
Benefits of Open Science, Lennart Martens (OpenAIRE workshop, Ghent, Nov.2015)
Benefits of Open Science, Lennart Martens (OpenAIRE workshop, Ghent, Nov.2015)  Benefits of Open Science, Lennart Martens (OpenAIRE workshop, Ghent, Nov.2015)
Benefits of Open Science, Lennart Martens (OpenAIRE workshop, Ghent, Nov.2015)
 
Web Apollo Workshop UIUC
Web Apollo Workshop UIUCWeb Apollo Workshop UIUC
Web Apollo Workshop UIUC
 
Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giants
 
2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
 

More from Valerie Wood

GO slimming tips
GO slimming tipsGO slimming tips
GO slimming tips
Valerie Wood
 
Copy of biocuration 2017
Copy of biocuration 2017Copy of biocuration 2017
Copy of biocuration 2017
Valerie Wood
 
PomBase infographic
PomBase infographicPomBase infographic
PomBase infographic
Valerie Wood
 
New PomBase website features
New PomBase website featuresNew PomBase website features
New PomBase website features
Valerie Wood
 
Community curation at PomBase
Community curation at PomBaseCommunity curation at PomBase
Community curation at PomBase
Valerie Wood
 
PomBase conventions for improving annotation depth, breadth, consistency and ...
PomBase conventions for improving annotation depth, breadth, consistency and ...PomBase conventions for improving annotation depth, breadth, consistency and ...
PomBase conventions for improving annotation depth, breadth, consistency and ...
Valerie Wood
 
Hidden in plain sight
Hidden in plain sightHidden in plain sight
Hidden in plain sight
Valerie Wood
 

More from Valerie Wood (7)

GO slimming tips
GO slimming tipsGO slimming tips
GO slimming tips
 
Copy of biocuration 2017
Copy of biocuration 2017Copy of biocuration 2017
Copy of biocuration 2017
 
PomBase infographic
PomBase infographicPomBase infographic
PomBase infographic
 
New PomBase website features
New PomBase website featuresNew PomBase website features
New PomBase website features
 
Community curation at PomBase
Community curation at PomBaseCommunity curation at PomBase
Community curation at PomBase
 
PomBase conventions for improving annotation depth, breadth, consistency and ...
PomBase conventions for improving annotation depth, breadth, consistency and ...PomBase conventions for improving annotation depth, breadth, consistency and ...
PomBase conventions for improving annotation depth, breadth, consistency and ...
 
Hidden in plain sight
Hidden in plain sightHidden in plain sight
Hidden in plain sight
 

Recently uploaded

Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
ossaicprecious19
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
kumarmathi863
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
ssuserbfdca9
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
DiyaBiswas10
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
Areesha Ahmad
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Scintica Instrumentation
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
muralinath2
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
muralinath2
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Richard Gill
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
IvanMallco1
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
muralinath2
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 

Recently uploaded (20)

Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 

Curate locally, think globally

  • 1. Curate locally, think globally (Insights from the “big-picture” view of curation) Valerie Wood, PomBase, Department of Biochemistry, University of Cambridge, UK ISB 2019
  • 2. The PomBase team Midori Harris (curator , ontology developer) Antonia Lock (curator) Kim Rutherford (developer)
  • 3. What can we learn from a “big picture” view of curated data (especially to improve our resources for end users) ? How can we effectively engage users in the curation process?
  • 4. ● QC- Identify annotation errors and/or outliers ● Identify annotation gaps ● Identify knowledge gaps (and improve annotation breadth) ● Improve data access and presentation Ultimately curation helps us to join the dots and synthesize new knowledge from data integration. Insights from the “big-picture” view of curationon We often overlook the value of the emergent knowledge from the ‘sum of the parts’
  • 5. Gene expression Lorem Ipsum Lorem Ipsum Making lists using ontologies and vocabularies Gene1 RNA recognition motif mRNA export protein kinase activity nucleus transcription Gene2 protein kinase activity RNA binding domain mRNA export nucleus transcription mitotic cell cycle mRNA export Gene 1 Gene 2 Gene 4 Gene 6 Gene 8 Gene 9 transcription Gene 1 Gene 2 Gene 3 Gene 7 Gene 10 mRNA export Gene 1 Gene 2 Gene 4 Gene 6 Gene 8 Gene 9 transcription Gene 1 Gene 2 Gene 3 Gene 7 Gene 10 Essentially creating 1000s of lists of ‘objects’ with similar features We curate detail, annotating genes to ‘terms’ These lists are often related to each other through ontologies We can use sets of lists to create “Annotation subsets” So why are lists useful?
  • 6. GO slim = Ontology subset of “high level” GO terms “GO slim annotation subset” = set of lists GO slim https://www.ebi.ac.uk/QuickGO/ Biological process slim (for analysis) should represent known biology well
  • 8. Intersection Metabolic process ∩ cellular process 3167 ‘High level’ terms are often uninformative for physiological role Fission yeast: 4369 proteins with biological process annotation metabolic process 3237 75% of BP annotated proteins cellular process 4112 Other process terms excluded response to (chemical) phosphorylation (can also apply to any module) Terms which apply across annotation space are often too general to be informative about physiological role (for a biologist). Slims with specificity are more useful.
  • 9. Fission yeast GO slim, 53 terms ● Good coverage of process (99% of gene products with BP). Important to clearly indicate what does not “slim” (and why) ● Some gene products belong to more than one slim category. Overlaps are unavoidable but are minimised where possible ● Align with biologically meaningful ‘modules’
  • 10. Slim terms and intersections (biological modules) 5069 proteins All cardiolipin biosynthesis Unknown 700
  • 11. tRNA metabolism transmembrane transport Fission yeast 161 Fission yeast 339 Example intersection with no co-annotated genes Using co-annotation and biological knowledge as a QC procedure for annotation Current intersection 10 possible annotation errors? All GO annotation 78000 All GO annotation 14000 Transmembrane transport ∩ tRNA metabolism = 0
  • 12. The Matrix Tool http://amigo.geneontology.org/matrix Seth Carbon & Chris Mungall), Berkeley Lab
  • 15. Multispecies rule building results Pilot project, tested mouse, worm, yeast 107 rules created to state that a particular annotation intersections = 0 Annotation errors (experimental ) identified (and corrected): 147 74 73 Acknowledgements, MGI David Hill, WormBase Kimberley Van Auken, SGD Stacia Engel, Rama Balakrishnan
  • 16. Multispecies rule building results Only 0.001% of annotation corpus (600 million) . Lots of scope... Preliminary rules are now incorporated into the GO rule base Plan to publish soon…. Electronic annotations are based on manual annotation using experimental data. Therefore a small number of corrections to manual annotation can fix a large number of automatic annotation applied ac non-model species
  • 17. Unknowns- the elephant in the room Unknown 700 ?
  • 18. Slow progress characterizing unknowns Hidden in plain sight: what remains to be discovered in the eukaryotic proteome? PMID:30938578 20% pombe and cerevisiae still “unknown process”
  • 19. 20% human also unknown 117 terms 53 terms Extended pombase slim to cover multicellular process annotation We confirmed that human unannotated are unknown, even when not explicitly annotated as such
  • 20. Why are unknowns unstudied? 27 Based on recent gene characterizations in fission yeast Most recently characterised proteins are involved in non-core functions: ● environment responsive or aging related processes: detoxification, proteostasis, lipidostasis, damage accumulation. ● Processes that are only required over longer timescales ● Less than 25% are housekeeping processes
  • 21.
  • 22. How can we help users to cut through the complexity? https://www.pombase.org/browse-curation/fission-yeast-go-slim-terms See P174 for recent updates to the PomBase website
  • 23.
  • 24. New interactive view (Quilt Tool), cut across data types
  • 25. Community Curation, making small-scale data FAIR See P133 (Antonia Lock) P168 (Alayne Cuzick) Easy to use curation tool (Canto), step-by-step workflow
  • 26. Please, add also delta crs1: normal onset of premeiotic DNA replication. Data in Fig S4. I am wondering a normal proportion germinates and go on to form viable colonies) - this is not what the definition is suggesting, but would be a more useful term I like this better. Is there also a ….“reduced viability of spore population” or something like this? ….in addition to “delayed onset of premeiotic DNA replication” Is it possible to use two different Term names? Yes of course. The peak looks a bit broader - would this be the equivalent to 'prolonged premeiotic DNA replication’? Yes, the kinetics of the disappearance of the G1 population is much slower; prolonged premeiotic DNA replication is fine (or extended).
  • 27. Community curation, increasing participation Literature triage identifies 6K ‘gene specific’ papers among the 12.5K that mention fission yeast Quality is EXCELLENT, coverage not so good, but improves with subsequent sessions. Once ‘initiated’ drop out rate is low. Nobody does it until asked, most need reminders Annotations per low-throughput study 9 18 41
  • 28. Understand curation improved reuse, visibility and dissemination Canto is easy to use BUT we can can improve 242 respondents who had used Canto out of 632 total
  • 29. What are the barriers? The dog ate my homework (7) ● Many apologies for not having done yet... ● I know I should have done. ● I keep meaning to and will! ● It is next on the 'to do' list! ● I have no excuse. I should and will curate my paper ● feel guilty for not doing so! ● ...I'm sure it's not that difficult, just hard to find time. I do think it's worthwhile and that I should prioritize my curation contribution ● Curation of papers is extremely important and this survey definitely motivates me to take the time to use Canto and curate my papers. 250 105 81 67 22 13 67
  • 30. Incentives and Nudges https://www.freepik.com/ Applying small behavioural ‘nudges’ to increase participation Easy Attractive Social Timely
  • 31. Incentives and Nudges Reciprocal links between PubMed and PomBase publication page Curator Attribution
  • 32. Testimonials - Making new connections It is back and forth: think about the ..results for a while, then compare with the body of data in PomBase, then think/work a bit more. Rinse and repeat. Martin Převorovský, Principal Investigator, Charles University, Czech republic I don't think we could have done anything without pombase...we build our research around its knowledge base. Mikel Zaratiegui, Principal Investigator, Rutgers University, US …...frequently use the ...gene annotations to make connections between pathways and to design experiments. Amanda Bird, Principal Investigator, The Ohio State University, US Recently we performed a screen and by using PomBase we quickly realized that all the hits were clustered in the same pathway. Finding this out without pombase would have required extensive review of papers that are not within our field of expertise. In this example a few minutes of work on PomBase gave us confidence that we were onto something and saved us many weeks of work. Anonymous Principal Investigator PomBase...has saved me countless hours of fruitless experiments and helped open up many new, unexpected avenues of investigation. Gautam Dey, Postdoc, MRC LMCB UCL, UK “Over 300 testimonials have been received from across the research community..Quite simply, without it, many significant discoveries would simply not have been made….” ”Ultimately, this integrated data is driving science forward in novel ways by enabling the community to make connections between new and existing data…” Paul Nurse, Director, Crick Institute
  • 33. acknowledgements The PomBase team Midori Harris (curator , ontology developer) Antonia Lock (curator) Kim Rutherford (developer) Collaborators Gene Ontology editorial team Pascale Gaudet David Hill, Kimberley Van Auken Harold Drabkin Chris Mungall Seth Carbon
  • 35. Intersections in a simple eukaryote cytoplasmic translation, RNA metabolism ribosome biogenesis nucleocytoplasmic transport TOTAL 1359 cell wall organization glycosylation lipid metabolic process membrane organization vesicle-mediated transport TOTAL 722 Intermodule only 9 shared genes Using co-annotation and biological knowledge as a QC procedure for annotation
  • 36. Step 1 Annotations shared between sets of GO terms are explored and annotation intersections (number of genes annotated) are noted.
  • 37. Step 3 Identify new annotations violating existing rules. Report to contributing database(s) for validation. Step 2 Rules are created for “zero intersects” based on known biology: • (“cellular amino acid meta. proc.” ∩ “DNA recombination”) = 0 • (“lipid meta. proc.” ∩ “carbohydrate meta. proc.”) = 0
  • 38. Step 4 Annotations critically inspected, leading to one of two outcomes: A: Violation identified: contributing database corrects annotation B: Annotation confirmed: rules are extended to allow specific exceptions: Explore co-annotation Correct or modify Identify and report Biological “rules” Steps 1- 4 Iterative process
  • 39. 29 annotation errors corrected Multispecies exercise: cohesin complex vs. processes