SlideShare a Scribd company logo
Hidden in plain sight
Classifying the known and the unknown
proteome
Valerie Wood,
PomBase
We tend to study what we know
5054 protein coding
2154 published, small scale (blue),
2050 inferred from orthologs (red)
Steady progress in characterizing proteins
already studied in other organisms
‘Unknowns’ decreasing only gradually
509 conserved (green)
321 Schizosaccharomyces specific (purple)
Similar situation in other organisms (other
organisms checked for annotation)
Progress in characterising proteins since
2006
Classifying as unknown
• Concept of “known function” is vague/arbitrary- we may
know the molecular function (oxidase/protease) but
nothing about the broader cellular role
• For fission yeast we use ‘unknown’ if there is no
information about the broad cellular role in which they
participate, and thus cannot be assigned to a ‘biological
process’ in the ‘GO slim’ (i.e. transcription, translation,
replication, amino acid metabolism)
• People tend to work on processes, this makes them
accessible as candidates for follow up
vertebrate, eukaryote only
vertebrate, bacteria
} 179 (have clear
human ortholog)
Taxonomic conservation
Unknown 830
Schizosaccharomyces only 178
321 fission yeast specific
Schizosaccharomyces pombe
only 143
other non-fungal eukaryote, no vertebrate
fungi only 186
horizontal transfer 19
509 conserved in other organismsfungi and bacteria
• To make useful inferences for ‘unknowns’ we
need a clear and accurate picture of what we
know
• A “GO slim” is a way of summarizing the
biological roles of an organisms gene products
• This “matrix” shows genes co-annotated to
pairs of GO slim terms
• Many GO slim terms do not share annotations
with other slim terms
• Used for QC, identify ontology and annotation
errors (especially electronic annotation)
What is known ?
DNA replication, recombination, repair frequently
intersect with each other and with
Chromosome organization, mitotic cell cycle regulation
Rarely (currently never) intersect with:
carbohydrate metabolism
amino acid metabolism
Cytokinesis
Lipid metabolism
Nucleocytoplasmic transport
Protein glycosylation
Unknown 830
TOTAL 5054
cytoskeleton
org 206
nuclear DNA
replication,
recombination,
repair
305
mitotic
chromosome
segregation
184 regulation of mitotic
cell cycle 232
10
CELL DIVISION 751
27
cytokinesis
110
0
39 1
46
3
4. MITOCHONDRIAL
ORG/EXP
280
4
cell wall
org 1303
4
1
MEMBRANES, TRAFFICKING, CELL SURFACE 787
14
lipid met
222 vesicle
Mediated
transport
324
6
glycosylation
polysacc met
140membrane
org 199
75
0
6
74
10
33
0
detox
SMALL MOLECULE TM
TRANSPORT
288
13
9
0
AA &
sulfur
met
220
vitamin
cofactor
met
9
5 nucleo-base/
side/tide met
219
small
sugar met
77
CENTRAL MET,
ENERGY
AND BUILDING
BLOCKS 549
Nitrogen
15
25
174
54
3430
other energy
generation
25
23
signalling
404
sexual reproductive
process 262
(Many intersections)
Other 290
No intersections.
Includes adhesion,
many proteases,
peroxions
EXPRESSION 1294
````
EXPRESSION submod 863
4
1
3
ribosome
biogenesis
317
RNA
metabolism
772cytoplasmic
translation
249
189
c
nucleocyto
transport
110
5
34
26
2
Transcription
479
32
18
PROTEIN ASSEMBLY/STABILITY 765
protein
catabolism
& autophagy
251
ubiquitination
192
63
folding
102
complex
Assembly
325
1
3
4
1
All cardiolipin synthesis
MITOCHONDRIA
L ORG/EXP
280
MEMBRANES, TRAFFICKING,
CELL SURFACE 787
signalling
404
sexual
reproductive
process 262
Other 290
TOTAL 5054
PROTEIN ASSEMBLY
/STABILITY 765
CELL DIVISION 751
SMALL MOLECULE
TM TRANSPORT
288
CENTRAL MET, ENERGY
AND BUILDING
BLOCKS 549
EXPRESSION 1294
````
EXPRESSION submod 863
c
Transcription
479
Unknown 830
This covers the known “process options” for a single-celled eukaryote
- First step for the unknowns , assign to broad process (bring to researchers of interest
notice)
- If we can predict strong association with some module or submodule is unlikely to be
associated with others (caveat)
- Provides a ‘framework’ to begin to partition “unknowns” based on general or specific
non-process characteristics (constrain predictions, and evaluate them, based on
existing knowledge)
New biology?
Function prediction
1. Find features informative for known processes
• Phenotypes
• Taxonomic distribution
• Location
• Catalytic grouping
2. Identify these informative features in unknown protein:
3. Cluster similar features
4. Ask “which known genes best match these profiles?”
5. Look for matching processes
Classification/clustering of unknowns
(conserved to human subset)
1. Identify informative features
for each unknown protein:
Phenotypes
Location
Taxonomic distribution
Catalytic grouping
2. Group by similar features
For 100/179 of the conserved to
human subset see poster 144
1. ER localization
2. >1 <4 TM domain
3. Absent from S. cerevisiae
4. Conserved in vertebrates
Ask “which known genes best
match these profiles?”
Query using PomBase advanced search tool:
Look for matching processes
“What are these genes
enriched for?”
1. Present in nucleus
2. Methyltransferase domain
3. Conserved in bacteria
4. Conserved in vertebrates
11/15 are tRNA metabolism,
… there are orphan tRNA enzymes11/15 tRNA met, 3/15 rRNA met
Adding another feature “HU sensitivity”, increases specificity for tRNA metabolism
Real example
SPCC1840.09 was recently characterised as coq11 in S. cerevisiae
4/9 with these phenotypes are ubiquinone biosynthesis
1 transcription
1 (SPAC823.10c) indirect (reannotated to heme transport)
Guilt by association
All unknowns in STRING (http://string-db.org/)
Human AMMECR1
Human MEMO1
Using AnGELI
AMMECR subnetwork has connections to meiotic cell cycle (7/8), and are upregulated in
response to caffeine and rapamycin and stress
What do we need to make good
predictions?
CURATION
• Accurate predictions requires high quality curation. Continual removal of known false positive
annotations (via ontology errors, incorrect experiments, manual curation errors and incorrect
automated mapping) from the ‘true positives training set’
FUNCTION PREDICTION METHODS
• Many pipelines for function prediction produce lots of false positives because not fully constrained
by existing knowledge
• Integrations of all methods, integrated approach to prediction, different methods will suit different
processes (no one size fits all)
• To identify more informative features (e.g. phenotypes) which correlate strongly and specifically
with known processes (i.e. some phenotypes ‘abnormal shape may be enriched for some
processes but are non-specific)
EXPERIMENTAL DATA
• Require more datasets which provide more strong positive and negative discriminators
• More high quality physical interactions
ACCESS
• Make predictions prominently accessible to validate in small scale follow up
The future
Acknowledgements
• Midori Harris
• Antonia Lock
• Jurg Bahler
• Danny Bitton (AnGeli)
• Steve Oliver
Spare slides
167
No biological role:
3436
831 Biological Process
e.g.
Cell cycle
Transcription
DNA replication
Transport
Regulation of process
Molecular Function
e.g
transporter
enzyme
-protein kinase
-ubiquitin ligase
-oxidoreductase
-protease
binding functions
enzyme regulator (direct)
Cellular Component
(location or complex)
455
15 1842
Total 5054
All 3 aspects unknown
717, unknown role
Plus 113 where the process
Is not very informative total 830
90
Find “non process” features that
correlate with processes
e.g. “Mitochondrial organization”
(A GO slim term)
More likely to be:
Location
Phenotype
Phenotype
Expression
Less likely to be:
Protein
feature
Using AnGeLi http://bahlerweb.cs.ucl.ac.uk/cgi-bin/GLA/GLA_input
275/280 genes involved in mitochondrion
organization are mitochondrial, BUT not
all mitochondrial genes (732) are involved in
mitochondrial organization
10/10 genes which show decreased
population growth on galactose are mitochondrial organization
Less likely to be periodic 8/497 or abnormal cell cycle 4/695
`
Between module intersections
Numbers of between module
intersections are low
Between module intersections
Excluding signalling co-annotation the
intersections between modules are
minimal (mainly Ub mediated protein
degradation)

More Related Content

What's hot

Final senior seminar presentation
Final senior seminar presentationFinal senior seminar presentation
Final senior seminar presentation
Shana Hardy
 
Protein protein interaction
Protein protein interactionProtein protein interaction
Protein protein interaction
sunil kaintura
 
protein-protein interaction
protein-protein  interactionprotein-protein  interaction
protein-protein interaction
Zeshan Haider
 
Specificity and Evolvability in Eukaryotic Protein Interaction Networks
Specificity and Evolvability in Eukaryotic Protein Interaction NetworksSpecificity and Evolvability in Eukaryotic Protein Interaction Networks
Specificity and Evolvability in Eukaryotic Protein Interaction Networks
pedrobeltrao
 
Proteomics
ProteomicsProteomics
Proteomics
Rana Basit
 
Metabolomics
MetabolomicsMetabolomics
Metabolomics
Shreya Ahuja
 
Protein protein interactions
Protein protein interactionsProtein protein interactions
Protein protein interactions
Tasuduq Yaqoob
 
miller-chap-5a
 miller-chap-5a miller-chap-5a
miller-chap-5a
Amit Gupta
 
M sc genetics
M sc geneticsM sc genetics
M sc genetics
upendra sharma
 
Biology unit 6 dna rna protein synthesis mutation notes
Biology unit 6 dna rna protein synthesis mutation notesBiology unit 6 dna rna protein synthesis mutation notes
Biology unit 6 dna rna protein synthesis mutation notes
rozeka01
 
YEAST TWO HYBRID SYSTEM
 YEAST TWO HYBRID SYSTEM YEAST TWO HYBRID SYSTEM
YEAST TWO HYBRID SYSTEM
Md Nahidul Islam
 
Age Related Histomorphological and Transmission Electron Microscopic Studies ...
Age Related Histomorphological and Transmission Electron Microscopic Studies ...Age Related Histomorphological and Transmission Electron Microscopic Studies ...
Age Related Histomorphological and Transmission Electron Microscopic Studies ...
iosrjce
 
Sample Final Report_ Extraction of CYP microsomes
Sample Final Report_ Extraction of CYP microsomesSample Final Report_ Extraction of CYP microsomes
Sample Final Report_ Extraction of CYP microsomes
Eric Sudar
 
Proteomics and its applications in phytopathology
Proteomics and its applications in phytopathologyProteomics and its applications in phytopathology
Proteomics and its applications in phytopathology
Abhijeet Kashyap
 
3 genetics syllabus statements
3 genetics syllabus statements3 genetics syllabus statements
3 genetics syllabus statements
cartlidge
 
Interactomics, Integromics to Systems Biology: Next Animal Biotechnology Fron...
Interactomics, Integromics to Systems Biology: Next Animal Biotechnology Fron...Interactomics, Integromics to Systems Biology: Next Animal Biotechnology Fron...
Interactomics, Integromics to Systems Biology: Next Animal Biotechnology Fron...
Varij Nayan
 
Lecture 8 (biol3600) dna damage and repair - winter 2012
Lecture 8  (biol3600)   dna damage and repair - winter 2012Lecture 8  (biol3600)   dna damage and repair - winter 2012
Lecture 8 (biol3600) dna damage and repair - winter 2012
Paula Faria Waziry
 
Biological Catalysts: Enzymes
Biological Catalysts: EnzymesBiological Catalysts: Enzymes
Biological Catalysts: Enzymes
Jill Tan-Abrenica
 
Dna mutation
Dna mutationDna mutation
Dna mutation
Anusha Ananthakrishna
 

What's hot (19)

Final senior seminar presentation
Final senior seminar presentationFinal senior seminar presentation
Final senior seminar presentation
 
Protein protein interaction
Protein protein interactionProtein protein interaction
Protein protein interaction
 
protein-protein interaction
protein-protein  interactionprotein-protein  interaction
protein-protein interaction
 
Specificity and Evolvability in Eukaryotic Protein Interaction Networks
Specificity and Evolvability in Eukaryotic Protein Interaction NetworksSpecificity and Evolvability in Eukaryotic Protein Interaction Networks
Specificity and Evolvability in Eukaryotic Protein Interaction Networks
 
Proteomics
ProteomicsProteomics
Proteomics
 
Metabolomics
MetabolomicsMetabolomics
Metabolomics
 
Protein protein interactions
Protein protein interactionsProtein protein interactions
Protein protein interactions
 
miller-chap-5a
 miller-chap-5a miller-chap-5a
miller-chap-5a
 
M sc genetics
M sc geneticsM sc genetics
M sc genetics
 
Biology unit 6 dna rna protein synthesis mutation notes
Biology unit 6 dna rna protein synthesis mutation notesBiology unit 6 dna rna protein synthesis mutation notes
Biology unit 6 dna rna protein synthesis mutation notes
 
YEAST TWO HYBRID SYSTEM
 YEAST TWO HYBRID SYSTEM YEAST TWO HYBRID SYSTEM
YEAST TWO HYBRID SYSTEM
 
Age Related Histomorphological and Transmission Electron Microscopic Studies ...
Age Related Histomorphological and Transmission Electron Microscopic Studies ...Age Related Histomorphological and Transmission Electron Microscopic Studies ...
Age Related Histomorphological and Transmission Electron Microscopic Studies ...
 
Sample Final Report_ Extraction of CYP microsomes
Sample Final Report_ Extraction of CYP microsomesSample Final Report_ Extraction of CYP microsomes
Sample Final Report_ Extraction of CYP microsomes
 
Proteomics and its applications in phytopathology
Proteomics and its applications in phytopathologyProteomics and its applications in phytopathology
Proteomics and its applications in phytopathology
 
3 genetics syllabus statements
3 genetics syllabus statements3 genetics syllabus statements
3 genetics syllabus statements
 
Interactomics, Integromics to Systems Biology: Next Animal Biotechnology Fron...
Interactomics, Integromics to Systems Biology: Next Animal Biotechnology Fron...Interactomics, Integromics to Systems Biology: Next Animal Biotechnology Fron...
Interactomics, Integromics to Systems Biology: Next Animal Biotechnology Fron...
 
Lecture 8 (biol3600) dna damage and repair - winter 2012
Lecture 8  (biol3600)   dna damage and repair - winter 2012Lecture 8  (biol3600)   dna damage and repair - winter 2012
Lecture 8 (biol3600) dna damage and repair - winter 2012
 
Biological Catalysts: Enzymes
Biological Catalysts: EnzymesBiological Catalysts: Enzymes
Biological Catalysts: Enzymes
 
Dna mutation
Dna mutationDna mutation
Dna mutation
 

Similar to Hidden in plain sight

Intro to structural biology
Intro to structural biologyIntro to structural biology
Intro to structural biology
Brianna Bibel
 
Proteomics
ProteomicsProteomics
Proteomics
Shikha Thakur
 
Enz.Prodctn&Sep.Technq-1.ppt
Enz.Prodctn&Sep.Technq-1.pptEnz.Prodctn&Sep.Technq-1.ppt
Enz.Prodctn&Sep.Technq-1.ppt
gopika201
 
PhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenesPhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenes
Phoenix Bioinformatics
 
EPIGENETICS101-BACCARELLI.EPIGENETICS101-BACCARELLI.PDF
EPIGENETICS101-BACCARELLI.EPIGENETICS101-BACCARELLI.PDFEPIGENETICS101-BACCARELLI.EPIGENETICS101-BACCARELLI.PDF
EPIGENETICS101-BACCARELLI.EPIGENETICS101-BACCARELLI.PDF
hanaabakry2021
 
Introduction to Biochemistry
Introduction to Biochemistry Introduction to Biochemistry
Introduction to Biochemistry
Saurabh Shrivastava
 
The Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in BiologyThe Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in Biology
robertstevens65
 
Mapping protein to function
Mapping protein to functionMapping protein to function
Mapping protein to function
Abhik Seal
 
Psychopharmacology Lab Wet Assays
Psychopharmacology Lab Wet AssaysPsychopharmacology Lab Wet Assays
Psychopharmacology Lab Wet Assays
bwetzell
 
STRUCTURE OF PROTEINS.pdf
STRUCTURE OF PROTEINS.pdfSTRUCTURE OF PROTEINS.pdf
STRUCTURE OF PROTEINS.pdf
IowaPerio
 
Exploring Estrogen’s Role in Metabolism and the Use of 13C-Labeled Nutrients ...
Exploring Estrogen’s Role in Metabolism and the Use of 13C-Labeled Nutrients ...Exploring Estrogen’s Role in Metabolism and the Use of 13C-Labeled Nutrients ...
Exploring Estrogen’s Role in Metabolism and the Use of 13C-Labeled Nutrients ...
InsideScientific
 
Protein protein interactions
Protein protein interactionsProtein protein interactions
Protein protein interactions
SHRIKANT YANKANCHI
 
New proteomics
New proteomicsNew proteomics
New proteomics
Muhammed Rashid Ak
 
Why Proteins Are Essential For Cellular Function
Why Proteins Are Essential For Cellular FunctionWhy Proteins Are Essential For Cellular Function
Why Proteins Are Essential For Cellular Function
Beth Salazar
 
Protein function prediction
Protein function predictionProtein function prediction
Protein function prediction
Lars Juhl Jensen
 
Domains of unknown function are essential in yeast
Domains of unknown function are essential in yeastDomains of unknown function are essential in yeast
Domains of unknown function are essential in yeast
Laura Berry
 
Eukaryotic gene expression
Eukaryotic gene expressionEukaryotic gene expression
Eukaryotic gene expression
AnuKiruthika
 
Eukaryotic gene expression
Eukaryotic gene expressionEukaryotic gene expression
Eukaryotic gene expression
AnuKiruthika
 
Plant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In SequencesPlant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In Sequences
Leighton Pritchard
 
2015 bioinformatics protein_structure_wimvancriekinge
2015 bioinformatics protein_structure_wimvancriekinge2015 bioinformatics protein_structure_wimvancriekinge
2015 bioinformatics protein_structure_wimvancriekinge
Prof. Wim Van Criekinge
 

Similar to Hidden in plain sight (20)

Intro to structural biology
Intro to structural biologyIntro to structural biology
Intro to structural biology
 
Proteomics
ProteomicsProteomics
Proteomics
 
Enz.Prodctn&Sep.Technq-1.ppt
Enz.Prodctn&Sep.Technq-1.pptEnz.Prodctn&Sep.Technq-1.ppt
Enz.Prodctn&Sep.Technq-1.ppt
 
PhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenesPhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenes
 
EPIGENETICS101-BACCARELLI.EPIGENETICS101-BACCARELLI.PDF
EPIGENETICS101-BACCARELLI.EPIGENETICS101-BACCARELLI.PDFEPIGENETICS101-BACCARELLI.EPIGENETICS101-BACCARELLI.PDF
EPIGENETICS101-BACCARELLI.EPIGENETICS101-BACCARELLI.PDF
 
Introduction to Biochemistry
Introduction to Biochemistry Introduction to Biochemistry
Introduction to Biochemistry
 
The Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in BiologyThe Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in Biology
 
Mapping protein to function
Mapping protein to functionMapping protein to function
Mapping protein to function
 
Psychopharmacology Lab Wet Assays
Psychopharmacology Lab Wet AssaysPsychopharmacology Lab Wet Assays
Psychopharmacology Lab Wet Assays
 
STRUCTURE OF PROTEINS.pdf
STRUCTURE OF PROTEINS.pdfSTRUCTURE OF PROTEINS.pdf
STRUCTURE OF PROTEINS.pdf
 
Exploring Estrogen’s Role in Metabolism and the Use of 13C-Labeled Nutrients ...
Exploring Estrogen’s Role in Metabolism and the Use of 13C-Labeled Nutrients ...Exploring Estrogen’s Role in Metabolism and the Use of 13C-Labeled Nutrients ...
Exploring Estrogen’s Role in Metabolism and the Use of 13C-Labeled Nutrients ...
 
Protein protein interactions
Protein protein interactionsProtein protein interactions
Protein protein interactions
 
New proteomics
New proteomicsNew proteomics
New proteomics
 
Why Proteins Are Essential For Cellular Function
Why Proteins Are Essential For Cellular FunctionWhy Proteins Are Essential For Cellular Function
Why Proteins Are Essential For Cellular Function
 
Protein function prediction
Protein function predictionProtein function prediction
Protein function prediction
 
Domains of unknown function are essential in yeast
Domains of unknown function are essential in yeastDomains of unknown function are essential in yeast
Domains of unknown function are essential in yeast
 
Eukaryotic gene expression
Eukaryotic gene expressionEukaryotic gene expression
Eukaryotic gene expression
 
Eukaryotic gene expression
Eukaryotic gene expressionEukaryotic gene expression
Eukaryotic gene expression
 
Plant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In SequencesPlant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In Sequences
 
2015 bioinformatics protein_structure_wimvancriekinge
2015 bioinformatics protein_structure_wimvancriekinge2015 bioinformatics protein_structure_wimvancriekinge
2015 bioinformatics protein_structure_wimvancriekinge
 

More from Valerie Wood

Go users meeting, unknowns
Go users meeting, unknownsGo users meeting, unknowns
Go users meeting, unknowns
Valerie Wood
 
Curate locally, think globally
Curate locally, think globallyCurate locally, think globally
Curate locally, think globally
Valerie Wood
 
GO slimming tips
GO slimming tipsGO slimming tips
GO slimming tips
Valerie Wood
 
Copy of biocuration 2017
Copy of biocuration 2017Copy of biocuration 2017
Copy of biocuration 2017
Valerie Wood
 
PomBase infographic
PomBase infographicPomBase infographic
PomBase infographic
Valerie Wood
 
New PomBase website features
New PomBase website featuresNew PomBase website features
New PomBase website features
Valerie Wood
 
Community curation at PomBase
Community curation at PomBaseCommunity curation at PomBase
Community curation at PomBase
Valerie Wood
 
PomBase conventions for improving annotation depth, breadth, consistency and ...
PomBase conventions for improving annotation depth, breadth, consistency and ...PomBase conventions for improving annotation depth, breadth, consistency and ...
PomBase conventions for improving annotation depth, breadth, consistency and ...
Valerie Wood
 

More from Valerie Wood (8)

Go users meeting, unknowns
Go users meeting, unknownsGo users meeting, unknowns
Go users meeting, unknowns
 
Curate locally, think globally
Curate locally, think globallyCurate locally, think globally
Curate locally, think globally
 
GO slimming tips
GO slimming tipsGO slimming tips
GO slimming tips
 
Copy of biocuration 2017
Copy of biocuration 2017Copy of biocuration 2017
Copy of biocuration 2017
 
PomBase infographic
PomBase infographicPomBase infographic
PomBase infographic
 
New PomBase website features
New PomBase website featuresNew PomBase website features
New PomBase website features
 
Community curation at PomBase
Community curation at PomBaseCommunity curation at PomBase
Community curation at PomBase
 
PomBase conventions for improving annotation depth, breadth, consistency and ...
PomBase conventions for improving annotation depth, breadth, consistency and ...PomBase conventions for improving annotation depth, breadth, consistency and ...
PomBase conventions for improving annotation depth, breadth, consistency and ...
 

Recently uploaded

Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
Daniel Tubbenhauer
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
Texas Alliance of Groundwater Districts
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
vluwdy49
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
Hitesh Sikarwar
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
MaheshaNanjegowda
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Leonel Morgado
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
Vandana Devesh Sharma
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
Sharon Liu
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
terusbelajar5
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
AbdullaAlAsif1
 

Recently uploaded (20)

Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
 

Hidden in plain sight

  • 1. Hidden in plain sight Classifying the known and the unknown proteome Valerie Wood, PomBase
  • 2. We tend to study what we know 5054 protein coding 2154 published, small scale (blue), 2050 inferred from orthologs (red) Steady progress in characterizing proteins already studied in other organisms ‘Unknowns’ decreasing only gradually 509 conserved (green) 321 Schizosaccharomyces specific (purple) Similar situation in other organisms (other organisms checked for annotation) Progress in characterising proteins since 2006
  • 3. Classifying as unknown • Concept of “known function” is vague/arbitrary- we may know the molecular function (oxidase/protease) but nothing about the broader cellular role • For fission yeast we use ‘unknown’ if there is no information about the broad cellular role in which they participate, and thus cannot be assigned to a ‘biological process’ in the ‘GO slim’ (i.e. transcription, translation, replication, amino acid metabolism) • People tend to work on processes, this makes them accessible as candidates for follow up
  • 4. vertebrate, eukaryote only vertebrate, bacteria } 179 (have clear human ortholog) Taxonomic conservation Unknown 830 Schizosaccharomyces only 178 321 fission yeast specific Schizosaccharomyces pombe only 143 other non-fungal eukaryote, no vertebrate fungi only 186 horizontal transfer 19 509 conserved in other organismsfungi and bacteria
  • 5. • To make useful inferences for ‘unknowns’ we need a clear and accurate picture of what we know • A “GO slim” is a way of summarizing the biological roles of an organisms gene products • This “matrix” shows genes co-annotated to pairs of GO slim terms • Many GO slim terms do not share annotations with other slim terms • Used for QC, identify ontology and annotation errors (especially electronic annotation) What is known ?
  • 6. DNA replication, recombination, repair frequently intersect with each other and with Chromosome organization, mitotic cell cycle regulation Rarely (currently never) intersect with: carbohydrate metabolism amino acid metabolism Cytokinesis Lipid metabolism Nucleocytoplasmic transport Protein glycosylation
  • 7. Unknown 830 TOTAL 5054 cytoskeleton org 206 nuclear DNA replication, recombination, repair 305 mitotic chromosome segregation 184 regulation of mitotic cell cycle 232 10 CELL DIVISION 751 27 cytokinesis 110 0 39 1 46 3 4. MITOCHONDRIAL ORG/EXP 280 4 cell wall org 1303 4 1 MEMBRANES, TRAFFICKING, CELL SURFACE 787 14 lipid met 222 vesicle Mediated transport 324 6 glycosylation polysacc met 140membrane org 199 75 0 6 74 10 33 0 detox SMALL MOLECULE TM TRANSPORT 288 13 9 0 AA & sulfur met 220 vitamin cofactor met 9 5 nucleo-base/ side/tide met 219 small sugar met 77 CENTRAL MET, ENERGY AND BUILDING BLOCKS 549 Nitrogen 15 25 174 54 3430 other energy generation 25 23 signalling 404 sexual reproductive process 262 (Many intersections) Other 290 No intersections. Includes adhesion, many proteases, peroxions EXPRESSION 1294 ```` EXPRESSION submod 863 4 1 3 ribosome biogenesis 317 RNA metabolism 772cytoplasmic translation 249 189 c nucleocyto transport 110 5 34 26 2 Transcription 479 32 18 PROTEIN ASSEMBLY/STABILITY 765 protein catabolism & autophagy 251 ubiquitination 192 63 folding 102 complex Assembly 325 1 3 4 1 All cardiolipin synthesis
  • 8. MITOCHONDRIA L ORG/EXP 280 MEMBRANES, TRAFFICKING, CELL SURFACE 787 signalling 404 sexual reproductive process 262 Other 290 TOTAL 5054 PROTEIN ASSEMBLY /STABILITY 765 CELL DIVISION 751 SMALL MOLECULE TM TRANSPORT 288 CENTRAL MET, ENERGY AND BUILDING BLOCKS 549 EXPRESSION 1294 ```` EXPRESSION submod 863 c Transcription 479 Unknown 830 This covers the known “process options” for a single-celled eukaryote - First step for the unknowns , assign to broad process (bring to researchers of interest notice) - If we can predict strong association with some module or submodule is unlikely to be associated with others (caveat) - Provides a ‘framework’ to begin to partition “unknowns” based on general or specific non-process characteristics (constrain predictions, and evaluate them, based on existing knowledge) New biology?
  • 9. Function prediction 1. Find features informative for known processes • Phenotypes • Taxonomic distribution • Location • Catalytic grouping 2. Identify these informative features in unknown protein: 3. Cluster similar features 4. Ask “which known genes best match these profiles?” 5. Look for matching processes
  • 10. Classification/clustering of unknowns (conserved to human subset) 1. Identify informative features for each unknown protein: Phenotypes Location Taxonomic distribution Catalytic grouping 2. Group by similar features For 100/179 of the conserved to human subset see poster 144
  • 11. 1. ER localization 2. >1 <4 TM domain 3. Absent from S. cerevisiae 4. Conserved in vertebrates Ask “which known genes best match these profiles?”
  • 12. Query using PomBase advanced search tool:
  • 13. Look for matching processes “What are these genes enriched for?”
  • 14. 1. Present in nucleus 2. Methyltransferase domain 3. Conserved in bacteria 4. Conserved in vertebrates
  • 15. 11/15 are tRNA metabolism, … there are orphan tRNA enzymes11/15 tRNA met, 3/15 rRNA met
  • 16. Adding another feature “HU sensitivity”, increases specificity for tRNA metabolism
  • 17. Real example SPCC1840.09 was recently characterised as coq11 in S. cerevisiae 4/9 with these phenotypes are ubiquinone biosynthesis 1 transcription 1 (SPAC823.10c) indirect (reannotated to heme transport)
  • 18. Guilt by association All unknowns in STRING (http://string-db.org/) Human AMMECR1 Human MEMO1
  • 19. Using AnGELI AMMECR subnetwork has connections to meiotic cell cycle (7/8), and are upregulated in response to caffeine and rapamycin and stress
  • 20. What do we need to make good predictions? CURATION • Accurate predictions requires high quality curation. Continual removal of known false positive annotations (via ontology errors, incorrect experiments, manual curation errors and incorrect automated mapping) from the ‘true positives training set’ FUNCTION PREDICTION METHODS • Many pipelines for function prediction produce lots of false positives because not fully constrained by existing knowledge • Integrations of all methods, integrated approach to prediction, different methods will suit different processes (no one size fits all) • To identify more informative features (e.g. phenotypes) which correlate strongly and specifically with known processes (i.e. some phenotypes ‘abnormal shape may be enriched for some processes but are non-specific) EXPERIMENTAL DATA • Require more datasets which provide more strong positive and negative discriminators • More high quality physical interactions ACCESS • Make predictions prominently accessible to validate in small scale follow up
  • 22. Acknowledgements • Midori Harris • Antonia Lock • Jurg Bahler • Danny Bitton (AnGeli) • Steve Oliver
  • 24. 167 No biological role: 3436 831 Biological Process e.g. Cell cycle Transcription DNA replication Transport Regulation of process Molecular Function e.g transporter enzyme -protein kinase -ubiquitin ligase -oxidoreductase -protease binding functions enzyme regulator (direct) Cellular Component (location or complex) 455 15 1842 Total 5054 All 3 aspects unknown 717, unknown role Plus 113 where the process Is not very informative total 830 90
  • 25. Find “non process” features that correlate with processes e.g. “Mitochondrial organization” (A GO slim term) More likely to be: Location Phenotype Phenotype Expression Less likely to be: Protein feature Using AnGeLi http://bahlerweb.cs.ucl.ac.uk/cgi-bin/GLA/GLA_input 275/280 genes involved in mitochondrion organization are mitochondrial, BUT not all mitochondrial genes (732) are involved in mitochondrial organization 10/10 genes which show decreased population growth on galactose are mitochondrial organization Less likely to be periodic 8/497 or abnormal cell cycle 4/695 `
  • 26. Between module intersections Numbers of between module intersections are low
  • 27. Between module intersections Excluding signalling co-annotation the intersections between modules are minimal (mainly Ub mediated protein degradation)

Editor's Notes

  1. Well studied proteins become better studied Funders are risk averse
  2. It is difficult to classify ‘unknowns’
  3. How are the unknowns conserved in other species?
  4. Difficult to process all of the information in the matrix. Convert into Euler diagram, grouped the most biologically related processes into modules Between module intersections are l mainly the red diamonds. Otherwise very small numbers of intersections between “modules”, easily explainable by biology.
  5. Here have phenotypes mainly morphology ad drug sensitivity Location Presence of tm domains Some taxonomic distribution (present in bacteria/ absent from S.c all of these are conserved to human) Catalytic activity
  6. Add results over top animated
  7. 3 phenotypes (sensitive to cadmium, duboredoxin and decreased mating efficiency strongly implicated SPAC823.10c and SPCC1840.09 in coenzyme Q biosynthesis. SPCC1840.09 now characterised as coq11
  8. Can look for potentially functionally interacting, demonstrated with string which includes information from expression, interactions across species It seems that there are likely functional connections between some of the unknown. A strong subnetwork centred around human AMMERC1 (disease ass) and MEMO1 genes
  9. Currently trying to make sense of this cluster by inspecting orthologs more closely. Membrane or organell organization…..
  10. People work on processes, need the process to bring ont someones radar PomBase uses strict criteria for assigning GO process terms Of the “90” with nothing 29 are schizosaccharomyces specific, 24 are pombe specific (some may not be real)
  11. Better if genome wide data avaiable Functionaly related genes tend to phenocopy each other Update using ‘known’ background?