SlideShare a Scribd company logo
1 of 15
Download to read offline
Gene Ontology
Slimming tips
Val Wood GO consortium meeting Cambridge Oct 2017
Whole genome slims
● Provide a summary of an organim’s biology
● As a resource to plan curation (unannotated genes, intersections); to identify
“unknown/uncharacterised genes”
■ Need to be biologically relevant, reduced redundancy is better
■ Need as complete coverage as possible
Single gene overview (Allied Genome Resources ribbon)
○ Informs database user to branches of GO applicable to a single gene product (filter)
■ Usually higher level general grouping terms (redundancy is less critical)
● To interpret analysis (slimming prior to enrichment helps to interpret results, orientation)
● Summarize/display experimental results- smallest possible number of terms, but specific enough to
convey results
● Taxon specific slim
Slimming results sets (subsets of genes)
● There is no “one size fits all”, different slims for different use cases.
● With a ‘generic slim’ we should to provide instructions how to refine
Common Uses of GO Slims
Coverage 1: Only slim one aspect at a time!
All 3 aspects “unknown” =
Biological Process unknown =
(103+195+23+429)
103
750
Pombe using Pombase slim
Unslimmed
Unknown
Coverage 2: Distinguish unannotated/unknown/unslimmed
= IDs not recognised by the slim tool (i.e not in GO database)
http://go.princeton.edu/cgi-bin/GOTermMapper
will provide all 3 numbers (and can use your own slim)
Unannotated
These 365 identifiers were not annotated in the slim, but they had non-root annotations that were not in
the slim:
These 734 identifiers had no non-root annotations:
Total 1099 un-slimmed
Pombe using AGR slim
This number should be small in
a slim with good coverage
Coverage 3: Minimise “unslimmed”
Pombe using PomBase slim
It is difficult to define a slim to cover all annotated gene
products without including terms with:
■ very small numbers of annotations,
■ or high level or biologically
uninformative terms
PomBase AGR slim summary in matrix http://amigo.geneontology.org/matrix (Terms are on both axis, totals on diagonal)
Some terms are not
biologically informative for
a generic slim
because they can apply to
*any* biological process
Indicated by intersections
with every process
Information content low
Non-specific
Exact subset
OK
Relevance 1: A balance between coverage and content
Relevance 1: Avoid going “to high”
Broad groupings
Good for ribbon diagrams
(display)
Not good for summarizing
biology.
“Response to stimulus” is not
very informative about biology
(but covers >8000 (33%) mouse
gene products )
Regulation of biological process
50%
Mouse AGR slim in matrix http://amigo.geneontology.org/matrix (Terms are on both axis, totals on diagonal)
Mouse using AGR slim and GO term slim mapper
Your input list contains 22928 genes.
These 2037 identifiers were found to be unannotated:
These 420 identifiers were not annotated in the slim, but they had non-root annotations that were not in the slim:
These 4037 identifiers had no non-root annotations:
Deleting “cell differentiation” loses 0 (descendant of development).
Deleting “cell proliferation” loses only 5, most covered by development and cell cycle.
Deleting “regulation of biological process” only loses 51 even though over half 11658 proteins
annotated.
These 476 identifiers were not annotated in the slim, but they had non-root annotations that were not in the slim:
Relevance 2: Minimising overlaps (redundancy)
Because many gene products are annotated to multiple terms, it not possible to
create a slim with no overlaps.
If term removal doesn’t change number slimmed is might not be so be useful for
a slim.
Complete subset terms should be avoided
Relevance 3: Lumping vs. splitting with common parent
Very little intersection
Between/within modules
Largely unconnected
but have common parent in the
GO:
nuclear and mitochondrial
gene expression
transmembrane and
vesicle-mediated and
nucleocytoplasmic
transport
Relevance 3: Value of splitting, example with real data
Geneexpression
From Hayles et al A genome-wide resource of cell cycle and cell shape genes of fission yeast.
Current PomBase slim in
matrix, overlaps low,
information content high
Zero overlaps between
vesicle-mediated transport
Nucleocytoplasmic transport
and transmembrane
transport, not biologically
connected.
Relevance 3: Lumping vs. splitting with common parent
Relevance 4: Avoid single step processes
● GO:0016570 histone modification
● GO:0006468 protein phosphorylation
● GO:0006470 protein dephosphorylation
● GO:0043543 protein acylation
● GO:0016310 phosphorylation
● GO:0016311 dephosphorylation
● GO:0055114 oxidation-reduction process
● GO:0006464 cellular protein modification process
● GO:0043086 negative regulation of catalytic activity
All are examples of molecular function grouping terms in the BP ontology.
Not informative about physiological role, only biochemical role
For this reason “protein metabolism” the ancestor of protein modifications should
also be avoided in the generic slim.
Proposed Iterative procedure
Evaluate individual species coverage of existing generic slim (BP)
What is missing? Add terms to cover
Evaluate species coverage
Which terms could be removed without affecting coverage? Remove
Test (evaluate species coverage changes)
What is missing? Add terms to cover
Evaluate species coverage
Which terms should be split to improve biological relevence? Split
Check coverage was not affected (or recommend improved annotation
specificity)
Spares
Possible changes to evaluate
Remove
cell proliferation
cell differentiation
cellular component organization and biogenesis
RNA processing (see Gene expression)
regulation of biological process
Add
cytoskeleton organization
chromatin organization
ribosome biogenesis (>1000 annot)
tRNA metabolic process (1157 annot)
gene expression (includes translation)
Not covered currently
detoxification
amino acid metabolic process (or
vitamin metabolic process small
cofactor metabolic process molecule?)

More Related Content

What's hot

Specificity and Evolvability in Eukaryotic Protein Interaction Networks
Specificity and Evolvability in Eukaryotic Protein Interaction NetworksSpecificity and Evolvability in Eukaryotic Protein Interaction Networks
Specificity and Evolvability in Eukaryotic Protein Interaction Networkspedrobeltrao
 
Genome wide association studies seminar
Genome wide association studies seminarGenome wide association studies seminar
Genome wide association studies seminarVarsha Gayatonde
 
Comparative and functional genomics
Comparative and functional genomicsComparative and functional genomics
Comparative and functional genomicsJalormi Parekh
 
Introduction to epigenetics and study design
Introduction to epigenetics and study designIntroduction to epigenetics and study design
Introduction to epigenetics and study designamlbinder
 
Genetics for Under-graduates - Dr HK Garg
Genetics for Under-graduates - Dr HK GargGenetics for Under-graduates - Dr HK Garg
Genetics for Under-graduates - Dr HK GargPROFESSOR Dr. H.K. Garg
 
10.2 inherritance
10.2 inherritance10.2 inherritance
10.2 inherritancelucascw
 
Mutation gene and chromosomal
Mutation gene and chromosomalMutation gene and chromosomal
Mutation gene and chromosomalAlfinBaby
 
Alvis Brazma, Array Express Gene Expression Atlas, fged_seattle_2013
Alvis Brazma, Array Express Gene Expression Atlas, fged_seattle_2013Alvis Brazma, Array Express Gene Expression Atlas, fged_seattle_2013
Alvis Brazma, Array Express Gene Expression Atlas, fged_seattle_2013Functional Genomics Data Society
 
Mutations can change the meaning of genes
Mutations can change the meaning of genesMutations can change the meaning of genes
Mutations can change the meaning of genesSofia Paz
 
Mutation and its types
Mutation and its typesMutation and its types
Mutation and its typesIkram Ullah
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomicshemantbreeder
 

What's hot (20)

Molecular evolution
Molecular evolutionMolecular evolution
Molecular evolution
 
Specificity and Evolvability in Eukaryotic Protein Interaction Networks
Specificity and Evolvability in Eukaryotic Protein Interaction NetworksSpecificity and Evolvability in Eukaryotic Protein Interaction Networks
Specificity and Evolvability in Eukaryotic Protein Interaction Networks
 
Genie
GenieGenie
Genie
 
Genome wide association studies seminar
Genome wide association studies seminarGenome wide association studies seminar
Genome wide association studies seminar
 
Plant Genetic
Plant Genetic Plant Genetic
Plant Genetic
 
Comparative and functional genomics
Comparative and functional genomicsComparative and functional genomics
Comparative and functional genomics
 
B10vrv4133
B10vrv4133B10vrv4133
B10vrv4133
 
Mutation
MutationMutation
Mutation
 
Mutation
MutationMutation
Mutation
 
Introduction to epigenetics and study design
Introduction to epigenetics and study designIntroduction to epigenetics and study design
Introduction to epigenetics and study design
 
Genetics for Under-graduates - Dr HK Garg
Genetics for Under-graduates - Dr HK GargGenetics for Under-graduates - Dr HK Garg
Genetics for Under-graduates - Dr HK Garg
 
10.2 inherritance
10.2 inherritance10.2 inherritance
10.2 inherritance
 
Mutations
MutationsMutations
Mutations
 
Genetic Mutations 1
Genetic Mutations 1Genetic Mutations 1
Genetic Mutations 1
 
Mutation gene and chromosomal
Mutation gene and chromosomalMutation gene and chromosomal
Mutation gene and chromosomal
 
Alvis Brazma, Array Express Gene Expression Atlas, fged_seattle_2013
Alvis Brazma, Array Express Gene Expression Atlas, fged_seattle_2013Alvis Brazma, Array Express Gene Expression Atlas, fged_seattle_2013
Alvis Brazma, Array Express Gene Expression Atlas, fged_seattle_2013
 
Mutations can change the meaning of genes
Mutations can change the meaning of genesMutations can change the meaning of genes
Mutations can change the meaning of genes
 
Mutation and its types
Mutation and its typesMutation and its types
Mutation and its types
 
Genetic Mutations
Genetic MutationsGenetic Mutations
Genetic Mutations
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 

Similar to GO slimming tips

Introduction to Gene Mining: Part B: How similar are plant and animal version...
Introduction to Gene Mining: Part B: How similar are plant and animal version...Introduction to Gene Mining: Part B: How similar are plant and animal version...
Introduction to Gene Mining: Part B: How similar are plant and animal version...adcobb
 
Go users meeting, unknowns
Go users meeting, unknownsGo users meeting, unknowns
Go users meeting, unknownsValerie Wood
 
Proteomics a search tool for vaccines
Proteomics a search tool for vaccinesProteomics a search tool for vaccines
Proteomics a search tool for vaccinesLawrence Okoror
 
AQA Biology Unit 2 Revision Notes
AQA Biology Unit 2 Revision NotesAQA Biology Unit 2 Revision Notes
AQA Biology Unit 2 Revision NotesAndy Hubbert
 
improved cultivation and metagenomics as new tools for bioprospecting in cold...
improved cultivation and metagenomics as new tools for bioprospecting in cold...improved cultivation and metagenomics as new tools for bioprospecting in cold...
improved cultivation and metagenomics as new tools for bioprospecting in cold...Nicol Hormazabal
 
Gene hunting strategies
Gene hunting strategiesGene hunting strategies
Gene hunting strategiesAshfaq Ahmad
 
Rice stress related gene expression analysis
Rice stress related gene expression analysisRice stress related gene expression analysis
Rice stress related gene expression analysisRonHazarika
 
DNA methylation: from array to sequencing
DNA methylation: from array to sequencingDNA methylation: from array to sequencing
DNA methylation: from array to sequencingjyotirmoy211
 
Mapping protein to function
Mapping protein to functionMapping protein to function
Mapping protein to functionAbhik Seal
 
A Biclustering Method for Rationalizing Chemical Biology Mechanisms of Action
A Biclustering Method for Rationalizing Chemical Biology Mechanisms of ActionA Biclustering Method for Rationalizing Chemical Biology Mechanisms of Action
A Biclustering Method for Rationalizing Chemical Biology Mechanisms of ActionGerald Lushington
 
Directed evolution
Directed evolutionDirected evolution
Directed evolutionIfrah Ishaq
 
Curate locally, think globally
Curate locally, think globallyCurate locally, think globally
Curate locally, think globallyValerie Wood
 
Techniques of-biotechnology-mcclean-good
Techniques of-biotechnology-mcclean-goodTechniques of-biotechnology-mcclean-good
Techniques of-biotechnology-mcclean-goodAmit Gupta
 
Techniques of-biotechnology-mcclean-good
Techniques of-biotechnology-mcclean-goodTechniques of-biotechnology-mcclean-good
Techniques of-biotechnology-mcclean-goodAmit Gupta
 
Protein function prediction
Protein function predictionProtein function prediction
Protein function predictionLars Juhl Jensen
 
UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOAEBI
 
Long-lasting alterations to DNA methylation and ncRNAs could underlie the eff...
Long-lasting alterations to DNA methylation and ncRNAs could underlie the eff...Long-lasting alterations to DNA methylation and ncRNAs could underlie the eff...
Long-lasting alterations to DNA methylation and ncRNAs could underlie the eff...Ben Laufer
 
Report- Genome wide association studies.
Report- Genome wide association studies.Report- Genome wide association studies.
Report- Genome wide association studies.Varsha Gayatonde
 
Copy of biocuration 2017
Copy of biocuration 2017Copy of biocuration 2017
Copy of biocuration 2017Valerie Wood
 
Modeling exposure events and adverse outcome pathways using ontologies
Modeling exposure events and adverse outcome pathways using ontologiesModeling exposure events and adverse outcome pathways using ontologies
Modeling exposure events and adverse outcome pathways using ontologiesChris Mungall
 

Similar to GO slimming tips (20)

Introduction to Gene Mining: Part B: How similar are plant and animal version...
Introduction to Gene Mining: Part B: How similar are plant and animal version...Introduction to Gene Mining: Part B: How similar are plant and animal version...
Introduction to Gene Mining: Part B: How similar are plant and animal version...
 
Go users meeting, unknowns
Go users meeting, unknownsGo users meeting, unknowns
Go users meeting, unknowns
 
Proteomics a search tool for vaccines
Proteomics a search tool for vaccinesProteomics a search tool for vaccines
Proteomics a search tool for vaccines
 
AQA Biology Unit 2 Revision Notes
AQA Biology Unit 2 Revision NotesAQA Biology Unit 2 Revision Notes
AQA Biology Unit 2 Revision Notes
 
improved cultivation and metagenomics as new tools for bioprospecting in cold...
improved cultivation and metagenomics as new tools for bioprospecting in cold...improved cultivation and metagenomics as new tools for bioprospecting in cold...
improved cultivation and metagenomics as new tools for bioprospecting in cold...
 
Gene hunting strategies
Gene hunting strategiesGene hunting strategies
Gene hunting strategies
 
Rice stress related gene expression analysis
Rice stress related gene expression analysisRice stress related gene expression analysis
Rice stress related gene expression analysis
 
DNA methylation: from array to sequencing
DNA methylation: from array to sequencingDNA methylation: from array to sequencing
DNA methylation: from array to sequencing
 
Mapping protein to function
Mapping protein to functionMapping protein to function
Mapping protein to function
 
A Biclustering Method for Rationalizing Chemical Biology Mechanisms of Action
A Biclustering Method for Rationalizing Chemical Biology Mechanisms of ActionA Biclustering Method for Rationalizing Chemical Biology Mechanisms of Action
A Biclustering Method for Rationalizing Chemical Biology Mechanisms of Action
 
Directed evolution
Directed evolutionDirected evolution
Directed evolution
 
Curate locally, think globally
Curate locally, think globallyCurate locally, think globally
Curate locally, think globally
 
Techniques of-biotechnology-mcclean-good
Techniques of-biotechnology-mcclean-goodTechniques of-biotechnology-mcclean-good
Techniques of-biotechnology-mcclean-good
 
Techniques of-biotechnology-mcclean-good
Techniques of-biotechnology-mcclean-goodTechniques of-biotechnology-mcclean-good
Techniques of-biotechnology-mcclean-good
 
Protein function prediction
Protein function predictionProtein function prediction
Protein function prediction
 
UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOA
 
Long-lasting alterations to DNA methylation and ncRNAs could underlie the eff...
Long-lasting alterations to DNA methylation and ncRNAs could underlie the eff...Long-lasting alterations to DNA methylation and ncRNAs could underlie the eff...
Long-lasting alterations to DNA methylation and ncRNAs could underlie the eff...
 
Report- Genome wide association studies.
Report- Genome wide association studies.Report- Genome wide association studies.
Report- Genome wide association studies.
 
Copy of biocuration 2017
Copy of biocuration 2017Copy of biocuration 2017
Copy of biocuration 2017
 
Modeling exposure events and adverse outcome pathways using ontologies
Modeling exposure events and adverse outcome pathways using ontologiesModeling exposure events and adverse outcome pathways using ontologies
Modeling exposure events and adverse outcome pathways using ontologies
 

Recently uploaded

Information science research with large language models: between science and ...
Information science research with large language models: between science and ...Information science research with large language models: between science and ...
Information science research with large language models: between science and ...Fabiano Dalpiaz
 
Technical english Technical english.pptx
Technical english Technical english.pptxTechnical english Technical english.pptx
Technical english Technical english.pptxyoussefboujtat3
 
dkNET Webinar: The 4DN Data Portal - Data, Resources and Tools to Help Elucid...
dkNET Webinar: The 4DN Data Portal - Data, Resources and Tools to Help Elucid...dkNET Webinar: The 4DN Data Portal - Data, Resources and Tools to Help Elucid...
dkNET Webinar: The 4DN Data Portal - Data, Resources and Tools to Help Elucid...dkNET
 
GBSN - Biochemistry (Unit 3) Metabolism
GBSN - Biochemistry (Unit 3) MetabolismGBSN - Biochemistry (Unit 3) Metabolism
GBSN - Biochemistry (Unit 3) MetabolismAreesha Ahmad
 
Heads-Up Multitasker: CHI 2024 Presentation.pdf
Heads-Up Multitasker: CHI 2024 Presentation.pdfHeads-Up Multitasker: CHI 2024 Presentation.pdf
Heads-Up Multitasker: CHI 2024 Presentation.pdfbyp19971001
 
MSCII_ FCT UNIT 5 TOXICOLOGY.pdf
MSCII_              FCT UNIT 5 TOXICOLOGY.pdfMSCII_              FCT UNIT 5 TOXICOLOGY.pdf
MSCII_ FCT UNIT 5 TOXICOLOGY.pdfSuchita Rawat
 
Classification of Kerogen, Perspective on palynofacies in depositional envi...
Classification of Kerogen,  Perspective on palynofacies in depositional  envi...Classification of Kerogen,  Perspective on palynofacies in depositional  envi...
Classification of Kerogen, Perspective on palynofacies in depositional envi...Sangram Sahoo
 
Efficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence accelerationEfficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence accelerationSérgio Sacani
 
Electricity and Circuits for Grade 9 students
Electricity and Circuits for Grade 9 studentsElectricity and Circuits for Grade 9 students
Electricity and Circuits for Grade 9 studentslevieagacer
 
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptx
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptxPOST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptx
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptxArpitaMishra69
 
Costs to heap leach gold ore tailings in Karamoja region of Uganda
Costs to heap leach gold ore tailings in Karamoja region of UgandaCosts to heap leach gold ore tailings in Karamoja region of Uganda
Costs to heap leach gold ore tailings in Karamoja region of UgandaTimothyOkuna
 
PHOTOSYNTHETIC BACTERIA (OXYGENIC AND ANOXYGENIC)
PHOTOSYNTHETIC BACTERIA  (OXYGENIC AND ANOXYGENIC)PHOTOSYNTHETIC BACTERIA  (OXYGENIC AND ANOXYGENIC)
PHOTOSYNTHETIC BACTERIA (OXYGENIC AND ANOXYGENIC)kushbuR
 
Film Coated Tablet and Film Coating raw materials.pdf
Film Coated Tablet and Film Coating raw materials.pdfFilm Coated Tablet and Film Coating raw materials.pdf
Film Coated Tablet and Film Coating raw materials.pdfPharmatech-rx
 
GBSN - Microbiology (Unit 4) Concept of Asepsis
GBSN - Microbiology (Unit 4) Concept of AsepsisGBSN - Microbiology (Unit 4) Concept of Asepsis
GBSN - Microbiology (Unit 4) Concept of AsepsisAreesha Ahmad
 
GBSN - Microbiology (Unit 5) Concept of isolation
GBSN - Microbiology (Unit 5) Concept of isolationGBSN - Microbiology (Unit 5) Concept of isolation
GBSN - Microbiology (Unit 5) Concept of isolationAreesha Ahmad
 
TEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdfTEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdfmarcuskenyatta275
 
MSC IV_Forensic medicine - Mechanical injuries.pdf
MSC IV_Forensic medicine - Mechanical injuries.pdfMSC IV_Forensic medicine - Mechanical injuries.pdf
MSC IV_Forensic medicine - Mechanical injuries.pdfSuchita Rawat
 
Heat Units in plant physiology and the importance of Growing Degree days
Heat Units in plant physiology and the importance of Growing Degree daysHeat Units in plant physiology and the importance of Growing Degree days
Heat Units in plant physiology and the importance of Growing Degree daysBrahmesh Reddy B R
 
Polyethylene and its polymerization.pptx
Polyethylene and its polymerization.pptxPolyethylene and its polymerization.pptx
Polyethylene and its polymerization.pptxMuhammadRazzaq31
 

Recently uploaded (20)

Information science research with large language models: between science and ...
Information science research with large language models: between science and ...Information science research with large language models: between science and ...
Information science research with large language models: between science and ...
 
Technical english Technical english.pptx
Technical english Technical english.pptxTechnical english Technical english.pptx
Technical english Technical english.pptx
 
dkNET Webinar: The 4DN Data Portal - Data, Resources and Tools to Help Elucid...
dkNET Webinar: The 4DN Data Portal - Data, Resources and Tools to Help Elucid...dkNET Webinar: The 4DN Data Portal - Data, Resources and Tools to Help Elucid...
dkNET Webinar: The 4DN Data Portal - Data, Resources and Tools to Help Elucid...
 
GBSN - Biochemistry (Unit 3) Metabolism
GBSN - Biochemistry (Unit 3) MetabolismGBSN - Biochemistry (Unit 3) Metabolism
GBSN - Biochemistry (Unit 3) Metabolism
 
Heads-Up Multitasker: CHI 2024 Presentation.pdf
Heads-Up Multitasker: CHI 2024 Presentation.pdfHeads-Up Multitasker: CHI 2024 Presentation.pdf
Heads-Up Multitasker: CHI 2024 Presentation.pdf
 
MSCII_ FCT UNIT 5 TOXICOLOGY.pdf
MSCII_              FCT UNIT 5 TOXICOLOGY.pdfMSCII_              FCT UNIT 5 TOXICOLOGY.pdf
MSCII_ FCT UNIT 5 TOXICOLOGY.pdf
 
Classification of Kerogen, Perspective on palynofacies in depositional envi...
Classification of Kerogen,  Perspective on palynofacies in depositional  envi...Classification of Kerogen,  Perspective on palynofacies in depositional  envi...
Classification of Kerogen, Perspective on palynofacies in depositional envi...
 
Efficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence accelerationEfficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence acceleration
 
Electricity and Circuits for Grade 9 students
Electricity and Circuits for Grade 9 studentsElectricity and Circuits for Grade 9 students
Electricity and Circuits for Grade 9 students
 
HIV AND INFULENZA VIRUS PPT HIV PPT INFULENZA VIRUS PPT
HIV AND INFULENZA VIRUS PPT HIV PPT  INFULENZA VIRUS PPTHIV AND INFULENZA VIRUS PPT HIV PPT  INFULENZA VIRUS PPT
HIV AND INFULENZA VIRUS PPT HIV PPT INFULENZA VIRUS PPT
 
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptx
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptxPOST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptx
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptx
 
Costs to heap leach gold ore tailings in Karamoja region of Uganda
Costs to heap leach gold ore tailings in Karamoja region of UgandaCosts to heap leach gold ore tailings in Karamoja region of Uganda
Costs to heap leach gold ore tailings in Karamoja region of Uganda
 
PHOTOSYNTHETIC BACTERIA (OXYGENIC AND ANOXYGENIC)
PHOTOSYNTHETIC BACTERIA  (OXYGENIC AND ANOXYGENIC)PHOTOSYNTHETIC BACTERIA  (OXYGENIC AND ANOXYGENIC)
PHOTOSYNTHETIC BACTERIA (OXYGENIC AND ANOXYGENIC)
 
Film Coated Tablet and Film Coating raw materials.pdf
Film Coated Tablet and Film Coating raw materials.pdfFilm Coated Tablet and Film Coating raw materials.pdf
Film Coated Tablet and Film Coating raw materials.pdf
 
GBSN - Microbiology (Unit 4) Concept of Asepsis
GBSN - Microbiology (Unit 4) Concept of AsepsisGBSN - Microbiology (Unit 4) Concept of Asepsis
GBSN - Microbiology (Unit 4) Concept of Asepsis
 
GBSN - Microbiology (Unit 5) Concept of isolation
GBSN - Microbiology (Unit 5) Concept of isolationGBSN - Microbiology (Unit 5) Concept of isolation
GBSN - Microbiology (Unit 5) Concept of isolation
 
TEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdfTEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdf
 
MSC IV_Forensic medicine - Mechanical injuries.pdf
MSC IV_Forensic medicine - Mechanical injuries.pdfMSC IV_Forensic medicine - Mechanical injuries.pdf
MSC IV_Forensic medicine - Mechanical injuries.pdf
 
Heat Units in plant physiology and the importance of Growing Degree days
Heat Units in plant physiology and the importance of Growing Degree daysHeat Units in plant physiology and the importance of Growing Degree days
Heat Units in plant physiology and the importance of Growing Degree days
 
Polyethylene and its polymerization.pptx
Polyethylene and its polymerization.pptxPolyethylene and its polymerization.pptx
Polyethylene and its polymerization.pptx
 

GO slimming tips

  • 1. Gene Ontology Slimming tips Val Wood GO consortium meeting Cambridge Oct 2017
  • 2. Whole genome slims ● Provide a summary of an organim’s biology ● As a resource to plan curation (unannotated genes, intersections); to identify “unknown/uncharacterised genes” ■ Need to be biologically relevant, reduced redundancy is better ■ Need as complete coverage as possible Single gene overview (Allied Genome Resources ribbon) ○ Informs database user to branches of GO applicable to a single gene product (filter) ■ Usually higher level general grouping terms (redundancy is less critical) ● To interpret analysis (slimming prior to enrichment helps to interpret results, orientation) ● Summarize/display experimental results- smallest possible number of terms, but specific enough to convey results ● Taxon specific slim Slimming results sets (subsets of genes) ● There is no “one size fits all”, different slims for different use cases. ● With a ‘generic slim’ we should to provide instructions how to refine Common Uses of GO Slims
  • 3. Coverage 1: Only slim one aspect at a time! All 3 aspects “unknown” = Biological Process unknown = (103+195+23+429) 103 750
  • 4. Pombe using Pombase slim Unslimmed Unknown Coverage 2: Distinguish unannotated/unknown/unslimmed = IDs not recognised by the slim tool (i.e not in GO database) http://go.princeton.edu/cgi-bin/GOTermMapper will provide all 3 numbers (and can use your own slim) Unannotated
  • 5. These 365 identifiers were not annotated in the slim, but they had non-root annotations that were not in the slim: These 734 identifiers had no non-root annotations: Total 1099 un-slimmed Pombe using AGR slim This number should be small in a slim with good coverage Coverage 3: Minimise “unslimmed” Pombe using PomBase slim It is difficult to define a slim to cover all annotated gene products without including terms with: ■ very small numbers of annotations, ■ or high level or biologically uninformative terms
  • 6. PomBase AGR slim summary in matrix http://amigo.geneontology.org/matrix (Terms are on both axis, totals on diagonal) Some terms are not biologically informative for a generic slim because they can apply to *any* biological process Indicated by intersections with every process Information content low Non-specific Exact subset OK Relevance 1: A balance between coverage and content
  • 7. Relevance 1: Avoid going “to high” Broad groupings Good for ribbon diagrams (display) Not good for summarizing biology. “Response to stimulus” is not very informative about biology (but covers >8000 (33%) mouse gene products ) Regulation of biological process 50% Mouse AGR slim in matrix http://amigo.geneontology.org/matrix (Terms are on both axis, totals on diagonal)
  • 8. Mouse using AGR slim and GO term slim mapper Your input list contains 22928 genes. These 2037 identifiers were found to be unannotated: These 420 identifiers were not annotated in the slim, but they had non-root annotations that were not in the slim: These 4037 identifiers had no non-root annotations: Deleting “cell differentiation” loses 0 (descendant of development). Deleting “cell proliferation” loses only 5, most covered by development and cell cycle. Deleting “regulation of biological process” only loses 51 even though over half 11658 proteins annotated. These 476 identifiers were not annotated in the slim, but they had non-root annotations that were not in the slim: Relevance 2: Minimising overlaps (redundancy) Because many gene products are annotated to multiple terms, it not possible to create a slim with no overlaps. If term removal doesn’t change number slimmed is might not be so be useful for a slim. Complete subset terms should be avoided
  • 9. Relevance 3: Lumping vs. splitting with common parent Very little intersection Between/within modules Largely unconnected but have common parent in the GO: nuclear and mitochondrial gene expression transmembrane and vesicle-mediated and nucleocytoplasmic transport
  • 10. Relevance 3: Value of splitting, example with real data Geneexpression From Hayles et al A genome-wide resource of cell cycle and cell shape genes of fission yeast.
  • 11. Current PomBase slim in matrix, overlaps low, information content high Zero overlaps between vesicle-mediated transport Nucleocytoplasmic transport and transmembrane transport, not biologically connected. Relevance 3: Lumping vs. splitting with common parent
  • 12. Relevance 4: Avoid single step processes ● GO:0016570 histone modification ● GO:0006468 protein phosphorylation ● GO:0006470 protein dephosphorylation ● GO:0043543 protein acylation ● GO:0016310 phosphorylation ● GO:0016311 dephosphorylation ● GO:0055114 oxidation-reduction process ● GO:0006464 cellular protein modification process ● GO:0043086 negative regulation of catalytic activity All are examples of molecular function grouping terms in the BP ontology. Not informative about physiological role, only biochemical role For this reason “protein metabolism” the ancestor of protein modifications should also be avoided in the generic slim.
  • 13. Proposed Iterative procedure Evaluate individual species coverage of existing generic slim (BP) What is missing? Add terms to cover Evaluate species coverage Which terms could be removed without affecting coverage? Remove Test (evaluate species coverage changes) What is missing? Add terms to cover Evaluate species coverage Which terms should be split to improve biological relevence? Split Check coverage was not affected (or recommend improved annotation specificity)
  • 15. Possible changes to evaluate Remove cell proliferation cell differentiation cellular component organization and biogenesis RNA processing (see Gene expression) regulation of biological process Add cytoskeleton organization chromatin organization ribosome biogenesis (>1000 annot) tRNA metabolic process (1157 annot) gene expression (includes translation) Not covered currently detoxification amino acid metabolic process (or vitamin metabolic process small cofactor metabolic process molecule?)