Functional Annotation      and the Gene Ontology          Brett TylerVirginia Bioinformatics Institute
What is Annotation• comments, notes, explanations, or other  types of external remarks that can be  attached to a document...
Functional Annotation                        Structural Annotation                                SearchesNucleotide/Prote...
Functional Annotation                     Structural Annotation                          SearchesNucleotide/Protein Databa...
Automated Searches• Search programs can be downloaded  and run internally on unix system• Graphic user interfaces but norm...
Homology or similarity based               searches•   Local pairwise alignment tools : look for any regions of    similar...
BLAST Programs• Blastn: Search a nucleotide database using  a nucleotide query• BlastP: Search protein database using a  p...
Example of BLAST outputtop row is the search protein (query) and the bottom row is the match protein(subject).Middle row i...
Functional Annotation                     Structural Annotation                          SearchesNucleotide/Protein Databa...
Domain SearchHidden Markov Models• Stastistical models of the primary  structure consensus of a sequence  family
Pfam         http://pfam.sanger.ac.uk/• Large collection of protein families  represented by multiple sequence  alignments...
INTERPRO         http://www.ebi.ac.uk/interpro/• Database of protein families, domains and  sites with identified in known...
Subcellular localization• Signal P:Predicts the presence and location  of signal peptide and cleavage sites in  organism• ...
Signal P Searchhttp://www.cbs.dtu.dk/services/SignalP/
Sample SignalP OutputCRN2…confirmed with proteomics
Sample SignalP OutputCRN2…confirmed with proteomics
Search EC numbershttp://ca.expasy.org/enzyme/
Functional Annotation                     Structural Annotation                          SearchesNucleotide/Protein Databa...
Metabolic Pathways•Help improve annotation by showing missinggenes in essentail pathways•Useful for comparative genomicsKE...
KEGG: Kyoto Encyclopedia of Genes and               Genomeshttp://www.genome.jp/kegg/pathway.html
Functional Annotation                     Structural Annotation                          SearchesNucleotide/Protein Databa...
Some initial PAMGO Biological Process Terms      Included in initial 35 terms added Jan 2005           First set of terms ...
GO: 0052048 interaction with host via secreted substance    GO: 0052044 induction by symbiont of host programmed cell deat...
GO: 0052048 interaction with host via secreted substance    GO: 0052044 induction by symbiont of host programmed cell deat...
Functional Annotation                     Structural Annotation                          SearchesNucleotide/Protein Databa...
Why manual AnnotationCombine all search information and evidenceManually look through all informationAdd experimental data...
Tyler functional annotation thurs 1120
Tyler functional annotation thurs 1120
Tyler functional annotation thurs 1120
Upcoming SlideShare
Loading in...5
×

Tyler functional annotation thurs 1120

749

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
749
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Tyler functional annotation thurs 1120

  1. 1. Functional Annotation and the Gene Ontology Brett TylerVirginia Bioinformatics Institute
  2. 2. What is Annotation• comments, notes, explanations, or other types of external remarks that can be attached to a document……• For genomics functional annotation means attaching biological information to sequences
  3. 3. Functional Annotation Structural Annotation SearchesNucleotide/Protein Databases Domain/Motifs Assignments EC Number Metabolic Automated GO Pathways Manual curation
  4. 4. Functional Annotation Structural Annotation SearchesNucleotide/Protein Databases Domain/Motifs Assignments EC Number Metabolic Automated GO Pathways Manual curation
  5. 5. Automated Searches• Search programs can be downloaded and run internally on unix system• Graphic user interfaces but normally takes limited sequences
  6. 6. Homology or similarity based searches• Local pairwise alignment tools : look for any regions of similarity within the proteins that score well. – BLAST • fast• Global pairwise alignment tools take two sequences and attempt to find an alignment of the two over their full lengths. – Needleman-Wunsch • finds best out of all possible alignments• Multiple alignments tools try to align 3 or more proteins so that the maximal number of amino acids from each protein are matched in the alignment - this may or may not include the full length of some or all of the proteins – clustalW
  7. 7. BLAST Programs• Blastn: Search a nucleotide database using a nucleotide query• BlastP: Search protein database using a protein query• Blastx: Search protein database using a translated nucleotide query• Tblastn: Search translated nucleotide database using a protein query• Tblastx: Search translated nucleotide database using a translated nucleotide query
  8. 8. Example of BLAST outputtop row is the search protein (query) and the bottom row is the match protein(subject).Middle row is consensus+ indicates similar amino acidsnumbers indicate amino acid position in the sequence
  9. 9. Functional Annotation Structural Annotation SearchesNucleotide/Protein Databases Domain/Motifs Assignments Metabolic EC Number Automated GO Pathways Manual curation
  10. 10. Domain SearchHidden Markov Models• Stastistical models of the primary structure consensus of a sequence family
  11. 11. Pfam http://pfam.sanger.ac.uk/• Large collection of protein families represented by multiple sequence alignments and HMMs• Analyze protein sequences for Pfam match• Look at multiple alignments of members of the gene family
  12. 12. INTERPRO http://www.ebi.ac.uk/interpro/• Database of protein families, domains and sites with identified in known proteins which can be applied to new protein sequences• Collects protein families from other databases such as Pfam, UniProtKb and TIGRFAMs• Sequence search is done with InterProScan Downloadable (rans faster on own server, large set) GUI (limited number of sequences)
  13. 13. Subcellular localization• Signal P:Predicts the presence and location of signal peptide and cleavage sites in organism• TMHMM: Predicts transmembrane• TargetP:Predicts subcellular location based on chlroplast transit peptide and mitochondrial targeting sequence
  14. 14. Signal P Searchhttp://www.cbs.dtu.dk/services/SignalP/
  15. 15. Sample SignalP OutputCRN2…confirmed with proteomics
  16. 16. Sample SignalP OutputCRN2…confirmed with proteomics
  17. 17. Search EC numbershttp://ca.expasy.org/enzyme/
  18. 18. Functional Annotation Structural Annotation SearchesNucleotide/Protein Databases Domain/Motifs Assignments EC Number Metabolic Automated GO Pathways Manual curation
  19. 19. Metabolic Pathways•Help improve annotation by showing missinggenes in essentail pathways•Useful for comparative genomicsKEGG:http://www.genome.jp/kegg/pathway.htmlReactome: http://www.reactome.orgMetacyc:http://www.metacyc.orgAdd lots of others
  20. 20. KEGG: Kyoto Encyclopedia of Genes and Genomeshttp://www.genome.jp/kegg/pathway.html
  21. 21. Functional Annotation Structural Annotation SearchesNucleotide/Protein Databases Domain/Motifs Assignments EC Number Metabolic Automated GO Pathways Manual curation
  22. 22. Some initial PAMGO Biological Process Terms Included in initial 35 terms added Jan 2005 First set of terms These processes are general to all associations
  23. 23. GO: 0052048 interaction with host via secreted substance GO: 0052044 induction by symbiont of host programmed cell deathbacteriumoomycete
  24. 24. GO: 0052048 interaction with host via secreted substance GO: 0052044 induction by symbiont of host programmed cell deathbacteriumoomycete GO: 0052048 interaction with host via secreted substance GO: 0052044 induction by symbiont of host programmed cell death GO: 0009405 pathogenesis
  25. 25. Functional Annotation Structural Annotation SearchesNucleotide/Protein Databases Domain/Motifs Assignments EC Number Metabolic Automated GO Pathways Manual curation
  26. 26. Why manual AnnotationCombine all search information and evidenceManually look through all informationAdd experimental data from literature when availableApproach conservativelySetbackTime-consuming and more expensive.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×