Tyler functional annotation thurs 1120

1,028 views
859 views

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,028
On SlideShare
0
From Embeds
0
Number of Embeds
34
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Tyler functional annotation thurs 1120

  1. 1. Functional Annotation and the Gene Ontology Brett TylerVirginia Bioinformatics Institute
  2. 2. What is Annotation• comments, notes, explanations, or other types of external remarks that can be attached to a document……• For genomics functional annotation means attaching biological information to sequences
  3. 3. Functional Annotation Structural Annotation SearchesNucleotide/Protein Databases Domain/Motifs Assignments EC Number Metabolic Automated GO Pathways Manual curation
  4. 4. Functional Annotation Structural Annotation SearchesNucleotide/Protein Databases Domain/Motifs Assignments EC Number Metabolic Automated GO Pathways Manual curation
  5. 5. Automated Searches• Search programs can be downloaded and run internally on unix system• Graphic user interfaces but normally takes limited sequences
  6. 6. Homology or similarity based searches• Local pairwise alignment tools : look for any regions of similarity within the proteins that score well. – BLAST • fast• Global pairwise alignment tools take two sequences and attempt to find an alignment of the two over their full lengths. – Needleman-Wunsch • finds best out of all possible alignments• Multiple alignments tools try to align 3 or more proteins so that the maximal number of amino acids from each protein are matched in the alignment - this may or may not include the full length of some or all of the proteins – clustalW
  7. 7. BLAST Programs• Blastn: Search a nucleotide database using a nucleotide query• BlastP: Search protein database using a protein query• Blastx: Search protein database using a translated nucleotide query• Tblastn: Search translated nucleotide database using a protein query• Tblastx: Search translated nucleotide database using a translated nucleotide query
  8. 8. Example of BLAST outputtop row is the search protein (query) and the bottom row is the match protein(subject).Middle row is consensus+ indicates similar amino acidsnumbers indicate amino acid position in the sequence
  9. 9. Functional Annotation Structural Annotation SearchesNucleotide/Protein Databases Domain/Motifs Assignments Metabolic EC Number Automated GO Pathways Manual curation
  10. 10. Domain SearchHidden Markov Models• Stastistical models of the primary structure consensus of a sequence family
  11. 11. Pfam http://pfam.sanger.ac.uk/• Large collection of protein families represented by multiple sequence alignments and HMMs• Analyze protein sequences for Pfam match• Look at multiple alignments of members of the gene family
  12. 12. INTERPRO http://www.ebi.ac.uk/interpro/• Database of protein families, domains and sites with identified in known proteins which can be applied to new protein sequences• Collects protein families from other databases such as Pfam, UniProtKb and TIGRFAMs• Sequence search is done with InterProScan Downloadable (rans faster on own server, large set) GUI (limited number of sequences)
  13. 13. Subcellular localization• Signal P:Predicts the presence and location of signal peptide and cleavage sites in organism• TMHMM: Predicts transmembrane• TargetP:Predicts subcellular location based on chlroplast transit peptide and mitochondrial targeting sequence
  14. 14. Signal P Searchhttp://www.cbs.dtu.dk/services/SignalP/
  15. 15. Sample SignalP OutputCRN2…confirmed with proteomics
  16. 16. Sample SignalP OutputCRN2…confirmed with proteomics
  17. 17. Search EC numbershttp://ca.expasy.org/enzyme/
  18. 18. Functional Annotation Structural Annotation SearchesNucleotide/Protein Databases Domain/Motifs Assignments EC Number Metabolic Automated GO Pathways Manual curation
  19. 19. Metabolic Pathways•Help improve annotation by showing missinggenes in essentail pathways•Useful for comparative genomicsKEGG:http://www.genome.jp/kegg/pathway.htmlReactome: http://www.reactome.orgMetacyc:http://www.metacyc.orgAdd lots of others
  20. 20. KEGG: Kyoto Encyclopedia of Genes and Genomeshttp://www.genome.jp/kegg/pathway.html
  21. 21. Functional Annotation Structural Annotation SearchesNucleotide/Protein Databases Domain/Motifs Assignments EC Number Metabolic Automated GO Pathways Manual curation
  22. 22. Some initial PAMGO Biological Process Terms Included in initial 35 terms added Jan 2005 First set of terms These processes are general to all associations
  23. 23. GO: 0052048 interaction with host via secreted substance GO: 0052044 induction by symbiont of host programmed cell deathbacteriumoomycete
  24. 24. GO: 0052048 interaction with host via secreted substance GO: 0052044 induction by symbiont of host programmed cell deathbacteriumoomycete GO: 0052048 interaction with host via secreted substance GO: 0052044 induction by symbiont of host programmed cell death GO: 0009405 pathogenesis
  25. 25. Functional Annotation Structural Annotation SearchesNucleotide/Protein Databases Domain/Motifs Assignments EC Number Metabolic Automated GO Pathways Manual curation
  26. 26. Why manual AnnotationCombine all search information and evidenceManually look through all informationAdd experimental data from literature when availableApproach conservativelySetbackTime-consuming and more expensive.

×