Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Chibucos annot go_final


Published on

Gene Ontology

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

Chibucos annot go_final

  1. 1. Marcus C. Chibucos, Ph.D.<br />Ontology<br />Evidence<br />Annotation<br />Arabidopsis thaliana ATPase<br />HMA4 zinc binding domain<br />GO:0006829 : zinc ion transport (BP)<br />GO:0005886 : plasma membrane (CC)<br />GO:0005515 : protein binding (MF)<br />Gene Annotation And Ontology<br />
  2. 2. Outline of this talk<br />2<br /><ul><li>Background: the language of biology
  3. 3. Gene Ontology: overview, terms & structure
  4. 4. Annotating with GO and Evidence
  5. 5. Using annotation to facilitate your research</li></li></ul><li>About screenshots in this talk<br />3<br />AmiGO web-based ontology browser<br /><br />OBO-Edit stand-alone editor<br /><br />
  6. 6. What is annotation? Who is involved?<br />Term confusion (what’s in a name?)<br />Scale: the sea of data<br />Controlled vocabularies & ontologies<br />The Gene Ontology Consortium<br />Background: the language of biology<br />4<br />
  7. 7. Annotation<br />5<br />annotate – to make or furnish critical or explanatory notes or comment.<br /> (Merriam-Webster dictionary)<br />genome annotation – the process of taking the raw DNA sequence produced by the genome-sequencing projects and adding the layers of analysis and interpretation necessary to extract its biological significance and place it into the context of our understanding of biological processes. <br /> (Lincoln Stein, PMID 11433356)<br />Gene Ontology annotation – the process of assigning GO terms to gene products… according to two general principles: first, annotations should be attributed to a source; second, each annotation should indicate the evidence on which it is based. <br /> (<br />
  8. 8. Diverse parties involved<br />6<br />End-users, including various researchers<br />Small-scale laboratory projects<br />Whole genome sequencing projects<br />Annotators<br />From reading papers to computational analysis <br />Ontology developers<br />Create terms that reflect scientific knowledge<br />Make interoperable ontologies, database links<br />Developers of tools & resources<br />Standards for storing & sharing data<br />Web interfaces for data analysis & sharing<br />Many areas of expertise<br />Laboratory sciences – biology, chemistry, medicine, and many other disciplines<br />Computational science – bioinformatics, genomics, statistics<br />Software development & web design<br />Philosophy – ontology & logic<br />
  9. 9. Term confusion: synonyms<br />7<br />Do biologists use precise & consistent language?<br />Mutually understood concepts – DNA, RNA, or protein<br />Synonym (one thing known by more than one name) – translation and protein synthesis<br />Enzyme Commission reactions<br />Standardized id, official name & alternative names<br /><br />
  10. 10. Term confusion: homonyms<br />8<br />Homonyms common in biology – different things known by the same name<br />Sporulation<br />Vascular (plant vasculature, i.e. xylem & phloem, or vascular smooth muscle, i.e. blood vessels?)<br />Endospore formation Bacillus anthracis<br />“Sporulation”<br />Reproductive sporulation<br />Asci & ascospores, Morchellaelata(morel)<br /><br />©L Stauffer 2003 (accessed 17-Sep-09)<br /><br />©PG Warner 2008 (accessed 17-Sep-09)<br />
  11. 11. Term confusion: homonyms and biological complexity <br />9<br />AmiGO query “vascular”  51 terms<br />In biology, many related phenomena are described with similar terminology<br />
  12. 12. The problem of scale<br />10<br /><ul><li>Small data sets, small experiments & isolated scientific communities?
  13. 13. Enormous data sets
  14. 14. Microarray experiments
  15. 15. Whole genome sequencing projects
  16. 16. Comparative genomics of multiple diverse taxa
  17. 17. Computers don’t understand nuance
  18. 18. Millions of proteins to annotate
  19. 19. How to effectively search?
  20. 20. How to draw meaningful comparisons?</li></ul><br />(accessed 17-Sep-09)<br />
  21. 21. The Gene Ontology (GO)<br />11<br />Way to address the problems of synonyms, homonyms, biological complexity, increasing glut of data<br />GO provides a common biological language for protein functional annotation<br /> <br />
  22. 22. Controlled vocabulary (CV)<br />12<br />An official list of precisely defined terms that can be used to classify information and facilitate its retrieval<br />Think of flat list like a thesaurus or catalog <br />Benefits of CVs<br />Allow standardized descriptions of things<br />Remedy synonym & homonym issues<br />Can be cross-referenced externally<br />Facilitate electronic searching<br />A CV can be “…used to index and retrieve a body of literature in a bibliographic, factual, or other database. An example is the MeSH controlled vocabulary used in MEDLINE and other MEDLARS databases of the NLM.”<br /><br />
  23. 23. Ontology is a type of CV with defined relationships<br />13<br />Ontology – formalizes knowledge of a subject with precise textual definitions<br />Networked terms where child more specific (“granular”) than parent<br />Less specific<br />GO terms describe biological attributes of gene products…<br />More granular<br />
  24. 24. How GO works<br />14<br />GO Consortium develops & maintains:<br />Ontologies and cross-links between ontologies and different resources<br />Tools to develop and use the ontologies<br />SourceForge tracker for development<br />People studying organisms at databases annotate gene products with GO terms<br />Groups share files of annotation data about their respective organisms<br />Because a common language was used to describe gene products and this information was shared amongst databases…<br />We can search uniformly across databases<br />Do comparative genomics of diverse taxa<br />
  25. 25. GO on<br />15<br />
  26. 26. The Gene Ontology Consortium<br />16<br />Collaboration began 1998 among model organism databases mouse (MGI), fruit fly (FlyBase) and baker’s yeast (SGD)<br />Michael Ashburner of FlyBase contributed the base vocabulary<br />Today > 20 members & associates<br />First publication 2000 (PMID 10802651)<br />Today, PubMed query “gene ontology” yields 3,347 papers (27-Jun-2011)<br />Organisms represented by GO annotations from every kingdom of life<br />Many groups use GO in many different ways for their research<br />Among eight OBO-Foundry ontologies<br />ZFIN<br />Reactome<br />IGS<br />
  27. 27. OBO Foundry<br />17<br />Collaboration among developers of science-based ontologies<br />Establish principles for ontology development<br />Goal of creating a suite of orthogonal interoperable reference ontologies in the biomedical domain.<br />many others…<br />
  28. 28. What the GO is not<br />GO comprises three ontologies<br />Anatomy & storage of GO terms<br />Ontology structure<br />Detail of a term in AmiGO<br />True path rule<br />Gene Ontology:overview, terms & structure<br />18<br />
  29. 29. Caveats – what GO is not<br />19<br />Not gene naming system or gene catalog<br />GO describes attributes of biological objects – “oxidoreductase activity” not “cytochromec”<br />The three ontologies have limitations<br />No sequence attributes or structural features<br />No characteristics unique to mutants or disease<br />No environment, evolution or expression<br />No anatomy features above cellular component<br />Not dictated standard or federated solution<br />Databases share annotations as they see fit<br />Curators evaluate differently<br />GO is evolving as our knowledge evolves<br />New terms added on daily basis<br />Incorrect/poorly defined terms made obsolete<br />Secondary ids – terms with same meaning merged<br />
  30. 30. GO comprises three ontologies<br />20<br />Cellular component ontology (CC) <br />“cytoplasm”<br />Molecular function ontology (MF)<br />“protein binding”<br />“peptidase activity”<br />“cysteine-type endopeptidase activity”<br />Biological process ontology (BP)<br />“proteolysis”<br />“apoptosis”<br />Terms describe attributes of gene products (GPs)<br />Any protein or RNA encoded by a gene<br />Species-independent context, e.g. “ribosome”<br />Could describe GPs found in limited taxa, e.g. “photosynthesis” or “lactation”<br />One GP can be associated with ≥ 1 CC, BP, MF<br />Example: Caspase-6 from Bostaurus<br />
  31. 31. Cellular component ontology<br />21<br />Describes location at level of subcellular structure & macromolecular complex<br />GP subcomponent of or located in particular cellular component, with some exceptions:<br />No individual proteins or nucleic acids<br />No multicellular anatomical terms<br />For annotation purposes, a GP can be associated with or located in ≥ one cellular component<br /><ul><li>Multi-subunit enzyme or protein complex
  32. 32. ribosome
  33. 33. proteasome
  34. 34. ubiquitinligase complex
  35. 35. Anatomical structure
  36. 36. rough endoplasmic reticulum
  37. 37. nucleus
  38. 38. nuclear inner membrane</li></li></ul><li>Molecular function ontology<br />22<br />Describe gene product activity at molecular level<br />Describes attributes of entities<br />Adenylate cyclase (E.C.<br />Catalyzes a specific reaction:<br />ATP = 3',5'-cyclic AMP + diphosphate<br />Described by the Gene Ontology term:<br />“adenylate cyclase activity” (GO:0004016)<br /><br />[accessed 4-Feb-2010]<br /><ul><li>Usually single GP, sometimes a complex
  39. 39. “ferritin receptor activity”
  40. 40. Definition: “combining with ferritin, an iron-storing protein complex, to initiate a change in cell activity”
  41. 41. Broad functions
  42. 42. “catalytic activity”
  43. 43. “transporter activity”
  44. 44. “binding”
  45. 45. Specific functions
  46. 46. “adenylatecyclase activity”
  47. 47. “protein-DNA complex transmembrane transporter activity”
  48. 48. “Fc-gamma receptor I complex binding”</li></li></ul><li>Biological process ontology<br />23<br />Describes recognized series of events or molecular functions with a defined beginning and end<br />“GO does not try to represent the dynamics or dependencies that would be required to fully describe a pathway” (from GO documentation)<br />Mutant phenotypes often reflect disruptions in BP<br /><ul><li>Specific process
  49. 49. “pyrimidine metabolism”
  50. 50. “α-glucosidase transport</li></ul>General considerations<br />The Cell Cycle<br />The Development Node<br />Multi-Organism Process<br />Metabolism<br />Regulation<br />Detection of and Response to Stimuli<br />Sensory Perception<br />Signaling Pathways<br />Transport and Localization<br />Transporter activity (molecular function)<br />Other Misc. Standard Defs<br /><ul><li>Broad process
  51. 51. “cellular physiological process”
  52. 52. “signal transduction”</li></ul><br />
  53. 53. Anatomy of a GO term<br />24<br />Term name<br />goid (unique numerical identifier)<br />Synonyms (broad or narrow) for searching, alternative names, misspellings… <br />Precise textual definition with reference stating source<br />GO slim<br />Ontology placement<br />
  54. 54. Storage and cross referencing of GO terms<br />25<br /><ul><li>Storage in flat file (text)
  55. 55. Database cross reference for mappings to GO
  56. 56. GO term identical to object in other database</li></li></ul><li>Ontology structure:parent-child relationship<br />26<br />Parent term (broader)<br />Child term (specialized)<br />hexose metabolism<br />monosaccharide biosynthesis<br />hexose biosynthesis<br /><ul><li>Up in the tree is more general; down in the tree is more specific:
  57. 57. Annotation of genes
  58. 58. Start with terms denoting broad functional categories
  59. 59. Use more specific term as knowledge warrants</li></li></ul><li>Ontology structure:terms arranged in DAGs<br />27<br />GO terms structured as hierarchical-like directed acyclic graphs (DAGs)<br />Tree-like, but each term can have more than one parent (pseudo-hierarchy)<br />Each term may have one or more child terms (“siblings” share same parent)<br />parents<br />child term<br />parent<br />child terms<br />“siblings”<br />
  60. 60. GO has three term relationships<br />28<br />is_a - child is instance of parent (“A is_a B”)<br />Class-subclass relationship<br />part_of - child part of parent (“C part_of D”)<br />When C present, part of D; but C not always present<br />Nucleus always part_of cell; not all cells have nuclei<br />regulates<br />Child term regulates parent term<br />(Zoomed in view of biological process ontology depicted here.)<br />
  61. 61. AmiGO for viewing terms<br />29<br />Open source HTML-based application developed by the GO Consortium<br />Interface for browsing, querying and visualizing OBO data<br />Users can search GO terms or annotations<br />Available via website or download for local install<br /><br />Example query with<br />keyword “hemolysis” or goid GO:0019836<br />GO:0019836<br />
  62. 62. AmiGO search results<br />30<br />Click<br />
  63. 63. Term information in AmiGO<br />31<br />Webpage continues…<br />
  64. 64. AmiGO view continued<br />32<br />Several informative views<br />Click<br />Number of gene products in GO annotation collection annotated to that term or one of its child terms<br />Relationship between term and its parent<br />Our term is much further down…<br />
  65. 65. Graph view<br />33<br /><ul><li>Alternative view of network of terms</li></li></ul><li>A term with two parents<br />34<br />amine group<br />carboxylic acid group<br />generic amino acid<br /><ul><li>Name: amino acid transmembrane transporter activity
  66. 66. ID number: GO:0015171
  67. 67. Definition: Catalysis of the transfer of amino acids from one side of a membrane to the other. Amino acids are organic molecules that contain an amino group and a carboxyl group. [source: GOC:ai, GOC:mtg_transport, ISBN:0815340729]
  68. 68. parent term: amine transmembrane transporter activity (GO:0005275)
  69. 69. relationship to parent: “is_a”
  70. 70. parent term: carboxylic acid transmembrane transporter activity (GO:0046943)
  71. 71. relationship to parent: “is_a”</li></li></ul><li>Multiple paths to root:graphical view in OBO-Edit<br />35<br />
  72. 72. “True path rule”<br />36<br />The pathway from a term all the way up to its top-level parent(s) must always be true for any gene product that could be annotated to that term (“if true for the child, then true for the parent”)<br />Incorrect for Bacteria<br />cell<br /> organelle<br />mitochondrion<br /> proton-transporting ATP synthase complex<br />Correct for Bacteria (and Eukaryotes)<br />cell<br /> intracellular<br /> proton-transporting ATP synthase complex<br /> plasma membrane proton-transporting ATP synthase complex<br /> mitochondrial proton-transporting ATP synthase complex<br /> membrane<br /> plasma membrane<br /> plasma membrane proton-transporting ATP synthase complex<br /> organelle<br /> mitochondrion<br /> mitochondrial inner membrane<br /> mitochondrial proton-transporting ATP synthase complex<br />(Abbreviated versions of the actualtrees)<br />
  73. 73. What is GO annotation?<br />Literature curation at model organism databases<br />The annotation file<br />Evidence – critical for annotation<br />Sequence similarity-based annotation<br />Annotation specificity<br />Annotating with GO and Evidence<br />37<br />
  74. 74. GO annotation overview<br />38<br />Associating a GO term with a gene product<br />Goal is to select GO terms from all three ontologies to represent what, where, and how<br />Linking a GO term to a gene product asserts that it has that attribute<br />For example, 6-phosphofructokinase<br />Molecular function<br />GO:0003872 6-phosphofructokinase activity<br />Biological process<br />GO:0006096 glycolysis<br />Cellular component<br />GO:0005737 cytoplasm<br />Annotation, whether based on literature or computational methods, always involves:<br />Learning something about a gene product<br />Selecting an appropriate GO term<br />Providing an appropriate evidence code<br />Citing a [preferably open access] reference<br />Entering information into GO annotation file<br />
  75. 75. Chaperone DnaK, one protein/multiple annotations<br />39<br />Molecular function<br />ATP binding (GO:0005524)<br />ATPase activity (GO:0016887)<br />unfolded protein binding (GO:0051082)<br />misfolded protein binding (GO:0051787)<br />denatured protein binding (GO:0031249)<br />Biological process<br />protein folding (GO:0006457)<br />protein refolding (GO:0042026)<br />protein stabilization (GO:0050821)<br />response to stress (GO:0006950)<br />Cellular component<br />cytoplasm (GO:0005737)<br />
  76. 76. Literature curation performed at model organism databases<br />40<br />From the abstract:<br />
  77. 77. Results section indicates a “direct assay” annotation<br />41<br />They document the findings of a direct assay performed on purified protein:<br />They further document the methods used, and evaluate the findings in the Discussion section…<br />
  78. 78. Query AmiGO with “DNA ligase” & “DNA ligation”<br />42<br />All “ligation” in biological process ontology<br />
  79. 79. Resulting annotations<br />43<br />Name: DNA ligase (stated in paper)<br />Gene symbol: ligA (stated in paper)<br />EC: (queried enzyme for “DNA ligase”)<br />
  80. 80. Gene annotation file captures annotations<br />44<br />Evidence<br />
  81. 81. Evidence<br />45<br />Essential to base annotation on evidence<br />Conclusions more robust and traceable<br />With evidence, a GO annotation is standard operating procedure (SOP)-independent<br />Many types of evidence exist<br />For example, experiment described in literature<br />What method (e.g. direct assay, mutant phenotype, et cetera) was used?<br />Did author cite references?<br />Did author provide details of analyses?<br />Perhaps you used a sequence-based method<br />What were the methods of manual curation?<br />Give accession numbers of similar sequences<br />Provide any references describing methods<br />Controlled vocabularies help here, too!<br />
  82. 82. GO standard references<br />46<br />GO_REF:0000011 <br />A Hidden Markov Model (HMM) is a statistical representation of patterns found in a data set. When using HMMs with proteins, the HMM is a statistical model of the patterns of the amino acids found in a multiple alignment of a set of proteins called the "seed". Seed proteins are chosen based on sequence similarity to each other. Seed members can be chosen with different levels of relationship to each other... <br />GO_REF:0000011 <br />A Hidden Markov Model (HMM) is a statistical representation of patterns found in a data set. When using HMMs with proteins, the HMM is a statistical model of the patterns of the amino acids found in a multiple alignment of a set of proteins called the "seed". Seed proteins are chosen based on sequence similarity to each other. Seed members can be chosen with different levels of relationship to each other. They can be members of a superfamily (ex. ABC transporter, ATP-binding proteins), they can all share the same exact specific function (ex. biotin synthase) or they could share another type of relationship of intermediate specificity (ex. subfamily, domain). New proteins can be scored against the model generated from the seed according to how closely the patterns of amino acids in the new proteins match those in the seed. There are two scores assigned to the HMM which allow annotators to judge how well any new protein scores to the model. Proteins scoring above the "trusted cutoff" score can be assumed to be part of the group defined by the seed. Proteins scoring below the "noise cutoff" score can be assumed to NOT be a part of the group. Proteins scoring between the trusted and noise cutoffs may be part of the group but may not. One of the important features of HMMs is that they are built from a multiple alignment of protein sequences, not a pairwise alignment. This is significant, since shared similarity between many proteins is much more likely to indicate shared functional relationship than sequence similarity between just two proteins. The usefulness of an HMM is directly related to the amount of care that is taken in chosing the seed members, building a good multiple alignment of the seed members, assessing the level of specificity of the model, and choosing the cutoff scores correctly. In order to properly assess what functional relevance an above-trusted scoring HMM match has to a query, one must carefully determine what the functional scope of the HMM is. If the HMM models proteins that all share the same function then it is likely possible to assign a specific function to high-scoring match proteins based on the HMM. If the HMM models proteins that have a wide variety of functions, then it will not be possible to assign a specific function to the query based on the HMM match, however, depending on the nature of the HMM in question, it may be possible to assign a more general (family or subfamily level) function. In order to determine the functional scope of an HMM, one must carefully read the documentation associated with the HMM. The annotator must also consider whether the function attributed to the proteins in the HMM makes sense for the query based on what is known about the organism in which the query protein resides and in light of any other information that might be available about the query protein. After carefully considering all of these issues the annotator makes an annotation.<br />
  83. 83. GO evidence<br />47<br />EXP - inferred from experiment<br />IDA - inferred from direct assay<br />IEP inferred from expression pattern<br />IGI - inferred from genetic interaction<br />IPI - inferred from physical interaction<br />IMP - inferred from mutant phenotype<br />ISS - inferred from sequence or structural similarity<br />ISA - inferred from sequence alignment<br />ISO - inferred from sequence orthology<br />ISM - inferred from sequence model<br />IGC - inferred from genomic context<br />ND - no biological data available<br />IC - inferred by curator<br />TAS - traceable author statement<br />NAS - non-traceable author statement<br />IEA - inferred from electronic annotation<br />GO codes are a subset of yet another ontology!<br />
  84. 84. Types of sequence similarity-based annotations<br />48<br />Find similarity between gene product & one that is experimentally characterized<br />BLAST-type alignments<br />Shared synteny to establish orthology of genomic regions between species<br />Find similarity between gene product and defined protein family<br />HMMs (Pfam, TIGRFAMS)<br />Prosite<br />InterPro<br />Find motifs in gene product with prediction tools<br />TMHMM <br />SignalP<br />Many (most?) information you find is based on transitive annotation and much of it has never been looked at by a human being!<br />
  85. 85. Evaluation of sequence similarity-based information<br />49<br />Visually inspect alignments & criteria<br />Length & identity<br />Conservation of catalytic sites<br />Check HMM scores with respect to cutoff<br />Look at available metabolic analysis<br />Pathways, complexes?<br />Information from neighboring genes<br />Gene in an operon (common prokaryotes) can supplement weak similarity evidence<br />Sequence characteristics<br />Transmembraneregions?<br />Signal peptide?<br />Known motifs that give a clue to function?<br />Paralogous family member<br />
  86. 86. An example: HI0678, a protein from H. influenzae…<br />...high quality alignment to experimentally characterized triosephosphateisomerase from Vibrio marinus<br />50<br />
  87. 87. Information from Swiss-Prot database on experimentally characterized match protein<br />further down the page<br />51<br />
  88. 88. High quality…..<br />…. full-length match, high percent identity (67.8%), conserved active and binding sites (boxed in red).<br />52<br />
  89. 89. Resulting annotations<br />53<br />name:triosephosphateisomerase<br />gene symbol:tpiA<br />EC:<br />(This, and the following annotations, came from the match protein.)<br />
  90. 90. KEGG pathway for glycolysis core<br />54<br />
  91. 91. KEGG pathway for glycolysis core<br />55<br />
  92. 92. Resulting annotations<br />56<br />name: triosephosphateisomerase<br />gene symbol: tpiA<br />EC:<br />
  93. 93. And another annotation<br />57<br />The biologist knows that glycolysis takes place in the cytoplasm in bacteria, and so infers a cytoplasmic location for that protein (“inferred by curator” evidence code). <br />
  94. 94. Annotation specificity should reflect knowledge<br />58<br />GO trees <br />(very abbreviated)<br />Function<br />catalytic activity<br />kinase activity<br /> carbohydrate kinase activity<br />ribokinase activity<br />glucokinase activity<br />fructokinase activity<br />Process<br /> metabolism<br /> carbohydrate metabolism<br /> monosaccharide metabolism<br />hexose metabolism<br />glucose metabolism<br /> fructose metabolism<br /> pentose metabolism<br /> ribose metabolism<br />Available evidence for three genes<br />#1<br />-good match to an HMM for “kinase”<br />#2<br />-good match to an HMM for “kinase”<br />-a high-quality BER match to an experimentally characterized “glucokinase’ AND a ‘fructokinase’<br />#3<br />-good match to an HMM specific for “ribokinase”<br />-a high-quality BER match to an experimentally characterized ribokinase<br />#1<br />#2<br />#3<br />#1<br />#2<br />#3<br />
  95. 95. Using shared annotations<br />Search for GO terms at databases<br />Slims for broad classification<br />GO tools<br />Working with GO-limited data sets<br />Summary<br />Using annotation to facilitate your research<br />59<br />
  96. 96. Sharing annotations<br />60<br />Annotation file sent to GO, put in repository<br />All these data free to anyone<br />Hundreds of thousands of GP annotations<br />Annotation files all in same format<br />Facilitates easy use of data by everyone<br />Most of your favorite organism databases use these annotation files<br />
  97. 97. Searching for GO terms at EuPathDB<br />61<br />
  98. 98. 62<br />Ontology slim<br /><br />Slim is a distilled (reduced) ontology <br />Made by manually pruning low-level terms with an ontology editor<br />Selected high-level terms remain<br />Slims reduce ontology complexity<br />Reduce clutter & see general trends<br />Microarray experiments<br />Comparative whole genome analyses<br />Remove irrelevant terms<br />Looking at specific taxa, such as yeast or plant<br />Go offers script to bin more granular annotations up to higher levels<br />
  99. 99. Comparing genomes with a GO slim<br />63<br /><ul><li>High-levelbiological process terms used to compare Plasmodium and Saccharomyces</li></ul>MJ Gardner, et al. (2002) Nature 419:498-511<br />
  100. 100. GO slim: manual/orthology-based gene annotations<br />64<br />Nucleic Acids Res. 2010 January; 38(Database issue): D420–D427. <br />
  101. 101. GO<br />65<br />The real challenge is finding the right one for your needs<br />For example, statistical representation of GO terms:<br /><br />
  102. 102. GO & analysis of RNA-seqdata<br />66<br />Young et al. Genome Biology 2010, 11:R14<br />We present GOseq, an application for performing Gene Ontology (GO) analysis on RNA-seq data. GO analysis is widely used to reduce complexity and highlight biological processes in genome-wide expression studies, but standard methods give biased results on RNA-seq data due to over-detection of differential expression for long and highly expressed transcripts. Application of GOseq to a prostate cancer data set shows that GOseq dramatically changes the results, highlighting categories more consistent with the known biology.<br />
  103. 103. When GO is limited<br />67<br />Food for thought: what happens when we have limited GO (or other)annotation data?<br />New and interesting genomes often see this problem<br />
  104. 104. Comparative analysis of orthologs in syntenic blocks<br />68<br />The more genomes we have at our disposal, the better<br />Structural rearrangements, absence of intron, gene duplication, intron structure, gene deletion/creation<br />Nucleic Acids Res. 2010 January; 38(Database issue): D420–D427. <br />
  105. 105. Summary GO analyses<br />69<br />GO remedies problems of synonyms & homonyms in biological nomenclature<br />Queries based on IDs linked to precise definitions, not less reliable text-matching<br />GO can help you to:<br />Find all genes that share a particular function regardless of sequence<br />Do comparisons across any species annotated with GO<br />Summarize major classes of genes in a newly sequenced genome<br />Characterize expressed genes is a study<br />Drive hypotheses to test in the laboratory<br />GO is not a panacea but it should be a valuable tool in your genomics toolbox<br />
  106. 106. The title slide revisited…<br />Ontology<br />Evidence<br />Annotation<br />Arabidopsis thaliana ATPase<br />HMA4 zinc binding domain<br />GO:0006829 : zinc ion transport (BP)<br />GO:0005886 : plasma membrane (CC)<br />GO:0005515 : protein binding (MF)<br />Thank you.<br />