Presentation from Biocuration conference describing extension to GO annotation formalism allowing curators to capture more detailed biological context and specificity at time of annotation. Feature Portuguese Man-o-War assaults.
This document discusses various topics relating to protein structure and bioinformatics. It begins with an overview of protein structure and why understanding protein structure is important. It then discusses the different levels of protein structure from primary to quaternary structure. Methods for determining protein structure like X-ray crystallography and NMR are mentioned. Databases for storing protein structures like the Protein Data Bank are also summarized. The document touches on topics like protein folding, domains, membrane protein topology, and secondary structure prediction methods.
The document summarizes Monica Munoz-Torres' thought process and analysis of two protein families - PTHR16771 and PTHR23074 - in relation to the GO term "recombinational repair". For PTHR16771 (DSS1), she inspects literature linking it to homologous recombination and prunes three species from the family. For PTHR23074 (AAA ATPase), she analyzes FIGNL1 and its role in double-strand break repair, and decides to propagate the relevant GO term only to the MRCA of FIGNL1 homologs. She also notes software issues with PAINT.
Lauren H. Silagy has over 15 years of experience in furniture design, fabrication, and operations management. She received her BFA in Furniture Design from Rhode Island School of Design in 2005. Since 2006, she has served as Operations Manager for Kboom Culture LLC, where she oversees manufacturing, client relations, and installations. She also freelanced as an illustrator and writer for ANTENNA Magazine from 2008-2010. Her skills include design, woodworking, metalworking, and computer-aided design programs. She has exhibited work nationally and internationally since 2005.
Uberon is an integrative multi-species anatomy ontology that contains over 11,000 classes describing anatomical structures across multiple animal species, with a focus on chordates and mammals. It uses multiple relationship types like subclass, part-of, and develops-from to connect these classes in a structured ontology. Uberon aims to bridge between existing species-specific anatomy ontologies like the Mouse Anatomy ontology and the Foundational Model of Anatomy for human. It allows cross-referencing between these ontologies and helps integrate anatomical knowledge across models and humans.
Mapping Phenotype Ontologies for Obesity and DiabetesChris Mungall
This document discusses approaches to mapping phenotype ontologies across species and categories. It describes using OWL axioms to define phenotypes in a machine-interpretable way and create bridges between ontologies. This enables cross-ontology queries and integrated views of data. Challenges include modeling complex phenomena accurately in OWL and a lack of tools integrated into the ontology development process. The Monarch Initiative aims to address these issues by developing tools like TermGenie and providing integrated views of data from multiple ontologies.
Function and Phenotype Prediction through Data and Knowledge FusionKarin Verspoor
The biomedical literature captures the most current biomedical knowledge and is a tremendously rich resource for research. With over 24 million publications currently indexed in the US National Library of Medicine’s PubMed index, however, it is becoming increasingly challenging for biomedical researchers to keep up with this literature. Automated strategies for extracting information from it are required. Large-scale processing of the literature enables direct biomedical knowledge discovery. In this presentation, I will introduce the use of text mining techniques to support analysis of biological data sets, and will specifically discuss applications in protein function and phenotype prediction, exploring the integration of literature data with complementary structured resources.
The Gene Ontology & Gene Ontology Annotation resourcesMelanie Courtot
The Gene Ontology (GO) provides structured controlled vocabularies for describing gene and gene product attributes across species. It includes three ontologies for molecular function, biological process, and cellular component. The GO is manually developed and electronically annotated to gene products to capture biological knowledge in a computable form. The GO Consortium aims to develop and maintain the GO through manual and computational methods, and to provide public GO annotation data and tools.
This document discusses various topics relating to protein structure and bioinformatics. It begins with an overview of protein structure and why understanding protein structure is important. It then discusses the different levels of protein structure from primary to quaternary structure. Methods for determining protein structure like X-ray crystallography and NMR are mentioned. Databases for storing protein structures like the Protein Data Bank are also summarized. The document touches on topics like protein folding, domains, membrane protein topology, and secondary structure prediction methods.
The document summarizes Monica Munoz-Torres' thought process and analysis of two protein families - PTHR16771 and PTHR23074 - in relation to the GO term "recombinational repair". For PTHR16771 (DSS1), she inspects literature linking it to homologous recombination and prunes three species from the family. For PTHR23074 (AAA ATPase), she analyzes FIGNL1 and its role in double-strand break repair, and decides to propagate the relevant GO term only to the MRCA of FIGNL1 homologs. She also notes software issues with PAINT.
Lauren H. Silagy has over 15 years of experience in furniture design, fabrication, and operations management. She received her BFA in Furniture Design from Rhode Island School of Design in 2005. Since 2006, she has served as Operations Manager for Kboom Culture LLC, where she oversees manufacturing, client relations, and installations. She also freelanced as an illustrator and writer for ANTENNA Magazine from 2008-2010. Her skills include design, woodworking, metalworking, and computer-aided design programs. She has exhibited work nationally and internationally since 2005.
Uberon is an integrative multi-species anatomy ontology that contains over 11,000 classes describing anatomical structures across multiple animal species, with a focus on chordates and mammals. It uses multiple relationship types like subclass, part-of, and develops-from to connect these classes in a structured ontology. Uberon aims to bridge between existing species-specific anatomy ontologies like the Mouse Anatomy ontology and the Foundational Model of Anatomy for human. It allows cross-referencing between these ontologies and helps integrate anatomical knowledge across models and humans.
Mapping Phenotype Ontologies for Obesity and DiabetesChris Mungall
This document discusses approaches to mapping phenotype ontologies across species and categories. It describes using OWL axioms to define phenotypes in a machine-interpretable way and create bridges between ontologies. This enables cross-ontology queries and integrated views of data. Challenges include modeling complex phenomena accurately in OWL and a lack of tools integrated into the ontology development process. The Monarch Initiative aims to address these issues by developing tools like TermGenie and providing integrated views of data from multiple ontologies.
Function and Phenotype Prediction through Data and Knowledge FusionKarin Verspoor
The biomedical literature captures the most current biomedical knowledge and is a tremendously rich resource for research. With over 24 million publications currently indexed in the US National Library of Medicine’s PubMed index, however, it is becoming increasingly challenging for biomedical researchers to keep up with this literature. Automated strategies for extracting information from it are required. Large-scale processing of the literature enables direct biomedical knowledge discovery. In this presentation, I will introduce the use of text mining techniques to support analysis of biological data sets, and will specifically discuss applications in protein function and phenotype prediction, exploring the integration of literature data with complementary structured resources.
The Gene Ontology & Gene Ontology Annotation resourcesMelanie Courtot
The Gene Ontology (GO) provides structured controlled vocabularies for describing gene and gene product attributes across species. It includes three ontologies for molecular function, biological process, and cellular component. The GO is manually developed and electronically annotated to gene products to capture biological knowledge in a computable form. The GO Consortium aims to develop and maintain the GO through manual and computational methods, and to provide public GO annotation data and tools.
Cross Product Extensions to the Gene OntologyChris Mungall
The document discusses extending the Gene Ontology (GO) through assigning logical computable definitions to GO classes. This involves partitioning GO classes into "cross product" sets based on the ontologies used in the definitions. Over 13,000 GO classes now have provisional logical definitions assigned using this approach, covering molecular function, biological process, cellular component, and other ontologies. The logical definitions allow for nested descriptions and reasoning over GO classes. Anatomy classes are standardized in the Uberon cross-species anatomy ontology.
Workflows supporting drug discovery against malariaBarry Hardy
The goal of Scientists Against Malaria (SAM) is the discovery of novel anti-malarial compounds. SAM supports virtual drug discovery organizational structures collaborating on target selection and modeling, protein expression and assay development, computational drug design, and screening. A combination of interoperable information systems, ontologies and web services were designed and deployed to manage the data, documents, computational and assay results, activity and toxicology predictions, as well as dashboards to track project progress and to support decision making. Workflows were developed for consensus virtual screening of candidate malarial kinase inhibitors including docking, pharmacophore-based screening and free energy-based molecular simulations. The models were applied to the discovery of active ligands against a novel target with previously unknown structure or ligands. The workflows were extended to include OpenTox model web services to prioritize drug candidates according to their predicted toxicities, supporting a weight of evidence categorization of candidate molecules according to their activity and toxicity profiles.
PomBase conventions for improving annotation depth, breadth, consistency and ...Valerie Wood
PomBase uses a combination of annotation conventions and QC mechanisms. In addition to identifying annotation inconsistencies and errors, these combined methods improve information content, annotation coverage, depth or specificity and redundancy.
Avacta Life Sciences Affimers Presentation Global Protein Engineering Summit ...AvactaLifeSciences
Avacta Life Sciences Exhibits Affimers at Global Protein Engineering Summit
Avacta Life Sciences exhibited recently at the Global Protein Engineering Summit ("PEGS") where it presented its Affimer technology.
You can read more about Affimer technology here http://www.avactalifesciences.com
PEGS is considered to be the essential protein engineering meeting where commercial and academic progress in protein engineering is showcased and this year it attracted over 1800 delegates from across the globe to Boston. Avacta Life Sciences presented its Affimer technology for the first time at a PEGS meeting with technical exhibits and a presentation by the CSO, Paul Ko Ferrigno, entitled "Biological Recognition: Beyond the Antibody."*
The exhibition booth was busy with over 80 delegates talking to the Avacta Life Science management team over the four days of the summit. The feedback on the Affimer technology was very positive, in particular, the short development times and excellent stability were highlighted by delegates as key advantages of Affimers over antibodies. There was also a strong interest in Affimers from the management of companies developing biological therapeutics who were keen to learn more about the potential of Affimers as novel therapeutics.
In addition, several companies were interested in the use of Affimers as an alternative to antibodies in diagnostic devices, mainly because they could generate binders against new biomarkers much more quickly and evaluate them in higher numbers.
The benefits of Affimer microarrays for biomarker discovery also resonated with diagnostic developers who appreciated the advantage of being able to evaluate significantly larger numbers of potential biomarkers more cost and time effectively than by mass spectrometry. The potential of the arrays for multiplexed solutions for clinical diagnosis and monitoring during drug trials was also something that generated interest amongst those delegates.
Matt Johnson, Chief Technical Officer of Avacta Life Sciences commented: "It was great to experience face to face the level of interest in Affimers. The majority of people I spoke to were either having problems raising antibodies to their target of interest or just couldn't use antibodies because of the type of assays they wanted to perform. Many of the presentations focused around the use of antibody fragments for intra-cellular studies which is a rapidly growing area that holds great interest for drug and diagnostics developers. It is an area where there are clear advantages for Affimers over antibody fragments which don't behave well in the cytoplasm.
"The general enthusiasm around Affimers was very encouraging and the amount of interest generated by the potential of Affimers as therapeutics and by the Affimer arrays for biomarker discovery only reinforces my excitement around this new technology."
Translating research data into Gene Ontology annotationsPascale Gaudet
This document discusses Gene Ontology (GO) annotations, which are statements linking genes to aspects of their functions as represented by GO terms. Annotations are generated through manual literature curation, manual sequence analysis, or algorithmic prediction. They are assigned evidence codes and references and represent the normal biological roles of gene products based on molecular function, biological process, and cellular component. Guidelines are provided for producing high-quality, literature-based annotations that accurately reflect experimental conclusions and biological context.
Modeling exposure events and adverse outcome pathways using ontologiesChris Mungall
This document discusses using ontologies to model exposure events, adverse outcome pathways, and phenotypes in order to support predictive toxicology. It describes existing ontologies like the Environment, Conditions, and Treatments Ontology (ECTO) and Gene Ontology Causal Activity Models (GO-CAMs) that can be used to represent exposure mechanisms and adverse outcomes. The document also presents challenges for developing an open predictive toxicology framework that leverages ontologies and linked data to make toxicology data more findable, accessible, interoperable, and reusable.
This document discusses three possible strategies for identifying biological knowledge from scientific literature: 1) Allowing authors to validate biological entities during the writing process, 2) Performing discourse analysis to understand persuasive elements and relationships between ideas, and 3) Encouraging collaboration between authors and databases to identify hypotheses. It focuses on the challenges of current fact extraction techniques and the potential for modeling discourse and rhetorical moves to improve knowledge representation.
NCBI has developed a powerful suite of online biomedical and bioinformatics resources, including old friends like PubMed and OMIM and newer resources such as Genome. This collection of databases and tools are widely used by scientists and medical professionals across the world. With such a wealth of information, it is easy to get overwhelmed. Join us for an overview to NCBI resources for the information professional with an emphasis on biodata connectivity. No science degree required!
Background of the project and simple use cases of using the Open PHACTS API and KNIME to extract compound, target and indication entities from millions of patent documents and infer meaningful links among them. Open PHACTS Linked Data meeting in Vienna.
The document discusses analyzing Gene Ontology (GO) annotations using Zipf's law, which states that word frequencies follow a power law distribution. The author:
1) Analyzed GO annotations from several species and found they generally follow a power law, indicating GO acts like a language. Exponents were in the normal range for communication.
2) Biological process annotations had higher exponents, suggesting more precise encoding, while molecular function and cellular component annotations favored the speaker with lower exponents.
3) High confidence annotations fit power laws better than low confidence ones, indicating higher quality communication.
The analysis provides a way to rapidly assess the "language-like" qualities and potential quality of GO annotations through the power law
The document discusses the functional interaction between mGlu1a and GABAB receptors. It first provides background on G protein-coupled receptors and their importance as drug targets. It then reviews evidence that mGlu1a and GABAB receptors physically interact in cortical neurons and Purkinje cells based on co-localization and co-immunoprecipitation studies. The study aims to further investigate the physical interaction between the receptors and the mechanism underlying their functional cross-talk using biophysical techniques like BRET, TR-FRET, and cell-surface co-immunoprecipitation. Preliminary data suggests mGlu1a and GABAB do not form complexes at the cell surface but may oligomerize intracellularly.
This document discusses using stochastic models to understand principles of gene regulation from regulatory DNA architecture. It summarizes that regulating promoters downstream of irreversible assembly steps reduces molecular noise compared to regulating initiation rates. Distributed binding sites across enhancers also helps reduce expression noise. The document proposes relating complex biochemical architectures of promoters and enhancers to their transcriptional properties using finite Markov chain approaches.
Paprica is a pipeline that predicts metabolic pathways and enzymes from 16S rRNA gene sequences. The document outlines a schedule for a short course on using paprica, including installing dependencies, running tutorials to analyze sample data, and building a custom paprica database. The tutorials will demonstrate the paprica workflow, analyzing samples and visualizing output files containing predicted pathways, enzymes, and confidence scores.
Inferring microbial gene function from evolution of synonymous codon usage bi...Fran Supek
Introduction: Thousands of microbial genomes are available, yet even for the model organisms, a sizable portion of the genes have unknown function. Phyletic profiling is a technique that can predict their function by comparing the presence/absence profiles of their homologs across genomes. In addition, prokaryotic genomes contain an evolutionary signature of gene expression levels in the codon usage biases, where highly expressed genes prefer the codons better adapted to the cellular tRNA pools.
Objectives: We aimed to augment the existing phyletic profiling approaches by incorporating more detailed knowledge of gene evolutionary history, and create a very large database of predicted gene functions direcly usable for microbiologists.
Materials & methods: We used the OMA groups of orthologs and the paralogy relationships inferred through OMA's „witness of non-orthology“ rule. Genes were assigned to Gene Ontology categories and the phyletic profiles compared using the CLUS classifier that performs a hierarchical multilabel classification using decision trees. We quantified significant codon biases using a Random Forest randomization test that compares against the composition of intergenic DNA. Codon biases in COG gene families were contrasted between microbes inhabiting different enviroments, while controlling for phylogenetic inertia.
Results: The genomic co-occurence patterns of both the orthologs and the paralogs (the homologs separated by a speciation and by a duplication event, respectively) were informative and synergistic in a phylogenetic profiling setup, even though paralogy relationships are thought to conserve function less well. The resulting ~400,000 gene function predictions for 998 prokaryotes (at FDR<10%)> method to systematically link codon adaptation within COG gene families to microbial phenotypes and environments (thus functionally characterizing the COGs) and experimentally validated the predictions for novel E. coli genes relevant for surviving oxidative, thermal or osmotic stress.
Conclusion: Our work towards ehnancing phylogenetic profiling, as well as developing complementary genomic context approaches, will contribute to prioritizing experimental investigation of microbial gene function, cutting time and cost needed for discovery.
caron.ppt educate the patient on the usesomar97227
This document summarizes Marc Caron's research on the neuronal plasticity associated with drugs of abuse. Some key points:
- Caron used mouse models and genetic approaches to study addiction at the molecular level. He identified behaviors like sensitization and withdrawal that model aspects of human addiction.
- Microarray analysis of genetically sensitized mice found changes in genes like PSD-95, involved in synaptic plasticity. Knockout of PSD-95 enhanced responses to drugs, suggesting it plays a role in addiction.
- Further work showed that cocaine reduces PSD-95 levels in the striatum, enhancing long-term potentiation there. This may underlie the plasticity induced by drugs of abuse.
This document discusses experiments in semantic enrichment. It describes work done with FEBS Letters to semantically enrich their structured digital abstracts with entities and relationships from databases. It also discusses the OKKAM entity repository project which aimed to create an entity-centric knowledge architecture. Finally, it explores efforts to analyze the discourse structure of biology articles to identify hypotheses and evidence, and how hypotheses can erode into facts over time as more evidence is accumulated.
The document discusses sources affecting next-generation sequencing (NGS) quality and how to identify problematic NGS samples. It analyzes base sequencing quality, quality trimming, biases from base composition, potential contaminations, and gene content of two samples (A and B). Sample B showed poorer base quality, more unmapped reads, and evidence of Proteobacteria contamination compared to Sample A. Further quality control is recommended to identify issues before assembly.
Autophagy is recognized as the main tool to degrade
damaged organelles and misfolded proteins.
Slideshow includes:
Autophagic Degradation
Beclin 1 Antibody
LC3 Antibody
ATG5 Antibody
Related Antibodies
This document summarizes research on molecular mechanisms behind lameness in meat chickens. The research found alterations to bone homeostasis and bacterial immune responses that contribute to lameness. Specifically, it was found that bacterial infection dysregulates genes involved in mitochondrial function, dynamics, and biogenesis in bone cells, leading to mitochondrial dysfunction, increased cell death, and disruption of cellular processes. Additionally, genes related to the autophagy pathway were downregulated in lame chickens, suggesting bacterial infection impairs autophagy in bone tissue. The research provides insights into how bacteria may cause lameness at the molecular level by interfering with mitochondrial health and autophagy in leg bones.
Demonstration of the applicability of the Linked Data Modeling Language and CHEMROF ( https://chemkg.github.io/chemrof/) for semantic chemical sciences. Presented at MADICES 2022. https://github.com/MADICES/MADICES-2022
Scaling up semantics; lessons learned across the life sciencesChris Mungall
Semantic modeling is key to understanding the biological processes underpinning the health of humans and the health of ecosystems on this planet. There are a number of different approaches to semantic modeling, varying from modeling of *things* in the form of knowledge graphs, modeling of *data structures* in the form of semantic schemas, and modeling of *words* in the form of ultra-large language models. Taking the metaphor of modeling paradigms as planets in a semantic solar system, I will take us on a tour through the solar system, exploring the strengths of each approach, and looking through a historic lens at how we keep iterating over similar solutions with each rotation around the sun. As an alternative to the dichotomy of either resisting change, or starting afresh I urge an approach were we embrace change and adapt with each revolution. I will look specifically at how the OBO community have built powerful knowledge graphs of biological concepts, how the LinkML modeling language incorporates aspects of both frame languages and shape languages, and how language models can be integrated with semantic ontological approaches through the OntoGPT framework
More Related Content
Similar to Increased Expressivity of Gene Ontology Annotations - Biocuration 2013
Cross Product Extensions to the Gene OntologyChris Mungall
The document discusses extending the Gene Ontology (GO) through assigning logical computable definitions to GO classes. This involves partitioning GO classes into "cross product" sets based on the ontologies used in the definitions. Over 13,000 GO classes now have provisional logical definitions assigned using this approach, covering molecular function, biological process, cellular component, and other ontologies. The logical definitions allow for nested descriptions and reasoning over GO classes. Anatomy classes are standardized in the Uberon cross-species anatomy ontology.
Workflows supporting drug discovery against malariaBarry Hardy
The goal of Scientists Against Malaria (SAM) is the discovery of novel anti-malarial compounds. SAM supports virtual drug discovery organizational structures collaborating on target selection and modeling, protein expression and assay development, computational drug design, and screening. A combination of interoperable information systems, ontologies and web services were designed and deployed to manage the data, documents, computational and assay results, activity and toxicology predictions, as well as dashboards to track project progress and to support decision making. Workflows were developed for consensus virtual screening of candidate malarial kinase inhibitors including docking, pharmacophore-based screening and free energy-based molecular simulations. The models were applied to the discovery of active ligands against a novel target with previously unknown structure or ligands. The workflows were extended to include OpenTox model web services to prioritize drug candidates according to their predicted toxicities, supporting a weight of evidence categorization of candidate molecules according to their activity and toxicity profiles.
PomBase conventions for improving annotation depth, breadth, consistency and ...Valerie Wood
PomBase uses a combination of annotation conventions and QC mechanisms. In addition to identifying annotation inconsistencies and errors, these combined methods improve information content, annotation coverage, depth or specificity and redundancy.
Avacta Life Sciences Affimers Presentation Global Protein Engineering Summit ...AvactaLifeSciences
Avacta Life Sciences Exhibits Affimers at Global Protein Engineering Summit
Avacta Life Sciences exhibited recently at the Global Protein Engineering Summit ("PEGS") where it presented its Affimer technology.
You can read more about Affimer technology here http://www.avactalifesciences.com
PEGS is considered to be the essential protein engineering meeting where commercial and academic progress in protein engineering is showcased and this year it attracted over 1800 delegates from across the globe to Boston. Avacta Life Sciences presented its Affimer technology for the first time at a PEGS meeting with technical exhibits and a presentation by the CSO, Paul Ko Ferrigno, entitled "Biological Recognition: Beyond the Antibody."*
The exhibition booth was busy with over 80 delegates talking to the Avacta Life Science management team over the four days of the summit. The feedback on the Affimer technology was very positive, in particular, the short development times and excellent stability were highlighted by delegates as key advantages of Affimers over antibodies. There was also a strong interest in Affimers from the management of companies developing biological therapeutics who were keen to learn more about the potential of Affimers as novel therapeutics.
In addition, several companies were interested in the use of Affimers as an alternative to antibodies in diagnostic devices, mainly because they could generate binders against new biomarkers much more quickly and evaluate them in higher numbers.
The benefits of Affimer microarrays for biomarker discovery also resonated with diagnostic developers who appreciated the advantage of being able to evaluate significantly larger numbers of potential biomarkers more cost and time effectively than by mass spectrometry. The potential of the arrays for multiplexed solutions for clinical diagnosis and monitoring during drug trials was also something that generated interest amongst those delegates.
Matt Johnson, Chief Technical Officer of Avacta Life Sciences commented: "It was great to experience face to face the level of interest in Affimers. The majority of people I spoke to were either having problems raising antibodies to their target of interest or just couldn't use antibodies because of the type of assays they wanted to perform. Many of the presentations focused around the use of antibody fragments for intra-cellular studies which is a rapidly growing area that holds great interest for drug and diagnostics developers. It is an area where there are clear advantages for Affimers over antibody fragments which don't behave well in the cytoplasm.
"The general enthusiasm around Affimers was very encouraging and the amount of interest generated by the potential of Affimers as therapeutics and by the Affimer arrays for biomarker discovery only reinforces my excitement around this new technology."
Translating research data into Gene Ontology annotationsPascale Gaudet
This document discusses Gene Ontology (GO) annotations, which are statements linking genes to aspects of their functions as represented by GO terms. Annotations are generated through manual literature curation, manual sequence analysis, or algorithmic prediction. They are assigned evidence codes and references and represent the normal biological roles of gene products based on molecular function, biological process, and cellular component. Guidelines are provided for producing high-quality, literature-based annotations that accurately reflect experimental conclusions and biological context.
Modeling exposure events and adverse outcome pathways using ontologiesChris Mungall
This document discusses using ontologies to model exposure events, adverse outcome pathways, and phenotypes in order to support predictive toxicology. It describes existing ontologies like the Environment, Conditions, and Treatments Ontology (ECTO) and Gene Ontology Causal Activity Models (GO-CAMs) that can be used to represent exposure mechanisms and adverse outcomes. The document also presents challenges for developing an open predictive toxicology framework that leverages ontologies and linked data to make toxicology data more findable, accessible, interoperable, and reusable.
This document discusses three possible strategies for identifying biological knowledge from scientific literature: 1) Allowing authors to validate biological entities during the writing process, 2) Performing discourse analysis to understand persuasive elements and relationships between ideas, and 3) Encouraging collaboration between authors and databases to identify hypotheses. It focuses on the challenges of current fact extraction techniques and the potential for modeling discourse and rhetorical moves to improve knowledge representation.
NCBI has developed a powerful suite of online biomedical and bioinformatics resources, including old friends like PubMed and OMIM and newer resources such as Genome. This collection of databases and tools are widely used by scientists and medical professionals across the world. With such a wealth of information, it is easy to get overwhelmed. Join us for an overview to NCBI resources for the information professional with an emphasis on biodata connectivity. No science degree required!
Background of the project and simple use cases of using the Open PHACTS API and KNIME to extract compound, target and indication entities from millions of patent documents and infer meaningful links among them. Open PHACTS Linked Data meeting in Vienna.
The document discusses analyzing Gene Ontology (GO) annotations using Zipf's law, which states that word frequencies follow a power law distribution. The author:
1) Analyzed GO annotations from several species and found they generally follow a power law, indicating GO acts like a language. Exponents were in the normal range for communication.
2) Biological process annotations had higher exponents, suggesting more precise encoding, while molecular function and cellular component annotations favored the speaker with lower exponents.
3) High confidence annotations fit power laws better than low confidence ones, indicating higher quality communication.
The analysis provides a way to rapidly assess the "language-like" qualities and potential quality of GO annotations through the power law
The document discusses the functional interaction between mGlu1a and GABAB receptors. It first provides background on G protein-coupled receptors and their importance as drug targets. It then reviews evidence that mGlu1a and GABAB receptors physically interact in cortical neurons and Purkinje cells based on co-localization and co-immunoprecipitation studies. The study aims to further investigate the physical interaction between the receptors and the mechanism underlying their functional cross-talk using biophysical techniques like BRET, TR-FRET, and cell-surface co-immunoprecipitation. Preliminary data suggests mGlu1a and GABAB do not form complexes at the cell surface but may oligomerize intracellularly.
This document discusses using stochastic models to understand principles of gene regulation from regulatory DNA architecture. It summarizes that regulating promoters downstream of irreversible assembly steps reduces molecular noise compared to regulating initiation rates. Distributed binding sites across enhancers also helps reduce expression noise. The document proposes relating complex biochemical architectures of promoters and enhancers to their transcriptional properties using finite Markov chain approaches.
Paprica is a pipeline that predicts metabolic pathways and enzymes from 16S rRNA gene sequences. The document outlines a schedule for a short course on using paprica, including installing dependencies, running tutorials to analyze sample data, and building a custom paprica database. The tutorials will demonstrate the paprica workflow, analyzing samples and visualizing output files containing predicted pathways, enzymes, and confidence scores.
Inferring microbial gene function from evolution of synonymous codon usage bi...Fran Supek
Introduction: Thousands of microbial genomes are available, yet even for the model organisms, a sizable portion of the genes have unknown function. Phyletic profiling is a technique that can predict their function by comparing the presence/absence profiles of their homologs across genomes. In addition, prokaryotic genomes contain an evolutionary signature of gene expression levels in the codon usage biases, where highly expressed genes prefer the codons better adapted to the cellular tRNA pools.
Objectives: We aimed to augment the existing phyletic profiling approaches by incorporating more detailed knowledge of gene evolutionary history, and create a very large database of predicted gene functions direcly usable for microbiologists.
Materials & methods: We used the OMA groups of orthologs and the paralogy relationships inferred through OMA's „witness of non-orthology“ rule. Genes were assigned to Gene Ontology categories and the phyletic profiles compared using the CLUS classifier that performs a hierarchical multilabel classification using decision trees. We quantified significant codon biases using a Random Forest randomization test that compares against the composition of intergenic DNA. Codon biases in COG gene families were contrasted between microbes inhabiting different enviroments, while controlling for phylogenetic inertia.
Results: The genomic co-occurence patterns of both the orthologs and the paralogs (the homologs separated by a speciation and by a duplication event, respectively) were informative and synergistic in a phylogenetic profiling setup, even though paralogy relationships are thought to conserve function less well. The resulting ~400,000 gene function predictions for 998 prokaryotes (at FDR<10%)> method to systematically link codon adaptation within COG gene families to microbial phenotypes and environments (thus functionally characterizing the COGs) and experimentally validated the predictions for novel E. coli genes relevant for surviving oxidative, thermal or osmotic stress.
Conclusion: Our work towards ehnancing phylogenetic profiling, as well as developing complementary genomic context approaches, will contribute to prioritizing experimental investigation of microbial gene function, cutting time and cost needed for discovery.
caron.ppt educate the patient on the usesomar97227
This document summarizes Marc Caron's research on the neuronal plasticity associated with drugs of abuse. Some key points:
- Caron used mouse models and genetic approaches to study addiction at the molecular level. He identified behaviors like sensitization and withdrawal that model aspects of human addiction.
- Microarray analysis of genetically sensitized mice found changes in genes like PSD-95, involved in synaptic plasticity. Knockout of PSD-95 enhanced responses to drugs, suggesting it plays a role in addiction.
- Further work showed that cocaine reduces PSD-95 levels in the striatum, enhancing long-term potentiation there. This may underlie the plasticity induced by drugs of abuse.
This document discusses experiments in semantic enrichment. It describes work done with FEBS Letters to semantically enrich their structured digital abstracts with entities and relationships from databases. It also discusses the OKKAM entity repository project which aimed to create an entity-centric knowledge architecture. Finally, it explores efforts to analyze the discourse structure of biology articles to identify hypotheses and evidence, and how hypotheses can erode into facts over time as more evidence is accumulated.
The document discusses sources affecting next-generation sequencing (NGS) quality and how to identify problematic NGS samples. It analyzes base sequencing quality, quality trimming, biases from base composition, potential contaminations, and gene content of two samples (A and B). Sample B showed poorer base quality, more unmapped reads, and evidence of Proteobacteria contamination compared to Sample A. Further quality control is recommended to identify issues before assembly.
Autophagy is recognized as the main tool to degrade
damaged organelles and misfolded proteins.
Slideshow includes:
Autophagic Degradation
Beclin 1 Antibody
LC3 Antibody
ATG5 Antibody
Related Antibodies
This document summarizes research on molecular mechanisms behind lameness in meat chickens. The research found alterations to bone homeostasis and bacterial immune responses that contribute to lameness. Specifically, it was found that bacterial infection dysregulates genes involved in mitochondrial function, dynamics, and biogenesis in bone cells, leading to mitochondrial dysfunction, increased cell death, and disruption of cellular processes. Additionally, genes related to the autophagy pathway were downregulated in lame chickens, suggesting bacterial infection impairs autophagy in bone tissue. The research provides insights into how bacteria may cause lameness at the molecular level by interfering with mitochondrial health and autophagy in leg bones.
Similar to Increased Expressivity of Gene Ontology Annotations - Biocuration 2013 (20)
Demonstration of the applicability of the Linked Data Modeling Language and CHEMROF ( https://chemkg.github.io/chemrof/) for semantic chemical sciences. Presented at MADICES 2022. https://github.com/MADICES/MADICES-2022
Scaling up semantics; lessons learned across the life sciencesChris Mungall
Semantic modeling is key to understanding the biological processes underpinning the health of humans and the health of ecosystems on this planet. There are a number of different approaches to semantic modeling, varying from modeling of *things* in the form of knowledge graphs, modeling of *data structures* in the form of semantic schemas, and modeling of *words* in the form of ultra-large language models. Taking the metaphor of modeling paradigms as planets in a semantic solar system, I will take us on a tour through the solar system, exploring the strengths of each approach, and looking through a historic lens at how we keep iterating over similar solutions with each rotation around the sun. As an alternative to the dichotomy of either resisting change, or starting afresh I urge an approach were we embrace change and adapt with each revolution. I will look specifically at how the OBO community have built powerful knowledge graphs of biological concepts, how the LinkML modeling language incorporates aspects of both frame languages and shape languages, and how language models can be integrated with semantic ontological approaches through the OntoGPT framework
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODOChris Mungall
NOTE THAT I HAVE MOVED AWAY FROM SLIDESHARE TO ZENODO
The identical presentation is now here:
https://doi.org/10.5281/zenodo.7778641
General introduction to LinkML, The Linked Data Modeling Language.
Adapter from presentation given to NIH May 2022
https://linkml.io/linkml
Slides from the Ontology Access Kit (OAK) workshop, https://incatools.github.io/ontology-access-kit/
OAK is a pluralistic Python library for accessing a variety of ontologies, using either the command line or the Python library
This document provides an overview of LinkML, a lightweight modeling language for building data schemas and knowledge graphs. It discusses how LinkML allows users to model data in a simple yet expressive way and generate outputs like JSON Schema, OWL, and RDF. LinkML aims to be developer-friendly and integrates with popular tools and standards. Several key projects currently use LinkML for tasks like building knowledge graphs and modeling genomics and clinical data.
LinkML is a modeling language for building semantic models that can be used to represent biomedical and other scientific knowledge. It allows generating various schemas and representations like OWL, JSON Schema, GraphQL from a single semantic model specification. The key advantages of LinkML include simplicity through YAML files, ability to represent models in multiple forms like JSON, RDF, and property graphs, and "stealth semantics" where semantic representations like RDF are generated behind the scenes.
Experiences in the biosciences with the open biological ontologies foundry an...Chris Mungall
The document discusses the need for ontologies in biology to integrate data from the large number of biological databases and standards. It outlines tools for building and using ontologies, including those for end users to search and analyze data, and those for ontology engineers to develop ontologies through automated reasoning and integration. The Gene Ontology is provided as an example of an ontology that has been widely adopted for analyzing gene sets. The document advocates developing ontologies through a collaborative framework like the Open Biological and Biomedical Ontologies to promote reuse and integration across domains.
All together now: piecing together the knowledge graph of lifeChris Mungall
The document summarizes challenges in organizing biological knowledge and progress made through collaborative ontology development. It discusses how early efforts focused on individual ontologies but challenges emerged in maintenance and linking data. New approaches focus on shared principles, standardized mappings between ontologies, and modeling knowledge as graphs. Tools like Boomer and LinkML help reconcile mappings and model data, while community efforts like OBO Foundry and Biolink Model advance integration through open collaboration. Overall progress has been made but more work is needed to operationalize ontologies and build interconnected knowledge graphs.
Collaboratively Creating the Knowledge Graph of LifeChris Mungall
The document discusses collaboratively building a knowledge graph of life by connecting existing biological ontologies. It describes how ontologies can standardize and organize biological data by representing entities and their relationships in a graph. The challenges of integrating different ontology projects are addressed through initiatives like the Open Biological and Biomedical Ontologies (OBO) Foundry. The document outlines how ontologies can be formalized using OWL and connected using tools like the Ontology Development Kit to enable discovery across domains. Current efforts like the Gene Ontology, Biolink Model, and National Microbiome Data Collaborative are leveraging these techniques to create unified, semantically queryable knowledge graphs.
Representation of kidney structures in UberonChris Mungall
The document discusses representation of kidney structures in the Uberon anatomy ontology. It provides examples of kidney classes like glomerular capsule and S-shaped body represented in Uberon along with their relationships. It also discusses how Uberon integrates representations of kidney structures from other species and anatomy ontologies through equivalence axioms and cross-links.
Uberon: opening up to community contributionsChris Mungall
The document discusses Uberon, an integrative multi-species anatomy ontology. It describes Uberon's taxonomic scope covering metazoans with a focus on vertebrates. It outlines how Uberon is edited on GitHub and maintained with cross-references to other species-specific anatomy ontologies. It also discusses how phenotypes from the Phenotype and Human Phenotype Ontology are directly mapped to Uberon and species-specific anatomies, as well as considerations for which anatomy ontology a phenotype ontology should use.
Causal reasoning using the Relation OntologyChris Mungall
The document discusses the need for standardized relationship types in biological data and ontologies. It provides an overview of the Relation Ontology (RO), which defines over 450 standardized relationship types organized hierarchically. RO provides a foundation for integrating multiple knowledge graphs and represents relationships in ontologies, linked data, and knowledge bases. It enables logical reasoning and inference across graphs through properties like transitivity.
This document discusses lessons learned from developing and using the Gene Ontology (GO) over the past 20 years. It covers how GO aims to systematically annotate gene function across species using an ontology. It describes how GO uses OWL constructs like subclasses, equivalence and reasoning to leverage relationships with other ontologies. It also discusses moving beyond simple annotation to represent biology accurately using causal models and graphs. Finally, it covers the Open Biology Ontology Foundry principles of collaboration, shared standards and interconnected ontologies that GO adheres to.
1. The document discusses using phenotypes across species to aid in interpreting genomic data from patients and improving diagnosis and treatment.
2. Building comprehensive phenotype databases from multiple sources is challenging due to disparate data on human genes/variants and model organisms.
3. The Monarch Initiative aims to link human diseases to phenotypes in model systems through an ontology-based knowledge base and portal.
4. Incorporating rich phenotypic data can improve variant filtering and interpretation by providing more context for sequencing results.
The document discusses the Environment Ontology (ENVO), which aims to represent environmental entities and their relationships in a structured format. It describes the main hierarchies in ENVO, including biome, environmental feature, and environmental material. ENVO represents different levels of environmental granularity from broad biomes down to specific materials. Any material entity can act as a feature determining an environmental system. The objectives for further developing ENVO are also outlined, such as representing various environmental qualities like temperature, nutrients, and toxins.
Chris Mungall discussed his path in biocuration which led him to focus on ontologies. Ontologies can amplify the impact of data by providing a structured knowledge framework. Early ontologies like GO became too monolithic so the Open Biological Ontologies (OBO) Foundry was created to develop interoperable, modular ontologies through collaboration. Mungall described work developing ontologies like Uberon, developing tools like ROBOT for quality control, and a vision for more sophisticated ontology annotation to encode biological knowledge.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/how-axelera-ai-uses-digital-compute-in-memory-to-deliver-fast-and-energy-efficient-computer-vision-a-presentation-from-axelera-ai/
Bram Verhoef, Head of Machine Learning at Axelera AI, presents the “How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-efficient Computer Vision” tutorial at the May 2024 Embedded Vision Summit.
As artificial intelligence inference transitions from cloud environments to edge locations, computer vision applications achieve heightened responsiveness, reliability and privacy. This migration, however, introduces the challenge of operating within the stringent confines of resource constraints typical at the edge, including small form factors, low energy budgets and diminished memory and computational capacities. Axelera AI addresses these challenges through an innovative approach of performing digital computations within memory itself. This technique facilitates the realization of high-performance, energy-efficient and cost-effective computer vision capabilities at the thin and thick edge, extending the frontier of what is achievable with current technologies.
In this presentation, Verhoef unveils his company’s pioneering chip technology and demonstrates its capacity to deliver exceptional frames-per-second performance across a range of standard computer vision networks typical of applications in security, surveillance and the industrial sector. This shows that advanced computer vision can be accessible and efficient, even at the very edge of our technological ecosystem.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...Alex Pruden
Folding is a recent technique for building efficient recursive SNARKs. Several elegant folding protocols have been proposed, such as Nova, Supernova, Hypernova, Protostar, and others. However, all of them rely on an additively homomorphic commitment scheme based on discrete log, and are therefore not post-quantum secure. In this work we present LatticeFold, the first lattice-based folding protocol based on the Module SIS problem. This folding protocol naturally leads to an efficient recursive lattice-based SNARK and an efficient PCD scheme. LatticeFold supports folding low-degree relations, such as R1CS, as well as high-degree relations, such as CCS. The key challenge is to construct a secure folding protocol that works with the Ajtai commitment scheme. The difficulty, is ensuring that extracted witnesses are low norm through many rounds of folding. We present a novel technique using the sumcheck protocol to ensure that extracted witnesses are always low norm no matter how many rounds of folding are used. Our evaluation of the final proof system suggests that it is as performant as Hypernova, while providing post-quantum security.
Paper Link: https://eprint.iacr.org/2024/257
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Increased Expressivity of Gene Ontology Annotations - Biocuration 2013
1. Increased Expressivity of Gene
Ontology Annotations
Huntley RP, Harris MA, Alam-Faruque Y, Carbon SJ,
Dietze H, Dimmer E, Foulger R, Hill DP, Khodiyar V,
Lock A, Lomax J, Lovering RC, Mungall CJ, Mutowo-
Muellenet P, Sawford T, Van Auken K, Wood V
2. The Gene Ontology
• A vocabulary of 37,500* distinct, connected
descriptions that can be applied to gene
products
gene 1
gene 2
• That’s a lot…
– How big is the space of possible descriptions?
*April 2013
3.
4. Current descriptions miss details
• Author:
– LMTK1 (Aatk) can negatively control axonal outgrowth in
cortical neurons by regulating Rab11A activity in a Cdk5-
dependent manner
– http://www.ncbi.nlm.nih.gov/pubmed/22573681
• GO:
– Aatk: GO:0030517 negative regulation of axon extension
• GO terms will always be a subset of total set of possible
descriptions
– We shouldn’t attempt to make a term for everything
5. • T63 Toxic effect of contact with venomous
animals and plants
Term from ICD-10, a
hierarchical medical
billing code system
use to ‘annotate’
patient records
6. • T63 Toxic effect of contact with venomous
animals and plants
– T63.611 Toxic effect of contact with Portugese
Man-o-war, accidental (unintentional)
7. • T63 Toxic effect of contact with venomous
animals and plants
– T63.611 Toxic effect of contact with Portugese
Man-o-war, accidental (unintentional)
– T63.612 Toxic effect of contact with Portugese
Man-o-war, intentional self-harm
8. • T63 Toxic effect of contact with venomous
animals and plants
– T63.611 Toxic effect of contact with Portugese
Man-o-war, accidental (unintentional)
– T63.612 Toxic effect of contact with Portugese
Man-o-war, intentional self-harm
– T63.613 Toxic effect of contact with Portugese
Man-o-war, assault
9. • T63 Toxic effect of contact with venomous
animals and plants
– T63.611 Toxic effect of contact with Portugese
Man-o-war, accidental (unintentional)
– T63.612 Toxic effect of contact with Portugese
Man-o-war, intentional self-harm
– T63.613 Toxic effect of contact with Portugese
Man-o-war, assault
• T63.613A Toxic effect of contact with Portugese Man-
o-war, assault, initial encounter
• T63.613D Toxic effect of contact with Portugese Man-
o-war, assault, subsequent encounter
• T63.613S Toxic effect of contact with Portugese Man-
o-war, assault, sequela
10. Post-composition
• Curators need to be able to compose their
complex descriptions from simpler
descriptions (terms) at the time of annotation
• GO annotation extensions
• Introduced with Gene Association Format (GAF) v2
– Also supported in GPAD
• Has underlying OWL description-logic model
http://www.geneontology.org/GO.format.gaf-2_0.shtml
11. “Classic” annotation model
• Gene Association Format (GAF) v1
– Simple pairwise model
– Each gene product is associated with an (ordered) set
of descriptions
• Where each description == a GO term
http://www.geneontology.org/GO.format.gaf-1_0.shtml
12. GO annotation extensions
• Gene Association Format (GAF) v1
– Simple pairwise model
– Each gene product is associated with an (ordered) set of
descriptions
• Where each description == a GO term
• Gene Association Format (GAF) v2 (and GPAD)
– Each gene product is (still) associated with an (ordered) set of
descriptions
– Each description is a GO term plus zero or more relationships
to other entities
• Entities from GO, other ontologies, databases
• Description is an OWL anonymous class expression (aka description)
http://www.geneontology.org/GO.format.gaf-2_0.shtml
13. “Classic” GO annotations are
unconnected
positive regulation of
protein transcription from pol II
localization to pap1 promoter in response to
sty1 nucleus[GO:003 oxidative
stress[GO:0036091]
4504]
cellular response
to oxidative stress
[GO:0034599]
DB Object Term Ev Ref ..
PomBase sty1 GO:0034504 IMP PMID:9585505 .. .. ..
SPAC24B11.06c
PomBase sty1 GO:0034599 IMP PMID:9585505 .. ..
SPAC24B11.06c
PomBase pap1 GO:0036091 IMP PMID:9585505 ..
SPAC1783.07c
14. Now with annotation extensions
positive regulation of
protein cellular response transcription from pol II
localization to to oxidative stress promoter in response to
nucleus[GO:003 [GO:0034599] oxidative
stress[GO:0036091]
4504]
happens
during
sty1 pap1
has
<anonymous
input <anonymous has regulation
description> description>
target
DB Object Term Ev Ref Extension
PomBase sty1 GO:0034504 IMP PMID:9585505 .. happens_during(GO:0034599), ..
SPAC24B11.06c protein has_input(SPAC1783.07c)
localization to
nucleus
PomBase pap1 GO:0036091 IMP PMID:9585505 has_reulation_target(…)
SPAC1783.07c
21. Curation tool support
• Supported in
– Protein2GO (GOA, WormBase) [poster#97]
– CANTO (PomBase) [poster#110]
– MGI curation tool
22. Analysis tool support
• Currently: Enrichment tools do not yet support
annotation extensions
– Annotation extensions can be folded into an
analysis ontology - http://galaxy.berkeleybop.org
• Future: Analysis tools can use extended
annotations to their benefit
– E.g. account for other modes of regulation in their
model
– Tool developers: contact us!
23. Challenge: pre vs post composition
• Curator question: do I…
– Request a pre-composed term via TermGenie[*]?
– Post-compose using annotation extensions?
See Heiko’s TermGenie talk tomorrow & poster #33
24. Challenge: pre vs post composition
• Curator question: do I…
– Request a pre-composed term via TermGenie?
– Post-compose using annotation extensions?
• From a computational protein localization to
nucleus[GO:0034504]
perspective:
– It doesn’t matter, we’re ≡
using OWL end_location
protein
– 40% of GO terms have OWL localization ⊓
Nucleus
[GO:0005634
equivalence axioms [GO:0008104] ]
http://code.google.com/p/owltools/wiki/AnnotationExtensionFolding
25. Curation Challenges
• Manual Curation
– Fewer terms, but more degrees of freedom
– Curator consistency
• OWL constraints can help
• Automated annotation
– Phylogenetic propagation
– Text processing and NLP
26. Similar approaches and future
directions
• Post-composition has been used extensively
for phenotype annotation
– ZFIN [poster#95]
– Phenoscape [next talk]
• Future:
– A more expressive model that bridges GO with
pathway representations
27. Conclusions
• Description space is huge
– Context is important
– Not appropriate to make a term for everything
– OWL allows us to mix and match pre and post
composition
• Number of extension annotations is growing
• Annotation extensions represent untapped
opportunity for tool developers
28. Acknowledgments
• GO Consortium, model organism and UniProtKB curators
• GO Directors
• PomBase developers:
– Mark McDowell, Kim Rutherford
• Funding
– GO Consortium NIH 5P41HG002273-09
– UniProtKB GOA NHGRI U41HG006104-03
– British Heart Foundation grant SP/07/007/23671
– Kidney Research UK RP26/2008
– PomBase - Wellcome Trust WT090548MA
– MGD NHGRI HG000330
Editor's Notes
10 mins. GAF2.0
1
Sweet spot in a large galaxy
Not ad-hoc – OWL description
Key point: logically equivalent to an annotation to a term in the <anon desc> box, with the same links out.