The document discusses challenges and opportunities in predicting protein function and subcellular localization from sequence data alone. It outlines issues with current orthology-based and pathway-based prediction methods, and ways to improve functional predictions by differentiating true orthologs from non-orthologous relationships and developing better pathway signatures. The author advocates for databases like OrtholugeDB that pre-compute ortholog predictions across many genomes to facilitate large-scale evaluation of prediction methods.
Protein-protein interactions are important for many biological processes. There are various types of interactions depending on their composition and duration. Methods to study interactions include yeast two-hybrid, co-immunoprecipitation, affinity chromatography, and chromatin immunoprecipitation. Databases such as IntAct and MINT provide repositories for protein interaction data.
Alexey Ball has 15 years of experience in oncology research, from basic target exploration to biomarker discovery and clinical application. He has a track record of developing assays, including multiparametric flow cytometry, to characterize cancer targets, biomarkers, and immunotherapies using primary tumor cells and patient samples. Ball also has experience designing studies, presenting data, and collaborating with external partners.
This document presents information on multiplex assays and protein microarrays. It begins with an introduction to multiplex assays, which allow simultaneous measurement of multiple analytes. It then discusses the historical development and types of multiplex assays, including protein and antibody microarrays. Protein microarrays are described as a tool that allows high-throughput analysis of protein expression, functions, and interactions. Applications of protein and antibody microarrays include disease diagnostics, biomarker identification, and functional proteomics. The document concludes by emphasizing the utility of protein microarray technology for multiplexed detection and proteomics studies.
This document provides an overview of comparative genomics. It begins by defining genomics and its subfields, including comparative genomics which compares complete genome sequences across species. Tools for comparative genomics like BLAST and synteny are discussed. The history of comparative genomics from early virus comparisons to current eukaryote analyses is summarized. Methods for comparative analysis include examining genome structure, coding regions, protein content, and non-coding regions. General databases useful for comparative genomics are also listed.
This document summarizes a study that systematically analyzed 3,158 druggable human genes to identify those that lack orthologous (equivalent) genes in mouse, rat, and dog. The researchers used several databases and tools to map human genes to orthologs in these species. They identified 41 genes that lack orthologs in all three species, as well as 22 genes that are missing orthologs in mouse and rat but have them in dog. The authors discuss implications for toxicity testing and drug development for targets lacking rodent orthologs.
There are two main types of genetic association studies: pedigree-based methods and pedigree-independent methods. Pedigree-based methods include positional cloning and the founder gene approach which use linkage analysis and genetic mapping of families. Pedigree-independent methods include the candidate gene approach and genome-wide association studies which examine associations between genetic variants and phenotypes across many individuals.
Database Of Rose Varieties Eucarpia Leiden 2009renesmulders
A presentation on the use of microsatellite markers to genotype over 700 rose varieties for identification purposes, given at the 23rd Intl. Eucarpia Symp. (Sec. Ornamentals) on
“Colourful Breeding and Genetics” in Leiden, The Netherlands, September 2009. Published in Acta Horticulturae (ISHS) 836: 169-174 (2009)
Protein-protein interactions are important for many biological processes. There are various types of interactions depending on their composition and duration. Methods to study interactions include yeast two-hybrid, co-immunoprecipitation, affinity chromatography, and chromatin immunoprecipitation. Databases such as IntAct and MINT provide repositories for protein interaction data.
Alexey Ball has 15 years of experience in oncology research, from basic target exploration to biomarker discovery and clinical application. He has a track record of developing assays, including multiparametric flow cytometry, to characterize cancer targets, biomarkers, and immunotherapies using primary tumor cells and patient samples. Ball also has experience designing studies, presenting data, and collaborating with external partners.
This document presents information on multiplex assays and protein microarrays. It begins with an introduction to multiplex assays, which allow simultaneous measurement of multiple analytes. It then discusses the historical development and types of multiplex assays, including protein and antibody microarrays. Protein microarrays are described as a tool that allows high-throughput analysis of protein expression, functions, and interactions. Applications of protein and antibody microarrays include disease diagnostics, biomarker identification, and functional proteomics. The document concludes by emphasizing the utility of protein microarray technology for multiplexed detection and proteomics studies.
This document provides an overview of comparative genomics. It begins by defining genomics and its subfields, including comparative genomics which compares complete genome sequences across species. Tools for comparative genomics like BLAST and synteny are discussed. The history of comparative genomics from early virus comparisons to current eukaryote analyses is summarized. Methods for comparative analysis include examining genome structure, coding regions, protein content, and non-coding regions. General databases useful for comparative genomics are also listed.
This document summarizes a study that systematically analyzed 3,158 druggable human genes to identify those that lack orthologous (equivalent) genes in mouse, rat, and dog. The researchers used several databases and tools to map human genes to orthologs in these species. They identified 41 genes that lack orthologs in all three species, as well as 22 genes that are missing orthologs in mouse and rat but have them in dog. The authors discuss implications for toxicity testing and drug development for targets lacking rodent orthologs.
There are two main types of genetic association studies: pedigree-based methods and pedigree-independent methods. Pedigree-based methods include positional cloning and the founder gene approach which use linkage analysis and genetic mapping of families. Pedigree-independent methods include the candidate gene approach and genome-wide association studies which examine associations between genetic variants and phenotypes across many individuals.
Database Of Rose Varieties Eucarpia Leiden 2009renesmulders
A presentation on the use of microsatellite markers to genotype over 700 rose varieties for identification purposes, given at the 23rd Intl. Eucarpia Symp. (Sec. Ornamentals) on
“Colourful Breeding and Genetics” in Leiden, The Netherlands, September 2009. Published in Acta Horticulturae (ISHS) 836: 169-174 (2009)
Structural genomics is a field that aims to determine the 3D structures of all proteins encoded by a genome. It involves determining structures on a large scale using techniques like X-ray crystallography and NMR. This allows identification of novel protein folds and potential drug targets. Comparative genomics compares genomic features between organisms and provides insights into evolution and conserved sequences and functions. It is a key tool in fields like medicine and agriculture.
Prediction of mi-RNA related to late blight disease of potatoAnimesh Kumar
The document summarizes the process of predicting miRNAs related to late blight disease of potato using bioinformatics approaches. It discusses how miRNAs play an important role in host-pathogen interactions and regulates genes. The author proposes to identify potential pathogenic miRNAs and their targets in Solanum tuberosum in response to Phytophthora infestans infection using available EST sequences. A multi-step computational approach including screening ESTs, identifying pre-miRNAs, predicting secondary structures, and identifying targets is outlined. Relevant literature on plant miRNA prediction and P. infestans is also reviewed.
The document discusses the field of proteomics, which is the large-scale study of proteins, including their functions and structures. It defines proteomics and describes several areas within it, such as functional proteomics, expressional proteomics, and structural proteomics. It outlines typical proteomics experiments and some key methods used, including two-dimensional electrophoresis, mass spectrometry, and protein-protein interaction prediction methods like phylogenetic profiling.
Proteomics is the study of the proteome, which is the entire set of proteins expressed by a genome, cell, tissue or organism. This document discusses several techniques used in proteomics including 2D gel electrophoresis, mass spectrometry, and protein databases. It provides examples of applications such as biomarker identification for disease diagnosis and drug target discovery. Limitations include the complexity of proteomes and no single technique being adequate for complete analysis. Overall, proteomics techniques help further our understanding of protein structure, function and interactions to gain insights into biological processes and diseases.
Drug Repositioning Conference Washington DC 20190923Tudor Oprea
Discussing the knowledge-based classification of human proteins and its applications in target repurposing discovery, with potential applications for Rare Diseases
Covering our on-going Machine Learning efforts using Protein Knowledge Graphs and MetaPath / XGBoost to predict novel protein-disease associations. Specific Examples for Type 2 Diabetes.
Computational Drug Repositioning Workflow.
Addressing the limitations and potential of machine learning in target and drug repurposing.
Drug Repositioning Candidates: Alprazolam / Glycopyrronium / Oteracil.
Introduction to Gene Mining Part A: BLASTn-off!adcobb
In this lesson, students will learn to use bioinformatics portals and tools to mine plant versions of human genes. Student handout and teacher resource materials are available at www.Araport.org, Teaching Resources (Community tab). Suitable for grades 9-12 or first year undergraduate students.
The document describes a lab for bioinformatics and computational genomics at Ghent University. It has over 100 people including engineers, mathematicians, and molecular biologists. The lab uses bioinformatics approaches like sequence analysis, datamining, and computational biology to analyze large genomic datasets. One goal is developing an app for personal genomic analysis and interpretation.
This document discusses various topics related to drug discovery through bioinformatics. It begins by describing how genome-wide RNAi screening in the nematode C. elegans can be used to identify genes involved in biological pathways related to diseases like type-2 diabetes. It then discusses topics like structural genomics, target identification and validation, high-throughput screening approaches and facilities, sources for screening libraries, criteria for hit and lead compounds, and computational methods used in hit identification and optimization like pharmacophore modeling and evaluating compounds against the "rule of five". Descriptors that can be used for characterizing compounds are also listed.
Envisioning a world where everyone helps solve diseasemhaendel
Keynote presented at the Semantic Web for Life Sciences conference in Cambridge, UK, December 9th, 2015
http://www.swat4ls.org/
The talk focuses on the use of ontologies for data integration to support rare disease diagnostics, and how so very many people unbeknownst to the patient or even to the researchers creating the data are involved in a diagnosis.
The document discusses protein-protein interactions (PPIs), including an introduction to PPIs, the types of interactions, techniques used to study them like X-ray crystallography, NMR spectroscopy and cryo-electron microscopy, and factors that affect PPIs. It also covers methods to investigate PPIs such as affinity purification coupled with mass spectrometry and yeast two-hybrid screening. Applications of understanding PPIs include developing therapeutic drugs and identifying functions of unknown proteins.
Introduction to Gene Mining: Part B: How similar are plant and animal version...adcobb
In this lesson, students will navigate BLASTp and www.Araport.org to determine whether plant and animal versions of genes and proteins are homologous. Student handout and teacher resources are available at www.Araport.org, teacher resources page (under Community). Suitable for grades 9-12 or first year undergraduate students.
Proteomics uses techniques from molecular biology, biochemistry, and genetics to analyze proteins produced by genes. Mass spectrometry is commonly used in proteomics to identify proteins. Techniques like isotope-coded affinity tags (ICAT) allow comparative analysis of protein expression between samples by labeling proteins with stable isotopes before mass spectrometry analysis. ICAT involves labeling cysteine-containing peptides from two samples with either light or heavy isotopic reagents, mixing the samples, then using mass spectrometry to quantify differences in protein expression between the original samples based on mass shifts between labeled peptides.
Protein microarrays are high-throughput methods that allow researchers to study protein interactions and functions on a large scale. There are three main types of protein microarrays: analytical microarrays use antibodies to detect specific proteins in samples; functional microarrays examine protein-protein and other molecular interactions; and reverse-phase protein microarrays profile protein expression levels and post-translational modifications by immobilizing cell or tissue lysates. Protein microarrays have applications in diagnostics, proteomics, studying protein functions, and analyzing antibodies.
This document discusses NIST's work in developing genomic reference materials and methods to evaluate microbial genomics measurements. It describes three projects: 1) assessing genomic purity by detecting low levels of contaminants using sequencing and classification, 2) evaluating SNP calling methods using reference materials and replicates to establish confidence, and 3) developing characterized genomic reference materials for public health pathogens. The overall aim is to build an infrastructure to support genome-based characterization of microbial samples.
This document provides an overview of common proteomics techniques. It describes proteomics as the study of proteins including their roles, structures, localization, interactions and other factors. The key techniques discussed include molecular techniques like DNA microarrays and yeast two-hybrid analysis, separation techniques like gel electrophoresis and chromatography, protein identification methods like mass spectroscopy and Edman sequencing, and protein structure determination methods like NMR, X-ray crystallography and computational prediction. The document provides examples and details of several of these techniques.
Target Validation / Biochemical and Cellular Assay Development OSUCCC - James
Target validation and assay development are essential steps in the drug discovery process. This document discusses several approaches to target validation, including using genetic tools like CRISPR/Cas9 and RNAi to interrogate targets. It also provides an example of developing a cellular assay using patient-derived cells to validate a target for cystic fibrosis. Additionally, the document describes a case study where phenotypic screening was used to discover a small molecule that restores function of a mutant protein associated with Usher Syndrome type III.
Protein Interaction Reporters : Protein-Protein Interactions (PPI) elucidated...Lorenz Lo Sauer
This document discusses protein interaction reporters (PIRs), a crosslinking strategy to study protein-protein interactions (PPIs) using mass spectrometry. PIRs chemically crosslink interacting proteins in their native state, then use a cleavable linker and mass spectrometry to identify and sequence the interacting proteins. Key advantages of PIRs include their ability to provide system-wide snapshots of PPI networks, introduce isotopic labels for relative quantification, and enrich for crosslinked peptides to reduce data complexity challenges. Future directions may include developing PIRs targeted to specific classes of proteins or reaction mechanisms to gain more functional insight into PPIs.
This document discusses protein-protein interaction prediction. It begins with an introduction to protein-protein interactions and databases of known interactions. It then describes methods that can be used to predict interactions, including using gene co-expression, orthology, domain co-occurrence, and other individual features. A naive Bayesian model is used to combine these features. The accuracy of using individual features and combined features is evaluated using cross-validation. Comparisons are made to other known interaction datasets to validate the predictions. In conclusion, over 37,000 novel human protein interactions are predicted beyond what is in current databases.
Structural genomics aims to determine the 3D structures of all proteins encoded by genomes through high-throughput methods. It uses a genome-based approach to solve protein structures rapidly and cost-effectively. Major initiatives like the Protein Structure Initiative have made progress in determining thousands of protein structures. Challenges include expressing membrane and eukaryotic proteins, as well as determining remaining novel folds. Determining protein structures through structural genomics increases understanding of protein function and facilitates drug discovery.
B.sc biochem i bobi u 3.3 homologous and heterologousRai University
This document defines and compares heterologs, homologs, analogs, orthologs, and paralogs. Heterologs differ in origin and activity, while homologs have a common origin but not necessarily common activity. Sequence similarity is a quantitative measure of how many bases match between two aligned sequences. Analogs have common activity but different origins, evolving convergently. Orthologs are homologs that evolved from a common ancestral gene through speciation, often retaining the same function. Paralogs are homologs produced through gene duplication within a genome, and may evolve new functions.
1. The study characterized Echinococcus granulosus genotypes from cyst samples collected from buffalo, sheep, and humans in Bangladesh using PCR of mitochondrial genes.
2. Two genotypes were identified: the common sheep strain G1 and the buffalo strain G3. Nine of 15 buffalo samples and 6 of 9 sheep samples tested positive for G1 using 12S rRNA, while 3 buffalo tested positive for G3 using COX1.
3. Sequence analysis revealed close identity between Bangladeshi isolates and reference sequences for G1 and G3 from other countries. This is the first molecular characterization of Echinococcus spp. in Bangladesh.
Structural genomics is a field that aims to determine the 3D structures of all proteins encoded by a genome. It involves determining structures on a large scale using techniques like X-ray crystallography and NMR. This allows identification of novel protein folds and potential drug targets. Comparative genomics compares genomic features between organisms and provides insights into evolution and conserved sequences and functions. It is a key tool in fields like medicine and agriculture.
Prediction of mi-RNA related to late blight disease of potatoAnimesh Kumar
The document summarizes the process of predicting miRNAs related to late blight disease of potato using bioinformatics approaches. It discusses how miRNAs play an important role in host-pathogen interactions and regulates genes. The author proposes to identify potential pathogenic miRNAs and their targets in Solanum tuberosum in response to Phytophthora infestans infection using available EST sequences. A multi-step computational approach including screening ESTs, identifying pre-miRNAs, predicting secondary structures, and identifying targets is outlined. Relevant literature on plant miRNA prediction and P. infestans is also reviewed.
The document discusses the field of proteomics, which is the large-scale study of proteins, including their functions and structures. It defines proteomics and describes several areas within it, such as functional proteomics, expressional proteomics, and structural proteomics. It outlines typical proteomics experiments and some key methods used, including two-dimensional electrophoresis, mass spectrometry, and protein-protein interaction prediction methods like phylogenetic profiling.
Proteomics is the study of the proteome, which is the entire set of proteins expressed by a genome, cell, tissue or organism. This document discusses several techniques used in proteomics including 2D gel electrophoresis, mass spectrometry, and protein databases. It provides examples of applications such as biomarker identification for disease diagnosis and drug target discovery. Limitations include the complexity of proteomes and no single technique being adequate for complete analysis. Overall, proteomics techniques help further our understanding of protein structure, function and interactions to gain insights into biological processes and diseases.
Drug Repositioning Conference Washington DC 20190923Tudor Oprea
Discussing the knowledge-based classification of human proteins and its applications in target repurposing discovery, with potential applications for Rare Diseases
Covering our on-going Machine Learning efforts using Protein Knowledge Graphs and MetaPath / XGBoost to predict novel protein-disease associations. Specific Examples for Type 2 Diabetes.
Computational Drug Repositioning Workflow.
Addressing the limitations and potential of machine learning in target and drug repurposing.
Drug Repositioning Candidates: Alprazolam / Glycopyrronium / Oteracil.
Introduction to Gene Mining Part A: BLASTn-off!adcobb
In this lesson, students will learn to use bioinformatics portals and tools to mine plant versions of human genes. Student handout and teacher resource materials are available at www.Araport.org, Teaching Resources (Community tab). Suitable for grades 9-12 or first year undergraduate students.
The document describes a lab for bioinformatics and computational genomics at Ghent University. It has over 100 people including engineers, mathematicians, and molecular biologists. The lab uses bioinformatics approaches like sequence analysis, datamining, and computational biology to analyze large genomic datasets. One goal is developing an app for personal genomic analysis and interpretation.
This document discusses various topics related to drug discovery through bioinformatics. It begins by describing how genome-wide RNAi screening in the nematode C. elegans can be used to identify genes involved in biological pathways related to diseases like type-2 diabetes. It then discusses topics like structural genomics, target identification and validation, high-throughput screening approaches and facilities, sources for screening libraries, criteria for hit and lead compounds, and computational methods used in hit identification and optimization like pharmacophore modeling and evaluating compounds against the "rule of five". Descriptors that can be used for characterizing compounds are also listed.
Envisioning a world where everyone helps solve diseasemhaendel
Keynote presented at the Semantic Web for Life Sciences conference in Cambridge, UK, December 9th, 2015
http://www.swat4ls.org/
The talk focuses on the use of ontologies for data integration to support rare disease diagnostics, and how so very many people unbeknownst to the patient or even to the researchers creating the data are involved in a diagnosis.
The document discusses protein-protein interactions (PPIs), including an introduction to PPIs, the types of interactions, techniques used to study them like X-ray crystallography, NMR spectroscopy and cryo-electron microscopy, and factors that affect PPIs. It also covers methods to investigate PPIs such as affinity purification coupled with mass spectrometry and yeast two-hybrid screening. Applications of understanding PPIs include developing therapeutic drugs and identifying functions of unknown proteins.
Introduction to Gene Mining: Part B: How similar are plant and animal version...adcobb
In this lesson, students will navigate BLASTp and www.Araport.org to determine whether plant and animal versions of genes and proteins are homologous. Student handout and teacher resources are available at www.Araport.org, teacher resources page (under Community). Suitable for grades 9-12 or first year undergraduate students.
Proteomics uses techniques from molecular biology, biochemistry, and genetics to analyze proteins produced by genes. Mass spectrometry is commonly used in proteomics to identify proteins. Techniques like isotope-coded affinity tags (ICAT) allow comparative analysis of protein expression between samples by labeling proteins with stable isotopes before mass spectrometry analysis. ICAT involves labeling cysteine-containing peptides from two samples with either light or heavy isotopic reagents, mixing the samples, then using mass spectrometry to quantify differences in protein expression between the original samples based on mass shifts between labeled peptides.
Protein microarrays are high-throughput methods that allow researchers to study protein interactions and functions on a large scale. There are three main types of protein microarrays: analytical microarrays use antibodies to detect specific proteins in samples; functional microarrays examine protein-protein and other molecular interactions; and reverse-phase protein microarrays profile protein expression levels and post-translational modifications by immobilizing cell or tissue lysates. Protein microarrays have applications in diagnostics, proteomics, studying protein functions, and analyzing antibodies.
This document discusses NIST's work in developing genomic reference materials and methods to evaluate microbial genomics measurements. It describes three projects: 1) assessing genomic purity by detecting low levels of contaminants using sequencing and classification, 2) evaluating SNP calling methods using reference materials and replicates to establish confidence, and 3) developing characterized genomic reference materials for public health pathogens. The overall aim is to build an infrastructure to support genome-based characterization of microbial samples.
This document provides an overview of common proteomics techniques. It describes proteomics as the study of proteins including their roles, structures, localization, interactions and other factors. The key techniques discussed include molecular techniques like DNA microarrays and yeast two-hybrid analysis, separation techniques like gel electrophoresis and chromatography, protein identification methods like mass spectroscopy and Edman sequencing, and protein structure determination methods like NMR, X-ray crystallography and computational prediction. The document provides examples and details of several of these techniques.
Target Validation / Biochemical and Cellular Assay Development OSUCCC - James
Target validation and assay development are essential steps in the drug discovery process. This document discusses several approaches to target validation, including using genetic tools like CRISPR/Cas9 and RNAi to interrogate targets. It also provides an example of developing a cellular assay using patient-derived cells to validate a target for cystic fibrosis. Additionally, the document describes a case study where phenotypic screening was used to discover a small molecule that restores function of a mutant protein associated with Usher Syndrome type III.
Protein Interaction Reporters : Protein-Protein Interactions (PPI) elucidated...Lorenz Lo Sauer
This document discusses protein interaction reporters (PIRs), a crosslinking strategy to study protein-protein interactions (PPIs) using mass spectrometry. PIRs chemically crosslink interacting proteins in their native state, then use a cleavable linker and mass spectrometry to identify and sequence the interacting proteins. Key advantages of PIRs include their ability to provide system-wide snapshots of PPI networks, introduce isotopic labels for relative quantification, and enrich for crosslinked peptides to reduce data complexity challenges. Future directions may include developing PIRs targeted to specific classes of proteins or reaction mechanisms to gain more functional insight into PPIs.
This document discusses protein-protein interaction prediction. It begins with an introduction to protein-protein interactions and databases of known interactions. It then describes methods that can be used to predict interactions, including using gene co-expression, orthology, domain co-occurrence, and other individual features. A naive Bayesian model is used to combine these features. The accuracy of using individual features and combined features is evaluated using cross-validation. Comparisons are made to other known interaction datasets to validate the predictions. In conclusion, over 37,000 novel human protein interactions are predicted beyond what is in current databases.
Structural genomics aims to determine the 3D structures of all proteins encoded by genomes through high-throughput methods. It uses a genome-based approach to solve protein structures rapidly and cost-effectively. Major initiatives like the Protein Structure Initiative have made progress in determining thousands of protein structures. Challenges include expressing membrane and eukaryotic proteins, as well as determining remaining novel folds. Determining protein structures through structural genomics increases understanding of protein function and facilitates drug discovery.
B.sc biochem i bobi u 3.3 homologous and heterologousRai University
This document defines and compares heterologs, homologs, analogs, orthologs, and paralogs. Heterologs differ in origin and activity, while homologs have a common origin but not necessarily common activity. Sequence similarity is a quantitative measure of how many bases match between two aligned sequences. Analogs have common activity but different origins, evolving convergently. Orthologs are homologs that evolved from a common ancestral gene through speciation, often retaining the same function. Paralogs are homologs produced through gene duplication within a genome, and may evolve new functions.
1. The study characterized Echinococcus granulosus genotypes from cyst samples collected from buffalo, sheep, and humans in Bangladesh using PCR of mitochondrial genes.
2. Two genotypes were identified: the common sheep strain G1 and the buffalo strain G3. Nine of 15 buffalo samples and 6 of 9 sheep samples tested positive for G1 using 12S rRNA, while 3 buffalo tested positive for G3 using COX1.
3. Sequence analysis revealed close identity between Bangladeshi isolates and reference sequences for G1 and G3 from other countries. This is the first molecular characterization of Echinococcus spp. in Bangladesh.
This is the second presentation of the BITS training on 'Mass spec data processing'.
It reviews the methods for separating protein mixtures prior to further analysis.
Thanks to the Compomics Lab of the VIB for contribution.
Three groups annotated the genome of Mycoplasma genitalium and found inconsistencies in their annotations. Of the 468 genes, 318 were annotated consistently by all three groups but 45 had conflicting annotations. Errors likely arose from insufficient sequence similarity to determine homology accurately or incorrectly inferring function based on homology alone. Database curation is needed to prevent propagation of erroneous annotations.
The document summarizes key aspects of amino acids and protein structure in 3 paragraphs or less:
Amino acids are the building blocks of proteins. They contain common structural features and exist in L- and D-forms. In proteins, amino acids are exclusively in the L-conformation. Amino acids are classified based on the properties of their side chains into nonpolar, aromatic, polar, positively charged, and negatively charged categories.
Protein structure is hierarchical, progressing from primary to secondary, tertiary, and quaternary levels. The primary structure is the amino acid sequence. Secondary structures include alpha helices, beta sheets, and turns formed by hydrogen bonding. Tertiary structure refers to the overall 3
Homologous genes are genes that have descended from a common ancestral gene. There are two main types of homologous genes:
1. Orthologous genes are homologous genes in different species that arose due to speciation. For example, the human and mouse eyeless genes are orthologs that descended from the eyeless gene in their last common ancestor.
2. Paralogous genes are homologous genes within the same species that arose due to a gene duplication event. For example, the fruit fly eyeless and twin of eyeless genes are paralogs that descended from a duplication of the eyeless gene in a fruit fly ancestor.
Homologous genes can differ in their sequences due
The document discusses various methods for predicting protein function, including homology-based transfer of annotation and prediction of functional motifs and domains. Homology-based transfer can infer molecular function from sequence similarity, but biological process is only transferable between orthologs. Orthologs can be detected through phylogenetic trees or automated methods like InParanoid. Each protein domain contributes to molecular function, while short motifs like phosphorylation sites are also important. Functional annotation involves describing proteins at the molecular, biological process, and cellular component levels.
Phylogenetic trees reconstruct evolutionary relationships by grouping taxa with shared derived characteristics inherited from recent common ancestors. This document discusses methods for building phylogenetic trees, including cladistics which uses shared derived homologies (synapomorphies) to determine relationships. It also examines evidence for the evolutionary relationships of whales. Molecular studies of transposable elements and additional fossil evidence support whales evolving from artiodactyl ancestors, rather than being the sister group to artiodactyls.
The document describes the process of constructing a phylogenetic tree from mitochondrial DNA sequence data retrieved from online databases. It involves the following steps:
1) Retrieving sequence data from NCBI for humans, Neanderthals, chimpanzees, gorillas and other primates.
2) Aligning the sequences using ClustalW and viewing the alignment.
3) Using programs from the Phylip package like Seqboot, Dnadist, Neighbor and Consense to generate bootstrap trees, distance matrices and consensus trees.
4) Viewing and saving the phylogenetic trees in TreeView.
Genetics is the study of genes, heredity, and variation in living organisms. It is a broad discipline that includes molecular genetics, transmission genetics, population genetics, and many other fields. Some key areas of genetics are molecular genetics, which studies genes at the molecular level; transmission genetics, which explores inheritance patterns; population genetics, which studies genetic variation in populations; and quantitative genetics, which examines continuously measured traits. Genetics interfaces with disciplines like biochemistry, molecular biology, and evolution and has applications in areas such as agriculture, medicine, and conservation.
A phylogenetic tree is used to represent evolutionary relationships between organisms believed to have a common ancestry. Charles Darwin first represented evolutionary relationships as trees in his book On the Origin of Species. A phylogenetic tree connects organisms based on how closely related they are, with the distance between organisms on the tree indicating how long ago their common ancestor lived. While trees can provide insight, they have limitations and do not fully represent species histories due to factors like gene transfers.
This document provides an overview of phylogenetic analysis, including:
1) Phylogenetic analysis involves inferring evolutionary relationships between taxa by building phylogenetic trees and analyzing character evolution.
2) Phylogenetic trees show the branching patterns and relationships between taxa, with internal nodes representing hypothetical ancestors.
3) Phylogenetic analysis can provide insights into questions like human evolution, disease transmission, and the origins of genetic elements.
This document outlines and provides examples of different phylogenetic tree construction methods, including UPGMA and neighbor joining. UPGMA assumes a constant mutation rate and joins clusters based on average distances. Neighbor joining does not assume a constant rate and finds the tree that best satisfies the four-point criterion of additive distances. The examples demonstrate the step-by-step process of applying these methods to distance matrices to build phylogenetic trees through an iterative clustering approach.
Proteomics and its applications
Proteomics involves the analysis of the entire complement of proteins in a cell, tissue or organism. It assesses protein activities, modifications, localization and interactions. Proteomics uses techniques like gel electrophoresis, mass spectrometry and liquid chromatography to separate and identify proteins. These techniques can be applied to discover disease biomarkers, develop diagnostic tools, and gain insights into disease pathogenesis and treatment. Proteomics has applications in studying various diseases including cancer, diabetes and infections. It provides insights into cellular processes and systems biology.
1. Recombinant DNA technology uses restriction enzymes and DNA ligase to cut and join DNA from different sources, allowing genes to be transferred between organisms.
2. Polymerase chain reaction (PCR) amplifies specific DNA sequences, enabling rapid copying of genes. It is used in DNA fingerprinting for identification.
3. Transgenic organisms have foreign genes inserted, allowing production of useful proteins like insulin from bacteria and growth hormones from animals and plants.
Proteomics is the study of the structure and function of proteins. It involves identifying and quantifying the proteins expressed by a genome or cell type. Key aspects of proteomics include protein separation techniques like gel electrophoresis, mass spectrometry to identify proteins, and analyzing protein interactions and post-translational modifications. While genomes provide the blueprint, proteomics helps understand the diversity of proteins expressed and how they function together to direct cellular activities. It is a promising tool for disease diagnosis by identifying protein biomarkers.
The document discusses genetic variation and the genome. It defines key terms like genes, which are units of heredity transmitted from parents to offspring via DNA. The largest human chromosome is chromosome 1, which contains a section of DNA code from a single gene on its long arm. This gene is important as it helps determine physical traits and can influence disease susceptibility. Next-generation sequencing allows researchers to efficiently read entire genomes and identify genetic variations between individuals.
Proteomics aims to comprehensively describe the biological systems of a species by obtaining a sufficient density of observations on all the proteins expressed by a cell or tissue. Initial goals were to rapidly identify all proteins expressed, but this has yet to be achieved for any species due to the large and complex nature of proteomes. Various technologies are used to study proteins including gene ontology, biochemical analysis, tagging, mass spectrometry, and protein interaction mapping to better understand protein functions, localizations, modifications, and connections within cellular pathways and systems.
Introduction
Transcriptome analysis
Goal of functional genomics
Why we need functional genomics
Technique
1. At DNA level
2.At RNA level
3. At protein level
4. loss of function
5. functional genomic and bioinformatics
Application
Latest research and reviews
Websites of functional genomics
Conclusions
Reference
A systematic approach to Genotype-Phenotype correlationsfisherp
It is increasingly common to combine Microarray and Quantitative Trait Loci data to aid the search for candidate genes responsible for phenotypic variation. Workflows provide a means of systematically processing these large datasets and also represent a framework for the re-use and the explicit declaration of experimental methods. Here we highlight the issues facing the manual analysis of microarray and QTL data for the discovery of candidate genes underlying complex phenotypes. We show how automated approaches provide a systematic means to investigate genotype-phenotype correlations. This methodology was applied to a use case of resistance to African trypanosomiasis in the mouse. Pathways represented in the results identified Daxx as one of the candidate genes within the Tir1 QTL region.
gene mapping, clonning of disease gene(1).pptxRajesh Yadav
1) The document discusses gene mapping, cloning of disease genes, and genetic variation in drug transporters. It provides details on techniques for gene mapping like genetic mapping and physical mapping.
2) Key points about cloning disease genes include using gene mapping to identify disease-causing genes, and two approaches for cloning genes - for genes of known function and unknown function.
3) The document also discusses genetic variation in drug transporter proteins like ABC transporters and solute carrier proteins, how polymorphisms can affect drug disposition and response.
The document discusses using genomic context analysis and high-throughput data to construct and interpret networks of functional associations between genes and proteins. It describes the STRING database, which uses genomic context evidence from 110 species to predict functional links. It also discusses integrating various high-throughput data types, like protein-protein interaction data and gene expression data from microarrays, to improve the coverage and accuracy of predicted functional associations in STRING. Normalization methods and singular value decomposition are used to analyze and combine expression data from multiple experiments.
Molecular markers for measuring genetic diversity Zohaib HUSSAIN
Molecular markers for measuring genetic diversity
Introduction:
The molecular basis of the essential biological phenomena in plants is crucial for the effective conservation, management, and efficient utilization of plant genetic resources (PGR).
Determining genetic diversity can be based on morphological, biochemical, and molecular types of information. However, molecular markers have advantages over other kinds, where they show genetic differences on a more detailed level without interferences from environmental factors, and where they involve techniques that provide fast results detailing genetic diversity
Comparison of different methods
Morphological characterization does not require expensive technology but large tracts of land are often required for these experiments, making it possibly more expensive than molecular assessment. These traits are often susceptible to phenotypic plasticity; conversely, this allows assessment of diversity in the presence of environmental variation.
Biochemical analysis is based on the separation of proteins into specific banding patterns. It is a fast method which requires only small amounts of biological material. However, only a limited number of enzymes are available and thus, the resolution of diversity is limited.
Molecular analyses comprise a large variety of DNA molecular markers, which can be employed for analysis of variation. Different markers have different genetic qualities (they can be dominant or co-dominant, can amplify anonymous or characterized loci, can contain expressed or non-expressed sequences, etc.).
Genetic marker
The concept of genetic markers is not a new one; in the nineteenth century, Gregor Mendel employed phenotype-based genetic markers in his experiments. Later, phenotype-based genetic markers for Drosophila melanogaster led to the founding of the theory of genetic linkage. A genetic marker is an easily identifiable piece of genetic material, usually DNA that can be used in the laboratory to tell apart cells, individuals, populations, or species. The use of genetic markers begins with extracting proteins or chemicals (for biochemical markers) or DNA (for molecular markers) from tissues of the plant (for example, seeds, foliage, pollen, sometimes woody tissues).
Molecular markers In genetics, a molecular marker (identified as genetic marker) is a fragment of DNA that is associated with a certain location within the genome. Molecular markers which detect variation at the DNA level such as nucleotide changes: deletion, duplication, inversion and/or insertion. Markers can exhibit two modes of inheritance, i.e. dominant/recessive or co-dominant. If the genetic pattern of homozygotes can be distinguished from that of heterozygotes, then a marker is said to be co-dominant. Generally co-dominant markers are more informative than the
This document discusses databases that define the druggable proteome - the portion of the human proteome that can bind small molecules with sufficient affinity for modulating protein function. Four databases - ChEMBL, BindingDB, DrugBank, and IUPHAR/BPS Guide to PHARMACOLOGY - provide evidence-supported links between human proteins and drug targets. Their intersection identifies ~490 proteins (13% of the union of targets) as the most precisely defined druggable proteome. Comparative analyses examine distributions of targets by function and other attributes. Initiatives aim to expand knowledge of currently unannotated but potentially druggable protein families to broaden therapeutic opportunities.
STRING - Prediction of a functional association network for the yeast mitocho...Lars Juhl Jensen
The document discusses predicting functional associations between proteins in the yeast mitochondrial system using the STRING database. It summarizes how STRING integrates genomic context, experimental data, and evidence from other species to infer functional links. It then describes applying these methods to predict mitochondrial proteins in yeast and build an association network for the yeast mitochondrial system, identifying functional modules within it.
Rice stress related gene expression analysisRonHazarika
The document summarizes research on stress response proteins in rice (Oryza sativa). It identifies 17 proteins commonly up-regulated and 3 commonly down-regulated in response to drought, heat, and salinity stress. It analyzes the protein with the highest interaction for each group and identifies 10 similar proteins in each family. It examines the proteins' physicochemical properties, 3D structures, and functions in plant defense. The study finds the proteins structurally similar but functionally diverse, concluding they help rice cope with stress through complex regulatory interactions.
Review Class on Introduction to BioinformaticsSyed Lokman
This document provides an overview of topics covered in a review class for an introduction to bioinformatics course, including database BLAST, comparative genomics, alignment algorithms, multiple sequence alignment, phylogenetic tree construction, protein analysis, and selection analysis. It also defines open reading frames, explains the difference between the sense and antisense strands of DNA, discusses mutation, and describes population differentiation and speciation. The document concludes with a note about questions and answers and a thank you.
Genomic gene expression changes resulting from Trypanosomiasis: a horizontal study Examining expression changes elucidated by micro arrays in seminal tissues associated with the pathophysiology of Trypanosomiasis during disease progression
This document describes MORPH-R, a tool for predicting missing genes in biological pathways. MORPH-R is available both as a web server and standalone software. It uses the MORPH algorithm to rank candidate genes for their likelihood of participating in or affecting biological pathways of interest. MORPH-R was tested on potato data and achieved high performance, identifying novel candidate genes for potato pathways and gene ontology categories. The tool allows users to analyze pathways in tomato, Arabidopsis, rice and potato, and can be customized to analyze additional organisms by adding new gene expression and interaction network data.
STR DNA profiling is now a powerful, inexpensive tool that can generate unique DNA signatures that can be used to authenticate cell lines and detect contamination of more than one cell type. This presentation will talk about why scientists need cell authentication, what is STR profile and STR profile workflow from Creative Bioarray.
Similar to Making Protein Function and Subcellular Localization Predictions: Challenges and Opportunities (20)
When I was asked to give a companion lecture in support of ‘The Philosophy of Science’ (https://shorturl.at/4pUXz) I decided not to walk through the detail of the many methodologies in order of use. Instead, I chose to employ a long standing, and ongoing, scientific development as an exemplar. And so, I chose the ever evolving story of Thermodynamics as a scientific investigation at its best.
Conducted over a period of >200 years, Thermodynamics R&D, and application, benefitted from the highest levels of professionalism, collaboration, and technical thoroughness. New layers of application, methodology, and practice were made possible by the progressive advance of technology. In turn, this has seen measurement and modelling accuracy continually improved at a micro and macro level.
Perhaps most importantly, Thermodynamics rapidly became a primary tool in the advance of applied science/engineering/technology, spanning micro-tech, to aerospace and cosmology. I can think of no better a story to illustrate the breadth of scientific methodologies and applications at their best.
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...Sérgio Sacani
Context. With a mass exceeding several 104 M⊙ and a rich and dense population of massive stars, supermassive young star clusters
represent the most massive star-forming environment that is dominated by the feedback from massive stars and gravitational interactions
among stars.
Aims. In this paper we present the Extended Westerlund 1 and 2 Open Clusters Survey (EWOCS) project, which aims to investigate
the influence of the starburst environment on the formation of stars and planets, and on the evolution of both low and high mass stars.
The primary targets of this project are Westerlund 1 and 2, the closest supermassive star clusters to the Sun.
Methods. The project is based primarily on recent observations conducted with the Chandra and JWST observatories. Specifically,
the Chandra survey of Westerlund 1 consists of 36 new ACIS-I observations, nearly co-pointed, for a total exposure time of 1 Msec.
Additionally, we included 8 archival Chandra/ACIS-S observations. This paper presents the resulting catalog of X-ray sources within
and around Westerlund 1. Sources were detected by combining various existing methods, and photon extraction and source validation
were carried out using the ACIS-Extract software.
Results. The EWOCS X-ray catalog comprises 5963 validated sources out of the 9420 initially provided to ACIS-Extract, reaching a
photon flux threshold of approximately 2 × 10−8 photons cm−2
s
−1
. The X-ray sources exhibit a highly concentrated spatial distribution,
with 1075 sources located within the central 1 arcmin. We have successfully detected X-ray emissions from 126 out of the 166 known
massive stars of the cluster, and we have collected over 71 000 photons from the magnetar CXO J164710.20-455217.
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
ESPP presentation to EU Waste Water Network, 4th June 2024 “EU policies driving nutrient removal and recycling
and the revised UWWTD (Urban Waste Water Treatment Directive)”
hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills
Immersive Learning That Works: Research Grounding and Paths ForwardLeonel Morgado
We will metaverse into the essence of immersive learning, into its three dimensions and conceptual models. This approach encompasses elements from teaching methodologies to social involvement, through organizational concerns and technologies. Challenging the perception of learning as knowledge transfer, we introduce a 'Uses, Practices & Strategies' model operationalized by the 'Immersive Learning Brain' and ‘Immersion Cube’ frameworks. This approach offers a comprehensive guide through the intricacies of immersive educational experiences and spotlighting research frontiers, along the immersion dimensions of system, narrative, and agency. Our discourse extends to stakeholders beyond the academic sphere, addressing the interests of technologists, instructional designers, and policymakers. We span various contexts, from formal education to organizational transformation to the new horizon of an AI-pervasive society. This keynote aims to unite the iLRN community in a collaborative journey towards a future where immersive learning research and practice coalesce, paving the way for innovative educational research and practice landscapes.
Authoring a personal GPT for your research and practice: How we created the Q...Leonel Morgado
Thematic analysis in qualitative research is a time-consuming and systematic task, typically done using teams. Team members must ground their activities on common understandings of the major concepts underlying the thematic analysis, and define criteria for its development. However, conceptual misunderstandings, equivocations, and lack of adherence to criteria are challenges to the quality and speed of this process. Given the distributed and uncertain nature of this process, we wondered if the tasks in thematic analysis could be supported by readily available artificial intelligence chatbots. Our early efforts point to potential benefits: not just saving time in the coding process but better adherence to criteria and grounding, by increasing triangulation between humans and artificial intelligence. This tutorial will provide a description and demonstration of the process we followed, as two academic researchers, to develop a custom ChatGPT to assist with qualitative coding in the thematic data analysis process of immersive learning accounts in a survey of the academic literature: QUAL-E Immersive Learning Thematic Analysis Helper. In the hands-on time, participants will try out QUAL-E and develop their ideas for their own qualitative coding ChatGPT. Participants that have the paid ChatGPT Plus subscription can create a draft of their assistants. The organizers will provide course materials and slide deck that participants will be able to utilize to continue development of their custom GPT. The paid subscription to ChatGPT Plus is not required to participate in this workshop, just for trying out personal GPTs during it.
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Leonel Morgado
Current descriptions of immersive learning cases are often difficult or impossible to compare. This is due to a myriad of different options on what details to include, which aspects are relevant, and on the descriptive approaches employed. Also, these aspects often combine very specific details with more general guidelines or indicate intents and rationales without clarifying their implementation. In this paper we provide a method to describe immersive learning cases that is structured to enable comparisons, yet flexible enough to allow researchers and practitioners to decide which aspects to include. This method leverages a taxonomy that classifies educational aspects at three levels (uses, practices, and strategies) and then utilizes two frameworks, the Immersive Learning Brain and the Immersion Cube, to enable a structured description and interpretation of immersive learning cases. The method is then demonstrated on a published immersive learning case on training for wind turbine maintenance using virtual reality. Applying the method results in a structured artifact, the Immersive Learning Case Sheet, that tags the case with its proximal uses, practices, and strategies, and refines the free text case description to ensure that matching details are included. This contribution is thus a case description method in support of future comparative research of immersive learning cases. We then discuss how the resulting description and interpretation can be leveraged to change immersion learning cases, by enriching them (considering low-effort changes or additions) or innovating (exploring more challenging avenues of transformation). The method holds significant promise to support better-grounded research in immersive learning.
Or: Beyond linear.
Abstract: Equivariant neural networks are neural networks that incorporate symmetries. The nonlinear activation functions in these networks result in interesting nonlinear equivariant maps between simple representations, and motivate the key player of this talk: piecewise linear representation theory.
Disclaimer: No one is perfect, so please mind that there might be mistakes and typos.
dtubbenhauer@gmail.com
Corrected slides: dtubbenhauer.com/talks.html
Current Ms word generated power point presentation covers major details about the micronuclei test. It's significance and assays to conduct it. It is used to detect the micronuclei formation inside the cells of nearly every multicellular organism. It's formation takes place during chromosomal sepration at metaphase.
8.Isolation of pure cultures and preservation of cultures.pdf
Making Protein Function and Subcellular Localization Predictions: Challenges and Opportunities
1. Making
protein
func0on
and
subcellular
localiza0on
predic0ons
–
challenges
and
opportuni0es
Fiona
Brinkman
Department
of
Molecular
Biology
and
Biochemistry
(Associate,
Faculty
of
Health
Sciences
and
School
of
Compu0ng
Sciences)
Simon
Fraser
University
Greater
Vancouver,
BC,
Canada
April
2014
2. • Improving
seq
similarity/orthology-‐based
predic0ons
–
a
keystone
of
many
predictors
• Improving
pathway/network-‐based
analysis
to
iden0fy
protein
func0ons
• Future
challenges
and
opportuni0es
(using
protein
localiza0on
as
an
example
of
what
is
to
come)
What
we
MUST
do
to
move
AFP
forward….
2
3. 3
One-‐to-‐one
orthologs
are,
in
par0cular,
more
func0onally
similar
to
each
other,
vs
other
orthologs,
paralogs,
when
>80%
seq
iden0ty
Func0onal
similarity
measured
by
GO
annota0on
similarity
(13
species)
Altenhoff
AM
et
al.
PLoS
Comput
Biol.
2012
4. 4
One-‐to-‐one
orthologs
are,
in
par0cular,
more
func0onally
similar
to
each
other,
vs
other
orthologs,
paralogs,
when
>80%
seq
iden0ty
Func0onal
similarity
measured
by
GO
annota0on
similarity
(13
species)
Altenhoff
AM
et
al.
PLoS
Comput
Biol.
2012
5.
6. 6
If
true
ortholog
is
missing…
(gene
loss,
or
incomplete
genome)
Ingroup1
Ingroup2
Outgroup
Species
Tree:
Gene
Tree:
Ingroup1
Ingroup2
Outgroup
RBBH
Reciprocal
Best
Blast
Hit
FAIL
Gene
Tree:
Ingroup1
Outgroup
Ingroup2
Usual
Divergence
One
of
the
orthologous
genes
diverges
faster…
Paralog
RBBH
Paralog
7. Ortholuge
Uses
phyle0c
ra0os
to
differen0ate
Suppor0ng
Species
Divergence
(SSD)
orthologs
vs
proteins
more
divergent
than
expected
(non-‐SSD)
7
Ra*o1
distance{ ingroup1-‐ingroup2}
distance{ ingroup1-‐outgroup }
Ingroup1
Ingroup2
Outgroup
SSD
Non-‐SSD
Ortholuge
analysis
comparing
Burkholderia
cepacia
&
B.cenocepacia
(outgroup:
B.pseudomallei)
Ra*o2
distance{ ingroup1-‐ingroup2}
distance{ ingroup2-‐outgroup }
Ingroup1
Ingroup2
Outgroup
Whiteside
et
al
2013
PMID
23203876
8. 0.000
0.200
0.400
0.600
0.800
1.000
KEGG
Orthology
Pfam
Domains
Tigrfam
Annota0ons
Subcellular
Localiza0ons
Propor*on
Predicted
Orthologs
in
600
Pairs
of
Bacterial
Species
SSD
Ortholog
Non-‐SSD
8
*
*
*
*
*
p-‐value
<
0.05
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
One
or
more
homologs
(based
on
BLAST
hits)
Propor*on
SSD
orthologs
Non-‐SSD
*
*
p-‐value
<
0.05
Non-‐SSD
“Orthologs”
more
likely:
-‐
Func0onally
dissimilar
-‐
Have
one
or
more
homologs
9. A Database of Ortholuge Evaluations
OrtholugeDB
(0nyurl.com/ortholugeDB)
• Provides
pre-‐computed
ortholog
predic0ons
for
>1400
bacteria
and
archaea
(update
coming
next
month!),
with
further
Ortholuge
assessments
• Covers
all
genes
in
fully
sequenced
bacterial
and
archaeal
genomes
• Facilitates
visualiza0on
and
evalua0on
of
ortholog
predic0ons
9
10. Similar
issue
with
ini0al
metagenomics
seq
func0onal
evalua0on
1. Simulated
reads
from
Pseudomonas
aeruginosa
PAO1
2. Created
databases
at
different
levels
of
clade
exclusion
• E.g.
for
species
clade
exclusion
removed
all
Pseudomonas
aeruginosa
genomes
from
the
database
3. Used
RAPSearch2
and
MEGAN5
to
assign
func0onal
categories
to
the
simulated
reads
4. Calculated
propor0on
of
reads
assigned
to
each
func0onal
category
rela0ve
to
how
many
reads
expected
• E.g:
10
Category
Expected
#
assigned
Actual
#
assigned
Rela0ve
Propor0on
Membrane
Transport
567
583
1.02822
11. Most
func0onal
categories
are
predicted
well
but
some
are
overpredicted
(ra0o
notably
>1)
0
0.5
1
1.5
2
2.5
Ra*o
of
assigned
rela*ve
to
expected
None
Species
Family
Class
Level of
clade
exclusion:
Ie. Endocrine system: 3 problematic
orthology groups – all with high #’s of
proteins (one has 3538 when median is 54!)
12. The
rela0ve
propor0ons
of
func0onal
categories
stays
rela0vely
consistent
as
clade
exclusion
level
increases
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
None
Species
Family
Class
Propor*on
of
reads
assigned
Clade
exclusion
level
Xenobio0cs
Biodegrada0on
and
Metabolism
Transcrip0on
Signal
Transduc0on
Replica0on
and
Repair
Infec0ous
Diseases
Nucleo0de
Metabolism
Neurodegenera0ve
Diseases
Metabolism
of
Other
Amino
Acids
Metabolism
of
Cofactors
and
Vitamins
Membrane
Transport
…
13. Improving
pathway-‐based
analysis
Issue:
Biomolecular
pathway
classifica0ons
can
bias
analyses
of
pathways
found
to
be
upregulated
or
downregulated
by
transcriptome
(or
other
omics-‐level)
analysis
What
you
iden0fy
depends
on
how
everything
is
classified….
Need
beper
“signatures”
of
pathways…
14. Dealing
with
PART
of
the
issue…
Distribu0on
of
the
number
of
associated
pathways
for
human
genes
in
KEGG.
1
7-45
2
3
4
5
6
Membership
of
a
gene
in
mul0ple
pathways
is
the
norm,
not
the
excep0on…
Foroushani et al, 2014 PMCID: PMC3883547
15. Not
all
genes
are
equal…
Maroon:
pathway
member
White:
no
membership
All
genes
are
not
equivalent
signatures
of
a
given
pathway
Foroushani et al, 2014
PMCID: PMC3883547
16. Individual Gene ORA
Antigen processing and presentation
Graft-versus-host disease
Natural killer cell mediated cytotoxicity
Viral myocarditis
Allograft rejection
Cell adhesion molecules (CAMs)
Chemokine signaling pathway
Type I diabetes mellitus
Toll-like receptor signaling pathway
Cytokine-cytokine receptor interaction
Example:
Treated
vs
Untreated
Mouse
Severe
InflammaIon
–
Gene
Expression
Dataset
Standard Over-
Representation Analysis
(ORA) and Gene Set
Enrichment Analysis
(GSEA) treat all genes in
a given pathway as equal
indicators that that
pathway is significant.
à Emphasizes
generalist genes/
pathways
Foroushani et al, 2014 PMCID: PMC3883547
17. Pathway
Signatures
using
SIGORA:
IdenIfying
genes/gene
pairs
uniquely
associated
with
a
single
pathway
SIGORA identifies statistically significant enrichment of
Pathway Signatures in a gene list of interest.
Foroushani et al, 2014 PMCID: PMC3883547
18. Example: Treated vs Untreated Mouse Severe Inflammation –
Gene Expression Dataset
SIGORA
avoids
many
biologically
less
plausible
results
seen
by
other
methods
that
over-‐emphasize
generalist
genes/pathways.
For example, 6/8 up-regulated genes in “Type I diabetes mellitus”
pathway are also in the "Antigen processing and presentation" pathway.
Individual Gene ORA SIGORA
Antigen processing and presentation Antigen processing and presentation
Graft-versus-host disease Natural killer cell mediated cytotoxicity
Natural killer cell mediated cytotoxicity Complement and coagulation cascades
Viral myocarditis Toll-like receptor signaling pathway
Allograft rejection Cytokine-cytokine receptor interaction
Cell adhesion molecules (CAMs) Leukocyte transendothelial migration
Chemokine signaling pathway Cell adhesion molecules (CAMs)
Type I diabetes mellitus Cytosolic DNA-sensing pathway
Toll-like receptor signaling pathway Chemokine signaling pathway
Cytokine-cytokine receptor interaction
19. Future
challenges
and
opportuni0es
(using
bacterial
protein
localiza0on
as
an
example
of
what
is
to
come)
(Gardy & Brinkman 2006 Nature Reviews Microbiology 4:741)
19
20. Bacterial
protein
subcellular
localiza0on
predic0on
• Aids
genome
annota0on
and
predic0on
of
protein
func0on
• Used
to
iden0fy
cell
surface/secreted
targets
for
drugs
and
diagnos0cs,
as
well
as
poten0al
vaccine
components
• Many
pathogen-‐associated
virulence
factors
predicted
as
secreted
(Gardy & Brinkman 2006 Nature Reviews Microbiology 4:741)
20
21. Signal
pep0des:
Non-‐cytoplasmic
Amino
acid
composi0on/paperns:
All
localiza0ons
-‐
Support
Vector
Machine’s
trained
with
amino
acid
composi0ons
or
frequent
subsequences
Transmembrane
helices:
Cytoplasmic
membrane
-‐
HMMTOP
PROSITE
mo0fs
with
100%
precision:
All
localiza0ons
Outer
membrane
mo0fs:
Outer
membrane
-‐
Iden0fied
by
associa0on-‐rule
mining
Homology
to
proteins
of
experimentally
known
localiza0on:
All
loc.
-‐
“SCL-‐BLAST”
against
pro
of
known
localiza0on
-‐
E=10e-‐10
and
length
restric0on
for
precision
Integra0on
with
a
Baysian
Network
Yu
et
al
(2010)
BioinformaIcs
26:1608
PSORTb:
bacterial
protein
subcellular
localiza0on
(SCL)
predic0on
sosware
22. PSORTb:
version
3
22
• Type
III
secre0on
apparatus
• Pili/fimbria
• Host-‐associated
SCL
• Flagellum
• Spore
• Gas
vesicle
Sub-‐category
localiza0on
predic0ons
Main
localiza0ons
predicted
Bacteria
and
Archaea
predic0ons
24.
Classic
Gram
posi0ve
bacteria,
monoderms:
Thick
pep0doglycan,
no
outer
membrane
Classic
Gram
nega0ve
bacteria,
diderms:
Thin
pep0doglycan
+
outer
membrane
…but
can
have
Gram
nega0ves
with
no
outer
membrane
(i.e.
Mycoplasma)
or
a
different
outer
membrane
(Synergistetes,
Sphingomonas),
or
Gram
posi0ve
(thick
peptdoglycan)
with
a
different
outer
membrane
(Deinococcus
–
6
layers
in
cell
envelope!),
or
“acid
fast”with
asymmetric
lipid-‐containing
thick
cell
wall
(Mycobacteria)
Plus
bacterial
organelles
and
other
substructures
(ie.
magnetosome
of
Magnetospirillum)...
Solu*on:
-‐
For
whole
genome
(deduced-‐proteome)
analysis,
detect
key
protein
markers
of
a
par0cular
cell
type
(i.e.
Omp85
essen0al
for
classic
Gram
nega0ve
membrane)
-‐
For
single
protein
analysis,
learn
from
above
analysis,
plus
literature
cura0on,
the
most
likely
cell
type
for
a
given
phyla
…then
make
predic0ons
assuming
that
cell
“type”
Challenge:
Organismal
diversity
24
Reproduced under Fair Use
25. Challenge:
Temporal,
contextual
diversity
Proteins
can
be
associated
with
mul0ple
subcellular
localiza0ons
i.e.
Cell
division
proteins,
Autotransporters,
“protein
A
dependant
on
protein
B”
Solu0on:
Note
all
possible
localizaIons
since
Temporal,
contextual
predic0ons
non-‐trivial
–
not
enough
knowledge
for
most
Kjærgaard K et al. J. Bacteriol. 2000;182:4789-4796
26. Challenge:
Metagenomics
High
demand
for
PSORTb
to
be
able
to
analyze
metagenomic
sequences
….
under
development
Need
taxonomy
data
to
aid
predic0ons
(then
enable
appropriate
cell
type
analysis)
27.
Through
over
a
decade
of
cura0ng
for,
making
and
evalua0ng
predictors
of
protein
localiza0on,
genomic
islands,
etc
What
makes
a
great
predictor?
28.
Through
over
a
decade
of
cura0ng
for,
making
and
evalua0ng
predictors
of
protein
localiza0on,
genomic
islands,
etc
What
makes
a
great
predictor?
(besides
it
being
right)
☺
29. Bioinforma0cs
Predictor’s
Code
of
Conduct
-‐
Never
force
predic0ons
-‐
always
have
a
predic0on
op0on/category
of
“unknown”
Inspired
by
the
classic
“Data
Provider’s
Code
of
Conduct”
in
Stein
(2002)
Nature
417,
119-‐120
30. Example
of
forced
predic0ons:
PSORT
I
predic0on
method
Nakai & Kanehisa, Proteins: Structure, Function, Genetics (1991) Overall Accuracy = 69%
What’s
wrong
here?
31. Example
of
forced
predic0ons:
PSORT
I
predic0on
method
Nakai & Kanehisa, Proteins: Structure, Function, Genetics (1991) Overall Accuracy = 69%
No secreted/
extracellular
localization!
32. Inspired
by
the
classic
“Data
Provider’s
Code
of
Conduct”
in
Stein
(2002)
Nature
417,
119-‐120
-‐
Never
force
predic0ons
-‐
always
have
“unknown”
op0on/category
-‐
Ensure
open
source
-‐
enable
viewing
of
predic0on
method
details
-‐
Predictor
should
easily
be
trainable
with
different
datasets
(if
applicable;
so
others
can
robustly
evaluate
accuracy)
-‐
Have
ability
to
run
locally
or
over
web
(with
an
API
is
preferred)
-‐
Provide
access
to
old
versions
(at
minimum
when
transi0oning
to
new
version)
-‐
Encourage
con0nuing
cura0on
from
the
literature/lab
experiments!
Incorporate
some
curaIon
efforts
into
predictor
funding
applicaIons
Bioinforma0cs
Predictor’s
Code
of
Conduct
33. Bioinforma0cs
Predictor’s
Code
of
Conduct
-‐
evalua*on
33
-‐
Evaluate
precision
and
recall
(and
accuracy
measure
combos
thereof)
with
x-‐fold
cross
valida0on
and/or
new
datasets
(like
CAFA!)
-‐
ID
errors,
biases
and
provide
guidance
to
users
re
issues
to
watch
for
-‐
bias
in
training
and/or
tes0ng
datasets
(“homology
reduc0on”,
“clade
exclusion”
may
help)
-‐
errors
in
“gold
standard”
lab-‐based
measure
-‐
contextual/temporal
changes
in
proteins,
impac0ng
predic0on
(ie.
Func0on
changes
when
another
protein/compound
present)
What
we
MUST
do:
Guide
users
to
not
just
blindly
use
a
predictor
and
its
default
output.
34. What
we
MUST
do:
Guide
users
to
not
just
blindly
use
a
predictor
and
its
default
output.
Curators,
experimentalists,
and
automated
funcIon
predictor
developers
must
coordinate
efforts
more
• Experimentalists
working
on
what
they
think
best…
• Curators
cura0ng
what
they
priori0ze…
• Func0on
predictors
op0mizing
predic0on
using
exis0ng
data….
FuncIon
predictors/bioinformaIcists
need
to
get
in
the
drivers
seat
more
for
research
Bioinforma0cs
Predictor’s
Code
of
Conduct
35. Brinkman
Lab
Kayaking
Trip,
Summer
2013
(Next
up,
Archery
Tag!)
Amir
Foroushani
Maphew
Laird
David
Lynn
Raymond
Lo
Mike
Peabody
Thea
Van
Rossum
Maphew
Whiteside
Nancy
Yu