Genetic variations in G protein-coupled receptors (GPCRs) can alter receptor function and cause diseases. Single nucleotide polymorphisms and other mutations have been linked to impaired or enhanced receptor signaling. For example, a mutation in the vasopressin V2 receptor causes nephrogenic diabetes insipidus by decreasing ligand binding and receptor expression. Similarly, mutations in chemokine receptors CCR5 and CCR2 impact HIV infection by altering receptor function or interaction with other coreceptors. Overall, GPCR polymorphisms are associated with diseases by changing ligand binding, receptor activation, trafficking, and coupling to downstream signaling pathways.
Molecular docking involves sampling ligand conformations and orientations within a protein binding site, and scoring the protein-ligand complexes to identify favorable binding geometries. Key aspects of docking include accounting for ligand and protein flexibility using methods like Monte Carlo, molecular dynamics, or genetic algorithms. Scoring functions evaluate shape complementarity and chemical interactions like hydrogen bonding, hydrophobic effects, and empirical or knowledge-based terms fit to binding affinity data. Consensus scoring integrates multiple scoring schemes to improve virtual screening accuracy over single functions. While docking false positive rates remain high, consensus approaches combined with experimental validation can suggest novel protein-ligand interactions.
Target Validation
Introduction,Drug discovery, Target identification and validation, Target validation and techniques
By
Ms. B. Mary Vishali
Department of Pharmacology
A lecture on molecular docking that I give for master students at University Paris Diderot.
Warning: this presentation has numerous animations which are not included in the slideshare document.
https://florentbarbault.wordpress.com/
This document discusses gene mapping and sequencing. It begins by defining genomics and genetic markers such as RFLP, SSLP, and SNP that are used to track inheritance. Gene mapping involves determining the locus and distance between genes on chromosomes, which is important for diagnosing genetic diseases. There are two main types of gene mapping: linkage mapping which measures recombination frequency to determine if genes are linked, and physical mapping which precisely locates DNA sequences on chromosomes using techniques like fluorescence in situ hybridization. The document also discusses methods for gene sequencing, including Sanger sequencing and Maxam-Gilbert sequencing, as well as newer techniques like shotgun sequencing and Illumina sequencing.
The document summarizes various mechanisms of regulating protein synthesis at the translational level in prokaryotes. These include controlling translation initiation through proteins or RNA binding near the ribosome-binding site. Another mechanism is the stringent response, where the unusual nucleotides ppGpp and pppGpp accumulate during starvation and inhibit transcription elongation. Riboswitches can also regulate translation by binding small molecules and inducing structural changes that influence gene expression. Translation can also be used to detect defective mRNAs and target them for degradation.
Phospholipase C (PLC) is an enzyme that cleaves phospholipids just before the phosphate group. It plays an important role in signal transduction by binding to and activating G protein coupled receptors. There are several isozymes of PLC classified by type, and they are activated by heterotrimeric G proteins, protein tyrosine kinases, small G proteins, calcium, and phospholipids. PLC cleaves phosphatidylinositol 4,5-bisphosphate to produce inositol 1,4,5-trisphosphate and diacyl glycerol.
Genetic variations in G protein-coupled receptors (GPCRs) can alter receptor function and cause diseases. Single nucleotide polymorphisms and other mutations have been linked to impaired or enhanced receptor signaling. For example, a mutation in the vasopressin V2 receptor causes nephrogenic diabetes insipidus by decreasing ligand binding and receptor expression. Similarly, mutations in chemokine receptors CCR5 and CCR2 impact HIV infection by altering receptor function or interaction with other coreceptors. Overall, GPCR polymorphisms are associated with diseases by changing ligand binding, receptor activation, trafficking, and coupling to downstream signaling pathways.
Molecular docking involves sampling ligand conformations and orientations within a protein binding site, and scoring the protein-ligand complexes to identify favorable binding geometries. Key aspects of docking include accounting for ligand and protein flexibility using methods like Monte Carlo, molecular dynamics, or genetic algorithms. Scoring functions evaluate shape complementarity and chemical interactions like hydrogen bonding, hydrophobic effects, and empirical or knowledge-based terms fit to binding affinity data. Consensus scoring integrates multiple scoring schemes to improve virtual screening accuracy over single functions. While docking false positive rates remain high, consensus approaches combined with experimental validation can suggest novel protein-ligand interactions.
Target Validation
Introduction,Drug discovery, Target identification and validation, Target validation and techniques
By
Ms. B. Mary Vishali
Department of Pharmacology
A lecture on molecular docking that I give for master students at University Paris Diderot.
Warning: this presentation has numerous animations which are not included in the slideshare document.
https://florentbarbault.wordpress.com/
This document discusses gene mapping and sequencing. It begins by defining genomics and genetic markers such as RFLP, SSLP, and SNP that are used to track inheritance. Gene mapping involves determining the locus and distance between genes on chromosomes, which is important for diagnosing genetic diseases. There are two main types of gene mapping: linkage mapping which measures recombination frequency to determine if genes are linked, and physical mapping which precisely locates DNA sequences on chromosomes using techniques like fluorescence in situ hybridization. The document also discusses methods for gene sequencing, including Sanger sequencing and Maxam-Gilbert sequencing, as well as newer techniques like shotgun sequencing and Illumina sequencing.
The document summarizes various mechanisms of regulating protein synthesis at the translational level in prokaryotes. These include controlling translation initiation through proteins or RNA binding near the ribosome-binding site. Another mechanism is the stringent response, where the unusual nucleotides ppGpp and pppGpp accumulate during starvation and inhibit transcription elongation. Riboswitches can also regulate translation by binding small molecules and inducing structural changes that influence gene expression. Translation can also be used to detect defective mRNAs and target them for degradation.
Phospholipase C (PLC) is an enzyme that cleaves phospholipids just before the phosphate group. It plays an important role in signal transduction by binding to and activating G protein coupled receptors. There are several isozymes of PLC classified by type, and they are activated by heterotrimeric G proteins, protein tyrosine kinases, small G proteins, calcium, and phospholipids. PLC cleaves phosphatidylinositol 4,5-bisphosphate to produce inositol 1,4,5-trisphosphate and diacyl glycerol.
Pharmacy students to you will learn principles of drug discovery molecular docking, systematic and simple approach.
Molecular recognition plays a key role in drug receptor interactions, drug activity depends on the molecular binding of the ligand to the receptor binding site.
Genetic variation in G protein coupled receptorsSachinGulia12
This document discusses genetic variation in G protein-coupled receptors (GPCRs). It notes that GPCRs are membrane proteins involved in cell signaling and the targets of many drugs. Genetic variations can influence drug efficacy and safety through several mechanisms: by altering the structure and function of GPCRs, their coupling to G proteins and signaling pathways, their binding pockets, and their spontaneous signaling. Specific examples of genetic variations in GPCRs that are associated with human diseases are also provided.
1. Gene mapping is the process of establishing the locations of genes on chromosomes, while gene sequencing determines the order of nucleotides within genes.
2. There are two main types of genome mapping - genetic mapping, which provides evidence that a disease is linked to genes, and physical mapping, which constructs maps by characterizing and assembling DNA fragments.
3. Restriction mapping uses restriction enzymes to cut DNA at specific sites, allowing genes to be mapped based on the resulting fragment sizes.
Nuclear receptors regulate gene expression by binding to ligands that pass through the cell membrane via simple diffusion. These receptors are located in the cytoplasm or nucleus and bind small molecule ligands like steroids, lipids, vitamins, and thyroid hormone to function as transcriptional coactivators. Nuclear receptor ligands come in various structures including steroids like estrogen, progesterone, and androgen as well as non-steroidal lipophilic hormones such as vitamin D, retinoic acid, fatty acids, and thyroid hormone. Some receptors have unknown ligands and are called "orphan" receptors. Steroid receptors differ from peptide receptors in their half-life, speed of action, duration of effect, location, and degree of post-receptor regulation
3D QSAR techniques like CoMFA and CoMSIA can quantitatively correlate the biological activity of a series of compounds to their 3D molecular properties. CoMFA generates 3D interaction fields around aligned molecules using steric and electrostatic probes, while CoMSIA additionally considers hydrophobic and hydrogen bonding interactions. The document discusses these techniques and provides an example case study applying CoMFA to develop a QSAR model for human eosinophil phosphodiesterase inhibitors with a cross-validated R2 of 0.565. In conclusion, 3D QSAR is a valuable tool for understanding structure-activity relationships and aiding drug design and discovery efforts.
This document discusses gene mapping and sequencing. It defines key terms like gene, genome, and gene mapping. It describes different types of gene mapping including linkage mapping and physical mapping. It also discusses various genetic markers used in mapping like RFLPs, SNPs, AFLPs, RAPDs, SSLPs, microsatellites, and minisatellites. Details are provided on techniques like RFLP analysis, RAPD, AFLP, and their advantages and limitations. The document also covers Sanger sequencing, the chain termination method, and the chemical cleavage method developed by Maxam and Gilbert.
pharmacogenomics is a new drug discovry approach. It is the study of how genes affect a person's response to drugs, combining pharmacology and genomics
Functional genomics,Pharmaco genomics, and Meta genomics.sangeeta jadav
Functional genomics, pharmacogenomics, and metagenomics are methods for determining the roles of genes. Functional genomics examines gene function through techniques like gene knockouts, microarrays, and RNA sequencing. Pharmacogenomics studies how genes affect individual responses to drugs in terms of metabolism and efficacy. Metagenomics analyzes the collective genomes of microbial communities through direct environmental DNA sequencing without requiring isolation of individual species.
Antisense technologies and antisense oligonucleotidesRangnathChikane
Antisense oligonucleotides are short strands of nucleic acids that bind to messenger RNA (mRNA) and inhibit gene expression. There are three main approaches for antisense oligonucleotides based on their ability to activate RNase H enzyme and resist nucleases. Antisense oligonucleotides show potential for treating various diseases like cancer, neurological disorders, and viral infections by hybridizing to target mRNA. However, challenges remain around protecting the oligonucleotides from degradation and improving cellular uptake and target specificity. Overall, antisense technology provides a novel approach for modulating gene expression and developing new therapies.
This document summarizes a seminar presentation on the role of pharmacogenetics and pharmacogenomics in drug development and regulatory review. It discusses how genetic differences can impact how individuals metabolize and respond to drugs in terms of pharmacokinetics and pharmacodynamics. Specific polymorphisms in enzymes, receptors, and transporters were highlighted as factors that can influence a drug's effects. The importance of considering pharmacogenomic data during drug trials and regulatory reviews was emphasized in order to optimize dosing and labeling recommendations to improve drug efficacy and safety.
Genetic variation and its role in health pharmacologyDeepak Kumar
Genetic variation exists at different scales, from single nucleotide polymorphisms within individuals of a species to larger structural differences between species. Genetic variation arises through mutations, recombination, gene flow, genetic drift, and the interaction of these processes over time. The effective population size of a species influences how genetic variation is shaped by these evolutionary forces.
This document discusses protein structure prediction and molecular modeling. It begins with an overview of the druggable genome and protein structure prediction approaches such as ab initio modeling, threading, and homology modeling. It then provides details on homology modeling steps including searching databases, selecting templates, aligning sequences, building models, and model evaluation. The document also discusses protein-ligand docking, scoring functions, assessing docking performance, and practical aspects of docking such as protein and ligand preparation.
G-protein coupled receptors (GPCRs) are the largest family of membrane receptors and the target of many drugs. They have seven transmembrane domains and signal through G proteins and second messengers like cAMP or IP3. Upon ligand binding, the GPCR activates a G protein that then activates or inhibits downstream effectors to produce a cellular response. GPCRs regulate many physiological functions and half of all drugs target these receptors. Recent research focuses on deorphaning orphan GPCRs and understanding how GPCR mutations can cause disease.
This document discusses the use of single nucleotide polymorphisms (SNPs) in pharmacogenomic studies. It begins by introducing personalized medicine and pharmacogenetics/pharmacogenomics. SNPs are described as the most common type of human genetic variation and are important in pharmacogenomic studies as they can affect drug metabolism and response. Methods for detecting SNPs like DNA sequencing and microarrays are presented. Examples are given of how SNPs in genes like TPMT, CYP2D6, and UGT1A1 can affect drug metabolism and dosing for medications like 6-mercaptopurine, codeine, and irinotecan. The SNP Consortium is summarized as a public effort to map SNPs to aid pharmacogen
This document discusses G-protein coupled receptors (GPCRs):
- GPCRs are the largest family of transmembrane receptors and play important roles in human senses and physiology. They respond to a wide range of ligands.
- GPCRs are classified into families based on sequence homology and functional similarity. The largest family is rhodopsin-like receptors which have 7 transmembrane domains.
- GPCRs are activated via conformational changes in the transmembrane domains, primarily involving movements of "microswitches" that allow the receptor to activate associated G proteins and downstream signaling.
3D QSAR approaches relate the biological activity of compounds to their 3D structural properties using statistical analysis. CoMFA is a commonly used 3D-QSAR method that involves aligning molecules, placing them in a grid, calculating electrostatic and steric field properties at each point, and correlating these descriptors to biological activity using PLS analysis. CoMFA results are often displayed as contour plots that identify regions where certain molecular properties increase or decrease activity. X-ray crystallography and NMR spectroscopy can provide experimental data on bioactive conformations.
The objectives are an understanding of:
▶ How “function” in a genomic context can be defined. This is a
surprisingly philosophical problem.
▶ Some strategies for determining if a genomic region is likely to
be functional or not.
Pharmacy students to you will learn principles of drug discovery molecular docking, systematic and simple approach.
Molecular recognition plays a key role in drug receptor interactions, drug activity depends on the molecular binding of the ligand to the receptor binding site.
Genetic variation in G protein coupled receptorsSachinGulia12
This document discusses genetic variation in G protein-coupled receptors (GPCRs). It notes that GPCRs are membrane proteins involved in cell signaling and the targets of many drugs. Genetic variations can influence drug efficacy and safety through several mechanisms: by altering the structure and function of GPCRs, their coupling to G proteins and signaling pathways, their binding pockets, and their spontaneous signaling. Specific examples of genetic variations in GPCRs that are associated with human diseases are also provided.
1. Gene mapping is the process of establishing the locations of genes on chromosomes, while gene sequencing determines the order of nucleotides within genes.
2. There are two main types of genome mapping - genetic mapping, which provides evidence that a disease is linked to genes, and physical mapping, which constructs maps by characterizing and assembling DNA fragments.
3. Restriction mapping uses restriction enzymes to cut DNA at specific sites, allowing genes to be mapped based on the resulting fragment sizes.
Nuclear receptors regulate gene expression by binding to ligands that pass through the cell membrane via simple diffusion. These receptors are located in the cytoplasm or nucleus and bind small molecule ligands like steroids, lipids, vitamins, and thyroid hormone to function as transcriptional coactivators. Nuclear receptor ligands come in various structures including steroids like estrogen, progesterone, and androgen as well as non-steroidal lipophilic hormones such as vitamin D, retinoic acid, fatty acids, and thyroid hormone. Some receptors have unknown ligands and are called "orphan" receptors. Steroid receptors differ from peptide receptors in their half-life, speed of action, duration of effect, location, and degree of post-receptor regulation
3D QSAR techniques like CoMFA and CoMSIA can quantitatively correlate the biological activity of a series of compounds to their 3D molecular properties. CoMFA generates 3D interaction fields around aligned molecules using steric and electrostatic probes, while CoMSIA additionally considers hydrophobic and hydrogen bonding interactions. The document discusses these techniques and provides an example case study applying CoMFA to develop a QSAR model for human eosinophil phosphodiesterase inhibitors with a cross-validated R2 of 0.565. In conclusion, 3D QSAR is a valuable tool for understanding structure-activity relationships and aiding drug design and discovery efforts.
This document discusses gene mapping and sequencing. It defines key terms like gene, genome, and gene mapping. It describes different types of gene mapping including linkage mapping and physical mapping. It also discusses various genetic markers used in mapping like RFLPs, SNPs, AFLPs, RAPDs, SSLPs, microsatellites, and minisatellites. Details are provided on techniques like RFLP analysis, RAPD, AFLP, and their advantages and limitations. The document also covers Sanger sequencing, the chain termination method, and the chemical cleavage method developed by Maxam and Gilbert.
pharmacogenomics is a new drug discovry approach. It is the study of how genes affect a person's response to drugs, combining pharmacology and genomics
Functional genomics,Pharmaco genomics, and Meta genomics.sangeeta jadav
Functional genomics, pharmacogenomics, and metagenomics are methods for determining the roles of genes. Functional genomics examines gene function through techniques like gene knockouts, microarrays, and RNA sequencing. Pharmacogenomics studies how genes affect individual responses to drugs in terms of metabolism and efficacy. Metagenomics analyzes the collective genomes of microbial communities through direct environmental DNA sequencing without requiring isolation of individual species.
Antisense technologies and antisense oligonucleotidesRangnathChikane
Antisense oligonucleotides are short strands of nucleic acids that bind to messenger RNA (mRNA) and inhibit gene expression. There are three main approaches for antisense oligonucleotides based on their ability to activate RNase H enzyme and resist nucleases. Antisense oligonucleotides show potential for treating various diseases like cancer, neurological disorders, and viral infections by hybridizing to target mRNA. However, challenges remain around protecting the oligonucleotides from degradation and improving cellular uptake and target specificity. Overall, antisense technology provides a novel approach for modulating gene expression and developing new therapies.
This document summarizes a seminar presentation on the role of pharmacogenetics and pharmacogenomics in drug development and regulatory review. It discusses how genetic differences can impact how individuals metabolize and respond to drugs in terms of pharmacokinetics and pharmacodynamics. Specific polymorphisms in enzymes, receptors, and transporters were highlighted as factors that can influence a drug's effects. The importance of considering pharmacogenomic data during drug trials and regulatory reviews was emphasized in order to optimize dosing and labeling recommendations to improve drug efficacy and safety.
Genetic variation and its role in health pharmacologyDeepak Kumar
Genetic variation exists at different scales, from single nucleotide polymorphisms within individuals of a species to larger structural differences between species. Genetic variation arises through mutations, recombination, gene flow, genetic drift, and the interaction of these processes over time. The effective population size of a species influences how genetic variation is shaped by these evolutionary forces.
This document discusses protein structure prediction and molecular modeling. It begins with an overview of the druggable genome and protein structure prediction approaches such as ab initio modeling, threading, and homology modeling. It then provides details on homology modeling steps including searching databases, selecting templates, aligning sequences, building models, and model evaluation. The document also discusses protein-ligand docking, scoring functions, assessing docking performance, and practical aspects of docking such as protein and ligand preparation.
G-protein coupled receptors (GPCRs) are the largest family of membrane receptors and the target of many drugs. They have seven transmembrane domains and signal through G proteins and second messengers like cAMP or IP3. Upon ligand binding, the GPCR activates a G protein that then activates or inhibits downstream effectors to produce a cellular response. GPCRs regulate many physiological functions and half of all drugs target these receptors. Recent research focuses on deorphaning orphan GPCRs and understanding how GPCR mutations can cause disease.
This document discusses the use of single nucleotide polymorphisms (SNPs) in pharmacogenomic studies. It begins by introducing personalized medicine and pharmacogenetics/pharmacogenomics. SNPs are described as the most common type of human genetic variation and are important in pharmacogenomic studies as they can affect drug metabolism and response. Methods for detecting SNPs like DNA sequencing and microarrays are presented. Examples are given of how SNPs in genes like TPMT, CYP2D6, and UGT1A1 can affect drug metabolism and dosing for medications like 6-mercaptopurine, codeine, and irinotecan. The SNP Consortium is summarized as a public effort to map SNPs to aid pharmacogen
This document discusses G-protein coupled receptors (GPCRs):
- GPCRs are the largest family of transmembrane receptors and play important roles in human senses and physiology. They respond to a wide range of ligands.
- GPCRs are classified into families based on sequence homology and functional similarity. The largest family is rhodopsin-like receptors which have 7 transmembrane domains.
- GPCRs are activated via conformational changes in the transmembrane domains, primarily involving movements of "microswitches" that allow the receptor to activate associated G proteins and downstream signaling.
3D QSAR approaches relate the biological activity of compounds to their 3D structural properties using statistical analysis. CoMFA is a commonly used 3D-QSAR method that involves aligning molecules, placing them in a grid, calculating electrostatic and steric field properties at each point, and correlating these descriptors to biological activity using PLS analysis. CoMFA results are often displayed as contour plots that identify regions where certain molecular properties increase or decrease activity. X-ray crystallography and NMR spectroscopy can provide experimental data on bioactive conformations.
The objectives are an understanding of:
▶ How “function” in a genomic context can be defined. This is a
surprisingly philosophical problem.
▶ Some strategies for determining if a genomic region is likely to
be functional or not.
This document provides background information on genetic sequencing techniques. It begins with a brief history of Sanger sequencing and its role in decoding genetic sequences. It then discusses how DNA can be separated by size using gel electrophoresis, noting that polyacrylamide gels allow for greater resolution than agarose gels. The document goes on to explain how Sanger sequencing works and some improvements that were made over time. It also introduces next-generation sequencing techniques and discusses their advantages over Sanger sequencing in providing massively parallel sequencing at lower cost.
The document provides information about genomics and the Human Genome Project. It defines genomics as the study of the structure and function of entire genomes. It describes the goals of the Human Genome Project as identifying all human genes, determining DNA sequences, and making the data publicly available. Sequencing techniques used include shotgun sequencing and Sanger sequencing. The document also discusses how DNA is amplified and prepared for sequencing.
The document provides an overview of the Human Genome Project (HGP). It describes the HGP's goal of mapping and sequencing the entire human genome. The HGP was an international research effort that worked alongside a private company, Celera Genomics, to complete a rough draft of the human genome by 2000. The completion of the HGP marked a major scientific achievement and has transformed fields like medicine, biotechnology, and genetics by providing a comprehensive map of the human genetic code.
this is done by me and my team mates of Wayamba University Sri Lanka for our project.From now we decided to allow download this file.I would be greatful if you could send your comments..
And I'm willing to help you in similar works.I'm in final year of my degree(.BSc Biotechnology)..
pubudu_gokarella@yahoo.com
The document summarizes research characterizing promoter-associated short tandem repeats (STRs) in the human genome and investigating their potential functional effects by analyzing associations with gene expression (eQTLs) and DNA methylation (mQTLs). The researchers targeted over 6,000 promoter-associated STRs in 120 individuals and identified STR variants correlated with local transcription and methylation, providing evidence that STRs can influence gene regulation and represent an overlooked source of genetic and phenotypic variation.
Introduction to Apollo: A webinar for the i5K Research CommunityMonica Munoz-Torres
This document provides an introduction and outline for a webinar on using the Apollo genome annotation editing tool. It was presented by Monica Munoz-Torres of BBOP to the i5K Research Community. The webinar aimed to help participants better understand genome curation in the context of automated and manual annotation. It also aimed to familiarize participants with Apollo's functionality and how to identify homologs of known genes, corroborate gene models using evidence, and modify automated annotations in Apollo. The document includes sections on genome sequencing projects, the objectives and uses of genome annotation, and a biological refresher on concepts relevant to manual annotation like genes, transcription, translation, and genome curation steps.
This document discusses challenges and opportunities in applying mRNA sequencing (mRNAseq) to non-model organisms. It describes using digital normalization as a way to cope with having a massive amount of lamprey mRNAseq data but an incomplete genomic reference. Digital normalization enables assembly and analysis that would otherwise not be possible. The document also discusses applying digital normalization to ascidian mRNAseq data, where it results in substantial time savings and comparable transcriptomes to those assembled without normalization. Finally, it discusses next steps for lamprey transcriptome analysis including enabling various evolutionary and biological studies.
Comparative genomics is the study of genome structure and function across species. Sequencing entire genomes is non-optimal due to vast numbers of species and large genome sizes, and individuals within a species have genetically distinct genomes. Comparative genomics addresses these issues using approaches like gene prediction algorithms that use features of protein-coding regions and tools that find putative genes or syntenic regions between genomes.
Investigation of phylogenic relationships of shrew populations using genetic...Juan Barrera
This study investigated the phylogenetic relationships of shrew populations using genetic markers. DNA was extracted from tissue samples of shrews from different geographic regions. Two mitochondrial genes, cytochrome c oxidase subunit 1 (COI) and cytochrome b (cyt-b), were amplified via PCR and sequenced. Genetic markers were used to identify species and construct a phylogenetic tree. The cyt-b gene was found to be more accurate for species identification and phylogenetic analysis of shrews. Some inconsistencies between the COI and cyt-b trees and BLAST results suggest that cyt-b data is more abundant in genetic databases for shrew species comparison.
Investigation of phylogenic relationships of shrew populations using genetic...Juan Barrera
This study investigated the phylogenetic relationships of shrew populations using genetic markers. DNA was extracted from tissue samples of shrews from different geographic regions. Two mitochondrial genes, COI and cyt-b, were amplified via PCR and sequenced. The genetic markers were used to identify species and construct a phylogenetic tree. Cyt-b was found to be more accurate for species identification and phylogenetic analysis of shrews. Some inconsistencies between the COI and cyt-b trees and BLAST results suggest that the genetic databases are more complete for cyt-b in shrews. The study provides insights into shrew diversity and evolution at the DNA level.
This study investigated the phylogenetic relationships of shrew populations using genetic markers. DNA was extracted from tissue samples of shrews from different geographic regions. Two mitochondrial genes, COI and cyt-b, were amplified and sequenced. The genetic markers were used to identify species and construct a phylogenetic tree. The study found that cyt-b more accurately identified species and analyzed phylogeny in shrews compared to COI. Some inconsistencies between the COI and cyt-b BLAST results and phylogenetic trees were observed, possibly due to limited shrew sequence data available in databases. The study provided insights into shrew diversity and evolutionary relationships at the DNA level.
This document discusses challenges and opportunities in applying mRNA sequencing (mRNAseq) to non-model organisms. It describes using digital normalization to cope with large amounts of lamprey mRNAseq data that would otherwise be too computationally intensive to assemble. Digital normalization was applied successfully to Molgula ascidian mRNAseq data, enabling transcriptome analysis. The lamprey transcriptome was assembled from over 5 billion reads from 50 tissues, producing over 600,000 transcripts. Next steps include addressing contamination issues and using the more complete transcriptome to enable various evolutionary and biological studies of lamprey. The document advocates making protocols and data openly available to help characterize genes in non-model organisms.
What is bioinformatics?
About human genome
Human genome project
Aim of human genome project
History
Sequencing Strategy
Benefits of Human Genome Project research
Disadvantages of human genome project
Conclusion
References
This Presentation will be helpful to undergraduate and postgraduate students of biology and biotechnology in understanding the significance of COT curves in determination of gene and genome complexity amoug various organisms
Apollo - A webinar for the Phascolarctos cinereus research communityMonica Munoz-Torres
This document provides an overview of a webinar introducing the Apollo genome annotation tool. The webinar aims to help the koala genome research community better understand genome curation processes involving automated annotation and manual curation using Apollo. It outlines the webinar topics which will explain gene prediction, the Apollo interface for collaborative curation, and demonstrations of identifying gene homologs and modifying automated annotations. The webinar aims to familiarize participants with genome curation concepts and the Apollo tool.
Dan Graur - Can the human genome be 100% functional?Andrei Afanasiev
(1) The document discusses the concept of genomic function and argues that not all parts of the genome are functional. It defines functional DNA as DNA whose sequence is under selection, and whose absence would impair fitness.
(2) The document discusses different concepts of biological function and argues the appropriate concept is "selected effect function" - a part's historical role in adaptation and survival, not just causal roles.
(3) It argues activity like transcription does not prove function, and most genome is likely nonfunctional or neutral "rubbish". Function should be defined by its evolutionary role, not biochemical properties.
The document discusses DNA methylation, repetitive DNA sequences, the C-value paradox, and their relationships. It provides background on DNA methylation, how it regulates gene expression and is involved in diseases. It describes highly repetitive and satellite DNA sequences. It explains that the C-value paradox stemmed from the observation that genome size did not correlate with complexity, but this was later resolved by discovering non-coding DNA. The paradox questioned how genome size related to gene number.
Similar to ppgardner-lecture03-genomesize-complexity.pdf (20)
Objectives are an understanding of:
▶ Homology search tools
▶ E-values
▶ how BLAST works
▶ how profile HMMs (hmmer) work
▶ which is the right tool for different questions
Sequence alignment & comparative genomics
▶ What’s the difference between homology and analogy?
▶ How homology is estimated?
▶ Where do sequence similarity scores come from?
▶ What are the BLOSUM protein scoring matrices?
▶ How are insertions and deletions scored?
Genome annotation & comparative genomics
An appreciation for:
▶ An overview of some techniques and methods are used for
comparative genomics
▶ An understanding of genome annotation methods, particularly
the advantages and disadvantages of the different methods:
▶ Sequence analysis (ORF finding)
▶ Comparative sequence analysis
▶ Experimental methods (RNAseq & mass-spectroscopy)
Does RNA avoidance dictate protein expression level?Paul Gardner
Selection against mRNA:ncRNA interactions is observed across bacterial and archaeal genomes. Stochastic interactions between abundant ncRNAs and mRNAs may influence protein expression levels by inhibiting translation. Analysis of highly conserved genes in over 1,500 bacterial and 100 archaeal genomes provides evidence that mRNA and ncRNA sequences have evolved to avoid stable interactions, suggesting such interactions are selectively avoided to prevent inaccurate regulation of protein levels.
1) The document discusses machine learning and random forests, a popular machine learning method.
2) Random forests use decision trees built from random subsets of variables and data, and aggregate results to improve accuracy.
3) Examples using random forests to classify iris data are shown, including evaluating variable importance and classification accuracy.
This document discusses clustering and classification techniques for analyzing multivariate data. It begins by explaining that multivariate data involves collecting measurements of multiple features for each item. The document then outlines two main styles of analysis: 1) classification (supervised learning), which involves using known class labels to develop a procedure for classifying new items, and 2) clustering (unsupervised learning), which aims to find groups within the data without known class labels. Common techniques are discussed for both classification and clustering. Examples of applications are also provided.
- Monte Carlo methods use randomness to solve problems by running simulations of random samples and processes. This allows evaluating the significance of observed statistics.
- The modern version was developed in the 1940s for use in nuclear weapons projects. It involves generating random samples from an assumed model and comparing statistics to observed values.
- As an example, points in spatial data can be tested for random distribution by comparing properties of the real data to randomly generated point data, like minimum distances between points.
The document discusses resampling techniques called the jackknife and bootstrap. The jackknife involves deleting each observation from the dataset and recalculating statistics to estimate bias, standard error, and confidence intervals. The bootstrap resamples the dataset with replacement many times to estimate properties of statistics like the mean. Both techniques are used to assess reliability of estimates and account for uncertainty without assumptions about the population distribution. The document provides examples applying these methods to estimate standard deviation, confidence intervals for the median, and properties of regression.
This document contains the notes from a lecture on contingency tables and related statistical methods. It introduces contingency tables and how they can be used to analyze relationships between variables. It discusses Fisher's exact test and the chi-squared test for assessing independence in contingency tables. Examples are provided to demonstrate contingency table analysis and visualization of results. Additional resampling methods like bootstrapping and Monte Carlo simulation are also mentioned.
1. The document discusses regression analysis and linear regression models. It defines key terms used in regression including the famous five values, sums of squares, variance, covariance, correlation, slope, and intercept.
2. Methods for assessing the explanatory power and fit of regression models are presented, including the coefficient of determination (r2), standard errors for slope and intercept, and partitioning total sum of squares.
3. The importance of model checking is emphasized through assessing residuals, influence points, and considering alternative transformations to improve linearity.
This document provides an overview of linear regression analysis. It defines key terms like residuals, error sum of squares (SSE), and the "famous five" values needed to perform regression (sums of x, y, x-squared, y-squared, and x times y). Linear regression finds the line of best fit by minimizing SSE. The slope of the regression line is calculated as the covariance (SSxy) divided by variance (SSx). Regression guarantees to pass through the point of mean x and mean y.
Analysis of covariation and correlationPaul Gardner
The document discusses correlation and covariance in biology. It defines correlation as a relationship between two or more variables, while covariance is a measure of how two random variables vary together. The document provides examples of calculating correlation coefficients between variables using formulas for covariance, variance, and the correlation coefficient r. It warns that correlation does not necessarily imply causation and provides tips for interpreting correlated data.
This document discusses statistical tests for comparing two samples, specifically Fisher's F test, Student's t-test, and Wilcoxon rank-sum test. It provides an example comparing ozone concentrations between two market gardens (Gardens A and B) using these tests. The F test showed the variances between gardens were not significantly different. The t-test and Wilcoxon test both found the mean ozone concentration was significantly higher in Garden B than Garden A.
The document discusses analyzing single samples and making inferences about population parameters. It addresses three key questions for single samples: what is the mean, is the mean significantly different from expectations, and how certain we are of the mean estimate. Parametric or non-parametric methods must be used depending on whether the data meets assumptions like normality. The normal distribution and central limit theorem allow drawing inferences from sample means. Examples demonstrate calculating probabilities for different parts of the normal distribution using z-scores and the standard normal distribution.
The document discusses measures of centrality in biology. It lists some common measures of centrality used to describe datasets, including the mean. It then shows a graph with many data points distributed around a central mean.
The document provides an overview of key concepts for the BIOL209: Fundamentals course, including:
- Instructor-provided slides had no impact on attendance but adversely affected exam performance.
- Note-taking and self-testing improves learning. Some students may experience math anxiety or stereotype threat.
- The scientific method involves forming hypotheses and testing them through experimentation and analysis of data.
- Understanding statistical and experimental design principles is important for reproducing and interpreting results. Randomization, replication, and controlling for confounding variables strengthen experimental conclusions.
Random RNA interactions control protein expression in prokaryotesPaul Gardner
Presented at the NZSBMB/NZMS Conference in Christchurch 2016
CustomScience Award
A core assumption of gene expression analysis is that mRNA abundances broadly correlate with protein abundance, but these two can be imperfectly correlated. Some of the discrepancy can be accounted for by two important mRNA features: codon usage and mRNA secondary structure. We present a new global factor, called mRNA:ncRNA avoidance, and provide evidence that avoidance increases translational efficiency. We demonstrate a strong selection for the avoidance of stochastic mRNA:ncRNA interactions across prokaryotes, and that these have a greater impact on protein abundance than mRNA structure or codon usage. By generating synonymously variant green fluorescent protein (GFP) mRNAs with different potential for mRNA:ncRNA interactions, we demonstrate that GFP levels correlate well with interaction avoidance. Therefore, taking stochastic mRNA:ncRNA interactions into account enables precise modulation of protein abundance.
Avoidance of stochastic RNA interactions can be harnessed to control protein ...Paul Gardner
Presented at the Computational RNA Biology conference in Hinxton, 17-19th October, 2016.
https://coursesandconferences.wellcomegenomecampus.org/events/item.aspx?e=584
A meta-analysis of computational biology benchmarks reveals predictors of pro...Paul Gardner
This meta-analysis of computational biology benchmarks found no significant predictors of algorithm accuracy. The author analyzed 84 benchmarks comparing the accuracy and speed of 203 bioinformatics methods. Metrics like journal impact factor, author citation counts, and method age showed weak or no correlation with accuracy rank. While fast methods underwent more development iterations, speed was also not predictive of accuracy. The author concludes that factors like author/journal prestige do not guarantee software quality, and heuristic approaches can perform as well as mathematically rigorous ones. Overall, the study found no clear predictors of a method's programming accuracy based on existing benchmarks.
This document provides an introduction to non-coding RNAs (ncRNAs). It discusses how RNA was proposed to have come before DNA and proteins in the RNA world hypothesis. It notes that RNA can store genetic information like DNA and have catalytic abilities like proteins. The document outlines different types and functions of ncRNAs, including how some act as guides for chemical modifications, regulate gene expression as sponges for small RNAs or proteins, or sense environmental signals as riboswitches or thermosensors. Examples of conserved ncRNA families and mechanisms are briefly described.
Anti-Universe And Emergent Gravity and the Dark UniverseSérgio Sacani
Recent theoretical progress indicates that spacetime and gravity emerge together from the entanglement structure of an underlying microscopic theory. These ideas are best understood in Anti-de Sitter space, where they rely on the area law for entanglement entropy. The extension to de Sitter space requires taking into account the entropy and temperature associated with the cosmological horizon. Using insights from string theory, black hole physics and quantum information theory we argue that the positive dark energy leads to a thermal volume law contribution to the entropy that overtakes the area law precisely at the cosmological horizon. Due to the competition between area and volume law entanglement the microscopic de Sitter states do not thermalise at sub-Hubble scales: they exhibit memory effects in the form of an entropy displacement caused by matter. The emergent laws of gravity contain an additional ‘dark’ gravitational force describing the ‘elastic’ response due to the entropy displacement. We derive an estimate of the strength of this extra force in terms of the baryonic mass, Newton’s constant and the Hubble acceleration scale a0 = cH0, and provide evidence for the fact that this additional ‘dark gravity force’ explains the observed phenomena in galaxies and clusters currently attributed to dark matter.
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...Scintica Instrumentation
Targeting Hsp90 and its pathogen Orthologs with Tethered Inhibitors as a Diagnostic and Therapeutic Strategy for cancer and infectious diseases with Dr. Timothy Haystead.
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆Sérgio Sacani
Context. The early-type galaxy SDSS J133519.91+072807.4 (hereafter SDSS1335+0728), which had exhibited no prior optical variations during the preceding two decades, began showing significant nuclear variability in the Zwicky Transient Facility (ZTF) alert stream from December 2019 (as ZTF19acnskyy). This variability behaviour, coupled with the host-galaxy properties, suggests that SDSS1335+0728 hosts a ∼ 106M⊙ black hole (BH) that is currently in the process of ‘turning on’. Aims. We present a multi-wavelength photometric analysis and spectroscopic follow-up performed with the aim of better understanding the origin of the nuclear variations detected in SDSS1335+0728. Methods. We used archival photometry (from WISE, 2MASS, SDSS, GALEX, eROSITA) and spectroscopic data (from SDSS and LAMOST) to study the state of SDSS1335+0728 prior to December 2019, and new observations from Swift, SOAR/Goodman, VLT/X-shooter, and Keck/LRIS taken after its turn-on to characterise its current state. We analysed the variability of SDSS1335+0728 in the X-ray/UV/optical/mid-infrared range, modelled its spectral energy distribution prior to and after December 2019, and studied the evolution of its UV/optical spectra. Results. From our multi-wavelength photometric analysis, we find that: (a) since 2021, the UV flux (from Swift/UVOT observations) is four times brighter than the flux reported by GALEX in 2004; (b) since June 2022, the mid-infrared flux has risen more than two times, and the W1−W2 WISE colour has become redder; and (c) since February 2024, the source has begun showing X-ray emission. From our spectroscopic follow-up, we see that (i) the narrow emission line ratios are now consistent with a more energetic ionising continuum; (ii) broad emission lines are not detected; and (iii) the [OIII] line increased its flux ∼ 3.6 years after the first ZTF alert, which implies a relatively compact narrow-line-emitting region. Conclusions. We conclude that the variations observed in SDSS1335+0728 could be either explained by a ∼ 106M⊙ AGN that is just turning on or by an exotic tidal disruption event (TDE). If the former is true, SDSS1335+0728 is one of the strongest cases of an AGNobserved in the process of activating. If the latter were found to be the case, it would correspond to the longest and faintest TDE ever observed (or another class of still unknown nuclear transient). Future observations of SDSS1335+0728 are crucial to further understand its behaviour. Key words. galaxies: active– accretion, accretion discs– galaxies: individual: SDSS J133519.91+072807.4
Microbial interaction
Microorganisms interacts with each other and can be physically associated with another organisms in a variety of ways.
One organism can be located on the surface of another organism as an ectobiont or located within another organism as endobiont.
Microbial interaction may be positive such as mutualism, proto-cooperation, commensalism or may be negative such as parasitism, predation or competition
Types of microbial interaction
Positive interaction: mutualism, proto-cooperation, commensalism
Negative interaction: Ammensalism (antagonism), parasitism, predation, competition
I. Mutualism:
It is defined as the relationship in which each organism in interaction gets benefits from association. It is an obligatory relationship in which mutualist and host are metabolically dependent on each other.
Mutualistic relationship is very specific where one member of association cannot be replaced by another species.
Mutualism require close physical contact between interacting organisms.
Relationship of mutualism allows organisms to exist in habitat that could not occupied by either species alone.
Mutualistic relationship between organisms allows them to act as a single organism.
Examples of mutualism:
i. Lichens:
Lichens are excellent example of mutualism.
They are the association of specific fungi and certain genus of algae. In lichen, fungal partner is called mycobiont and algal partner is called
II. Syntrophism:
It is an association in which the growth of one organism either depends on or improved by the substrate provided by another organism.
In syntrophism both organism in association gets benefits.
Compound A
Utilized by population 1
Compound B
Utilized by population 2
Compound C
utilized by both Population 1+2
Products
In this theoretical example of syntrophism, population 1 is able to utilize and metabolize compound A, forming compound B but cannot metabolize beyond compound B without co-operation of population 2. Population 2is unable to utilize compound A but it can metabolize compound B forming compound C. Then both population 1 and 2 are able to carry out metabolic reaction which leads to formation of end product that neither population could produce alone.
Examples of syntrophism:
i. Methanogenic ecosystem in sludge digester
Methane produced by methanogenic bacteria depends upon interspecies hydrogen transfer by other fermentative bacteria.
Anaerobic fermentative bacteria generate CO2 and H2 utilizing carbohydrates which is then utilized by methanogenic bacteria (Methanobacter) to produce methane.
ii. Lactobacillus arobinosus and Enterococcus faecalis:
In the minimal media, Lactobacillus arobinosus and Enterococcus faecalis are able to grow together but not alone.
The synergistic relationship between E. faecalis and L. arobinosus occurs in which E. faecalis require folic acid
Signatures of wave erosion in Titan’s coastsSérgio Sacani
The shorelines of Titan’s hydrocarbon seas trace flooded erosional landforms such as river valleys; however, it isunclear whether coastal erosion has subsequently altered these shorelines. Spacecraft observations and theo-retical models suggest that wind may cause waves to form on Titan’s seas, potentially driving coastal erosion,but the observational evidence of waves is indirect, and the processes affecting shoreline evolution on Titanremain unknown. No widely accepted framework exists for using shoreline morphology to quantitatively dis-cern coastal erosion mechanisms, even on Earth, where the dominant mechanisms are known. We combinelandscape evolution models with measurements of shoreline shape on Earth to characterize how differentcoastal erosion mechanisms affect shoreline morphology. Applying this framework to Titan, we find that theshorelines of Titan’s seas are most consistent with flooded landscapes that subsequently have been eroded bywaves, rather than a uniform erosional process or no coastal erosion, particularly if wave growth saturates atfetch lengths of tens of kilometers.
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...Creative-Biolabs
Neutralizing antibodies, pivotal in immune defense, specifically bind and inhibit viral pathogens, thereby playing a crucial role in protecting against and mitigating infectious diseases. In this slide, we will introduce what antibodies and neutralizing antibodies are, the production and regulation of neutralizing antibodies, their mechanisms of action, classification and applications, as well as the challenges they face.
2. Introduction
▶ I started at Otago in 2018.
▶ Taught bioinformatics, genomics & biostatistics for 6 years at
the University of Canterbury.
▶ Worked for ∼ 10 years in Europe. I’ve lived in Bielefeld,
Copenhagen and Cambridge. Taught courses in Denmark,
Cambridge, Poland, Portugal, Australia, USA, and NZ.
▶ Originally from Whāngārā, married with three children.
6. Ohno S (1972) So Much “Junk DNA” in our Genome. Brookhaven Symposium on Biology. – available on
Blackboard.
7. Summary of Susumu Ohno’s main points about junk DNA
▶ The human genome is roughly 3 × 109 bases, ≈ 750 times E.
coli.
▶ Assuming the same gene density as E. coli, humans will have
≈ 3 million genes.
▶ HOWEVER, if gene density is preserved, then salamanders
have 10 times the number of genes that we do.
▶ There may be a strict upper limit to the number of gene loci.
▶ Therefore, only a fraction of human DNA is “genic”.
▶ Argues that based upon mutation rates, that the number of
genes must be less than 1
mutation rate to avoid meltdown.
▶ and the number of genes is ≈ 30, 000, therefore about 6% of
our DNA is genic.1
▶ Speculates about some hypothetical functions of non-coding
DNA, e.g. chromosome structure, promoter/operators,
inhibiting non-sense/frameshift mutations, ...
▶ Speculates that our genome is littered with the “fossil”
remains of extinct genes.
8. Vertebrate genomes & junk DNA
▶ Vary in length by two orders of magnitude, e.g. bird genomes
are ≈ 1Gb, while salamander genomes are ≈ 32Gb (giant
lungfish genome is ≈ 43Gb).
▶ Variation largely driven by decaying remnants of transposons.
▶ ≈ 300Mb of sequence is conserved across the vertebrates.
▶ Randomly generated sequences when inserted into genomes
are also transcribed, promoters are very degenerate.
▶ I.e. many different sequences can be bound by a transcription
factor
▶ Which suggests the number of functional elements should not
scale with genome length.
Ohno S (1972) So much’junk’DNA in our genome. In Evolution of Genetic Systems, Brookhaven Symp. Biol.
9. Mammalian genome variation & junk DNA
▶ But salamanders are special... (if the genome sizes were
longer in the birds than in the salamanders we could
rationalise how complex birds are).
▶ Let’s look at just the mammalian extremes of genome size
then... the red vizcacha rat, the mountain vizcacha rat and a
bat!
Which of these mammals is more complex?
Southern bent-wing bat
Miniopterus orianae bassanii
~1.6 Gb genome
Mountain viscacha rat
Octomys mimax
~3.2 Gb genome
Plains viscacha rat
Tympanoctomys barrerae
~6.2 Gb genome
Evans et al. (2017) Evolution of the Largest Mammalian Genome. GBE.
10. 30 years later we forgot about Ohno’s line of reasoning...
Editorial (2000) The nature of the number. Nature genetics.
Willyard (2018) New human gene tally reignites debate. Nature.
Hatje et al. (2019) The Protein-Coding Human Genome: Annotating High-Hanging Fruits. BioEssays.
11. Genome sizes & number of genes
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
A
Nucleus
Cytoplasm
ER
Golgi
Lysosome
Mitochondria
B
Interactions
Genome length
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Paramecium
Arabidopsis
Homo
Gemmata
Escherichia
Saccharomyces
Lokiarchaeum
Mycoplasma
Nanoarchaeum
Escherichia phage
104
105
106
107
108
109
1010
101
102
103
104
105
●
Animal
Fungi
Plant
Protists
Archaea
Bacteria
Phage
Transect
Number
of
protein
coding
genes
▶ Prediction: “In fact, there seems to be a strict upper limit for
the number of genes loci which we can afford to keep in our
genome” (Ohno, 1972)
▶ Eukaryotic genomes can vary widely in size, with little change
in the number of protein-coding genes. Gardner, Gumy & Fineran (2019)
Unpublished.
12. Related to junk DNA, is the “C-value paradox”
▶ C-values (a.k.a. genome size) vary enormously between
closely related species.
▶ Vertebrates: birds (1 Gb), bats (1.6 Gb), human (3 Gb),
tuatara (5 Gb), salamander (32 Gb), lungfish (130 Gb).
▶ Allium (onions): 7 Mb to 32 Mb.
▶ There is little relationship between the “perceived
complexity” (or the corresponding number of genes) of a
species and the size of a genome.
▶ Given that C-values are relatively constant within
species/cells, yet bears no relationship to the complexity, or
presumed number of genes, then this somewhat of a paradox.
https://en.wikipedia.org/wiki/C-value#C-value paradox
Eddy, SR (2012) The C-value paradox, junk DNA and ENCODE. Current Biology.
Palazzo & Gregory et al. (2014) The Case for Junk DNA PLoS Genetics.
13. Drivers of genome size variation
C-value variation is largely driven by transposable elements and
other repetitive DNA elements
https://sandwalk.blogspot.com/2018/03/whats-in-your-genome-pie-chart.html
http://www.dfam.org/entry/DF0000001/hits
14. Protein functions, 2023 – still lots of unknowns
cytoskeletal protein 3% (627)
transporter 5% (1031)
scaffold/adaptor protein 4% (746)
protein−binding activity modulator 4% (793)
RNA metabolism protein 4% (827)
gene−specific transcriptional regulator 7% (1516)
defense/immunity protein 3% (664)
metabolite interconversion enzyme 9% (1939)
protein modifying enzyme 8% (1622)
transmembrane signal receptor 6% (1151)
Unclassified 33% (6695)
Other 14% (2980)
Unknown 0% (0)
20,851 Human Proteome Functions 2023
Gardner (2023)
Data from Panther: http://www.pantherdb.org
15. The ENCODE Project 2012
▶ 443 authors, from genome/analysis centres around the world
▶ Aim to map functional elements across the human genome
▶ Generated LOTS of important data that we are still using:
▶ Transcription: RNA-seq, CAGE (Cap Analysis Gene
Expression), RNA-PET (paired end tag)
▶ Translation: mass spectrometry
▶ Transcription-factor-binding sites: ChIP-seq and DNase-seq
▶ Chromatin structure: DNase-seq, FAIRE-seq, histone
ChIP-seq and MNase-seq
▶ DNA methylation sites: RRBS assay
▶ Largely used 3 cell lines: erythroleukaemia K562;
B-lymphoblastoid GM12878 & H1 embryonic stem cells
The ENCODE Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome.
Nature.
17. ENCODE 2012: LOTS of cool results (30 papers)
E.g. transcription is correlated (r = 0.9) with transcription factor
binding (left) and cancer associated SNPs are linked to chromatin
structure, which may influence gene expression (right).
18. ENCODE 2012
▶ Acknowledged that “Comparative genomic studies suggest
that 3% to 8% of bases are under purifying (negative)
selection”
▶ Yet conclude “The vast majority (80.4%) of the human
genome participates in at least one biochemical RNA- and/or
chromatin-associated event in at least one cell type.”
▶ “Many non-coding variants in individual genome sequences lie
in ENCODE-annotated functional regions; this number is at
least as large as those that lie in protein-coding genes.”
▶ “Single nucleotide polymorphisms (SNPs) associated with
disease by GWAS are enriched within non-coding functional
elements”
20. And there was a predictable scientific backlash:
▶ Eddy, SR (2012) The C-value paradox, junk DNA and
ENCODE. Current Biology.
▶ Q&A style discussion of C-values, junk DNA and ENCODE
▶ Eddy, SR (2013) The ENCODE project: Missteps
overshadowing a success. Current Biology.
▶ “all reproducible biochemical events were claimed to be
‘critical’ and ‘needed’.”
▶ “Far from disproving junk DNA, ENCODE’s operationalized
definition of function included junk DNA”.
▶ “genomes contain a lot of DNA that is not critically important
for host functions, much of which arises from mobile element
replication”.
▶ Sean proposes the “The Random Genome Project: the
missing negative control”.
▶ “Biology is noisy” therefore an assessment of noise in
transcription, translation & DNA binding would allow an
assessment of how much of the ENCODE results are noise,
and how much is genuinely functional.
▶ Recent progress: Deciphering eukaryotic gene-regulatory logic
with 100 million random promoters
21. And further responses
▶ Doolittle WF (2013) Is junk DNA bunk? A critique of
ENCODE. PNAS.
▶ If the number of functional elements were to rise significantly
with C-value then
▶ (i) organisms with larger genomes are more complex
phenotypically
▶ NB. Organism/phenotypic complexity is generally not well
defined.
▶ (ii) ENCODE’s definition of a functional element identifies
many sites that would not be considered functional or
phenotype-determining by standard uses in biology
▶ (iii) the same phenotypic functions are often determined in a
more diffuse fashion in larger-genomed organisms
▶ Graur D et al. (2013) On the immortality of television
sets:“function” in the human genome according to the
evolution-free gospel of ENCODE. Genome biology and
evolution.
▶ “a very forcefully worded critique” – W. Ford Doolittle
▶ “angry, dogmatic, scattershot, sometimes inaccurate” – Sean
Eddy
22. Note: Transcription & translation are likely to be noisy
processes
Genome
Functional Junk
Untranscribed Transcribed Transcribed Untranscribed
Untranslated Translated Translated Untranslated
Transcriptome
Proteome
23. Promoters/Transcription factor binding motifs are
degenerate
▶ “Degenerate” motifs refers to the fact that many different
sequences can promote transcription
▶ This implies there is lots of potential for random, noisy
transcription
http://jaspar.genereg.net/
24. How would you determine if a genomic region is
“functional”?
▶ Lecture #7
25. Humans are not the most (or least) complex species!
Big brains, hands, sure. But we can’t breath underwater water,
regrow limbs, don’t have tentacles, can’t fly unassisted, can’t
photosynthesise, melt under mild radiation, turn inside out in a
vacuum, get cancers - circulatory disease - infectious disease, do
stupid things, get too hot and too cold, ...
Humans are weak and puny!
Great Chain of Being
https://en.wikipedia.org/wiki/Category:Obsolete biological theories
26. The main points
▶ C-values (genome sizes) vary dramatically, even between
closely related species.
▶ The “C-value paradox” is that genome size is not proportional
to the perceived organism “complexity”.
▶ Many “functional” non-coding elements exist in the genome
(.e.g ncRNAs, promoters, etc.)
▶ The majority of the eukaryotic genomes is littered with
“Junk” DNA.
▶ these are largely pseudogenised transposable elements and
endogenous retroviruses
▶ NB. junk ̸= in-active AND non-coding ̸= junk
▶ ENCODE is a valuable resource for identifying “biochemical”
activity in cell lines.
▶ Some of the ENCODE conclusions and press are controversial.
▶ Reports of the death of junk DNA are greatly exaggerated.
27. Self-evaluation exercises
▶ Outline the C-value paradox, include in your answer a
definition of junk DNA.
▶ What are the main sources of junk DNA?
▶ Describe the ENCODE project and main conclusions.
▶ Outline an approach to distinguish between functional and
noisy cellular processes.
▶ The below figures have been used to associate proportions of
noncoding DNA with “complexity”. Using your knowledge of
the C-value paradox, write a critique of this result.
Mattick (2004) The hidden genetic program of complex organisms. Scientific American.
Taft et al. (2007) The relationship between non-protein-coding DNA and eukaryotic complexity. BioEssays.
28. Self-evaluation exercises
▶ Use chatGPT to generate an essay on “genome function and
junk DNA”. Critique the essay based upon the references
presented in this lecture.
AI generated art by https://creator.nightcafe.studio/studio, “Junk DNA”, Preset: Surreal
29. Suggested reading
▶ Important: Ohno S (1972) So Much “Junk DNA” in our
Genome. Brookhaven Symposium on Biology.
▶ Important: Eddy, SR (2013) The ENCODE project: Missteps
overshadowing a success. Current Biology.
▶ Extra: Palazzo & Gregory (2014) The Case for Junk DNA.
PLOS Genetics.
30. Questions relating to my lectures can be asked & viewed
here:
https://docs.google.com/document/d/1PQd dp7C 0cXA8SwUv-
qrkTOj8c8fUAt-U Z5dg2yc8/edit?usp=sharing