This document provides an overview of molecular modelling techniques used for in silico drug discovery. It discusses using computational approaches to model small molecule and protein interactions to assess drug safety and efficacy. The key techniques covered include obtaining protein structures from databases like PDB, simulating molecular interactions through docking and screening, and considering factors like binding affinity, pharmacokinetics and toxicity during the drug design process. Computational protein structure prediction is also discussed as an important technique when experimental structures are unavailable.
molecular docking its types and de novo drug design and application and softw...GAUTAM KHUNE
This ppt deals with all the aspects related to molecular docking ,its types(rigid ,flexible and manual) and screening based on it and also deals with de novo drug design , various softwares available for docking methodologies and applications for molecular docking in new drug design
A lecture on molecular docking that I give for master students at University Paris Diderot.
Warning: this presentation has numerous animations which are not included in the slideshare document.
https://florentbarbault.wordpress.com/
molecular docking its types and de novo drug design and application and softw...GAUTAM KHUNE
This ppt deals with all the aspects related to molecular docking ,its types(rigid ,flexible and manual) and screening based on it and also deals with de novo drug design , various softwares available for docking methodologies and applications for molecular docking in new drug design
A lecture on molecular docking that I give for master students at University Paris Diderot.
Warning: this presentation has numerous animations which are not included in the slideshare document.
https://florentbarbault.wordpress.com/
Prediction of the three dimensional structure of a given protein sequence i.e. target protein from the amino acid sequence of a homologous (template) protein for which an X-ray or NMR structure is available based on an alignment to one or more known protein structures
ADMET properties prediction using AI will accelerate the process of drug discovery.
This slide mostly focuses on using graph-based deep learning techniques to predict drug properties.
Cadd and molecular modeling for M.PharmShikha Popali
THE CADD IS FOR THE DRUG DEVELOPMENT THE DIFFERENT STRATEGIES ARE MENTIONED LIKE QSAR MOLECULAR DOCKING, THE DIFFERENT DIMNSIONAL FORMS OF QSAR , THE ADVANCE SAR of it.
Validation is the process of checking that your model is consistent with stereochemical standards i.e., validation is the process of evaluating reliability
In this presentation various aspects of validation are discussed
Limitations of in silico drug discovery methodsAlichy Sowmya
In drug discovery there are various in silico approaches such as Virtual high throughput screening, Molecular docking, Homology modelling, QSAR, CoMFA, Molecular Dynamics, and Pharmacophore mapping. In this presentation various limitations of these approaches are given
Drug discovery take years to decade for discovering a new drug and very costly
Effort to cut down the research timeline and cost by reducing wet-lab experiment use computer modeling
Others have done the work. Some have used the work. I have spoken only on behalf of their behalf.
Introduction to In silico engineering for biologicsLee Larcombe
A subset of slides from "Introduction to in silico drug discovery" focussed just on the engineering considerations for biologics / antibody therapeutics
Prediction of the three dimensional structure of a given protein sequence i.e. target protein from the amino acid sequence of a homologous (template) protein for which an X-ray or NMR structure is available based on an alignment to one or more known protein structures
ADMET properties prediction using AI will accelerate the process of drug discovery.
This slide mostly focuses on using graph-based deep learning techniques to predict drug properties.
Cadd and molecular modeling for M.PharmShikha Popali
THE CADD IS FOR THE DRUG DEVELOPMENT THE DIFFERENT STRATEGIES ARE MENTIONED LIKE QSAR MOLECULAR DOCKING, THE DIFFERENT DIMNSIONAL FORMS OF QSAR , THE ADVANCE SAR of it.
Validation is the process of checking that your model is consistent with stereochemical standards i.e., validation is the process of evaluating reliability
In this presentation various aspects of validation are discussed
Limitations of in silico drug discovery methodsAlichy Sowmya
In drug discovery there are various in silico approaches such as Virtual high throughput screening, Molecular docking, Homology modelling, QSAR, CoMFA, Molecular Dynamics, and Pharmacophore mapping. In this presentation various limitations of these approaches are given
Drug discovery take years to decade for discovering a new drug and very costly
Effort to cut down the research timeline and cost by reducing wet-lab experiment use computer modeling
Others have done the work. Some have used the work. I have spoken only on behalf of their behalf.
Introduction to In silico engineering for biologicsLee Larcombe
A subset of slides from "Introduction to in silico drug discovery" focussed just on the engineering considerations for biologics / antibody therapeutics
What can we learn from molecular dynamics simulations of carbon nanotube and ...Stephan Irle
We present the results of nonequilibrium molecular dynamics (MD) simulations of catalytic and non-catalytic carbon nanostructure formation processes, including single-walled carbon nanotube (SWCNT) and graphene nucleation and growth. In the talk, we discuss the significance of the findings in the light of more traditional, static descriptions of growth reaction mechanisms, and highlight differences as well as commonalities.
This is about Image segmenting.We will be using fuzzy logic & wavelet transformation for segmenting it.Fuzzy logic shall be used because of the inconsistencies that may occur during segementing or
PRESENTED BY: HARSHPAL SINGH WAHI, SHIKHA D. POPALI
USEFUL FOR PHARMACY STUDENTS AND ACADEMICS, INDUSTRIALS FOR MOLECULE DEVELOPMENT, MODELING, DRUG DISCOVERY, COMPUTATIONAL TOOLS, MOLECULAR DOCKING ITS TYPES, FACTORS AFFECTING, DIFFERENT STAGES, QSAR ADVANTAGES, NEED
IOSR Journal of Applied Chemistry (IOSR-JAC) is an open access international journal that provides rapid publication (within a month) of articles in all areas of applied chemistry and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in Chemical Science. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
•U.S. Congress mandated that the EPA screen chemicals for their potential to be endocrine disruptors
•Led to development of the Endocrine Disruptor Screening Program (EDSP)
•Initial focus was on environmental estrogens, but program expanded to include androgens and thyroid pathway disruptors
Researchers at EPA’s National Center for Computational Toxicology integrate advances in biology, chemistry, and computer science to examine the toxicity of chemicals and help prioritize chemicals for further research based on potential human health risks. The goal of this research program is to quickly evaluate thousands of chemicals, but at a much reduced cost and shorter time frame relative to traditional approaches. The data generated by the Center includes characterization of thousands of chemicals across hundreds of high-throughput screening assays, consumer use and production information, pharmacokinetic properties, literature data, physical-chemical properties as well as the predictive computational modeling of toxicity and exposure. We have developed a number of databases and applications to deliver the data to the public, academic community, industry stakeholders, and regulators. This presentation will provide an overview of our work to develop an architecture that integrates diverse large-scale data from the chemical and biological domains, our approaches to disseminate these data, and the delivery of models supporting predictive computational toxicology. In particular, this presentation will review our new publicly-accessible CompTox Dashboard as the first application built on our newly developed architecture. This abstract does not reflect U.S. EPA policy.
QSAR Studies of the Inhibitory Activity of a Series of Substituted Indole and...inventionjournals
HF method, with the basis set 6-31G (d) was employed to calculate quantum some chemical descriptors of 37 substituted Indole. The best descriptors were selected to establish the quantitative structure activity relationship (QSAR) of the inhibitory activity against isoprenylcysteine carboxyl methyltransferase (Icmt), by principal components analysis (PCA), to a multiple regression analysis (MLR), to a nonlinear regression (RNLM) and to an artificial neural network (ANN). We accordingly propose a quantitative model and we interpret the activity of the compounds relying on the multivariate statistical analysis. This study shows that the MLR and have served to predict activity, but when compared with the results given by the ANN model. We concluded that the predictions achieved by this latter is more effective and much better than other models. The statistical results indicate that the model is statistically significant and shows very good stability towards data variation in the validation method. The contribution of each descriptor to the structure-activity relationship is evaluated.
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...Kamel Mansouri
This presentation highlighted how data curation impacts the reliability of QSAR models. We examined key datasets related to environmental endpoints to validate across chemical structure representations (e.g., mol file and SMILES) and identifiers (chemical names and registry numbers), and approaches to standardize data into QSAR-ready formats prior to modeling procedures. This allowed us to quantify and segregate data into quality categories. This improved our ability to evaluate the resulting models that can be developed from these data slices, and to quantify to what extent efforts developing high-quality datasets have the expected pay-off in terms of predicting performance. The most accurate models that we build will be accessible via our public-facing platform and will be used for screening and prioritizing chemicals for further testing.
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsLuis Marco Ruiz
Databases for Clinical Information Systems are difficult to
design and implement, especially when the design should be
compliant with a formal specification or standard. The
openEHR specifications offer a very expressive and generic
model for clinical data structures, allowing semantic
interoperability and compatibility with other standards like
HL7 CDA, FHIR, and ASTM CCR. But openEHR is not only
for data modeling, it specifies an EHR Computational
Platform designed to create highly modifiable future-proof
EHR systems, and to support long term economically viable
projects, with a knowledge-oriented approach that is
independent from specific technologies. Software Developers
find a great complexity in designing openEHR compliant
databases since the specifications do not include any
guidelines in that area. The authors of this tutorial are
developers that had to overcome these challenges. This
tutorial will expose different requirements, design principles,
technologies, techniques and main challenges of implementing
an openEHR-based Clinical Database, with examples and
lessons learned to help designers and developers to overcome the challenges more easily
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsLuis Marco Ruiz
Modern medicine needs methods to enable access to data,
captured during health care, for research, surveillance,
decision support and other reuse purposes. Initiatives like the
National Patient Centered Clinical Research Network in the
US and the Electronic Health Records for Clinical Research
in the EU are facilitating the reuse of Electronic Health
Record (EHR) data for clinical research. One of the barriers
for data reuse is the integration and interoperability of
different Healthcare Information Systems (HIS). The reason is
the differences among the HIS information and terminology
models. The use of EHR standards like openEHR can alleviate
these barriers providing a standard, unambiguous,
semantically enriched representation of clinical data to
enable semantic interoperability and data integration. Few
works have been published describing how to drive
proprietary data stored in EHRs into standard openEHR
repositories. This tutorial provides an overview of the key
concepts, tools and techniques necessary to implement an
openEHR-based Data Warehouse (DW) environment to reuse
clinical data. We aim to provide insights into data extraction
from proprietary sources, transformation into openEHR
compliant instances to populate a standard repository and
enable access to it using standard query languages and
services
Using open bioactivity data for developing machine-learning prediction models...Sunghwan Kim
Presented at the 256th American Chemical Society (ACS) National Meeting in Boston, MA (August 22, 2018).
==== Abstract ====
The retinoid X receptor (RXR) is a nuclear hormone receptor that functions as a transcription factor with roles in development, cell differentiation, metabolism, and cell death. Chemicals that interfere the RXR signaling pathway may cause adverse effects on human health. In this study, open bioactivity data available at PubChem (https://pubchem.ncbi.nlm.nih.gov) were used to develop prediction models for chemical modulators of RXR-alpha, which is a subtype of RXR that plays a role in metabolic signaling pathways, dermal cysts, cardiac development, insulin sensitization, etc. The models were constructed from quantitative high-throughput screening (qHTS) data from the Tox21 project, using various supervised machine learning methods (including support vector machine, random forest, neural network, k-nearest neighbors, decision tree, and naïve Bayes). The performance of the models was evaluated with an external data set containing bioactivity data submitted by ChEMBL and the NCATS Chemical Genomics Center (NCGC). This study showcases how open data in the public domain can be used to develop prediction models for chemical toxicity.
This pdf is about the Schizophrenia.
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
1. Molecular modelling for In silico drug
discovery:
modelling small molecules and proteins
Dr Lee Larcombe
leelarcombe@gmail.com
2. Lecture Aim
This lecture aims to provide a basic understanding of
the concept of protein and molecular in silico
engineering/design as part of the drug development
process:-
Introducing theory and approaches, drivers, databases
and software – and with a focus on safety and efficacy.
3. This Lecture Covers
• Drivers for use of computational approaches
• Getting protein structures
• Simulation of molecular interactions
• Considering safety during design
• We will also highlight key software or data sources along
the way
5. Business
Target identification
Lead selection
Lead refinement
Pre-Clinical phases
Genomics
Proteomics/Metabolomics
Interaction Networks
Molecular modelling
Protein modelling
Chemoinformatics
Molecular modelling
Data modelling
Interaction Networks
Systems Biology
In vitro
In vivo
££
£
£
£
££
6. Ethics Drivers
• Use of animals in research
• 3Rs – Refine, Reduce, Replace
• Relevance of animal data for human use
• Extrapolation across species
• Improvement of safety for subsequent trials
• Regulatory requirements and change
7. Extrapolation of data across
species
How relevant is animal physiology to human physiology ?
Models not available for all diseases
Choice of species can be important
• 30% attrition due to no efficacy in man
• 10% attrition due to toxicity
For biologics, even more difficult to predict
9. Safety and Efficacy of Small Molecule
Drugs
• Safety: safety issues primarily focus on the potential of
the small molecule to have off-target effects,
metabolite/breakdown product toxicity, or buildup/non
clearance
• Efficacy: efficacy issues focus on bioavailability and good
binding kinetics to the right target protein – including
variations of that protein (SNPs/mutants)
10. 1st we need a source of molecules:
Chemical Repositories
• Databases with safety information (GRS, CAS)
• Databases with structure and vendor/price – individual
chemical supply companies - Zinc
• Databases with multiple information types – ChEMBLdb,
PubChem, Kegg
11. ChEMBLdb
“The ChEMBL database (ChEMBLdb) contains medicinal chemistry bioassay data,
integrated from a wide variety of sources (the literature, deposited data sets, other
bioassay databases). Subsets of ChEMBLdb, relating to particular target classes, or
disease areas, are exported to smaller databases, These separate data sets, and the
entire ChEMBLdb, are available either via ftp downloads, or via bespoke query interfaces,
tailored to the requirements of the scientific communities with a specific interest in these
research areas”
• Targets: 10,579
• Compound records: 1,638,394
• Distinct compounds: 1,411,786
• Activities: 12,843,338
• Publications: 57,156
(release 19)
13. Basic Requirements for modelling
1. Representation of atomic coordinates
2. Scoring
3. Searching
14. Structure Representation
• How much information do you want to include?
• atoms present
• connections between atoms
• bond types
• stereochemical configuration
• charges
• isotopes
• 3D-coordinates for atoms
C8H9NO3
15. Structure Representation
• 3D-coordinates for atoms
• connections between atoms
OH
CH2
H N CH 2
O
OH
• bond types
16. http://en.wikipedia.org/wiki/International_Chemical_Identifier
Structure Representation - InChi
Morphine
InChI=1S/C17H19NO3/c1-18-7-6-17-10-3-5-13
(20)16(17)21-15-12(19)4-2-9(14(15)17)8-
11(10)18/h2-5,10-11,13,16,19-20H,6-
8H2,1H3/t10-,11+,13-,16-,17-/m0/s1
The condensed, 27 character standard
InChIKey is a hashed version of the full
standard InChI (using the SHA-256
algorithm), designed to allow for easy web
searches of chemical compounds.
BQJCRHHNABKAKU-KBQPJGBKSA-
N
17. Scoring (Energy Functions & Force Fields)
Energy can be broken into a sum of potential energy terms
E = Ebonds + Eangles + Etorsions + Evdw + Eelectrostatic
Estr stretch
Ebend bend
Etors torsion
EvdW van der Waals
Eel electrostatic
Epol polarization
+ - + -
Repulsion
Attraction
+-+
+-+
+-++
-
+-+
q
f
18. Searching
Mol Mechanics (static) – minimisation
Mol Dynamics (dynamic) – laws of motion
MD a bit more complicated … need to know about:
• Classical mechanics
• classical equations of motion (EOM)
• e.g. Newton’s equations of motion
If we know these equations we *could* try to search for ALL possible
structures of Proteins and how they fold e.g. Protein Folding
19. Energy Minimisation Theory
• Treat molecule as a set of balls (with mass) connected by rigid
rods and springs
• Rods and springs have empirically determined force constants
• Allows one to treat atomic-scale motions in proteins as classical
physics problems (OK approximation)
20. Energy Minimisation
Local minimum vs global minimum
Many local minima; only ONE global minimum
Methods: Steepest descent, Conjugate gradient, others…
• Efficient way of “polishing and shining” your model
• Removes atomic overlaps and unnatural strains in the structure
• Stabilizes or reinforces strong hydrogen bonds, breaks weak
ones
• Brings molecule to lowest energy
22. Steepest Descent Minimisation
Low Energy High Energy
Low Energy
Makes small locally steep moves down gradient
Sufficient if starting point already close to optimal solution (e.g.
refinement of experimental structure)
23. I have no idea where this image came from, but it is a very nice illustation of the comparison. If anyone
knows where it is from please let me know!
Molecular Mechanics vs Dynamics
MM calculates just minimum energy state.
MM ignores kinetic energy, does only potential energy
Molecules, especially proteins, are not static.
• Dynamics can be important to function
• Trajectories, not just minimum energy state.
MD takes same force model, but calculates F=ma and calculates
velocities of all atoms (as well as positions)
24. Why simulate the dynamics of (molecular) systems?
• Molecular systems are not static
• Molecules are in dynamic equilibrium
• Properties are averages over dynamic behaviour of
molecules
• Molecular processes are not instantaneous
• Time course (kinetics) of events is important
• Time dependence essential to understand development
and regulation
Why?
25. What can we do with chemical
models?
We can investigate structure and similarities of structure
between molecules
We can map structural characteristics to properties (SARs)
We can study molecular interactions – particularly with
proteins
26. Interactions – Docking & Screening
• Computation to assess binding affinity
• Looks for conformational and electrostatic "fit" between
proteins and other molecules
• Optimization: Does position and orientation of the two
molecules minimise the total energy? (Computationally
intensive)
• Docking small ligands to proteins is a way to find potential
drugs. Industrially important!
27. Virtual Screening
• Docking small ligands to proteins is a way to find potential
drugs. Industrially important
• A small region of interest (pharmacophore) can be identified,
reducing computation
• Empirical scoring functions are not universal
• Various search methods:
• Rigid- provides score for whole ligand (accurate)
• Flexible- breaks ligands into pieces and docks them
individually
28. So – we need protein (target)
structures
http://www.rcsb.org/
29. The PDB
The PDB was established in 1971 at Brookhaven National
Laboratory and originally contained 7 structures. In 1998,
the Research Collaboratory for Structural Bioinformatics
(RCSB) became responsible for the management of the
PDB.
Last year (2013), 9597 structures were deposited from
scientists all over the world – this year (2014) so far, 8391
Now totals 105,839 (yesterday) structures
31. What if there is no structure available?
Can we predict structures?
Tertiary structure is dependent on ‘folding’ of the protein.
Recognition, characterisation, and assignment of domains
and folds is a major area of structural bioinformatics.
Predicting structure from sequence is one of the biggest
challenges...
32. Historical perspective?
Basic secondary structure prediction
Basic methods of secondary structure prediction rely on
statistical applications of ‘propensity’
The propensity/inclination/tendency of an amino acid to
be in a particular structure based on observation of
known datasets
33. Propensity
n[I][s] / n[I]
n[s] / n
P =
P = propensity
I = residue of interest
n[I] = number of residues [I] in the database
n = total number of residues in the database
n[I][s] = number of residues [I] in state of interest i.e. helices
n[s] = number of all residues in the database in the state of interest.
34. Example
124 / 1640
1246 / 10136
P[A] = = 0.61
So, the helical propensity for Alanine where:
• the number of alanines in the database is 1640,
• and the total number of residues in the database is 10136,
• and where the number of alanines found in helices is 124,
• and the total number of residue found in helices is 1246,
would be 0.61
35. Sliding windows
Propensity values are often assigned using sliding
window methods
Sequence: A G T W Y K M C Q N P V
window 1: A G T W Y K M average applied to W
window 2: G T W Y K M C average applied to Y
window 3: T W Y K M C Q average applied to K
Theory that neighboring residues affect local structure
36. GOR
Method by Garnier, Osguthorpe & Robson (1978).
Uses propensity values for Helix, Sheet, Coil, Turn for each residue from
experimentally-determined structures
Analysis done for each state, most probable state is assigned
Sequence EVSAEEIKKHEEKWNKYYGVNAFNLPKELFSKVDEKDRQKYPYNTIGNVFVKGQTSATGV
GOR Sheet ---------------------------------------------SSSSSSSS---SSSS
GOR Helix --------HHHHHHHHH----HHHHHHHHHHHHHHHH-----------------------
GOR Coil --CCCC---------------------------------CCCC-----------------
GOR Turn ------------------TT----------------------------------------
37. Hydrophobicity
Method by Kyte & Doolittle (1982)
Uses values representing hydrophobicity of residues rather
than structural propensity
Applied with a sliding window method
38. Hydrophobicity
Often helices tend to be more hydrophobic
Internalised regions of a protein are more hydrophobic
Transmembrane domains are hydrophobic
40. One more - Hydrophilicity
A method by Hopp & Woods (1980)
Experimentally derived values representing residue
hydrophilicity
Attempts to determine surface/solvent accessibility -
Antigenicity?
41. Problems
Many of these tools are old - and rely on statistical values
from small datasets
They generally cannot achieve better than 60% accuracy
(depends on how you measure it!)
60% right is still 40% wrong!!!
!! However – they are still in common use !!
(eg. Emboss tools: garnier & antigenic)
43. Application example: Stability
There can be some benefit to using these scales in
combination (similar to antigenic) – here using scales for
order/disorder, aggregation potential and hydrophobicity to
look at protein stability in the absence of structural
information
44. Folding is Complex:
Is a truly random approach possible?
Levinthal’s paradox (1969)
100 residues = 99 peptide bonds
therefore 198 different phi and psi bond
angles
3 stable conformations of bond angle = 3198
possible conformations
At a nano/pico second sample rate proteins
would not find correct structure for a long
time (longer than the age of the Universe!)
phi
psi
Proteins fold on a milli/micro second timescale – this is the paradox...
45. How does it work at all?
1. proteins do NOT fold from random conformations,
which was an assumption of Levinthal's calculation
2. instead, they fold from denatured states that retain
substantial 2o, and possibly 3o, structure
Why are folding simulations so difficult?
• Simulations are computational expensive
• Gross approximations in simulations
• Nature uses tricks such as
• Posttranslational processing
• Chaperones
• Environment change
46. Complexity & Diversity –
potential vs reality
If the average protein contains about 300 amino acids, then
there could be a possible 20300 different proteins
(Apparently) this is more than the atoms in the universe!
Yet a human (complex) has only 30,000 proteins
All proteins so far appear to be represented by between
1000 - 5000 fold types
47. Two reasons for limited fold space
Convergent evolution
Certain folds are biophysically favourable and may
have arisen in multiple cases
Divergent evolution
The number of folds seen is limited because they have
evolved from a limited number of common ancestor
proteins
Despite the evolutionary limitation of the number of existing folds (fold
space) it is still complex enough to make classification and
comprehension difficult
48. Why is Folding Difficult to do?
It's amazing that not only do proteins self-assemble -- fold -- but they do
so amazingly quickly: some as fast as a millionth of a second. While this
time is very fast on a person's timescale, it's remarkably long for
computers to simulate.
In fact, it takes about a day to simulate a nanosecond (1/1,000,000,000 of
a second) of dynamics for a reasonable sized protein. (eg Intel core i7
2.66Ghz)
Unfortunately, proteins fold on the tens of microsecond timescale (10,000
nanoseconds). Thus, it would take 10,000 CPU days to simulate folding
-- i.e. it would take 30 CPU years! That's a long time to wait for one
result!
50. Similar Project:
http://boinc.bakerlab.org/rosetta/
ab initio protein tertiary structure prediction based on the approach that sequence-dependent
local interactions limit or bias segments of the chain to form only distinct
sets of local structures
and that non-local interactions select the lowest free-energy tertiary conformations
compatible with the local biases.
different models are used to treat the local and non-local interactions.
Rather than attempting a physical model for local sequence-structure relationships,
the approach turns to the protein database to look at the distribution of local structures
adopted by short sequence segments (fewer than 10 residues in length) in known
three-dimensional structures
Berkley Open Infrastructure for
Network Computing
52. Some Rosetta@home results
A: Left, crystal structure of the MarA transcription
factor bound to DNA; right, our best submitted
model in CASP3.Despite many incorrect details, the
overall fold is predicted with sufficient accuracy to
allow insights into the mode of DNA binding.
B: Left, the crystal structure of bacteriocin AS-48;
middle, our best submitted model in CASP4; right,
a structurally and functionally related protein (NK-lysin)
identified using this model in a structure-based
search of the Protein Data Bank (PDB). The
structural and functional similarity is not
recognizable using sequence comparison methods
(the identity between the two sequences is only 5
percent).
C: Left, crystal structure of the second domain of
MutS; middle, our best submitted model for this
domain in CASP4; right, a structurally related
protein (RuvC) with a related function recognized
using the model in a structure-based search of the
PDB. The similarity was not recognized using
sequence comparison or fold recognition methods.
56. A compromise: Homology modelling
If there is no structure for your protein - perhaps there is
one for a similar protein.
Sequence alignment tools can be used to compare this to
your sequence with unknown structure
Homology searching and sequence alignment is now the
first step to protein structure prediction
If homologous proteins are found with structures, unknown
can be ‘overlayed’ and structure inferred
57. Homology Modeling
Based on two assumptions:
1.The structure of a protein is determined by its amino acid
sequence alone
2.With evolution, the structure changes more slowly than
the sequence - similar sequences may adopt the same
structure
58. Sequence alignment
TEX19 – human protein without a
structure.
PDB 2AAM: Crystal structure of a
putative glycosidase (tm1410) from
thermotoga maritima
63. Using the Models – Docking/Screening
• Choose and prepare target protein
• Identify binding pocket
• Fit ligand to pocket
• Score
• (for screening – repeat!)
64. Identify the Binding Pocket
• Could identify this by the location of an existing co-crystallised
ligand
• Or use surface sphere clusters
• Or identify it by clustering of solvent molecules (normally
water)
• Perhaps identify it by clustering of fragments (SurFlex
dock protomol)
65. Binding site based on existing
ligand
• Most methods allow you to
specify where the site is –
perhaps by identifying key
residues or based on an
existing ligand
• Could use the ‘hole’ left by the
ligand as a pocket, or use the
‘surface’ of the ligand as a
protomol
66. Surface Sphere generation
• Generate the surface of the target
– Connolly surface
• ‘Rolls’ a sphere the radius of
water across the van der Waal’s
surface of the target
• Each atom’s centre of van der Waal’s radius acts as a sitepoint for the
generation of a sphere on the surface whose centre is perpendicular to
the surface at the sitepoint.
• Spheres are then clustered – each cluster is a potential pocket
68. Prepare the ligand
• The ligand needs to be prepared too
• Drawn & minimised
• From a database - & minimised
• Extracted from another/the same binding site
• Hydrogens added etc
• Minimised/optimised – ready to dock
69. Docking
• Rigid docking -> ligand is fixed conformationally
• Flexible docking –> ligand is conformationally flexible
• Posable -> ligand is rigid, but moved spacially
70. Rigid Ligand docking•
Centres of spheres
representing the binding
pocket act as ‘Site
Points’
• The atoms of the ligand
are matched to the site
points
• Once orientation made,
possibly interaction
minimised: receptor kept
rigid and ligand flexible
71. Alternatives
Flexible Docking Posable Docking
Rings treated as flexible
Other bonds treated as
flexible/rotamers
Rings treated as rigid – ligand
fragmented
Rigid docking, but ligands
posed conformationally
•Rotated
•Twisted
•Flipped etc
And repetitively docked to find
best fit
73. Virtual Screening
• Docking – but repeated with many potential ligands
• Libraries can come from resources such as
PubChem/ChEMBLdb – vendors – or other in-house
sources
• From specialised databases holding structures suitable for
docking
• It is important to have a diversified library especially for
rigid docking !
74. Considering safety & efficacy – “Drug-like”
Lipinski rule of 5 (or Pfizer rule)
‘Compounds which violate at least two of the following conditions have
a very low chance of being orally bioavailable’
• MW <500 Da
• log P (lipophilicity) <5
• number of H bond donors <5
• number of H bond acceptors <10
Works well once you have descriptions of small molecules – can be
search criteria in databases...
75. ADME / ADME-Tox
• Lipinski rule is really the 1st step in ADME (adsorption,
distribution, metabolism, excretion) modelling
• Structure Activity Relationships (SARs) – similar
molecules will behave in similar ways, ie have similar
effects.
• Allows for knowledge-based compariative analysis – Tox
databases
82. Explore the following Software Tools
As well as resources mentioned in the slides!
Homology Modelling
Modeller, Phyre, SwissModel
Model Viewers
Pymol, Jmol, Rasmol
Molecular Simulation etc
Gromacs, Tinker, Amber, NAMD, Charmm,
Docking/Screening
Surflex Dock, Dock, AutoDock, Vina
Graphical Tools/builders/interfaces
Chimera, Maestro, Ghemical, VMD, DeepView
Suites (companies)
Tripos, Accellrys, OpenEye, ChemAxon, Schrodinger, MoE, Yasara
Some are free for
academic use, but cost
for commercial use
Take note and beware!
83. Workflow example – free vs paid
ChEMBL
PDB
Discovery
Studio
ligand
target
Marvin Sketch
Chimera
Gromacs
Dock
Chimera
get structures
preparation
minimisation
dynamics
docking
evaluation
Commercial suite
vs free tools
£££ $$$