protein structure prediction methods. homology modelling, fold recognition, threading, ab initio methods. in short and easy form slides. after one time read you can easily understand methods for protein structure prediction.
Secondary Structure Prediction of proteins Vijay Hemmadi
Secondary structure prediction has been around for almost a quarter of a century. The early methods suffered from a lack of data. Predictions were performed on single sequences rather than families of homologous sequences, and there were relatively few known 3D structures from which to derive parameters. Probably the most famous early methods are those of Chou & Fasman, Garnier, Osguthorbe & Robson (GOR) and Lim. Although the authors originally claimed quite high accuracies (70-80 %), under careful examination, the methods were shown to be only between 56 and 60% accurate (see Kabsch & Sander, 1984 given below). An early problem in secondary structure prediction had been the inclusion of structures used to derive parameters in the set of structures used to assess the accuracy of the method.
Some good references on the subject:
Ab Initio Protein Structure Prediction is a method to determine the tertiary structure of protein in the absence of experimentally solved structure of a similar/homologous protein. This method builds protein structure guided by energy function.
I had prepared this presentation for an internal project during my masters degree course.
The experimental methods used by biotechnologists to determine the structures of proteins demand sophisticated equipment and time.
A host of computational methods are developed to predict the location of secondary structure elements in proteins for complementing or creating insights into experimental results.
Chou-Fasman algorithm is an empirical algorithm developed for the prediction of protein secondary structure
protein structure prediction methods. homology modelling, fold recognition, threading, ab initio methods. in short and easy form slides. after one time read you can easily understand methods for protein structure prediction.
Secondary Structure Prediction of proteins Vijay Hemmadi
Secondary structure prediction has been around for almost a quarter of a century. The early methods suffered from a lack of data. Predictions were performed on single sequences rather than families of homologous sequences, and there were relatively few known 3D structures from which to derive parameters. Probably the most famous early methods are those of Chou & Fasman, Garnier, Osguthorbe & Robson (GOR) and Lim. Although the authors originally claimed quite high accuracies (70-80 %), under careful examination, the methods were shown to be only between 56 and 60% accurate (see Kabsch & Sander, 1984 given below). An early problem in secondary structure prediction had been the inclusion of structures used to derive parameters in the set of structures used to assess the accuracy of the method.
Some good references on the subject:
Ab Initio Protein Structure Prediction is a method to determine the tertiary structure of protein in the absence of experimentally solved structure of a similar/homologous protein. This method builds protein structure guided by energy function.
I had prepared this presentation for an internal project during my masters degree course.
The experimental methods used by biotechnologists to determine the structures of proteins demand sophisticated equipment and time.
A host of computational methods are developed to predict the location of secondary structure elements in proteins for complementing or creating insights into experimental results.
Chou-Fasman algorithm is an empirical algorithm developed for the prediction of protein secondary structure
Functional proteomics, methods and toolsKAUSHAL SAHU
INTRODUCTION
HISTORY
DEFINITION
PROTEOMICS
FUNCTIONAL PROTEOMICS
PROTEOMICS SOFTWARE
PROTEOMICS ANALYSIS
TOOLS FOR PROTEOM ANALYSIS
DIFFERENTS METHODS FOR STUDY OF FUNCTIONAL PROTEOMICS
APLLICATIONS
LIMITATIONS
CONCLUSION
Automated sequencing of genomes require automated gene assignment
Includes detection of open reading frames (ORFs)
Identification of the introns and exons
Gene prediction a very difficult problem in pattern recognition
Coding regions generally do not have conserved sequences
Much progress made with prokaryotic gene prediction
Eukaryotic genes more difficult to predict correctly
Prediction of the three dimensional structure of a given protein sequence i.e. target protein from the amino acid sequence of a homologous (template) protein for which an X-ray or NMR structure is available based on an alignment to one or more known protein structures
After sequencing of the genome has been done, the first thing that comes to mind is "Where are the genes?". Genome annotation is the process of attaching information to the biological sequences. It is an active area of research and it would help scientists a lot to undergo with their wet lab projects once they know the coding parts of a genome.
INTRODUCTION
A PERFECT THERAPEUTIC DRUG
DRUG DISCOVERY- HISTORY
MODERN DRUG DISCOVERY
BIOINFORATICS IN DRUG DISCOVERY
DRUG DISCOVERY BASED ON BIOINFORMATIC TOOLS
BIOINFORMATICS IN COMPUTER-AIDED DRUG DISCOVERY
ECONOMICS OF DRUG DISCOVERY
CONCLUSION
REFERENCES
Functional proteomics, methods and toolsKAUSHAL SAHU
INTRODUCTION
HISTORY
DEFINITION
PROTEOMICS
FUNCTIONAL PROTEOMICS
PROTEOMICS SOFTWARE
PROTEOMICS ANALYSIS
TOOLS FOR PROTEOM ANALYSIS
DIFFERENTS METHODS FOR STUDY OF FUNCTIONAL PROTEOMICS
APLLICATIONS
LIMITATIONS
CONCLUSION
Automated sequencing of genomes require automated gene assignment
Includes detection of open reading frames (ORFs)
Identification of the introns and exons
Gene prediction a very difficult problem in pattern recognition
Coding regions generally do not have conserved sequences
Much progress made with prokaryotic gene prediction
Eukaryotic genes more difficult to predict correctly
Prediction of the three dimensional structure of a given protein sequence i.e. target protein from the amino acid sequence of a homologous (template) protein for which an X-ray or NMR structure is available based on an alignment to one or more known protein structures
After sequencing of the genome has been done, the first thing that comes to mind is "Where are the genes?". Genome annotation is the process of attaching information to the biological sequences. It is an active area of research and it would help scientists a lot to undergo with their wet lab projects once they know the coding parts of a genome.
INTRODUCTION
A PERFECT THERAPEUTIC DRUG
DRUG DISCOVERY- HISTORY
MODERN DRUG DISCOVERY
BIOINFORATICS IN DRUG DISCOVERY
DRUG DISCOVERY BASED ON BIOINFORMATIC TOOLS
BIOINFORMATICS IN COMPUTER-AIDED DRUG DISCOVERY
ECONOMICS OF DRUG DISCOVERY
CONCLUSION
REFERENCES
Geological Evidence for photosynthesis, mechanisms of evolution, evolution of co-factors, evolution of protein complexes, photosynthetic reaction centers and electron transport chains
Protein structure Lecture for M Sc biology students Anuj Kumar
Presentation on Protein Structure for MSc class by Dr Anuj Kumar Scientist at National Institute of Virology, Indian Council of Medical Research (ICMR)
Also useful for Students preparing of CSIR/JRF NET and LS
Bio inspired metal-oxo catalysts for c–h bond functionalizationDaniel Morton
Metal-oxo complexes are important species in the activation of strong C–H bonds in biological systems. The high reactivity of metal-oxo complexes results from the way their valence electrons are arranged, and this arrangement depends strongly on the geometry around the metal center.
Contributed by: A.S. Borovik and Sarah Cook, University of California-Irvine, 2014
We provide an overview of the use we make of ontologies at the Royal Society of Chemistry. Our engagement with the ontology community began in 2006 with preparations for Project Prospect, which used ChEBI and other Open Biomedical Ontologies to mark up journal articles. Subsequently Project Prospect has evolved into DERA (Digitally Enhancing the RSC Archive) and we have developed further ontologies for text markup, covering analytical methods and name reactions. Most recently we have been contributing to CHEMINF, an open-source cheminformatics ontology, as part of our work on disseminating calculated physicochemical properties of molecules via the Open PHACTS. We show how we represent these properties and how it can serve as a template for disseminating different sorts of chemical information.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
Digital Tools and AI for Teaching Learning and Research
Protein Structure Prediction
1. Rohit
Digitally signed by Rohit Jhawer
DN: cn=Rohit Jhawer, o, ou,
email=rohit_jhawer@hotmail.
Jhawer
com, c=IN
Date: 2007.03.09 14:10:44
+05'30'
Lecture 14:
Protein Structure Prediction
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
2. Review of Proteins
• Proteins: polypeptides with a three
dimensional structure
•
• Primary structure – sequence of amino
acids constituting polypeptide chain
• Secondary structure – local organization of
polypeptide chain into secondary structures
such as α helices and β sheets
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
3. Review of Proteins
• Tertiary structure –three dimensional
arrangements of amino acids as they react to
one another due to polarity and interactions
between side chains
• Quaternary structure – Interaction of several
protein subunits
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
4. Protein Structure
• Proteins: chains of amino acids joined by
peptide bonds
• Amino Acids:
– Polar (separate positive and negatively charged
regions)
– free C=O group (CARBOXYL), can act as
hydrogen bond acceptor
– free NH group (AMINYL), can act as hydrogen
bond donor
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
6. Protein Structure
• Many confirmations possible due to the
rotation around the Alpha-Carbon (Cα)
atom
• Confirmational changes lead to
differences in three-dimensional
structure of protein
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
7. Protein Structure
• Polypeptide chain has pattern of N-Cα-C
repeated
• Angle between aminyl group and Cα is
PHI (φ) angle; angle between Cα and
carboxyl group is PSI (ψ) angle
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
9. Differences between A.A.’s
• Difference between 20 amino acids is the R
side chains
• Amino acids can be separated based on the
chemical properties of the side chains:
– Hydrophobic
– Charged
– Polar
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
10. Differences between A.A.’s
• Hydrophobic: Alanine(A), Valine(V),
phenylalanine (Y), Proline (P), Methionine
(M), isoleucine (I), and Leucine(L)
• Charged: Aspartic acid (D), Glutamic Acid
(E), Lysine (K), Arginine (R)
• Polar: Serine (S), Theronine (T), Tyrosine (Y);
Histidine (H), Cysteine (C), Asparagine (N),
Glutamine (Q), Tryptophan (W)
•
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
11. Secondary Structure
• Image source: http://www.ebi.ac.uk/microarray/biology_intro.html
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
12. Secondary Structures
• Core of each protein made up of regular
secondary structures
• Regular patterns of hydrogen bonds are
formed between neighboring amino acids
• Amino acids in secondary structures have
similar φ and ψ angles
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
13. Secondary Structures
• Structures act to neutralize the polar groups
on each amino acid
• Secondary structures tightly packed in protein
core and a hydrophobic environment
• Each amino acid side group has a limited
space to occupy -- therefore a limited number
of possible interactions
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
14. Types of Secondary
Structures
• α Helices
• β Sheets
• Loops
• Coils
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
15. α Helix
• Most abundant secondary
structure
• 3.6 amino acids per turn
• Hydrogen bond formed
between every fourth reside
• Average length: 10 amino
acids, or 3 turns
• Varies from 5 to 40 amino acids
Image source: http://www.hhmi.princeton.edu/sw/2002/psidelsk/scavengerhunt.htm; http://www4.ocn.ne.jp/~bio/biology/protein.htm
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
16. α Helix
• Normally found on the surface of protein
cores
• Interact with aqueous environment
– Inner facing side has hydrophobic amino
acids
– Outer-facing side has hydrophilic amino
acids
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
17. α Helix
• Every third amino acid tends to be
hydrophobic
• Pattern can be detected computationally
• Rich in alanine (A), gutamic acid (E), leucine
(L), and methionine (M)
• Poor in proline (P), glycine (G), tyrosine (Y),
and serine (S)
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
18. β Sheet
Image source: http://broccoli.mfn.ki.se/pps_course_96/ss_960723_12.html;
http://www4.ocn.ne.jp/~bio/biology/protein.htm
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
19. β Sheet
• Hydrogen bonds between 5-10
consecutive amino acids in one portion
of the chain with another 5-10 farther
down the chain
• Interacting regions may be adjacent
with a short loop, or far apart with other
structures in between
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
20. β Sheet
• Directions:
– Same: Parallel Sheet
– Opposite: Anti-parallel Sheet
– Mixed: Mixed Sheet
• Pattern of hydrogen bond formation in
parallel and anti-parallel sheets is
different
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
21. β Sheet
• Slight counterclockwise rotation
• Alpha carbons (as well as R side
groups) alternate above and below the
sheet
• Prediction difficult, due to wide range of
φ and ψ angles
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
22. Interactions in Helices and
Sheets
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
23. Loop
• Regions between α helices and β
sheets
• Various lengths and three-dimensional
configurations
• Located on surface of the structure
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
24. Loop
• Hairpin loops: complete turn in the
polypeptide chain, (anti-parallel β sheets)
• More variable sequence structure
• Tend to have charged and polar amino acids
• Frequently a component of active sites
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
25. Coil
• Region of secondary structure that is
not a helix, sheet, or loop
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
26. Secondary Structure
• Image source: http://www.ebi.ac.uk/microarray/biology_intro.html
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
27. 6 Classes of Protein Structure
1) Class α: bundles of α helices connected by
loops on surface of proteins
2) Class β: antiparallel β sheets, usually two
sheets in close contact forming sandwich
3) Class α/β: mainly parallel β sheets with
intervening α helices; may also have mixed β
sheets (metabolic enzymes)
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
28. 6 Classes of Protein Structure
4) Class α+ β: mainly segregated α helices and
antiparallel β sheets
5) Multidomain (α and β) proteins more than
one of the above four domains
6) Membrane and cell-surface proteins and
peptides excluding proteins of the immune
system
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
29. α Class Protein (hemoglobin)
• http://www.rcsb.org/pdb/cgi/explore.cgi?job=graphics;pdbId=3hhb;page=;pid=&opt=show&size=250
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
30. β Class Protein (T-Cell CD8)
• http://www.rcsb.org/pdb/cgi/explore.cgi?job=graphics;pdbId=1cd8;page=;pid=&opt=show&size=500
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
31. α/ β Class Protein
(tryptohan synthase)
• http://www.rcsb.org/pdb/cgi/explore.cgi?job=graphics;pdbId=2wsy;page=;pid=&opt=show&size=500
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
32. α+β Class Protein
(1RNB)
• http://www.rcsb.org/pdb/cgi/explore.cgi?job=graphics;pdbId=1rnb;page=;pid=&opt=show&size=500
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
33. Membrane Protein (10PF)
• http://www.rcsb.org/pdb/cgi/explore.cgi?job=graphics;pdbId=1opf;page=;pid=&opt=show&size=500
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
34. Protein Structure Databases
• Databases of three dimensional structures of
proteins, where structure has been solved
using X-ray crystallography or nuclear
magnetic resonance (NMR) techniques
• Protein Databases:
– PDB
– SCOP
– Swiss-Prot
– PIR
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
35. Protein Structure Databases
• Most extensive for 3-D structure is the
Protein Data Bank (PDB)
• Current release of PDB (April 8, 2003)
has 20,622 structures
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
36. Partial PDB File
ATOM 1 N VAL A 1 6.452 16.459 4.843 7.00 47.38 3HHB 162
ATOM 2 CA VAL A 1 7.060 17.792 4.760 6.00 48.47 3HHB 163
ATOM 3 C VAL A 1 8.561 17.703 5.038 6.00 37.13 3HHB 164
ATOM 4 O VAL A 1 8.992 17.182 6.072 8.00 36.25 3HHB 165
ATOM 5 CB VAL A 1 6.342 18.738 5.727 6.00 55.13 3HHB 166
ATOM 6 CG1 VAL A 1 7.114 20.033 5.993 6.00 54.30 3HHB 167
ATOM 7 CG2 VAL A 1 4.924 19.032 5.232 6.00 64.75 3HHB 168
ATOM 8 N LEU A 2 9.333 18.209 4.095 7.00 30.18 3HHB 169
ATOM 9 CA LEU A 2 10.785 18.159 4.237 6.00 35.60 3HHB 170
ATOM 10 C LEU A 2 11.247 19.305 5.133 6.00 35.47 3HHB 171
ATOM 11 O LEU A 2 11.017 20.477 4.819 8.00 37.64 3HHB 172
ATOM 12 CB LEU A 2 11.451 18.286 2.866 6.00 35.22 3HHB 173
ATOM 13 CG LEU A 2 11.081 17.137 1.927 6.00 31.04 3HHB 174
ATOM 14 CD1 LEU A 2 11.766 17.306 .570 6.00 39.08 3HHB 175
ATOM 15 CD2 LEU A 2 11.427 15.778 2.539 6.00 38.96 3HHB 176
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
37. Description of PDB File
• second column: amino acid position in the
polypeptide chain
• fourth column: current amino acid
• Columns 7, 8, and 9: x, y, and z coordinates
(in angstroms)
• The 11th column: temperature factor -- can be
used as a measurement of uncertainty
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
38. Protein Structure
Classification Databases
• Structural Classification of proteins
(SCOP)
• based on expert definition of structural
similarities
• SCOP classifies by class, family, superfamily,
and fold
• http://scop.mrc-lmb.cam.ac.uk/scop/
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
39. Protein Structure
Classification Databases
• Classification by class, architecture,
topology, and homology (CATH)
• Classifies proteins into hierarchical levels by
class
• a/B and a+B are considered to be a single
class
• http://www.biochem.ucl.ac.uk/bsm/cath/
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
40. Protein Structure
Classification Databases
• Molecular Modeling Database (MMDB)
• structures from PDB categorized into
structurally related groups using the VAST
• looks for similar arrangements of secondary
structural elements
• http://www.ncbi.nlm.nih.gov/Entrez
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
41. Protein Structure
Classification Databases
• Spatial Arrangement of Backbone
Fragments (SARF)
• categorized on structural similarities,
similar to the MMDB
• http://www-lmmb.ncifcrf.gov/~nicka/sarf2.html
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
42. Visualization of Proteins
• A number of programs convert atomic
coordinates of 3-d structures into views of the
molecule
• allow the user to manipulate the molecule by
rotation, zooming, etc.
• Critical in drug design -- yields insight into
how the protein might interact with ligands at
active sites
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
43. Visualization of Proteins
• Most popular program for viewing 3-
dimensional structures is Rasmol
Rasmol: http://www.umass.edu/microbio/rasmol/
Chime: http://www.umass.edu/microbio/chime/
Cn3D: http://www.ncbi.nlm.nih.gov/Structure/
Mage: http://kinemage.biochem.duke.edu/website/kinhome.html
Swiss 3D viewer: http://www.expasy.ch/spdbv/mainpage.html
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
44. Alignment of Protein Structure
• Three-dimensional structure of one protein
compared against three-dimensional
structure of second protein
• Atoms fit together as closely as possible to
minimize the average deviation
• Structural similarity between proteins does
not necessarily mean evolutionary
relationship
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
45. Alignment of Protein Structure
• Positions of atoms in three-dimensional
structures compared
• Look for positions of secondary
structural elements (helices and
strands) within a protein domain
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
46. Alignment of Protein Structure
• Distances between carbon atoms
examined to determine degree
structures may be superimposed
• Side chain information can be
incorporated
– Buried; visible
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
47. SSAP
• Secondary Structure Alignment
Program
• Incorporates double dynamic
programming to produce a structural
alignment between two proteins
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
48. Steps in SSAP
• 1) Calculate vectors from Cβ of one amino
acid to set of nearby amino acids
– Vectors from two separate proteins compared
– Difference (expressed as an angle) calculated,
and converted to score
• 2) Matrix for scores of vector differences
from one protein to the next is computed.
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
49. Steps in SSAP
• 3) Optimal alignment found using
global dynamic programming, with a
constant gap penalty
• 4) Next amino acid residue
considered, optimal path to align this
amino acid to the second sequence
computed
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
50. Steps in SSAP
• 5) Alignments transferred to
summary matrix
– If paths cross same matrix position, scores
are summed
– If part of alignment path found in both
matrices, evidence of similarity
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
51. Steps in SSAP
• 6) Dynamic programming alignment
is performed for the summary matrix
– Final alignment represents optimal
alignment between the protein structures
– Resulting score converted so it can be
compared to see how closely related two
structures are
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
52. Distance Matrix Approach
• Uses graphical procedure similar to dot
plots
• Identifies atoms that lie most closely
together in three-dimensional structure
• Two sequences with similar structure
can have dot plots superimposed
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
53. Distance Matrix Approach
• Values in distance matrix represent distance
between the Cα atoms in the three
dimensional structure
• positions of closest packing atoms marked
with a dot to highlight regions of interest
• Similar groups superimposed as closely as
possible by minimizing sum of atomic
distances
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
54. DALI
• Distance Alignment Tool (DALI)
• Uses distance matrix method to align protein
structures
• Assembly step uses Monte Carlo simulation
to find submatrices that can be aligned
• Existing structures that have been compared
are organized into the FSSP database
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
55. Fast Structural Similarity
Search
• Compare types and arrangements of
secondary structures within two proteins
• If elements similarly arranged, three-
dimensional structures are similar
• VAST and SARF are programs that use
these fast methods
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
56. Structural Motifs Based on
Sequence Analysis
• Some structural elements can be
determined by looking at sequence
composition
– zinc finger motifs
– leucine zippers
– coiled-coil structures
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
57. Zinc Finger Motifs
• Found by looking at
order and spacing of
cysteine and
histidine residues
• Typical zinc finger
motifs are
composed of two
cysteines followed Image source: www.bmb.psu.edu/faculty/tan/lab/
by two histidines tanlab_gallery_protdna.html
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
58. Leucine Zippers
• Found by looking for
two antiparallel alpha
helices held together
• Interactions between
hydrophobic leucine
residues found every
seventh position in helix Image source: ww2.mcgill.ca/biology/undergra/
c200a/sec3-5.htm
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
59. Transmembrane Proteins
• traverse back and forth
through alpha helices
• Typical length: 20-30
residues
• Transmembrane alpha
helices have hydrophobic
residues on the inside
facing portions, and
hydrophilic residues on the
outside Image source:
http://www.northwestern.edu/neurobiology/faculty/pinto2/pinto_12big.jpg
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
60. Membrane Prediction
Programs
• PHDhtm: employs neural network approach;
neural network trained to recognize sequence
patterns and variations of helices in
transmembrane proteins of known structures
• Tmpred: functions by searching a protein
against a sequence scoring matrix obtained
by aligning the sequences of all known
transmembrane alpha helix regions
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
61. Distance Matrix Approach
• Uses graphical procedure similar to dot
plots
• Identifies atoms that lie most closely
together in three-dimensional structure
• Two sequences with similar structure
can have dot plots superimposed
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
62. Distance Matrix Approach
• Values in distance matrix represent distance
between the Cα atoms in the three
dimensional structure
• positions of closest packing atoms marked
with a dot to highlight regions of interest
• Similar groups superimposed as closely as
possible by minimizing sum of atomic
distances
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
63. DALI
• Distance Alignment Tool (DALI)
• Uses distance matrix method to align protein
structures
• Assembly step uses Monte Carlo simulation
to find sub-matrices that can be aligned
• Existing structures that have been compared
are organized into the FSSP database
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
64. Fast Structural Similarity
Search
• Compare types and arrangements of
secondary structures within two proteins
• If elements similarly arranged, three-
dimensional structures are similar
• VAST and SARF are programs that use
these fast methods
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
65. Structural Motifs Based on
Sequence Analysis
• Some structural elements can be
determined by looking at sequence
composition
– zinc finger motifs
– leucine zippers
– coiled-coil structures
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
66. Zinc Finger Motifs
• Found by looking at
order and spacing of
cysteine and
histidine residues
• Typical zinc finger
motifs are
composed of two
cysteines followed Image source: www.bmb.psu.edu/faculty/tan/lab/
by two histidines tanlab_gallery_protdna.html
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
67. Leucine Zippers
• Found by looking for
two antiparallel alpha
helices held together
• Interactions between
hydrophobic leucine
residues found every
seventh position in helix Image source: ww2.mcgill.ca/biology/undergra/
c200a/sec3-5.htm
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
68. Transmembrane Proteins
• traverse back and forth
through alpha helices
• Typical length: 20-30
residues
• Transmembrane alpha
helices have hydrophobic
residues on the inside
facing portions, and
hydrophilic residues on the
outside Image source:
http://www.northwestern.edu/neurobiology/faculty/pinto2/pinto_12big.jpg
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
69. Membrane Prediction
Programs
• PHDhtm: employs neural network approach;
neural network trained to recognize sequence
patterns and variations of helices in
transmembrane proteins of known structures
• Tmpred: functions by searching a protein
against a sequence scoring matrix obtained
by aligning the sequences of all known
transmembrane alpha helix regions
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
70. Chou-Fasman Method
• based on analyzing frequency of amino acids in
different secondary structures
– A, E, L, and M strong predictors of alpha helices
– P and G are predictors in the break of a helix
• Table of predictive values created for alpha helices,
beta sheets, and loops
• Structure with greatest overall prediction value
greater than 1 used to determine the structure
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
71. GOR Method
• Improves upon the Chou-Fasman method
• Assumes amino acids surrounding the central amino
acid influence secondary structure central amino acid
is likely to adopt
• Scoring matrices used in GOR method, incorporates
information theory and Bayesian statistics
• Mount, p450-451
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
72. Neural Network Models
• Programs trained to recognize amino acid
patterns located in known secondary
structures
• distinguish these patterns from patterns not
located in structures
• PHD and NNPREDICT use neural networks
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
73. Nearest-neighbor
• machine learning method
• secondary structure confirmation of an amino
acid calculated by identifying sequences of
known structures similar to the query by
looking at the surrounding amino acids
• Nearest-neighbor programs include include
PSSP, Simpa96, SOPM, and SOPMA
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
74. Prediction of 3d Structures
• Threading is most Robust technique
• Time consuming
• Requires knowledge of protein structure
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
75. Threading
• Searches for structures with similar folds
without sequence similarity
• Threading takes a sequence with unknown
structure and threads it through the
coordinates of a target protein whose
structure has been solved
– X-ray crystallography
– NMR imaging
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
76. Threading
• Considered position by position subject
to predetermined constraints
• Thermodynamic calculations made to
determine most energetically favorable
and confirmationally stable alignment
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
77. Environmental Template
• Environment of each amino acid in each
known structural core is determined
– secondary structure
– area of side chain buried by closeness to
other atoms
– types of nearby side chains
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
78. Environmental Template
• Each position classified into one of 18
types
– 6 representing increasing levels of residue
burial
– three classes of secondary structure (alpha
helices, beta sheets, and loops).
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
79. Upcoming Seminars
• Topic TBA
– Rafael Irizarry, Johns Hopkins University
• Friday, 4/23/2004
• 8:30 AM – 9:30 AM
• LOCATION: K-Building Room 2036 (HSC
Campus)
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
80. Presentations
• 4:45 – 5:00 Richard Jones
• 5:00 – 5:15 Steven Xu
• 5:15 – 5:30 Olutola Iyun
• 5:30 – 5:45 Frank Baker
• 5:45 – 6:00 Guanghui Lan
• 6:00 – 6:15 Tim Hardin
• 6:15 – 6:30 Satish Bollimpalli & Ravi
Gundlapalli
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka