SlideShare a Scribd company logo
Protein Structure, DatabaSeS anD
Structural alignment
Saramita De
Chakravarti
Research Scientist, II (i(
Chembiotech
Research
Laboratories
1
2
Basics of proteinBasics of protein
structurestructure
3
Why Proteins Structure ?Why Proteins Structure ?
 Proteins are fundamental components of all living
cells, performing a variety of biological tasks.
 Each protein has a particular 3D structure that
determines its function.
 Protein structure is more conserved than protein
sequence, and more closely related to function.
4
Protein core - usually conserved.
Protein loops - variable regions
Hydrophobic core
Surface loops
Protein Structure
5
Supersecondary structures
Assembly of secondary structures which are
shared by many structures.
Beta hairpin
Beta-alpha-beta unit
Helix hairpin
6
Hemoglobin (1bab(
Fold: General structure composed of
sets of Supersecondary structures
7
http://scop.berkeley.edu/count.html
How Many Folds Are There ?How Many Folds Are There ?
8
• Two conserved sequences similar structures
• Two similar structures conserved sequences?
Structure – Sequence RelationshipsStructure – Sequence Relationships
There are cases of proteins with the same
structure but no clear sequence similarity.
9
Principles of Protein Structure
•Today's proteins reflect millions of years of
evolution.
•3D structure is better conserved than sequence
during evolution.
•Similarities among sequences or among
structures may reveal information about shared
biological functions of a protein family.
10
The Levinthal paradox
Assume a protein is comprised of 100 AAs and that
each AA can take up 10 different conformations.
Altogether we get:10100
(i.e. google( conformations.
If each conformation were sampled in the shortest
possible time (time of a molecular vibration ~ 10-13
s(
it would take an astronomical amount of time (~1077
years( to sample all possible conformations, in order
to find the Native State.
11
The Levinthal paradox
Luckily, nature works out with these sorts of
numbers and the correct conformation of a protein
is reached within seconds.
12
How is the 3D Structure Determined ?How is the 3D Structure Determined ?
Experimental methods (Best approach(:Experimental methods (Best approach(:
• X-rays crystallography.
• NMR.
• Others (e.g., neutron diffraction(.
13
How is the 3D Structure Determined ?How is the 3D Structure Determined ?
In-silico methodsIn-silico methods
Ab-initio structure prediction given only the
sequence as input - not always successful.
14
A note on ab-initio predictions: The
current state is that “failure can no
longer be guaranteed”…
15
A note on ab-initio secondary structure
prediction: Success ~70%.
16
How is the 3D Structure Determined ?How is the 3D Structure Determined ?
In-silico methodsIn-silico methods
Threading = Sequence-structure alignment. The
idea is to search for a structure and sequence in
existing databases of 3D structure, and use
similarity of sequences + information on the
structures to find best predicted structures.
17
Comments
• X-ray crystallography is the most widely
used method.
• Quaternary structure of large proteins
(ribosomes, virus particles, etc) can be
determined by electron microscopes
(cryoEM).
18
Protein DatabasesProtein Databases
19
PDB: Protein Data Bank
• Holds 3D models of biological macromolecules
(protein, RNA, DNA).
• All data are available to the public.
• Obtained by X-Ray crystallography (84%) or NMR
spectroscopy (16%).
• Submitted by biologists and biochemists from
around the world.
20
PDB: Protein Data Bank
•Founded in 1971 by Brookhaven National
Laboratory, New York.
•Transferred to the Research Collaboratory
for Structural Bioinformatics (RCSB) in 1998.
•Currently it holds > 49,426 released
structures.
61695
21
PDB - model
• A model defines the 3D positions of atoms in
one or more molecules.
• There are models of proteins, protein
complexes, proteins and DNA, protein
segments, etc …
• The models also include the positions of ligand
molecules, solvent molecules, metal ions, etc.
22
PDB – Protein Data Bank
http://www.pdb.org/pdb/home/home.do
23
The PDB file – text formatThe PDB file – text format
24
The PDB file – textThe PDB file – text formatformat
ATOM:
Usually protein
or DNA
HETATM:
Usually Ligand,
ion, water
chain
Residue
identity
Residue
number
Atom
number
Atom
identity
The coordinates
for each residue in
the structure
X Y Z
25
Structural Alignment
26
Why structural alignment?
• Structural similarity can point to remote
evolutionary relationship
• Shared structural motifs among proteins
suggest similar biological function
• Getting insight into sequence-structure
mapping (e.g., which parts of the protein
structure are conserved among related
organisms).
27
As in any alignment problem, we can
search for GLOBAL ALIGNMENT or for
LOCAL ALIGNMENT
28
Human Myoglobin
pdb:2mm1
Human Hemoglobin
alpha-chain
pdb:1jebA
Sequence id: 27%
Structural id: 90%
29
What is the best transformation thatWhat is the best transformation that
superimposes the unicorn on the lion?superimposes the unicorn on the lion?
30
Solution:
Regard the shapes as sets of points
and try to “match”
these sets using a transformation
31
This is not a good result….
32
Good result:
33
Kinds of transformations:
• Rotation
• Translation
• Scaling
and more….
34
Translation:
X
Y
35
Rotation:
X
Y
36
Scale:
X
Y
37
We represent a protein as a geometric
object in the plane.
The object consists of points represented
by coordinates (x, y, z).
Thr
Lys
Met Gly
Glu
Ala
38
The aim:
Given two proteins
Find the transformation that produces
the best Superimposition of one protein
onto the other
39
Correspondence is Unknown
Given two configurations of points in the three
dimensional space:
+
40
Find those rotations and translations of one of the point
sets which produce “large” superimpositions of
corresponding 3-D points
?
41
The best transformation:
T
42
Simple case – two closely related proteins with the
same number of amino acids.
Question:
how do we asses the
quality of the
transformation?
+
43
Scoring the Alignment
Two point sets: A={ai} i=1…n
B={bj} j=1…m
• Pairwise Correspondence:
(ak1,bt1) (ak2,bt2)… (akN,btN)
(1) Bottleneck max ||aki – bti||
(2) RMSD (Root Mean Square Distance)
Sqrt( Σ||aki – bti||2
/N)
44
RMSD – Root Mean Square
Deviation
Given two sets of 3-D points :
P={pi}, Q={qi} , i=1,…,n;
rmsd(P,Q) = √ Σ i|pi - qi |2
/n
Find a 3-D transformation T*
such that:
rmsd( T*
(P), Q ) = minT √ Σ i|T(pi) - qi |2
/n
Find the highest number of atoms aligned with the lowest RMSD
45
Pitfalls of RMSD
• all atoms are treated equally
(residues on the surface have a higher degree of
freedom than those in the core)
• best alignment does not always mean minimal
RMSD
• does not take into account the attributes of the
amino acids
46
Flexible alignment vs. Rigid
alignment
Rigid alignment
Flexible alignment
47
Some more issuesSome more issues
48
Does the fact that all proteins have alpha-
helix indicates that they are all evolutionary
related?
No. Alpha helices reflect physical constraints,
as do beta sheets.
For structures – it is difficult sometimes
to separate convergent evolution from
evolutionary relatedness.
49
Structural genomics: solve or predict 3D of
all proteins of a given organism (X-ray, NMR,
and homology modelling).
Unlike traditional structural biology, 3D is
often solved before anything is known on
the protein in question. A new challenge
emerged: predict a protein’s function from
its 3D structure.
50
CASP: a competition for predicting 3D
structures.
Instead of running to publish a new 3D
structure, the AA sequence is published and
each group is invited to give their
predictions.
51
Capri: same as casp – but for docking.
52
Homology modeling: predicting the
structure from a closely related known
structure.
This can be important for example to
predict how a mutation influences the
structure
53

More Related Content

What's hot

Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment   Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment
The Oxford College Engineering
 
Chou fasman algorithm for protein structure prediction
Chou fasman algorithm for protein structure predictionChou fasman algorithm for protein structure prediction
Chou fasman algorithm for protein structure prediction
Roshan Karunarathna
 
Gen bank databases
Gen bank databasesGen bank databases
Gen bank databases
Hafiz Muhammad Zeeshan Raza
 
RNA secondary structure prediction
RNA secondary structure predictionRNA secondary structure prediction
RNA secondary structure prediction
Muhammed sadiq
 
EMBL
EMBLEMBL
Protein structure classification/domain prediction: SCOP and CATH (Bioinforma...
Protein structure classification/domain prediction: SCOP and CATH (Bioinforma...Protein structure classification/domain prediction: SCOP and CATH (Bioinforma...
Protein structure classification/domain prediction: SCOP and CATH (Bioinforma...
SELF-EXPLANATORY
 
PAM : Point Accepted Mutation
PAM : Point Accepted MutationPAM : Point Accepted Mutation
PAM : Point Accepted Mutation
Amit Kyada
 
Protein database
Protein databaseProtein database
Protein database
Khalid Hakeem
 
Ddbj
DdbjDdbj
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
Vidya Kalaivani Rajkumar
 
Structural databases
Structural databases Structural databases
Structural databases
Priyadharshana
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
Ramya S
 
NCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology InformationNCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology Information
Thapar Institute of Engineering & Technology, Patiala, Punjab, India
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
ammar kareem
 
Phylogenetic analysis
Phylogenetic analysis Phylogenetic analysis
Phylogenetic analysis
Nitin Naik
 
Prosite
PrositeProsite
Genome annotation 2013
Genome annotation 2013Genome annotation 2013
Genome annotation 2013
Karan Veer Singh
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijay
Vijay Hemmadi
 

What's hot (20)

Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment   Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment
 
Chou fasman algorithm for protein structure prediction
Chou fasman algorithm for protein structure predictionChou fasman algorithm for protein structure prediction
Chou fasman algorithm for protein structure prediction
 
Gen bank databases
Gen bank databasesGen bank databases
Gen bank databases
 
RNA secondary structure prediction
RNA secondary structure predictionRNA secondary structure prediction
RNA secondary structure prediction
 
EMBL
EMBLEMBL
EMBL
 
Protein structure classification/domain prediction: SCOP and CATH (Bioinforma...
Protein structure classification/domain prediction: SCOP and CATH (Bioinforma...Protein structure classification/domain prediction: SCOP and CATH (Bioinforma...
Protein structure classification/domain prediction: SCOP and CATH (Bioinforma...
 
PAM : Point Accepted Mutation
PAM : Point Accepted MutationPAM : Point Accepted Mutation
PAM : Point Accepted Mutation
 
Phylogenetic analysis
Phylogenetic analysisPhylogenetic analysis
Phylogenetic analysis
 
Protein database
Protein databaseProtein database
Protein database
 
Ddbj
DdbjDdbj
Ddbj
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
Structural databases
Structural databases Structural databases
Structural databases
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
NCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology InformationNCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology Information
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
Phylogenetic analysis
Phylogenetic analysis Phylogenetic analysis
Phylogenetic analysis
 
Prosite
PrositeProsite
Prosite
 
blast bioinformatics
blast bioinformaticsblast bioinformatics
blast bioinformatics
 
Genome annotation 2013
Genome annotation 2013Genome annotation 2013
Genome annotation 2013
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijay
 

Similar to Protein Structure, Databases and Structural Alignment

Protein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on RosettaProtein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on Rosetta
Bioinformatics and Computational Biosciences Branch
 
Molecular Structures 2009
Molecular Structures 2009Molecular Structures 2009
Molecular Structures 2009lyonja
 
So sánh cấu trúc protein_Protein structure comparison
So sánh cấu trúc protein_Protein structure comparisonSo sánh cấu trúc protein_Protein structure comparison
So sánh cấu trúc protein_Protein structure comparison
bomxuan868
 
Computer Aided Molecular Modeling
Computer Aided Molecular ModelingComputer Aided Molecular Modeling
Computer Aided Molecular Modeling
pkchoudhury
 
Protein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modelingProtein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modeling
Bioinformatics and Computational Biosciences Branch
 
Structural Systems Pharmacology
Structural Systems PharmacologyStructural Systems Pharmacology
Structural Systems Pharmacology
Philip Bourne
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
researchinventy
 
Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...
Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...
Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...
Melissa Moody
 
Homology modeling
Homology modelingHomology modeling
Overview of cheminformatics
Overview of cheminformaticsOverview of cheminformatics
Overview of cheminformatics
Benjamin Bucior
 
Cheminformatics: An overview
Cheminformatics: An overviewCheminformatics: An overview
Cheminformatics: An overview
subhasis banerjee
 
1 -val_gillet_-_ligand-based_and_structure-based_virtual_screening
1  -val_gillet_-_ligand-based_and_structure-based_virtual_screening1  -val_gillet_-_ligand-based_and_structure-based_virtual_screening
1 -val_gillet_-_ligand-based_and_structure-based_virtual_screening
Deependra Ban
 
Bioinformatics t7-proteinstructure v2014
Bioinformatics t7-proteinstructure v2014Bioinformatics t7-proteinstructure v2014
Bioinformatics t7-proteinstructure v2014
Prof. Wim Van Criekinge
 
A family of global protein shape descriptors using gauss integrals, christian...
A family of global protein shape descriptors using gauss integrals, christian...A family of global protein shape descriptors using gauss integrals, christian...
A family of global protein shape descriptors using gauss integrals, christian...pfermat
 
2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練 2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練
Abner Huang
 
Bioinformatics lecture xxiii
Bioinformatics lecture xxiiiBioinformatics lecture xxiii
Bioinformatics lecture xxiii
Muhammad Younis
 
Bits protein structure
Bits protein structureBits protein structure
Bits protein structureBITS
 
Homology Modeling.pptx
Homology Modeling.pptxHomology Modeling.pptx
Homology Modeling.pptx
AmnaAkram29
 
Protein Threading
Protein ThreadingProtein Threading
Protein Threading
SANJANA PANDEY
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformaticsAbhishek Vatsa
 

Similar to Protein Structure, Databases and Structural Alignment (20)

Protein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on RosettaProtein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on Rosetta
 
Molecular Structures 2009
Molecular Structures 2009Molecular Structures 2009
Molecular Structures 2009
 
So sánh cấu trúc protein_Protein structure comparison
So sánh cấu trúc protein_Protein structure comparisonSo sánh cấu trúc protein_Protein structure comparison
So sánh cấu trúc protein_Protein structure comparison
 
Computer Aided Molecular Modeling
Computer Aided Molecular ModelingComputer Aided Molecular Modeling
Computer Aided Molecular Modeling
 
Protein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modelingProtein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modeling
 
Structural Systems Pharmacology
Structural Systems PharmacologyStructural Systems Pharmacology
Structural Systems Pharmacology
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...
Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...
Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...
 
Homology modeling
Homology modelingHomology modeling
Homology modeling
 
Overview of cheminformatics
Overview of cheminformaticsOverview of cheminformatics
Overview of cheminformatics
 
Cheminformatics: An overview
Cheminformatics: An overviewCheminformatics: An overview
Cheminformatics: An overview
 
1 -val_gillet_-_ligand-based_and_structure-based_virtual_screening
1  -val_gillet_-_ligand-based_and_structure-based_virtual_screening1  -val_gillet_-_ligand-based_and_structure-based_virtual_screening
1 -val_gillet_-_ligand-based_and_structure-based_virtual_screening
 
Bioinformatics t7-proteinstructure v2014
Bioinformatics t7-proteinstructure v2014Bioinformatics t7-proteinstructure v2014
Bioinformatics t7-proteinstructure v2014
 
A family of global protein shape descriptors using gauss integrals, christian...
A family of global protein shape descriptors using gauss integrals, christian...A family of global protein shape descriptors using gauss integrals, christian...
A family of global protein shape descriptors using gauss integrals, christian...
 
2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練 2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練
 
Bioinformatics lecture xxiii
Bioinformatics lecture xxiiiBioinformatics lecture xxiii
Bioinformatics lecture xxiii
 
Bits protein structure
Bits protein structureBits protein structure
Bits protein structure
 
Homology Modeling.pptx
Homology Modeling.pptxHomology Modeling.pptx
Homology Modeling.pptx
 
Protein Threading
Protein ThreadingProtein Threading
Protein Threading
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 

More from Saramita De Chakravarti

Data Retrieval Systems
Data Retrieval SystemsData Retrieval Systems
Data Retrieval Systems
Saramita De Chakravarti
 
Protein docking
Protein dockingProtein docking
Protein docking
Saramita De Chakravarti
 
QSAR : Activity Relationships Quantitative Structure
QSAR : Activity Relationships Quantitative StructureQSAR : Activity Relationships Quantitative Structure
QSAR : Activity Relationships Quantitative Structure
Saramita De Chakravarti
 
MOLECULAR DOCKING
MOLECULAR DOCKINGMOLECULAR DOCKING
MOLECULAR DOCKING
Saramita De Chakravarti
 
Molecular Markers: Major Applications in Insects
Molecular Markers: Major Applications in InsectsMolecular Markers: Major Applications in Insects
Molecular Markers: Major Applications in Insects
Saramita De Chakravarti
 
OLFACTION
OLFACTIONOLFACTION
Synthesis and Actions of Juvenile Hormones In Insect Development (MS Power…
Synthesis and Actions of Juvenile Hormones In Insect Development (MS Power…Synthesis and Actions of Juvenile Hormones In Insect Development (MS Power…
Synthesis and Actions of Juvenile Hormones In Insect Development (MS Power…
Saramita De Chakravarti
 

More from Saramita De Chakravarti (7)

Data Retrieval Systems
Data Retrieval SystemsData Retrieval Systems
Data Retrieval Systems
 
Protein docking
Protein dockingProtein docking
Protein docking
 
QSAR : Activity Relationships Quantitative Structure
QSAR : Activity Relationships Quantitative StructureQSAR : Activity Relationships Quantitative Structure
QSAR : Activity Relationships Quantitative Structure
 
MOLECULAR DOCKING
MOLECULAR DOCKINGMOLECULAR DOCKING
MOLECULAR DOCKING
 
Molecular Markers: Major Applications in Insects
Molecular Markers: Major Applications in InsectsMolecular Markers: Major Applications in Insects
Molecular Markers: Major Applications in Insects
 
OLFACTION
OLFACTIONOLFACTION
OLFACTION
 
Synthesis and Actions of Juvenile Hormones In Insect Development (MS Power…
Synthesis and Actions of Juvenile Hormones In Insect Development (MS Power…Synthesis and Actions of Juvenile Hormones In Insect Development (MS Power…
Synthesis and Actions of Juvenile Hormones In Insect Development (MS Power…
 

Recently uploaded

Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
timhan337
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 

Recently uploaded (20)

Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 

Protein Structure, Databases and Structural Alignment

  • 1. Protein Structure, DatabaSeS anD Structural alignment Saramita De Chakravarti Research Scientist, II (i( Chembiotech Research Laboratories 1
  • 2. 2 Basics of proteinBasics of protein structurestructure
  • 3. 3 Why Proteins Structure ?Why Proteins Structure ?  Proteins are fundamental components of all living cells, performing a variety of biological tasks.  Each protein has a particular 3D structure that determines its function.  Protein structure is more conserved than protein sequence, and more closely related to function.
  • 4. 4 Protein core - usually conserved. Protein loops - variable regions Hydrophobic core Surface loops Protein Structure
  • 5. 5 Supersecondary structures Assembly of secondary structures which are shared by many structures. Beta hairpin Beta-alpha-beta unit Helix hairpin
  • 6. 6 Hemoglobin (1bab( Fold: General structure composed of sets of Supersecondary structures
  • 7. 7 http://scop.berkeley.edu/count.html How Many Folds Are There ?How Many Folds Are There ?
  • 8. 8 • Two conserved sequences similar structures • Two similar structures conserved sequences? Structure – Sequence RelationshipsStructure – Sequence Relationships There are cases of proteins with the same structure but no clear sequence similarity.
  • 9. 9 Principles of Protein Structure •Today's proteins reflect millions of years of evolution. •3D structure is better conserved than sequence during evolution. •Similarities among sequences or among structures may reveal information about shared biological functions of a protein family.
  • 10. 10 The Levinthal paradox Assume a protein is comprised of 100 AAs and that each AA can take up 10 different conformations. Altogether we get:10100 (i.e. google( conformations. If each conformation were sampled in the shortest possible time (time of a molecular vibration ~ 10-13 s( it would take an astronomical amount of time (~1077 years( to sample all possible conformations, in order to find the Native State.
  • 11. 11 The Levinthal paradox Luckily, nature works out with these sorts of numbers and the correct conformation of a protein is reached within seconds.
  • 12. 12 How is the 3D Structure Determined ?How is the 3D Structure Determined ? Experimental methods (Best approach(:Experimental methods (Best approach(: • X-rays crystallography. • NMR. • Others (e.g., neutron diffraction(.
  • 13. 13 How is the 3D Structure Determined ?How is the 3D Structure Determined ? In-silico methodsIn-silico methods Ab-initio structure prediction given only the sequence as input - not always successful.
  • 14. 14 A note on ab-initio predictions: The current state is that “failure can no longer be guaranteed”…
  • 15. 15 A note on ab-initio secondary structure prediction: Success ~70%.
  • 16. 16 How is the 3D Structure Determined ?How is the 3D Structure Determined ? In-silico methodsIn-silico methods Threading = Sequence-structure alignment. The idea is to search for a structure and sequence in existing databases of 3D structure, and use similarity of sequences + information on the structures to find best predicted structures.
  • 17. 17 Comments • X-ray crystallography is the most widely used method. • Quaternary structure of large proteins (ribosomes, virus particles, etc) can be determined by electron microscopes (cryoEM).
  • 19. 19 PDB: Protein Data Bank • Holds 3D models of biological macromolecules (protein, RNA, DNA). • All data are available to the public. • Obtained by X-Ray crystallography (84%) or NMR spectroscopy (16%). • Submitted by biologists and biochemists from around the world.
  • 20. 20 PDB: Protein Data Bank •Founded in 1971 by Brookhaven National Laboratory, New York. •Transferred to the Research Collaboratory for Structural Bioinformatics (RCSB) in 1998. •Currently it holds > 49,426 released structures. 61695
  • 21. 21 PDB - model • A model defines the 3D positions of atoms in one or more molecules. • There are models of proteins, protein complexes, proteins and DNA, protein segments, etc … • The models also include the positions of ligand molecules, solvent molecules, metal ions, etc.
  • 22. 22 PDB – Protein Data Bank http://www.pdb.org/pdb/home/home.do
  • 23. 23 The PDB file – text formatThe PDB file – text format
  • 24. 24 The PDB file – textThe PDB file – text formatformat ATOM: Usually protein or DNA HETATM: Usually Ligand, ion, water chain Residue identity Residue number Atom number Atom identity The coordinates for each residue in the structure X Y Z
  • 26. 26 Why structural alignment? • Structural similarity can point to remote evolutionary relationship • Shared structural motifs among proteins suggest similar biological function • Getting insight into sequence-structure mapping (e.g., which parts of the protein structure are conserved among related organisms).
  • 27. 27 As in any alignment problem, we can search for GLOBAL ALIGNMENT or for LOCAL ALIGNMENT
  • 29. 29 What is the best transformation thatWhat is the best transformation that superimposes the unicorn on the lion?superimposes the unicorn on the lion?
  • 30. 30 Solution: Regard the shapes as sets of points and try to “match” these sets using a transformation
  • 31. 31 This is not a good result….
  • 33. 33 Kinds of transformations: • Rotation • Translation • Scaling and more….
  • 37. 37 We represent a protein as a geometric object in the plane. The object consists of points represented by coordinates (x, y, z). Thr Lys Met Gly Glu Ala
  • 38. 38 The aim: Given two proteins Find the transformation that produces the best Superimposition of one protein onto the other
  • 39. 39 Correspondence is Unknown Given two configurations of points in the three dimensional space: +
  • 40. 40 Find those rotations and translations of one of the point sets which produce “large” superimpositions of corresponding 3-D points ?
  • 42. 42 Simple case – two closely related proteins with the same number of amino acids. Question: how do we asses the quality of the transformation? +
  • 43. 43 Scoring the Alignment Two point sets: A={ai} i=1…n B={bj} j=1…m • Pairwise Correspondence: (ak1,bt1) (ak2,bt2)… (akN,btN) (1) Bottleneck max ||aki – bti|| (2) RMSD (Root Mean Square Distance) Sqrt( Σ||aki – bti||2 /N)
  • 44. 44 RMSD – Root Mean Square Deviation Given two sets of 3-D points : P={pi}, Q={qi} , i=1,…,n; rmsd(P,Q) = √ Σ i|pi - qi |2 /n Find a 3-D transformation T* such that: rmsd( T* (P), Q ) = minT √ Σ i|T(pi) - qi |2 /n Find the highest number of atoms aligned with the lowest RMSD
  • 45. 45 Pitfalls of RMSD • all atoms are treated equally (residues on the surface have a higher degree of freedom than those in the core) • best alignment does not always mean minimal RMSD • does not take into account the attributes of the amino acids
  • 46. 46 Flexible alignment vs. Rigid alignment Rigid alignment Flexible alignment
  • 47. 47 Some more issuesSome more issues
  • 48. 48 Does the fact that all proteins have alpha- helix indicates that they are all evolutionary related? No. Alpha helices reflect physical constraints, as do beta sheets. For structures – it is difficult sometimes to separate convergent evolution from evolutionary relatedness.
  • 49. 49 Structural genomics: solve or predict 3D of all proteins of a given organism (X-ray, NMR, and homology modelling). Unlike traditional structural biology, 3D is often solved before anything is known on the protein in question. A new challenge emerged: predict a protein’s function from its 3D structure.
  • 50. 50 CASP: a competition for predicting 3D structures. Instead of running to publish a new 3D structure, the AA sequence is published and each group is invited to give their predictions.
  • 51. 51 Capri: same as casp – but for docking.
  • 52. 52 Homology modeling: predicting the structure from a closely related known structure. This can be important for example to predict how a mutation influences the structure
  • 53. 53

Editor's Notes

  1. Atoms on the surface have a higher degree of freedom than those in the core