SlideShare a Scribd company logo
1 of 82
FBW
             27-11-2012




Wim Van Criekinge
Inhoud Lessen: Bioinformatica




                                GEEN LES
Biobix: Applied Bioinformatics Research


• Thesisonderwerpen
    – Lopend onderzoek
         •   Biomerker predictie / Methylatie
         •   Metabonomics
         •   Peptidomics
         •   Translational biotechnology (text mining)
         •   Structural Genomics
         •   miRNA prediction / Target Prediction
         •   Exploring genomic dark matter (junk mining)
    – Samenwerking met diverse instituten
    – Ambities om te peer-reviewed te publiceren
The reason for “bioinformatics” to exist ?

                  • empirical finding: if two biological
                    sequences are sufficiently similar, almost
                    invariably they have similar biological
                    functions and will be descended from a
                    common ancestor.
                  • (i) function is encoded into sequence,
                    this means: the sequence provides the
                    syntax and
                  • (ii) there is a redundancy in the
                    encoding, many positions in the
                    sequence may be changed without
                    perceptible changes in the function, thus
                    the semantics of the encoding is robust.
Protein Structure

                    Introduction
                      Why ?
                      How do proteins fold ?
                    Levels of protein structure
                      0,1,2,3,4
                    X-ray / NMR
                    The Protein Database (PDB)
                    Protein Modeling
                    Bioinformatics & Proteomics
                    Weblems
Why protein structure ?


  • Proteins perform a variety of cellular
    tasks in the living cells
  • Each protein adopts a particular folding
    that determines its function
  • The 3D structure of a protein can bring
    into close proximity residues that are far
    apart in the amino acid sequence
  • Catalytic site: Business End of the
    molecule
Rationale for understanding protein structure and function

                                                  structure determination
            Protein sequence                         structure prediction

            -large numbers of
             sequences, including
                                                                 Protein structure
             whole genomes
                                                                 - three dimensional
                                                                 - complicated
                                                                 - mediates function


                     ?
Protein function                                          homology
                                                    rational mutagenesis
- rational drug design and treatment of disease     biochemical analysis
- protein and genetic engineering                       model studies
- build networks to model cellular pathways
- study organismal function and evolution
About the use of protein models (Peitch)

• Structure is preserved under evolution when
  sequence is not
     – Interpreting the impact of mutations/SNPs and conserved
       residues on protein function. Potential link to disease
         • Function ?
              – Biochemical: the chemical interactions occerring in a protein
              – Biological: role within the cell
              – Phenotypic: the role in the organism
         • Gene Ontology functional classification !
     – Priorisation of residues to mutate to determine protein
       function
     – Providing hints for protein function:Catalytic mechanisms
       of enzymes often require key residues to be close
       together in 3D space
     – (protein-ligand complexes, rational drug design, putative
       interaction interfaces)
MIS-SENSE MUTATION
e.g. Sickle Cell Anaemia
Cause: defective haemoglobin due to mutation in β-
globin gene
Symptoms: severe anaemia and death in homozygote
Normal β-globin - 146 amino acids
val - his - leu - thr - pro - glu - glu - ---------
  1     2     3     4    5      6     7
Normal gene (aa 6)             Mutant gene
DNA      CTC                   CAC
mRNA     GAG                   GUG
Product Glu                    Valine

Mutant β-globin
val - his - leu - thr - pro - val - glu - ---------
Protein Conformation


 • Christian Anfinsen
   Studies on reversible denaturation
   “Sequence specifies conformation”
 • Chaperones and disulfide
   interchange enzymes:
   involved but not controlling final state, they
   provide environment to refold if misfolded
 • Structure implies function: The amino
   acid sequence encodes the protein’s
   structural information
How does a protein fold ?

• by itself:
   – Anfinsen had developed what he called his
     "thermodynamic hypothesis" of protein folding to explain
     the native conformation of amino acid structures. He
     theorized that the native or natural conformation occurs
     because this particular shape is thermodynamically the
     most stable in the intracellular environment. That is, it
     takes this shape as a result of the constraints of the
     peptide bonds as modified by the other chemical and
     physical properties of the amino acids.
   – To test this hypothesis, Anfinsen unfolded the RNase
     enzyme under extreme chemical conditions and observed
     that the enzyme's amino acid structure refolded
     spontaneously back into its original form when he returned
     the chemical environment to natural cellular conditions.
   – "The native conformation is determined by the totality of
     interatomic interactions and hence by the amino acid
     sequence, in a given environment."
Protein Structure

                    Introduction
                      Why ?
                      How do proteins fold ?
                    Levels of protein structure
                      0,1,2,3,4
                    X-ray / NMR
                    The Protein Database (PDB)
                    Protein Modeling
                    Bioinformatics & Proteomics
                    Weblems
The Basics

 • Proteins are linear heteropolymers: one or more
   polypeptide chains
 • Below about 40 residues the term peptide is frequently
   used.
 • A certain number of residues is necessary to perform a
   particular biochemical function, and around 40-50
   residues appears to be the lower limit for a functional
   domain size.
 • Protein sizes range from this lower limit to several
   hundred residues in multi-functional proteins.
 • Three-dimentional shapes (folds) adopted vary
   enormously
 • Experimental methods:
     –   X-ray crystallography
     –   NMR (nuclear magnetic resonance)
     –   Electron microscopy
     –   Ab initio calculations …
Levels of protein structure

                  • Zeroth: amino acid composition
                    (proteomics, %cysteine, %glycine)
Amino Acid Residues




    The basic structure of an a-amino acid is quite simple. R denotes any one of the
    20 possible side chains (see table below). We notice that the Ca-atom has 4
    different ligands (the H is omitted in the drawing) and is thus chiral. An easy
    trick to remember the correct L-form is the CORN-rule: when the Ca-atom is
    viewed with the H in front, the residues read "CO-R-N" in a clockwise
    direction.
Amino Acid Residues
Amino Acid Residues
Amino Acid Residues
Amino Acid Residues
Levels of protein structure


                    • Primary: This is simply the order of
                      covalent linkages along the
                      polypeptide chain, I.e. the sequence
                      itself
Backbone Torsion Angles
Backbone Torsion Angles
Levels of protein structure


   • Secondary
        – Local organization of the protein backbone: alpha-
          helix, Beta-strand (which assemble into Beta-
          sheets) turn and interconnecting loop.
Ramachandran / Phi-Psi Plot
The alpha-helix
A Practical Approach: Interpretation

                     • Residues with hydrophobic properties
                       conserved at i, i+2, i+4 separated by
                       unconserved or hydrophilic residues
                       suggest surface beta- strands.
                       A short run of hydrophobic amino acids
                       (4 residues) suggests a buried beta-
                       strand.
                       Pairs of conserved hydrophobic amino
                       acids separated by pairs of
                       unconserved, or hydrophilic residues
                       suggests an alfa-helix with one face
                       packing in the protein core. Likewise,
                       an i, i+3, i+4, i+7 pattern of conserved
                       hydrophobic residues.
Beta-sheets
Topologies of Beta-sheets
Secondary structure prediction ?
Secondary structure prediction:CHOU-FASMAN




   • Chou, P.Y. and Fasman, G.D. (1974).
   Conformational parameters for amino acids in helical, -
   sheet, and random coil regions calculated from proteins.
   Biochemistry 13, 211-221.
   • Chou, P.Y. and Fasman, G.D. (1974).
   Prediction of protein conformation.
   Biochemistry 13, 222-245.
Secondary structure prediction:CHOU-FASMAN




   •Method
     •Assigning a set of prediction values to a
     residue, based on statistic analysis of 15
     proteins
     • Applying a simple algorithm to those
     numbers
Secondary structure prediction:CHOU-FASMAN


    Calculation of preference parameters
    For each of the 20 residues and each secondary structure ( -
    helix, -sheet and -turn):
               observed counts
    • P = Log --------------------- + 1.0
               expected counts
    • Preference parameter > 1.0  specific residue has a
    preference for the specific secondary structure.
    • Preference parameter = 1.0  specific residue does not
    have a preference for, nor dislikes the specific secondary
    structure.
    • Preference parameter < 1.0  specific residue dislikes the
    specific secondary structure.
Secondary structure prediction:CHOU-FASMAN


         Preference parameters
            Residue   P(a)   P(b)   P(t)    f(i)   f(i+1)   f(i+2)   f(i+3)

              Ala     1.45   0.97   0.57   0.049   0.049    0.034    0.029

             Arg      0.79   0.90   1.00   0.051   0.127    0.025    0.101

             Asn      0.73   0.65   1.68   0.101   0.086    0.216    0.065

             Asp      0.98   0.80   1.26   0.137   0.088    0.069    0.059

             Cys      0.77   1.30   1.17   0.089   0.022    0.111    0.089

             Gln      1.17   1.23   0.56   0.050   0.089    0.030    0.089

             Glu      1.53   0.26   0.44   0.011   0.032    0.053    0.021

              Gly     0.53   0.81   1.68   0.104   0.090    0.158    0.113

              His     1.24   0.71   0.69   0.083   0.050    0.033    0.033

              Ile     1.00   1.60   0.58   0.068   0.034    0.017    0.051

             Leu      1.34   1.22   0.53   0.038   0.019    0.032    0.051

             Lys      1.07   0.74   1.01   0.060   0.080    0.067    0.073

             Met      1.20   1.67   0.67   0.070   0.070    0.036    0.070

             Phe      1.12   1.28   0.71   0.031   0.047    0.063    0.063

             Pro      0.59   0.62   1.54   0.074   0.272    0.012    0.062

             Ser      0.79   0.72   1.56   0.100   0.095    0.095    0.104

              Thr     0.82   1.20   1.00   0.062   0.093    0.056    0.068

              Trp     1.14   1.19   1.11   0.045   0.000    0.045    0.205

              Tyr     0.61   1.29   1.25   0.136   0.025    0.110    0.102

              Val     1.14   1.65   0.30   0.023   0.029    0.011    0.029
Secondary structure prediction:CHOU-FASMAN


     Applying algorithm
1.   Assign parameters to residue.
2.   Identify regions where 4 out of 6 residues have P(a)>100: -helix. Extend
     helix in both directions until four contiguous residues have an average
     P(a)<100: end of -helix. If segment is longer than 5 residues and P(a)>P(b):
       -helix.
3.   Repeat this procedure to locate all of the helical regions.
4.   Identify regions where 3 out of 5 residues have P(b)>100: -sheet. Extend
     sheet in both directions until four contiguous residues have an average
     P(b)<100: end of -sheet. If P(b)>105 and P(b)>P(a): -helix.
5.   Rest: P(a)>P(b)  -helix. P(b)>P(a)  -sheet.
6.   To identify a bend at residue number i, calculate the following value:
     p(t) = f(i)f(i+1)f(i+2)f(i+3)
     If: (1) p(t) > 0.000075; (2) average P(t)>1.00 in the tetrapeptide; and (3)
     averages for tetrapeptide obey P(a)<P(t)>P(b): -turn.
Secondary structure prediction:CHOU-FASMAN


   Successful method?
   19 proteins evaluated:
   • Successful in locating 88% of helical and 95% of
       regions
   • Correctly predicting 80% of helical and 86% of -
     sheet residues
   • Accuracy of predicting the three conformational
     states for all residues, helix, b, and coil, is 77%
   Chou & Fasman:successful method
   After 1974:improvement of preference parameters
Sander-Schneider: Evolution of overall structure


• Naturally occurring sequences with more than
  20% sequence identity over 80 or more
  residues always adopt the same basic
  structure (Sander and Schneider 1991)
Sander-Schneider

• HSSP: homology derived secondary structure
Structural Family Databases



                    • SCOP:
                         – Structural Classification of
                           Proteins
                    • FSSP:
                         – Family of Structurally Similar
                           Proteins
                    • CATH:
                         – Class, Architecture, Topology, H
                           omology
Levels of protein structure


                  • Tertiary
                      – Packing of secondary structure
                        elements into a compact spatial unit
                      – Fold or domain – this is the level to
                        which structure is currently possible
Domains
Protein Architecture
Domains



   • Protein Dissection into domain
   • Conserved Domain Architecture
     Retrieval Tool (CDART) uses
     information in Pfam and SMART to
     assign domains along a sequence
   • (automatic when blasting)
Domains


• From the analysis of alignment of protein
  families
• Conserved sequence features, usually
  associate with a specific function
• PROSITE database for protein
  “signature” protein (large amount of FP &
  FN)
• From aligment of homologous sequences
  (PRINTS/PRODOM)
• From Hidden Markov Models (PFAM)
• Meta approach: INTERPRO
Protein Architecture
Levels of protein structure: Topology
Hydrophobicity Plot

P53_HUMAN (P04637) human cellular tumor antigen p53
      Kyte-Doolittle hydrophilicty, window=19
The ‘positive inside’ rule
(EMBO J. 5:3021; EJB 174:671,205:1207; FEBS lett. 282:41)



                                Bacterial IM
                                In: 16% KR out: 4% KR
                                Eukaryotic PM
                                In: 17% KR out: 7% KR
                                Thylakoid membrane
                                In: 13% KR out: 5% KR
                                Mitochondrial IM
                                In: 10% KR out: 3% KR
GPCR Topology


   • Membrane-bound receptors

   • Transducing messages as photons, organic
   odorants, nucleotides, nucleosides, peptides, lipids and
   proteins.
   • 6 different families

   • A very large number of different domains both to
   bind their ligand and to activate G proteins.

   • Pharmaceutically the most important class

   • Challenge: Methods to find novel GCPRs in human genome
   …
GPCR Topology
GPCR Topology           GPCR Structure




                • Seven transmembrane regions
                • Hydrophobic/ hydrophilic domains
                • Conserved residues and motifs (i.e. NPXXY)
GPCR Topology




          Eg. Plot conserverd residues (or multiple alignement: MSA to SSA)
Levels of protein structure




                              • Difficult to predict
                              • Functional units:
                                Apoptosome, proteasome
Protein Structure

                    Introduction
                      Why ?
                      How do proteins fold ?
                    Levels of protein structure
                      0,1,2,3,4
                    X-ray / NMR
                    The Protein Database (PDB)
                    Protein Modeling
                    Bioinformatics & Proteomics
                    Weblems
What is X-ray Crystallography

• X-ray crystallography is an experimental
  technique that exploits the fact that X-rays are
  diffracted by crystals.
• X-rays have the proper wavelength (in the
  Ångström range, ~10-8 cm) to be scattered by
  the electron cloud of an atom of comparable
  size.
• Based on the diffraction pattern obtained from
  X-ray scattering off the periodic assembly of
  molecules or atoms in the crystal, the electron
  density can be reconstructed.
• A model is then progressively built into the
  experimental electron density, refined against
  the data and the result is a quite accurate
  molecular structure.
NMR or Crystallography ?

 • NMR uses protein in solution
     – Can look at the dynamic properties of the protein structure
     – Can look at the interactions between the protein and
       ligands, substrates or other proteins
     – Can look at protein folding
     – Sample is not damaged in any way
     – The maximum size of a protein for NMR structure determination is ~30
       kDa.This elliminates ~50% of all proteins
     – High solubility is a requirement

 • X-ray crystallography uses protein crystals
     –   No size limit: As long as you can crystallise it
     –   Solubility requirement is less stringent
     –   Simple definition of resolution
     –   Direct calculation from data to electron density and back again
     –   Crystallisation is the process bottleneck, Binary (all or nothing)
     –   Phase problem Relies on heavy atom soaks or SeMet incorporation
 • Both techniques require large amounts of pure protein and require
   expensive equipment!
Protein Structure

                    Introduction
                      Why ?
                      How do proteins fold ?
                    Levels of protein structure
                      0,1,2,3,4
                    X-ray / NMR
                    The Protein Database (PDB)
                    Protein Modeling
                    Bioinformatics & Proteomics
                    Weblems
PDB
PDB
PDB
PDB
Visualizing Structures




      Cn3D versie 4.0 (NCBI)
Visualizing Structures




        Ball: Van der Waals radius   N, blue/O, red/S, yellow/C, gray (green)
        Stick: length joins center
Visualizing Structures




                         From N to C
Visualizing Structures


                   • Demonstration of Protein explorer
                   • PDB, install Chime
                   • Search helicase (select structure where
                     DNA is present)
                   • Stop spinning, hide water molecules
                   • Show basic residues, interact with
                     negatively charged backbone

                   • RASMOL / Cn3D
Protein Structure

                    Introduction
                      Why ?
                      How do proteins fold ?
                    Levels of protein structure
                      0,1,2,3,4
                    X-ray / NMR
                    The Protein Database (PDB)
                    Protein Modeling
                    Bioinformatics & Proteomics
                    Weblems
Modeling
Protein Stucture
     Molecular Modeling:
building a 3D protein structure
       from its sequence
Modeling

  • Finding a structural homologue
  • Blast
     –versus PDB database or PSI-
      blast (E<0.005)
     –Domain coverage at least 60%
  • Avoid Gaps
     –Choose for few gaps and
      reasonable similarity scores
      instead of lots of gaps and high
      similarity scores
Modeling

• Extract “template” sequences and align with query

•   Whatch out for missing data (PDB file) and complement with additonal
    templates
•   Try to get as much information as possible, X/NMR

•   Sequence alignment from structure comparson of templates (SSA) can be
    different from a simple sequence aligment

•   >40% identity, any aligment method is OK
•   <40%, checks are essential
     –   Residue conservation checks in functional regions (patterns/motifs)
     –   Indels: combine gaps separted by few resides
     –   Manual editing: Move gaps from secondary elements to loops
     –   Within loops, move gaps to loop ends, i.e. turnaround point of backbone

•   Align templates structurally, extract the corresponding SSA or QTA
    (Query/template alignment)
Modeling


 Input for model building

 • Query sequence (the one you want the 3D
   model for)
 • Template sequences and structures
 • Query/Template(s) (structure) sequence
   aligment
Modeling


 • Methods (details on these see paper):
    – WHATIF,
    – SWISS-MODEL,
    – MODELLER,
    – ICM,
    – 3D-JIGSAW,
    – CPH-models,
    – SDC1
Modeling



   • Model evaluation (How good is the
     prediction, how much can the algorithm
     rely/extract on the provided templates)
       – PROCHECK
       – WHATIF
       – ERRAT

   • CASP (Critical Assessment of Structure
     Prediction)
       – Beste method is manual alignment editing !
Comparative modelling at CASP

                                          BC         CASP1           CASP2          CASP3          CASP4

              alignment             excellent         poor             fair           fair         fair
              side chain             ~ 80%           ~ 50%           ~ 75%           ~75%          ~75%
              short loops             1.0 Å          ~ 3.0 Å         ~ 1.0 Å        ~ 1.0 Å        ~ 1.0 Å
              longer loops            2.0 Å          > 5.0 Å         ~ 3.0 Å        ~ 2.5 Å        ~ 2.0 Å


   CASP4: overall model accuracy ranging from 1 Å to 6 Å for 50-10% sequence identity

**T128/sodm – 1.0 Å (198 residues; 50%)    **T111/eno – 1.7 Å (430 residues; 51%)    **T122/trpa – 2.9 Å (241 residues; 33%)




**T125/sp18 – 4.4 Å (137 residues; 24%)   **T112/dhso – 4.9 Å (348 residues; 24%)    **T92/yeco – 5.6 Å (104 residues; 12%)
Protein Engineering / Protein Design

More Related Content

What's hot

Structure-Function Analysis of POR Mutants
Structure-Function Analysis of POR MutantsStructure-Function Analysis of POR Mutants
Structure-Function Analysis of POR Mutants
AYang999
 
BT631-8-Folds_proteins
BT631-8-Folds_proteinsBT631-8-Folds_proteins
BT631-8-Folds_proteins
Rajesh G
 
Structure-Function Analysis of POR Mutants
Structure-Function Analysis of POR MutantsStructure-Function Analysis of POR Mutants
Structure-Function Analysis of POR Mutants
AYang999
 
Jack Tuszynski
Jack TuszynskiJack Tuszynski
Jack Tuszynski
agrilinea
 
BT631-6-structural_motifs
BT631-6-structural_motifsBT631-6-structural_motifs
BT631-6-structural_motifs
Rajesh G
 
14 proteins
14 proteins14 proteins
14 proteins
MUBOSScz
 

What's hot (20)

Structure-Function Analysis of POR Mutants
Structure-Function Analysis of POR MutantsStructure-Function Analysis of POR Mutants
Structure-Function Analysis of POR Mutants
 
BT631-8-Folds_proteins
BT631-8-Folds_proteinsBT631-8-Folds_proteins
BT631-8-Folds_proteins
 
Structure-Function Analysis of POR Mutants
Structure-Function Analysis of POR MutantsStructure-Function Analysis of POR Mutants
Structure-Function Analysis of POR Mutants
 
Conformational study of polynucleotide
Conformational study of polynucleotideConformational study of polynucleotide
Conformational study of polynucleotide
 
Protein folding @ sid
Protein folding @ sidProtein folding @ sid
Protein folding @ sid
 
Membrane proteins
Membrane proteinsMembrane proteins
Membrane proteins
 
Jack Tuszynski
Jack TuszynskiJack Tuszynski
Jack Tuszynski
 
Proteins structure and role in gene expression
Proteins structure and role in gene expressionProteins structure and role in gene expression
Proteins structure and role in gene expression
 
Supersecondary structure ppt
Supersecondary structure pptSupersecondary structure ppt
Supersecondary structure ppt
 
BT631-6-structural_motifs
BT631-6-structural_motifsBT631-6-structural_motifs
BT631-6-structural_motifs
 
Protein Structures
Protein StructuresProtein Structures
Protein Structures
 
non ribosomal peptide synthesis (molecular biology)
non ribosomal peptide synthesis (molecular biology)non ribosomal peptide synthesis (molecular biology)
non ribosomal peptide synthesis (molecular biology)
 
Protein stability(molecular biology)
Protein stability(molecular biology)Protein stability(molecular biology)
Protein stability(molecular biology)
 
Protein folding
Protein foldingProtein folding
Protein folding
 
Protein structure, Protein unfolding and misfolding
Protein structure, Protein unfolding and misfoldingProtein structure, Protein unfolding and misfolding
Protein structure, Protein unfolding and misfolding
 
Protein
ProteinProtein
Protein
 
14 proteins
14 proteins14 proteins
14 proteins
 
The World of Nonribosomal Peptides
The World of Nonribosomal PeptidesThe World of Nonribosomal Peptides
The World of Nonribosomal Peptides
 
Protein structure and_stability-1
Protein structure and_stability-1Protein structure and_stability-1
Protein structure and_stability-1
 
Pro fold
Pro foldPro fold
Pro fold
 

Viewers also liked

Bioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingBioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searching
Prof. Wim Van Criekinge
 
2014 03 28_next_generation_epigenetic_profling_v_les_epigenetica_vweb
2014 03 28_next_generation_epigenetic_profling_v_les_epigenetica_vweb2014 03 28_next_generation_epigenetic_profling_v_les_epigenetica_vweb
2014 03 28_next_generation_epigenetic_profling_v_les_epigenetica_vweb
Prof. Wim Van Criekinge
 

Viewers also liked (20)

2012 12 12_adam_v_final
2012 12 12_adam_v_final2012 12 12_adam_v_final
2012 12 12_adam_v_final
 
Bioinformatics v2014 wim_vancriekinge
Bioinformatics v2014 wim_vancriekingeBioinformatics v2014 wim_vancriekinge
Bioinformatics v2014 wim_vancriekinge
 
Bioinformatics t5-databasesearching v2014
Bioinformatics t5-databasesearching v2014Bioinformatics t5-databasesearching v2014
Bioinformatics t5-databasesearching v2014
 
NXTGNT kick off
NXTGNT kick offNXTGNT kick off
NXTGNT kick off
 
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
 
Bioinformatica t3-scoring matrices
Bioinformatica t3-scoring matricesBioinformatica t3-scoring matrices
Bioinformatica t3-scoring matrices
 
Bioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingBioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searching
 
2012 12 02_epigenetic_profiling_environmental_health_sciences
2012 12 02_epigenetic_profiling_environmental_health_sciences2012 12 02_epigenetic_profiling_environmental_health_sciences
2012 12 02_epigenetic_profiling_environmental_health_sciences
 
2015 bioinformatics python_io_wim_vancriekinge
2015 bioinformatics python_io_wim_vancriekinge2015 bioinformatics python_io_wim_vancriekinge
2015 bioinformatics python_io_wim_vancriekinge
 
2015 bioinformatics bio_python_part4
2015 bioinformatics bio_python_part42015 bioinformatics bio_python_part4
2015 bioinformatics bio_python_part4
 
Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014
 
Jose María Ordovás-El impacto de las ciencias ómicas en la nutrición, la medi...
Jose María Ordovás-El impacto de las ciencias ómicas en la nutrición, la medi...Jose María Ordovás-El impacto de las ciencias ómicas en la nutrición, la medi...
Jose María Ordovás-El impacto de las ciencias ómicas en la nutrición, la medi...
 
2014 03 28_next_generation_epigenetic_profling_v_les_epigenetica_vweb
2014 03 28_next_generation_epigenetic_profling_v_les_epigenetica_vweb2014 03 28_next_generation_epigenetic_profling_v_les_epigenetica_vweb
2014 03 28_next_generation_epigenetic_profling_v_les_epigenetica_vweb
 
2015 bioinformatics alignments_wim_vancriekinge
2015 bioinformatics alignments_wim_vancriekinge2015 bioinformatics alignments_wim_vancriekinge
2015 bioinformatics alignments_wim_vancriekinge
 
Thesis bio bix_2014
Thesis bio bix_2014Thesis bio bix_2014
Thesis bio bix_2014
 
Van criekinge next_generation_epigenetic_profling_vlille
Van criekinge next_generation_epigenetic_profling_vlilleVan criekinge next_generation_epigenetic_profling_vlille
Van criekinge next_generation_epigenetic_profling_vlille
 
2015 bioinformatics go_hmm_wim_vancriekinge
2015 bioinformatics go_hmm_wim_vancriekinge2015 bioinformatics go_hmm_wim_vancriekinge
2015 bioinformatics go_hmm_wim_vancriekinge
 
2014 05 21_personal_genomics_v_n2n_vfinal
2014 05 21_personal_genomics_v_n2n_vfinal2014 05 21_personal_genomics_v_n2n_vfinal
2014 05 21_personal_genomics_v_n2n_vfinal
 
Bio ontologies and semantic technologies
Bio ontologies and semantic technologiesBio ontologies and semantic technologies
Bio ontologies and semantic technologies
 
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
 

Similar to Bioinformatica t7-protein structure

Mapping protein to function
Mapping protein to functionMapping protein to function
Mapping protein to function
Abhik Seal
 
Cross Product Extensions to the Gene Ontology
Cross Product Extensions to the Gene OntologyCross Product Extensions to the Gene Ontology
Cross Product Extensions to the Gene Ontology
Chris Mungall
 
Computational Prediction Of Protein-1.pptx
Computational Prediction Of Protein-1.pptxComputational Prediction Of Protein-1.pptx
Computational Prediction Of Protein-1.pptx
ashharnomani
 
Proteomics: lecture (1) introduction to proteomics
Proteomics: lecture (1) introduction to proteomicsProteomics: lecture (1) introduction to proteomics
Proteomics: lecture (1) introduction to proteomics
Claudine83
 

Similar to Bioinformatica t7-protein structure (20)

Bioinformatics t7-protein structure-v2013_wim_vancriekinge
Bioinformatics t7-protein structure-v2013_wim_vancriekingeBioinformatics t7-protein structure-v2013_wim_vancriekinge
Bioinformatics t7-protein structure-v2013_wim_vancriekinge
 
Bioinformatics t7-proteinstructure v2014
Bioinformatics t7-proteinstructure v2014Bioinformatics t7-proteinstructure v2014
Bioinformatics t7-proteinstructure v2014
 
2016 bioinformatics i_proteins_wim_vancriekinge
2016 bioinformatics i_proteins_wim_vancriekinge2016 bioinformatics i_proteins_wim_vancriekinge
2016 bioinformatics i_proteins_wim_vancriekinge
 
2015 bioinformatics protein_structure_wimvancriekinge
2015 bioinformatics protein_structure_wimvancriekinge2015 bioinformatics protein_structure_wimvancriekinge
2015 bioinformatics protein_structure_wimvancriekinge
 
Bioinformatica 01-12-2011-t7-protein
Bioinformatica 01-12-2011-t7-proteinBioinformatica 01-12-2011-t7-protein
Bioinformatica 01-12-2011-t7-protein
 
Mapping protein to function
Mapping protein to functionMapping protein to function
Mapping protein to function
 
Cross Product Extensions to the Gene Ontology
Cross Product Extensions to the Gene OntologyCross Product Extensions to the Gene Ontology
Cross Product Extensions to the Gene Ontology
 
Computer Aided Molecular Modeling
Computer Aided Molecular ModelingComputer Aided Molecular Modeling
Computer Aided Molecular Modeling
 
Proteins structure and role in gene expression
Proteins structure and role in gene expressionProteins structure and role in gene expression
Proteins structure and role in gene expression
 
Part I : Introduction to Protein Structure
Part I : Introduction to Protein StructurePart I : Introduction to Protein Structure
Part I : Introduction to Protein Structure
 
Computational Prediction Of Protein-1.pptx
Computational Prediction Of Protein-1.pptxComputational Prediction Of Protein-1.pptx
Computational Prediction Of Protein-1.pptx
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
Drug design and discovery
Drug design and discoveryDrug design and discovery
Drug design and discovery
 
Proteomics: lecture (1) introduction to proteomics
Proteomics: lecture (1) introduction to proteomicsProteomics: lecture (1) introduction to proteomics
Proteomics: lecture (1) introduction to proteomics
 
Protein database
Protein databaseProtein database
Protein database
 
structure of proteins
structure of proteinsstructure of proteins
structure of proteins
 
structure of protins
structure of protins structure of protins
structure of protins
 
Protein function and bioinformatics
Protein function and bioinformaticsProtein function and bioinformatics
Protein function and bioinformatics
 
Cs273 structure prediction
Cs273 structure predictionCs273 structure prediction
Cs273 structure prediction
 
06_Alignment_2022.pdf
06_Alignment_2022.pdf06_Alignment_2022.pdf
06_Alignment_2022.pdf
 

More from Prof. Wim Van Criekinge

More from Prof. Wim Van Criekinge (20)

2020 02 11_biological_databases_part1
2020 02 11_biological_databases_part12020 02 11_biological_databases_part1
2020 02 11_biological_databases_part1
 
2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload
 
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload
 
2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload
 
2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload
 
2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload
 
P7 2018 biopython3
P7 2018 biopython3P7 2018 biopython3
P7 2018 biopython3
 
P6 2018 biopython2b
P6 2018 biopython2bP6 2018 biopython2b
P6 2018 biopython2b
 
P4 2018 io_functions
P4 2018 io_functionsP4 2018 io_functions
P4 2018 io_functions
 
P3 2018 python_regexes
P3 2018 python_regexesP3 2018 python_regexes
P3 2018 python_regexes
 
T1 2018 bioinformatics
T1 2018 bioinformaticsT1 2018 bioinformatics
T1 2018 bioinformatics
 
P1 2018 python
P1 2018 pythonP1 2018 python
P1 2018 python
 
Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]
 
2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql
 
2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload
 
2018 03 20_biological_databases_part3
2018 03 20_biological_databases_part32018 03 20_biological_databases_part3
2018 03 20_biological_databases_part3
 
2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload
 
2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload
 
P7 2017 biopython3
P7 2017 biopython3P7 2017 biopython3
P7 2017 biopython3
 
P6 2017 biopython2
P6 2017 biopython2P6 2017 biopython2
P6 2017 biopython2
 

Recently uploaded

An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
SanaAli374401
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
MateoGardella
 

Recently uploaded (20)

Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 

Bioinformatica t7-protein structure

  • 1.
  • 2. FBW 27-11-2012 Wim Van Criekinge
  • 4.
  • 5. Biobix: Applied Bioinformatics Research • Thesisonderwerpen – Lopend onderzoek • Biomerker predictie / Methylatie • Metabonomics • Peptidomics • Translational biotechnology (text mining) • Structural Genomics • miRNA prediction / Target Prediction • Exploring genomic dark matter (junk mining) – Samenwerking met diverse instituten – Ambities om te peer-reviewed te publiceren
  • 6. The reason for “bioinformatics” to exist ? • empirical finding: if two biological sequences are sufficiently similar, almost invariably they have similar biological functions and will be descended from a common ancestor. • (i) function is encoded into sequence, this means: the sequence provides the syntax and • (ii) there is a redundancy in the encoding, many positions in the sequence may be changed without perceptible changes in the function, thus the semantics of the encoding is robust.
  • 7. Protein Structure Introduction Why ? How do proteins fold ? Levels of protein structure 0,1,2,3,4 X-ray / NMR The Protein Database (PDB) Protein Modeling Bioinformatics & Proteomics Weblems
  • 8. Why protein structure ? • Proteins perform a variety of cellular tasks in the living cells • Each protein adopts a particular folding that determines its function • The 3D structure of a protein can bring into close proximity residues that are far apart in the amino acid sequence • Catalytic site: Business End of the molecule
  • 9. Rationale for understanding protein structure and function structure determination Protein sequence structure prediction -large numbers of sequences, including Protein structure whole genomes - three dimensional - complicated - mediates function ? Protein function homology rational mutagenesis - rational drug design and treatment of disease biochemical analysis - protein and genetic engineering model studies - build networks to model cellular pathways - study organismal function and evolution
  • 10. About the use of protein models (Peitch) • Structure is preserved under evolution when sequence is not – Interpreting the impact of mutations/SNPs and conserved residues on protein function. Potential link to disease • Function ? – Biochemical: the chemical interactions occerring in a protein – Biological: role within the cell – Phenotypic: the role in the organism • Gene Ontology functional classification ! – Priorisation of residues to mutate to determine protein function – Providing hints for protein function:Catalytic mechanisms of enzymes often require key residues to be close together in 3D space – (protein-ligand complexes, rational drug design, putative interaction interfaces)
  • 11. MIS-SENSE MUTATION e.g. Sickle Cell Anaemia Cause: defective haemoglobin due to mutation in β- globin gene Symptoms: severe anaemia and death in homozygote
  • 12. Normal β-globin - 146 amino acids val - his - leu - thr - pro - glu - glu - --------- 1 2 3 4 5 6 7 Normal gene (aa 6) Mutant gene DNA CTC CAC mRNA GAG GUG Product Glu Valine Mutant β-globin val - his - leu - thr - pro - val - glu - ---------
  • 13. Protein Conformation • Christian Anfinsen Studies on reversible denaturation “Sequence specifies conformation” • Chaperones and disulfide interchange enzymes: involved but not controlling final state, they provide environment to refold if misfolded • Structure implies function: The amino acid sequence encodes the protein’s structural information
  • 14. How does a protein fold ? • by itself: – Anfinsen had developed what he called his "thermodynamic hypothesis" of protein folding to explain the native conformation of amino acid structures. He theorized that the native or natural conformation occurs because this particular shape is thermodynamically the most stable in the intracellular environment. That is, it takes this shape as a result of the constraints of the peptide bonds as modified by the other chemical and physical properties of the amino acids. – To test this hypothesis, Anfinsen unfolded the RNase enzyme under extreme chemical conditions and observed that the enzyme's amino acid structure refolded spontaneously back into its original form when he returned the chemical environment to natural cellular conditions. – "The native conformation is determined by the totality of interatomic interactions and hence by the amino acid sequence, in a given environment."
  • 15. Protein Structure Introduction Why ? How do proteins fold ? Levels of protein structure 0,1,2,3,4 X-ray / NMR The Protein Database (PDB) Protein Modeling Bioinformatics & Proteomics Weblems
  • 16. The Basics • Proteins are linear heteropolymers: one or more polypeptide chains • Below about 40 residues the term peptide is frequently used. • A certain number of residues is necessary to perform a particular biochemical function, and around 40-50 residues appears to be the lower limit for a functional domain size. • Protein sizes range from this lower limit to several hundred residues in multi-functional proteins. • Three-dimentional shapes (folds) adopted vary enormously • Experimental methods: – X-ray crystallography – NMR (nuclear magnetic resonance) – Electron microscopy – Ab initio calculations …
  • 17. Levels of protein structure • Zeroth: amino acid composition (proteomics, %cysteine, %glycine)
  • 18. Amino Acid Residues The basic structure of an a-amino acid is quite simple. R denotes any one of the 20 possible side chains (see table below). We notice that the Ca-atom has 4 different ligands (the H is omitted in the drawing) and is thus chiral. An easy trick to remember the correct L-form is the CORN-rule: when the Ca-atom is viewed with the H in front, the residues read "CO-R-N" in a clockwise direction.
  • 19.
  • 24. Levels of protein structure • Primary: This is simply the order of covalent linkages along the polypeptide chain, I.e. the sequence itself
  • 27. Levels of protein structure • Secondary – Local organization of the protein backbone: alpha- helix, Beta-strand (which assemble into Beta- sheets) turn and interconnecting loop.
  • 30. A Practical Approach: Interpretation • Residues with hydrophobic properties conserved at i, i+2, i+4 separated by unconserved or hydrophilic residues suggest surface beta- strands. A short run of hydrophobic amino acids (4 residues) suggests a buried beta- strand. Pairs of conserved hydrophobic amino acids separated by pairs of unconserved, or hydrophilic residues suggests an alfa-helix with one face packing in the protein core. Likewise, an i, i+3, i+4, i+7 pattern of conserved hydrophobic residues.
  • 34. Secondary structure prediction:CHOU-FASMAN • Chou, P.Y. and Fasman, G.D. (1974). Conformational parameters for amino acids in helical, - sheet, and random coil regions calculated from proteins. Biochemistry 13, 211-221. • Chou, P.Y. and Fasman, G.D. (1974). Prediction of protein conformation. Biochemistry 13, 222-245.
  • 35. Secondary structure prediction:CHOU-FASMAN •Method •Assigning a set of prediction values to a residue, based on statistic analysis of 15 proteins • Applying a simple algorithm to those numbers
  • 36. Secondary structure prediction:CHOU-FASMAN Calculation of preference parameters For each of the 20 residues and each secondary structure ( - helix, -sheet and -turn): observed counts • P = Log --------------------- + 1.0 expected counts • Preference parameter > 1.0  specific residue has a preference for the specific secondary structure. • Preference parameter = 1.0  specific residue does not have a preference for, nor dislikes the specific secondary structure. • Preference parameter < 1.0  specific residue dislikes the specific secondary structure.
  • 37. Secondary structure prediction:CHOU-FASMAN Preference parameters Residue P(a) P(b) P(t) f(i) f(i+1) f(i+2) f(i+3) Ala 1.45 0.97 0.57 0.049 0.049 0.034 0.029 Arg 0.79 0.90 1.00 0.051 0.127 0.025 0.101 Asn 0.73 0.65 1.68 0.101 0.086 0.216 0.065 Asp 0.98 0.80 1.26 0.137 0.088 0.069 0.059 Cys 0.77 1.30 1.17 0.089 0.022 0.111 0.089 Gln 1.17 1.23 0.56 0.050 0.089 0.030 0.089 Glu 1.53 0.26 0.44 0.011 0.032 0.053 0.021 Gly 0.53 0.81 1.68 0.104 0.090 0.158 0.113 His 1.24 0.71 0.69 0.083 0.050 0.033 0.033 Ile 1.00 1.60 0.58 0.068 0.034 0.017 0.051 Leu 1.34 1.22 0.53 0.038 0.019 0.032 0.051 Lys 1.07 0.74 1.01 0.060 0.080 0.067 0.073 Met 1.20 1.67 0.67 0.070 0.070 0.036 0.070 Phe 1.12 1.28 0.71 0.031 0.047 0.063 0.063 Pro 0.59 0.62 1.54 0.074 0.272 0.012 0.062 Ser 0.79 0.72 1.56 0.100 0.095 0.095 0.104 Thr 0.82 1.20 1.00 0.062 0.093 0.056 0.068 Trp 1.14 1.19 1.11 0.045 0.000 0.045 0.205 Tyr 0.61 1.29 1.25 0.136 0.025 0.110 0.102 Val 1.14 1.65 0.30 0.023 0.029 0.011 0.029
  • 38. Secondary structure prediction:CHOU-FASMAN Applying algorithm 1. Assign parameters to residue. 2. Identify regions where 4 out of 6 residues have P(a)>100: -helix. Extend helix in both directions until four contiguous residues have an average P(a)<100: end of -helix. If segment is longer than 5 residues and P(a)>P(b): -helix. 3. Repeat this procedure to locate all of the helical regions. 4. Identify regions where 3 out of 5 residues have P(b)>100: -sheet. Extend sheet in both directions until four contiguous residues have an average P(b)<100: end of -sheet. If P(b)>105 and P(b)>P(a): -helix. 5. Rest: P(a)>P(b)  -helix. P(b)>P(a)  -sheet. 6. To identify a bend at residue number i, calculate the following value: p(t) = f(i)f(i+1)f(i+2)f(i+3) If: (1) p(t) > 0.000075; (2) average P(t)>1.00 in the tetrapeptide; and (3) averages for tetrapeptide obey P(a)<P(t)>P(b): -turn.
  • 39. Secondary structure prediction:CHOU-FASMAN Successful method? 19 proteins evaluated: • Successful in locating 88% of helical and 95% of regions • Correctly predicting 80% of helical and 86% of - sheet residues • Accuracy of predicting the three conformational states for all residues, helix, b, and coil, is 77% Chou & Fasman:successful method After 1974:improvement of preference parameters
  • 40.
  • 41. Sander-Schneider: Evolution of overall structure • Naturally occurring sequences with more than 20% sequence identity over 80 or more residues always adopt the same basic structure (Sander and Schneider 1991)
  • 42. Sander-Schneider • HSSP: homology derived secondary structure
  • 43. Structural Family Databases • SCOP: – Structural Classification of Proteins • FSSP: – Family of Structurally Similar Proteins • CATH: – Class, Architecture, Topology, H omology
  • 44. Levels of protein structure • Tertiary – Packing of secondary structure elements into a compact spatial unit – Fold or domain – this is the level to which structure is currently possible
  • 47. Domains • Protein Dissection into domain • Conserved Domain Architecture Retrieval Tool (CDART) uses information in Pfam and SMART to assign domains along a sequence • (automatic when blasting)
  • 48. Domains • From the analysis of alignment of protein families • Conserved sequence features, usually associate with a specific function • PROSITE database for protein “signature” protein (large amount of FP & FN) • From aligment of homologous sequences (PRINTS/PRODOM) • From Hidden Markov Models (PFAM) • Meta approach: INTERPRO
  • 50. Levels of protein structure: Topology
  • 51. Hydrophobicity Plot P53_HUMAN (P04637) human cellular tumor antigen p53 Kyte-Doolittle hydrophilicty, window=19
  • 52.
  • 53. The ‘positive inside’ rule (EMBO J. 5:3021; EJB 174:671,205:1207; FEBS lett. 282:41) Bacterial IM In: 16% KR out: 4% KR Eukaryotic PM In: 17% KR out: 7% KR Thylakoid membrane In: 13% KR out: 5% KR Mitochondrial IM In: 10% KR out: 3% KR
  • 54.
  • 55. GPCR Topology • Membrane-bound receptors • Transducing messages as photons, organic odorants, nucleotides, nucleosides, peptides, lipids and proteins. • 6 different families • A very large number of different domains both to bind their ligand and to activate G proteins. • Pharmaceutically the most important class • Challenge: Methods to find novel GCPRs in human genome …
  • 57. GPCR Topology GPCR Structure • Seven transmembrane regions • Hydrophobic/ hydrophilic domains • Conserved residues and motifs (i.e. NPXXY)
  • 58. GPCR Topology Eg. Plot conserverd residues (or multiple alignement: MSA to SSA)
  • 59. Levels of protein structure • Difficult to predict • Functional units: Apoptosome, proteasome
  • 60. Protein Structure Introduction Why ? How do proteins fold ? Levels of protein structure 0,1,2,3,4 X-ray / NMR The Protein Database (PDB) Protein Modeling Bioinformatics & Proteomics Weblems
  • 61. What is X-ray Crystallography • X-ray crystallography is an experimental technique that exploits the fact that X-rays are diffracted by crystals. • X-rays have the proper wavelength (in the Ångström range, ~10-8 cm) to be scattered by the electron cloud of an atom of comparable size. • Based on the diffraction pattern obtained from X-ray scattering off the periodic assembly of molecules or atoms in the crystal, the electron density can be reconstructed. • A model is then progressively built into the experimental electron density, refined against the data and the result is a quite accurate molecular structure.
  • 62. NMR or Crystallography ? • NMR uses protein in solution – Can look at the dynamic properties of the protein structure – Can look at the interactions between the protein and ligands, substrates or other proteins – Can look at protein folding – Sample is not damaged in any way – The maximum size of a protein for NMR structure determination is ~30 kDa.This elliminates ~50% of all proteins – High solubility is a requirement • X-ray crystallography uses protein crystals – No size limit: As long as you can crystallise it – Solubility requirement is less stringent – Simple definition of resolution – Direct calculation from data to electron density and back again – Crystallisation is the process bottleneck, Binary (all or nothing) – Phase problem Relies on heavy atom soaks or SeMet incorporation • Both techniques require large amounts of pure protein and require expensive equipment!
  • 63. Protein Structure Introduction Why ? How do proteins fold ? Levels of protein structure 0,1,2,3,4 X-ray / NMR The Protein Database (PDB) Protein Modeling Bioinformatics & Proteomics Weblems
  • 64. PDB
  • 65. PDB
  • 66. PDB
  • 67. PDB
  • 68. Visualizing Structures Cn3D versie 4.0 (NCBI)
  • 69. Visualizing Structures Ball: Van der Waals radius N, blue/O, red/S, yellow/C, gray (green) Stick: length joins center
  • 70. Visualizing Structures From N to C
  • 71. Visualizing Structures • Demonstration of Protein explorer • PDB, install Chime • Search helicase (select structure where DNA is present) • Stop spinning, hide water molecules • Show basic residues, interact with negatively charged backbone • RASMOL / Cn3D
  • 72. Protein Structure Introduction Why ? How do proteins fold ? Levels of protein structure 0,1,2,3,4 X-ray / NMR The Protein Database (PDB) Protein Modeling Bioinformatics & Proteomics Weblems
  • 74. Protein Stucture Molecular Modeling: building a 3D protein structure from its sequence
  • 75. Modeling • Finding a structural homologue • Blast –versus PDB database or PSI- blast (E<0.005) –Domain coverage at least 60% • Avoid Gaps –Choose for few gaps and reasonable similarity scores instead of lots of gaps and high similarity scores
  • 76. Modeling • Extract “template” sequences and align with query • Whatch out for missing data (PDB file) and complement with additonal templates • Try to get as much information as possible, X/NMR • Sequence alignment from structure comparson of templates (SSA) can be different from a simple sequence aligment • >40% identity, any aligment method is OK • <40%, checks are essential – Residue conservation checks in functional regions (patterns/motifs) – Indels: combine gaps separted by few resides – Manual editing: Move gaps from secondary elements to loops – Within loops, move gaps to loop ends, i.e. turnaround point of backbone • Align templates structurally, extract the corresponding SSA or QTA (Query/template alignment)
  • 77. Modeling Input for model building • Query sequence (the one you want the 3D model for) • Template sequences and structures • Query/Template(s) (structure) sequence aligment
  • 78. Modeling • Methods (details on these see paper): – WHATIF, – SWISS-MODEL, – MODELLER, – ICM, – 3D-JIGSAW, – CPH-models, – SDC1
  • 79. Modeling • Model evaluation (How good is the prediction, how much can the algorithm rely/extract on the provided templates) – PROCHECK – WHATIF – ERRAT • CASP (Critical Assessment of Structure Prediction) – Beste method is manual alignment editing !
  • 80. Comparative modelling at CASP BC CASP1 CASP2 CASP3 CASP4 alignment excellent poor fair fair fair side chain ~ 80% ~ 50% ~ 75% ~75% ~75% short loops 1.0 Å ~ 3.0 Å ~ 1.0 Å ~ 1.0 Å ~ 1.0 Å longer loops 2.0 Å > 5.0 Å ~ 3.0 Å ~ 2.5 Å ~ 2.0 Å CASP4: overall model accuracy ranging from 1 Å to 6 Å for 50-10% sequence identity **T128/sodm – 1.0 Å (198 residues; 50%) **T111/eno – 1.7 Å (430 residues; 51%) **T122/trpa – 2.9 Å (241 residues; 33%) **T125/sp18 – 4.4 Å (137 residues; 24%) **T112/dhso – 4.9 Å (348 residues; 24%) **T92/yeco – 5.6 Å (104 residues; 12%)
  • 81.
  • 82. Protein Engineering / Protein Design