Protein Evolution: Structure, Function, and Human Health
Upcoming SlideShare
Loading in...5

Protein Evolution: Structure, Function, and Human Health



Guest Lecture, Protein Biochemistry course on basics of evolution at the protein level and some applications.

Guest Lecture, Protein Biochemistry course on basics of evolution at the protein level and some applications.



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Protein Evolution: Structure, Function, and Human Health Protein Evolution: Structure, Function, and Human Health Presentation Transcript

  • Protein  Evolu-on   Structure,  Func-on,  and  Human   Health   11/28/2013   Dr.  Daniel  Gaston,  Department   of  Pathology   1  
  • So,  about  this  evolu-on  thing?   Why  should  I  care?  What  use  is  it?  
  • Lots  of  reasons   •  Knowledge  for  its  own  sake  is  good   –  Otherwise,  why  do  science  at  all?  
  • Lots  of  reasons   •  Knowledge  for  its  own  sake  is  good   –  Otherwise,  why  do  science  at  all?   •  Shapes  our  understanding  of  ecology  and   biological  diversity  
  • Lots  of  reasons   •  Knowledge  for  its  own  sake  is  good   –  Otherwise,  why  do  science  at  all?   •  Shapes  our  understanding  of  ecology  and   biological  diversity   •  Prac-cal  reasons   –  An-bio-c  resistance   –  Microbiome:  Fecal  transplanta-on   –  Cancer   –  Predic-ng  gene/protein  func-on   –  Predic-ng  the  impact  of  muta-ons  for  poten-al  to   cause  human  disease  (Genotype:Phenotype)  
  • Evolu-on  of  Life  on  Earth   A  (Very)  Brief  Overview  
  • Eukaryota" Eubacteria" Archaebacteria" ROOT Iwabe et al. 1989 Gogarten et al. 1989
  • Eukaryota" Eubacteria" Archaebacteria" ROOT Iwabe et al. 1989 Gogarten et al. 1989
  • Eukaryota" Eubacteria" Archaebacteria" ROOT Iwabe et al. 1989 Gogarten et al. 1989
  • You  are  here  
  • A  Brief  History  of  Cells  and  Molecules   •  Origin of the earth ~4.5 billion years ago •  Origin of life: ~3.0-4.0 billion years ago –  –  –  –  –  Origin of self-replicating entities The RNA world (?) Origin of the first genes, proteins & membranes Gave rise to the first cells the Last Universal Common Ancestor (LUCA) of all cells –  Probably had 500-1000 genes •  First microfossils of bacteria: ~3.5 billion years ago (controversial) ~2.7 billion years ago (for certain) •  Oxygenation of the atmosphere: 2.3-2.4 billion years ago (by photosynthetic bacteria) •  Origin of eukaryotes: ~1.0-2.2 billion years ago (probably 1.5) •  Origin of animals: ~0.6-1.0 billion years ago
  • Some  Defini-ons   •  Homology = descent from a common ancestor –  homology is all or nothing: sequences are either homologous (related) or not homologous (not related) –  Not the same as “similarity” (degrees of similarity are possible)
  • Some  Defini-ons   •  Divergence = change in two sequences over time (after splitting from a common ancestor) Ancestral sequence T Sequence 1 T Sequence 2 •  Convergence = similarity due to independent evolutionary events –  On the amino acid sequence level, it is relatively rare & difficult to prove (but see an example later)
  • How does evolutionary change happen in proteins?
  • Evolu-on:  Two  Groups  of  Processes   •  Muta-on   –  Many  different  processes  that  generate  muta-ons   –  Muta-ons  are  the  raw  materials  needed  for   evolu-on  to  happen   •  Selec-on  and  DriY   –  Muta-ons  happen  in  individuals   –  Evolu-on  happens  in  popula-ons  of  organisms   –  Selec-on  and  Gene-c  DriY  affect  the  frequency  of   muta-ons  in  a  popula-on  over  -me  
  • Muta-ons  
  • Point  Muta-ons Unrepaired mispaired base ! ! ! ! AGGTTCCAATTAA! TCCAAGGTCAATT! ! REPLICATION (meiotic or mitotic division) ! AGGTTCCAATTAA ! AGGTTCCAGTTAA ! TCCAAGGTCAATT! TCCAAGGTTAATT! Wild-type alleles ! Mutant allele Mutant Gamete (for multicellular org.) Wild-type Gamete (for multicellular org.)
  • Larger  Scale  Muta-ons  
  • Exon  shuffling  and  Protein  Domains   Exon1   Exon  2   Exon  3  
  • Exon  shuffling  and  Protein  Domains   Exon1   Exon  2   Domain  1   Exon  3   Domain   2  
  • Exon  shuffling  and  Protein  Domains   Exon  2   Exon1   Exon  3  
  • Exon  shuffling  and  Protein  Domains   Exon  2   Exon1   Domain  A   Exon  3   Domain   2  
  • Genomic  Scale  Muta-ons   Gene  1   Gene  2  
  • Genomic  Scale  Muta-ons   Gene  1   Gene  2  
  • Gene  Duplica-on   Gene  1   Gene  2  
  • Gene  Duplica-on   Gene  1   Gene  1a   Gene  2  
  • Gene-c  DriY  and  Selec-on  
  • Mutations vs. substitutions •  Mutations happen in individual organisms •  A nucleotide ‘substitution’ occurs IF after many generations, all individuals in the population harbour the ‘mutation’ •  This process is called “fixation of mutations” •  substitution = fixed mutation •  When comparing homologous protein sequences between species, looking at amino acid substitutions
  • Fixation of alleles Population with two alleles: N generations Proportion of Proportion of = 1/14 (7.1%) = 13/14 (93%) Proportion of = 1.0 (100%) This is the same as saying that was fixed in the population in N generations The ‘mutation’ became a ‘substitution’ after it was fixed in the population
  • Natural selection and Neutral drift •  Positive selection –  Mutation confers fitness advantage (more offspring that survive) –  RARE •  Purifying selection (negative selection) –  Mutation confers fitness disadvantage (less offspring or ‘no’ viable offspring - e.g. lethal) –  FREQUENT •  Neutral evolution (genetic drift) –  Mutation has very little fitness effect –  Will drift in frequency in the population due to random sampling effects –  VERY FREQUENT
  • Nearly-neutral theory 
  • Common  Examples  of  Posi-ve   Selec-on   •  MHC  Genes   –  Diversity  =  Good   –  Very  polymorphic  in  humans   •  Envelope  (gp120)  of  HIV   –  Immune  system  evasion   •  Enzymes  involved  in  human  dietary   metabolism   –  Accelerated  posi-ve  selec-on  over  last  ~10,000   years  
  • Gene-c  DriY   Select  a  marble  randomly  from  a  jar  and  “copy”  it  in  to  the  next   Fixa-on  of  the  plain  blue  allele  in  5  genera-ons  
  • Polymorphism   •  Polymorphisms  are  sites  with  more  than  one   allele  present  in  a  popula-on   –  Muta-ons  that  have  not  yet  been  fixed  
  • Muta-on  and  Codons   Not  all  muta-ons  are  created  equal  
  • Point mutations in protein genes are classified according to the genetic code: The genetic code is degenerate: more than one codon often specifies a single amino acid. E.g. Serine has 6 codons, Tyrosine has 2 codons and Tryptophan has one codon!
  • Point mutations in  protein-coding genes •  synonymous (silent) substitutions: cause interchange between two codons that code for the same amino acid: e.g. CTG --> CTA = Leu --> Leu Mostly invisible to selection •  non-synonymous (replacement) mutations: cause change between codons that code for different amino acids (missense) or stop codons (nonsense) e.g. CTG --> ATG = Leu --> Met TGG --> TGA = Trp --> Stop
  • 8 kinds of 1st codon-position synonymous mutation: R-->R and L-->L
  • 126 kinds of 3rd-codon position synonymous mutation:
  • A  Note  on  Indels   •  Ignored  because  indels  are  far  more  likely  to   be  deleterious   –  More  likely  to  result  in  frame  shiYs     •  Can  s-ll  be  non-­‐deleterious   –  Par-cularly  if  in  mul-ples  of  three   –  Over  evolu-onary  -me  indels  more  oYen   observed  in  loops  than  more  constrained   structural  elements  
  • Evolu-onary  Rates   Speed  of  Evolu-on  
  • Rates of protein evolution (i.e. rates that individual amino acids are substituted) •  Different regions in proteins have different rates of evolution (functional constraints) •  Different proteins have different overall rates of evolution
  • Enolase •  Ubiquitous glycolytic enzyme, highly conserved throughout evolution •  TIM Barrel family doing an α-proton abstraction cMLE Euks Archaea β MLE α γ Bacteria
  • All Eukaryotes site rates (63 taxa) mapped on Lobster Enolase low rates blue high rates red
  • Site rate categories 1 and 2 (slowest sites)
  • Site rates Categories 3 and 4
  • Site rates Categories 5 and 6
  • Site rates Categories 7 and 8 (fastest sites)
  • Evolutionary rates as a function of enolase structure/function •  Rates of evolution increase from the centre of the molecule (slow) to the surface (fast) •  The pattern is probably due to: –  Distance from the catalytic centre --> catalytic residues don’t change (slowest), residues that interact with catalytic residues are constrained (slow) –  Geometric constraints - residues in the centre of the molecule have restricted ‘space’ around them that constrains them. At the surface, there are fewer such constraints –  Hydrophobic core in centre –  More loops and alpha helices on surface •  NOTE: this pattern seems to work for soluble globular enzymes with catalytic centre in the centre of mass. It does not hold for structural proteins like tubulin, actin etc.
  • Rates of evolution of sites versus their structural position •  There are no completely general rules! –  It depends on what the protein is doing and where. •  Functional sites (catalytic sites) or sites at interfaces (protein-protein interactions) are conserved •  Geometric, chemical, folding and functional constraints (catalysis, binding) determine evolutionary constraints
  • Detec-ng  and  Quan-fying   Evolu-onary  Rela-onships  
  • How do we know if two proteins are homologous? (A) If sequences > 100 amino long are >25% identical --> they are probably significantly similar and very likely to be homologous -BLAST, FASTA, Smith-Waterman algorithms are likely to find them “significantly similar” (E-value << 1x10-4) (B) If they are >100 long and 15-25% identical (Twilight Zone) --> probably homologous BUT need to rigourously test it -a number of methods are available: permutation test (C) If they are <15% identical......difficult to prove homology -test it -if its not significant look for motifs in multiple alignments -look at tertiary structure
  • 15-23%! identity! }!
  • Applica-ons   •  Evolu-onary  methods  for  studying  protein   func-on   –  Annota-ng  novel  proteins   –  Func-onal  divergence   •  Predic-ng  pathogenicity  of  muta-ons   Informing  protein  structure  predic-on   –  Mendelian  disease   –  Cancer  
  • Applica-ons  of  Evolu-onary   Biology  to  Medicine   Inherited  Gene-c  Diseases  and   Cancer  
  • Lynch  Syndrome   •  Autosomal  dominant  cancer  syndrome   •  Increased  risk  for  many  cancers,  mostly   colorectal  cancer  due  to  mismatch  repair   defects  
  • Lynch  Syndrome   •  Autosomal  dominant  cancer  syndrome   •  Increased  risk  for  many  cancers,  mostly   colorectal  cancer  due  to  mismatch  repair   defects  
  • Mutator  Phenotype   •  Inac-va-on  of  mismatch  repair  (MMR)  genes   led  to  mutator  phenotypes  in  E.  coli  and  yeast   •  Included  Microsatellite  instability    
  • Mutator  Phenotype   •  Inac-va-on  of  mismatch  repair  (MMR)  genes   led  to  mutator  phenotypes  in  E.  coli  and  yeast   •  Included  Microsatellite  instability   •  Careful  research  iden-fied  human  homologs   –  MLH1  and  MSH2   –  Defects  in  these  genes  cause  Lynch  Syndrome    
  • Mismatch  Repair   •  Mismatch  Repair  -­‐>     •  Microsatellite  Instability  -­‐>     •  Cancer     Most  microsatellites  spread  throughout  the   genome  in  non-­‐genic  regions     But  some  are  found  in  important  tumor  suppressor   genes  
  • Applica-ons  of  Evolu-onary   Biology  to  Medicine   Predic-ng  Pathogenicity  and  Impact   of  Human  Muta-ons  
  • The  Sequencing  Revolu-on  
  • Problem   •  OYen  leY  with  hundreds  to  thousands  of   poten-al  muta-ons  in  a  family  that  “track”   with  the  disease   –  Needle  in  a  “stack  of  needles”  problem   •  Must  discriminate  neutral  missense  muta-ons   from  pathogenic  ones  
  • Evolu-on  at  Work   •  Many  programs  exist  to  make  these   predic-ons:   –  PolyPhen   –  Muta-on  Taster   –  EvoD   –  SIFT   –  PROVEAN   –  FATHMM   –  etc  
  • Evolu-on  at  Work   •  Important  amino  acids  have  low  evolu-onary   rates   –  Higher  conserva-on   •  The  more  important  the  protein  the  more   likely  it  is  to  be  broadly  found  among   eukaryotes   –  Also  higher  overall  conserva-on   •  However  many  important  proteins  in  humans   only  found  in  primates,  mammals,  or  animals  
  • Evolu-on  at  Work   Reference  Sequence   …RPLAHTY…! Mul-ple  Sequence  Alignment   …RPLAHTY…! …RPLVHTY…! …RPIAHTY…! …RPIGHTY…! …RPIICTY…! …RPLACTY…! …RPLLCTY…! !  
  • Evolu-on  at  Work   Reference  Sequence   …RPLAHTY…! Mul-ple  Sequence  Alignment   …RPLAHTY…! …RPLVHTY…! …RPIAHTY…! …RPIGHTY…! …RPIICTY…! …RPLACTY…! …RPLLCTY…! !   Compute  an  Evolu-onary  Conserva-on  Score  for  Each  Posi-on  
  • Evolu-on  at  Work   Reference  Sequence   …RPLACTY…! Mul-ple  Sequence  Alignment   …RPLAHTY…! …RPLVHTY…! …RPIAHTY…! …RPIGHTY…! …RPIICTY…! …RPLACTY…! …RPLLCTY…! !   Conserva-ve  changes  more  likely  to  be  neutral  
  • Evolu-on  at  Work   Reference  Sequence   …RPLACTP…! Mul-ple  Sequence  Alignment   …RPLAHTY…! …RPLVHTY…! …RPIAHTY…! …RPIGHTY…! …RPIICTY…! …RPLACTY…! …RPLLCTY…! !   Radical  changes  more  likely  to  be  deleterious  
  • Applica-ons  of  Evolu-onary  to   Protein  Func-on   Func-onal  Divergence  
  • Func-onal  Divergence   Gene  1   Gene  1a   Gene  2   Over  evolu-onary  -me  scales  Gene  1  and  Gene  1a  are  known  as  paralogs,  a     subset  of  homologs     They  can  diverge  from  one  another  in  sequence,  as  well  as  func-on.  
  • Types  of  Func-onal  Divergence   •  Subfunc-onaliza-on   –  Paralog  specializes  and  retains  only  a  subset  of   ancestral  func-on     •  Neofunc-onaliza-on   –  Paralog  gains  a  new  func-on,  and  loses  old   func-on(s)   •  Subneofunc-onaliza-on   –  Paralog  undergoes  rapid  subfunc-onaliza-on  but   then  undergoes  neofunc-onaliza-on  
  • Func-onal  Divergence   Family  A   Gene  A   Family  B  
  • Func-onal  Divergence   Family  A   …A L H… …A L H… …A L H… …A L H… …A L H… …A L H… Species 1 Species 2 Species 3 Species 4 Species 5 Species 6 Family  B   …R A H… …R R H… …R C H… …R A H… …R A H… …R Y H… Species 1 Species 2 Species 3 Species 4 Species 5 Species 6
  • Glyceraldehyde-­‐3-­‐Phosphate   Dehydrogenase   NAD+ +Pi NAD+ +  Pi   Glyceraldehyde-­‐3-­‐Phosphate    NADH    +H+    NADH      +  H+   1,3-­‐Biphosphoglycerate   Cytosol:  Glycolysis  
  • Glyceraldehyde-­‐3-­‐Phosphate   Dehydrogenase   NADP+  NADPH   +Pi  +H+   NADP+  NADPH   +Pi  +H+   Glyceraldehyde-­‐3-­‐Phosphate   1,3-­‐Biphosphoglycerate   Plas-d:  Calvin  Cycle  
  • GAPDH  Evolu-on   Cytosolic  GapC   Green  Plants   Cyanobacteria   ‘Chromalveolates’   Cytosolic  GapC  
  • GAPDH  Structure  
  • NADPH  Binding  Necessary  for  Calvin   Cycle  Func-on