Protein	
  Evolu-on	
  
Structure,	
  Func-on,	
  and	
  Human	
  
Health	
  

11/28/2013	
  

Dr.	
  Daniel	
  Gaston,	
  Department	
  
of	
  Pathology	
  

1	
  
So,	
  about	
  this	
  evolu-on	
  thing?	
  
Why	
  should	
  I	
  care?	
  What	
  use	
  is	
  it?	
  
Lots	
  of	
  reasons	
  
•  Knowledge	
  for	
  its	
  own	
  sake	
  is	
  good	
  
–  Otherwise,	
  why	
  do	
  science	
  at	
  all?	
  
Lots	
  of	
  reasons	
  
•  Knowledge	
  for	
  its	
  own	
  sake	
  is	
  good	
  
–  Otherwise,	
  why	
  do	
  science	
  at	
  all?	
  

•  Shapes	
  our	
  understanding	
  of	
  ecology	
  and	
  
biological	
  diversity	
  
Lots	
  of	
  reasons	
  
•  Knowledge	
  for	
  its	
  own	
  sake	
  is	
  good	
  
–  Otherwise,	
  why	
  do	
  science	
  at	
  all?	
  

•  Shapes	
  our	
  understanding	
  of	
  ecology	
  and	
  
biological	
  diversity	
  
•  Prac-cal	
  reasons	
  

–  An-bio-c	
  resistance	
  
–  Microbiome:	
  Fecal	
  transplanta-on	
  
–  Cancer	
  
–  Predic-ng	
  gene/protein	
  func-on	
  
–  Predic-ng	
  the	
  impact	
  of	
  muta-ons	
  for	
  poten-al	
  to	
  
cause	
  human	
  disease	
  (Genotype:Phenotype)	
  
Evolu-on	
  of	
  Life	
  on	
  Earth	
  
A	
  (Very)	
  Brief	
  Overview	
  
Eukaryota"
Eubacteria"

Archaebacteria"

ROOT	

Iwabe et al. 1989	

Gogarten et al. 1989
Eukaryota"
Eubacteria"

Archaebacteria"

ROOT	

Iwabe et al. 1989	

Gogarten et al. 1989
Eukaryota"
Eubacteria"

Archaebacteria"

ROOT	

Iwabe et al. 1989	

Gogarten et al. 1989
You	
  are	
  here	
  
A	
  Brief	
  History	
  of	
  Cells	
  and	
  Molecules	
  
•  Origin of the earth ~4.5 billion years ago
•  Origin of life: ~3.0-4.0 billion years ago
– 
– 
– 
– 
– 

Origin of self-replicating entities
The RNA world (?)
Origin of the first genes, proteins & membranes
Gave rise to the first cells
the Last Universal Common Ancestor (LUCA) of all cells
–  Probably had 500-1000 genes

•  First microfossils of bacteria: ~3.5 billion years ago (controversial)
~2.7 billion years ago (for certain)
•  Oxygenation of the atmosphere: 2.3-2.4 billion years ago (by
photosynthetic bacteria)
•  Origin of eukaryotes: ~1.0-2.2 billion years ago (probably 1.5)
•  Origin of animals: ~0.6-1.0 billion years ago
Some	
  Defini-ons	
  

•  Homology = descent from a common ancestor
–  homology is all or nothing: sequences are either
homologous (related) or not homologous (not
related)
–  Not the same as “similarity” (degrees of similarity
are possible)
Some	
  Defini-ons	
  
•  Divergence = change in two sequences over time
(after splitting from a common ancestor)
Ancestral sequence	


T	

Sequence 1 	


T	

Sequence 2 	


•  Convergence = similarity due to independent
evolutionary events
–  On the amino acid sequence level, it is relatively rare
& difficult to prove (but see an example later)
How does evolutionary change
happen in proteins?
Evolu-on:	
  Two	
  Groups	
  of	
  Processes	
  
•  Muta-on	
  
–  Many	
  different	
  processes	
  that	
  generate	
  muta-ons	
  
–  Muta-ons	
  are	
  the	
  raw	
  materials	
  needed	
  for	
  
evolu-on	
  to	
  happen	
  

•  Selec-on	
  and	
  DriY	
  
–  Muta-ons	
  happen	
  in	
  individuals	
  
–  Evolu-on	
  happens	
  in	
  popula-ons	
  of	
  organisms	
  
–  Selec-on	
  and	
  Gene-c	
  DriY	
  affect	
  the	
  frequency	
  of	
  
muta-ons	
  in	
  a	
  popula-on	
  over	
  -me	
  
Muta-ons	
  
Point	
  Muta-ons	

Unrepaired mispaired base	


! !
! !

AGGTTCCAATTAA!
TCCAAGGTCAATT!

!
REPLICATION (meiotic or mitotic division) 	

!
AGGTTCCAATTAA !
AGGTTCCAGTTAA !
TCCAAGGTCAATT!
TCCAAGGTTAATT!
Wild-type alleles	

! Mutant allele	


Mutant Gamete	

(for multicellular org.)	


Wild-type Gamete	

(for multicellular org.)
AGTCCAAGGCCTTAA
-------------> AGTTCAAGGCCTTAA	

	

	

point mutation	

	

	


CCTTA	

AGTCCAAGGCCTTAA
------------- AGTCCAAGGCCTTACCTTAA	

	

	

insertion	

	

	

	

	

	

	

AAGG	

AGTCCAAGGCCTTAA
------------- AGTCC-CCTTAA 	

	

	

deletion	

	

	

AGTCCAAGGCCTTAA
------------- AGTCCCCTTCCTTAA	

	

	

`
	

inversion	

	

AGTCCAAGGCCTTAA
------------- AGTCCAAGGCC	

+ 	

translocation 	

 +	

GGTCCTGGAATTCAG
GGTCCTGGAATTCAGTTAA	

	

AGTCCAAGGCC
-------------- AGTCCAAGGCCAGTCCAAGGCC	

duplication	

	

AAGG	

	

AGTCCAAGGCCTTAA
--------------- AGTCCAAAGGCTTAA	

	

	

recombination 	

	

 AGGC
Larger	
  Scale	
  Muta-ons	
  
Exon	
  shuffling	
  and	
  Protein	
  Domains	
  

Exon1	
  

Exon	
  2	
  

Exon	
  3	
  
Exon	
  shuffling	
  and	
  Protein	
  Domains	
  

Exon1	
  

Exon	
  2	
  

Domain	
  1	
  

Exon	
  3	
  

Domain	
  
2	
  
Exon	
  shuffling	
  and	
  Protein	
  Domains	
  

Exon	
  2	
  

Exon1	
  

Exon	
  3	
  
Exon	
  shuffling	
  and	
  Protein	
  Domains	
  

Exon	
  2	
  

Exon1	
  

Domain	
  A	
  

Exon	
  3	
  

Domain	
  
2	
  
Genomic	
  Scale	
  Muta-ons	
  

Gene	
  1	
  

Gene	
  2	
  
Genomic	
  Scale	
  Muta-ons	
  

Gene	
  1	
  

Gene	
  2	
  
Gene	
  Duplica-on	
  

Gene	
  1	
  

Gene	
  2	
  
Gene	
  Duplica-on	
  

Gene	
  1	
  

Gene	
  1a	
  

Gene	
  2	
  
Gene-c	
  DriY	
  and	
  Selec-on	
  
Mutations vs. substitutions
•  Mutations happen in individual organisms
•  A nucleotide ‘substitution’ occurs IF after many generations,
all individuals in the population harbour the ‘mutation’
•  This process is called “fixation of mutations”
•  substitution = fixed mutation
•  When comparing homologous protein sequences between
species, looking at amino acid substitutions
Fixation of alleles	

Population with two alleles: 	


N generations	


Proportion of
Proportion of

= 1/14 (7.1%)	

= 13/14 (93%)	


Proportion of = 1.0 (100%)	

	

This is the same as saying	

that was fixed in the 	

population in N generations	

	

The ‘mutation’	

became a ‘substitution’ after 	

it was fixed in the population
Natural selection and Neutral drift	

•  Positive selection	


–  Mutation confers fitness advantage (more offspring that
survive)	

–  RARE	


•  Purifying selection (negative selection)	


–  Mutation confers fitness disadvantage (less offspring or ‘no’
viable offspring - e.g. lethal)	

–  FREQUENT	


•  Neutral evolution (genetic drift)	


–  Mutation has very little fitness effect	

–  Will drift in frequency in the population due to random
sampling effects	

–  VERY FREQUENT
Nearly-neutral theory
Common	
  Examples	
  of	
  Posi-ve	
  
Selec-on	
  
•  MHC	
  Genes	
  
–  Diversity	
  =	
  Good	
  
–  Very	
  polymorphic	
  in	
  humans	
  

•  Envelope	
  (gp120)	
  of	
  HIV	
  
–  Immune	
  system	
  evasion	
  

•  Enzymes	
  involved	
  in	
  human	
  dietary	
  
metabolism	
  
–  Accelerated	
  posi-ve	
  selec-on	
  over	
  last	
  ~10,000	
  
years	
  
Gene-c	
  DriY	
  

Select	
  a	
  marble	
  randomly	
  from	
  a	
  jar	
  and	
  “copy”	
  it	
  in	
  to	
  the	
  next	
  
Fixa-on	
  of	
  the	
  plain	
  blue	
  allele	
  in	
  5	
  genera-ons	
  
Polymorphism	
  
•  Polymorphisms	
  are	
  sites	
  with	
  more	
  than	
  one	
  
allele	
  present	
  in	
  a	
  popula-on	
  
–  Muta-ons	
  that	
  have	
  not	
  yet	
  been	
  fixed	
  
Muta-on	
  and	
  Codons	
  
Not	
  all	
  muta-ons	
  are	
  created	
  equal	
  
Point mutations in protein genes are
classified according to the genetic code:

The genetic code is degenerate: more than one codon often specifies a single
amino acid.
E.g. Serine has 6 codons, Tyrosine has 2 codons and Tryptophan has one codon!
Point mutations in 
protein-coding genes	

•  synonymous (silent) substitutions:
cause interchange between two codons that code for the same
amino acid: 	

e.g. 	

CTG -- CTA = Leu -- Leu	

Mostly invisible to selection	

	


•  non-synonymous (replacement) mutations:
cause change between codons that code for different amino
acids (missense) or stop codons (nonsense)	

e.g. 	

CTG -- ATG = Leu -- Met	

	

 	

TGG -- TGA = Trp -- Stop
8 kinds of 1st codon-position synonymous mutation:	

R--R and L--L
126 kinds of 3rd-codon position synonymous mutation:
A	
  Note	
  on	
  Indels	
  
•  Ignored	
  because	
  indels	
  are	
  far	
  more	
  likely	
  to	
  
be	
  deleterious	
  
–  More	
  likely	
  to	
  result	
  in	
  frame	
  shiYs	
  	
  

•  Can	
  s-ll	
  be	
  non-­‐deleterious	
  
–  Par-cularly	
  if	
  in	
  mul-ples	
  of	
  three	
  
–  Over	
  evolu-onary	
  -me	
  indels	
  more	
  oYen	
  
observed	
  in	
  loops	
  than	
  more	
  constrained	
  
structural	
  elements	
  
Evolu-onary	
  Rates	
  
Speed	
  of	
  Evolu-on	
  
Rates of protein evolution
(i.e. rates that individual amino acids are substituted)

	


•  Different regions in proteins have different
rates of evolution (functional constraints)	

•  Different proteins have different overall rates
of evolution
Enolase
•  Ubiquitous glycolytic enzyme, highly

conserved throughout evolution
•  TIM Barrel family doing an α-proton

abstraction

cMLE
Euks

Archaea

β

MLE

α
γ
Bacteria
All Eukaryotes site rates (63 taxa) mapped on Lobster
Enolase

low rates blue	

high rates red
Site rate categories 1 and 2 (slowest sites)
Site rates Categories 3 and 4
Site rates Categories 5 and 6
Site rates Categories 7 and 8 (fastest sites)
Evolutionary rates as a function of
enolase structure/function	

•  Rates of evolution increase from the centre of the molecule
(slow) to the surface (fast)	

•  The pattern is probably due to:	


–  Distance from the catalytic centre -- catalytic residues don’t change
(slowest), residues that interact with catalytic residues are constrained
(slow)	

–  Geometric constraints - residues in the centre of the molecule have
restricted ‘space’ around them that constrains them. At the surface,
there are fewer such constraints	

–  Hydrophobic core in centre	

–  More loops and alpha helices on surface	


•  NOTE: this pattern seems to work for soluble globular enzymes
with catalytic centre in the centre of mass. It does not hold for
structural proteins like tubulin, actin etc.
Rates of evolution of sites versus their
structural position	

•  There are no completely general rules!	

–  It depends on what the protein is doing and where.	


•  Functional sites (catalytic sites) or sites at
interfaces (protein-protein interactions) are
conserved	

•  Geometric, chemical, folding and functional
constraints (catalysis, binding) determine
evolutionary constraints
Detec-ng	
  and	
  Quan-fying	
  
Evolu-onary	
  Rela-onships	
  
How do we know if two proteins are
homologous?	

(A) If sequences  100 amino long are 25% identical 	

	

-- they are probably significantly similar and very likely to be
homologous	

	

-BLAST, FASTA, Smith-Waterman algorithms are likely to find
them “significantly similar” (E-value  1x10-4)	

(B) If they are 100 long and 15-25% identical (Twilight Zone)	

	

 -- probably homologous BUT need to rigourously test it	

	

-a number of methods are available: permutation test	

(C) If they are 15% identical......difficult to prove homology	

	

-test it	

	

-if its not significant look for motifs in multiple alignments	

	

-look at tertiary structure
15-23%!
identity!

}!
Applica-ons	
  
•  Evolu-onary	
  methods	
  for	
  studying	
  protein	
  
func-on	
  
–  Annota-ng	
  novel	
  proteins	
  
–  Func-onal	
  divergence	
  

•  Predic-ng	
  pathogenicity	
  of	
  muta-ons	
  
Informing	
  protein	
  structure	
  predic-on	
  
–  Mendelian	
  disease	
  
–  Cancer	
  
Applica-ons	
  of	
  Evolu-onary	
  
Biology	
  to	
  Medicine	
  
Inherited	
  Gene-c	
  Diseases	
  and	
  
Cancer	
  
Lynch	
  Syndrome	
  
•  Autosomal	
  dominant	
  cancer	
  syndrome	
  
•  Increased	
  risk	
  for	
  many	
  cancers,	
  mostly	
  
colorectal	
  cancer	
  due	
  to	
  mismatch	
  repair	
  
defects	
  
Lynch	
  Syndrome	
  
•  Autosomal	
  dominant	
  cancer	
  syndrome	
  
•  Increased	
  risk	
  for	
  many	
  cancers,	
  mostly	
  
colorectal	
  cancer	
  due	
  to	
  mismatch	
  repair	
  
defects	
  
Mutator	
  Phenotype	
  
•  Inac-va-on	
  of	
  mismatch	
  repair	
  (MMR)	
  genes	
  
led	
  to	
  mutator	
  phenotypes	
  in	
  E.	
  coli	
  and	
  yeast	
  
•  Included	
  Microsatellite	
  instability	
  

	
  
Mutator	
  Phenotype	
  
•  Inac-va-on	
  of	
  mismatch	
  repair	
  (MMR)	
  genes	
  
led	
  to	
  mutator	
  phenotypes	
  in	
  E.	
  coli	
  and	
  yeast	
  
•  Included	
  Microsatellite	
  instability	
  

•  Careful	
  research	
  iden-fied	
  human	
  homologs	
  
–  MLH1	
  and	
  MSH2	
  
–  Defects	
  in	
  these	
  genes	
  cause	
  Lynch	
  Syndrome	
  
	
  
Mismatch	
  Repair	
  
•  Mismatch	
  Repair	
  -­‐	
  	
  
•  Microsatellite	
  Instability	
  -­‐	
  	
  
•  Cancer	
  
	
  
Most	
  microsatellites	
  spread	
  throughout	
  the	
  
genome	
  in	
  non-­‐genic	
  regions	
  
	
  
But	
  some	
  are	
  found	
  in	
  important	
  tumor	
  suppressor	
  
genes	
  
Applica-ons	
  of	
  Evolu-onary	
  
Biology	
  to	
  Medicine	
  
Predic-ng	
  Pathogenicity	
  and	
  Impact	
  
of	
  Human	
  Muta-ons	
  
The	
  Sequencing	
  Revolu-on	
  
Problem	
  
•  OYen	
  leY	
  with	
  hundreds	
  to	
  thousands	
  of	
  
poten-al	
  muta-ons	
  in	
  a	
  family	
  that	
  “track”	
  
with	
  the	
  disease	
  
–  Needle	
  in	
  a	
  “stack	
  of	
  needles”	
  problem	
  

•  Must	
  discriminate	
  neutral	
  missense	
  muta-ons	
  
from	
  pathogenic	
  ones	
  
Evolu-on	
  at	
  Work	
  
•  Many	
  programs	
  exist	
  to	
  make	
  these	
  
predic-ons:	
  
–  PolyPhen	
  
–  Muta-on	
  Taster	
  
–  EvoD	
  
–  SIFT	
  
–  PROVEAN	
  
–  FATHMM	
  
–  etc	
  
Evolu-on	
  at	
  Work	
  
•  Important	
  amino	
  acids	
  have	
  low	
  evolu-onary	
  
rates	
  
–  Higher	
  conserva-on	
  

•  The	
  more	
  important	
  the	
  protein	
  the	
  more	
  
likely	
  it	
  is	
  to	
  be	
  broadly	
  found	
  among	
  
eukaryotes	
  
–  Also	
  higher	
  overall	
  conserva-on	
  

•  However	
  many	
  important	
  proteins	
  in	
  humans	
  
only	
  found	
  in	
  primates,	
  mammals,	
  or	
  animals	
  
Evolu-on	
  at	
  Work	
  
Reference	
  Sequence	
  
…RPLAHTY…!

Mul-ple	
  Sequence	
  Alignment	
  
…RPLAHTY…!
…RPLVHTY…!
…RPIAHTY…!
…RPIGHTY…!
…RPIICTY…!
…RPLACTY…!
…RPLLCTY…!
!
	
  
Evolu-on	
  at	
  Work	
  
Reference	
  Sequence	
  
…RPLAHTY…!

Mul-ple	
  Sequence	
  Alignment	
  
…RPLAHTY…!
…RPLVHTY…!
…RPIAHTY…!
…RPIGHTY…!
…RPIICTY…!
…RPLACTY…!
…RPLLCTY…!
!
	
  

Compute	
  an	
  Evolu-onary	
  Conserva-on	
  Score	
  for	
  Each	
  Posi-on	
  
Evolu-on	
  at	
  Work	
  
Reference	
  Sequence	
  
…RPLACTY…!

Mul-ple	
  Sequence	
  Alignment	
  
…RPLAHTY…!
…RPLVHTY…!
…RPIAHTY…!
…RPIGHTY…!
…RPIICTY…!
…RPLACTY…!
…RPLLCTY…!
!
	
  

Conserva-ve	
  changes	
  more	
  likely	
  to	
  be	
  neutral	
  
Evolu-on	
  at	
  Work	
  
Reference	
  Sequence	
  
…RPLACTP…!

Mul-ple	
  Sequence	
  Alignment	
  
…RPLAHTY…!
…RPLVHTY…!
…RPIAHTY…!
…RPIGHTY…!
…RPIICTY…!
…RPLACTY…!
…RPLLCTY…!
!
	
  

Radical	
  changes	
  more	
  likely	
  to	
  be	
  deleterious	
  
Applica-ons	
  of	
  Evolu-onary	
  to	
  
Protein	
  Func-on	
  
Func-onal	
  Divergence	
  
Func-onal	
  Divergence	
  

Gene	
  1	
  

Gene	
  1a	
  

Gene	
  2	
  

Over	
  evolu-onary	
  -me	
  scales	
  Gene	
  1	
  and	
  Gene	
  1a	
  are	
  known	
  as	
  paralogs,	
  a	
  	
  
subset	
  of	
  homologs	
  
	
  
They	
  can	
  diverge	
  from	
  one	
  another	
  in	
  sequence,	
  as	
  well	
  as	
  func-on.	
  
Types	
  of	
  Func-onal	
  Divergence	
  
•  Subfunc-onaliza-on	
  
–  Paralog	
  specializes	
  and	
  retains	
  only	
  a	
  subset	
  of	
  
ancestral	
  func-on	
  	
  

•  Neofunc-onaliza-on	
  
–  Paralog	
  gains	
  a	
  new	
  func-on,	
  and	
  loses	
  old	
  
func-on(s)	
  

•  Subneofunc-onaliza-on	
  
–  Paralog	
  undergoes	
  rapid	
  subfunc-onaliza-on	
  but	
  
then	
  undergoes	
  neofunc-onaliza-on	
  
Func-onal	
  Divergence	
  

Family	
  A	
  
Gene	
  A	
  

Family	
  B	
  
Func-onal	
  Divergence	
  
Family	
  A	
  

…A L H…
…A L H…
…A L H…
…A L H…
…A L H…
…A L H…

Species 1
Species 2
Species 3
Species 4
Species 5
Species 6

Family	
  B	
  

…R A H…
…R R H…
…R C H…
…R A H…
…R A H…
…R Y H…

Species 1
Species 2
Species 3
Species 4
Species 5
Species 6
Glyceraldehyde-­‐3-­‐Phosphate	
  
Dehydrogenase	
  
NAD+
+Pi

NAD+
+	
  Pi	
  
Glyceraldehyde-­‐3-­‐Phosphate	
  

	
  NADH	
  
	
  +H+	
  

	
  NADH	
  
	
  	
  +	
  H+	
  
1,3-­‐Biphosphoglycerate	
  

Cytosol:	
  Glycolysis	
  
Glyceraldehyde-­‐3-­‐Phosphate	
  
Dehydrogenase	
  
NADP+ 	
  NADPH	
  
+Pi 	
  +H+	
  

NADP+ 	
  NADPH	
  
+Pi 	
  +H+	
  
Glyceraldehyde-­‐3-­‐Phosphate	
  

1,3-­‐Biphosphoglycerate	
  

Plas-d:	
  Calvin	
  Cycle	
  
GAPDH	
  Evolu-on	
  
Cytosolic	
  GapC	
  
Green	
  Plants	
  
Cyanobacteria	
  
‘Chromalveolates’	
  

Cytosolic	
  GapC	
  
GAPDH	
  Structure	
  
NADPH	
  Binding	
  Necessary	
  for	
  Calvin	
  
Cycle	
  Func-on	
  

Protein Evolution: Structure, Function, and Human Health

  • 1.
    Protein  Evolu-on   Structure,  Func-on,  and  Human   Health   11/28/2013   Dr.  Daniel  Gaston,  Department   of  Pathology   1  
  • 2.
    So,  about  this  evolu-on  thing?   Why  should  I  care?  What  use  is  it?  
  • 3.
    Lots  of  reasons   •  Knowledge  for  its  own  sake  is  good   –  Otherwise,  why  do  science  at  all?  
  • 4.
    Lots  of  reasons   •  Knowledge  for  its  own  sake  is  good   –  Otherwise,  why  do  science  at  all?   •  Shapes  our  understanding  of  ecology  and   biological  diversity  
  • 5.
    Lots  of  reasons   •  Knowledge  for  its  own  sake  is  good   –  Otherwise,  why  do  science  at  all?   •  Shapes  our  understanding  of  ecology  and   biological  diversity   •  Prac-cal  reasons   –  An-bio-c  resistance   –  Microbiome:  Fecal  transplanta-on   –  Cancer   –  Predic-ng  gene/protein  func-on   –  Predic-ng  the  impact  of  muta-ons  for  poten-al  to   cause  human  disease  (Genotype:Phenotype)  
  • 6.
    Evolu-on  of  Life  on  Earth   A  (Very)  Brief  Overview  
  • 7.
  • 8.
  • 9.
  • 11.
  • 12.
    A  Brief  History  of  Cells  and  Molecules   •  Origin of the earth ~4.5 billion years ago •  Origin of life: ~3.0-4.0 billion years ago –  –  –  –  –  Origin of self-replicating entities The RNA world (?) Origin of the first genes, proteins & membranes Gave rise to the first cells the Last Universal Common Ancestor (LUCA) of all cells –  Probably had 500-1000 genes •  First microfossils of bacteria: ~3.5 billion years ago (controversial) ~2.7 billion years ago (for certain) •  Oxygenation of the atmosphere: 2.3-2.4 billion years ago (by photosynthetic bacteria) •  Origin of eukaryotes: ~1.0-2.2 billion years ago (probably 1.5) •  Origin of animals: ~0.6-1.0 billion years ago
  • 13.
    Some  Defini-ons   • Homology = descent from a common ancestor –  homology is all or nothing: sequences are either homologous (related) or not homologous (not related) –  Not the same as “similarity” (degrees of similarity are possible)
  • 14.
    Some  Defini-ons   • Divergence = change in two sequences over time (after splitting from a common ancestor) Ancestral sequence T Sequence 1 T Sequence 2 •  Convergence = similarity due to independent evolutionary events –  On the amino acid sequence level, it is relatively rare & difficult to prove (but see an example later)
  • 15.
    How does evolutionarychange happen in proteins?
  • 16.
    Evolu-on:  Two  Groups  of  Processes   •  Muta-on   –  Many  different  processes  that  generate  muta-ons   –  Muta-ons  are  the  raw  materials  needed  for   evolu-on  to  happen   •  Selec-on  and  DriY   –  Muta-ons  happen  in  individuals   –  Evolu-on  happens  in  popula-ons  of  organisms   –  Selec-on  and  Gene-c  DriY  affect  the  frequency  of   muta-ons  in  a  popula-on  over  -me  
  • 17.
  • 18.
    Point  Muta-ons Unrepaired mispairedbase ! ! ! ! AGGTTCCAATTAA! TCCAAGGTCAATT! ! REPLICATION (meiotic or mitotic division) ! AGGTTCCAATTAA ! AGGTTCCAGTTAA ! TCCAAGGTCAATT! TCCAAGGTTAATT! Wild-type alleles ! Mutant allele Mutant Gamete (for multicellular org.) Wild-type Gamete (for multicellular org.)
  • 19.
    AGTCCAAGGCCTTAA -------------> AGTTCAAGGCCTTAA point mutation CCTTA AGTCCAAGGCCTTAA -------------AGTCCAAGGCCTTACCTTAA insertion AAGG AGTCCAAGGCCTTAA ------------- AGTCC-CCTTAA deletion AGTCCAAGGCCTTAA ------------- AGTCCCCTTCCTTAA ` inversion AGTCCAAGGCCTTAA ------------- AGTCCAAGGCC + translocation + GGTCCTGGAATTCAG GGTCCTGGAATTCAGTTAA AGTCCAAGGCC -------------- AGTCCAAGGCCAGTCCAAGGCC duplication AAGG AGTCCAAGGCCTTAA --------------- AGTCCAAAGGCTTAA recombination AGGC
  • 20.
  • 21.
    Exon  shuffling  and  Protein  Domains   Exon1   Exon  2   Exon  3  
  • 22.
    Exon  shuffling  and  Protein  Domains   Exon1   Exon  2   Domain  1   Exon  3   Domain   2  
  • 23.
    Exon  shuffling  and  Protein  Domains   Exon  2   Exon1   Exon  3  
  • 24.
    Exon  shuffling  and  Protein  Domains   Exon  2   Exon1   Domain  A   Exon  3   Domain   2  
  • 25.
    Genomic  Scale  Muta-ons   Gene  1   Gene  2  
  • 26.
    Genomic  Scale  Muta-ons   Gene  1   Gene  2  
  • 27.
    Gene  Duplica-on   Gene  1   Gene  2  
  • 28.
    Gene  Duplica-on   Gene  1   Gene  1a   Gene  2  
  • 29.
    Gene-c  DriY  and  Selec-on  
  • 30.
    Mutations vs. substitutions • Mutations happen in individual organisms •  A nucleotide ‘substitution’ occurs IF after many generations, all individuals in the population harbour the ‘mutation’ •  This process is called “fixation of mutations” •  substitution = fixed mutation •  When comparing homologous protein sequences between species, looking at amino acid substitutions
  • 31.
    Fixation of alleles Populationwith two alleles: N generations Proportion of Proportion of = 1/14 (7.1%) = 13/14 (93%) Proportion of = 1.0 (100%) This is the same as saying that was fixed in the population in N generations The ‘mutation’ became a ‘substitution’ after it was fixed in the population
  • 32.
    Natural selection andNeutral drift •  Positive selection –  Mutation confers fitness advantage (more offspring that survive) –  RARE •  Purifying selection (negative selection) –  Mutation confers fitness disadvantage (less offspring or ‘no’ viable offspring - e.g. lethal) –  FREQUENT •  Neutral evolution (genetic drift) –  Mutation has very little fitness effect –  Will drift in frequency in the population due to random sampling effects –  VERY FREQUENT
  • 33.
  • 34.
    Common  Examples  of  Posi-ve   Selec-on   •  MHC  Genes   –  Diversity  =  Good   –  Very  polymorphic  in  humans   •  Envelope  (gp120)  of  HIV   –  Immune  system  evasion   •  Enzymes  involved  in  human  dietary   metabolism   –  Accelerated  posi-ve  selec-on  over  last  ~10,000   years  
  • 35.
    Gene-c  DriY   Select  a  marble  randomly  from  a  jar  and  “copy”  it  in  to  the  next   Fixa-on  of  the  plain  blue  allele  in  5  genera-ons  
  • 36.
    Polymorphism   •  Polymorphisms  are  sites  with  more  than  one   allele  present  in  a  popula-on   –  Muta-ons  that  have  not  yet  been  fixed  
  • 37.
    Muta-on  and  Codons   Not  all  muta-ons  are  created  equal  
  • 38.
    Point mutations inprotein genes are classified according to the genetic code: The genetic code is degenerate: more than one codon often specifies a single amino acid. E.g. Serine has 6 codons, Tyrosine has 2 codons and Tryptophan has one codon!
  • 39.
    Point mutations in protein-coding genes •  synonymous (silent) substitutions: cause interchange between two codons that code for the same amino acid: e.g. CTG -- CTA = Leu -- Leu Mostly invisible to selection •  non-synonymous (replacement) mutations: cause change between codons that code for different amino acids (missense) or stop codons (nonsense) e.g. CTG -- ATG = Leu -- Met TGG -- TGA = Trp -- Stop
  • 41.
    8 kinds of1st codon-position synonymous mutation: R--R and L--L
  • 42.
    126 kinds of3rd-codon position synonymous mutation:
  • 43.
    A  Note  on  Indels   •  Ignored  because  indels  are  far  more  likely  to   be  deleterious   –  More  likely  to  result  in  frame  shiYs     •  Can  s-ll  be  non-­‐deleterious   –  Par-cularly  if  in  mul-ples  of  three   –  Over  evolu-onary  -me  indels  more  oYen   observed  in  loops  than  more  constrained   structural  elements  
  • 44.
  • 45.
    Rates of proteinevolution (i.e. rates that individual amino acids are substituted) •  Different regions in proteins have different rates of evolution (functional constraints) •  Different proteins have different overall rates of evolution
  • 47.
    Enolase •  Ubiquitous glycolyticenzyme, highly conserved throughout evolution •  TIM Barrel family doing an α-proton abstraction cMLE Euks Archaea β MLE α γ Bacteria
  • 48.
    All Eukaryotes siterates (63 taxa) mapped on Lobster Enolase low rates blue high rates red
  • 49.
    Site rate categories1 and 2 (slowest sites)
  • 50.
  • 51.
  • 52.
    Site rates Categories7 and 8 (fastest sites)
  • 53.
    Evolutionary rates asa function of enolase structure/function •  Rates of evolution increase from the centre of the molecule (slow) to the surface (fast) •  The pattern is probably due to: –  Distance from the catalytic centre -- catalytic residues don’t change (slowest), residues that interact with catalytic residues are constrained (slow) –  Geometric constraints - residues in the centre of the molecule have restricted ‘space’ around them that constrains them. At the surface, there are fewer such constraints –  Hydrophobic core in centre –  More loops and alpha helices on surface •  NOTE: this pattern seems to work for soluble globular enzymes with catalytic centre in the centre of mass. It does not hold for structural proteins like tubulin, actin etc.
  • 54.
    Rates of evolutionof sites versus their structural position •  There are no completely general rules! –  It depends on what the protein is doing and where. •  Functional sites (catalytic sites) or sites at interfaces (protein-protein interactions) are conserved •  Geometric, chemical, folding and functional constraints (catalysis, binding) determine evolutionary constraints
  • 55.
    Detec-ng  and  Quan-fying   Evolu-onary  Rela-onships  
  • 56.
    How do weknow if two proteins are homologous? (A) If sequences 100 amino long are 25% identical -- they are probably significantly similar and very likely to be homologous -BLAST, FASTA, Smith-Waterman algorithms are likely to find them “significantly similar” (E-value 1x10-4) (B) If they are 100 long and 15-25% identical (Twilight Zone) -- probably homologous BUT need to rigourously test it -a number of methods are available: permutation test (C) If they are 15% identical......difficult to prove homology -test it -if its not significant look for motifs in multiple alignments -look at tertiary structure
  • 57.
  • 59.
    Applica-ons   •  Evolu-onary  methods  for  studying  protein   func-on   –  Annota-ng  novel  proteins   –  Func-onal  divergence   •  Predic-ng  pathogenicity  of  muta-ons   Informing  protein  structure  predic-on   –  Mendelian  disease   –  Cancer  
  • 60.
    Applica-ons  of  Evolu-onary   Biology  to  Medicine   Inherited  Gene-c  Diseases  and   Cancer  
  • 61.
    Lynch  Syndrome   • Autosomal  dominant  cancer  syndrome   •  Increased  risk  for  many  cancers,  mostly   colorectal  cancer  due  to  mismatch  repair   defects  
  • 62.
    Lynch  Syndrome   • Autosomal  dominant  cancer  syndrome   •  Increased  risk  for  many  cancers,  mostly   colorectal  cancer  due  to  mismatch  repair   defects  
  • 63.
    Mutator  Phenotype   • Inac-va-on  of  mismatch  repair  (MMR)  genes   led  to  mutator  phenotypes  in  E.  coli  and  yeast   •  Included  Microsatellite  instability    
  • 64.
    Mutator  Phenotype   • Inac-va-on  of  mismatch  repair  (MMR)  genes   led  to  mutator  phenotypes  in  E.  coli  and  yeast   •  Included  Microsatellite  instability   •  Careful  research  iden-fied  human  homologs   –  MLH1  and  MSH2   –  Defects  in  these  genes  cause  Lynch  Syndrome    
  • 65.
    Mismatch  Repair   • Mismatch  Repair  -­‐     •  Microsatellite  Instability  -­‐     •  Cancer     Most  microsatellites  spread  throughout  the   genome  in  non-­‐genic  regions     But  some  are  found  in  important  tumor  suppressor   genes  
  • 66.
    Applica-ons  of  Evolu-onary   Biology  to  Medicine   Predic-ng  Pathogenicity  and  Impact   of  Human  Muta-ons  
  • 67.
  • 68.
    Problem   •  OYen  leY  with  hundreds  to  thousands  of   poten-al  muta-ons  in  a  family  that  “track”   with  the  disease   –  Needle  in  a  “stack  of  needles”  problem   •  Must  discriminate  neutral  missense  muta-ons   from  pathogenic  ones  
  • 69.
    Evolu-on  at  Work   •  Many  programs  exist  to  make  these   predic-ons:   –  PolyPhen   –  Muta-on  Taster   –  EvoD   –  SIFT   –  PROVEAN   –  FATHMM   –  etc  
  • 70.
    Evolu-on  at  Work   •  Important  amino  acids  have  low  evolu-onary   rates   –  Higher  conserva-on   •  The  more  important  the  protein  the  more   likely  it  is  to  be  broadly  found  among   eukaryotes   –  Also  higher  overall  conserva-on   •  However  many  important  proteins  in  humans   only  found  in  primates,  mammals,  or  animals  
  • 71.
    Evolu-on  at  Work   Reference  Sequence   …RPLAHTY…! Mul-ple  Sequence  Alignment   …RPLAHTY…! …RPLVHTY…! …RPIAHTY…! …RPIGHTY…! …RPIICTY…! …RPLACTY…! …RPLLCTY…! !  
  • 72.
    Evolu-on  at  Work   Reference  Sequence   …RPLAHTY…! Mul-ple  Sequence  Alignment   …RPLAHTY…! …RPLVHTY…! …RPIAHTY…! …RPIGHTY…! …RPIICTY…! …RPLACTY…! …RPLLCTY…! !   Compute  an  Evolu-onary  Conserva-on  Score  for  Each  Posi-on  
  • 73.
    Evolu-on  at  Work   Reference  Sequence   …RPLACTY…! Mul-ple  Sequence  Alignment   …RPLAHTY…! …RPLVHTY…! …RPIAHTY…! …RPIGHTY…! …RPIICTY…! …RPLACTY…! …RPLLCTY…! !   Conserva-ve  changes  more  likely  to  be  neutral  
  • 74.
    Evolu-on  at  Work   Reference  Sequence   …RPLACTP…! Mul-ple  Sequence  Alignment   …RPLAHTY…! …RPLVHTY…! …RPIAHTY…! …RPIGHTY…! …RPIICTY…! …RPLACTY…! …RPLLCTY…! !   Radical  changes  more  likely  to  be  deleterious  
  • 75.
    Applica-ons  of  Evolu-onary  to   Protein  Func-on   Func-onal  Divergence  
  • 76.
    Func-onal  Divergence   Gene  1   Gene  1a   Gene  2   Over  evolu-onary  -me  scales  Gene  1  and  Gene  1a  are  known  as  paralogs,  a     subset  of  homologs     They  can  diverge  from  one  another  in  sequence,  as  well  as  func-on.  
  • 77.
    Types  of  Func-onal  Divergence   •  Subfunc-onaliza-on   –  Paralog  specializes  and  retains  only  a  subset  of   ancestral  func-on     •  Neofunc-onaliza-on   –  Paralog  gains  a  new  func-on,  and  loses  old   func-on(s)   •  Subneofunc-onaliza-on   –  Paralog  undergoes  rapid  subfunc-onaliza-on  but   then  undergoes  neofunc-onaliza-on  
  • 78.
    Func-onal  Divergence   Family  A   Gene  A   Family  B  
  • 79.
    Func-onal  Divergence   Family  A   …A L H… …A L H… …A L H… …A L H… …A L H… …A L H… Species 1 Species 2 Species 3 Species 4 Species 5 Species 6 Family  B   …R A H… …R R H… …R C H… …R A H… …R A H… …R Y H… Species 1 Species 2 Species 3 Species 4 Species 5 Species 6
  • 80.
    Glyceraldehyde-­‐3-­‐Phosphate   Dehydrogenase   NAD+ +Pi NAD+ +  Pi   Glyceraldehyde-­‐3-­‐Phosphate    NADH    +H+    NADH      +  H+   1,3-­‐Biphosphoglycerate   Cytosol:  Glycolysis  
  • 81.
    Glyceraldehyde-­‐3-­‐Phosphate   Dehydrogenase   NADP+  NADPH   +Pi  +H+   NADP+  NADPH   +Pi  +H+   Glyceraldehyde-­‐3-­‐Phosphate   1,3-­‐Biphosphoglycerate   Plas-d:  Calvin  Cycle  
  • 82.
    GAPDH  Evolu-on   Cytosolic  GapC   Green  Plants   Cyanobacteria   ‘Chromalveolates’   Cytosolic  GapC  
  • 83.
  • 84.
    NADPH  Binding  Necessary  for  Calvin   Cycle  Func-on