Bioinformatic approaches to the discovery of apoptotic proteins

NOTE TO USERS
This reproduction is the best copy available.
®
UMI

•
•
•
Bioinforrnatic Approaches to the Discovery of
Apoptotic Proteins
Stephane Acoca
Department of Biochemistry
McGill University
Montreal, Quebec, Canada
A thesis submitted to McGill University in partial fulfillment of the
requirements for the degree of Master of Science
© Stephane Acoca, February, 2005

1+1 Library and
Archives Canada
Bibliothèque et
Archives Canada
Published Heritage
Branch
Direction du
Patrimoine de l'édition
395 Wellington Street
Ottawa ON K1A ON4
Canada
395, rue Wellington
Ottawa ON K1A ON4
Canada
NOTICE:
The author has granted a non-
exclusive license allowing Library
and Archives Canada to reproduce,
publish, archive, preserve, conserve,
communicate to the public by
telecommunication or on the Internet,
loan, distribute and sell theses
worldwide, for commercial or non-
commercial purposes, in microform,
paper, electronic and/or any other
formats.
The author retains copyright
ownership and moral rights in
this thesis. Neither the thesis
nor substantial extracts from it
may be printed or otherwise
reproduced without the author's
permission.
ln compliance with the Canadian
Privacy Act some supporting
forms may have been removed
from this thesis.
While these forms may be included
in the document page count,
their removal does not represent
any loss of content from the
thesis.
•••
Canada
AVIS:
Your file Votre référence
ISBN: 0-494-12385-0
Our file Notre référence
ISBN: 0-494-12385-0
L'auteur a accordé une licence non exclusive
permettant à la Bibliothèque et Archives
Canada de reproduire, publier, archiver,
sauvegarder, conserver, transmettre au public
par télécommunication ou par l'Internet, prêter,
distribuer et vendre des thèses partout dans
le monde, à des fins commerciales ou autres,
sur support microforme, papier, électronique
et/ou autres formats.
L'auteur conserve la propriété du droit d'auteur
et des droits moraux qui protège cette thèse.
Ni la thèse ni des extraits substantiels de
celle-ci ne doivent être imprimés ou autrement
reproduits sans son autorisation.
Conformément à la loi canadienne
sur la protection de la vie privée,
quelques formulaires secondaires
ont été enlevés de cette thèse.
Bien que ces formulaires
aient inclus dans la pagination,
il n'y aura aucun contenu manquant.

•
•
•
Abstract
Bioinformatics can be broadly defined as the interface between the biological and
computational sciences. The wealth of sequencing information brought about by the
completion of various genome projects in the past years, has created a need for the
Implementation of increasingly sophisticated computer systems designed for the
management, organization, and indexing of this information. Bioinformatics further
extends to the use of computational methods to facilitate and expedite biological research.
The problem of sequence homology detection is that of analyzing protein databases in an
attempt to detect significant homology between a query sequence (or set of sequences)
and a candidate sequence, and thereby possibly infer sorne evolutionary and functional
relationship between them.
Apoptosis, or programmed cell death, is a genetically programmed pathway that
allows the targeted and careful elimination of cells in an organism. Over the past years,
numerous implications of the pathway in several human disease states have prompted
much research into the mechanisms which regulate it. In particular, the Bel-2 family of
proteins, named after the original discovery of Bel2 as an overexpressed gene in human
b-celllymphomas, form a core machinery which mediate and regulate several Irreversible
steps in its execution.
The core of this research project revolves around the development of knowledge
and expertise in the field of sequence homology identification techniques, as applied
towards the discovery of Bel2 family members. The use of such methods resulted in the
uncovering of a Bel2 Homology 3 (BH3) domain in a ubiquitin ligase enzyme known as
11

•
•
•
upstream regulatory binding protein l (UreBI). A complete biochemical analysis
unequivocally demonstrated binding of the UreBI BH3 domain to MeL-l, an
antiapoptotic member of the Bel2 protein family. Furthermore, the discovery of a BH3
domain in UreB1 may provide a link in the established involvement of the ubiquitin
pathway in the degradation of Mell following certain apoptotic stimuli. In addition,
using an independent domain modelling strategy, we describe the development of BISA,
a web-accessible software package for sequence homology detection.
111

•
•
•
Résumé
Le domaine de la bioinformatique peut être décrit de façon générale comme
l'interface des sciences biologiques et informatiques. L'essor impressionnant du nombre
de séquences connues au cours des dernières années, largement due a la finition de
plusieurs projets de séquençage génomique, a crée un besoin grandissant pour
l'implémentation de systèmes informatiques sophistiqués désignés a la gérance,
l'organisation, et l'indexage de cet information. La bioinformatique s'étends à
l'utilisation de méthodes informatiques pour faciliter et expédier la recherche biologique.
Le problème de la détection d'homologie de séquences et celui de l'analyse de base de
données de séquences dans la tentative de détecter une ressemblance considérable entre
un séquence (ou ensemble de séquences) et une séquence candidate, et de possiblement
déduire une évolution coordonnée et relation fonctionnelle entre elles.
L'apoptose, ou la mort programmée, est un programme génétique qui permet
l'élimination ciblée et minutieuse, de cellules d'un organisme. Au cours des dernières
années, les nombreuses implications de l'apoptose dans plusieurs maladies humaines a
initié beaucoup de recherche dans les mécanismes qui le réglementent. En particulier, la
famille Bel2 de protéines, nommé après la découverte initiale de Bel2 comme étant un
gène surexprimé dans des lymphomes à cellules B humains, forme une machinerie
centrale qui régularise plusieurs étapes irréversibles dans son exécution.
La majorité de ce projet de recherche gravite autour du développement d'une
connaissance et expertise accrue dans les techniques de détection d'homologie de
séquences, comme appliquées envers la découverte de protéines de la famille Bel2.
IV

•
•
•
L'utilisation de ces méthodes a conduit à la découverte d'un domaine homologie de Bel2
nommé BH3 (3ième
domaine d'homologie à Bel2) dans un enzyme ubiquitine-ligase
connu sous le nom de UreB l (upstream regulatory binding protein 1). Une analyse
complète a démontré sans contredit l'attachement du domaine BH3 de UreB l a Mell,
une protéine inhibiteur de l'apoptose appartenant à la famille Bel2. En effet, la
découverte d'un domaine BH3 dans UreB 1 nous offre un lien entre la voie de
dégradation par l'ubiquitine et l'abaissement du niveau de Mell de la cellule lors de
certaines signalisations d'apoptose. De plus, en utilisant une approche de modélisation
indépendante de domaines, nous décrivons le développement de BISA, une application
accessible de par l'Internet pour la détection d'homologie de séquences.
v

•
•
•
Thanks!
Professor Mathieu Blanchette for aU ofhis support, assistance, and guidance
throughout this project without which most ofthis work would not have beenfeasible
Professor Gordon Shore for his endless encouragements and support! Professor
Michael Hallett for being the first to introduce me to this wonderful field! AU ofthe
members ofthe McGill Centre for Bioinformatics which have been incredibly helpfuZ
during the entire time and have made the experience at the MeE so memorable. 1 should
thank especiaUy François Pépin (the endZess source ofknowledge ta aU the members of
the Zab!), Michelle Scott, and Greg Finak for having been such incredibly supportive
friends during my stay! Ta aU ofmy friends (Anna, David, Michael,Patrick) who have
always been there for me and keep putting up with me. Most ofaU thanks t myfamily
which keeps being a great source ofinspiration and love to me.
VI

•
•
•
Table of Contents
Abstract
Resume
Acknowledgements
Table of Contents
List of Figures and Tables
Abbreviations
Contributions of Authors
Chapter 1: General Introduction
1.1 Apoptosis
1.1.1 Molecular and Morphological changes in Apoptosis
1.1.2 Caenorhabditis Elegans : an introduction to the apoptotic
pathway
1.1.3 The Caspases: The essential executioners of programmed
cell death
1.1.4 The Death Domains: Evolution of the signalling machinery
1.1.5 The Death Receptors
1.1.5.1 The CD95/Fas Receptor
1.1.5.2 The TNFR1 Receptor
1.1.5.3 The Death Receptor 3 (DR3)
1.1.5.4 The Death Receptors 4 & 5 (DR4 & DR5)
1.1.6 Mitochondria and Apoptosis
1.1.6.1 The Bel2 Family of Apoptotic Proteins
1.2 Bioinformatic approaches to sequence homology identification
1.2.1 Pairwise Homology Screening
1.2.2 Automated Profile Screening
1.2.3 Profile Screening: Hidden Markov Models (HMMs)
1.2.3.1 Profile Hidden Markov Models
1.2.3.2 Motif-based Hidden Markov Models
1.3 Thesis Objectives
Chapter 2: Computational approaches for protein domain
screening
2.1 Position Weight Matrices
2.1.1 The Method
2.1.2 The Score
2.1.3 The Results
2.2 Development of BISA
2.2.1 The Hidden Markov Model
Vll
11
IV
vi
Vll
ix
XlI
X111
2
4
5
8
11
15
16
20
22
22
24
27
38
38
40
41
44
46
49
51
51
54
54
56
58

•
•
•
2.2.1.1 Architecture
2.2.1.2 Parameter Estimation
2.2.1.2.1 Transition Probabilities
2.2.1.2.2 Emission Probabilities
2.2.1.3 Redefining pseudocounts: Protein blocks
2.2.2 The Background Hidden Markov Model
2.2.3 Random Sequence Generator Model
2.2.4 The Viterbi Algorithm
2.2.5 P-value estimates
2.2.5.1 The Gumbel Extreme Value Distribution
2.2.5.2 Il and À Determination
2.2.5.2.1 Maximum-Likelihood estimation of Il and À
2.2.5.2.2 The Curve-Fitting Algorithm
2.2.5.3 Combining the scores
2.3 Discussion
59
62
62
64
66
68
69
71
73
77
79
80
84
89
91
Chapter 3: An informatics search identifies the HECT-domain
protein Urebl as a BH3-only protein that interacts with
Mcl-l
3.1 Abstract
3.2 Introduction
3.3 Materials and Methods
3.4 ResuUs
3.4.1 Urebl
3.4.2 BH3 of Urebl Associates with Mel-l
3.4.3 BH3 of Urebl Selectively Associates with Mel-l
3.5 Discussion
Chapter 4: BISA: A novel bioinformatics search application
for the discovery of true protein sequence homology
4.1 Abstract
4.2 Introduction
4.3 Description
4.4 Materials and Methods
4.5 ResuUs
4.6 Discussion
4.7 Future Considerations
4.8 Conelusion
4.9 Acknowledgements
Conclusion
References
viii
102
102
106
111
113
114
115
116
120
120
122
125
126
127
130
131
131
132
135

• List of Figures and Tables
Chapter 1
Figure 1.1: Activation of Programmed Cell Death in C. 6
Elegans
Figure 1.2: Procaspase activation by upstream caspases 9
Figure 1.3: The caspase family of proteins 9
Figure 1.4: Superposition of Death Fold Domain 14
Figure 1.5: Apoptosis signalling by the Fas receptor 18
Figure 1.6: Pro and Antiapoptotic signalling by TNFR1 20
Figure 1.7: Apoptosis signalling by DR4 and DR5 23
Figure 1.8: The antiapoptotic Bel2 family members 29
Figure 1.9: The proapoptotic Bel2 family members 30
Figure 1.10: The Bel-XL/Bak BH3 peptide complex 31
Figure 1.11: lnvolvement of Bid- and Bad-like proteins in the
mitochondrial apoptotic pathway 34
Figure 1.12: A small Profile Hidden Markov Model 45
Figure 1.13: Sample architecture of a Motif-based Hidden
• Markov Model 47
Chapter 2
Table 2.1: Results obtained through scoring of human proteins from
the NCBI RefSeq database using a Position Weight
55Matrix representing the BH3 domain
Table 2.2: Results of one-fold cross validation experiments with
proteins containing a BH3 domain 57
Figure 2.1: Alignment of the most widely recognized BH3 domains 60
Figure 2.2: Section of an HMM modelling the BH3 domain 60
Figure 2.3: Alignment of BH3-like domain found in the Map1
protein with the BH3 domains of Bik, Bel2, and Bak 61
Figure 2.4: Section of an HMM representing the BH3-like domain
alignment 61
Figure 2.5: Complete HMM representing the BH3-like domain
alignment of Fig 2.3 61
•
Figure 2.6: The probability associated with state transitions from
states containing a single state-transition is 1 63
IX

• Figure 2.7: The probability associated with state transitions
representing gaps is equivalent to the proportion of the
domain sequences containing the gap in the domain
63
alignment
Figure 2.8: A O-order HMM for generating random amino acid
sequences 70
Figure 2.9: Comparison of scores retrieved for the NACHT and BH3
domains: A measure of scores as a function of size and
complexity 75
Figure 2.10: Illustration of the effect of length and complexity on the
difference in HMM scores: average scores retrieved for a
single BH3 domain, a double BH3 domain, and the
75NACHT domain
Figure 2.11: The density and distribution functions of the Gumbel
(Extreme Value) Distribution 77
Figure 2.12: The effects of the J.l and À parameters on the Extreme
Value Distribution 78
Figure 2.13: Lambda values obtained from a sample simulation using
the BH3 domain at variable lengths 78
• Figure 2.14: Mu values obtained from a sample simulation using the
BH3 domain at variable lengths 79
Figure 2.15: Distribution of Scores with respect to sample size 83
Figure 2.16: Error Rate in estimation of the Extreme Value
Distribution Paramaters using the Maximum Likelihood
84approach
Figure 2.17: Variations of Extreme Value Distribution with respect to
length of protein sequences 85
Figure 2.18: Mu values obtained from a sample simulation using the
BH3 domain at variable lengths 87
Figure 2.19: Percentage in errors obtained from a sample simulation at
various slope differentiallimits using the BH3 domain 88
Figure 2.20: Number of approximated sequence lengths at various
slope differentiallimits from a sample simulation using
89the BH3 domain
Table 2.3: Effect of different P-value scores on the combined final
score 90
Table 2.4: Effect of different P-value scores on the combined final
score with adjustments for individual P-value limits 91
Figure 2.21: Position-Specifie Conservation Indexes for the BH3
• domain 92
x

•
•
•
Figure 2.22:
Table 2.5:
Figure 2.23:
Figure 2.24:
Figure 2.25:
Chapter 3
Figure 3.1:
Figure 3.2:
Figure 3.3:
Figure 3.4:
Chapter 4
Depiction of the side chains at the interaction interface of
the Bcl-XL-Bak BH3 complex
Effect of Position-Specific Conservation Indexes on
sconng
An example of the original KroghlHaussler profile HMM
architecture
The HMMer Plan7 architecture
Sequence weighting schemes available as options in the
HMMer software package
Position Weight Matrix search results for the BH3
domain
Schematic of the domains in human UREB1 including the
newly identified BH3 domain
Association of the Ureb1 BH3 domain with Mcl-1
Selective Association of the Ureb1 BH3 domain with
Mcl-1
Figure 4.1: The BISA web-interface
Figure 4.2: The BISA results page
Figure 4.3: The sequence result information page
Xl
93
95
96
97
100
112
113
114
115
123
129
129

• Abbreviations
aa amino acid
ANT adenine nuc1eotide translocator
APAF apoptotic protease activating factor
BH bcl2 homology
C.Elegans Caenorhabditis Elegans
CARD caspase recruitment domain
CsA cyclosporine A
DcR decoy receptor
DD death domain
DED death effector domain
EVD extreme value distribution
HMM hidden markov model
IMM inner mitochondrial membrane
FLIP FLICE-like inhibitory protein
• MOMP mitochondrial outer membrane permeabilization
ML Maximum likelihood
PHMM profile hidden markov model
PS phosphatidyl serine
PTP permeability transition pore
PWM position weight matrix
OMM outer mitochondrial membrane
RIP receptor interacting protein
ROS reactive oxygen species
TNF tumor necrosis factor
TNFR TNF receptor
TRAIL TNF related apoptosis inducing Ligand
TRAF TNFR associated factor
VDAC voltage-dependent anion channel
• Xll

•
•
•
Contributions of Authors
This thesis includes two research articles that are in the process of being
published (Chapters 3 and 4). For convenience, the references from aIl chapters were
placed into a single reference section at the end of the thesis.
Chapter 2:
While the almost complete entirety of the work in this chapter was done by myself, credit
must be given to Hui Chen for the work on amino acid groupings that were used to
calculate pseudocounts (Section 2.2.1.3). In addition, the Maximum-Likelihood
algorithm employed for deriving the Il and À parameters of the Gumbel distribution from
a sample distribution set was based on a technical report from Sean Eddy for his HMM
modelling software HMMer (Eddy, 1997).
Chapter 3:
Acoca, S., Warr, M., Germain, M., Shore, G.C., and Blanchette, M. An informatics
search identifies the HECT-domain protein Urebl as a BH3-only protein that interacts
with Mcl-l. Apoptosis. to be submitted shortly.
Credit and thanks must be extended to Mathew Warr for the experimental verification of
UREB-l binding to MCL-I. Therefore, the sections of the article and experiments under
"Plasmids", "Transient Transfection", "Immunoprecipitation", "BH3 of UREB l
Associates with MCL-I", and "BH3 ofUREBI Selectively Associates with MCL-I"
were written and performed by him. In addition, the discussion was written as part of a
collaborative effort between myself and Mathew.
Chapter 4:
Acoca, S., Blanchette, M., Chen, H. BISA: A novel bioinformatics search application for
the discovery of true protein sequence homology. Nucleic Acids Res. to be completed
and submitted shortly.
All of the work in this paper was done by myself. However, supervision by professor
Blanchette has been invaluable in achieving the work required for the design and
implementation of the software described in this paper. As mentioned previously, the
project involving the gathering of amino acid groupings was a contribution from Hui
Chen of the bioinformatics group supervised by professor Blanchette.
xm

•
Chapter 1
• General Introduction
• 1

•
•
•
1.1 Apoptosis
The discovery that physiological cell death in multicellular organisms was under
genetic control and the demonstration that abnormalities ofthe process were implicated
in disease states brought about the widespread recognition of apoptosis as an important
pathway in the regulation of cell production and maintenance of homeostasis in an
orgamsm.
Programmed cell death, commonly called apoptosis, may serve several purposes
during the lifespan of an organism. For instance, apoptosis will frequently be initiated to
eliminate cells which may be described as having no function. In C.Elegans for example,
many cells which are formed die before differentiation has taken place and live less than
an hour (Sulston et al., 1983). Similarly, in mammals, many thymocytes die before they
mature into functional T-Cells. A number of such cells that die as part of normal
development are evolutionary remnants that may have been functional in ancestral
species but are now programmed to die (Ellis et al., 1991). Such cellular death can
facilitate evolutionary change in three ways. First, cell death is known to have a
morphological function which allows the alteration of the shape ofa structure in order for
it to be sculpted into a more adaptive form. For instance, in the embryo, a simple vesicle
gives rise the complex structure ofthe adult inner ear through a number of morphogenetic
processes, including a specific distribution ofapoptotic spots (Leon Y. et al., 2004).
Secondly, cell death can modify a structure that has been duplicated during evolution,
giving it a novel identity. An example ofthis is clearly seen in the grasshopper where the
ganglia found in the thoracic and abdominal segments originate from identical sets of
2

•
neuroblasts and preeursor cells. The ganglia then diverge by subsequent apoptotic death
which attributes to eaeh ofthem a distinct composition (Ellis et al., 1991). Lastly, cell
death in one sex and not in the other can create sexually dimorphic traits. In drosophila,
the male-specifie somatic gonadal precursors (msSGPs) are specified in both males and
females but are eliminated in females via programmed cell death through the activity of
the sex determination regulatory gene doublesex (DeFalco et al., 2003). In sum, each of
these three cases demonstrates a possible venue for which cellular death may be allowed
to occur in view of modifying proeesses or structures that already exist, instead of
creating new ones.
A second driving force behind cellular apoptosis resides in the maintenance of
homeostasis, the tendency of a biological system to seek and maintain a condition of
• balance or equilibrium within its internaI environment despite changes in its external
milieu. One means through which this is accomplished is by the e1imination of cells
which are produced in excess. One example is during the developmental stages ofthe
chick where spinal motor neurons are excessively produced. The population ofmotor
neurons is then approximately halved through apoptosis, thereby possibly ensuring that
the number ofavailable target sites c10sely matches the population number ofthe cell
type. In addition, through the course of the development of the organism, cells that may
not develop in the intended manner should be eliminated. In vertebrates for instance, the
development ofthe visual system is subject to selective apoptosis in an attempt to remove
neurons whieh have formed unsuitable connections. Other cells may yet have a transient
function in an organism's deve10pmental stages and may then undergo apoptosis after
• their purpose has been served. In the nematode C.Elegans, the tapering tail is partly
3

•
•
•
shaped by filaments produced by a tail-spike cell which is then programmed to die.
Moreover, apoptosis can also function in maintenance ofhomeostasis through the
removal of cells that are deemed potentially harmful to the organism. Thymocytes are
known to sometimes carry T-cell receptors that may injure an organism's tissues. In such
cases, their self-reactive nature ensures that they are induced to undergo cell death in the
thymus, before they are able to mature (Ellis et al., 1991). In addition, cells which contain
irreparable DNA damage can either haIt their cell cycle and activate repair mechanisms
or induce apoptosis as a last line of defense in order to prevent cells that have undergone
significant genetic alteration from proliferating, resulting in a reduction ofthe risk of
tumor formation (Hofmann, 1999). Cells that are under extensive oxidative stress can
also attempt to enter apoptosis as an alternative to cellular necrosis. Furthermore, in order
to prevent further spread ofa viral, bacterial or fungal infection, affected cells can be
targeted for apoptosis. This form ofpathogen-induced cell death is ofparticular
importance for plants, since they lack the defense mechanism that is provided by an
immune system. It is therefore not surprising that viruses have evolved elaborate
strategies to evade host cell apoptosis, or at least until the assembly of the viral progeny
has been completed (Hofmann, 1999).
1.1.1 Molecular and Morphological changes in Apoptosis
At various stages along the initiation of the apoptotic pathway, a number of
molecular and morphological features distinguish apoptotic cells. Early apoptotic events
attribute to most cells a more rounded morphology by releasing extracellular attachments
and reorganizing focal adhesions. The reorganization ofactin filaments initiates the
4

•
formation ofa peripheral membrane-associated ring which, through Myosin II-dependent
contractions, brings about the blebbing stage characterized by periods of sustained
protrusion and retraction of the plasma membrane (Mills et al., 1999). The dissolution of
the polymerized actin filaments correlates with the condensation ofthe cell into small,
membrane-enc1osed partic1es called apoptotic bodies (Mills et al., 1999).
The loss of membrane phospholipid asymmetry in apoptotic cells is followed by
the exposure ofphosphatidylserine (PS), an aminophospholipid which is normally
restricted to the inner-membrane of cells. Exposure ofPS is the most widely accepted
surface change which plays a direct role in recognition by phagocytes (Fadok et al.,
1998). In addition, changes in surface sugars have also been shown to be recognized by
phagocyte lectins and may also play an important role in the phagocytosis of the
• apoptotic bodies (Fadok et al., 1998). AlI ofthese surface signaIs ensure efficient
elimination by phagocytosis and prevent any inflammatory response from being initiated,
a key factor which differentiates apoptosis from cellular necrosis.
•
Other apoptotic features ofinterest inc1ude cytoplasmic, nuc1ear and chromatin
condensation along with fragmentation ofthe nuc1ear membrane and ofthe genomic
content. Of note is the formation of one ofthe most extensively employed markers for
apoptosis: a nuc1eosomalladder generated by c1eavage ofgenomic DNA located in
between nuc1eosomes into fragments corresponding of multiple integers of 180 base pairs.
1.1.2 Caenorhabditis Elegans : an introduction to the apoptotic pathway
5

•
•
•
The regulatory network ofprogrammed cell death in the nematode C. Elegans
provides a simple molecular framework for our initial understanding ofthe regulation
involved in the decision steps initiated before commitment ofa cell to undergo apoptosis
and ofthe proteins implicated in its execution.
The genes involved in the apoptotic pathway ofany organism can be separated
into positive and negative regulators of the apoptotic pathway. Of the four genes
regulating the apoptotic pathway in C.Elegans, only one, ced-9, functions as an
antagonist ofthe pathway whereas three, egl-l, ced-3, and ced-4, function to promote it
(Metzstein et al., 1998). This pathway is initiated by Egl-l which inhibits the
antiapoptotic activity ofCed-9. Ced-9's activity is directed at the inhibition of Ced-4
which is an activator of Ced-3, a cysteine protease that functions as the executioner of
apoptosis in C. Elegans (Metzstein et al., 1998).
Step 1
Upstream signais
~~GL-1
Step 2
.~.EGL-1
r/'>c:e>
Inactive
CED-3
!-1>Cleavage
Active
CED-3
Fig 1.1: Activation ofProgrammed Cell Death in C. Elegans. a) Upstream signaIs initiate the activation
and/or production ofEgl-1 which binds to and inactivates the Ced-9 protein, allowing the release ofCed-4
from Ced-9.b) The subsequent step involves proteolytic cleavage ofthe pro- enzymeform ofCed-3,
releasing its two catalytic subunits which assemble into the activeform. (Taken from Metzstein et al., 1998)
6

•
•
•
Loss of function mutations in any of the positive regulators of the pathway or
gain offunction mutations in ced-9 blocks the cell's ability to undergo apoptosis (Chen et
al., 2000). Genetic studies revealed that Ced-4 is a mitochondrial protein that undergoes
re1ocalization to the nucleus in apoptotic cells (Chen et al., 2000). This change in
localization is brought about by the inhibitory activity ofEgl-1 on Ced-9, which releases
ced-4 from it's interaction with ced-9 located at the mitochondrial surface and allows its
relocalization to the nuclear membrane where it can exert its apoptotic activity (Chen et
al., 2000). Rence it seems that part of the inhibitory activity ofCed-9 relies in its capacity
to sequester Ced-4 at the mitochondria and prevent its localization to the nucleus where it
can bind directly to the pro-enzyme form of ced-3 to facilitate its proteolytic cleavage
into the active enzyme form (Chen et al., 2000).
The characterization of the implicated genes in the apoptotic pathway of C.
Elegans has proven to be key in understanding programmed cell death in higher
organisms. Ced-3 was discovered to be part of a family ofproteases called caspases that
induce the morphological changes seen in apoptotic cells (refer to Section 1.1.3) and was
instrumental in identifying the roles played by caspases in the regulation ofapoptosis in
mammalian cells. As well, APAF-1, a mammalian protein, was found to be closely
related to Ced-4 and also appeared to be functioning in the regulation ofprogrammed cell
death through caspase activation with a number of similar characteristics (Metzstein et al.,
1998). Upon the discovery ofthe bel2 gene through gain-of-function mutations in human
follicular b-celllymphomas (Tsujimoto et al., 1986), its identification as an antiapoptotic
protein implicated in programmed cell death in mammalian cell-culture experiments,
along with the discovery that it belongs to a multidomain family ofpro and antiapoptotic
7

•
proteins, made the understanding ofthe parallels between apoptosis in the nematode and
higher organisms an easy task (see Section 1.1.5.2). The cloning of the ced-9 gene
revealed that its protein product showed close homology to bcl-2. Similarly, the EGL-1
protein was also found to contain a BH3 domain and belonged to a proapoptotic subclass
of the family which essentially contained a sole BH3 domain for their proapoptotic
function. Thus the simplified view ofapoptosis through the C. Elegans regulatory
network provided a means to understand how the Bcl-2 proteins work in regulating the
activity of the caspases in the execution of apoptosis.
1.1.3 The Caspases: Essential executioners ofprogrammed cell death
The caspases form a group ofhighly conserved enzymes called cysteine
• aspartate-specific proteases which are indispensable for the execution ofprogrammed
cell death (Strasser et al., 2000). Their importance is such that the cleavage oftheir
•
highly limited subset oftarget polypeptides in the cell is sufficient to account for the
variety of morphological and cellular events that occur in the course ofprogrammed cell
death (Nicholson, 1999). So far, 14 caspases have been identified in mammals (Strasser,
2000). They were initially synthesized in their zymnogen form, the inactive form of the
enzyme which contains a pro domain and has very low intrinsic enzymatic activity. The
specificity ofeach of these is defined by the four amino acid (tetrapeptide) motif
upstream of the target cleavage site which invariably culminates at the carboxyl terminal
ofan aspartate residue, the caspase cleavage site.
Functional caspases are organized into a heterotetramer composed of two
large identical subunits of approximately 20 kDa each (p20 subunit), and two small
8

•
•
•
subunits weighing about 10 kDa each (plO subunit) identical subunits. Figure 1.2 (left
side) shows caspases in their tripartite pro-enzyme form, composed ofa prodomain and
two non-functional subunits along with
their activated form. Caspase-mediated
cleavage at the two aspartate residues
will allow the release of the subunits and
the formation ofthe activated caspase.
Caspases
caspase
other
names
Ced-3 --œmI
2 (ICtrl. Nedd-2) ~
9 (MCH-6.ICE-LAP6) œm
8 (MCH-5. MACH. FLICEI . -
10 (MCH-4) ~ BD
3 (CPP32. Apopain. Yama) ---i
6 (MCH-2) ---1
7 (MCI+3. ICE-LAP3. CMH-l) --1
(ICE) J.'!.1lliJ
4 (ICH-2. TX. IC~elllJ lim!!l
5 (IC~1-3. TY. ICErellll)
11
12
13 (ERlCE)
14 (MICEI
pro p20
Figure 1.2 Procaspase activation by proteolytic
cleavagefrom upstream caspases. The resulting
release ofthe p20 and plO subunit allows the
formation ofan active caspase camplex composed
oftwo plO and two p20 subunits.
preferred
substrateos, DETO
:ID
•• DEHD
PEPO
apoptosis
1 LEHO
V:iï
Initiator
LETO
1ii.1 LEND
IEjD
~W ]
T1D apoptosis
VEHO effector
I:JD III OEVD
Wjr WEHO
!!iD il (WL)EHO
:iD Il (WL)EHD
1 iii WEHO'"
cytokine
- maturation
ft WEHO*
WYdt WEHO*
fjf WEHO'"
Figure 1.3 The caspasefamily. The aspartate cleavage sites between the large (yellow bars) and small
(orange bars) subunits are indicated. Death effector domains (DED) and caspase recruitment domains
(CARD) are shown. Human caspases 1-10, mouse caspases 11-14, and the C.Elegans caspase Ced-3 are
shown. (Strasser et al., 2000)
9

•
•
•
ICE: The prototype caspase
The prototype caspase is the ICE [Interleukin Converting 1- fi Enzyme] enzyme, also
known as caspase-1, which is responsible for the conversion ofpro-interleukin 1- fi
(proIL-1B), an inactive pro-inflammatory cytokine precursor, to its activated forrn
(Nicholson et al., 1997). From a catalytic perspective, ICE was notable for its near
absolute requirement ofan aspartate residue at the cleavage position, substitution of
which creates a larger than 100 fold decrease in the Kea/Km value (Nicholson et al., 1997).
ICE also tends to be more selective for hydrophobie residues at the 4th
amino acid
upstream ofthe aspartate residue but does tolerate other amino acids reasonably weIl.
The initial cloning ofICE in 1992 made it an interesting discovery in that it
was found to be unrelated to any known mammalian protein. Closer inspection did reveal
homology to Ced-3 which prompted a successful search for a number of other
mammalian homologs (Creagh et al., 2001). From an evolutionary standpoint,
phylogenetic analysis indicated that the 14 known mammalian caspases can be classified
into two groups with regards to their relatedness to ICE or Ced-3 (Nicholson, 1999).
Evidence indicates that the members ofthe ICE subfamily are primarily involved in
inflammatory processes (the third group injig. 1.3) whereas the Ced-3 subfamily
(composed of the first and second groups ofjig. 1.3) is almost exclusively involved in
apoptosis. Further classification ofthe caspase subfamilies based on their substrate
specificities resulted in three groupings (fig 1.3 displays the functional classification of
the 14 caspases; Nicholson, 1999). Group 1consists of caspases 1, 4, 5 and 13 and have a
preference for large hydrophobie amino acids at the fourth upstream position relative to
10

•
•
•
the cleavage site (P4). This amino acid preference is consistent with their role in cytokine
processing but not in apoptosis since none ofthe target proteins cleaved during apoptosis
contain hydrophobic residues at P4. Group II (caspases 2, 3 and 7) are much more
specific in their requirement of an Asp at P4 and have a specificity which is virtually
identical to that of the C. Elegans caspase Ced-3. The DExD motifrecognized by Group
II also occurs in a number ofproteins targeted for cleavage during apoptosis, supporting
the role of Group II as the major effectors ofprogrammed cell death (Nicholson et al.,
1999). Group III (caspases 6, 8, 9 and 10) have a preference for branched chain aliphatic
amino acids in P4 which are found at the activation cleavage site of most Group II and III
caspases, a specifity which is consistent with their roles as initiators ofactivation of
Group II caspases that occurs during apoptosis (Nicholson, 1999). Two exceptions to this
general classification scheme are caspases 2 and 6. The former appears to be a self-
activating effector caspase while the latter plays an effector role in addition to the
putative activation role (Nicholson, 1999). These exceptions account for the slight
differences observed between the substrate-specificities classification scheme and that
observed infigure 1.3.
1.1.4 The Death Domains: Evolution ofthe signalling machinery
A considerable number ofproteins have been identified that act on the upstream
part ofthe signalling pathway, from the activated receptor initializing a death signal to
the activation of the initiator caspases. These identified "adaptor" proteins provide an
essentiallink between the cell death effectors caspases and the cell death regulators, the
death receptors (Section 1.1.5). Through physical associations, the adaptor proteins
11

•
•
•
provide bridges between the caspases and the regulators ofapoptosis. Based on localized
regions of sequence similarity, these proteins are known to consist of several
independently folded domains (Hofmann K., 1999). The analogies discovered between
these domains have helped us understand the underlying architecture of the pathway.
Nature has commonly used domains as modules that can be mixed and matched in
an effort to create proteins with nove1 functions (Pawson and Nash, 2003). As an
example, the Death Effector Domain (DED) has long been one of the most studied
instances ofthe apoptotic domains. This structure however rarely appears by itself in
proteins. The reason for this is simply that upon activation ofthe appropriate death
receptor, its purpose in the pathway is to then bind adapter proteins at the cytoplasmic
surface ofthe ceU and activate the foUowing appropriate pathways. Here, we will now
foUow a briefaccount for each ofthe domains that are commonly known to play an
intrinsic part in the evolution of the apoptotic machinery.
The Death Domain Fold Superfamily
The associations between apoptotic proteins are mediated by members ofthe
Death Domain Fold Superfamily which contains as its family members the Death
Effector Domain (DED), the Death Domain (DD) and the Caspase Recruitment Domain
(CARO) aU ofwhich play prominent roles in the intermediate steps between activation of
the death receptor up until caspase activation (Hofmann K., 1999). Pyrin, a recently
uncovered member of the Death Domain Fold Superfamily, is c1ose1y involved in linking
the inflammatory processes of a cell to the apoptotic machinery, a topic which will not be
covered in this thesis (Stehlik et al., 2002). Members of the Death Domain Fold
12

•
•
•
Superfamily are homotypic interaction domains through which domains of the same c1ass
can interact and are characterized by five, six (rarely even seven) tightly coiled
antiparallel alpha he1ices into what is called a Greek-Key fold structure which is
illustrated infigure 1.4 (Lahm et al., 2003). Early on in evolution, homotypic interaction
domains first evolved as a means of self-assembly ofproteins gathering into structures
composed ofmultiple units ofthe same protein (Lahm et al., 2003). The peculiarity of
members of the Death Domain Fold Superfamily lies in there being no known example of
a member ofthe Death Domain Fold Superfamily binding to itself. That is, none ofthe
previously reported interactions are between the Death-fold domains ofthe same protein
(i.e. the DED ofFas with the DED of Fas), only between death-fold domains ofthe same
subfamily from different proteins (i.e. the DD ofFas and the DD ofFADD).
Role in caspase activation
Activation of sorne caspases, that contain a long prodomain harbouring a member
ofthe Death Domain Fold Superfamily, is achieved through a process involving proteins
containing one or more of the death fold domains as part of their structure. Long
prodomains of caspases 1, 2, 4 and 9 are known to contain a CARD domain whereas
those ofcaspases 8 and 10 contain two tandem DED domains as illustrated infigure 1.3.
Certain cells contain plasma membrane receptors that trigger cell death pathways upon
binding oftheir cognate ligands (please refer to Section 1.1.5). In the course ofthis
apoptotic pathway, transfer of the upstream death stimulus to the downstream
13

•
•
•
APA1-CARO PRO-CASPS CARO
PELLE-DD RAIDDCARD
P7S-DD FADDDED FADD-DD
Fig 1.4: Superposition ofCARD, DD, and DED death fold domain structures. For comparison purposes
the structures have been separated white maintaining their relative orientation. Ali structures conserve
the 6 or 7alpha-helical central motifbut maintain a certain variability which represents their adaptation
to specific biologicalfunctions in the cell. The RAIDD helices were labeled Hl to H6 as a reference point.
APAI and PRO-CASP9 are abbreviationsfor Apafl and Procaspase 9 respectively. Ali domains are
illustrated as present in the FSSP database (Holm et al., 1996). (Illustration taken from Lahm et al., 2003)
caspases is achieved through adapter proteins. Adapter proteins invariably play a role in
associating to a number ofthe receptors involved in triggering cellular death and relaying
the signal to either another adaptor protein or to an initiator caspase. In any event, the
final interaction in each of the death adaptor cascades is the association of an adaptor
protein with a caspase prodomain through one oftwo of the death fold domains, the
caspase recruitment domain (CARD) or the death effector domain (DED) (Hofmann,
1999).
14

•
1.1.5 The Death Receptors
Mammals have evolved a mechanism which allows the organism to actively
direct instructions for individual cells to self-destruct, a mechanism which is vital for the
normal, controlled functioning of the immune system (Ashkenazi et al., 1998). AlI such
signaIs are mediated through a c1ass of receptors collectively known as the death
receptors and belong to the Tumor Necrosis Factor (TNF) family of multimeric receptors.
These proteins form a large family ofmultimeric receptors that are defined by a
significant homology in their cysteine-rich extracellular domain and can activate cell
survival or cell death signaIs upon binding by their respective ligands (Dragovich et al.,
1998). In addition, the death receptors are also known to contain a homologous
cytoplasmic region known as the Death Domain which generally functions in the
• activation ofdownstream caspases but can in sorne instances function to negatively
regulate apoptosis (Ashkenazi et al., 1998).
•
The best characterized death receptors are: CD95 (also called the Fas or Apo1
receptor) and the TNFR1 (also known as p55). Other known death receptors inc1ude DR3
(Ap03), DR4 and DR5 (Ap02). The ligands which activate the death receptors (CD95
Ligand for CD95, TNF and lymphotoxin-a for TNFR1, Ap03 ligand for DR3 and Ap02
ligand for DR4 and DR5) are all structurally-related molecules belonging to the TNF
gene superfamily (Ashkenazi et al., 1998). As expected, the TNF related ligands and
receptors are all expressed on, but not confined to, activated T-cells and macrophages,
and are inevitably required during T-cell mediated immune responses (Baker and Reddy
1998).
15

•
1.1.5.1 The CD95/Fas Receptor
The initial evidence of the physiological importance ofthe Fas receptor and
ligand came from human patients and the mouse mutations lpr (lymphoproliferation) and
gld (generalized lymphoproliferative disease) where the gene for CD95 and/or CD95L
were found to be defective (Ashkenazi et al., 1998; Nagata, 1997). The resulting
phenotype could involve accumulation ofperipherallymphoid cells as well as a fatal
autoimmune syndrome characterized by massive enlargement ofthe lymph nodes
(Ashkenazi et al., 1998).
Functional Roles
Three main physiological roles have been attributed to the Fas Receptor and
• Ligand complex. The first involves the removal ofperipheral activated T-Cells at the end
of an immune response. T-Lymphocytes are generally responsible for the removal of
virally infected and cancerous cells and die at a number of stages in the course oftheir
development. Most immature T-cells are ofno functional use to the organism due to
incorrect rearrangement of the T-cell receptor and/or may potentially be self-reactive and
therefore detrimental to the organism (more than 95% ofthymocytes immigrating to the
thymus are eliminated during their development; Nagata, 1997). In addition, peripheral
mature T-cells that recognize self-antigens are also deleted. When a target cell is
encountered, mature T-cells are activated to proliferate and allow for the appropriate
immune response to be provided. After such a response is accomplished by the mature T-
cell, they must be removed. Failure to die after activation leads to an accumulation of
• activated cells in the spleen and lymph nodes ofthe affected organism, a phenotype seen
16

•
•
•
in the gld and lpr mice (Nagata, 1997). Secondly, Cytotoxic T lymphocytes (CTL) and
Natural Killer (NK) cells recognize and kill virus/bacterium-infected cells and cancerous
cells respectively through a Fas-dependent mechanism (Nagata, 1997). CTL activation
through the interaction of the T-cell receptor to viral antigens causes the expression ofthe
Fas ligand (FasL) at the cell surface. The FasL then binds to the Fas receptor n the target
cells and activates caspases thereby causing apoptosis. The third revolves around
"Immune Privilege" cells. Immune response reactions are accompanied by inflammatory
responses that can cause non-specifie damage to neighbouring tissues. As acceptable as it
may be for most organs, it isn't for others such as the testis and the eye (Nagata, 1997).
These are "immune privilege sites" that, through constitutive expression of functional Fas
ligand, enable the immediate killing of activated inflammatory cells entering these organs
(Nagata, 1997). Therefore, the numerous functional roles that may be assumed by the Fas
receptor and Fas ligand make it a valuable tool in the proper functioning of the immune
system both in maintaining inflammatory processes and preventing them from occurring
in "immune privileged sites".
Signalling
The CD95 Ligand, like other TNF family members, forms a homotrimeric
molecule at the cell surface which allows for binding to a homotrimeric complex of the
CD95 receptor at the target cellleading to aggregation of the receptor's death domains
(fig 1.5). The adaptor molecule FADD recognizes the Death Domain c1uster at the
receptor site and, through its own Death Domain, binds and recruits through another
apoptotic domain, the Death Effector Domain (DED), the zymnogen (inactive) form of
17

•
•
•
caspase-8. FADD is also known to contain a DED which binds the tandemly repeated
DED in the prodomain of caspase-8. The caspase-8 oligomerization which results from
its recruitment at the activated Fas complex initiates and drives it's activation through
self-c1eavage(Ashkenazi and Dixit, 1998). Deletion and Knockout studies involving
dominant negative mutants
CD95L
Caspase 8
•Effector
caspases
Apoptosis
Fig 1.5: Apoptosis signalling by CD95.
(Ashkenazi and Dixit, 1998)
18
ofFADD and FADD knockout mice
respectively further established FADD
as an essential link for apoptosis
induction by the Fas receptor (Ashkenazi
and Dixit, 1998). Activation of caspase-
8 results in the activation of downstream
effector caspases such as caspase-9
which effectively cornrnits the cell to the
apoptotic program.
Viral proteins known as
vFLIPs (viral FLICE-Like Inhibitory
Protein) and a related cellular protein
called cFLIP have been compared to
catalytically inactive caspase-8
molecules and regulate apoptosis
signalling through the Fas receptor by
means oftwo DED motifs that are

• similar to the ones contained in caspase-8. cFLIP is found in a cFLIP-Iong and cFLIP-
short isoforms. The long form also contains a region homologous to the catalytic domain
of caspase-8 but several changes in the sequence render it catalytically inactive (Thorbum
A. 2004). However, it seems that FLIP proteins may have a dual role as apoptotic
regulators acting both as inhibitors, by competing for FADD with caspase-8, and
activators binding to ofcaspase-8 induced apoptosis (Thorbum A., 2004). The activation
function ofFLIP proteins is mediated by the long FLIP isoform which can dimerize with
caspase-8 at the DISe and activate its proteolytic activity (Thorbum K., 2004). Other
FADD-independent pathways are also known to exist which can mediate apoptosis
through the Fas receptor in sorne cells. For instance, cytoplasmic proteins such as Daxx
• bind the Fas death domain and can initiate the stress-activated c-Jun kinase (Ashkenazi
and Dixit, 1998).
•
There also exists a FasL-neutralizing decoy receptor known as DcR3 which
lacks an apparent transmembrane domain and is secreted (Bhardwaj et al., 2003). DcR3
can bind FasL with an affinity equal to that of the Fas receptor but cannot transmit any
intracellular death signaIs. Therefore, DcR3 competes with Fas for binding to FasL. In
addition, DcR3 rnRNA is frequently overexpressed in tumor celllines and the DeR3 gene
is also amplified in 50% ofprimary lung and colon tumors, suggesting a mechanism used
by tumor cells for escaping immune cytotoxic attacks (Bhardwaj et al., 2003).
19

•
•
•
1.1.5.2 The TNFRI Receptor
TNFR1 activation results in activation ofNf-KB and AP-1, two transcription
factors which induce expression of genes promoting inflammation and modu1ating
immune function (Ashkenazi and Dixit, 1998). In addition, the observation that, under
circumstances ofinhibited protein synthesis, TNFRI activation can also trigger apoptosis
suggested a mechanism for suppression ofthe apoptotic stimulus from TNF by cellular
factors (Ashkenazi and Dixit, 1998).
Further evidence that inhibition ofeither
pathway sensitized cells to TNF-induced
apoptosis provided proof of such an
inhibitory mechanism ofTNF-induced
apoptosis (Ashkenazi and Dixit, 1998).
More importantly however, it is the
suppression of apoptosis which
augments the inflammatory response to
TNFCi. In fact, the TNFCi promoter
contains itselfNf-KB and AP-1 binding
sites and is subject to positive
autoregulation, thereby allowing an
amplification ofthe inflammatory
response (Baud et al., 2001).
20
TNF
TNFR1
cIAP1f2 ,
~jlNIK MEKK1?
IKK
• c-Jun ----1
1
Caspase 8
...Effector
caspases
...Apoptosls
Fig 1.6: Proapoptotic and antiapoptotic
signalling by TNFR1. ( Ashkenazi and
Dixit, 1998)

•
•
Activation of the TNFR1 receptor follows a series ofevents similar to that of
the Fas receptor. Trimerization ofthe TNFR1 receptor is brought about upon binding of
TNF. The following association of the receptor's intracellular Death Domain with the
Death Domain ofthe adapter protein TRADD (TNFR-associated death domain) functions
as a platform for the recruitment of several signalling molecules. TNFR-associated
factor-2 (TRAF2) and receptor-interacting protein (RIP) stimulate the pathways leading
to Nf-KB and JNKIAP-1. Upon binding to the activated receptor complex, TRAF2 and
RIP activate Nf-KB-inducing kinase (NIK). The result is the phosphorylation ofNf-KB
inhibitory proteins, the ikB's, which prevent Nf-KB from entering the nucleus. Activation
ofNIK leads to the phosphorylation of the iKBs by the IkB kinase (IKK) complex,
ultimately resulting in their degradation and the nuclear translocation of the released Nf-
KB (Baud et al., 2001). Nuclear Nf-KB transcriptional activity can be further modulated
by phosphorylation by a number ofTNFCI.- responsive protein kinases, providing a
meeting point for cross-talk between signalling pathways. On the other hand, AP-1 is a
heterogeneous collection ofdimeric transcription factors including Jun, Fos and ATF
subunits. Activity ofAP-1 subunits is regulated in part through phosphorylation.
Activation ofTNFR1 will induce a number ofkinases, including the more relevant c-Jun
amino-terminal kinases (JNK's), to enter the nucleus and phosphorylate DNA-bound
transcription factors such as AP-1, an important factor in TNF-mediated induction of AP-
l activity (Baud et al., 2001).
FADD may bind to the activated TNFR1-TRADD complex resulting in
caspase-8 activation and initiation of apoptosis (Ashkenazi and Dixit, 1998). In addition
• to FADD, an adaptor protein may bind to RIP at the activated complex through a death
21

•
•
•
domain and mediate apoptosis by recruitment of caspase-2 through a CARD domain
(Ashkenazi et al., 1998).
1.1.5.3 The Death Receptor 3 (DR3)
Like TNFR1, DR3 mediates responses such as Nf-KB activation and apoptosis,
and also employs similar mechanisms for their activation. Therefore, with respect to their
induction of apoptosis and activation ofNf-KB, Apo3L and TNF, endogenous ligands for
activation ofthe DR3 and TNFR1 receptors respectively, show little differences.
However, the most notable, distinctive feature with respect to one another, is in the
expression pattern ofTNF and Apo3L, and oftheir respective receptors TNFR1 and
DR3.Unlike TNF which is expressed mainly in activated macrophages and lymphocytes,
Apo3L transcription occurs constitutively in many tissues (Ashkenazi and Dixit,
1998).With respect to their cognate receptors, the opposite occurs. That is, whereas
TNFRI is expressed in all cells, DR3 transcripts are induced by activation in T-cells,
suggesting distinct biological roles for the Apo3L-DR3 and TNF-TNFR1 interactions.
1.1.5.4 The Death Receptors 4 and 5 (DR4 & DR5)
DR4 and DR5 are two identified receptors for binding ofTNF-related apoptosis-
inducing ligand (TRAIL), a cytokine known to induce apoptosis in tumor celllines
(Bhardwaj et al., 2003). Numerous contradictory accounts mean that the signalling
mechanisms and effects poorly understood. In addition, TRAIL has been shown to bind
to three other receptors with no signalling consequence (Bhardwaj et al., 2003). In fact,
two ofthese receptors, known as DcRl and DcR2, contain extracellular domains that can
22

•
•
•
bind to TRAIL but cannot transmit an apoptotic signal either due to a lack of a death
domain at the cytoplasmic end of the protein (DcRl) or to an inactive truncated one
(DcR2) (Bhardwaj et al., 2003). DcRI is a glycosyl phosphatydilinositol (GPI)-anchored
cell surface protein. Since DcRI and DcR2 are expressed in most normal tissues but are
rarely seen in numerous tumor tissues, it has been suggested that they serve as decoy
receptors in normal cells that protect
against the cytotoxic effects ofTRAIL,
which may account in part for the
resistance ofnormal celllines to TRAIL-
induced apoptosis (Bhardwaj et al.,
2003).
Functional roles
TRAIL has been implicated in
physiological roles such as peripheral T-
cell deletion and in killing ofvirus-
infected cells (Ashkenazi et al., 1997).
23
Apo2UTRAIL
AA
l ~OcR1
OR4
or OcR2
ORS
Adaptor?
y
Caspase?
y
Effector
caspases
y
Apoptosls
Figl.7: Apoptosis signalling by DR4/DR5 and
its modulation by the decoy receptors.
(Ashkenazi and Dixit, 1998)

•
•
•
1.1.6 Mitochondria and Apoptosis
The mitochondria's involvement in the apoptotic process is well established
(Green et al., 1998; Scorrano et al., 2003; Hengartner et al., 2000; Green et al., 2004).
Indeed, mitochondrial outermembrane permeabilization (MOMP) and leakage ofproteins
from the mitochondrial intermembrane space is not only associated with apoptosis
through caspase activation but leads to cellular death even in a state of caspase inhibition
(Green et al., 1998; Green et al., 2004). For example, abolishing caspase activity through
the use of caspase inhibitors does not always block cellular death induced by
proapoptotic stimuli. In fact, under the presence of apoptotic stimuli, caspase inhibition
may weIl prevent certain, if not most, morphological changes associated with apoptosis.
Nevertheless, the cells do not retain their replicative or elonogenetic potential and
ultimately die by non-apoptotic means. On the other hand, Bel-2 and Bel-XL,
antiapoptotic members of the Bel2 family of apoptotic proteins, can effectively inhibit
apoptosis induced by proapoptotic stimuli by protecting the mitochondria from damage
induced by death signaIs. The net result is a cell that can survive and maintain its
elonogenetic capabilities in face ofdeath, an advantage widely exploited by cancer cells.
In addition, proapoptotic molecules ofthe Bel2 family such as Bax (see Section 1.1.6.1)
can induce mitochondrial damage and lead to cellular death independently ofcaspases
(Green et al., 1998). Therefore, although numerous pathways which function in the
induction ofapoptosis require caspase activation, mitochondrial function plays a central
role in the induction ofapoptosis and cellular death.
24

•
Mechanisms
The mitochondrion supports cellular requirements of adenosine triphosphate
(ATP) through a process known as oxidative phosphorylation. Oxidative phosphorylation
is a process which involves a successive transport of electrons in the mitochondrial
intennembrane space, through a set ofproteins collectively known as the electron
transport chain, coupled to the fonnation of a proton gradient. This proton gradient serves
as fuel for the phosphorylation ofadenosine diphosphate (ADP) to ATP. Disruption of
the electron transport chain has already been recognized as an early marker for apoptosis.
In fact, both 'Y-irradiation and ligation of Fas lead to disruption of electron transport at the
cytochrome c level, a member of the electron transport chain (Eamshaw et al., 1999).
However, since decreases in ATP production appear late in the apoptotic process, they
• cannot be solely responsible for mitochondria-mediated apoptosis (Green et al., 1998). In
fact, ATP is required for some downstream events in apoptosis. Nevertheless, the release
•
ofcytochrome c brought about by MOMP is implicated in the induction ofapoptosis
through the fonnation of a caspase-activating complex known as the "apoptosome". The
apoptosome is a complex fonned by cytosolic cytochrome c, apoptotic protease
activating factor (Apaf-l), and procaspase-9 (Scorrano et al., 2003). The triggering ofthis
post-mitochondrial pathway activates caspase-9 which subsequently cleaves and activates
effector caspases-3 and -7.
Numerous proapoptotic stimuli, including Bax overexpression and UV irradiation,
proceed through MOMP for their induction ofapoptosis and are inhibited by the presence
ofBcl2 at the mitochondria but not by caspase inhibition (Green et al., 1998). Similarly,
25

•
•
•
deficiency in any component ofthe apoptosome, the formation of which is initiated by
MOMP, creates defects in the ability ofthe cell to induce apoptosis through intrinsic
death signaIs. In addition, MOMP is associated with the eventual collapse of the electron
transport chain due to the release ofcytochrome c from the intermembrane space ofthe
mitochondrion and subsequent necrosis ofcells in a state ofcaspase inhibition (Green et
al., 1998). Therefore, MOMP is a pivotaI event in cellular death which is associated with
the induction of cellular death by apoptosis through the formation of the apoptosome and
caspase activation or by necrosis due to the collapse of the electron transport chain. The
path taken may very well be cell-type dependent. For instance, when cytochrome c
storages are sufficient, leakage ofcytochrome c from the mitochondrial intermembrane
space into the cytosol can activate caspases while leaving the electron transport chain and
ATP production unaffected. In other cases, the presence of endogenous caspase inhibitors
may prevent such apoptotic activity from cytochrome c release and instead, the inevitable
loss offunction ofthe electron transport chain may slowly drive the cell towards a
necrotic cellular death (Green et al., 1998). However, several other proapoptotic proteins
are released from the mitochondria through MOMP such as procaspases-2, 3, 8, and 9,
IKBa (an Nf-KB inhibitor), HsplO (a coactivator of the apoptosome), and apoptosis-
inducing factor (AIF; a caspase-dependent event which induces chromatin condensation
and large-scale DNA fragmentation (50kbp)) (Arnoult et al., 2003; Green et al., 2004).
In addition, reactive oxygen species (ROS) such as the superoxide anion (02-) are
generated through electron transport since 1-5% ofelectrons are lost in the course ofthe
transfer and participate in O2- formation (Green et al., 2004). The loss of cytochrome c
during MOMP causes a net decrease in the efficiency of electron transport and a
26

•
•
concomitant increase in ROS. ROS assist apoptosis by direct activation of procaspase-9
(Green et al., 2004). Two observations further places ROS production as a possible
amplificatory mechanism in mitochondria-induced apoptosis: 1) ROS production from
the mitochondrion may be responsible for the MOMP and 2) inhibition ofROS through
antioxidant treatment and/or overexpression of the mitochondrial antioxidant enzyme
Mn-SOD abrogated release of cytochrome c and apoptosis (Li et al., 2003).
1.1.6.1 The Bel2 Family of apoptotic Proteins
The second elass ofmechanisms involving the induction ofMOMP during
apoptosis is dependent on the Bel2 family ofproteins. Bel2, the founding member of the
family, is an anti-apoptotic protein first discovered as a gene activated through
chromosome translocation in b-celllymphomas and unexpectedly found to permit
survival of cytokine-dependent haematopoietic cells in the absence ofcytokine
(Tsujimoto et al., 1986; Vaux et al., 1988). Their discovery established that the genetic
controls which mediate cell survival and cell proliferation were distinct entities both of
which could be contributing factors to neoplasia (Vaux et al., 1988; Adams et al., 1998).
The Bcl2 family of proteins are critical regulators of cellular death and is composed of
pro- and anti-apoptotic members that are balanced in equilibrium in normal cells,
negating each others function. A shift in equilibrium either towards the pro-apoptotic or
anti-apoptotic members strongly enforces cellular death or survival respectively.
Three groups ofBel2-related proteins are known. Sequence conservation
among the three groups ofproteins is limited to at most four short a-helical homology
• regions known as Bel2 Romology (BR) domains 1 to 4 (BR1 to BR4). Each group is
27

•
•
•
distinguished from the other by the number of such BH domains it contains and its
function with respect to apoptosis. Members of the pro-survival grouping are the only
ones known to contain the BH4 domain and must contain at least the BH1 and BH2
domains which are essential for their cell survival properties (Adams et al., 1998). Two
other groupings can be considered when subdividing the pro-apoptotic members ofthe
Bel2 family. The first comprises those whose only homology to other known Bel2
proteins resides in the 15 residues contained in the BH3 domain, the BH3-0nly proteins,
and who are otherwise unrelated to other known protein families. The second grouping
incorporates all other pro-apoptotic family members, all of which are known to contain
more than one Bel2 Homology region, the multi-domain pro-apoptotic proteins. In both
groups, the BH3 domain is a critical factor for induction of cellular death and is found to
be sufficient for pro-apoptotic activity (Cosulich et al., 1997; Letai et al., 2002).
Regardless of the grouping, most ofthe members of the Bel2 superfamily contain a
carboxy-terminal hydrophobic domain which targets these proteins to the cytosolic face
ofthe outer mitochondrial membrane, the endoplasmic reticulum and the outer nuelear
envelope, and may act as damage registers ofthese membranes (Adams et al., 1998;
Kelekar et al., 1998). Additionally, while members such as Bel2 and Bak seem to be
constitutively localized at the outer mitochondrial membrane, only a fraction of Bel-XL is
known to actually reside on membrane surfaces and may preferentially be cysolic while
for Bax, the localization to the membrane surfaces from the cytosol depends on the
initiation of an apoptotic stimulus, an event which is also linked to MOMP (see Section
1.1.6) (Hsu et al., 1997; Sharpe et al., 2004).
28

• Dlmertzatlon (receptor) Membrane
Pro-survlval Regulation domaln ancl10r
1Bcl-21 Subfamily Pore formation
r----l
1
a6 1 a7a1
.. U .. a2 a3 0:4 a5,----, r---. rï ,--, r--ï r-----l I l
Bcl-2 IBH41 lenBI lellli 1:>!le~ ~I
Bel-xL -c::::J ~
Bel-v.1 -c=::J rmr: ~
1l1el-1 II ~
A1 te
NR-13 ~
BHRF1 LW ~
LMW5-HL LW
ORF16
== ~
KS-Bcl-2 LE ~
E1 B-19K &ES
CED-9 c::::=J ='A0VAr-
Figure 1.8: The antiapoptotic Bel2family members. (Adams et al., 1998)
• The genetic analysis ofthe nematode C. Elegans established one locus, ced-9,
which could counteract the action of the caspases Ced-3 and Ced-4 (see Section 1.1.2).
Ced-9 proved to be a structural and functional homolog ofBcl-2 while Egl-1, another C.
Elegans protein, belonged to the BH3-0nly grouping ofpro-apoptotic proteins and could
functionally antagonize the action of Ced-9.
Pro- and antiapoptotic proteins can heterodimerize which suggested that the
relative abundance ofthese competing groups ofmolecules may determine the cell's life
or death (Adams et al., 1998). Through mutagenesis studies, residues within the BH1 and
• 29

•
•
•
Pro-apoptosls
lsax,! Subfamlly
Bax
Bal<
Bol<
IBH31 Su bfa mlly
Bik
BII<
Hrl<
Ligand
domaln
r---ï
..1.···_··--------------~.... .....--------------~......------~....
.. ~I
BNIP3
BimL
Bad
Bid
EGL-1
----------------------~........------~~
--------------------..----~
........
Figure 1.9: The proapoptotic Bcl2family members. (Adams et al., 1998)
BR2 domains were shown to be vital for the survival functions ofthe anti-apoptotic
proteins and for their binding to pro-apoptotic members such as Bak or Bax (Adams et al.,
1998). In contrast, the heterodimerization properties ofpro-apoptotic molecules were
found to be mediated by the BR3 domain. Isolation ofthe Bax-bound three dimensional
structure ofBcl-XL provided the final explanation by showing that the BR1, BR2, and
BR3 a-helices of anti-apoptotic members formed a hydrophobie binding pocket to which
BR3 domains ofpro-apoptotic proteins could bind (Sattler et al., 1997). Rowever, which
of the two provided the primary function that would be opposed by the other
30

•
•
•
still proved to be an enigmatic query.
Initial analysis of the NMR structure of
the Bel-XI-Bax complex identified
residues within the hydrophobie eleft
which were involved in BH3 binding
and resulted in the creation of a series of
mutants with altered BH3-binding
capabilities (Kelekar et al., 1997). These
results suggested that the survival-
promoting properties of Bel-Xl were
mediated through dimerization to pro-
apoptotic members. However, the
development of mutants which
Fig 1.10: The Bcl-XLlBak BH3 peptide
comp/ex. (Adams and Cary, 1998)
lost their ability to heterodimerize with pro-apoptotic members yet still retained much of
their cellular survival properties proved otherwise (Cheng et al., 1996). These studies
proved that the anti-apoptotic activity of Bel-XL was mediated in a Bax- and Bad-
Independent manner and suggested that the pro-apoptotic activity of at least sorne BH3-
Only proteins such as Bad was exerted through their binding and subsequent inhibition of
the anti-apoptotic counterpart (Kelekar et al., 1997; Cheng et al., 1996).
APAF-l Inhibition
31

•
•
•
Pro-survival Bel-2 proteins may act by direct binding to Ced-4 like molecules such as
APAF-1 and prevent their activation of initiator caspases (Adams 1998). In fact, both
Bel-XL and Ced-9 were found to bind to Ced-4 and prevent its apoptotic activity through
association with Ced-3 (Adams et al., 1998). In addition, Bel-XL can bind to the Ced-4-
like portion of APAF-1 and hinder its association with procaspase-9 at its amino-terminal
CARD domain (Adams et al., 1998). In contrast, the pro-apoptotic members may mediate
their death-inducing properties through inhibiting these interactions (Adams et al., 1998).
In addition, Bel2 may preserve integrity ofthe mitochondria and act in preventing the
efflux of cytochrome c and the formation of the apoptosome (Adams et al., 1998).
BHl- and BH2-mediated Pore Formation
The BR1 and BR2 domains are contained in aIl ceIl survival members but only in
one group of the pro-apoptotic proteins. Ofparticular significance is the structural
similarity ofthe alpha 5 and 6 helices (the helices flanked by the BR1 and BR2 domains)
of Bel-XL to the pore-forming domains ofbacterial toxins such as colicin Al and
diphtheria toxin (Adams et al., 1998; Kelekar et al., 1998). The possible implication that
the BR1 and BR2-containing Bel2 members could form channels through cellular
membranes prompted a quest for an answer. In vitro studies confirmed the formation of
ion-conducting channels by these proteins (Kelekar et al., 1998). Furthermore, in
contrary to Bel-XL, the pores formed by Bax were unstable and resulted in membrane
rupture (Sharpe et al., 2004). In addition, the channel selectivity of the pores formed
markedly differed and displayed multiple conductance states which may be due to the
different charge distributions and residues contained in the BR1 and BR2 domains
32

•
•
•
flanking the pores and to whether the proteins are in a monomeric or oligomeric state
(Kelekar et al., 1998). This data suggested that Bax interacted primarily with lipids at the
membrane bilayer, disrupting its structure and forming pores with both protein and lipids
(Sharpe et al., 2004).
Bax exists as a monomer in healthy cells. In the presence ofan apoptotic
stimulus, cytosolic Bax relocalizes to the outer membrane of the mitochondrion as a
consequence ofthe exposure of the C-terminal membrane-anchor domain and forms large
oligomeric complexes important for the permeabilization of the mitochondrial outer
membrane (Sharpe et al., 2004). In addition, the exposure ofa 6 amino acid N-terminal
epitope (amino acids 13 to 19) is also required prior to Bax oligomerization at the MOM
and may also be a regulating step for Bax activation (Sharpe et al, 2004). Thus the
cumulating evidence indicates that a multitude of steps must occur prior to Bax
oligomerization and that at each ofthese, numerous factors can intervene to assist and/or
inhibit the process. Interestingly, binding of Bid to Bak causes exposure of a related N-
terminal epitope prior to Bak oligomerization. In addition, Bcl-2 at the mitochondrial
outer membrane can also bind to the exposed N-terminal epitope and thus prevents Bak
oligomerization along with MOMP (Sharpe et al., 2004). In addition to this pore-
formation property, Bax may also mediate MOMP through specific interactions with the
mitochondrial PT pore complex (Sharpe et al., 2004). In support ofthis claim there have
been documented reports of Bax-mediated VDAC opening in synthetic liposomes,
inefficient at Bax-induction of cytochrome c release in yeast VDAC-1 knockout cells,
and prevention of Bax-mediated cytochrome c release, mitochondrial swelling, and loss
33

•
ofmitochondrial transmembrane potential by the PTP inhibitors cyelosporine A (CsA)
and bongkrekic acid (Sharpe et al., 2004; Narita et al., 1998).
Bid is a member ofthe BH3-0nly group ofBel2 proteins and has been suggested
for quite sorne time to promote apoptosis through Bax-activation (Sharpe et al., 2004). As
previously mentioned in Section 1.1.5, many death receptor signalling pathways
converge with the activation of caspase-8. Upon activation, caspase-8 eleaves part of the
Bid amino terminal to create a truncated form ofBid referred to as tBid (other
posttranslational modifications can further activate the protein (Zha et al., 2000)). tBid
may then directly induce Bax N-terrninal exposure, BaxlBak oligomerization, and
MOMP (Sharpe et al., 2004). Although several reports do indicate a synergistic
relationship between Bid and Bax for the induction ofMOMP (Kuwana et al., 2002)
• along with mediating appropriate conformational changes in Bax (Desagher et al., 1999)
and inducing apoptosis in a BaxlBak dependent manner (Letai et al., 2002), inferring a
direct in vivo relationship between Bid and Bax is difficult due to the multiple number of
signaIs which may result in Bax activation (Sharpe et al., 2004). Nevertheless, current
models (described below) for the involvement ofBH3-0nly proteins do involve direct
interaction between Bid and BaxlBak as weIl as Bid and the antiapoptotic Bel2.
•
Current Models: Involvement ofBH3-0nly proteins
Two distinct subsets ofBH3-0nly proteins have been described which differ in
their role in the induction of apoptosis (Letai et al., 2002). One group, the bid-like
proteins, is distinguished by its ability to induce oligomerization ofBak/Bax and release
of cytochrome c from the mitochondria (Letai et al., 2002). This group consists ofBH3-
34

•
Only proteins such as Bid (as the name bid-like implies) and Bim. When isolated,
peptides containing the BH3 domain of these BH3-0nly proteins can induce the release
ofcytochrome c from the mitochondria in a Bak/Bax dependent manner while peptides
derived from the BH3 domains ofthe BH3-0nly proteins Bad, Bik and Noxa could not
even at 10x the initial concentration, an activity which was not correlated with the ability
ofthese peptides to form alpha-helices as they do in their native proteins (Letai et al.,
2002). Similarly, the BH3 peptides derived from the antiapoptotic protein Bc12 initiated
no such response. This initiated the first distinction between groupings ofBH3-0nly
proteins.
The observation that Bc12 could inhibit the Bid-mediated release ofcytochrome c
and oligomerization ofBak/Bax, came to suggest that its function was to disable the pro-
• apoptotic activity of Bid-like BH3-0nly proteins, working upstream of the involvement
ofBid-like proteins with Bak or Bax (Letai et al., 2002). Curiously, Bad was found to
restore the ability ofBH3 peptides from Bid-like proteins to induce cytochrome c re1ease
and BaxlBak oligomerization. The reasoning became that these Bad-like proteins may
function to interfere with the antiapoptotic protection of Bc12 by occupying the
hydrophobie pocket ofBc12 and freeing bound Bid-like proteins. In fact, Bad
demonstrated a 5-fold increase in affinity for the Bc12 binding site with respect to Bid
(Letai et al., 2002). Bik also showed the ability to overcome Bc12 protection, although
with lesser potency than Bad, and induce apoptosis in Bc12-overexpressing mitochondria
that were treated with activated Bid (Letai et al., 2002). Fluorescence polarization
• 35

• Cell Death Signal
Bjd-like BH3 / ~ Bad-like BH3
·Activalor" -:b. • ~ Li::-' "Enabler'
(~Id, Slm).v 11/ (l-3ad, I:Ilk)
Multi.f3H domain / !~ MiJlti·BH domain
aElfector" aRegulator"
(Bax, Bak) . (Bd-2, Bei-xL)
r -"" - -" ~""; • - - - - - - - -" -: ,- - - - - - - -"" - -"", ,""""" -""" -"" - - ~ .. -" "" -" -" - - " / - -" - - - -" ~
:(1)11'i'=-=®:i®Ji,i
n° i'® :~ ~ • • ' ~ 1
: l : ~ ~ /1 t ~ 1
. : . ~ : : : : :. . . . . . . . ,-=====J:===U===IJP=-=
i *:! ! :! * :: !;; *lApoptosis j ~Apoptosis ~ ~ Apoptosis ~: Apoptosis ~! Apoptosis• • 11<1111 • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • IiIIA . . . . . . . . iltl& . . . • ' • • • 11 . . . . . . . . . . . . . . . . . . . . . . .
Fig 1.11 : Involvement ofBid- and Bad-likeproteins in the mitochondrial apoptotic pathway.
(Illustration from Chittenden T., 2002).
analysis then further confirmed displacement ofthe Bid BH3 peptide from the Bc12
• binding site by the Bad BH3 peptide (Letai et al., 2002).Therefore, Bad and Bid seem to
have a synergistic relationship in the induction ofapoptosis such that, in the presence of
Bad, an initial trickle of activated Bid which results from caspase-8 c1eavage and
posttranslational modifications would suffice to initiate Bax/Bak oligomerization and
release ofcytochrome c along with other proapoptotic factors from the mitochondria and
promote apoptosome formation.
Recent reports have suggested that a second BH3-independent pathway exists for
Bid-mediated cytochrome c release (Scorrano et al., 2003). The cytochrome c release in
response to Bid-mediated Bax activation is incomplete with only about 15% of
cytochrome c released. However, analysis ofcytochrome c release during apoptosis
•
indicates that the extent ofrelease is remarkably complete (Scorrano et al., 2003). When
36

•
•
•
the ultrastructure of the mitochondrion is taken into account, about 85% ofthe
cytochrome c stores, along with the majority ofthe oxidative phosphorylation complexes,
are located in the cristae while only 15% is localized in the intermembrane space. This
accounts for Bid-induced cytochrome c release but does not explain how the remainder is
discharged during apoptosis since in most cases, swelling ofthe mitochondria does not
occur. Evidence indicates that a BH3- and Bax-independent pathway ofmitochondrial
remodelling occurs in response to tBid which allows the mobilization ofcytochrome c
stores for release across the outer membrane. The triggering of mitochondrial inner
membrane remodelling is also associated with a transient permeability transition (PT) and
is blocked by the PT inhibitor CsA (Scorrano et al., 2003).
1.1.7 Summary
Apoptosis is an intrinsic mechanism for the regulated destruction of a cell
characterized by chromatin condensation, fragmentation of the genomic content, and
dynamic blebbing of membrane-enelosed cellular debris. The apoptotic pathway evolved
from its origins in the nematode C.Elegans, to a complex modular signalling pathway
whose regulation and execution is dependent on a number of specialized regions of
proteins sequences called domains. These domains are found at every level ofthe
apoptotic pathway, from its signalling through the death receptors to its execution by the
mitochondrial and/or caspase-dependent pathways (see Sections 1.1.3 and 1.1.6). Due to
the nature of the project, the Death Domains reviewed in Section 1.1.4 and the Bel2
Homology Domains (Section 1.1.6) will be of particular interest for the an understanding
of the results discussed in the following chapters of this thesis.
37

•
•
•
1.2 Bioinformatic approaches to sequence homo10gy identification
The genome sequencing projects have provided in recent years an increasing
amount of biological sequence information. The deciphering and annotation of protein
function has, for the vast majority, remained an important problem. The field of
bioinformatics has provided sorne solutions to the problem through a variety ofhomology
searching (the detection offunctional domains within unannotated proteins through the
use of annotated ones) protocols, each with their shortcomings and benefits. An important
segment ofthe research described within this thesis was aimed towards the development
ofsuch a utility. Hence, let us first go over a brief review ofthe methodologies developed
in the field before introducing the one produced as part ofthis research project (see
Chapter 2, Section 2; Chapter 4).
1.2.1 Pairwise Homology Screening
The first homology searching tools such as the BLAST (Altschul et al., 1990) and
FASTA (Pearson et al., 1988) family ofprograms are essentially pairwise comparison
tools which have helped tremendously in the discovery of novel apoptotic proteins
(Gururajan et al., 1999). Simply, by comparing the similarities between two sequences,
whether they are of a proteomic or genomic nature, a likelihood score ofhomology
between the sequences can be computed. The procedure followed by the majority
involves the generation of an alignment of the two sequences (or ofregions of strong
similarity between the two sequences). The manner with which insertions and deletions
are scored and added in the alignment accounts for sorne ofthe differences between the
38

•
•
•
approaches. Additionally, most ofthese methods employa substitution matrix, such as
the PAM (Percent Accepted Mutation; Dayhoffet al., 1978) or BLOSUM (Henikoffand
Henikoff, 1992) matrices, to evaluate the probability that point mutations could be
responsible for the differences between the aligned sequences. The original solution to
the problem came from dynamic programming algorithms which are used to performed a
thorough search and obtain a global (Needleman and Wunsch, 1970) or local)
(Waterman, 1984) optimal alignment (the optimality is defined by the scoring scheme
used in each ofthe cases). However, the computational demands ofthese algorithms,
especially of the global optimization approach, make them inefficient for use with the
large databases encountered today. The development ofheuristics which approximated
the dynamic programming approaches revolutionized the field by allowing large
databases to be screened within seconds. For example, the BLAST program works under
the assumption that regions ofhomology in protein families are strongly conserved
throughout evolution and are unlikely to require insertions or deletions when aligned.
Therefore, by searching for the largest scoring regions of identicallength homologies
between sequences, the BLAST (Basic Local Alignment Tool; A1tschul et al., 1990)
heuristic can rapidly calculate a value describing the expected occurrence of such a local
alignment score in random sequences. The original BLAST algorithm did not permit the
insertion of gaps in the aligned homology region and was therefore likely to fail in the
detection ofhomology segments with a significant amount of insertion and deletion.
Current versions ofBLAST have partially circumvented this problem by forming a
gapped local alignment from small, ungapped regions of strong homology (Altschul et
1 A web-accessible implementation of the Smith-Waterman algorithm is available through SSEARCH
(Pearson W.R., 1996).
39

•
•
•
al., 1997). The search for ungapped matches to a query sequence implemented by
BLAST inevitably accelerates the local alignment procedure and, in either case, results in
the fastest applications available today. Nevertheless, pairwise alignment methods
invariably have their drawbacks since a significant percentage of similarities are routinely
missed by these search methods (Gururajan et al., 1999). First, sequence homology is much
more likely to be detected for close relatives ofthe sequence and aUows for minimal
discrimination of more distantly related members andor ofdomains with weak pairwise
sequence homology. In addition, such methods generally assume an equal importance for
aU amino acids positions, an assumption which truly limits their power since a gathering
amount ofposition specific information is usuaUy available for protein families. AlI of
these drawbacks have prompted increasingly sophisticated bioinforrnatic solutions to the
problem.
1.2.2 Automated Profile Screening
Multiple alignments offer an interesting approach to homology searching methods
since a better perspective as to the relative importance ofposition-specific amino acid
residues can be obtained. The PSI-BLAST approach extends the BLAST algorithm to
integrate this idea through an iterative profiling of statistically significant alignments to
the query sequence (Altschul et al., 1997). The results yielded improvements over its
predecessors and brought about the identification of domains in proteins which could not
be previously detected using the standard search protocol (Gururajan et al., 1999). While
the idea is ofsignificant value, the results were not always as optimal as one may desire.
40

•
•
These approaches had been optimized for convenience and speed. The problem ofPSI-
BLAST is one that is inherent to any iterative profiling approach. Through the automated
addition of statistically significant matches to a query sequence, it generates a multiple
alignment at every iteration and conveniently bypasses the requirements for user
intervention. During any iteration, a statistically significant but random match could be
incorporated into the profile. The mistaken similarity will then used as part ofthe profile
for the next iteration of the 'training' process thereby leading to an amplification offalse-
positives in the search itself. Hence, the addition of biologically irrelevant data at any
iteration ofthe profile training process decreases the overall sensitivity ofthe method.
1.2.3 Profile Screening: Hidden Markov Models (HMMs)
In contrast to the automated screening procedures, the use of alternative, more
sophisticated methods such as generalized profiling techniques (Bucher et al., 1996) and
Hidden Markov Models (Eddy et al., 1996; Eddy et al., 1998) provides a more
informative, flexible and successful approach in representing any given set of
biologically related protein sequences. Due to the relationship of these approaches to the
sections ofthe project described in Chapters 2 and 4, the use ofHMMs in database
screening strategies will be described to a more extensive degree than the pairwise or
automated approaches.
In 1994, the work produced by David HaussIer and Ander Krogh at the University
ofCalifornia brought the world to the attention ofHMMs as a tool for modeling
biological sequences (HaussIer et al., 1994; Eddy et al., 1996). HMMs are a general
• statistical framework for modeling linear data sets and have been in extensive use in the
41

•
field of speech recognition. They provide a means of integrating a fully probabilistic
method to the problem of sequence alignments.
A Hidden Markov Model is a probabilistic model for generating sequences. It is
formed by a set of states and transitions between these states. Each state is said to 'emit' a
symbol from a given alphabet, randomly chosen from a given distribution. To each
transition between states A and B is associated a probability describing the probability of
going to state B given that we are in state A. In the specific case of sequence homology,
each of the states represents a given position in a sequence alignment ofthe protein
family ofinterest and emits a symbol corresponding to an amino acid residue (seefig
1.12 for a small example). The associated emission probabilities are determined by the
relative frequency of any given amino acid at each position to which a predetermined
• pseudocount value may be added. State-transition probabilities are then estimated with
respect to the specifics of the implemented architecture ofthe HMM along with available
•
training data (for a complete introductory review on HMMs, please refer to Eddy SR.,
2004).
Three problems are encountered when dealing with HMM's. First, the alignment
problem addresses the issue ofknowing what optimal state sequence could have been
used to generate the observed sequence. The Viterbi algorithm (Fomey, 1973), in
conjunction with the forward and backwards algorithms, offers a practical dynamic
programming solution to the problem by maximizing such probabilities over the number
ofpossible sequences of states that could have generated the sequence. Secondly, the
scoring problem addresses the issue of estimating the probability that an observed
42

•
•
•
sequence was in fact generated by a given HMM. Typically the problem is solved by
measuring the likelihood that the observed sequence in question was generated by the
given HMM versus a null model HMM specifically designed for this purpose. In order
for the likelihood to be measured, a log-odds score is calculated for the state sequence
retumed by the Viterbi algorithm for each ofthe two HMMs. The score difference
between the HMM modeling the domain of interest and the null model HMM is then
used to indicate which model best describes the observed sequence in question (Chapter
2, Sections 2.2.1.4 and 2.2.1.5). The third problem, referred to as the training problem, is
that of inferring the optimal parameters of an HMM that could best account for the data.
Algorithms such as the Baum-Welch Expectation-Maximization method offer an
interesting solution to the problem by allowing untrained HMMs to be optimized from a
given set ofparameters (Baum, 1972). The caveat however is that the optimization can
only be performed to local maxima for any given set ofstarting parameters. Since the
local maxima may in several cases not be globally optimal, several heuristics have been
proposed to overcome this typical drawback (Eddy et al., 1998). Therefore, despite the
usefulness of the Baum-Welch algorithm, sorne parameters should always be estimated
before an optimization is set in place.
An additional advantage cornes from the multiple alignment generated by the
alignment of sequences to an HMM. Indeed, HMM-derived multiple alignments are
conceptually different than most multiple alignment heuristics in use today. Instead of the
many-to-many computationally intractable multiple alignment problem, the many-to-one
sequence-HMM alignment problem is much c10ser to the intuitive idea ofwhat a multiple
alignment should be, an alignment of sequences to a common consensus model.
43

•
•
•
Recently, HMM-based progressive multiple alignments algorithms have been proposed
(Loytynoja et al., 2003). However, although the use ofHMM's for multiple alignments
holds much promise, it is yet to surpass the standard heuristics commonly used today.
1.2.3.1 Profile Hidden Markov Models
The grouping ofprotein families depends on a common evolutionary relationship
that has through time preserved features essential to the role and function of these
proteins. Such conserved protein segments are commonly referred to as motifs or
domains. Within such a motif, different residues display variable degrees of conservation
and are subject to different selective pressures. WeIl designed multiple alignments of
these protein families reveal such conservation patterns where sorne positions may be
conserved strictly on a particular property (hydrophobicity, size, etc.), others may have
much more stringent evolutionary requirements. Similarly, sorne positions ofthe
alignments are much more prone to insertions and deletions than are others where the size
of segments separating residues is of sorne functional importance.
The advent ofProfile HMM's (PHMMs) made a major contribution by
allowing for the justified, statistical use ofinsertions and deletions for generating an
optimal alignment of an HMM to a query sequence (Eddy et al., 1996). PHMMs are built
just as traditional HMMs with the exception that each match state is accompanied by a
deletion and insertion state. The match states refer to the standard emission states
discussed above and represent a column in a multiple alignment ofthe training data. The
deletion state is an empty state in that it emits nothing and skips over any given match
44

•
•
•
123
CAF
CGW
CDY
CV F
CKY
Fig 1.12: A sample profile HMM representing a multiple alignment offive sequences with three
columns (left). Each ofthe columns in the alignment correspond to a single match state in the profile
HMM (squares labeled as ml, m2 and m3). Each match state contains an emission distribution over the 20
amino acids in the proteome that is representative ofthe amino acid distribution in the alignment column.
Insertion states and delete states are labeled as diamonds iO, il, i2 and i3 and circ/es dl, d2 and d3
respective/y. The circ/es labeled band e represent the non-emitting begin and end states respectively.O
state. On the other hand, an insertion state, which contains astate-transition to itself,
exists between every pair of match states and allows for symbols to be inserted using
sorne background probability. The handling ofsuch insertions and deletions in standard
alignment protocols has thus been a limiting factor in the efficiency and sensitivity of
such methods to decipher distantly related homologs and is thus a driving force behind
the extensive use ofHMM's in the field. Thus, profile HMM's essentially tum a multiple
sequence alignment into a position-specifie scoring system that can be relatively easily
used for database screening of distantly-related homologous sequences (Eddy et al.,
1998).
The profile HMM approach to database screening has been extensively
employed as part of a successful effort for the creation of an HMM database. The
database, known as Pfam (Bateman et al., 2004; Sonnhammer et al., 1998), currently
45

•
contains two sets ofalignments for each entered protein domain along with the profile
HMM. The first is a seed alignment maintaining a manually verified representative set of
domain sequences (Sonhammer et al., 1998). A program known as HMMer (Eddy et al.,
1998) is then used to generate a profile HMM from the seed alignment (Sonnhammer et
al., 1998). The second alignment, called the full alignment, is automatically generated by
searching Swiss-Prot, a curated protein sequence database (Bairoch et al., 2004), and
aligning matches to the HMM profile (Sonnhammer et al., 1998).
1.2.3.2 Motif-based Hidden Markov Models
The motivation behind the use ofa Motif-based HMM approach to protein
• annotation arose from the requirements ofmodeling protein families. For such models,
the number of required parameters demands a large training set in order to be accurately
estimated and be representative of the protein family it attempts to characterize. In the
field of speech-recognition, in order to reduce the number oftrainable parameters,
researchers have successfully attempted to reduce the size and simplify their models
(Woodland et al., 1994). Similarly, through the use ofsmaller HMM's which focus on
conserved segments, called motifs, the demand for a large training set and the
complexity of the representative model is thus reduced to within a reasonable range
(Grundy et al., 1997). In any given training set, the a priori knowledge ofthe motif
structure of the family is not always given. Lawrence and Reilly initially introduced
• 46

•
•
•
Fig 1.13:The Meta-Meme architecture: A motif-based hidden markov model architecture. The numbered
squares indicate match states white the diamonds shapes indicate insertion states. The circles represent
special, non-emitting begin and end states. (Eddy SR., 1998)
the use of the expectation maximization (EM) methods as a means of solving the motif
leaming problem (Bailey et al., 1995). The concept was further developed by the
Multiple EM for Motif Elicitation (MEME) algorithm which adapts the method for the
identification ofmultiple motifs in biological polymers (Bailey et al., 1995). Meta-
MEME develops motif-based models that rely on the motif results generated by the
MEME algorithm. However, previous EM approaches to the motif-Ieaming problem had
severallimitations. The first was simply the choice of a correct starting point for the
motifs within the sequences. The second arose from the one-occurrence-per-sequence
model generated that assumed that each of the sequence in the training set contain one
occurrence ofa motif. The consequence was that multiple appearances of a motifwould
under-contribute to the model whereas sequences with no presence ofa motifwould
over-contribute to the model. Meta-MEME also overcame many ofthese limitations by:
1) building an n-occurrences-per-sequence model and 2) systematically choosing starting
points based on aIl the subsequences ofthe dataset. Through this, Meta-MEME can
characterize, within reasonable probabilistic standards, any given set of biologically
related sequences.
47

•
•
•
Several advantages can be gathered from the Meta-MEME technique. The use
ofan n-occurrences-per-sequence model , choice of a correct start site for each motif
within each of the sequences, use of a motif-based HMM for the characterization of
protein families are all factors which allow for an increased sensitivity in database
searches. However, the use of EM methods is still susceptible, albeit to a lesser degree, to
many inconveniences such as local optima and therefore makes it always preferable to
incorporate as much ofthe known biological information within the models as possible.
In addition, Meta-MEME is specific to only a select c1ass of motifs known as contiguous
motifs where insertions and deletions are not allowed. For complex motif structures such
as the death-fold domains, where the sequence conservation is minimal and multiple
alignments resume to short segments of identifiable sequences, the use of such models
would reduce the amount of information contained within the spacing regions of the
conserved segments. This would not properly model the domain since all ofthese
structures rely on a specific tertiary structure the specifics ofwhich are contained within
the sequences of the spacing regions. Furthermore, the evolution ofcellular pathways is
such that the cassette-like mix-and-match approach of coupling pathways by joining
motifs may allow for novel matches to be uncovered and should at least be taken under
consideration. The models generated by a leaming approach may not account for nature's
flexibility in combining the involved motifs.
48

•
•
•
1.3 Thesis Objectives
The initial assignment ofthis project involved the discovery of novel members of
the Bel2 family of apoptotic proteins. The quest resided in the particular interest ofthe
research conducted in Professor Shore's laboratory on various aspects of apoptotic
regulation, ineluding the Bel2 family (Ruffolo and Shore, 2003) and the resident ER
protein BAP31 (Wang et al., 2004). Our search strategies were specifically directed
towards the discovery ofnovel BH3-0nly proteins.
In an effort to generalize our search towards an entire spectrum ofapoptotic
proteins and concurrently create an improved strategy for the discovery ofBH3-0nly
proteins, a more flexible approach to database screening was devised. The searching
utility devised incorporates many elements which have been successful in the field so far.
Additionally, the utility performs betler at deciphering true-positives than its predecessors
due to novelties in the modeling and scoring processes (Chapter 2, Section 2.2; Chapter
4).
49

•
•
•
Chapter 2
Computational approaches for the Discovery of novel
Apoptotic Proteins
50

•
•
•
2.1 Position-Weight Matrices
One of the uses ofmatrices in bioinformatics is as a tool for assembling statistical
information on specific protein or DNA sequences. With that perspective, domain specificities can
be nice1y organized into a matrix which would then describe the probability offinding any given
amino acid at any given position for the protein sequences of interest. Initially, the purpose ofthis
project revolved around the discovery of novel BH3-0nly proteins. Position weight matrices
(Stormo et al., 1989) proved to be ideal for this purpose by allowing gapless domains to be
modelled accurately. By using the sequences ofknown BH3 domains, a PWM would then be built
to describe the probability that any protein segment of length equal to that ofthe domain of interest
could potentially be a new functional domain entity. This then provided a practical means of
evaluating all protein segments for their potential in functioning as a BH3 domain.
2.1.1 The Method
A PWM is a representation ofthe sequence of a protein domain that captures the frequency
of each amino acid at each position ofthe domain. Such a matrix is built from a set of known
domain instances and can be used to scan a protein for new candidate domains. Therefore, using
the data for any ungapped domain of size n, a 20 by n matrix indicating probabilities for each
amino acid at each ofthe n positions gathered in the data can be generated.
Given a domain d composed of n amino acids, a matrix M of size 20 by n is built which
describes the probability that any n-Iong polypeptide is an occurrence of the domain d. Let JB(j)
be the background frequency (the frequency ofthe amino acid that is observed in a proteome
51

Bioinformatic approaches to the discovery of apoptotic proteins

Bioinformatic approaches to the discovery of apoptotic proteins

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Bioinformatic approaches to the discovery of apoptotic proteins

Similar to Bioinformatic approaches to the discovery of apoptotic proteins (20)

Bioinformatic approaches to the discovery of apoptotic proteins