Predicting peptide interactions using protein building blocks

Faculty of Science and Bio-engineering Sciences
Department of Bio-engineering Sciences
Predicting peptide interactions
using protein building blocks
Thesis submitted in partial fulfilment of the requirements for the degree of
Doctor in Bio-engineering Sciences
Peter Vanhee
Promotor: Prof. Dr. Frederic Rousseau
Co-promoter: Prof. Dr. Joost Schymkowitz
March 4th, 2011

Published by the VIB Switch Laboratory
SWIT, Department of Bio-engineering Sciences
Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussel
Apart from any fair dealing for the purpose of research, private study, criticism or review, this
publication may not be reproduced, stored in a retrieval system, or transmitted in any form, by any
means, electronic, mechanical, photocopying, recording, scanning, or otherwise, without the prior
permission in writing of the publisher.
Peter Vanhee was funded by a PhD grant from the Institute for the Promotion of Innovation
through Science and Technology in Flanders (IWT-Vlaanderen), Belgium.
Predicting peptide interactions using protein building blocks. Peter Vanhee PhD disser-
tation Vrije Universiteit Brussel, Brussels, Belgium, March 2011.
Cover: design by Antonio De Marco and Peter Vanhee.
© Vrije Universiteit Brussel, all rights reserved.

Summary
P
roteins are by far the most versatile and complex molecules in the cell. It is
commonly accepted that protein function directly relates to three-dimensional
structure, which in turn is dependent on the specific amino acid sequence of the
protein. Peptides are short sequences of amino acids that perform a myriad of
functions and are estimated to be involved in up to 40% of all protein-protein
interactions. The lack of structural evidence for many of these peptide interac-
tions however has hindered the functional annotation of this important class of
molecules and the development of peptides as therapeutics. In this thesis, we
propose the use of small, recurrent polypeptide fragments as one way of solving
the lack of protein-peptide structures. We show that protein-peptide binding sites
can be modeled at high resolutions using fragment interactions and provide two
methods for the de novo prediction of protein loops and peptide structure. The
developments presented in this work provide a valuable alternative to experimental
high-resolution structure elucidation of target protein-peptide complexes, bringing
closer the possibility of in silico designed peptides for therapeutic applications.

Samenvatting
E
iwitten zijn veruit de meest krachtige en complexe biologische moleculen in de
cel. Ze zijn essentieel en aanwezig in alle vormen van het leven: van virussen
en bacteri¨en tot planten en dieren. Het is algemeen aanvaard dat de functie van
een eiwit afhankelijk is van de driedimensionale structuur van het eiwit, die op
haar beurt meteen gerelateerd kan worden aan de opeenvolging van aminozuren
waaruit het eiwit is opgebouwd. Eiwitten interageren met allerhande moleculen
zoals andere eiwitten, DNA, RNA of peptiden.
Peptiden zijn moleculen die bestaan uit korte sequenties van aminozuren. Er
wordt geschat dat eiwit-peptide interacties een rol spelen in meer dan 40% van
alle eiwit-eiwit interacties in en buiten de cel. Het gebrek aan data omtrent de
driedimensionale structuur van deze peptide-interacties heeft er evenwel voor
gezorgd dat de functie van veel van deze interacties tot nog toe onbekend is; het
gebrek aan structurele data is ook een hinderpaal in de ontwikkeling van deze klas
van moleculen als nieuwe en krachtige geneesmiddelen.
In deze thesis stellen we het gebruik van eiwitfragmenten voor om de complexe
structuur van peptide-interacties te voorspellen en zodoende het gebrek aan hoge-
resolutie structuren te omzeilen. Hiervoor maken we gebruik van BriX, een data-
bank met meer dan 7 miljoen eiwitfragmenten bestaande uit 4 tot 14 aminozuren
elk, waarin ongeveer 2000 canonieke eiwitfragmenten kunnen ge¨ıdentificeerd wor-
den. We tonen aan dat de bindingsoppervlakken tussen eiwitten en peptiden sterke
v

gelijkenissen vertonen met de interacties tussen eiwitfragmenten die uit verschil-
lende, niet-gerelateerde eiwitstructuren worden ge¨extraheerd. Dit inzicht laat ons
toe de enorme hoeveelheid aan structurele data uit deze eiwitfragmenten te ge-
bruiken voor het voorspellen van de interactie tussen eiwit en peptide. We tonen
bijvoorbeeld aan dat we accuraat de structuur kunnen voorspellen van peptiden
die binden aan modulaire domeinen, zoals PDZ domeinen of het LBD domein van
de oestrogeen receptor.
De ontwikkelingen gepresenteerd in dit werk bieden een alternatief voor het ex-
perimenteel oplossen van hoge-resolutie structuren van eiwit-peptide interacties,
en brengen ons een stap dichter bij het ontwerpen van peptiden voor therapeutis-
che doeleinden.
vi

Preface
This thesis deals with the topic of protein and peptide structure prediction and
design. Parts of this research have been published or are currently in the process
of publication. It is the objective of this thesis to unite and present all the individual
findings obtained and described in each manuscript. I have, however, taken the
liberty to edit each of these manuscripts to fit the flow of the thesis. Each chapter,
with the exception of first and last, contains an introduction, a results section, a
materials and methods section, and a conclusion.
Chapter 1 introduces proteins and peptides and the role of structure. We have
made an attempt to bring an objective overview of the field of protein modeling and
design that is relevant to many of the concepts and applications presented here.
We also introduce the field of peptide prediction and design and its relevance for
therapeutics (Vanhee et al., 2011).
Chapter 2 introduces the ‘protein fragment paradigm’ that is key to this work.
Two databases of protein fragments – BriX and Loop BriX (http://brix.crg.es)
– are described that provide a vast resource for fragment-based protein structure
prediction and design (Vanhee et al., 2011).
Chapter 3 describes LoopX (http://loopx.crg.es), a method for de novo pre-
diction of protein loops, the most variable parts of the protein structure and no-

toriously difficult to predict. We describe how LoopX outperforms state-of-the-art
methods, combining a 100-fold speed increase with excellent prediction accuracy
and coverage for loops up to 12 residues. Moreover, we demonstrate that LoopX
can model conformational ensembles adopted by protein loops.
Chapter 4 provides a comprehensive overview of the structural landscape of
protein-peptide interactions, conveniently stored in the PepX database (http:
//pepx.switchlab.org). Protein-peptide complexes are classified based on the
architecture of their binding sites and annotated with both structural and biological
information (Vanhee et al., 2010).
Chapter 5 provides an key insight in the structure of the protein-peptide inter-
actions, relating the architecture of monomeric proteins with the architecture of
protein-peptide complexes. Our analysis, building on both the BriX and PepX
databases, suggests that the wealth of structural data on monomeric proteins can
be harvested to model peptide interactions (Vanhee et al., 2009).
Chapter 6 puts many of the developed insights into practice by describing a peptide
structure prediction algorithm that is able to model peptide interactions without
previous knowledge of the complex structure. We provide two in-depth case stud-
ies on the PDZ domain and the α-ligand binding domain of the estrogen receptor,
demonstrating the potential for structure prediction of peptide motifs.
Finally, Chapter 7 provides a discussion on the general topic of this thesis.
viii

Publications
Protein-Peptide Interactions Adopt the Same Structural Motifs as Monomeric Pro-
tein Folds. Peter Vanhee, Francois Stricher, Lies Baeten, Erik Verschueren, Tom Lenaerts,
Luis Serrano, Frederic Rousseau and Joost Schymkowitz. Structure, August 2009.
PepX: a structural database of non-redundant protein-peptide complexes. Peter
Vanhee, Joke Reumers, Francois Stricher, Lies Baeten, Luis Serrano, Joost Schymkowitz,
Frederic Rousseau. Nucleic Acids Research, January 2010.
Modeling protein-peptide interactions using protein fragments: fitting the pieces?
Peter Vanhee, Francois Stricher, Lies Baeten, Erik Verschueren, Luis Serrano, Frederic
Rousseau and Joost Schymkowitz. BMC Bioinformatics, December 2010.
BriX: a database of protein building blocks for structural analysis, modeling and
design. Peter Vanhee*, Erik Verschueren*, Lies Baeten, Francois Stricher, Luis Serrano,
Frederic Rousseau and Joost Schymkowitz. Nucleic Acids Research, January 2011.
Computational design of peptide ligands. Peter Vanhee, Almer van der Sloot, Erik Ver-
schueren, Luis Serrano, Frederic Rousseau and Joost Schymkowitz. Trends in Biotech-
nology, May 2011.
ix

Acknowledgements
This thesis is the result of the hard work of many, many different people I have
worked together with during the course of my PhD. Here, I would like to express
my gratitude towards them.
First of all, I would like to thank my supervisors, Joost Schymkowitz and Frederic
Rousseau, who have been tremendously helpful during the course of this work.
They introduced me to the complex maze the field of biology really is, encouraged
me to do research at the forefront of science, and supervised this project from
start to end. They have motivated me, at the SWITCH lab in which I started this
PhD, to develop a deep interest in molecular biology. I also would like to thank
Tom Lenaerts, my master thesis supervisor, who proposed me to start a PhD and
introduced me to the SWITCH lab. He has always been a source of help and
advice.
I am very grateful to Luis Serrano, who opened the doors of his lab at the
Center of Genomic Regulation in Barcelona. Despite his hectic agenda, he has
been instrumental in all parts of his work, continuously throwing in new ideas
and providing me with the opportunity to work in one of the leading institutes in
biomedical sciences in Europe.
I have been very fortunate with the people with whom I have been working
side by side in this project. Lies Baeten who graduated from computer science like
me, has initiated the BriX project during her PhD. Sharing the same background,
she has contributed to many of the ideas and tools we developed together during
xi

this work. One year later, Erik Verschueren joined the SWITCH laboratory and
continued his work in the CRG in Barcelona. Since we met each other again in the
CRG, we have been working together on a daily basis, sharing many moments of
frustration and euphoria. Without both Lies and Erik, I believe this work would not
have been the same.
Many more people have been important to this project. For example, Almer
van der Sloot, whom I met in the lab of Luis Serrano, has often shared his broad
knowledge in cellular biology with me; I was very happy to write with him a
review on computational peptide design, that shaped the introductory chapter of
this thesis. Fran¸cois Stricher often helped me understanding the nitty-gritty of
protein structure and stability, and his contributions to the FoldX force field have
been essential to this work. Joke Reumers, a former member of the SWITCH lab,
motivated me to work on the database of protein-peptide complexes, and together
we published a paper which has left me hungry for more publications. I also
really enjoyed working together with Joost Van Durme; we pushed the project of
protein loop prediction, which was originally started by Lies Baeten, to the next
level. Programming with Javier Delgado, originally a SWITCH member and now
post-doc at the lab of Luis Serrano, on the FoldX suite has been very pleasant as
well.
I wish to thank all the members in both the SWITCH group and the group of
Luis Serrano at the CRG. Besides being great colleagues, many of you have also
become good friends. I’d like to thank Ivo, a former member of the SWITCH lab,
for giving critical advice and support. Outside the context of this PhD, I have
often worked together with Antonio, Christof and Andrea. I believe what we did
together has helped me making this project successfull, and I hope we will be
working together again in the future.
The financial support for performing this study was given by IWT, FWO and
EMBO. It goes without saying that without their funds, this thesis would not have
been possible.
Finally, I would like to thank my family and my friends, and in particular my
parents for giving their unconditional support. I also wish to thank everyone
who welcomed me with open arms in Barcelona and with whom I spent many
unforgettable moments. And thanks to you Camilla, for sharing both the difficult
xii

and the great moments I went through while working on this thesis. Nothing here
would have been possible without all of you.
xiii

Contents
1 Introduction 1
1.1 Proteins and peptides . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Protein function . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Protein building blocks . . . . . . . . . . . . . . . . . . . . 4
1.1.3 Protein folding and stability . . . . . . . . . . . . . . . . . 5
1.1.4 Biological function of protein-peptide interactions . . . . . 7
1.1.5 Peptides as therapeutics . . . . . . . . . . . . . . . . . . . 8
1.1.6 Protein Structure . . . . . . . . . . . . . . . . . . . . . . . 12
1.2 Protein structure prediction and design . . . . . . . . . . . . . . . 13
1.2.1 Comparative modeling . . . . . . . . . . . . . . . . . . . . 15
1.2.2 Ab initio structure prediction . . . . . . . . . . . . . . . . . 17
1.2.3 Predicting protein dynamics . . . . . . . . . . . . . . . . . 18
1.2.4 Computational protein design . . . . . . . . . . . . . . . . 19
1.2.5 Protein docking . . . . . . . . . . . . . . . . . . . . . . . . 20
1.3 Computational design of peptide ligands . . . . . . . . . . . . . . 20
1.3.1 A better understanding of protein-peptide interactions . . . 21
1.3.2 Peptide design based on sequence motifs . . . . . . . . . . 23
1.3.3 Protein complexes as a source of active peptides . . . . . . 26
1.3.4 Protein docking and fragment based docking as tools for
peptide design . . . . . . . . . . . . . . . . . . . . . . . . 28
xv

1.3.5 Peptide design using protein-peptide complexes . . . . . . 28
1.3.6 Remedying the lack of structural information . . . . . . . . 30
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2 Fragmenting protein space 43
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.2 Contents of the BriX database . . . . . . . . . . . . . . . . . . . . 46
2.2.1 Update of the BriX database . . . . . . . . . . . . . . . . . 46
2.2.2 BriX Statistics . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.2.3 Creation of the Loop BriX database . . . . . . . . . . . . . 49
2.2.4 Loop BriX Statistics . . . . . . . . . . . . . . . . . . . . . . 53
2.2.5 Applications of the BriX database . . . . . . . . . . . . . . 53
2.3 Database access . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.3.1 Database availability . . . . . . . . . . . . . . . . . . . . . 54
2.3.2 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.3.3 Covering or bridging of protein structures . . . . . . . . . . 55
2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3 Predicting loop structure 63
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.2.1 Comparison with the state-of-the-art loop reconstruction
algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.2.2 Loop homology is no prerequisite for loop reconstruction
accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.2.3 Loop ensemble prediction . . . . . . . . . . . . . . . . . . 70
3.2.4 Comparison with MODELLER, RAPPER, PLOP and FREAD . 72
3.3 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . 76
3.3.1 LoopX Algorithm . . . . . . . . . . . . . . . . . . . . . . . 76
3.3.2 Reconstruction accuracy . . . . . . . . . . . . . . . . . . . 79
3.3.3 Benchmark datasets . . . . . . . . . . . . . . . . . . . . . 80
3.3.4 LoopX Webserver . . . . . . . . . . . . . . . . . . . . . . . 81
3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
xvi

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4 The structural landscape of protein-peptide interactions 87
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.2 Contents of the PepX database . . . . . . . . . . . . . . . . . . . . 90
4.2.1 Construction of a non-redundant data set of protein-peptide
complexes . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.2.2 Statistics on structural protein-peptide complexes . . . . . 93
4.2.3 Ligand annotation with structural variants for peptide design 97
4.3 Database Access . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.3.1 Database Availability . . . . . . . . . . . . . . . . . . . . . 97
4.3.2 User interface . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5 Protein-peptide interactions resemble
monomeric protein interactions 107
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.2.1 InteraX: a database of interacting protein fragments . . . . 111
5.2.2 Reconstruction of protein-peptide interactions from inter-
acting fragment pairs derived from monomeric proteins . . 113
5.2.3 Reconstruction of peptide binding motifs by using multiple
fragment pairs observed in monomeric proteins . . . . . . 117
5.2.4 Statistical analysis of the factors that determine reconstruc-
tion accuracy . . . . . . . . . . . . . . . . . . . . . . . . . 120
5.3.1 Construction of a non-redundant data set of protein-peptide
complexes . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
5.3.2 The dataset of protein fragments . . . . . . . . . . . . . . . 125
5.3.3 InteraX database . . . . . . . . . . . . . . . . . . . . . . . 126
5.3.4 Covering algorithm . . . . . . . . . . . . . . . . . . . . . . 126
5.3.5 FoldX force field . . . . . . . . . . . . . . . . . . . . . . . . 127
5.3.6 Statistical analysis . . . . . . . . . . . . . . . . . . . . . . . 129
xvii

5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
6 Predicting peptide structure and speciﬁcity 133
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
6.2.1 Peptide docking using interaction patterns from InteraX . . 135
6.2.2 De novo peptide structure prediction using interaction pat-
terns from InteraX . . . . . . . . . . . . . . . . . . . . . . 136
6.2.3 Case study: PDZ peptide design and specificity . . . . . . 137
6.2.4 Case study: helical peptide design for the estrogen receptor
ligand-binding domain . . . . . . . . . . . . . . . . . . . . 143
6.3.1 A constraints-based framework for peptide design . . . . . 148
6.3.2 Local backbone moves using BriX . . . . . . . . . . . . . . 151
6.3.3 Binding site prediction . . . . . . . . . . . . . . . . . . . . 152
6.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7 Discussion 157
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
List of Figures 169
List of Tables 173
xviii

1Introduction
Parts of this chapter are based on
Computational design of peptide ligands Peter Vanhee, Almer van der Sloot, Erik Verschueren,
Luis Serrano, Frederic Rousseau and Joost Schymkowitz. Trends in Biotechnology, May 2011.
P
roteins are the most versatile and complex molecules in the cell, giving rise
to most of life’s extraordinary shapes and processes. Peptides – short se-
quences of ∼4-40 amino acids – are key components of protein-protein interaction
networks, regulating many important cellular processes. It is commonly accepted
that protein function directly relates to the three-dimensional structure of these
molecules, yet high-resolution structures are often lacking. For therapeutic usage,
peptides possess several attractive features when compared to small molecule and
protein drugs: they show a high structural compatibility with target proteins, con-
tain the ability to disrupt protein-protein interfaces and have a small size. Efficient
structure prediction and design of high affinity peptide ligands via rational methods
has been a major obstacle to the development of this potential drug class. How-
ever, structural insights into the architecture of protein-peptide interfaces have
recently culminated in a number of computational approaches for the rational de-
sign of peptides targeting proteins. These methods provide a valuable alternative
to high-resolution structures of target protein-peptide complexes, bringing closer
the possibility of in silico designed peptides.
1

1. INTRODUCTION
1.1 Proteins and peptides
1.1.1 Protein function
A B C
Figure 1.1: Proteins interacting with antibodies, small molecules and peptides.
Structural models of protein interactions relevant for therapeutics. (A) The monoclonal
antibody (mAb) cetuximab inhibits the extracellular domain of the epidermal growth
factor receptor (EGFR) (PDB 1YY8). This therapeutic mAb is used in the treatment of
colorectal cancer. (Citri & Yarden, 2006). (B) The small molecule gefitinib (Iressa,
AstraZeneca) occupies the ATP-reserved binding pocket of intracellular kinase domain
of EGFR, thus preventing phosporylation (PDB 2ITY) and inhibiting tumor growth.
(Yun et al., 2007) (C) A phosphotyrosine peptide interacting with the SH2 domain of
GRB2 that binds the intracellular domain of EGFR (PDB 1JYR) (Huang et al., 2008)
Proteins are present in all forms of life, from plants, bacteria and viruses to
animals. They are the cell’s workhorses, putting the genetic information (DNA) of
the cell into action. There are many different types of proteins, making up most
of the cell’s dry mass. Proteins are involved in almost all of the processes going
on in the body; they transport nutrients through the blood, break them down to
power muscles and send signals to the brain. Many proteins act as enzymes that
catalyze reactions to form and break covalent bonds, directing the vast majority of
2

all major chemical processes in the cell.
Here is a small sample of the role of proteins:
• Enzymes facilitate biochemical reactions. For example, Alcohol dehydroge-
nase transforms alcohol into a non-toxic form that the body uses for food
and lactase breaks down sugar lactose found in milk.
• Transport proteins move molecules from one place to another. For example,
hemoglobin carries oxygen through the blood and cytochromes operate in
the electron transport chain as electron carrier proteins.
• Structural proteins give structural features to the cell and provide support.
For example, keratin strengthens protective coverings such as hair, and col-
lagen gives structure and support to the skin and the bones.
• Hormonal proteins are messenger proteins that coordinate important pro-
cesses in the cell and facilitate cell-cell communication. For example, insulin
regulates glucose metabolism by controlling the blood-sugar concentrations
and growth hormone helps regulate growth.
• Contractile proteins provide movement to the cell. For example, actin and
myosin are responsible for muscle contraction.
• Antibodies defend the body from foreign invaders by tightly binding to
antigens such as viruses or bacteria. Antigens are bound by the Major
Histocompatibility Complex (MHC) and presented to a T-cell receptor, after
which white bloods cells can be recruited to destroy the invaders. The
structural diversity of antibodies (and in particular of the loops that bind the
antigen) is immense: it has been estimated that humans can generate around
10 billion different antibodies, able to recognize virtually any foreign invader.
To perform all these functions, proteins do not act alone. Instead, they can
associate with themselves or with other proteins as dimers or as multi-subunit com-
plexes, creating networks of protein interactions (Figure 1.1). These interactions –
for example between proteins and other proteins, small molecules, peptides, met-
als, lipids, DNA or RNA – are fundamental in understanding the relation between
3

1. INTRODUCTION
genotype and phenotype at all different levels, from the molecules towards to the
organism itself. In Saccharomyces cerevisiae (baker’s yeast) – for which most of
the protein-protein interaction studies have been carried out – nearly every protein
is involved in an interaction (Han et al., 2004). High-throughput studies on entire
organisms, elucidating entire protein-protein interaction networks, are now within
reach of our understanding, as was shown recently for the bacterium Mycoplasma
pneumoniae (K¨uhner et al., 2009).
1.1.2 Protein building blocks
aliphatic
tiny
small
polar
charged
positivearomatic
hydrophobic
I
L
V
M
F
G
W
Y
D
R
KH E Q
T N
SCSH
CS–S
P
A
Figure 1.2: Amino acids grouped by properties. A Venn diagram grouping amino
acids according to their properties. This is just one of the many possible classifications
of amino acids. The Figure was adapted from (Taylor, 1986).
Proteins are made of amino acids, small molecules of carbon, oxygen, nitrogen,
sulfur and hydrogen. To make a protein, amino acids are connected together
with peptide bonds, that folds into a three-dimensional structure according to the
chemical properties of the amino acids. Each of the amino acids has a small group
of atoms (the ‘sidechain’) branching off the main chain (the ‘backbone’), which
4

gives its unique properties to the nascent protein. There are 20 naturally occurring
amino acids, each of them with a slightly different chemical structure. Based on
their chemical properties, they can be organized in different categories: basic or
acidic, polar (hydrophilic or ‘water-loving’) or hydrophobic (‘greasy’), charged or
uncharged, aliphatic or aromatic (Figure 1.2).
Primary structure
α-helix
amino acid sequence
β-sheet
Secondary structure
regular sub-structures
Quaternary structure
complex of protein molecules
hemoglobin
3-dimensional structure
Tertiary structure
p13 protein
Figure 1.3: Four levels of protein structure.
As shown in Figure 1.3, proteins have different levels of structure: the primary
structure is the amino acid sequence, the secondary structure the local substruc-
tures (α-helices and β-strands) that are stabilized by an organized network of
hydrogen bonds. The tertiary structure is the entire protein folded into a complete
three-dimensional structure, and the quaternary structure is the structure of the
interaction of multiple units coming together to form a larger unit.
1.1.3 Protein folding and stability
As Anfinsen showed, the structure of a protein is uniquely defined by its sequence
(Anfinsen, 1973). The folding of the polypeptide chain is a complex process
that turns the essentially unstructured and elongated polypeptide chain into a
compact, stable and unique protein fold, held together by mainly non-covalent
interactions (Onuchic & Wolynes, 2004). In the late sixties, Levinthal famously
pointed out that it seemed impossible that a protein could fold spontaneously
5

1. INTRODUCTION
following a random process in a reasonable timeframe, suggesting the existence
of folding pathways (Levinthal, 1969). Through a combination of technologies –
most notably, protein recombinant technologies, NMR and X-Ray technologies and
computer simulations – we now commonly accept that the major force of protein
folding is the hydrophobic collapse of the polypeptide chain (Dill, 1990; Chandler,
2005).
The large majority of proteins in their folded state is only marginally stable,
meaning that the energy difference between the unfolded and folded state of
proteins is relatively small. The breaking of a single hydrogen bond caused by
a single amino acid mutation might lead to the collapse of the entire protein.
Different forces contribute the free energy of the protein, commonly expressed
as a variation of free energy (∆G) between the unfolded and folded states. The
non-covalent forces that contribute to the stability of proteins can be described as
follows:
• Van der Waals interactions are weak, attractive or repulsive interactions
that occur between both charged and polar molecules. They include the
London dispersion forces, dipole-dipole interactions and hydrogen bonding,
and are often calculated using (6-12)-potentials such as the Lennard-Jones
potential.
• Hydrogen bonding occurs when two electronegative atoms compete for the
same hydrogen atom. The proton donor is covalently bound to the hydrogen
atom, while the proton acceptor interacts favorably with the hydrogen atom.
Originally observed in 1936 by Pauling and Mirsky – before the first protein
structures became available –, hydrogen bonds are ‘holding together’ the
folded polypeptide chain, giving rise to both α-helices and β-sheets.
• Hydrophobic interactions are a phenomenon observed when non-polar
compounds collapse into aggregates when surrounded by water. Almost
half of the amino acids are hydrophobic (Figure 1.2) and tend to cluster
together to form the hydrophobic core of the protein.
• Electrostatic interactions are long-distance cohesive forces that appear be-
tween differently charged atoms. Salt bridges are a special kind of hydrogen
6

bonds that occur between charged functional groups.
It came somewhat as a surprise to discover that an estimated 40% of all proteins
in the human proteome are intrinsically disordered and become only fully or partly
structured upon binding to binding partners in the cell (Gianni et al., 2003). Short
motifs or peptides are often recognized within these unstructured proteins and
play important roles in protein regulatory networks and signaling pathways.
1.1.4 Biological function of protein-peptide interactions
Peptides and short peptide stretches in larger proteins (∼4-40 amino acids long)
perform a myriad of functions both in cell-to-cell and intracellular communication:
they are important mediators in many signalling pathways and regulatory networks
(Neduva & Russell, 2006). A great variety of endogenous regulatory peptides of
variable length act as peptide hormones and/or neurotransmitters and are involved
in inter-cellular communication (Brunton et al., 2006). These peptides show a wide
range of physiological activities and are important in maintaining homeostasis.
Examples are the potent blood pressure regulators angiotensine II (8 amino acids,
a.a.) and vasopresine (9 a.a.), the appetite regulators ghrelin (28 a.a) and obestatin
(23 a.a.), and glucagon (29 a.a.), a regulator of glucose metabolism. They act in an
endocrine, paracrine or autocrine fashion by binding cell surface receptors, such as
G-protein coupled receptors (GPCRs). Typically, these peptides are produced by
differential processing of a precursor protein by endopeptidases to yield biologically
active peptides. Many higher organisms, from amphibians to humans, also rely
on peptides as an integral part of their host-defense mechanism against microbial
assault, and although the use of peptides in antimicrobial therapy is rather limited at
the moment, peptides are increasingly being considered as antibacterials, antivirals
and antifungals in clinical settings (Hancock & Sahl, 2006; Easton et al., 2009).
Short, often unstructured peptide stretches or ‘motifs’ of larger proteins are
also important players in intracellular signaling networks (Russell & Gibson, 2008).
These motifs are recognized by globular protein domains, such as the SH3 domain
that binds short polyproline-rich motifs, the SH2 domain that recognizes peptides
containing phoshorylated tyrosine, or the PDZ domain that binds C-terminal motifs
(Pawson & Nash, 2003). It is believed that up to 40% of all protein interactions in
7

1. INTRODUCTION
the cell are either directly or indirectly influenced by peptide-mediated interactions
(Neduva & Russell, 2006; Petsalaki & Russell, 2008). Given the importance of these
protein-peptide interactions in both inter- and intracellular signalling they provide
important targets for therapeutic intervention in a range of diseases. Modulating
these interactions with peptide-like agonists and antagonists therefore constitutes
an attractive therapeutic strategy.
1.1.5 Peptides as therapeutics
The vast majority of therapeutic compounds achieve their effects by binding to and
altering the function of target protein molecules (Figure 1.4). Traditionally, the main
source of successful therapeutics has been small organic molecules, which usually
bind in small cavities of the target protein and inhibit or ‘block‘ specific catalytic
centers or the binding sites of natural substrate analogues (Drews, 2000) (Figure
1.4C). The recent focus on protein-protein interaction networks has shifted the goal
of drug targeting increasingly towards disruption of protein-protein interactions, a
feat for which classical small molecules are not always ideally suited (Arkin & Wells,
2004; Wells & McClendon, 2007). The newest additions to the pharmaceutical
arsenal are protein-based therapeutics, which are generally improved recombinant
replacements of endogenous proteins or monoclonal antibodies directed against
a wide variety of targets (Walsh, 2010) (Figure 1.4B). Although the introduction of
protein therapeutics – in particular monoclonal antibodies – has been tremendously
successful, their use is mainly limited to extracellular targets, such as membrane-
bound receptors and secreted proteins, because uptake of these large molecules
into intracellular compartments remains cumbersome (Patel et al., 2007).
Peptides are generally considered ‘poor drugs’ because of cumbersome deliv-
ery, prohibitively short in vivo lifetimes and bad overall bio-availability (Antosova
et al., 2009; Audie & Boyd, 2010). However, recent technological innovations in
formulation, delivery and chemistry have sparked greater interest in peptide ther-
apeutics (Walensky et al., 2004; Timmerman et al., 2005; Tan et al., 2010). Their
chemical structure makes them by definition highly compatible with the proteins
they target and their intermediate size enables them to disrupt protein-protein
interfaces, whilst remaining sufficiently small for intracellular targeting (Patel et al.,
8

x
antibody
receptor
Y-kinase
hormone
small
molecule
peptide
effector
1 2 3 4A B C D
Figure 1.4: Targeting the cell with different molecules. Overview of different
drug strategies targeting protein signaling pathways: (A) normal scenario of a generic
pathway, (B) therapeutic antibodies, (C) small molecules and (D) peptides.
2007) (Figure 1.4D). Presently, more than 50 peptide-based products are approved
for clinical use in the United States and other countries (Table 1.1) (Pechon et al.,
2010), underlining the tremendous market potential of peptidic drugs. This has
spurred a great interest in technologies capable of providing new peptide se-
quences with high affinity and specificity towards therapeutically relevant targets.
In the remainder of this chapter and throughout this thesis, we will discuss recent
technological advances that could lead to rationally designed peptides targeting
proteins.
Peptides and redeﬁning ‘druggability’
Given the current success of recombinant protein-based therapeutics, we are al-
ready witnessing the erosion of the long-standing and relatively narrow definition
of what constitutes a ‘druggable target’ (Hopkins & Groom, 2002) (i.e. a protein
that can be modulated by an orally administered active small molecule, adhering
to the ‘rule of five’ proposed by Lipinski et al. (2001)). The definition of ‘drugga-
bility’ has widened to include targets whose activity can be modulated by larger
molecules, such as proteins and peptides.
Current small-molecule drugs target only a fraction of all proteins inside and
9

1. INTRODUCTION
Name
Approval
date US
Disease
#
A.A.
Origin of mimet-
ics
Company
Global
sales
(US $
mil-
lion)
Glatiramer,
Copaxone
®
1996
Multiple Scle-
rosis
4 Myelin protein Teva 3200
Leuprolide,
Lupron ®
1985
Prostate and
breast cancer
(mainly)
9
Gonadotropin Re-
leasing Hormone
(GnRH) mimetic
Abbott
(amongst
others)
1900
Goserelin,
Zoladex ®
1989
Prostate and
breast cancer
(mainly)
10
Luteinising Re-
leasing Hormone
(LRH) mimetic
Astra-
Zeneca
1146
Octreotide,
Sando-
statin
®
1998
Acromegaly,
carcinoid
syndrome
8
Somatostatin hor-
mone mimetic
Novartis 1123
Teriparatide,
Forteo ®
2002 Osteoporosis 34
Parathyroid
hormone (84
residues, residues
1-34)
Eli Lilly 779
Exenatide,
Byetta ®
2005
Diabetes Type
2
39
Exendin-4 hor-
mone (incretin
mimetic)
Amylin /
Eli Lilly
750
Enfuvirtide,
Fuzeon ®
2003
AIDS (HIV-1 in-
fection)
36
Viral glycoprotein
(gp41)
Roche 167
Table 1.1: Leading examples of peptide therapeutics currently on the market.
Data is extracted from the annual Peptide Report issued by the Peptide Therapeutics
Foundation (http://www.peptidetherapeutics.org/annual-report.html) (Pe-
chon et al., 2010).
outside the cell. Typical targets include GPCRs, enzymes, nuclear hormone re-
ceptors and ion channels, all of which have natural small-molecule substrates
(Brunton et al., 2006). Most of these drugs target the binding pocket of the sub-
strate directly, but also other, allosteric cavities can be targeted. On average, the
contact surface between a small-molecule ligand and its protein receptor is be-
tween 300-1000 Å2
(Smith et al., 2006). In contrast, the contact surface between
two interacting proteins is generally much flatter, larger (1200-3000 Å2
) (Conte
et al., 1999; Jones & Thornton, 1996), and discontinuous in sequence. Most of
the free energy of binding is contributed by a limited number of amino acids in the
10

interface (‘hotspot’ residues) (Clackson & Wells, 1995). Interaction networks are
distributed and constructed with a modular architecture, showing tight coopera-
tive interactions within a module and additive interactions between the modules
(Reichmann et al., 2005). Conversely, protein-peptide interfaces display a smaller
contact surface and a more continuous architecture and often target well-outlined,
large hydrophobic pockets on a protein (London et al., 2010). These pockets are
larger than the typical clefts targeted by small molecules, but smaller than large
protein-protein interfaces.
Given the large, shallow and distributed nature and lack of pockets and cavities
in protein-protein interfaces, these interfaces are often considered to be hard
to target with small-molecule drugs. Although progress has been made – e.g.
Thorsen et al. (2010) identified a small molecule that targets PDZ domains with
micromolar affinity similar to the endogenous peptide – disrupting protein-protein
interfaces with classical small molecule compounds remains a difficult task (Wells
& McClendon, 2007). Peptide-like drugs are likely to be more suitable candidates
to act as competitive inhibitors of protein-protein interactions, considering their
similar binding mode.
Alternatively, peptide-like ligands could target protein-protein interactions in
a non-competitive manner by acting as an allosteric modulator. This concept
has received a lot of attention owing to the success of small-molecule allosteric
modulators (Conn et al., 2009; Eglen & Reisine, 2010), but is also well-established
in protein-mediated interactions (Alvarado et al., 2010). One limitation to this
allosteric approach is the need to identify the ‘pressure points’ in a target protein
structure that should be ‘hit’ in order to affect its function. Several methods to map
the dynamics of the amino acid interaction network that constitutes the protein
structure have been developed, and these have, at a minimum, the potential to
reveal sites on the protein surface with allosteric modulatory power (Lee et al.,
2008; Lenaerts et al., 2008, 2009; Haliloglu & Erman, 2009; Haliloglu et al., 2010).
In short, targeting protein-protein interactions with peptide based competitive
inhibitors or – albeit more challenging – peptide-based allosteric modulators ex-
tend the definition of druggability by expanding the potential classes of druggable
targets.
11

1. INTRODUCTION
1.1.6 Protein Structure
Structural biology gathers structural information from atoms to cells at different
levels of the biological hierarchy into a common framework. Thus, elucidating
structure of protein molecules is key. Since the determination of the first three-
dimensional structure of myoglobin – an oxygen-carrying protein found in muscle
tissue – in 1958 (Kendrew et al., 1958), our understanding of biology has radically
changed (Figure 1.5). Experimental structure determination is often carried out
either by X-ray crystallography or by nuclear magnetic resonance (NMR).
1958 1960 1963 1965 1969 1973 1977 1978 1985 1987 2001
Perutz published the
low-resolution structure
of haemoglobin.
Kendrew reported
the first low-
resolution structure
of myoglobin.
Anfinsen deduced from
experiments that the
native conformation of a
protein is determined by
its amino-acid sequence .
Levitt and Lifson
introduced the method
of energy refinement.
Karplus published the first
molecular dynamics simulations.
Wuthrich determined the first
NMR structure of a protein.
Preliminary publication of the
human genome sequence.
Phillips determined the high-
resolution structure of hen egg-
white lysozyme .
Cohen, Boyer and colleagues
developed recombinant
DNA technology.
Hutchison andSmith reported
an effective method for site-
directed mutagenesis .
(1987–1989) Fersht introduced
/ -value analysis of binding
and folding.
Sanger and Maxam and Gilbert published
their respective methods for DNA sequencing.
Figure 1.5: History of experimental structure determination from the last ﬁfty
years. Figure adapted from (Fersht, 2008).
Many structures of globular proteins exist in public databases such as the Pro-
tein Data Bank (PDB, Berman et al. (2000)), approximately 69.900 as of December
2010. Even though special care is taken for the resolution of fibrous proteins
such as membrane proteins, currently only few structures are available in public
databases because of the complexities associated with crystallizing these aggre-
gating proteins.
Protein structures are not ‘static pictures’. Instead, proteins undergo dynamic
excursions from their ‘ground states’ and these fluctuations have increasingly been
associated with important biological processes such as for example protein folding,
enzyme catalysis and signal transduction (Eisenmesser et al., 2002; Lange et al.,
2008). These fluctuations are typically modeled using Nuclear Magnetic Resonance
(NMR) experiments, but computational techniques have also been a major aid in
the determination of protein dynamics (Duan & Kollman, 1998; Shaw et al., 2010).
Nowadays, high-throughput structural genomics initiatives strive towards ex-
perimentally solving selected sets of proteins, such that all proteins of unknown
12

1.2 Protein structure prediction and design
structures have at least one neighbor in protein classification systems such as
SCOP and CATH (Chandonia & Brenner, 2006). Since their inception, structural
genomics has resolved thousands of novel protein structures, mainly using X-ray
diffraction. Yet it has been estimated that at least 16.000 carefully selected struc-
tures would need to be solved in order that comparative modeling can predict
90% of all protein domain families (Chruszcz et al., 2010). Solving this ‘structural
data’ problem is key to the general understanding of proteins in the cell, and since
still more than 70% of all known proteins are without a determined structure –
let alone complexes of protein-protein, protein-DNA, protein-RNA interactions –,
the computational modeling of protein structure has become a field on its own.
In the remainder of this introductory chapter, we introduce this field and focus in
particular on the prediction of peptide interactions.
Following Anfinsen’s dogma, the structure of a protein is uniquely defined by
its sequence (Anfinsen, 1973). Since experimental determination of a protein’s
structure is an often expensive and time-consuming task, computational biologists
have embraced the challenge to predict secondary and tertiary structure directly
from sequence. However, the problem turns out to be far from trivial: the
structural variety the polypeptide chain can adapt is virtually unlimited. Creighton
estimated that a protein of 100 amino acids can adopt up to 10100
alternative
conformations (Creighton, 1984), approximately as much as there are atoms in
the universe.
Many computational methods have been developed and used to increase struc-
tural coverage (Baker & Sali, 2001). Every design methodology essentially has two
components: a sampling component that samples the space of possible conforma-
tions and a scoring component, that ranks solutions based on a ranking scheme.
The sampling problem is often simplified by taking a ‘fixed-backbone’ assump-
tion, although recent years have seen an increasingly number of methods that
introduce protein backbone flexibility, e.g. by using an ensemble of backbone
conformations, robotic-arm inspired moves or iterative backbone and sequence
optimization, amongst others (Mandell & Kortemme, 2009a). Sidechain confor-
13

1. INTRODUCTION
mations are subsequently sampled on a given backbone, often using a library of
rotamers that represent all different conformations the sidechain can adapt on the
given backbone template. Complete enumeration of these conformations remains
a herculean task, such that both stochastic methods (e.g. Monte Carlo simulation,
Kuhlman & Baker (2000)) and deterministic methods (e.g. the popular dead-end
elimination technique, Desmet et al. (1992)) are employed. Finally, scoring func-
tions - either statistical or physics-based - are used to rank the final set of solutions,
mainly relying on energetic terms such as Van der Waals packing constraints, hy-
drophobic interactions, hydrogen bonding, solvation and electrostatic interactions
(Section 1.1.3). In our work, we rely on the empirical force field FoldX to weigh
these components and output a final ranking based on the total free energy estimate
of the model (Schymkowitz et al., 2005).
Probably the most important question in modeling is when structure prediction
becomes biologically useful (Zhang, 2009). This depends on the purpose of the
model: highly accurate models (< 1-2 Å root mean square deviation, RMSD, versus
the crystallographic model) can in some cases be used for ligand-binding studies
or even virtual screening, while medium-resolution models (2.5-5 Å) could provide
an idea about functionally important residues, active sites or disease-associated
mutations. Low resolution models (> 5 Å) could be used for topology recognition
or for determining the protein boundaries.
Measuring progress in the field of structure prediction is the topic of the bi-
annual competition CASP (Critical Assessment of techniques for protein Structure
Prediction) (Moult et al., 1995). Research groups are given an amino acid sequence
for which no native structure is known, but that will be determined soon. Since
its inception in the mid-90’s, this community-wide effort has already instigated
many novel design protocols, with over 100 different groups participating in the
competition.
The available range of modeling methods can roughly be divided in two cate-
gories, although an increasing number of hybrid methods is being developed as
well: methods that rely on comparative modeling or threading and methods that
predict structure ab initio. Typically, the first category of methods has provided
the more accurate models but is limited by the amount of structural data available,
while the latter is unconstrained but limited by the huge number of possible confor-
14

mations, idealized model systems and approximate free energy estimates. In what
follows, we will discuss a selection of recent advances in these methodologies and
also focus on a related but slightly different field in computational biology: that of
computational protein design. We conclude with a short discussion of the docking
problem, in which the structure of two interacting proteins is sought, given their
structures in isolation.
1.2.1 Comparative modeling
Comparative modeling – also called homology modeling – relies on the observation
that sequence similarity suggests structural similarity, often because in the process
of evolution, structure (and thus function) is not radically altered. Homology
modeling thus searches for proteins which share a certain amount of sequence
similarity with the target protein. Highly accurate models are often generated
when more than 50% sequence identity exists between the target protein and the
templates. These models might have a root-mean-square (RMS) error of 1 Å on
the main-chain atoms, which is comparable to the difference between X-ray and
NMR methods (Baker & Sali, 2001). Between 30 and 50%, homology modeling
will still give reliable models but especially loops and other variable regions in
the protein structure might deviate from the template structures. Finally, when
comparative models are based on less than 30% sequence homology the errors
accumulate rapidly in the model. Frequent sidechain packing errors, distortion of
the protein core, loop modeling errors and other severe problems might render the
model useless. These bottlenecks in comparative modeling - especially when low
sequence homology exists - might be partly remedied by means of all-atom force
field refinements, multiple template structures or specialized loop reconstruction
methods (see Box).
To better understand the mechanisms of folding, Rose and Creamer put forward
the challenge to find two proteins with high sequence similarity but a different
fold (Rose & Creamer, 1994). Recently, two small proteins of 56 residues each,
GA88 and GB88, were designed that sharing 88% sequence homology and only
7 non-identical residues. The structures, solved by NMR, revealed two distinct
folds: GA88 adopted a 3-α fold while GB88 adopted a α-β-fold, showing that
15

1. INTRODUCTION
Protein loop reconstruction with LoopX
In Chapter 3 we present LoopX, a loop reconstruction method that
combines a database backbone template search with sidechain reconstruc-
tion. We demonstrate the competitiveness of the method by comparing it to
various state-of-art loop prediction methods, including the robotics-inspired
KIC, which was shown recently recently to reconstruct loops until length
12 with sub-angstrom accuracy (Mandell et al., 2009). Additionally, we
demonstrate that LoopX can model the conformational ensemble adopted by
protein loops and induced by ligand binding in the case of the PDZ-peptide
and meganuclease-DNA interactions.
conformational switching between two folds could be effected with just a handful
of mutations (He et al., 2008; Alexander et al., 2009). This poses interesting
problems to homology-based methods, since they challenge our interpretations of
sequence-structure relationships, yet they might not be representative for a large
number of examples.
Protein threading Threading methods attempt to fit sequences to a known struc-
ture from a library of folds, especially for cases in which no evolutionary relation
is obvious (Bowie et al., 1991). This is accomplished by ‘threading’ the sequence
along the backbone of the template model, followed by a scoring function which
evaluates the placement of the amino acids in the backbone. Threading methods
are less constrained since no homology is required, yet they still rely on correctly
selecting a series of templates in which to fit the sequence. One such success-
ful (hybrid) approach is I-TASSER (Roy et al., 2010). The algorithm performs a
PSI-BLAST (Altschul et al., 1997) using the query sequence to identify evolution-
ary relatives, followed by secondary structure assignment to construct an initial
scaffold. That scaffold is then used to select a series of models from the PDB
for threading using a series of state-of-the-art threading programs (Wu & Zhang,
2007). In subsequent steps, fragments from the structural models are determined
and combined with ab initio predictions for badly aligned regions (particularly,
loops), to result in a structural model for which functional properties can be de-
rived.
16

Fragment-based structure prediction Another type of comparative modeling
tools are fragment-based methods. They differ into their definition of the smallest
unit used to infer sequence similarity: instead of the entire fold, they consider
stretches of residues, effectively expanding the number of templates that can be
modeled with. Many different fragment-based methods have been successfully ap-
plied to protein structure prediction, often in combination with ab initio sidechain
prediction and energy evaluation. A successful fragment-based approach is used
in Rosetta to bootstrap structure prediction (Section 1.2.2). In our own work,
we have used the ‘fragment paradigm’ as an effective way to solve the sampling
problem in protein structure prediction (see Box).
The BriX fragment paradigm
In our own work, we have used the ‘fragment paradigm’ as an effec-
tive way to solve the sampling problem in protein structure prediction (Baeten
et al., 2008; Vanhee et al., 2011). We used short protein backbone fragments
to reconstruct (parts of) proteins (Chapter 2), deduce structural relationships
between single proteins and protein-peptide interfaces (Chapter 5) and
perform blind reconstruction of peptide interactions (Chapter 6).
Comparative modeling entirely depends on the quality of the templates avail-
able: many structural templates obviously lead to better homology modeling and
thus to better models. In the limit, and with the increasing body of structural
data that is being deposited into public databases, comparative modeling will ulti-
mately cover the entire structural space, since the number of unique folds in nature
is expected to be limited (Grant et al., 2004).
1.2.2 Ab initio structure prediction
Ab initio methods differ from comparative modeling techniques since they remove
the requirement of having at least one related structure. Yet, some of the most
successful ab initio techniques - such as the popular Rosetta framework (Rohl
et al., 2004) - follow a hybrid approach. For example, Rosetta scans the PDB for
small structural fragments with similar sequence signatures to the target sequence
17

1. INTRODUCTION
using a Bayesian probability distribution. It then iteratively assembles these frag-
ments using Monte Carlo sampling, optimizing packing between the fragments
and favoring β-sheet formation. In a fine-grain step the sidechains are rebuilt
with backbone-dependent rotameric libraries. As an example, an enzymatic active
site has been designed using a series of minimal active site templates (termed
‘theozymes‘) that could accomodate the four different catalytic motifs used to cat-
alyze breaking a carbon-carbon bond, and the model was later confirmed using
X-Ray crystallography (Jiang et al., 2008). In other recent work, the methodology
was repeated to provide a design that catalyzes the Diels-Alder reaction, a reac-
tion that synthesizes a special type of organic bonds and for which supposedly
no natural enzyme exists (Siegel et al., 2010). Both cases are milestones for our
current ability to design proteins with desired properties using semi-computational
approaches.
Often, interleaving experimental information in the computational method re-
sults in better designs, by constraining for example the design towards certain
active sites, or allowing more conformational freedom in one part of the protein
than in another. Among many different approaches, the use of NMR chemical shift
data is particularly appealing. In CS-Rosetta, the chemical shift data was used to
select fragments with similar resonance profiles in combination with the traditional
sequence similarity from public databases, thus constraining the fragments used in
the stochastic sampling and improving the final models (Shen et al., 2008). A com-
pletely different but highly innovate approach is to use the human eye to resolve
hard three-dimensional problems. By means of an online computer game, Baker
and co-workers managed to not only explain protein folding to a large community
outside the protein design field, they also showed a significant improvement in
CASP models that could not have been achieved using computational algorithms
alone (Cooper et al., 2010).
1.2.3 Predicting protein dynamics
Methods that sample the conformational space according to first principles are
much slower and thus limited in use. Probably the best known of these methods
are the Molecular Dynamics (MD) (McCammon et al., 1977). In MD simulations,
18

biophysical forces are explicitly described all-atom – e.g. with CHARMM (Brooks
et al., 1983) or AMBER (Cornell et al., 1995) –, as opposed to the statistical-based
force fields descriptions often used in approximate or homology-based methods
(Klepeis et al., 2009). The advantage of MD simulations over other structure
prediction methods is that they not only capture the ground state of the folded
protein, but they also hint at the dynamics of the protein and its folding process,
often important for understanding the function of the protein. Recently, David
Shaw and co-workers modelled the folding and unfolding of the small WW domain
and the BPTI protein, the latter in millisecond scale (Shaw et al., 2010). Important
biological conformational changes, such as folding, often taken place on a time
scale between 10 µs and 1 ms, and this achievement - made possible through the
use of a customized supercomputer - improved the sampling time of such methods
a 100-fold.
1.2.4 Computational protein design
Computational protein design deals with finding a compatible sequence for a given
protein fold, and as such, is often termed the ‘inverse folding problem’. It shares
many of the same challenges posed by protein structure prediction and both
require an understanding of the often complicated relationship between sequence
and structure (Mandell & Kortemme, 2009b).
Traditionally, changing the properties of a protein is accomplished using ‘ratio-
nal design’, in which humans tinker and tweak with proteins, or directed evolution,
an experimental technique that harnesses natural selection at the molecular level
to customize proteins to meet certain specifics (Romero & Arnold, 2009). In the
last twenty years however, computer algorithms have entered this field to produce
in-silico models of optimized proteins that are then subjected to experimental
analysis (van der Sloot et al., 2009; Lutz, 2010). Protein design has an enormous
number of applications both in academia and industry: protein design techniques
for example have been used to increase the thermostability of an enzyme whilst re-
taining enzymatic activity (Korkegian et al., 2005); the affinity and specificity of the
family of leucine zipper transcription factors was altered (Grigoryan et al., 2009);
pathways in cellular systems have been artificially rewired, following a synthetic
19

1. INTRODUCTION
biology approach, making use of the modular architecture of signaling pathways
in eukaryotes (Pryciak, 2009).
1.2.5 Protein docking
Protein docking deals with finding the structure of the interaction rather than the
structure of the individual proteins, that is, predicting the quaternary structure
(Figure 1.3). Since proteins exercise their functions through the way they interact
with other proteins, an atomistic understanding is often required to infer functional
relationships between proteins, decipher signaling pathways and so on. Usually,
docking of two unbound proteins proceeds in two phases: first, a putative binding
mode is detected using geometric complementarity or Fast Fourier Transforma-
tions (e.g. using PatchDock (Schneidman-Duhovny et al., 2005) or ZDock (Chen
et al., 2003)). Second, a fine-grain refinement protocol fits both structures, some-
times allowing for slight conformational flexibility in the backbone and sidechains
of the proteins (e.g. Haddock (Dominguez et al., 2003) or RosettaDock (Gray
et al., 2003)). Similar to CASP, the recurring competition CAPRI (‘Critical Assess-
ment of PRedicted Interactions‘) measures the progress in the field using a blind
competition (Janin et al., 2003). While CAPRI largely seems to be an academic
exercise at this point and limited by the need of start structures, an increase in
prediction accuracy of the methods can be observed. Most of the improvements
are now towards introducing backbone flexibility in the search process (Lensink
et al., 2007).
1.3 Computational design of peptide ligands
Efficient design of high affinity peptide ligands via rational methods has been
a major obstacle to the development of peptides for therapeutics. However,
structural insights into the architecture of protein-peptide interfaces have recently
culminated in a number of computational approaches for the rational design of
peptides targeting proteins (Vanhee et al., 2009; London et al., 2010). These
methods provide a valuable alternative to experimental high-resolution structures
of target protein-peptide complexes, bringing closer the dream of in silico designed
20

peptides for therapeutic applications. Here we provide an extensive review of these
methods (Figure 1.7).
1.3.1 A better understanding of protein-peptide interactions
With the increase of high-resolution structures of protein-peptide complexes in
the Protein Data Bank (PDB, http://www.pdb.org) (Berman et al., 2000), and in
complementary databases such as the database of three-dimensional interact-
ing domains (3did, http://3did.irbbarcelona.org) (Stein et al., 2010) and
the non-redundant database of protein-peptide complexes (PepX, http://pepx.
switchlab.org) (Vanhee et al., 2010), large-scale structural studies have at-
tempted to describe the key properties of peptide binding (Vanhee et al., 2009;
London et al., 2010) (Chapter 5). For example, we have identified 505 unique
structural peptide-mediated interactions from a set of 1431 high-resolution struc-
tures, with a high over-representation of well-studied peptide interactions, such
as MHC-peptide complexes, thrombin-bound peptides, or peptides bound to the
α-ligand binding domain of the estrogen receptor (Vanhee et al. (2009), Chapter
4). In a set of 103 peptide complexes, it has been noted that many interfaces ex-
hibit tighter packing and more main chain hydrogen bonds than normally found in
protein-protein interfaces (London et al., 2010). This difference is logical: peptides
in isolation cannot be too hydrophobic for they would aggregate. Therefore part
of the binding energy to compensate the loss of entropy upon binding has to be
derived from main-chain/main-chain- and main-chain/side-chain hydrogen bonds.
In silico mutagenesis on these interaction interfaces has revealed that peptide
interfaces contain ‘hotspot’ residues, reminiscent of those found in protein-protein
interfaces (Clackson & Wells, 1995). Peptides that are 6-8 residues-long typically
contain two hotspot residues, whereas 3 hotspots are typical for peptides of length
9-11 (London et al., 2010). In general, peptides often exhibit an elongated struc-
ture upon binding (Stein & Aloy, 2010) and do not appear to induce any large
conformational changes in their binding partners in order to reduce the entropic
cost of complex formation (London et al., 2010). In contrast, many of the peptide
motifs are located in structurally disordered regions of proteins and only adopt a
stable fold upon binding to their protein partner (‘fold-on-binding’).
21

1. INTRODUCTION
hydrophobic hydrophilichydrogen bonds
β2
A B
C
Figure 1.6: PDZ-peptide interactions and peptide speciﬁcity. (A) The PDZ domain
of Erbin (PDB 1N7T) binds peptides in an elongated way, with multiple residues con-
tributing to the interaction. (B) The carboxy-terminal of the PDZ peptide binds tightly
in a hydrophobic pocket of the PDZ domain. (C) Distribution of 74 PDZ domains
in selectivity space, after singular value decomposition of correlated positions in the
peptide. The contributions of different peptide positions allows the PDZ domains to
optimize their specificity while avoiding cross-reactivity, revealing an even distribution
throughout selectivity space (Figure adapted from Stiffler et al. (2007)).
Despite their limited size, peptide interactions can be highly specific. For ex-
ample, many C-terminal peptides exhibit high specificity in vivo for certain specific
22

PDZ domains, while avoiding cross-reactivity (Stiffler et al., 2007) (Figure 1.6).
Interestingly, peptide specificity across 157 mouse PDZ domains matched with
217 peptide ligands could not be captured in discrete classes but instead showed a
more evenly distribution in selectivity space. Specificity in peptide interactions can
also be introduced by engineering approaches even when not observed in nature
(Reina et al., 2002; Grigoryan et al., 2009). One such example is the basic-region
leucine zipper (bZIP) family of transcription factors which share a high degree of
structural and sequence similarity and binds DNA upon homo- and/or heterodimer-
ization with an identical or related bZIP monomer subunit. By replacing one of the
wild-type monomer subunits with a variant that has the basic-region substituted
by an acidic region, DNA binding is prevented and consequently the activity of the
transcription factor will be inhibited. As these acidic variants inherit the dimeriza-
tion properties from the wild-type, it is difficult to inhibit one specific bZIP family
member due to intrinsic heterodimerization properties. Keating and co-workers
showed recently that it was feasible to design anti-bZIP peptide variants that bind
specifically to only a single member of the human bZIP family using a computa-
tional design approach (Grigoryan et al., 2009). An algorithm was employed that
explicitly considers both target and non-target interactions by selecting sequences
that minimize the loss of affinity for the target while maximizing differences in
affinity between any non-target members. Out of the 20 targeted bZIP fami-
lies, 10 designed peptides bound their representative member of the family with
considerable higher affinity than any other non-target competitors, demonstrating
peptide specificity. This study and related, albeit smaller-scale computational de-
sign studies (Reina et al., 2002; van der Sloot et al., 2006) demonstrate that specific
binding partners can be designed even in situations where there is a high degree
of sequence- and structural similarity between target and non-target molecules.
1.3.2 Peptide design based on sequence motifs
If structural information is present for a drug target – either from the single structure
or from the target in complex with its ligand – this information can be used in the
drug discovery process to speed up lead identification (Murray & Blundell, 2010).
Unfortunately, structural information is available for only an estimated 50% of all
23

1. INTRODUCTION
A | Structure-free peptide design
Phage-display
library screening
Quantitave
peptide assays
Sequence
motif scanning
B | Structure-based peptide design
Peptides derived from
protein complex structures
De novo peptide design with
structural scaffolds
experimental
experimental/computational
experimental/computational
computational
computational
Optimizing binding affinity
Peptide docking &
de novo design
1 2
1
1 2
3
2
1 2 3 4 5 6 7 8
selection
sequencing
direct
read-out
...SEQUENCESEQUENCE...
consensus motif
1 2 3 4 5 6 7 8
1 2 3 4 5
1 2 3 4 5 6 7 81 2 3 4 5 6 7 8
...
...
...
...
...
...
...
...
1 2 3 4 5 6 7 8
A
C
D
E
F
G
H
I
...
...
best binding motifPWM heatmap
aminoacids
position-specific mutagenesis
Figure 1.7: Example workﬂows for peptide design. (A) Structure-free peptide design
and (B) structure-based peptide design.
drug targets (Tanrikulu & Schneider, 2008), with a significant underrepresentation
of targets of high therapeutic importance such as membrane proteins (Baker,
24

2010). As a result, many research groups use existing information compiled in
databases of protein-peptide interactions to derive sequence-binding motifs that
could be used to design peptides.
The most obvious cases are the well studied SH2, SH3, PDZ and WW domains
where using simple sequence based rules one can design peptide templates (i.e.
for SH3 domains the well known PxxP motif, with x any amino acid, or for PDZ
class I the T/S-x-I/V/L-COOH), even though the discrete classification in motifs
has been disputed (Stiffler et al., 2007). These templates can be randomized at
the non-key positions and using different screening methods like yeast two-hybrid
(Y2H) or phage display, specific peptides can be found (Tonikian et al., 2008;
Giordano et al., 2010).
For cases in which enough information on peptide binding is available, other
more sophisticated approaches can be used. For example, an artificial neural
network, capable of learning to recognize non-linearity in complex datasets, was
trained on 650 peptides derived from T-cell epitopes and known to bind the Major
Histocompatibility Complex (MHC) class II molecule (Honeyman et al., 1998).
The neural network then was used to speed up epitope screening by reducing the
experimental T-cell assay from 68 to 22 peptides, with only a potential loss of 5
out of 17 epitopes. In more recent work, prediction was combined with genetic
algorithms, hidden markov models or other motif discovery algorithms (Lin et al.,
2008). Predicting from sequence alone is often difficult because of permissive
binding modes (MHC Class II for example accommodates from 9 to 18 residues,
although longer peptides have been observed too), multiple binding cores and
insufficient high-quality binding data, all leading to noisy and often inaccurate pre-
dictions. Adding structural information to the prediction process – approximately
169 X-ray structures of MHC in complex with an antigenic peptide are available in
PepX (Vanhee et al. (2010) and Chapter 4) – could increase prediction accuracies,
yet structure-based methods are still too slow for genome-wide screening (Lin
et al., 2008).
While these motif-scanning methodologies can lead to novel peptide discover-
ies, it is unclear whether they could lead to more generalist approaches when little
information is known about the target protein.
25

1. INTRODUCTION
1.3.3 Protein complexes as a source of active peptides
Peptide fragments derived from the crystallographic interface of a protein-protein
interaction are the major sources for rational drug design (Watt, 2006). In 2003,
the anti-HIV peptide enfuvirtide (Fuzeon ®) was the first peptide (36 a.a.) derived
from an extracellular protein interface to receive FDA approval (Table 1.1) and
presented a landmark in the field of peptide therapeutics (Naider & Anglister,
2009). Intracellular targets associated with HIV infection have been targeted with
peptides as well.
Transcription factors (TF), regarded as ‘undruggable’ by classical small molecule
drugs owing to their large protein-protein interfaces (Section 1.1.5), have now
been targeted with peptides too. The original discovery of a 59-mer peptide frag-
ment from the co-activator of the Mastermind-like family (MAML-1) required for
NOTCH signaling marked the start for structure-based inhibitor design (Weng
et al., 2003). The protein-peptide complex bound to DNA has been solved re-
cently by two independent groups, showing that the Mastermind peptide binds
as a twisted helix in the shallow protein-protein groove (Nam et al., 2006) (Figure
1.8A). Using a technique termed ‘peptide stapling’ (Schafmeister et al., 2000), a
16-residue peptide has been designed in which two residues are stapled together
using a hydrocarbon bond; this acts to constrain the helix functionality of the pep-
tide while improving binding affinity. The inhibitory α-helical peptide is able to
penetrate the cell membrane and bind to a shallow groove formed by the intra-
cellular domain of NOTCH and a DNA-bound TF, thereby blocking the interaction
with the co-activator MAML-1, required for recruiting the transcription machinery.
As a consequence, proliferation of T-cell acute lymphoblastic leukemia cells was
stopped.
In an entirely differently class of proteins, stapled α-helical peptides have been
shown to be effective as well, inhibiting members of the anti-apoptotic BCL2-family
(Stewart et al., 2010). These anti-apoptotic proteins contain a hydrophobic groove
that engages the death-promoting BH3-helix. Molecular mimicry of that helix with
a stapled peptide led to selective inhibition of the apoptotic protein (Figure 1.8B).
Both examples of successful helical peptide design suggest that nature’s use
peptides in protein-protein interfaces provides exciting opportunities for peptide
26

hydrocarbon staple
hydrophobic
positive charge
negative charge
hydrophilic
A B
Figure 1.8: Stapled helical peptides as potent therapeutic peptides. (A) Design of
MAML-1 derived peptides by taking different portions of the MAML-1 helix and turning
them into peptides (sliding window: orange, red, pink, orange, indicate different
peptides used for stabilization, PDB 2F8X). The 16 a.a. sequence of MAML-1 targeting
ICN1 and CSL is shown in red and was used to design the stapled peptide. Figure
adapted from (Moellering et al., 2009). (B) Crystal structure of the stapled helix
MCL-1 complex (PDB 3MK8). The stapled helix engages in binding in the canonical
binding groove. Hydrophobic interactions at the binding interface are reinforced by a
complementary polar interaction network. The side chains of hydrophobic (yellow),
positively charged (blue), negatively charged (red) and hydrophilic (green) residues
are shown. Figure adapted from (Stewart et al., 2010).
therapeutics. Scanning the entire PDB for interfaces involving helical segments
has revealed many potentially interesting interfaces in which α-helical interactions
play an important role, such as nuclear hormone receptors or other transcription
factor-cofactor interfaces (Jochim & Arora, 2009). The acquirement of Aileron’s
peptide stapling technology by Roche in August 2010 only confirms the potential
of these stabilized α-helix peptides as a new class of powerful peptide therapeutics
(Sheridan, 2010).
Yet, so far successful peptide designs seem to be largely limited to the α-
helical scaffold. One reason for this may be the large entropy cost associated
with structuring a peptide upon binding, which is easier to achieve using α-helical
peptides. For example, a leucine-zipper scaffold can be used to fix the helix bundle
(Grigoryan et al., 2009) or chemical stapling of side chain interactions to fix a single
helix scaffold (Schafmeister et al., 2000). For hairpin structures, cyclization has
also been employed (Craik et al., 2007). Yet another way to extend the structural
27

1. INTRODUCTION
stability of peptides is to incorporate them in a highly stable mini-protein, such as
knottins or other scaffolds (Gebauer & Skerra, 2009).
In conclusion, protein-complex derived peptides in combination with scaffold
designs are currently the most successful ways for therapeutic peptide design.
1.3.4 Protein docking and fragment based docking as tools for
peptide design
A generalist approach for peptide design uses structures or homology models of the
target in combination with docking algorithms to construct peptides along a chosen
path on the target surface. Several tools can be used to structurally detect putative
binding sites, for example using geometric amino acid-dependent preferences
derived from a set of structural binding modes (Petsalaki et al., 2009). Autodock
– a popular small-molecule docking algorithm – was used in combination with a
genetic algorithm to design tetrapeptides against a selected hydrophobic region
of α-synuclein, a protein associated with aggregation diseases (Abe et al., 2007).
Upon experimental validation, several binding peptides having µM dissociation
constants were identified that could be used as leads for further screening. Another
approach uses a Gaussian Network Model to identify the binding site and Autodock
is used to dock a series of dipeptides in a pairwise fashion on the grid along a flexible
binding path, finally resulting in an optimal peptide sequence for a given surface
(Unal et al., 2010).
While these methods work well for peptides comparable in size to small
molecules (typically no longer than 3-4 a.a.), the design of longer peptides still
represents major combinatorial problems.
1.3.5 Peptide design using protein-peptide complexes
Often structural information on the protein-peptide binding interface can be used
to the advantage of modeling the protein-peptide interaction. Most approaches
can be divided in three main scenarios:
1. Use a structure with a peptide ligand as template and model by homology
a domain-related sequence and then mutate in silico with a protein design
28

algorithm the amino acid side chains of the peptide in order to change
specificity while keeping peptide backbone coordinates fixed (Reina et al.,
2002; van der Sloot et al., 2006).
2. Use a structure with a peptide ligand to model by homology a domain-related
sequence while allowing peptide backbone flexibility. The crudest approach
uses different domain-peptide complexes of the same family to generate
different ligand backbone structures that can then be superimposed on the
target structure (Fernandez-Ballester et al., 2009). This was probed recently
for the PDZ domain, for which peptide specificity was computationally re-
designed using all available structures from the PDZ domain and compared
with large-scale phage display experiments (Smith & Kortemme, 2010). An-
other approach introduces backbone flexibility in the peptide starting from
a series of perturbed X-ray protein-peptide complexes (Raveh et al., 2010).
This protocol was validated on a set of 89 peptide complexes and it produced
models that showed sub-angstrom deviation from the native structure.
3. Use a structure while only knowing the approximate binding site of the
peptide ligand, for example based on evidence from related domains. The
PepSpec algorithm does not rely on a structural model of the peptide (King &
Bradley, 2010). Instead, it only needs a single anchor residue positioned in
the binding pocket and introduces implicit backbone movements in the re-
ceptor through ensemble modeling. Evaluation was carried out on a series of
peptide-binding domain families, such as PDZ, SH2 and SH3. In the absence
of an experimentally obtained structural model of the domain and relying on
a model based on a homologous domain, the algorithm captured some of
the peptide specificities that were matched with experimental phage display
libraries. However, large simulation times that scale unfavourable with pep-
tide length were reported (∼100-300 hours per peptide). In this field, we
achieved some progress too: peptides were designed for the PDZ domain,
the α-ligand binding domain and the SH2 domain within sub angstrom accu-
racy, using structural data from BriX interaction patterns in combination with
the FoldX force field for sidechain placement and energy evaluation (Figure
1.10 and Chapter 6).
29

1. INTRODUCTION
To summarize, fixed backbone peptide design (scenario 1) can be successfully
used in situations when a high degree of sequence and structural similarity exists –
or can be assumed – between template complex-structure and the target complex
structure while minimizing computational cost. When changes in backbone con-
formation are expected to play a greater role (e.g. in cases of decreasing sequence
and structural similarity or when insertions/deletions relative to the template struc-
ture have to be modeled) one of the approaches mentioned under scenario 2 can
be employed. When the exact binding site of the peptide is not known one of the
approaches mentioned under 3 has to be employed. For now, the computational
cost of these methods would limit the use to selected (design) case studies rather
than proteome-wide screening.
1.3.6 Remedying the lack of structural information
A recently reported method addresses the lack of structural information on mem-
brane proteins by employing a database of helix-helix interaction scaffolds to initiate
de novo peptide design (Yin et al., 2007) targeting integrins. Integrins are important
receptor proteins in mammalian cells, with a flexible domain of transmembrane
(TM) helices in the phospholipid bilayer. Integrins process extracellular signals,
transmitting them to the interior of the cell, thus making them attractive targets for
tumor therapy (Desgrosellier & Cheresh, 2010).
Peptides selectively targeting integrins αIIbβ3 and αVβ3 have been computa-
tionally designed with a new approach for rational peptide design (Yin et al., 2007).
While typical peptide designs are derived from the crystal structure of the target
protein-protein complex (Section 1.3.3), the design task mainly consists in stabiliz-
ing the hot-spot interactions with a peptide. However, because in most cases no
crystal structure of the interface is available, in this study the authors relied on a
repertoire of over 400 naturally occurring TM-helix interactions with recognizable
sequence signatures (Walters & DeGrado, 2006).
The computational design was divided in two steps: (1) the helix-helix inter-
action motifs served as realistic backbone templates (Figure 1.9A/B) – as opposed
to idealized helix pairs often used in protein design – and were selected based
on sequence compatibility with the target TMs; (2) the authors threaded the se-
30

anti-αIIb scaffold (PDB 1JB0)TM helix-pairs cluster
αIIb peptide threaded
on scaffold and repacked
1 2
A B C
Figure 1.9: Design of helices targeting Trans-Membrane (TM) proteins. Using (A)
a library of trans-membrane helix pairs from (B) unrelated structures (e.g. PDB 1JB0)
to (C) design novel peptide ligands. Figure adapted from (Yin et al., 2007).
quence of the target TM helix on one helix of the helix pair; they then selected a
compatible side chain for the peptide, using a side-chain-repacking algorithm for
the other helix (Figure 1.9C). The computationally designed peptides were subse-
quently validated in micelles, bacterial membranes and finally mammalian cells,
where they inhibited the binding between the TMs of the α- and β-subunits, thus
activating the integrin.
Multiple advances reported in this study are noteworthy. First, the authors
showed that peptides possess the capacity to integrate within the lipid bilayer and
selectively interact with and activate α-β-integrins in mammalian cells; this had
previously been difficult to accomplish owing to the lack of a solvent-exposed
binding site. Second, by using a library of naturally constrained helix-helix inter-
action motifs, they circumvented the need to model computationally expensive
inter-helical hydrogen bonding patterns and deviations from idealized helical ge-
ometry. Finally, this study provides exciting opportunities for designing peptide
inhibitors, getting around the need for high-resolution structures of the interface.
We have taken a radically different approach toward remedying the lack of
structural data on protein-peptide interactions (Chapter 5). Peptide binding motifs
often resemble intramolecular packing motifs, suggesting that the wealth of data
31

1. INTRODUCTION
target without ligand monomeric interaction motif designed ligand
helix-helix motif helix-loop motif cation-PI motif
binding
site
WT design
A
B
1
1
2
2
3
3
Figure 1.10: Innovative structural approaches in peptide design using BriX and
InteraX. (A) Examples of monomeric interaction motif mining in structures (red): (1)
a helix-helix interaction motif (PDB 153L); (2) a helix-loop interaction motif (PDB
153L); (3) a cation-PI interaction motif (PDB D1GA). See Chapter 5. (B) Sub-angstrom
design of peptide interactions using monomeric structures. (1) Structure of a PDZ
domain (PDB 2I1N) without its ligand and the helix and β2 strand forming the interface
(blue). (2) Identification of an intra-molecular helix-strand-strand motif (red) from an
unrelated structure (PDB 1GSA). (3) Comparison between the structures of the wild-
type sequence (EETSV) designed on the intra-molecular scaffold (red) and the original
ligand (gold). See Chapter 6.
on single-chain proteins could be used to model peptide interactions (Figure 1.10
and Chapter 5). Through analysis of a representable set of 301 protein-peptide
binding interfaces, we showed that more than half of all peptide interaction motifs
could be reliably modeled from sets of interacting fragments from the BriX database
of protein fragments (http://brix.crg.es and Chapter 2) (Baeten et al., 2008;
Vanhee et al., 2011). As a result, the amount of structural peptide interaction
motifs increased from a couple of 100 to over 100.000 fragment interactions. The
use of these intramolecular ‘fragment interaction motifs’ that have pre-optimized
32

packing represents an important conceptual breakthrough because it transforms
the whole database of protein structures into learning data for computer algorithms
that design peptide substrates de novo, as we describe in Chapter 6. In the near
future, we expect that such algorithms will start to appear so that large-scale virtual
peptide screening will become a valid opportunity.
33

REFERENCES
References
Abe, K., Kobayashi, N., Sode, K. & Ikebukuro, K.
(2007). Peptide ligand screening of alpha-
synuclein aggregation modulators by in sil-
ico panning. BMC Bioinformatics, 8, 451. 28
Alexander, P.A., He, Y., Chen, Y., Orban,
J. & Bryan, P.N. (2009). A minimal se-
quence code for switching protein structure
and function. Proceedings of the National
Academy of Sciences, 106, 21149–54. 16
Altschul, S.F., Madden, T.L., Sch¨affer, A.A.,
Zhang, J., Zhang, Z., Miller, W. & Lipman, D.J.
(1997). Gapped blast and psi-blast: a new
generation of protein database search pro-
grams. Nucleic Acids Research, 25, 3389–
402. 16
Alvarado, D., Klein, D.E. & Lemmon, M.A.
(2010). Structural basis for negative coop-
erativity in growth factor binding to an egf
receptor. Cell, 142, 568–79. 11
Anfinsen, C. (1973). Principles that govern the
folding of protein chains. Science. 5, 13
Antosova, Z., Mackova, M., Kral, V. & Macek, T.
(2009). Therapeutic application of peptides
and proteins: parenteral forever? Trends in
biotechnology, 27, 628–35. 8
Arkin, M.R. & Wells, J.A. (2004). Small-
molecule inhibitors of protein-protein inter-
actions: progressing towards the dream. Nat
Rev Drug Discov, 3, 301–17. 8
Audie, J. & Boyd, C. (2010). The synergistic
use of computation, chemistry and biology
to discover novel peptide-based drugs: the
time is right. Curr Pharm Des, 16, 567–82. 8
Baeten, L., Reumers, J., Tur, V., Stricher, F.,
Lenaerts, T., Serrano, L., Rousseau, F. &
Schymkowitz, J. (2008). Reconstruction of
protein backbones from the brix collection
of canonical protein fragments. PLoS Com-
put Biol, 4, e1000083. 17, 32
Baker, D. & Sali, A. (2001). Protein structure
prediction and structural genomics. Science,
294, 93–6. 13, 15
Baker, M. (2010). Making membrane proteins
for structures: a trillion tiny tweaks. Nature
Publishing Group, 7, 429–434. 24
Berman, H.M., Westbrook, J., Feng, Z., Gilliland,
G., Bhat, T.N., Weissig, H., Shindyalov, I.N. &
Bourne, P.E. (2000). The protein data bank.
Nucleic Acids Research, 28, 235–42. 12, 21
Bowie, J.U., L¨uthy, R. & Eisenberg, D. (1991). A
method to identify protein sequences that
fold into a known three-dimensional struc-
ture. Science, 253, 164–70. 16
Brooks, B., Bruccoleri, R., Olafson, B.D., States,
D.J., Swaminathan, S. & Karplus, M. (1983).
Charmm: A program for macromolecular
energy, minimization, and dynamics calcu-
lations. J Comput Chem. 19
Brunton, L., Lazo, J. & Parker. . . , K. (2006).
Goodman & gilman’s the pharmacological
basis of therapeutics. mcgraw-hill.co.uk. 7,
10
Chandler, D. (2005). Interfaces and the driving
force of hydrophobic assembly. Nature, 437,
640–7. 6
Chandonia, J.M. & Brenner, S.E. (2006). The im-
pact of structural genomics: expectations
and outcomes. Science, 311, 347–51. 13
Chen, R., Li, L. & Weng, Z. (2003). Zdock: an
initial-stage protein-docking algorithm. Pro-
teins, 52, 80–7. 20
Chruszcz, M., Domagalski, M., Osinski, T., Wlo-
dawer, A. & Minor, W. (2010). Unmet chal-
lenges of structural genomics. Curr Opin
Struct Biol. 13
Citri, A. & Yarden, Y. (2006). Egf-erbb signalling:
towards the systems level. Nat Rev Mol Cell
Biol, 7, 505–16. 2
Clackson, T. & Wells, J.A. (1995). A hot spot of
binding energy in a hormone-receptor inter-
face. Science, 267, 383–6. 11, 21
34

REFERENCES
Conn, P.J., Christopoulos, A. & Lindsley, C.W.
(2009). Allosteric modulators of gpcrs: a
novel approach for the treatment of cns dis-
orders. Nat Rev Drug Discov, 8, 41–54. 11
Conte, L.L., Chothia, C. & Janin, J. (1999). The
atomic structure of protein-protein recogni-
tion sites. Journal of Molecular Biology, 285,
2177–98. 10
Cooper, S., Khatib, F., Treuille, A., Barbero, J.,
Lee, J., Beenen, M., Leaver-Fay, A., Baker, D.,
Popovi´c, Z. & Players, F. (2010). Predicting
protein structures with a multiplayer online
game. Nature, 466, 756–60. 18
Cornell, W.D., Cieplak, P., Bayly, C.I., Gould,
I.R., Merz, K.M., Ferguson, D.M., Spellmeyer,
D.C., Fox, T., Caldwell, J.W. & Kollman, P.A.
(1995). A second generation force field for
the simulation of proteins, nucleic acids, and
organic molecules. J. Am. Chem. Soc.. 19
Craik, D.J., Clark, R.J. & Daly, N.L. (2007).
Potential therapeutic applications of the
cyclotides and related cystine knot mini-
proteins. Expert opinion on investigational
drugs, 16, 595–604. 27
Creighton, P. (1984). Structures and molecular
principles. Proteins. 13
Desgrosellier, J.S. & Cheresh, D.A. (2010). Inte-
grins in cancer: biological implications and
therapeutic opportunities. Nat Rev Cancer,
10, 9–22. 30
Desmet, J., Maeyer, M., Hazes, B. & Laster, I.
(1992). The dead-end elimination theorem
and its use in protein side-chain positioning.
Nature. 14
Dill, K.A. (1990). Dominant forces in protein
folding. Biochemistry, 29, 7133–55. 6
Dominguez, C., Boelens, R. & Bonvin, A.M.J.J.
(2003). Haddock: a protein-protein dock-
ing approach based on biochemical or bio-
physical information. Journal of the Ameri-
can Chemical Society, 125, 1731–7. 20
Drews, J. (2000). Drug discovery: a historical
perspective. Science, 287, 1960–4. 8
Duan, Y. & Kollman, P.A. (1998). Pathways to
a protein folding intermediate observed in a
1-microsecond simulation in aqueous solu-
tion. Science, 282, 740–4. 12
Easton, D.M., Nijnik, A., Mayer, M.L. & Hancock,
R.E.W. (2009). Potential of immunomodu-
latory host defense peptides as novel anti-
infectives. Trends in biotechnology, 27, 582–
90. 7
Eglen, R. & Reisine, T. (2010). Human kinome
drug discovery and the emerging importance
of atypical allosteric inhibitors. Expert Opin-
ion on Drug Discovery, 5, 277–290. 11
Eisenmesser, E.Z., Bosco, D.A., Akke, M. & Kern,
D. (2002). Enzyme dynamics during cataly-
sis. Science, 295, 1520–3. 12
Fernandez-Ballester, G., Beltrao, P., Gonzalez,
J.M., Song, Y.H., Wilmanns, M., Valencia, A.
& Serrano, L. (2009). Structure-based pre-
diction of the saccharomyces cerevisiae sh3-
ligand interactions. Journal of Molecular Bi-
ology, 388, 902–16. 29
Fersht, A.R. (2008). From the first protein struc-
tures to our current knowledge of protein
folding: delights and scepticisms. Nat Rev
Mol Cell Biol, 9, 650–654. 12
Gebauer, M. & Skerra, A. (2009). Engineered pro-
tein scaffolds as next-generation antibody
therapeutics. Curr Opin Chem Biol, 13, 245–
55. 28
Gianni, S., Guydosh, N.R., Khan, F., Caldas, T.D.,
Mayor, U., White, G.W.N., DeMarco, M.L.,
Daggett, V. & Fersht, A.R. (2003). Unify-
ing features in protein-folding mechanisms.
Proceedings of the National Academy of Sci-
ences of the United States of America, 100,
13286–91. 7
35

REFERENCES
Giordano, R.J., Card´o-Vila, M., Salameh, A.,
Anobom, C.D., Zeitlin, B.D., Hawke, D.H., Va-
lente, A.P., Almeida, F.C.L., N¨or, J.E., Sidman,
R.L., Pasqualini, R. & Arap, W. (2010). From
combinatorial peptide selection to drug pro-
totype (i): targeting the vascular endothelial
growth factor receptor pathway. Proceedings
of the National Academy of Sciences of the
United States of America, 107, 5112–7. 25
Grant, A., Lee, D. & Orengo, C. (2004). Progress
towards mapping the universe of protein
folds. Genome Biol, 5, 107. 17
Gray, J.J., Moughon, S., Wang, C., Schueler-
Furman, O., Kuhlman, B., Rohl, C.A. & Baker,
D. (2003). Protein-protein docking with si-
multaneous optimization of rigid-body dis-
placement and side-chain conformations.
Journal of Molecular Biology, 331, 281–99.
20
Grigoryan, G., Reinke, A.W. & Keating, A.E.
(2009). Design of protein-interaction speci-
ficity gives selective bzip-binding peptides.
Nature, 458, 859–64. 19, 23, 27
Haliloglu, T. & Erman, B. (2009). Analysis of cor-
relations between energy and residue fluctu-
ations in native proteins and determination
of specific sites for binding. Phys. Rev. Lett.,
102, 088103. 11
Haliloglu, T., Gul, A. & Erman, B. (2010). Pre-
dicting important residues and interaction
pathways in proteins using gaussian network
model: binding and stability of hla proteins.
PLoS Comput Biol, 6, e1000845. 11
Han, J.D.J., Bertin, N., Hao, T., Goldberg, D.S.,
Berriz, G.F., Zhang, L.V., Dupuy, D., Walhout,
A.J.M., Cusick, M.E., Roth, F.P. & Vidal, M.
(2004). Evidence for dynamically organized
modularity in the yeast protein-protein in-
teraction network. Nature, 430, 88–93. 4
Hancock, R.E.W. & Sahl, H.G. (2006). An-
timicrobial and host-defense peptides as
new anti-infective therapeutic strategies. Nat
Biotechnol, 24, 1551–7. 7
He, Y., Chen, Y., Alexander, P., Bryan, P.N. &
Orban, J. (2008). Nmr structures of two de-
signed proteins with high sequence identity
but different fold and function. Proceedings
Honeyman, M.C., Brusic, V., Stone, N.L. & Har-
rison, L.C. (1998). Neural network-based
prediction of candidate t-cell epitopes. Nat
Biotechnol, 16, 966–9. 25
Hopkins, A.L. & Groom, C.R. (2002). The drug-
gable genome. Nat Rev Drug Discov, 1, 727–
30. 9
Huang, H., Li, L., Wu, C., Schibli, D., Colwill, K.,
Ma, S., Li, C., Roy, P., Ho, K., Songyang, Z.,
Pawson, T., Gao, Y. & Li, S.S.C. (2008). Defin-
ing the specificity space of the human src
homology 2 domain. Mol Cell Proteomics,
7, 768–84. 2
Janin, J., Henrick, K., Moult, J., Eyck, L.T., Stern-
berg, M.J.E., Vajda, S., Vakser, I., Wodak, S.J. &
of PRedicted Interactions, C.A. (2003). Capri:
a critical assessment of predicted interac-
tions. Proteins, 52, 2–9. 20
Jiang, L., Althoff, E.A., Clemente, F.R., Doyle,
L., Rothlisberger, D., Zanghellini, A., Gallaher,
J.L., Betker, J.L., Tanaka, F., Barbas, C.F., Hil-
vert, D., Houk, K.N., Stoddard, B.L. & Baker,
D. (2008). De novo computational design
of retro-aldol enzymes. Science, 319, 1387–
1391. 18
Jochim, A.L. & Arora, P.S. (2009). Assessment
of helical interfaces in protein-protein inter-
actions. Mol Biosyst, 5, 924–6. 27
Jones, S. & Thornton, J.M. (1996). Principles
of protein-protein interactions. Proceedings
Kendrew, J., BODO, G., Dintzis, H., Parrish, R. &
WYCKOFF, H. (1958). A three-dimensional
model of the myoglobin molecule obtained
by x-ray analysis. Nature. 12
36

Predicting peptide interactions using protein building blocks

Predicting peptide interactions using protein building blocks

Recommended

Recommended

More Related Content

Similar to Predicting peptide interactions using protein building blocks

Similar to Predicting peptide interactions using protein building blocks (20)

More from Peter Vanhee

More from Peter Vanhee (7)

Recently uploaded

Recently uploaded (20)

Predicting peptide interactions using protein building blocks