1 3Reference
Methodology, Concepts,
and Mode-of-Action
Drug Design
Gerhard Klebe
Drug Design
Gerhard Klebe
Drug Design
Methodology, Concepts, and
Mode-of-Action
With 494 Figures and 44 Tables
Gerhard Klebe
Institute of Pharmaceutical Chemistry
Philipps-University Marburg
Marburg, Germany
Translator
Leila Telan
D€
usseldorf, Germany
ISBN 978-3-642-17906-8 ISBN 978-3-642-17907-5 (eBook)
ISBN 978-3-642-17908-2 (print and electronic bundle)
DOI 10.1007/978-3-642-17907-5
Springer Heidelberg New York Dordrecht London
This work is based on the second edition of “Wirkstoffdesign”, by Gerhard Klebe, published
by Spektrum Akademischer Verlag 2009, ISBN: 978-3-8274-2046-6
Library of Congress Control Number: 2013933987
# Springer-Verlag Berlin Heidelberg 2013
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or
information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts
in connection with reviews or scholarly analysis or material supplied specifically for the purpose of being
entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication
of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the
Publisher’s location, in its current version, and permission for use must always be obtained from
Springer. Permissions for use may be obtained through RightsLink at the Copyright Clearance Center.
Violations are liable to prosecution under the respective Copyright Law.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt
from the relevant protective laws and regulations and therefore free for general use.
While the advice and information in this book are believed to be true and accurate at the date of
publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for
any errors or omissions that may be made. The publisher makes no warranty, express or implied, with
respect to the material contained herein.
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)
Preface
The present handbook on drug design builds on the German version first written by
Hans-Joachim Böhm, Hugo Kubinyi, and me in 1996. After 12 years of success on
the market, the German version of this handbook was entirely rewritten and
significantly extended, then by me as the sole author. The new edition particularly
considers novel approaches in drug discovery and many successful examples
reported in literature on structure-based drug design and mode-of-action analysis.
This novel version appeared in 2009 on the German market. Several attempts were
made to translate this book into English to make it available to a wider audience.
This intention was driven by the fact that the author was repeatedly approached
with the question as to why such a successful book is not available in the English
language. An analysis of the textbook market made apparent that no similar
compendium was (and still is) available covering the same field of interest. Finally,
Springer agreed in the translation project, and Dr. Leila Telan, a gifted bilingual
medicinal chemist and physician, was found willing to take the task of producing
a first draft of a cover-to-cover translation of the German original. This version was
corrected, and some chapters extended by the author. The book is meant for
students of chemistry, pharmacy, biochemistry, biology, chemical biology, and
medicine interested in the design of new active agents and the structural founda-
tions of drug action. But it is also tailored to experts in drug industry who want to
obtain a more comprehensive overview of various aspects of the drug discovery
process.
Such a book project would not have been possible without the help of many
friends and colleagues. First of all, I want to express my sincere thanks to Dr. Leila
Telan, D€
usseldorf, Germany, who produced the first version of this translation. Her
version and the modifications of the author have been carefully proofread by many
colleagues in the field. Their help is highly appreciated. Furthermore, I would like
to acknowledge the help of Prof. Dr. Hugo Kubinyi, Heidelberg, Germany, who
assisted in correcting the first version of the English translation. Particular thanks
go to Dr. Simon Cottrell, Cambridge, England, and to Dr. Nathan Kilah, Hobat,
Tasmania, Australia, for their excellent and very thorough proofreading of the
different chapters. The project was ideally guided by Dr. Daniel Quinones and
v
Dr. Sylvia Blago, Springer, Heidelberg, Germany. The author is grateful to the
publisher for their assistance and technical support in producing the electronic and
printed version of this handbook.
Marburg, Germany, May 2013 Gerhard Klebe
vi Preface
Contents
Part I Fundamentals in Drug Research . . . . . . . . . . . . . . . . . . . . . . 1
1 Drug Research: Yesterday, Today, and Tomorrow . . . . . . . . . . . . 3
2 In the Beginning, There Was Serendipity . . . . . . . . . . . . . . . . . . . 23
3 Classical Drug Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4 Protein–Ligand Interactions as the Basis for Drug Action . . . . . . 61
5 Optical Activity and Biological Effect . . . . . . . . . . . . . . . . . . . . . . 89
Part II The Search for the Lead Structure . . . . . . . . . . . . . . . . . . . 111
6 The Classical Search for Lead Structures . . . . . . . . . . . . . . . . . . . 113
7 Screening Technologies for Lead Structure Discovery . . . . . . . . . 129
8 Optimization of Lead Structures . . . . . . . . . . . . . . . . . . . . . . . . . . 153
9 Designing Prodrugs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
10 Peptidomimetics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
Part III Experimental and Theoretical Methods . . . . . . . . . . . . . . . 209
11 Combinatorics: Chemistry with Big Numbers . . . . . . . . . . . . . . . . 211
12 Gene Technology in Drug Research . . . . . . . . . . . . . . . . . . . . . . . . 233
13 Experimental Methods of Structure Determination . . . . . . . . . . . 265
14 Three-Dimensional Structure of Biomolecules . . . . . . . . . . . . . . . 291
15 Molecular Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
16 Conformational Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
vii
Part IV Structure–Activity Relationships and Design
Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
17 Pharmacophore Hypotheses and Molecular Comparisons . . . . . . 349
18 Quantitative Structure–Activity Relationships . . . . . . . . . . . . . . . 371
19 From In Vitro to In Vivo: Optimization of ADME and
Toxicology Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
20 Protein Modeling and Structure-Based Drug Design . . . . . . . . . . 429
21 A Case Study: Structure-Based Inhibitor Design for
tRNA-Guanine Transglycosylase . . . . . . . . . . . . . . . . . . . . . . . . . . 449
Part V Drugs and Drug Action: Successes of Structure-Based
Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469
22 How Drugs Act: Concepts for Therapy . . . . . . . . . . . . . . . . . . . . . 471
23 Inhibitors of Hydrolases with an Acyl–Enzyme Intermediate . . . . 493
24 Aspartic Protease Inhibitors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533
25 Inhibitors of Hydrolyzing Metalloenzymes . . . . . . . . . . . . . . . . . . 565
26 Transferase Inhibitors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599
27 Oxidoreductase Inhibitors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641
28 Agonists and Antagonists of Nuclear Receptors . . . . . . . . . . . . . . 697
29 Agonists and Antagonists of Membrane-Bound Receptors . . . . . . 719
30 Ligands for Channels, Pores, and Transporters . . . . . . . . . . . . . . 745
31 Ligands for Surface Receptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 777
32 Biologicals: Peptides, Proteins, Nucleotides, and Macrolides
as Drugs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 813
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 851
Illustration Source References . . . . . . . . . . . . . . . . . . . . . . . . . . . . 853
Name Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 863
Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 867
viii Contents
Introduction
Drug design is a science, a technology, and an art all in one. An invention is the
result of a creative act, and a discovery is the detection of an already-existing
reality. Design encompasses the two processes with emphasis on a targeted
approach based on the available knowledge and technology. Furthermore, the
creativity and intuition of the researcher play a decisive role.
Drugs are all substances that affect a system by inducing a particular effect. In
the context of this book, drugs are substances that exhibit a biochemical or
pharmacological effect, in most cases medications, that achieve a therapeutic result
in humans.
The idea of rational drug design is not new. Organic compounds were prepared
more than a century ago with the goal of attaining new medicines. The sedatives
chloral hydrate (1869) and urethane (1885), and the antipyretics phenacetin
(1888) and acetylsalicylic acid (1897) are early examples of how targeted com-
pounds can be made that have favorable therapeutic properties by starting with
a working hypothesis. The fact, that the hypotheses in all four cases were more or
less incorrect (▶ Sects. 2.1, ▶ 2.2, and ▶ 3.1) simultaneously demonstrates one of
the main problems of drug design.
In the case of the artistic design of a poster or commodity, or, in the case of
engineering, the design of an automobile, a computer, or a machine, the result is
usually predictable. In contrast, the design of a drug is even today not completely
foreseeable. The consequences of the smallest structural changes of a drug on its
biological properties and target tissue are too multifaceted and at present too poorly
understood.
Until modern times, scientists have worked on the principle of trial and error to
find new medicines. By this they derived mostly empirical rules that have contrib-
uted to a knowledge base for rational drug design and which has been translated by
individual researchers more or less successfully into practice. Today new technol-
ogies are available for drug research, for instance, combinatorial chemistry, gene
technology, and automated screening methods with high throughput, protein crys-
tallography and fragment screening, virtual screening, and the application of bio-
and chemoinformatics.
ix
In many cases the molecular mechanisms of the mode of action of medicines are
fairly well understood, but in other cases we are at the threshold of comprehension.
Many of these mechanisms will be discussed in this book. Progress in protein
crystallography and NMR spectroscopy allows the determination of the three-
dimensional structure of protein–ligand complexes on a routine basis. As is
shown in many of the illustrations in this book (for a general explanation of
“reading” these illustrations, see the appendix at the end of this book) these
structures make a decisive contribution to the targeted design of drugs. 3D struc-
tures with up to atomic resolution are known for approximately 550,000 small
molecules and more than 85,000 proteins and protein–ligand complexes, and the
numbers are increasing exponentially. Methods for the prediction of the 3D struc-
tures of small molecules are now mature, and semiempirical and ab initio quantum
chemical calculations on drugs are now routinely performed. The sequencing of the
human genome is complete, and the genomes of other organisms are reported nearly
every week, including those of important human pathogens. The age of structural
genomics has begun, and it is only a matter of time before the 3D structures of entire
gene families are available. Given enough sequence homology, modeling programs
can nowadays achieve an impressive reliability. In the meantime, the composition of
entire genomes is being processed with structure-prediction programs. There are
already interesting approaches for the de novo prediction of 3D protein structures,
and the first correct 3D structural predictions have been successfully accomplished.
Structure-based and computer-aided design of new drugs is here to stay in
practical drug research. Computer programs serve the search for, modeling of,
and targeted design of new drugs. In countless cases these techniques have assisted
the discovery and optimization of new drugs. On the other hand, a too-strict and
one-sided focus on the computational results bears the danger of losing sight of the
available knowledge of the relationship between the chemical structure and bio-
logical activity. Another danger is the limited consideration of an active agent only
with respect to its interaction with one single target without considering the other
essential requirements for a drug, for instance, the pharmacokinetic and toxicolog-
ical properties. In the last decade, intensive research effort has gone into the
compilation of empirical guidelines to predict bioavailability, toxicological pro-
files, and metabolic properties (ADME parameters). The ability to predict the
metabolic profile for a given xenobiotic by the arsenal of cytochrome P450
enzymes or to predict for each individual patient the metabolic peculiarities is
still a dream. Nonetheless, just such an individually adjusted therapy and dosing
regime is within the realm of possibilities. It is also conceivable that in the
foreseeable future, gene sequencing of each of us will be financially feasible and
will require a manageable and justifiable amount of time and effort. This will open
entirely new perspectives for drug research. Whether this pushes open the gate to
individualized personal medicines will be a question of cost. The theme of this book
is to introduce the methods required for drug design particularly based on structural
and mechanistic evidence. By the use of well-selected examples the route to the
discovery and development of new medicines is discussed and will be reflected
under the constantly changing conditions.
x Introduction
Drug research is a multidisciplinary field in which chemists, pharmacists,
technologists, molecular biologists, biochemists, pharmacologists, toxicologists,
and clinicians work together to pave the way for a substance to become
a therapeutic. Because of this, the majority of drug developments is done in an
industrial setting. It is only there that the financial requirements and structural
organization are in place to allow a successful cooperation of all disciplines that are
necessary to channel the research in the required manner toward a common goal.
The fundamentals and future-oriented innovations of drug research are, however,
increasingly being established in academia. Interestingly, an increasing amount of
research activities at the universities have recently been devoted to drug develop-
ments for infectious diseases and for diseases that particularly afflict developing
countries, which have been sorely neglected by the profit oriented pharmaceutical
industry of the industrialized world. This is even more alarming when we consider
that our improved quality of life and prolonged life expectancy are attributable to,
above all else, a victory over devastating infectious diseases. We can only hope
that politicians recognize this situation in time and make the resources and
organizational infrastructure available so that the academic research groups can
step into the breach in an efficient and goal-oriented way.
The rising costs of research and development, an already high standard of health
care in many indications, and distinctly increased safety awareness and the con-
comitant demanding standards of the regulatory authorities have caused the number
of new chemical entities (NCE) to steadily decrease over the last decades from 70–
100 per year from 1960 to 1969, to 60–70 from 1970 to 1979, to an average of 50
between 1980 and 1989, to 40–45 in the 1990s, and even less in the new millen-
nium. Despite this, there have still been new developments, and distinct progress
has been made in the therapy of, for example, psychiatric diseases, arterial hyper-
tension, gastrointestinal ulcers, and leukemia in addition to the broadening of
indications for older compounds. Of the blockbusters, a disproportionately large
percentage of the drugs were found in the last years by using a rational approach.
The cost of developing and launching a new drug has increased continuously; to
date, it is between US $800–$1,600 million. Only large pharmaceutical companies
can still afford these costs, with the associated risk of failure in the last phases of
clinical trials, or a misjudgment of the therapeutic potential of a new drug.
There is talk nowadays of a paradigm shift in pharmaceutical research. In
research this refers to the use of new technologies; in the market place this refers
to a concentration process of corporate mergers and acquisitions. The last decade
brought about many such “mega-mergers.” Larger and larger sales figures are being
achieved by fewer and fewer companies. In parallel to this, a very dynamic and
hardly insignificant scene has developed of small- to medium-sized, highly flexible
biotech companies. The areas of gene technology, combinatorial chemistry, sub-
stance profiling, and rational design are particularly well represented in numerous
such companies. Larger companies try to outsource their riskier research concepts
to these companies and contract their services for everything up to the development
of clinical candidates. However, the success of this scene has led to the result that
the “good” companies have been swallowed by the “big” companies. Many former
Introduction xi
employees of “big pharma” have established their own small companies with an
innovative idea. If the idea was good and successful, after a few years these
innovators find themselves once again incorporated into the organization of
a “big pharma” company.
At the same time the prescribing practices in all areas of health care have
changed. Formerly it was the physician alone, occasionally in consultation with
a pharmacist, who was responsible for the pharmacological therapy of the patient.
Today cost-cutting measures, “negatives lists,” health insurance, the purchasing
departments of hospitals and pharmacies, the ubiquitous Internet, and even public
opinion influence therapies to an ever larger extent.
The drug market, with its US $600 billion, is an extremely attractive market.
Furthermore, this market is characterized by dynamic growth, which is decidedly
more than in other markets. The best selling drug in 2005, Lipitor®
(Sortis®
in
Europe; atorvastatin) achieved US $12.2 billion in annual sales. Only illegal
narcotics like heroin and cocaine have higher sales figures.
Tailored medications – Will the latest technologies really deliver on this prom-
ise? What makes drug research so difficult? To use a parable, it is something like
playing against an almighty chess computer. The rules are known to both sides, but
it is very difficult to comprehend the consequences of each individual move during
a complicated middle game. A biological organism is an extremely complicated
system. The effect of a drug on the system and the effect of the system on the drug
are multifaceted. Every structural change made with the goal of optimizing one
particular characteristic simultaneously changes the finely tuned equilibrium of the
other characteristics of the drug.
The knowledge of the interplay between the chemical structure and the biolog-
ical effect must be united with the newest technology and results of genetic research
to purposefully develop new medicines. It is also necessary to define the range of
applications and the limitations of new technologies. Theory and modeling cannot
exist detached from experiment. The results of calculations depend strongly on the
boundary parameters of the simulation. The results collected at one system are only
conditionally transferable to other systems. Only an experienced specialist is in
a position to fully exploit the special potential of theoretical approaches. The claims
that some software and venture capital companies make, that their results automat-
ically lead to success, should be considered with some skepticism. This book should
be helpful in these situations too, to separate the wheat from the chaff and to
identifying the application range of these method as well as their limitations.
This book is about drug research and the mode of action of medicines. It is
different from classical textbooks on pharmaceutical chemistry in its structure and
goals. The principles, methods, and problems associated with the search for new
medicines are the themes. Classes of drugs are not discussed, but rather the way that
these drugs were discovered and some insights into the structural requirements for
their action on a particular target protein. As the title suggests, the book is meant for
students of chemistry, pharmacy, biochemistry, biology, and medicine who are
interested in the art of designing new medicines and the structural fundamentals of
how drugs act on their targets.
xii Introduction
In the first section, after an introduction to the history of medicines and the
concept of serendipity as an unpredictable but always very successful concept in
drug research, examples from classical drug research will be presented.
A discussion about the fundamentals of drug action, the ligand–receptor interaction,
and the influence of the three-dimensional structure on the efficacy of a drug round
the section out. In the second section, the search for lead structures and their
optimization and the use of prodrug strategies are introduced. New screening
technologies but also the systematic modification of structures by using the concept
of bioisosteres and a peptidomimetic approach are discussed. In the third section,
experimental and theoretical methods applied in drug research are described.
Combinatorial chemistry has afforded access to a wide variety of test substances.
Gene technology has produced the target proteins in their pure form, and has helped
to characterize these proteins’ properties and function from the molecular level to
the cellular assembly, all the way to the organism level. It has built a bridge between
understanding the effects of a drug therapy on the complex microstructure of a cell
and in systems biology of an organism. The spatial structure of proteins and
protein–ligand complexes are accessible through NMR spectroscopy and X-ray
crystallography. Their structural principles are becoming better understood and are
increasingly allowing us access to the binding geometry of the drugs. The computer
methods and molecular dynamics simulations of complex conformational analysis
have also sharpened our understanding of targeted drug design. The fourth section
introduces design techniques such as pharmacophore and receptor modeling, and
discusses the methods of, and uses for, quantitative structure–activity relationships
(QSAR). Insights into the transport and distribution of drugs in biological systems
are given, and different techniques for structure-based design are presented. A drug-
design case study from the author’s research closes the chapter. The fifth section of
this book focuses on the core question of pharmacology: How drugs actually work?
Enzymes, receptors, channels, transporters, and surface proteins are divided into
individual chapters and discussed as a group of target proteins. The spatial structure
of the protein and modes of action are used to elucidate in detail why a drug works
and why it must exhibit a particular geometry and structure to work. Exemplarily,
the contributions of structure-based and computer-aided design to the discovery of
new drugs are presented in these chapters, and other aspects are also shifted into the
spotlight.
Because of the concept of this book, many important drugs are not considered or
are only fleetingly mentioned. The same is true of receptor theory, pharmacokinet-
ics and metabolism, the basics of gene technology, and statistical methods. The
biochemical, molecular biological, and pharmacological fundamentals of the mode
of action of drugs, which are important for the understanding of the theme of drug
design, are only commented upon in outline form. Other disciplines that are critical
for the development of an active substance to a medicine and application to
patients, such as pharmaceutical formulations, toxicological testing, and clinical
trials, are not themes that are covered in this book.
The selection of examples from therapeutic areas was made subjectively and for
didactic reasons based on case studies and to bring other aspects of drug research to
Introduction xiii
the foreground. A balanced presentation of the methods of drug design and their
practical application was attempted. The interested reader does not have to read the
book chronologically. If the reader’s interest is purely on drugs and their mode of
action, then they can also begin with ▶ Chap. 22. There are many cross references
in the text to help the reader to find the passages in other parts of the book that are
necessary for an exact comprehension of what is being discussed at any given part.
The references and literature suggestions that follow cite particularly recommend-
able monographs and are ordered alphabetically; journals and series on the themes
that are discussed in later chapters are not mentioned specifically again.
Literature
Monographs
Brunton L, Lazo J, Parker K (2005) Goodman & Gilman’s the pharmacological basis of thera-
peutics, 11th edn. McGraw-Hill, Europe
Ganellin CR, Roberts SM (eds) (1993) Medicinal chemistry. The role of organic chemistry in drug
research, 2nd edn. Academic Press, London
King FD (ed) (2003) Medicinal chemistry: principles and practice, 2nd edn. The Royal Society of
Chemistry, Cambridge
Krogsgaard-Larsen P, Bundgaard H (eds) (1991) A textbook of drug design and development.
Harwood Academic Publishers, Chur, Schweiz
Lednicer D (ed) (1993) Chronicles of drug discovery, vol 3. American Chemical Society,
Washington, DC and earlier volumes from this series
Lemke TL, Williams DA (2008) Foye’s principles of medicinal chemistry, 6th edn. Williams &
Wilkins, Baltimore
Mannhold R, Kubinyi H, Folkers G (eds) Methods and principles in medicinal chemistry. Wiley-
VCH, Weinheim, Series with Guest Editors
Maxwell RA, Eckhardt SB (1990) Drug discovery. A casebook and analysis. Humana Press,
Clifton
Mutschler E, Derendorf H (1995) Drug action, basic principles and therapeutic aspects. CRC
Press:Boca Raton/Ann Arbor/London/Tokyo
Silverman RB (2004) The organic chemistry of drug design and drug action, 2nd edn. Elsevier/
Academic Press, Burlington
Wermuth CG, Koga N, König H, Metcalf BW (eds) (1992) Medicinal chemistry for the 21st
century. Blackwell Scientific, Oxford
Journals and Series
Annual Reports in Medicinal Chemistry
Chemistry & Biology
ChemMedChem
Drug Discovery Today
Drug News and Perspectives
Journal of Computer-Aided Molecular Design
Journal of Medicinal Chemistry
Methods and Principles in Medicinal Chemistry
xiv Introduction
Nature
Nature Reviews Drug Discovery
Perspectives in Drug Discovery and Design
Pharmacochemistry Library
Progress in Drug Research
Quantitative Structure-Activity Relationships
Reviews in Computational Chemistry
Science
Scientific American
Trends in Pharmacological Sciences
Nowadays the Internet, discussion platforms, and the tremendously valuable tool of Wikipedia are
available to everyone and provide access to an enormous source of information.
Introduction xv
Part I
Fundamentals in Drug Research
This colored copper plate engraving from arguably the most beautiful plant book,
the Hortus Eystettensis by Basilius Besler, Eichst€
att, 1613, shows the squill, Scilla
alba (modern name: Urginea maritima L.). This plant was known to the ancient
Egyptians, Greeks, and Romans as a remedy for many ailments, but especially
dropsy (today: congestive heart failure). It was venerated faithfully as general
defense against harm. It was not until our century that the active components of
squill, the glycosides scillaren, and proscillaridin were isolated in their pure form,
and a derivative with improved bioavailability, meproscillarin (Clift®
), was avail-
able for pharmaceutical therapy.
2 I Fundamentals in Drug Research
Drug Research: Yesterday, Today, and
Tomorrow 1
The targeted route to medicines is an old dream of humanity. Even the alchemists
sought after the Elixir, the Arcanum that was meant to heal all disease. It still has
not been found today. On the contrary, drug therapy has become even more
complicated as our knowledge of the different disease etiologies has become
more complex.
Nonetheless, the success of drug research is impressive. For hundreds of years,
alcohol, opium, and solanaceae alkaloids (from thorn apples) were the only prepa-
ratory measures for surgery. Today general anesthesia, neuroleptanalgesia, and
local anesthetics allow absolutely pain-free surgical and dental procedures to be
carried out. Until this century, plagues and infectious diseases have killed more
people than all wars. Today, thanks to hygiene, vaccines, chemotherapeutics, and
antibiotics, these diseases have been suppressed, at least in industrialized countries.
The dangerously increasing numbers of therapy-resistant bacterial and viral path-
ogens (e.g., tuberculosis) have presented new problems and make the development
of new medications urgently necessary. The H2-receptor inhibitors and proton-
pump inhibitors have drastically reduced the number of surgical procedures to treat
gastric and duodenal ulcers. Combinations of these inhibitors with antibiotics have
brought even more advances in that it allows a causal therapy (▶ Sect. 3.5).
Cardiovascular diseases, diabetes, and psychiatric diseases (diseases of the central
nervous system, CNS) are treated mostly symptomatically, that is, the cause of the
disease is not addressed, but rather the negative effects of the disease on the
organism. Often the therapy is limited to slowing the progression of these diseases
or increasing the quality of life. Synthetic corticosteroids have lead to significant
pain reduction and retardation of the pathological bone degeneration associated
with chronic inflammatory diseases (e.g., rheumatoid and chronic polyarthritis).
The spectrum of cancer therapy ranges from healing, particularly in combination
with surgical and radiation therapy, all the way to complete failure of all therapeutic
measures.
The history of drug research can be divided into several sequential phases:
• the beginning, when empirical methods were the only source of new medicines,
• targeted isolation of active compounds from plants,
G. Klebe, Drug Design, DOI 10.1007/978-3-642-17907-5_1,
# Springer-Verlag Berlin Heidelberg 2013
3
• the beginning of a systematic search for new synthetic materials with biological
effects and the introduction of animal models as surrogates for patients,
• the use of molecular and other in vitro test systems as precise models and as
a replacement for animal experiments,
• the introduction of experimental and theoretical methods such as protein
crystallography, molecular modeling, and quantitative structure–activity rela-
tionships for the targeted structure-based and computer-supported design of
drugs, and
• the discoveries of new targets and the validation of their therapeutic value
through genomic, transcriptomic, and proteomic analysis, knock-in and knock-
out animal models, and gene silencing with siRNA.
Each preceding phase loses its importance with the arrival of the next phase.
Interestingly, in modern drug research individual phases run in the opposite direc-
tion. That is, first a target structure is discovered in the sequenced genome of an
organism and its function is modulated to validate it as a candidate for drug therapy.
Then the structure-based and computer-aided design of an active substance is
undertaken in close cooperation with multiple in vitro tests to clarify the activity
and the activity spectrum. Next, the animal experiments substantiate the clinical
relevance, and in the final step clinical trials confirm a test substance’s suitability as
a medicine for patients.
1.1 It All Began with Traditional Medicines
The beginnings of drug therapy can be found in traditional medicines. The narcotic
effect of the milk of the poppy, the use of autumn crocus (Colchicum autumnale)
for gout, and the diuretic effect of squill (Urginia maritime) for dropsy (today:
congestive heart failure) have been known since antiquity. The dried herbs and
extracts from these and other plants have served as the most important source of
medicines for more than 5,000 years. The oldest written records of these uses are
from 3000 BC.
Around 1550 BC the ancient Egyptian Papyrus Ebers listed approximately 800
prescriptions, of which many contained additional rituals to invoke the help of the
gods. The five-volume book De Materia Medica of Dioskurides (Greek physician,
first century AD) is the most scientifically rigorous work of antiquity. It contains
descriptions of 800 medicinal plants, 100 animal products, and 90 minerals. Its
influence reached into the late Arabic medicine and the early modern age.
The most famous medicine of antiquity was undoubtedly Theriac. Its precursor,
Mithridatum, served the King of Pontus, Mithridates VI (120–63 BC) as an antidote
for poisonings of all kinds. Theriac can be traced to Andromachus, the private
physician of the emperor Nero, and originally contained 64 ingredients. This
preparation remained very widespread even into the eighteenth century. It was
prepared in many variations with up to 100 ingredients. In some cities it was even
prepared under state control to ensure that none of the ingredients were left out! Its
use evolved into a panacea for all diseases. In addition, every imaginable wonder
4 1 Drug Research: Yesterday, Today, and Tomorrow
drug was in use, some examples include rain worm oil, unicorn powder, gastric
calculus stones, human cranium powder (Lat. Cranium, skull), mummy dust, and
many more.
Traditional Chinese medicine was very advanced even in ancient times.
A special feature of their formulation was, and is, the circumstances responsible
for the effect of four different qualities. The chief (jun) is the carrier of the effect,
the adjutant (chen) supports the effect or induces a different effect. The assistant
(zuo) can also support the main effect or can serve to ameliorate side effects, and
one or more messengers (shi) moderate the desired effect. The Chinese Pen-Ts’ao
school (first and second century AD), whose goal it was to live for as long as
possible without aging (!), recommended the following dosing regime:
When treating a disease with a medicine, if a strong effect is desired, one should begin with
a dose that is not larger than a grain of millet. If the disease is healed, no more medicine
should be given. If the disease is not healed, the dose should be doubled. If that does not
heal the disease, the dose should be increased tenfold. When the disease is healed, the
therapy should always be discontinued.
The Chinese Materia Medica published by Li Shizhen in 1590 is made up of
52 volumes. It contains almost 1,900 medical principles, plants, insects, animals,
and minerals incorporated into 10,000 detailed recipes for their preparation. The
Chinese Pharmacopeia from 1990 contains only two volumes. One of those
volumes contains 784 traditional medicines; the other contains 967 medications
from “Western” medicine.
Paracelsus (born Theophrastus Bombastus von Hohenheim; 1493/1494–1541)
made a great breakthrough for scientific medical research. He understood the
human to be a “chemical laboratory” and held the ingredients of drugs themselves,
the Quinta essentia, responsible for their healing effects. Despite this, up until
the beginning of the nineteenth century all therapeutic principles were based on
either extracts from plant, animal ingredients, or minerals; only in the most seldom
cases were pure organic compounds used. That changed fundamentally with the
advent of organic chemistry. The great age of natural products from plants (for
examples see 1.1–1.9, Fig. 1.1), and the active substances that were derived from
them had begun. Premature hopes that were invested in some of these substances
around the turn of the previous century, for example in heroin (▶ Sect. 3.3), or
cocaine (▶ Sect. 3.4), were very quickly squelched, but natural products from
plants established the fundamentals for, and form an exceedingly large part of our
modern pharmacy. Natural products and their analogues and derivatives are also
well represented among the best-selling drugs today.
1.2 Animal Experiments as a Starting Point for Drug Research
The wealth of experience gained by traditional medicine is based on many thou-
sands of years of sometimes accidental, sometimes intentional observations of their
therapeutic effects on humans. Planned investigations on animals were relatively
seldom. The biophysical experiment of Luigi Galvani, an anatomy professor in
1.2 Animal Experiments as a Starting Point for Drug Research 5
O
HO
H
H
N
N N
N
H3C
CH3
O
O
Morphine
1.1
HO
N
H
N N
CH3
O
Caffeine
1.2
MeO
N
HO
H
N
NHCOMe
O
OMe
MeO
MeO
MeO
N
N COOCH3
O O
H3C
H3C
OH
N
CH3
Quinine
1.3 1.4 Colchicine
H
H
N
H
CH3
Ephedrine
Cocaine
1.6
1.5
H
O
O
OH
Coniine
1.7
N
N
O
H
H
H
MeO Atropine (racemate)
1.8
H
O
OMe
OMe
OMe
OMe
O
MeO
1.9 Reserpine
N CH3
H
Fig. 1.1 Many important natural products were isolated in the nineteenth century, and a few were
synthesized. Morphine 1.1 was isolated from opium by Friedrich Wilhelm Adam Sert€
urner in
1806, caffeine 1.2 was isolated from coffee, and quinine 1.3 was isolated from cinchona bark by
Friedlieb Runge in 1819. Quinine was discovered independently by Pierre Joseph Pelletier and
Joseph Bienaimé Caventou, who one year later isolated colchicine 1.4 from autumn crocus.
Cocaine 1.5 was extracted from coca leaves by Albert Niemann in 1860, and ephedrine 1.6 was
extracted from the Chinese plant Ma Huang (Ephedra vulgaris) by Nagayoshi Nagai. In 1886 the
first alkaloid, coniine 1.7, which is found in hemlock, was synthesized by Albert Ladenburg; in
1901 atropine 1.8 from Deadly Nightshade was synthesized by Richard Willst€
atter. Reserpine 1.9,
from Rauwolfia serpentina was first prepared in the middle of the twentieth century, and its
structure was elucidated.
6 1 Drug Research: Yesterday, Today, and Tomorrow
Bologna, which was first described in his book De viribus electricitatis in motu
musculari in 1791, has become famous. In 1780 his students had already observed
how frog thighs would twitch when the nerve was dissected and if a static electricity
generator was simultaneously in use, such devices were standard laboratory equip-
ment in many laboratories at the time. He wanted to demonstrate in standardized
experiments whether the twitching was also caused by thunderstorms. He hung the
legs on an iron window grill with a copper hook — they twitched simply upon
contact with the grill. The voltage difference between the two metals was enough to
stimulate the nerve, even without an electrical discharge.
The systematic investigation of the biological effects in animals of plant
extracts, animal venoms, and synthetic substances began in the next-to-last century.
In 1847 the first pharmacology department was founded at the Imperial University
in Dorpat (today: Tartu, Estonia). The famous pharmacologist, Sir James W. Black,
who developed the first b-blocker (an antihypertensive, ▶ Sect. 29.3) at ICI, and
later took part in the development of the first H2 antagonists (see gastrointestinal
ulcer medications, ▶ Sect. 3.5) at Smith, Kline & French, compared pharmacolog-
ical testing to a prism: what pharmacologists see in their substances’ properties
directly depends on the model that was used to test the substances.
Just as a prism would, the models distort our vision in different ways. There is no
such thing as a depressed rabbit or a schizophrenic rat. Even if there were such
animals, they would not be able to share their subjective perceptions and emotions
with us. Gene-modified animals (▶ Sect. 12.5), such as the Alzheimer mouse, are also
approximations of reality that have been distorted through a different prism, to use
Black’s analogy. This actuality is often underestimated in industrial practices. Sci-
entists tend to optimize their experiments on a particular, isolated model. In doing so,
many factors and characteristics that are essential for a medicine, for instance the
selectivity or bioavailability, are inadequately considered.
There is no way out of this dilemma. We need simple in vitro models (Sect. 1.5)
to be able to test large series of potentially active compounds, and we need the
animal models to correlate the data and to make predictions about the therapeutic
effects on humans. In the past, therapeutic progress was preferentially achieved
when a new in vivo or in vitro pharmacological model was available for a new effect
(see the H2 receptor antagonists, ▶ Sect. 3.5).
Typical mistakes in the selection of models and interpretation and comparison of
experimental results arise from different modes of application and the correlation of
results obtained in different species of animals. It does not make sense to optimize
the therapeutic range of a substance in one species, and the toxicology in another.
Further, comparing effects after a fixed dose, without determining an effective dose
also distorts the results because very strong and weak substances fall outside the
measurement range. Measuring the effect strictly according to a schedule is also
questionable because neither the latency period, that is the time before an effect is
seen, nor the time of maximum biological effect are recorded. In whole-animal
models, auxiliary medications are usually applied, which can also influence the
experimental results. Anesthetized animals often give entirely different results than
conscious animals.
1.2 Animal Experiments as a Starting Point for Drug Research 7
1.3 The Battle Against Infectious Disease
Plagues and infectious diseases, and at the top of this list are malaria and tubercu-
losis, have killed more people over the ages than all of the wars in the history of
humanity. Twenty-two million people died during the first wave of the 1918
influenza (“Spanish flu”). Up until the middle of the twentieth century, millions
of people died every year of malaria, and unfortunately, today these numbers are
shooting up again (▶ Sect. 3.2). Until the turn of the twentieth century, ipecac
(Psychotria ipecacuanha) and cinchona (Cinchona officinalis L.) were the only
therapeutic approaches to this disease. The impressive successes in the fight against
plagues came in large part from the last 80 years of drug research. We have the
sulfonamides (▶ Sect. 2.3) and their combinations with dihydrofolatereductase
inhibitors (▶ Sect. 27.2), the antibiotics (▶ Sects. 2.4, ▶ 6.4, and ▶ 32.6), and
the synthetic tuberculostatic medicines (▶ Sect. 6.5) to thank for this. When
Selman A. Waksman (1888–1973) received the Nobel Prize for the discovery of
streptomycin (▶ Sect. 6.4), a little girl congratulated him with a bouquet of
flowers. She was the first patient with meningeal tuberculosis to be healed with
streptomycin. Today we cannot appreciate the atmosphere in a tuberculosis hos-
pital from our own experience, rather solely from Thomas Mann’s The Magic
Mountain (German: Zauberberg).
However, the infectious diseases, including tuberculosis, are on the advance
again. In the past many antibiotics were too broadly used. This and the spread of
resistant pathogens in hospitals have led to the situation that many cases are only
treatable with very specific antibiotics. If resistance develops to these antibiotics, all
of our weapons are dull. New viral infections are looming. Before the advent of the
immune disease AIDS (acquired immune deficiency syndrome) there were very
few cases of pneumonia from the fungus Pneumocystis jirovecii (formerly
Pneumocystis carinii), nowadays the numbers have increased tremendously. This
type of pneumonia is the primary cause of death of AIDS patients and
immunosupressed patients after organ transplantation. A great effort has been
made to find drugs for AIDS and its complications. On the other hand, many
widespread tropical diseases, for instance malaria and Chagas disease, have been
inadequately researched, and expanding resistance to the currently available med-
ications represents an increasing worldwide problem. Because these diseases are
rampant in parts of the world where people lack the economic resources to finance
chemotherapy, more and more pharmaceutical companies have withdrawn from
these research areas for financial reasons. The chances of recovering the develop-
ment costs from this social stratum are poor. Here the global politics must establish
some structure so that these people are able to benefit from the technological
progress made by modern drug research. An example of this is the Bill and Melinda
Gates Foundation, which is dedicated to the treatment and eradication of diseases
around the entire world, but with particular emphasis on developing countries.
Improved hygiene has also helped to reduce the risk of infection, for
instance traumatic fever or Shigella dysentery (discussed in ▶ Chap. 21, “A Case
8 1 Drug Research: Yesterday, Today, and Tomorrow
Study: Structure-Based Inhibitor Design for tRNA-Guanine Transglycosylase”).
Above all else, it was the vaccines that contributed to the eradication of many
infectious diseases. Now as before, hopes rest on new and combined vaccines for
the prevention of AIDS, malaria, and gastrointestinal ulcers, the latter of which we
now know to be caused by the bacteria Helicobacter pylori (▶ Sect. 3.5).
1.4 Biological Concepts in Drug Research
Acetylcholine 1.10 (Fig. 1.2), which was synthesized in 1869 by Adolf v. Bayer, is
a neurotransmitter, that is, a transfer agent for nerve impulses. In 1921 Otto
Loewi, a pharmacologist, proved its biological effect in an elegant experiment.
Two isolated frog hearts were perfused with the same solution. The vagal nerve of
one of the hearts was stimulated, leading to a slowing of the heart rate, a so-called
bradycardia. Shortly afterward, the second heart also began to beat more slowly,
which was a clear indication of a humoral (Lat. humor, umor, fluid) signal transfer.
Soon after that acetylcholine was recognized as the responsible “Vagus Stoff”.
Acetylcholine is itself not usable as a therapeutic because it is metabolized too
quickly by acetylcholine esterases (▶ Sect. 23.7).
In 1901 Thomas Bell Aldrich (1861–1938) and Jokichi Takamine isolated the
first human hormone, adrenaline 1.11 (Fig. 1.2). This hormone and its N-desmethyl
derivative, noradrenaline 1.12 (Fig. 1.2), are produced in a central location, the
adrenal glands, and are released under stress conditions into the entire system with
the exceptions of the CNS and the placenta, which have their own barriers against
most polar compounds. These substances cause different reactions in different parts
of the organism, where they react with the relevant receptors. The specificity is
poor, and a plethora of pharmacodynamic effects result: pulse and blood pressure
rise, and the organism is prepared for “flight” – which has been an exceedingly
important function over the course of evolution.
Noradrenaline and adrenaline (also called norepinephrine and epinephrine,
respectively) are also neurotransmitters (▶ Sect. 29.3), just like acetylcholine, the
biogenic amines 1.13–1.15, the amino acids 1.16–1.19, and peptides, such as 1.20
and 1.21 (Fig. 1.2). Neurotransmitters are produced locally in the nerve cells,
stored, and upon stimulation of the nerve, released. After interaction with receptors
on the neighboring nerve cell, they are quickly metabolized or taken up again by the
same neuron that released them. Depending on the name of the neurotransmitter,
one speaks of the adrenergic, cholinergic, and dopaminergic (etc.) systems. The
effect that adrenaline invokes is referred to as adrenergic, and an antagonist to this
system is called antiadrenergic. However, this nomenclature is not always strictly
adhered to. It is common to see combinations of the name of the neurotransmitter
with the term agonist or antagonist, or sometimes blocker instead of antagonist, for
instance a dopamine agonist, a histamine antagonist, or a b-blocker for antagonists
of b-adrenergic receptors. A plethora of drugs have arisen from the structural
variations of neurotransmitters.
1.4 Biological Concepts in Drug Research 9
At the end of the 1920s the steroid hormones were isolated, and their structures
were determined in short order (▶ Sect. 28.5). Altogether the discoveries of the
mid-twentieth century heralded the “golden age” of drug research. The systematic
variation of the principles responsible for biological activity and our increasing
knowledge of the mode of action has led to the synthesis of enzyme inhibitors,
receptor agonists and antagonists, which together with natural product derivatives
from plants makes up the largest part of our modern pharmacy.
+
H3C O
N
CH3
CH3
CH3
O OH
N
R
HO
H
1.10 Acetylcholine
1.11 Adrenaline, R = CH3
HO
NH2
HO
HO
1.12 Noradrenaline, R = H
1.13 Dopamine
HO
NH2
H
1.15
1.14 Histamine
N
N
N
NH2
H
Serotonin
HOOC
HOOC
NH2
COOH
NH2
COOH
1.17 Glutamic acid
1.16 Aspartic acid
1.18 Glycine 1.19 γ-Aminobutyric acid
H2N COOH H2N COOH
Tyr-Gly-Gly-Phe-Met
Tyr-Gly-Gly-Phe-Leu
1.20 Met-Enkephalin
1.21 Leu-Enkephalin
Fig. 1.2 The natural hormones und neurotransmitters acetylcholine 1.10, adrenaline 1.11, nor-
adrenaline 1.12, dopamine 1.13, histamine 1.14, and serotonin 1.15, the excitatory amino acids
glutamic acid 1.16 and aspartic acid 1.17, the inhibitory amino acid glycine 1.18 and
g-aminobutyric acid (GABA) 1.19, and several peptides, such as the enkephalins 1.20 and 1.21,
substance P and others serve as lead structures for drugs for a variety of cardiovascular and CNS
diseases (see ▶ Chaps. 3, “Classical Drug Research”; ▶ 29, “Agonists and Antagonists of Mem-
brane-Bound Receptors”; and ▶ 30, “Ligands for Channels, Pores, and Transporters”).
10 1 Drug Research: Yesterday, Today, and Tomorrow
1.5 In Vitro Models and Molecular Test Systems
Around 40 years ago, we began to think about testing substances in simple in vitro
models. With these models biological testing takes place in test tubes rather than
animals. There are many compelling reasons to avoid animal experiments. They
increasingly provoke public criticism and are time and cost intensive. In the
beginning cell culture models were preferentially employed, for example tumor
cell cultures for testing cytostatic therapies, or embryonic chicken heart cells for
cardio-active compounds. Later these were joined by receptor-binding studies. The
first molecular test models were enzyme-inhibitor assays in which the inhibitory
activity of a molecule could be evaluated on one particular target protein in the
absence of interfering side effects (▶ Chap. 7, “Screening Technologies for
Lead Structure Discovery”). With the progress of gene technology methods
(▶ Chap. 12, “Gene Technology in Drug Research”), not only is the preparation
of the enzyme simplified, but also receptor-binding studies can be carried out on
standardized materials. Today it is possible to achieve an exact evaluation of the
entire activity spectrum of any substance on any enzyme, receptors of all types and
subtypes, ion channels, and transporters. In the meantime, in industrial drug dis-
covery this procedure has become routine. Before biological screening begins, the
following questions have to be answered: what therapeutic goal should be achieved
and is this goal achievable? Therapeutic concepts are established based on the
pathophysiology and the causes of its alteration. Regulatory interventions with
drugs should re-establish the normal physiological conditions as closely as possible.
In doing so, a distinct problem occurs. Nature works on two orthogonal principles: the
specificity of the mode of action and an accentuated spa separation of effects;
the compartmentalization. Adrenaline that is produced in the adrenal glands works
on the entire body except for the brain. If it is released there, it works only in the
synapse between two nerve cells. As far as the specificity goes, the chemists can beat
nature most of the time, but they fail when it comes to spatial separation by a wide
margin.
Through the progress made in gene technology (▶ Chap. 12, “Gene Technology
in Drug Research”) we can investigate active substances much more exactly than
before; but by using isolated enzymes and binding studies we are a long way away
from the reality of animal models, and even further away from humans. In analogy
to the difference between an animal experiment and an isolated-organ experiment,
a well-established correlation between the results obtained in cell culture and an
in vitro test and the desired therapeutic effect is a prerequisite to successfully using
the in vitro model. The quantitative relationship between different biological effects
(▶ Chap. 18, “Quantitative Structure–Activity Relationships”) establishes the con-
nection between animal models and humans.
One modern researcher stands out in the area of CNS-active compounds
especially, but also in areas of cardiovascular-active compounds and antihista-
mines. Paul Janssen (1926–2003) was the director of the company Janssen
Pharmaceuticals in Beerse, Belgium. In the years after World War II, his company
discovered over 70 new active substances, carried out the preclinical and clinical
1.5 In Vitro Models and Molecular Test Systems 11
development, and established them as therapies. In doing so, his company
established itself as the most successful in pharmaceutical history. His recipe
for success was not a secret. Paul Janssen was a master of structural variation,
a Beethoven of drug discovery. The systematic combination of pharmacologically
interesting structural fragments, and the elegant evaluation of receptor-binding
studies, in vitro models, and animal experiments were the foundation of
his successes.
1.6 The Successful Therapy of Psychiatric Illness
Up until the middle of the last century psychiatric hospitals were purely custodial care
facilities; they were almost indistinguishable from prisons in terms of the restriction of
personal freedom of the individual. The discovery of neuroleptics, antidepressants,
anticonvulsives, and sedatives revolutionized psychiatry. Typical examples of this
class of drugs are depicted in Fig. 1.3. With the repertoire of drugs that are available
today, schizophrenia, chronic anxiety, and depression preponderate open-ward psychi-
atry. Many patients can be treated in an ambulatory setting. In 1933 Manfred Sakel
(1901–1957), who worked at the psychiatric university hospital in Vienna, noticed that
when schizophrenics were given insulin to stimulate their appetites, they became
calmer. Encouraged by this result, he increased the dose to the point of hypoglycemic
coma, which is a form of deep unconsciousness induced by too little blood sugar. Insulin
shock, pentetrazole, and electroshock became the standard treatment over the next two
decades for psychotic illness, an impressive and frightening proof of the absence of
therapeutic alternatives.
This situation changed in the 1950s with the discovery of reserpine 1.9 (Fig. 1.1,
Sect. 1.1), a herbal natural product. This substance exerts its effect by emptying the
reserves of the neurotransmitters noradrenaline, serotonin, and dopamine in nerve
cells. Reserpine was the first substance to display a prominent neuroleptic effect,
that is, it is sedating and calming, and it was the first compound to be used for
psychotic illness, for which the biological effect could be explained by a mode of
action. In addition, reserpine was used as an antihypertensive medication. Because
of its very broad and unspecific effect it is rarely used today for psychiatric illness
or arterial hypertension.
The role of dopamine 1.13 (Fig. 1.2, Sect. 1.4) in the etiology of schizophrenia
became clear with the discovery of chlorpromazine 1.22 (Fig 1.3, ▶ Sects. 8.5 and
▶ 19.10), a substance that showed a favorable clinical effect. In contrast to the
unspecific reserpine, chlorpromazine is a pure dopamine antagonist. The applica-
tion of chlorpromazine and analogous tricyclic neuroleptics caused symptoms that
occur in Parkinson’s disease. This was the first indication that an endogenous
dopamine deficiency is the cause of that disease.
Chlordiazepoxide (Librium®
, ▶ Sect. 2.7), the first tranquilizer of the group of
benzodiazepines, was found by accident. Only one year after its introduction and
for many years after that, the chemically closely related medication diazepam 1.23
(Valium®
, Fig. 1.3) was the worldwide best-selling drug. The Rolling Stones
12 1 Drug Research: Yesterday, Today, and Tomorrow
commemorated it in their multifaceted song “Mother’s Little Helper.” Many com-
panies started grandly endowed synthetic programs, and chemists and pharmacol-
ogists applied their entire arsenal of methods. Their success justified their efforts.
Substances with different modes of action resulted: further tranquilizers, sedatives,
hypnotics, and even antagonists. Even today the benzodiazepines (▶ Sect. 30.5)
belong to the most popular and widespread medications.
The first antidepressant, iproniazid (▶ Sects. 6.7 and ▶ 27.8) was also an acci-
dental discovery. It works by inhibiting the metabolism of the biogenic amines
dopamine, serotonin, noradrenaline, and adrenaline by inhibiting the enzyme
monoamino oxidase (▶ Sect. 27.8). In addition to other severe side effects, the first
unspecific representatives caused hypertensive crises, and when taken with certain
foods a few fatalities occurred. Tyramine, a substance found in cheese, wine, and beer
(therefore the term “cheese effect”) was not duly metabolized. This caused a life-
threatening rise in noradrenaline, a hormone that raises blood pressure.
The antidepressant imipramine 1.24 (Fig. 1.3, ▶ Sect. 8.5) resulted from the
synthesis of analogues of chlorpromazine. Interestingly and despite its close struc-
tural relationship, it is not a neuroleptic but rather it works in the opposite way.
It blocks the transporter for noradrenaline and serotonin, and this prevents the
N
N
CH3
R
O
N
CH3
F3C
N
S
N
CH3
CH3
Cl
N
N
Cl
O
CH3
1.24 Imipramine, R = CH3
1.25 Fluoxetine
Desipramine, R = H 1.26
1.23 Diazepam
Chlorpromazine
1.22
H
Fig. 1.3 A revolution in the therapy of psychiatric illness was brought about by the discovery of
potent neuroleptics such as chlorpromazine 1.22, tranquilizers such as diazepam 1.23, and
antidepressants such as imipramine 1.24. For the first time, these compounds allowed
a purposeful treatment of schizophrenia, chronic anxiety, and depression. Examples of newer
antidepressants with specific modes of action on transport systems (▶ Sect. 4.6) for noradrenaline
and serotonin are desipramine 1.25 and fluoxetine 1.26, respectively.
1.6 The Successful Therapy of Psychiatric Illness 13
reuptake of these neurotransmitters from the synaptic gap. Desipramine 1.25 and
fluoxetine 1.26 are even more selective in that they inhibit only the noradrenaline or
the serotonin transporter of nerve cells.
1.7 Modeling and Computer-Aided Design
An extremely capable tool is available for modeling the properties and reactions of
molecules, and particularly their intermolecular interactions: the computer. In
addition to processing complex numerical problems, it is the translation of the
results into color graphics that exceedingly accommodates the human ability to
grasp pictures faster and more easily than text or columns of numbers. That is not
a surprise. Our brains process text sequentially, but pictures are comprehended in
parallel. X-ray crystallography and multidimensional NMR spectroscopic tech-
niques (▶ Chap. 13, “Experimental Methods of Structure Determination”) contrib-
ute to our understanding of molecules as much as quantum mechanical and force
field calculations (▶ Chap. 15, “Molecular Modeling”).
Is molecular modeling an invention of modern times? Yes and No. Friedrich
August Kekulé (1829–1896) supposedly derived his cyclic structure for benzene
from a vision of a snake that circled upon itself and bit its own tail (incidentally, the
snake Uroborus is an age-old alchemist symbol). This now-famous dream may be,
however, traced to a memory of the book Constitutionsformeln der Organischen
Chemie by the Austrian schoolteacher Joseph Loschmidt (1821–1895; Fig. 1.4).
Loschmidt admittedly would take pleasure in contemplating pictures of models that
are quite similar to his own. More and more today we place the three-dimensional
structure, the steric dimensions, and the electronic qualities of molecules in the
foreground. Advances in theoretical organic chemistry and X-ray crystallography
have made this possible. The first structure-based design was carried out on
hemoglobin, the red blood pigment, in the research group of Peter Goodford.
Hemoglobin’s affinity for oxygen is modulated by so-called allosteric effector
molecules that bind in the core of the tetrameric protein. From the three-
dimensional structure he deduced simple dialdehydes and their bisulfite addition
products. These substances bind to hemoglobin in the predicted way and shift the
oxygen-binding curve in the expected direction.
The first drug developed by using a structure-based approach is the antihy-
pertensive agent captopril, an angiotensin-converting enzyme (ACE) inhibitor
(▶ Sect. 25.4). Although the lead structure was a snake venom, the decisive
breakthrough was made after modeling the binding site. For this, the binding site
of carboxypeptidase, another zinc protease, was used because its three-dimensional
structure was known at the time.
The road to a new drug is difficult and tedious. A nested overview of the interplay
between the different methods and disciplines from a modern point of view is
illustrated in the scheme in Fig. 1.5. In the last few years molecular modeling
(▶ Chap. 15, “Molecular Modeling”) and particularly the modeling of ligand–
receptor interactions (▶ Chap. 4, “Protein–Ligand Interactions as the Basis
14 1 Drug Research: Yesterday, Today, and Tomorrow
for Drug Action”), have gained importance. Although modeling is employed
predominantly for the targeted structure modification of lead compounds, it is
also suitable for the structure-based and computer-aided design of drugs
(▶ Chap. 20, “Protein Modeling and Structure-Based Drug Design”) and lead
structure discovery (▶ Sect. 7.6). Examples of these approaches are given in
▶ Chaps. 23, “Inhibitors of Hydrolases with an Acyl–Enzyme Intermediate”;
▶ 24, “Aspartic Protease Inhibitors”; ▶ 25, “Inhibitors of Hydrolyzing
Metalloenzymes”; ▶ 26, “Transferase Inhibitors”; ▶ 27, “Oxidoreductase Inhibi-
tors”; ▶ 28, “Agonists and Antagonists of Nuclear Receptors”; ▶ 29, “Agonists and
Antagonists of Membrane-Bound Receptors”; ▶ 30, “Ligands for Channels, Pores,
and Transporters”; ▶ 31, “Ligands for Surface Receptors”; ▶ 32, “Biologicals:
Peptides, Proteins, Nucleotides, and Macrolides as Drugs”.
In addition to modeling and computer-aided design, structure–activity relation-
ship analysis (▶ Chap. 18, “Quantitative Structure–Activity Relationships”) has
contributed to the understanding of the correlation between the chemical structure
of compounds and their biological effects. By using these methods, the influence
of lipophilic, electronic, and steric factors on the variation of the biological
activity, transport, and distribution of drugs in biological systems could be
systematized for the first time on statistically significant foundations.
H
H
OH
O
H2N NH2
Cl
N N
N
N
N
H
H
Fig. 1.4 Loschmidt’s book Constitutionsformeln der Organischen Chemie (1861) contains struc-
tures that anticipate both the formulation of the benzene ring as well as the modern modeling
structure. Kekulé must have known about this book because he disparaged it in a letter to Emil
Erlenmeyer in January 1862 in that he referred to it as “Confusionsformeln.” Loschmidt did not
become famous for his book, but rather because he carried out an experiment in 1865 that
determined the number of molecules in a mole to be 6.021023
, a constant that was later to be
named after him.
1.7 Modeling and Computer-Aided Design 15
1.8 The Results of Drug Research and the Drug Market
The development of different methods in drug research has already been described
in the last section. Table 1.1 gives a short historical overview of the most prominent
results.
Identification of a
biological target, proof
of principle, molecular
test system
Literature, patents,
competitor products
(‘me too’ research)
Biological concept,
clinical side effects
Screening
Natural products,
synthetics, peptides,
combinatorial
chemistry
Lead structures
Experimental design,
synthetic design
Synthesis
Computer-aided
design: protein
crystallography,
NMR, 3D database
searches, de novo
design
DESIGN CYCLE
Biological
testing Structure—activity
relationships, QSAR,
molecular modeling
Candidate for
further
development
Developmental
substance Drug
Formulation
Fig. 1.5 The way to a drug is long. The upper part of the figure shows routes to lead structures.
The middle part describes the design cycle, which in practically all cases must be repeatedly
reiterated. Each of these phases is described in detail in the following chapters. The result of
iterative optimization is candidates for further development such as preclinical and toxicological
studies. It is from these studies that the actual candidates are selected. Formulation, clinical trials,
and registration then lead to a new medicine. The last phases are not presented in this book.
16 1 Drug Research: Yesterday, Today, and Tomorrow
Table 1.1 Important milestones in drug research
Year Substance Indication/Mode of action
1806 Morphine Hypnotic
1875 Salicylic acid Anti-inflammatory
1884 Cocaine Stimulant, local anesthetic
1888 Phenacetin Analgetic and antipyretic
1889 Acetylsalicylic acid Analgetic and antipyretic
1903 Barbiturate Sedative
1909 Arsphenamin Anti-syphilitic
1921 Procaine Local anesthetic
1922 Insulin Antidiabetic
1928 Estrone Female sex hormone
1928 Penicillin Antibiotic
1935 Sulfamidochrysoidine Bacteriostatic
1944 Streptomycin Antibiotic
1945 Chloroquine Antimalarial
1952 Chlorpromazine Neuroleptic
1956 Tolbutamide Oral antidiabetic
1960 Chlordiazepoxide Tranquilizer
1962 Verapamil Calcium channel blocker
1963 Propranolol Antihypertensive (b-blocker)
1964 Furosemide Diuretic
1971 L-DOPA Parkinson’s disease
1973 Tamoxifen Breast cancer (estrogen receptor antagonist)
1975 Nifedipine Calcium channel blocker
1976 Cimetidine Gastrointestinal ulcer (H2 blocker)
1981 Captopril Antihypertensive (ACE inhibitor)
1981 Ranitidine Gastrointestinal ulcer (H2 blocker)
1983 Ciclosporin A Immunosuppressant
1984 Enalapril Antihypertensive (ACE inhibitor)
1985 Mefloquine Antimalarial
1986 Fluoxetine Antidepressant (5-HT-transport inhibitor)
1987 Artemisinin Antimalarial
1987 Lovastatin Cholesterol biosynthesis inhibitor
1988 Omeprazole Gastrointestinal ulcer (H+
/K+
-ATPase inhibitor)
1990 Ondansetron Antiemetic (5-HT3 blocker)
1991 Sumatriptan Migraine (5-HT1B,D agonist)
1993 Risperidone Antipsychotic (D2/5-HT2-blocker)
1994 Famciclovir Antiviral/herpes (DNA polymerase inhibitor)
1995 Losartan Arterial hypertension (ATII antagonist)
1995 Dorzolamide Glaucoma (carboanhydrase inhibitor)
1996 Saquinavir HIV protease inhibitor
1996 Ritonavir HIV protease inhibitor
1996 Indinavir HIV Protease inhibitor
1996 Nevirapine HIV reverse transcriptase inhibitor
(continued)
1.8 The Results of Drug Research and the Drug Market 17
The assessment of the efficacy and safety of a drug has reached an extraordi-
narily high standard today. To some extent this development is a bystander in our
goal of finding new medicines, but it is also a hindrance. Acetylsalicylic acid
(Aspirin®
) is without any doubt a valuable drug. Today this compound would
have great difficulty to pass clinical trials. Acetylsalicylic acid is an irreversible
enzyme inhibitor, it has relatively weak efficacy, it causes gastric bleeding in high
doses, and it has a very short biological half-life. Each of these problems would be
a profound argument against its continued development today. It probably would
have already failed in screening. In a risk–benefit analysis however, it is better than
most of the alternatives. Where is the problem? It probably lies in the analytical–
deterministic mindset that dominates science, and therefore also drug research. It is
often overlooked that such an approach deals with a system as complicated and
complex as a human, to whom we apply a drug therapy, cannot always be ade-
quately addressed by all means.
Despite public healthcare systems that constitute a barrier between the supplier
and the consumer, the drug market, with worldwide sales of more than US$880
billion, has strong competition. Two forces affect this market: the state of science
Table 1.1 (continued)
Year Substance Indication/Mode of action
1997 Sibutramine Obesity (uptake inhibitor)
1997 Orlistat Obesity (lipase inhibitor)
1997 Tolcapon Parkinson’s disease (COMT inhibitor)
1998 Sildenafil Erectile dysfunction (PDE5 inhibitor)
1998 Montelukast Broncholytic (leukotriene receptor antagonist)
1999 Infliximab Antirheumatic (TNFa antagonist)
2000 Celecoxib Analgesic (COX-2 inhibitor)
2000 Verteporfin Macular degeneration (photodynamic therapy)
2001 Imatinib Acute myeloid leukemia (kinase inhibitor)
2002 Boscutan Arterial hypertension (endothelin-1 receptor antagonist)
2002 Aprepitant Antiemetic (neurokinin receptor antagonist)
2003 Enfuvirtid HIV fusion inhibitor (oligopeptide)
2004 Ximelagatran Coagulation inhibitor (thrombin inhibitor)
2004 Bortezomib Multiple myeloma (proteasome inhibitor)
2005 Bevacizumab Cytostatic (angiogenese inhibitor)
2006 Natalizumab Multiple sclerosis (monoclonal antibody; integrin inhibitor)
2006 Aliskiren Antihypertensive (renin inhibitor)
2007 Maraviroc HIV fusion inhibitor (CCR5 antagonist)
2007 Sitagliptin Type-II diabetes (DPPVI inhibitor)
2008 Raltegravir HIV integrase inhibitor
2009 Rivaroxaban Oral Anticoagulant (FXa inhibitor)
2010 Mifamurtide Drug against Osteosarcoma (bone cancer)
2011 Fingolimod Immunomodulating drug (multiple sclerosis treatment)
18 1 Drug Research: Yesterday, Today, and Tomorrow
and technology and the needs of patients. A few drugs command a large portion of
sales. Constantly changing “hit lists” of the best-selling drugs can be found on the
internet. Because of the merging of established pharmaceutical companies in the
last years, the market has contracted to fewer, bigger companies. It is frequently
the case that a single drug can make or break a company. Often only two to three
drugs make up more than 50% of a large company’s sales. A historical example is
Glaxo. This company made its way out of the midfield to the top with ranitidine.
Astra experienced a similar boom with omeprazole. Today after the merger with
Zeneca, it belongs to the biggest representatives of this field. Sankyo also had
a single drug, lovastatin, that exceedingly boosted sales. With its drugs sildenafil
(Viagra®
) and atorvastatin (Sortis®
/Lipitor®
) Pfizer’s profits shot to unimaginable
highs. Just in the last years we have been able to see an increasing concentration of
pharmaceutical companies, so that the market is making a transition to an oligo-
poly, dominated by multinational corporations. Keep in mind that sales giants such
as GlaxoSmithKline (GSK), Novartis, Sanofi-Aventis, Bayer HealthCare, Bristol-
Myers Squibb or AstraZeneca have only originated in the last 10 years through
mergers. Companies such as Pfizer and Roche have significantly grown from acqui-
sitions. The role research plays for pharmaceutical companies is apparent when one
considers that typically 15–20% of turnover is invested in this area. It is certain that
the concentration of the pharmaceutical market is not complete. We can only wait and
see how the landscape continues to shift and adapt at an almost annual pace.
1.9 Controversial Drugs
Drugs remain in the focal point of public interest. Whereas for decades it was the
physician alone who prescribed medication, today it is the patient, frightened by the
lay press or better informed through labeling or reputable literature, who wants to
take control of, or at least share in the decision making.
The issues can be illustrated by one example. Psychotropic pharmaceuticals
exert an impressive effect on personality and behavior. At least since the intro-
duction of Valium®
(diazepam) these drugs have been in the media spotlight.
They are invaluable for the treatment of psychiatric illness. On the other hand, the
danger of misuse and addiction is particularly high. Some of these drugs are even
used as self-medication, without strict adherence to the indication guidelines.
Fluoxetine 1.26 (Prozac®
, Fig. 1.3, Sect. 1.6) was introduced in 1988 by Eli Lilly,
and brought unequivocal progress in the treatment of depression. On this one
medication alone there are now over ten popular science books with controversial
content. Peter Kramer’s book Listening to Prozac takes an overall sympathetic
tone with the assertion that depressed patients feel better and more “in harmony”
with their personality after treatment with fluoxetine. This book was on the New
York Times bestsellers’ list for over 21 weeks. Peter Breggin’s book Talking Back
to Prozac criticized fluoxetine, the company Eli Lilly, and the U.S. Food and Drug
Administration (FDA) polemically. The side effects, risks, and particularly the
addictive potential were placed in the foreground. Both books contain correct
1.9 Controversial Drugs 19
assertions, and both books lead to the wrong conclusions. Prozac®
is a valuable
medicine for the treatment of clinically manifest depression; for the treatment
of mundane unhappiness or as a general stimulant, however, it is a drug with
many risks.
To make a risk–benefit analysis of a medication, it is important to consider not
only the desired effect but also the severity of the illness and the objective and
subjective side effects. In oncology one accepts even severe side effects for the
possibility of improving the patient’s condition. If an end-stage cancer patient is
refused an effective pain therapy because of the risk of addiction, then that must be
seen as malpractice. On the other hand many people handle highly potent medica-
tions recklessly. The misuse of antibiotics, the faith in the almighty power of
tranquilizers and antidepressants, or the chronic use of analgesics and laxatives
do more damage than good.
1.10 Synopsis
• Drug research can be divided into several sequential phases starting with empirical
observations of the uptake of natural products from food, the development of
in vitro test systems, increasing understanding of structures and modes of action,
to in vivo models and gene technology.
• It all started with traditional medicines. The first prescriptions date back to the
ancient Egyptians and to traditional Chinese medicine.
• Paracelsus founded scientific medical research and understood humans to be
a “chemical laboratory.” The ingredients of drugs were first held responsible for
healing effects.
• With the advent of organic chemistry, the first therapeutic principles based on
pure organic compounds became available. The great age of natural products
from plants and their active ingredients began.
• Systematic studies on animals began in the next-to-last century and can be
seen as a starting point for drug research. In vitro models are needed to
test large series of potentially active compounds, but animal models are required
to correlate the data and make predictions about the therapeutic effects
in humans.
• Our present life expectancy would not be possible without the successful
fight against infectious diseases. The broad application of antibiotics and
the spread of resistant pathogens, however, have led to situations in which
the best weapons against infectious diseases are becoming increasingly dull.
Research against widespread tropical diseases has been neglected, and
the currently increasing resistance to available medications represents a
worldwide problem.
• The elucidation of biological concepts, pathways, and regulatory cycles by
endogenous compounds has strongly stimulated drug research. Many developed
drugs have arisen from structural variations of neurotransmitters, hormones,
steroids, or natural substrates.
20 1 Drug Research: Yesterday, Today, and Tomorrow
• Systematic substance testing began with the establishment of in vitro models
that replaced biological testing on animals by assays in test tubes. Gene
technology has made it possible to prepare sufficient amounts of pure proteins
for testing.
• The discovery of neuroleptics, antidepressants, anticonvulsives, and sedatives
has revolutionized the treatment of psychiatric diseases.
• Molecular modeling and computer-aided design along with structural
biology give access to rational considerations on drug action. The first
structure-based design project was carried out on hemoglobin, and the first
drug developed by using a structure-based approach was the antihyperten-
sive captopril.
• The assessment of drug efficacy and safety has reached an extraordinary high
standard today. The worldwide drug market, with nearly a thousand billion
US dollars in sales per year, is large and highly competitive. Only a few
drugs command a large portion of the sales and determine the particular
dynamics in the market; the current tendency is corporate contraction to
fewer and bigger companies. Often a single drug can make or break
a company.
• Drugs remain in the focal point of public interest. It is no longer the physician
alone who influences the prescription of medication; multiple sources of infor-
mation have an impact and inform the patient. A proper risk–benefit analysis of
a medication, taking into consideration not only the desired therapeutic effect
but also the severity of an illness, is needed.
Bibliography
General Literature
Barondes SH (1993) Molecules and mental illness, Scientific American Library. W. H. Freeman
and Company, New York
Beddell CR (ed) (1992) The design of drugs to macromolecular targets. Wiley, Chichester
Fischer D, Breitenbach J (eds) (2003) Die Pharmaindustrie. Spektrum Akademischer Verlag,
Heidelberg/Berlin
Friedrich C, Müller-Jahncke W-D (2005) Von der Frühen Neuzeit bis zur Gegenwart, vol 2.
GOVI-Verlag, Eschborn
Higby G (ed) (1997) The inside story of medicine. A Symposium. Madison, Wi
Herrmann EC, Franke R (eds) (1995) Computer-aided drug design in industrial research, Ernst
Schering research foundation workshop 15. Springer, Berlin
Müller K (ed) (1995) De Novo Design, Persp. Drug Discov. Design, vol 3, Escom, Leiden, 1995
MüllerJahnke WD, Friedrich C (2005) Arzneimittelgeschichte. Wissenschaftliche Verlagsge-
sellschaft, Stuttgart
Perun TJ, Propst CL (eds) (1989) Computer-aided drug design. Methods and applications. Marcel
Dekker, New York
Porter R, Teich M (eds) (1995) Drugs and narcotics in history. Cambridge
Restak RM (1994) Receptors. Bantam Books, New York
Schmitz R (1998) Geschichte der Pharmazie, vol 1. GOVI-Verlag, Eschborn
Bibliography 21
Verband Forschender Arzneimittelhersteller (2009) e.V.: http://www.vfa.de/de/presse/statcharts/
arzneimittelmarkt/. Accessed 22 Nov 2011
Werth B (1994) The Billion-Dollar Molecule. One Company’s Quest for the Perfect Drug.
Touchstone, New York
Special Literature
Beddell CR, Goodford PJ, Norrington FE et al (1976) Compounds designed to fit a site of known
structure in human hemoglobin. Br J Pharmac 57:201–209
Breggin PR, Breggin GR (1994) Talking back to prozac. St. Martin’s Press, New York
Kramer P (1993) Listening to prozac. Viking, New York
Mutschler E (1987) Arzneimittel – Erfolge, Misserfolge, Hoffnungen. Deutsche Apoth-Ztg
127:2025–2033
Newman DJ, Cragg GM (2007) Natural products as sources of new drugs over the last 25 years.
J Nat Prod 70:461–477
Noe CR, Bader A (1993) Facts are better than dreams. Chem Brit 29:126–128, Kekulés and
Loschmidts Formeln
22 1 Drug Research: Yesterday, Today, and Tomorrow
In the Beginning, There Was Serendipity
2
“A lucky accident dropped the medicine into our hands”; this is how a publication on
August 14, 1886, from Arnold Cahn and Paul Hepp in the Centralblatt f€
ur Klinische
Medizin began. The history of drug research is punctuated by lucky accidents.
As a general rule, detailed knowledge of biological systems was absent. So it is not
surprising that the working hypotheses were often wrong, and the obtained results
differed from the expectations. The case of accidental success fell into the back-
ground over time. Today happenstance as a strategy has been replaced by the arduous
and ambitious goal of preparing drugs by using a straightforward approach. The only
exception to this is the kind of shotgun-style testing of large and diverse chemical
compound libraries, including microbial and plant extracts that is done with the goal
of finding new lead structures. In this case, serendipity is desired to find as large and
diverse a palette of lead structures (▶ Chaps. 6, “The Classical Search for Lead
Structures” and ▶ 7, “Screening Technologies for Lead Structure Discovery”) with
potential for further optimization (▶ Chaps. 8, “Optimization of Lead Structures” and
▶ 9, “Designing Prodrugs”).
2.1 Acetanilide Instead of Naphthalene: A New, Valuable
Antipyretic
Back to Cahn and Hepp. What happened? There are several legends about this
lucky accident. The most plausible version is that the antipyretic effect of naph-
thalene, which was widely available from coal tar, was tested. The substance indeed
showed fever-lowering qualities. The responsible substance however, was not naph-
thalene but rather something entirely different: acetanilide 2.1 (Fig. 2.1). Further
experiments confirmed the efficacy. Shortly thereafter, the company Kalle  Co.
introduced it to the market with the name “Antifebrin.”
Phenacetin 2.2 (Fig. 2.1) was subsequently developed based upon a targeted
approach. At the time, Bayer in Elberfeld had 30 t of p-nitrophenol, a side product
from dye production, on their waste heap. The then 25-year-old Carl Duisberg, who
later became the chairman of Bayer Farbenfabriken AG and who also took a leading
G. Klebe, Drug Design, DOI 10.1007/978-3-642-17907-5_2,
# Springer-Verlag Berlin Heidelberg 2013
23
role in the foundation of I.G. Farbenindustrie in 1924, wanted to use it for the
preparation of acetanilide as it could easily be reduced to p-aminophenol. The
known toxicity of phenol groups led to the design of p-ethoxyacetanilide 2.2 (phen-
acetin), which actually did have the desired qualities and served as an analgesic for
headaches and as an antipyretic for a century. Unfortunately its metabolite 2.4, which
still contains the ethoxy group, leads to the production of methemoglobin, an
oxidized form of the red blood pigment that is incapable of carrying oxygen.
Furthermore, chronic misuse by, for instance, taking kilogram quantities of phenac-
etin over a lifetime, leads to kidney damage. Paradoxically, the main metabolite of
phenacetin, p-hydroxyacetanilide 2.5 (Fig. 2.1, acetaminophen in American English,
or paracetamol in UK English) is actually responsible for the effect, and it is less toxic
and better tolerated. In the USA alone, paracetamol achieved over US$1.3 billion in
annual sales. This is even more than for acetylsalicylic acid.
2.2 Anesthetics and Sedatives: Pure Accidental Discovery
In 1799 Humphry Davy (1778–1829) discovered the euphoric effect of nitrous oxide
(N2O), which was appropriately named “laughing gas.” The dentist Horace Wells
(1815–1848) saw a traveling theater production of a “sniffing party” with N2O in
1844 in which a participant suffered from a flesh wound, apparently without pain. To
test this effect, Wells had one of his own teeth extracted, also without pain. He then
repeated the procedure on many people, with success. A public demonstration went
O
N
O-
O
C
O
+
HN CH3
HN CH3
OH
OEt
Phenacetin
O
p-Nitrophenol
Acetanilide
2.1 2.2 2.3
NH2 HN CH3
OEt OH
2.4 2.5 Paracetamol
Toxic metabolite
Fig. 2.1 By starting with the accidently discovered acetanilide 2.1, Carl Duisberg planned
the synthesis of phenacetin 2.2 from nitrophenol 2.3. In contrast to the toxic metabolite 2.4, the
main metabolite, paracetamol (Amer. acetaminophen) 2.5 is well tolerated.
24 2 In the Beginning, There Was Serendipity
wrong though, and this drove him to suicide four years later. The same effect was
observed in 1842 by Crawford W. Long (1815–1878) with ether, but he did not report
it immediately. After administering ether, he was able to remove an ulcer from the
neck of a volunteer. William T. Morton (1819–1868) successfully carried out the first
ether anesthesia in the same hospital as Wells. Starting in 1847, chloroform was used
as an anesthetic. A few years later anesthesia became standard for surgical pro-
cedures, a real blessing for the suffering of humanity.
Oskar Liebreich (1839–1908) wanted to develop a depot form of chloroform 2.6
in 1868. Because chloral hydrate can be cleaved with base in an aqueous milieu, he
hoped that this could also happen in the body. Chloral hydrate is in fact a sedative,
but this is because of its active metabolite, trichloroethanol 2.8 (Fig. 2.2), and not
because it releases chloroform.
In 1885 Oswald Schmiedeberg (1838–1921) tested urethane 2.9 (ethylcarbamate,
Fig. 2.3) because he thought that it would release ethanol in the organism. Urethane
itself is the active agent. Its optimization later led to isoamylcarbamate 2.10
(Hedonal®
, 1899). Based on this, open and cyclic carbamates and ureas were
investigated. In 1903 the first barbiturate sedative, barbital (Veronal®
) resulted.
In the decades that followed, a wealth of better-tolerated barbiturates with a broader
pharmacokinetic spectrum was introduced.
2.3 Fruitful Synergies: Dyes and Pharmaceuticals
Dyes and pharmaceuticals have stimulated each other. The first synthetic dye was
the result of a failed drug synthesis. In 1856 August Wilhelm v. Hoffman assigned
the task of synthesizing quinine, an alkaloid used for treating malaria (▶ Sects. 1.1
and ▶ 3.2), to the then 17-year-old William Henry Perkins (1838–1907); by starting
Cl CH
Cl OH
H
Cl
Cl
CH2OH
Cl
Cl
OH−
2.6 Chloroform 2.7 Chloral hydrate 2.8 Trichlorethanol
Metabolism
Cl OH
Cl Cl
Fig. 2.2 The anesthetic chloroform 2.6 is formed upon treatment of chloral hydrate 2.7 with base.
This reaction does not work in vivo, however. The active metabolite of 2.7 is trichloroethanol 2.8.
O
R
2.9 Urethane R = −CH2 CH3
O
H2N
2.10 Isoamylcarbamate
O
H
N
N
O
O
Et
Et
H
2.11 Barbital
R = −CH2CH2CH(CH3)2
Fig. 2.3 The hypothetical
“prodrug” of ethanol,
urethane 2.9, led to the
development of
isoamylcarbamate 2.10,
which in turn led to the first
barbiturate, barbital 2.11.
2.3 Fruitful Synergies: Dyes and Pharmaceuticals 25
with only the molecular formula, it was anticipated that the oxidation of an allyl-
substituted toluidine would deliver the desired product. Now that the structural
formula is known, we understand that this could not possibly have worked! Upon
oxidation of aniline that was contaminated with o- and p-toluidine Perkins isolated
a dark precipitate. It contained a dye, mauveine 2.12 (Fig. 2.4) that colored silks
a brilliant mauve. Other dyes were prepared in rapid succession. The development
and later proliferation of the dye industry in England and Germany in the second
half of the nineteenth century can be traced back to this accidental discovery.
Toward the end of the next-to-last century increasing competition and a difficult
economic situation in the dye market inspired the reactionary expansion into
industrial pharmaceutical research. In 1896 a pharmaceutical research laboratory
was founded in the 33-year-old Bayer Farbenfabrik. At that time innumerable
synthetic dyes were known, therefore it is not surprising that these substances
were tested for pharmacological effects.
Of all people, wine adulterators played an important role in the discovery of the
first synthetic laxative. To stop people from selling Trester wine (so-called
Nachwein) as a natural wine (Naturwein), in 1900 the dye phenolphthalein was
added as an easily detectable indicator. The Hungarian pharmacologist Zoltán
von Vámossy (1868–1953) investigated the effects of this compound. Back then,
the conventions of the pharmacologists were still rather primitive. The intravenous
application of 0.01–0.03 g to rabbits caused death “with loud shrieking, convul-
sions, and paralysis”. Vámossy then decided to feed 1–2 g to a rabbit and 5 g to a
4 kg lap dog. Because these oral doses were all well tolerated, Vámossy took 1.5 g
of phenolphthalein himself, and a friend took 1.0 g. The effects were explosive:
rumbling in the bowels, diarrhea, and for two additional days loose stools. It was
later established that 150–200 mg would have been a therapeutic dose.
N
N
H3C
H2N NH
CH3
CH3
NH2
R = H oder o-, p-Methyl
C20H24N2O2 + H2O
3 [O]
[O]
+
2 C10H13N
2.12 Mauveine
Allyl-
toluidin
Quinine
R
Fig. 2.4 An unsuccessful quinine synthesis founded the dye industry. The structures of many
organic compounds were still entirely unknown in the middle of the nineteenth century. The
attempt to prepare quinine via a simple route (upper reaction) could not have worked. The
oxidation of an impure aniline (below) gave mauveine 2.12 in 1856, which was used to dye silk
a brilliant mauve color. It was the first synthetic dye!
26 2 In the Beginning, There Was Serendipity
An entire range of antibacterial and antiparasitic dyes are based on the work
of Robert Koch (1843–1910). He showed that bacteria and parasites accumulate
dyes specifically. Based on this, Paul Ehrlich (1854–1915) hoped to kill pathogens
selectively with suitably chosen dyes. In 1891 he cured two mild cases of malaria by
treating the patients with methylene blue. In the following years he tested hundreds of
different pigments, and thousands more analogues were later synthesized in the
laboratories of Bayer and Hoechst. In 1909 Paul Ehrlich pursued a rational design
when he exchanged both of the nitrogen atoms of an —N═N— group of an azodye
for arsenic atoms. Arsphenamine 2.14 (Salvarsan®
, Fig. 2.5) was the first effective
compound to treat syphilis; the first chemotherapeutic. It became an extraordinary
economic success for the company Hoechst.
The breakthrough with chemotherapeutics was made by the physician Gerhard
Domagk (1895–1964). At the age of 31, he took over the newly formed department
of experimental pathology at Bayer in Elberfeld. Azo dyes bearing sulfonamide
groups had already been designed by the chemists Fritz Mietzsch and Josef Klarer,
but they showed no in vitro activity; Domagk tested these substances in strepto-
cocci-infected mice. By using this model, he found the first active substances in
1932. Sulfamidochrysoidine 2.15 (Protonsil®
, Fig. 2.6), a dark-red dye that could
As
HO
H2N
As
NH2
OH
O
O
HO
HO
2.13 Phenolphthalein 2.14 Arsphenamine
x 2 HCl
Fig. 2.5 The laxative effect of phenolphthalein became apparent while testing it as an additive for
cheap wines. The antisyphilis compound arsphenamine 2.14 (Salvarsan®
, here shown as monomer)
is simply an azodye in which the —N═N— group was exchanged for an —As═As— group.
H2N
H2N H2N
N N
NH2
SO2NH2
2.15 Sulfamidochrysoidine
2.16 Sulfanilamide 2.17 p-Aminobenzoic acid
SO2NH2 COOH
Fig. 2.6 The red azodye sulfamidochrysoidine 2.15 is effective only after cleavage to the
colorless sulfanilamide 2.16, which is a bacterial antimetabolite of p-aminobenzoic acid 2.17.
2.3 Fruitful Synergies: Dyes and Pharmaceuticals 27
treat even severe streptococci infections, resulted in 1935. The sulfonamides
became world famous a year later when the son of the US president, Theodore
D. Roosevelt, Jr., was treated with one to cure a severe sinus infection. But
even here a false hypothesis led to success. It was not the azodye itself, but rather
its metabolite, sulfanilamide 2.16 that was effective. Sulfanilamide replaces
p-aminobenzoic acid 2.17 (Fig. 2.6), which is needed for the bacterial synthesis
of an enzymatic cofactor, dihydrofolic acid.
2.4 Fungi Kill Bacteria and Help with Syntheses
The discovery of the antibiotic effect of Penicillium notatum by Alexander Fleming
(1881–1955) in 1928 is the most famous example of a serendipitous discovery.
Fleming noticed that a spoiled staphylococcus culture had been contaminated with
a fungal infection. In the area around the fungus, no bacteria could grow. Further
investigations showed that this fungus could also curb other bacteria. Fleming
called the still-unknown agent penicillin. It was not until 1940 that it was isolated
and characterized by Ernst Boris Chain (1906–1979) and Howard Florey (1910–
1985). In 1941 an English policeman was the first patient to be treated with
penicillin. Despite a temporary improvement, and even though penicillin could be
isolated from his urine, he died after a few days as no more penicillin was available
for his continued therapy. The fungus Penicillium chrysogenum, which produces
more penicillin than Penicillium notatum and is easier to cultivate was isolated
from a moldy melon in Illinois. The tedious route to the structural elucidation of
penicillin and the successful work to systematically vary its structure are scientific
masterworks of the first order. There were even more difficult problems to conquer
to optimize its production and its biotechnological mass production. Today the
modified penicillins 2.18 and cephalosporins 2.19 (Fig. 2.7), which make up a broad
range of antibiotics with outstanding bioavailability are available. The newer
analogues have a broader spectrum of activity against many pathogens and are
distinguished by a generally improved stability to the penicillin-degrading enzyme
b-lactamase. Fleming was a researcher to whom Pasteur’s thesis “chance favors the
prepared mind” fully applies. One day in 1921 while working in his laboratory with
a cold, he tried a rather headstrong experiment. He added a drop from his own nasal
mucus to a bacterial culture and found a few days later that the bacteria had been
S
H H CH3
CH3
N
S
H H
RHN
RHN
N
O COOH
O CH2R⬘
COOH
2.18 Penicillins 2.19 Cephalosporins
Fig. 2.7 Fleming’s accidental discovery of the antibiotic effects of a fungus has delivered a wide
palette of penicillins 2.18 and cephalosporins 2.19, each with different R groups.
28 2 In the Beginning, There Was Serendipity
killed. This “experiment” led to the discovery of lysozyme, an enzyme that hydro-
lyzes the bacterial wall. As a therapy it is unfortunately unsuitable because it does
not attack most human pathogens.
Chance and a fungus played an important role in the industrial synthesis of
corticosteroids. An important step in the synthesis is the introduction of an oxygen
atom at a particular position in the steroid scaffold, position 11. In 1952 chemists at
the Upjohn company sought after a soil bacteria that could hydroxylate a steroid in
this position. Just when they finally decided to set an agar plate on the window bank
of the laboratory, Rhizopus arrhizus landed exactly there. This fungus transforms
progesterone (▶ Sect. 28.5) to 11a-hydroxyprogesterone. With its help the yield
could be increased to 50%. The closely related fungus Rhizopus nigricans even
afforded 90% of the desired product.
2.5 The Discovery of the Hallucinogenic Effect of LSD
In the 1930s Albert Hoffmann (1906–2008) was working on the partial synthesis of
ergoline alkaloids at Sandoz. In 1938 he wanted to find a way to transfer the
respiratory and cardiovascular stimulatory effect of N,N-diethyl nicotinamide
2.20 onto this class of compounds. In analogy to 2.20, he prepared N,N-diethyl
lysergamide 2.21 (Fig. 2.8) with the hope of maintaining the stimulatory circulatory
and respiratory effects. Except in case the experimental animals were agitated
under anesthesia, the substances showed no particular effect. Therefore they were
not pursued at first. Hoffman prepared the substances for a second time five years
later because he wanted to investigate them more thoroughly. Upon the purification
procedure and recrystallization he reported feeling “a strange agitation combined
with a slight dizziness.” At home he fell into “a not-unpleasant inebriated condition
that was characterized by extremely animated fantasies . . . after about 2 hours, the
condition went away.” Hoffman suspected a connection to the compounds he
prepared and conducted a self-experiment with 0.25 mg a few days later. That
was the smallest dose with which he expected to see an effect. The outcome was
dramatic, the experience was the same as the first time, but much more intense. He
had a technician accompany him home on his bicycle. During the ride, his condition
H
N
CO-N(Et)2
N
CH3
H
H CO-N(Et)2
HN
2.20 N,N-diethyl
nicotinamide
2.21 LSD
Fig. 2.8 N,N-Diethyl nicotinamide 2.20 is a centrally active derivative of nicotinic acid.
Hofmann wanted to synthesize a general stimulant analogously by preparing the N,N-diethyl
amide of lysergic acid. The result was the hallucinogen lysergic acid diethyl amide 2.21 (LSD).
2.5 The Discovery of the Hallucinogenic Effect of LSD 29
took on a threatening form, and he fell into a severe crisis dominated by dizziness
and anxiety. The world took on a grotesque form. Later it was determined that
0.02–0.1 mg is enough to cause hallucinations. The substance was temporarily
marketed as Delyside®
for use in psychotherapy and to treat anxiety and compul-
sive disorders.
2.6 The Synthetic Route Determines the Structure
The structure of the first calcium channel blocker, verapamil 2.22 was determined
by its synthesis (Fig. 2.9). Verapamil counteracts the effects of b-adrenergic
agonists, but it is not a b-blocker. It was only after its introduction to the market
CN CH3
CH3
MeO
+
N
CH3
OMe
OMe
OMe
OMe
Cl
CH2 H3C
H3C
Br
MeO +
+
CN
N
MeO
MeO
CH3
2.22 Verapamil
CHO
NO2
COOMe
NO2
COOMe
+
MeOOC
MeOOC
NH3
OH
H3C
H3C
HO CH3
CH3
N
H
+
2.23 Nifedipine
Fig. 2.9 Ferdinand Dengel, a chemist at the former Knoll AG wanted to prepare a cardiovascular
therapeutic by alkylating a nitrile. To avoid a double substitution, he started with the sterically
demanding isopropyl group. The result was the first calcium channel blocker, verapamil 2.22. The
isopropyl group is the optimal alkyl group because it stabilizes the biologically active conforma-
tion. The synthetic route played an important role in the development of the second calcium
channel blocker, nifedipine 2.23. In 1948, Friedrich Bosser at Bayer was given the task of finding
new substances that dilate the coronary arteries. After years of work, in 1964 he turned to the easily
prepared dihydropyridines, which surprisingly displayed the desired effects. In this case, the
space-filling nitro group promotes the biologically active conformation (▶ Sect. 17.9).
30 2 In the Beginning, There Was Serendipity
that Albrecht Fleckenstein clarified its mode of action: it blocks the inward mem-
brane-voltage-dependent flow of calcium ions through the calcium channels
(▶ Sect. 30.1) in heart and endothelial cells. The hypotonic effect was initially
seen as a side effect, but in the following years it became the most important reason
for use. The second group of therapeutically important calcium channel blockers,
nifedipine 2.23 was inspired by a synthetic principle. It was a reaction from 1882,
the Hantzsch synthesis of dihydropyridines (Fig. 2.9). Remarkably, the pharmaco-
logical experiments on nifedipine had to be carried out in a darkened room because
of its photosensitivity. All the more reason to acclaim that it was developed into
a medicine despite this characteristic.
2.7 Surprising Rearrangements Lead to Medicines
Leo Sternbach (1908–2005), a chemist at Hoffman La Roche was involved in
a program in the mid-1950s to find structurally novel tranquilizers. Sternbach
remembered a synthetic program on pigments from a decade before in which
N-oxide 2.24 (Fig. 2.10) was also prepared. Its reaction with secondary amines
delivered the expected products, which were pharmacologically absolutely
uninteresting. The work was practically ended in 1957, and the laboratory was
being cleaned up when it was noticed that a crystalline base and its hydrochloride
salt had precipitated from a solution. The substance was the product of a reaction
between N-oxide 2.24 and methylamine, but it was never tested due to other
priorities. The subsequent pharmacological testing convincingly showed outstanding
qualities. It was only later established that an unexpected ring rearrangement reaction
had occurred to afford chlordiazepoxide 2.25 (Librium®
, Fig. 2.10).
There are other examples of this sort. In 1974 W. Berney was working on
spirodihydronaphthalenes 2.26 (Fig. 2.11) with the goal of preparing CNS-active
substances. Upon acid treatment, he obtained a compound that was highly potent
in vitro and in vivo against a series of human-pathogenic fungi in a routine broad
screening at Sandoz Research Institute in Vienna. In 1985 the substance was
N
Cl
O−
Cl N+
N+
N
O−
N
H
CH3
Cl
CH3NH2
2.24 2.25 Chlorodiazepoxide
Fig. 2.10 Treatment of 2.25 with methylamine delivers the rearrangement product chlordiaz-
epoxide 2.25 (Librium®
) instead of the expected one. This first test compound became the first of
the benzodiazapine class to be marketed.
2.7 Surprising Rearrangements Lead to Medicines 31
introduced as naftifine 2.27, and later a more potent analogue, terbinafine 2.28
(Fig. 2.11) followed. Both substances showed a previously unknown mode of
action. They damage the membrane of fungi in that they block the ergosteroe
biosynthesis. This happens in a very early step because of the inhibition of the
enzyme squalene epoxidase.
2.8 A Long List of Accidents
The list of accidental discoveries, from which a few are described here, can be
prolonged ad infinitum. A few more examples are briefly mentioned without
chemical formulae.
• Pethidine (▶ Sect. 3.3), the first fully synthetic opiate analgesic, was synthesized
in the 1930s as part of an anticonvulsives research program, by starting from
atropine.
• The suitability of antihistamines for the prevention of motion sickness was
discovered in Boston because of a treatment for a skin rash. A patient reported
that her motion sickness, which always occurred when riding a Boston street car
went away. The “clinical trial” was carried out in 1947 on hundreds of sailors on
the transatlantic voyage of the USNS General Ballou.
• Haloperidol (▶ Sect. 3.3) was meant to be an analgesic, it turned out to be
a neuroleptic.
• Imipramine is structurally very similar to the neuroleptic chlorpromazine
(▶ Sects. 1.6 and ▶ 8.5). Nonetheless it has the opposite effect and is an
antidepressant.
• Phenylbutazone was meant to be an additive used to dissolve the anti-
inflammatory aminophenazone. The substance turned out to be an anti-
inflammatory agent itself as did its metabolite, oxyphenbutazone.
N N
CH3
2.27 Naftifine
H+
N CH3
HO
N
2.26
N
CH3 tBu
2.28 Terbinafine
Fig. 2.11 Instead of CNS activity, naftifine 2.27, prepared from spiro-compound 2.26, is an
antimycotic. A comparison with the more portent terbinafine 2.28 shows that the phenyl group can
advantageously be replaced with a tert-butylethinyl group.
32 2 In the Beginning, There Was Serendipity
• An attempt to isolate the causative agent of bipolar disorder from the urine
of patients afforded only uric acid. Because uric acid is poorly soluble,
lithium ureate was tested. This led to the discovery of the antidepressant effect
of lithium salts.
• Clonidine was meant to be a local treatment for the runny nose that accom-
panies the common cold. Instead of the expected effect, a profound hypotonic
effect was surprisingly found. Despite intensive structural variations, none of
clonidine’s analogues have surpassed its potency.
• Levamisole was developed as a broad-spectrum anthelmintic (anti-worm agent).
Instead, an immunomodulatory effect was accidently found that now stands in
the therapeutic foreground.
• Praziquantel was originally meant to be an antidepressant. Because of its high
polarity, it cannot cross the blood–brain barrier. An outstanding suitability for
the treatment of the tropical disease bilharziosis was found through broad
biological testing.
• A chemist at Searle who was working on dipeptides licked his fingers while
flipping through the pages of a book. The sweet taste that he noticed turned out to
be caused by the artificial sweetener aspartame. Saccharine was also found in
a very similar way. In the case of cyclamate, a smoker noticed a sweet taste to his
cigarettes.
• Even today when one would think that rational concepts dominate drug research,
the lucky accident still helps to make “blockbusters.” In the pursuit of a
phosphodiesterase inhibitor to hinder the degradation of cyclic guanosine
monophosphate (cGMP), an improved treatment for angina pectoris was not
found (▶ Sect. 25.8). Instead it became conspicuous that the male subjects in the
clinical trial did not want to give up the substance. After the side effect of
a stronger penile erection was recognized, the side effect became the main effect.
The compound sildenafil was marketed for the treatment of erectile dysfunction
as Viagra®
, and developed into a billion-dollar product.
2.9 Where Would We Be Without Serendipity?
In the English-speaking world, a word is in use that is difficult to translate into other
languages: serendipity. This term, as an expression of a lucky accident, was coined
by Sir Horace Walpole in 1754. It is derived from a Persian fairytale in which three
princes of Serendip (earlier Ceylon, today Sri Lanka) have accidental and unex-
pected luck and make interesting discoveries entirely analogously to the many
examples in this chapter. Serendipity has played an exceedingly important role in
general in science, and especially in drug research. How would our modern
medicine supply look without all of these lucky accidents? By no means should
an arbitrary approach be taken, and an accidental discovery be counted upon. To the
contrary, chemists and pharmacologists have always developed concrete ideas as
to how and why particular structural variations on a lead compound should be
2.9 Where Would We Be Without Serendipity? 33
pursued. Some of these hypotheses were correct, and others were false. One thing
that they always had in common that helped the researchers was that when
a hypothesis failed, or an unexpected result was found, they recognized the poten-
tial consequences of the result, drew the correct conclusions, and did the right
things. The following chapters will show numerous examples of successful targeted
drug design in cases in which the correct working hypothesis was realized. The
search for a new active substance is, however, not a process that can be pushed
through by a purely technically oriented management. As a general rule, short-term
planning and bureaucratic control have only negative consequences. On the other
hand the search for new medicines requires a concerted effort from many different
groups of specialists, who must work together in a suitable organizational structure.
The subsequent preclinical and clinical development of a newly found active
substance is an extremely expensive and time-consuming process that must be
carefully planned, carried out, and controlled. For this, other instruments are
necessary than are used for drug discovery.
2.10 Synopsis
• The history of early drug research is full of lucky accidents. Many active
principles of substances were discovered by serendipity, but mostly success
can be attributed to an outstanding researcher with a “prepared mind” who
observed important effects.
• Dyes and pharmaceuticals, both developed in the early stages of the up-coming
chemical industry, especially stimulated each other in very fruitful synergies.
• The discovery by Alexander Fleming of the first antibiotic principle, the peni-
cillins, as a defense mechanism of a fungus against bacteria, is one of the most
famous examples of a serendipitous discovery.
• The partial synthesis of ergoline alkaloids led to the discovery of the hallucino-
genic effects of LSD. In those days, researchers frequently conducted self-
experiments to first test active principle in humans.
• Unexpected synthetic products, surprising structural rearrangements, and ini-
tially false working hypotheses produced new, pharmacologically interesting
substances with surprising or outstanding qualities.
• Even today, where rational concepts and the understanding of mode-of-action
dominates drug research, the lucky accident can still help to make “block-
busters” as proven recently by the example of sildenafil (Viagra®
).
Bibliography
Primary Literature
Ban TA (2006) The role of serendipity in drug discovery. Dialogues Clin Neurosci 8:335–344
Burger A (1983) A guide to the chemical basis of drug design. Wiley, New York
34 2 In the Beginning, There Was Serendipity
de Stevens G (1986) Serendipity and structured research in drug discovery. Fortschr Arzneimit-
telforsch 30:189–203
Kubinyi H (1999) Chance favors the prepared mind. From serendipity to rational drug design.
J Receptor Signal Transd Res 19:15–39
Restak RM (1994) Receptors. Bantam Books, New York
Roberts RM (1989) Serendipity. Accidental discoveries in science. Wiley, New York
Sneader W (1990) Chronology of drug introductions. In: Hansch C, Sammes PG, Taylor JB (eds)
Comprehensive medicinal chemistry, vol 1, Kennewell PD (ed). Pergamon Press, Oxford,
S.7–S.80
Secondary Literature
Cahn A, Hepp P (1886) Das Antifebrin, ein neues Fiebermittel. Centralblatt f€
ur Klinische Medizin
7:561–564
Hofmann A (1993) LSD – mein Sorgenkind, dtv/Klett-Cotta
Sternbach LH (1978) The Benzodiazepine story. Fortschr Arzneimittelforsch 22:229–266
St€
utz A (1987) Allylamine derivatives – a new class of active substances in antifungal chemo-
therapy. Angew Chem Int Ed 26:320–328
von Vámossy Z (1900) Ist Phenolphthalein ein unsch€
adliches Mittel zum Kenntlichmachen von
Tresterweinen? Chemiker-Zeitung 24:679–680
Bibliography 35
Classical Drug Research
3
The hundred years of pharmaceutical research from 1880 to 1980 were punctuated
by trial and error, but also by elegant ideas and their translation into therapeutically
valuable principles. Many lead structures were found by accident (see ▶ Chap. 2,
“In the Beginning, There Was Serendipity”), others came from traditional medi-
cines or from biochemical concepts. In contrast to modern drug research, classical
design was the result of rather limited knowledge of the pathophysiology and
cellular and molecular etiology of disease, and was restricted to animal experi-
ments. Nonetheless, this phase, and particularly the last 50 years, has been excep-
tionally successful. The targeted fight against infectious diseases and the successful
treatment of many psychiatric and other important diseases can be attributed to this
period in drug development. With this came a significant increase in quality of life
and life expectancy. In the following sections, selected examples are used to
demonstrate different aspects of classical pharmaceutical research.
3.1 Aspirin: A Never-Ending Story
The history of acetylsalicylic acid (ASA, Aspirin®
) reflects the progress of phar-
maceutical research like no other example. This is especially true for the elucida-
tion of the mode of action, and the newly found targeted therapies that resulted.
Willow bark extracts have been used since antiquity for the treatment of inflam-
mation. When Napoleon marched across Europe, between 1806–1813 the bark was
even used as a substitute for cinchona bark (Sect. 3.2). Salicin 3.1, a glucoside of the
o-hydroxybenzylalcohol saligenin, is responsible for the effect. Upon hydrolysis
and oxidation, the actual active compound, salicylic acid 3.2 (Fig. 3.1), is formed.
In 1897 the then 29-year-old Bayer chemist Felix Hoffmann began a systematic
search for derivatives of salicylic acid. His father, who suffered from severe
rheumatoid arthritis, had asked him to. High doses of salicylic acid caused unpleas-
ant gastric irritation and vomiting. Hoffmann prepared simple derivatives of
salicylic acid, and was successful within the year. On October 10, 1897 he synthe-
sized acetylsalicylic acid 3.3 (ASA, Fig. 3.1) for the first time in a pure form.
G. Klebe, Drug Design, DOI 10.1007/978-3-642-17907-5_3,
# Springer-Verlag Berlin Heidelberg 2013
37
It was a lucky strike. Although ASA has a very short half-life in plasma, it is
analgetic, antipyretic, and anti-inflammatory in large measure. The clinical trial was
carried out at the Diakonissenkrankenhaus in Halle an der Saale on 50 patients. On
February 1, 1899 Bayer registered ASA as Aspirin®
(A for acetyl and spiraea,
another plant that contains salicylic acid) as a trademark under the number 36 433.
From then on it was sold as 1 g of powder in envelopes, and shortly thereafter as
tablets. Detractors alleged that it was only developed in tablet form so that Bayer
could emboss their famous Bayer cross onto it. Aspirin quickly gained a leading
place in drug therapy. One-hundred years after its market introduction, 40,000 t of
ASA are produced and pressed into tablets every year, worldwide. At the end of
1994 the Bayer plant in Bitterfeld produced 400,000 Aspirin®
tablets per hour, 3.5
billion per year. The importance that the trademark Aspirin had for Bayer became
clear in 1994 when the company paid US$1 billion to take over the self-medication
business from Sterling—Winthrop, which included the trademark rights for
Aspirin, which had been lost in 1918.
The Spanish philosopher José Ortega y Gasset called the previous century the‚
“age of Aspirin.” In his book‚ The Rising of the Masses, he wrote:
The ordinary person lives today more easily, comfortably and safely than the most powerful
of the past. Why should he care that he is not richer than others when the world is and
roads, trains, hotels, telegraphs, personal safety, and Aspirin®
are at his disposal.
Jaroslaw Hasek, Kurt Tucholsky, Giovanni Guareschi, Graham Greene, John
Steinbeck, Agatha Christie, Truman Capote, Hans Helmut Kirst, and Edgar
Wallace also wrote about Aspirin. The singer Enrico Caruso treated his headaches
with only “German Aspirin,” out of principle. Even Franz Kafka and Thomas Mann
raved about its outstanding effects in their letters. In 1986 on an official visit to
Germany, Queen Elizabeth II said that:
German successes span the entire breadth of human life. From philosophy, music and
literature, to the discovery of X-rays and the mass production of Aspirin®
.
The compliment was wonderful, but one must also consider that all of these
scientific discoveries are slightly more than 100 years old! ASA was considered to
O-b-D-glucopyranoside
3.1 Salicin 3.2 Salicylic acid
CH2OH COOH
OH
COOH
O CH3
3.3 Acetylsalicylic acid
O
Fig. 3.1 Salicylic acid 3.2 is the oxidation and cleavage product of salicin 3.1, which is isolated
from willow bark. Acetylsalicylic acid (ASA) 3.3 is not simply a prodrug of salicylic acid, but
rather a drug with its own mode of action.
38 3 Classical Drug Research
be a prodrug of salicylic acid and a drug of unknown mode of action until John Robert
Vane (Nobel Prize 1982) and Sergio H. Ferreira discovered in 1971 that salicylic
acid and other nonsteroidal anti-inflammatory drugs inhibit prostaglandin G/H
synthase (cyclooxygenase, COX). COX, a ubiquitously present, membrane-bound
enzyme transforms arachidonic acid 3.4 over a cyclic endoperoxide into PGH2 3.5,
which in turn is transformed into prostacyclin 3.6, thromboxane A2 3.7, and other
prostaglandins. Large quantities of prostaglandins are produced in inflamed tissue, so
that the inhibition of cyclooxygenase intervenes in the cause of the process
itself (Fig. 3.2).
ASA is in fact a metabolic precursor of salicylic acid. In contrast to other anti-
inflammatory drugs, including salicylic acid, however, it has an astonishing mode
of action (▶ Sect. 27.9). It has been known for some time that ASA selectively
acetylates the hydroxyl group of the amino acid serine 530 of cyclooxygenase. In
1995 the three-dimensional complex structure of a bromine analogue was solved
for the first time. This drives the point home that ASA, analogously to other COX
inhibitors, docks near the arachidonic acid binding site (▶ Sect. 27.9). Therefore
despite its relatively weak binding, ASA is in an outstanding position to acetylate
this serine. Serine 530 is not involved in the catalytic mechanism, but the additional
-
COOH
Cyclo
oxygenase
O
O
COOH
OH
ASA
3.5 PGH2
3.4 Arachidonic acid
Prostacyclin-
synthase
Thromboxane-
synthase
O
COOH
COOH
O
OH
O
OH
HO
3.6 Prostacyclin 3.7 Thromboxane A2
Fig. 3.2 Arachidonic acid 3.4 undergoes an oxidative cyclization and a peroxidase reaction in the
prostaglandin biosynthesis to give the primary product PGH2 3.5. Finally prostacyclin synthase
transforms PGH2 into prostacyclin 3.6, which protects the gastric mucosa, dilates blood vessels,
and inhibits platelet (thrombocyte) aggregation. The platelet thromboxane synthase transforms
PGH2 into thromboxane A2, which promotes aggregation. ASA irreversibly inhibits cyclooxygen-
ase. By using low ASA doses, the thromboxane A2 synthesis in the platelets is more strongly
inhibited than the production of prostacyclin in the vascular walls.
3.1 Aspirin: A Never-Ending Story 39
volume of the acetyl group impedes arachidonic acid’s entrance to the binding site
and therefore the synthesis of the prostaglandin precursors. A COX mutant that
carries an alanine instead of a serine at position 530, is enzymatically fully active
but is inhibited by all other anti-inflammatory compounds. This mutant is, as
expected, only weakly inhibited by ASA.
Stimulation for the continued research on nonsteroidal anti-inflammatory drugs
was generated by the discovery in 1991 of a second cyclooxygenase, COX-2. All
anti-inflammatory drugs until then were unselective, or they exerted their effect
overwhelmingly over COX-1 and only slightly over COX-2. The most important
side effect of ASA and other anti-inflammatory drugs is the gastrointestinal damage
that can occur at high doses; this results from the inhibition of the COX-1-
dependent synthesis of prostacyclin 3.6, which protects the gastric mucosa. In
contrast to the ubiquitously occurring COX-1, COX-2 is responsible for the fast
synthesis of prostaglandins in inflamed tissue. It has been possible to bring many
drugs to the market that are more than 1,000-fold more selective for COX-2 than
COX-1, for instance, 3.8 and 3.9 (Fig. 3.3 and ▶ Sect. 27.9).
But do not worry, Aspirin®
will live forever. Its success is growing in another
market. Even at low doses ASA inhibits the synthesis of thromboxane A2 3.7, which
initiates the coagulation of platelets (thrombocytes). Because of its irreversible
inhibition of cyclooxygenase, and the inability of platelets to synthesize new
enzyme, a one-time contact with the substance is enough to suppress the synthesis
for the lifetime of the thrombocyte, that is, for about a week. The enzyme is
replaced in other tissues besides thrombocytes. Therefore the physiological adver-
sary to thromboxane, the aggregation-inhibiting prostacyclin that is produced in the
walls of the vasculature, can be replenished (Fig. 3.2).
With regard to the condition of increased coagulation tendency, ASA adjusts the
biosynthesis away from the “bad” thromboxane in the direction of the “good”
prostacyclin. This effect is the basis for the therapeutic use of ASA in cases of
thrombosis susceptibility, for instance, before and after a heart attack or stroke.
Considering the now-known mechanism of the effect, the dose can be decreased by
tenfold! That reduces the risk of gastrointestinal bleeding as a possible side effect.
Based on these observations, it is now recommended that ASA be taken
SO2NH2
SO2NH2
H3C
N
N
CH3
O N
F3C
3.9 Valdecoxib
3.8 Celecoxib
Fig. 3.3 Celecoxib 3.8 and
valdecoxib 3.9 are specific
inhibitors of cyclooxygenase
COX-2, which is in particular
responsible for the fast
synthesis of prostaglandins in
inflamed tissue than COX-1.
40 3 Classical Drug Research
prophylactically before long-haul flights. The constricted sitting and lack of move-
ment coupled with the dry air and reduced pressure in the cabin lead to dehydration
and cause a “thickening” of the blood. The economy-class syndrome typically leads
to jet legs and increases the risk of embolism and vein thromboses. Here ASA can
offer a measure of protection. On the other hand, its use before surgical procedures
is not recommended. No surgeon wants an increased bleeding risk for the patient as
a result of diminished coagulation competence during a procedure.
Felix Hoffmann’s approach of using simple derivatization to improve the
tolerability of a substance led to a new therapeutic principle 100 years ago, the
value of which cannot be appreciated enough. The victory lap of ASA was, and
is, unstoppable. A German/Austrian study on 13,300 patients showed that ASA
therapy reduces the mortality of a heart attack by 17%, and the number of non-fatal
repeat attacks by 30%. On October 9, 1985 the US FDA, a normally
conservative organization, announced that the daily consumption of ASA can
reduce the chances of a recurrent heart attack by 20%, and in some high-risk
populations by even more than 50%. A further study on 22,000 physicians inves-
tigated the influence of regular ASA use on the chances of heart attack. Here, the
physicians were not the experimenters but the patients. The study was prematurely
ended when it was established that the control group had 18 lethal and 171 non-
lethal heart attacks, whereas the ASA-treated group had 5 lethal and 99 non-lethal
heart attacks: altogether a reduction of 50%. A study on 90,000 nurses showed the
same protective effect in women. The risk of a first heart attack was reduced by
30%. This marked the introduction of ASA as a “preventive medicine.”
A six-year study of 600,000 volunteers is worth an entry in the Guinness Book of
World Records. After the results were in, it appeared that ASA reduces the risk of
lethal colon cancer by 40%. Even this effect has a plausible explanation.
Malondialdehyde, a metabolite of prostaglandines, damages DNA. Mutations in
the so-called tumor-suppressor gene TP53 occur in human colon tumors particu-
larly frequently. This causes the cancer cells to lose the ability to regulate their
growth, and they grow uncontrollably. It could also be entirely different. As a result
of gastrointestinal bleeding, a possible side effect of ASA, the treated group was
probably more frequently examined than the control group. It is entirely conceiv-
able that the colon cancer was therefore found in an earlier stage in which it was
more easily operable.
Since 1992 Aspirin®
is available as a chewable tablet. In this form it is buffered
with calcium carbonate, the absorption is much faster, and the side effects are
reduced. ASA has had an unbelievable career, particularly if one considers that it
would never have had a chance under modern criteria to be approved. Its short
plasma half-life, the irreversible protein inhibition, and the high doses would have
met today’s exclusion criteria. A definitive end point in its hypothetical modern
development would be the teratogenicity seen in rats. A pathological result in
toxicity studies with this animal model will definitely lead to discontinuation,
because who would dare to wager that a teratogenic effect occurs in rodents, but
not in humans. Aspirin®
— really a never-ending story.
3.1 Aspirin: A Never-Ending Story 41
3.2 Malaria: Success and Failure
The therapy of malaria begins with the discovery of cinchona, around which there
are numerous legends. The nicest and most frequently cited version is that of the
fever-stricken Countess Cinchon, the wife of the Spanish viceroy in Lima, Peru,
who was healed by the doctor Juan de Vega in 1638. On the advice of the town
magistrate of Loja, Quinquina the “bark of the barks” (therefore the confusing name
“cinchona bark”) was brought in from 800 km away. The Countess was allegedly
healed and from then on distributed the powder herself. In the older works, the
cinchona bark was also called “Countess powder” or “Jesuit powder.” Perhaps it
was also true that the Indians, who were forced into compulsory service in the silver
mines by their Christian conquerors, chewed the bark to fight off shivering in the
cold. The clever Jesuits took note of these observations, and thought that chewing
the bark would also help with the shivering that comes from a malarial fever
episode. Cinchona then came back to Europe with the Jesuits.
Malaria, the remittent fever, is a widespread tropical and sub-tropical disease.
Because it is transmitted by the anopheles mosquito, it occurs particularly in
wetlands. Even the city Buenos Aires (Span. “good airs”) was badly hit by malaria
(Ital. mala aria¼“bad airs”). Alexander the Great, the Gothic King Alarich, and
the German Emperors Otto II and Heinrich IV died of it. Even Albrecht D€
urer
(1471–1528) apparently suffered from malaria. He sent his private physician
a drawing of himself in which he was wearing only a loincloth. His right hand is
over his spleen with the additional text that do der gelb Fleck ist vnd mit dem Finger
drawff dewt, do ist mir we (there where the yellow spot is and where the finger
points, is where it hurts). In Europe malaria was still widespread until the middle of
the last century. In the north of Germany, the last epidemics were in the years 1896,
1918, and 1926.
The miasma, emissions from the ground, swamps, and corpses, were long seen
as the source of malaria and other epidemics. The Roman author Marcus Terrentius
Varrus (116–127 BC) suspected back then that small invisible organisms might be
responsible. Toward the end of the nineteenth century, the anopheles mosquito was
identified as the vector, and a plasmodium was recognized as the cause of malaria.
Around 1930 about 700 million people were infected, and in 2003 the number was
estimated to be 300–500 million. Up to 1.2 million people die every year, mostly
children under the age of 5, and many others retain permanent damage. Psychiatric
changes are also a consequence. The term “spleen” for eccentricity originally came
from the enlarged spleen that malaria causes.
It should not go unmentioned that heterozygotic (i.e, genetically mixed) carriers
of sickle cell anemia are protected from malaria. This genetic form of anemia was
the first disease for which the molecular cause could be identified (▶ Sect. 12.12).
A single amino acid in the hemoglobin of those afflicted is mutated. This causes
hemoglobin to aggregate, and the erythrocyte shrinks together. The malaria parasite
cannot adequately reproduce in such an erythrocyte. This partial protection from
malaria has abetted the spread of sickle cell anemia in malaria-endemic areas, but
not in other areas.
42 3 Classical Drug Research
The active substance in the cinchona bark, the alkaloid quinine 3.10 (Fig. 3.4)
was isolated in 1820. Aside from the positive therapeutic effects, it also had
considerable side effects; nonetheless up until a few years ago it was the most
important antimalarial, particularly for the parenteral treatment of severe malaria.
The first synthetic alternative, plasmoquine 3.11, became available in 1927, but it is
seldom used due to its side effects. The later-developed, more potent analogues
3.12–3.14 show a clear structural relationship to the lead structure quinine
(Fig. 3.4). It was only through the protection from malaria that the exploitation of
the colonies was possible. The World Health Organization, WHO, initiated a global
malaria-eradication program in 1955 mainly through the use of the insecticide
dichlorodiphenyltrichloroethane 3.16 (DDT, Fig. 3.5).
The success was overwhelming, the number of cases and fatalities was reduced
to practically zero (Table 3.1). In 1953 it was estimated that five million lives have
H MeO
HO
H
N N
HN
HN
N(Et)2
MeO
N
CH3
3.10 Quinine 3.11 Plasmoquine
CH3 CH3
N(Et)2
HN
N(Et)2
MeO
N Cl N Cl
3.12 Mepacrine 3.13 Chloroquine
NH
HO
H
HN
N
OH
N
CF3
CF3
3.14 Mefloquine
N
Cl
3.15 Amodiaquine
Fig. 3.4 Simple synthetic analogues with antimalarial effects were derived from quinine 3.10.
Plasmoquine 3.11 still contains the methoxyquinoline ring of quinine, but it is in a different
position. The later-developed analogues mepacrine 3.12 and chloroquine 3.13 show strong
similarity to quinine. The newer derivatives mefloquine 3.14 and amodiaquine 3.15 are also
structurally closely related to quinine.
3.2 Malaria: Success and Failure 43
been saved since 1942. In India alone the number of cases went from 75 million to
750,000, and the number of annual fatalities was reduced to 1,500. DDT has saved
more lives than all antimalarial drugs put together! The acute toxicity of DDT is
actually not a problem for mammals and humans. Unfortunately, it turned out that
DDT decomposes extremely slowly in the environment, and it enriches as it moves
its way up the food chain, especially in birds and fish. It also accumulates in human
fat and in breast milk. The chronic toxicity comes from long-term retention of
one year or more, and that is a serious problem.
The moving book, Silent Spring by Rachel Carson, was published in 1962.
Despite warnings from experts, DDT spraying for mosquitoes was stopped
in Sri Lanka in 1963, and the number of malaria cases raced to 2.4 million by
Table 3.1 Number of malaria cases in different countries before and after the introduction
of DDT 3.16 (Fig. 3.5) The numbers in parentheses are the years (Jukes TH (1974) Naturwiss
61:6–16)
Country Cases of malaria (year)
Before DDT After DDT
Italy 411,602 (1946) 37 (1969)
Spain 19,644 (1950) 28 (1969)a
Yugoslavia 169,545 (1937) 15 (1969)a
Bulgaria 144,631 (1946) 10 (1969)a
Romania 338,198 (1948) 4 (1969)a
Turkey 1,188,969 (1950) 2,173 (1969)
India  75 million per year  750,000 (1969)
Sri Lanka 2.8 million (1946) 110 (1961)
31 (1962)
17 (1963)
2.5 million (1968/1969)b
Taiwan 1 million (1945) 9 (1969)
Venezuela 817,115 (1943) 800 (1958)
Mauritius 46,395 (1948) 17 (1969)
a
Imported cases
b
After DDT spraying was discontinued in 1963
Cl Cl Cl Cl
CCl3 CCl2
3.16 DDT 3.17 DDE
Fig. 3.5 The insecticide p, p0
-dichlorodiphenyltrichloroethane 3.16 (DDT) saved more human
life than all of antimalarials put together. The latest investigations show though, that the
antiandrogenic effects of the main metabolite p, p0
-dichlorodiphenyldichloroethylene 3.17
(DDE) is possibly the main culprit responsible for reproductive disorders found in animals,
including perhaps humans.
44 3 Classical Drug Research
1968/1969. By then it was too late to use DDT again because the mosquitoes had
become resistant, and this was certainly also partially due to the residual DDT that
remained in the environment in the intervening years.
Further investigations showed that a DDT metabolite, dichlorodiphenyldichloro-
ethylene 3.17 (DDE, Fig. 3.5) has surprisingly strong antiandrogenous effects, that
is, it blocks the effects of male hormones. Therefore, DDE is responsible for the
DDT-dependent reproductive and developmental disorders that are seen in some
species, perhaps also in humans. It is remarkable that the effect of this metabolite
was only discovered 50 years after DDT was introduced.
Not only the mosquitoes became resistant to DDT, the parasite also became
resistant to the drugs. For this reason, the history of the chemotherapeutic develop-
ments for malaria has been a rollercoaster ride of new promising compounds, and
the more or less quick development and distribution of resistant parasites.
Chloroquine 3.13, was prepared in 1934 in the Bayer laboratories, but was
judged to be “too toxic”; it was “rediscovered” by the Americans and deployed
as a malaria therapeutic par excellence. Efficacious, well tolerated, and above all
else inexpensive to produce, it, along with the above-described mosquito extermi-
nation with DDT and landscaping measures, brought us within reach of a victory
over malaria. But resistant parasites emerged almost simultaneously and indepen-
dently from one another in the 1960s in different parts of Southeast Asia, Oceania,
and South America. They possessed a mutated transport protein in the membrane of
their gastriole that recognizes chloroquine as a substrate. By using this protein they
were able to expel chloroquine from its target. In the meantime, resistant parasites
have spread throughout almost the entire geographic range of malaria. Chloroquine
lost its once phenomenal status for the therapy of malaria tropica. Since then,
a malaria therapeutic with similar qualities as chloroquine has been sought by
researchers, until now, however, without success. The structurally related
amodiaquine 3.15 (Fig. 3.4) is in fact effective against weakly chloroquine-resistant
strains, but it is largely ineffective against highly resistant strains (especially in
Southeast Asia). Moreover, upon long-term use as a prophylaxis, it carries the risk
of irreversible liver damage or a life-threatening agranulocytosis. In the short term,
it appeared that the antifolate combination of sulfadoxine/pyrimethamine 3.18/3.19
(Fansidar®
) could replace chloroquine (Fig. 3.4), but the first resistance occurred
much faster than with chloroquine. Starting from the point of origin in Southeast
Asia, the resistance has spread over the entire world.
The wars of the last century have also promoted the search for new antimalarial
drugs. Tremendous effort was made at the Walter Reed Army Institute of
Research in the USA. Over the course of 40 years, and particularly during WWII
and the Vietnam War, more than 250,000 substances were tested for an anti-
malarial effect. Judging on hand of the exerted effort, the success was modest:
the two aryl amino alcohols halofantrine 3.20 and mefloquine 3.14, and the
8-aminoquinoline tafenoquine 3.21, which still has not completed clinical trials,
were the result of strenuous labor. After its introduction, halofantrine was with-
drawn from the market because it caused lethal arrhythmias (▶ Sect. 30.3). In
Southeast Asia the resistance to mefloquine developed so quickly that it can only
3.2 Malaria: Success and Failure 45
be used in combinations with artesunate 3.22. Because mefloquine has been used
sparingly due to its price, most of the parasite strains are still sensitive to it. For this
reason, today mefloquine is one of the most important malaria prophylactics for
Western tourists. Artesunate is a partial-synthetic derivative of dihydroartemisinin
3.24, which is isolated from annual mugwort (Artemisia annua). Artemisinin’s
very unusual endoperoxide structure is essential for its activity. Intense research
is currently devoted to clarifying whether the iron(II)-catalyzed production
of radicals, which then react with the immediate cell structures (iron-triggered
cluster bomb), or a specific calcium pump inhibition is its mode of action. At any
rate, these are the most potent medicines to fight malaria to date. Scientists consider
it to be only a matter of time until resistance to artemisinin develops.
The artemisinin-based combination therapy is the current recommendation of the
WHO. It is combined with whatever is available, even with substances that have
already established massive resistance. At the moment it is combined with the
Chinese-developed aryl amino alcohol lumefantrine 3.23, which is usually
still effective. The combinations of dihydroartemisinin/piperaquine 3.24/3.25
and artesunate/pyronaridine 3.22/3.26 (Fig. 3.6) are in advanced stages of clinical
trials.
Both combination partners were developed in China in the 1960s and 1980s,
respectively. They belong to the same class as chloroquine, even though
pyronaridine has an azaacridine instead of a quinone scaffold. Resistance to both
of these compounds is already widespread in Southeast Asia. The combination of
dapsone/chlorproguanil (LapDap®
) 3.27/3.28 was introduced only a few years ago
and both compounds are representatives of a long-used class: the antifolates. Even
in this case, the majority of Southeast Asian strains are already resistant. True
novelties in the mode of action are rare. In 1997 a very expensive combination
medication atovaquone/proguanil 3.29/3.30 (Malarone®
) was introduced that syn-
ergistically inhibits the mitochondrial respiratory chain. Fosmidomycin 3.31, an
inhibitor of the parasite-specific mevalonate-independant isoprenoid synthesis
pathway, is currently in clinical trials. Increased efforts are necessary to find new
substances. Ideally, modes of action that have not been exploited yet should be
pursued. It is only in this way that we can be armed and ready for the time that
resistance to artemisinin spreads.
3.3 Morphine Analogues: A Molecule Cut to Pieces
Research on the opiates has taught us how complex natural products can be
systematically simplified, and structurally abbreviated analogues can be prepared
that have the identical effect, but sometimes with even better specificity. It has also
shown that there is sometimes no obvious solution for a specific problem. The
separation of the analgesic and addictive qualities could not, or only inadequately
be achieved.
The narcotic, analgesic, and euphoric effects of opium, which is isolated from
poppies, have been known for at least 5,000 years. Opium was used for operations,
46 3 Classical Drug Research
N
N
N
H
S
H2N
OMe
MeO
O O
3.18 Sulfadoxine
N
N
H2N Cl
NH2
H3C
3.19 Pyrimethamine
N
CH3
CH3
HO
F3C
Cl
Cl
3.20 Halofantrine
CF3
N
O
H3C
CH3
CH3
O
CH3
O
N
HN
NH2
3.21 Tafenoquine
O
O
O
O
H3C
CH3
CH3
H
H
O
O
O
O–
O
3.22 Artesunate
Cl
Cl
N
CH3
CH3
HO
Cl
3.23 Lumefantrine
O
O
O
H3C
CH3
H
H
O
O
CH3
OH
3.24 Dihydroartemisinin
Fig. 3.6 (continued)
3.3 Morphine Analogues: A Molecule Cut to Pieces 47
but is also a traditional drug of abuse. The importance of its abuse in the cultural
history of humanity is illustrated, among other places, in the “Opium Wars” of the
nineteenth century. In 1840 the Chinese wanted to stop the English from importing
opium and burned 20,000 cases of it; this led to a 2-year-long war between the two
countries.
N
N
N
N
N Cl
N
Cl
3.25 Piperaquine
HN
OH
N O
N
N
N
Cl
CH3
3.26 Pyronaridine
S
O
O
H2N NH2
3.27 Dapsone
N Cl
N
N
H
CH3
CH3
H2N
NH2
NH2
N Cl
N
N
H
CH3
CH3
H2N
Cl
3.28 Chlorproguanil
O
Cl
O
OH
3.29 Atovaquone
3.30 Proguanil
N
O
H
OH
P
O–
O
OH
Na+
3.31 Fosmidomycin
Fig. 3.6 The latest research in antimalarials shows that many products can be used in
combination. First Fansidar®
, a combination of sulfadoxine 3.18 and pyrimethamine 3.19 was the
drug of choice. The development of rapid resistance has made this once-promising treatment
useless in the meantime. To date, hopes rest on the artemisinin derivatives 3.22 and 3.24. A new
beacon of hope is found in fosmidomycin 3.31, which has a novel mode of action in that it inhibits
the mevalonate-independent biosynthetic route to isoprenoids.
48 3 Classical Drug Research
In 1804/5 the pharmacy assistant Friedrich Wilhelm Adam Sert€
urner of the
Hof-Apotheke in Paderborn isolated the compound with the sleep-inducing princi-
ple. He named it morpheum (later morphine) after Morpheus, the Greek god of
dreams and son of Hypnos. Morphine addiction took on a whole new dimension
after 1853 and the invention of the hypodermic needle and syringe by Charles G.
Pravaz and Alexander Wood. As a result, morphine and heroin addiction spread
widely, and in the history of humanity it is one of many examples of the misuse of
a beneficial discovery.
Morphine 3.32 (Fig. 3.7) is one of the few examples of a natural product that is
still used today in its original form. It belongs to the most potent known analgesics.
If it is administered according to the correct dose and schedule, the danger of
addiction is low. The addictive potential is often overestimated by physicians such
that patients with severe pain are often inadequately treated with opiates. Morphine
is also a prime example of the success of systematic structural variation in the
direction of more-easily manufactured, simpler analogues as well as more selective
activity. The first modified products were simple derivatives such as the methyl
ether codeine 3.33, which is also found in the poppies. Codeine is weaker than
morphine, but it is bioavailable after oral administration. It has a pronounced
antitussive effect and a low addictive potential. Unfortunately, the opposite is
true for the potent, fast-acting diacetyl derivative heroin 3.34. It has enormous
addictive potential. Today it seems ironic that at the end of the nineteenth century
Heinrich Dreser, a senior pharmacologist at Bayer, wanted to discontinue the
development of Aspirin®
because of a suspected cardiotoxicity in favor of devel-
oping heroin as a well-tolerated and potent cough medicine (sic!), at least until he
realized the mistake. Of all the morphine derivatives, codeine and heroin are the
most widespread: codeine is in numerous combination preparations, and heroin is in
the drug scene. Some n-alkyl derivatives of morphine and close analogues, for
instance, naloxone 3.35, are opiate antagonists, that is, they inhibit the effect of
morphine (Fig. 3.7).
The structural elucidation of morphine took more than 120 years, and its total
synthesis, and ultimate structural proof, was completed in 1952 by Marshall Gates
O H
H O
HO
N
H
OH
R1
O
R2
O
N CH3
O
N
3.32 Morphine, R1
= R2
= H
3.35 Naloxone
3.33 Codeine, R1
= Me, R2
= H
3.34 Heroin, R1
= R2
= Acetyl
Fig. 3.7 Morphine 3.32 and codeine 3.33 served as lead structures for heroin 3.34, which has
better CNS bioavailablity, and naloxone 3.35, a morphine antagonist.
3.3 Morphine Analogues: A Molecule Cut to Pieces 49
and Gilg Tschudi. Morphine contains five rings: an aromatic benzene ring, two
unsaturated six-membered rings, the nitrogen-containing piperidine ring, and an
oxygen-containing five-membered ring. Systematic structural modifications had the
goal of simplifying the structure, for example, by opening one or more rings, or
removing them altogether.
In 1939, the potent analogue pethidine 3.36 (Fig. 3.8) was the first fully synthetic
analgesic, though it was originally based on the spasmolytic atropine 3.37. Despite
this, it is recognized to be a morphine analogue. In levomethadone 3.38 the
piperidine ring of pethidine is opened, an oxygen atom from the ester group is
removed, and another aromatic ring is added. There are thousands of other ana-
logues, some of which have been introduced to therapy. Aside from the decon-
struction of morphine, the construction of additional rings has surprisingly led to
analogues with more potency, for example, etorphine 3.39 (Fig. 3.8).
For a long time it was a complete mystery why our bodies would have extra
receptors for the contents of poppy plants, so-called opiate receptors. The solution
came with the discovery of the endogenous morphine-like peptides Met- and Leu-
enkephalin (▶ Sect. 10.2), which are the natural ligands for these receptors. The
discovery stimulated an intensive search for orally active peptides or
peptidomimetics devoid of addictive potential. The result of the work was more
COOEt
N
H3C
N
H3C
H3C
H3C
H
O
OH
N
CH3
COOEt
O
3.36 Pethidine
=
3.37 Atropine
O
Et
O
HO
N
N(Me)2
N CH3
H
OH
CH3
MeO
3.38 Levomethadone
3.39 Etorphine
Fig. 3.8 The architecture of morphine was dissected in many ways. The strongly potent pethidine
3.36 was the first fully synthetic opiate analgesic, but it was discovered in the 1930s in a search for
anticonvulsives by varying the structure of atropine 3.37. It is recognizable however, that pethidine
retains the benzene ring of morphine as well as its piperidine ring. Levomethadone 3.38 is derived
from pethidine. The addition of another ring led to substances the potency of which surpasses
morphine by orders of magnitude. Etorphine 3.39 is 2,000–10,000-times more potent than
morphine in animals. Since 1963 it is used in African wildlife preserves to immobilize large
animals such as elephants and rhinoceroses.
50 3 Classical Drug Research
than sobering. Although orally active analogues were found, their addictive poten-
tial was identical to that of morphine and most morphine-derived analogues.
A few synthetic analogues have, in addition to agonistic activity, a weak antag-
onistic effect as well. The potential for these substances to be abused by addicts is
less than with the classical morphine analogues. Combination preparations of
agonists and antagonists are also available. With appropriate use, the analgesic
effect of the agonist dominates because it is present in excess. If the medicine is
injected intravenously, the more-strongly binding antagonist displaces the agonist,
and the desired euphoric effect never sets in.
The work with regard to improved selectivity was also successful. Today cough
medicines and antidiarrhea medicines, for example, loperamide 3.40 (Fig. 3.9), are
available that have no central morphine-like effects. This substance is able to pass
through the blood–brain barrier but is immediately expelled by an active trans-
porter. Upon inhibition of these transporters, for instance, when coupled with
quinidine, loperamide also has classical opiate effects. Its structure unites elements
of pethidine 3.36 and levomethadone 3.38.
In this section only a few representatives of the many thousand structural
modifications of morphine can be discussed. The approach of Paul Janssen should
not remain unmentioned though; he started with pethidine 3.36 with the goal of
preparing a strong analgesic, but instead experienced an unexpected success in
another area. The result was the neuroleptic haloperidol 3.41 (Fig. 3.9), a drug for
the treatment of schizophrenia, the mode of action of which is mediated by an
antagonistic effect at the dopamine D2 receptor (▶ Sect. 29.4).
3.4 Cocaine: Drug and Valuable Lead Structure
No other substance sparkles in so many ways as cocaine. In the introduction it was
already mentioned that it is at the pinnacle of all illegal drugs. Cocaine was also the
chemical starting material for a wide palette of valuable local anesthetics and
antiarrhythmics. We can thank the lead-structure cocaine for local anesthesia,
pain-free dentistry, and nerve-block anesthesia for smaller surgical procedures.
N
Cl
OH
Cl
O
CON(Me)2
N
OH
F
3.40 Loperamide 3.41 Haloperidol
Fig. 3.9 Structural derivatives of morphine and its analogues have led to selective antidiarrhea
agents, loperamide 3.40, for instance, as well as neuroleptics such as haloperidol 3.41.
3.4 Cocaine: Drug and Valuable Lead Structure 51
The translation of the quite positive central effects of cocaine onto analogues
devoid of addictive potential is still in progress. The example of morphine leads
one to fear that this goal might not be possible.
Coca leaves and cocaine 3.42 (Fig. 3.10) belong to the oldest known drugs.
Chewing dried coca leaves has a long tradition in Peru and Bolivia. In 1744
Garcilaso de la Vega wrote that coca “satisfies hunger, gives new energy to the
tired and exhausted, and lets the unhappy forget their troubles”. The Scottish
author, Robert Louis Stevenson (Treasure Island) wrote in his novella The Strange
Case of Dr. Jekyll and Mr. Hyde about a personality split that a doctor undergoes
under the influence of drugs; he wrote the first draft of this novella in only three
days and nights while under the influence of cocaine. In 1863 the American chemist
Angelo Mariani (1838–1914) patented a mixture of coca extract and wine as
Vin Mariani. It made him a rich man. In 1886 the pharmacist John S. Pemberton
developed a coca-containing stimulant and headache remedy that he named Coca
Cola. He sold the rights in 1891 to a colleague, Asa G. Candler, who founded the
Coca Cola Company one year later. Up until 1906 Coca Cola indeed contained
a small amount of cocaine, but today it only contains the harmless stimulant
caffeine. Back at the turn of the last century, cocaine was already fashionable,
particularly in artistic circles. The Viennese psychiatrist Sigmund Freud (1856–
1939) experimented with cocaine intensively and rather uncritically. He considered
it to be a wonder drug, took it himself regularly, and recommended it generously for
use in therapy, for the treatment of stomach aches, and for a depressed mood. Later,
after massive criticism from his colleagues he turned away from it.
Cocaine causes the release of dopamine from its transporter (see ▶ Sect. 30.7).
Usually it is sniffed, occasionally it is intravenously injected, or it is mixed in drinks
or taken orally. Sniffing delivers it quickly to the brain where it displaces dopamine
N
COOCH3
O O
H3C
H3C
H3C
O
H3C
O
H
NH2
3.42 Cocaine 3.43 Benzocaine
CH3
CH3 CH3
N
H
N N
H
N
O
H3C
H3C
O
3.44 Lidocaine 3.45 Mepivacaine
Fig. 3.10 The local anesthetic effect of cocaine 3.42 was recognized early on. The independently
found lead structure benzocaine 3.43 and the basic moiety of cocaine were models for synthetic
local anesthetics. The structural relationship is clearly recognizable in lidocaine 3.44, which also
acts as an antiarrhythmic, and in mepivacaine 3.45.
52 3 Classical Drug Research
from the binding site of the transporter and this causes increased dopamine release
into the synaptic gap. The free base, which is made by mixing it with sodium
bicarbonate (crack) is absorbed very quickly through the lungs by smoking it, and
causes euphoria that is even more distinct stronger than when the salt (coke,
powder, snow) is sniffed. Because cocaine does not bind for long, the transporter
is quickly reloaded with dopamine. The same effect can be induced again after
a little while. Other cocaine analogues that bind for longer do not allow the effect to
be repeated for hours. Psychological dependence occurs very quickly, even after the
first use in the case of crack cocaine. Physical withdrawal symptoms, as seen with
heroin addicts, usually do not occur.
The credit for discovering the local anesthetic effect of cocaine does not go to
Freud but rather a friend of his, the ophthalmologist Carl Koller (1857–1944).
Freud had planned to investigate this effect but in 1884 he wanted to visit a friend of
his, Martha Bernays, in New York quickly first. Koller picked up on Freud’s
suggestion and carried out the decisive experiment on the eye in his absence. The
synthetic benzoic acid esters and anilides that were initially used as local anes-
thetics were not derived from cocaine 3.42, but rather from p-aminobenzoic acid
esters; benzocaine 3.43 was already in use in therapy in 1902. A structural rela-
tionship to cocaine is, however, easily seen in modern local anesthetics such as
lidocaine 3.44 and mepivacaine 3.45 (Fig. 3.10).
3.5 H2 Antagonists: Ulcer Therapy Without Surgery
The history of the treatment of gastroduodenal ulcers is long and educational. Basic
research clarified the important mechanisms without providing a new drug. The
development of the therapy occurred in several phases. Again and again, better was
the enemy of good. In the beginning the treatment consisted of antacids, and later
anticholinergics. In severe cases only surgery helped. The H2 antagonists made the
breakthrough to purely pharmaceutical treatment. Now we are experiencing the victory
lap of the proton-pump inhibitors, which are used in different combinations with
antibiotics. Perhaps in the future this will be augmented or even replaced by a vaccine.
Gastric and duodenal ulcers are usually chronic illnesses and are widespread in
the general population. Any damage to the mucosal membrane of the stomach leads
to damage to the underlying cells through proteolytic enzymes and gastric acid.
Acetylcholine 3.46, histamine 3.47, and gastrin, a mixture of peptides with 17 (little
gastrin) and 34 (big gastrin) amino acids, stimulate the production of acid.
For decades the treatment of gastroduodenal ulcers was based on reducing the
amount of acid, for instance, with sodium bicarbonate, calcium carbonate, magne-
sium salts, and aluminum oxide hydrate. Advanced ulcers had to be treated surgi-
cally. Anticholinergics, antagonists of the acetylcholine receptor should, in
principle, have been suitable for ulcer treatment; however, unspecific antagonists
are out of the question because of their severe side effects. It was not until
pirenzepine 3.48 (Fig. 3.11), a selective so-called M1 antagonist, was developped
3.5 H2 Antagonists: Ulcer Therapy Without Surgery 53
that this class could be used in therapy. Here the undesirable side effects of
unspecific anticholinergics are only apparent at relative high doses.
The role of histamine in acid secretion was initially called into question because
the classical antihistamines, later defined as H1 antihistamines, did not reduce acid
secretion. These substances, for instance, diphenhydramine 3.49 (Fig. 3.11) antag-
onize histamine in the intestines, lungs, and in allergic reactions. Today a wide
palette of different histamine antagonists is available for the treatment of allergic
rhinitis (hay fever). The most important side effect, particularly with the older
substances, is a more or less pronounced sedation. Histamine-induced gastric acid
secretion, the effect on the heart, and uterus contractions are not inhibited by
diphenhydramine and other analogues. It was first suspected in 1948 that there
might be two different histamine receptors, H1 and H2. The H1-type is inhibited by
diphenhydramine, but the H2-type, which is responsible for the above-mentioned
effects is not. Both belong to the family of G protein-coupled receptors
(▶ Sect. 29.1). In the meantime two additional members of the family, the H3 and
H4 receptors, had been discovered. In 1964 James W. Black (1924–2010) at Smith
Kline  French in England began to develop three models to test the inhibition of
these other effects of the H2-mediated effect of histamine. One was an in vivo model
measuring gastric perfusion on anesthetized rats, and two were in vitro models
evaluating the histamine-induced stimulation of a guinea pig heart and a rat uterus.
James Black later received not only the Nobel Prize, but was also knighted by Queen
Elizabeth II, two rather unusual honors for an industrial pharmaceutical researcher.
Despite all strategies that were available for the development of receptor antag-
onists, the search for an H2 antagonist was to no avail for years. The American
management in Philadelphia became impatient and wanted to end the program. The
first promising result came just in the nick of time. Because all lipophilic analogues
+
O
N
CH3
H3C
CH3
O N
N
NH2
H
3.46 Acetylcholine 3.47 Histamine
O
CH3
O
H
N
N
O
N(Me)2
H
N
N
N
O Me
3.48 Pirenzepine 3.49 Diphenhydramine
Fig. 3.11 Acetylcholine 3.46 and histamine 3.47 stimulate the acid production in the stomach.
The acetylcholine receptor antagonist pirenzepine 3.48 was the first drug specifically for ulcer
therapy. Classical H1 antihistamines such as diphenhydramine 3.49 cannot antagonize histamine in
the stomach.
54 3 Classical Drug Research
were ineffective, the earlier more polar compounds that had already been investi-
gated were reinvestigated. A compound that had already been synthesized in 1928
and determined to be ineffective, Na-guanylhistamine 3.50 (Fig. 3.12), now
appeared to be a weak antagonist. The effect had been overlooked because 3.50
is actually a partial agonist and therefore shows a weak histamine-like effect.
Within a few days the first lead structure, S-(2-imidazoyl-4-yl-ethyl)isothiourea
3.51, with interesting activity was identified (Fig. 3.12).
The extension of the side chains of both of these compounds delivered partial
agonists, the antagonistic effects of which were too weak. It was only in 1972 after
they abandoned the hypothesis that the basic nitrogen in the side chain was
necessary for activity that they, after chain elongation and an N-methyl substitution
of the thiourea, arrived at the first clinically useful H2 antagonist burimamide 3.52.
Human trials confirmed the efficacy, but the bioavailability was poor. The next
milestone was achieved with the development of metiamide 3.53 (Fig. 3.12), which
is 5–10-times more potent than burimamide and clinically demonstrated the desired
ulcer-healing effect. In some patients, however, a granulocytopenia occurred,
which is a dangerous suppression of the white blood cells and cannot be tolerated.
The medical need was great. It was not foreseeable whether the observed effect
was a result of H2 antagonism. We have the company to thank for taking on the risk
of further research. The sulfur atom of the thiourea was suspect. An isosteric
exchange for an oxygen atom delivered a less-potent urea analogue. Exchange for
an ═NH group led back to a guanidine, which was strongly basic, but a potent
antagonist nonetheless. Substitution of the imino group for an NO2 or a CN group
led to less-basic analogues, the antagonistic potency of which was comparable to
metiamide. The somewhat more active of the two analogues, cimetidine 3.54
(Fig. 3.12) was clinically tested. In November 1976 and in August 1977 it was
introduced in England and the USA, respectively. By 1979 it was available in over
100 countries. Shortly thereafter in 1983, cimetidine (Tagamet®
) became the most-
prescribed drug in many countries, and its sales reached about US $1 billion.
X
N N
R
H
H
N
HN
X CH3
S
X NH2
N
HN NH
3.50 X = -NH- 3.52 Burimamide, R = H, X = -CH2-
3.51 X = -S- 3.53 Metiamide, R = CH3, X = -S-
S
N
H3C N
H
H
N
HN
S CH3
N
C N
3.54 Cimetidine
Fig. 3.12 Na-Guanylhistamine 3.50 and S-(2-imidazolyl-4-yl-ethyl)isothiourea 3.51 served as
lead structures for the H2-type antihistamines. The first clinically tested H2 antagonists,
burimamide 3.52 and metiamide 3.53, were unsuitable for therapy. Only the development of
cimetidine 3.54 led to a breakthrough and an exceedingly successful therapy.
3.5 H2 Antagonists: Ulcer Therapy Without Surgery 55
Such a successful drug makes other companies restless. There are many cases
in the history of pharmaceutical research in which a major new concept was adapted
by developments in other companies. Other examples of this are the structurally
entirely different calcium channel blockers verapamil and nifedipine (▶ Sect. 2.6)
and the angiotensin-converting enzyme inhibitors captopril and enalapril
(▶ Sect. 25.4).
The same happened in the development of the H2 antagonists. Ulcer therapy had
been researched since 1960 at Allen and Hansburys, a subsidiary of Glaxo. One of
the first lead structures 3.55 (Fig. 3.13), an aminotetrazole with about the same
potency as burimamide, was systematically varied without success. Their research
management also wanted to stop the project to concentrate on the anticholinergics.
The breakthrough came upon replacement of the tetrazole ring with a furan. It was
not exactly an obvious idea because the previously synthesized compounds always
had at least one nitrogen atom in the ring. The —CH2SCH2CH2— chain was taken
over from metiamide 3.53, and a dimethylaminomethylene group was added to
improve water solubility; the result was AH 18665 3.56 (Fig. 3.13).
The chemists also synthesized a cyanoguanidine AH 18801 3.57 that was
comparable to cimetidine 3.54 in terms of potency. The substance’s characteristics
were, however, unsatisfactory: the melting point was too low. The nitrovinyl
analogue 3.58 brought success in this respect. It was synthesized and was an oil!
That was not seen as a prohibitive problem because it was redeemingly 10-times
more potent than cyanoguanidine 3.57 in the rat. Ranitidine 3.58 (Fig. 3.13) was
developed as a drug and introduced in 1981 as Zantac®
and Sostril®
. Compared to
cimetidine, ranitidine was 4–5-times more efficacious in humans and had the
advantage that it was more selective. In 1987 ranitidine overtook cimetidine. In
1994 with US $4 billion in sales, it became the most economically successful drug
in annual sales at that time. Within a few years, Glaxo was catapulted to the
pinnacle of the world rankings of pharmaceutical corporations. Glaxo used this
opportunity. The research of this company and its strategy in drug development
belong to “the finest” in the branch today. Through mergers and acquisitions with
competitors, Glaxo, “GSK” as it is known today, has become one of the largest
pharmaceutical corporations in the market.
In the meantime, an antitumor effect in colon, gastric, and renal cancer has been
reported for cimetidine. Apparently it suppresses the tumor-mediated interleukin-1-
induced selectin activation (▶ Sect. 31.3).
It is understandable from the chemical structure that cimetidine has a high
affinity for cytochrome P450 enzymes, particularly CYP 3A4 (▶ Sect. 27.6). As
a consequence, interactions with other drugs that depend on CYP 3A4 for meta-
bolism are common. What was first seen as an indispensible imidazole moiety in
3.54 blocks the catalytic iron center in the P450 enzymes. Ranitidine 3.58 carries
a furan ring in the same position and lacks the P450 inhibition. After cimetidine and
ranitidine, very few other drugs have made their way to the market. Nizatidine 3.59
and famotidine 3.60 contain a thiazole ring as a heterocycle (Fig. 3.13). In 3.60, the
electron-withdrawing group of the guanidine moiety is replaced by a sulfonamide
group.
56 3 Classical Drug Research
It is true even for the H2 blockers that good drugs are replaced by better ones.
After being prompted to acid stimulation, the cells use an H+
/K+
-ATPase active
enzyme to pump protons out of the cell in exchange for potassium at the cost of
energy. If “the faucet is turned off ” at this step, not only the histamine-induced acid
production, but also the acetylcholine and gastrin-mediated acid production is
stopped. Omeprazole 3.61 is a prodrug that has been developed, which, upon
rearrangement, acts as an irreversible inhibitor of this proton pump (▶ Sect. 9.5).
The effect of omeprazole therefore lasts longer, and the reduction in acid secretion
is stronger than with the H2 antagonists. Gastric and duodenal ulcers heal more
N
N
N
N
N
H
N
H
CH3
S
3.55
NH2
O
N
H3C
S
N
H
N
H
CH3
X
CH3 X
3.56 AH 18665, X = S
3.57 AH 18801, X = N-CN
N
N
H3C
S
N
H
N
H
CH3
3.58 Ranitidine, X = CH-NO2
S
CH3
NO2
NH2
N
SO2NH2
N
N
H2N
S
3.59 Nizatidine
N
S
NH2
S
CH3
3.60 Famotidine
N
S
N
OMe
CH3
N
S
O
3.61 Omeprazole
MeO
H
Fig. 3.13 The lead structures
3.55–3.57 were steps on the
way to ratinidine 3.58, which
in the 1980s was the
economically most important
drug. Nizatidine 3.59 and
famotidine 3.60 represent
newer developments.
Omeprazole 3.13 is a proton
pump inhibitor.
3.5 H2 Antagonists: Ulcer Therapy Without Surgery 57
quickly and reliably. These substances also hit it big. At the end of the last century,
Losec®
, Antra®
(both from Astra), and Prilosec®
(Merck  Co., USA) had com-
bined global sales of over US $6 billion despite the fact that they were introduced to
the market much later than ranitidine. The enantiomerically pure form
esomeprazole (Nexium®
) even reached US $7 billion in sales in 2007.
That is not even the end of the story. Although in principle it had been known
since 1983, the relevance of the bacteria Helicobacter pylori for the etiology of
ulcers was first discussed in 1994 at a conference of the US National Institutes of
Health (NIH). This bacterium infects a large portion of the population in childhood.
Frequently it is spread within a family; a kiss can be enough to infect someone. It
causes gastrointestinal damage in a portion of those infected, which can lead to an
ulcer. In the meantime it is held responsible not only for ulcers but also for at least
two different forms of gastric cancer. It survives assault by many antibacterial
agents as well as the acidic milieu of the stomach. It has an urease that releases
ammonia in its immediate vicinity, which in turn neutralizes the gastric acid.
The drugs of choice to treat such infections are combinations of H2 blockers,
proton pump inhibitors, and antibiotics. H. pylori seems to quickly develop antibi-
otic resistance though. Since the beginning of 1995 the first animal model is
available, a mouse with a sustained H. pylori infection; this should promote further
research in this important area. There is a vaccine currently in development.
A portion of the vaccinated patients exerted enough of an immune response to
defend themselves from the bacteria. For practical use however, its reliability must
be improved. Perhaps in the foreseeable future we will have an ulcer therapy that is
completely different, for instance, a swallowed vaccine that delivers life-long
protection. The revolution is in sight: a one-time treatment without repeated
gastroscopy. The patients will be delighted. Others will see this dramatic change
in therapy with mixed emotions.
3.6 Synopsis
• Even though the period of classical drug research was strongly governed by trial
and error, it has been exceptionally successful. Many leads were found by
accident or from traditional medicine, though limited knowledge of pathophys-
iology or molecular disease etiology was available.
• Acetylsalicylic acid or Aspirin®
is one of our oldest but also most prototypical
drugs. Originating from bark extracts and chemically modified to improve taste
and tolerance, it achieves its actual potency and mode of action by irreversibly
inhibiting cyclooxygenase.
• Since then two isoforms of cyclooxygenase have been characterized, one is
constitutionally present, and the other is induced in inflamed tissue.
Acetylsalicylic acid inhibits both unselectively, giving rise to some undesirable
side effects.
• Due to irreversible inhibition of COX in platelets, Aspirin exerts an influence on
the ratio of synthesized thromboxane and prostacyclin, which has a depressing
58 3 Classical Drug Research
effect on blood’s coagulation tendency. As a consequence, Aspirin is
recommended as “preventive medicine” to protect against thrombosis or to
reduce mortality of heart attack.
• Malaria is a widespread tropical/subtropical disease transmitted by the anophe-
les mosquito and caused by the plasmodium parasite accessing erythrocytes in
humans. The disease had been nearly eradicated by fighting the mosquito with
the insecticide DDT. One of the oldest active substances to hit the parasite is
quinine, isolated from cinchona bark.
• After stopping DDT spraying for the mosquitos, malaria raged again. Increasing
resistance of the parasite to known drugs occurred, and the development of new
chemotherapeutics for malaria has been a rollercoaster ride of promising com-
pounds and the development of resistant parasites.
• Morphine, isolated from poppies, is in use as the unchanged natural product, as
a potent analgesic. When administered correctly, the risk of addiction is low. Its
complex structure of five fused rings has been simplified and cut into pieces to
give more-easily accessible analogues with higher selectivity.
• Cocaine, which is the active ingredient in coca leaves, is one of our oldest drugs.
Upon replacement of dopamine from its transporter in the synaptic gap, its
euphoric effect is achieved. The cocaine structure served as lead structure for
the development of anesthetics.
• Ulcer therapy went through several phases of drug development leading to active
substances with increasingly efficient mode of action to reduce production of
gastric acid.
• Starting with antacids and rather unspecific anticholinergics, selective H2 antag-
onists were developed as a real breakthrough in pure pharmaceutical treatment
of ulcera. They act upon the H2 receptor, a member of G protein-coupled
receptors (GPCRs). A protein that pumps protons for acid release is stimulated
through these receptors. Proton-pump inhibitors such as omeprazole directly
block the function of the proton-secreting H+
/K+
-ATPase that builds up acidic
milieu.
• The bacterium Helicobacter pylori causes gastrointestinal damage leading
to ulcers. It can be eradicated by a combination of a proton-pump inhibitor
with an antibiotic. A vaccine against the bacterium could deliver life-long
protection.
Bibliography
General Literature
Burger A (1983) A guide to the chemical basis of drug design. Wiley, New York
Ryan J, Newman A, Jacobs M (eds) (2000) The pharmaceutical century. Ten decades of drug
discovery. American Chemical Society, Washington, DC, Supplement to ACS Publications
Sneader W (1996) Drug prototypes and their exploitation. Wiley, Chichester
Sneader W (2005) Drug discovery. A history. Wiley, Chichester
Verg E (1988) Meilensteine: 125 Jahre Bayer, 1863–1988. Bayer AG, Leverkusen
Bibliography 59
Special Literature
Aspirin – eine unendliche Geschichte, Research. Das Bayer-Forschungsmagazin, Issue 6, S. 4–21
(1992) and other articles in this magazine
Battistini B, Botting R, Bakhle YS (1994) COX-1 and COX-2: toward the development of more
selective NSAIDs. Drug News Perspect 7:501–512
Kelce WR et al (1995) Persistent DDT Metabolite p, p’-DDE is a potent androgen receptor
antagonist. Nature 375:581–585
Patrono C (1989) Aspirin and human platelets: from clinical trials to acetylation of cyclooxygen-
ase and back. Trends Pharm Sci 10:453–458
Schlitzer M (2007) Malaria chemotherapeutics part I: history of antimalarial drug development,
currently used therapeutics, and drugs in clinical development. Chem Med Chem 2:944–986
Wiesner J, Ortmann R, Jomaa H, Schlitzer M (2003) New antimalarial drugs. Angew Chem Int Ed
Engl 42:5274–5293
60 3 Classical Drug Research
Protein–Ligand Interactions as the Basis for
Drug Action 4
To purposefully design an active substance the following questions must first be
answered: How does a drug act? How does Aspirin®
relieve headaches? Why do
b-blockers lower blood pressure? Where does a calcium channel blocker act? How
does cocaine work? How do sulfonamides prevent the proliferation of bacterial
pathogens? An active substance must bind to a very special target molecule in the
body to exert its pharmacological action. Usually this is a protein, but nucleic acids
in the form of RNA and DNA can also be target structures for active molecules. An
important prerequisite for the binding is that the active substance has the correct
size and shape to fit into a cavity on the surface of the protein, a binding pocket, as
well as possible. Furthermore, it is also necessary that the surface properties of
ligand and protein fit together so that specific interactions can form. In 1894, Emil
Fischer compared the exact fit of a substrate for the catalytic center of an enzyme to
the picture of a lock and key. In 1913, Paul Ehrlich formulated the Corpora non
agunt nisi fixata, which literally translated means “bodies do not act if they are not
bound.” With this he wanted to express that drugs that are meant to kill bacteria or
parasites must be “fixed,” that is, bound by certain structures. Both concepts form
the starting point for rational drug research. In the broadest sense, they are valid
even today. After being taken, a drug must arrive at its target tissue and enter into
interactions with biological macromolecules there. Specific active substances have
a high affinity to a binding site on these macromolecules and are adequately
selective. It is only in this way that the desired biological effect can be deployed
without extensive side effects.
The most important terms that have to do with the modes of action of drugs are
briefly defined in Table 4.1. These terms are described in detail ▶ Chaps. 23,
“Inhibitors of Hydrolases with an Acyl–Enzyme Intermediate”; ▶ 24, Aspartic
Protease Inhibitors; ▶ 25, “Inhibitors of Hydrolyzing Metalloenzymes”; ▶ 26,
“Transferase Inhibitors”; ▶ 27, “Oxidoreductase Inhibitors”; ▶ 28, “Agonists
and Antagonists of Nuclear Receptors”; ▶ 29, “Agonists and Antagonists of
Membrane-Bound Receptors”, ▶ 30, “Ligands for Channels, Pores, and Trans-
porters”, ▶ 31, “Ligands for Surface Receptors”; and ▶ 32, “Biologicals: Peptides,
Proteins, Nucleotides, and Macrolides as Drugs” in detail with examples of target
G. Klebe, Drug Design, DOI 10.1007/978-3-642-17907-5_4,
# Springer-Verlag Berlin Heidelberg 2013
61
structures. Drugs often act as inhibitors of enzymes or as agonists or antagonists
on receptors. Enzyme inhibitors and receptor antagonists occupy a binding site
and prevent the substrate or endogenous ligand from docking there. Agonists
exhibit an additional quality, a so-called intrinsic effect. This has the consequence
that the receptor adopts a three-dimensional structure that is in a state that invokes a
response from a downstream process.
Although ion channels, pores, and transport systems are also receptors in the
broadest sense, they are considered as a separate group. Often the term “receptor” is
used rather loosely as a general term for any biological macromolecule that
interacts with a drug.
Biomolecules communicate frequently with one another by the recognition and
formation of large common surface contacts. It is over these contacts that the
primary attack and entry of viruses, bacteria, and parasites into the host cell take
place. Many cells receive a signal via surface receptors upon binding
a macromolecule. Even the rolling behavior of leukocytes in the vasculature is
governed by such surface receptors. These systems are increasingly being tapped
for drug therapy (▶ Chap. 31, “Ligands for Surface Receptors”) in that
active macromolecular substances, so-called biologicals or biopharmaceuticals
Table 4.1 Brief definitions of the most important terms
Term Definition
Ligand A (usually small) molecule that binds to a biological macromolecule
Enzyme An endogenous biocatalyst that can transform one or more substrates into one
or more products
Substrate A ligand that is a starting material for an enzymatic reaction
Inhibitor A ligand that prevents the binding of a substrate either directly (competitive) or
indirectly (allosteric), reversibly or irreversibly
Receptor A membrane-bound or soluble protein (or a protein complex) that initiates an
effect after binding an agonist
Agonist A receptor ligand that exhibits an intrinsic effect, that is, it causes a receptor
response
Antagonist A receptor ligand that either directly (competitive) or indirectly (allosteric)
prevents the binding of an agonist
Partial agonist A weak agonist that has a high affinity to the binding site, and in this manner
acts as an antagonist
Inverse agonist A ligand that stabilizes the inactive conformation of a receptor or ion channel
Functional
antagonist
A substance that prevents a receptor response by another mode of action
Allosteric
effector
A ligand that influences the function of a protein by causing a change in the 3D
structure of the protein
Ion channel A pore in a protein that allows specific ions to flow in and out across the cell
membrane along a concentration gradient. Opening and closing is affected by
binding a ligand or by a membrane potential change
Transporter A protein that transports molecules or ions across the cell membrane against
the concentration gradient by consuming energy
Antimetabolite A substance that interferes with the biosynthesis of a central metabolic product
either as a false substrate or as an inhibitor
62 4 Protein–Ligand Interactions as the Basis for Drug Action
(▶ Chap. 32, “Biologicals: Peptides, Proteins, Nucleotides and Macrolides as
Drugs”), are more often finding application as therapeutics in our pharmaceutical
arsenal.
4.1 The Lock-and-Key Principle
In the early 1880s, Emil Fischer investigated the cleavage of glucosides with
different enzymes that only differed in the stereochemistry of the glycosidic carbon
atom. He noticed that particular glucosides could only be cleaved with one group of
enzymes. Other glucosides, on the other hand, could only be cleaved with another
group of enzymes. He drew the correct conclusions from his observation and in
1894 formulated them in an article in the Berichte der Deutschen Chemischen
Gesellschaft (Reports of the German Chemical Society):
The limited effect of enzymes on the glucosides can also be explained by the assumption that
a chemical process can be initiated only by those [enzymes] that have a similar geometric
construction that approximates that of the molecule [substrates]. To use a picture, I want to
say that enzymes and glucosides must fit together like a lock and key to be able to exert
a chemical effect upon one another. This idea has gained plausibility and value for
stereochemistry research after the phenomena was transferred from the biological to the
chemical field.
In the same year he refined this picture:
Apparently here the geometrical construction exerts such a large influence on the play of
chemical affinities that the comparison of the two molecules undergoing an interaction
seems to me to be comparable to a lock and key. If the fact that some yeasts can ferment
a larger number of hexoses than others is to be explained, the picture can be completed by
differentiating between master and special keys.
Emil Fischer did not pursue this image any further, and later even complained that it
is often quoted out of context. The configuration of the sugars interested him, that of
the isomeric glucosides did not. He expressed a rather distanced attitude to purely
theoretical considerations. In 1912, he wrote in a letter “I myself take not so much
pleasure in theoretical things.” This is remarkably modest for a man who exerted such
a great influence with his image of a lock and key! Emil Fischer would have certainly
been pleased and proud if he had seen the results of the X-ray structural analysis of
protein–ligand complexes, for instance, of retinol (vitamin A) bound to the retinol-
binding protein, which is the transport protein for this molecule (Fig. 4.1).
Many binding sites can exceedingly specifically discriminate between analogues
that are chemically closely related. Even the smallest mishap must not occur in
protein biosynthesis. Friedrich Cramer more closely investigated the recognition
mechanism for the incorporation of the amino acids valine and leucine. These
amino acids differ in their side-chains only in that a methyl group is exchanged
for an ethyl group. The smaller valine residue should easily fit into the “lock” for
4.1 The Lock-and-Key Principle 63
a leucine, though it might not bind as strongly. A clear distinction, which is
absolutely necessary for an error-free protein synthesis, can only occur through
repeated recognition. Indeed, that is the case. An energy-consuming, iterative, and
“skeptical” auditing process reduces the error quotient to less than 1:200,000.
Because of this harsh feedback and control process, even the correct binding partner
is sometimes unsuccessful. Over 80% are rejected as being “dubious.” The result is
a process with an accuracy of about 1:40,000.
The retinol-binding protein is less selective. In this case, such extreme precision is
apparently not necessary for flawless functioning. In addition to the “stretched” retinol
isomer, the “folded” retinol isomer and chemically related substances also bind to the
protein. Other proteins discriminate very little. Examples of less-selective proteins
include digestive enzymes (▶ Sect. 23.3), metabolic enzymes (e.g., cytochromes;
▶ Sect. 27.6), and glycoprotein GP 170, which is responsible for the drug resistance
of tumor cells (▶ Sect. 30.7). A bacterial transport protein, oligopeptide-binding
protein A, can bind any peptide with two to five amino acids with approximately the
same affinity; this represents an extreme case of “chemical promiscuity.”
Linus Pauling translated the “lock-and-key” principle to the transition states of
enzymatically catalyzed reactions. A flexible adaptation often occurs during the
binding of the substrate. The transition state of the reaction binds more strongly to
the enzyme than either the substrate or the product (▶ Sect. 22.3) and is stabilized
by the functional groups of the binding site. The “lock-and-key” principle has been
repeatedly challenged because of the mobility of the ligand in the binding site; but
even with a high-security lock, the pins are still mobile and play an essential role in
the mechanism.
In the 1950s, Daniel E. Koshland proposed the theory of “induced fit,” which says
that the ligand induces a conformational change in the protein by binding to it. The
theory works under the assumption of a particular effect, for instance, the enzymatic
cleavage of the substrate. This mechanism does not contradict the lock-and-key
principle because, as previously stated, even a high-security lock has mobile parts.
Small, induced adaptations play an essential role in the ligand–receptor complex.
Even the relocation of entire protein domains has been observed. As a rule,
Fig. 4.1 Like a key in a lock,
vitamin A (retinol) fits into
the binding pocket of its
transport protein. The surface
of the ligand is green. The
protein residues in the direct
vicinity of the binding pocket
are visible. To improve the
clarity, the back of the
binding site and the residues
in front of the binding site
have been omitted.
64 4 Protein–Ligand Interactions as the Basis for Drug Action
the adaptability of the protein is related to its function. Proteins often have to be
adequately flexible to fulfill their biological functions.
For the rational design of ligands, there are two fundamentally different starting
points that differ in the informational content of the system. Either the exact
three-dimensional structure of the binding site is known or it is unknown. In the first
case, the lock is known, and the key “only” has to be cut (▶ Chap. 20, “Protein
Modeling and Structure-Based Drug Design”). In the other case, the active and
inactive analogues represent the fitting and ill-fitting keys. It is through the comparison
of the keys and systematic variations that better-fitting keys can be designed
(▶ Chap. 17, “Pharmacophore Hypotheses and Molecular Comparisons”). In the
following section, the binding of a low-molecular-weight drug (“ligand”) and
a macromolecular receptor will be more precisely illuminated. These target
structures for drugs can be outside or inside the cell, or they can be embedded in the
cell membrane. Therefore we will briefly address the construction and function of the
cell membrane before the protein–ligand interaction is brought to the foreground.
4.2 The Essential Role of the Membrane
The majority of biological processes in our body take place inside cells. These cells
are surrounded by a membrane that protects the cellular content from “leaking”.
The membrane also hinders undesirable xenobiotics from entering the cell and
mediates the contacts between cells. Membranes are also found within the cell,
where they form substructures (so-called compartments) and separate individual
cellular components from one another. In mammalian cells, the outer membrane is
made up of a lipid double-layer, in which proteins and cholesterol molecules are
embedded (Fig. 4.2). All molecules can move relatively freely, therefore it is called
a “fluid mosaic membrane.”
Lipid membranes of this type function as barriers for polar substances and as
permeable layers for non-polar molecules. The importance of membranes for the
transport and distribution of drugs is presented in detail in ▶ Chap. 19, “From
In Vitro to In Vivo: Optimization of ADME and Toxicology Properties”. Here,
only the important function that the lipid membrane has for the activity of
drug molecules is discussed. Membrane-embedded proteins belong to entirely
different classes. Among them are the membrane-anchored and membrane-residing
enzymes, the large class of G protein-coupled receptors (▶ Chap. 29, “Agonists and
Antagonists of Membrane-Bound Receptors”), ion channels, pores and transporters
(▶ Chap. 30, “Ligands for Channels, Pores, and Transporters”), and surface recep-
tors (▶ Chap. 31, “Ligands for Surface Receptors”).
Due to the phosphate and ethanolamine head groups, both of the outer layers
of the lipid double layer are very polar. The alkyl chains are found on the inside,
where the membrane is non-polar. Many drugs are also non-polar and accumulate
here in higher concentration than in solution. Amphiphilic (soap-like) molecules,
that is, substances that have both non-polar and polar character, arrange themselves
in the membrane so that the non-polar portion is on the inside (Fig. 4.2).
4.2 The Essential Role of the Membrane 65
This orientation within the membrane plays a particularly important role when the
polar group is a positively charged nitrogen atom that can form additional electro-
static interactions with the phosphate group of the lipids.
In the meantime, this concept has been proven experimentally with numerous
independent methods. For many receptors it is accepted that the ligand binds at
a site in the protein that is only accessible from the inner layer of the membrane
(e.g., lipases, ▶ Sect. 23.7; or cyclooxygenases, ▶ Sect. 27.9). Therefore the
enrichment and arrangement of an active molecule in the membrane plays an
important role for the optimal approach to the binding site. If the molecule, on
the other hand, assumes an incorrect orientation, its docking is hindered.
4.3 The Binding Constant Ki Describes the Strength of
Protein–Ligand Interactions
The binding of a ligand to its target protein is measurable. The extent of the binding
is characterized by the binding constant (Eq. 4.1). Literally interpreted, the disso-
ciation constant Kd is the reverse of the association constant Ka. With enzymes,
the so-called inhibition constant Ki is determined in a kinetic assay (▶ Sect. 7.2). At
low substrate concentration, it determines the inhibitory concentration that is
necessary to reduce the rate of an enzyme reaction by one half. Although Ki is
therefore not exactly defined as a dissociation constant, the two quantities are
usually referred to interchangeably. In the following, the abbreviation Ki is used
in the same sense as a dissociation constant, which indicates the strength of the
interaction between protein and ligand. It is a thermodynamic equilibrium measure
that indicates what portion of the ligand is bound to the protein, on average. The law
of mass action can be expressed as:
Polar drug
Amphiphilic
drug
Membrane-embedded
cholesterine molecule
Protein
Exterior of
the membrane
Interior of
the membrane
Non-polar
drug
Polar
head groups
Non-polar
alkyl groups
Membrane
lipids
Fig. 4.2 Membranes from mammalian cells are constructed from a lipid double layer, in which
proteins (yellow) and individual cholesterol molecules (black) are embedded. The individual lipid
molecules (orange) point their polar groups to the exterior of the membrane, and their alkyl chains
to the interior. Therefore polar drugs (light blue) accumulate on the outside of the membrane.
Non-polar drugs (red) are enriched in the interior of the membrane. Amphiphilic drugs (violet) are
oriented into the membrane according to their structure. Despite this, all of the molecules can
move relatively freely. Therefore this is called a “fluid mosaic membrane”.
66 4 Protein–Ligand Interactions as the Basis for Drug Action
Ki ¼
½ligand  ½protein
½ligand  protein complex
(4.1)
Ki has the dimensions of a concentration with the units of mol/L (M). The smaller
the Ki value is, the more strongly the ligand binds to the protein. If the concentration
of the ligand is significantly lower than Ki, only a very small portion of the protein
molecules are occupied by ligand molecules. A biological effect like that of the
inhibition of an enzyme cannot be observed. If the ligand concentration is equivalent
to Ki, half of the available protein molecules are occupied by a ligand. The Gibbs free
energy can be derived from the binding constants by a thermodynamic relationship
(which is valid for equilibria under so-called standard conditions; Eq. 4.2):
DG ¼ RT ln Ki (4.2)
in which R is the gas constant, and T is the absolute temperature in Kelvin.
A binding constant of Ki ¼ 109
M ¼ 1 nM, which is a respectable value for an
active substance, corresponds to a Gibbs free energy of 53.4 kJ/mol at body
temperature. A change in Ki of one order of magnitude means a change in the Gibbs
free energy of 5.9 kJ/mol (or 1.4 kcal/mol).
Frequently, instead of a Ki value, a so-called IC50 value is given. In contrast to
the Ki value, the IC50 value depends on the concentration of the enzyme and the
substrate. The obtained value is affected by the affinity of the substrate for the
enzyme as substrate and inhibitor compete for the same binding site. The IC50
value can be transformed into a Ki value by use of the Cheng-Prusoff equation.
Experience has shown that both values in the first approximation run parallel to one
another so that the more easily determined IC50 value is well suited to characterize
a ligand in comparison to other compounds.
Why is the Gibbs free energy used to describe the energetic relationships upon
complex formation? In chemistry and biology, processes run in open systems under
atmospheric pressure. Because the volume of the environment is enormous, it can be
assumed that the external pressure is unchanged even in processes in which produc-
tion of gas occurs. Therefore these processes are considered to be under constant-
pressure conditions. Nonetheless, a gas that was formed in the reaction must first find
space in the surrounding particles in the air. Therefore work must be performed. This
so-called pressure–volume work diminishes the maximum possible work to be
achieved by the system (internal energy, DU). The energy diminished by the
pressure–volume work is referred to as the enthalpy (DH). It is therefore the energy
converted during a process, corrected by the portion of the pressure–volume work.
The change in enthalpy is not the entire answer as to why a particular process,
such as the formation of a protein–ligand complex, spontaneously occurs. If we take
a hot and a cold chunk of metal and bring them into contact, everyone knows that
the heat will flow from the hot metal to the cold one. The opposite cannot be
observed, even though the energy content of the entire system would remain
unchanged for this process. Why does energy spontaneously flow from a hot to a
4.3 The Binding Constant Ki Describes the Strength of Protein–Ligand Interactions 67
cold object and not the other way around? This has something to do with the tendency
of all natural process to distribute energy evenly. The metal atoms vibrate very
strongly in a hot metal block in around their resting positions. Therefore the piece of
metal is hot. Some vibrational degrees of freedom are strongly activated. If the cold
metal block is brought into contact with the hot metal, these vibrations are transmitted.
In the end, the metal atoms in both blocks vibrate around their resting positions, but on
average not as vigorously as the atoms in the hot block moved before. The sum of the
energy content has remained constant; it is, however, now distributed over many more
degrees of freedom. The system can be described as having gone into a more disor-
dered state (many more atoms are now vibrating on average than in the beginning).
This happens for all spontaneously occurring processes. The entropy, S, is used as
a measure to describe the uniform distribution or random disorder. To correctly
describe the process of the formation of a protein–ligand complex (Eq. 4.3), we
need not only the enthalpy (DH) that is exchanged between the two binding partners,
how the distribution of degrees of freedom changes, and whether the system migrates
into a more disordered state must also be considered. Therefore the term free energy
(DG) has been introduced because it considers not only the energy balance of the
process. It also considers the changes in entropy (TDS) that reflect the spontaneous
distribution of energy over the degrees of freedom of the system. Spontaneously
occurring processes are characterized by a negative value for DG.
DG ¼ DH  TDS (4.3)
As shown in Eq. 4.3, DG is composed of an enthalpic component DH, and
an entropic component TDS. The entropic component is weighted with the
temperature. It matters a great deal whether the entropy in a system is changed at
low temperature, where all the particles are largely in an ordered state, or whether it
occurs at high temperature where the disorder is already very high. Because of the
negative sign, an increase in the entropy causes a decrease in the DG, and therefore
an increase in the binding affinity.
4.4 Important Types of Protein–Ligand Interactions
Organic molecules can bind to proteins by forming chemical bonds between ligand
and protein as well as non-covalent interactions. For example, a chemically mod-
ified product of omeprazole reacts with its target a protein and forms a covalent
bond (▶ Sect. 9.5). In this section, we want to limit ourselves to ligands that bind to
the protein by forming non-covalent interactions. For the following discussion, it
is helpful to classify protein–ligand interactions into different categories. The
different types of interactions are summarized in Fig. 4.3.
Hydrogen bonds (H-bonds) are very frequently observed between protein and
ligand. The proton-carrying partner in a biological system is usually an NH or
OH group, which is termed hydrogen-bond donor. The opposite group is an electro-
negative atom with a partial negative charge and is termed hydrogen-bond acceptor.
68 4 Protein–Ligand Interactions as the Basis for Drug Action
Examples of hydrogen-bond acceptors are oxygen and nitrogen atoms. Hydrogen
bonds are predominantly electrostatic interactions. They achieve their extraordi-
nary strength because the hydrogen atom of the donor group is bound to a strongly
electronegative atom, whereby the electron density of the hydrogen atom is
shifted to the neighboring atom. The sphere of influence of the hydrogen
atom becomes virtually smaller. This, in turn allows the acceptor to come closer
to the proton than the sum of the van der Waals radii should actually allow.
The electrostatic attraction between the partners therefore becomes larger. The
geometry of an H-bond is shown in Fig. 4.4. A hydrogen bond is characterized
by a pronounced distance and angle dependence. It is directional; its geometry is
defined within narrow limits.
It is often found that the charged groups of the ligand bind to the oppositely
charged groups on the protein. Such ionic interactions (also known as salt bridges)
are particularly strong when the two groups are separated by 2.7–3.0 Å from one
another. Frequently an ionic interaction overlaps with a hydrogen bond. This is
called a charge-assisted hydrogen bond. We will see that in many protein–ligand
complexes, the association is determined in large part by such ionic interactions.
A few proteins contain metal ions as cofactors, for example, Zn2+
in metallo-
proteases (▶ Chap. 25, “Inhibitors of Hydrolyzing Metalloenzymes”). It is often
O
O H
CH3
H N
O
H3C
H3N+
N
+
N
N
H
H
H
H
Zn
2+
S
H
O
O
O
O
Protein Ligand
Hydrogen bonds
Hydrophobic interactions
Cation–p interactions
Ionic interactions
(salt bridges)
Metal complexation
+
-
-
Fig. 4.3 Frequently
occurring protein–ligand
interactions. Important polar
interactions are hydrogen
bonds and ionic interactions.
Metalloproteases contain zinc
ions as a cofactor, the
interaction of which with
a ligand often yields
important contributions to the
binding affinity. Non-polar
parts of the protein and ligand
contribute hydrophobic
interactions. Because of the
particular electron
distribution in aromatic rings,
the interaction between
unsaturated ring systems is
particularly large.
4.4 Important Types of Protein–Ligand Interactions 69
the attractive interactions between the metal ion and the opposite charge in the
ligand that makes a decisive contribution to the affinity in these structures. Fur-
thermore, there are a few groups that are particularly well suited to forming
complexes with transition metals. Among these are the thiols RSH, hydroxamic
acids RCONHOH, acid groups, and many nitrogen-containing heterocycles.
Whether the charge can increase the affinity contribution of hydrogen bonds
depends strongly on the protonation state in which the involved functional groups
are found. Drugs are usually weak acids or bases, that is, they contain so-called
titratable groups (▶ Sect. 19.4). Whether these groups, for example, a carbonic
acid, an acidic sulfonamide, or a nitrogen-containing heterocycle, can release
or accept a proton and transform into a charged state depends strongly on the pH.
The same can happen with functional groups of the acidic or basic amino acid
residues. Then these groups can form charge-assisted hydrogen bonds that provide
a higher contribution to the binding affinity (Sect. 4.8).
The pKa value is considered to estimate whether a group is in the protonated or
deprotonated state. It indicates at which pH value the two forms, which are in
equilibrium with one another, are present in equal amounts. The situation might
become even more complicated because the pKa value can be shifted by the local
environment. In a hydrophobic environment, adopting a charged state is less
favorable for acidic and basic groups, that is, a shift to less acid or basic character
is the consequence. If an already-protonated, positively charged group in the ligand
faces an amino acid of the protein with the same charge, its protonation becomes
even more difficult to accomplish. The group therefore behaves less basic.
The opposite is the case when putative positively charged basic groups bind in
a protein environment with a negative charge. Here, the charged state is even more
easily formed, which corresponds to having stronger basic character. Entirely
analogous considerations result for acidic groups, just with opposite signs.
Here a positively charged protein environment shifts acidic groups toward higher
acidity, and a negatively charged environment makes them behave less
acidic. In this way the protein environment can induce a significant pKa shift
of the titratable groups of the ligand. Uncharged H-bonds can become
charge-supported contacts that significantly contribute to the binding affinity
(▶ Sect. 21.9). With the help of electrostatic calculations an attempt can be made
to estimate the pKa shift upon complex formation (▶ Sect. 15.4).
N H O
N-H··O
C=O··H
Fig. 4.4 Geometry of a hydrogen bond. The atoms N, H, and O adopt an almost linear orientation
to one another. The N···O distance is between 2.8 and 3.2 Å. The angle N–H···O is practically
always larger than 150
. A large variability is observed for the C═O···H angle. It is typically
between 100
and 180
.
70 4 Protein–Ligand Interactions as the Basis for Drug Action
Hydrophobic interactions form through the close proximity between non-polar
amino acid side chains of the protein and lipophilic groups on the ligand. Lipophilic
groups are aliphatic or aromatic hydrocarbon groups and also halogen substituents
(e.g., chlorine) and many heterocycles such as thiophene and furan (Fig. 4.5). All
areas that cannot form H-bonds or other polar interactions count as lipophilic parts
of the surface of a protein and ligand. In contrast to hydrogen bonds, hydrophobic
interactions are not directional. It does not matter in which relative orientation that
the lipophilic groups are to each other. The interactions between aromatic rings, for
which there is indeed a preferred relative orientation, are an exception to this.
It has been shown that for ligands with large lipophilic groups hydrophobic
interactions often afford a significant contribution to the binding affinity. The
influence of direct attractive forces between the lipophilic groups is, however,
small. The hydrophobic interactions are mainly caused by the displacement, or
more correctly put, the liberation of water molecules from the lipophilic environ-
ment of the binding pocket. Moreover the ligand with its lipophilic substituents
leaves the bulk water phase in the vicinity of the protein. The solvent “cave”, in
which the ligand was hosted in water, collapses. This step is also coupled with
changes in the free energy. The role of water molecules is discussed in Sect. 4.6.
Yet another important interaction should be mentioned here. Obviously, quaternary
amines bind particularly well in binding pockets that are formed by the aromatic
side chains of the protein. This contact is largely based on the polarization interac-
tion between the positive charge and the electronic system of the aromatic rings.
4.5 The Strength of Protein–Ligand Interactions
When evaluating the strength of protein–ligand interactions, it is reasonable to first
consider the non-covalent interactions between isolated small molecules. Information
about these is available through quantum mechanical calculations (▶ Sect. 15.5) as
well as spectroscopic investigations. In this way, molecule pairs can be experimen-
tally investigated in the gas phase. The association energies that are obtained for the
molecules afford an impression about the strength of the direct interactions.
The influence of effects that originate from the liberation of the solvent water
(desolvation) is, of course, missing in such experiments. Some of these data are
summarized in Table 4.2.
O S
O
iso-Pentyl- Cyclohexyl- Phenyl-
Chlorophenyl- Furanyl- Thiophenyl-
Phenoxy-
Cl
Fig. 4.5 Typical lipophilic
groups in ligands are aliphatic
and aromatic hydrocarbons,
halogen substituents, as well
as non-polar heterocycles
such as furan and thiophene.
4.5 The Strength of Protein–Ligand Interactions 71
The results show that electrostatic interactions are the dominating energetic
factor. The interaction between a cation and an anion in a vacuum is more than
400 kJ/mol. This corresponds to the strength of a covalent bond! This amount is
enormous compared to the typical protein–ligand interactions in water that are
summarized in Sect. 4.4. The binding of an ion pair in the gas phase, therefore, is
much larger than the typical strength of a protein–ligand interaction in water.
Two water molecules bind to each other with 22 kJ/mol. This interaction is also
overwhelmingly of electrostatic nature in that the large dipole moment is respon-
sible for the strong binding. Interactions between small, non-polar molecules are
much weaker. Two methane molecules bind to each other with about 2 kJ/mol. This
is less than 10% of an H2O···H2O interaction. Correspondingly, methane boils at
90 K whereas water is a liquid at room temperature. The direct interactions between
polar groups are therefore orders of magnitude stronger than those between non-
polar groups.
4.6 Blame It All on Water!
The data that were presented in the previous section could suggest that protein–
ligand interactions are mainly determined by H-bonds and ionic interactions.
All the more astonishing is the fact that the acetate ion, CH3COO
, does not
form a dimer with the guanidinium ion H2NC(═NH2
+
)NH2 in water. Likewise,
amides practically do not associate in water at all, even though hydrogen bonds
often occur between two amide groups in protein structures. How can that be? The
answer is: water is to blame for everything!
All biochemical reactions take place in water, and they only occur at all because
of this reason! The binding of a ligand to a protein occurs in an aqueous environ-
ment. At first, the “empty” binding pocket of the protein is filled with water. A few
water molecules form hydrogen bonds to the protein and are found in an energet-
ically favorable orientation. Other water molecules are in contact with lipophilic
areas on the protein surface and cannot build a perfect hydrogen-bond network.
The ligand is also solvated. When it diffuses into the binding pocket it displaces the
water molecules that are there and must additionally strip off its own solvation
shell. At the same time, the “cave” in which the ligand was situated in the water
phase collapses. Therefore not only are direct interactions between protein and
ligand formed, numerous H-bonds to water molecules are broken.
Table 4.2 Experimental or
quantum mechanically
determined association
energies in the gas phase
Dimer Binding energy in kJ/mol
CH4···CH4 2
C6H6···C6H6 10
H2O···H2O 22
NH3···NH3 18
Na+
··· H2O 90
NH4
+
···CH3COO
400
Na+
···Cl
400
72 4 Protein–Ligand Interactions as the Basis for Drug Action
We want to consider the formation of a hydrogen bond as well as a lipophilic
contact between the protein and ligand more closely. Both processes are displayed
in Fig. 4.6. How are H-bonds formed between protein and ligand? Let us assume
that the polar groups of both partners are solvated. Then at least two water
molecules must be displaced to form an H-bond between the two partners.
The released water molecule can in turn form H-bonds with other water molecules.
In this way, exactly as many new H-bonds are formed as are broken. The total
number of H-bonds remains constant! The gain in free energy is determined by the
relative strength of the different H-bonds as well as the entropic contribution, which
is based on the change in the degree of order of the system (Sect. 4.7). The total
contribution to the free energy that results is difficult to quantitatively predict.
If a ligand manages to form more hydrogen bonds to the protein than were possible
with the solvent shell, very strong binding results. This is particularly the case if in
the binding pocket of the protein, the groups forming the polar H-bond are oriented
in a way that the water molecules alone cannot fully manage to satisfy all these
interactions. This is possible for the ligand because it has an optimal arrangement
of its donor and acceptor groups.
The formation of a hydrophobic contact also leads to the release of water
molecules (that were previously occupying the space) from the binding pocket.
Once released into the surrounding aqueous bulk phase, they form H-bonds with
N
H
H
O
H
H
O
H
O
N
H
H
O
H
H
O
H
O
CH3
H
O
H
H
O
H
CH3
CH3
H
O
H
H
O
H
CH3
+ +
+ +
Formation of a hydrogen bond between protein and ligand
Hydrophobic interactions
Ligand Protein–ligand
complex
Protein
a
b
Fig. 4.6 The influence of water molecules on the strength of protein–ligand interactions.
(a) Upon formation of an H-bond between protein and ligand, water molecules must be displaced.
These form hydrogen bonds to both protein and ligand prior to complex formation. The balance of
hydrogen bonds, that is, the number of H-bonds before and after binding, remains unchanged.
(b) Upon formation of hydrophobic contacts, water molecules are released from an environment
that was unfavorable for them into the bulk water phase. The number of H-bonds increases.
4.6 Blame It All on Water! 73
each other (Fig. 4.6). Because previously H-bonds were possible neither to the
protein nor to the ligand, the total number of H-bonds now increases. Moreover, the
strength with which the water molecules were fixed in the binding pocket before
their release is decisive. If they were strongly fixed, the newly gained degrees of
translational freedom increase the disorder and therefore boosts the entropy, which
is thermodynamically favorable for the free energy DG. If the displaced water
molecules were already severely disordered, their displacement causes very little
entropy gain. Newer findings have shown that the binding pocket does not need to
always be uniformly packed with water molecules. Narrow hydrophobic pockets in
particular are not perfectly solvated. This has consequences for the free energy
balance during binding because it is just this displacement of water molecules that
is decisive for the hydrophobic interactions.
4.7 Entropic Contributions to Protein–Ligand Interactions
In addition to the energetic contributions, the entropic component must also be
considered in the evaluation of the strength of protein–ligand interactions.
As described previously, the entropy S is a measure of the order of a system.
This allows an estimate to be made as to over how many degrees of freedom
a particular amount of energy is distributed. A degree of freedom can mean, for
example, a particular vibration of the system or a rotation of individual groups
around one another. A highly ordered system in which the energy is distributed over
only a few degrees of freedom has little entropy; increasing the disorder increases
the entropy and concomitantly decreases the free energy G.
At room or body temperature, proteins and ligands can move in all spatial
directions. Furthermore, a water shell is, of course, also mobile; the water mole-
cules diffuse back and forth. A few of them are spatially fixed for a longer period of
time because they are bound to the protein by several H-bonds. Such water
molecules can be identified by X-ray crystallography of the protein. A spatial
fixation of a molecule is entropically unfavorable. Other water molecules are freely
mobile and are therefore not captured in an X-ray crystal structure. Such water
molecules are in an entropically favorable state because their TDS contribution is
more positive than for a spatially fixed water molecule.
The hydrophobic protein–ligand interaction is, in many cases, of an entropic nature,
above all, when individual, previously fixed water molecules are displaced from the
binding pocket and released into the surrounding bulk water. The entropic contribution
to protein–ligand interactions is therefore based not on direct interactions but rather on
how the number of degrees of freedom for the protein–ligand–water system changes
upon ligand–protein binding. The more water molecules are released from the hydro-
phobic environment, the greater the contribution to the binding affinity. The number of
released molecules is, in a first approximation, proportional to the size of the hydro-
phobic surface that is no longer accessible to water upon ligand binding, that is, in
other words, “buried.” Therefore this surface contribution often serves as a benchmark
for the estimation of the entropic portion.
74 4 Protein–Ligand Interactions as the Basis for Drug Action
In addition to the release of fixed water molecules, there is a further entropic
contribution to the binding energy. The association of a ligand to the protein
leads to a loss in translational and rotational degrees of freedom, and therefore to
a loss in entropy. Before the association, the ligand and protein move freely and
independently of one another. They each have three degrees of translational
and three degrees of rotational freedom. After binding, the protein and ligand rotate
and diffuse together so that three degrees of translational and three degrees of
rotational freedom are lost. Furthermore, a freely mobile, flexible ligand takes on
different conformations (▶ Chap. 16, “Conformational Analysis”) and is therefore
entropically favored. Once bound to the protein the ligand is restricted in its
conformational degrees of freedom to one or a few conformations that fit into the
binding pocket of the protein. It finds itself in an entropically unfavorable state.
Different enthalpic and entropic binding contributions are summarized in Fig. 4.7.
It is first assumed that the entropy TDS contributes positively and the enthalpy
DH contributes negatively to DG. If the negative enthalpic contribution over-
compensates entropic losses, an overall negative DG results (cf. Eq. 4.3). In fact,
such enthalpy-driven binding is very frequently observed, but there are also known
cases, especially with large lipophilic ligands, in which the binding is entropy driven.
This means that the ligand binding is enthalpically unfavorable, but the effect is
over-compensated by the marked entropy increase, that is, DG is overall negative.
Receptor
Bound H2O
molecules
Ligand
in solution
Loosely
associated
H2O molecules
Free rotation
Ligand–receptor
complex
H2O molecules that
can move freely
in solution
Fig. 4.7 Illustration of the thermodynamic contribution to the free energy DG. Before binding, the
ligand can move freely; this gives rise to a certain translational and rotational entropy. Moreover,
the ligand is usually flexible, and adopts different conformations. Protein and ligand are solvated in
that H-bonds to water molecules are formed. Some water molecules are in loose contact with the
protein or the ligand without forming H-bonds. Translational and rotational degrees of freedom
are lost upon binding. The concomitant loss in entropy is unfavorable for the binding. Furthermore,
both the protein and the ligand must shed their water shells, which is also an unfavorable process for
the binding. The binding of the ligand leads to the formation of direct interactions to the protein and
it releases water molecules. Both of these are contributions that are favorable for the binding.
H-bonds are indicated by dashed lines and hydrophobic interactions by dotted lines.
4.7 Entropic Contributions to Protein–Ligand Interactions 75
The entropy gain occurs, as mentioned, because of the release of fixed water
molecules. This, however, is not the only entropy contribution that changes upon
ligand binding. The protein changes too. For example, many side chains in proteins
are distributed over multiple conformational states. Upon binding a ligand, this
distribution can change. According to the total balance, the entropy can increase or
decrease through this change. The same is true for the rotation of side chains,
especially methyl groups. If the rotational behavior changes, the total entropy of the
ligand-binding process is influenced. The picture can even be complicated in that
some areas of the protein transform into a more ordered state, and others become
less ordered. In this way the entropic contribution is partially compensated. It is
often assumed that the changes in the entropic portion of the binding within a series
of very similar ligands are the same. Then such contributions can be neglected in
a relative comparison of ligands. Unfortunately, this simplified picture has proven
to be a fallacy. Just such an example is introduced in Sect. 4.10.
4.8 What Is the Contribution of a Hydrogen Bond to the
Strength of Protein–Ligand Interactions?
Naturally, in any discussion about protein–ligand interactions, the question arises as
to how large the contribution of particular hydrogen bonds to the binding affinity
actually is. The question can be experimentally answered when two protein–ligand
complexes that are only different by one hydrogen bond are compared to one
another. Such a comparison is possible, for example, by using protein mutants in
which an amino acid that contributes an H-bond to the ligand is exchanged
for another amino acid that cannot do this. Alan Fersht conducted an elegant
experiment for protein tyrosyl–RNA synthase in complex with the substrate tyrosyl
adenylate (Fig. 4.8). Numerous H-bonds are formed between the protein and
substrate, for example, between the phenolic OH group of tyrosine 34 and
the substrate. The mutant Tyr34 ! Phe, in which tyrosine is replaced by a non-
polar phenylalanine, was prepared, and the binding of the substrate to the mutant
protein was tested. The binding was weakened by 2 kJ/mol. Analogously, other
mutants were investigated. The loss of a neutral H-bond led to a loss in binding
affinity between 2 and 6 kJ/mol. The H-bonds in which one partner is charged are
stronger. The mutation Tyr169 ! Phe decreases the binding affinity by 15.6 kJ/mol.
Fidarestat 4.1 is a potent aldose reductase inhibitor (▶ Sect. 27.5). It forms
a hydrogen bond to the NH function of the amide group of Leu300 with its
carboxamide group (Fig. 4.9). If the leucine is exchanged for a proline, the
possibility of forming the H-bond is lost because proline has no free NH group.
This exchange means a loss in free energy of 7.8 kJ/mol. When the partitioning of
enthalpy DH and entropy TDS is measured by microcalorimetry, it can be seen
that the H-bond loss is largely of an enthalpic nature (▶ Sect. 7.7). In comparison,
the inhibitor sorbinil 4.2, in which the carboxamide group is missing, should
be considered. Interestingly, the free energy of binding for the wild-type protein
and the Leu300 ! Pro mutant is practically identical. Because the group to form the
76 4 Protein–Ligand Interactions as the Basis for Drug Action
O
P
O
O
O O
N
N
N
N
NH2
O O
O
N
O
H
H
H
H
H H
O
H
O
H
Asp38
Asp176
Tyr169
Gln195
Asp38
His48
Thr51
Gly192
Gly36
Cys35
Tyr34
+
-
Fig. 4.8 Numerous intermolecular hydrogen bonds are formed in the complex between tyrosyl-
RNA synthetase and the substrate tyrosyl adenylate. The exchange of amino acid Tyr34 for Phe or
Tyr169 for Phe leads to the situation that in each case the hydrogen bond can no longer be formed.
This results in a loss of binding affinity.
N
H
N
H
O O
NH2
F
O
O
N
H
O
N
H
O
NH
N
O
N
H
O
NH
Leu300
4.1
Pro300
N
H
N
H
O
F
O
O
N
H
O
N
H
O
NH
N
O
N
H
O
NH
H
O
H
Leu300
4.2
Pro300
ΔΔG: 7.8 kJ/mol
ΔΔH: 6.9 kJ/mol
−TΔΔS: 0.9 kJ/mol
ΔΔG: −0.8 kJ/mol
ΔΔH: 5.1 kJ/mol
−TΔΔS: −5.9 kJ/mol
Fig. 4.9 Fidarestat 4.1 (left) forms a hydrogen bond with its carboxamide group to the NH
function of Leu300 (blue). By exchanging Leu for Pro (red), the H-bond can no longer be formed.
This leads to a DDG loss of 7.8 kJ/mol, which is paid for mostly by the enthalpy (DDH: 6.9 kJ/
mol). The carboxamide group is missing in sorbinil 4.2 (right). The exchange leucine ! proline
leaves the free energy of binding DDG practically unchanged. Sorbinil, however, binds to the wild
type (leucine, blue) enthalpically more favorably and entropically less favorably than to the
proline mutant (red). An entrapped water molecule mediates an H-bond between sorbinil and
Leu300. This brings an enthalpic advantage to the wild type of about 5 kJ/mol. At the same time,
the entrapment of a water molecule is entropically disadvantageous for the wild type (─TDDS: 6
kJ/mol) and compensates the enthalpic advantage.
4.8 Contribution of a Hydrogen Bond 77
H-bond with the NH group of Leu300 is missing in sorbinil, the loss of the NH
function in the protein is hardly noticeable. This explains the practically unchanged
free energy of binding. Nonetheless, the sorbinil complexes with the wild-type
protein and the mutant are different. The binding with the wild type is enthalpically
more favorable, but it is entropically more expensive than with the mutant.
The crystal structure indicates that a water molecule mediates an H-bond between
the ether group of sorbinil and the NH function of Leu300 (Fig. 4.9). This yields
an enthalpy gain of about 5 kJ/mol. At the same time, the uptake of water
is entropically disfavored. This contribution of nearly 6 kJ/mol just compensates
the enthalpic gain so that there is practically no affinity gain in DG in the balance.
The proline mutant cannot form a water-mediated contact to sorbinil because of the
missing NH function. Therefore the enthalpic gain from the H-bond is lost. There is
also no entropic loss from capturing a water molecule.
The three-dimensional structures of a large number of protein–ligand complexes
have been elucidated. Many of these complexes contain hydrogen bonds between
the protein and ligand. The entire issue of the contribution of hydrogen bonds
to the binding affinity becomes apparent in Fig. 4.10. Here the experimentally
determined binding constants for 80 protein–ligand complexes are plotted against
the number of hydrogen bonds. The measured binding constants spread over a
considerable range for a given number of hydrogen bonds. The contribution of
a single H-bond is therefore by no means constant, but rather it varies significantly.
The contribution of an H-bond can even reduce the binding affinity due to an
unfavorable desolvation effect. If two ligands are compared that are only different
16
14
12
10
8
6
4
2
0
0 2 4 6 8 10 12 14
−lgK
i
n
Fig. 4.10 A plot of the
binding constants Ki of 80
crystallographically
investigated protein ligand
complexes shows that Ki has
no direct relationship to the
number of hydrogen bonds
that exist between protein and
ligand.
78 4 Protein–Ligand Interactions as the Basis for Drug Action
in the functional group that forms the H-bonds with the protein, the affinity can
increase, remain the same, or even decrease.
An impressive example of the importance of hydrogen bonds is displayed by the
inhibitors 4.3 of the metalloprotease thermolysin, which were synthesized in the
research group of Paul Bartlett. There, a phosphonamide ─PO2HN─ was replaced
by a phosphinate ─PO2CH2─ or a phosphonate ─PO2O─. The results of these
exchanges are summarized in Table 4.3. Although the X-ray structure shows that
the NH groups form an H-bond, it can nonetheless be replaced with a CH2 group
without loss of binding affinity. This result is understandable if we consider the
number of hydrogen bonds before and after ligand binding for the phosphonamide
and for the phosphinate, as we did in Fig. 4.6. In both cases the number of H-bonds
is unchanged. If the NH group is replaced by an oxygen atom, the binding affinity
decreases by a factor of 1,000. In water, the oxygen atom that is in the place of the
NH group can form a hydrogen bond to the bulk water. In the protein–ligand
complex of the phosphonate ─PO2O─, the electronegative oxygen atom is found
exactly opposite the oxygen of the carbonyl group of Ala113. Two acceptor groups
are directly facing one another. A hydrogen bond cannot be formed here. The
inventory of hydrogen bonds remains unbalanced. Furthermore, the two groups
repel one another, which results in a poorer binding. A similarly positioned case is
illustrated in Table 4.4. Here the binding affinity of three thrombin inhibitors
4.4 that were synthesized at Eli Lilly are compared with each other. The amine
(X ¼ ─NH─) can form an H-bond with Gly219 and binds the most strongly.
The ether (X ¼ ─O─) binds 5,000-times weaker because of an electrostatic repulsion
Table 4.3 Binding constants Ki for the thermolysin inhibitors 4.3, which contain either
a phosphonamide (X ¼ ─NH─), a phosphonate (X = ─O─), or a phosphinate (X ¼ ─CH2─)
group. The phosphonamide group -PO2NH- complexes the zinc ion and simultaneously forms an
H-bond with Ala113
O N
O
P
X
R
O
O O
O
Zn
2+
Ala 113
-
4.3
H
Binding constant Ki in mM X¼
R ─NH─ ─O─ ─CH2─
OH 0.76 660 1.4
Gly-OH 0.27 230 0.3
Phe-OH 0.08 53 0.07
Ala-OH 0.02 13 0.02
Leu-OH 0.01 9 0.01
4.8 Contribution of a Hydrogen Bond 79
between the ether oxygen atom and the carbonyl group of the protein. The aliphatic
compound (X ¼ ─CH2─) shows remarkable binding compared to X ¼ ─NH─ that
is merely reduced by a factor of eight (thrombin) and two (trypsin).
4.9 The Strength of Hydrophobic Protein–Ligand Interactions
We have seen that the direct attractive forces between lipophilic groups are
considerably smaller than those between polar groups. Hydrophobic interactions
are mainly based on the displacement of water molecules. It has been shown in
many experiments that their contribution to the binding affinity is, as a first approx-
imation, proportional to the size of lipophilic surface that is buried upon ligand
binding and therefore no longer accessible to water. Typically it is found that the
contribution is approximately between 50 to 200 J/mol per Å2
of lipophilic
contact area. An example for this is retinol. It binds to the retinol-binding protein
(Fig. 4.1) with a binding constant of 190 nM, exclusively through lipophilic
contacts. This corresponds to a free energy of 39.8 kJ/mol. As a result of the
binding, a lipophilic area of 250 Å2
is buried. The contribution per Å2
amounts
to 39,800/250 ¼ 159 J/mol Å2
.
Six HIV protease inhibitors (▶ Sect. 24.6) are listed in Fig. 4.11. During the
course of a lead structure optimization, the hydrophobic surface of 4.5 was enlarged
by adding hydrophobic groups. It could be confirmed crystallographically that the
binding mode did not change. If the changes in the molecular volume in this series
are plotted against the affinity, a linear relationship is obtained. The binding affinity
increases by 65 J/mol Å2
.
In many cases, the hydrophobic interactions are a dominant contribution to the
free energy of binding. In Fig. 4.12 the lipophilic surface area that is buried upon
Table 4.4 Binding of 4.4 to the serine proteases thrombin and trypsin
X
O
N
O
N CHO
NH
NH
O
N
Gly 216
4.4
H
H
H2N
IC50 values in mg/mL
Enzyme X ¼ ─NH─ ─O─ –CH2–
Thrombin 0.009 52 0.07
Trypsin 0.009 43 0.018
80 4 Protein–Ligand Interactions as the Basis for Drug Action
complex formation of the same 80 protein–ligand complexes as in Fig. 4.10 are
shown together with their experimentally determined binding constants. Here too,
the values are scattered over a broad range.
4.10 Binding and Mobility: Compensation of Enthalpy and
Entropy
According to Eq. 4.3, enthalpy and entropy are in a close physical relationship, and
their sum results in the free energy of binding. If the formation of protein–ligand
N+
H H
N
N
S
O
O
SO2
X X
X=H
Cl
CH3
CF3
Br
I
4.5
Fig. 4.11 The scaffold of the HIV protease inhibitor 4.5 was enlarged during the course of a lead
structure optimization by adding hydrophobic groups to the aromatic N-benzyl group. An
unchanged binding mode was evidenced crystallographically. The additional molecular volume
improved the binding affinity in a linear manner by about 65 J/mol Å2
.
0
6
2
0
4
8
10
12
14
16
100 200 300 400
−lgK
i
X/Å2
Fig. 4.12 In analogy to
Fig. 4.10, a plot of the binding
constants Ki of the 80
crystallographically
investigated protein–ligand
complexes against the buried
hydrophobic surface area
shows that there is no simple
function for this measure
either.
4.10 Binding and Mobility: Compensation of Enthalpy and Entropy 81
complexes is considered, the DG of weakly binding millimolar complexes
and strongly binding nanomolar complexes fall between ca. 35–55 kJ/mol. A lead
structure optimization (▶ Chap. 8, “Optimization of Lead Structures”) usually
covers an even smaller range. Typically, the binding constants are improved by
5–6 orders of magnitude, which correspond to 25–30 kJ/mol. Upon exchanging
functional groups in a lead structure, the enthalpy DH usually varies over
a considerably broader range. If the change in DG is much smaller during the
course of this replacement, out of purely mathematical reasons the changes in the
enthalpy DH must be compensated by an opposite change in the entropy TDS. It is
only in this way that the large variations in the two properties can lead to the result
that DG remains in a small window. An important question is derived from this: Is
there a connection that causes the enthalpy and entropy, which are opponents, to
partially compensate one another during the optimization? How can it be nonethe-
less achieved that both measures are optimized without canceling out the effects of
one another so that DG remains unchanged?
Entropic optimization aims at increasing the hydrophobic surface of a ligand that
becomes buried upon binding. It is embodied in this very intuitive factor that the
enlarged ligand displaces an increasing number of water molecules upon binding.
The design of a rigid ligand with correctly frozen conformational degrees of freedom
usually leads to an improvement in the entropic binding contribution (▶ Sect. 24.6).
To increase the enthalpic binding of a ligand to the protein, above all, additional polar
interactions must be incorporated. This, however, as a rule comes at a price in that the
additional polar groups must first release their solvation shell. This contribution to
desolvation must be paid for. If an amidine group is added to the para position of the
unsubstituted phenyl group of the thrombin inhibitor 4.6, a significant improvement
in the affinity is obtained in 4.7, which is accompanied by a strong increase in the
enthalpy (Fig. 4.13). The inhibitor forms a salt bridge with its benzamidine group to
an aspartate residue in thrombin. It is therefore strongly spatially fixed, which is
entropically unfavorable. The inhibitor 4.6, which lacks the polar group, binds with
a similar geometry. It cannot, however, form the salt bridge. The structure indicates
an increased residual mobility of the inhibitor in the binding pocket, which is
advantageous from an entropic point of view.
The two compounds 4.8 and 4.9 also represent thrombin inhibitors. They
differ in the size of the cycloalkyl group on the basic scaffold that fills
a hydrophobic pocket of the protein. Both inhibitors have practically the
same binding affinity for thrombin. However, the free energy of binding is
partitioned very differently into the enthalpy and entropy components. The
compound with the cyclopentyl substituent has an enthalpic advantage and an
entropic disadvantage compared to the six-membered-ring derivative. From
where does this surprising effect originate? The crystal structures of the two
derivatives with thrombin show an important difference with regard to the
cycloalkyl group. Whereas the five-membered ring is easily recognized in
the electron density (▶ Sect. 13.5), practically no density at all is visible
where the six-membered ring should be encountered. Such an observation in
an X-ray structure indicates a high degree of disorder in a particular moiety of
82 4 Protein–Ligand Interactions as the Basis for Drug Action
the protein–ligand complex. This disorder can be of a purely static nature
whereby the six-membered ring is scattered over many orientations. Alterna-
tively, it can also be the result of a much larger residual mobility in the protein-
bound state than observed for the five-membered-ring derivative. Molecular
dynamics simulations (▶ Sect. 15.7) confirmed this difference. In the case of the
five-membered ring compound, the cyclopentyl group remains in a hydrophobic
pocket and from time to time it undergoes a jump rotation. In doing so, the
virtually planar ring jumps between two orientations and exchanges its upper
and lower face. This practically does not change the placement of the ring in
the pocket. Furthermore, compound 4.8 does not form a hydrogen bond to the
carbonyl group of Gly216. The six-membered ring derivative 4.9 behaves entirely
differently. Here the cyclohexyl group moves out of the binding pocket during the
course of the simulation and returns after some time. At the same time, 4.9 forms
N
O O
N
H
NH2
4.6
N
H
N
O O
NH2
H2N NH
4.7
N
H
N
O O
NH
NH
H2N
4.8
N
H
N
O O
NH
H2N NH
4.9
ΔG: −31.7 kJ/mol
ΔH: −13.6 kJ/mol
−TΔS: −18.1 kJ/mol
ΔG: −46.7 kJ/mol
ΔH: −40.6 kJ/mol
−TΔS: −6.1 kJ/mol
ΔG: −36.2 kJ/mol
ΔH: −10.5 kJ/mol
−TΔS: −25.7 kJ/mol
ΔG: −35.4 kJ/mol
ΔH: −16.9 kJ/mol
−TΔS: −18.5 kJ/mol
Fig. 4.13 Replacement of a phenyl group in 4.6 by a para-benzamidinophenyl group in 4.7 leads to
a significant improvement in the affinity of this thrombin inhibitor, which is largely because of an
enthalpic gain. This is because of the formation of a salt bridge to Asp189 (▶ Sect. 23.3). The
homologous ligands 4.8 and 4.9 bind equally strongly to thrombin, but the binding affinity is divided
into the enthalpic and entropic contributions entirely differently. Compound 4.9 has a significantly
higher residual mobility in the binding pocket than 4.8, which results in an entropic advantage for
this derivative, even though the poorer contacts to the protein cause an enthalpic disadvantage.
4.10 Binding and Mobility: Compensation of Enthalpy and Entropy 83
an intermediate hydrogen bond to Gly216. It is because of this that 4.9 maintains a
large amount of residual mobility.
This difference in the dynamic behavior of 4.8 and 4.9 explains the divergent
thermodynamic profile. The cyclopentyl derivative has an entropic disadvantage
because it is largely fixed in the binding pocket. The unambiguous orientation
achieves an advantage for enthalpic interactions. The good and stabile contacts to
the protein ensure an increased contribution to the interaction energy. This looks
different for the six-membered-ring derivative. Its looser fixation in the binding
pocket means a smaller loss in degrees of freedom upon complex formation.
This causes an entropic advantage. Enthalpically, however, this behavior is
disadvantageous. Because it temporarily leaves the binding pocket, interaction
with the protein can only be formed with reduced strength.
What can be learned from this example? Even when ligands have a very similar
structure, the binding behavior can be significantly different. Their residualmobility in
the binding pocket can have decisive consequences for the thermodynamic binding
contributions. Obviously a mutual compensation of enthalpy and entropy leads to
an unchanged free energy. This interplay of residual mobility in the binding pocket
and quality of the formed interactions has, of course, consequences for the optimiza-
tion process. Medicinal chemists like to think in terms of group contributions to
binding affinity experienced during the exchange of particular functional groups.
Statistical analyses of such group contributions have been carried out and can be
applied as a rule of thumb to guide optimization strategies. The thinking is usually
done additively. How much is gained if a particular group is combined with another in
a molecule that is to be optimized? One should be careful with these considerations.
Small differences in the binding behavior cause such simple rules to fail.
The optimization of the thrombin inhibitors 4.10 and 4.11 should be considered
as examples (Fig. 4.14). Two changes should be undertaken. One is that a hydro-
phobic substituent on the end of the molecular scaffold should be enlarged from an
n-propyl to a phenylethyl group. This means a significant increase in the hydro-
phobic molecular surface area. Second, an amino group introduced next to the
hydrophobic group should form a hydrogen bond to Gly216. The two changes from
4.10 to 4.11 lead to an improvement in the affinity of DDG ¼ 18.6 kJ/mol.
Both modifications could also be introduced sequentially with the intermediates
4.12 and 4.13. If the hydrophobic group is first enlarged from 4.10 to 4.12, only a
small amount of binding affinity is gained. If 4.12 is further optimized, a significant
affinity gain is obtained. Does the amino group yield so much in affinity?
The reverse approach can also be taken, and the amino group can be introduced
to 4.10 to give 4.13. For this change an improvement of only DDG ¼ 9.6 kJ/mol is
obtained. The final enlargement of the hydrophobic surface area of 4.13 to 4.11
features another 9.0 kJ/mol of affinity gain.
This example shows that simple additivity rules fail. As in the example with the
five- and six-membered-ring derivatives 4.8 and 4.9, the balance of the residual
mobility, partial solvation of the binding pocket, and quality of the formed inter-
actions exert a decisive influence on the increase in affinity. The interplay of these
84 4 Protein–Ligand Interactions as the Basis for Drug Action
partially compensating enthalpic and entropic binding contributions is responsible
for this complex picture.
4.11 Lessons for Drug Design
This chapter should not give the impression that a quantitative prediction about the
strength of protein–ligand interactions is impossible. Despite the complex character
of protein–ligand interactions, some simple rules should always be consulted first.
ΔΔG = -3.1 kJ/mol
ΔΔG = -9.0 kJ/mol
ΔΔG = -9.6 kJ/mol ΔΔG = -15.5 kJ/mol
N
O
O
N
H
Cl
4.10 ΔG = -19.9 kJ/mol
N
O
O
N
H
Cl
NH2
4.13 ΔG = -29.5 kJ/mol
N
O O
N
H
Cl
4.12 ΔG = -23.0 kJ/mol
N
O
O
N
H
Cl
NH2
4.11 ΔG = -38.5 kJ/mol
ΔΔG = -18.6 kJ/mol
Fig. 4.14 Optimization of the thrombin inhibitor 4.10 to 4.11 increases affinity by DDG = 18.6
kJ/mol. This is achieved by increasing the size of the hydrophobic side chain (red) from n-propyl
to phenyl and attaching an amino group (blue). The changes can also be accomplished in step-wise
fashion. Increasing the hydrophobic surface to 4.12 enhances affinity only by 3.1 kJ/mol, major
contribution of 15.5 kJ/mol is provided by the addition of the subsequently introduced amino
group. Adding first the amino group to feature 4.13, contributes 9.6 kJ/mol, and the subsequent
substitution of the hydrophobic substituent increases affinity by another 9 kJ/mol. Explanation
for the lack of additivity is found in the complex interference of residual mobility, desolvation and
strength of the formed enthalpic interactions.
4.11 Lessons for Drug Design 85
• Many strong protein–ligand interactions are characterized by extensive lipo-
philic contacts. An increase in the lipophilic contact area between the protein
and the ligand often leads to an improvement in the binding affinity. This means
that the search for unoccupied lipophilic pockets in the protein should be the first
step in the design and optimization of new ligands. Admittedly, this approach
should not be taken too far because a huge increase in the total lipophilicity of
a molecule increasingly reduces its water solubility.
• An additional H-bond does not guarantee an increase in the binding affinity.
An H-bond contributes to the total inventory if a stronger interaction of the
participating groups occurs in the protein–ligand complex compared to those in
bulk water. On the other hand, a buried polar atom that cannot be accommodated
with an H-bond almost always leads to a loss in binding affinity. It must be
ensured in ligand design that polar atoms find binding partners in case they
are no longer water-accessible in the formed protein–ligand complex.
• Each ligand displaces water molecules upon protein binding. There are binding
pockets in proteins that are formed in a way that they cannot be optimally
solvated by water. In these cases, a ligand can be in the position to form more
H-bonds to the protein than is possible with water. The binding affinity of such
ligands can be very high.
• Rigid ligands can bind more strongly than flexible ligands because the loss of
internal degrees of freedom is less for rigid ligands.
• Water can form strong H-bonds, but is often not as good a ligand for transition
metals as thiols, acids, hydroxamic acids, and related groups. Accordingly,
a direct interaction with the metal ion is important for most proteins that contain
a transition metal (▶ Chap. 25, “Inhibitors of Hydrolyzing Metalloenzymes”).
Generally, all protein–ligand interactions that either cannot at all or can only
very poorly be replaced by water contribute strongly to the affinity.
The relative contributions of enthalpy and entropy to the binding affinity DG,
the actual property that is to be optimized in drug design, are important for the
characterization of ligand binding. This goal can be achieved by improving
the enthalpic or entropic contributions, or optimally both in parallel. For this the
different parameters of the protein–ligand interaction must be concentrated upon
(▶ Sect. 8.8). The question is open whether an enthalpically or an entropically
driven binding is advantageous for a particular drug. The break-through strategy
will depend on whether the binding of the active substance will show adequate
tolerance for quickly developing resistance mutations (▶ Sects. 24.5, ▶ 31.4, and
▶ 32.5), high target selectivity, or even the desired broad binding promiscuity
toward multiple members of a protein family (▶ Sect. 25.6, ▶ 26.4, and ▶ 27.4).
4.12 Synopsis
• Emil Fisher introduced the “lock-and-key” principle to describe the interaction
of a small molecule substrate and a macromolecular receptor. More than
50 years later, Koshland extended this picture by induced-fit considerations
86 4 Protein–Ligand Interactions as the Basis for Drug Action
that allow both binding partners to change conformations and mutually adapt to
one another to optimally interact.
• The cells are surrounded by a lipid double-layer membrane with polar
head groups on the exterior and hydrophobic alkyl chains in the interior.
This membrane is a barrier for polar substances, but sufficiently lipophilic
compounds can penetrate and even pass through the membrane.
• The strength of protein–ligand interactions is measured by the binding constant,
which quantifies the stability of a protein–ligand complex as a dissociation
constant according to the law of mass action for complex formation.
• The binding constant is logarithmically related to the Gibbs free energy of
binding. The free energy is composed of an enthalpic and entropic contribution.
The enthalpic part summarizes all terms that relate to the interaction energy
of the binding partners. The entropic part considers the order of the system and
how its energy content is distributed over the degrees of freedom of the system.
• Protein–ligand complexes usually form through non-covalent interactions, pre-
dominantly through hydrogen bonds. The strength of hydrogen bonds strongly
depends on the distributions of charges among the interacting functional groups.
Whether a group is charged or not depends on its protonation state, which is
defined by the pKa value of the titratable groups involved in the protein–ligand
interactions.
• Depending on the local environment in a binding pocket, the pKa values of
titratable groups can vary significantly and can, by this, transform a normal
H-bond into a much stronger charge-assisted H-bond.
• Hydrophobic interactions form through the close proximity of non-polar
functional groups of the binding partners. As direct interactions, they are rather
weak. Nevertheless they can afford a significant contribution to binding affinity
through the release of water molecules from either the lipophilic environment of
the binding pocket or from the ligand surface next to a lipophilic surface patch.
• The strength of protein–ligand interactions is strongly influenced by the water
environment. Both the protein binding pocket and the ligand are solvated before
complex formation and functional groups of protein and ligand will form
H-bonds to water molecules. The total balance of the hydrogen-bond inventory
before and after complex formation matters for binding affinity considerations.
Only if the newly formed hydrogen bonds in the complex are increased in
number and/or stronger than those previously formed to water, a net affinity
increase results.
• The release of water molecules from hydrophobic surface patches can increase
affinity by enthalpy and entropy. Release of fixed water molecules increases the
degrees of freedom and boosts entropy. Replacement of highly disordered water
molecules into the bulk water environment can contribute to an enthalpic gain.
• Entropic contributions to binding arise from an increase of the degrees of
freedom of the protein–ligand–water system and, as a first approximation,
correlate with the size of the hydrophobic surface buried in the formed complex.
• Free energy variations are observed over a window of about 30–55 kJ/mol in
protein–ligand complexes. Variations in enthalpy (DH) and entropy (TDS) can
4.12 Synopsis 87
be much larger. This results from extensive enthalpy/entropy compensation.
Entropically favored increases in the degrees of freedom, release of water
molecules, or enhanced residual mobility are usually detrimental to improve-
ments in the enthalpy that result from strong interactions.
• The pronounced interdependence of enthalpy and entropy along with dynamic
versus interaction geometric phenomena causes simple additive rules about func-
tional group contributions to fail. Instead pronounced cooperative effects are in
operation.
Bibliography
General Literature
Andrews PR (1993) Drug-receptor interactions. In: Kubinyi H (ed) 3D-QSAR in drug design.
Theory, methods and applications. Escom, Leiden, pp 13–40
Andrews PR, Craik DJ, Martin JL (1984) Functional group contributions to drug-receptor
interactions. J Med Chem 27:1648–1657
Böhm HJ, Klebe G (1996) What can we learn from molecular recognition in protein-ligand
complexes for the design of new drugs? Angew Chem Int Ed Engl 35:2588–2614
Böhm H-J, Schneider G (2003) Protein-ligand interactions. From molecular recognition to drug
design. In: Mannhold R, Mannhold R, Kubinyi H, Folkers G (eds) Methods and principles in
medicinal chemistry. Wiley-VCH, Weinheim
Creighton TE (1992) Proteins: structures and molecular properties, 2nd edn. W.H. Freeman,
New York
Gohlke H, Klebe G (2002) Approaches to the description and prediction of binding affinity of
small-molecule ligands to macromolecular receptors. Angew Chem Int Ed Engl 41:2644–2676
Kuntz ID, Chen K, Sharp KA, Kollman PA (1999) The maximal affinity of ligands. Proc Natl Acad
Sci USA 96:9997–10002
Special Literature
Ehrlich P (1913) Chemotherapeutics: scientific principles, methods and results. Lancet 182:445–451
Fersht AR, Shi JP, Knill-Jones J et al (1985) Hydrogen bonding and biological specificity analysed
by protein engineering. Nature 314:235–238
Gerlach C, Smolinski M et al (2007) Thermodynamic inhibition profile of a cyclopentyl- and
a cyclohexyl derivative towards thrombin: the same, but for deviating reasons. Angew Chem
Int Ed Engl 46:8511–8514
Lichtenthaler FW (1994) 100 Years “Schluessel-Schloss-Prinzip”: what made Emil Fischer use
this analogy? Angew Chem Int Ed Engl 33:2364–2374
Mason RP, Rhodes DG, Herbette LG (1991) Reevaluating equilibrium and kinetic binding
parameters for lipophilic drugs based on a structural model for drug interaction with biological
membranes. J Med Chem 34:869–877
Morgan BP, Scholtz JM, Ballinger MD, Zipkin ID, Bartlett PA (1991) Differential binding energy:
a detailed evaluation of the influence of hydrogen-bonding and hydrophobic groups on the
inhibition of thermolysin by phosphorous-containing inhibitors. J Am Chem Soc 113:297–307
Petrova T, Steuber H et al (2005) Factorizing selectivity determinants of inhibitor binding toward
aldose and aldehyde reductases: structural and thermodynamic properties of the aldose reduc-
tase mutant Leu300Pro-Fidarestat complex. J Med Chem 48:5659–5665
88 4 Protein–Ligand Interactions as the Basis for Drug Action
Optical Activity and Biological Effect
5
The three-dimensional shape of a molecule has a decisive influence on its biological
activity. The configuration of a molecule is made up of the bonds between the
atoms. Substances with an asymmetric center that are considered here are
optically active and exist in two different forms. They are asymmetrically built
and have a relationship to one another like of an image and its mirror image. They
are called chiral. It is impossible to convert one form into the other without breaking
and remaking bonds. Chirality is often unimportant to chemists because the image
and mirror image behave exactly the same in a symmetrical environment. If they
are brought into an asymmetrical environment, for instance at the binding site of
a protein, that is not true anymore. The consequences of this for drug design and
therapy are the topic of this chapter.
At the beginning of the nineteenth century, Jean Baptiste Biot observed that
some quartz crystals rotated the plane of linearly polarized light to the right, and
others to the left. Macroscopically this optical activity is imprinted in the asym-
metric, handed (enantiomorphic) form of the crystals; they exist as left and right-
handed mirror-image forms. A little later, Biot found that not only crystals but also
organic compounds like turpentine oil or sugar solutions rotated polarized light in
a particular direction.
5.1 Louis Pasteur Sorts Crystals
The decisive experiment was carried out by the then 26-year-old Louis Pasteur in
Paris in 1848. Several literature reports were inconsistent with his theory that an
obvious relationship must exist between crystal forms and their optical properties.
During a careful investigation of the sodium–ammonium salt of the optically
inactive tartaric acid, he discovered that the crystals had different forms. They
were either right- or left-symmetrical and could be sorted by hand. The crystals of
the enantiomers 5.1 and 5.2 (Fig. 5.1) gave solutions that had an opposite rotational
direction. This confirmed his suspicion. Before Pasteur could present his results to
the Academy of Science, he had to repeat the experiment publically (!) in Biot’s
G. Klebe, Drug Design, DOI 10.1007/978-3-642-17907-5_5,
# Springer-Verlag Berlin Heidelberg 2013
89
presence at the Collège de France. He was lucky. It was only because his solutions
were allowed to slowly evaporate at room temperature that his experiment was
successful. Above the critical temperature of 28 
C, a stoichiometric 1:1 mixture of
both enantiomeric forms, a racemate, would have homogeneously crystallized
(Sect. 5.4).
A few years later Pasteur managed another important observation: mold con-
tamination of a racemic tartaric acid solution caused optical activity to develop.
One enantiomer of tartaric acid is metabolized significantly faster than the other.
With this, he discovered two important methods to separate racemates into enan-
tiomers. Whereas mechanical sorting is limited to a very few examples, enzymatic
kinetic resolution of enantiomers has found broad applications (Sect. 5.4).
5.2 Structural Basis of Optical Activity
An explanation for optical isomerism was possible with the help of the theory of
tetrahedral carbon, which was independently developed in 1874 by Jacobus
COOH
HO H
COOH
H OH
COOH
H OH
COOH
H OH
COOH
HO H
COOH
H OH
Inversion
Symmetry
5.1 5.2 5.3
D-(-)-Tartaric acid L-(+)-Tartaric acid meso-Tartaric acid
Mirror
plane
Fig. 5.1 Optical isomerism in tartaric acid. The enantiomers ()-tartaric acid 5.1 (mp.
168–170 
C, [a]D
20
¼ 12
) and (+)-tartaric acid 5.2 (mp. 168–170 
C, [a]D
20
¼ +12
) cannot be
superimposed upon each other either in the plane of the paper or in 3D space. They have only
a twofold rotational axis (orange axes) that dissect the central C—C bond. Each mirror image
rotates the plane of polarized light in opposite directions to the other. In contrast, meso-tartaric acid
5.3 (mp. ¼ 140 
C) has an inversion center of symmetry (the purple center on the central C—C
bond). Solutions of meso-tartaric acid have no optical activity because the contribution from each
stereogenic center compensates for the other. Racemic tartaric acid (mp. ¼ 206 
C, no rotation) is
a 1:1 mixture of both enantiomers of tartaric acid 5.1 and 5.2. Such mixtures are optically inactive
and are called racemates (Lat. racemus, the grape—tartaric acid is found in grapes and wine).
90 5 Optical Activity and Biological Effect
Henricus van’t Hoff and Joseph-Achille Le Bel. When a carbon atom carries four
different substituents an asymmetric, or, as it is sometimes called, a stereogenic
center is produced. This property is not limited to carbon; nitrogen (in ammonium
salts), or silicon atoms with four different substituents, phosphorus, for instance, in
phosphonic or phosphoric acid esters, or even sulfur atoms in sulfoxides (with two
different substituents, oxygen, and the lone electron pair) can also be asymmetric. The
spatial orientation of these compounds give rise to two mirror-image isomers, each of
which rotates polarized light in the opposite direction to the same degree. These forms
are called enantiomers (earlier antipodes). With the exception of their optical
activity, enantiomers are identical in all of their chemical and physicochemical
properties, but only as long as they are in an achiral environment.
Compounds with two chiral centers that are configured as an image and mirror
image within the same molecule do not exhibit optical activity macroscopically.
meso-Tartaric acid 5.3 (Fig. 5.1), an inversion-symmetrical molecule, exists as
a racemic mixture of chiral conformers. Each conformer exists as an “internal”
racemic mixture because in one energetically favored conformation the molecule
exhibits inversion symmetry. Its left part can be inverted by point reflection through
the center of the central C—C bond into its right part. Optical activity is also present in
other forms of molecular asymmetry. An example is any regular or irregular tetrahe-
dral orientation of different substituents on any other scaffold than a single carbon
atom. Another case can be found in compounds in which two groups are strongly
rotationally hindered around a common bond. An asymmetrical center results, giving
rise to optically active rotational isomers, so-called atropisomers (Fig. 5.2).
The experimentally determined rotational value (+) or () (previously called
d or l) is used to characterize enantiomeric compounds. The spatial configuration
of a stereogenic center in a molecule is described as D or L (Lat. dextro, levo).
This notation is based on the Fischer convention and is related to the absolute
5.4 Twistane
Methalqualone
N
O
N
N
N
5.6
N
O
5.5
N
N
O
Fig. 5.2 Even molecules without stereogenic centers can form an image–mirror-image pair
because of their spatial construction; an example is twistane 5.4. If rotation around the bonds is
limited, as in the case of the sedative methaqualone 5.5, enantiomers are separable (so-called
atropisomers). In non-planar fused ring systems like the dibenzocycloheptadiene derivative 5.6,
the enantiomeric separation depends on the barrier of inversion for the ring system.
5.2 Structural Basis of Optical Activity 91
configuration of D- and L-glyceraldehyde, 5.7 and 5.8 (Fig. 5.3). Most sugars, for
instance glucose 5.9, can be traced back to D-glyceraldehyde 5.7, and the natural
amino acids of proteins, for instance alanine 5.10, can be traced back to
L-glyceraldehyde 5.8. For this reason, today the D/L nomenclature is still frequently
applied to sugars and amino acids. The enantiomers of tartaric acid correspond to
the D-() or L-(+) form.
The Cahn–Ingold–Prelog rule allows an unambiguous stereochemical assign-
ment (Fig. 5.4). According to the convention, the optical center is oriented so that
the substituent with the smallest atomic number is at the back (e.g., a hydrogen
atom or a lone pair of electrons). To use an intuitive explanatory model, we want to
assign this substituent to be the column of a steering wheel. Then the other sub-
stituents lie in the plane of the steering wheel. If these substituents are regarded in
descending order according to the atomic number, and this sequence follows
a rotation to the right, the stereogenic center has an R configuration; the opposite
direction is the S configuration (from the Latin: rectus and sinister). The only
disadvantage to this nomenclature system is that the assignment of the stereocenter
can change just because of the atomic number, valency, or oxidation state. The
homologous L-amino acids serine and cysteine, which are structurally stereochem-
ical analogues that differ only in that an oxygen is exchanged for a sulfur atom, are
classified as (S)-serine and (R)-cysteine.
If one stereogenic center is present in a molecule, there are two enantiomers.
Each additional symmetry-independent stereocenter increases the number of
CHO
CHO
H OH
CHO
Fischer Projection
CH2OH
CH2OH CH2OH
H OH
HO H
H OH
CH2OH
CH2OH
O
H H
H OH
5.7
L-Glyceraldehyde
5.8
D-Glyceraldehyde
Stereoprojection
5.9 D-Glucose
COOH
CHO CHO
H2N H
H OH HO H
5.7 5.8 5.10 L-Alanine
Fig. 5.3 The rotation (+ or ) and the Fischer assignment (D or L) is reported as part of the
characterization of optically active compounds. To determine the Fischer assignment, the longest
carbon chain is drawn vertically with the highest-oxidized carbon atom on top (e.g., 5.9). The
standard is set by the asymmetric carbon (red) of the D- and L-glyceraldehyde pair (5.7 and 5.8).
With sugars (e.g., glucose 5.9) or amino acids (e.g., alanine 5.10), the carbon that is marked with
the arrow decides whether the molecule is D or L.
92 5 Optical Activity and Biological Effect
enantiomers by a factor of 2. For n asymmetric centers, there are 2n
optical isomers.
They occur as 2n1
racemic mixtures because each has two isomers that behave as
mirror images of each other. Diastereomers cannot be superimposed onto each
other by any translation and rotation in space or by generating a mirror image
because the chirality of the stereocenters differs relative to each other. As a result
they have different physicochemical and chemical properties. All pairwise race-
mates of a diastereomeric mixture are present as a 1:1 mixture of enantiomers, but
their relative portions in the total composition can vary greatly. Labetalol 5.11
(Fig. 5.5) is just such a diastereomer pair that consists of two racemates, that is, two
enantiomeric pairs. As a mixed antagonist, it affects the a-, b1-, and b2-adrenergic
receptors (cf. ▶ Sect. 29.3). Because of the asymmetric architecture of biological
macromolecules, the individual components of this mixture vary significantly in
their quantitative and qualitative biological properties (Sect. 5.5, 5.7).
• Large atomic numbers have priority over low ones, (e.g., BrClFONCH)
• Free electron pairs always have the lowest priority
• Larger atomic masses have priority, (e.g., for isotopes DH)
• In case the first sphere is identical, (i.e., C), the next sphere is considered
Cahn–ngold–Prelog Rules
CH3 CH3 CH3
CH3
CH3 CH3
H H H
H
H H
C[C+C+C] C[C+C+H] C[C+H+H] C[H+H+H]
  
F
CH3 CH3 CH3 CH3 CH3
CH3
CH3
CH3
CH3
CH3
CH3
F OH OH
H
NH2 NH2
H
H
H
  
      
• Multiple bonds are considered as multiple single bonds, e.g., aldehyde
CHO = C (O+O+H)CH2OH = (O+H+H)
• If the substituents are chiral, the RS and R,RRS and S,SS,R
• In the case of differently configurated double bonds ZE
(Z = zusammen = together and E = entgegen = apart for the configuration of double bonds)
H H H
CHO
H
HO CH2OH
CHO
H
HOH2C OH
(R)-Glyceraldehyde (S)-Glyceraldehyde
5.7 5.8
Fig. 5.4 The R/S nomenclature that was proposed by R. S. Cahn, C. K. Ingold, and V. Prelog is
unambiguous. Priority rules for each of the four different substituents on the tetrahedral
stereogenic center were established. The substituent with the lowest priority is placed in the
back, and the direction of remaining substituents determine the direction of rotation by decreasing
priority.
5.2 Structural Basis of Optical Activity 93
5.3 The Isolation, Synthesis, and Biosynthesis of Enantiomers
Racemic acids and bases can often be separated by using other enantiomerically
pure, optically active bases and acids, as the formed diastereomeric salts of which
have different solubility. The chemical reaction of racemic acids, amines, and
alcohols with optically active alcohols or acids results in diastereomeric reaction
products. Because of their different characteristics, it is possible to separate them
and finally isolate the desired optically active product by chemical cleavage.
Syntheses that do not start with optically active starting materials, and that use
no optically active auxiliaries, always lead to racemic mixtures, that is, an exact
50:50 mixture of both enantiomers. Access to optically active compounds can be
obtained when synthetic reaction components are taken from the “chiral pool”.
Here, all optically active natural products, their derivatives, and degradation prod-
ucts that are available in an optically pure form can be used as easily accessible
synthetic building blocks. Syntheses with chiral catalysts are particularly elegant. In
most cases the optimization of the yield and enantiomeric purity, which is
expressed as the ee value (ee¼enantiomeric excess) requires considerable process
development. The chromatographic separation of racemates on optically active solid
supports is more appropriate for semipreparative or analytical purposes.
Enzymatic and biotechnological techniques have increasingly gained favor in
the last years. Proteases, esterases, lipases, or hydantoinases react more or less
selectively, preferentially with a distinctly different reaction rate; only one enan-
tiomer of a racemic mixture is transformed to the product. The selectivity and yield
of such a reaction can be optimized through the careful selection of the medium and
other reaction conditions.
The production of optically pure ephedrine is an example of an industrial
application of biotechnological synthesis that has been in use for decades. This
phytopharmacon is found in combination preparations for the adjuvant therapy of
O OH
H2N
N
H
*
*
HO
CH3
5.11 Labetalol
N
H
OH
H
N
H
H
HO
R1
N R2
H CH3
R1
N R2
H3C H
O
(R,R) (S,S)
R1
N
H
R2
OH
H
R1
N
H
R2
H
O
H
H3C H H CH3
(R,S) (S,R)
Fig. 5.5 Because it has two
different asymmetric centers,
labetalol 5.11 is
a diastereomeric mixture of
four different compounds
with different activities on the
same receptor. The
antagonistic potency on the a1
receptor of the (R,R)-, (R,S)-,
(S,R)-, and (S,S)-isomers is:
S,R S,SR,RR,S; and
on the b1 receptor is:
R,R R,SS,SS,R;
and on the b2 receptor is:
R,R R,S S,SS,R.
94 5 Optical Activity and Biological Effect
rhinitis, bronchitis, and asthma. The synthetic intermediate 5.12 (Fig. 5.6) is
obtained from a mixture of benzaldehyde, sugar, and yeast. It is then transformed
to (1R,2S)-()-ephedrine 5.13, which is identical to the natural product in both of
its optical centers. The C1 isomer (1S,2S)-(+)-pseudoephedrine 5.14 is
a diastereomer of ephedrine. Its optical rotation, melting point, and biological
characteristics are different from ephedrine’s.
Innumerable other microbial syntheses deliver optically pure products with or
without the use of achiral, racemic, or enantiomerically pure starting materials. The
biotechnological syntheses of a variety of antibiotics, above all the penicillins and
cephalosporins (▶ Sects. 2.4 and ▶ 23.7), are of particular economic importance.
Even the biotechnological preparation of synthetic intermediates for chiral drugs is
gaining increasing importance.
5.4 Lipases Separate Racemates
Because of their asymmetric architecture, lipases are well suited to separate race-
mates. This can either happen if one of the two enantiomers binds as a substrate
better and reacts faster, or if a chemical reaction takes place in the binding pocket of
the protein with disparate efficiency. Lipases are often used for kinetic resolution
because their architecture and their lipophilic surface allow them to sustain their
reactivity in organic solvents. They belong to a larger family of hydrolyzing
enzymes (▶ Chap. 23, “Inhibitors of Hydrolases with an Acyl–Enzyme
CHO
H3C COOH
O
sugar
yeast
+
Benzaldehyde + Pyruvic acid
H OH
Yeast
CH3
O
(R)-(-)-1-Hydroxy-
1-phenylacetone
5.12
H
CH3NH2/H2/Pt
NHCH3 NHCH3
OH
H H
HO
H
H CH3 CH3
(1R,2S)-(-)-
Ephedrine
5.13 (1S,2S)-(+)-
Pseudoephedrine
5.14
Fig. 5.6 The
biotechnological production
of ephedrine is accomplished
by the fermentation of sugar
with baker’s yeast
Saccharomyces cerevisiae
to pyruvic acid. Pyruvic acid
is coupled to benzyldehyde
with decarboxylation to form
(R)-(–)-1-hydroxy-1-
phenylacetone 5.12. Upon
further chemical
transformation (1R,2 S)-(–)-
ephedrine 5.13 is obtained in
optically pure form. (1S,2S)-
(+)-pseudoephedrine 5.14 is
a diastereomer of ephedrine.
The configuration of one of
the two chiral centers is
different.
5.4 Lipases Separate Racemates 95
Intermediate”). A nucleophilic serine is present in the catalytic center that forms an
acyl–enzyme complex upon hydrolysis of an amide or ester substrate. The protein is
then itself converted to an ester through the OH group of the serine, the so-called
acyl form (▶ Sect. 23.2). Such a complex can then react with another nucleophile,
for instance an amine. The amine attacks the internal enzyme ester, the bond to the
serine oxygen atom is broken, and a new amide bond is formed. If one employs the
right or left-handed form of an amine, one form will react preferentially. In this
way, the racemate is resolved.
How does the enzyme manage to distinguish between both enantiomers of an
amine? The reaction of (R)- and (S)-phenylethylamine 5.15 and 5.16 with the lipase
Candida antarctica was carefully investigated (Fig. 5.7). The energy barrier for the
faster-reacting R form is lower than for the slower S form. A more exact evaluation of
the kinetic parameters showed that this is above all due to an enthalpic advantage of
the (R)-amine. The S form has an entropic advantage. Altogether the enthalpic
component is in excess so that the free energy (DG) favors the R form (Fig. 5.7).
How is this discrimination to be understood? Structural transition-state analogues
were synthesized. In the place of the unstable tetrahedral carbon atom intermediate,
a phosphorus atom was introduced (5.17 and 5.18, Fig. 5.8). This trick gives a stable
compound that is very similar to the transition state form at the carbon atom. These
analogues were synthesized with both enantiomeric amines, and complexes with the
lipase were prepared. Marco Bocola managed to get a crystal structure of both.
E-R
E-S
ΔG
ΔR-SG
ΔR-SH
E-A
TΔR-SS
E+S
+ (R)-Amine
(S)-Amine +
E+R
Reaction coordinate
NH2
5.15
NH2
5.16
ΔSG ΔRG
ΔR-SΔG = −19.4 ± 6 kJ/mol
Fig. 5.7 The reaction of (R)- and (S)-phenylethylamine, 5.15 and 5.16, with Candida antarctica
lipase begins with the formation of an acyl–enzyme complex, E–A. The faster-reacting R-amine
5.15 (red) forms a lower-energy transition state that leads to the free enzyme and the R-amide
(E+R). Analogously the S-amide (E+S) forms from the higher-energy E–S transition state (blue)
from the S-amine 5.16. Difference in DG{
is 19.4 kJ/mol and favors the R form. The DG{
difference is based on a combined enthalpic and entropic contribution in which the R form is
enthalpically favored, and entropically disfavored. The S form is enthalpically disfavored but has
an entropic advantage.
96 5 Optical Activity and Biological Effect
O
a
N
H
CH3
P
O
O
O
5.18
Trp104
Ser HN
N
N
H
His224
N
H
CH3
P
O
O
O
5.17
b
Trp104
Ser HN
N
N
H
His224
Fig. 5.8 Shown above is
a phosphorous transition state
analogue 5.18 for the lipase
with the (S)-amine (a). The
crystal structure and
simulations indicate that it is
less-rigidly fixed in the
transition state and rarely
adopts the geometry with an
H-bond (purple) to histidine
(on the lower edge of the
binding pocket) that is
necessary for the reaction to
occur. The relevant complex
with the transition-state
analogue 5.17 of the faster-
reacting (R)-amine is shown
in (b). This substrate is highly
restricted in the binding
pocket. Its methyl group
(above right) is embedded in
a small niche in the binding
pocket. This substrate
exclusively adopts the
geometry with the H-bond to
histamine. This orientation is
required for a successful
substrate reaction. Therefore,
the (R)-amine 5.17 reacts with
the enzyme faster.
5.4 Lipases Separate Racemates 97
Interestingly the transition-state analogue of the faster-reacting R form fits into the
binding pocket well (Fig. 5.8). On the other hand, the S form demonstrated great
residual mobility in the catalytic center. Computer simulations and molecular dynam-
ics with both forms confirmed the picture: whereas the R analogue had a well-defined
and temporally stable geometry, which is ideal for the reaction, the S analogue is very
mobile and rarely adopts an orientation that is productive for the catalytic reaction in
the lipase. Therefore a successful reaction of this substrate occurs much less often. On
the other hand, the R analogue, fixed in a vice-like clamp and waiting for its reaction,
forms good enthalpic contacts with the enzyme. It takes on a form that is practically
complementary to the enzyme pocket. This results in a large enthalpic advantage. The
fixation has its entropic price though. The methyl group on the stereogenic center
embeds itself in a small niche in the binding pocket. The S analogue does not have
this possibility because its methyl group is oriented in the mirrored direction. In this
case, the anchor that can be embedded in the binding pocket is missing. It has a high
mobility in the catalytic center and does not lose as many degrees of freedom
compared to the situation before enzyme binding. Entropically this is advantageous.
Enthalpically, however, the substrate loses a good interaction and the complementary
fit is rarely achieved. In the end, the enthalpic component prevails so that the
(R)-amine is transformed significantly faster. This is more than enough to ensure
that, in practice, only the (R)-amide is formed in high yield. This lipase can also be
immobilized onto a solid support and loaded into a glass column. After the acyl form
is prepared on the column, a racemic mixture of the amine only need to be poured
onto the column. The (S)-amine and (R)-amide must then simply be collected in
a flask. If the solvent is well chosen, the amide crystallizes directly from the solution,
and can be mechanically separated.
Interestingly, the enantiopreference of the kinetic resolution is lost with
increasing temperature or enlargement of the enzyme pocket. An enlargement can
be achieved by exchanging a tryptophan along the rim of the catalytic pocket for
a histidine. The higher temperature or increased space in the binding pocket
increases the mobility of both substrates in the lipase. The enthalpic advantage of
the faster-reacting R-amine is lost. The entropic difference of both substrates levels
out under these conditions.
This example shows on a molecular level how a lipase achieves kinetic resolution.
With knowledge of the energetic parameters and structural information, an attempt
can be made to tailor lipases for other transformations. Because of the importance of
such reactions, the targeted design of enzyme catalysts has developed into an ever
more important theme for the synthesis of chiral building blocks in new drugs.
5.5 Differences in the Activity of Enantiomers
Flora and fauna stand out because of their symmetry. Consider the face, the arms and
legs, the ribs, or an orchid flower. The exceptions, for instance a snail shell, are rare or
occur, as in the case of the flounder, only under special evolutionary conditions. The
inner organs of vertebrates are oriented partially paired and partially asymmetrically.
98 5 Optical Activity and Biological Effect
On the molecular level, there is no correlating symmetry: optically active
building blocks prevail. All specific interaction partners of biologically active
molecules are chiral. Enzymes and receptors are built of L-amino acids. Nucleic
acids are built on a scaffold of D-ribose or D-deoxyribose building blocks. Most
naturally occurring sugars have a D configuration. Important vitamins, hormones,
and messengers exist in an optically homogenous form. Accordingly it is to be
anticipated that enantiomers of an optically active ligand have different effects.
This has been proven with many thousand examples. Enantiomers most often show
significant differences in their efficacy and the quality of their effect.
According to the suggestion of Everhardus J. Ariëns, biologically active enan-
tiomers are referred to as eutomers, and inactive enantiomers as distomers. The
quotient of both affinities or effects is defined as the eudismic ratio, and the
logarithm of this value is called the eudismic index. It should be considered that
this value must be determined on extremely pure compounds. As little as 1% of the
eutomer as an impurity in an entirely inactive distomer can simulate 1% relative
activity in the distomer!
The more the activities of enantiomers in a racemic pair differ, the stronger the
eudismic ratio drifts away from 1. Examples of this are given in the compounds
5.20–5.22 (Fig. 5.9). A eudismic ratio of 500,000 was measured for a chloride ion
transporter inhibitor. In this case the chemists pulled out all the stops for the
purification of the less-effective enantiomer. Theoretically, a nanomolar-effective
compound should give even higher values. A few naturally occurring peptide
antibiotics contain D-amino acids. This affords them better metabolic stability.
For the same reason, D-amino acids are incorporated into many synthetic peptide
molecules. In the best cases, a stronger and longer-acting analogue is obtained.
Synthetic analogues of peptides with a retro–inverso configuration represent
a special case. The direction of the peptide chain, or a part of the peptide chain is
reversed in these cases, that is, compared to the original peptide, the amino and
carboxyl groups of single amino acids are reversed. In order to maintain the relative
configuration, D-amino acids or their analogues are used instead of L-amino acids.
In this way it is possible to deceive some enzymes or receptors; they bind the
natural peptide and the retro–inverso peptide in the same way. This is true for
thiorphan 5.23 and its retro–inverso analogue 5.24, for two enzymes, but not for
a third one (Fig. 5.10). As a general rule retro–inverso peptides are metabolically
more stable than their original peptide analogues.
Enantiomers differ not only in the strength of their effects, but also the qualities.
These differences can manifest as undesirable side effects of the antipode, for
instance the chiral barbiturate 5.25 (Fig. 5.11). The most severe drug side effect
of the last 50 years was the embryonal malformations that were caused by the
sleeping pill thalidomide 5.26 (Contergan®
); these were caused by one of the two
enantiomers (Fig. 5.11). In the 1950s, thalidomide was claimed to be the best-
tolerated sleeping pill, with the fewest side effects. In 1957 it was introduced to the
market and was available in pharmacies as an over-the-counter drug. There were no
concerns that even women in the first months of their pregnancies were taking these
sleeping pills. In 1961 it was withdrawn from the market because of its teratogenic
5.5 Differences in the Activity of Enantiomers 99
effects. If drug testing were then what it is today, this catastrophe would certainly
have been recognized earlier and probably largely avoided. This would not have
been prevented by the administration of only one enantiomer. Both enantiomers
racemize in vitro, that is, one converts into the other even in a test tube.
O N
H
CH3
CH3
Eudismic
Ratio
*
H OH
b-Blockade
Membrane effect
5.19 Propranolol
H3C O
O H CH3
N
CH3
CH3
CH3
Cholinergic effect
*
+
5.20 Metacholine
O
O H CH3
N
CH3
CH3
CH3
OH
*
*
Ester group center
+
50–100
5.21 Anticholinergic agent
N
H H
t-Bu
OH
α1 Receptor 73
*
*
H D2 Receptor 1250
5 HT1 Receptor 8
5 HT2 Receptor 73
Muscarinic Receptor 0.5
*
5.22 Butaclamol,
(+)-Enantiomer
100
1
320
Amino alcohol center 2–4
Fig. 5.9 Enantiomers have different biological effects. The eudismic ratio of propanolol 5.19 is
100 for b-antagonism, and for unspecific membrane interaction, it is, expectedly, 1. Identical
partial structures can have entirely different eudismic ratios, for instance compare the optical
center of the alcohol moiety of the cholinergic compound metacholine 5.20, with the identical
center on the anticholinergic compound 5.21. Compound 5.21 also proves that the eudismic ratio
of different centers in a compound are independent from each other. The example butaclamol 5.22
also shows that the same substance can have different eudismic ratios on different receptors.
100 5 Optical Activity and Biological Effect
Accordingly, the effect was confirmed in vivo after administration of the suppos-
edly safe enantiomer led to teratogenic effects in an animal model.
The “other” enantiomer can also open new therapeutic opportunities. The
enantiomer of a synthetic opiate, for instance propoxyphene 5.27 (Fig. 5.11) has
weak analgesic and narcotic effects, but good cough-suppressing effects. Enantio-
mers can also influence each other in their effects, and even cancel one another out.
In the case of the calcium channel ligand 5.28, one enantiomer is an agonist and the
other is an antagonist.
In the time period between 1983 and 2002, 38% of all approved drugs were
achiral, 39% were enantiomerically pure, and 23% were racemic or diastereo-
meric mixtures. The fact is that racemic mixtures of chiral drugs were much more
easily accepted in earlier decades than they are today. This was certainly not
caused by a stereophobia on the part of the chemical industry. It was more an
expression of inadequate understanding of the stereospecificity and side effects,
and perhaps also because economic considerations were in the foreground; kinetic
resolution and/or enantiomerically pure syntheses are very expensive. You can
certainly see that the proportion of enantiomerically pure drugs is gaining in the
marketplace (Fig. 5.12).
In the 1970s, Ariëns was the first to decisively come out against the use of
racemic mixtures in therapy. Racemates are, in his view, compounds with 50%
impurity. The non-active or less-active enantiomer is seen as enantiomeric ballast.
He used the diastereomeric mixture labetalol 5.11 (Fig. 5.5, Sect. 5.2) as a showcase
Enzyme Ki Value in mmol
HS N COOH
NEP 24.11
Thermolysin
0.0019
1.8
H
O ACE
5.23 Thiorphan
0.14
NEP 24.11 0.0023
HS
N
COOH
O
Thermolysin
ACE
2.3
ACE
5.24 retro -Thiorphan
10
H
Fig. 5.10 Thiorphan 5.23 inhibits the metabolism of enkephalins and contains
a b-mercaptopropionic acid, the absolute configuration of which is analogous to L-phenylalanine.
Application of the retro–inverso concept gives aminothiol 5.24, the absolute configuration of
which corresponds to D-phenylalanine. The identical binding mode to the zinc protease was
determined for both thiorphan 5.23 and retro-thiorphan 5.24. Thiorphan and neutral endopeptidase
24.11 (NEP 24.11, previously referred to as enkephalinase) are inhibited by both compounds to the
same extent. On the other hand, angiotensin-converting enzyme (ACE), another zinc protease,
discriminates decidedly between these substances.
5.5 Differences in the Activity of Enantiomers 101
example, which is not a “mixed a,b-antagonist” but rather a mixture of four
different drugs. The effect of this “combination” is a result of the effects of each
enantiomer. In most cases Ariëns criticism is fully justified.
It must be ensured that the biological activity is as specific as possible, and the
side effects are minimal in the design and development of new drugs. Compound
uniformity is usually easier to achieve for an enantiomer than for a racemate,
which is a mixture of two substances, or even for a diastereomeric mixture.
The choice of the correct enantiomer can even reduce or prevent undesirable side
effects of metabolites. Selegilin 5.29, a monoamine oxidase inhibitor, is metabo-
lized to the CNS-effective compounds methamphetamine 5.30 and amphetamine
5.31 (Fig. 5.13). The more-active enantiomer of 5.29 luckily forms the less active of
these two metabolites! If the correct enantiomer of the racemate is used, the desired
effect is increased and the undesired CNS side effects are reduced.
There are also a few counter examples. The ()-enantiomer of the calcium
channel blocker verapamil (▶ Sect. 2.6) is more effective than the (+)-enantiomer.
The therapeutic spectrum of both enantiomers is practically identical. After
oral application, the ()-enantiomer is quickly metabolized. Therefore the
(+)-enantiomer contributes substantially to the desired effect. In this case it would
not be economical to try to separate the racemic mixture.
N
O CH3
N
O
*
N
H
O
O
N
N
H
O
O
O
*
5.25 N-Methyl-5-phenyl-5
- propylbarbiturate
5.26 Thalidomide
CF3
H
OCOEt
* *
N
H
COOMe
H3C CH3
H
N
CH3
CH3
CH3
*
O2N
5.27 Propoxyphene 5.28 Bay K 8644
Fig. 5.11 Enantiomers also differ in their mode of action. The (R)-()-enantiomer of barbiturate
5.25 is a hypnotic agent, whereas the (S)-(+)-enantiomer causes seizures. In rats and mice only the
(S)-()-enantiomer of thalidomide 5.26 (Contergan®
) is teratogenic, that is, it causes
embryopathies. Thalidomide 5.26 racemizes in vitro as well as in rabbits. Therefore even the
(R)-(+)-enantiomer is teratogenic in rabbits. Propoxyphene 5.27 is a potent analgesic, the effect of
which depends on the (2S,3R)-(+) enantiomer, dextropropoxyphene. The (2R,3S)-()-enantiomer
is a cough suppressant. The (R)-(+)-enantiomer of Bay K 8644 5.28 is a weak calcium channel
blocker. The (S)-()-enantiomer stabilizes calcium channels in the open form and is therefore an
agonist, that is, a calcium channel opener.
102 5 Optical Activity and Biological Effect
Ibuprofen 5.32, an anti-inflammatory drug of the arylpropionic acid class
(Fig. 5.14 and ▶ Sect. 27.9), is a special case. The potency of the enantiomers are
very different in vitro. In vivo, however, the inactive (R)-()-enantiomer is
converted to a large extent to the (S)-(+)-enantiomer. The reverse reaction does
not take place. Therefore the racemate and each enantiomer are therapeutically
identical, even at the same dose. Only the side-effect spectrum is different because
the inversion of the (R)-()-enantiomer is not 100% complete.
Sometimes the effort to produce a pure enantiomer is hardly justifiable. In
such cases the effects and the side effects of both forms must be compared.
Fig. 5.12 The proportion of achiral, enantiomerically pure, and racemic or diastereomeric drugs
approved in the period from 1983 to 2003. In the meantime, the proportion of newly approved
drugs has shifted decidedly in the direction of enantiomerically pure compounds.
R
N CN
CH3
NH
R
* *
CH3
CH3
Metabolism
5.29 5.30 R = CH3
5.31 R = H
Fig. 5.13 Upon metabolism of the monoamine oxidase inhibitor, selegilin 5.29, which is used to
treat Parkinson’s disease, the more potent (R)-(–)-enantiomer is converted to methamphetamine
5.30 and amphetamine 5.31. The less-active (S)-(+)-selegilin has less severe side effects because it
is not metabolized to CNS-active stimulants.
5.5 Differences in the Activity of Enantiomers 103
According to the result, in special cases the continued use of the racemate or the
development of an achiral analogue can be considered. At any rate, today these data
must be complete before the drug can receive approval.
5.6 Image and Mirror Image: Why Is It Different for the
Receptor?
Enantiomers and diastereomers have different biological characteristics because the
proteins to which they bind have a handedness. They occur naturally in only one
form. The amino acids with their chiral centers and the secondary structural
elements (▶ Sect. 14.2) with their helical rotational direction are responsible for
these properties. If a protein is offered a left or right-handed ligand, different
binding modes are to be expected, just as two right hands come together to shake
hands more easily than a right and a left hand can.
Up to now only a few successful examples of the structure determination of
protein–ligand complexes have been reported with the ligand bound in the left as
well as right-handed form. This is only possible when both enantiomers have
enough affinity for the target protein, that is, they both bind so strongly to the
protein that an X-ray crystal structure could be determined.
The R- and S-enantiomers of the compound BX5633 (5.33) inhibit the serine
protease trypsin (▶ Sect. 23.3) equally well. They have a stereogenic center next to
an acid group. The crystal structure determination explains this lack of discrimina-
tion. The inhibitor’s acid group is oriented outside of the binding pocket so that no
specific interaction is to be expected (Fig. 5.15). A stereopreference cannot exist.
Both enantiomers 5.34 and 5.35 bind to carbonic anhydrase II, a zinc hydrolase
(▶ Sect. 25.7). There is a difference of a factor of 100 in their affinities. As the
X-ray structure with both enantiomers shows, they have similar binding modes
(Fig. 5.16). All properties that relate to the solvation of the ligands must be the same
for both enantiomers. The difference in affinity is therefore only caused by differ-
ences in the binding mode. The sulfonamide groups of both enantiomeric ligands
COOH
H CH3
H3C CH3
COOH
H3C H
H3C CH3
5.32 (R)-(-)-Form
*
No Inversion
5.32 (S)-(+)-Form
Metabolic
Inversion
Fig. 5.14 The (R)-()-enantiomer of ibuprofen 5.32 undergoes a metabolic inversion of its
stereocenter, and the (S)-(+)-enantiomer is formed. As a cyclooxygenase inhibitor in vitro, the
(S)-(+)-form is more potent than the (R)-()-form. The less-active form is converted to the more-
active enantiomer in vivo. Therefore both compounds exhibit equally anti-inflammatory properties
in animal models.
104 5 Optical Activity and Biological Effect
bind almost identically to the catalytic zinc. Further, the endocyclic SO2 group
forms very similar hydrogen bonds to Gln92. The hydrophobic isobutyl side chains
are in similar parts of the binding pockets. The six-membered ring, however must
adopt a conformation in the case of the more-weakly binding enantiomer that is
highly strained. The price for taking on this strained conformation is paid for in the
reduced binding affinity to the enzyme.
The enantiomeric agonists 5.36 and 5.37 bind in the ligand-binding domain of
the retinoic acid receptor with a difference of a factor of 1,000 (▶ Sect. 28.2). The
receptor itself adopts the same geometry (Fig. 5.17). The alcohol function in the
middle of the molecule is at the stereogenic center. In both cases, the hydrogen
bond to Met272 is formed. As a result, the neighboring amide must take on
a deviating orientation in the binding pocket. On the “right” side, the tetraline
moiety for both stereoisomers is in a similar place. On the “left” side, the benzoic
acid moiety of both enantiomers form a hydrogen-bond network with Arg278,
Ser289, and Leu233. The fluorine-substituted benzene ring adopts in both cases
a 180
flipped orientation. These different orientations, together with the diver-
gently oriented amide bond are responsible for the severe difference in the binding
affinity of the mirror-image agonists.
5.7 An Excursion in the World of Antipodes
Experience has taught us that if an enantiomer crystallizes with a particular auxil-
iary base or acid, the other enantiomer will crystallize with the antipode of the
auxiliary in the same way if the identical reaction conditions are applied. Poly-
peptides composed of L-amino acids form right-handed helices, and polypeptides
made of D-amino acids form left-handed helices.
COO−
NH
H2N
NH
R,S
+
+
O
NH2
5.33
Fig. 5.15 The (R)- (gray)
and (S)-enantiomers (beige)
of the inhibitor BX5633 5.33
bind with the same affinity to
trypsin. Because the protein
adopts practically the same
geometry with both inhibitors,
only one structure is shown.
The crystal structure shows
that both have almost
identical binding modes. The
acid function on the
stereogenic center points out
of the binding pocket and into
the surrounding aqueous
medium. Therefore no
stereochemical discrimination
can take place.
5.7 An Excursion in the World of Antipodes 105
Some naturally occurring peptides form ion channels in lipid layers. Their
synthetic antipodes are also able to do this. The more interesting question is: how
does the mirror image of an enzyme behave? In 1992 Stephan Kent and co-workers
prepared HIV protease, a homodimer made up of 299 amino acids, entirely from
D-amino acids. The naturally occurring protein was also prepared in parallel. The
L-enzyme reacts only with L-peptide substrates and the D-enzyme reacts only with
the all-D enantiomer. The same is true for chiral inhibitors of the HIV-1 protease.
An achiral inhibitor, on the other hand, inhibits both enzymes in the same way.
Rubredoxin, an electron-transport protein, was prepared as the D-protein for the sole
purpose of mixing it with the naturally occurring L-protein and to make the racemate!
If the effort involved is considered, this is certainly an approach that takes some getting
used to. The reward for the work was very high-quality crystals. The racemate
crystallized in a centrosymmetric space group (▶ Sect. 13.2), which allowed a better
resolution of the 3D structure than was possible with the natural, all-L enantiomer.
S
NH2 NH2
O
O
S
O
O
S
S
O
O
N S
S
O
O
N
O
5.34 5.35
Fig. 5.16 The enantiomeric sulfonamides 5.34 (gray) and 5.35 (beige) bind in a similar way to
the enzyme carbonic anhydrase. Because the protein adopts practically the same geometry with
both inhibitors, only one structure is shown. The zinc ion in the catalytic center (purple sphere) is
coordinated to the sulfonamide groups. The SO2 groups in the six-membered ring form a hydrogen
bond to Gln92 (green). The hydrophobic isobutylamino moieties on the chiral centers project into
a hydrophobic pocket and fill this out to the same extent. In doing this, the six-membered ring must
adopt a deviating conformation in both enantiomers. In one stereoisomer this conformation is
much more strained than in the other, and causes a loss in binding affinity.
106 5 Optical Activity and Biological Effect
What does a visit to the mirror-image world look like? Achiral drugs would have
an identical potency and mode of action. On the other hand, many enantiomerically
pure drugs would be useless. We would have to watch out for chiral barbiturates
such as 5.25. They would sooner cause a seizure than act as a sedative. In cases in
which chiral antibiotics were used to treat bacterial infections, it would first have to
be established whether the infecting bacteria came from the mirror-image world or
the normal world. The administration of trimethoprim (▶ Sect. 37.2) and
a sulfonamide (both achiral) would help at any rate.
There would be tremendous problems with nutrition. The carbohydrate and
protein metabolism would not work anymore, nor would the resorption of mono-
mers from the gastrointestinal tract. We would not be able to recognize some plants
by their smell. (R)-Carvone smells of caraway seeds, (S)-carvone smells of spear-
mint. Our beloved sugar would have lost its sweet taste, and fruit juices and
S Met272 Met272
S
N
H
O
OH
N
H
O
F
OH
O
F
HOOC HOOC F
(R)-5.36 (S)-5.37
Fig. 5.17 Both enantiomers of the agonists 5.26 (beige) and 5.37 (gray) bind the retinoic acid
receptor with 1,000-fold difference in affinity. Because the protein adopts practically the same
geometry with both ligands, only one structure is shown. Both ligands form H-bonds with their OH
groups to the sulfur in Met272. In doing so, the fluorine-substituted aromatic ring of the benzoic acid
moiety on the left with its central amide bond has to adopt a deviating orientation. The tetrahydro-
naphthalene (tetraline) moiety, on the other hand, is positioned in the same way in both enantiomers.
5.7 An Excursion in the World of Antipodes 107
lemonade would taste sour. Coffee, tea, and cola would retain their stimulatory
effects because caffeine is achiral. Diet drinks would have to be sweetened with
saccharine or cyclamate (both achiral) because aspartame is chiral.
Let us return to the normal world! But first, let us have a quick glass of vodka. It
could also be cognac, whisky, or a dry red wine. The taste would be the same as in
the normal world, or would it not? Despite the many hundred flavor components of
wine, the exchange of a single chiral center could have the consequence that
a connoisseur might no longer recognize the chateau. The euphoric effects would
be the same, though this would not be the case for the hard, optically active drugs
such as heroin, cocaine, or LSD.
5.8 Synopsis
• Compounds with an asymmetric or chiral center give rise to enantiomers, two
isomeric forms that relate to each other like an image and mirror image and
cannot be mutually transferred without breaking and reforming bonds.
• Enantiomers exhibit the same properties as long as they are found in a non-chiral
environment. If exposed to the asymmetric environment of a protein-binding
site, they experience different interactions and thus produce distinct biological
properties.
• Chiral centers are mostly found at atoms carrying four different substituents, but
also an overall handed scaffold can give rise to chirality. If n independent
stereocenters are present, 2n
isomers (diastereomers) are produced occurring as
2n1
racemic mixtures (pair of equally present enantiomers) as long as there is
no internal inversion, mirror, or improper rotation symmetry present.
• Chiral centers are named according to the Cahn–Ingold–Prelog priority rules that
bring the substituents in a unique sequence according to their atomic numbers.
The substituent with lowest priority has to be oriented to the back and the
direction of the remaining substituents determine R/S by the sense of rotation
following decreasing priority.
• Enantiomers can be separated by fractional crystallization after being converted
into diastereomeric salts with appropriate chiral auxiliaries. Also enzymes such
as lipases, esterases, or proteases can be used for resolution because they
transform one enantiomer faster than the other for steric and kinetic reasons.
• Most natural products are optically active and occur in just one form. Biologi-
cally active enantiomers are called eutomers, inactive ones distomers.
• Biological activities of enantiomers and diastereomers can vary greatly either in
strength and quality. Application of racemates has to be examined carefully for
each individual case. Side effects, chemical stability, and deviating metabolism
can have decisive influence on the activity profile.
• On the molecular level the affinity discrimination of enantiomers is explained by
deviating binding modes in the binding pocket of the target protein resulting in
differences of the observed interaction pattern or strain of the adopted bound
conformation.
108 5 Optical Activity and Biological Effect
Bibliography
General Literature
Ariëns EJ, Soudijn W, Timmermans PBMWM (1983) Stereochemistry and biological activity of
drugs. Blackwell Scientific, Oxford
Brown C (ed) (1990) Chirality in drug design and synthesis. Academic, London
Caner H, Groner E, Levy L (2004) Trends in the development of chiral drugs. Drug Discov Today
9:105–110
Eichelbaum M, Testa B, Somogyi A (2002) Handbook of experimental pharmacology, stereo-
chemical aspects of drug action and disposition. Springer, Heidelberg
Holmstedt B, Frank H, Testa B (1990) Chirality and biological activity. Alan R. Liss, New York
Klebe G (2004) Differences in binding of stereoisomers to protein active sites. In: Pifat-Mrzljak
G (ed) Supramolecular structure and function 8. Kluwer Academic/Plenum, New York,
pp 31–53
Smith DF (ed) (1989) CRC handbook of stereoisomers: therapeutic drugs. CRC Press, Boca Raton
Special Literature
Ariëns EJ (1984) Stereochemistry, a basis for sophisticated nonsense in pharmacokinetics and
clinical pharmacology. Eur J Clin Pharmacol 26:663–668
Ariëns EJ (1993) Nonchiral, homochiral and composite chiral drugs. Trends Pharmacol Sci
14:68–75
Ariëns EJ et al (1976) Stereoselectivity and affinity in molecular pharmacology. Fortschr
Arzneimittelforsch 20:101–142
Bocola M, Stubbs MT, Sotriffer C, Hauer B, Friedrich T, Dittrich K, Klebe G (2003) Structural and
energetic determinants for enantiopreferences in kinetic resolution of lipases. Protein Eng
16:319–322
Greer J, Erickson JW, Baldwin JJ, Varney MD (1994) Application of the three-dimensional
structures of protein target molecules in structure-based drug design. J Med Chem
37:1035–1054
Jung G (1992) Proteins from the D-chiral world. Angew Chem Int Ed Engl 31:1457–1459
Klaholz BP, Mitschler A, Belema M, Zusi C, Moras D (2002) Enantiomer discrimination illus-
trated by high-resolution crystal structures of the human nuclear receptor hRARg. Proc Natl
Acad Sci USA 97:6322–6327
Mason S (1986) The origin of chirality in nature. Trends Pharmacol Sci 7: 20–23, and other articles
from the same author on pp. 60–64, 112–116, 155–158, 200–205, 227–230 and 281–285
Stinson SC (1994) Chiral drugs. Chem Eng News S. 38–72, and 9 Oct 1995, S. 44–74
Stubbs MT, Huber R, Bode W (1995) Crystal structures of factor Xa-specific Inhibitors in complex
with Trypsin: structural grounds for inhibition of factor Xa and selectivity against thrombin.
FEBS Lett 375:103–107
Bibliography 109
Part II
The Search for the Lead Structure
The starting point in the development of a new drug is the search for an appropriate
lead structure for a target protein. Next such a target structure within the genome or
proteome must be validated as being relevant as a therapeutic principle. The
production of the pure target structure is possible by using gene technology
methods. After a high-throughput screening assay is established, thousands of test
molecules can be evaluated for binding to the target protein. The X-ray crystal
structure is solved and serves both the search for, and optimization of lead struc-
tures. Without techniques such as bio- and chemoinformatics, molecular modeling,
and computational chemistry this type of search and optimization is unthinkable
(announcement poster from the research group of the author on the occasion of
a conference in 2003 in Rauischholzhausen, Marburg).
112 II The Search for the Lead Structure
The Classical Search for Lead Structures
6
The starting point in the search for a new drug is the lead structure. Such
a substance has already a desirable biological effect, but some specific character-
istics are still inadequate for its therapeutic use. The definition of the term “lead
structure” also means that analogues can be prepared by targeted chemical varia-
tions which produce compounds better than the lead structure in, for instance, their
potency or selectivity. The goal is the optimization of all characteristics until a final
substance is ready for therapeutic use.
The largest part of our pharmacy originates directly or indirectly from natural
products, that is, from plants, animals, or microbial sources, or from endogenous
substances such as hormones and neurotransmitters. Only a few natural products
have become drugs themselves. Examples include morphine, codeine, papaverin,
digoxin, ephedrine, cilcosporin, and hirudin, the latter of which was isolated from
leeches. Examples of endogenous drugs are the thyroid hormone T3, insulin, coag-
ulation factor VIII, erythropoietin, and further proteins for substitution therapy.
Most naturally occurring compounds serve as lead structures. They are chemically
manipulated with the goal of optimizing their desirable characteristics and mini-
mizing their side effects (▶ Chap. 8, “Optimization of Lead Structures”). Examples
are found in the many natural products and endogenous receptor agonists that have
been modified into selective agonists and antagonists (▶ Sects. 6.2, ▶ 6.3, ▶ 6.4, and
▶ 6.6). Drugs are also derived from enzyme substrates (▶ Sect. 6.6 and ▶ Chaps. 23,
“Inhibitors of Hydrolases with an Acyl–Enzyme Intermediate”; ▶ 24, “Aspartic
Protease Inhibitors”; ▶ 25, “Inhibitors of Hydrolyzing Metalloenzymes”;
▶ 26, “Transferase Inhibitors”; ▶ 27, “Oxidoreductase Inhibitors”) which can either
be substrates for endogenous enzymes, for instance, that play a role in blood pressure
regulation or inflammation, or they are substrates of enzymes from viruses, bacteria,
or parasites, of which the metabolism should be specifically shut down.
In the last 100 years preparative organic chemistry has played a decisive role not
only in the systematic variation of lead structures but also in lead structure discovery.
The search for new active substances has delivered many drugs that have no structural
relationship to endogenous examples. In other cases, the relationship between the
biological effect and the mode of action was clarified long after their discovery.
G. Klebe, Drug Design, DOI 10.1007/978-3-642-17907-5_6,
# Springer-Verlag Berlin Heidelberg 2013
113
6.1 How It Began: Hits by In Vivo Screening
The first example of discovering an active principle through testing occurred in the
eighteenth century, and is found in the effects of digitalis. The Scottish physician,
William Withering, while working in England, was consulted by a patient who
suffered from an extremely weak heart. After the doctor was unable to help him, the
patient consulted a gypsy woman, who prescribed a herbal therapy. Impressed by
the recovery of the patient, Withering sought out the woman and asked for the
recipe. He received it in exchange for a handsome fee. The mixture contained an
extract of the (poisonous) purple foxglove, Digitalis purpurea. The physician
investigated the potency of different preparations of these plants in that he gave
the medicines to 163 patients! With this experiment, he established that the best
formulation was made up of the dried, powdered leaves. After the observation was
made that a toxic dose is quickly reached, he recommended that diluted
preparations be administered in repeated doses until the desired effect was
achieved. Even though digitalis is still used today for congestive heart failure,
no one would recommend that Withering’s experimental technique be used to
establish the therapeutic potential of a substance. This approach was neither
ethical nor practical.
6.2 Lead Structures from Plants
The example of the previous section shows that nature has furnished plants with
highly potent substances. A plethora of secondary metabolites, for example, alka-
loids, terpenes, flavones, and glycosides are also available. The contents of about
a hundred different plant species have either directly or indirectly, in the form of
analogues, found their way into human therapy. Traditional medicines use about
5,000–10,000 of the several hundred thousand already known species from the rich
plant kingdom. Morphine, caffeine, quinine, cocaine, ephedrine, coniine, atropine,
and reserpine were already mentioned in ▶ Sect. 1.1. Further plant-based pharma-
ceuticals that are used in therapy, or that have served as lead structures for the
development of medicines are compounds 6.1–6.7 (Fig. 6.1), and, in addition,
emetine, pilocarpine, podophyllotoxin, and the vinca alkaloids vinblastine and
vincristine.
Why do plants contain so many valuable therapeutic compounds? There is not
a human-related answer because plants did not evolve so that they could become
human medicines. The plants, however, had to respond to their environment, and
a competition with other species occurred. The decisive disadvantage of being
a plant is that it cannot run away! That is not a disadvantage when it comes to
reproduction. Bees take care of the first part, and aerodynamic seeds help with the
rest. An effective protective mechanism against, for instance, fungal infection and
pests such as caterpillars, sheep, and cattle served as a selection advantage for
some plants. The substances that offer an advantage taste bitter, hot, or are
toxic. They exert their effects in that they interact with the enzymes or receptors
114 6 The Classical Search for Lead Structures
of the “enemy.” The stronger the effect, the better the protection. A successful
principle of evolution is the development of defensive substances that do not kill,
but cause an unpleasant experience for the predator, which in turn teaches the
enemy to stay away. That is how butterflies survive that accumulate poisonous
N+
H3C CH3 OH
O
OMe
OH N+
O
R
O
O
6.1 Tubocurarin
x 2 Cl−
H CH3
CH3
OH
MeO
MeO
N
OMe
O
CH3
O
O
OH
O
H
MeO
OMe
O
CH3
O
OH
OH
6.2 Papaverin
6.3
6.4
OH
HO
O O
CH3
H
H
CH3
O
O
O
H3C
H
O
H
CH3
H
O
O
H3C
OH
CH3
O
O
6.6 Artemisinin
H
N
H
O
CH3
N
H
OH O
H
O
O
O
O O
CH3
O
H3C
NH2
6.5 Paclitaxel 6.7 Huperzin A
Digitoxin, R = H
Digoxin, R = OH
Fig. 6.1 Natural products from plants that have been introduced to therapy or have served as lead
structures include, in addition to the substances introduced in ▶ Sect. 1.1, tubocurarine (curare)
6.1, papaverin 6.2, digitoxin 6.3, digoxin 6.4, and the related cardiac glycosides. Newer natural
products from plants with great therapeutic potential include paclitaxel (Taxol®
) 6.5 for tumor
therapy, artemisinin 6.6 for malaria therapy (▶ Sect. 3.3), and the acetylcholinesterase inhibitor
huperzin A 6.7 for the potential treatment of Alzheimer’s disease.
6.2 Lead Structures from Plants 115
plant-based substances in their bodies, and even those others that just imitate the
appearance of these butterflies. After the first experience with the poisonous
species, birds give both species a wide berth.
Plant substances have already undergone a selection process on biologically
relevant proteins; during the course of evolution they have “seen” receptors and
binding sites. Further, the course of their biosynthesis takes place in the binding site
of a protein, that is, they have functionality that mediates affinity to a protein.
Certainly, there are many plant substances that coincidently have a biological effect
in humans. Morphine contains a basic nitrogen, a phenolic hydroxyl group, an ether
bridge, and a hydrophobic domain: a medicinal chemist would also choose such
a mixture of functional groups, without the complicated ring structure, in the
conception of an active substance.
The isolation of natural products from plants for lead discovery has experienced
rather changing valuation in the last decades. Large pharmaceutical companies
have repeatedly started ambitious programs to elucidate the mechanism of action of
traditional medicines, only to abandon the area again disappointed. The disappoint-
ments are a result of an unfavorable relationship between effort and reward. All too
often only a toxin is isolated instead of a valuable lead structure, and all too often an
already-known principle is found. Nonetheless, the search continues. Nature offers
structural variation that the chemist can only dream of.
6.3 Lead Structures from Animal Venoms and Other
Ingredients
In contrast to the plants, the evolution of animal venoms occurred with the objective
of subduing prey or defending against an enemy. Many of these substances are
proteins, peptides, and alkaloids. They function as potent poisons that can quickly
lame or kill a victim. Because of this, many active substances from animals are
unsuitable for therapy, but others, for the exact same reason, are interesting lead
structures. Animal products offer many surprises, as illustrated in the following two
examples.
Despite its simple structure, epibatidine 6.8 (Fig. 6.2), which was isolated from
the Ecuadorian poison dart frog Epipedobates tricolor, is a 100-fold more-potent
analgesic than morphine! It does not affect the opiate receptor, but rather it is an
agonist at the nicotinic acetylcholine (nACh) receptor (▶ Sect. 30.4). That comes as
no surprise when its structural similarity to nicotine 6.9 is considered. Epibatidine
has a binding constant of 0.04 nM on the nACh receptor, which is 50-fold stronger
than nicotine. Unfortunately, its analgesic effects are coupled with a pronounced
body temperature reduction (hypothermia).
Dolastatine 6.10 (Fig. 6.2) was isolated from the wedge sea hare, Dolabella
auricularia, a marine snail. It is an interesting lead structure for antitumor com-
pounds. Synthetic analogues of 6.10 cause the complete disappearance of tumors in
some animal models. The diversity of marine animals in particular has historically
been a rich source of new and interesting lead structures and modes of action.
116 6 The Classical Search for Lead Structures
Other animal substances have gained importance in experimental pharmacology.
Among them are the poison of the notorious fugu fish, tetrodotoxin 6.11, and the
steroid alkaloid batrachotoxin 6.12 from the skin of the Columbian poison dart frog
(Fig. 6.2). Whereas tetrodotoxin specifically blocks sodium channels, batrachotoxin
stabilizes sodium channels in the open form.
Peptides from snake venom made a decisive contribution to the development of
the antihypertensive angiotensin-converting enzyme inhibitors (▶ Sect. 25.4).
Research on the area of thrombin inhibitors in the past years have turned toward
the active ingredient of leech saliva, hirudin. Aside from the direct use of hirudin,
longer-acting derivatives, shorter peptides that only bind on the fibrinogen-binding
site, and protein conjugates with other thrombin inhibitors have been derived from
the structure.
H
H
N
NH
Cl N
N
CH3
H
6.9 Nicotine
6.8 Epibatidine
N
N
H3C
N
N N
O O O O
O
O
N
OMe
O
H
CH3
CH3 O
6.10 Dolastatin-15
N
H O
O
O−
H
HO
H
H
OH
H2N+
N
H
N
HO
H
H
OH
CH2OH
H
H
OH
6.11 Tetrodotoxin
HO
HO
N
CH3
H3C
H3C
O NH
O CH3
O
H
O
N
6.12 Batrachotoxin
Fig. 6.2 Epibatidine 6.8, a non-opiate analgesic that binds 50-fold more potently to the nicotinic
acetylcholine receptor than nicotine 6.9, comes from a South American frog (▶ Sect. 30.4).
Dolastatin-15 6.10, which was isolated from a marine snail, is an interesting lead structure for
cancer therapeutics. The toxin of the fugu fish, tetrodotoxin 6.11, is not a lead structure but rather
a sodium channel blocker for experimental (in vitro) use. The steroid alkaloid batrachotoxin 6.12 is
the most potent animal venom known. The LD50 value in mice, that is, the dose necessary to kill
50% of the experimental animals within 24 h, is 200 ng/kg.
6.3 Lead Structures from Animal Venoms and Other Ingredients 117
Animal and human proteins as well as polymeric carbohydrates are extraor-
dinarily important for substitution therapies. Insulin (isolated from the pig pan-
creas) is at the top of the list, followed by aprotinine, a protease inhibitor
(isolated from cattle lungs), digestive enzymes, and the coagulation inhibitor
heparin. Now that the possibility of the gene-technological production of insulin
is available, its isolation from animal organs has become less important. Other
proteins, for example, the erythrocyte-stimulating hormone erythropoietin
(▶ Sect. 29.8), human growth hormone, tissue plasmin activator tPA, urokinase,
and factor VIII, are all manufactured by using gene technology nowadays
(▶ Sect. 32.1). In this way, these proteins are available in practically unlimited
quantities.
The protease ancrod, isolated from the venom of the Malayan pit viper
Agkistrodon rhodostoma, cleaves the precursor of fibrin, fibrinogen, to a product
that can no longer aggregate. Thus the viscosity and the coagulation ability of the
blood is reduced (▶ Sect. 23.4). An elevated thrombosis risk can be significantly
reduced through this mechanism. To isolate the active component of this venom,
several hundred snakes have to be “milked” regularly.
6.4 Lead Structures from Microbial Organisms
When speaking of active substances from microorganisms, antibiotics must be
mentioned first. The b-lactams penicillin and cephalosporin (▶ Sects. 2.4 and
▶ 23.7) are highlighted as particularly valuable lead structures. Aside from oral
bioavailability, the therapeutic goals were broad-spectrum activity and metabolic
stability. Tetracycline 6.13 (Fig. 6.3) was also intensively structurally modified. It
attacks the ribosome during protein biosynthesis (▶ Sect. 32.6). Other microbial
antibiotics, for instance, streptomycin 6.14, are used directly in therapy.
The immunosuppressants ciclosporin A (▶ Sects. 4.7 and ▶ 10.1), FK 506, and
rapamycin also originated from microorganisms. Ciclosporin A is a convincing
example of how difficult it is to predict the potential of a new therapeutic substance.
Sandoz almost abandoned its development because of “lack of market potential.”
This decision would have had fatal consequences because a large portion of the
success of transplantation surgery today can be attributed to this substance. Instead,
ciclosporin became one of the company’s best-selling products.
The fungus Claviceps purpurea, which grows in grain (ergot, Secale cornutum),
contains a toxic alkaloid. For hundreds of years, the consumption of bread that had
been made from contaminated flour was the cause of severe poisonings. The
structures of these alkaloids, for example, ergotamine 6.15 (Fig. 6.3), were in
large part elucidated at Sandoz. Their systematic modification led to active sub-
stances for many indications, e.g., for inducing contractions during labor, migraine
therapy, perfusion disorders, and arterial hypertension. Today they have little
importance because of their limited therapeutic index. Another representative of
this class is the hallucinogen lysergic acid diethylamide (▶ Sect. 2.5), which was
discovered by accident.
118 6 The Classical Search for Lead Structures
Lovastatin and some analogues (▶ Sects. 9.2 and ▶ 27.3) are exceedingly
important therapeutic substances that were isolated from microorganisms; they
interfere in the biosyntheses of cholesterol. Cholecystokinin (CCK) is a peptide
hormone that acts at a G protein-coupled receptor (▶ Sect. 29.1). It induces
multifaceted effects in the central nervous system and gastrointestinal tract. The
non-peptide CCK antagonist asperlicin 6.16 (IC50 ¼ 1.4 mM) originated from
extracts of Aspergillus alliaceus. After intensive structural variation, the much
simpler devazepide 6.17 (IC50 ¼ 80 pM) was designed, which has more than
OH OH
O O
O
O
NH2
NH
HN
HN
NH
H
OH
CH3 N(CH3)2
HO H
OHHO
H2N
NH2
OH
O
O
CHO
H3C
H3C HO
O
6.13 Tetracyclin
H
O
N
H N
N
O
H
HO
R1
HO
OH
R1 = −CH2OH
R2 = −NHCH3
R2
N
CH3
H
O
H
O
6.14 Streptomycin
HN
6.15 Ergotamine
NH
N
O
H
O
CH3
N
N
H
N
H
O
N
O
HO
H
N H
N
H
O
HN
6.16 Asperlicin 6.17 Devazepide
Fig. 6.3 Penicillins, cephalosporins (▶ Sects. 2.4 and ▶ 23.7), and tetracycline 6.13 were impor-
tant lead structures for even better antibiotics. In contrast, streptomycin 6.14 is used in therapy
itself. Ergotamine 6.15 is a typical representative of the ergot alkaloids, from which a plethora of
different drugs have been derived. Likewise, asperlicin 6.16 is a structurally complex microbial
natural product. The 10,000-fold more potent derivative devazepide 6.17 was derived from it.
6.4 Lead Structures from Microbial Organisms 119
10,000-fold better affinity to the CCK receptor (Fig. 6.3). This antagonist is orally
bioavailable and is an appetite stimulator.
The enzyme streptokinase for the dissolution of blood clots, and bacterial
collagenase for wound treatment are examples of therapeutically important proteins
that were isolated from microorganisms.
6.5 Dyes and Intermediates Lead to New Drugs
In 1903, Paul Ehrlich investigated hundreds of dyes in mice that had been infected
with trypanosomes. The result of this research was Nagana Red, the first drug for
Trypanosoma crucei infection, the causative agent of cattle trypanosomiasis. Other
dyes followed, as did colorless compounds that contained amide instead of azo
groups. It was only after Ehrlich’s death in 1916 that Bayer, after having investi-
gated more than a thousand analogues, produced its wonder drug suramin
(Germanin®
) 6.18 (Fig. 6.4). The work in this area led to the discovery of the
antibacterial sulfonamides in the 1930s (▶ Sect. 2.3). Thousands, if not tens of
thousands, of analogues were synthesized and tested. Many were introduced to the
market. Depending on the structure, they cover an extraordinarily broad spectrum
of different pharmacokinetic characteristics.
No actual biological activity was expected from the synthetic intermediates.
They were seen merely as starting material for the desired end product. Despite this,
many intermediates were routinely tested for biological activity, and it was a good
thing too!
CH3
N
H
N
H
N
H
N
H
CH3
O
O
NH
O NH
SO3Na O
O
O
SO3Na
SO3Na
SO3Na
SO3Na
SO3Na 6.18 Suramin
Fig. 6.4 Bayer’s suramin 6.18, which is also known as E 205 or Germanin®
, had strategic
importance for the colonies. An English engineer who was suffering from the African sleeping
sickness (trypsanosomiasis) and was near death despite aggressive treatment with diverse anti-
mony and arsenic preparations, was cured after a few injections of this substance. The solvent for
the preparation of the intravenous injection solution was rain water in the tropical clinical trials(!).
After a short time, suramin was considered to be a “wonder drug.” Despite the fact that the
structure was kept secret, French researchers worked out their own synthesis within a short time.
Suramin is still used for the treatment of trypsanosomiasis because it has good efficacy and a long-
lasting effect.
120 6 The Classical Search for Lead Structures
Gerhard Domagk, the discoverer of sulfonamides (▶ Sect. 2.3), investigated just
such a synthetic intermediate in addition to the many end-target substances and
found a surprisingly good effect against tuberculosis. Structural optimization
afforded thiacetazone 6.19 (Fig. 6.5), which unfortunately turned out to be
hepatotoxic. In the search for a follow-up substance, Bayer started a concerted
program with 5,000 compounds. In 1951 another synthetic intermediate showed
surprisingly potent tuberculostatic activity. Isoniazid 6.20 (Fig. 6.5) was 15 times
more active than the best antituberculosis antibiotic at the time, streptomycin 6.14
(Fig. 6.3). The discovery was palpable. Two other research groups, both in the USA,
simultaneously and independently discovered the effect of this substance, which,
upon enzymatic radical generation, irreversibly binds to the cofactor NADH of
a fatty-acid-synthesizing enzyme of the tuberculosis bacillus. The hypothesis that
metabolic cleavage to isonicotinic acid 6.21, which in turn exerts its effect by acting
as an anti metabolite to nicotinic acid 6.22 (Fig. 6.5), was evidently wrong.
Inhibitors of the enzyme dihydrofolatereductase, for instance, methotrexate 6.23
(Fig. 6.6), are used in the treatment of leukemia (▶ Sect. 27.2). During the inves-
tigation of analogues, a simple synthetic intermediate, mercaptopurine 6.24 was
tested. It showed efficacy, but was too toxic. The further development delivered
azathioprine 6.25, which releases mercaptopurine in the organism (Fig. 6.6). As an
immunosuppressive, azathioprine was even better than the then-used corticoste-
roids (▶ Sect. 28.5). Until the introduction of ciclosporin (▶ Sect. 10.1) it was used
in all organ transplantations. Another intermediate from this class, allopurinol 6.26
(Fig. 6.6), is a xanthine oxidase inhibitor. It is used for the treatment of gout.
6.6 Mimicry: How to Copy Endogenous Ligands
As of the middle of the nineteenth century, biological substances, enzyme sub-
strates, neurotransmitters, and hormones were increasingly being used as
N
N
H
NH2
S R
O
COOH
H
N N
HNCOCH3
6.19 Thiacetazone
6.20 Isoniazid
R = −NH-NH2
6.21 Isonicotinic acid
R = −OH
6.22 Nicotinic acid
Fig. 6.5 Thiacetazone 6.19 and isoniazid 6.20 are tuberculostatics that originated as synthetic
intermediates. Isoniazid penetrates the cell wall and irreversibly binds to the enzymatic cofactor
NADH after radical generation. The originally accepted hypothesis that, upon metabolic
degredation to isonicotinic acid 6.21, it acts as an antimetabolite for nicotinic acid 6.22, proved
to be incorrect.
6.6 Mimicry: How to Copy Endogenous Ligands 121
archetypes for new medicines. The directed design of drugs from these lead
structures led to the “golden age” of pharmaceutical research (▶ Sect. 1.4).
The principal approach is demonstrated here on the example of enzyme inhib-
itors. Enzymes catalyze chemical reactions in that they stabilize the transition state
of the reaction. In doing so, they decrease the activation energy, and the reaction
can proceed at a lower temperature (▶ Sect. 22.3). This specificity can be exploited
particularly well for the optimization of enzyme inhibitors. By starting with knowl-
edge of the reaction mechanism, substrate groups are assembled that are structurally
analogous to the transition state (Fig. 6.7). They imitate it but do not lead to
O
N
H
O
OH
P
X
O OH
Groups that imitate transition states
X = -CH2-, -NH-, -O-
O
H OH
X
CHO CH
OH
OH
B
OH
Substrate
, as
N
H O
X
OH
X
O
H OH
OH
Transition state , as X = -CF3, -CF2-, -Aryl
Fig. 6.7 Examples of substrate, transition state, and groups that imitate the enzymatic transition
state of an amide hydrolysis reaction. A few of the groups reversibly form covalent bonds to the
serine in the catalytic pocket of a serine protease (see ▶ Sect. 23.2).
COOH
O
N
N
NH2
N
N
H
COOH
6.23 Methotrexate
N
N
H2N
CH3
S
OH
S
N
N
NO2
N
N N
H
N N
N N
H
N
N
N N
H
N
H3C
6.24 Mercaptopurine 6.25 Azathioprine 6.26 Allopurinol
Fig. 6.6 Simple synthetic intermediates to methotrexate 6.23 turned out to be new drugs.
Mercaptopurine 6.24 and azathioprine 6.25 are immunosuppressants, and allopurinol 6.26 is
used to treat gout.
122 6 The Classical Search for Lead Structures
a product. In this way in a single step, through an entirely purposeful chemical
transformation, a substrate can be converted into a potent and selective inhibitor.
The correct inhibitor binding geometry improves the affinity by several orders of
magnitude. The two natural products pentostatin 6.29 and nebularine 6.30 (Fig. 6.8)
are inhibitors of the enzymatic transformation of adenosine 6.27 to inosine 6.28 and
impressive examples of transition-state mimetics. The introduction of a hydroxyl
group with the correct stereochemistry increased the affinity of the ligand to the
enzyme by many orders of magnitude.
Never before was the search for new drugs as successful as it was in the two to
three decades of the “golden age.” Subsequently the success rate fell. Research
became more expensive and laborious. How is this explainable? Because of the
success during this period, many indication areas achieved a very high standard of
care. That makes it difficult for modern research to be as successful as before, even
with the use of superior tools. Other reasons include higher requirements for
efficacy and safety.
H2N OH N
OH
H
N
N
NH2
N
N N
N
O
O
N
N
N
O
HO
N
N N
Sugar
OH
HO OH
Hypothetical transition state
of the enzyme reaction
O OH
H
Adenosine-
deaminase
6.27
Adenosine
6.29 Pentostatin
N
N N
N N
N N
N
N
N N
N
O O
HO
HO
HO O
HO OH
HO OH
HO OH
6.28 Inosine 6.30 Nebularine Hypothetical active
form of 6.30
Fig. 6.8 Pentostatine 6.29 and nebularine 6.30 inhibit the enzymatic transformation of adenosine
6.27 to inosine 6.28. The affinity of 6.29 is 7 orders of magnitude more potent than the substrate
adenosine (Ki ¼ 2.5 pM), and the active form of 6.30 is 10 orders of magnitude even more potent
(Ki ¼ 0.3 pM). The structures of pentostatin as well as the active form of nebularine correspond to
the transition state of the enzymatic reaction.
6.6 Mimicry: How to Copy Endogenous Ligands 123
6.7 Side Effects Indicate New Therapeutic Options
Many drugs came from the observation of side effects during clinical or practical
use (see ▶ Sect. 2.8). The diuretic effects of mercury compounds were discovered
purely by accident (▶ Sect. 30.9). In 1919 physicians in the First Medical Univer-
sity Hospital in Vienna were testing a new treatment for syphilis. It was observed in
a 21-year-old woman that her urine production increased from 200–500 mL a day to
1.2–2.0 L on the third day of treatment with the test substance. This result led to the
development of the first effective diuretic (medicine to increase urine production).
Fortunately, we are no longer dependent on extremely toxic mercury compounds
for the therapy of venereal disease or as diuretics!
In 1948 it was observed in vulcanization factories that the antioxidant disulfiram
6.31 (Fig. 6.9) caused workers to become intolerant of alcoholic drinks. This
discovery led to the use of the substance for the treatment of chronic alcoholism.
S
H
CH3
CH3
S
S N(Et)2
S
N
O
N
H
(Et)2N
N
OH
O
CH
OH
OH
6.31 Disulfiram 6.32 Iproniazid
O
OH CH3
O
O
O O
O
6.33 Dicoumarol 6.34 Warfarin
OH
O
HS
H
CH3
6.35 Penicillamine
NH2
H3C
Fig. 6.9 Tetraethylthiuram disulfide 6.31 or disulfiram, better known as Antabuse®
, is an
aldehyde dehydrogenase inhibitor. The accumulation of the toxic acetaldehyde leads to nausea.
Iproniazid 6.32, a simple derivative of isoniazid 6.20 (Fig. 6.5), is a monoamine oxidase inhibitor
(▶ Sect. 27.8). It acts as an antidepressant by prolonging the effects of the biogenic amines. The rat
poison warfarin 6.34 is derived from dicoumarol 6.33. Even though the coagulation parameters
must be closely monitored, it is still the standard of therapy for diseases that are coupled with
a thrombosis risk, for example, heart attack or stroke. Penicillamine 6.35 is a complexation agent
for heavy metals; it is used for–among other indications–the treatment of Wilson’s disease, which
is an inherited disease that leads to the accumulation of copper in the tissues. It was only later that
its efficacy in chronic rheumatic diseases was discovered.
124 6 The Classical Search for Lead Structures
The metabolic intermediate of ethanol, acetaldehyde, is not metabolized any
further. This leads to generalized poisoning symptoms such as nausea, palpitations,
and cold sweats. The effect is, however, difficult to control. Alcohol consumption
after treatment has occasionally been fatal.
A classic example of the discovery of an important indication by observing side
effects can be found in the sulfonamides. The sulfonamide diuretics and the oral
antidiabetics (▶ Sect. 30.2), drugs of choice to treat certain forms of diabetes, were
found in this way (▶ Sect. 8.4).
Iproniazid 6.32 (Fig. 6.9) is a derivative of isoniazid 6.20 (Fig. 6.5). In 1957
a tuberculosis patient noticed a distinctive mood brightening, which led to its broad
use for the treatment of chronic depression. The substance had to be withdrawn
from the market a few years later due to severe side effects (▶ Sect. 27.8).
Sweet clover has been used in Europe to feed livestock for hundreds of years.
During its introduction in the 1920s to the USA and Canada, it was initially stored
inappropriately, with disastrous consequences. Massive bleeding and fatalities in
the cattle were attributed to the spoiled sweet clover (i.e., hemorrhagic sweet clover
disease). The active substance, dicoumarol 6.33 (Fig. 6.9), was introduced into
therapy in 1942, but its effects were unreliable. The Wisconsin Alumni Research
Foundation investigated 150 analogues and produced warfarin 6.34, which was sold
as a rat poison. The name is derived from the company’s acronym WARF, and the
ending “arin” from coumarin. In 1951 an American soldier attempted suicide with
a high dose of warfarin. Because he survived, a clinical trial was initiated. Despite
the need for frequent and tight control of the coagulation values, treatment with
warfarin is the standard therapy today after a heart attack or stroke.
Penicillamine 6.35 (Fig. 6.9) provides an example of an important indication exten-
sion. It was introduced for the treatment of Wilson’s disease, an inherited metabolic
disease thatleadstocopperaccumulationintissue.Because6.35formscomplexeswell,
it is also appropriate for the treatment of heavy-metal poisonings. It was only later, after
its practical use, thatits much larger importance as a basis therapy for rheumatic disease
was recognized. The mechanism of action remains largely unclear.
6.8 From the Traditional Search to the Screening of Large
Compound Libraries
The approaches that are described in the previous sections are still used in industrial
pharmaceutical research today. Because of the enormous costs associated with the
development of drugs, the search for original lead structures is an increasingly
important goal. Large sums are paid for novel therapeutic approaches, test models,
or 3D structures of target proteins. This information can lead to an advantage over
the competition that indeed takes time to realize, but must be zealously defended
and brought to fruition.
According to the principle of risk diversification and the maximal exploitation of
all imaginable resources, today pharmaceutical companies subscribe to a strategy of
broadly established screening of huge substance libraries of plant extracts,
6.8 From the Traditional Search to the Screening of Large Compound Libraries 125
microbial fermentations, and synthetically prepared compounds. The last category
comes from in-house chemistry as well as purchased compounds and combinatorial
substance libraries (▶ Chap. 11, “Combinatorics: Chemistry with Big Numbers”).
Furthermore, a large part of the search for new lead structures takes nowadays place
by computer methods.
The identification of therapeutically relevant target proteins plays an ever–
increasing role for the discovery of new lead structures. The elucidation of the
human genome (▶ Sect. 12.3) has delivered the sequences of all human proteins.
By comparing the expression pattern between diseased and healthy cells, it is
possible to recognize particular proteins as a cause or consequence of a given
pathology (▶ Sect. 12.8). Should such a protein be detected, the next steps
are certain. The therapeutic concept is tested on a genetically modified animal
(▶ Sect. 12.5), or the gene is silenced (▶ Sect. 12.7), a molecular test system is
established, and the 3D structure of the protein is elucidated. In parallel, all
available techniques for lead structure search are employed. Because this process
chain is being carried out with increasingly high throughput, the capacity for lead-
structure searching must be constantly extended.
Many companies try to simultaneously develop chemically unrelated lead struc-
tures for the same indication. The elaborateness of the animal models for the
preclinical profiling and the preparations for clinical testing require so much
labor and expense that it seems hardly justifiable to start such a program with
only one compound class. Risk minimization and distribution are required for the
search as well as the development of a medicine. Techniques that are used for the
detection of new lead structures are presented in the next chapter (▶ Chap. 7,
“Screening Technologies for Lead Structure Discovery”).
6.9 Synopsis
• Many active substances originate from natural products found in plants, animals,
and microbial sources. Their mode of action has been copied as an active
principle for the development of drugs.
• Endogenous substances such as hormones and neurotransmitters also served as
references for drug development.
• Only a few natural products became drugs themselves.
• Usually targeted chemical variations are required to optimize a lead for meta-
bolic stability, half-life, or selectivity to be ready for therapeutic use.
• Plants contain many valuable therapeutic compounds usually developed as an
effective protective mechanism against all sorts of enemies.
• Nature offers a tremendous body of structural variations, however, ambitious
programs to elucidate mechanisms of action of traditional medicines all too often
only isolate toxins and discover already-known principles.
• Animals have developed venoms as aggressive or defense mechanisms to be
used as predators or against enemies. They are mostly proteins, peptides, or
alkaloids that either kill or lame a victim.
126 6 The Classical Search for Lead Structures
• Snake venoms served as references for the development of anti-hypertensive
drugs; active principles to block blood clotting (e.g., by leeches or bats) were
turned into active ingredients for anticoagulation drugs.
• Proteins for substitution therapy (such as insulin, erythropoietin, factor VII) are
manufactured by gene technology.
• Microorganisms have provided leads for antibiotics (e.g., penicillins), which had
to be optimized for oral availability, broad-spectrum activity, and metabolic
stability.
• The immunosuppressant ciclosporin A, a cyclic peptide; ergotamine, a toxic alka-
loid in ergot; lovastatin, an inhibitor of cholesterol biosynthesis; or streptokinase to
dissolve blood clots, are successful drugs originating from microorganisms.
• Dyes and many synthetic intermediates produced in chemical industry were
investigated for biological effects and provided important compound classes
such as the sulfonamides.
• Small but essential structural changes of endogenous ligands transform enzyme
substrates, neurotransmitters, and hormones into successful drugs.
• Many drugs originated from clinical observations of side effects during practical
use, for instance, the anti-diabetic effect of sulfonyl ureas from the observation
of side effects of sulfonamides.
• To exploit all imaginable resources to discover leads today, huge substance
libraries of plant extracts, microbial fermentations, and libraries of synthetically
prepared compounds are screened.
Bibliography
General Literature
Burger A (1983) A guide to the chemical basis of drug design. Wiley, New York
Sneader W (1990) Chronology of drug introductions. In: Hansch C, Sammes PG, Taylor JB (eds)
Comprehensive medicinal chemistry. vol 1, Kennewell PD (ed). Pergamon, Oxford, pp 7–80
Verg E (1988) Meilensteine. 125 Jahre Bayer 1863–1988. Bayer AG, Leverkusen
Special Literature
Badio B et al (1994) Epibatidine: discovery and definition as a potent analgesic and nicotinic
agonist. Med Chem Res 4:440–448 and other works (Special journal edition dedicated to
Epibatidine)
Buss AD, Waigh RD (1995) Natural products as leads for new pharmaceuticals. In: Wolff M (ed)
Burger’s medicinal chemistry and drug discovery. Wiley, New York, pp 983–1033
Hylands PJ, Nisbet LJ (1991) The search for molecular diversity (I): natural products. Ann Rep
Med Chem 26:259–269
Pettit GR et al (1993) Isolation of dolastatins 10–15 from the marine mollusc Dolabella
Auricularia. Tetrahedron 41:9151–9170
Suffness M (1993) Taxol: from discovery to therapeutic use. Ann Rep Med Chem 28:305–314
Tempesta MS, King SR (1994) Ethnobotany as a source for new drugs. Ann Rep Med Chem
29:325–330
Bibliography 127
Screening Technologies for Lead Structure
Discovery 7
In the last chapter, examples were presented of how lead structures can be discovered
by purposefully searching, particularly by using examples from nature or compounds
with known modes of action. Even if a large number of natural products and synthetic
substances are available, it is not always easy to filter the active molecules out and to
assess their value for a given indication. This requires a time and cost-intensive sorting
or screening of enormous substance libraries. By “screening” is meant the more or less
specific biological testing of compounds. Although today molecular test systems and
cell culture models are practically exclusively used, the cost for testing a compound is
between US $2 and US $5. Because typically millions of compounds are tested,
a screening campaign can cost a lot of money!
The screening process can be divided into three phases. First there is an
automatic introductory screening, which is usually carried out by robots and
encompasses libraries of millions of compounds. The first substances that show
an interaction are identified as “hits” that have to be validated by repeated testing.
Next, a more detailed screening follows, with which the chemical space around the
identified compounds is explored. The goal is to establish a structure–activity
relationship (▶ Chap. 18, “Quantitative Structure–Activity Relationships”) and to
improve the pharmacological and physicochemical properties (▶ Chap. 19, “From
In Vitro to In Vivo: Optimization of ADME and Toxicology Properties”). Along the
way, lead structures (so-called “leads”) are discovered. Then in the last phase the
lead optimization takes place through detailed biological testing, through which
a drug candidate is selected for clinical testing (▶ Chap. 8, “Optimization of Lead
Structures”). How can we find appropriate hits from the enormous amount of test
candidates that have the potential to be developed into a medicine? The question is
answered by screening for biological effects.
7.1 Screening for Biological Activity by HTS
The prerequisite for a large-scale screening was the development of in vitro test
systems as a surrogate for animal experiments. The first were carried out on
G. Klebe, Drug Design, DOI 10.1007/978-3-642-17907-5_7,
# Springer-Verlag Berlin Heidelberg 2013
129
isolated enzymes and membrane homogenates for receptor-binding studies.
Later gene technology (▶ Sect. 12.6) made sufficient quantities of pure proteins
available for the development of molecular test systems. This offered the
advantage that homogenous proteins, preferentially human proteins, could be
tested.
In the mid-1990s, automated test systems with an extremely high capacity
(high-throughput screening, HTS) led to a daunting boom. The discovery of
candidates for drug development is now attempted by using the entire methodo-
logical repertoire of biochemistry in a test tube. Meanwhile it is known how to
reprogram cells and organisms so that the function of single genes is highlighted.
The special trick with all of these test methods lies in translating the molecular
effect into a macroscopically visible signal.
Despite the enormous effort that is associated with HTS, and the not-always-
justifiable hit rate, HTS is here to stay in pharmaceutical research. There are
always interesting lead structures to be found in this way (▶ Chaps. 23, “Inhib-
itors of Hydrolases with an Acyl–Enzyme Intermediate”; ▶ 24, “Aspartic Pro-
tease Inhibitors”; ▶ 25, “Inhibitors of Hydrolyzing Metalloenzymes”; ▶ 26,
“Transferase Inhibitors”; ▶ 27, “Oxidoreductase Inhibitors”; ▶ 28, “Agonists
and Antagonists of Nuclear Receptors”; ▶ 29, “Agonists and Antagonists of
Membrane-Bound Receptors”; ▶ 30, “Ligands for Channels, Pores, and Trans-
porters”; ▶ 31, “Ligands for Surface Receptors”; ▶ 32, “Biologicals: Peptides,
Proteins, Nucleotides, and Macrolides as Drugs”). A weakness may be the
limited diversity of synthetic substances, compared with the structural com-
plexity of plant and microbial metabolites. Another limitation of in vitro test
systems is that neither the entire effect spectrum nor many other effects such as
transport, distribution, metabolism, and excretion (▶ Chap. 19, “From In Vitro
to In Vivo: Optimization of ADME and Toxicology Properties”) can be
assessed.
The composition of suitable screening libraries is exceedingly critical. Fre-
quently molecules and test candidates are used that were prepared during the
course of other drug-development projects. As such, these molecules already
have the size of a typical drug. Usually only modest, almost always micromolar
binding to the test receptor is found. To improve the properties of such a hit, it
must be structurally modified. As a general rule, this is accomplished by adding
more chemical groups. This means that the molecular weight can quickly reach or
exceed 500–600 Da, which is considered to be the upper threshold for good
bioavailability (▶ Sect. 9.1). The optimization of such a screening hit therefore
means that the size must be reduced first, so that it can be increased again during
a goal-oriented optimization. Yet the size reduction often comes with a loss in
binding. Therefore the criterion “ligand efficiency” was introduced to judge
a screening hit’s optimization potential. For this, the number of non-hydrogen
atoms of the hit are considered in relation to the binding affinity. Small sub-
stances that have good binding in relation to their size are seen as particularly
promising candidates for an optimization program.
130 7 Screening Technologies for Lead Structure Discovery
7.2 Color Change Demonstrates Activity
Important target proteins for drug development are proteases and esterases, which
are enzymes that cleave peptide and ester bonds (▶ Chaps. 23, “Inhibitors of
Hydrolases with an Acyl–Enzyme Intermediate”; ▶ 24, “Aspartic Protease Inhib-
itors”; ▶ 25, “Inhibitors of Hydrolyzing Metalloenzymes”). How can their enzy-
matic activity be visualized? One prepares synthetic substrates that are similar to
the natural substrate. They carry however, a para-nitroanilide or a para-
nitrophenolate group coupled by a peptide or ester bond (Fig. 7.1) When the
enzyme cleaves this substrate, yellow nitrophenolate or nitroanilide is released,
and the absorption properties of the produced anion are a measurably change. This
is observed spectroscopically. If then, during screening, a compound acts as an
inhibitor, the enzymatic cleavage of the synthetic substrate is more or less
suppressed, and the yellow color is minimized. In this way the inhibition potency
of test substances can be determined (Fig. 7.1)
405 nm
ε
λ
NH2
O−
+
Peptide
NH− NH
N
−
O O
+
R N
O
N
O
+
N
−
O O
N
−
O
+ +
OH
H
O
N
O
O−
+
Ester
O− O
-RCOO−
O−
N
−
O O
+
R O
O O
N
−
O O
N
−
O O−
+ +
cleavage
Fig. 7.1 A p-nitrophenolate or a p-nitroanilide group is added to the terminus of a natural protease
or esterase substrate. The enzyme cleaves the p-nitrophenolate or p-nitroanilide, which becomes
visible as a yellow-colored mesomerically stabilized anion (absorption maximum at 405 nm). If
a competitive inhibitor is added along with the substrate to the enzyme, the cleavage reaction rate
is suppressed depending on the binding strength. This is apparent by the more or less strong yellow
color of the solution and can be quantitatively measured.
7.2 Color Change Demonstrates Activity 131
A broad palette of chromophoric reactions could be developed that are
suitable for the characterization of enzymatic activity. Many enzymes, for example,
dehydrogenases, need NAD(P)H as a natural cofactor, which is subsequently
oxidized to NAD(P)+
(▶ Sect. 27.1). Because the NAD(P)H starting material, in
contrast to the product, absorbs at 340 nm, the progress of the enzymatic reaction
can be followed at this wavelength. As a variation, two enzymatic reactions can be
coupled to one another. This possibility is interesting when the substrate that is
easily spectroscopically followed is produced in an upstream reaction. In this case
the reaction of the enzyme of interest is not actually directly observed. Rather, the
activity of interest is registered based on the consumption of the upstream reaction
products in the subsequent enzyme reaction. Although absorption spectroscopic
assays are preferred for technical reasons, tests that are based on the reaction of
radiolabeled compounds play an even more important role. The activity of kinases
is, for example, followed by using 32
P-labeled adenosine triphosphate. The terminal
phosphate group of the labeled substrate is transferred to the phosphorylated protein
by the kinase (▶ Sect. 26.3). The incorporation rate serves as a measure of the
kinase activity. Receptor-binding studies are carried out with a known radioactively
labeled ligand. The assay investigates to what extent test compounds can displace
the radioactively labeled ligand from the receptor-binding site. This type of test
does not necessarily represent a functional assay though. Agonistic and antagonistic
binding (▶ Chaps. 28, “Agonists and Antagonists of Nuclear Receptors” and ▶ 29,
“Agonists and Antagonists of Membrane-Bound Receptors”) must still be
distinguished.
7.3 Getting Faster and Faster: More and More Compounds by
Using Less and Less Material
Antibodies play an important role in assay development. The enormous specificity
of antibody–antigen interactions can be exploited as a highly sensitive system
(▶ Sect. 32.3). In classical immunoassays, either the release of a radioactively
labeled substance is followed (Radioimmunoassay, RIA), or an enzymatic reaction
is provoked (enzyme-linked immunosorbent assay, ELISA). The latter technique
has enjoyed a distinctly larger application range, mostly because radioactivity is
best avoided as a measured quantity. Because they only recognize a single molec-
ular species, immunoassays are not only highly specific but also versatile.
Screening techniques are optimized to be automated and miniaturized. Driven
by the desire for higher capacity, these tests are hardly ever carried out in 96-well
(8  12) microtiter plates anymore. The wells of these plates hold a reaction volume
of about 0.3 mL. In the meantime 384-well (16  24) microtiter plates are used or
even 1536-well (32  48) plates, the volumes of which are only a few microliters
per well. The aggregation behavior of hydrophobic test compounds poses a large
problem. The aqueous buffer solutions that are used for these assays can cause these
compounds to aggregate. This aggregation generates hydrophobic surfaces, on
which the proteins can adsorb. The concentration of free protein is reduced,
132 7 Screening Technologies for Lead Structure Discovery
which can appear as though the protein is well inhibited. The addition of detergents
can reverse this effect.
By using a sophisticated robot system, 100,000 assays a day can be carried
out. This leads to an enormous flood of data to be evaluated. The reduced test
volume has the advantage that much less material is consumed. Furthermore, the
measurements can be carried out quickly. At the same time the sample manipu-
lation has become ever more difficult. One only has to consider the evaporation
of such small amounts of solution, the enormously increasing logistics of
comprehending so much data in parallel, or the reproducibility of the results,
and the necessary sensitivity to measure weak signals with certainty to appreciate
the difficulty.
In order to improve this last aspect, ever more sensitive detection procedures are
used. Fluorescence measuring techniques are particularly sensitive. In the sim-
plest case, a fluorescing substrate such as coumarin (▶ Sect. 14.6) is incorporated in
the place of para-nitroanilide. The protein–ligand binding can also be followed by
fluorescence anisotropy (or polarization). A known ligand is coupled to
a fluorophore and excited with polarized light. The emitted fluorescence is in this
case also polarized. In the time that the excited molecule can freely diffuse in
solution, the extent of the induced polarization decreases. Because a small molecule
can diffuse much faster than a big one, its polarization signal decreases much faster
than if it were bound to a large protein. The difference is determined based on the
change in diffusion character of the large protein, which can be measured.
Even better sensitivity can be achieved with so-called FRET measuring
techniques (fluorescence resonance energy transfer). A resonance energy transfer
can occur between donor and acceptor fluorophores of similar absorption if both
are separated by no more than 50 Å. If, for example, a phosphatase assay is desired,
a phosphorylated peptide substrate must be coupled with a covalently bound donor
fluorophore. The substrate is added with the test compound. Depending on how
potent the inhibiting test compound is, the enzyme’s activity is reduced, and less
substrate is cleaved. Then an antibody is added that binds to the unphosphorylated
substrate. The antibody is also coupled to a fluorescence acceptor, the absorption
maximum of which overlaps with the emission spectrum of the donor fluorophore.
If a fair amount of phosphorylated substrate is still present, that is, the test
compound is a potent inhibitor, the spatial proximity of the donor and acceptor
leads to a strong FRET signal. This can be quantitatively measured.
In the meantime, progress in assay miniaturization allows the detection of single
molecules. This is possible by using fluorescence correlation spectroscopy (FCS).
A confocal laser microscope irradiates approximately a femtoliter of test solution.
If a single fluorophore diffuses through the volume of interest, it causes a time-
resolved fluctuation in the fluorescence signal. An exact analysis of these signals
delivers information about the concentration and diffusion constants. The diffusion
velocity, on the other hand, depends on whether the fluorescence-marker-labeled
substance is bound to a protein or not. If the proteins as well as the ligands are
tagged with different markers, the association and dissociation can even be
followed.
7.3 Getting Faster and Faster 133
7.4 From Binding to Function: Testing in Entire Cells
The binding of a ligand to a protein says nothing about the concomitant function or
change in function. Often it is easy to relate the observed inhibition in an enzyme
assay to a function. The correlation is less obvious with receptors and ion channels
(▶ Chaps. 28, “Agonists and Antagonists of Nuclear Receptors”; ▶ 29, “Agonists
and Antagonists of Membrane-Bound Receptors”; ▶ 30, “Ligands for Channels,
Pores, and Transporters”). If the biochemical pathways and cell cycle regulation
are considered, it becomes even more complex to assign function for enzymes. This
correlation is not so easily reproduced in a test tube. Therefore assays must also be
developed to study function that allow the response of an entire cell to be measured
upon ligand binding. It is possible to culture cells for many different tissues, which
then allows the study of tissue-specific receptors.
Typically the activity of ion channels can be investigated by using binding tests or
radioactive assays. The so-called patch–clamp technique allows the influence of
a drug candidate to be even better characterized. An electrode is attached to the surface
of a cell, and a voltage or current is applied. In this way the opening or closing of single
channels can be registered, particularly when a test molecule is added. This technique
certainly does not encroach on the dimension of the high-throughput techniques. It is
better used to elucidate the function of hits from a prescreening. Fluorescence methods
are more popular for the first step. As an example, Ca2+
-channel function can be
assessed by measuring an increase in intracellular calcium levels by using a dye that
fluoresces in the presence of calcium ions.
Other tests employ the coupling to a reporter gene. Receptor stimulation
initiates a signaling cascade that, for some receptors, leads to the transcription of
gene products that are controlled by the relevant promoters (▶ Sect. 28.1). If the
sequence of the relevant gene is replaced with that of a reporter’s, such as b-
galactosidase, luciferase, or green-fluorescent protein (GFP), then these proteins are
produced by the cell instead. This can subsequently be observed as an easily
detectable signal (Fig. 7.2). As examples, if the produced b-galactosidase cleaves
X-gal, a blue dye is released, luciferase develops an ATP-dependent chemilumines-
cence, and the green-fluorescent protein is detectable because of its own intrinsic
fluorescence.
7.5 Back to Whole-Animal Models: Screening on Nematodes
Primary substance testing on animals as it was once carried out is ethically
unjustifiable today. Further, an animal model is not predictive for target-oriented
optimization. Nevertheless it does have advantages. The reaction of an entire
organism to a substance is immediately transparent, the bioavailability is directly
measured, and side effects as well as synergistic effects are straightaway obvious.
Back in 1963, Sydney Brenner recognized the complexity of molecular biology in
that he emphasized the biochemical control of cellular development. He proposed
that the pinworm (the nematode Caenorhabditis elegans) would be the simplest
134 7 Screening Technologies for Lead Structure Discovery
multicellular organism to investigate. This nematode normally lives in soil and
feeds on bacteria. It is also easily culturable in microtiter plates and fed with
Escherichia coli bacteria. It is a hermaphrodite, has a short lifespan, reproduces
itself within 3 days, can be conserved in liquid nitrogen, is transparent, and
homologous genes have been found in humans for 60–80% of its genes. The
pinworm genome has been sequenced, and we now understand how to easily
manipulate it. Because it is transparent, any internal changes can be easily observed
so that, for instance, proteins can be tagged with fluorescence markers. Its 959
somatic cells form many different organs, including a nervous system with 302
neurons. Can substance testing be carried out in such a life form? The ethical
threshold may be set lower in this case. But then, how predictive would any tests
be? Can such an animal be used to predict mood changes, depression, or appetite
and its relation to obesity? This is only possible if the causes of these diseases are
known on the molecular level, for example, a defect caused by an altered serotonin-
mediated signaling. In such a situation the worm can serve as a model. A first step
toward the discovery of a potential target is selective gene silencing. This is
possible by using RNA interference (▶ Sect. 12.7). If the pinworm (nematode) is
exposed to a substance library, it is possible to see a change in appearance or
behavior. Is the life expectancy lengthened or shortened? These are indications that
the compounds could interfere with the aging process or are toxic. If there are
changes in muscle cells, perhaps it might be useful for neurodegenerative muscle
GFP
hν
hν
DNA
Preparation of
the construct
Promotor
for Gene A
Gene A
DNA
Promotor
GFP
GFP Gene
Cell penetration
DNA
Promotor
for Gene A
GFP Gene
Test model
Activation
by active
substances
Registered signal
Fig. 7.2 Genes are controlled by promoters. Promoter-initiated gene activation leads to the
synthesis of the relevant protein. By using green fluorescent protein (GFP), an easily observed
assay can be constructed based on this principle. For this the gene promoter that is activated by
agonist binding is coupled to the GF-protein gene. Activation of the promoter then delivers not the
original gene product, but rather the GF protein. The presence of GF protein is easily observed
because of its fluorescence upon excitation with ultraviolet light.
7.5 Back to Whole-Animal Models: Screening on Nematodes 135
disease. Aside from macroscopic changes in the body form, changes in the gene
expression pattern can also be analyzed (▶ Sect. 12.9). Are mutations in proteins
apparent? Certainly the worm does not have the same metabolic pathways as we do.
Even its disease models only partially represent the pathophysiology that is seen in
human disease. Nonetheless the direct testing of compounds on the pinworm seems
to afford a new perspective for screening substance libraries. As an alternative, the
fruit fly (Drosophila melanogaster) or the zebra fish (Danio rerio) are also available
as test organisms. They help to test the validity of a therapeutic approach early in
a program.
7.6 In Silico Screening of Virtual Libraries
As described in the previous section, experimental high-throughput screening
(HTS) has been automated with great effort. When fed with compounds from
combinatorial chemistry (▶ Chap. 11, “Combinatorics: Chemistry with Big Num-
bers”), several hundred thousand substances can be scanned by using HTS. At first
it seemed that this would be the end to all rational structure-based techniques. In
view of the enormous financial investment and the disappointingly low hit rate, the
initial euphoria began to soberingly wane. Therefore as an alternative, the technique
of enumerating huge databases on the computer by fitting smaller molecules in
a predefined binding pocket (docking, ▶ Sect. 20.8) was developed (virtual
screening).
The unsatisfactory hit rate from HTS is attributed to the size, structural diversity,
and poorly selected composition of the substance library with respect to the actual
properties of the target protein. The recognition of false-positive and
false-negative hits in biological systems causes large problems. Disappointing hit
rates have been reported for the translation of initial hits into potential lead
structures for lead optimization. This is all the more reason to attempt to develop
virtual screening techniques into a complementary and alternative method. The
prerequisites for the successful use of the these techniques are entirely different
from those of the technology-driven HTS: virtual screening can only reasonably be
applied if the factors that are responsible for a putative drug to bind to its target
protein are understood on the molecular level.
The starting point for this is the spatial structure of the target protein, which is
usually determined by NMR spectroscopy (▶ Chap. 13, “Experimental Methods of
Structure Determination”) or X-ray structure analysis (Fig. 7.3). Models can be
increasingly derived from structurally homologous proteins of known geometry
(▶ Sect. 20.5). To successfully bind to a protein, the ligand must adopt a shape that
is complementary to the binding pocket. Molecules are flexible and can change
their shape through bond rotations that require very little energy (▶ Chap. 16,
“Conformational Analysis”). In addition to spatial fit in a suitable conformation,
the functional groups of a potential ligand must find complementary functional
groups in the binding pocket of the protein. Hydrogen bonds must be formed
between ligand and protein, and hydrophobic molecular portions must find their
136 7 Screening Technologies for Lead Structure Discovery
counterpart in the protein (▶ Chap. 4, “Protein–Ligand Interactions as the Basis for
Drug Action”). For this, the protein binding pocket is analyzed to highlight the
areas that are essential for binding.
For a particular atom type, for instance, a hydrogen-bond donor or acceptor, the
binding pocket is systematically scanned. By using computer graphics, it is possible
to see where functional groups attached to a candidate ligand might be optimally
placed (▶ Sect. 17.10). The composite picture of all such placed atom types in the
binding pocket that are indicated by this analysis reveals a spatial pattern of
physicochemical properties that a ligand must meet to successfully bind to the
protein (“hot spots” ▶ Sect. 17.1 and ▶ 17.10). With these criteria in hand,
a molecular database can be searched that is composed of already-synthesized
compounds or compounds that have been virtually assembled on the computer.
1 2 3 4 5 6
−2
−1
0
1
2
3
Computer
Screening
a
b
c
O
OH
OAc
d
e
f
g
h
Fig. 7.3 The spatial structure of a protein is the starting point for virtual screening (a). The binding
pocket is explored with a variety of different probe atoms, for instance, for hydrogen bond acceptors
or donors (b). Regions that are particularly favorable for such interacting groups are highlighted on
the computer graphics. If the “hot spots” in these areas are summarized, a spatial pattern of properties
that a potential ligand should have become apparent (c). This pattern is called “pharmacophore” and
serves as the search criterion for a database retrieval (d). Potential ligands from a large database are
filtered and energetically evaluated by docking (e). The found hits are either commercially available
or synthesized in the laboratory (f). Next biological testing takes place (g), and if the binding is
successful, the lead structure is crystallized with the protein. The subsequent structural determination
(h) serves as a starting point for further design cycles.
7.6 In Silico Screening of Virtual Libraries 137
In case a hit from the latter group is found, the compound can be subsequently
synthesized. The search is divided into multiple filtering steps that become increas-
ingly stringent and sophisticated with successive reduction of the search quantity.
With the help of fast docking programs (▶ Sect. 20.8), molecules are fitted into the
binding pocket and a binding geometry is generated, from which the expected
binding affinity can be estimated. This step is the decisive one, but unfortunately it
is also the most difficult (▶ Sect. 20.9). In ▶ Chap. 21, “A Case Study: Structure-
Based Inhibitor Design for tRNA-Guanine Transglycosylase”, examples are
presented that were found by virtual screening.
The evaluation of the generated binding geometries is accomplished with suffi-
cient accuracy in about 70% of cases nowadays. An improvement in predictive
power requires that we understand the ligand–protein recognition process better
(▶ Chap. 4, “Protein–Ligand Interactions as the Basis for Drug Action”). The role
of water in the binding, the induced steric and dielectric adaptation, the plastic
behavior and residual mobility of proteins and bound ligands, and the dynamic
changes during complex formation are still poorly understood. The composition of
the databases themselves plays a decisive role in the search’s success. Enlarging the
database alone is not enough. The enrichment of the compounds that could fulfill
the requirements is crucial. Screening is often compared to the search for a needle
in a haystack. When looking for such a needle, it is not helpful to simply double the
size of the haystack! The haystack must be spiked with more promising needles. To
achieve this, all available knowledge about the structure, function, and dynamic
behavior of the target protein must be used to define the database search. Compar-
isons between proteins and protein binding pockets, especially among members of
the same protein family, can offer decisive information (▶ Sects. 20.3, ▶ 20.4,
▶ 20.5, ▶ 20.6). In principle, all of the data that are needed about the composition
of a suitable compound library for a virtual screening are already intrinsically coded
in the structure and geometric interaction properties of the binding pocket. It is only
a question of applying it correctly. Another decisive criterion for a hit is an adequate
pharmacokinetic profile so that satisfactory bioavailability can be achieved
(▶ Chap. 19, “From In Vitro to In Vivo: Optimization of ADME and Toxicology
Properties”).
7.7 Biophysics Supports Screening
Surface plasmon resonance techniques are being increasingly used to screen for
new lead structures. For this a target molecule is anchored onto the gold-coated
surface of a sensor chip. The underside of a glass carrier is irradiated with light
(Fig. 7.4). Changes in the refractive index, which are measured as a shift in total
internal refraction are a measure for bulk change on the sensor surface.
If a compound binds, the resulting change in mass on the gold surface can be
registered. Because the technique is fast and time resolved, other kinetic parameters
such as the association or dissociation rate constants of the binding event can be
measured in addition to the stoichiometry. One problem associated with screening
138 7 Screening Technologies for Lead Structure Discovery
in microtiter plates is the huge amount of time that is needed to load the plate with
compounds. One way around this bottleneck is to apply the entire compound library
to a sensor chip in a microarray format by using spraying techniques. This means
now all the low-molecular weight ligands are anchored on the chip. If a test receptor
protein is added to such a chip, a mass difference is detected where the protein
binds. Because of the spatial resolution of the chip, it can easily be determined
which library compound is responsible for the interaction. The disadvantage of the
method is that the test compounds must be attached with a chemical anchor that
allows them to be immobilized on the chip surface. Surface plasmon resonance has
meanwhile achieved a sensitivity that allows the detection of even very small test
compounds with a mass as small as 100 Da. Therefore the approach can be
reversed: Now the protein is immobilized at the surface and ligand binding from
solution can be recorded.
In Sect. 7.1, the concept of “ligand efficiency” was introduced. To take the latter
aspect into consideration, test libraries are being increasingly supplied with com-
pounds that have a molecular weight of less than 250 Da. In the meantime the term
chemical “fragment” has become popular for these search candidates. The term is
a bit unfortunate because the molecules are actually “complete” small molecules,
and not as the term might suggest that they are simply a “fragment”, that is, an
additional building block to be attached to a lead structure.
Proteins denature when they are heated. A “melting temperature” is defined
when an unfolding process (▶ Sect. 14.2) occurs. This temperature can be mea-
sured very sensitively with a thermal sensor. The binding of a ligand to a protein
Light Source
Sensor Chip
with Gold Film
Prism
Resonance
Signal
Time
II
I
Intensity
Angle
I II
Polarized
Light
Reflected
Light
I
II
Detector
Flow Channel
Sensorgram
Kon Koff
Target Protein Test Ligand
Fig. 7.4 The principle of surface plasmon resonance (SPR). The method registers changes in the
refractive index on the surface of a sensor chip (green). The extent of the changes on the gold surface
that are caused by the binding of the substrate molecule (yellow) onto an anchored receptor (red) leads
to a shift in the resonance angle of the reflected light (I and II). That way, not only the binding affinity
but also the kinetic association (kon) and dissociation (koff) parameters are measured.
7.7 Biophysics Supports Screening 139
changes this melting point. As described in Sect. 7.3, fluorescence measurements,
are extremely sensitive indicators. This effect of melting can be registered in that
the unfolded proteins interact with a fluorescent dye, and the change in fluorescence
signal can be detected. The temperature shift caused by ligand binding can be used
as evidence as to whether a ligand is bound to a protein or not. It has also been
possible to construct quantitative binding assays exploiting this effect. This very
sensitive technique is also suitable to detect weakly binding fragments.
Mass spectrometry has developed significantly in the last decades. By applying
very gentle bombardment conditions it is possible to detach single electrons from
huge biomacromolecules, or even to generate negatively charged species. In the
best case, it is possible to detect the investigated protein in its intact form as
a singly charged ion. The charged particles are then accelerated between charged
parallel-oriented condensator plates. The flow of charged particles can be bent by
the application of a magnetic field. The flight path of a particular particle depends
on its mass and charge. In this way it is possible to separate and detect particles
based on their mass-to-charge ratio. This principle has been refined with the most
sophisticated technology and adept combination of electrical and magnetic fields so
that it is now possible to detect single mass differences of only a few Daltons among
even huge proteins. Clever experimental conditions allow a given situation in
solution, for instance, a protein–ligand complex, to be carried across into the gas
phase without decomposition. There it is ionized and detected in the mass spectrom-
eter. With this technique an assay is at our disposal that can be used to detect the
binding of very small ligands to proteins. It is even possible to cause the tailored
decomposition of the complexes by varying the acceleration voltage. By registering
the voltage at which the decomposition occurs, the strength of the protein–ligand
complex can be assessed. Because the decomposition occurs in the gas phase,
information about the binding strength of such complexes in a water-free environ-
ment is available.
Ligands can also be “fished” with proteins. For this, a protein for which a ligand
is sought, is exposed to an entire library of test compounds in aqueous solution.
Whatever compounds from the library bind to the protein are captured. The protein
is then separated with a microfilter, and the bound ligand is released in that the
protein is chemically denatured. The solution with the released ligands is then
processed, and a micro-HPLC separation is carried out. The chromatographically
separated ligands are then subjected to a very sensitive analysis to determine which
members of the original library were fished out by the protein.
The binding process of a ligand to a protein represents a chemical reaction. As
with all chemical reactions, a more or less pronounced heat of reaction can be
observed. The process can either release (exothermic) or absorb heat (endothermic).
This heat signal can be recorded to register the binding event of a ligand to a protein.
A very sensitive calorimeter is required. When equipped with an electronically
controlled compensatory heating, these devices can achieve astonishing sensitivity.
As an example, such a device was built to study the activity of a butterfly that was
being enticed with different pheromones. The heat that was generated with the stroke
of the wing was detected as a signal by the calorimeter.
140 7 Screening Technologies for Lead Structure Discovery
A dissolved ligand can be titrated by dropwise injection into the solution of
a target protein in such a calorimeter. Each drop results in a heat signal. Upon
increasing saturation of the protein, the heat signal decreases so that a curve can be
generated from which the binding constant of the ligand can be deduced (Fig. 7.5).
If all the signals are integrated over the entire titration, the total heat of reaction for
the binding event is determined. With this, two different thermodynamic binding
characteristics are measured. The free energy DG is determined from the equilib-
rium constant, and the enthalpy DH is given by the integrated heat signal (▶ Sect.
4.3). By using Eq. 4.3, the entropy of binding can be calculated. It is important that
in addition to the proof of ligand binding to the protein, the most relevant thermo-
dynamic parameters DG, DH, and DS are assessable in one experiment at one
temperature.
The method of isothermal titration calorimetry is not for high throughput. It is
better for the analysis and description of the binding process. Because of its
importance, particularly with the optimization of ligands in mind, this method is
considered again in ▶ Sect. 8.8.
2.0
1.5
Stoichiometry
ΔG
ΔH
∫dF = ΔH
0.5
0.0
−6
−4
−2
−0.8
μJ/s
kJ/mol
−0.6
−0.4
−0.2
0.0
0 10 20 30
Time (min)
40 50 60 70
0
1.0
Molar Ratio
Fig. 7.5 In isothermal titration calorimetry, a solution of a ligand is added dropwise to a solution
of a protein. The binding to the protein leads to an exothermic or an endothermic reaction.
The heat that evolves upon the addition of each drop is the area under the single signal peaks.
The total integral of all signal peaks is the binding enthalpy DH. With increasing amount of
ligand the protein becomes saturated so that the signal intensity of the heat signal decreases.
The binding constant (dissociation constant) can be derived from the shape of the curve and
the free energy DG can be obtained from the relationship DG ¼ RT ln Kd. The stoichiometry
of the reaction is simultaneously obtained. The entropy is calculated by using the equation:
DG ¼ DH  TDS.
7.7 Biophysics Supports Screening 141
7.8 Screening by Using Nuclear Magnetic Resonance
The method of NMR spectroscopy is presented in ▶ Sect. 13.7 in greater depth.
Here it suffices to say that it has to do with the orientation of magnetic moments of
the nuclei in a substance sample. By applying a carefully chosen spatial and time-
resolved sequence of electromagnetic fields, it is possible to specifically activate
nuclei that are oriented within these magnetic fields. This can be carried out for one
type of nucleus in a protein. If a solution of test ligands or an entire mixture of
ligands is added to such a solution, protein binding can occur, assuming that the
ligands are suitable. According to their binding strength, they reside for a particular
length of time on the magnetically saturated protein. In doing so the magnetic
signal is transferred from the protein to the ligand. Upon dissociation, the changed
magnetic characteristics can be spectroscopically detected because the relaxation
time of the transferred magnetization is faster in the uncomplexed state. The
solution is measured with and without the magnetized protein. Then the difference
between the spectra is evaluated. Signals are only then recognizable for ligands that
had been bound to the protein in the interim and have therefore experienced
magnetization transfer. The so-called saturation transfer difference (STD) spec-
trum can be used to screen for possible ligands (Fig. 7.6). Many different variations
and elaborate experimental protocols have been developed for the above-described
RF
(selective)
fast
fast
minus
=
Fig. 7.6 To determine the saturation transfer difference (STD) with NMR spectroscopy, a library
of test ligands ( , ) is added to a target protein (ellipse). Potential binders (here ) reside for
a finite time span bound to the protein. If the nuclear spin of one type of nucleus in the protein is
selectively saturated (red) by using a suitable resonance frequency (RF), the protein magnetization
can be transferred (nuclear Overhauser effect, see ▶ Sect. 13.7) to the ligand that was bound in the
meantime ( ). These ligands become apparent in that their spectrum is altered even though they
are already dissociated from the protein. If the difference between the spectra in presence of the
saturated and unsaturated protein is displayed, it is possible to determine which ligands were
bound immediately to the protein. Many variations and sophisticated experimental protocols have
been developed for the principle of magnetization transfer.
142 7 Screening Technologies for Lead Structure Discovery
principle of magnetization transfer. Even the use of so-called reporter or spy
ligands, which have an easily measured NMR signal are used. The resonance of
fluorine atoms is particularly well suited. For this, a fluorine-containing reporter
ligand that binds to the protein is needed; its binding should not be too strong
though. The ligand should be easily displaced from the protein by the test ligand.
This release is detectable as a change in the fluorine NMR spectrum, and reveals the
binding of the test ligand in this way. As is explained in detail in ▶ Sect. 13.7,
the spatial structure of proteins can be determined by isotopic labeling and the
measurement of mutually coupled NMR spectra. By such means, where a test
ligand binds to a protein can be accurately determined by evaluating the specific
resonance shifts of the labeled protein. In the best case, it is even possible to see two
ligands binding at once or two different ligands binding on different non-
overlapping positions in the binding pocket. The research group of Steven Fesik
at Abbott developed these methods. It is known as “SAR by NMR” (SAR stands for
structure–activity relationship) and is used for lead-structure identification and
optimization. A nanomolar inhibitor for the matrix metalloproteinase stromelysin
(▶ Sect. 25.6) was found with this method. First a potent head group was sought
that could bind to the zinc ion in the catalytic center of this protease. Just such
a molecule, acetohydroxamic acid 7.1, was found with an admittedly weak but
specific binding of Kd ¼ 17 mM (Fig. 7.7). After the discovery of this ligand, the
binding site on zinc was saturated by this compound. Further NMR measurements
concentrated on the search for a ligand suited to fill the neighboring S1
0
binding
pocket. For this, a small library of heteroarylphenyl and biphenyl derivatives was
employed. 4-Cyano-40
-hydroxy-biphenyl 7.2 was identified as a hit. On the right
side of Fig. 7.7 both ligands are shown in the binding pocket. The evaluation of the
structural data showed that the hydroxylated phenyl ring binds in proximity to the
methyl group of the acetohydroxamic acid. Therefore connecting the fragments was
the next obvious thing to do. An ethylenoxy group was used as a bridge and was
coupled to the cyanobiphenyl moiety. NMR spectroscopy confirmed this structural
hypothesis and an inhibitor, 7.3, with an affinity of 25 nM was produced.
7.9 Crystallographic Screening for Small Molecular
Fragments
Crystal structure analysis delivers the most exact spatial position of a molecule in
the binding pocket of a protein. Even the geometry of small, very weakly binding
molecules is easily recognized. In structures that have a resolution better than
2–2.5 Å (▶ Sect. 13.5), water molecules are usually still recognizable as discrete
density maxima. Often, they indicate sites in the binding pocket that can be
equivalently accommodated by polar functional groups of ligands (Fig. 7.8). In
the early 1990s, Dagmar Ringe in the research group of Greg Petzko exposed
protein crystals intentionally to solvent molecules to allow the solvent to diffuse
into the crystals (▶ Sect. 20.2). The solvent molecules can act as probes in that they
populate binding regions of the protein pockets. As an example, the areas where
7.9 Crystallographic Screening for Small Molecular Fragments 143
isopropanol, acetonitrile, or acetone are encountered in thermolysin, a zinc prote-
ase, are shown in Fig. 7.8.
Even phenol, a small organic molecule, manages to diffuse into the binding pocket.
Phenylsuccinic acid, a lead structure with a typical fragment size, binds to the zinc
protease. Its binding position has been determined by crystallography. The phenyl
ring of this molecule sits in the position that is also explored by phenol. One of the
acid groups of the succinic acid is in the position that was indicated by the carbonyl
carbon of acetone. The second acid group coordinates to the zinc ion and occupies
positions where water molecules resided in the uncomplexed state (Fig. 7.8). There
are many protein–ligand complexes in which small molecules from the crystallization
solution or cryobuffer were adsorbed. These can be used as probes to map out
N
H
CH3
CH3
Kd
= 17 mM
HO
O
Zn2+
Zn2+
Zn2+
S1⬘
S1⬘
S1⬘
7.1
Stromelysin
a
b
c
Kd = 17 mM Kd = 0.02 mM
HO
N
H
HO
7.2
O
CN
Stromelysin
O
HO
N
H
IC50 = 25 nM
7.3
O
CN
Stromelysin
d
His211
Zn2+
Val163
His205
His211
Zn2+
Val163
His211
His205
e
Fig. 7.7 In the “SAR by NMR” method, ligands with weak affinity to a protein, in this case
stromelysin, are sought from a large complex mixture. 15
N-labeled protein is used and so-called
1
H-15
N HSQC spectra are measured. If a ligand such as acetohydroxamic acid 7.1 becomes
apparent through a shift in the resonance of specific amino acids that protrude into the binding
pocket, the binding geometry can be deduced (a, d). Later the binding site is saturated with these
ligands. Further NMR measurements are carried out to identify ligands for neighboring binding
positions. These are revealed by the shift in the resonances of neighboring amino acids. That is how
4-cyano-40
-hydroxybiphenyl 7.2 was discovered (b, d). A chemical coupling of both hits 7.1 and
7.2 with a –CH2CH2O– linker produced 7.3, which is a nanomolar inhibitor of the protease
stromelysin (c, e).
144 7 Screening Technologies for Lead Structure Discovery
Phe114
Asn112
Zn2+
Arg203
a
OH
O
CH3
N
H2O
Phe114
Asn112
Zn2+
Arg203
O
b
HO
HO O
Benzylsuccinic acid
HO
Acetone
Water Acetonitrile Isopropanol Phenol
Fig. 7.8 It was possible to
soak small probe molecule
(so-called “fragments”) into
crystals of the protease
thermolysin. (a)
Superposition of multiple
structures in which water
(red spheres), isopropanol
(C atoms are gray), acetone
(C atoms are light blue),
acetonitrile (C atoms are
green), and phenol (C atoms
are violet) had penetrated the
crystals. They describe
potential positions for
functional groups of putative
ligands. The structure of
benzylsuccinic acid, a weakly
binding inhibitor of
thermolysin, is also shown in
(b). That molecule
coordinates with one of its
acid groups to the catalytic
zinc ion (upper row). Both
oxygen atoms of the acid
group displace two water
molecules that are present in
the non-complexed structure.
The other carboxylate group
forms a salt bridge with the
neighboring Arg203.
The oxygen of an acetone
molecule was found at almost
the same position. The phenyl
ring of the benzylsuccinic
acid that occupies nearly the
same position as the phenol
molecule in the fragment
structure was detected.
Benzylsuccinic acid can be
used as a starting structure
for further optimization.
7.9 Crystallographic Screening for Small Molecular Fragments 145
a binding pocket. A creative scientist will directly exploit their position for the design
of new drug candidates. From there, it was obvious to use crystal structure analysis as
a method to screen small molecules or “fragments” (MW 250 Da).
Even today a crystal structure determination is fairly laborious. All the same, it
can be largely automated so that a few hundred molecules can be processed. In
addition, the tendency of small molecules to diffuse into mature protein crystals can
also be used (so-called “soaking”; ▶ Sect. 13.9). If a “cocktail” of multiple test
substances is used, the screening can be accelerated. A protein crystal can be
exposed to up to 10 compounds at once. The composition of the cocktails is
construed so that a mixture of different forms (long and stretched, angular, spher-
ical, etc.) is present. This makes it easier to distinguish them later in the electron
density (see ▶ Sect. 12.5). To optimize the effort-to-yield ratio for the crystallo-
graphic screening, often a different screening method is carried out first to pre-filter
possible hits. Only compounds that have been identified as hits in the first screening
are used in the subsequent crystallographic screening. However, only a few tech-
niques that have been described in the previous section are really suitable to find
a small, weakly binding candidate from a fragment library. Frequently this concerns
only millimolar-binding candidates.
The hits from the crystallographic fragment screening can be further developed
(▶ Sect. 20.7). One possibility is to probe the different regions of the binding
pocket and then connect the pieces with a linker, analogously to what was
described in Sect. 7.6 in the “SAR by NMR” method. In another, usually more
successful variation, the fragment hits are chemically elaborated upon. For this
approach additional moieties are added on the basis of the crystal structure. In this
way the original hit, which serves as a seed, can be enlarged to bind more strongly to
the protein.
7.10 Tethered Ligands Explore Protein Surfaces
Ligands bind with very poor affinity to flat pockets that are open to the surrounding
solvent. Therefore, it is extremely difficult to evidence their binding or obtain
a crystal structure with a ligand bound in such an area. James Wells and his
colleagues at the Sunesis company in San Francisco developed the idea to tether
ligands for this type of binding. From a chemical point of view, this means that
a reaction is carried out with the exposed thiol of a cysteine residue on the protein’s
surface. Such a cysteine must be available in the native protein, or it is appropriately
introduced by mutagenesis (▶ Sect. 12.2). Under suitable reaction conditions, the
ligand is anchored with a disulfide bond, which is formed through the thiol group
of the exposed cysteine (Fig. 7.9). Only those test candidates from the compound
library will react that are able to form an interaction with the surface in the vicinity
of the cysteine thiol group. For all intents and purposes, they explore the surround-
ing region, react with the cysteine, and remain coupled to the surface by the
disulfide bridge. Successfully formed complexes are then evidenced by mass
spectrometry. James Wells and Robert Strout chose thymidylate synthase as their
146 7 Screening Technologies for Lead Structure Discovery
first test example. This enzyme plays an important role in the de novo synthesis of
thymidine, an essential building block for DNA. Cells with a high division rate
especially need this building block so that inhibition of this enzyme might represent
potent anti-infective agents or antitumor compounds (▶ Sect. 27.2).
Thymidylate synthase has a cysteine residue in position 146, in the vicinity of
the catalytic site. From a library of 1200 disulfides, compounds 7.4–7.7 proved to be
binders whereas the very similar derivatives 7.8–7.11 were not selected (Fig. 7.10).
Accordingly, the phenylsulfonamide together with the proline moiety seemed to be
essential for binding. Next the disulfide anchor was removed, and the binding
constant for N-tosyl-D-proline 7.12 was measured to be 1.1 mM (Fig. 7.11). To
further test the concept, Cys146 was exchanged for a serine (Fig. 7.12). When no
binding was apparent with this mutant, the neighboring His147 was mutated to
a cysteine, but this mutant could not fish out the N-tosylproline moiety either. In
contrast, the position-143 mutant was successful (Fig. 7.12). In that case a leucine
was exchanged for a cysteine. The subsequently determined crystal structure
showed that the N-tosylprolyl moiety was almost identically bound in both cova-
lently anchored complexes, just as they are without an S—S anchor (Fig. 7.12). This
is convincing proof that the covalent coupling is not responsible for the binding
geometry. In fact, the technique allows small, initially weakly binding ligands to be
fished out of a large library. From the original millimolar hit 7.12, the side chain of
the natural cofactor methylenetetrahydrofolic acid could be transferred to give 7.13,
which was developed into a nanomolar inhibitor 7.15 in two steps.
The method of “tethering” can be fairly generally applied. It has especially
achieved success in the search for ligands that disrupt the formation of protein–
protein surface contacts (▶ Sect. 10.6). A great advantage of the technique is that
it is not necessary to develop an additional biochemical binding assay. Weakly
R
R
S
S
S
R
R
S
S
S
S
S
SH S
S
+
Fig. 7.9 The thiol group of the exposed cysteine is used as an anchor group for the formation of
disulfide bonds with ligand candidates from a compound library. There, suitable ligands react that
are also able to interact with the surface region in the vicinity of the cysteine thiol. A crystal
structure was determined from just such a covalently linked complex (Fig. 7.12). After optimiza-
tion of the initially discovered hit, the disulfide anchor can be discarded and a non-covalent
inhibitor can be developed.
7.10 Tethered Ligands Explore Protein Surfaces 147
binding ligands are covalently “tethered” and cannot be washed away as happens in
the case of simple complex formation. Further, the covalently bound chemical
probes allow the adaptive capacity of the surface region to be explored.
7.11 Synopsis
• Large substance libraries are screened for biological effects to filter out active
molecules and assess their value for a given indication.
• Three phases are distinguished, a broad automatic introductory screening for
hits, a more detailed screening of chemical analogues around a hit to establish
the first structure–activity relationship, and a lead optimization to find candidates
for clinical testing.
• A prerequisite for high-throughput screening was the development of in vitro
test systems using pure proteins produced by gene technology along with the
entire arsenal of biochemical methods in the test tube so that the function of
single-gene products can be recorded.
• As a disadvantage, high-throughput screening does not assess the entire effect
spectrum and ignores effects such as transport, distribution, metabolism, and
excretion.
• Screening libraries are frequently assembled of molecules from other drug
development projects; as such, they are rather inefficient with regard to their
molecular size and their modest screening hit activity in micromolar range.
CH3 F
O
S
S
S
S
S
O N
O
S
O N
O
7.4
CH3
H3C
CH3
7.5
S
O
S
S
O
S
S
Cl
S
O N
S
O N
7.6 7.7
S
S
CH3
N
H3C
S
S
S
O N
O
CH3 CH3
7.8 7.9
S
O
S
S
O
S
S
O N
H
S
O N
H
7.10 7.11
Fig. 7.10 From a library of 1,200 disulfides, the compounds on the left side 7.4–7.7 proved to be
binders although structurally similar derivatives 7.8–7.11 (right) were synthesized but did not bind
to the protein.
148 7 Screening Technologies for Lead Structure Discovery
Small substances with high ligand efficiency and sufficient space for structural
optimization are particularly promising.
• Enzymatic function and its inhibition can be recorded by the production of
chromophoric reaction products.
• Radioactively labeled compounds or enzyme-linked immunosorbent assays are
versatile techniques to record protein function on the molecular level.
• Progress in assay miniaturization calls for sophisticated robotic systems, ever-
improving sensitivity of the read-out, including fluorescence measuring tech-
niques, and reliable logistics to handle the enormous data flow.
• Aggregate formation of hydrophobic test compounds can exert significant influ-
ence on the assay read-out or even cause false positive or negative hits.
• Testing on cell-based assays is performed to study changes in cellular or
organism-related function beyond pure binding of a test compound to a given
protein target.
CH3
NH
O
HOOC HOOC
COOH COOH
NH
O
S
O
COOH
S
O
COOH
S
O
N
H
O
COOH
O N O N O N
7.14
7.12 7.15
HOOC COOH
NH
O
N
HN
N
N
H
N
O
N
H2N
7.13
Ki = 1,1 mM Ki = 330 nM
Ki = 24 μM
Fig. 7.11 By transferring a side chain from the natural cofactor methylenetetrahydrofolic acid
7.13, N-tosyl-D-proline, a millimolar inhibitor could be transformed into a nanomolar inhibitor
7.15 in two steps.
7.11 Synopsis 149
• Primary animal testing in vertebrates has been abolished today for ethical
reasons, but it is being increasingly replaced by whole-animal screening by
using nematodes as the simplest multicellular organism to record synergistic
and side effects.
• As a complementary and alternative method, virtual computer screening has
been developed to screen large compound libraries by docking ligand candidates
into the known spatial structure of a target protein.
• Binding events are recoreded by biophysical methods such as surface plasmon
resonance, thermal stability shifting, mass spectrometry, or microcalorimetry.
They are used to detect ligands as potential binders.
• NMR spectroscopy can be used to detect ligand binding by magnetization
transfer. Multiple binders can be chemically linked to more strongly binding
ligands according to the SAR by NMR technique.
• Exposure of small molecular probes and fragments to protein crystals allows for
the structural characterization of the binding modes of weakly binding fragments
as a versatile starting point to lead optimization.
• Small-molecule fragments tethered to a protein through covalent attachment to
the exposed thiol group of a cysteine residue allow the exploration of the binding
properties of flat, solvent-exposed surface depressions and serve as a starting
point to develop antagonists to perturb the protein–protein interface in complex
formation.
S S
Cys143
Cys143 Leu
= 7.4
S
S
Cys146
= 7.4
Cys146 Ser
= 7.12
His147
Fig. 7.12 Superpostions of crystal structures of the enzyme thymidylate synthase with two
tethered ligands, one bound to Cys143 (C atoms of ligand 7.4 are green) and the other to
Cys146 (C atoms of ligand 7.4 are violet), both of which are N-tosyl-D-proline derivatives and
which are covalently anchored through S—S bridges. Upon cleavage of the disulfide anchor, the
free N-tosyl-D-proline (C atoms are gray, 7.12) proved to be a ligand with an affinity of 1.1 mM. Its
binding geometry is very similar to both of the covalently anchored derivatives.
150 7 Screening Technologies for Lead Structure Discovery
Bibliography
General Literature
Blundell TL, Jhoti H, Abell C (2002) High-throughput crystallography for lead discovery in drug
design. Nat Rev Drug Discov 1:45–54
Hajduk PJ, Greer J (2007) A decade of fragment-based drug design: strategic advances and lessons
learned. Nat Rev Drug Discov 6:211–219
Jahnke W, Erlanson DA (2006) Fragment-based approaches in drug discovery. In: Mannhold R,
Kubinyi H, Folkers G (eds) Methods and principles in medicinal chemistry, vol 34. Wiley-
VCH, Weinheim
Jones AK, Buckingham SD, Sattelle DB (2005) Chemistry-to-gene screens in Caenorhabitis
elegans. Nat Rev Drug Discov 4:321–330
Klebe G (2006) Virtual ligand screening: strategies, perspectives and limitations. Drug Discov
Today 11:580–592
Löfås S (2004) Optimizing the hit-to-lead process using SPR analysis. Assay Drug Dev Technol
2:407–415
Siegel MM (2002) Early discovery drug screening using mass spectrometry. Curr Topics Med
Chem 2:13–33
Sotriffer C (2010) Virtual screening. In: Mannhold R, Kubinyi H, Folkers G (eds) Methods and
principles in medicinal chemistry, vol 48. Wiley-VCH, Weinheim
Vogtherr M, Fiebig K (2003) NMR-based screening methods for lead discovery. In: Hillisch A,
Hilgenfeld R (eds) Modern methods of drug discovery. Birkh€
ausen Verlag, Boston, pp S183–
S120. ISBN 376436081X
Special Literature
Hajduk PJ, Sheppard G, Nettesheim DG, Olejniczak ET, Shuker SB, Meadows RP, Steinman DH,
Carrera GM Jr, Marcotte PA, Severin J, Walter K, Smith H, Gubbins E, Simmer R, Holzman
TF, Morgan DW, Davidsen SK, Summers JB, Fesik SW (1997) Discovery of potent nonpeptide
inhibitors of stromelysin using SAR by NMR. J Am Chem Soc 119:5818–5827
Erlanson DA, Braisted AC, Raphael DR, Randal M, Stroud RM, Gordon EM, Wells JA
(2000) Site-directed ligand discovery. Proc Natl Assoc Soc 97:9367–9372
Bibliography 151
Optimization of Lead Structures
8
A lead structure is the starting point on the way to a drug. The potency, specificity,
and duration of effect must be optimized, and the side effects and toxicity must be
minimized in an usually elaborate, iterative process. Every change in the chemical
structure modulates the 3D structure of the molecule, its physicochemical prop-
erties, and the activity spectrum. The isosteric replacement of atoms or groups,
the introduction of hydrophobic building blocks, the dissection of rings or the
restriction of flexible molecular portions into cyclic structures, and the optimiza-
tion of the substitution pattern are all possibilities to purposefully modify a target
structure.
Creativity and luck are always important prerequisites for success in pharmaceu-
tical research. Nonetheless, there is a treasure chest of decades of accumulated
experience that can be exceedingly supportive to the rational optimization process.
The computer-aided methods can contribute to their full capability in this field in
particular. Several general considerations and approaches to lead optimization are
presented in the sections of this chapter. A discussion of the structure-based
and computer-aided optimization of lead structures is presented in ▶ Chaps. 17,
“Pharmacophore Hypotheses and Molecular Comparisons” and ▶ 20, “Protein
Modeling and Structure-Based Drug Design”; examples for its application to differ-
ent therapeutic areas are presented in ▶ Chaps. 23, “Inhibitors of Hydrolases with an
Acyl–Enzyme Intermediate”; ▶ 24, “Aspartic Protease Inhibitors”; ▶ 25, “Inhibitors
of Hydrolyzing Metalloenzymes”; ▶ 26, “Transferase Inhibitors”; ▶ 27, “Oxidore-
ductase Inhibitors”; ▶ 28, “Agonists and Antagonists of Nuclear Receptors”; ▶ 29,
“Agonists and Antagonists of Membrane-Bound Receptors”; ▶ 30, “Ligands for
Channels, Pores, and Transporters”; ▶ 31, “Ligands for Surface Receptors”; ▶ 32,
“Biologicals: Peptides, Proteins, Nucleotides, and Macrolides as Drugs”.
8.1 Strategies for Drug Optimization
The optimization of active substances follows a process that is best characterized by
the words of the philosopher Sir Karl Popper:
G. Klebe, Drug Design, DOI 10.1007/978-3-642-17907-5_8,
# Springer-Verlag Berlin Heidelberg 2013
153
The truth is objective and absolute. But we can never be sure that we have found it. Our
knowledge is always an assumed knowledge. Our theories are hypotheses. We test for the
truth in that we exclude what is false. (Objective Knowledge, 1972)
Accordingly the optimization of a compound’s potency follows a working
hypothesis, while an iterative process of trial and error refines the hypothesis. The
assembled data about the relationship between chemical structure and biological
activity serve the design of new structures. These are synthesized and tested, and
a new working hypothesis is modified as appropriate. In negative cases, the
hypothesis is discarded and a new one is formulated that fits more harmoniously
with the biological data. The following qualities in the structure of the active
substance are distinguished from one another:
• The actual pharmacophore (Sects. 8.7 and ▶ 17.1) that is responsible for the
specific binding and upon which only limited chemical modification can be
carried out,
• The additional groups (adhesion groups) that improve the affinity and biolog-
ical activity,
• Further groups that do not influence the binding but rather the lipophilicity of
the molecule and with it the transport and distribution in biological systems
(▶ Chap. 19, “From In Vitro to In Vivo: Optimization of ADME and Toxicology
Properties”),
• The groups that must be cleaved or modified in the organism to release the
actual active form (▶ Chap. 9, “Designing Prodrugs”).
The most important steps in the optimization of lead structures are the systematic
changes in the shape and form, that is, the three-dimensional structure, and/or the
physicochemical properties. Single steps along this route are:
• Changes in the lipophilicity and the electronic properties through the introduc-
tion or removal of hydrophobic or hydrophilic groups,
• Variations of substituents at aromatic or heteroaromatic rings,
• Introduction or elimination of heteroatoms in chains or rings,
• Changes in chain length of aliphatic groups or linkers,
• Introduction of space-filling substituents to stabilize a particular conformation,
• Changes in the ring size of alicyclic or heterocyclic rings,
• Incorporation of flexible partial structures in rings,
• Incorporation of branches or attachments to rings (rigidifying),
• Opening of rings,
• Elimination of chiral centers to simplify a structure,
• Addition of chiral centers to increase the selectivity or
• Shift the thermodynamic binding profile and the drug’s residence time at the
target protein.
These processes are usually unidirectional in classical drug optimization,
that is, the optimization takes place on one position of the molecule at a time, in
one single direction. In the past, such unidirectional optimization has led to many
disappointments because interdependent influences of the structural changes were
neglected, or the optimal lipophilicity was exceeded. John Topliss developed
154 8 Optimization of Lead Structures
a scheme for the variation of aromatic substituents that allows the biological activity
to be optimized in a minimum number of steps (Sect. 8.3). The application of
experimental design, simultaneously changing multiple parts of a molecule, and the
evaluation of the results by using quantitative structure–activity relationships
(▶ Chap. 18, “Quantitative Structure–Activity Relationships”) usually allows a fast
and effective optimization. In structure-based and computer-aided optimization, the
3D structure of the target protein and its complexes leads to directed structural
variations of the active substances. Here again, the aspects of total lipophilicity and
metabolism should not be neglected.
8.2 Isosteric Replacement of Atoms and Functional Groups
Isosteric replacement is the exchange of particular groups in a molecule for
sterically and electronically related groups. If the biological effect is essentially
maintained, the term bioisosteric replacement (Fig. 8.1) is used. In the simplest
case a single atom is exchanged, for instance, a Cl (lipophilic, weakly electron
withdrawing) is replaced by a Br (same characteristics as Cl) or methyl (lipophilic,
weakly electron donating), or an –O– (polar, H-bond acceptor) is exchanged for an
NH (polar, H-bond donor) or a –CH2– (lipophilic, unable to form H-bonds).
Furthermore, bioisosteric replacement also means the exchange of entire groups.
Substituents: F-, Cl-, Br-, CF3-, NO2-
Methyl-, Ethyl-, Isopropyl-, Cyclopropyl-, tert-Butyl-,
-OH, -SH, -NH2, -OMe, -N(Me)2
Bridging Groups: -CH2-, -NH-, -O-
-COCH2-, CONH-, -COO-,
C=O, C=S, C=NH, C=NOH, C=NOAlkyl
Atoms and Groups in Rings: -CH=, -N=
-CH2-, -NH-, -O-, -S-
-CH2CH2-, CH2-O- -CH=CH-, -CH=N-
Larger Groups: -NHCOCH3, -SO2CH3
N
N O
N
H
-COOH, -CONHOH, -SO2NH2, ,
N
NH
HO N
HO
N
O
HO N N
H
H
Fig. 8.1 A few possibilities for the isosteric replacement of atoms and/or groups.
8.2 Isosteric Replacement of Atoms and Functional Groups 155
For example, –COOH, an H-bond acceptor and donor, can be replaced with other
groups that have the same or modified properties, for instance, with the similarly
acidic tetrazole. Another example can be found in the exchange of a phenyl ring for
a thiophene or a furan building block (Fig. 8.1). The potential of isosteric replace-
ment is illustrated in the exchange of all three iodine atoms of triiodothyronine T3
8.1 for alkyl groups to give 3,5-dimethyl-30
-isopropylthyronine 8.2, which in turn
retains impressive affinity and agonistic activity on the thyroid hormone receptor.
In contrast to triiodothyronine, which is both iodinated and metabolized by
a deiodinase, the alkyl groups of 8.2 are no longer metabolically cleavable.
Bioisosteric replacement was and is one of the most important strategies in
pharmaceutical research. Nonetheless, surprises sometimes occur. The replacement
of an ester for an amide group in the local anesthetics (▶ Sect. 3.4) expectedly
improved the metabolic stability. In the case of acetylsalicylic acid 8.3 (Fig. 8.2)
this exchange cannot be made. An analogous exchange of the –COO– group for
a –CONH– group results in a complete activity loss because the amide can no
longer acylate the cyclooxygenase enzyme (▶ Sect. 27.9). In the case of
p-aminobenzoic acid (R ¼ –COOH, Fig. 8.2) the exchange of a carboxyl group
for a sulfonamide group gives sulfanilamide 8.4 (R ¼ –SO2NH2), which is an
antimetabolite of p-aminobenzoic acid (▶ Sect. 2.3).
O CH2CH(NH2)COOH
CH2CH(NH2)COOH
HO
I I
I 8.1 Triiodothyronine, T3
O
HO
R
8.2
COOH
O
O 8.4 R = -COOH
NH2
8.3 Acetylsalicylic acid or -SO2NH2
Fig. 8.2 Isosteric replacement with retention, loss, and reversal of the biological activity. All
three iodine atoms of the thyroid hormone thyroxine 8.1 can be replaced with alkyl groups and
compound 8.2 is still active. In the case of acetylsalicylic acid 8.3, the exchange of the –OCOCH3
for an NHCOCH3 group led to the loss of the acylating ability and therefore a nearly complete loss of
the biological activity. The antimetabolite sulfanilamide 8.4 (R ¼ SO2NH2) is derived from
p-aminobenzoic acid 8.4 (R ¼ COOH), which is a critical intermediate in the bacterial dihydrofolate
synthesis; 8.4 (R ¼ SO2NH2) is the result of the exchange of a carboxyl group for an isosteric
sulfonamide group.
156 8 Optimization of Lead Structures
A lead structure is rarely studied exclusively by one research group. Other
companies adopt successful examples, at the very latest after the economic success
of a new medicine. The goal of this so-called “me-too” research is to modify the
competitor’s lead structure to arrive at patent-free analogues that are more effica-
cious, more selective, or better tolerated. It must be accepted that even this form of
competition has led to the therapeutically most valuable compounds in many thera-
peutic areas. On the one hand, a plentitude of duplicate work has been performed,
while on the other hand, new analogues with improved properties have been produced
and introduced to therapy which turned out to be successful in the long run. Penicil-
lins of the third and fourth generation with broad-spectrum activity and metabolic
stability, b-blockers with improved selectivity, and many other specific drugs would
simply not exist if it were not for the much-disparaged “me-too” research.
8.3 Systematic Variation of Aromatic Substituents
The goal of lead structure optimization has an impact on the planning of the
relevant experimental series. If the biological consequences of structural changes
are to be evaluated with minimal effort, careful design must precede the synthesis
of the substances. Here an almost unsolvable problem emerges in that, as a general
rule, the exchange of a substituent or group leads to complex changes in multiple
properties. The exchange of an ethyl group for a methyl group changes only the
lipophilicity and size of the substituent. If a methyl group is exchanged for
a chlorine atom, the polarizability, electronic properties, and moreover the metab-
olism is altered. Other substituents could then change the H-bond donor and
acceptor properties as well as the ionization and dissociation.
In 1971, Paul Craig proposed the use of a simple diagram for the structural variation
of aromatic substituents, with which the important characteristics of these substitu-
ents, for instance, lipophilicity and electronic properties, are plotted against each
other. The selection of substituents from different quadrants of this diagram allows
an evaluation of different combinations of properties. The concept can be extended to
multiple dimensions, possibly with the aid of mathematical and statistical methods.
In 1972, John Topliss made a suggestion that went further, which would be
called today an evolutionary strategy. One substituent at a time (e.g., hydrogen for
chlorine) is exchanged in the optimization of the substitution pattern of an aromatic
compound. The next compound is planned based on which of the first two com-
pounds demonstrated better effects. If the new substituent improves the effect,
a new substituent is chosen that has the same physicochemical properties, in larger
measure, or more of these substituents are added. If the new substituents impair the
biological activity, then a substituent is chosen that has the opposite physicochem-
ical properties. If two different substituents produce the same effect, it should be
evaluated whether changes in the physicochemical properties influence the activity
in the opposite direction. Despite its elegance, this strategy often fails for the
mundane reason that it is too time consuming to take such a stepwise approach.
8.3 Systematic Variation of Aromatic Substituents 157
As a consequence of the work of Craig and Topliss, further design methods were
developed. None of these methods should be interpreted too closely. Synthetic
planning must be oriented on both the accessibility of the compounds as well as
achieving the largest possible structural variation, that is, a diversity of physico-
chemical properties and 3D structure. Since the introduction of combinatorial
chemistry (▶ Chap. 11, “Combinatorics: Chemistry with Big Numbers”), the ratio-
nal design of diverse substance libraries has taken on entirely new possibilities and
perspectives.
8.4 Optimizing the Activity and Selectivity Profile
The structural variation of a lead structure influences not only the activity strength
but also the activity spectrum. That can be thoroughly advantageous, but it also
brings with it the risk that the selectivity can deteriorate. A simple rule of thumb is
that enlarging the molecule, introducing optically active centers, and rigidification
improves the selectivity, assuming that the activity is not entirely lost. On the other
hand, removing a chiral center, establishing more flexibility, or reducing the size of
the molecule usually results in unspecific and weaker activity.
Because of the sequencing of the human genome, the gene family to which
a target protein belongs is known, as is the number of members of the gene family.
By using gene technology it is possible to construct single isoform test systems
(assays). As a result, today pharmaceutical research is in a position to make
a predictive selectivity profile. This has stimulated efforts to develop selective
drugs. An interesting corollary to these efforts is the fact that the molecular weight
of drugs has increased, as statistics show, in the last years, a confirmation of the
above-mentioned rule of thumb.
For drugs that are meant to act on neuroreceptors in the brain, the polarity is critical
to whether they can cross the blood–brain barrier. Polar compounds are unable to do
this and act only in the periphery, for instance, on the circulatory system. Examples of
this are adrenaline 8.5 and dopamine 8.6 (Fig. 8.3). The stepwise removal or masking
of polar groups brings the central effects into the foreground. Ephedrine 8.7 acts in the
brain and in the periphery, it is centrally stimulating and raises the blood pressure.
Amphetamine 8.8 (“speed”) and the intoxicant MDMA 8.9 (the designer drug
“ecstasy”) are weak bases. Their relatively nonpolar neutral forms easily overcome
the blood–brain barrier and their CNS effects dominate (Fig. 8.3).
There are exceptions even here. L-DOPA 8.10 (Fig. 8.3) is an extremely polar
amino acid. It could never cross the blood–brain barrier by passive diffusion alone.
Instead it is recognized by an amino acid transporter and actively transported over
the membrane and into the brain. This simultaneously solves the problem of
bringing dopamine 8.6, which is used to treat Parkinson’s disease, into the brain
because L-DOPA is decarboxylated to dopamine there (▶ Sects. 9.4 and ▶ 27.8).
The decisive influence that even the smallest changes in the structure can have is
seen in the effect spectrum of the hormone and neurotransmitter noradrenaline and
adrenaline and their synthetic analogues. Whereas noradrenaline 8.11 (Fig. 8.4)
158 8 Optimization of Lead Structures
OH
OH
8.11 Noradrenaline, R = H
Predominantly α-Mimetic
N
H
R
HO
HO
HO
HO
HO
HO
8.5 Adrenaline, R = CH3
α- and β-Mimetic
8.12 Isoprenaline, R = -CH(CH3)2
β1-Mimetic
N
H 8.13 Dobutamine
β1-Mimetic
CH3
OH
N
H
CH3 N
H
Cl CH3
OH
CH3
CH3
H2N
CH3
CH3
Cl
8.14 Salbutamol
b2-Mimetic
8.15 Clenbuterol
b2-Mimetic
Fig. 8.4 Noradrenaline 8.11, adrenaline 8.5, and isoprenaline 8.12 act to different extents on the
a and b receptors. Selective b1 and b2 agonists, for instance, 8.13, 8.14, and 8.15, act specifically as
cardiac stimulants or bronchodilators.
OH
Polar Molecules Intermediate Polarity: Nonpolar Molecules:
H
N
CH3
HO
HO
HO
HO
OH
N
CH3
NH2
CH3
8.8 Amphetamine
H
H
NH2
R
CH3
N
CH3
O
8.5 Adrenaline
8.7 Ephedrine
H
CH3
O
8.6 Dopamine, R = H
8.10 L-DOPA, R = COOH
8.9 MDMA
Fig. 8.3 The polar compounds adrenaline 8.5 and dopamine 8.6 are cardiovascularly active in the
periphery after intravenous administration. Ephedrine 8.7 is more lipophilic and therefore shows
both peripheral and central effects. The more nonpolar compound amphetamine 8.8 (“speed”) has
overwhelmingly stimulatory effect in the CNS. 3,4-Methylenedioxymethamphetamine 8.9
(MDMA; “ecstasy”) is hallucinogenic. Polar groups are red and neutral or lipophilic groups are
blue.
8.4 Optimizing the Activity and Selectivity Profile 159
affects the a-adrenergic receptors, its N-methyl derivative adrenaline 8.5 (Fig. 8.3)
acts on a and b receptors as a mixed a/b agonist. This difference was used to
enlarge the N-alkyl group to arrive at the specific b-agonist isoprenaline 8.2
(Fig. 8.4). Further differentiation of the effects could be achieved within the class
of b-adrenergic substances. Dobutamine 8.13 is missing the alcoholic hydroxyl
group of adrenaline. Despite its structural relationship to dopamine 8.6 (Fig. 8.3) it
is a b1 agonist with cardioselective effects. Specific b2 agonists, for instance
salbutamol 8.14 and clenbuterol 8.15 (Fig. 8.4) are used to treat asthma because
they are bronchiodilators without the cardio-stimulatory effects of the unspecific b
agonists (▶ Sect. 29.3).
The sulfonamides are a prime example for the targeted optimization of lead
structures in different therapeutic indications. From the first antibacterial examples,
the diuretics as well as hypoglycemics (antidiabetics) resulted. It had already been
noticed in 1940 that sulfanilamide (▶ Sect. 2.3) inhibits the enzyme carbonic
anhydrase, and therefore should lead to increased urine production (▶ Sect. 25.7).
Among other substances, hydrochlorothiazide 8.16, furosemide 8.17 (Fig. 8.5), and
structurally related compounds gained therapeutic importance. In the early 1940s,
the hypoglycemic effects of a few sulfonamides were clinically observed. The
antibacterial and simultaneously hypoglycemic carbutamide 8.18 was introduced
to therapy in 1955, the lipophilic and therefore more bioavailable tolbutamide 8.19
N
NH
S
S
O O O O
Cl
H2N H2N
N
H
S
O O
Cl
O
O
OH
H H
8.16 Hydrochlorothiazide 8.17 Furosemide
R
S
N
H
N
H
CH3
O O O
8.18 Carbutamide, R = NH2
8.19 Tolbutamide, R = CH3
Cl
S
N
H
N
H
O O O
O
OMe
N
H
8.20 Glibenclamide
Fig. 8.5 The sulfonamides hydrochlorothiazide 8.16, furosemide 8.17, and related diuretics are
different from most antibacterial analogues because of the unsubstituted sulfonamide group.
Carbutamide 8.18 and tolbutamide 8.19 were the first unspecific sulfonamides with hypoglycemic
effects that were later replaced with specific hypoglycemics of the glibenclamide-type 8.20.
160 8 Optimization of Lead Structures
was introduced later. Systematic structural variation finally led to glibenclamide
8.20 (Fig. 8.5 and ▶ Sect. 30.2), which is much more potent and specific.
8.5 From Agonists to Antagonists
There is no general recipe for the transformation of an agonist into an antagonist. An
example of this is found in the tedious route from the agonist histamine to the H2
antagonist, as is described in detail in ▶ Sect. 3.5. There are, however, recognized
principles that have proven to be of value. For example, the exchange of polar for non-
polar substituents or the introduction of large groups such as additional aromatic rings
changes some receptor agonists to antagonists. The exchange of both phenolic
hydroxyl groups in isoprenaline 8.12 for two chlorine atoms (DCI, 8.21) or additional
aromatic rings (pronethalol, 8.22) delivered the first b-adrenergic antagonists, the
b-blockers. The introduction of an oxygen atom in the side chain, and further structural
optimization afforded the first b1-selective antagonists, for example, practolol 8.23 and
metoprolol 8.24. The b1-selective partial agonist xamoterol 8.25 is a blocker as well as
an agonist (Fig. 8.6). It occupies b1 receptors and displays a moderately stimulating
effect. By occupying the receptor, it protects it from an excessive response upon
elevated adrenaline release, for instance, from exercise or stress.
Analogously, the exchange of the imidazole ring of histamine 8.26 for large
hydrophobic groups led to the first H1 antagonists, for instance, diphenhydramine
8.27 (Fig. 8.7). Sedation is the most troublesome side effect of the classic H1
antagonists, which are used to treat allergies. The non-sedating terfenadine
Cl
OH
N
H
CH3
CH3
OH
N
H
CH3
CH3
Cl
OH
8.21 DCI 8.22 Pronethalol
R
O N
H
CH3
CH3
8.23 Practolol, R = -NHCOCH3
8.24 Metoprolol, R = -CH2CH2OMe
O N
H
OH
N
H
N
O
O
8.25 Xamoterol
HO
Fig. 8.6 3,4-Dichloroisoprenaline 8.21 (DCI) and pronethalol 8.22, the first unspecific
b-blockers, were derived from isoprenaline 8.12. Practolol 8.23 and metoprolol 8.24 are specific
b1 agonists. Xamoterol 8.25 is a partial b1 agonist, a combined agonist and antagonist.
8.5 From Agonists to Antagonists 161
8.28 (R ¼ H) can cross the blood–brain barrier because of its high lipophilicity, but
is immediately expelled by a transporter. Because of its cardiotoxicity, terfenadine
has been withdrawn from the market in the meantime and replaced by its active
metabolite fexofenadine 8.28 (R ¼ COOH). The sedating side effects of antihista-
mines also led to neuroleptics and antidepressants (▶ Sect. 1.6). Here, however, the
limits of rational drug optimization are apparent. Promethazine 8.29 is an antihis-
tamine with antiallergic action and sedating side effects. The neuroleptic chlor-
promazine 8.30 is a central depressant and therefore an antipsychotic; the
extraordinarily similar structure of imipramine 8.31 acts, on the other hand, as
a stimulant and is an antidepressant (Fig. 8.8). All three substances have different
mechanisms of action. The introduction of additional aromatic rings to other
receptor agonists, for instance, to the neurotransmitters acetylcholine and dopa-
mine, has led to antagonists (Fig. 8.9).
8.6 Optimizing Bioavailability and Duration of Effect
The absorption of the majority of pharmaceuticals depends only on their
lipophilicity. The more polar the drug, the more poorly it can penetrate the lipid
membrane, and the lower the absorption (▶ Sect. 19.6). Increasing the lipophilicity
improves the absorption (▶ Sect. 19.6). Extremely lipophilic compounds are insol-
uble in water, and the absorption is too slow. Lipophilic acids and bases offer
advantages here, if their acidity constant is not too far away from the neutral point,
pH 7. In their ionized form they are highly water soluble, while in their neutral form,
with which they are in equilibrium, they are lipophilic and membrane penetrable.
N
H
O
N
CH3
CH3
N
N
NH2
8.26 Histamine
H-Agonist
8.27 Diphenhydramine
Non-polar H1 Antagonist
(sedating)
N
R
OH
OH
CH3
CH3
8.28 Terfenadine, T = CH3
Polar H1 Antagonist (non-sedating)
Fexofenadine, Active Metabolite: R = -COOH
Fig. 8.7 By starting with
histamine 8.26 and
introducing large
hydrophobic groups, the H1
antagonists, for instance,
diphenhydramine 8.27, were
obtained. The non-sedating
terfenadine 8.28 (R ¼ CH3)
crosses the blood–brain
barrier but is immediately
expelled by a transporter.
In the meantime the active
metabolite, fexofenadine with
R ¼ COOH, is in the market.
162 8 Optimization of Lead Structures
These correlations are discussed in detail in ▶ Sect. 19.5. The molecular size influ-
ences the bioavailability insofar that substances with a molecular weight above
500–600 Da are captured by the liver on the sole grounds of the molecular size, and
are quickly excreted with the bile. Aside from this there are substances that penetrate
the membrane regardless of their polarity. These are taken up into the cell or are
eliminated from the cell by transporters (▶ Sect. 30.7). Among these are structural
analogues of amino acids and nucleosides. Classical strategies to extend the duration
of action are the conversion of free hydroxyl groups to ethers (see ▶ Sect. 9.2), the
replacement of esters with amides, and the replacement of metabolically labile amide
groups with isosteres. In a few cases, such structural changes are associated with
a reduction in potency, which is more than compensated for by a longer duration of
action. In the case of peptides the replacement of L-amino acids with D-amino acids,
the inversion of amide groups, and the replacement of larger structural elements with
peptidomimetic groups (▶ Sect. 10.4) have all proven successful.
The metabolism of aliphatic amino groups can be suppressed with alkyl substi-
tution or branching at the a carbon. Secondary alcohols can be converted to the
more bioavailable tertiary alcohols by introducing an ethinyl group at the same
carbon atom (▶ Sect. 28.5). The introduction of an isosteric fluorine atom in the
para position as a replacement for hydrogen prevents hydroxylation in this position.
If steric considerations do not play a role, the para position can also be blocked
N
N
NH3
+ D
A
P
H
8.26 Histamine
(Positively charged
form at pH = 7)
Pharmacophore
Fig. 8.9 The active
substance histamine 8.26 and
pharmacophores that are
attributed to it (A acceptor, D
donor, P positively charged
group).
S S
N
S
CH3
N Cl N
N
H3C CH3
CH3
CH3
CH3
CH3
N N
8.29 Promethazine
H1 Antagonist
8.30 Chlorpromazine
Neuroleptic
8.31 Imipramine
Antidepressant
Fig. 8.8 Closely related structures of active substances can have very different qualitative
activity. Chlorpromazine 8.30, a dopamine antagonist with neuroleptic activity, and imipramine
8.31, a dopamine transporter inhibitor with antidepressant activity, are both derived from
promethazine 8.29, an H1 antagonist with antiallergic activity.
8.6 Optimizing Bioavailability and Duration of Effect 163
with a larger group, such as a chlorine atom or a methoxy group. In the hydroxylated
3- and 4-position of the neurotransmitters dopamine, adrenaline, and noradrenaline,
the conversion to the monohydroxylated analogues, 3,5-dihydroxy compounds or to
the NH-isosteric indole group (Fig. 8.1, Sect. 8.2) led to metabolically more stable
and therefore longer-acting compounds.
8.7 Variations of the Spatial Pharmacophore
Rational design is characterized by the fact that the common feature of all
active compounds, and the differences to less potent or inactive analogues
can be derived from the structure of the pharmacophore. A pharmacophore
(Sect. 8.9) is defined as a special arrangement of particular functionalities that
are common to more than one drug and form the basis of the biological activity
(▶ Sect. 17.1).
During the course of rational optimization the molecular scaffold and the sub-
stituents at a pharmacophore are changed to maintain the principle function while
arriving at higher potency or better selectivity. Many computer methods have been
developed to generate ideas for the spatial isomorphic replacement of ligand scaf-
folds. By considering the conformational aspects of the molecules (▶ Chap. 16,
“Conformational Analysis”), they scan databases to find possible candidates that,
despite a different parent scaffold, can place the side chains and interacting groups
in the same spatial orientation. Examples of such approaches are presented in
▶ Sect. 10.8 and ▶ Chap. 17, “Pharmacophore Hypotheses and Molecular Com-
parisons”. But an indirect approach using the protein structure has also been tried.
For this, the spatial structure of the protein–ligand complex is the starting point
from which a part of the binding pocket is cut out, and new building blocks for the
ligand are sought. Subsequently the form and interaction properties of the cut-out
pocket are compared with a database of all known protein–ligand complexes
(▶ Sect. 20.4). If a subpocket is discovered that has similarities to the sought-
after pocket, then ligands that bind there provide an interesting design hypothesis.
The structure of the building blocks that occupy the newly discovered pocket can
generate ideas for isosteric structural elements in a modified ligand.
A different strategy that also considers the pharmacophore can be successful.
In this approach the pharmacophore is retained and only those groups are modified
that affect the pharmacokinetic properties, that is, the transport, distribution,
metabolism, and excretion of a molecule. An efficient and pragmatic strategy is
important. For this, it is essential that not too many changes are made at the same
time, and the changes should not be too biased. With little synthetic effort, a broad
spectrum of physicochemical properties and spatial arrangements should be
covered.
In the meantime it has been established that binding to human plasma proteins
such as serum albumin and the acidic k1-glycoprotein is of decisive importance for
the transport and pharmacokinetic properties of a drug. Therefore binding to these
proteins is considered even in the early phase of drug development (▶ Chap. 19,
164 8 Optimization of Lead Structures
“From In Vitro to In Vivo: Optimization of ADME and Toxicology Properties”). On
the other hand, binding to the hERG ion channels (so-called “antitarget”) is
avoided because blocking these channels can lead to arrhythmias (▶ Sect. 30.3).
Drug metabolism is in itself a very important theme and must be considered in
earlier phases of development. The cytochrome P450 enzymes are responsible
for the vast majority of chemical transformations that occur on xenobiotics
(▶ Sect. 27.6). To be able to predict the behavior of drug candidates at this stage
of the development process, the expected interactions with these metabolic
enzymes are evaluated in an early phase of optimization. The expression of P450
enzymes can also be induced by xenobiotics. The trigger for this could be the
binding to a transcription factor like the PXR receptor (▶ Sect. 28.7). Drug candi-
dates binding to this transcription factor can be evaluated early in their development
to avoid this undesirable enhanced metabolism.
8.8 Optimizing Affinity, Enthalpy, and Entropy of Binding
and Binding Kinetics
Generally, the binding affinity to a target protein is primarily improved during
the course of optimization. If multiple candidates are available, the ligand
efficiency (▶ Sect. 7.1) in addition to the chemical accessibility leads the way.
Small, potent lead structures offer legitimate hope that they can be well optimized.
Very small compounds that have nanomolar affinity, despite their low molecular
weight, can be problematic. Most of the time an optimal interaction pattern
is already established. It is then almost impossible to transfer this pattern to another
molecular scaffold. Medicinal chemists have established a set of rules based
on experience (▶ Sect. 4.10). According to these rules it is possible to judge
how much a particular group, if correctly placed, can contribute to the binding
affinity.
It was shown in ▶ Sect. 4.10 that the affinity is a combination of the enthalpic
and entropic contributions. Usually one begins with a lead structure that has
a binding affinity in the micromolar range. Expressed as the Gibb’s free
energy DG, this is usually about 30 kJ/mol. An increase in the binding affinity of
4–5 orders of magnitude causes an improvement in DG of 20–30 kJ/mol. Where
should the screw be turned to optimize a lead structure? Does it make more sense to
improve the binding enthalpy, or is one better advised to improve the binding
entropy? Given the enthalpy/entropy compensation described in ▶ Sect. 4.10, is it
even possible to attempt optimization of both values independently? The prereq-
uisite for using such a concept in the optimization is the determination of both
values of a lead structure. Does this help in the choice of the right candidate for
optimization? In the case that the thermodynamic binding profiles of multiple
alternative lead candidates are known, should enthalpically or entropically
driven binders be chosen for optimization? It is very interesting to compare the
thermodynamic signatures of multiple generations of marketed products. The
binding profiles for HIV protease inhibitors (▶ Sect. 24.3) and HMG-CoA
8.8 Optimizing Affinity, Enthalpy, and Entropy of Binding and Binding Kinetics 165
inhibitors (▶ Sect. 27.3) are displayed in Fig. 8.10. Notably, it has been successful
to shift the profile from initially strongly entropically driven binders to
enthalpically driven ones. This observation suggests that it is initially simpler to
optimize a substance’s entropic binding contribution than its enthalpic contribu-
tion. Most of the time this can be seen in the first lead structure upon which an
enlargement of the hydrophobic surface area leads to better binding. The affinity
that is gained is explained by the displacement of ordered water molecules
(▶ Sect. 4.6). Such contributions are assumed to be entropically favorable.
A strategy of introducing rigid rings can also be pursued. In doing so, the com-
pound loses degrees of freedom. If the geometry of the bound state is correctly
frozen, the binding is improved for entropic reasons. An example of this is the
5
ΔG ΔH −TΔS
kcal/mol
−5
0
−20
−15
−10
I
n
d
i
n
a
v
i
r
S
a
q
u
i
n
a
v
i
r
N
e
l
f
i
n
a
v
i
r
R
i
t
o
n
a
v
i
r
A
m
p
r
e
n
a
v
i
r
L
o
p
i
n
a
v
i
r
A
t
a
z
a
n
a
v
i
r
T
i
p
r
a
n
a
v
i
r
D
a
r
u
n
a
v
i
r
5
kcal/mol
−5
0
−20
−15
−10
F
l
u
v
a
s
t
a
t
i
n
P
r
a
v
a
s
t
a
t
i
n
C
e
r
i
v
a
s
t
a
t
i
n
A
t
o
r
v
a
s
t
a
t
i
n
R
o
s
u
v
a
s
t
a
t
i
n
Fig. 8.10 Between 1995 and 2006, the profile of multiple development generations of HIV
protease inhibitors (upper, for formulae see ▶ Fig. 24.15) and statins as HMG-CoA inhibitors
(lower, for formulae see ▶ Fig. 27.13) could be optimized for their thermodynamic signatures, that
is, the extent to which they are driven by entropy or enthalpy. The free energy DG is shown in red,
the enthalpy DH in blue, and the entropic contribution TDS in green. The more negative the
column becomes, the stronger the binding affinity and the more the profile is determined by
enthalpy or entropy. The initially developed compound such as indinavir, saquinavir, nelfinavir,
and pravastatin were entropic binders; in contrast, the newer derivatives such as darunavir or
rosuvastatin have an improved enthalpic profile.
166 8 Optimization of Lead Structures
binding of the largely rigid thrombin inhibitor 8.32, which binds in an almost
exclusively entropically driven manner to the protein (Fig. 8.11). In contrast, the
decidedly more flexible ligand 8.33 displays a large enthalpic binding contribu-
tion. Compound 8.32 represents the result of an optimization that led to a substance
with single-digit nanomolar binding and an optimal shape complementarity for the
binding pocket of thrombin.
As it seems, in general there are applicable concepts for the entropy-driven optimi-
zation. If one can “always win entropically,” then for theoretical reasons enthalpically
favored lead structures should be preferred as a starting point for optimization.
However, caution is called for here. Why a ligand has a particular thermody-
namic profile must be clarified. The inhibitors 8.34 and 8.35 were discovered in
a virtual screening as aldose reductase inhibitors (Fig. 8.12). The chemical struc-
tures of both ligands are very similar. Nevertheless one is an enthalpically driven
binder, and the other is an entropically driven binder. The crystal structure of both
ligands with the protein delivered the reason: the enthalpically preferred inhibitor
8.34 entraps a water molecule, which mediates binding between the ligand and the
protein, whereas the other one does not. The incorporation of a water molecule is
entropically disfavored, and therefore the profile appears to be that of an enthalpic
binder. A resistance profile for inhibitors against mutants of the viral HIV protease
was investigated in the research group of Ernesto Freire at The Johns Hopkins
University in Baltimore (▶ Sect. 24.5). Interestingly, the result was that resistance
to the entropically favored inhibitors could be developed much faster than to
inhibitors with enthalpic advantages. This observation indicates that it is worth-
while to concentrate on enthalpically favored binders in cases in which resistance
can be expected to develop. In the investigated example the enthalpically driven
O
O
O
N
CH3
CH3
S
N
H
O
N
H
O
O O
O
H3C
H3C
CO2H
N N
H
H
HN NH2
O
NH2
8.33
8.32
HN
ΔG: −42.3 kJ/mol
ΔH: −6.2 kJ/mol
−TΔS: −36.1 kJ/mol
ΔG: −49.2 kJ/mol
ΔH: −48.5 kJ/mol
−TΔS: −0.7 kJ/mol
Fig. 8.11 The rigid thrombin inhibitor 8.32 only has a small number of rotatable bonds. It has an
optimal shape complementarity to the binding pocket of thrombin. Its binding is, for the most part,
entropically driven. On the other hand, the considerably more flexible ligand 8.33 has a higher
enthalpic binding contribution.
8.8 Optimizing Affinity, Enthalpy, and Entropy of Binding and Binding Kinetics 167
binder 8.33 had a less-rigid scaffold (Fig. 8.11). This allows it to more easily elude
changes that are caused by mutations. It is much more difficult for rigid ligands that
bind for entropic reasons to adapt to such steric modifications.
On the other hand, entropic binders can also have an advantage in escaping
resistance. If a ligand is entropically favored because it adopts multiple binding
modes, and even exhibits large residual mobility in the binding pocket when bound,
this can prove to be beneficial! If the protein tries to change the shape of its binding
pocket through resistance mutations to this inhibitor, an incorporated ligand that is
able to adopt multiple binding modes is left with alternative orientations, which,
despite the mutation, still offer good binding.
If it is clear that a lead structure is an enthalpically driven binder, and
superimposed effects such as the entrapment of water molecules have not distorted
the profile, how is the binding of an enthalpically driven binder optimized? Let us
remember the consideration in ▶ Sects. 4.5 and ▶ 4.8: hydrogen bonds, electrostatic
interactions, and van der Waals contacts determine the binding enthalpy. However,
a change in such an interaction property of a molecule is often coupled with
a compensation of enthalpy and entropy. The result is that DG and the binding
affinity do not change at all! The optimization process can be compared to the act of
getting around the inherent enthalpy/entropy compensation. Enthalpically favorable
hydrogen bonds should have an optimal geometry and should not induce severe
structural changes in the protein environment. Otherwise this can lead to an entropic
compensation by causing a shift in the dynamic degrees of freedom. It seems to be
more favorable to strengthen the hydrogen bonds in structurally rigid regions of the
binding pocket. There, enthalpy is better gained because the compensatory shift in
dynamic parameters is less likely. Introduced hydrogen bonds should also not reduce
the degree of desolvation of a bound ligand in that they induce small structural
changes in the binding geometry of hydrophobic groups that become stronger when
OH
O S
OH
O
N N
O S
O
O
N
O
N
O2N
O2N
8.34 8.35
ΔG: −35.4 kJ/mol
ΔH: −25.6 kJ/mol
−TΔS: −9.8 kJ/mol
ΔG: −31.3 kJ/mol
ΔH: −8.7 kJ/mol
−TΔS: −22.6 kJ/mol
Fig. 8.12 Compounds 8.34 and 8.35 were discovered in a virtual screening as lead structure for
the inhibition of aldose reductase. Although they are structurally similar, 8.34 is a stronger
enthalpic binder and 8.35 is an entropic binder. The subsequent crystal structure analysis of the
complex with the reductase showed that 8.34 traps a water molecule upon binding, whereas this
was not observed with 8.35. Because the entrapment of a water molecule is entropically unfavor-
able, the binding of 8.34 is enthalpically preferred.
168 8 Optimization of Lead Structures
exposed to the surrounding solvent environment. It is also important that the local
water structure in the binding pocket remains unchanged.
Another essential question has to do with the optimal interaction kinetics that
a ligand should have. Surface plasmon resonance was introduced in ▶ Sect. 7.7.
The question of whether a ligand binds quickly or slowly to a protein and with what
rate it is released again can be determined with this method. Ideally, how long
should a ligand stay bound to a protein, what is the optimal residence time? The
binding affinity is determined by the relative ratio of the association rate (kon) and
the dissociation rate (koff). It has been shown that structurally similar ligands can
have entirely different kinetic profiles. Which profile is optimal? A loss in affinity
can manifest itself as an increased dissociation rate, or a slower association rate, as
well as a combination of both effects. It was shown in the research group of Helena
Danielson in Uppsala that different binding profiles of therapeutically used HIV
protease inhibitors correlate with the development of resistance to mutants of the
protease. They also demonstrated that resistance forms more rapidly against drugs
that have a higher dissociation rate. This is a decisive criterion to direct drug
optimization in the correct direction. Certainly the kinetic binding profile must be
granted a greater priority in the future. Therefore, a more comprehensive correla-
tion between the structure and the binding is necessary so that this knowledge can
be used for targeted design. Until now, what differentiates a “fast” or “slow” binder
has only been understood in a very few cases. These are parameters that have to do
with the induced-fit adaptations of the protein. It can also involve the ease with
which the desolvation of the previously uncomplexed binding pockets takes place
or with the kinetics with which a ligand in the solvated state sheds its own water
shell. More attention must be paid to these protein and ligand-based properties.
8.9 Synopsis
• A lead structure is only the starting point on the way to a drug; potency,
specificity, and duration of action have to be optimized concurrently to minimize
side effects and toxicity.
• The structure of an active substance is determined by its pharmacophore, which
is responsible for target binding. Its adhesion groups enhance potency and
biological activity, its lipophilicity is responsible for transport and distribution,
and groups to be cleaved or modified release the active form.
• Multiple concepts to modify the chemical structure of a lead can be planned,
however, optimization is multifactorial due to highly correlated influences of the
attempted changes.
• Bioisosteric functional group replacement attempts the exchange of groups on
a given skeleton for sterically and electronically related groups that maintain
activity but improve other drug properties.
• Me-too research follows the goal of modifying the competitor’s lead structures
to arrive at patent-free analogues with improved properties.
8.9 Synopsis 169
• Assuming unchanged activity, enlarging a molecule, adding chiral centers,
and rigidification usually improves selectivity, whereas removing chiral
centers, allowing more flexibility, and reducing the size makes a drug less
selective.
• The activity spectrum of a substance can be tailored even by the smallest
structural changes that modulate affinity, transportation, distribution, or metab-
olism. Therefore a particular compound class can show activity in quite different
therapeutic indications.
• Transforming agonists to antagonists does not follow clear-cut rules, however,
increasing the size and the attachment of hydrophobic groups such as aromatic
rings often shift the profile.
• The more polar a drug, the more poorly it can penetrate lipid membranes, and the
lower is the absorption. On the other hand, special transporters can assist
penetration.
• Extension of the duration of action is mostly achieved by replacement of
metabolically labile groups with more stable isosteres, the introduction of
more branching groups, blockage of metabolically labile positions at aromatic
rings by F or Cl, or by exchanging L- for D-amino acids concurrently with the
inversion of amide groups.
• Molecular databases can be screened to detect other scaffolds or substitution
patterns that represent a given pharmacophore in an alternative fashion.
• In the early phase of drug development undesired binding to plasma proteins,
antitargets such as the hERG ion channel or preferred binding, inhibition, or
activation of transcription factors or metabolizing cytochrome P450 enzymes are
examined and possibly avoided.
• Proper adjustment of the thermodynamic binding profile can be essential for the
optimization of binding affinity and to endow a drug with the required target-
specific properties. Similarly the interaction kinetics determining binding on and
off rates or residence times are of decisive importance to develop drugs with, for
example, an optimal resistance profile.
Bibliography
General Literature
Sneader W (1985) Drug discovery: the evolution of modern medicines. Wiley, New York
Taylor JB, Triggle DJ (eds) (2007) Comprehensive medicinal chemistry II. Elsevier, Oxford
Wermuth CG (ed) (2008) The practice of medicinal chemistry, 3rd edn. Elsevier-Academic,
New York
Special Literature
Burger A (1991) Isosterism and bioisosterism in drug design. Fortschr Arzneimittelforsch
37:287–371
170 8 Optimization of Lead Structures
Copeland RA, Pompliano DL, Meek TD (2006) Drug–target residence time and its implications
for lead optimization. Nat Rev Drug Discov 5:730–740
Fokkens J, Klebe G (2006) A simple protocol to estimate protein binding affinity differences for
enantiomers without prior resolution of racemates. Angew Int Ed Engl 45:985–989
Hansch C (1974) Bioisosterism. Intra-Science Chem Rept 8:17–25
Lipinski CA (1986) Bioisosterism in drug design. Ann Rep Med Chem 21:283–291
Ohtaka H, Freire E (2005) Adaptive inhibitors of the HIV-1 protease. Prog Biophys Mol Biol
88:193–208
Shuman CF, Markgren P-O, H€
am€
al€
ainen M, Danielson UH (2003) Elucidation of HIV-1 protease
resistance by characterization of interaction kinetics between inhibitors and enzyme variants.
Antiviral Res 58:235–242
Steuber H, Heine A, Klebe G (2007) Structural and thermodynamic study on aldose reductase:
nitro-substituted inhibitors with strong enthalpic binding contribution. J Mol Biol 368:618–638
Thornber CW (1979) Isosterism and molecular modification in drug design. Chem Soc Rev
8:563–580
Bibliography 171
Designing Prodrugs
9
After the optimization of a lead structure there are still problems. Many substances
lack important characteristics that are required for therapy in humans, for instance,
adequate bioavailability, duration of action and metabolic stability, the ability to
penetrate the blood–brain barrier, selectivity, or good tolerability. Often it proves
impossible to address or improve these properties through structural variation. A
solution to this problem can be found through special preparations, for instance to
be used for poorly water-soluble substances, or via a derivatization to a prodrug.
This term refers to a non-active or poorly active precursor or derivative of an active
molecule. In the organism this form is converted to the actual active substance. In
most cases, this is achieved by enzymatic reactions, in a few cases it happens by
spontaneous chemical decomposition.
Aside from this, the metabolites of some drugs also show favorable therapeutic
properties. In some cases this has led to new and improved drugs, in other cases the
original substance was retained as a prodrug.
9.1 Foundations of Drug Metabolism
Multiple factors have crucial importance for the absorption, bioavailability, and
duration of action of an active substance. The most important are the solubility and
lipophilicity of the drug, which are nearly equal in importance, followed by the
molecular size and the metabolic stability. The terms absorption and bioavailability
have very different meanings. Absorption refers to the amount of active substance
that is taken up by the entire gastrointestinal tract. The bioavailability refers to just
the portion of the active substance that is available in the circulation after the first
pass through the liver.
After oral administration, the metabolism of the substance by enzymes begins.
Ester and amide bonds are hydrolyzed, often already in the stomach and intestines,
or by passage through the stomach and intestinal wall. The entire blood volume
that flows through the intestines goes first to the liver via the portal vein (Fig. 9.1).
This passage is called “first pass”. Because of its rich spectrum of hydrolyzing,
G. Klebe, Drug Design, DOI 10.1007/978-3-642-17907-5_9,
# Springer-Verlag Berlin Heidelberg 2013
173
oxidizing, reducing, and conjugating enzymes, the liver is the main site of drug
degradation, that is, metabolism. A drug can have poor bioavailability despite
good absorption because of fast and pronounced metabolism in the liver. For many
substances, the first pass is already ‘the end of the road’. They are well absorbed,
but are immediately metabolized or excreted in the bile. The “first-pass effect”
refers to cases of successful and extensive metabolism in the very first passage.
Lipophilic active substances and those with a molecular weight of more than
500–600 Daltons (Da) are susceptible to particularly intense first-pass effects. Of
course, blood flows continuously through the liver, and metabolism carries on. The
substances are no longer in the blood stream at as high a concentration as they were
before the first liver passage because they have been distributed to the tissue. In
general, the hydrolytic cleavage of ester or amide groups leads to highly water-
soluble metabolites that can be excreted by the kidneys. Conjugation, that is, the
coupling of the substance with native polar substances, for instance, with sulfate
groups, the amino acid glycine, or the glucose oxidation product glucuronic acid,
leads to easily excreted products. In humans, conjugation has great importance. It
is more critical if the substance has neither easily degradable functional groups nor
conjugation positions. Nonetheless humans have enzymes that can metabolize xeno-
biotics. Among these, the cytochrome P450 isoenzymes are particularly important
because they are able to chemically change a molecule oxidatively at various posi-
tions. Usually this leads to better water solubility and therefore better-excretable
substances. Because these enzymes cannot predict what properties the metabolites
of these biotransformations will possess, it can occasionally happen that toxic com-
pounds ensue that have mutagenic or carcinogenic properties (▶ Sect. 27.6).
Evolution has had time over millions of years to hone the degradation and
excretion of foreign substances. For many compounds however, the system fails.
Instead of detoxifying, the opposite happens, a “poisoning”. The carcinogenic
effect of polycyclic hydrocarbons is attributed to an oxidative assault, just as is
Bile
Metabolites
Liver
Organs
Circulatory
System
Feces
Metabolites Gastrointestinal Wall
Portal Vein Kidney
Drug
Urine
Fig. 9.1 Schematic sketch of the “lifecycle” of a drug after oral administration. The drug is
already metabolized during the passage through the stomach or intestinal wall, and above all, at the
first pass through the liver. Lipophilic drugs and substances with a molecular weight of more than
500–600 Da are excreted with the bile. Polar substances and conjugated and/or metabolic products
(metabolites) are excreted by the kidneys.
174 9 Designing Prodrugs
the bone marrow damage and blood disease that is caused by benzene 9.1. The
simplest alkyl homologue of benzene, toluene 9.2 is less toxic for this reason alone
because it can be oxidized to benzoic acid 9.3, which, after conjugation with the
amino acid glycine, can be excreted as hippuric acid 9.4 (Fig. 9.2). There are even
more conjugation possibilities available for the benzoic acid intermediate.
One can speculate as to why no multienzyme complexes have evolved to
immediately convert toxic intermediates into polar, nontoxic metabolites. In any
case, it is an almost unsolvable problem because the properties of the metabolites
would have to be predicted for each xenobiotic. A modification that leads to
improved water solubility in one compound can cause a mutagenic effect in
another. For their own protection humans have, in fact, mechanisms for trapping
reactive metabolites. Here glutathione and glutathione transferase must be men-
tioned because they can detoxify electrophiles particularly well (▶ Sect. 27.7).
Perhaps toxic or carcinogenic effects were not a particularly decisive theme for
evolution until now. Tumors play a secondary role for most animals because of their
short lifespan. Up until just a few generations ago, war and infectious diseases were
the primary causes of death in humans. It has only been in recent times that the
average life expectancy increased. In the sense of evolution, aging individuals play
only a secondary role. Once reproduction is complete, the parents are only neces-
sary for the care of their young until early adulthood. One only needs to think of
female spiders that consider their mates to be nothing more than their next prey
immediately after copulation!
From the above-described examples of toxic chemicals, the wrong conclusion
should not be drawn that only human-made substances can cause cancer. That is
true for a few natural products as well, for instance, aflatoxins. These microbial
secondary metabolites, which form in spoiled nuts and other foodstuffs are potent
carcinogens. Certain alkaloids, for example, from the Spurge family (Euphorbiaceae)
are also strongly cancer-promoting substances; they are so-called tumor promoters.
The principle of nil nocere (Lat. do not harm) is strictly applied to medicines,
and only slowly have these standards been applied to other materials in our
O
H
H
Conjugation with
Macromolecules
Further
Metabolization
COOH
CH3
9.1 Benzene Epoxide
9.2 Toluene 9.3 Benzoic Acid 9.4 Hippuric Acid
CONHCH2COOH
Fig. 9.2 The oxidation of benzene 9.1 leads to a reactive and toxic intermediate. In contrast, the
oxidation of toluene 9.2 affords benzoic acid 9.3, which can be excreted by the kidney as its
nontoxic glycine conjugate 9.4.
9.1 Foundations of Drug Metabolism 175
environment. For the testing and development of active compounds, this means that
particularly rigorous tests for carcinogenic, mutagenic, and teratogenic effects must
be conducted. The well-founded suspicion alone that a compound or one of its
possible metabolites displays such effects leads to the consequence that the com-
pound is not further developed.
9.2 Esters Are Ideal Prodrugs
Establishing satisfactory water solubility in substances that are simultaneously
suitable for passive transport across membranes is a special challenge in pharma-
ceutical optimization. Nowadays attention is paid to the correct balance of
these parameters already in the early phase of development (▶ Chap. 19, “From
In Vitro to In Vivo: Optimization of ADME and Toxicology Properties”). If it is not
possible to achieve this optimum with the actual active substance, esters are often
produced as suitable prodrugs. Esters are easily cleaved by ubiquitously occurring
esterases. The improved lipophilicity helps with the passive transport through
diffusion over membrane barriers, as found in the intestines and above all else
the blood–brain barrier. One prodrug that has sadly achieved infamy is heroin 9.5
(Fig. 9.3), the diacetyl ester of morphine (▶ Sect. 3.3). Because of its markedly
increased lipophilicity, heroin penetrates the blood–brain barrier quickly. The
pharmacologist Heinrich Dreser, who tested acetylsalicylic acid at Bayer, intro-
duced heroin to therapy in 1898 as a pain and cough medicine because of its
minimal respiratory depression. But heroin belongs to the substances with the
highest addictive potential. Its abuse is an enormous social problem in many
countries. It is used therapeutically in exceptional cases, for instance, for pain
therapy in cancer patients, particularly those, who have exhausted other thera-
peutic options.
Many other prodrugs are also esters. The transformation from an acid or alcohol
group to an ester usually leads to a better-absorbable product. The formerly used
antilipidemic clofibrate 9.6 (▶ Sect. 28.6) is just such an example of a bioavailable
ester of a biologically active free acid 9.7. The angiotensin-converting enzyme
inhibitor enalapril 9.8 (▶ Sect. 25.4) and its analogues are also prodrugs. The free
acid 9.9 is not absorbed, but it is the active form in vitro (Fig. 9.3). The diester is
chemically unstable and quickly forms the inactive diketopiperazine 9.10. It is
essential that only one of the acid groups is esterified to prevent the formation of
this side product. The monoester 9.8 is “interpreted” as a dipeptide and is transported
over the cell membrane by an oligopeptide transporter (▶ Sect. 30.7). The b-lactam
antibiotics (▶ Sect. 23.7) are also taken up by this transporter.
Hydroxymethylglutaryl-coenzyme A 9.11 (HMG-CoA) is enzymatically
reduced to mevalonic acid 9.12 in the biosynthesis of cholesterol (Fig. 9.4). The
antilipidemic lovastatin 9.13 (▶ Sect. 27.3) prevents this reaction by inhibiting
HMG-CoA reductase. It contains a lactone ring, which is transformed to its active
form 9.14 by hydrolysis. This form is structurally very similar to the product of the
enzymatic reaction, mevalonic acid 9.12.
176 9 Designing Prodrugs
Other ester prodrugs were developed for depot formulations to achieve a longer
duration of action after subcutaneous or intramuscular administration.
The phenolic hydroxyl group of bambuterol 9.15 is masked as a carbamate.
Terbutaline 9.16 (Fig. 9.5) is formed from this prodrug after hydrolysis by
unspecific cholinesterases (▶ Sect. 23.7). By using this prodrug strategy it was
possible to make a long-acting bronchospasmolytic that only needs to be adminis-
tered once daily in contrast to the actual active substance, which must be admin-
istered three times daily.
Occasionally, a prodrug can be used to improve the taste, for instance, in the case
of the extremely bitter chloramphenicol 9.17. By converting it to the palmitate 9.18
(Fig. 9.5) the water solubility is strongly reduced, but the substance no longer tastes
bitter. The concomitant reduction in the absorption is of no consequence. The
substance is hydrolyzed to the highly soluble and easily absorbed chloramphenicol
in the duodenum by the pancreatic lipase enzymes.
The glucoside salicin (▶ Sect. 3.1) represents a true prodrug that after hydrolysis
and oxidation is converted to the anti-inflammatory salicylic acid. In contrast,
acetylsalicylic acid (ASA) is a mixed type. It has its own activity through the
CH3COO
Cl
O
H3C CH3
OR
O
O
N CH3
H
H
CH3COO
Cl
9.6 Clofibrate, R = Et
9.5
9.7 Clofibric acid, R = H
R2
R1
N
N
CH3
ROOC
N
N
O
O
EtOOC
O COOH
9.8 Enalapril, R = Et
9.9
9.10 Diketopiperazine
H
Enalaprilat, R = H R1
= Phenethyl, R2
= Me
Heroin
Fig. 9.3 Heroin 9.5, the diacetyl derivative of morphine acts reliably and quickly, “heroically.”
Like morphine, it is slowly and inefficiently absorbed, but after intravenous application it crosses
the blood–brain barrier 100 times faster than morphine. There, the ester is converted by the
enzyme pseudocholinesterase to morphine, which can no longer leave the brain because of its
higher polarity. The cholesterol-lowering drug clofibrate 9.6 is a prodrug of the actual active
compound, the free acid 9.7. The antihypertensive enalapril 9.8 is also a prodrug of the active
compound 9.9. Here the high lipophilicity is not responsible nor the absorption, rather it is actively
transported by binding to a dipeptide transporter. The diester of enalapril is unsuitable as a drug
because it spontaneously forms the inactive diketopiperazine 9.10.
9.2 Esters Are Ideal Prodrugs 177
irreversible inhibition of cyclooxygenase, above all as a coagulation-inhibiting
substance. On the other hand, ASA has prodrug character because the metabolic
release of salicylic acid contributes a small part to the anti-inflammatory effect
(▶ Sect. 27.9). Furthermore, ASA is less irritating to the mucous membranes and
tastes less unpleasant than salicylic acid. For a drug with a molecular weight of
180 Da, this combination of favorable characteristic in one structure is a proud
achievement.
Esterification can also help with inadequate water solubility of an active sub-
stance. For this, esters with phosphoric acid or hemiesters with dicarboxylic acids
such as succinic acid are formed. The added groups carry a charge and increase the
water solubility of the active substance. In the organism, the esters are easily
hydrolyzed again. The anticonvulsive compound phenytoin could be converted to
a more-hydrophilic phosphate prodrug 9.19 (Fig. 9.5), which is easily hydrolyzed
by phosphatases (▶ Sect. 26.8). If a terminal sulfonamide group, as found in the
prodrug of celecoxib (9.21, 9.22 Fig. 9.5), is acylated, water-soluble salts are more
easily formed. The acyl group is also easily hydrolyzed in the intestines.
Esterification with polyethylene glycol (PEG) can also be used to enhance
solubility. This very water-soluble polymer has been coupled through an ester
group to the natural product paclitaxel (▶ Sect. 6.2, ▶ 6.5). As PEG-paclitaxel,
this compound can be used as an intravenous chemotherapeutic.
9.3 Chemically Well Wrapped: Multiple Prodrug Strategies
The antibacterial sulfonamide, sulfamidochrysoidine (▶ Sect. 2.3) is a prodrug. It is
only after cleavage of the azo bond that the metabolic product, sulfanilamide, acts
as an antimetabolite of p-aminobenzoic acid, which is critical for microorganisms.
COOH
H3C
HO
OH
COOH
H3C
HO
HMG-CoA-
Reductase
O
SCoA
OH
9.11
9.12 Mevalonic Acid
HMG-CoA
O
H
HO
O
OH
COOH
H
HO
O
R
9.13 Lovastatin 9.14 Active Metabolite
R
Fig. 9.4 The enzymatic reduction of hydroxymethylglutaryl-coenzyme A 9.11 (HMG-CoA) to
mevalonic acid 9.12 is inhibited by the lactone-ring-opened active metabolite 9.14 of lovastatin
9.13 (▶ Sect. 27.3).
178 9 Designing Prodrugs
O
CH3
H3C
O
N
CH3
H3C
O
OH
N
H
CH3
N
HO
HO
OH
N
H
CH3
Bioactivation
O
OPO(OH)2
SO2
N
R
O
Na+
9.15 Bambuterol
9.16 Terbutaline
HO H
HN
O
H
CHCl2
O2N
N
H
N O
O
N
N
H3C
R
O
CF3
9.17 R = H
9.18 R = CO(CH2)14CH3
9.19 Fosphenytoin
9.20 R = Methyl
9.21 R = Ethyl
Celecoxib Prodrugs
N N N
NH NH
N N
N NH2
H
H
H
CH3
CH3
Cl
Cl
CH3
S
O
S
9.22 Proguanil
9.23 Cycloguanil
S S
F CH2COOH CH2COOH
F
9.25
9.24 Sulindac
Bioactivation
Bioactivation
H3C
H2N
Fig. 9.5 Bambuterol 9.15 is a carbamate-masked prodrug of the bronchospasmolytic terbutaline
9.16. It is transformed to the active compound slowly, by hydrolysis. The prodrug 9.18 of
chloramphenicol 9.17 masks only its extremely bitter taste. Phenytoin can be converted to
a phosphoric acid ester 9.19, which is significantly better water-soluble. The cyclooxygenase
inhibitor celecoxib can be converted to prodrugs (9.21–9.21) by adding acyl groups; these have
much-improved water solubility. The antimalarial cycloguanil 9.23 is formed by a metabolic
cyclization of the inactive precursor proguanil 9.22. The anti-inflammatory sulindac 9.24 has
100 times better water solubility than its actual active form, the sulfide 9.25. In addition to this
reversible enzymatic reduction an irreversible enzymatic oxidation to a biologically inactive
sulfone also occurs.
9.3 Chemically Well Wrapped: Multiple Prodrug Strategies 179
Additional prodrugs are proguanil 9.22, which is converted to cycloguanil 9.23
(▶ Sect. 27.2), or the anti-inflammatory sulindac 9.24, which is metabolically
converted to the active sulfide 9.25 (Fig. 9.5).
Amidines are used as building blocks in thrombin inhibitors and antagonists of
the integrin receptor aIIbb3 (▶ Sect. 31.2). These strongly basic groups are detri-
mental for good bioavailability. Through oxidation to the corresponding
amidoximes, a less-basic group is formed that is not protonated under physiolog-
ical conditions. Reductases, which are present in the liver, kidney, lung, or brain,
release the original amidine structure. This concept, together with the esterifica-
tion of the terminal acid function, was applied in a double-prodrug strategy for the
thrombin inhibitor ximelagatran 9.26 and the receptor antagonist sibrafiban 9.27
(Fig. 9.6).
The bombing of an allied ship that was docked in an Italian harbor in 1943 with
100 t of mustard gas 9.28 (bis-b-chlorethylsulfide, Fig. 9.7) led to the observation
that many of those who were poisoned experienced a severe reduction in their white
blood cell counts. This severe toxicity for cells that quickly divide could be used for
killing tumor cells. The cytotoxic effect arises from multiple alkylations of DNA.
Consequently, replication and subsequent cell devision are affected. A purposeful
search for analogues of mustard gas with less toxicity led over N-derivative 9.29 to
the aromatic-substituted derivative 9.30, which still had inadequate tolerability and
tumor specificity. Tumor cells are especially rich in phosphatases. Because of this,
H. Arnold at the German company Chemie Gr€
unenthal reasoned that phosphoric
acid derivatives of N-lost might be suitable for a tumor-specific therapy. The most
interesting compound was cyclophosphamide 9.31, a substance that can cause the
complete disappearance of tumors in animal experiments. The originally assumed
mechanism is not correct because the substance is inactive in vitro in cell cultures of
tumors. The metabolic activation occurs outside of the tumor in the liver through
oxidation (Fig. 9.7).
O
N
H
O O
N
N
N
H
EtO
NOH
NH2
NH2
O
Ximelagatran
9.26
O
N
N
O
EtO
H
O NOH
Sibrafiban
9.27
Fig. 9.6 Ximelagatran 9.26
and sibrafiban 9.27 were
developed to improve oral
bioavailability, and contain
both an uncharged amidoxime
group and an ester function as
a double prodrug.
180 9 Designing Prodrugs
In the case of the cancer therapeutic 5-fluorouracil 9.33, the activation occurs
through tumor-specific enzymes. The triple-prodrug capecitabin 9.34 is initially
activated to 9.35 by a carboxylesterase in the liver (Fig. 9.8). Then cytidine
deaminase cleaves an amino group to give 9.36 in the liver as well as in the
tumor. Lastly thymidine phosphorylase releases the active substance 9.33 in the
tumor cell. There, the compound unleashes its effect by blocking thymidylate
synthase, an enzyme that plays an important role in the thymine biosynthesis
(▶ Sect. 27.2) in that it delivers building blocks for DNA synthesis. Because cancer
cells divide more quickly than healthy cells, they are more dependent on the activity
of thymidylate synthase.
9.4 L-DOPA Therapy: A Clever Prodrug Concept
The neurotransmitters dopamine and acetylcholine fulfill different tasks in partic-
ular parts of the central nervous system. Parkinson’s disease, also called
the “shaking palsy,” is a result of the degeneration of dopamine-producing cells
in the Substantia nigra in the midbrain. The ensuing disproportion between the
S
Cl
Cl
N
Cl
Cl
R
O
P N
Cl
O
9.28 Mustard gas 9.29
9.30 N-Aryl-analog, R = Aryl
N
H
Cl
9.31 Cyclophosphamide
Metabolic activation
in the liver
O
P N
Cl
Cl
O
N
H
HO
Cl
HO O
O
O Cl
O
N
Cl
P
O
H2N H2N
P N
Cl
O
9.32 Active form Acrolein
+
N-analog, R = CH3
Fig. 9.7 The cytostatic N-methyl and N-aryl compounds 9.29 and 9.30 are derived from mustard
gas 9.28. The first step in the activation of the prodrug cyclophosphamide 9.31 is a metabolic
hydroxylation of the carbon next to the nitrogen atom. The biologically active agent 9.32 and the
toxic side product acrolein come from a labile intermediate that is formed by enzymatic degrada-
tion and spontaneous decomposition.
9.4 L-DOPA THERAPY: A CLEVER PRODRUG CONCEPT 181
dopaminergic and cholinergic nerve impulses leads to episodic chronic movement
disorders such as rigidity, tremor, shaking, and an inability to move normally.
Similar side effects are caused by substances that block the dopamine receptors,
for instance, the tricyclic neuroleptics (▶ Sect. 1.6). Intravenous administration of
dopamine 9.37 (Fig. 9.9) does not lead to the desired effect because the substance
cannot penetrate the blood–brain barrier. Because of its purely peripheral effect,
undesirable side effects on the heart and circulation are observed, for example, an
increase in heart rate and blood pressure.
The desired equilibrium in the brain should also be established by suppressing
the cholinergic system. This route is also taken by giving anticholinergics, that is,
antagonists to the cholinergic receptors. The administration of the amino acid
L-DOPA 9.38 (Fig. 9.9) is a more elegant possibility for dopamine substitutions.
This metabolic precursor of dopamine is an orally bioavailable, CNS-effective
medicine. It is even more polar than dopamine and can neither be absorbed from the
gastrointestinal tract nor can it cross the blood–brain barrier just by passive diffusion.
Because it is an amino acid, it uses an amino acid transporter (▶ Sect. 30.7).
With this, the first goal, CNS activity, is achieved. Oral L-DOPA administration
however, still presents too many side effects in the peripheral nervous system.
Furthermore, L-DOPA is very short acting as dopamine is quickly metabolized in
the brain. Therefore, one must try to prevent the metabolism of the substance while
simultaneously reducing its concentration in the periphery. The combination of
N
NH2
F
HN
F
O
O
CH3
N
O
O
N
N
O
O
H3C
H3C
Carboxyl-
esterase
Liver
Cytidine-
deaminase
Liver, Tumor
HO
HO
OH
OH
9.35
9.34 Capecitabin
H3C
HO
O
F
HN
O
F
HN
N
H
O
F
N
O
O Thymidine-
phosphorylase
Tumor
OH
9.33 5-Fluorouracil
9.36
Fig. 9.8 The triple-prodrug capecitabin 9.34 is activated to 9.35 by a carboxylesterase in the liver,
then it is transformed into 9.36 by a cytidine deaminase in the tumor, and a thymidine phosphor-
ylase produces the cancer therapeutic 5-fluorouracil 9.33.
182 9 Designing Prodrugs
L-DOPA with the peripheral decarboxylase inhibitor benserazide 9.39 and the CNS-
effective monoamino oxidase inhibitor selegilin 9.40 (▶ Sect. 27.8) largely solves
this problem. The peripheral side effects are reduced and the CNS effects are
extended (Fig. 9.9). Despite this tour de force of drug design, which has led to
significant therapeutic progress, the metabolically produced dopamine still acts in
too many places. Aside from the residual peripheral side effects, sudden changes
between excessive movement, normal movement, and rigidity, insomnia, agitation,
and hallucinations are all manifestations of the generalized CNS activity.
It has been speculated in conjunction with this observation, whether, in addition
to endogenous and genetic factors, environmental factors, for example, the meta-
bolic transformation of structurally analogous foreign substances, might be respon-
sible for triggering Parkinson’s disease.
9.5 Drug Targeting, Trojan Horses, and Pro-prodrugs
The design of active substances that exert their effect only in, or overwhelmingly in,
one particular organ is called drug targeting. Aside from general principles, for
example an optimal lipophilicity as a prerequisite for crossing the blood–brain
barrier, specific metabolic transformations are used. The Parkinson’s disease drug
L-DOPA, which was introduced in the previous section, is such a prodrug. The
anticonvulsive medicine progabide 9.41 is a double prodrug because both func-
tional groups of the neurotransmitter are masked. After crossing the blood–brain
barrier and release of the amino and carboxyl groups, the actual active compound,
g-aminobutyric acid (GABA, Fig 9.10), is formed.
NH2 NH2
HO
HO
HO
HO
HO
HO COOH
9.37 Dopamine 9.38 L-DOPA
OH
N
N
NH2
N
CH3
CH
H
N CH2OH
O CH3
9.39 Benserazide (racemate) 9.40 Selegilin
H
Fig. 9.9 Because dopamine 9.37 cannot enter the central nervous system, the metabolic precursor
L-DOPA 9.38 is used. To reduce the cardiovascular effects of dopamine, L-DOPA is combined
with a peripherally active decarboxylase inhibitor benserazide 9.39. The administration of
a monoamino oxidase inhibitor, for example, selegilin 9.40, prevents the fast degradation of
dopamine.
9.5 Drug Targeting, Trojan Horses, and Pro-prodrugs 183
The ability of the blood–brain barrier to exclude polar substances can also be used
as a prodrug concept. For this an active compound with a metabolically labile group
can be coupled to a dihydropyridine. The neutral conjugate 9.43 can cross the blood–
brain barrier. Oxidation leads to a permanently charged compound 9.44, which can
no longer leave the brain. Upon metabolic cleavage the free active compound is
released in situ (Fig 9.11). If oxidation takes place in the periphery, the highly water-
soluble complex is excreted before the actual active substance is released. As nice as
this principle seems, it has not found its way into therapy yet.
N
O
OH
O
N
NH2
F H2N
OH
Blood–Brain
Barrier
Cl
9.41 Progabid 9.42 GABA
Fig. 9.10 Because it is a lipophilic neutral molecule, progabide 9.41 can cross the blood–brain
barrier. It is transformed into the neurotransmitter g-aminobutyric acid (GABA) 9.42 upon
metabolic release of the amino and carboxyl groups.
Periphery Blood–Brain Barrier Brain
X Drug
O
H H
X Drug
Drug
O
H H
N
CH3 CH3
CH3
CH3
N
Metabolic Activation
9.43
Neutral
lipophilic
N
X Drug
O
N
X
O
+ +
N N
+ Metabolic
cleavage
Free
drug
9.44
+
Charged
polar
Fast
elimination
Fig. 9.11 Drug targeting in the brain is accomplished with a drug–dihydropyridine conjugate
9.43. This substance can easily enter the central nervous system. Metabolic oxidation leads to
a permanently charged pyridine 9.44, which cannot cross the blood–brain barrier. The active
compound is released in the brain, and the polar conjugate is quickly excreted from the periphery.
184 9 Designing Prodrugs
Several analogues of nucleoside bases and nucleosides are Trojan horses. The
anti-herpes medicine aciclovir 9.45 enters the cell as its inactive form. The first
monophosphorylation occurs only in virus-infected cells by a virus-specific thymi-
dine kinase. Next cellular kinases carry out the formation of the triphosphate, the
actual active substance. Because of this aciclovir acts as a targeted antiviral. The
compound is, however, poorly absorbed. The more suitable valaciclovir 9.46
(Fig. 9.12) is understood to be a pro-prodrug. In the organism it is initially
hydrolyzed to aciclovir and then transformed into the active form by the viral
enzyme. Valaciclovir is more lipophilic than aciclovir, but despite this it is more
soluble in water and approximately 55% bioavailable.
Omeprazole 9.47 is the prodrug of an irreversible inhibitor of the H+
/K+
-
ATPase, the so-called proton pump. Only under strongly acidic conditions, in the
acid-producing cells of the stomach, it is transformed into sulfenic acid 9.48, which
is in equilibrium with cyclic sulfenamide 9.49 (Fig. 9.13). This reacts irreversibly
with an SH group of the enzyme to form a disulfide. Omeprazole is more effective
than the H2 antagonists (▶ Sect. 3.5) because it blocks not only the histamine-
induced acid secretions but rather all forms of acid secretion.
The different metabolic activity in different tissues can be used to achieve
a selective effect in one specific organ. In principle, adrenaline (▶ Sect. 1.4) as
well as some b-blockers are suitable for the treatment of glaucoma, because they
can normalize elevated intraocular pressure. However, they have substantial unde-
sirable side effects on the heart function and circulation. This can be avoided by the
administration of prodrugs that are metabolized more quickly in the eye, or only in
the eye, for example, a particularly robust ester 9.50 of adrenaline 9.51, or a ketone–
oxime ether 9.52 of timolol 9.53 (Fig. 9.14).
The area of drug targeting has developed into an exciting field in the last years.
Aside from the above-described prodrugs that release active compounds in the
target area, the concept of antibody-coupled drugs has been pursued especially
for the development of novel cancer therapeutics. Another approach is the
coupling of drugs to a cell-specific recognition sequence. The goal of this work
is to trick the membrane transporters of very specific cells so that the drug
conjugate gains entry. Tumor therapeutics that were derived from N-lost were
introduced in Sect. 9.3. These cytotoxic alkylating compounds, however, are very
reactive and should only be activated in the desired target tissue. For this, the
HN
N
O
NH2
9.45 Aciclovir, X = H
N N
H2N
O
O
O
CH3
CH3
9.46 Valaciclovir, X =
X
Fig. 9.12 Aciclovir 9.45 is a Trojan horse. An enzymatic phosphorylation of its hydroxyl group
by a viral kinase affords its monophosphorylated form in virus-infected cells only, which is then
transformed to the triphosphate derivative by the cellular kinases. Valaciclovir 9.46 is a
pro-prodrug because it is first transformed to aciclovir by hydrolysis and subsequently activated.
9.5 Drug Targeting, Trojan Horses, and Pro-prodrugs 185
CH3
OMe
N
N
S
N
CH3
N
N
N
CH3
OMe
CH3
+
H+
O
N
S
OH
9.47 Omeprazole 9.48
MeO
MeO
H
H
N
N
CH3
OMe
N
N
N
CH3
CH3
OMe +
ATPase-SH
+
N
S CH3
N
S
S
ATPase 9.49
MeO
MeO
H
Fig. 9.13 In the presence of acids, omeprazole 9.47 is rearranged to a sulfenic acid 9.48, which is
in equilibrium with a cyclic sulfenamide 9.49. This reacts irreversibly with an SH group on the
H+
/K+
-ATPase, the so-called proton pump.
OH
N
H
CH3
N
N
N
O
O N
X
CH3
CH3
RO H
S
N
N H3C
9.50 Dipevefrine, R = COC(CH3)3 9.52 Oxime Ether, X = N-OCH3
RO
OH
Ketone, X = O
N
H
CH3
HO
HO 9.53 Timolol, X = H, OH
9.51 Adrenaline, R = H
Fig. 9.14 The metabolic peculiarities of the eye are exploited for drug targeting in glaucoma
therapy. After penetrating the cornea, the bis-pivaloyl ester, dipivefrin 9.50 of adrenaline 9.51 is
hydrolyzed 20 times faster than it is in the periphery. The oxime ether of timolol 9.52 is
metabolized through the ketone to the active form, timolol 9.53, only in the eye.
186 9 Designing Prodrugs
following strategies were developed. Aromatic N-lost derivative 9.55 (Fig. 9.15) is
released from prodrug 9.54 by specific peptide cleavage with carboxypeptidase
G2, an enzyme that only exists in bacteria. This enzyme was coupled to
a monoclonal antibody (▶ Sect. 32.3) that specifically recognizes human colorec-
tal cancer cells. With this, the enzyme that “arms” the cancer drug is brought in
the immediate vicinity of the cancer cell. In the future, this antibody-guided
enzyme-activated prodrug therapy could make cancer therapy more tolerable
and less toxic by releasing the active substance locally and in a distinctly more
targeted way.
9.6 Synopsis
• If it is impossible to achieve sufficient bioavailability, duration of action,
membrane penetration or metabolic stability by chemical modifications,
a prodrug can be developed that corresponds to a non- or poorly active precursor
or derivative that is converted in the organism to its active form.
• After absorption, a drug is transported to the liver and exposed to degrading
enzymes that make it better water-soluble for excretion. The amount of the drug
that survives this first liver pass is referred to as the bioavailable portion and can
be distributed in the organism.
• Esters are often used as prodrugs to mask polar acid groups; they are cleaved by
ubiquitously present esterases.
• A large variety of chemical modifications have been applied to modulate the
physicochemical properties of drug molecules, however, they require special
enzymes in the targeted cells or organs for metabolic activation.
• L-DOPA, an amino acid analogue of dopamine, is delivered to the brain via an
amino acid transporter and rapidly decarboxylated. To avoid side effects in the
periphery, a combination with polar decarboxylase inhibitors is advisable.
• Drug targeting to particular organs or cells exploits specific metabolic trans-
formations only present in these compartments of the body.
Cl Cl Cl Cl
N N
Carboxypeptidase
O N
COOH
COOH
O OH
H
9.54 9.55
Fig. 9.15 The highly
reactive cancer therapeutic
derivative 9.55 is released
from prodrug 9.54, which is
activated by a specific
carboxypeptidase. The
carboxypeptidase is bound to
an antibody that is targeted to
the cancer cell.
9.6 Synopsis 187
• Antibody-coupled drugs are specifically delivered to those compartments or
organs that present the antibody-specific recognition site on the surface of
disease-related cells. To trick membrane transporters, drugs can be coupled to
cell-specific recognition sequences and thus gain entry to the cells.
Bibliography
General Literature
Balant LP, Doelker E (1995) Metabolic considerations in prodrug design. In: Wolff ME (ed)
Burger’s medicinal chemistry, vol I, 5th edn. Wiley, New York, pp 949–982
Bodor N (1987) Prodrugs and site-specific chemical delivery systems. Annu Rep Med Chem
22:303–313
Bundgaard H (ed) (1985) Design of prodrugs. Elsevier, Amsterdam
Bundgaard H (1991) Design and application of prodrugs. In: Krogsgaard-Larsen P,
Bundgaard H (eds) A textbook of drug design and development. Harwood Academic, Chur,
pp 113–191
Ettmayer P, Amidou GL, Clement B, Testa B (2004) Learned from marketed and investigational
prodrugs. J Med Chem 47:2394–2404
Gibson GG (1994) Introduction to drug metabolism. Blackie, London
Rautio J (2012) Prodrugs and targeted delivery—towards better ADME properties. In:
Mannhold R, Kubinyi H, Folkers G (eds) Methods and principles in medicinal chemistry,
vol 47. Wiley-VCH, Weinheim
Silverman RB (2004) The organic chemistry of drug design and drug action, 2 edn. Elsevier
Academic, Oxford, Chapter 7, Drug metabolism, and Chapter 8, Prodrugs and drug delivery
systems
Stella VJ, Borchardt RT, Hageman MJ, Oliyai R, Maag H, Tilley JW (eds) (2007) Prodrugs:
challenges and rewards, vol 2. Springer, New York
Testa B (2007) Prodrug and soft drug design. In: Taylor JB, Triggle DJ (eds) Comprehensive
medicinal chemistry II, vol 5. Elsevier, Oxford, pp 1009–1041
Testa B, Mayer JM (2003) Hydrolysis in drug and prodrug metabolism – chemistry, biochemistry
and enzymology. Wiley-VHCA, Z€
urich
Special Literature
Bodor N, Buchwald P (2005) Ophthalmic drug design based on the metabolic activity of the eye:
soft drugs and chemical delivery systems. AAPS J 7:E820–E833
Brewster ME, Pop E, Bodor N (1993) Chemical approaches to brain-targeting of biologically
active compounds. In: Kozikowski AP (ed) Drug design for neuroscience. Raven, New York
Napier MP, Sharma SK et al (2000) Antibody-directed enzyme prodrug therapy: efficacy and
mechanism of action in colorectal carcinoma. Clin Cancer Res 6:765–772
188 9 Designing Prodrugs
Peptidomimetics
10
Peptides are open-chain polymers made up of amino acids (Fig. 10.1). The main
chain is constructed of alternating amide groups —CONH— and aliphatic
carbon atoms, which are labeled Ca. The side chains branch from the main chain
at the Ca atom. The amide group is barely flexible (▶ Sect. 14.1). In contrast,
a rotation around the Ca–Cb bond is possible. The side chains are flexible as well.
Because of this, each amino acid can take on multiple conformations. As a
consequence, peptides are very flexible molecules with many rotatable bonds and
a multitude of possibilities to adopt different spatial configurations. Formally, there
is no difference between the construction of peptides and proteins. Nonetheless,
oligomers of amino acids up to a size of 30–50 monomer building blocks are called
peptides, and the term protein is preferred for any members of this substance class
that are above this limit.
10.1 The Therapeutic Relevance of Peptides
Peptides are responsible for numerous biological functions in humans as enzyme
substrates and hormones. A few important examples are summarized in Table 10.1.
Accordingly, peptides are interesting for therapeutic purposes, and in fact, several
important drugs are peptides (Fig. 10.2).
The use of peptides as drugs is significantly limited by several factors:
• Peptides are poorly absorbed after oral administration; this is mostly because of
their high molecular weight and pronounced polarity.
• Peptides are easily degraded by proteases in the gastrointestinal tract and are
therefore metabolically unstable.
• The body is able to very quickly excrete peptides via the liver and kidneys.
Because peptides accomplish so many biological functions in our bodies,
there is tremendous interest in finding active substances that do not have the
above-mentioned detrimental properties, but that bind to the same receptors
analogously to peptides or block enzymes that transform peptide substrates.
A stepwise approach is taken in the search for such compounds. Peptide structures
G. Klebe, Drug Design, DOI 10.1007/978-3-642-17907-5_10,
# Springer-Verlag Berlin Heidelberg 2013
189
O
H
H
H
H2N
N
N
N
O O
O
N
H
COOH
O
Tyr Gly Gly Phe Leu
HO
Cg
Cb
O
ω φ
χ
ψ
H
H
N
N
Ca
O
Fig. 10.1 The pentapeptide Leu-enkephalin as an example of a peptide structure. The left side
with the free NH2 group is the N terminus, and the other is the C terminus. Each amino acid
contributes three atoms to the peptide chain. Nature almost exclusively uses the 20 natural
(proteinogenic) L-amino acids for the construction of peptides (see Appendix 1). Depending on
the functional groups in the side chains, the distinction is made between hydrophilic acidic and
basic amino acids and those with hydrophobic aliphatic and aromatic side chains. The amino acids
are abbreviated with three-letter codes. A one-letter code is also used. The definition of the torsion
angles o, f, c, and w is shown on the example of the amino acid phenylalanine. The angle o is
practically always close to 180
. The spatial course of the peptide backbone is determined by the f
and c angles (see ▶ Sect. 14.2). The first atom in the side chain is called the Cb atom, and the next
is given the index g.
Table 10.1 Several
important peptide hormones.
Peptide Function
Leu-Enkephalin,
Met-Enkephalin
Opiate receptor ligands, analgesics
Fibrinogen Platelet aggregation
Angiotensin II Blood pressure increase
Endothelin Blood pressure increase (among other
actions)
Neuropeptide Y Blood pressure increase (among other
actions)
Substance P Bronchoconstriction and pain mediation
190 10 Peptidomimetics
are replaced with isosteric building blocks so that the molecular recognition
properties of the peptide remain, but the undesirable characteristics are reduced.
Such peptidomimetics should have the following qualities:
• Few or no cleavable amide bonds to improve metabolic stability.
• Reduced molecular weight to improve oral bioavailability.
• The same spatial orientation of groups responsible for strong binding to the
receptor or enzyme as in the peptide.
Bacteria are the true masters of constructing peptide structures that frequently
achieve the desired metabolic stability. They incorporate amino acids that do not
belong to the typical 20 residues that are usually used for the construction of
proteins. Stereochemically inverted amino acids are also employed, and many of
these structures have a cyclic architecture. They have even evolved a dedicated
synthesis machinery for this: nonribosomal peptide synthesis (▶ Sect. 32.6). This
system of modular, coupled enzymes works like an assembly line. Depending on
the desired product, different enzymatic functional units are lined up, one after the
other, to successively assemble the amino acids cyclizing the product in the final
step. The exchange of an enzymatic synthesis unit causes other amino acids to be
incorporated into the otherwise unchanged peptide. Even ester bonds can be
constructed with a very similar multienzyme complex. Many lead structures all the
way to complete drugs can be derived from these originally bacterial peptides, such
as ciclosporin in Fig. 10.1, which is a most important immunosuppressant. A large
number of macrolide antibiotics (▶ Sect. 32.6) are also synthesized in this way.
Recently a so-called chemoenzymatic synthetic strategy has been developed for
the construction of such macrolides. As discussed in ▶ Sect. 11.6, linear oligopeptides
can easily be synthesized by using the Merrifield synthesis. Non-natural amino acids
H-Cys-Tyr-Ile-Gln-Asn-Cys-Pro-Leu-Gly-NH2 Oxytocin
N
Me
N
N
Me
N
N
Me
O O
O
Ciclosporin
N
O
Me
N
O
N
Me
N
Me
O
O
O
N
O
O
H
O
N
N
N
O
N
Me
H
H
pGlu-His-Trp-Ser-Tyr-D-Leu-Leu-Arg-Pro-NHEt Leuprolide
Fig. 10.2 Peptides as drugs. Oxytocin is used to induce and strengthen contractions during labor.
The immunosuppressive ciclosporin prevents organ rejection after transplantation. Leuprolide
(pGlu ¼ pyro-glutamate) is an analogue of LHRH (luteinizing hormone releasing hormone), one
of the hypothalamic hormones that, via LH (luteinizing hormone), controls the synthesis of male
and female sexual hormones. Leuprolide is used to treat advanced-stage prostate cancer.
10.1 The Therapeutic Relevance of Peptides 191
with L and D configurations can also be used to generate high combinatorial diversity.
It is very difficult to cyclize these linear oligopeptides to the desired macrocycle by
using chemical-synthetic methods. Here the nonribosomal peptide synthetic machin-
ery is of service. The synthetically prepared peptides are then funneled into the
enzymatic process chain and the cyclization domain from the bacteria catalyzes
the ring closure of the peptide: a perfect symbiosis between synthetic chemistry
and enzyme biology!
10.2 Designing Peptidomimetics
In the beginning of the 1980s, there was only one generally accepted example for
a low-molecular-weight active substance that takes over the function of an endog-
enous peptide: the opiate. It is assumed that morphine 10.1 is a mimetic of the
endogenous peptide b-endorphine 10.2 (Fig. 10.3). A comparison of both structures
makes it immediately clear that morphine cannot possibly simulate all of the
functional groups of the peptide. Obviously not all are necessary for the biological
activity. This underscores the suspicion that other peptides also bind to receptors
with only a few functional groups. If this hypothesis is true, it should be possible to
identify the essential functional groups and find a small organic molecule that has
the necessary functional groups in the correct relative orientation.
The starting point for the design of peptidomimetics is the identification of the
biologically active peptide, the function of which is to be imitated. In the first step,
single amino acids are excluded to determine whether a portion of the peptide
retains sufficient activity. Next the importance of the individual side chains is
investigated. In a so-called alanine scan (Sect. 10.7), each amino acid is succes-
sively replaced with alanine. A severe loss of activity is an indication that the
removed side chain is important. Until now only peptides made up of the natural
20 amino acids have been investigated. In the next step structural elements are
introduced that do not occur in the 20 proteinogenic amino acids. In principle, the
following are possibilities for peptide structure modification:
• The use of D- instead of L-amino acids.
• Modifications of the side chain of amino acids.
Tyr-Gly-Gly-Phe-Met-Thr-Ser-Glu-Lys-Ser-
Gln-Thr-Pro-Leu-Val-Thr-Leu-Phe-Lys-Asn-
Ala-Ile-Ile-Lys-Asn-Ala-Tyr-Lys-Lys-Gly-Glu
HO
H
H
O
HO
N CH3
H
Morphine 10.2
10.1 b-Endorphine
Fig. 10.3 Morphine 10.1 is a peptidomimetic for the endogenous peptide b-endorphine 10.2 and
the enkephalins (▶ Sect. 1.4). It binds as an agonist to the opiate receptor.
192 10 Peptidomimetics
• Changes on the peptide main chain.
• Cyclization to stabilize the conformation.
• The use of templates that enforce a particular secondary structure, or that allow
the attachment of side chains in a defined spatial orientation.
10.3 First Step to Variation: Modifying Side Chains
An improvement in a peptide’s binding properties can often be achieved by using
other side chains. For instance, in Fig. 10.4 a few analogues of the amino acid
phenylalanine are shown that could be used as possible replacements. An increase
in the binding affinity can be achieved if nonproteinogenic amino acids fill the
COOH
NH2
α
β
Phenylalanine
COOH
NH2
COOH
NH2
COOH
NH2
NH N COOH
COOH
NH
H2N
O
H
NH
H2N
H2N H2N H2N
O
HN
O
NH2
F O
COOH COOH COOH
Fig. 10.4 Sterically demanding, conformationally fixed, or metabolically stable analogues of the
amino acid phenylalanine; the structural enhancements are indicated in red.
10.3 First Step to Variation: Modifying Side Chains 193
binding pocket more completely. Rigid analogues lead to improved binding if the
biologically active conformation, the one that is adopted in or at the receptor site,
is immobilized.
The introduction of nonproteinogenic amino acids can increase the metabolic
stability. The hydroxylation of aromatic side chains can be suppressed by using
a substituent, for example fluorine or a methoxy group, in the para position.
Stability to cleavage by the digestive enzyme chymotrypsin can be improved by
adding substituents to the Cb atom because the modified side chain no longer fits
into the active site of this protease. A peptide’s proteolytic stability can also be
improved by exchanging L- for D-amino acids. As described above, bacteria have
already recognized this trick. Distributing D-amino acids randomly in the peptide
can furnish active substances with astonishing metabolic stability.
10.4 A More Courageous Step: Modifying the Main Chain
An important step in the design of peptidomimetics is the replacement of amide
bonds in the main chain. A few commonly used groups are summarized in
Fig. 10.5. It can be difficult or even impossible to find replacements for amide
groups, which make hydrogen bonds to the protein with the C═O as well as NH
groups, that do not decidedly reduce the binding affinity. If the amides only bridge
functional groups to one another and do not form hydrogen bonds to the protein,
N
O R
Amide bond
N
O R
O H OH R R R
R
H
N
CH3
N
R
O
R
H
X
O OH R
N
R
N-Methyl- Ketomethylen- Hydroxyethylen- (E)-Ethylen- Carba-
Ether Reduced Amide
H
H
X = -NH-, -O-, -CH2-
Phosphonamides, Phosphonates, Phosphinates
Retro-inverso
P
X
N
O
Fig. 10.5 Different functional groups that can serve as a replacement for amide bonds in
peptidomimetics.
194 10 Peptidomimetics
then a large palette of different replacement groups is available. Substitution at the
amide nitrogen atom leads to metabolic stabilization because proteases can hardly
cleave N-methylated amide bonds. If the N-methylation of a main-chain amide
group leads to a loss in affinity, several different explanations come into question.
One is that the N-methylated compound can no longer form hydrogen bonds, and an
essential H-bond is lost in which the NH group was involved. Further, it could be that
an undesired conformational change might have occured as a result of the additional
methyl group, or the methyl group might be sterically blocking the binding onto
the protein. On the other hand, an improvement in binding as a result of N-methylation
indicates that the biologically active conformation is stabilized. At room temperature,
an amide bond is practically exclusively in the trans geometry. Therefore it can also be
substituted with an ester bond that takes on the same geometry. In doing so, however,
the hydrogen-bond-donating properties of the amide are lost.
An N-methyl substitution improves the stability of the 180
-rotated conforma-
tion of the amide. In the case of proline, the only proteinogenic amino acid with an
N-alkyl substitution, both the cis and trans amide configuration can be found.
The exchange for a 1,5-disubstituted tetrazole can replace the cis orientation of
a proline. In addition, trans-configured double bonds imitate the geometry of an
amide bond well. The polar characteristics however, are lost. To a certain extent,
this can be compensated if the double bond is substituted with fluorine. The
reduction of an amide or an isosteric ester bond means the loss of the carbonyl
group and leads to increased flexibility. If the carbonyl group is exchanged
for an —S═O, —SO2 or —PO2 group, the H-bond-accepting characteristics
are amplified, however, a geometry change comes with the bargain. The exchange
of an amide for a thioamide results in a weakening of the H-bond-accepting
properties and can serve as a test of the possible importance of H-bonds to carbonyl
groups in the peptide backbone. Nonetheless, a measure of caution is warranted
because the desolvation of a thiocarbonyl group is less difficult than that of
a carbonyl group. This overlaps with the observed affinity and can mask the
effect of the loss of the H-bond. The retro-inverso exchange of an amide bond
can lead to marked improvement in the proteolytic stability without losing the
binding qualities (▶ Sect. 5.5).
An entirely different concept is the incorporation of b-amino acids
(▶ Sect. 31.7). In contrast to the proteinogenic a-amino acids, these residues have
four chain members per monomer unit. The amide bonds are separated by two
aliphatic carbon atoms. Peptides that are made from these amino acids also show
secondary structural characteristics (Sects. 10.5 and ▶ 14.2). They have already
successfully been incorporated into naturally occurring peptides as mimetics and
can simulate peptide–protein interactions. Because of the altered sequence of amide
bonds, they are stable to proteolytic degradation.
If the cleavable bond of a protease substrate is replaced with an isosteric, non-
cleavable group, a substrate can be converted to an inhibitor (▶ Sect. 6.6). If the
newly introduced group forms particularly favorable interactions with the active
site of an enzyme, an exceedingly potent enzyme inhibitor can result. An example is
found in the ketomethylene group in serine and cysteine protease inhibitors as a
10.4 A More Courageous Step: Modifying the Main Chain 195
possible replacement for the amide bond that is destined for cleavage (▶ Chap. 23,
“Inhibitors of Hydrolases with an Acyl–Enzyme Intermediate”). The
hydroxyethylene group is especially suitable for aspartic protease inhibitors
(▶ Chap. 24, “Aspartic Protease Inhibitors”). Phosphonamides, phosphonates,
and phosphinates are often strong inhibitors of metalloproteases (▶ Chap. 25,
“Inhibitors of Hydrolyzing Metalloenzymes”).
10.5 Rigidifying the Backbone by Fixing Conformations
An important aspect to the design of peptidomimetics is the peptide conformation.
Peptides are flexible molecules and can take on different conformations. It is known
however that certain conformations are preferably adopted in proteins and in some
peptides. Among these are the two most important secondary structural elements:
the a helix and the b sheet (▶ Sect. 14.2). Furthermore there are loops and turns at
the ends of these secondary structural elements that also adopt preferred patterns,
particularly the b turn (Fig. 10.6).
A b turn is formed when a hydrogen bond exists between the carbonyl group of
the amino acid i and the NH group of the amino acid i + 3. It is obvious that such
hydrogen bonds can only form for certain combinations of the torsion angles f and
c, which are determined by the amino acids in the i + 1 and i + 2 positions
(▶ Sect. 14.2).
b turns are especially interesting because many peptides bind to proteins in a
b-turn conformation. Let us assume that the main chain of the peptide only serves
to position the side chains so that an optimal receptor interaction can occur. Then it
should be possible to replace the peptide chain with an entirely different scaffold,
upon which functional groups are attached that adopt the same spatial orientation as
the amino acid side chains.
If a b-turn-configured peptide binds to a receptor, then a rigid analogue that
“freezes” the b-turn conformation should lead to improved binding. The simplest
way to fix a b turn is the incorporation of the necessary sequence in a small cyclic
peptide. It is known from experimental structure determination that cyclic penta-
and hexapeptides almost always contain a b turn. The conformation of these
peptides were investigated at length in the research group of Horst Kessler at the
Universities of Frankfurt and Munich. It could be shown that the position of a b turn
O Ri+2
Ri+1
O
HN
HN
O
N
Ri
H
NH O
Ri+3
fi+1
yi+1 fi+2
yi+2
Fig. 10.6 A b turn is
a peptide conformation in
which a hydrogen bond is
formed between the amino
acids i and i + 3. Particular
ranges for the values of the
torsion angles fi+1, ci+1, fi+2,
and ci+2 are characteristic for
the b turn.
196 10 Peptidomimetics
in a sequence can be controlled. Proline as well as D-amino acids prefer the i + 1
position in these loops. The introduction of D-amino acids supports the formation of
a b turn above other possible conformations.
A b turn can also be forced by a non-peptide template. Numerous b-turn
mimetics have been proposed for this (Fig. 10.7). A part of the structures serves
as a template on which two peptide chains can be forced in an antiparallel orien-
tation. However, substitution by introduction of the R2
and R3
side chains is
synthetically difficult. Benzodiazepines are interesting scaffolds onto which all
four side chains R1
–R4
can be coupled. Other peptide conformations can also be
fixed by the introduction of rigid groups. A few examples of conformation-
stabilizing ring systems are displayed in Fig. 10.8.
An especially convincing example of a scaffold mimic is the design of
a thyreotropine-releasing hormone (TRH) mimetic by P. N. Olson and co-workers.
S
H
S
H
N
H
O
N
O
N
O
H
O
S
NH
O
O
N
H
N
R
S
O
O
N
N
H
O
R
O
N
O
O N O N
Figure 10.7 Typical b-turn mimics. The amino acids are added onto the template at the colored
positions.
N
N
O
N N
R
H
H
N
O R
N
O
O
O
R
N S S
H
H
H H
N
R
N
HN
O
N
R
N
N
N
R
O
N
N
H
R R O O
N
O
N
N
O
H
Fig. 10.8 The illustrated rings replace one or two amino acids and force a particular
conformation.
10.5 Rigidifying the Backbone by Fixing Conformations 197
TRH is the tripeptide pGlu–His–Pro–NH2 10.3. The approach is shown in Fig. 10.9.
After deducing a pharmacophore hypothesis, a rigid scaffold molecule was sought
upon which the side chains could be appended in the correct relative orientation.
Cyclohexane was chosen as a scaffold. Compound 10.4 is a potent TRH receptor
ligand. The substance acts as an agonist and elicits the same effects as TRH. An
improvement in cognitive function could be seen in animal experiments after the
administration of 10.4.
10.6 Peptidomimetics to Interfere with Protein–Protein
Interactions
Proteins communicate with one another and transmit information and signals in that
they form mutual complexes through commonly shared surfaces. The area of the
shared contact surface is usually larger than a few thousand square Ångströms (Å2
).
This is a large value when compared to the surface that a small organic molecule of
typical drug size occupies upon binding. Furthermore, the contact area between
two proteins is, as a general rule, not very jagged. It hardly resembles the deep
binding pockets in enzymes that can host small ligands. Nevertheless it would open
entirely new perspectives for drug therapy if such protein–protein contact surfaces
could be blocked with low-molecular-weight compounds. At first glance, this task
seems almost impossible. How can a small molecule bind to a flat, barely structured
O
N
N
H
N
O CONH2
N
NH
O
10.3 TRH
H
N
N
N
H
H
N
N
O
CONH2
O N
R
CONH2
N
Ph
Pharmacophore 10.4
Fig. 10.9 By starting with the structure of tripeptide TRH 10.3 and a hypothesis for the functional
groups that are essential for binding, the non-peptidic molecule 10.4 was designed, which also
binds to the TRH receptor.
198 10 Peptidomimetics
protein surface with an interaction that is strong enough not to be “washed away”
when the protein–protein contact forms? Furthermore, there is the problem that
amino acid residues on the convex surface of a protein have in general much more
space to flexibly adapt their conformation. A statistical analysis of the amino acid
composition across the contact surfaces in protein complexes showed a preference
for aromatic residues, aspartate, arginine and the aliphatic residues proline and
isoleucine. The selective exchange of amino acids in the contact surface also
showed that there are a few protruding residues that dominate the interaction
(so-called “hot spots,” ▶ Sect. 17.10). The search for possible binding sites of a
small molecule that can compete with the formation of the protein–protein interface
starts with a detailed analysis of the complementary geometry to the contacting
surfaces. Are there clustered areas with charged residues or does a structural
element such as a b turn or a helix penetrate a little more deeply into the opposite
contact surface? Next, the peptide sequence that corresponds to the contact surface
is synthesized. This can be portions that preferably adopt a helical structure or that
can be fixed in a turn pattern such as a cyclopeptide. If an active peptide is found, it
must be structurally characterized in complex with the opposite contact surface.
The complex of the BCL-XL (B-cell lymphoma) protein with a 16-residue
peptide that was cut from the BAK protein is shown in Fig. 10.10. BCL-XL belongs
to the proteins that prevent programmed cell death (apoptosis). Its function is
regulated by binding to pro- and antiapoptotic factors such as BAK. Inhibitors of
this contact formation might therefore deliver potential drugs for an anticancer
therapy. The binding of the helical peptide takes place in a stretched-out groove.
Small molecules have been discovered that fill this crevice (Fig. 10.11). The group
Fig. 10.10 The NMR spectroscopic structure of the BCL-XL protein with the a-helical,
16-membered peptide fragment from the BK protein (orange). The peptide binds in a deep groove
with the amino acids Ile85, Ile81, Leu78, Val74 (from left to right, side chains are in light blue).
The surface of the BCL protein is shown in white, the contact surface of the hydrophobic amino
acids of the peptide all protrude into the cleft and are indicated by the light-blue net.
10.6 Peptidomimetics to Interfere with Protein–Protein Interactions 199
of Andrew Hamilton at Yale University has been searching for a basic scaffold that
can imitate the characteristics of a helix and simultaneously hold the side chains on
one side. Terphenyl derivatives 10.5–10.7 were found that can arrange the side
chains in a staggered conformation analogous to a helix. An alanine scan along the
BAK peptide showed that four hydrophobic residues (Val74, Leu78, Ile81, and
Ile85) are essential for binding. In addition, Asp83 forms a salt bridge to BCL-XL.
The terphenyl scaffold was therefore furnished with an acidic group at the end and
decorated with alkyl and aryl residues in the ortho positions. Compound 10.6 binds
to the BCL-XL protein with an affinity of 114 nM.
A different approach was taken at Abbott. Small molecules that interact
with the BCL protein were sought by NMR spectroscopy (▶ Sect. 7.8). The
millimolar inhibitors para-fluorobiphenylcarboxylic acid 10.8 (Fig. 10.11) and
1-hydroxytetraline 10.9 were discovered. Both bind to distinct but neighboring
positions. They replace Asp83 and Leu78 of the binding domain of the BAK peptide,
and 10.9 occupies the Ile85 position. From the two discovered fragments, the scientists
at Abbott developed compound 10.10, which had two-digit nanomolar affinity for the
protein. Further optimization led to 10.11, a highly potent antagonist that blocks the
entire family of antiapoptotic BCL-2 proteins. The synergistic effect of ABT-737
together with radiation and chemotherapy was demonstrated in animal experiments.
An analogous case was studied with the MDM2 protein at Roche. MDM2 is
overexpressed in many tumors. It binds to the tumor-suppressor protein p53, which
protects cells from converting to a malignant state. It is therefore the protein that is
most often inactivated during the carcinogenesis. Inhibition of complex formation
between the overexpressed MDM2 protein and p53 could thus represent an
approach to a possible cancer therapy. Here too, an a-helical p53 peptide stretch
binds to a hydrophobic groove on the MDM2 protein. A cis-imidazoline with an
affinity of 100–300 nM was found in screening. The co-crystal structure was
accomplished with 10.12 (Fig. 10.11). The imidazoline scaffold imitates the side
of an a helix of the peptide from the p53 protein. The two p-bromophenyl rings
replace a Trp and a Leu. The ethyl ether group on the third aromatic ring orients in
the pocket that is filled with a phenylalanine in the peptide. The MDM2 protein is
blocked through this competitive binding, and the level of free p53 increases.
Through this, the p53 pathway in cancer cells is activated, and the cell cycle
comes to a complete stop. The cell may go into programmed cell death. The
tumor growth inhibition was already demonstrated in animal models.
Another large class of proteins that is controlled by contacts with other proteins
is the integrins. Numerous low-molecular-weight inhibitors have been discovered
for this class. An example for the successful design of antagonists by starting from
cyclic peptides is presented in ▶ Sect. 31.2. Many G protein-coupled receptors
(▶ Sect. 29.1) are controlled by endogenous peptides or proteins. For this, the
peptide or protein binds to the receptor. The replacement of the peptide sequences
with an organic molecule that imitates the binding of the natural ligand has also
been attempted. An example of the design of such an active compound is given in
▶ Sects.29.5 and ▶ 29.6. Although successful, the design concept that was followed
200 10 Peptidomimetics
O
COOH
O
COOH
O
COOH
COOH COOH COOH
10.5 10.6 10.7
Kd = 114 nM
Kd = 1.89 μM Kd = 2.70 μM
OH
COOH
10.9
K
Kd = 4.3 mM
F
10.8
H
NO2
O NH
S
O
O
N
S
F
10.10
Ki = 36 nM
Kd = 0.3 mM
N
H
NO2
N
H
O NH
S
O
O S
N
N
10.11
Ki = 1 nM ABT-737
Br
N
N
O N
OH
O
N
Br
O
10.12
Cl
Fig. 10.11 Different inhibitors of protein–protein contacts that imitate the a-helical structural
building blocks in the contact surface. The terphenyl derivatives 10.5–10.7 bind to the BCL-XL
protein in a pronounced crevice and block the binding site of a helix. The small fragments 10.8 and
10.9, which led to the development of inhibitors 10.10 and 10.11 were discovered in the same area
in an NMR spectroscopic screening. Compound 10.12 is a different helix mimetic that prevents the
interaction between the MDM2 and p53 proteins.
10.6 Peptidomimetics to Interfere with Protein–Protein Interactions 201
was wrong: the active peptide and the derived synthetic mimic do not bind in
an overlapping binding region of the receptor.
10.7 Tracing Selective NK Receptor Antagonists by Ala Scan
Tachykinins are neuropeptides that all contain the same lipophilic C terminus:
–Phe–X–Gly–Leu–Met–NH2. A well-investigated representative of the tachykinins
is substance P, Arg–Pro–Lys–Pro–Gln–Gln–Phe–Phe–Gly–Leu–Met–NH2 (10.13,
Table 10.2). Tachykinins bind to at least three different tachykinin receptors, the
NK1, NK2, and NK3 receptors. All three belong to the class of G protein-coupled
receptors (▶ Sect. 29.1). They mediate a variety of biological effects, for example,
bronchoconstriction or pain transmission. Consequently a receptor antagonist could
be helpful for the treatment of asthma as well as to fight pain.
The study that was carried out on the development of an NK2 receptor antagonist
at Parke–Davis in Cambridge is a classic example of conversion of a peptide to
a peptidomimetic (Table 10.2 and Fig. 10.12). A compound was sought that binds
to the same receptor as substance P. Starting point of the work was a hexapeptide,
Leu–Gln–Met–Trp–Phe–Gly–NH2 (10.14), known from the literature that binds to
the NK2 receptor with an affinity of 11.7 nM. In the first step each amino acid was
systematically exchanged for alanine (10.15–10.20). In a few cases the
Table 10.2 The rational design of NK2 receptor ligands.
No. Structure Ki (nM)
Substance P 10.13 Arg-Pro-Lys-Pro-Gln-Gln-Phe-
Phe-Gly-Leu-Met-NH2
295
Minimal fragment 10.14 Leu-Gln-Met-Trp-Phe-Gly-NH2 11.7
Ala scan 10.15 Ala-Gln-Met-Trp-Phe-Gly-NH2 40
10.16 Leu-Ala-Met-Trp-Phe-Gly-NH2 138
10.17 Leu-Gln-Ala-Trp-Phe-Gly-NH2 156
10.18 Leu-Gln-Met-Ala-Phe-Gly-NH2 10,000
10.19 Leu-Gln-Met-Trp-Ala-Gly-NH2 8,300
10.20 Leu-Gln-Met-Trp-Phe-Ala-NH2 28
10.21 Leu-Gln-Met-Trp-Phe-NH2 200
Dipeptid 10.22 Z-Trp-Phe-NH2 2,700
Immobilization of the biologically active
conformation
10.23 Z-Trp-(R,S)-(a-Me)Phe-NH2 327
N-Terminal optimization 10.24 (2,3-di-OCH3)C6H3CH2OCO-
Trp-(R,S)-(a-Me)Phe-NH2
37.6
Stereochemical optimization 10.25 (2,3-di-OCH3)C6H3CH2OCO-
Trp-(R)-(a-Me)Phe-NH2
10,000
10.26 (2,3-di-OCH3)C6H3CH2OCO-
Trp-(S)-(a-Me)Phe-NH2
17.2
Addition of amino acid 10.27 (2,3-di-OCH3)C6H3CH2OCO-
Trp-(S)-(aMe)Phe-Gly-NH2
1.4
202 10 Peptidomimetics
replacement with alanine resulted in only a weak decrease in the binding affinity.
As an example, the N-terminal leucine could be replaced with an alanine (10.15).
The conclusion was that the Leu side chain can only be of secondary importance for
receptor binding. The compound in which tryptophan or phenylalanine were
replaced with alanine, however, showed very little affinity for the NK2 receptor.
This was the “smoking gun” that these two amino acids are essential for the
binding. The removal of the C-terminal amino acid glycine (10.21) decreased the
affinity by a factor of 7. Obviously this amino acid also has some importance
for receptor binding. The testing of several N-terminal protected dipeptides led
to Z–Trp–Phe–NH2 (10.22, Ki ¼ 2700 nM) as a lead structure for further work.
With this, the first stage of the project was accomplished. As a dipeptide, 10.22
represented an interesting lead structure for further work.
In the next stage, additional methyl groups were introduced at different
positions of the molecule. This limited the number of possible conformations.
A decrease in binding affinity was observed for many of the investigated com-
pounds with conformational restriction. A methyl group on the Ca atom of
phenylalanine increased the binding affinity by a factor of 8 (10.23, Ki ¼ 327 nM).
A possible explanation for this finding is that the conformation that is adopted in the
receptor is stabilized by the additional methyl group. Then the N-terminal part
of the molecule was varied. The replacement of the terminal phenyl ring with
a 2,3-dimethoxyphenyl group further increased the binding affinity by a factor of 10
(10.24, Ki ¼ 37.6 nM). This value corresponds to the racemic a-methylphenyl-
alanine. The enantiomerically pure compound 10.26 with this building block in the
H
Ki = 2700 nM
10.22, R = H
N
O O
H
H
Ki = 327 nM
10.23, R = CH3
O N
N
O
NH2
R
H
H
10.26, R = H
N
H
Ki = 17.2 nM
10.27
O N
O
N
O
O
O
H
Ki = 1.4 nM
, R = CH2 CONH2
O N
O
NHR
H
Fig. 10.12 Important intermediates on the way to NK2 receptor antagonists 10.27.
10.7 Tracing Selective NK Receptor Antagonists by Ala Scan 203
S configuration binds with a Ki of 17.2 nM. The reintroduction of the C-terminal
glycine finally led to the highly potent compound 10.27 (Ki ¼ 1.4 nM).
Independent of the work at Parke–Davis, lead structure 10.28 was optimized to
the NK1-specific receptor antagonists 10.32 and 10.33 at Merck, Sharp,  Dohme
(MSD). Although 10.28–10.32 were only effective in vitro, 10.33 is also active
in vivo because of its higher metabolic stability (Fig. 10.13). MSD was finally
successful with the structurally related aprepitant 10.34. The compound was intro-
duced as a medicine to prevent acute emesis (vomiting) during highly nausea-
inducing chemotherapy.
10.8 CAVEAT: Idea Generator for the Design of
Peptidomimetics
In the previous sections it was often highlighted that the side chains of the amino
acids are responsible for the binding to receptors. Usually the main chain merely
plays the role of a scaffold that serves to bring the side chains into the necessary
spatial alignment for binding. As such, a rigid, non-peptidic scaffold onto which the
side chains can be attached in the same spatial orientation should be suitable to
design molecules with similar properties as peptides. This idea was embedded in
a computer program in the group of Paul Bartlett at the University of California in
Berkeley. The program CAVEAT allows the search for rigid molecules that
N
H
10.28, R = Et, X = H
O
R
X
10.29, R = H, X = H
10.30, R = H, X = 3,5-di-CH3 I C50 = 1533 nM
I C50 10000 nM
I C50 = 3800 nM
10.31, R = Ac, X = 3,5-di-CH3 I C50 = 67 nM
N
O
N
H
F
10.32, R = Ac, X = 3,5-di-CF3 I C50 = 1.6 nM
H
CF3
CF3
CF3
CH3
O
N
O
N
H
N
O
CF3
O
N
H
N
O
10.33, IC50 = 3 nM
H
10.34 Aprepitant
Fig. 10.13 The optimization of lead structure 10.28, which was found by screening, to selective
NK1 receptor antagonists 10.32 and 10.33. In contrast to the metabolically labile benzyl esters
10.28–10.32, ketone 10.33 is also active in animal experiments. The first NK1 receptor antagonist
aprepitant 10.34 was successfully brought to the market by MSD for the prevention of acute
emesis.
204 10 Peptidomimetics
imitate a particular segment of a peptide scaffold. For this, the bonds on the peptide
backbone are described with vectors (Fig. 10.14). The 3D structure of the peptide
for the peptidomimetic being sought must be known as a prerequisite. The orien-
tation of the side chains is determined by the binding vectors Ca–Cb. The relative
orientation of, for instance, three amino acid side chains is found by the position
of the relevant Ca–Cb binding vectors. With this spatial pattern of vectors, a 3D
database of molecular scaffolds that contain three substitutable bonds oriented
analogously to the three Ca–Cb vectors is searched. The result is a list of rigid,
usually cyclic molecular scaffolds, the free positions of which can be coupled to the
amino acid side chains.
10.9 Design of Peptidomimetics: Quo Vadis?
In this chapter the systematic approach to the design of peptidomimetics has been
described. The approaches have proven themselves in many cases and have led to
many attractive drugs. Nevertheless there are also difficulties. The first problem is
the stepwise approach. A peptide is systematically modified, and the synthesized
structures serve only to identify the essential functional groups. The synthesis of
the many resultant derivatives, that is, practically all in which an amide group was
NH
NH
NH2
HN
A
B
NH
N
HN
O
O
H
C
O
OH
Fig. 10.14 The principles of a 3D search for scaffold mimics with the CAVEAT program. First,
the relative orientation of the biologically active side chains in the peptide lead structure is defined
by the Ca–Cb vectors. In this example the three essential amino acids Trp, Arg, and Tyr are taken.
The three vectors, A, B, and C are the essential information used to search the 3D database for rigid
scaffold structures that bear substitutable bonds in the same relative orientation. A list of cyclic
structures that represent possible templates for peptidomimetics is the result.
10.9 Design of Peptidomimetics: Quo Vadis? 205
replaced by one of the structures in Fig.10.4, is laborious. Furthermore these
compounds only serve as tools because most modified peptides have high molecular
weights, and this can result in poor oral bioavailability.
In the past, many new nonpeptidic active substances, especially as receptor
antagonists, were found in high-throughput screening, and these could frequently
be developed into clinical candidates in relatively little time. These successes have
pushed rational concepts for the development of peptidomimetics, which were once
in the foreground, somewhat into the background. Despite this, the design of
peptidomimetics remains an important research area in drug design. The terphenyl
scaffold helix mimetics serve as an example of this. Many enzyme inhibitors that
are introduced in ▶ Chaps. 23, “Inhibitors of Hydrolases with an Acyl–Enzyme
Intermediate”; ▶ 24, “Aspartic Protease Inhibitors” and ▶ 25, “Inhibitors of Hydro-
lyzing Metalloenzymes” continue to have peptidic character. Here the peptide
substrate is clearly the “gold standard” for the design of a mimetic. As always,
the peptidomimetic concept plays an important role in lead structure optimization.
10.10 Synopsis
• Peptides are open-chain polymeric molecules made up of amino acids that
are mutually linked by amide bonds. Side chains branch from the main chain
at the Ca atoms and show a high degree of flexibility. If such a polymer contains
up to 30–50 amino acids, it is called peptide, beyond this limit, it is called
protein.
• Peptides are responsible for many biological functions; their applicability as
drugs is limited due to size, polarity, and poor proteolytic stability.
• Due to their multiple functions, peptides can be mimicked by smaller – similarly
binding – and metabolically stable peptidomimetics.
• Peptidomimetic design starts with the identification of the minimal peptide
sequence responsible for a biological effect, followed by successive replacement
of each amino acid in the chain with alanine to detect the side chains responsible
for activity. Finally, individual amino acids are replaced by non-proteinogenic
ones or similar chemical building blocks.
• Multiple surrogates for amino acid side chains have been developed and can be
tested to reveal better binding and more stable peptidomimetics. If not involved
in direct binding, main-chain amide bonds can be replaced by a large variety of
substitutes that achieve a similar geometry.
• Peptides are flexible and adopt multiple conformations. If a particular fold is
adopted to correctly orient interacting side chains, the peptide backbone can be
replaced by an entirely different scaffold that correctly positions the essential
interacting groups.
• Peptides fold upon themselves through particular turn patterns. These turns
stabilize a required conformation and can be chemically replaced by rigid
structural surrogates that freeze a given turn conformation.
206 10 Peptidomimetics
• Proteins communicate with each other through the formation of large, mutu-
ally shared surface patches. Small molecules designed to bind to such flat
surfaces can antagonize complex formation and interfere with protein–protein
communication.
• Design of small molecules to block protein–protein interfaces exploits depres-
sions on the surface that accommodate spatial patterns such as turns or helical
portions of the penetrating contact surface of the partner protein.
• Peptides bind to receptors mostly via side chains, and the backbone provides the
scaffold for their attachment. Computer programs can be used to screen struc-
tural databases to retrieve alternative scaffolds that are able to orient substituents
in very similar fashion.
Bibliography
General Literature
Ahn J-M, Boyle NA, MacDonald MT, Janda KD (2002) Peptidomimetics and peptide backbone
modifications. Mini Rev Med Chem 2:463–473
Gante J (1994) Peptidomimetics—tailored enzyme inhibitors. Angew Chem Int Ed Engl
33:1699–1701
Giannis A, Kolter T (1993) Peptidomimetics for receptor ligands—discovery, development, and
medical perspectives. Angew Chem Int Ed Engl 32:1244–1267
Hirschmann R (1991) Medicinal chemistry in the golden age of biology: lessons from steroid and
peptide research. Angew Chem Int Ed Engl 30:1278–1301
Marahiel MA (2009) Working outside the protein-synthesis rules: Insights into non-ribosomal
peptide synthesis. J Pept Sci 15:799–807
Special Literature
Howson W (1995) Rational design of Tachykinin receptor antagonists. Drug News Perspect
8:97–103
Lauri G, Bartlett PA (1994) CAVEAT: a program to facilitate the design of organic molecules.
J Comput Aided Mol Des 8:51–66
Lelais G, Seebach D (2004) b2
-amino acids-synthesis, occurrence in natural products, and
components of b-peptides. Biopolymers 76:206–243
McLeod AM, Merchant KJ, Cascieri MA et al (1993) N-Acyl-Ltryptophan benzyl esters: potent
substance P receptor antagonists. J Med Chem 36:2044–2045
Merchant KJ, Lewis RT, MacLeod AM (1994) Synthesis of homochiral ketones derived from
L-tryptophan: potent substance P receptor antagonists. Tetrahedron Lett 35:4205–4208
Olson GL, Bolin DR, Bonner MP et al (1993) Concepts and progress in the development of peptide
mimetics. J Med Chem 36:3039–3049
Bibliography 207
Part III
Experimental and Theoretical Methods
A crystal is the prerequisite for the 3D-structure determination of a protein with
X-ray crystallography (▶ Chap. 13). The figure shows crystals of a complex of
protein kinase A that were used to elucidate the reaction mechanism of this class of
enzymes (▶ Chap. 26). (Reprinted with the kind permission of Dr. Dirk
Bossenmeyer, Deutsches Krebsforschungszentrum, Heidelberg, Germany.)
210 III Experimental and Theoretical Methods
Combinatorics: Chemistry with Big
Numbers 11
The search for new lead structures and the optimization of their activity profile by
systematic modification are among the most time and cost-demanding steps in drug
research. The optimization of a small organic molecule can serve as an example.
Even if the number of different groups per position is limited to relatively few,
several million structures are possible as exemplarily shown in the case of the
multisubstituted tetrahydroisoquinoline carboxylic acid amide 11.1 (Fig. 11.1). The
combinatorial explosion of all imaginable substitution possibilities can no longer be
realized with classical chemical techniques. The diversity increases even more
when the different stereoisomers are considered. The number is already consider-
ably larger than the number of all of the compounds referenced in Chemical
Abstracts (33 million) or in Beilstein (10 million compounds).
In the days when substances were tested on whole animals or in complex
pharmacological in vitro models, the biological tests were the rate-determining
step. The introduction of molecular test models, for example, enzyme or
receptor-binding tests, and extensive automation of screening has fundamentally
changed this situation. Testing of many thousands of compounds per day is
technically unproblematic (▶ Sect. 7.3). To use the capacity of these methods
to their fullest extent, the synthesis of thousands or even tens or hundreds of
thousands of different molecules is desirable. The strategy can then shift either to
automated parallel synthesis to cover a large number of single compounds or
the simultaneous production of compound mixtures by using combinatorial
chemistry.
11.1 How Nature Produces Chemical Multiplicity
Nature has shown a way to achieve combinatorial diversity with the nucleic acids
and with proteins. A 600-base-pair DNA sequence codes a protein with 200 amino
acids. From the “pool” of four nucleic acids that code for the 20 proteinogenic
amino acids in triplet sequences, 4600
(a number with 360 digits!) different
G. Klebe, Drug Design, DOI 10.1007/978-3-642-17907-5_11,
# Springer-Verlag Berlin Heidelberg 2013
211
DNA sequences are possible. This translates to 20200
(a number with 260 digits!)
different amino acid sequences for the resulting protein. Short peptides with
enormous structural variety can be constructed with just the 20 proteinogenic
amino acids. If instead of amino acid A, a manageable number of modified
amino acids M is used, the number of possible analogues increases even more
(Table 11.1).
Peptides play an important role in biological systems. They are found as
protein ligands in the free form or as simple derivatives. Peptide sequences exposed
on the surface of a protein determine the recognition properties of the protein at
a receptor. Nature exhausts the full combinatorial diversity of the variable
sequences on the surface regions (epitopes) of proteins for their selective recogni-
tion. This principle of Nature can be adopted to generate huge compound libraries
with highly variable composition.
4
5
N
R5
R6
R4
N
O
R9
5
2
*
*
R7
R8
R2
R1
R3
R10
10
5
20
11.1
5 10
2
Fig. 11.1 The tetrahydroisoquinoline carboxylic acid amide 11.1 is to be substituted in 10 posi-
tions. The groups in these positions encompass a multiplicity of a total of 68 building blocks
(R1–R10 ¼ 5, 10, 10, 4, 5, 5, 5, 2, 2, 20 groups). Twenty million compounds can be constructed in
this way. If the structural diversity that results from the two stereocenters (*) is considered, this
number increases again by a factor of 4.
Table 11.1 Four-hundred
dipeptides, 8,000 tripeptides,
160,000 tetrapeptides, and
64 million hexapeptides can
be generated from the
20 natural amino acids, A. If
the palette is expanded to 100
modified, non-natural amino
acids, M, the combinatorial
diversity increases
dramatically.
Compounds Number
Natural amino acids, A 20
Dipeptides, A–A 400
Tripeptides, A–A–A 8,000
Tetrapeptides, A–A–A–A 160,000
Hexapeptides, A–A–A–A–A–A 64,000,000
Modified amino acids, M 100 (for example)
Modified hexapeptides, M–M–M–M–M–M 1,000,000,000,000
Number of known compounds 33,000,000
212 11 Combinatorics: Chemistry with Big Numbers
11.2 Protein Biosynthesis as a Tool to Build Compound
Libraries
How can the biochemical synthesis machinery be used as a vehicle to generate
a multiplicity of peptide sequences? It is possible to connect short sequences to a
carrier protein so that they are exposed on the surface and can interact with the
target protein in a molecular test system. The test system is constructed in a way
that the binding to the target protein is monitored with an easily registered signal,
for instance, a fluorescence signal or a colorimetric reaction (▶ Sect. 7.2).
To exploit protein synthesis for the construction of such a library, information
about the randomly constructed peptide must be translated into the “genetic make-
up” of the DNA molecule. This codes the sequence of the protein on the surface of
which the library will be presented. Randomly assembled double-stranded DNA
sequences must be introduced in the correct position of the DNA. After production
of a large number of identical copies (clones), the gene can be expressed. A large
population of proteins is produced that carry the randomly composed peptide
sequence in a particular position, usually at the beginning or the end of the polymer
chain. These proteins are then investigated in a test system. The distribution of the
20 proteinogenic amino acids over the variable sequence section is not entirely
homogenous. That is because some amino acids are coded with a single triplet
sequence (codons), and others are represented with up to five different codons
(▶ Sect. 32.7). Because of this, biased libraries are inevitably formed.
The bacteriophage M13 is an extremely popular expression system. M13 is
a virus that infects Escherichia coli strains well. The virus carries six proteins on its
coat. Two of these coat proteins allow randomly assembled protein sections to be
added to their ends. A library of 20 million modified 15-residue peptides was
constructed with this M13 system. Their binding to the protein streptavidin was
tested. Fifty-eight candidates were identified as binding partners. They all carried
the ─His─Pro─Gln─ segment in common. A crystal structure of the streptavidin in
complex with this oligopeptide was successfully determined. The ─His─Pro─Gln─
segment of the peptide occupies the binding pocket that is normally used by biotin.
This proves that selectively binding peptide sequences can be found with this method.
The biochemical approach to generating and presenting compound libraries has
the overwhelming advantage that the high-capacity protein biosynthesis is
exploited. Furthermore the sophisticated protein and DNA synthesis techniques
and analytical methods that have been developed for such substances (Sect. 11.7)
can be used to characterize screening hits. But it also has disadvantages. The
molecular diversity is limited to the 20 proteinogenic L-amino acids, and only
peptides result as lead structures. Often these represent the starting point for
the development of an active substance. However, we want to get away from the
metabolically unstable, poorly bioavailable peptides. Therefore structures are
searched for by using classic organic molecular scaffolds. At the very least,
peptidomimetics or peptides with metabolically stable non-natural amino acids are
desired. Unfortunately the step away from peptides toward alternative scaffolds, with
retention of the biological activity, is not trivial (▶ Chap. 10, “Peptidomimetics”).
11.2 Protein Biosynthesis as a Tool to Build Compound Libraries 213
11.3 Organic Chemistry from a Different Angle: Random-
Guided Synthesis of Compound Mixtures
Organic preparative methods were devised as an alternative to the biological
approaches to generate compound libraries. Simple access to a compound library
is gained by starting with reactive molecular building blocks, such as oligofunctional
acid chlorides (11.2–11.4, Fig. 11.2). These components are simultaneously reacted
with numerous reagents, for example, amines or amino acids. A mixture of many
products is formed in an uncontrolled manner. Contrary to the general academic
opinion that organic reactions should only deliver homogenous products, in this
case as much product diversity as possible is desired. The advantage of this method
is that it is easily carried out, and automation is readily realized. This synthesis
Cl
O
O
Cl
Cl
Cl O
O
O
Cl
O
O
Cl
O O
Cl
11.2 11.3
Cl
O
O
Cl
Cl
Cl
O
11.4
O O AA1
AA2
AA4
O
AA3
O
O
O O Ile O O Ile
Lys Lys
O
O
Pro Pro
O
O
Val
O O
Val
11.6
11.5
Fig. 11.2 The oligofunctional acid chlorides of the central building blocks cubane 11.2, xanthene
11.3, and benzene 11.4 are treated with protected amino acids. A xanthene-containing library
inhibits the digestive enzyme trypsin. The active component of the library was deconvoluted and
characterized by targeted resynthesis. In the end the isomers 11.5 and 11.6 remained as the most
potent compounds. The derivative 11.5 inhibits trypsin with a Ki of 9.4 mM.
214 11 Combinatorics: Chemistry with Big Numbers
strategy also has disadvantages. The coupling partners have different reactivities. As
a result, the products are not evenly distributed. The transformation of a particular
functional group on the central building block can depend upon which components
the central molecule has already reacted with, and how this influences the other
functional groups.
The thus-generated library is then tested. If binding to the target protein is found,
the active substance in the mixture is characterized, a task that is not particularly
simple. On the one hand, sophisticated analytical techniques such as liquid chro-
matography coupled with NMR spectroscopy and mass spectrometry can be used.
Moreover, an attempt can be made to “deconvolute” the library. For this, a targeted
resynthesis of the library is carried out in which a partial library is prepared by using
a defined selection of building blocks. This smaller library is then tested and the
composition of the active mixture is determined. This strategy must be followed
back to the level of single defined reactions product.
11.4 What Is Contained in Chemical Space?
At this point the fundamental question must be asked: how many organic molecules
are principally possible from which medicinal chemists can create their candidates?
What is the content of this, at first virtual, chemical space? Much has been speculated
about this question. Numbers between 1020
and 10200
possible molecules have been
named. The last claim encompasses so many molecules that the entire mass of the
universe would not be enough to synthesize at least one molecule of every com-
pound! We have to thank Tobias Fink and Jean-Louis Reymond of the University of
Bern for forming a concrete idea of the principle occupancy of this chemical space.
Beginning with mathematical graphs that describe simple hydrocarbon scaffolds,
molecules with up to 11 C, N, O, and F atoms were generated on the computer.
Heteroatoms and unsaturated bonds were scattered throughout the generated molec-
ular graphs in a combinatorial fashion. Different filters that consider the chemical
stability of the functional groups, the strain of the ring systems, and the formation of
tautomers produced a database of 26.4 million structures. If all possible stereoiso-
mers are generated, an average of 4.2 isomers per entry is formed. The database
finally encompassed 110 million molecules. It is interesting to see that the number of
entries increases exponentially with the square of the number of atoms. Therefore
already 90% of the database is composed of molecules with 11 non-hydrogen atoms.
If the number of molecules that can be generated with 25 non-hydrogen atoms is
estimated, the result is 1027
imaginable products. Twenty-five atoms represent
approximately the average size of a typical drug molecule.
It is worthwhile, however, to look at the database with entries of 11 non-
hydrogen atoms more closely. The average molecular mass in this database is
153  7 Da. Molecules of this size fall into the range of typical fragments or
“lead-like” molecules (▶ Sect. 7.9). Exclusion criteria were proposed that emphasize
promising candidates for drug development. The so-called “rule of three” leans on
the “rule of five,” which was established by Chris Lipinski at Pfizer (▶ Sect. 19.7).
11.4 What Is Contained in Chemical Space? 215
If the database is filtered with these rules, approximately half of the entries remain. Of
these, ca. 15% are acyclic compounds, and about 43% contain one ring. It is very
enlightening to see that only about 55% of the ring systems in the virtual database
have been described in Chemical Abstracts or Beilstein. Comparison with a data
collection of already-synthesized molecules of the same size makes clear where the
chemical space has been only sketchily explored. It seems that very big gaps still
exist! Over 99.8% of the entries in the virtual database are waiting to be synthesized.
A comparison of the physicochemical properties of the molecules in both databases
suggests that very broad areas still remain that until now have not been explored.
If the chemical space is limited to compounds with 7, 8, or 9 atoms, it seems that the
chemical space is well covered with already prepared molecules. Approximately 2/3
of the molecules with 10 or 11 atoms in the virtual database are chiral. In this group
particularly, there are many candidates that meet the “lead-like” criteria. This is a real
challenge for synthetic chemists. Chiral fused carbo- and heterocycles are difficult to
make. Nevertheless, Nature has led the way: many biologically active natural prod-
ucts contain just these building blocks.
11.5 Compound Libraries on Solid Support: Complete Yield
and Easy Purification
An interesting variation on classical chemistry in solution is found in the synthesis
of compound libraries on solid supports. Organic polymers, usually cross-linked
polystyrenes, are used as carriers. This material is chemically modified so that
it carries numerous reactive functional groups of a particular sort, for example,
chloromethyl, carboxylate, or amino groups. Through these groups the reaction
product remains covalently attached to the insoluble polymer during the synthetic
steps. Stepwise growth of the product is accomplished by coupling with appropri-
ately protected building blocks (e.g., amino acids) and subsequent cleavage of these
protecting groups. Large excess of reagents causes fast and nearly complete trans-
formations. Unreacted starting materials can be removed by simple washing. After
assembly of the target molecule, all protecting groups are removed. At the end of
the synthesis, the product is either tested directly on the support or it is cleaved and
its biological activity is tested in solution (Sect. 11.7).
The technique can be easily automated. In the beginning of the 1960s, Robert
Bruce Merrifield developed solid-phase synthesis for peptides and small proteins
(Fig. 11.3). In the beginning of the 1980s the idea to use synthetic combinatorial
principles for peptide synthesis emerged for the first time. H. Mario Geysen
devised a multipin synthesis of peptides. By using a conventional Merrifield
solid-phase synthesis 96 different peptides or defined peptide mixtures were
prepared in an 8  12 format on polymer pins. This concept was so revolutionary
that the originally submitted manuscript was rejected for publication in 1984. The
referees were too severely restricted by their traditional thinking. The absolute
control of stoichiometry and yield were less in the foreground for Geyson, rather
the creation of combinatorial diversity with minimal effort was more important.
216 11 Combinatorics: Chemistry with Big Numbers
Cl
O
O
N
H
Boc
R1
O
H
R1
O
O
N
H
Boc
O
O
N
H
R1
O
R2
N H
R1
O N H
H
Br
HO
O
N
H
R2
+
ClCH2OCH3 + SnCl4
or ZnCl2 + HCHO
Boc-NHCHR1COO−
HCl / CH3COOH
Boc-NHCHR2COOH/
DCCI / DMF
Cleavage with
strong acid
HBr/CF3COOH
Fig. 11.3 The Merrifield peptide synthesis is assembled on a polymeric resin that is
functionalized in an appropriate way. The first N-terminal-protected amino acid is coupled to
the chloromethylene group (Boc ¼ tert-butoxycarbonyl protecting group). Then the amino group
is released, activated with dicyclohexylcarbodiimide (DCCI), and coupled with a second amino
acid. The N terminus of the resulting dipeptide can be deprotected and elongated. It can also be
cleaved from the resin under strongly acidic conditions as a peptide.
11.5 Compound Libraries on Solid Support: Complete Yield and Easy Purification 217
In this way, thousands of different peptides could be prepared weekly. Entire
libraries of compounds could be prepared and tested. The new methods were
originally used for “epitope mapping,” that is, the structural probing of the surface
of a protein with different antibodies (▶ Sect. 32.1). This technique allows the
recognition of areas in a polypeptide chain that are exposed to the surface of
a protein. Later it served the search for optimal sequences of protease substrates
(▶ Sect. 14.6) and for the synthesis of biologically active peptides. In addition to
the multipin method, high-efficiency methods have been established, for instance,
the teabag method. Support beads are filled into teabags and dipped into solutions
of protected amino acids with which their peptide sequence is to be elongated.
11.6 Compound Libraries on Solid Support Require Contrived
Synthetic Strategies
An especially sophisticated synthetic strategy is needed for the construction of
compound libraries. Hexapeptides are considered as an example. In principle, all
20 proteinogenic amino acids could be used and 206
¼ 64 million hexapeptides
prepared and individually tested–an impossible undertaking. Therefore intelligent
strategies are needed to quickly identify the biologically active sequences. As
a consequence, an attempt is made to summarize the 64 million peptides in partial
libraries. They contain constant amino acids in fixed positions. For example, all
400 partial libraries should be prepared for all possible hexapeptides with the form
XXABXX (A, B ¼ predefined amino acids, and X is any natural mixture of amino
acids). After testing these 400 substances the most biologically active mixture is the
starting point for the second round of synthesis. Another 400 libraries are generated,
this time with the form XA(Aa1)(Aa2)BX. Aa1 and Aa2 are the amino acids from the
most active mixture from the first testing. These too are fed into the testing. The
“best” amino acids for position 2 and 5 are found. This strategy is pursued step for
step, until the most active sequence is identified.
In a simpler procedure the amino acids are varied in one position at a time. By
starting with 20 libraries AXXXXX the most active amino acid is determined in the
first position. The starting point for the next synthetic cycle is the most active
mixture Aa1XXXXX. By varying the adjacent position, the second amino acid is
ascertained. This is repeated and 6  20 ¼ 120 hexapeptide libraries are prepared in
the form of AXXXXX, Aa1AXXXX, . . . Aa1Aa2Aa3Aa4Aa5A until the “best”
amino acids in all positions are determined.
Another method allows the targeted construction of a library in few working
steps. The conceptual design of the synthesis ensures that a defined compound is
produced on each polymeric support bead. This is achieved by using the so-called
“split- and -combine” technique (Fig. 11.4). For example, it is possible to synthesize
all 8,000 possible tripeptides from the 20 proteinogenic amino acids in only
60 reaction steps. They are produced as 20 mixtures of 400 substances each. In the
end, one definite compound is located on each polymer bead. The individual beads
are available as a batch that is easily separated mechanically and individually tested.
218 11 Combinatorics: Chemistry with Big Numbers
A
B
C
A B C
A C
B
C C C
A
A
A
C C C
A C
B
B
B
A
A
A
B
B
B
B
A
A
C
B
B
C
A C
B
A A A
A A A
A
C
A
A
B
C
A
B
B
A
A
B
B
A
B
A
C
C
C
A
C
A
B
A
A A A
A B
B
B
B B
C C C
B B
B
B
B B
B
B B
C
C
C
A A A B
C C C
B
B
C C C
C C C
C C C
C
A B
A C
B
A
B
C C C C C C C C C
C
Fig. 11.4 The construction of a compound library according to the split-and-combine technique
starts with a certain amount of resin beads. These are evenly distributed among n reaction vessels.
Only three are considered here for the sake of simplicity. In the first flask reagent A (e.g., amino
acid A) is coupled to the resin. Reagents B and C are analogously added to flask 2 and 3. In the next
step a dipeptide is constructed. To solve the problem of different reaction rates between the
different amino acids A, B, and C, only one soluble reaction partner is added in excess to the
mixture of solid-phase-bound starting materials. After the first reaction step, the resin, which is
now loaded with an amino acid is combined and mixed. It is again distributed between 3 (or more)
reaction flasks. The next reaction is carried out. In the case of a peptide synthesis, amino acid A is
added to flask 1, B to flask 2, and C to flask 3. The resin is combined and mixed thoroughly. In the
meantime all nine possible dipeptides are on the beads. After separating the beads again, the third
step follows. In case the peptide chain is to be extended by another amino acid, amino acid A is
added to flask 1, B to flask 2, and C to flask 3. Now all 27 imaginable sequential tripeptides are on
the resin after three parallel reaction steps. A clearly identifiable compound is found on each resin
bead. The library can be tested directly on the polymer or it can be tested in solution after cleavage
from the support.
11.6 Compound Libraries on Solid Support Require Contrived Synthetic Strategies 219
11.7 Which Compound in the Solid Support Combinatorial
Library Is Biologically Active?
The libraries that were generated on the solid support are biologically tested. This
can be done directly on the polymer-immobilized compounds. As with testing
the libraries from bacteriophages, there is a danger that the support material
influences the test, for example, through steric hindrance or unspecific interactions.
Furthermore, it is important that the test protein is in a soluble form. Membrane-
bound receptors therefore elude testing. Alternatively, the compound library can be
cleaved from the resin. For this, the coupling between the resin and the library
component must be made by using an appropriate “linker,” which allows the library
to be selectively released. This linker is cleaved off, for instance, at a low pH or
photochemically with UV light. It must not interfere, however, with the synthetic
assembly of the library, and must not be cleaved during the synthesis.
The final cleavage from the resin must not destroy the products. Testing the
cleaved products certainly correlates better with physiological conditions. Spread-
ing the cleaved compounds onto a large area or embedding them in a gel
achieves a spatial separation so that compounds interacting with the test protein
occur in local high concentrations. This way, the binding to insoluble (e.g., mem-
brane-bound receptors) proteins can be tested. However, the advantage of
the mechanical manipulation of a polymer-bound compound library is lost
upon release.
If biological activity is found in the test, it remains to be determined which
compound from the library is responsible. If the library is precisely defined through
the synthetic program, then it is known which compounds were tested. Active
components are narrowed down by deconvolution and the resynthesis of partial
libraries. Only one defined compound is produced on each resin bead with the one-
bead-one-compound technique. It is not known, however, which one. It is only
after activity is found that the compound characterization is attempted. There are
many ways to do this: they can be tested on the resin by separating the relevant resin
beads and analyzing the compounds. If the library is of peptides or oligonucleo-
tides, peptide sequencing by Edman degradation (works even on 0.1 picomolar!) is
carried out, or polymerase chain reaction allows (▶ Sect. 12.1) amplification and
enrichment of oligonucleotides.
Even more elaborate techniques are also used. During synthesis, the library is
allowed to “grow” on multiple different linkers. The single library compounds can
be released from these linkers under different conditions (e.g., different pH values,
or photochemically at different wavelengths). First the compound is cleaved from
the first linker to carry out testing. The cleavage from the second linker is performed
after mechanical separation of the desired resin beads. This method serves to
practically “label” the resin beads. The technique is therefore an elegant variation
on library testing in a detached state. The different linker-bound compounds on the
resin bead need not be identical. Therefore a test library of peptides can be linked to
the resin bead by oligonucleotides, which are used as labels. Halogenated aromatics
were also proposed as labels because they can be easily identified by mass
220 11 Combinatorics: Chemistry with Big Numbers
spectrometry even in the smallest quantities. The labels can even be encoded based
on their sequence or the number of monomer building blocks with an appropriate
binary code.
The techniques of labeling the resin bead require considerable synthetic effort
for the library preparation. The transformation steps for the assembly of the library
and the labeling must not disturb one another. Even the final reading of the labels
can require multiple working steps. The alternative route using the programmed
synthesis concept with deconvolution and resynthesis also means increased effort
for the repeated construction of the library components. However, the same work-
ing steps are always used, they are just carried out with different reagent compo-
sitions. With respect to automation, this is certainly an advantage.
11.8 Combinatorial Libraries with Large Diversity: A Challenge
for Synthetic Chemistry
Another aspect speaks for the last above-mentioned concept. In the meantime
a large number of organic reactions have been transferred to solid-phase synthesis.
For each solid-phase synthesis, a special strategy, a specific linker, and a suitable
cleavage method must be developed. Each single synthetic step must be compatible
with the protecting groups, the polymer support, and the linker. However, a whole
new dimension of chemical diversity is made available than is possible with
peptides and nucleotides.
Careful design of the target molecules to be synthesized is indispensible for
combinatorial chemistry. Limitations arise from the accessibility, that is, the devel-
opment of an appropriate synthetic scheme, and furthermore from the desired
structural diversity of the resulting library. Computer methods help to find
a “reasonable” selection of synthetic components. How is the optimal composition
obtained? This highly depends on what the constructed library should be tested for.
A library can be developed for general-purpose screening. It should then be
“optimally diverse.” Its composition is outlined according to generally accepted
criteria such as molecular weight, total lipophilicity, an even distribution of H-bond
donors and acceptors, as well as the size of the hydrophobic surface area.
These characteristics are important for the similarity or diversity of active com-
pounds in the library (▶ Chap. 17, “Pharmacophore Hypotheses and Molecular
Comparisons”). The desired library diversity can also be considered in relation to
the biological properties of a receptor (target oriented). Criteria that make
a molecule “similar” or “diverse” for one receptor are not necessarily the same
for another receptor (▶ Sect. 17.7). In view of the broad palette of proteins for
which combinatorial libraries should be tested, there is no absolute measure of
diversity. Therefore, combinatorial chemistry plays an important role in the estab-
lishment of structure–activity relationships for a target protein. For this the chem-
ical variation in different positions must be very quickly conducted on a suitable,
discovered lead structure. The design and synthesis of such targeted compound
libraries opens the gateway quickly.
11.8 Combinatorial Libraries with Large Diversity: A Challenge for Synthetic Chemistry 221
11.9 Nanomolar Ligands for G Protein-Coupled Receptors
Chemists at the company Chiron synthesized a library of trimeric N-substituted
oligoglycines (peptoids) by using the split-and-combine method (Fig. 11.5). In their
design of the nitrogen substituents the scientists had G protein-coupled receptors in
mind. These receptors are the targets of many neurotransmitters and hormones. In
the construction of their peptoids they combined at least one aromatic group and
a side chain with an H-bond donor in the form of a hydroxyl group (Fig. 11.5,
Groups A and O). Furthermore, a basic nitrogen atom is present in the molecules
with X ¼ H. These groups match those also found in neurotransmitters and
hormones. They have chosen as diverse a substituent composition as possible for
the remaining third substituents (Group D). A peptoid library with ca. 5,000 di- and
tripeptoids was prepared from these groups.
Different mixtures were tested on the adrenergic receptors. The H-ODA-NH2
partial library was identified as the most active. It served as a starting point for the
stepwise deconvolution of the library. Partial libraries were resynthesized, first by
keeping the hydroxy side chain O constant, then the members of the diverse group D,
and finally the aromatic substituent A. In the end, 11.7 remained as a nanomolar
ligand (Fig. 11.6).
The same peptoid library was tested on another GPCR, the opiate receptor. In
this case the most active partial library H-ADO-NH2 was found in the first step. The
relevant deconvolution through resynthesis delivered 11.8 as a nanomolar ligand.
The molecule has a p-hydroxyphenylethyl moiety and a diphenylmethane group on
both ends of the tripeptoid. It is known from detailed studies on Met-enkephalin
11.9 that the amino acids tyrosine and phenylalanine are essential for the activity.
There are analogous groups for both moieties on the tripeptoid (Fig. 11.6).
11.10 More Potent than Captopril: A Hit from a Combinatorial
Library of Substituted Pyrrolidines
The Affymax company prepared a library of ca. 500 differently substituted
pyrrolidines by 1,3-dipolar cycloaddition. In the first step, the resin was loaded
with protected amino acids (Gly, Ala, Leu, and Phe; Fig. 11.7). Then the transfor-
mation to an imine was made with four different aromatic aldehydes. Cycloaddition
with five different alkenes led to five-membered-ring heterocycles. In the last step,
the pyrrolidines were N-substituted with three different thiols.
This last step was done in view of testing these ligands on the angiotensin-
converting enzyme (ACE, ▶ Sect. 25.4). Inhibitors of this enzyme contain a
functionalized proline residue at their C terminus. The iterative deconvolution of
the library afforded 11.10 as a potent ACE inhibitor (Fig. 11.7; Ki ¼ 160 pM). It is
distinctly a stronger binder than the marketed product captopril and belongs to the
most potent thiol-containing ACE inhibitors.
222 11 Combinatorics: Chemistry with Big Numbers
N
X
R1
N
O
R2
N
NH2
O
R3
O
H
X
∗
∗
O
O
H3C
∗
cHex
X R1 R3
R2
X A D O
X A O D
X O D A
X O A D
X D A O
X D O A
A
∗
∗
∗
∗
O
∗
OH
∗
∗
OH
OH
∗
∗
∗
∗
∗
∗
D CH3
CH3 COOH CH3
∗
∗
∗ ∗
∗
∗
∗ O NH2
CH3
OH
CH3
O
O
NH2
∗ ∗ ∗
∗
∗
CH3
O
CH3
N
O
∗
∗
∗
∗
N
OCH3
O
O
OCH3
OCH3
O
∗
N
Fig. 11.5 Peptoids are oligoglycines that are substituted at nitrogen. A library of di- and
tripeptoids was constructed according to the split-and-combine technique. Three X groups
were added to the N terminus. Three groups O with a hydroxy function, 4 groups A with an
aromatic ring, and 17 groups D with diverse groups were used as nitrogen side chains. Eighteen
mixtures (6 permutations of A, O, and D with 3 end groups) gave ca. 5,000 di- and tripeptides.
The H-ODA-NH2 library showed activity on the a-adrenergic receptor. First, the hydroxy
groups O were deconvoluted. The compounds with p-hydroxyphenethyl groups were the most
active ones. In the next synthesis round, 17 partial libraries were composed with this O group
held constant, and defined groups were used from the diverse D group. Compounds with
a diphenyl or diphenyl ether group were particularly active. With these groups in the
D position, the work was continued. The division of the aromatic side chains A in the last
position led to eight individual compounds.
11.10 More Potent than Captopril 223
11.11 Parallel or Combinatorial, in Solution or on Solid
Support?
Combinatorial chemistry on solid support has enabled the automated synthesis of
numerous molecules, but it also faces problems. The difficulties associated with
testing on resins or the deconvolution and resynthesis of libraries have already
been mentioned. Labeling is an elegant but laborious alternative. Another way to
OH
HN
N
N
O
NH2
O
O
O
D
A
11.7
N
N
O
O
O
O
O
OH
HN
NH2
O
A
D
11.8
NH2
N
H
O
N
H
N
H
N
H
O
HO
O
O
O
SCH3 HO
11.9
Fig. 11.6 The derivative 11.7 is the most potent compound from the H-ODA-NH2 library with
a Ki ¼ 5 nM on the a-adrenergic receptor. Testing on the opiate receptor gave compound 11.8 as the
candidate with highest affinity (Ki ¼ 6 nM) from the H-ADO-NH2 library after deconvolution. Met-
enkephalin 11.9 is a potent opiate receptor ligand. The relationship between the p-hydroxyphenyl
group in 11.8 and the tyrosine side chain in 11.9, and a phenyl portion in the diphenylmethane
groups of 11.8 and the benzyl groups of phenylalanine in 11.9 is obvious. Tyr and Phe are essential
for the activity of Met-enkephaline.
224 11 Combinatorics: Chemistry with Big Numbers
avoid deconvolution of a library but which still uses the advantages of combinato-
rial chemistry is parallel synthesis in spatially separated reaction vessels. It remains
clear along the entire reaction sequence which reactant and product is in each
vessel. A laborious deconvolution is omitted. At first this strategy seems to be
impractical. How should a thousand reaction components be reasonably
transformed in a thousand reaction flasks? For this purpose, the reaction flasks
Ar-CHO:
*
+
Aa:
Gly
Ala
Le
+
a
R1:
H
*
CH2Cl O
NH2
R
O
R1
Leu
Phe Me
OMe
OSiMe2tBu
Y:
+
b c
O
N
O
Ar
O
N
O
Ar
CN
CO2Me
COMe
H
R R
Y
CO2Me
CO2tBu
+
d
Cl Thio
O
N
O
A
Thio
O
Thio :
CH2SAc
CH2CH2SAc
+ O
R
r
Y
CH(Me)CH2SAc
N
O
Ph
O
CH3
SH
11.10
HO
CO2Me
Fig. 11.7 The amino acids Aa ¼ Gly, Ala, Leu, or Phe are coupled to the support resin (a). Next,
they are transformed to imines with four different aromatic aldehydes (Ar-CHO; b), which react
with alkenes under 1,3-dipolar cycloaddition conditions to give pyrrolidines (c). In the last step the
free NH proton on the heterocycle is treated with different thiol compounds (thio-COCl; d). With
the help of the split-and-combine technique the library is cleaved from the polymer with release of
an acid function. Its ability to inhibit the angiotensin-converting enzyme was tested. By
resynthesis and renewed testing, the library was deconvoluted to the active compound. In doing
so, compound 11.10 was identified as a high-affinity inhibitor.
11.11 Parallel or Combinatorial, in Solution or on Solid Support? 225
should not be thought about in the classical organic chemistry sense. Rather,
miniaturized reaction “automats” are developed in which all reactions steps are
carried out in parallel. Alternatively, methods have been developed in which the
resin beads are filled into many small reaction capsules (e.g., called teabags or
“KansTM
”). These are open for the solution-phase for compound transport, but the
beads are mechanically enclosed. Each capsule is fitted with a label that can be read
with a radio transmitter. All of the capsules are then placed in a classical round-
bottomed flask and the usual chemistry is carried out. The capsules can be mechan-
ically separated and brought into contact with different reagents. Which reaction
sequence is performed on which capsule is followed by the registration system with
the radio transmitter. In this way, one molecule can clearly be prepared by combi-
natorial principles per reaction capsule, practically as it is in parallel synthesis. The
single compounds are then available for testing.
Synthesis on a solid support material has disadvantages compared to chemistry
in the solution phase. Usually transformations are slower and the analysis to follow
the reactions is considerably more laborious to carry out. Coupling to the solid
support requires a suitable linker. Such an anchoring group should be removed from
the library before testing as tracelessly as possible. Above all else, upon removal of
the linker (“traceless linkers”) no functional groups should remain in the library that
might unintentionally be part of the pharmacophore. The chemistry to attach
and remove the linker must be compatible with all of the other reactions in the
synthesis of the solid-supported library. This can lead to limitations in the usable
chemistry. In preparative chemistry, molecules are preferably constructed by using
a convergent synthesis strategy. For this, an approach is developed in which the
components of the final product are prepared in separate steps, each in parallel. In
the subsequent reaction steps, the previously prepared components are brought
together and coupled with one another to produce the final product. Such a strategy
is more efficient and leads to a higher yield than a linear synthetic route.
A convergent strategy, however, cannot be carried out by sequential construction
on a resin. Therefore, the tables have been turned for some syntheses. The prepared
libraries are not bound to the solid support, but rather the reagents with which they
are treated. The advantage of carrying out reactions on the solid support is retained.
Good mechanical separation of reaction components, working simply with large
excesses of reagent, and automated reactions belong to this technique. An advan-
tage is that it is now possible to carry out convergent syntheses. Even toxic reagents
can be used as their separation is ensured by their firm adhesion to a solid support.
The usual analytical methods that are typically applied for the solution phase can
also be used.
Some reactions, especially ring-closure reactions or condensations, are in
competition with intermolecular transformations. To avoid these, highly diluted
solution conditions are used. If a solid-supported reactant is used, the local con-
centration of the reactant will be reduced as it is fixed to the solid support and
spatially separated. Reactions that occur over a trapped reaction product can be
simplified if the trapping reagent is coupled to a solid support. Mechanical filtering
is enough to separate the trapped components. Similarly, the products can be
226 11 Combinatorics: Chemistry with Big Numbers
separated and purified by trapping them on a solid support. Acids and bases can be
separated for purification by treatment with an immobilized amine or a sulfonic
acid. In the meantime, the adhesion of metal-complexing groups or hydrophobic
adhesion groups are already used for the purification of combinatorially produced
compound libraries.
How will combinatorial chemistry develop further? The miniaturization
of reaction vessels and synthetic automats seems to be a seminal perspective. The
“lab-on-a-chip” concept is already intensively used for bioanalytical methods. Small
reaction volumes, integrated separation columns, miniaturized valves, and pumps
that are controlled by piezo elements are integrated on small chip cards. We can only
wait and see whether such serial reaction automats are the laboratories of the future.
11.12 The Protein Finds Its Own Optimal Ligand: Click
Chemistry and Dynamic Combinatorial Chemistry
Could a protein simply produce its own best inhibitor itself? With the ideal
geometry, it should be able to form the optimal interactions directly in the binding
pocket of the target enzyme. Which chemical reactions might be best suited for
such a concept? It would have to be a reaction that can be conducted in aqueous
medium, is reliably enthapically driven, is fast, and that gives complete turnover.
Such a reaction, named “click chemistry,” was investigated in detail in the group of
Barry Sharpless in La Jolla California in recent years. Cycloadditions of unsatu-
rated compounds (1,3-dipolar cycloadditions, Diels–Alder reactions); nucleophilic
substitutions, particularly ring-opening reactions; non-aldol-like carbonyl reac-
tions; and additions to C─C multiple bonds fulfill these requirements. These
can be applied by using combinatorial principles. The 1,3-dipolar cycloaddition
(Huisgen Reaction) can be particularly well used to build five-membered triazole
and tetrazole heterocycles (Fig. 11.8). 1,4-Disubstituted 1,2,3-triazoles can be
regiospecifically produced by the reaction of an azide and alkyne in the presence
of Cu(I) salts at room temperature. 1,5-Disubstituted triazoles are formed when
copper ions are excluded or other ions such as ruthenium are added. The reaction
runs in a broad pH range between 4 and 12. The reaction type can be extended to
tetrazoles. For this, nitriles are needed as dipolarophile reaction partners in the
presence of zinc ions.
The research group of Jean-Marie Lehn in Strasbourg chose another way. They
developed “dynamic combinatorial chemistry” through the spontaneous construc-
tion of molecules from suitable starting materials and irreversible reactions
(Fig. 11.9). All imaginable combinatorial products form from a mixture of different
building blocks. A dynamic exchange equilibrium is established between them. The
target receptor, (e.g., a protein) is added to such an equilibrium system. This way
the mixture components with the best protein-binding characteristic have an advan-
tage, as the protein captures the best binders and shifts the equilibrium. It leads to
a self-perpetuating choice of the ligands that fit best into the binding pocket. In this
way the added protein practically seeks its own best inhibitor.
11.12 The Protein Finds Its Own Optimal Ligand 227
Even click chemistry can be directed toward a such a self-selecting synthetic
process. Acetylcholinesterase (AChE, ▶ Sect. 23.7) was added as a target protein to
a mixture of potential azides and alkynes as starting materials for the Huisgen
reaction partners. From the multiplicity of imaginable reaction products
a femtomolar inhibitor was selected! When decorated on one end by
a phenylphenanthridine group, and a tacrine head group suited for the shallow
entrance, the azide and alkyne react in the middle of the hose-shaped binding
pocket to afford a triazole (Fig. 11.10). Very few products form. They are
predetermined by the possible position of the starting compounds. The crystal
structures could be determined with two potent products. The newly produced
triazole ring forms an H-bond that is mediated by a water molecule with Ser203
in the catalytic center of the protein. The triazole ring does not form preferentially
as the entropically favored product of a simple linker the polar interaction with
Ser203 appears to be the driving force.
In a similar way, carboanhydrase II (▶ Sect. 25.7) was used as a target protein
for the selection of suitable reactants in a Huisgen reaction. In this case the alkyne
component was initially coupled to the catalytic zinc ion over a benzylsulfonamide
anchor. Later, structurally fitting azide components could be brought to react in the
funnel-shaped binding pocket. Nanomolar inhibitors were produced. Analogously,
N N
+
N
R2
N
N
R2
R2
N
CH
R1
N
R1
N
N
R1
N N
+
N
R2
1
4
1
5
+
HC R1
N N
+
N
R2
N
N
R2
1,4-Triazole 1,5-Triazole
N N N
N
R1
N
N
N
N
R1
1,5-Tetrazole
Fig. 11.8 The 1,3-dipolar cycloaddition (Huisgen reaction) is a typical click chemistry reaction
and leads to five-membered triazole and tetrazole heterocycles. In the presence of Cu(I) salts,
azides and alkynes react regiospecifically at room temperature to form 1,4-disubstituted 1,2,3-
triazoles, in the absence of copper but with ruthenium ions, 1,5-disubstituted products are formed.
If a nitrile is used instead of the alkyne component, and the reaction is catalyzed by zinc ions, 1,5-
disubstituted tetrazoles are obtained as products.
228 11 Combinatorics: Chemistry with Big Numbers
success has been achieved with HIV protease (▶ Sect. 24.3) and ACh-binding
protein (▶ Sect. 30.5) Goal-oriented combinatorial libraries are used as starting
materials for these reactions. Time will tell what significance this in situ inhibitor
synthesis will gain for practical drug research.
11.13 Synopsis
• As a consequence of the tremendous acceleration of automated compound
screening for biological activity, the amount of compounds required for testing
significantly increased and stimulated the development of automated parallel
synthesis and combinatorial chemistry.
• Nature produces an enormous chemical multiplicity by combining either amino
acids or nucleic acids to reveal polymers that fold into 3D arrangements.
• The chemical space of organic molecules with up to 25 non-hydrogen atoms and
that satisfy the requirements of drug-likeness has been estimated to host about
1027
imaginable candidates.
• Chemical reactions on solid support, usually organic polymer resins such as
cross-linked polystyrene, follow a stepwise synthesis strategy to build up mol-
ecules sequentially on the solid phase. Complete yields and easy purification can
be achieved, and product release from the solid phase is accomplished as the
final step.
Library
generation
Selection through
the receptor
Receptor
Dynamic exchange of
library components
Library
components
Receptor
Selection of the
best binder
Fig. 11.9 A mixture of different library components is furnished that interact under equilibrium
conditions in dynamic combinatorial chemistry. Numerous products can form in the equilibria.
They represent potential “keys” that can fit in the “lock” of the target protein. The added receptor
protein binds to the best-fitting ligands from the compound mixture and shifts the equilibrium in
the direction of increased formation of this product. It is then removed from the equilibrium by the
protein binding (according to O. Ramström and J.-M. Lehn).
11.13 Synopsis 229
N
HN
N
N
+
NH2
H2N
N
N
N
+
NH2
H2N
N
HN
N
N
N
N
N
H
N
N
+
NH2
H2N
N
N
Phenylphenanthridine
Tacrine
11.11 11.12
Ser203
Fig. 11.10 The library produced from alkynes bearing an AChE-suitable tacrine side chain
and AChE custom-made phenylphenanthridine-substituted azides. In the presence of acetylcho-
linesterase (AChE) the products 11.11 (green) and 11.12 (gray) are formed, which proved to
be potent enzyme inhibitors. They differ in the topology on the five-membered ring. Crystal
structure determinations were accomplished with both inhibitors. The surface around the
protein is shown with the bound ligand 11.12. Both ligands occupy the tube-shaped binding
pocket of AChE. Compound 11.11 binds via a water molecule (red sphere) to the hydroxy function
of Ser203.
230 11 Combinatorics: Chemistry with Big Numbers
• Sophisticated synthetic strategies have been developed to generate multiple
products on the solid support from reagent mixtures in a limited number of
reaction steps. Elaborate protocols have been established to keep track of
product formation that also use elaborate chemical labeling techniques.
• The biological activity testing of compound libraries generated by combinatorial
chemistry on a solid support requires sophisticated protocols to detach and
deconvolute the library.
• The design and selection of building blocks used for library synthesis are
purpose-oriented and consider the properties of the target(s) at which the library
is subsequently screened.
• Multiple protocols have been developed, either for combinatorial chemistry or
parallel synthesis that immobilize either the library substrates on the solid phase,
or the reagents are immobilized and the library is developed in the solution
phase.
• The target protein can be added to a mixture of reagents in click chemistry and
dynamic combinatorial chemistry. From a large variety of possible reaction
products, the protein binding pocket selects the best binder as a potent inhibitor
or antagonist of the target protein.
Bibliography
General Literature
Balkenhohl F, von dem Bussche-Hünnefeld C, Lansky A, Zechel C (1996) Combinatorial syn-
thesis of small organic molecules. Angew Chem Int Ed Engl 35:2288–2337
Bannwarth W, Hinzen B (2006) Combinatorial chemistry. From theory to application. In:
Mannhold R, Kubinyi H (eds) Methods and principles in medicinal chemistry, 26th edn.
Wiley-VCH, Weinheim
Baum RM (1994) Combinatorial approaches provide fresh leads for medicinal chemistry. Chem
Eng News 72:20–26
Beck-Sickinger AG, Weber P (2002) Combinatorial strategies in biology and chemistry. Wiley,
Weinheim
Bunin BA (1998) The combinatorial index. Academic, San Diego
Gallop MA, Barrett RW, Dower WJ, Fodor SPA, Gordon EM (1994) Applications of combina-
torial technologies to drug discovery. 1. Background and peptide combinatorial libraries.
J Med Chem 37:1233–1251
Gordon EM, Barrett RW, Dower WJ, Fodor SPA, Gallop MA (1994) Applications of combina-
torial technologies to drug discovery. 2. Combinatorial organic synthesis, library screening
strategies, and future directions. J Med Chem 37:1385–1401
Jung G (1999) Combinatorial chemistry. Wiley-VCH, Weinheim
Jung G, Beck-Sickinger AG (1992) Multiple peptide synthesis methods and their applications.
New synthetic methods. Angew Chem Int Ed Engl 31:367–383
Kay BK (1994) Biologically displayed random peptides as reagents in mapping protein–protein
interactions. Persp Drug Discov Design 2:251–268
Kolb HC, Finn MG, Barry Sharpless K (2001) Click chemistry: diverse chemical function from
a few good reactions. Angew Chem Int Ed Engl 40:2004–2021
Ley SV, Baxendale IR (2002) New tools and concepts for modern organic synthesis. Nat Rev Drug
Discov 1:573–586
Bibliography 231
Madden D, Krchnak V, Lebl M (1994) Synthetic combinatorial libraries: views on techniques and
their application. Persp Drug Discov Design 2:269–285
Moos WH, Green GD, Pavia MR (1993) Recent advances in the generation of molecular diversity.
Annu Rep Med Chem 28:315–324
Nicolaou KC, Hanko R, Hartwig W (2002) Handbook of combinatorial chemistry. Drugs, cata-
lysts, materials. Wiley-VCH, Weinheim
Ramström O, Lehn J-M (2002) Drug discovery by dynamic combinatorial libraries. Nat Rev Drug
Discov 1:27–36
Seneci P (2000) Solid-phase synthesis and combinatorial technologies. Wiley-Interscience,
New York
Special Literature
Bourne Y, Kolb HC, Radic Z, Sharpless KB, Taylor P, Marchot P (2004) Freeze-frame inhibitor
captures acetylcholinesterase in a unique conformation. Proc Natl Acad Sci 110:1449–1454
Carell T, Wintner EA, Sutherland AJ, Rebek J, Dunayevskiy YM, Vouros P (1995) New promise
in combinatorial chemistry: synthesis, characterization, in screening of small-molecule librar-
ies in solution. Chem Biol 2:171–183
Dooley CT, Chung NN, Schiller PW, Houghton RA (1993) Acetalins: opioid receptor antagonists
determined through the use of synthetic peptide combinatorial libraries. Proc Natl Acad Sci
USA 90:10811–10815
Fink T, Reymond J-L (2007) Virtual exploration of the chemical universe up to 11 atoms of C, N,
O, F: assembly of 26.4 million structures (110.9 million stereoisomers) and analysis for new
ring systems, stereochemistry, physicochemical properties, compound classes, and drug dis-
covery. J Chem Inf Model 47:342–353
Geysen HM, Meloen R, Barteling S (1984) Use of peptide synthesis to probe viral antigens for
epitopes to a resolution of a single amino acid. Proc Natl Acad Sci USA 81:3998–4002
Murphy MM, Schullek JR, Gordon EM, Gallop MA (1995) Combinatorial organic synthesis of
highly functionalized pyrrolidines: identification of a potent angiotensin converting enzyme
inhibitor from a mercaptoracyl proline library. J Am Chem Soc 117:7029–7030
Zuckermann RN, Martin EJ, Spellmeyer DC et al (1994) Discovery of nanomolar ligands for
7-transmembrane G-protein- coupled receptors from a diverse N-(substituted)glycine peptoid
library. J Med Chem 37:2678–2685
232 11 Combinatorics: Chemistry with Big Numbers
Gene Technology in Drug Research
12
Engineers and writers have predicted many developments in science and technol-
ogy. In addition to other sophisticated machines, Leonardo da Vinci described the
principle of the helicopter. In the early 1820s, Charles Babbage designed an
automatic calculator long ahead of its time. Over 160 years later, the mechanical
precursor of a programmable computer was in fact built, and it worked! Jules Verne
described submarines and a journey to the moon, and Hans Dominik described
obtaining energy by splitting the atom. All of these visions have become reality.
Only a single application was preconceived for gene technology, the most seminal
invention of our time: the cloning of two genetically identical individuals in Aldous
Huxley’s Brave New World. It remains a hope that researchers will respect ethical
boundaries, and despite the feasibility, never actually use Huxley’s idea.
With the methods of gene technology it is possible to bring new genes into the cell,
multiply them, and exchange or remove them. If they are removed or changed, the cell
can no longer produce the originalprotein derived from that gene. By introducing a new
gene and using a clever choice of method, the cell manufactures a foreign product,
either a purposefully modified protein, or an entirely new one. For many diseases, the
molecular cause is known to be the absence of a protein, or a genetically caused
mutation in a protein. Only a few generally known examples are mentioned here:
• Diabetes as a result of insulin deficiency,
• Particular, hereditary cancer forms (e.g., familial colon cancer, malignant
melanoma),
• Chorea Huntington, a chronic form of brain atrophy,
• Sickle cell anemia, a genetic disease producing malformed red blood cells
(Sect. 12.14), and
• Bleeding disorders that are caused by the absence of particular coagulation
factors (see Sect. 12.14).
The possibility of purposefully producing arbitrary proteins has yielded the
following main applications of gene technology.
• The identification of genes and proteins that could play a role in the treatment of
a disease,
• The development of animal models to test a therapeutic principle,
G. Klebe, Drug Design, DOI 10.1007/978-3-642-17907-5_12,
# Springer-Verlag Berlin Heidelberg 2013
233
• The production of proteins for therapies in which a particular protein is missing,
• The manufacture of monoclonal antibodies and vaccines,
• The manufacture of proteins for molecular test systems, and the determination of
the 3D structures of enzymes and other soluble proteins,
• The generation of proteins of which a targeted mutagenesis has been undertaken
to exchange one or more amino acids for the elucidation of the mode of action of
enzymes and for the characterization of receptor binding sites,
• Somatic individual gene therapy for specific patients.
Other application possibilities, for example, manipulation of the human
germline, or genetic changes in crops to achieve herbicide resistance, or to prolong
the shelf life of fruits, are only briefly mentioned here.
12.1 The History and Basics of Gene Technology
The foundations of gene technology were first established in the middle of the
twentieth century. The starting shot was made in 1953. Back then, James Watson
and Francis Crick elucidated the three-dimensional structure of the hereditary
substance of all living things, desoxyribonucleic acid, DNA. Immediate indications
were obtained from the structure about the mechanism of our hereditary transfer
and about the genetic code for the biosynthesis of proteins. A few years later,
Werner Arber found enzymes, restriction enzymes that attack a very specific
position on the double helix to sequence-specifically cleave the DNA. What was
initially seen as a curiosity proved to be an exceedingly important discovery for
gene technology. It is possible to selectively cleave DNA with these enzymes and to
introduce new fragments. Next, the merging of new information with the original
DNA, the recombination of the genetic constitution, is accomplished with ligases
from special viruses called bacteriophages. The techniques for DNA sequencing
have also made decisive progress. Soon afterward, the amino acid sequence of
a protein was no longer directly determined, but rather deduced from the analysis of
the corresponding DNA. Today, for sequencing the detour over the cDNA is used,
which is complementary to the RNA (Sect. 12.6).
In 1973 Stanley Cohen and Herbert Boyer managed to recombine the genome of
a bacterium for the first time (Fig. 12.1). Then things happened one after the other:
two years later the bacterial strain Escherichia coli K12, which is still used today,
was developed. A part of its genetic constitution is missing so that it is only viable
under laboratory conditions. This bacterium can be arbitrarily genetically manipu-
lated without the worry that it could be injurious. The British scientists H. Wil-
liams-Smith and E. S. Anderson carried out self-experiments independently of one
another in that they orally ingested Escherichia coli K12. They proved that these
bacteria only survive in the GI tract for a short period of time, and that the K12
gene, which confers antibiotic resistance for the selection of transformed cells,
cannot be transferred to the normal Escherichia coli that is found in the intestinal
flora. Experts discussed the possible dangers of gene experiments at a conference in
Asilomar, California, and defined different risk and safety classes. In 1976 the
234 12 Gene Technology in Drug Research
company Genentech was established. Its founder, Herbert Boyer had to borrow
US $500 as start-up capital! When in 1980 the company was initially traded on the
stock market, within a few minutes he became a millionaire because of the value of
his stock. As early as 1982 Genentech introduced the first medication to the market,
that was manufactured by using gene technology human insulin (Humulin®
).
In 1983 Kary Mullis made a very decisive contribution to gene technology in
that he developed the polymerase chain reaction (PCR) while he was working at the
California company Cetus, which was founded in 1971. Heating melts double-
stranded DNA into its single strands, then the four DNA nucleotides are added, as
are two short single-stranded DNA pieces that are complementary to the regions at
the beginning of the DNA, the so-called primers. A polymerase can then be used to
synthesize new DNA in a test tube. This means that by starting with the primers,
a new double strand is formed (Fig. 12.2). A heat-stable DNA polymerase from the
bacteria Thermus aquaticus, which is endemic to the hot springs in Yellowstone
National Park is used for the DNA synthesis. Each repetition of this step doubles the
original DNA amount. Within a few hours, billions and trillions of DNA molecules
can be manufactured from a single starting molecule. This amount is enough for
a sequencing of the relevant DNA segment.
Cutting the
Plasmid into
a Linear, Double-
Stranded DNA
Target DNA
is Added
Ligases Fuse
the Vector and
Target DNA
The Plasmid is
Introduced into
the Cell
DNA
Vector
Bacteria Cell Transformed
Cell
Recombinant
DNA
Plasmid
Fig. 12.1 The principle of gene technological recombination of hereditary information. Bacteria
often contain additional genetic material in addition to their “chromosome” in the form of ring-
shaped plasmids; these are used in gene technology as vectors to introduce foreign genes. Plasmids
are removed from the cell and sequence-specifically cut with so-called restriction enzymes,
which come from bacteria. The target DNA that carries the desired genes, which were typically
also treated with the same restriction enzyme, is bound in vitro to the overlapping single-stranded
DNA ends. The DNA ends are coupled with the enzyme DNA ligase, and the modified, recom-
binant plasmid is brought into the bacteria cell. Plasmid vectors that are used in gene technology
carry in addition to the DNA segment that is necessary for replication, additional information that
allows the recognition and selection of the transformed cells (usually an antibiotic-resistance
gene). In the presence of the selecting agent, only plasmid-containing cells grow.
12.1 The History and Basics of Gene Technology 235
PCR methods are applied diversely. The entire genetic information of an indi-
vidual can be derived from a single DNA molecule. In medical diagnostics, this
serves to evidence genetic disorders, cancer, infectious diseases, and risk factors.
PCR methods are also used to establish a genetic fingerprint in paternity tests and in
forensic science.
Heating at 95 °C
Double-Stranded
DNA Molecule
Two Single Strands
+ Two Primers
Excess
Nucleotide
Heating
at 95 °C
Heat-Stable
Polymerase
+
+
Arrows: Direction
of the DNA Synthesis
(DNA + Copy)
Repeating the
PCR Cycle
Four Single Strands
Multiple Repetitions of the PCR Cycle
(DNA + 3 Copies)
Fig. 12.2 Polymerase chain reaction allows unlimited identical copies of a DNA molecule to be
manufactured. For this the DNA is heated to separate the double-stranded DNA into complemen-
tary single strands. Synthetic oligonucleotides with approximately 20 bases, so-called primers,
which are complementary to these DNA strands hybridize with the corresponding strand. Each
primer must bind to one end of the two DNA strands. The primers set the boundaries of the
amplified DNA. Furthermore an excess of primer must be used because in each cycle one primer
pair is needed for each DNA double strand. They are not explicitly produced in later cycles. The
primers are necessary to effectuate the new synthesis of DNA in the presence of the DNA
polymerase and an excess of the four different nucleotides. This occurs in the reverse direction
(dashed arrow) because of opposite course of the DNA strands and the specificity of the
polymerase. The newly synthesized DNA segment can be a few hundreds to thousands of base
pairs long. The result is two identical double-stranded DNA molecules. After heating, single
strands are obtained and the above-described procedure is repeated. Because the DNA polymerase
is heat stable, it does not need to be repeatedly added. Each repeat of the above-described steps
leads to a doubling of the DNA molecule. Its number grows exponentially. Ten cycles lead to
about 1,000 DNA molecules, 20 to a million, and 30 already to a billion. In this way, a single DNA
molecule can be multiplied into a quantity that is biochemically analyzable.
236 12 Gene Technology in Drug Research
New genetic information cannot only be brought into bacterial cells, but also
into yeasts, virus-infected insect cells, and even in mammalian cells. In a first
approximation though, it is valid that the more complex the organism is, from
bacteria all the way to mammalian cells, the more difficult it is to produce proteins
in these cells. On the other hand, insect and mammalian cells have the advantage
that they not only produce small proteins but also more complex ones (e.g.,
glycosylated proteins) in a functional form. In many cases, such organisms are
therefore to be depended upon.
12.2 Gene Technology: A Key Technology in Drug Design
The 1970s and 1980s were the grand age of receptor-binding tests with membrane
preparations. Radioactively labeled ligands were used to determine the specific
binding of new substances. The most important receptors for hormones and
neurotransmitters were known and in some cases, the difference between pre-
and postsynaptic receptors as well. The different subtypes and their amino acid
sequences were not known. Correspondingly, the results of the investigations
were inaccurate.
Gene-technology methods allow the preparation of homogeneous recombinant
proteins in practically unlimited quantities. They play an important role at the very
first step of drug design: the identification of a target protein. Progress with the
methodology led to the discovery of new receptors with partially unknown function
or specificity. The next steps are the testing of the therapeutic concept
on genetically altered animals. Another important contribution is the preparation
of proteins for molecular test systems and the isolation of adequate material for
the elucidation of the 3D protein structure (▶ Chap. 13, “Experimental Methods of
Structure Determination”). With perhaps the exception of a very few proteins that
can be isolated from blood or other natural sources, the production of large
quantities of protein is dependent on gene technology. Nowadays the purification
of proteins from animals or human blood is done rather reluctantly. The risk of
transmitting viruses or infections is deemed to be too high.
Gene technology offers the possibility to selectively produce structural variants
of proteins. The generation of point mutations (site-directed mutagenesis) allows
particular properties in proteins to be improved, and the binding and catalytic
properties of enzymes to be purposefully changed. Membrane-bound receptors
can be probed position by position to establish which amino acids are responsible
for the maintenance of the 3D structure, the adoption of a particular conformation,
or are of critical importance with respect to binding of a ligand. Three-dimensional
structural models of receptors can be generated in this way, or their relevance can
be appraised.
In many cases, it has also proven worthwhile to introduce point mutations that
change the surface properties of proteins and help to elucidate the 3D structure of
proteins. Sometimes the charge on individual amino acids must be changed for the
12.2 Gene Technology: A Key Technology in Drug Design 237
sake of the protein crystallization. In the case of proteins in which a part of their
sequence is anchored in the membrane, the membrane anchor, which would impede
crystallization, is removed before the experiment. With soluble receptors it has
proven worthwhile to remove individual domains, crystallize them, and determine
their structure. Of course such modified proteins must still fulfill their particular
functions, that is, ligand binding or DNA docking. If the difficult crystallization
step is accomplished, then the actual structural elucidation is nowadays usually only
a matter of a few weeks in most cases (▶ Chap. 13, “Experimental Methods of
Structure Determination”).
If the contributions for humanity are considered that come from all of this
progress, the question involuntarily arises: why are such broad segments of society
so afraid of gene technology? It takes a little effort to understand these prejudices.
With the use of gene technology, almost everything that is theoretically imaginable
is possible in the field of genetics. The trust that people have in science is, however,
not as unshakable as it was before the atom bomb. Now, when significantly more
chances than risks are at hand, the sins of our forefathers have come back to haunt us.
Scientists have all too often underestimated possible risks in the past and put their
ethical concerns on hold. Scientists have still not managed to assuage public fears.
We must take these fears earnestly and build new trust by behaving responsibly.
12.3 Genome Projects Decipher Biological Constructions
The entire human genome is organized on 23 chromosomes. In 1990 the Human
Genome Organization (HUGO), equipped with a budget of US $3 billion, started
with the then exceedingly ambitious goal of sequencing the entire human genetic
code from DNA within 15 years. By the end of 1993, the first annotated genomic
maps became available, which were later refined. By 2001 it was then so advanced
that the entire genome was published in Science and Nature in parallel by two
consortia.
The two competing consortia followed different strategies. The publically funded
international consortium chose an approach of setting progressively narrowing
parameters, the stepwise digestions of the genome, and the systematic elucidation
of the sequences for the complete genomic analyses. In humans, this means that in
addition to the 5% of DNA that corresponds to genes, the other 95% of sequenced
DNA, the function of which was unknown, was classified with the somewhat
derogatory term “junk DNA.” Today it is known that these areas take on important
tasks in the regulation of gene expression (Sect. 12.7). The second strategy, which
was pursued by the privately financed consortium, made use of the so-called shotgun
method. For this, a longer DNA strand was amplified, and then cleaved into many
arbitrary small segments. After these segments were sequenced, the sequences were
reconstructed to the original long DNA strand by using a powerful computer pro-
gram. This can only work, of course, when the sequences of the cleaved segments
display adequate overlap. This technique proved to be significantly faster than the
238 12 Gene Technology in Drug Research
usual systematic sequencing methods. Above all, it benefited from the development
of faster and faster sequencing machines and powerful bioinformatic programs. It
was of no disadvantage in the end that because of the high redundancy of the method,
the genome had to be sequenced multiple times with the shotgun method. Interest-
ingly, the shotgun method was also used at the end by the international consortium
that followed the systematic approach to elucidate local sequence areas. Because the
initial intent of the private enterprise was to patent the sequenced genome, the
competition between the two initiatives was great. In March 2000, the American
President Bill Clinton declared the human genome to be not patentable, and spoke for
its use by everyone for the common good.
How did it come that a competing private initiative started to sequence the
genome? In spring 1995 Craig Venter and his group identified the entire genome for
the bacteria Haemophilus influenzae by using the shotgun method. The enormous
amount of 1,830,121 base pairs that code for 1,749 genes was sequenced. The
complete genomes of individual viruses were already known, but this was the
decoding of the genetic information of a self-contained creature. The subsequent
decoding of the sequence of 580,067 base pairs of the Mycoplasma genitalium
genome by Venter’s wife, Claire Fraser, took only four months.
Venter and his group worked with the shotgun method on the entire genome, the
so-called “whole-genome shotgun sequencing.” The statistical approach that was
followed by Venter initially seemed so unusual and utopian that his application for
a research grant from the American National Institutes of Health (NIH) was
rejected. This brought about the founding of The Institute for Genomic Research
(TIGR) and the Celera Genomics company. There, Venter could pursue his
research according to his ideas and plans. Finally, the success proved the feasibility
of the proposed strategy.
Whose genome was actually sequenced? In both initiatives the DNA of multiple
individuals was mixed and the individual differences were purposefully calculated
out. In this way the “consensus sequence” of the human genome was determined.
But it did not stop with the human genome. The complete elucidation of baker’s
yeast Saccharomyces cerevisiae, and the common thale cress Arabidopsis thaliana,
the rice plant Oryza sativa, the pinworm Caenorhabditis elegans, the fruit fly
Drosophila melanogaster, the chimpanzee Pan troglodytes, the mouse Mus
musculus, and many other organisms (Table 12.1) has been accomplished. In the
meantime new ones emerge weekly. This raises new questions: how should this
plethora of information be managed? How can the genetic information be trans-
lated into useful knowledge? The field of bioinformatics has been challenged.
Computer programs for the intelligent comparison of sequences and the analysis of
metabolic pathways and signaling cascades already exist. New initiatives were
founded that have the goal of determining the spatial structure of all or at least
many sequences. The structural space of all real, naturally occurring proteins is
filling slowly. The crystal structures of all representatives of some protein families
of the human genome have been determined. Therefore it is only a question of time
until we can lay spatial blueprints next to the catalogue of all sequences in our
genome.
12.3 Genome Projects Decipher Biological Constructions 239
12.4 What Is Contained in the Biological Space of the Human
Proteome?
After the human genome was sequenced, the exciting question arose as to which
gene products all of these DNA sequences code for. Initially it must be remarked
that the genome is not static, it is constantly changing. It is only in this way that the
genetic variations can occur that make up the diversity of all creatures. In the course
of evolution, the genetic constitution has expanded. Simple single-cell organisms
without cell nuclei (prokaryotes) have a circular genome that contains only coding
genes. Single-celled organisms with a cell nucleus (eukaryotes) such as yeast, have
a larger genome, of which about 20% represents coding genes. Multicellular
organisms such as humans have a genome that is 200-times larger than that of
yeast (Table 12.1). The number of coding genes, however, is not larger. There are
even organisms such as the amoeba that have a genome that is 200-times larger than
that of humans. Even the miniscule water flea numerically overshadows us with its
31,000 genes. So the alleged masterpiece of creation does not necessarily also have
the largest genome. Obviously only a small number of additional DNA sequences
have accrued during the course of evolution that in fact code for additional gene
products. Many genes in higher organisms are similar to those in simpler species. If
the number of coding genes has hardly grown from the single-cell organisms to
humans, and even the gene products that are coded for are similar, what is the
explanation for the massive increase in complexity of the genome in higher-
developed organisms? The answer is not in the diversity of the needed gene
products, but much more in the finely tuned regulation of gene expression
(Sect. 12.13). In higher organisms, it is of decisive importance where and at what
time particular genes are expressed and gene products are synthesized. The 95% of
Table 12.1 Examples for the sequenced genomes of different organisms
Organism Genome sizea
Genes
HI virus 9.2  103 b
9
HI-9.2 virus, Phage l 4.85  104
70
Intestinal bacteria Escherichia coli 4.6  106
4,800
Baker’s yeast, Saccharomyces cerevisiae 2  107
6,275
Pin worm, Caenorhabditis elegans 8  107
19,000
Wallcress, Arabidopsis thaliana 1  108
25,500
Fruit fly, Drosophila melanogaster 2  108
13,600
Green blow fish, Tetraodon nigroviridis 3.85  108
Human, Homo sapiens 3.2  109
25,000
Common newt, Triturus vulgaris 2.5  1010
Ethiopian lung fish, Protopterus aethiopicus 1.3  1010
Amoeba, Amoeba dubia 6.70  1010
a
Number of base pairs
b
Single-stranded RNA
240 12 Gene Technology in Drug Research
human DNA that does not code for proteins contains numerous sequences and
signals that control this regulation. Therefore the total number of genes in higher-
developed creatures does not seem to increase, but rather the gene density
decreases. On average, 12 genes per one million base pairs are found in the
human genome, whereas this number is 118 in the fruit fly, 197 in the pinworm,
and 221 in the common thale cress. Furthermore the human genome is very
scattered. It seems that it is not the number of genes but rather how they are used
and how their activation is regulated that is decisive for the developmental state of
the organism. It must also be considered that multicellular organisms also need
a great deal of cell differentiation into different organs. These processes must be
reliably regulated and controlled. Moreover, higher organisms achieve a much
larger diversity in their protein composition by so-called alternative splicing.
Posttranslational modification after the biosynthesis also plays a role. This is
observed to a much smaller extent in, for instance, prokaryotes. The splicing
process cuts out the portions of DNA that are not coding for proteins during
translation from DNA to RNA. During alternative splicing, it is decided in what
is cut out and what is translated. In this way, one DNA sequence can code for
multiple different proteins.
To date, the largest genome of a prokaryote that has been found belongs to the
pathogenic protozoa Trichomonas vaginalis. It consists of 160 million base pairs.
This pathogen is usually transmitted in humans by sexual intercourse and causes
urinary tract infections. Its enormous genome takes on an over-proportional dimen-
sion in the cell. This could create an advantage for the pathogen because its large
surface area adheres to the vaginal mucosa better. Furthermore, the immune system
has trouble to attack and destroy such an over-sized parasite. The genome of the soil
bacterium Sorangium cellosum with 13 million bases and 10,000 genes is four times
as large as the average genome of other bacteria. This might have something to do
with the fact that this soil bacteria is able to carry out special tasks that makes its
therapeutic use interesting. It is a versatile producer of complex natural products
such as the epothilones, which are potent chemotherapeutics that have great poten-
tial in the treatment of cancer.
According to an analysis carried out in 2007, the human genome encompasses
3.25 billion bases. It contains around 25,000 genes, a few thousand of which are
recognized as RNA genes (even today the number is not exactly named because
only 92% has been fully sequenced). The earlier textbook knowledge that one gene
product is behind each DNA sequence, must be expanded. It must not be
overlooked that our genome contains many thousands of genes that are for non-
coding RNA segments. The resulting RNA molecules accomplish important func-
tions in our bodies. The large groups of tRNAs that serve as adapter molecules for
the reading and translation of base-pair triplets in the genome into the correct amino
acid sequence deserve special mention. Furthermore it has been shown that the
ribosome itself, which is the molecular machinery for protein synthesis, consists
largely of RNA. The spliceosome, the complex machinery for the removal of non-
coding segments of the genome, contains RNA molecules, so-called snRNAs.
12.4 What Is Contained in the Biological Space of the Human Proteome? 241
There are even more small RNA molecules (snoRNAs) that are responsible for the
processing and modification of other RNA molecules.
Since then, it is known that over 21,500 genes in our genome are translated into
proteins. It is not known however, what functions all of these proteins fulfill.
Bioinformatics has contributed a great deal to classification of their biochemical
function, that is, whether the protein is an enzyme (e.g., a protease, kinase, or
oxidoreductase) or whether it is a receptor, ion channel, or transporter. The function
or to what protein class a new sequence belongs can be discovered by sequence
comparisons to already annotated proteins. Often by making so-called multiple
sequence comparisons within a protein family, a significant similarity can be recog-
nized. The information about the spatial architecture and folding (▶ Sect. 14.2) can
be analyzed through relationships because the spatial geometry of proteins has
been much more strongly conserved than the sequential composition of the folded
protein chain. It is often that individual motifs or characteristic sequence segments
disclose a particular biochemical function of a protein. Another tool in this detective
tour de force of functional annotation has proven to be protein sequence comparisons
between the genomes of other species.
The assignment of a biochemical function to a protein sequence affords a
glimpse into its molecular function. It shows whether, for example, it cleaves
a peptide sequence as a catalyst, carries out a metabolic reduction, or transduces
a signal to the cell as a receptor. What this regulation and control mean for
the organism remains to be resolved. Whether a particular protein causes
a disease by either a defective function or by dysregulation is just as unclear.
The correction of such a defect could lead to a successful pharmaceutical
therapy.
In the Science publication from the Venter group in 2001, it was assumed that the
genome coded for more than 26,500 proteins. At that time, a definitive function
could not be assigned to 40% of the sequences. In the remaining part, about 10%
were detected to be enzymes. Another 12% proved to be involved in signal
transduction, and 13.5% are nucleic acid binding proteins. The large remaining
group was scattered across many different functions such as proteins of the cyto-
skeleton, surface receptors, ion channels, transporters, extracellular matrix proteins,
immune system proteins, or chaperones. Seven year later this picture could be
refined. The largest protein family with more than 7,000 members contains the zinc
finger domain (▶ Sect. 28.2). These proteins assume an important role in transcrib-
ing sequence segments of the DNA into RNA. Most zinc finger proteins belong
to the group of transcription factors. Another large protein family contains the
immunoglobulins. These domains (▶ Sect. 32.1), which are constructed from b-
pleated sheets, occur in antibodies. A few protein families are listed in Table 12.2
and are presented in more detail in ▶ Chaps. 23, “Inhibitors of Hydrolases with
an Acyl–Enzyme Intermediate”; ▶ 24, “Aspartic Protease Inhibitors”;
▶ 25, “Inhibitors of Hydrolyzing Metalloenzymes”; ▶ 26, “Transferase Inhibitors”;
▶ 27, “Oxidoreductase Inhibitors”; ▶ 28, “Agonists and Antagonists of Nuclear
Receptors”; ▶ 29, “Agonists and Antagonists of Membrane-Bound Receptors”;
▶ 31, “Ligands for Surface Receptors”; and ▶ 32, “Biologicals: Peptides, Proteins,
242 12 Gene Technology in Drug Research
Nucleotides, and Macrolides as Drugs” of this book. The compilation of what
protein family is frequently associated with what disease (Fig. 12.3) is interesting.
This list is led by the protein kinases (▶ Chap. 26, “Transferase Inhibitors”).
Therefore it is not surprising that current basic research in the pharmaceutical
industry is intensively concentrated on the control and inhibition of protein kinases.
The cadherins follow this group. These proteins are important for the stabilization
of cell–cell contacts. They play a role in the embryonic morphogenesis, signal
transduction, and intervene in the construction of the cytoskeleton in cells. The
G protein-coupled receptors, ion channels, trypsin-like serine proteases, or RAS
proteins also belong to this list of proteins that are potentially associated with
disease, especially when genetically altered.
Finally, how the human genome is different from other eukaryotes should also
be considered. From more than 2,200 protein families that have been discovered in
organisms with a cell nucleus, over 1,000 are missing from the human genome.
Most of these families assume specific tasks in the relevant organisms or are
explained phylogenetically. Among these are, for example, venoms such as found
in snakes, scorpions, or insects. Proteins occur in plants that assume a very specific
function for the plant, for example nutrient storage in the seeds, or defense against
disease. As a rule, the proteins that are absent in humans assume biochemical
functions that are irrelevant for our organism, or they assume a highly specific
task in the lower eukaryotes.
Table 12.2 Examples of protein families in the human genome and the number of their members
Protein superfamily Number
Zinc finger (C2H2 and C2HC) 7,707
Protein kinase-like 876
G Protein-coupled receptor-like 784
a/b-Hydrolases 151
Cysteine proteases 164
Trypsin-like serine proteases 155
Metalloprotease (“Zincins”), catalytic domains 132
FAD/NAD(P)-binding domains 79
Cytochrome P450 79
Integrin a, N-terminal domains 51
Cytokines 52
cycl. Nucleotide-phosphodiesterase, catalytic domains 50
Caspase-like 39
Carbonic anhydrases 23
Aquaporin-like 20
Integrin domains 18
Aspartic proteases 16
ClC-chloride channel 16
Subtilisin-like 14
http://hodgkin.mbu.iisc.ernet.in/human/
12.4 What Is Contained in the Biological Space of the Human Proteome? 243
12.5 Knock In, Knock Out: Validation of Therapeutic Concepts
Molecular biology delivers a plethora of information about how diseases
develop and how their course can be influenced. The long route from the search
for and development of a new drug is based on this. In the end it might be
determined that the result, even though it was so well planned, did not lead to
the desired clinical success. It is therefore important to have an animal model
available upon which the therapeutic concept can be validated early on. Clas-
sical test models are often not available because the corresponding disease does
not occur in animals.
Since the 1980s, transgenic animals have been increasingly used in pharmaco-
logical research. These are animals in which a particular gene is fully or partially
turned off or is replaced by a human gene. An animal in which the gene is
completely turned off corresponds to an animal in which the relevant protein is
absent or non-functional. A heterozygous animal in which the gene is only present
in one parent, corresponds to an animal in which the protein is partially blocked. If
the gene for an enzyme or receptor is affected, the influence of an inhibitor or an
antagonist can be simulated. The development and progress of a disease or the
influence of protein inhibition on a disease can be observed in such an animal. In
this way some assurance about the relevance of a therapeutic concept is established
before an exceedingly long research and development process. The increased
Protein kinases
250
300
Cadherins
GPCR
Fibronectin III
150
200 Homeobox
Spectrin
MHC I
50
100
Ion Transport
Myosin
RRM
0
Frequency
Trypsin-like
Laminin EGF
Ras
SH2
Fig. 12.3 The composition of protein families that are particularly often associated with human
diseases (GPCR: G-protein-coupled receptor; Fibronectin: extracellular glycoproteins in tissue
construction; homeobox: proteins that influence the morphogenetic development; spectrin: cyto-
skeletal proteins; MHC I: major histocompatibility complex proteins that are involved in immune-
recognition processes; myosin: motor protein in muscle control; RRM: RNA-recognition motif
transcriptions factor; trypsin-like: serine proteases; laminin EGF: a growth factor in the extracel-
lular matrix; Ras: oncoprotein in tumorigenesis; SH2: protein domains in the phosphorylation
signal cascade).
244 12 Gene Technology in Drug Research
production of a particular protein can be induced by multiplying a gene. If the
absence of a gene causes the overexpression of another gene, this will also become
transparent. The gene product that is then produced in increased quantities can take
over the missing function of the turned-off product. In such a case the planned
therapeutic principle would only work if the other gene-product’s function were
also blocked. This question plays an important role in the inhibition of kinases
(▶ Sect. 26.2).
A very specific gene is turned off in the so-called knock-out method. The
technique was developed in 1987 by Mario Capecchi at the University of Utah.
The sequence of the turned-off gene must be known. A structurally homologous
gene that is not functional, for example because of the insertion of a stop signal,
is generated. The gene is introduced in an animal, and the intact gene is replaced at
exactly the same position. The process is called homologous recombination or
also gene targeting. Mice are particularly well suited because the technology to
manipulate their embryonic stem cells is especially advanced. A foreign gene, for
example a human gene, can also be introduced. Mice are also well suited for this
because their genome and the human genome are surprisingly similar.
To generate a transgenic mouse, the female mice are treated so that they produce
a large number of egg cells. After fertilization, stem cells are extracted from the
embryos in a very early stage, the blastocytic stage. They are cultured in vitro, and
the desired gene is injected into the cell. This procedure only works in low yield.
A technique was developed with which transfected from non-transfected cells can
be differentiated. For this, the gene that should be transferred is coupled beforehand
with a gene that confers resistance to the cell toxin neomycin. When the cells are
treated with neomycin, only the transformed cells survive. The blastocytes are
united with blastocytes of other mice and the altered embryos are carried to birth
by mice. The offspring of the surrogate mothers are chimeric, that is, they carry the
genetic information from the donor as well as the acceptor mice. Here mice with
differently colored fur are chosen so that the transformed mice can be easily
recognized by their spotted fur.
Another method is that foreign DNA can be injected directly into an early
embryonic stage. A disadvantage of the random incorporation of a gene is the
possibility of destroying another gene, a lack of expression of the new gene, or
multiple incorporations. Animals from the first litter are bred to generate both
genetically mixed, heterozygous animals, and genetically homogenous, homozy-
gous animals. Particularly sophisticated techniques even allow the selective turning
on and off of the new genes.
In this way transgenic animals are generated in which hereditary diseases, for
instance, cystic fibrosis, Crohn’s disease, phenylketonuria, and others, can be
studied. Relevant animal models also exist nowadays for diseases that have
different or multiple causes such as cancer, diabetes, rheumatoid arthritis, and
Alzheimer’s disease. Since 1988 when the American Patent Office first granted
a patent for a transgenically altered mouse, a controversy has erupted as to whether
a living creature can be patented at all. In the meantime there are whole series of
12.5 Knock In, Knock Out: Validation of Therapeutic Concepts 245
patents for gene-technologically altered animals, including European and German
patents, and the conflict about whether these patents are ethically or legally valid
continues.
12.6 Recombinant Proteins for Molecular Test Systems
Early on, pure or enriched enzymes were available for in vitro tests, but only in
those cases in which the material was easily available, for instance, human throm-
bin from blood. In other cases, animal material had to be used with all the risks that
come with it considering the relevance for rational design (see ▶ Sect. 19.11).
There are many proteins that cannot be isolated in adequate amounts or in
a homogeneous form. The sequence determination and the production of such
proteins are simple today. The unbelievably small amount of a few picomoles
(1 pmol¼1012
mol) is enough to determine the primary structure of a short
sequence. It is over the thus-determined amino acid sequence that, after the
translation procedure, the genetic code can be reconstructed into a gene. In doing
so, it must be considered that multiple base triplets can stand for a particular amino
acid (so-called degenerate codes, ▶ Sect. 32.7). A group of single-stranded oligo-
nucleotides are synthesized that could theoretically cover all the original peptide
segment. These molecules can be used to find a complementary sequence in
a cDNA library. cDNA (complementary DNA) is the complementary DNA to
the mRNA (messenger RNA). It is obtained from the mRNA, which merely
contains the sequence that is needed for the biosynthesis of proteins, by translation
with a reverse transcriptase (▶ Sect. 32.5). Finally the gene is produced in larger
quantities by using PCR techniques, and the amino acid sequence is determined via
its base sequence simply because oligonucleotides are much easier to sequence.
Next, the gene is brought into cells that are allowed to reproduce. There can be
difficulties in a few cases with this step. In bacteria, such as the intestinal bacterium
Escherichia coli, or in yeast cells, only soluble proteins can be produced. Some
proteins accumulate in inclusion bodies. They must be extracted, dissolved, and
refolded under specific conditions. The gene segment for a small protein is often
coupled with the information for another protein and both are then expressed
together. The large protein conjugate that forms in the cell is better protected from
metabolic degradation than small proteins. In the preparation, the non-essential part is
cleaved from the protein conjugate. There can be problems if the folding of the
protein is not correctly accomplished, or if multiple chains (as in insulin) must be
coupled over disulfide bridges. Larger proteins that must be furnished with sugar
groups to accomplish their function (glycosylation) must be produced in cells from
higher organisms, for example in mammalian cells. The manufacture of complex
proteins in insect cells has become particularly attractive. These cells are infected
with the so-called baculovirus, in which the desired information has been incorpo-
rated into its genome. The virus codes for the protein and insect cells provide the
production and subsequent glycosylation abilities. Not only enzymes, but also recep-
tors, ion channels, and entire signaling cascades can be produced in cells in this way.
246 12 Gene Technology in Drug Research
12.7 Silencing Genes by RNA Interference
How intervention in the germline of organisms of genetically altered species can
occur so that a particular absent or defective gene and therefore gene product can be
replaced was introduced in Sect. 12.5. The function of particular genes for an
organism can be studied in this way. The consequences for the organism, for
example, of blocking a particular gene product, are made transparent prior to the
development of a potent active substance. In the late 1990s, another technique was
discovered that allows genes to be silenced without intervening in the molecular
biology of the genes of an organism. This work was carried out by Andrew Fire and
Craig Mello. For their accomplishments, which are only slowly being validated,
they were awarded the Nobel Prize in 2006.
Genes are archived in DNA. For gene expression the coding part of the genome
is transcribed into mRNA. Based on this copied information, the ribosome trans-
forms the base sequence into a peptide sequence. In the early 1980s the idea
emerged to trap the translated information on the single-stranded mRNA by adding
an inversely arranged RNA complement strand, the so-called antisense strand. The
two strands can hybridize, that is, they can bind to form a matching double strand.
Double-stranded RNA is then the result. This antisense principle (▶ Sect. 32.4) did
not deliver the hoped-for, break-through result. Genes were partially or weakly
suppressed, however, even the addition of a normal RNA strand can achieve
suppression. Fire and Mello suspected that neither the normal nor the antisense
strand could cause a gene blockade, but rather the double-stranded form that was
added inadvertently as an impurity. Renewed experiments confirmed the assump-
tion. Interestingly, even small amounts of double-stranded RNA are enough to take
many mRNA molecules out of action. When using the antisense strands, on the
other hand, stoichiometric amounts are necessary. This also shows that short, ca.
20-nucleotide-long double-stranded RNA fragments are enough to silence an entire
mRNA gene sequence. Fire and Mello named the phenomena RNA interference.
What had happened? An enzyme with the name dicer cleaved the double-stranded
RNA into 21–23 base-long pieces that then caused the blockade. For this, double-
stranded RNA pieces are incorporated in an enzyme complex called RISC (RNA-
induced silencing complex) and separated into single strands. One strand breaks
away from the complex while the other remains there to act as a template to capture
mRNA molecules.
The sequence of the captured strand allows the RISC complex to recognize all
mRNA with a complementary base sequence and to cleave them sequentially.
Finally, they are digested by enzymes in the cell plasma. The cell selectively
eliminates only the mRNAs that contain the sequence pattern that is complementary
to the short RNA strands in the RISC complex. In practice, this gene blockade has
proven to be simpler and more reliable than the antisense technique. RNA interfer-
ence even allows discovered genes to be systematically blocked to draw conclu-
sions about the resulting consequences for the organism. RNA interference serves
not only analytical purposes. There are already biotech companies that want to turn
off disease-causing genes with small RNA fragments.
12.7 Silencing Genes by RNA Interference 247
There is another bigger problem though: how is a 22-base RNA molecule to be
transported into the cell to the place where it should act? Strongly charged mole-
cules cannot cross the cell membrane. For this, a special delivery system that allows
this task is needed. Intensive research is taking place on the development of such
systems, but the problem is a long way from being solved. A reliable and highly
efficient system that can selectively transfer such polar and nuclease-sensitive
molecules into the cell interior is likely to open a totally new and presently
unforeseeable perspective for the therapy of disease.
The goal is to construct delivery systems that can pack the fragile and polar
freight of RNA molecules and dock onto the cell. There, the coat of these carriers
must merge with the cell membrane or selectively achieve penetration to arrive
in the interior of the cell. One concept follows the packaging and compartmental-
ization of RNA in polymers such as polyethylenimine. The positive charges on
the polymer backbone can bind and encapsulate a negatively charged polymer
molecule such as RNA or DNA building blocks. Other systems try to make the
RNA or DNA molecules bioavailable for the cell by encapsulating them in
a membrane-like coat. This packaging in liposomes leads to a selective adhesion
of the artificial cell to the membrane of the target cell, and then the liposome melts
with the target cell in an endocytosis-like process.
A further problem is the danger that small, silencing RNA molecules (siRNAs)
could cause an immune response. A solution to this dilemma is represented by the
chemical modification of the siRNAs. For this the RNA molecules are modified so
that they can still optimally hybridize to the addressed segment in the mRNA, but
have better properties in terms of transport, immunogenity, and stability. For this,
the OH groups of ribose building block of the nucleotides have been exchanged for
fluorine, methoxy, or hydrogen.
siRNA research certainly is still in its infancy. The potential of the methods seems
to be impressive, as it uses the principles applied in Nature for gene regulation. As
previously described, we have genes in our genome that code for microRNAs and
that show sequence complementarity over long stretches. Structurally, they exist as
double strands. They are cut to size by the dicer protein and can serve to interfere with
RNA: as a result, this leads to an alternative means of gene regulation. For a broad
therapeutic application of externally administered RNA fragments there are certainly
important prerequisites to fulfill such as transport into the host cell and the prevention
of an immune response. Currently the technology serves the construction of model
organisms, to study the consequence of turning off genes. Nonetheless, the validation
of the method for use under in vivo conditions has long since begun.
12.8 Proteomics and Metabolomics
The approaches that were described in Sects. 12.5 and 12.7 pursue the goal of
turning a disease-causing gene, or a gene that plays a role in a disease off. But how
is it to be recognized whether a particular gene or gene product is involved in
a disease process at all? Decisive indicators to answer this question can be extracted
248 12 Gene Technology in Drug Research
from the protein composition in the cell. This composition changes dynamically. It
is termed proteome, and reflects the totality of all proteins in a cell, actually in the
entire organism, at a given time under entirely defined conditions. If we concentrate
on the protein pattern of a cell from a particular organ, important variables are the
metabolic state, the developmental stage of the organism, the time point in the cell
cycle, or the surrounding temperature. Disease processes and pharmaceutical ther-
apy also change this pattern. In the transcriptome, all theoretically expressed
proteins are coded as static hereditary information. In contrast, the proteome
reflects the protein composition at a particular time point. The difference between
a butterfly in its caterpillar and adult phases serves as an impressive example of the
difference between the genome and the proteome. The genome is the same for both,
but the proteome is significantly changed, which is expressed in the form of
a completely different phenotype.
In view of disease processes or a pharmaceutical therapy thereof, the proteome
can be used to compare the state of cells that are healthy, diseased, as well as under
the influence of a drug therapy. Initially this seems like an extremely complex,
barely solvable task. A cell contains thousands of proteins, of which many are
modified after their expression. For example, the first amino acids in a sequence are
cleaved (▶ Sect. 25.9), phosphate groups are transferred (▶ Sect. 26.3), sugar
building blocks are added on, disulfide bridges are coupled, prosthetic groups are
added, and ubiquitin or prenyl groups are added (▶ Sect. 26.10). In addition,
alternative RNA splicing occurs, which is carried out as a mechanism of gene
regulation and further increases the diversity of the proteome on the basis of
a comparatively small number of genes. All of this dramatically increases the
diversity of the protein composition, probably by a factor of 5–10 compared to
the genome composition. Nevertheless, a sophisticated analytical method has been
developed with which it is possible to analyze the proteome of a cell at a particular
time point. First a cell must be denatured in a way that all modifying processes are
abruptly stopped so that conclusions can be drawn about the cell contents. The cell
lysate is then subjected to separation. Proteins contain many acidic and basic amino
acids so that an exactly defined pH value exists for each protein at which the
protonation or deprotonation arrives at a state at which the protein appears to be
overall electrically neutral. This pH value is specific for each protein and depends
on the amino acid composition (isoelectric point). The protein mixture is added to
a solid support (a polyacrylic acid gel) as would typically be used for chromatog-
raphy purposes. Then voltage is applied. If the proteins carry a charge, they migrate
over this solid support in the direction of the oppositely charged pole. In this way, at
some point in their migration over the gel, which is construed in such a way that
a continuous pH gradient from one end to the other is established, the applied
proteins reach a point where their exterior appears to be uncharged overall. If this
position is reached on the solid support, the proteins no longer migrate. Proteins are
then separated according to their isoelectric point by using this so-called isoelectric
focusing. All proteins with the same isoelectric point migrate the same distance and
occur as a mixture. Then the chromatography plate is turned 90
, and the proteins
are separated again, however now according to a different principle. For this, the
12.8 Proteomics and Metabolomics 249
proteins are thermally denatured and their charges are masked with sodium
dodecylsulfate, a strongly charged anionic surfactant, so that all are virtually
equally charged on the exterior (SDS-PAGE). The denatured proteins migrate
again by the application of an electrical field. Now the migration speed is, however,
dependent on the mass of the proteins. The direction of the migration, which is
perpendicular to the first isoelectric separation, causes the originally applied pro-
teome to be broadly distributed and well separated on the solid support in the end.
By using this 2D electrophoresis, it is possible to separate many thousands of
proteins. The quantity and sequential composition of the separated proteins must be
characterized. Many different staining and fluorescence techniques have been
developed for the quantitative analysis. They allow quantitative determinations,
especially in comparison to the proteomes of analogous cells that are in a different
state. This is how the quantitative comparison of the protein composition in
a diseased and healthy state is accomplished. How the proteome changes under
the influence of a drug (Fig. 12.4) can also be determined. But how does one
recognize what is hidden in each individual protein spot on such a 2D gel? For
this the proteins are extracted from the plate and digested with trypsin. This
protease (▶ Sect. 23.3) cleaves the denatured proteins into small peptide fragments,
which are finally analyzed by mass spectrometry. Sophisticated technologies
together with computer analyses of precalculated fragmentation patterns of proteins
allow the proteins to be reconstructed and characterized with regard to their
sequence. Proteins in the proteome that have either been up- or down-regulated
due to a disease process can be detected in this way. Whether, however, the altered
expression pattern causes the pathological state or is a consequence of it is
a question that remains to be answered by independent experiments.
As described, the proteome of a cell can change upon therapy with a drug. What
are the interaction partners for a given drug? Are the induced effects always the
a b c
1 2 3 1 2 3 1 2 3
8 9 8 9 8 9
5
6
4
7 5
6
4
7 5
6
4
7
10 10 10
Fig. 12.4 2D-Gel electrophoresis for cellular proteome analysis: (a) proteome of a normal cell.
(b) Proteome for a pathologically altered cell. (c) Proteome of a pathologically altered cell after
treatment with a drug. Changes in the protein concentration are indicated by red circles. Above all,
the proteins at positions 3, 6, and 7 are clearly up-regulated in the diseased state. A few of the
pathological changes are corrected by the drug therapy, but new changes in the proteome (e.g.,
2, 8, and 10) might be induced by side effects (Figure taken from Lottspeich F (1999) Angew
Chem Intl Ed Engl 38:2476–2492).
250 12 Gene Technology in Drug Research
same if drugs from the same compound class are used? The properties of three
kinase inhibitors that were developed for the treatment of chronic myeloid leukemia
(▶ Sect. 26.5) were investigated in detail in the research group of Giulio Superti-
Furga in Vienna, Austria. For this, the drug first had to be equipped with
a chemically inert anchor group. It is certainly quite a sophisticated challenge to
find the correct position to place an anchor on such an active substance so that the
mode of action is not significantly perturbed. As a rule, multiple positions along the
molecular scaffold must be tried for this purpose. Finally the drug is irreversibly
covalently coupled over the attached anchor group to a chromatography column.
Once equipped with these “baits,” the proteome from the lysate of a cell is added to
the column. Proteins that have affinity to the immobilized drug stick to the column.
Finally the binding partners that were detected in this pull-down experiment must be
released from the column, separated, and characterized analogously to the above-
described technique. The composition of all proteins that have affinity to the active
substance is obtained. It is difficult to initially extract quantitative conclusions about
the affinity of the binding partners, above all because the protein quantities and their
composition in the lysate are highly variable. It is possible, however, to construct
a profile for each active substance according to its protein interaction partners. This
led to the surprising result that even drugs that belong to the same or similar substance
classes and were developed for the same therapeutic indication can well display
significantly different interaction profiles in the cell. This is an impressive observa-
tion, the evaluation and application of which will require great research effort. We
will see in the next section that the different efficacy, therapeutic deviations, and
variable side-effect profiles in patients can be explained by this.
Proteome analytical techniques (proteomics) can also be used in clinical diag-
nostics. Without exactly resolving the analyte, significant changes in the form of
a mass fingerprint can be recognized. Tumor diseases are revealed by changes in
their protein composition. These can be recognized at a very early point, which
should hopefully still allow a curative treatment for the tumor.
Another technique that is analogous to proteomics is the analysis of metabolites
that are produced in an organism. The term metabolome comprises all metabolites
(e.g., metabolic degradation products) that are present at a specific time point. The
techniques of metabolomics try to quantify the metabolite composition and to
draw conclusions about the condition of the cell based on this information. This
is particularly valid when the cell is exposed to foreign substances. If the metabolite
profile at a particular time point is studied, especially in pathophysiologically
or genetically changed conditions, the term metabolomic is used. The goal of this
technique is to draw conclusions about the molecular composition in cells from
body fluids such as urine, serum, or cerebral spinal fluid. This can lead to an
improved and more sophisticated diagnostic procedure, and therefore an easier
early detection of diseases. These techniques also serve to characterize proteins
for drug therapy or to analyze the greater influence of an event in the cell that is
being treated with a drug. The hope remains that these techniques will allow for
a better understanding of the total effects of the use of pharmaceuticals, and finally
achieve a higher safety standard for therapy.
12.8 Proteomics and Metabolomics 251
12.9 Expression Patterns on a Chip: Microarray Technology
Thousands of molecules are found in the analysis of the genome, transcriptome,
proteome, or metabolome, the occurrence of which must be characterized. This
flood of data requires immense measurement capacity. Therefore in the late 1980s
the development of microarray technology was initiated. Thousands of molecules
that are to be analyzed in parallel in an automated fashion are attached to a support
that is only a few centimeters large and is made of glass, silicon, gold, or nylon
(Fig. 12.5). Only very small quantities of the biomolecules are needed. In the
meantime, this technique has achieved a maturity that allows its use in routine
analytical procedures. In addition to the appropriate preparation of the surface, it is
also the art of reliable and standardized immobilization of the molecules needed
for the precise analysis that guarantees the success of the method. In addition to
proteins and protein domains, antibodies, antigens, and especially DNA, oligonu-
cleotides, and RNA can be immobilized. Proteins are often anchored in that the
protein of interest is co-expressed coupled to an anchoring protein such as
streptavidin as a so-called fusion protein. The streptavidin anchor is attached to
the surface via biotin. Further, chemistry with thiol groups is used. The coupling to
the surface, which was previously equipped with appropriate reactive groups, is
accomplished with disulfide bridges. Other strategies use amino groups, for exam-
ple of lysine, which are then coupled to a reactive aldehyde group on solid support
material. To test the composition of an analyte, a soluble mixture is added to
a premanufactured chip. If binding partners are found in this transformation, the
components from the analyte solution remain adhered to the surface.
Such binding must be simple and detectable on the chip in a spatially resolved
manner. Initially, stain and fluorescence were the method of choice (Fig. 12.5).
Fluorescence stains, for example, green and red stains, are used for this because
they can be excited and detected easily and in a spatially resolved way. If mixed
signals resulting from a simultaneous red and green fluorescence occur, a yellow
signal is obtained. In the meantime surface plasmon resonance has achieved
a greater significance (▶ Sect. 7.7). As an alternative, this technique is used for
the detection of binding. Moreover, techniques that function similarly to ELISA
methods are also used (▶ Sect. 7.3).
Frequently, microarrays are used to analyze the expression pattern of biolog-
ical systems. For this the transcriptome of a cell is investigated under different
conditions, for example in a diseased and healthy state. The first molecules to be
successfully anchored onto chips were single-stranded DNA oligonucleotides. To
study the coding mRNA of a cell in a particular state, these molecules are translated
into a complementary DNA segment, the so-called cDNA by using reverse tran-
scriptase (Fig. 12.5). These cDNA molecules, or the fragmented sections of cDNA
that are obtained, are immobilized on a chip and cleaved into single strands. The
cell lysate with the single-stranded mRNA (transcriptome analysis), or the trans-
lated cDNA that was prepared from it, is added to such a chip, and the comple-
mentary mRNA strand hybridizes with the oligonucleotide fragments that are
anchored there. It is important in this process that the samples to be analyzed are
252 12 Gene Technology in Drug Research
equipped with different fluorescence dyes according to their origin. For example,
the mRNA from a healthy cell is labeled green and that from a diseased cell is red.
After the hybridization on the chip, there will be areas that fluoresce green, red, or
yellow upon excitation, and others that remain without fluorescence. Areas that
glow yellow under the fluorescent light indicate that mRNA molecules from
healthy as well as diseased tissue have been bound. Obviously the mRNA that
binds there is available equally in the diseased and healthy states. Areas where no
fluorescence is seen indicate that neither healthy nor diseased cells produced
mRNA that bound there. Areas that fluoresce either green or red are interesting
Cells from
Diseased
Tissue
Gen1
Gen2 Gen3
PCR
RNA
Isolation
Reverse
Transcriptase
Fluorescence
Labeling
mRNA mRNA
cDNA cDNA
Construction
of the Microarray
Chip
Hybridization
…..
…..
Cells from
Healthy
Tissue
Fig. 12.5 Manufacture and testing of an expression pattern with microarray technology. Individ-
ual gene segments from an organism are cut out and amplified by using PCR (above left). Next
they are immobilized on a microchip support as single-stranded oligonucleotides (below left). In
addition to the isolated and amplified DNA, synthetically manufactured DNA building blocks or
cDNA molecules, which are obtained from reverse transcription can also be brought onto the
support. One sort of bait molecule is at each point on the support. RNA molecules are isolated from
the cells of healthy (green) and diseased tissue (red), translated into mRNA, and reverse-
transcribed into cDNA. The cDNA is provided with a colored fluorescence marker. Then the
test molecule is added in a single-stranded form to the microarray plate, and if it is complementary,
a hybridization (below middle) results. Finally the binding is analyzed under fluorescent light
(below right). Yellow areas indicate that mRNA molecules from the healthy as well as the diseased
cells have bound. The mRNA that binds there is expressed in healthy as well as diseased states.
Areas that remain dark indicate that the mRNA is up-regulated neither in a healthy nor in
a diseased state. Areas that are either only green or only red fluorescent indicate a difference in
the expression pattern between cells from healthy and diseased tissue.
12.9 Expression Patterns on a Chip: Microarray Technology 253
because they indicate differences in the expression pattern between healthy and
diseased cells. In this way, gene products can be discovered that are involved in
a disease process. If a misregulation is present, an attempt can be made to correct
this state with a pharmaceutical therapy.
12.10 SNPs and Polymorphism: What Makes us Different
What makes a single organism of a particular species different and leads to the
enriching diversity of a population? We speak of the human genome, but many
interesting deviations must be present so that we all look different and have
different features. Polymorphisms, that is, variations in the composition of the
genome, cause the observed diversity in or form the different phenotypes of
a species. The most obvious phenotypic difference is the division into male and
female individuals. Of course this is not the only difference that we recognize for
the human species. Many sequence variations occur within a population at the
genome level. If they occur in more than 1% of the population, then different alleles
are spoken of, otherwise they are attributed to mutations that have not yet been
enforced by evolution. Genetic polymorphisms are, for instance, observed as
insertions or deletions in which at least one nucleotide has been either partially or
completely incorporated or lost. However, single nucleotide exchanges occur as
the most common sequence variation. Here the term SNPs (spoken “snips”) is
used, which is an abbreviation of single nucleotide polymorphism. Compared to the
entire genome, polymorphisms encompass only a very small portion. They are
estimated to be 1% of the entire genome, so about three million bases. Of these,
SNPs are the overwhelming portion with about 90% share. Therefore the largest
part of our genome is identical over the entire species human, even though enor-
mous diversity in the phenotype is observed between us.
Within the SNPs, coding and non-coding changes are differentiated according to
whether these observed exchanges are translated into proteins or not. In the coding
regions of the genome the single exchange of a nucleotide can lead to an altered
protein sequence. In ▶ Sect. 32.7 the translation procedure of a base triplet into
a protein sequence is introduced. If a base in a coding triplet is changed, it can either
be translated into the same amino acid, or it leads to the incorporation of a different
group. This is related to the fact that sometimes multiple triplets code for the same
amino acid. The incorporation of a different amino acid into a protein can change its
properties. For example the amino acid composition of a glycosyltransferase is
decisive for the blood group that we have. An example is introduced in ▶ Sect. 29.7
of how an altered incorporation of a few amino acids in a G protein-coupled
receptor can exert an influence on our sense of smell. Humanity is divided into
different alleles according to their ability to smell different intensities and qualities.
However, not only SNPs in coding regions lead to differences in our species.
SNPs in noncoding segments of the genome can lead to changes in gene regulation.
In the context of drug research and therapy, SNPs can also be relevant where they
have no immediate effect on the phenotype. It is assumed that some SNPs confer
254 12 Gene Technology in Drug Research
susceptibility to diseases or influence the cellular response to a drug. It must be
considered at this point that SNPs can also occur in the region of the binding site of
a drug molecule, which may not necessarily be identical with that of the natural
substrate. Then they exert a direct influence on the affinity and the binding profile of
the active substance. As a result, an active substance can exert a stronger or weaker
inhibition of protein function in patients with an observed SNP than it would in
patients in which this SNP is not present.
12.11 The Personal Genome: Access to an Individual Therapy?
Genome sequencing and the analysis of SNPs and polymorphisms have impres-
sively uncovered the source of disease predisposition, and why drugs have attenu-
ated tolerability and different side-effect profiles. It has offered an explanation for
why undesirably high variations in the efficacy of drugs can occur in different
patients. All the more reason to ask whether the sequencing of the individual
genome of each person would provide options for a tailored individual and
personalized therapy. It is in no way an utopian idea that in a few years the full
sequencing of each individual person will be possible at manageable prices and
within an acceptable time frame.
It is long known in medicine that the blood groups of donor and recipient must
match for blood transfusions. A genome analysis would make the search for a
matching donor organ easier for transplantations. A particularly high density of
SNPs has been discovered in the genome, especially in regions coding for proteins
that present antigens in the immune system on their surface to stimulate an immune
response (▶ Sects. 31.7 and ▶ 32.2). An SNP analysis of each individual could
indicate the probability of developing a particular disease. Here, early detection of
this risk and possible lifestyle modification could be better than any therapy.
Already today high-resolution DNA chips (Sect. 12.9) allow the simultaneous
determination of more than 500,000 genetic SNP markers. Discovered SNPs can
indicate an elevated disposition for, for instance, the development of Alzheimer’s
disease in old age. A simple screening of the individual DNA sequence would allow
a predisposition for a particular disease pattern to be recognized.
Craig Venter, who determined the human genome in his company by the mRNA
shotgun method, had his own genome analyzed and published. From the gene
analyses of these data, a tendency for obesity and cardiovascular disease was
identified. His own father died at 59 years old of a heart attack. Based on this
analysis, Venter decided to take a lipid-lowering agent from the statin class
preventatively. A doctor could simply read from a personal genome whether the
patient displays an SNP pattern that would lead one to expect an intolerance for
a particular drug therapy. Moreover the doctor could see what type of metabolizer
category (▶ Sect. 27.7) the patient belongs to. This could reduce intolerance upon
the simultaneous treatment with multiple drugs, and would allow a safe adjustment
of individual dosing. It can also help to choose the right drug for a therapy,
particularly if multiple drugs are available for one indication.
12.11 The Personal Genome: Access to an Individual Therapy? 255
The dream of a development of “personalized medicines” for individual ther-
apy will be difficult to realize for cost reasons. Just the addition of one more methyl
group in a drug requires a full toxicological and pharmacological testing program to
achieve approval. It would devour millions in development costs. As always, the
determination of the individual genome and the elucidation of all imaginable pre-
dispositions for possible diseases has, however, its downside. In the hands of the
treating physician, this information is a blessing. But what would a future employer
read from these data about the prospect of hiring an employee? Insurance compa-
nies could accept only risk-free clients based on their genomic data—a chilling idea
that the individual genomic composition would decide an insurance premium!
By all estimations, our genetic differences and the imaginable consequences for
drug therapy, it must not be forgotten that our gastrointestinal tract is home to
millions of microorganisms. This flora exerts a decisive influence on our wellbeing,
our health stability, our metabolism, and also on our response to drug therapy. The
individual gastrointestinal flora begins to build up at birth, and is influenced in
critical measure by the mother. It varies considerable with lifestyle, the food culture,
and exposure to the regional microorganism landscape. In India, China, or Europe
a different microbe culture is found than in, for instance, America. Interestingly, it
changes if a person changes his home between the continents. Other microorganisms
cause a different configuration of secondary metabolites and contribute to a displaced
health equilibrium. Presumably these differences between individuals are just as
important as the genetic diversity that makes us different.
12.12 When Genetic Difference Becomes Disease
Genetic diseases have a molecular origin. A gene is altered (allele), sometimes the
two genes originating from both parents. Each of us carries a large number of such
altered genes, which are a result of arbitrary base exchanges: the SNPs. The
principle of evolution is based on these random mutations. If a mutation causes
a better adaptability of an individual in the environment, the chances of survival and
reproduction increase. Those genes are then reproduced with increased probability.
So-called horizontal gene transfer exerts an accelerated effect on evolution in
asexually reproducing species. There, entire DNA fragments between individuals
or even species are exchanged. Crossover plays an important role in this sense in
sexual reproduction. In this case, neighboring gene sequences of both parents
arbitrarily crossover and make new couplings. Without mutations and crossover,
all species would remain absolutely constant. In individual cases many errors are
produced as a mechanism of evolution. Some of these errors are the cause of genetic
disease. In sickle cell anemia a single amino acid in hemoglobin, which gives
blood its red color, is exchanged and a glutamic acid in position 5 of the b chain of
hemoglobin A (HbA) is replaced by a valine. The altered hemoglobin aggregates: it
“sticks” together in the red blood cells. The cells collapse and take on
a characteristic sickle form. Homozygous carriers, that is, individuals in whom
the “sick” gene is inherited from the father and the mother, are not able to survive.
256 12 Gene Technology in Drug Research
Heterozygous carriers who carry one “sick” and one “healthy” gene produce
normal and altered hemoglobin alongside one another. These people indeed have
a shorter life expectancy, but usually achieve reproductive maturity. In areas in
which malaria is endemic, there is a selection pressure for the genetic disease.
Heterozygous carriers of sickle cell anemia are more resistant to malaria than
healthy people (▶ Sect. 3.2). Here we are witnesses to Nature’s great experiment.
How will it end? Even people intervene. If malaria is successfully treated, wild-type
HbA carriers are no longer disadvantaged, the evolutionary advantage of sickle cell
anemia and the consequent selection pressure in the direction of this disease
disappears. This genetic disease could become “extinct” after a few generations.
On the other hand, if sickle cell anemia is treated either conventionally or gene
therapeutically, then these people would have entirely normal “healthy” red cells.
The malaria pathogen could reproduce well in them again. The protection from this
disease would disappear, and the susceptibility of these people to malaria would
rise to a normal risk level.
In addition to sickle cell anemia, around four thousand other diseases and their
molecular causes are known. Some, for example cystic fibrosis, phenylketonuria,
and inherited coagulopathies occur relatively frequently. Many others are rare and
are sometimes only described once. In the last years a multifactorial genetic cause
has been established for an increasing number of diseases, for example for diabetes,
rheumatoid arthritis, some cancers, asthma, and Alzheimer’s disease. The occur-
rence of these diseases is brought about by the simultaneous coincidence of
multiple genetic alterations, or is at least fostered by them.
The mechanisms of evolution are also responsible for the development of
resistances (▶ Sect. 4.8). Here, the selection pressure is exerted by a drug or an
insecticide (e.g., to exterminate malaria-carrying mosquitoes). Because of their
rapid reproduction, bacteria and viruses adapt quickly to a “hostile” environment.
The true masters are the retroviruses, which can develop resistance particularly
quickly because of their high mutation rates, and can therefore annihilate the
success of a drug with one stroke (▶ Sect. 24.5).
12.13 Epigenetics: Lifestyle and Environment Influence Gene
Activity as a Pen Would Make a Mark in the Book of Life
For the development of an organism, it is not only the kind of hereditary infor-
mation stored in the DNA that can be translated into gene products that is critical,
it is just as important that particular genes are only read in particular cells at
particular times. Even social factors and environment influence the genes and
change their behavior. Scientists observed the following example with zebra
finches. If a male zebra finch hears the song of another male, the gene EGR-1 is
more strongly read. The unknown song of a potential rival leads to a much
stronger activity in EGR-1 than background bird song that the finch has already
heard. EGR-1 is itself a key gene in gene regulation so that a change in the social
surroundings of the finch leads to many shifts in the protein expression pattern of
12.13 Epigenetics: Lifestyle and Environment Influence 257
the bird. This response helps the bird to adapt to the new changes because the
intrusion of a potential competitor into his own territory can be of essential
importance to him.
Pluripotent embryonal stem cells can differentiate into very different cell types.
For example, liver, brain, and muscle cells have the same chromosome set. They
are fundamentally different in their function. Many different phenotypes arise
from the identical genotype. This is true for the different cell types of an organism
at the same time as well as for different time-staged developmental steps in an
organism. Research on twins has produced remarkable results in this regard.
Comparative studies on identical twins, who are genetically identical, show that
with increasing age, and above all with different lifestyles, progressively larger
differences in the phenotype occur. There must therefore be mechanisms that lead
to changes in the phenotype that are passed along without changes in the genotype.
They regulate the transcription process and pass along this property to daughter
cells. This process is summarized under the term epigenetics. It leads to the
situation where an additional level of information is formed that regulates the
reading of the genes from the DNA.
The surroundings exert their effect on the genes through the epigenome.
Upbringing, childhood experiences, the effects of chemicals or intoxicants, and
stress are all epigenetic regulatory influences over which the gene activity is
temporarily or even permanently changed. As the following example of the Agouti
mice shows, such information can even be passed along to subsequent generations.
Normally, these rodents are small brown, thin, and very agile animals. The so-
called Agouti gene is contained within their genes, which after activation causes the
animal to become ill, their coat turns yellow, and they become ravenous and fat.
The offspring of these ill mice are colored the exact same way and are just as frail as
their parents. The American molecular biologist Randy Jirtle at Duke University in
Durham, NC, fed pregnant Agouti females a special diet that was rich in dietary
supplements such as vitamin B12, folic acid, choline, and betaine. As a result, the
majority of the offspring of these females were brown, thin, and in the best of
health. The Agouti gene was turned off by the enriched diet, without requiring any
changes to the genome sequence of the rodents.
On the molecular level, it is in particular methylation and acetylation that
transmits the additional epigenetic information. In contrast to genetic changes
that cause mutations in the translated gene products, epigenetic changes have
a strong dynamic component and are, above all, reversible. In the stretched-out
state, there is more than two meters of DNA in the cell; this is wound into a highly
compact form onto small basic proteins: the histones. Lined up like pearls along
a string, they collectively make up the chromatin, which makes up the chromo-
somes in its maximally packed form. Histones are the most strongly conserved
proteins in existence, for example, the 102-residue histone protein H4 from the pea
and from the cow are only different in two positions.
Epigenetic changes modify as one option the DNA in that methyl groups are
transferred to cytosine by methyltransferases (see ▶ Sect. 26.9) to give
5-methylcytosine. The base pairing with guanine in the DNA is not affected by
258 12 Gene Technology in Drug Research
this modification, and the genetic code remains unchanged. If a methylation occurs
in a promoter region of the DNA, this leads to a silencing of the corresponding
gene. The methylation makes the DNA inaccessible to the reading apparatus, which
is somewhat similar to password-protected computer data. If the promoters in these
gene segments are demethylated again by methylases, the translation into the
corresponding protein is possible once more. As a second epigenetic change
histone proteins can be modified. Methyl, acetyl, and phosphate groups can be
enzymatically transferred to lysine and arginine residues of these basic proteins
with, for example, histone acetyltransferases (HATs). The added acetyl groups
neutralize the positive charge on the Lys and Arg residues (the so-called “histone
tails”). They can no longer interact as efficiently with the negatively charged
phosphate groups of DNA. Added phosphate groups have an even more repulsive
effect. These changes lead to less densely packed chromatin, which makes the DNA
reading in particular regions easier. The transcription and gene expression is
regulated in this way. On the contrary, the cleavage of acetyl groups by histone
deacetylases (HDACs) or by methylation of the Lys and Arg residues of the
histone causes the packing density of the chromatin to increase, and this diminishes
the probability for the DNA to be read in the affected areas.
Misregulation of the described enzymes is associated with the development of
diverse cancers. Because epigenetic processes are fundamentally reversible, there is
a chance that a drug therapy could intervene in the misregulated function of these
transferases. For this reason, intensive research efforts are underway for inhibitors
of different methyltransferases and histone deacylases, the latter of which are
mechanistically comparable to metalloproteinases (▶ Chap. 25, “Inhibitors of
Hydrolyzing Metalloenzymes”). The hope remains that these inhibitors can sup-
press disease-causing epigenetic changes and become potent drugs for cancer
therapy in humans.
12.14 The Scope and Limitations of Gene Therapy
In September 1990 the 4-year-old Ashanti DeSilva was the first patient to be treated
with a gene therapy. The alleles of both parents for the enzyme adenosine deam-
inase were defective. Because this enzyme is critical for the function of the immune
system, the little girl suffered from severe immune insufficiency that could no
longer be classically treated. As a therapy, the white cells of the patient were
repeatedly infected with a virus that carried the correct information for the missing
enzyme. The patient, who previously was hospitalized and in constant danger of
infection, has developed into a person with entirely normal health.
The term gene therapy refers to any technology with which a gene is introduced
into a cell of a patient to replace a defective or missing gene. In principle it is very
simple. Viruses demonstrate it for us daily: they bring their own genetic informa-
tion into a foreign cell and use it to code for a few key enzymes that are necessary
for their own reproduction. For the rest they use the biosynthesis machinery of the
infected cell. The retroviruses, the genetic information of which is coded in RNA,
12.14 The Scope and Limitations of Gene Therapy 259
translate this information into DNA and integrate it into the host’s DNA. In gene
therapy, a nucleic acid segment is inserted into the genome of a virus that codes for
the protein that is to be substituted in the patient. The construct, which is what
these modified viral genes are called, is surrounded by the virus capsid and is
introduced into the cells of the patient. This can either take place outside of the
body, that is, in bone marrow or in white blood cells, that have already been
aspirated or within the body such as by injection into tumor tissue or in
a particular organ. Adenoviruses, herpes viruses, or retroviruses are all well suited
as carriers of the genes because these viruses incorporate their own genetic infor-
mation into mammalian DNA. Although retroviruses only transfer their genes
during cell division, adenoviruses can cause non-dividing cells to incorporate and
use foreign genetic information. Plasmids, DNA and liposomes and pure DNA
constructs are also being experimented with. The rates of transfer for the new
information into cellular DNA is significantly higher here than for the viruses. In
the meantime over 1,000 gene therapy clinical studies are underway, most in the
USA and overwhelmingly for tumor therapy. Cancer is indeed not a hereditary
disease, but the genetic information that is inherited from cell to cell creates
a “local” genetic disease. Oncogenes are a large group of proteins that are respon-
sible for the occurrence of cancer. Tumor-suppressor genes code for proteins that
interfere in the cell cycle and stop the division of cells. The quickly increasing
knowledge of the molecular structure of these proteins has afforded many
approaches for the gene therapy of tumors.
Other diseases can also be approached with gene therapy. The standard therapy
for cardiovascular diseases that are characterized by an excessive growth of endo-
thelial cells and consequent narrowing of the blood vessels is widening with
a balloon catheter. That helps, but only temporarily. After a few months the cells
proliferate anew and the blood flow in the downstream areas decreases threaten-
ingly. Here a gene therapy could be employed. Adenoviruses can be released
locally during the balloon catheter treatment. These carry the genetic information
for a protein that inhibits cell division, the so-called retinoblastoma protein. The
cells can then no longer proliferate.
AIDS patients die from infections because their immune systems are damaged.
The so-called T cells die. Bone marrow transplantation is a possible therapy. For this
it is decisive that the immunological properties of the donor and patient are as close as
possible. Many people are eliminated as possible donors, not to mention animals. Or
are they suited? A new approach for bone marrow transplantation and perhaps even
organ transplantation is the humanization of animals. For this immature human
bone marrow cells, stem cells, are transplanted into an animal, for example, a baboon.
The rejection reaction of the foreign cells is prevented by treatment with immuno-
suppressants. The human recipient does not bear the risk of an immune reaction, but
rather the animal donor. After the proliferation of the human cells in the animal, the
cells can be safely transplanted into the human “pro-donor.”
Will gene therapy replace classical drug therapy? The answer is absolutely
certain: no. The technique is very laborious and each patient needs an individually
adapted therapy. Moreover, the results to date have been a bit disappointing and
260 12 Gene Technology in Drug Research
sometimes devastating. Fatalities have been observed in the gene therapy of
pediatric leukemias. Gene therapy will conquer a place in the therapy of special
diseases because it is a curative and not a symptomatic therapy. With increasing
experience and better appraisal of the possible risks, interventions into the human
genome will become acceptable for such diseases because it would make it possible
to eliminate the genetic disease for the individual and his or her offspring once and
for all and eradicate it from the world.
Gene technology not only solves problems, it creates new ones too. The techni-
cal barrier to the creation of a Homo perfectus is as low as it has ever been in the
history of humanity. The door to possible misuse has been widely opened. We can
only hope that ethics and common sense prevents this from happening. Draconian
legal regulations damage the beneficial use of gene technology more than it
contributes to the prevention of misuse. Those in responsibility have recognized
this and have established a framework in which gene technology can further
develop for the good of humanity.
12.15 Synopsis
• Gene technology has developed as a key technology in modern drug research
because it allows the production of pure proteins, the targeted mutagenesis to
elucidate functional and mechanistic properties of proteins or to confirm and
disprove binding modes, produces animal models by knocking-in and out par-
ticular genes, allows genes to be activated or silenced, or allows somatic
individual gene therapy.
• The elucidation of the genetic code, the recombinant production of genes and
gene products, and the polymerase chain reaction were milestones in the estab-
lishment of gene technology.
• Sequencing of the human genome revealed the constitution of our genes, the
number of gene products, and many functional insights. Meanwhile hundreds of
genomes of other species have been sequenced, and the genome analysis of
individuals is on the horizon.
• The human genome contains about 25,000 genes of which about 22,000–23,000
are translated into proteins. Some sequence segments are non-coding RNAs and
they accomplish important functions in the organism (e.g., in the ribosome or
spliceosome). About 95% of the genome contains numerous sequences and
signals that control the regulation of the genome. A functional classification of
the gene products has been accomplished for a significant portion of the genome.
• To study the relevance of blocking the function of a gene product, that is,
a protein in a disease situation, a particular gene can be knocked-out in an
animal model, mostly in mice. Genes can also be knocked in. Such turning on
and off of genes is of utmost importance in drug research because it provides
decisive information about the relevance of a planned therapeutic intervention.
• In vitro models for drug screening could only be developed once proteins could
be produced in pure form and high yield. Various expression systems from
12.15 Synopsis 261
bacterial up to mammalian cells can be used for the production of foreign
proteins, which are brought into cells via the corresponding coding DNA.
• Genes can be silenced by RNA interference. Therefore small amounts of double-
stranded RNA, usually produced by the enzyme dicer, are incorporated in the
enzyme complex RISC. RISC uses one strand of the RNA dimer segments as
a template to capture mRNA molecules with a complementary sequence and
cleaves them sequentially. By doing this, mRNAs with particular sequences are
eliminated.
• To copy this principle for therapy, one needs about 22-base RNA molecules that
have to be transported across the membrane into cells, a difficult task with fragile
and highly polar species. Furthermore, these molecules can cause unwanted
immune response. Chemical modifications of the RNA molecules are aimed at
improvements in the transportation, immunogenicity, and stability properties.
• The proteome reflects the totality of all proteins in a cell at a given time under
precisely defined conditions. Its composition changes dynamically and differs
between healthy or diseased states or under the influence of therapeutic
treatments.
• The proteome can be analyzed at any given time by 2D gel electrophoresis; this
combines a separation by isoelectric focusing and SDS-PAGE analysis. Differ-
ences in the expression patterns indicate the involvement of proteins in a disease
situation. Back regulation under drug administration can indicate a possible
therapeutic strategy.
• Pull-down experiments with immobilized drug molecules on a chromatographic
solid support allow trapping of proteins that show interaction with the studied
drug molecules. Interaction profiles for drug molecules in the cell can be
determined.
• Biomolecules can be immobilized on microarray chips. Particularly RNA, DNA,
and oligonucleoides thereof are anchored on these chips to extract from the
complementary RNA or DNA sequences from large mixtures. By appropriate
fluorescence labeling of the anchored baits, sequences, and the target sequences
to be “fished,” detection of binding can be easily recorded in automated fashion.
With this, expression patterns of cells can be studied.
• Polymorphisms, particularly single nucleotide polymorphisms (SNPs), are vari-
ations in the composition of the genome of a species. These changes make
individuals different, and some SNPs confer susceptibility or resistance to
diseases or influence the cellular response to a drug.
• Differences in the individual genomes might be the key to a tailored individual
and personalized drug therapy and can allow a susceptibility to a particular
disease pattern to be recognized. Intolerance to a given drug therapy could
become transparent or classification of an individual into different metabolizer
classes could be achieved.
• Genetic differences can be a reason for the development of diseases. In some
cases, they are caused by single amino acid exchanges in one gene product (e.g.,
sickle cell anemia), in other cases multifactorial genetic causes are responsible
for the disease development.
262 12 Gene Technology in Drug Research
• Epigenetics regulate the transcription process not by altering the genetic
sequence of DNA but by regulating the reading of the genes from the DNA.
Lifestyle, experience, and environment exert their effect on the genes through
the epigenome.
• Methylations and acetylations transmit additional epigenetic information in
a reversible manner. Either the bases of DNA are directly methylated or the
packing density of stored DNA on the histone proteins is altered making it more
or lesser accessible to the reading apparatus. The latter process modifies the
charges of positively charged Lys and Arg residues involved in packing via the
transfer of acetyl groups.
• Gene therapy tries to replace a defective or missing gene in the cells of
a patient. This would make it possible to eliminate the genetic disease for the
individual and his or her offspring. A nucleic acid segment is inserted into the
genome via viral carriers, and it codes for the protein that is to be substituted in
the patient. Gene therapy opens opportunities in special disease situations but it
also has its risks.
Bibliography
General Literature
Cooper NG (ed) (1994) The human genome project. Deciphering the blueprint of heredity.
University Science, Mill Valley
Kiely JS (1994) Recent advances in antisense technology. Ann Rep Med Chem 29:297–306
Lander ES et al (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921
Monastersky GM, Robel JM (eds) (1995) Strategies in transgenic animal science. Blackwell
Science, Oxford
Mullis KB, Ferré F, Gibbs RA (eds) (1994) The polymerase chain reaction. Birkh€
auser, Boston
Pandit SB, Balaji S, Srinivasan N (2004) Structural and functional characterization of gene
products encoded in the human genome by homology detection. IUBMB Life 56:317–331
Post LE (1995) Gene therapy: progress, new directions, and issues. Ann Rep Med Chem 30:219–226
Slagboom PE, Meulenbelt I (2002) Organisation of the human genome and our tools for identi-
fying disease genes. Biol Psychol 61:11–31
Venter JC et al (2001) The sequence of the human genome. Science 291:1304–1351
Wolff JA (1994) Gene therapeutics. Methods and applications of direct gene transfer. Birkh€
auser,
Boston
Special Literature
Adams MD et al (1995) Initial assessment of human gene diversity and expression patterns based
upon 83 million nucleotides of cDNA sequence. Nature 377(Suppl 6547):3–174 (85 authors
including JC Venter)
Carlton JM et al (2007) Draft genome sequence of the sexually transmitted pathogen Trichomonas
vaginalis. Science 315:207–212
Chang MW, Barr E, Seltzer J, Jiang Y-Q, Nabel GJ, Nabel EG, Parmacek MS, Leiden JM
(1995) Cytostatic gene therapy for vascular proliferative disorders with a constitutively active
form of the retinoblastoma gene product. Science 267:518–522
Bibliography 263
Craig C (1995) Bristol-Myers to Pay $2.7M for transgenic goats that make human antibodies.
BioWorld Today 6:1
Explore the Homo sapiens genome. http://www.ensembl.org/Homo_sapiens/index.html
Fleischmann RD et al (JC Venter et al) (1995) Whole genome random sequencing and assembly of
Haemophilus influenzae Rd. Science 269:496–512
Human genome database with functional predictions
Schneiker S et al (2007) Complete genome sequence of the Myxobacterium Sorangium
cellulosum. Nat Biotech 25:1281–1289
Seide RK, Giaccio A (1995) Patenting animals. Chem Ind 16:656–659
Sippl W, Jung M (2009) Epigenetic targets in drug discovery methods and principles in medicinal
chemistry. In: Mannhold R, Kubinyi H, Folkers G (eds) Methods and principles in medicinal
chemistry, vol 42. Wiley-VCH, Weinheim
264 12 Gene Technology in Drug Research
Experimental Methods of Structure
Determination 13
In this chapter we want to turn to the experimental structure determination methods of
ligands and proteins. There are two techniques in particular that deliver information
about the three-dimensional structure of small organic molecules all the way to
proteins: crystal structure analysis and high-resolution NMR spectroscopy. The
first technique is the older method. It goes back to an experiment of Max von Laue in
1912. It was just 17 years earlier that Wilhelm Röntgen had discovered an electro-
magnetic radiation, which was later named X-rays, or “Roentgen rays” in German in
honor of him. Together with his collaborators Walter Friedrich and Paul Knipping,
Laue was able to demonstrate the wave nature of X-rays with a copper sulfate crystal.
At the same time they proved the lattice structure of crystals. Only one year later
William Lawrence Bragg and his father William Henry Bragg reaped the rewards of
these experiments. They determined the crystal structure of sodium chloride. The
technique has grown over the years. Today the structures of proteins with 4,000
amino acids have been determined. In the last years electron microscopy has proven
to be a very powerful crystal diffraction technique tool for the structure elucidation of
membrane-bound proteins and viruses. NMR spectroscopy is likewise a relatively
young technique. In 1945 the research group of Felix Bloch and Edward Purcell in the
USA observed the resonance absorption of hydrogen atom nuclei in a magnetic field
for the first time. From this experiment, the technique has grown, mostly due to
progress with the instrumentation, to the extent that the structure determination of
proteins with more than 800 amino acids has been accomplished. For this purpose,
however, the protein must be extensively labeled with different isotopes.
13.1 Crystals: Aesthetic on the Outside, Periodic on the Inside
The term “crystal” causes one to immediately think of well-formed minerals or
sparkling gemstones with a magnificent cut. The association of crystals with the
structures of the molecules that determine our lives only occur to us as a second
thought. The crystal is typically associated with “dead” material. When Jack Dunitz
took over his chair as professor of organic chemistry at the ETH in Zurich at the end
G. Klebe, Drug Design, DOI 10.1007/978-3-642-17907-5_13,
# Springer-Verlag Berlin Heidelberg 2013
265
of the 1950s, the famous natural product chemist Leopold Ruzicka dismissively
told him that crystals are a “chemical graveyard.” Nonetheless, Dunitz and his
research group showed over many years that a crystal in no way belongs in
a “graveyard,” but rather is the key to understanding the structure, dynamics, and
reactivity of molecules.
If a mineral is considered, the regular construction of the single crystals stands
out. Even organic materials have the ability to form shapely crystals. One must only
think of the fascinating crystals of candied sugar. Is this external regularity
a representation of the inner structure? Before this question is answered, the way
that crystals are obtained should be clarified. A mineralogist got it easy. Nature has
already provided well-formed crystals over thousands or millions of years. Organic
molecules and proteins rarely occur in Nature in a crystalline state. Conditions must
be found under which they crystallize.
In general, crystals are grown from a solution. For simple organic substances this
can also be accomplished from liquid material or by sublimation. Both crystalliza-
tion methods are known from water when a lake freezes to ice, or from beautiful
crystals of frost. For crystallization from solution a solvent is sought in which the
compound is adequately soluble. By changing the conditions, the saturation point
of the solution is exceeded. If this occurs slowly, small crystal nuclei form that can
grow to large crystals. As a rule the solubility of the compound decreases with
sinking temperatures. The saturation point of the solution can be exceeded by
changing the temperature. The solution can also be “thickened”, that is, some of
the solvent is removed. Another possibility is the addition of a second solvent in
which the compound is less soluble. If the ratio of the two solvents is correctly
chosen, the saturation point can be slowly approached. For compounds with acidic
or basic groups, pH conditions can be found under which the compound exists as
a salt. Because of strong ionic interactions the salts often form better crystals. They
can be “salted out.” For this, a salt, for example, sodium chloride, is added to an
aqueous solution of the compound. The salt “uses up” the water molecules as it goes
into solution. It becomes surrounded by a solvation sphere of water molecules. In
doing so, the water is removed from the organic compound, which also has a sphere
of water surrounding it, the solvent. The saturation point of the compound is
exceeded, and the crystallization begins.
Proteins are complex entities that, as a general rule, are only soluble in water.
Because of their amino acid composition, they carry charged ionic groups on their
surfaces. Even with proteins it holds true that conditions must be found under which
they associate in periodic array. This is accomplished by slowly changing the
amount of water in which the protein is dissolved. This can work in both directions.
Hydrophobic proteins begin to aggregate when the amount of water increases.
Proteins that have stronger polar groups on their surfaces aggregate when the
water molecules are removed from their surfaces. Adjusting pH to find the right
value, the choice of suitable salt for salting out, and different temperatures are the
conditions that must be optimized. In addition to salts, surface-active substances
(detergents) can also influence the solvent shell and support the crystallization.
Despite this, crystallization is a kind of fine art. The search for suitable conditions
266 13 Experimental Methods of Structure Determination
requires creativity and diligence. Today, however, the crystallization methods are
so elaborate that the tedious work of setting up thousands of different test condi-
tions is carried out by robots.
Sometimes considerable effort is invested into structure determination. In 1995,
the crystallization and structure determination of HIV integrase, one of the key
enzymes in the generation cycle of the virus, was accomplished only after the
40th point mutation of the original protein. This point mutation was made with the
goal of changing the surface properties of the protein so that an orderly aggregation
to a crystal could occur.
Let us return to the original question of whether the orderly outward appearance of
a crystal is a reflection of the internal construction. Chemically, a crystal is
homogenously composed. The organic molecule or the protein represents the basic
building block. It is only when these building blocks are spatially neatly organized
that a periodic array occurs that optimally fills the space. In daily life, many solutions
to these packing problems are easily seen, for example, sugar cubes that only fit into
the box if they are layered in the right direction, or paving stones that must be neatly
laid in a periodic fashion to completely cover the path without gaps (Fig. 13.1).
A single paving stone, when correctly fitted to the next, represents a repeating unit
in the lattice. A crystallographer refers to this unit as an elementary unit cell, and the
orderly setting of one unit upon another in terms of periodic translation. In the most
simple organic crystal structure, the elementary cell is one molecule (Fig. 13.2).
13.2 Just Like Wallpaper: Symmetries Govern Crystal Packings
The contents of an elementary cell can also be more complexly composed, for
example, like a wallpaper pattern. A basic motif is repeated so that it fills the surface
area. Crystallographers call the basic motif the asymmetric unit. In Fig. 13.3 this
motif is a flower branch. Not all of the motifs can be generated simply by shifting
a b
Fig. 13.1 Paving stones cover a surface without leaving holes (a). This is only possible if they are
derived from a particular basic geometric pattern, for instance a parallelogram, rectangle, square,
triangle, or hexagon. This basic pattern can by modulated by complementary bulges and recesses.
A path cannot be covered without holes if equilateral pentagons or octagons are used. If an
octagonal stone is combined with a square stone, however, the surface is completely covered. It
is immediately clear that if a square stone is cut along its two diagonals, two triangles result.
Adding four such pieces an octagon can be amended to a square in this way (b).
13.2 Just Like Wallpaper: Symmetries Govern Crystal Packings 267
the branch, some must be additionally reflected. A pair of image and mirror-image
branches represent the elementary cell. The surface can now be filled with this
building block by simply shifting it. In addition to reflecting, basic motifs can also
be rotated. By using reflections and rotations, both so-called symmetry operations,
the contents of the elementary cell is generated from the asymmetric unit. This cell
is layered on itself in all three spatial directions in an orderly formed crystal lattice.
Even as a three-dimensional entity, the elementary cell must take on a particular
Fig. 13.2 In the most simple case, molecular packing, or unit cell, is accomplished purely by
shifting the molecule in all three spatial directions. The resulting unit, the elementary cell, is
derived from an irregularly angled body, a parallelepiped (above right, violet). If a point near the
molecule is picked out and all of the molecules in the crystal packing are connected by this point,
a three-dimensional lattice results.
Fig. 13.3 An area can be
covered not only by purely
shifting an object, the
asymmetric unit. Additional
symmetry operations such as
reflection and rotation can
also be used. This way
multiple copies of the object
are generated. In the
presented case, the flower
branch along with its mirror
image makes up the unit (the
elementary cell is outlined in
red) that can be used to cover
the surface simply by shifting
it regularly.
268 13 Experimental Methods of Structure Determination
form to completely fill all of the space. If the basic types of elementary cells are
combined with all of the possible symmetry operations, 230 possibilities result for
the basic motif to fill the space. The crystallographer calls them the 230 space
groups. For chiral molecules, and proteins belong to this group, mirror reflection
does not occur. Therefore proteins only crystallize in 65 space groups.
13.3 Crystal Lattices Diffract X-Rays
Max von Laue used crystals to prove the wave nature of X-rays (Roentgen rays) by
diffracting them. For illustration, we shall consider a water wave. When a drop of
rain strikes a puddle, circular waves form that propagate from the center outward.
The drop generates a so-called elementary wave upon submersion. If two drops that
are separated by a particular distance simultaneously strike the water’s surface,
circular waves propagate outwardly from both submersion points. It is better to
observe this experiment if the water’s surface is constantly being “excited,” for
instance, with a constantly dripping tap. The circular outwardly spreading wave
fronts meet each other at some point. What happens? A lamellar pattern forms, parts
of the water’s surface remain at rest and other parts seem to move vigorously
(Fig. 13.4). In the cross section the water surface moves sinusoidally (Fig. 13.5).
How do two waves behave that collide and superimpose with one another? If the
wave peak and another wave peak or the wave trough and another wave trough
meet, the wave is amplified. If, on the other hand, a wave peak meets a trough, they
cancel one another out. The water surface remains calm. The lamellar pattern of
moving and still water surface between waves that are moving outwardly and
inwardly is caused by this superimposition. It is called interference. The band
density depends on the distance between the submersion points of the drops. The
ensuing interference pattern therefore contains information about the relative posi-
tion of the points from which the elementary waves were generated.
Fig. 13.4 Two raindrops
strike the surface of the water
and form circular, outwardly
moving water waves. These
superimpose on one another
to give a band-formed
interference pattern. There are
areas along these bands where
the water surface is quiet. In
other areas it moves that much
more strongly.
13.3 Crystal Lattices Diffract X-Rays 269
If parallel water waves (e.g., a wave front at the coast) collide with a barrier that
has a small opening (e.g., a harbor entrance) semicircular waves spread outward
from the backside. If this barrier has two neighboring openings (double slit),
a semicircular wave develops behind each opening. The same picture as with the
two raindrops is achieved (Fig. 13.4). The waves interfere with one another behind
the double-slit barrier, and a diffraction pattern forms. The density of this pattern,
that is, the progression of the bands, depends on the geometry of the double slit.
Formally, the diffraction sequence on the crystal lattice is analogous. The same
principles are valid, but the superimposition is more complex. A very simple lattice
shall be considered that only has one type of atom. An X-ray runs as a parallel wave
toward this crystal. It collides with an array of atoms and initiates an interaction that
is comparable to that between the raindrop and the puddle. Each atom generates
a spherical wave because of the interaction between the atom’s electrons and the
X-ray. The circular wave on the water’s surface represents therefore the spherical
wave in space. The spreading spherical waves superimpose on one another and
form a wave that leaves the crystal in a changed direction (Fig. 13.6). Formally
seen, the incoming and outgoing waves have an angular relationship to one another
that is equivalent to the reflection of the wave in a plane perpendicular to the
Fig. 13.5 The waves run in a sinusoidal manner in cross section. The distance between two wave
peaks is called the wavelength. The height of the water wave at the summit is called the amplitude.
The position at which the wave crosses the resting position determines the phase. (a) If two wave
trains with the same phase meet, they add to one another and the amplitude doubles. This situation
is in the places in Fig. 13.4 where the water’s surface moves more strongly. (b) If there is a phase
difference of exactly one half of a wavelength, the wave peaks meet with the troughs. Both waves
cancel one another out. This represents the parts of Fig. 13.4 where the water surface is very still.
(c) Any other superimposed phase shift causes a wave, the amplitude of which is somewhere
between the extremes in (a) and (b).
270 13 Experimental Methods of Structure Determination
considered atom row. Therefore, the diffraction of the three-dimensional crystal
lattice can be treated formally as a reflection at a plane in the lattice.
Many parallel sets of such lattice planes can be inscribed on a crystal with
differing relative separation from one another and relative occupation density with
atoms (Fig. 13.7). The reflected waves contain the information about the geometry
(distance) and the relative occupancy (scattering power) in this plane. To record the
diffraction properties of a crystal, each set of parallel planes of the crystal must be
oriented in the X-ray beam so that a reflection is possible. This laborious work is
taken over by a computer-controlled diffractometer.
13.4 Crystal Structure Analysis: Evaluating the Spatial
Arrangement and Intensity of Diffraction Patterns
To demonstrate that different lattices indeed generate different diffraction patterns,
a simple experiment should be considered. For this purpose a laser pointer and
different pinhole filters are needed. The pinhole filters can easily be made. A black
and white print out of the periodic alignment as is shown in Fig. 13.8 can be reduced
and transferred to high-resolution photography film. This homemade aperture rep-
resents a two-dimensional periodic lattice. The laser beam is bent through the pinhole
mask and generates the diffraction pattern on a screen that is shown in Fig. 13.8.
a b
Fig. 13.6 If a wave front (blue) in one plane meets with a row of atoms (black points on the dotted
lines), each atom in this row becomes the starting point for a circular wave. This is analogous to
those created when the raindrop hits the surface of a puddle. The circular waves that formed from
the back row of atoms superimpose upon one another just as in the case with the water waves
(Fig. 13.4). All circular waves are generated with the same phase in the indicated direction of the
incoming wave (a). As a result of this superimposition, a new wave front forms (red) that leaves
the crystal in an altered direction. Relative to the direction of the incoming wave, they have an
angle that is formally a reflection of the incoming wave front on the atom row that is marked with
the green line. If a different incoming direction is taken the circular waves are not generated from
the same place (b), that is, there is a phase difference between them. Their superimposition does
not lead to a new wave front.
13.4 Crystal Structure Analysis 271
In the first two masks the distance and symmetry of the pinhole mask is changed.
In the third and fourth mask the repeating motif of the three or five differently
sized holes represent a molecule that has two types of atoms. These motifs
produce a periodic lattice when lined up next to each other. They have the same
dimension as is found in the first image on the left. If the diffraction pictures are
compared, the distribution of the intensity of the light points is different. That is
b
a
d
c
Fig. 13.7 A cluster of parallel planes can be laid through the atoms of a crystal lattice (a, b, c).
Their relative distance from one another and their atomic occupation density varies. Each one can
give rise to “reflections” in an X-ray diffraction experiment. For this the crystal must be brought
into the correct orientation for the incoming beam each time. The X-ray counter is positioned so
that it captures the out-going X-ray beam. It is from this geometry that the spatial orientation of the
cluster of planes in the crystal is determined. The occupation density of the atoms decides how
“well” a particular plane cluster reflects. This information is contained in the intensity (amplitude)
of the outgoing wave. (d) Different types of atoms in a molecular crystal have different spatial
relationships to one another. A parallel cluster of planes can be placed through each atom in the
molecule (here a three-atom molecule). The amplitude of the outgoing beam results in the
superimposition of wave trains that are reflected in these planes.
272 13 Experimental Methods of Structure Determination
Fig.
13.8
A
perforated
mask
can
be
used
for
a
diffraction
experiment
with
a
laser
pointer.
For
this
the
displayed
hole
patterns
(above)
must
be
brought
to
the
size
of
the
wavelength
of
laser
light.
The
diffraction
patterns
below
were
generated
from
the
masks.
The
holes
in
the
two
left
masks
are
all
the
same
size,
which
is
comparable
to
having
only
one
type
of
atom.
The
hole
pattern
changes
from
wide-meshed
squares
to
an
angular
orientation.
The
diffraction
patterns
reflect
the
symmetry
and
distance
of
the
holes
to
one
another.
In
the
two
masks
on
the
right,
the
distance
between
the
repeating
units
is
identical
to
the
first
masks.
The
composition
of
the
motif
in
the
repeating
unit,
however,
varies.
It
is
made
up
of
multiple
holes
and
can
be
compared
to
the
different
atoms
in
a
molecule.
The
distance
between
the
diffracted
light
reflections
(lower
row)
is
identical
for
the
first,
third,
and
fourth
masks.
The
intensity
of
the
diffracted
radiation,
however,
varies
from
reflection
to
reflection.
It
contains
the
information
about
the
composition
and
the
geometry
of
the
original
motif.
13.4 Crystal Structure Analysis 273
what contains the information about the construction of the motif that generated the
lattice. It is just this information that is used to determine the crystal structure.
The reflections, that is, the intensity of the individual light points in the
diffraction pattern, contain the information about the form of the molecule. There
is a mathematical technique, the Fourier transform, which can be used to translate
the diffraction pattern back to the generating motif. A Fourier transform is the
superimposition of many sine and cosine functions. The intensity of the diffraction
reflections determines the contribution of the functions, as does the phasing. The
importance of these aspects was already underscored in the interference of the waves
(Fig. 13.5). Unfortunately just this information about the relative phasing is lost in the
diffraction experiment. The diffractometer only registers the intensity of the reflec-
tions. The missing information is referred to as the phase problem of crystal
structural determination. It must be reconstructed for the individual reflections
by computational methods and by using appropriate measuring conditions. Fre-
quently large electron-rich elements (e.g., heavy-metal ions) are embedded in the
protein (i.e., by coordinating to histidine or cysteine). These heavy atoms dominate
the diffraction pattern, and in doing so, they betray their position in the crystal lattice.
Another method takes advantage of the so-called anomalous scattering. This effect is
based on the interaction of X-rays with the electrons of heavy atoms in particular.
This leads to the situation that a spherical wave that is propagating toward an atom is
reflected with a phase shift. Simply stated, it is returned with a delay. The effect is
dependent on the wavelength and can be exploited to determine the phasing. The
crystal is measured on a synchroton (particle accelerator that also produces electro-
magnetic radiation in a broad wavelength range, including X-rays) and the diffraction
experiment is carried out with multiple different wavelengths. Anomalous scatter-
ing requires that a heavy atom is contained in the protein structure. This is already
the case for metalloproteins. Often another approach is taken. Proteins that are
produced in a special expression system (▶ Sect. 12.6), can be generated with
selenomethionine instead of methionine. The heavier selenium serves as an anoma-
lous scatterer in the diffraction experiment. There are methods for small molecules
that allow a straightforward reconstruction of the phase information from the inten-
sity distribution, the so-called “direct methods.” The development of such methods is
being worked on for protein structural determination. Often an already-solved,
related protein structure can be utilized as a starting model for a structure determi-
nation (molecular replacement method). The model is translated and rotated in the
elementary cell by computer simulations until a calculated diffraction pattern is
produced that matches the diffraction pattern of the unknown protein.
The phasing obtained at the beginning of the structural analysis with this method
is only approximate. Altogether the regeneration of the phasing information is not
trivial. Even in the 1960s, phasing calculations kept one scientist busy for several
years. The methodical progress and the increased performance of computers now
allow this to be accomplished in a few minutes. Even today, however, this step can
still be very challenging for proteins. It is becoming apparent though, that the
structure determination of medium-sized proteins is becoming routine. Historically,
the time span from crystallization to structure determination could be quite long.
274 13 Experimental Methods of Structure Determination
Urease is certainly a curiosity. It was the first protein to be successfully crystallized.
James B. Sumner accomplished this back in 1926. Its 3D structure, however, was
first elucidated in 1995, that is, 70 years later!
13.5 Diffraction Power and Resolution Determine the Accuracy
of a Crystal Structure
A picture of the contents of the unit cell is the result of the Fourier transform. It
is portrayed in terms of the electron density in space (Fig. 13.9). The detail with
which the electron density can be determined depends on the spatial resolution
with which the diffraction pattern was measured. In relation to the Fourier trans-
form, this is a question of the number of different wave fronts that were
superimposed upon one another in the correct amplitude and phase. It can be seen
in the diffraction pattern created with the laser beam (Fig. 13.8) that the intensity
clearly weakens toward the edges. The extent to which the diffraction pattern is
Fig. 13.9 View of a crystal structure of aldose reductase (▶ Sect. 27.4). The electron density (the
so-called 2F0–Fc density at 1s level) is displayed as a blue mesh on the predefined contour level
around a tryptophan residue. In (a) the diffraction data were obtained at a resolution of 4 Å, and
a Fourier transform was used to calculate the electron density. The resolution increases from (a)
4 Å to (b) 3 Å, to (c) 2 Å, and to (d) 0.66 Å. The resolution in the last-shown contour density is so
high that hydrogen atoms can be recognized as single density peaks in the difference density map
(positive is yellow, negative is violet F0–Fc difference density, 2s level). The electron density is so
clearly structured at 2 Å that it is simple to fit the indole building block in place. At 4-Å resolution
this assignment is problematic and can easily lead to errors.
13.5 Diffraction Power and Resolution Determine the Accuracy of a Crystal Structure 275
perceivable in the edges limits the accuracy with which the generated motif can be
spatially resolved. For small organic molecules, this resolution is easily achieved in
that the atoms are visible as distinct maxima in the electron density. If the crystal’s
quality is diminished due to lattice defects or disorder, the resolution is poorer. The
resolution in protein crystals is usually between 1.5 and 3 Å. In the best case,
a resolution is achieved that is in the order of magnitude of a bond length. The
upper limit falls into the range of the cross section of a benzene ring. Resolutions of
less than 1 Å, however, have been achieved (Fig. 13.9). In those cases many details are
recognizable, such as single hydrogen atoms or multiple arrangements of side chains.
At higher resolution the electron density maxima are directly assigned to the atoms
in the molecule (Fig. 13.10). In the beginning this assignment is crude, the phases used
in the Fourier transform are only approximate. The position of the detected maxima
must still be optimized. This is defined as “refinement of the structure.” For this the
experimentally observed diffraction pattern is compared with the diffraction pattern
that is calculated from the atomic positions of the preliminary model. If the measure-
ment is very accurate, the density of a “pseudomolecule” with spherical atoms can be
subtracted from the observed electron densities at the end of the structure determina-
tion. What remains is the electron distribution of the bonds between the atoms in the
molecule (Fig. 13.10). This is, however, only possible with very high-resolution
measurements. At lower resolution, as is the case in moderately resolved protein
structure determinations, a direct assignment of the atoms of the protein to the
electron density maxima cannot be made (Fig. 13.11). More commonly the course
of the chains is fitted to the electron density. Because proteins are constructed from
20 different amino acids that prefer to take on typical geometries, the interpretation
of the electron density is simplified (Fig. 13.11). As with low-molecular-weight
structures the model is iteratively refined, and the structural data improved.
Electrons scatter X-rays. Therefore, the number of electrons around an atom
determines how well it is detected in the resulting density. Hydrogen atoms have
only one electron in their shell. As a consequence, they are often not located or are
located with poor accuracy in the electron density. Hydrogen atoms can be recog-
nized as densities in the structure determination of small molecules, but this is only
possible in protein structures if the resolution is less than 1 Å. It is unproblematic as
long as it only concerns hydrogen atoms at positions that correspond to spatially
fixed positions at a rigid molecular scaffold, for instance, hydrogen atoms on phenyl
rings. It is more difficult if the hydrogen atom is on a conformationally flexible
group or groups that can be protonated or deprotonated. It is good to know if
a carboxyl group is ionized, or if it exists as the free acid, and in which direction
the hydrogen atom is oriented. This information can only be indirectly gleaned from
the protein structure through an exact analysis of the spatial orientation of the
surrounding hydrogen-bonding partners.
The accuracy of the structure determination depends on the resolution of the
data that was obtained from a crystal. Even if the structure of the protein is
displayed on the computer screen like that of an organic molecule, its geometry is
much less accurately determined. The error margins in small molecule determina-
tions are approximately 0.01 Å for bond lengths, 0.1
for bond angles, and 1
–2
for
276 13 Experimental Methods of Structure Determination
a b
c
e
d
H
H
H
H
H
H
C
C
0
0
0 0
0
0
f
Fig. 13.10 Crystals with an edge of 0.1–0.3 mm are needed for the structure determination
of small organic molecules. (a) A diffraction pattern is obtained in an X-ray beam (compare
Fig. 13.8) that is displayed on a photographic plate or (b) is registered with a diffractometer
counter. The molecule that generated this diffraction pattern, which is periodically arranged
in the crystal is back-calculated from the reflections. (c) A Fourier transform is carried out
with approximate phasing, and a map of the electron density in space is obtained that is
contoured according to its height. The maxima are assigned to the atoms in the molecule
(here oxalic acid). (d) The spatial blurring of the electron density is associated with thermal
motion of the atoms. It is displayed with ellipsoids that represent the 50% probability of the
occupancy of each atom. (e) Crystals that scatter well allow the determination of the electron
density in the bonds between atoms. (f) The application of symmetry operations generates the
molecular packing in the crystal lattice. It delivers information about noncovalent interactions
between molecules.
13.5 Diffraction Power and Resolution Determine the Accuracy of a Crystal Structure 277
b
f
a
c
e
d
Fig. 13.11 (a) The diffraction pattern of a protein crystal clearly shows more reflections. As they
are made up by larger molecules the unit cells comprise a bigger volume and exhibit more lattice
planes and therefore reflections. However, due to high solvent content and inherent flexibility of
the more complicated macromolecules the crystals give rise to poorer diffraction quality and the
data are registered to a lower resolution. (b) The enormous data flood is registered with an area
detector on a diffractometer. This allows the simultaneous registration of many diffracted inten-
sities. (c) A Fourier transform performed with phases from the first model delivers the distribution
of the electron density in space (blue mesh). Because no atomic centers are resolved in this density,
the trace of the protein chain (here a segment from a b sheet of tumor necrosis factor, TNF) is fitted
to the electron density distribution. (d) Similarly to small molecules, the obtained model is refined
until all of the atoms of the protein fit optimally into the density. (e) The color-coded thermal
motion of the molecule is shown over the entire molecule. Blue to yellow to red color changes
show the transition from mild to severe movement. (f) Symmetry operations generate the molec-
ular packing in the crystal lattice. There are “empty” areas that are occupied by numerous water
molecules. Because of the strong thermal motion and the disorder that it causes, they are not found
in the electron density map.
278 13 Experimental Methods of Structure Determination
dihedral angles (▶ Chap. 16, “Conformational Analysis”). For protein structures,
significantly larger errors must be assumed, and they are difficult to quantify.
They depend on how the structure was refined. The electron density does not
allow individual atoms to be resolved. Therefore amino acids are placed with
idealized bond lengths and angles in the electron density. Their geometry is left at
the predefined knowledge-based values for the subsequent refinement. The
assignment of atom types for the placement of the side chains is partially based
on assumptions. Knowledge-based values are used, or attempts are made to keep
the hydrogen-bonding network consistent. These aspects are to be considered
when judging the accuracy of a protein structure. The result of the crystal
structure determination is given in a spatially and time-averaged picture of one
“mean” molecule that represents the whole crystal. Often it is discovered that the
electron density in some areas indicates only a reduced occupancy of a side chain
or a part of a bound ligand. Furthermore, alternative orientations (conformations)
are recognizable. Sometimes the electron density from entire areas is missing.
This is indicative of “disorder,” and argues for a distribution over multiple
orientations in the crystal. This disorder can be dynamic, that is, the relevant
groups jump back and forth between two or more orientations. Or the disorder is
static which mean several orientations are present side-by-side in a crystal.
Because the structure is an averaged picture, these arrangements are scattered
throughout the crystal with different orientations. If a part of the molecule is
entirely disordered, that is, scattered over numerous orientations, the electron
density is usually not visible. Today, just to reduce the damage due to radiation
exposure, structures are measured at 100 K by using a nitrogen cool gas stream. At
this temperature many movements in the crystal are frozen and static disorder can
be observed. Despite this, it has been shown that the determined structure corre-
sponds well to the situation at room or body temperature. These conclusions can
be drawn by comparing the results to the analogous determination from NMR
spectroscopy (Sect. 13.7) and molecular dynamic simulations (▶ Sect. 15.8).
13.6 Electron Microscopy: Using Two-Dimensional Crystals to
Trace Membrane Proteins
Cryoelectron microscopy represents an ideal complement to X-ray structure deter-
mination because it makes the structure of very large membrane-bound proteins
accessible. Electrons are used as the radiation source. These slightly penetrate the
crystalline sample and they are more strongly absorbed than X-rays. Molecules
scatter electrons much more strongly than X-rays. Therefore much smaller crystals
can be used. Even crystals that are razor blade thin and are made up of only a few
molecular layers are sufficient. Single molecules can even be imaged, but their
molecular mass must exceed several million Daltons. Smaller molecular weights
make periodically organized arrays of multiple molecules necessary. In the mean-
time, membrane protein crystals have been successfully grown in two-dimensional
periodic molecular orientation. The attempt to grow crystals of such proteins that
13.6 Electron Microscopy 279
are large enough for an X-ray structure analysis has only worked a few times and
requires very special additives for the crystallization.
In recent times crystallization of membrane proteins has been successful in
lipidic cubic phases. Sophisticated mixtures of lipid, water, and protein can form
structured three-dimensional lipidic arrays that are pervaded by water channels.
Protein molecules diffuse into this structured yet flexible matrix, which facilitates
crystal nucleation and growth.
In addition to the work with readily obtained crystals, electron radiation has
another advantage over X-rays. It can be used for a diffraction experiment as well as
for the direct visualization of an object. The microscopic visualization is unfortu-
nately not possible with X-rays because a convergent lens cannot be built for
X-rays. This is successful for electrons because they can be focused by using
magnetic fields. Why not use an electron microscope to visualize molecules in
general? Despite the reduced radiation, electrons still damage the samples con-
siderably. Furthermore the crystals that are used represent about a millionth the
sample size that is used for X-ray structure analysis. The data for an X-ray
structure can be collected on one single crystal. In contrast, several hundred to
thousand tiny, often only 5-mm large crystals are needed for electron microscopy.
They are shock-frozen under high vacuum and directly exposed to the electron
beam. Proteins can only withstand these conditions after special preparation.
A very low radiation dose is worked with. Because of this, the images are very
noisy and must be averaged over many observations. To obtain a detailed reso-
lution in the plane perpendicular to the crystal’s plane, the crystal must be
measured in many orientations. Fine structural details are lost in doing this. The
analogous patterns in the electron diffraction diagram, as would be obtained in an
X-ray experiment, can be corrected by computational methods. With the help of
the Fourier transform, an electron density map of the molecule is obtained. Its
interpretation or refinement is accomplished in the same way as for the X-ray
experiment. The phasing that is necessary for the transform can be determined
from the images in electron microscopy.
The technique is relatively young and the methods are developing further.
There is more work to be done. Structural determination still takes several years,
and only a few laboratories have adequately powerful microscopes. Nonetheless,
the knowledge that we have about the structure of membrane-bound receptors
today is often based on the results that were achieved with this method
(▶ Chap. 30, “Ligands for Channels, Pores, and Transporters”).
13.7 Structures in Solution: The Resonance Experiment in NMR
Spectroscopy
Many atomic nuclei have an angular momentum, or spin. The nuclei that occur in
biological systems that have a nuclear spin are the hydrogen isotope 1
H, the carbon
isotope 13
C, the nitrogen isotope 15
N, the fluorine isotope 19
F, and the phosphorus
280 13 Experimental Methods of Structure Determination
isotope 31
P. Just as a top would, these nuclei rotate about their axes. As long as no
magnetic field is applied, the tops orient in all possible spatial directions. In
a magnetic field they are forced into alignment (Fig. 13.12). If a toy top is spun,
it moves in the gravitation field. This field has one preferred direction. If the
alignment of the rotation axis of the top and the direction of the gravitation field,
which is oriented toward the center of the Earth, are not exactly the same, the top
wobbles. The end of the rotation axis performs a circular movement, an arc, with
a very precise rotational speed. It depends on the mass and geometry of the top. In
physics this movement is known as precession.
Atomic nuclei with a spin behave in a very similar way. In contrast to the
macroscopic top, they obey the laws of quantum mechanics. This means that the
rotation axes that their precession movement takes on can only adopt very specific
angles with respect to the applied field direction. The result for the 1
H, 13
C, 15
N, 19
F,
and 31
P nuclei is that the rotation axis for the precession arc can only be parallel
or antiparallel to the direction of the field. The orientation in the direction of the
field is energetically somewhat more favorable than the rotation antiparallel to
b
a
Fig. 13.12 Atomic nuclei with a rotational momentum behave like a spinning top. In the absence
of an external magnetic field, they orient in all possible directions randomly (a). Upon application
of a magnetic field, they orient their rotation axes parallel or antiparallel to the direction of the field
(b). The precession movement is oriented in an arc around the applied field direction. The two
orientations, parallel or antiparallel, with respect to the direction of the field are energetically
different. Because of this, there is a small difference in occupancy between the two states. By
applying an electromagnetic field with a frequency that corresponds to the rotational speed of the
top’s axis, the occupancy can be inverted. This resonance absorption, the exact frequency of which
depends on the type of nucleus and its immediate chemical environment, is registered with
a spectrometer.
13.7 Structures in Solution: The Resonance Experiment in NMR Spectroscopy 281
the direction of the field. Statistically, therefore, more nuclear spins in the substance
sample will align with the direction of the field. If an additional magnetic field
is applied to the outer magnetic field, and its frequency corresponds to the
precession frequency of the nuclear spin, the occupancy of “parallel” to “antipar-
allel” spinning nuclei can be reversed and a resonance absorption for the sample can
be registered. After a particular time span, the original situation is restored
(relaxation).
The rotational speed of the top’s axis for precession movements is character-
istic for each type of nucleus. It depends further on the composition of the
chemical environment in which the nucleus resides. A carbon atom of a phenyl
ring has a different resonance frequency than that of an aliphatic chain. The
relative position of the resonance absorption in relation to a standard reference is
also called the chemical shift. Furthermore the individual nuclei can perceive the
spin orientation of the neighboring nuclei. An alignment in the same direction as
a neighboring nucleus is energetically different from that of an antiparallel
orientation. This influence also modulates the rotational speed of the spin on
the observed nucleus. The information transfer regarding the orientation or the
magnetic state of the nuclei in the vicinity can be transmitted over several bonds.
This transfer can even occur through space without any direct covalent
connection.
To measure an NMR spectrum (nuclear magnetic resonance), a solution of the
substance has to be placed in a strong magnetic field. In addition, a variable
electromagnetic field is applied to the sample. The frequencies at which the nuclei
in the sample have resonance, meaning when they flip from parallel to antiparallel,
are recorded. The resulting spectrum discloses information about the composition
and the chemical environment around the studied nuclei. It contains information
about the spatial structure of the molecules under investigation. Based on the work
of Richard Ernst, multidimensional NMR techniques have been developed in the
last 30 years. By using suitable measuring conditions and selectively irradiating
electromagnetic fields, information about the mutual influence of resonance fre-
quencies among individual nuclei is separated and analyzed. This either-way
induced information transfer about the magnetic state of neighboring nuclei is
apparent from the signal form of multidimensional spectra, which are registered
in terms of cross peaks. Only the hydrogen isotope 1
H occurs in nearly 100%
natural abundance. Therefore, it can be assumed that for statistical reasons, two 1
H
nuclei will always be adjacent to each other in a molecule. In contrast, the 13
C and
15
N isotopes are scarce. As a result, statistically they are only very rarely found in
the direct vicinity of one another. Data on the mutual influence of the magnetization
of these nuclei are required for the spectra. Therefore it is necessary to enrich the
proteins with the appropriate isotopes. For this, bacteria are fed with isotopically
labeled substrates such as glucose or ammonium chloride and will then produce
proteins that are isotopically enriched. It is even necessary to produce deuterated
proteins for the structural investigation of very large proteins. Today, by using
numerous spectroscopic techniques, spectra from proteins of more than 800 amino
282 13 Experimental Methods of Structure Determination
acids have been successfully interpreted. The following questions can be addressed
by NMR analysis:
• Which atomic nuclei occur in which chemical environment?
• What is in the immediate, covalently connected neighborhood of these nuclei?
Information about the spatial orientation of atoms in the vicinity is also
contained within these spectral parameters.
• Which geometric relationships are given between different segments of the
polypeptide chain? This results from information transfer about magnetic states
of nuclei that are not directly connected by covalent bonds.
13.8 From Spectra to Structure: Distance Maps Evolve into
Spatial Geometries
This last-mentioned observation, which results from the nuclear Overhauser
effect (NOE), yields intramolecular distances of spatially neighboring but not
directly covalently bound atoms. The entire connectivity, that is, the list of all
covalent bonds within a molecule, and a list of the recorded intramolecular
noncovalent distances are applied to generate the structure for the molecule
(Fig. 13.13). For this purpose, so-called distance–geometry calculations are
used to create the spatial coordinates of the atoms.
Often multiple equally good structural models fulfill the experimentally deter-
mined distance conditions in complex molecules. If the spectral parameters for a
section of the structure are too scarsely distributed with too large distances, it is
very difficult to achieve a unique spatial configuration of the atoms. Therefore, the
generation of a structural model is coupled with molecular dynamics simulations
(▶ Sect. 15.7). These calculations deliver geometries of molecules that represent
energetically favorable 3D structures consistent with the spectral parameters.
Multiple slightly divergent models are given in areas with few spectral conditions.
Therefore, the NMR spectroscopists always suggest a bunch of structural solutions
(Fig. 13.14).
Attempts are often made to compare the quality of X-ray and NMR structures.
Both methods measure different properties, and the structures are derived from
different measured variables. This must be considered when making a direct
comparison. The accuracy of an NMR structure fluctuates with the density and
frequency of spectral distance constraints, while that of an X-ray structure mainly
depends on the resolution of the diffraction experiment.
13.9 How Relevant Are Structures in a Crystal or NMR Tube to
a Biological System?
The discussed structure determination techniques investigate molecules in a crystal
assembly or in solution in an NMR tube. Are these conditions at all relevant for the
13.9 How Relevant Are Structures in a Crystal or NMR Tube to a Biological System? 283
biological conditions in an organism? Small flexible molecules change their geom-
etry depending on the environment. They will adopt a different shape in a crystal, in
solution, or in the binding pocket of a protein. Therefore the question can be asked
whether the data from a small-molecule crystal structure are suitable to deliver
information about the molecular geometry in a binding pocket. From the numerous
known crystal structures, and in the meantime it is more than 500,000, some general
principles about the molecular architecture of organic compounds can be deduced.
All of the published crystal structures are electronically archived at the Cambridge
Crystallographic Data Centre in England. They can be retrieved and compared with
one another. It will be shown in ▶ Chaps. 14, “Three-Dimensional Structure of
0.0
10.0
8.0
6.0
4.0
2.0
0.0
2.0
4.0
PPM
6.0
8.0
10.0
A
B
B B
A
A
H2N
H2N
COOH
COOH
Fig. 13.13 A multidimensional NMR spectrum contains information about the spatial vicinity of
atomic nuclei in a molecule (here, the trypsin inhibitor from bovine pancreas). It is expressed in so-
called cross peaks. Information can be extracted about the distance between non-covalently bound
atoms in a molecule. The individual signals of the spectra are assigned to atoms in the molecule
(e.g., A and B). The positions that these atoms have in the polypeptide chain are known from the
sequence of the protein (above left). The intensity of the cross peak indicates which spatial distance
is found between nuclei A and B in the folded polypeptide chain (above right). Just as was done for
A and B, the many other cross peaks are evaluated and translated into distance conditions.
284 13 Experimental Methods of Structure Determination
Biomolecules” and ▶ 16, “Conformational Analysis” that valuable information
about possible molecular and interaction geometries are available through a
statistical evaluation of these data, which provides insights also relevant for the
conditions in a protein binding pocket.
Nevertheless, are the structures in the crystal of the protein too remote from the
conditions in a biological system, much further than, for instance, the solution-
phase state? A good many structure determinations that were carried out in solution
and in the crystal in parallel are available. Experience has shown that the correlation
is usually very large. Deviations are preferably found on the surface area of
proteins. There, the amino acid side chains form interactions with the environment.
Therefore, these deviations are not surprising. The crystal packing of tumor necro-
sis factor (TNF) is presented in Fig. 13.11. Large holes are conspicuous in the
crystal packing. These areas are filled with water molecules that are so loosely
incorporated into the crystal that they can freely move to a large extent. Therefore,
they are not locatable in the electron density. Channels filled with water in protein
Fig. 13.14 The accuracy of an NMR structure depends on the density of the experimentally
determined atomic distances. These come from experiments that deliver information about the
exchange of the magnetic state of spatially adjacent, but not directly connected atoms (so-called
nuclear Overhauser effect, NOE). With the connectivity list and the NOE conditions, multiple
structural models are generated. These models represent the low-energy geometries that agree with
the spectral parameters. In the left part of the figure (a) the experimentally measured NOEs (black
dashed lines) are distributed over the 3D structure of a domain of the guanine nucleotide exchange
factor. For the sake of clarity, only the long-range NOEs are shown. Most of the amino acid side
chains are also suppressed; many of these NOEs therefore indicate the positions of atoms that are
not shown. In areas in which very few distances could be determined (e.g., in the green loop areas
or at the termini), the model is ambiguously defined. Multiple models are consistent with the
experimental data (b). The main chain of the protein fans out. In areas where a large number of
NOE conditions are found (e.g., the helices and the central b strand), the structural models diverge
only slightly from one another.
13.9 How Relevant Are Structures in a Crystal or NMR Tube to a Biological System? 285
crystals can make up to 70% of the crystal’s mass! Therefore, the crystal can also be
considered as a highly concentrated, ordered solution. NMR measurements also
require high concentrations. They are considerable higher than in biological
systems, but are still 10–100 times lower than in protein crystals.
The high water content of protein crystals offers the possibility to allow small
molecules to diffuse into the crystals. In the water channels, they move as they
would in an aqueous solution. In favorable cases, the binding pocket of the protein
is directly accessible from one of these channels. By placing the protein crystal
directly in a solution of the active substance (soaking), the latter can penetrate the
crystal through the channels, diffuse into the binding pockets, and dock there. Then
a new diffraction experiment is carried out with the loaded crystal. The reflections
are measured, and, based on the known structure of the protein, the electron density
map is generated. The density of the uncomplexed protein is subtracted from that
map. The difference density of the incorporated ligand remains. This information is
of essential importance for understanding the interactions between small molecules
and proteins. The question of whether the experimental structure is really relevant
for the biological conditions has still not been answered. Crystalline hemoglobin is
able to reversibly take up and release oxygen. It could be shown on crystals of
purine nucleoside phosphorylase (PNP) that the enzyme is still catalytically active
in the crystal (Fig. 13.15).
The research group of Malcolm Walkinshaw at the University of Edinburgh
could even show on the example of the enzyme Cyp3
, a peptidylproline isomer-
ase, that there is a quantitative agreement between the crystalline and solution
states. Different concentrations of an inhibiting prolyldipeptide were allowed to
diffuse into the crystal. Afterward, the occupancy of this inhibitor obtained from
the differently concentrated soaking solutions was determined in a crystallo-
graphic experiment. The binding constants were then ascertained from this
occupancy data. They quantitatively agreed with the inhibition constants that
were determined in a functional assay in solution.
The diffraction data can be very quickly collected with even more intense,
so-called white X-rays from a synchrotron source (the so-called Laue technique).
With this experiment, it was possible to observe stable intermediates of enzyme
reactions. Structural changes of the two-dimensional crystals of the acetylcholine
receptor (▶ Sect. 30.4) could be observed with electron microscopy after loading
with the natural ligand. This and other experiments have proven that proteins exist
in a crystal lattice that must be, at the very least, very similar to the biologically
active form.
13.10 Synopsis
• The most powerful methods to determine the spatial structure of molecules are
X-ray crystallography and NMR spectroscopy. The former requires the bio-
molecules to be arranged in periodic arrays in a crystal, and the latter studies
them in solution, usually in an isotopically labeled form.
286 13 Experimental Methods of Structure Determination
• Crystals need special conditions to grow from saturated solutions. They spatially
arrange in periodic arrays, and the molecules pack through translational sym-
metry in three dimensions. In addition to the pure shifting of basic motifs,
usually one molecule that represents the asymmetric unit, symmetry operation
such as mirror reflection, two-, three-, four-, and six-fold rotation or inversion
can be applied.
• Crystal lattices diffract X-rays and the diffraction experiment can be understood
as a three-dimensional interference of elementary spherical waves generated
at the positions of the atoms in the lattice. The diffraction phenomenon
at a 3D lattice can be treated formally as multiple reflections at crystal planes
in the lattice.
• Because the relative phases of the generated elementary spherical waves,
superimposed in the various reflections, are not accessible by experiment, they
O O
O
HO N
NH
N
N NH2 N
NH
N
N NH2
H
OH
OH
H2PO3
−
O
O
H OPO2H−
+
+
PNP
OH
OH
Crystal
removed
Reaction
rate
Crystal
removed
Crystal
soaked
Crystal
soaked
Time
Fig. 13.15 The enzyme purine nucleoside phosphorylase (PNP) transforms guanosine and
phosphate to guanine and ribose-1-phosphate. If a protein crystal is placed in a solution of the
substrate, the reaction begins. This could also have been caused by a partial dissolution of the
enzyme crystal. If the crystal is removed from the solution, the reaction stops. If the crystal is
brought back into the solution, the reaction carries on. This experiment demonstrates that even
crystalline enzymes are catalytically active. Therefore, a geometry must be present in the crystal
that corresponds to the biologically active form.
13.10 Synopsis 287
must be regenerated by sophisticated phasing methods. Only then can a Fourier
transform be calculated from the measured reflections that represents the spatial
distribution of the electron density in the crystal. A model of the crystallized
molecules is assigned to this electron density.
• The diffraction power and resolution of the crystals determine the accuracy of the
resolved structure. For proteins, a resolution of 1.5–3 Å is usually achieved. At the
lower end, molecular building blocks such as phenyl rings are well resolved, and
individual water molecules are visible. At the upper limit, only the overall
topology is determined, and the water molecules usually cannot be assigned.
• The crystal structure is an average structure over space and time. Enhanced
B-factors give an estimate of the residual mobility of molecular portions in
a molecule.
• Cryoelectron microscopy is an alternative method to determine the structure of
membrane-bound proteins in particular by diffraction experiments. Data are
collected from thousands of tiny razor blade-thin crystals.
• NMR spectroscopy records the resonance of magnetic nuclei such as 1
H, 13
C, or
15
N oriented in a strong magnetic field. The transition between parallel and
antiparallel orientation of the nuclear spins can be induced by additional fields.
Because the frequency at which these transitions take place depends on the
chemical environment in a molecule, the spectral parameters contain informa-
tion about the 3D structure of the molecules in solution.
• The multiplicity of the recorded spectral parameters can be transformed into
distance maps. They can be translated into the spatial structure of the protein by
using a distance geometry approach coupled with molecular dynamics simulations.
• It could be shown for many cases that the NMR structure of a protein in solution
and the X-ray structure in a crystal largely coincide with one another. Differ-
ences are observed for the surface-exposed residues.
• Protein crystals contain up to 70% water and exhibit large water channels that
pass through the crystal. Small molecules can diffuse and access binding sites
through these channels, particularly if these sites are accessible from one of these
channels. The binding modes of small-molecule ligands can be easily deter-
mined by using soaking techniques.
• The significance of the architecture of proteins determined in a crystalline envi-
ronment for biologically relevant conditions has been demonstrated. For example,
enzyme reactions also take place when the protein is arranged in a crystalline state.
Bibliography
General Literature
Blundell TL, Johnson LN (1976) Protein crystallography. Academic, London
Drenth J (1994) Principles of protein X-ray crystallography. Springer, Berlin
Dunitz JD (1979) X-ray analysis and the structure of organic molecules. Cornell University Press,
Ithaca
288 13 Experimental Methods of Structure Determination
Friebolin H (2010) Basic one- and two-dimensional nmr spectroscopy. Wiley-VCH, Weinheim
Glusker JP, Trueblood KN (1985) Crystal structure analysis, a primer, 2nd edn. Oxford University
Press, New York
Glusker JP, Lewis M, Rossi M (1994) Crystal structure analysis for chemists and biologists. VCH
Publishers, New York
Pellecchia M, Bertini I, Cowburn D et al (2008) Perspectives on NMR in drug discovery:
a technique comes of age. Nature Rev Drug Discov 7:738–745
Wuthrich K (1986) NMR of proteins and nucleic acids. Wiley, New York
Special Literature
DeRosier DJ (1993) Turn-of-the-century electron microscopy. Curr Biol 3:690–692
Wear MA, Kan D, Rabu A, Walkinshaw MD (2007) Experimental determination of van der Waals
energies in a biological system. Angew Chem Int Ed 46:6453–6456
Bibliography 289
Three-Dimensional Structure of
Biomolecules 14
In drug design the ligand, which is generally a small organic molecule with
a molecular weight of under 500 Da is under focus. It undergoes interactions with
a macromolecular receptor and exerts an influence on the receptor’s characteristics.
On the other hand, the surrounding receptor can also determine the properties of the
bound active ligand. Selective interference in these interactions requires not only an
understanding of the ligand but also the receptor. After the methods for the
structural determination of biomolecules were introduced in the last chapter, we
want to take a look at what can be learned about the construction principles and
characteristics of these molecules. Proteins are made up of 20 basic building
blocks, the amino acids (see Appendix 1). A dipeptide is formed by coupling two
amino acids through an amide bond. Larger peptides and proteins are formed by the
addition of further amide bonds.
14.1 The Amide Bond: Backbone of Proteins
The simplest molecule with an amide bond is formamide 14.1. Its structure is
shown in Fig. 14.1. This connection occurs many hundreds of times in proteins,
for instance, over 50,000 times in the shell of the rhinovirus. The bond length
between the carbon, oxygen, and nitrogen atoms can be obtained from the crystal
structure of formamide. The microwave spectrum of gas-phase formamide also
affords bond lengths, but different values are obtained. In the gas phase, formamide
is “isolated,” that is, it does not “perceive” any neighbors in its immediate vicinity.
The C═O double bond is shorter, and the C–N single bonds are longer than in the
crystalline formamide. In the crystal assembly, the individual formamide molecules
are not “alone.” They are connected to neighboring molecules by hydrogen bonds.
A hydrogen bond is a non-covalent interaction. It couples a functional group
carrying a hydrogen atom (e.g., NH or OH) with an electronegative heteroatom
(e.g., N, O; ▶ Sect. 4.2). Obviously, incorporating a molecule into a network of
G. Klebe, Drug Design, DOI 10.1007/978-3-642-17907-5_14,
# Springer-Verlag Berlin Heidelberg 2013
291
hydrogen bonds causes a change in its geometry. The electron density between the
atoms is shifted so that the C═O double bonds are longer and consequently weaker.
Simultaneously, the C–N single bonds become shorter and stronger. Twisting the
molecule around this bond away from planarity is therefore made difficult.
The amide bond is the fundamental building block of proteins. Every third bond
in the polymer chain is an amide bond. As we have seen in formamide, they have
a planar geometry, that is, a plane can be defined through its atoms. The folding of
the polymer chain and the concomitant spatial construction of the protein is
determined by the torsion angle in the plane of the amide bonds against one another
(Fig. 14.2). Its rigidity and planarity is decisive for the stability of the spatially
folded protein. In proteins, the amide bonds are practically only in the trans
configuration. Only the rotation around the plane of the amide bond remains as
a degree of freedom for the polymer chain. These torsions (▶ Chap. 16, “Confor-
mational Analysis”) occur around bonds that lie between the Ca carbon atoms. As
was shown in the bond-length comparison between the gaseous and crystalline
formamide, the decisive additional stiffening of the amide bond is caused by its
incorporation into a hydrogen-bonding network.
Bond length in Å
C=O C-N
Formamide
N
H
H
Crystal assembly
Gas phase
1.241
1.219
1.318
1.352
H O
14.1
H H
N
C O
H
Fig. 14.1 Formamide 14.1 is the smallest molecule that has an amide group. Its molecular
structure is shown the lower part. Because of thermal motion in the solid state the molecule
carries out vibrational movements. Its electron density is therefore distributed over a larger area.
This is described by using ellipsoids that encompass the 50% probability of occurance the atom.
Two hydrogen bonds are incurred between the carbonyl group and the amide group of
a neighboring molecule in the crystal packing. An extended H-bond network stabilizes the crystal
structure and polarizes the amide group. The bond lengths (in Å) are different in the crystal
assembly and in the gas phase (upper part).
292 14 Three-Dimensional Structure of Biomolecules
180
−100
Ψ
−180
−160
−140
−120
−100
−80
−60
−40
−20
0
20
40
60
80
100
120
140
160
180
Φ
−120
−140
−160
−180
160
140
120
100
80
60
40
20
0
−20
−40
−60
−80
Cα
H
N
O C
H
ψ
H
H
Cα
R
N
H
φ
C
O
O
Cα
Fig. 14.2 The spatial course of a polypeptide chain is determined by the relative orientation of the
planar peptide bonds (above). The twist of these planes against one another is measured on
the basis of the two twisting or dihedral angles f and c. These do not assume any value around
the bond axes, but rather are limited to a few combinations of value ranges. In the diagram, a
so-called Ramachandran plot, the values for both angles (below) along the peptide chain are
plotted. The angle combinations for an a helix are found in the middle left (Fig. 14.3), and those for
a b-pleated sheet in the top left (Fig. 14.4).
14.1 The Amide Bond: Backbone of Proteins 293
14.2 Proteins Fold in Space to Form a Helices and b Strands
Typically, the angles named f and c are used to describe the two dihedral angles
around the Ca carbon atom, and these angles usually take on value pairs from two
ranges. These ranges are related to a helical or sheet-like course of the polymer
chain (Fig. 14.2). In an a helix with a right-handed turn, all CO and NH groups
orient in the same direction (Fig. 14.3). They form a network of H-bonds among
themselves. Each amino acid in the helix is in contact with the next fourth amino
acids in the sequence. This unidirectional orientation of the polar groups of the
R
R
R
R
R
R
R
R
R
R
R
1
2
3
4
5
6
7
8
9
10
11
a b
Fig. 14.3 The a helix is a commonly found secondary structure. The polypeptide chain forms
a right-handed spiral with a pitch of 7 Å, and 3.6 amino acids per turn (a). All carbonyl groups
(oxygen is red) are oriented parallel to the helix axis in the same direction. The NH functionalities
(nitrogen is blue, hydrogen is light blue) are oriented in the opposite direction. The groups form
a pronounced hydrogen-bond network (violet dashed line) between themselves (b). The side chain
(R) on the Ca atoms are on the outside pointing away from the helix axis. This forms a typical
furrow pattern that runs in a spiral over the surface. This “ridge and groove” pattern determines the
mutual packing of a helices in proteins.
294 14 Three-Dimensional Structure of Biomolecules
amide bonds in an a helix has consequences for the electrostatic characteristics
(▶ Sect. 15.4). Whereas a helix is made up of amino acids from a single segment of
the peptide, amino acids from at least two sequence sections must come together to
form a b-pleated sheet. Both strands can be bonded with each other in either
a parallel or antiparallel orientation relative to the polymer chain (Fig. 14.4). This
network exhibits a different progression of H-bonds for both orientations. The side
H
H
H
H
H H
H
H
H
H
H
H
H H H
H
H
H
H
H
H
N
O
R
C
N
O
R
C
N O
R
C
N
O
R
C N
O
R
C
N
O
R
C
N
O
R C
C
C
C
C
C
C
C
C
C
O
O
O
O
R
R
R
N
N
N
C
C
C
C
N
Antiparallel
H
H
H
H
H
H H
H
H
H H
H
H
H H
H
H
H
H
H
H
C
N
O
R
C
N
O
R
C
N
O
R
C
N
O
R
C
N
O
R
C
N
O
R
C
N
O
R C
C
C
C
C
C
C
C
C
C
O
O
O
O
R
R
R
N N
N
N
C
C
C
C
Parallel
7 Å
R
R
R
R
R
R
R
R
Fig. 14.4 A second important secondary structure, the b strand is composed of multiple sections
of the polymer chain that exist in a stretched conformation (top). The strands can run parallel or
antiparallel. They are crosslinked to each other via hydrogen bonds (violet). The sheet-like
structure displays a zigzag wrinkle and is called a b-pleated sheet. The side chains (R) of the
amino acids point away from, and alternate above and below the pleated sheet.
14.2 Proteins Fold in Space to Form a Helices and b Strands 295
chains alternate above and below the pleated sheet. The entire strand is slightly
twisted upon itself. Because of this a pleated sheet of multiple strands has a twist to
it when viewed from the side (Fig. 14.5).
Aside from these two common secondary structures, other typical combinations
of torsion angles occur. A polymer chain that folds to a globular structure in space
must reverse its direction. This is achieved in the so-called turn or loop region.
Turns can be classified according to the number of involved amino acids and the
type of interaction that closes the turn. Loops that form a C═O···H–N hydrogen
bond in the direction of the polymer chain, inverse turns with hydrogen bonds in the
reversed orientation, and open turns in the chain that are held together by van der
Waals interactions and polar interactions can be distinguished from one another
(Fig.14.6). A total of 158 turn classes were summarized in a recent evaluation by
Oliver Koch.
What force effectuates the organization of a protein? Amino acids possess
hydrophilic and hydrophobic side chains. Hydrophobic groups avoid aqueous
environments (▶ Sect. 4.2). During the folding of the polymer chain in an aqueous
medium the hydrophobic amino acids aggregate to diminish their common hydro-
phobic surface. That is why the hydrophobic amino acids are predominantly
found in the inside of a folded protein. The polar groups of the amide bonds of
the main chain become saturated in the secondary structure by hydrogen bonds.
The side chains of polar amino acids are only found on the inside of a protein if
they can form a polar interaction with another amino acid in the vicinity. Other-
wise they orient themselves on the outside of a protein; they protrude into the
surrounding water. Proteins can also span a cell membrane. In those areas where
they have contact with the membrane they have a large, cohesive hydrophobic
surface (Sect. 14.7). If the packing density in the interior of the protein is
Fig. 14.5 Within a b-pleated sheet of multiple strands, here shown with a parallel orientation,
a right-handed twist occurs. For simplification, the single b strands are indicated with an arrow.
The twist can be seen by the internal rotation of the arrow. The pleated sheet here is shown in two
perpendicular views.
296 14 Three-Dimensional Structure of Biomolecules
considered, it is on the same scale as is found in crystals of small organic
molecules. The interactions that determine the molecular packing are identical
in both cases.
14.3 From Secondary Structure Via Motifs and Domains to
Tertiary and Quaternary Structure
Proteins organize their secondary structural segments in motifs. As an example
the sequence of an a helix, a b strand, and another a helix makes up one motif.
Multiple motifs fold into domains to yield the tertiary structure of a protein.
Domains can be preferably constructed from helices, pleated sheets, or
a combination of both building blocks. Often the domain has a particular function.
Many proteins are made up of a single domain. Complex proteins can be built
from multiple domains. If a complex assembly of multiple separate polymer
chains forms (e.g. as in hemoglobin), this will be referred to as quaternary
structure.
Despite the enormous multiplicity that can be achieved by combining the 20
amino acids into sequences, there seems to be a rather limited number of folding
possibilities for the domains. How many total folding patterns exist can be specu-
lated upon. Of all the crystal structures that are known today, 1,150 different folding
patterns have been found. Because no new examples have been found in the last years
despite intense efforts, it can be assumed that there are perhaps 1,200 stabile patterns.
This number is essentially based on data from globular enzymes and transport pro-
teins. Approximately 30% belong to one of the classes shown in Fig. 14.7. To date
perhaps only 100 structures are known from the group of membrane-bound proteins.
On the basis of these examples it seems difficult to make an estimate about possible
additional folding classes to be found in membrane proteins.
Drug design concentrates on the interaction of a ligand with a protein. Therefore
the structural considerations of chemists are usually limited to the amino acid
COi – NHi+n Cαi – Cαi+n
NHi – COi+n
3–6 2–6 4–6 Amino acids
Fig. 14.6 The polymer chain of a globular protein reverses its direction in the loop or turn area.
Numerous turn patterns have been found. They are made up of 2–6 amino acids. Normal turns
(left) form a C═OHN hydrogen bond (violet) in the direction of the polymer chain. This
hydrogen bond has a different order in inverse turns (middle). Another group of open turns
(right) is held together by van der Waals contacts and polar interactions.
14.3 From Secondary to Tertiary and Quaternary Structure 297
Fig. 14.7 The course of the polypeptide chains is symbolized with spirals for a helices, with
arrows for b-pleated sheets, and with threads for different turn segments. Approximately 30% of
the structurally known proteins can be assigned to one of the nine shown folding classes. The first
folding pattern (bottom left) is a “TIM barrel,” and the one above is an open pleated-sheet structure.
298 14 Three-Dimensional Structure of Biomolecules
groups that protrude into the binding pocket. The folding pattern in the vicinity
of the binding pocket, however, exerts an influence on the properties that are
found there. For example, a helix that is arranged toward the binding pocket
decisively determines the local electrostatic potential. Even this can be exploited
for the design of selective ligands that bind only to proteins of a particular
folding class.
Despite progress in the methods of structure determination techniques, it can
occur that the structure analysis of an important protein fails, but the structure of,
for example, a related protein can be solved. A model of the desired protein can be
built on this basis (▶ Sect. 20.5). Information about the construction and folding
principles of proteins are needed for this purpose. They allow the understanding of
what part of the protein stabilizes the scaffold, what parts determine functions, and
what parts make up the differences between homologues.
An in-depth discussion of these principles would go too far here. As an example,
the folding pattern of the b barrel should be examined. A stretched-out sheet of
multiple b strands has an internal twist (cf. Fig. 14.5). If, as an example, eight such
strands are lined up next to one another, a cylinder is formed. This barrel-like
folding pattern of eight and more strands is often observed. Several variations of
this folding pattern are displayed in Fig. 14.8 that show how, and according to
which principles, a polypeptide can spatially fold.
A loop acts as a connecting element between the pleated sheet strands of the b
barrel in the example in Fig. 14.8. a Helices can also serve as connecting elements
(Fig. 14.7). A barrel-like structure forms on the surface, and the bridging a helices
align on its surface. This folding pattern was first discovered in triosephosphate
isomerase. It is therefore called a TIM barrel (Fig. 14.7). Another important
folding class that is made up of a-helical and b-pleated sheet segments are the
open-sheet structures (Fig. 14.7). In this class the pleated sheet does not close to
a cylinder but rather it remains open. Helices group above and below the sheet.
14.4 Are the Fold Structure and Biological Function of Proteins
Correlated?
How is the structure of a protein coupled to its function? Do all proteases, for example,
display the same folding pattern? A large number of enzymes that have distinctly
different functions all belong to the TIM barrel type, or the open-sheet structure.
There are many oxidases, isomerases, kinases, aldolases, synthases, dehydrogenases,
or proteases that can be assigned to these two classes. Here, Nature started from
a common origin and developed divergently. Consequently, the function of a protein
is not necessarily coupled to a particular folding pattern. If the construction of the
enzyme is analyzed further, it turns out that the catalytic sites of the proteins of
a folding class are at the same position. This is found at the C terminal end of the
barrel in the TIM-barrel structure, and at the topological switch of the connecting
helices from the upper to the lower side of the open-sheet structure (Fig. 14.9).
14.4 Are the Fold Structure and Biological Function of Proteins Correlated? 299
C
N
C
C
N
4 1 2 3
4 1
2
3
2
8 1
7
6 3
4
5
N
C
C
N
N
C
N
C N
C N
a
b
c
Figure 14.8 The folding pattern of different b-barrel structures can be thought of as
a polymer chain with eight separate b strands (arrows). These are separated by loop areas.
(a) An up-and-down barrel forms when the folding of the polymer chain of eight b strands follows
a zigzag pattern. The antiparallel sections form hydrogen bonds between themselves that close
up to form a cylinder. (b) The four-b-strand polypeptide chains lie next to one another so that the
first chain interacts with the fourth, and the second interacts with the fifth. Then the double
strand folds and the first pair comes to lie next to the second. Because the course of the polymer
chain is reminiscent of the engravings on Greek vases, the pattern is called a Greek key. Two
such patterns can come together into a cylinder-like orientation and form a Greek-key barrel.
(c) Another folding pattern is formed from a double strand that is placed together with
an internal twist. The double strand wraps itself into a cylinder-like structure that is called a
jelly roll.
300 14 Three-Dimensional Structure of Biomolecules
The function-determining amino acids occur in the loop area between neighboring
pleated sheets and helices. Why would Nature follow this principle of separating the
folding structure from the function? The amino acids that enable the stable folding of
a domain are separated from those that induce a specific function. This approach is
a very efficient evolutionary strategy. Two areas were simultaneously optimized:
• The stability of the protein scaffold in special folding patterns
• The layout of the amino acid sequence to serve a special function.
Spatially separating and displacing the function-carrying groups in the structur-
ally less-committed loop areas allowed the two tasks to be optimized in parallel.
Exchanging a single amino acid in a secondary structure element could destabilize
the entire folding pattern and stop the folding. This is avoided if the amino acid
sequence that is to be functionally optimized is placed on a stable scaffold that does
not interfere with the optimization.
A protein class that implements this principle to perfection are the immunoglob-
ulins. As antibodies they recognize and bind to xenobiotics, the antigens. To remove
an antigen, immunoglobulins with highly specific binding pockets and high affinity
must be available within a few days. The recognized substances could be anything
from small organic molecules to large proteins. Despite this, it is estimated that
about 1012
different variable sequences are formed based on only about 25,000
human genes. The difficult task of achieving such high diversity is solved by
immune-system cells by using a combination of different variable gene segments
and excessive amino acid exchange in these segments during lymphocyte maturation.
In this way, variable loop areas are formed that are set upon a stable scaffold of
Fig. 14.9 The folding-pattern-determining and function-carrying amino acid groups are found in
proteins in different regions. (a) The catalytic site (yellow spheres), which binds and transforms
substrates lies in a TIM-barrel-type structure (a helices: red cylinder, b strands: light-blue arrows)
at the end of the barrel where one would expect to find a lid. The loops of the polymer chain that
surround this “lid” (gray and green threads) carry the function-determining amino acids. (b) The
function-determining amino acids in the loop area occur in the open-pleated-sheet structure there,
where the attached helices change from the top to the bottom of the pleated sheet.
14.4 Are the Fold Structure and Biological Function of Proteins Correlated? 301
barrel-like pleated sheet structures (Fig. 14.10). The therapeutic value of such bio-
molecules (so-called biologicals) has been recognized. Many humanized antibodies
can be found in development as therapeutics (▶ Sect. 32.3).
14.5 Proteases Recognize and Cleave Substrates in
Well-Tailored Pockets
Proteases cleave polypeptide chains during enzymatic degradation or upon the
release of an active protein or peptide from an inactive precursor form. For this,
the enzymes possess a catalytic site in which the cleavage takes place (Sect. 14.6
and ▶ Chaps. 23, “Inhibitors of Hydrolases with an Acyl–Enzyme Intermediate”;
▶ 24, “Aspartic Protease Inhibitors”; and ▶ 25, “Inhibitors of Hydrolyzing
Metalloenzymes”). To recognize a particular substrate specifically, multiple
binding pockets are on their surface. These are structurally complementary to
the side chains of the substrate that orient themselves around the catalytic sites.
In 1967 Israel Schechter and Arieh Berger proposed a system of nomenclature to
describe these pockets (Fig. 14.11). The position of the amino acids of the
peptide substrate are described as P3, P2, P1 P1
0
, P2
0
, P3
0
and so forth. Starting
at the N terminus, the position P1 is immediately before and the position P1
0
is
immediately after the cleavage site. The binding pocket of the enzyme for the
side chain of the amino acid P1 is called S1, and the same goes for the other side
chains. This very useful nomenclature is initially purely formal. The translation
of these labels to a particular enzyme does not mean that the named binding
pocket really exists. Two binding pockets can appear as one large binding pocket
Fig. 14.10 The immunoglobulins form a highly specific binding pocket in which they recognize
antigens, which are exogenous substances. The enormously large structural variety of these binding
pockets is achieved by variations in the amino acids in the loop areas. The immunoglobulins
have a Y-like form that is divided into a trunk (constant Fc domain) and two identical Fab branches
(a). The course of the polymer chain in these branches corresponds to the barrel type. The antigen-
binding site is indicated by an arrow. Picture (b) is an enlargement of the circled branch in
(a). Loops are found at the right end (colored) that are responsible for the recognition of exogenous
substances. They grasp the antigen (here dark red) like the fingers of two hands.
302 14 Three-Dimensional Structure of Biomolecules
in the 3D structure. The S3 and S4 binding pockets in the serine protease thrombin
are really only one large pocket (▶ Sect. 23.3). It can also happen that a substrate
amino acid has no complementary binding pocket in the enzyme. It then pro-
trudes into the water.
14.6 From Substrate to Inhibitor: Screening of Substrate
Libraries
Peptides are easily synthesized with enormous diversity (▶ Sect. 11.5). If the
peptide is attached to a probe that changes its color or fluorescence upon release
(▶ Sect. 7.2), the labeled peptide can be used to ascertain the substrate profile of
the protease. For this purpose a large library (▶ Sect. 11.1) of these peptides is
offered to the protease, and the members that are well cleaved are identified. In
Fig. 14.12 the amino acid composition of a labeled tetrapeptide is given that is
preferably cleaved by the proteases trypsin, factor Xa, plasmin, and chymotryp-
sin. Peptides with basic groups such as arginine or lysine are preferably cleaved
by trypsin, plasmin, and factor Xa. Factor Xa converts peptides with arginine in
the P1 position almost exclusively. Chymotrypsin behaves entirely differently. It
prefers to have aromatic amino acids such as tyrosine, phenylalanine, and tryp-
tophan in the P1 position. The selectivity at the positions P2 to P4 is not nearly as
pronounced. Trypsin transforms tetrapeptides that have branched groups at P2
such as Phe, Tyr, Trp, Ile, or Val much more poorly if an arginine is at the P1
position. Basic groups are also less preferred. Trypsin shows virtually no selec-
tivity at the P3 and P4 positions. Factor Xa has a particular preference for the small
glycine at position P2, but hardly any difference at all is seen for the groups in the
N
N
N
N
N
O R2
R1
O R1⬘
O R2⬘
O R3⬘
R3
N
O
S3 S1 S2⬘
S2 S1⬘ S3⬘
P3
P2O
P1
P3⬘
P2⬘
P1⬘
H
H
H
H
H
H
Fig. 14.11 The side chains of a peptide substrate and the binding pockets that they belong to them
are classified on the N-terminal side of the peptide as P3, P2, P1. . . or S3, S2, S1. . . (left); on the
C-terminal side they are classified as P1
0
, P2
0
, P3
0
. . . or S1
0
, S2
0
, S3
0
. . . (right).
14.6 From Substrate to Inhibitor: Screening of Substrate Libraries 303
NH2
O
V
F K H D E N S T Y R W G A P I L n
F K H D E N Q S T Y R W G A P V I L n Q
V
F K H D E N S T Y R W G A P I L n
F K H D E N Q S T Y R W G A P V I L n Q
O
N
O
N
H
O
O
N
H
N
P4 P2
O
H
O
O
H
N
H
P3
P1
a
Trypsin Faktor Xa
Plasmin Chymo-
trypsin
O
NH2
P4 P2
O
N
H
O
N
H
O
N
H
O
O
N
H
O
N
H
O
P3
NH
P1- constant
NH
H2N
R K H D E N Q S T Y F WG A P V I L n R K H D E N Q S T Y F WG A P V I L n R K H D E N Q S T Y F WG A P V I L n
R K H D E N Q S T Y F WG A P V I L n
R K H D E N Q S T Y F WG A P V I L n R K H D E N Q S T Y F WG A P V I L n
b
Trypsin P4 P3 P2
P3
P4
Faktor Xa P2
Fig. 14.12 A tetrapeptide library, held constant in position P2 to P4, was varied at position 1 with
19 amino acids (one-letter notation; n norleucine). It is cleaved by trypsin after arginine and lysine,
by factor Xa after arginine, and by plasmin after lysine (a). If arginine is held in position P1 and the
remaining three positions are varied, trypsin shows practically no selectivity for the amino acids at
P2, P3, and P4. On the other hand, factor Xa prefers a glycine in position P2 (b).
304 14 Three-Dimensional Structure of Biomolecules
P3 position for this enzyme. On the other hand, different groups in the P4 position
are more strongly selected. The substrate-binding profile helps to expose the
selectivity characteristics of enzymes. They display the complementary proper-
ties in the binding pocket and help to inspire the first ideas about the design of
imaginable inhibitors.
This concept was applied to cysteine proteases in the research group of
Jonathan Ellman at the University of California at Berkley. Substrate molecules
were synthesized that carried a fluoresence marker at the end of an amide bond
that was to be cleaved. Different organic building blocks were placed on the other
side. If such a substrate molecule is cleaved by the protease, the organic part
must be bound in the binding pocket of the enzyme. Therefore, the transformation
indicates the binding of a test molecule. The method can be optimally used
for screening. A hit that is discovered in this way can easily be chemically
transformed from a substrate molecule to an inhibitor. If the cleaved amide bond
is replaced with, for instance, an aldehyde function, a cysteine-protease inhibitor
(▶ Sect. 23.9) can be developed that has very little in common with the peptide
substrate.
14.7 When Crystals Learn to Walk: From Static Crystal
Structures to Dynamics and Reactivity
What kind of information about the dynamics and reactivity of molecules can be
extracted from a crystal structure? Molecular vibrations are visible even in the solid
state. This is reflected in the blurriness of the electron density. If a molecule takes
part in a reaction, bonds are broken and new ones are formed. The formation and
cleavage of amide bonds is a central task in biochemical processes. The molecule
14.2 contains an amide and an ester group (Fig. 14.13). If a crystal of this compound
is exposed to thermal energy, a reaction takes place in the solid state to form 14.3.
The molecule is in a geometry in the incipient crystal structure that is conducive for
entry into the reaction pathway.
Having information about changes in the geometric orientation of functional
groups in the chemical reaction is decisive for understanding the concomitant
structural changes that occur. This knowledge is a prerequisite for the design of
transition-state-analogue inhibitors (▶ Sects. 6.6 and ▶ 22.3). In view of the for-
mation or cleavage of an amide bond, the question is posed: from which direction
does the amino group attack the carbonyl carbon in the course of the nucleophilic
addition to form a new bond?
In the early 1970s Hans-Beat B€
urgi and Jack Dunitz began to extract information
about the geometric changes along such reaction steps from crystal structures. Before
there were movies and television, people developed creative ideas to bring pictures to
movement, for example, with flip-books (Fig. 14.14). These impart the impression of
the dynamic sequence of a story. Let us imagine that because of frequent use, the
pages of the little book have fallen apart and are now in disarray. You must bring
them into the correct order again. Ordering criteria are needed in this case. A similar
14.7 When Crystals Learn to Walk 305
task is posed for the organization of structural data to describe a reaction. Particular
crystal structures are sought from databases of known crystal structures (▶ Sect. 13.9)
in which an amino group is in the vicinity of a carbonyl group, as in the structure of
14.2. Finally they are brought into a logical order (Fig. 14.15).
The systematic comparison of crystal structure data affords a first understanding
of structural molecular properties, for instance, about the preferred conformation
(▶ Sect. 16.4). The geometry of non-covalent interactions can also be evaluated this
way. The side chain of the amino acid histidine contains an imidazole ring with its
two nitrogen atoms. In the neutral state one of these nitrogen atoms is a hydrogen-
bond acceptor, and the other is a donor. There are hundreds of molecules with an
imidazole ring in the database of low-molecular-weight crystal structures. In these
structures the imidazole ring has, in fact, acceptor and donor interactions, usually
with neighboring molecules. All these structures are superimposed upon one another
based on their common imidazole ring (Fig. 14.16). It shows in which spatial
direction the imidazole nitrogen atom’s hydrogen-bonding partner is found. The
task of estimating the possible interaction positions in the binding site of the protein
for the functional groups of a ligand is undertaken in the course of de novo drug
design (▶ Chap. 20, “Protein Modeling and Structure-Based Drug Design”). Fur-
ther, this information is needed for comparing the binding properties of molecules
(▶ Chap. 17, “Pharmacophore Hypotheses and Molecular Comparisons”) or for the
exploration of binding pockets for their preferred ligand-binding sites (hot spots).
H2N
O
CH3
HN
O
CH3
O O HO O
14.2 14.3
a
b
C(9)
C(8)
C(2)
C(3)
C(4) C(5)
C(6)
C(1)
O(1)
C(7)
N(1)
O(3)
O(2)
Fig. 14.13 If thermal energy is applied to a crystal of 14.2, the carbonyl group of the ester
function reacts with the amide NH2 group and an imide bond is formed between N1 and C8 to give
14.3 (a). There must be implied vibrational motion (b) that ends in the reaction. Simultaneously
the ester bond between C(8) and O(2) is cleaved during the reaction steps.
306 14 Three-Dimensional Structure of Biomolecules
Fig.
14.14
A
story
is
shown
in
static
pictures
in
flip-books.
If
the
different
pages
of
this
story
flip
past
the
eyes
quickly
enough,
the
impression
of
a
dynamic
process
is
given.
14.7 When Crystals Learn to Walk 307
The database Isostar, assembled at the Cambridge Crystallographic Data Centre,
holds numerous such contact geometries and spatial distributions available.
14.8 Solutions to the Same Problem: Serine Proteases with
Differing Folds Have Identical Function
It was shown in Sect. 14.4 that the amino acids that determine the folding
and function of a protein occur in separate parts of the structure. For enzymes
with the same function, Nature has come to the same solution, however, by different
folding.
The function and therapeutic meaning of serine proteases will be discussed in
more detail in ▶ Chap. 23, “Inhibitors of Hydrolases with an Acyl–Enzyme Inter-
mediate”. A unit of three amino acids, the so-called catalytic triad, plays a key role
in accelerating the hydrolysis of amide bonds by these enzymes. The two amino
acids serine and histidine, and an acidic amino acid, such as aspartic or glutamic
acid, are found in a characteristic spatial orientation. They are defined by the narrow
Fig. 14.15 The formation or cleavage of an amide bond occurs by nucleophilic addition.
A nucleophile, for instance, an oxygen or nitrogen atom, approaches the planar carbonyl carbon
atom. During the reaction it rises out of the plane of the three neighbors and adopts a tetrahedral
configuration. Examples were sought from low-molecular-weight crystal structures in which
a nitrogen atom approaches a carbonyl group between a single bond and a van der Waals
contact in the crystal packing. By superimposing these data it is recognizable that the approach
of the nucleophilic nitrogen towards the carbonyl group is “perfomed from back and behind.” With
this approach the carbon migrates out of the plane in the direction of the nucleophile. The
geometry of this reaction step also determines the structural composition of the catalytic center
of a variety of hydrolases (▶ Sect. 22.3).
308 14 Three-Dimensional Structure of Biomolecules
boundaries that are established by the reaction geometry required for a nucleophilic
addition (Sects. 14.6 and ▶ 23.2). Their composition is ideally suited for the
cleavage of amide bonds.
The enzyme trypsin is constructed from two barrel-like subunits (Fig. 14.17a).
The catalytic site is located at the interface of these two subunits. Subtilisin is
another serine protease that belongs to the class of open-pleated-sheet structures.
The catalytic triad occurs in a loop area at the edge of the pleated sheet
(Fig. 14.17b). If the amino acids that are involved in the catalysis are removed
from the protein and superimposed in space, the identical geometry of the triad is
obvious. In addition to the mentioned enzymes, this catalytic triad is also encoun-
tered in lipases and esterases (▶ Sect. 23.7), which also cleave peptide or ester
bonds. Although they display divergent scaffold folding, the geometric orientation
of their triads is once again identical.
14.9 DNA as a Target Structure of Drugs
Our genetic information is encoded on the DNA molecule. It is a thread-like
molecule approximately 20 Å in diameter and reaches a length of up to 2 m in
the extended form. It is constructed as a double helix (Fig. 14.18). On the outside,
a polymer chain of sugar and phosphate building blocks tighten themselves
like a guardrail around the base pairs. The latter bases form a complementary pair
on each step. Base pairs are coupled between one another by a hydrogen-bond
pattern. In doing so, a purine base (adenine A and guanine G) always interacts with
Fig. 14.16 The crystal packing of low-molecular-weight compounds affords an overview of
possible interaction geometries of hydrogen-bond donors (left) and acceptors (right) around the
nitrogen atoms of an imidazole ring. Accordingly all structures with an imidazole ring were sought
in which at least one of the two nitrogen atoms participates in a hydrogen bond. The superposition
of the structures shows where the positions of the interacting partners can be expected.
14.9 DNA as a Target Structure of Drugs 309
a pyrimidine base (cytosine C and thymine T, in the related RNA molecule,
thymine is replaced by uracil U; Figs. 14.18 and 14.19). The spiral staircase that
is formed has a pitch of 34 Å and reaches a full turn after ten steps. The two
mutually wound polymer strands form two grooves of different sizes on their
surfaces (Fig. 14.18). If the DNA is examined from the side along the steps
at the major and minor groove, the characteristics of the base pairs will be visible.
There are three functionalities in the minor groove that determine the interaction
with other molecules. In the major groove there are four. Interestingly the pattern
that is read in the major groove is unambiguous because of the exposed properties
for each base pair on a step. Only the difference between either AT/TA or GC/CG
can be distinguished in the minor groove (Fig. 14.19).
The base pairs on each three neighboring steps code for an amino acid
(▶ Sect. 32.6). To read this information unambiguously from the DNA, proteins
that regulate gene expression (so-called transcription factors) read the information
Fig. 14.17 Trypsin (a, red) and subtilisin (b, green) are serine proteases. They have the same
catalytic triad of serine, histidine, and aspartic acid. These function-determining amino acids are,
however, placed upon entirely different folding patterns. In the above-right picture, the course of
the chain of both proteins is superimposed upon one another (c). Despite this, the side chains of the
amino acids of the catalytic triad are in the same spatial position (d). The course of the polymer
chains are shown with colored ribbons that represent the spatial orientation of side chains of the
three catalytic amino acids.
310 14 Three-Dimensional Structure of Biomolecules
from the major groove, from the side (cf., ▶ Sect. 28.2). Only there is it possible to
read the prescribed code (AT, TA, GC, CG) unambiguously. Due to the many
outwardly oriented phosphate groups, the DNA molecule is heavily charged.
This charge is neutralized by the formation of ion pairs, mostly with magnesium.
Because of its important role in the mediation of genetic information, several
important drugs act on DNA. Two examples are briefly mentioned here. Cisplatin
14.4 is a reactive metal complex that can react with the nitrogen atoms of
two nucleobases on two adjacent steps of the DNA by exchanging both chlorine
substituents (Fig. 14.20). This crosslinking distorts the DNA in such a way that the
sequence information is no longer readable. Cisplatin and analogous derivatives such
as carboplatin are used in cancer therapy as potent chemotherapeutics. Daunorubicin
14.5 is a representative with a somewhat different mode of action, but it also prevents
the reading of the DNA base pairs. By slightly spreading the DNA along the chain the
planar molecular part of 14.5 slips largely between two adjacent base pairs and causes
a structural distortion of the DNA (intercalation). This intravenously administered
cytostatic is used as a combination scheme therapeutic for the treatment of acute
leukemias. Many natural products also use this so-called intercalation mechanism for
a b c
major groove
minor groove
Fig. 14.18 The DNA molecule is built of single stair steps. A base pair forms each step.
The sugar phosphate chain suspends the steps like a double banister. It forms a major and
a minor groove on the surface. (a) A segment of DNA with 14 base pairs, (b) a schematic
representation with the sugar phosphate backbone as a gray arrow, thymine (light blue) adenine
(red), cytosine (violet), and guanine (light green). (c) A model of a DNA surface in which the
size difference between the minor and major grooves is emphasized. The individual bases
align according to their interaction properties (blue: H-bond donor, red: H-bond acceptor, gray:
hydrophobic contact).
14.9 DNA as a Target Structure of Drugs 311
their antibacterial activity spectrum. Other pharmaceutical research approaches try to
use segments of DNA themselves for therapy. Such modified-oligonucleotide thera-
peutics are discussed in ▶ Sect. 32.4.
14.10 Synopsis
• Every third bond in the polymer chain of a protein is an amide bond. It is the
fundamental building block in the protein backbone and the mutual spatial
arrangement of the sequential planar amide bonds determines the overall archi-
tecture of a protein.
O
N
N
N
N
O
N
H
H
H
O
N
N
N
O
H
H H
O
N
N
N
N
N H
H
H O
N
N
O
O
CH3
H
O
N
N
N
N
O
N
H
H
H
O
N
N
N
O
H H
H
O
N
N
N
N
N
H
H
H
O
N
N
O
O
H3C
H
G C
A T
C G T A
G • • • • • C
C • • • • • G
A • • • • • T
T • • • • • A
major major
major major
minor minor
minor
minor
Fig. 14.19 The DNA base pairs of cytosine (C) with guanine (G) and thymine (T) with adenine
(A) on the individual steps are formed by complementary hydrogen bonds. Each base carries a sugar
phosphate group that is coupled with the polymer chain. It affords a double-helical construction
with a minor (green) and major (yellow) groove (cf. Fig. 14.18). If viewed from parallel to the steps,
four groups can been seen in the major groove that possess either hydrogen bond donors (blue),
acceptors (red), or hydrophobic properties (gray). Three such groups are aligned in the minor
groove. If an attempt is made to read the interaction pattern from this side, a GC or CG pair and a AT
or TA pair are recognized as identical. Here the orientation of the interaction pattern cannot be
distinguished. In the major groove, on the other hand, the pattern of exposed interaction is
unambiguous. Therefore, proteins read information about the DNA from the major groove.
312 14 Three-Dimensional Structure of Biomolecules
• Typical arrangements involving the amide NH and C═O groups in hydrogen
bonds lead to a-helical and b-pleated sheet structures. Reversal of the polymer
chain in space is achieved in turns that can adopt a variety of distinct geometries.
• Helices, sheets, and turns, the secondary structure elements, assemble into
motifs and domains to form the tertiary and quaternary structure of proteins.
• The function of a protein is not necessarily coupled to a particular folding
pattern, however, the catalytic and ligand-functional sites within a folding
class are found at the same position.
• Nature separates fold-stabilizing residues from function-carrying amino acids to
keep the dual optimization problem separated.
• Proteases recognize peptide sequences specifically via the binding in well-
tailored pockets on both sides of the cleavage site.
• Peptide libraries with an attached photometric or fluorescent label that can be
cleaved by the protease reaction help to elucidate the substrate profile of different
proteases.
Fig. 14.20 Crystal structure of an oligomeric DNA segment after a reaction with cisplatin 14.4
(a) or intercalation with daunorubicin 14.5 (b). In both cases the DNA molecule is severely
distorted and the genetic information on the DNA cannot be read for cell division. Cisplatin reacts
with the nitrogen atoms of two nucleobases (here guanine) of the DNA on neighboring steps with
substitution of both chlorine atoms. With its planar tetracyclic ring system, daunorubicin interca-
lates between two neighboring base pairs by spreading the DNA along the helix axis. The
compound’s amino sugar accommodates in the DNA minor groove.
14.10 Synopsis 313
• Structural arrangements of molecular portions found in multiple crystal struc-
tures can be arranged sequentially in a kinematic order to provide an idea of
a dynamic process.
• The spatial arrangement of amino acid residues exerting a particular chemical
transformation is highly conserved and can reside on protein architectures with
similar geometry that are constructed from deviating folds.
• The DNA molecule encodes our inheritance and forms a double helix of two
banister-like sugar-phosphate polymer chains wrapping around complementary
pairs of bases on successive steps. Through the H-bonding pattern of the central
bases each single DNA strand is complementary to the second strand. A minor
and a major groove are formed between the sugar-phosphate banisters. A unique
reading of the coding base pairs can be accomplished from the major
groove only.
Bibliography
General Literature
Branden C, Tooze J (1999) Introduction to protein structure, 2nd edn. Garland, New York
Bürgi HB, Dunitz JD (1994) Structure correlation, vol 1. VCH, Weinheim
Jeffrey GA, Saenger W (1991) Hydrogen bonding in biological structures. Springer, Berlin
Schulz GE, Schirmer RH (1978) Principles of protein structure. Springer, New York
Special Literature
Allen FA, Kennard O, Taylor R (1983) Systematic analysis of structural data as a research
technique in organic chemistry. Acc Chem Res 16:146–153
CSD Database: www.ccdc.cam.ac.uk/products/csd/
Klebe G (1994) The use of composite crystal-field environments in molecular recognition and the
de novo design of protein ligands. J Mol Biol 237:212–235
Koch O, Klebe G (2008) Turns revisited: a uniform and comprehensive classification of normal,
open, and reverse turn families minimizing unassigned random chain portions. Proteins: Struct
Funct Bioinform 74:353–367
Lario PI, Vrielink A (2003) Atomic resolution density maps reveal secondary structure dependent
differences in electronic distribution. J Am Chem Soc 125:12787–12794
Orengo CA, Jones DT, Thornton JM (1994) Protein superfamilies and domain superfolds. Nature
372:631–634
PDB Database: http://www.rcsb.org/pdb/home/home.do
Vyas K, Monahar H, Venkatesan K (1990) Thermally induced O to N acyl migration in
salicylamides. Thermal motion analysis of the reactants. J Phys Chem 94:6069–6073
Wood VJL, Patterson AW et al (2005) Substrate activity screening: a fragment-based method for
the rapid identification of nonpeptidic protease inhibitors. J Am Chem Soc 127:15521–15527
314 14 Three-Dimensional Structure of Biomolecules
Molecular Modeling
15
Molecules are most commonly communicated in chemistry as two-dimensional
molecular representations. This formalism is tried and true and has proven to be
enormously fruitful. The ability of a chemist to quickly comprehend and intellec-
tually process structures should not be underestimated. The notation nonetheless
has its limitations. In particular, the three-dimensional shape of a molecule is not
directly apparent from the chemical formula. The geometry, however, is of great
importance for the physical, chemical, and biological properties of drugs and
consequently for drug design as well. Therefore structure determination (▶ Chap.
13, “Experimental Methods of Structure Determination”) is granted special impor-
tance. Whenever possible, the experimentally determined 3D structure of the active
substance and the target protein is consulted to explain the structure–activity
relationship. That notwithstanding, there is often the problem that these structures
are not always available. In these cases, the explanation for the experimental results
is limited to the structural consideration of generated models.
15.1 3D Structural Models as Well-Established Tools in
Chemistry
Three-dimensional structure models have been used since Jacobus H. van’t Hoff
and Joseph Le Bel. Emil Fischer reported in his book Aus meinem Leben about
a vacation in Italy:
In the previous winter 1890/91 I was busy with the task of clarifying the configuration of
sugar, without entirely achieving my goal. Then the thought came to me in Bordighera that
the decision about the configuration of pentose has to do with its relation to trioxyglutaric
acid. Unfortunately for lack of a model I could not tell to what extent such acids are
possible according to theory and I therefore posed the question to Baeyer. He picks up such
things with great enthusiasm, and directly constructed carbon atoms from balls of bread
and toothpicks. But after many attempts he gave the cause up, ostensibly because it was too
hard. Later in W€
urzburg after considering good models at length, I managed to find the
conclusive solution.
G. Klebe, Drug Design, DOI 10.1007/978-3-642-17907-5_15,
# Springer-Verlag Berlin Heidelberg 2013
315
Linus Pauling was the first to propose the a helix as a secondary structure in
proteins.
The key to Linus’s success was his reliance on the simple laws of structural chemistry. The
a-helix had not been found by only staring at X-ray pictures. The essential trick, instead,
was to ask which atoms like to sit next to each another. In place of pencil and paper, the
main working tools for this work were a set of molecular models superficially resembling
the toys of pre-school children.
With these sentences the Nobel prize winner James Watson described the
approach of Pauling in his book The Double Helix. Pauling’s success was also
based upon well-founded proficiency in theoretical chemistry. That is how Pauling
knew that an amide bond is stiff and flat, whereas his rivals, William Bragg, Max
Perutz, and John Kendrew, were of the misconception that they would be flexible.
James Watson and Francis Crick went the same way as Pauling in the search for the
DNA structure:
We could thus see no reason why we should not solve the DNA problem in the same way [as
Pauling]. All we had to do was build a set of molecular models and begin to play—with luck
the structure would be a helix.
Working with molecular models must not have been pure pleasure back then.
In one place in the book, for example, he writes:
Our first minutes with the models, though, were not joyous. Even though only about fifteen
atoms were involved, they kept falling out of the awkward pincers set up to hold them the
correct distance apart.
Later other problems were talked about:
No serious models were built, however, for several days. Not only did we lack the purine
and pyrimidine components, but we had never had the shop put together any phosphorus
atoms. Our machinist needed at least three days merely to turn out the simplest phosphorus
atoms. . .
Based on this background the achievement of Watson and Crick seems even
more impressive. They were awarded the Nobel Prize in 1962 for the elucidation of
the double-helix structure of DNA. This example should underscore the importance
of models in science. To end with a word from Francis Crick: “A good model is
worth its weight in gold.”
15.2 Strategies in Molecular Modeling
In contrast to the 1950s and 1960s, computers are available today with impressive
graphical performance and high computing speed. Accordingly, programs are
available for working with molecular models. The new field of molecular model-
ing has been established. This term encompasses the display and manipulation of
realistic three-dimensional molecular structures along with the calculation of their
physicochemical properties. The most important methods that are employed in the
context of molecular modeling are summarized in Table 15.1.
316 15 Molecular Modeling
In principle, molecular modeling can be approached from two sides. One
possibility is to extrapolate the geometry and physicochemical properties to the
investigated structure from known experimental data. In the other approach an
attempt is made to obtain as accurate a computed prediction as possible by starting
from first principles. Quantum chemical methods and force-field calculations
belong to this strategy. In practice both approaches are used in parallel and are
increasingly coupled to one another. When relevant experimentally determined
structures are available, it would be silly not to use these for the model construction.
On the other hand, the quantum chemical and molecular mechanical approaches are
broadly applicable and deliver reliable results.
The construction of a structural model is achieved in three steps:
• Generation of a starting model
• Optimization and analysis
• Work with the model.
It is advisable to stay as close to experimental structures as possible when
generating the starting model. For this, the crystal structure of an active
substance can be consulted. The Cambridge Crystallographic Database, in which
experimentally determined structures of small molecules are stored, is searched,
Table 15.1 Overview of the
most important molecular
modeling approaches in
pharmaceutical research
Technique Objective
Interactive computer
graphics
Display of 3D structures
Modeling small molecules 3D Structure generation
(CONCORD, CORINA)
Molecular mechanics—force
fields
Molecular dynamics
Quantum mechanical techniques
Conformational analysis
Calculation of physicochemical
properties
Comparing molecules Superimposition of molecules
according to their similarity
Volume comparisons
3D-QSAR (e.g., CoMFA
methods)
Protein modeling Sequence comparisons
Protein homology modeling
Protein-folding simulations
Modeling of protein–ligand
interactions
Binding constant calculations
Ligand docking
Ligand design Searches in 3D databases
Structure-based ligand design
de novo design
Virtual screening
15.2 Strategies in Molecular Modeling 317
and the geometry of the resulting hits most closely resembling the query molecule
are used. In the next step the molecule is optimized by a force-field calculation.
There are also standard programs for the generation of starting models that
translate a 2D structure formula into a 3D spatial structure according to the
principle of a molecular model kit. These “electronic molecule-construction kits”
have lists of bond lengths and angles as well as preferred fragment geometries
stored, and build molecules according to a sophisticated system of rules. In frac-
tions of seconds they determine the 3D spatial structure for the 2D structural
formula. The program CONCORD from Robert Pearlman in Austin, Texas, and
CORINA from Johann Gasteiger and Jens Sadowski at the University of Erlangen
are among the most important. Both programs are used to generate 3D structures of
small molecules. The 3D structure of a protein, however, cannot be built with these
programs. More sophisticated techniques are necessary for proteins (▶ Sect. 20.1).
15.3 Knowledge-Based Approaches
Perhaps the most often used technique for molecular modeling is the so-called
knowledge-based approach. Here an attempt is made to exploit the enormous
accumulated knowledge from experimentally determined molecular structures,
crystal packings, protein structures, protein sequences, and structure–activity rela-
tionships from protein–ligand complexes, etc., to efficiently solve the relevant
problem. Basically nothing more is done here than to imitate the approach that
a conscientious scientist would take with a computer program. Initially as much
experimental data as possible is collected and analyzed. Important information
sources are the Cambridge Crystallographic Database with over 500,000 crystal
structures of small molecules as well as the protein databank (PDB) with more than
80,000 protein and DNA structures. Physicochemical properties are also available
in databases. The Beilstein database, with almost 10 million chemical structures,
contains, for example, pKa values for more than 20,000 compounds. The challenge
lies in the extraction of the necessary data for the question at hand from the
enormous plethora of electronically available information. Furthermore, it must
be considered that the data comes from different sources and could be partially
erroneous.
The largest growth in electronically available data recently has occurred in the
area of DNA sequences. Hundreds of genomes have been sequenced, and new ones
are added weekly. The nearly endless number of sequences can only be conquered
with intelligent searching protocols. Knowledge-based approaches play a central
role in this area and in the modeling of protein structures.
15.4 Force-Field Methods
Force-field methods, also known as molecular mechanics, are empirical tech-
niques for the calculation of molecular geometries. The goal of a force-field
318 15 Molecular Modeling
calculation is the determination of an energetically favorable three-dimensional
structure of a molecule, or of a complex of several molecules. The forces that act
between the atoms are described in an analytical form with the appropriate param-
eters. Covalent and non-covalent forces are considered. The central idea of molec-
ular mechanics is the assumption that the bond lengths and angles adopt values that
are close to standard values in molecules. Steric interactions, that is, the repulsion
of two atoms that are not directly connected to one another, can lead to the situation
that some bond lengths and angles cannot adopt their ideal values. These repulsive
interactions are also called van der Waals interactions. For the first time in 1946,
three terms, van der Waals interaction, bond stretching, and angle deformation,
were proposed that should be enough to calculate the structure and energy of
molecules. However, at that time the execution of such calculations was extremely
difficult. It was only after the availability of computers increased that molecular
mechanics calculations gained importance. In addition to the three originally
proposed terms, a typical force-field that is used today contains at least one
additional contribution that considers rotations around the dihedral angles
(Fig. 15.1). Furthermore, many force-fields use terms for electrostatic interactions.
For this, a partial charge must be assigned to each atom. The sum of these charges
results in the formal charge of the entire molecule. In most cases, this is set to zero.
Coulomb’s law is used to describe the forces that occur between charges. This
law states that the product of interacting charges is inversely related to the square of
the distance between them, or considering the potential, it is inversely related to the
distance. The assignment of the charges and the correct choice of dielectric constant
is critical for the correct treatment of electrostatic energy contributions. These
values are in the denominator of Coulomb’s law and can adopt values between
e ¼ 80 for water and e ¼ 1 for vacuum. With this, the electrostatic interactions in
water are very quickly damped, whereas in a vacuum they tend to reach further. The
choice of the correct dielectric constant for force-field calculations in proteins is
very difficult. Many values between e ¼ 4 and e ¼ 20 have been tried. The constant
is sometimes assumed to be environmentally dependent so that larger values are
chosen next to the surface than for the protein interior. The van der Waals
interactions are described by the Lennard–Jones potential. This interaction has an
attractive term that falls at a rate of 1/r6
, and a repulsive term that falls at a rate of
1/r12
(Fig. 15.1). The result of the combination of these terms is a gradient that is
very large near the atoms, and that approaches zero the larger the distance becomes.
In the middle it passes through a potential energy minimum (▶ Fig. 18.5). In
addition to the A/r6
–C/r12
gradient, other distance dependencies with other poten-
tials or exponential gradients in force-fields are used.
A force-field is derived by calibrating upon the experimental data and upon the
results of high-level quantum mechanical calculations. For this the 3D structures of
small molecules as well as infrared and Raman spectroscopy-derived force con-
stants are used. It is clear that different parameters must be used for a single bond
between two carbon atoms than for a double bond. Therefore multiple different
atom types per element are used in a force-field. The crystal packing of small
organic molecules can be consulted for nonbonding interactions. Amino acids and
15.4 Force-Field Methods 319
many functional groups of active compounds can occur in a protonated or
deprotonated state according to the applied pH conditions (so-called titratable
groups). The strength of the interactions strongly depends on the charge state of the
involved functional groups. The acidity or basicity of a given functional group is
E = EBond length + EBond angle + ETorsion + ENon-covalent
Bonds
E = Kb (b −b0)2
∑
1
2
Bond angle
+ KΘ(Θ − Θ0)2
∑
1
2
Torsion angle
+ KΦ(1+cos(nΦ−d)2
∑
1
2
Nonbonding atom pairs
+ (Aijrij
−12 − Cijrij
−6 + qiqj / εrij)
∑
Fig. 15.1 E is the total energy of a molecule or a complex of several molecules. It is composed of
various contributions. The first term describes the energy change upon stretching or compressing
a chemical bond. In the example at hand, it describes the so-called harmonic potential with the
force constant Kb and the equilibrium bond length b0 as a parameter. The energy as a function of
the bond angle Y is described in the second term. Here too, the harmonic potential is used with the
force constants KY and an equilibrium constant Y0. The third contribution describes the change in
the energy upon changing the dihedral angle, and the last term stands for non-covalent interactions.
The sum of three terms is used for this last contribution. The first term Aij/rij
12
is always positive
and rises quickly with decreasing distance. It describes the repulsion between atoms that come too
close together. The contribution from Cij/rij
6
is always negative and approaches zero with
increasing distance rij, though not as fast as the repulsive term. It describes attractive interactions,
which are also called dispersion interactions. Other attractive interactions exist between polar
molecules that are also proportional to 1/rij
6
(for a description of the potentials see ▶ Sect. 18.12,
▶ Fig. 18.5). The last term qiqj/erij describes the electrostatic interactions based on Coulomb’s law,
which are based on a point charge model. The dielectric constant is e. The non-covalent contri-
bution to the total energy, without the electrostatic term, is called van der Waals energy.
320 15 Molecular Modeling
determined by its pKa value. This indicates how easily a group accepts or releases
a proton. This property, in turn, depends heavily upon the partial charge that the
group carries and what other charges are in the immediate vicinity of the group.
Thus, the pKa value shifts if a functional group comes into an altered environment.
For example, carboxylic acids become more acidic when they are brought near
a positive charge. Their acidic nature changes, on the other hand, if a partially
negatively charged group is nearby. This effect must be considered in a reliable
force-field calculation. An attempt can be made to predict the protonation state in
protein–ligand complexes with such calculations. For this, the contribution to the
energy content of the complex is determined by evaluating all possible combina-
tions of states of titratable groups. In this way the shift in the pKa values of
functional groups can be estimated.
The importance of water as a binding partner in the formation of protein–ligand
complexes was emphasized in ▶ Chap. 4, “Protein–Ligand Interactions as the Basis
for Drug Action”. Complex formation causes a change in the solvation conditions
for the involved molecules. This must be considered in the force-field calculations.
For this, a force-field is combined with estimations for the contribution from
solvation. Newer methods such as the MM-PBSA or MM-GBSA methods try to
sum up these contributions over the local environment in a surface-dependant way.
The choice of a relevant starting geometry is important for any force-field
calculation. A force-field calculation leads to an energy minimization. By starting
from an energetically unfavorable geometry, the force field drives “downhill” to the
next local minimum on the multidimensional energy surface (▶ Sect. 16.2). If one
starts with two different geometries, the resultant minimized structure can also
be different. Many molecules and especially protein–ligand complexes can adopt
numerous energetically favorable conformations. It is therefore recommended that
multiple force-field calculations are performed by starting from different
geometries.
15.5 Quantum Chemical Methods
In quantum mechanical approaches, the electronic structure of molecules is calcu-
lated by using the Schrödinger equation. Its mathematically closed solution is,
however, only possible for simple cases such as the hydrogen atom or the molecular
ion of hydrogen, H2
+
. For molecules with multiple electrons, approximate methods
must be used for the solution of the quantum mechanical “many-body problem.”
The most commonly used approximation is the so-called Hartree–Fock method.
Here, the many-body problem is reduced to multiple single-body problems. The
sum of the electron–electron interactions within a molecule is replaced with an
effective field that can be iteratively refined. It is from this that the commonly used
name, SCF (self-consistent field) is derived. Each electron in this model “sees”, in
addition to the potential of the nuclei, the averaged potential of the remaining
electrons. The state of each electron in a molecule is described by a single-particle
15.5 Quantum Chemical Methods 321
function, the so-called atomic orbital (AO) or, in a molecule, molecular orbital
(MO). The wave function of the entire molecule is applied as the antisymmetric
product of these many orbitals. The Hartree–Fock equation is obtained on the
condition that optimally chosen orbitals lead to minimal energy. The main defi-
ciency of the Hartree–Fock approach, namely, neglecting the electron correlation,
can be corrected with more elaborate methods, whereby the calculation time,
however, severely increases.
Quantum mechanical ab initio calculations allow the calculation of the molec-
ular structure and electron density distribution as well as molecular properties
without the assumptions that are necessary for force-field calculations. In many
cases it is difficult to make predictions a priori based on the hybridization state of
the atoms. In the case of amines and sulfonamides, it is often impossible to predict
whether the atoms that are bound to nitrogen are in the same plane or whether
nitrogen is in a pyramidal environment. In a force-field calculation one must specify
from the very beginning what atom type is to be assigned to which atom (i.e., for
the above case, whether it should be a planar or a pyramidal nitrogen atom). If
the wrong atom type is chosen, the resulting structure is, of course, meaningless.
Quantum mechanical calculations require no such assumptions.
The majority of currently applied force-fields use a point-charge model to
describe the electrostatic interactions. One possibility to derive the atomic charges
is to calculate the electrostatic potential of a small molecule that contains the group
in question by using quantum-mechanical methods. Subsequently, a set of partial
charges is assigned to the various nuclei so that the quantum mechanically calcu-
lated potential is depicted as accurately as possible. These charges can then be
transferred to force-field calculations to be used in a large system.
A further important application of quantum-mechanical calculations in drug
design is found in the calculation of conformational energies of small molecules
to calibrate force-fields. The force-fields that have been developed for proteins
and peptides are based on conformational energies that have been quantum-
mechanically calculated for small peptides.
In contrast to force-field methods, quantum-mechanical techniques are able to
consider the polarization of the electron density caused by the influence of neighboring
groups. For example, the amide bond dipoles in an a helix are all oriented in the same
direction so that they sum up to a significant total dipole moment. As a consequence,
such large compiled dipoles can polarize other groups that are localized at the end of
the helix. In this way the induced dipoles are incompletely described by force-field
methods. For quantum-mechanical methods, this is not a problem. A further important
application area is chemical reactions for which force-fields are hardly parameterized
at all, with the exception of a few special cases. Here quantum mechanical methods
are the only possibility for theoretical description.
Quantum-mechanical methods are considerably more elaborate than force-field
methods. The most accurate methods, which also devour the most calculation time,
are the so-called ab initio methods. These techniques meet their limits however,
with very large systems. Therefore other less computationally demanding methods
were developed. In these so-called semiempirical methods, certain integrals, the
322 15 Molecular Modeling
determination of which represents the rate-determining step in ab initio methods,
are replaced with adequate approximations that are quickly calculated. The drasti-
cally reduced calculation time that results, which is nevertheless accompanied by
reduced accuracy, allows the routine application of semiempirical calculations to
active molecules and proteins. Density functional theory represents another faster
ab initio technique. With this method, the position-dependent electron density
distribution is calculated in the ground state for a many-body system; the complete
solution to the Schrödinger equation for a many-body system is avoided. All of the
interesting properties are then derived from the electron density. Techniques have
been developed for large protein–ligand systems that treat the interesting areas, for
example, the binding site or the catalytic reaction center, quantum mechanically.
The surrounding areas are approximated with a faster force-field method (QM/MM
methods).
15.6 Computing Molecular Properties
The result of a molecular mechanics or quantum chemical calculation is at first a set
of atomic coordinates that define the three-dimensional shape of the molecule.
What can be done with this? An important application of the calculations is the
determination of conformational energies: this is the relative energy of a molecular
conformation in comparison to another (▶ Sect. 16.1).
Two further molecular properties can be calculated: the form and size of
a molecule along with its electronic characteristics. All of the currently used
graphics programs have multiple options for the display of the spatial structure of
molecules. The most important are summarized in Fig. 15.2.
The most often used representation is a line or stick representation (Dreiding
models), sometimes atoms are displayed as little spheres. As a general rule, a color-
coding is used to denote the atoms; nitrogen is blue, oxygen is red, sulfur is yellow,
fluorine is turquoise, chlorine is green, bromine is brown, and iodine is violet.
Hydrogen atoms are shown in white, but usually they are omitted for the sake of
clarity. Carbon atoms are generally shown in black or gray. In the majority of
figures in this book, carbon atoms that belong to protein are shown in orange, and
carbon atoms that belong to the ligand are shown in gray. Another display option is
the space-filling model, with which van der Waals surfaces are shown. For this
representation each atomic nucleus is shown with a sphere, the size of which
corresponds to the van der Waals radius. Values for these radii come from the
crystal packing or from very exact ab initio calculations. Such representations are
also known as CPK models (named after the scientists Corey, Pauling, and Koltun).
Furthermore there are other options for displaying surfaces (Fig. 15.3). The
solvent-accessible surface has proven particularly valuable for proteins. The
most-used protein-display form in this book is transparent-opaque white surfaces.
The van der Waals surfaces in Fig. 15.3a give the impression that a crack is present
at the position that is marked with the arrow. This crevice, however, is so narrow
that no other atom fits inside. Therefore the solvent-accessible surface (Fig. 15.3b)
15.6 Computing Molecular Properties 323
is less misleading. It is generated by rolling a sphere with a radius of 1.4 Å, which
corresponds to the size of a water molecule, over the surface of the molecule. This
surface appears much smoother. Depressions that are still present mean that
small molecules – at least a water molecule – can really fit in there. The Lee–
Richards surface is less frequently used but very helpful. It is so chosen that ligand
atoms that come into contact with the examined surface lie directly on this surface
(Fig. 15.3c).
The surface can be colored too. For example, a color can be assigned to each
atom type, and then the color of the next-closest atom can be used for the surface.
A representation in which the molecule’s surface is colored according to other
properties, for example, electrostatic or hydrophobic potential, is very instructive.
Fig. 15.2 Different computer graphics representations of dopamine (▶ Sect. 1.4, Formula 1.13).
Carbon atoms are colored gray, hydrogen atoms are white, nitrogen atoms are blue, and oxygen
atoms are red. (a) Dreiding models. (b) Ball-and-stick models. (c) Space-filling models (CPK
representation). (d) Solvent-accessible surface. (e) Electrostatic potential projected on the surface
(positively charged areas are blue, negatively charged areas are red). (f) Highest-occupied
molecular orbitals (HOMO), calculated for the uncharged dopamine molecule. The blue or red
areas of the wave function indicate a different sign.
324 15 Molecular Modeling
15.7 Molecular Dynamics: Simulation of Molecular Motion
None of the processes that are interesting to us run at 0 Kelvin, but rather at body
temperature, which is approximately 310 Kelvin. It is therefore clear that not only
the potential energy but also the kinetic energy must be considered. Molecules
move at room temperature. They diffuse and change their shape in that they adopt
different conformations. The flexibility and adaptability of both partners play a big
role in protein–ligand interactions. A prerequisite for protein binding is that the
ligand can take on a conformation that corresponds to the shape of the binding
pocket. On the other hand, the protein is flexible to a certain extent. For example,
side chains on the surface can adopt different conformations or entire domains can
move relative to one another. The mutual adaptation of protein and ligand shapes
plays an important role in the formation of protein–ligand complexes in particular.
The molecular dynamics simulation (MD) is a theoretical method to describe
these effects. In molecular dynamics simulations the movement of atoms and
molecules is followed under the influence of the chosen force-fields. It is assumed
a
b
c
Fig. 15.3 Definitions of molecular surfaces (a) van der Waals surface. The arrow marks a place
where a crevice is found, but it is too small to accommodate a water molecule. (b) Solvent-
accessible area. (c) Lee–Richards surface.
15.7 Molecular Dynamics: Simulation of Molecular Motion 325
in these calculations that the interactions between particles obey the laws of
classical mechanics. For this, the Newtonian equations of motion are solved in
parallel and stepwise for all particles simultaneously. Usually it is assumed that the
force between two particles is not influenced by other particles.
In practical applications, a starting geometry is generated at first (Fig 15.4). If an
experimentally determined structure, for instance, the crystal structure of a protein–
ligand complex, is available, then that is the starting point. To take the surrounding
water shell into consideration, the complex is dipped into a “water bath,” that is,
a large number of water molecules enclose it. Further, an adequate number of ions is
added to keep the whole system in an electrically neutral state. To prevent boundary
effects on the “walls,” a trick called “periodical boundary conditions” is used on the
water bath. If the simulated protein complex approaches such a wall and wants to
leave the water bath, the process is handled on the computer as though the complex
had again entered from the opposite side. Formally, the boundary areas of the water
bath are eliminated.
In the beginning of the actual simulation each atom is assigned a random starting
velocity with an arbitrary orientation. The velocities are chosen so that on average
they correspond to the desired temperature (Boltzmann distribution). Then all
forces from all surrounding atoms acting on a particular atom are calculated.
At set time intervals the next position is calculated with Newtonian motion equa-
tions, and so forth. The step width is typically a femtosecond (1 fs ¼ 1015
s). This
small step width is necessary because there are many extremely fast processes that
occur on the molecular level. The development of the movement is followed for
multiple nanoseconds, and is shown in terms of a trajectory. Ten nanoseconds are
enough to follow the movement of side chains and sometimes even of protein
domains. It is not enough, however, to describe the diffusion of an active compound
into the binding pocket. For this, longer simulation times are necessary. The folding
of a protein is also difficult to simulate with this technique. The necessary time for
protein folding is on the actual time scale between 20 ms and 1 h. The calculation of
one time step (1 fs) still requires seconds of processing time on even the fastest
computers. Nonetheless new algorithms and computers with more specific archi-
tectures are being developed that will make such simulations possible in the
foreseeable future.
Another application of MD simulations, the calculation of binding affinity,
should be mentioned here. In principle the free energy DG for a given system can be
calculated. From the point of view of statistical thermodynamics the so-called
partition function (German: Zustandssumme) is determined for this, in which the
energetic contributions of all possible configurations of a system are considered.
The entropic component of the system is automatically calculated by determining
the distribution and relative population of the many states. Differences in the free
binding energy of different ligands is of particular interest in the context of protein–
ligand interactions. Experience has shown, however, that only differences in the
binding free energy between two similar ligands can be reliably calculated. In
modern applications (e.g., for screening purposes, ▶ Sect. 7.4), particularly large
amounts of data are evaluated. Therefore, the effort associated with MD
326 15 Molecular Modeling
calculations to estimate the binding affinities can hardly be afforded. Furthermore,
many simple empirical methods allow a good affinity estimation to be made that is
of similar quality. Therefore these faster methods are more readily used.
15.8 Dynamics of a Flexible Protein in Water
The most important application of molecular dynamics simulations is undoubtedly
the ability to follow the motion of one or more molecules in solution. For example,
which parts of a protein’s binding pocket or a ligand are rigid upon protein–ligand
complex formation and which are flexible, can be investigated.
The enzyme aldose reductase has proven to be a very flexible protein. It is
capable of adapting its binding pocket to the shape of a complexed ligand in
versatile ways. This property is related to the biological function of this protein.
It reduces a very broad palette of aldehyde substrates. Its exact function and role as
a target structure for a drug therapy is discussed in ▶ Sect. 27.5. Highly flexible and
adaptive proteins pose a special challenge to drug design. From the many crystal
structure determinations it became apparent that there are several parent confor-
mations for aldose reductase that are most likely in a dynamic equilibrium with one
another. A binding ligand picks out a conformation from this equilibrium that fits,
and the conformation becomes stabilized upon binding. These considerations are
Generate Start Coordinates
Choose Starting Velocity
Calculate Forces
(Pair Approximation)
Calculate Velocity
and New Coordinates
Save Coordinates
Another Step?
Yes
End
No
Fig. 15.4 Schematic course
of a molecular dynamics
simulation. The starting
geometry is either an
experimentally determined
structure or a geometry that
was optimized with
force-fields. Each atom is
assigned an appropriate
starting velocity. Then the
movement equations are
stepwise solved with these
starting conditions and the
coordinates are periodically
saved.
15.8 Dynamics of a Flexible Protein in Water 327
applied to GPCRs in particular, which are introduced in ▶ Chap. 29, “Agonists and
Antagonists of Membrane-Bound Receptors”.
Matthias Zentgraf carried out extensive molecular dynamic simulations on
aldose reductase. The profile that resulted was consistent with the crystallographic
structure determinations. Amino acids that are repeatedly found in many protein–
ligand complexes with modified geometries were shown to be very flexible in MD
simulations as well. If the trajectory of such simulations is evaluated, it is apparent
that the protein flips between the above-mentioned parent conformations. Addi-
tionally, many geometries occur that have only small but structurally critical
variations to these parent conformations. Small areas in the binding pocket are
thus opened that are able to accommodate, for example, an additional methyl group
or a phenyl ring on a ligand. Such information can be directly used for the design of
new inhibitors.
To provide an overview of the flexibility of a protein, the variation of the atom
positions is calculated from one simulation state to the next along a trajectory. Just
as with photographic film, these momentary pictures of complexes are called
“snapshots.” Above all, it becomes transparent if a protein fluctuates for
a particular time in one conformation before it flips into another geometry. In
further progress it can either return to the original geometry or flip into another
basis geometry. Such an orientation map is shown in Fig. 15.5. From this map,
it can be extracted that the protein spends time in multiple parent conformations.
If representative snapshots from these clusters of basis conformations are
superimposed upon one another, a very good picture of which groups in the binding
pocket show enhanced flexibility is obtained. In the example at hand, the side
chains from two neighboring phenylalanines (Phe121 and Phe122, Fig. 15.6) are
particularly implicated. These can swing out of the way to open a new, previously
closed cavity in the binding pocket. In the context of drug design, such information
can be translated into the design of new inhibitors that can occupy new binding
pockets. In this way an improved affinity or selectivity for the target protein can be
achieved. A ligand is shown in Fig. 15.7 that has been furnished with an additional
benzyl group (red), that optimally fills the newly opened cavity in the snapshot (in
light blue) in Fig. 15.6.
15.9 Model and Simulation: Where Are the Differences?
To conclude this chapter, the terms “model” and “simulation” should be briefly
compared and contrasted. Molecular models are used to approach questions that are
experimentally difficult or impossible to address. What different conformations can
a molecule adopt? This question is currently difficult to answer experimentally.
Does a possible drug candidate fit into a protein’s binding pocket? Even this
question is only answerable with laborious experiments. The use of models is
an elementary component of every scientific discipline. Models have always played
a central role in chemistry. It is shown in ▶ Chaps. 23, “Inhibitors of Hydrolases
with an Acyl–Enzyme Intermediate”; ▶ 24, “Aspartic Protease Inhibitors”;
328 15 Molecular Modeling
▶ 25, “Inhibitors of Hydrolyzing Metalloenzymes”; ▶ 26, “Transferase Inhibitors”;
▶ 27, “Oxidoreductase Inhibitors”; ▶ 28, “Agonists and Antagonists of Nuclear
Receptors”; ▶ 29, “Agonists and Antagonists of Membrane-Bound Receptors”;
▶ 30, “Ligands for Channels, Pores, and Transporters”; ▶ 31, “Ligands for Surface
Receptors”; and ▶ 32, “Biologicals: Peptides, Proteins, Nucleotides, and
Macrolides as Drugs” how models, built on the basis of crystal structures of
protein–ligand complexes, afford important contributions to drug design, especially
in the preselection of possible molecular candidates for synthesis.
The term “simulation” describes the calculations with models. Multiple options
or variable combinations can be quickly evaluated on the computer for a given
mathematical model. Such investigations can contribute considerably to a better
understanding of the system. Next to theory and experiment, computer simulations
have been called the third pillar of exact science.
Number of Snapshots
Number
of
Snapshots
2D RMS Diagram
rmsd
[Å]
600
500
400
300
300 400 500 600
200
200
100
100
0
0
0
0.3
0.6
0.9
1.2
1.5
1.8
Fig. 15.5 The development with time of the spatial deviations of various snapshots along the
simulation trajectory are visualized on this map. Large deviations are color-coded with red, medium-
sized deviations with green, and small deviations are colored blue. Green delineated square areas are
recognizable along the main diagonal. There the complex spends time near a parent conformation.
The transition to the next square represents a flip to a new geometry. If sectors outside the main
diagonal are colored increasingly red, the geometry deviates strongly from the previously adopted
conformation. If an area outside the diagonal is reached that is green, the newly adopted geometry is
not very different from a state that the system reached one time. With such a map it is possible to see
which of the many parent conformations a complex swings between.
15.9 Model and Simulation: Where Are the Differences? 329
Beware of too-high expectations in the area of drug design! It should not be
overlooked that the performance of a reasonable simulation requires that the
fundamental model is accurate and its limitation are well understood. This prereq-
uisite is indeed well met in many areas of engineering science so that a simulation
plays an important role in the design of automobiles or computer chips. Unfortu-
nately, in chemistry things are more complicated. The currently available molecular
models allow the assembly and ranking of compounds that are to be synthesized.
They can also be used to design ligands with improved binding properties. None-
theless, the present models are often not exact enough to allow detailed simulations
of protein–ligand complexes with sufficient accuracy to determine a binding
energy. In view of the importance of this field, this can only mean that more effort
must be exerted for the collection of experimental data for the development of
improved models.
Fig. 15.6 Representative snapshots were taken from the different square area along the main
diagonal in Fig. 15.5 and superimposed upon one another. It can be seen that above all else, the
side chains of the phenylalanines Phe121 and Phe122 can undergo severe movements in the
binding pocket. In doing so, they can also adopt conformations (e.g., the light-blue geometry) that
open a new hydrophobic cavity in the binding pocket.
330 15 Molecular Modeling
15.10 Synopsis
• Models have been and still are used in chemistry in general, but in particular in
modern drug design. Computer graphics is a versatile tool to display structures
and models along with various properties assigned and/or geometrically
superimposed onto these molecules.
• Structures can be calculated by starting from first principles and by trying to
regard physics as closely as possible. This is done with quantum mechanical
calculations. Because these methods easily become elaborate and computation-
ally intractable, an alternative is the empirical approaches. They are based on
much simpler physics, normally classical mechanics, and treat molecules as a set
of point charges in space interconnected by springs following harmonic
potentials.
• Empirical approaches can only be used if enough experimental data are available
to parameterize and calibrate the empirical concepts. Therefore large databases
assembling knowledge about molecular properties have been developed.
COOH
N
N
O
Fig. 15.7 Conformations occur along the trajectory of the protein that open a new hydrophobic
pocket when the side chain of a phenylalanine swings away (Fig. 15.6, i.e., light-blue geometry).
This pocket can be occupied by a ligand. For this a benzyl group was added to the scaffold of the
shown benzodiazepine-like inhibitor, which can occupy the opened pocket during the simulation.
15.10 Synopsis 331
• Molecular mechanics to compute the geometry of molecules are based on
empirical force-fields. They comprise multiple energy terms that describe
mutual interactions in the molecule either through bonds or through space.
Particular potentials are used to describe the torsional barrier to rotations around
single bonds. Furthermore nonbonded interactions are handled by special
potentials.
• The accuracy and required computational capacity of quantum chemical
approaches depend on the sophistication of the basis sets of atomic or molecular
orbitals used for the calculations. Parameterization of some parts of the calcu-
lations with empirical data can significantly reduce the computational require-
ments. Density function theory is a faster approach and works with electron
density distributions instead of orbitals. Combinations of quantum chemical
methods and force-field approaches have been developed to handle large sys-
tems such as protein–ligand complexes.
• Properties such as charges can be displayed on the surface of molecules.
Different types of surfaces have been defined such as the van der Waals surface
or the solvent-accessible surface.
• Molecular dynamics simulations are normally based on potentials derived from
empirical force-fields. They consider the properties of a molecule under dynamic
conditions by solving Newtonian equations of motion. As a result, the motion of
a molecule can be evaluated with time by analyzing the so-called molecular
trajectory.
• Molecular dynamics simulations can be used to study the flexibility of a protein
next to its ligand-binding site. Such simulations can show multiple conforma-
tions of the protein that are competent to accommodate different ligands.
• Computer simulations allow the possible properties of molecules under differ-
ent test conditions to be enumerated. They help to interpret results from
experiments or help to predict properties of molecules to better plan the next
experiments.
Bibliography
General Literature
Barnickel G (1995) Molecular modelling – von der Theorie zur Wirklichkeit. Chemie in unserer
Zeit 29:176–185
Birner P, Hofmann HJ, Weis C (1979) MO-theoretische Methoden in der organischen Chemie.
Akademie-Verlag, Berlin
Burkert U, Allinger NL (1982) Molecular mechanics, ACS monograph 177. American Chemical
Society, Washington, DC
Goodfellow JM (ed) (1995) Computer modelling in molecular biology. VCH, Weinheim
Kunz RW (1991) Molecular modelling f€
ur Anwender, Teubner Studienb€
ucher
Leach A (2001) Molecular modelling: principles and applications, 2nd edn. Prentice Hall, New York
Lipkowitz KB, Boyd DB (eds) (1990) Reviews in computational chemistry. VCH, Weinheim
332 15 Molecular Modeling
Special Literature
Cornell WD et al (1995) A Second generation force field for the simulation of proteins, nucleic
acids, and organic molecules. J Am Chem Soc 117:5179–5197
Cram DJ (1988) The design of molecular hosts, guests, and their complexes. Angew Chem Int Ed
Eng 27:1009–1020
Fischer E (1922) Aus meinem Leben. Springer, Berlin, p 134
Pullman B (1990) Molecular modelling, with or without quantum chemistry. In: Rivail JL (ed)
Modelling of molecular structures and properties, vol 71, Studies in physical and theoretical
chemistry. Elsevier, Amsterdam, pp 1–15
van Gunsteren WF, Weiner PK (1989) Computer simulations of biomolecular systems. ESCOM,
Leiden
Watson JD (2010) The double helix, Phoenix, London; originally published by Weidenfeld 
Nicholson 1968
Bibliography 333
Conformational Analysis
16
Assembling a molecule with a modelling kit makes it already clear that rotations
around single bonds can be easily carried out. The molecule will achieve a different
shape, or as the chemists say, it is transformed into a different conformation. In a real
molecule, rotations around these bonds are not fully free. They are subjected to
a potential and the molecule adopts during the rotation particular, energetically favor-
able arrangements. n-Butane represents the simplest case (Fig. 16.1). The central
torsion or dihedral angle determines the relative orientation of the two bonds to the
methyl groups to one another. If n-butane is rotated out of the arrangement with the two
bonds to the methyl groups in 180
orientation (trans), the methyl group at the “front”
carbon and the hydrogen atom at the “back” carbon will directly coincide which each
other ata rotation angle of 120
and 240
called“eclipsed”. In this geometry, they come
closer to one another, therefore this arrangement is unfavorable for steric reasons. At a
rotationangle of 60
and300
the groups are again in a staggered geometry,which is an
energetically more favorable situation. This arrangement is somewhat less favorable
than the staggered trans orientation because of the spatial vicinity of the methyl groups,
which are now said to be “gauche” to one another. Finally along the rotation path an
orientation is adopted at 0
and 360
in which both methyl groups are exactly behind
one another. This is an even less favorable orientation.
16.1 Many Rotatable Bonds Create Large Conformational
Multiplicity
Multiple energy maxima and minima can be passed through during the course of
a full rotation about 360
depending on which atoms and groups are attached to the
rotatable bond. They are at different energy levels relative to one another. The
lowest minimum is called the global minimum, and the energetically higher minima
are called local minima. Knowledge about these minima is important because
molecules adopt geometries that correspond to such energy minima. Calculations
are necessary to find these minima. A possible method is in the systematic rotation
of all rotatable bonds, for instance in 10
steps. At each step the energy of the
G. Klebe, Drug Design, DOI 10.1007/978-3-642-17907-5_16,
# Springer-Verlag Berlin Heidelberg 2013
335
molecule is calculated by using a force-field. All detected minima correspond to
possible conformations of the molecule.
Most drug-like molecules have many single bonds and therefore exhibit more
than one rotatable bond. For these bonds, multiple values of the torsion angle can be
adopted. These values have to be combined for all rotatable bonds in the molecule.
The number of possible combinations increases multiplicatively. The molecule
n-hexane has three rotatable bonds. If, analogous to n-butane, three local minima
are assumed for each rotatable bond (60
, 80
, and 300
), we can expect 3  3 
3 ¼ 27 minima. To perform a systematic search for these minima in 10
steps
however, the evaluation of 36  36  36 ¼ 46,656 positions would be necessary. In
principle, the energy must be calculated for each of these positions. Not all angle
positions will, however, lead to reasonable geometries. It can happen that parts of
the molecule fold back upon itself, and parts will mutually superimpose. Such
collisions can be recognized by computer programs, and the geometry is discarded
from consideration. It is also easily imaginable that with an increasing number of
rotatable bonds, the number of local minima and adoptable geometries can dramat-
ically increase in a systematic search.
Energy
(kJ/mol)
3,8 kJ
CH3
CH3
CH3
CH3
CH3
CH3
CH3
CH3
CH3
CH3
CH3
CH3
CH3
CH3
τ
0 60 120 180 240 300 360
Torsion Angle t [°]
gauche trans gauche
14,6 kJ
25,5 kJ
Fig. 16.1 Butane, CH3CH2CH2CH3, is made up of a linear chain of carbon atoms. If the terminal
methyl groups are covering one another after rotation around the central C—C bond, the torsion
angle about the central bond is 0
. At a 60
angle the “back” methyl group is half way between the
“front” methyl group and a hydrogen atom. This situation is called a “gauche” orientation. At 120
a methyl group and a hydrogen atom are eclipsed to one another. At 180
the terminal methyl
groups are exactly opposite one another. Here the energetically most favorable situation, the trans
orientation, is achieved. From now on, the course of the rotation is mirror symmetrical, and ends in
the starting position at 360
. The orientations at 120
and 140
are energetically less favorable than
the 180
-orientation by 14.6 kJ/mol. The gauche orientations at 60
and 300
are the least
favorable ones and are 25.5 kJ/mol higher in energy. If a minimization method is applied that
can only run “downhill,” the three minima on the potential curve can be reached by starting at the
110
, 130
, and 250
points.
336 16 Conformational Analysis
16.2 Conformations Are the Local Energy Minima of
a Molecule
It was shown in ▶ Chap. 15, “Molecular Modeling” that the energy and geometry
of a molecule can be calculated with the help of a force field or a quantum
mechanical method. In this way every possible angle value combination about the
rotatable bonds in a molecule can be found that correspond to energetically
favorable states. The mathematical method that is used to search for such
a minimum geometry can only move downhill on the potential energy surface
(▶ Sect.15.5). For this, the potential of n-butane should be considered again
(Fig. 16.1). If an angle of 130
is used as a starting value, the minimization ends
with a trans geometry. If an angle of 110
is started with, which is only 20
distant, the
optimization will lead to a gauche orientation. By doing this, two of the three possi-
bilities are detected. The third minimum that mirrors the gauche conformation is
reached if an angle of 350
is started from. In this way, all three conformations are
found for the simplest possible case.
How are complex molecules to be approached? In principle, in exactly the same
way. Because it is not known which torsion angles of the individual single bonds
will give access to potential minima, that is, stable conformations, the minimization
must be started from numerous angles for each of the single bonds. From these
values the minimization always goes “downhill”. The minima on the potential
surface are found in this way. The art is to efficiently define the starting points
from which a given geometry is minimized. This is a very laborious task, particu-
larly with large molecules. It is akin to a hiker in the mountains searching for the
deepest valley.
Adenosine monophosphate 16.1 serves as an example (Fig. 16.2). The analysis
concentrates on the five-membered ribose ring, the bond to nitrogen in the adenine,
and the three bonds of the sugar phosphate side chain. What conformations can this
molecule adopt? Rotations are performed about the open-chain bonds in 10
steps.
In the systematic search for the ribose ring only those orientations are considered
that allow the ring to close. To get a rough overview of the hypothetically obtained
geometries, the distance between the center of the adenine scaffold and the phos-
phorus atom is measured in each generated geometry. This falls between 4.5 and
9.3 Å for the more than 300,000 generated geometries. To estimate the energy content
of a molecule in an arbitrary geometry, its van der Waals energy (▶ Chap. 15,
“Molecular Modeling”) is calculated. Such a calculation is quickly accomplished.
The energies of the 300,000 geometries are between 0 and 64 kJ/mol. The
so-generated structures are not yet in local potential minima. To achieve this, each
starting geometry must be minimized (cf., the potential energy curve of n-butane in
Fig. 16.1). The subsequently obtained conformations are compared to determine
whether the same local minima have been reached by starting from different points.
This is a rather laborious endeavor for 300,000 starting geometries! It is akin to letting
our hiker walk downhill from each level square to find the deepest valley. Hopefully
he is granted great longevity so that he lives long enough to see the results of the
search! Can this search be structured more effectively?
16.2 Conformations Are the Local Energy Minima of a Molecule 337
16.3 How to Scan Conformational Space Efficiently?
Sometimes rolling the dice is better than systematic probing! The hiker could
choose random places in the mountains from which to descend into the next valley.
With a little luck he will find the deepest valley with significantly less effort. Such
Monte Carlo methods are very popular in conformational analysis. For this the
starting angles for the conformation search are chosen purely randomly. Molecular
dynamics serves as another approach. The hiker would have to climb into an
airplane that flies at high speed between the mountains and changes its direction
with each obstruction. After set time intervals, the hiker jumps from the airplane
and hikes to the base of the valley upon landing. The higher the airplane flies,
the fewer mountain peaks are encountered and the faster the mountains can
be crisscrossed. In the course of molecular dynamics a molecular trajectory
(▶ Sect. 15.8) is followed, and the geometry is saved at predefined time intervals
to use them as starting points for energy minimizations in a conformational anal-
ysis. By increasing the temperature (i.e., flying higher) a larger area of conforma-
tional space can be searched in a shorter period of time.
16.4 Is It Necessary to Search the Entire Conformational
Space?
Until now molecules have been considered in an isolated state. How does their
flexibility change when they are brought into an environment like the binding
pocket of a protein? In principle nothing changes in their conformational flexibility.
It could be that minima are found at different positions that have different relative
energies because of electrostatic and steric interactions in the binding pocket. This
begs the question of whether the torsion angles in all areas must be sought for
N
N
NH2
O
P
O−
OH
O
N
N
O
O
16.1
OH
O
H
t1 t4
t2
t3
Fig. 16.2 Adenosine monophosphate 16.1 exhibits the conformationally flexible ribose ring and
four open-chain torsion angles, t1–t4. Rotations are performed and the center of the around these
torsion angles during the conformational analysis. To get a rough description of the attained
geometry, the distance between the phosphorus atom in the side chain and the adenine scaffold
(
N
) is measured.
338 16 Conformational Analysis
a ligand that is in a binding pocket. If energy minima occur preferentially at
particular torsion angles, it is reasonable to limit the search to these angles. The
hiker could, for example, get the impression that the villages are predominantly
found in valleys and hardly ever on peaks or slopes. Because of this, all of the
villages would be worthwhile as starting points for his minimum search.
Ligands in the binding pocket of a protein are under the influence of directional
interactions from the amino acids that are located there. Similar conditions are
found for molecules in a crystal lattice. There, the environment is built of identical
copies of neighboring molecules (▶ Chap. 13, “Experimental Methods of Structure
Determination”). These undergo directional interactions with the molecule, analo-
gously to the amino acids in the binding pocket. Interestingly, the molecular
packing density in the interior of a protein is similar to organic molecules in
a crystal lattice. As was already mentioned in ▶ Sect. 13.9, the crystal structures
of numerous organic molecules are known and stored in a database. Experience has
unfortunately shown that the conformation of a flexible molecule in a crystal
structure is often not identical, or even similar to the geometry of the molecule in
the binding pocket of a protein. The same is true for conformations that have been
found in solution.
The receptor-bound conformation of a molecule cannot be unambiguously
derived from its small-molecule crystal structure or from that in solution. None-
theless, much can be learned from crystal structures. As an example, not the entire
molecule should be considered, but rather individual torsion angles. The potential
energy for the central torsion angle of n-butane is shown in Fig. 16.1. If the
angles for multiple C—CH2—CH2—C fragments are extracted from a database
of small-molecule crystal structures, they gather overwhelmingly in areas where the
potential energy curve shows local minima. Adenosine monophosphate 16.1 has
four open-chain torsion angles t1–t4 (Fig. 16.2). The bond between the ribose
ring and the adenine scaffold forms the torsion angle t4. A further fragment is the
phosphate group with the oxygen and the attached carbon in the chain (t3).
This fragment occurs in the database in a large variety of different structures.
A representative picture can be expected because this fragment occurs in very
many different environments when enough crystal structures are considered. The
results of such searches for the four torsion angles t1–t4 are shown in Fig. 16.4 as
frequency distributions, so-called histograms. Experience has shown that clearly
preferred values occur for many torsion angles. That is the case here for t1, t2, and
t3. The question can be raised as to why this statistical evaluation is not better
performed on ligands that are taking part in crystallographically studied protein–
ligand complexes. Unfortunately the diversity of these data is still limited, and the
data are usually not accurate enough for the desired evaluation. Nevertheless,
comparative studies have shown that the same torsion angles are preferentially
found in protein–ligand complexes and small-molecule crystal structures
(Fig. 16.3).
The experience that torsion angles prefer particular values can be used for the
conformational search. The angle t4 between the ribose ring and the adenine
scaffold shows a broad distribution over many possible values (Fig. 16.4).
16.4 Is It Necessary to Search the Entire Conformational Space? 339
Unfortunately, the search cannot be narrowed here. This looks better for the other
angles t1–t3. There, only specific values occur. If the systematic search is limited to
these areas, and a search in 10
steps is carried out around the average value, it would
only be necessary to generate 6,340 geometries. Almost the same distance between
phosphorus and adenine is covered with 5.9–9.3 Å as in the unrestricted search.
If a van der Waals energy calculation is carried out on these geometries, values
between 0 and 16.3 kJ/mol are obtained. In contrast to the results from Sect. 16.2, all
the geometries that correspond to the energetically unfavorable areas are discarded.
How can it be confirmed that this restricted search also covers that part of the
conformational space that includes the receptor-bound conformations? Adenosine
monophosphate 16.1 often occurs as a substructure of cofactors in protein com-
plexes so that there is enough information about receptor-bound conformations for
this particular example. They come from crystal structures of proteins with these
bound cofactors. The distance range of 5.9–9.2 Å between the adenine scaffold and
the phosphorus in the receptor-bound structures covers the same range that was
detected in the enhanced systematic search. It can therefore be assumed that enough
geometries were generated that satisfactorily populate the local minima of the
bound state of adenosine monophosphate. Reflecting back to the initial butane
example (Fig. 16.1), this means that the starting points were well distributed so
that all minima were reached.
16.5 The Difficulty in Finding Local Minima Corresponding to
the Receptor-Bound State
As already described, the local minima in a systematic conformational search are
obtained by subjecting all of the generated geometries to a force-field optimization.
There can be problems with this approach. To explain this, a different molecule,
citric acid 16.2 can be considered, in the binding pocket of citrate synthase. Seven
Frequency
[%]
60
80
0
20
40
0 30 60 90 120 150 180 210 240 270 300 330 360
Torsion Angle t [°]
Fig. 16.3 A value distribution for the torsion angles with clusters at 60
, 180
, and 300
is derived
from a database of small-molecule crystal structures for the C—CH2—CH2—C fragment. Most
values are found at 180
. Torsion angles between 0
and 360
are entered as the relative frequency
in percent. The maxima of the distribution are at the points where the potential curve of n-butane
(Fig. 16.1) shows its energy minima.
340 16 Conformational Analysis
hydrogen bonds are formed by its three carboxylate groups and the hydroxyl group
to three histidine and two arginine residues of the protein (Fig. 16.5). If the free, not
to the protein bound citrate molecule is considered and its geometry is minimized in
an isolated state, it takes on a conformation with internally saturated hydrogen
bonds (▶ Sect. 15.5). Of course, a different geometry can be started from, but in
all cases, conformations with intramolecular hydrogen bonds will result upon
minimization. Such hydrogen bonds rarely occur in the protein-bound state.
Therefore the conformation that was obtained after minimization in the isolated
state has no relevance for the conditions in the protein.
As a general rule, ligands rarely bind to proteins in a conformation exhibiting
intramolecular hydrogen bonds. The H-bond-forming groups are generally
involved in interactions with the protein.
To circumvent the problem of intramolecular H-bond formation, a minimization
of the generated starting structure can be neglected, and all geometries from the
systematic search can be used for further comparison (▶ Chap. 17, “Pharmacophore
Hypotheses and Molecular Comparisons”). Then, however, very many geometries
must be examined. This would severely limit the scope of such comparisons for
N
N
NH2
O
P
−
O
HO
N
N
O
O
16.1
OH
HO
t1 t4
t2
t3
40
60
20
30
40
0
20
0
10
Frequency
[%]
Frequency
[%]
0 30 60 90 120 150 180 210 240 270 300 330 360
Frequency
[%]
60
0
20
40
Torsion Angle t [°]
0 30 60 90 120 150 180 210 240 270 300 330 360
Torsion Angle t [°]
Frequency
[%] 15
0
5
10
0 30 60 90 120 150 180 210 240 270 300 330 360
Torsion Angle t [°]
0 30 60 90 120 150 180 210 240 270 300 330 360
Torsion Angle t [°]
Fig. 16.4 The frequency distribution of the torsion angles of the open-chain bonds of adenosine
monophosphate as found in the crystal structures of small organic molecules. The torsion-angle
histograms are constructed for fragments that are representative for corresponding portions of the
test molecule. There are clearly preferred values for the angles t1–t3, but a broad distribution of all
possible angles is found for t4. This knowledge is used in the conformational analyses and limits
the search for t1–t3 to the preferred value ranges.
16.5 The Difficulty in Finding Local Minima 341
computational reasons. Furthermore, such generated results would likely describe
rather distorted geometries. The force field responsible for the formation of intra-
molecular H-bonds could be neglected. But how reliable would such a reduced
force field be? An attempt can be made to summarize the geometries that were
generated in a systematic search so that groups with similar conformations are
described by one representative member.
16.6 An Effective Search for Relevant Conformations by Using
a Knowledge-Based Approach
A knowledge-based approach analyzes first the experimentally determined confor-
mations and generates for new molecules only those conformations that are con-
sistent with the experimental knowledge base. In this way, many geometries are
never generated from the very beginning. The example of adenosine
monophosphate 16.1 is once again invoked. The approach recognizes a flexible
five-membered ring and four open-chain rotatable bonds. Energetically favorable
conformations of the ring are chosen from a database. This database contains many
different ring systems as they are found in, for example, crystal structures of
organic molecules. In the case at hand, the approach suggests the five energetically
most favorable ring conformations from which two are in fact found in the protein-
bound cofactors. For the open-chain part of the molecule the method is guided by
the above-mentioned frequency distribution of the dihedral angle (Fig. 16.4). The
starting geometries are only generated in areas in which these distributions show
significant frequencies. The distribution is still rather crude. In a final step, the
generated geometries are optimized by readjusting the torsion angles. Clashes
between non-covalently bound atoms are avoided. At the same time the adjusted
dihedral angles are kept as close as possible to the preferred values. This approach
gets by with relatively few conformations. They are rather evenly distributed in the
part of the conformational space that is relevant for receptor-bound conformations
(Fig. 16.6).
N
N
His 238
Arg 329
N
HN
N
NH
OH
O
O
H
H
HN
H H
His 320
+
NH
O−
O
O−
HN
O−
H
N
H
Arg 401 +
N
N
16.2 His 274
Fig. 16.5 Interactions
between citric acid 16.2 and
the enzyme citrate synthase.
The molecule is bound by
seven hydrogen bonds to three
histidine and two arginine
residues.
342 16 Conformational Analysis
16.7 What Is the Outcome of a Conformational Search?
Many drug-like molecules are flexible. They can adopt markedly different confor-
mations depending on the surrounding environment. Usually the receptor-bound
geometry is not in the energetically most favorable conformation found for the
isolated state, but will fall in an energetically favorable area. For the conformational
analysis, this means that it is not necessarily the deepest minimum that is sought.
Rather, it should be the “relevant” minimum that corresponds to the bound state.
There is only a chance of finding it when the criteria for the search are known.
There is no difference in the difficulty of finding the energetically most
favorable conformation, or the one that “fits” best the binding site. An important
tool in the search for novel lead structures is the docking of candidate molecules
into the binding pocket of a given protein. Programs that are able to use this
approach must be able to handle the conformation problem. Meanwhile, a large
variety of methods have been developed that allow efficient docking searches on
computer clusters, particularly for molecules of drug-like size.
Fig. 16.6 Eighty-one
conformers (upper part) from
experimentally determined
protein–ligand complexes are
superimposed upon one
another to illustrate the areas
in space that adenosine
monophosphate 16.1 can
adopt in a protein-bound state.
The ribose ring is located in
the center, for which two ring
conformations occur. The
possible orientations of the
adenine ring are shown on the
top, and the conformations of
the flexible phosphate chain
are on the bottom. Similar
coverage of the
conformational space is
achieved with a manageable
number of 14 conformations
(lower part), which were
generated by a knowledge-
based approach.
16.7 What Is the Outcome of a Conformational Search? 343
16.8 Synopsis
• Drug-like molecules exhibit multiple rotatable bonds. Rotations around these
bonds drive the molecules into different conformations that correspond to local
minima on the energy surface of the molecule.
• The receptor-bound conformation of a drug-like molecule is the starting
point for any drug-design considerations. Therefore, many methods have been
developed to perform conformational analyses. Systematic searches by incre-
mental rotations about each single bond torsion angle will produce a huge
amount of geometries that need to be optimized to the local minima on the
energy surface.
• The conformation of a drug-like molecule frequently changes with the environ-
ment. Usually the conformation in the protein-bound state differs from that in
solution, in the gas phase, or in the small-molecule structure.
• Considering torsional fragments in small molecules and analyzing them across
databases of crystal structures by statistical means reveals clear-cut torsional
preferences for many examples. Such knowledge can be exploited to perform
a conformational search more efficiently. Not all values around a rotatable bond
have to be tested, and the search can be limited to the ranges that are known to be
preferred.
• A further obstacle in the conformational search of the protein-bound conforma-
tion of a drug-like molecule is that the molecule will interact with its environ-
ment. This environment, the protein’s binding pocket, is often polar and will
involve the bound ligand in multiple hydrogen bonds.
• Using a knowledge base on torsional preferences of small organic molecules
can significantly enhance the conformational search, particularly during
docking, in molecular comparisons, or in database searches based on predefined
pharmacophores.
Bibliography
General Literature
Leach A (2001) Molecular modelling: principles and applications, 2nd edn. Prentice Hall,
Englewood Cliffs
Special Literature
Böhm HJ, Klebe G (1996) What can we learn from molecular recognition in protein–ligand
complexes for the design of new drugs? Angew Chem Intl Ed Eng 35:2588–2614
Klebe G, Mietzner T (1994) A fast and efficient method to generate biologically relevant
conformations. J Comput Aided Mol Design 8:583–606
Klebe G (1994) Structure correlation and ligand/receptor interactions. In: Bürgi HB, Dunitz JD
(eds) Structure correlation. VCH, Weinheim, pp 543–603
344 16 Conformational Analysis
Klebe G (1995) Toward a more efficient handling of conformational flexibility in computer-
assisted modelling of drug molecules. Persp Drug Des Discov 3:85–105
Marshall GR, Naylor CB (1990) Use of molecular graphics for structural analysis of small
molecules. In: Hansch C, Sammes PG, Taylor JB (eds) Comprehensive medicinal chemistry,
4. Pergamon, Oxford, pp 431–458
Stegemann B, Klebe G (2011) Cofactor-binding sites in proteins of deviating sequence: Compar-
ative analysis and clustering in torsion angle, cavity, and fold space. Proteins 80:626–648
Bibliography 345
Part IV
Structure–Activity Relationships and
Design Approaches
Today drug design is supported by numerous computational approaches that, like
the pieces of a puzzle, all provide contributions from the development of a first
design hypothesis to a clinical candidate. (Announcement poster from the research
group of the author on the occasion of a conference in 2005 in Rauischholzhausen,
Marburg, Germany.)
348 IV Structure–Activity Relationships and Design Approaches
Pharmacophore Hypotheses and Molecular
Comparisons 17
Emil Fischer’s lock-and-key principle (▶ Sect. 4.1) demonstrates the specific
interaction of an active compound with its receptor. With a key, it is the grooves
on the blade that interact with the wards in the keyway to open the lock. With active
substances it is a particular part of the molecule that undergoes an interaction with
the amino acids in the binding pocket. Similar molecules are frequently compared
in drug design to derive ideas for new structures. In this chapter, the criteria that
make such comparisons possible are compiled. Furthermore, these criteria can be
used to search in databases for alternative molecules that can bind in the same way
with the protein.
17.1 The Pharmacophore Anchors a Drug Molecule in the
Binding Pocket
The structure of the binding pocket determines which functional groups are neces-
sary for the ligand to bind. The spatial orientation of these functional groups in
ligands is referred to as the pharmacophore (▶ Sect. 8.7, Fig. 8.9). Because of its
importance for drug design and model hypothesis in medicinal chemistry, an
official IUPAC definition has been established by Camille G. Wermuth
(Table 17.1). The interacting groups that a ligand must possess to be able to
successfully interact with a protein defines the pharmacophore in space and is
independent of the special molecular scaffold to which they are attached. The
hydrogen-bond-forming groups or hydrophobic parts are considered for this.
A more detailed examination differentiates between positively and negatively
charged groups in a molecule. When derived from a set of similarly binding ligands,
this generalized description is referred to as the ligand-based pharmacophore. On
the other hand, the protein structure can be the starting point. For this, an analysis is
made as to which amino acid functional groups are in the binding pocket. They
define the properties with which a ligand can bind to them. In this sense, the protein
structure determines how the pharmacophore of a ligand must be shaped to be
able to successfully bind to the protein. This description is referred to as the
G. Klebe, Drug Design, DOI 10.1007/978-3-642-17907-5_17,
# Springer-Verlag Berlin Heidelberg 2013
349
protein-based pharmacophore. In contrast to the lock-and-key picture, ligands
and proteins are flexible. In ligands the functional groups of the pharmacophore
must be oriented in the direction of the corresponding counter groups in the protein.
Therefore, detailed knowledge about the conformational properties of the ligand is
essential. Only then, it can be predicted whether a ligand can potentially adopt
a geometry that satisfies the interactions with the protein. On the receptor side, the
geometry of the binding pocket can adapt to the shape of the ligand, similar to how
a glove fits on the hand of its wearer (induced fit, ▶ Sect. 4.1). Binding pockets are
indeed found in the interior or in buried grooves on the surface of proteins, and it is
there that the small but decisive conformational changes of the protein occur. An
example for the adaptability of a protein is presented in ▶ Sect. 15.8. An attempt is
made to describe these induced-fit adaptations by using molecular dynamics
simulations.
17.2 Structural Superposition of Drug Molecules
For the moment, we want to limit ourselves to an example with an unknown
receptor structure. All of the effects that ligand binding induces in the protein are
therefore neglected. An example should clarify this. The fruit of the shrub Anamirta
cocculus, the fishberry, contains the terpene picrotoxinin 17.1, which causes con-
vulsions. This compound affects chloride channels (▶ Sect. 30.5). Because of its
central stimulatory effect, it has been used in the past as an antidote to sleeping pill
overdoses. Due to its high toxicity, it has no therapeutic importance today. The
structure of picrotoxinin was determined by crystallography (Fig. 17.1).
Synthetic modifications of the cyclic core structure have led to active and inactive
derivatives (Fig. 17.2). The spatial structure of the individual derivatives can be
constructed on a computer from the crystal structure of the parent compound and
superimposed upon one another to recognize structural differences. The parts of the
molecule that are seen as equivalent in a ligand-based pharmacophore model are
Table 17.1 Official IUPAC definition of a pharmacophore by Wermuth CG et al (1998) Pure
Appl Chem 70:1129–1143
A pharmacophore is the ensemble of steric and electronic features that is necessary to ensure the
optimal supramolecular interactions with a specific biological target structure and to trigger (or
to block) its biological response.
A pharmacophore does not represent a real molecule or a real association of functional groups,
but a purely abstract concept that accounts for the common molecular interaction capacities of
a group of compounds toward their target structure.
A pharmacophore can be considered as the largest common denominator shared by a set of active
molecules. This definition discards a misuse often found in the medicinal chemistry literature,
which consists of naming as pharmacophores simple chemical functionalities such as guanidines,
sulfonamides, or dihydroimidazoles (formerly imidazolines), or typical structural skeletons such
as flavones, phenothiazines, prostaglandins, or steroids.
A pharmacophore is defined by pharmacophoric descriptors, including H-bonding, hydrophobic,
and electrostatic interaction sites, defined by atoms, ring centers, and virtual points.
350 17 Pharmacophore Hypotheses and Molecular Comparisons
placed upon one another for this superimposition. The superposition of all active and
inactive derivatives along with the common volumes of both classes are shown in
Fig. 17.3. The difference between both volumes is computed. It describes those areas
in space that are only occupied by the inactive molecules.
17.3 Logical Operations with Molecular Volumes
What information can be extracted from such comparative volumes? It is assumed
as a working hypothesis that a molecule can only be bound when its size does not
exceed the maximum available space. How large is the maximum available space?
To get an idea of this, the common volumes of all active derivatives are considered
and compared with the volumes of all inactive derivatives. A possible explanation
for the lack of activity of a molecule could then be that the area in the binding
pocket that the molecule would likely occupy is already taken by the protein.
Volume comparisons between active and inactive derivatives deliver informa-
tion about the possible shape of the receptor pocket. Such comparisons can be very
supportive in drug design. If the “forbidden” volume area for a compound class is
found, it can be checked before synthesis whether a compound really leaves the
“forbidden” area unoccupied.
O
OH
O
O CH3
O O
17.1
Fig. 17.1 Picrotoxinin 17.1 is responsible for the centrally stimulating effect of the extracts of
fishberries. Its structure and spatial architecture were proven by X-ray structure analysis.
17.3 Logical Operations with Molecular Volumes 351
Because of the rigidity of the molecule, it is simple to superimpose picrotoxinin
analogues on one another. Considering flexible molecules however, it cannot be
expected that the transition from a 2D molecular representation to a 3D structure
(▶ Chaps. 15, “Molecular Modeling” and ▶ 16, “Conformational Analysis”)
active
R1
OH
R1 =
O
OH
O 17.1
Picrotoxinin
O CH3
O
H OAc
O CH3
CH3
CH3
CH3
CH3
CH3
CH3
CH2
R2 = OH
O
OH
O
O
OH
H
OAc
CH3
R2
N CH3
O
O
inactive
O OCOCH3
O
O OH
O
O
O
O
O
CH3
CH3
CH3
CH3
O O
O OH O
O
OH
O
O
O
O
O O
O
Fig. 17.2 By starting with
picrotoxinin 17.1, active and
inactive derivatives were
synthesized.
352 17 Pharmacophore Hypotheses and Molecular Comparisons
delivers molecules in conformations in which all of the functional groups of the
pharmacophore are already analogously placed in space. Therefore, there are two
problems to solve:
• The groups that correspond to one another in different molecules and define the
pharmacophore must be determined.
• Techniques are needed that bring the molecules into conformations in which the
equivalent groups of the pharmacophores are analogously oriented in space.
17.4 The Pharmacophore Is Modified by Conformational
Transitions
To resolve the first problem, the role that the functional groups of the active
substance that form the contact with the receptor must be considered. They must
form hydrogen-bonding and hydrophobic interactions with the protein. In this
context, similarity of the functional groups means that they can form analogous
interactions with the protein. To define a pharmacophore in space, at least three
interacting groups are needed. This is immediately clear if one considers how many
fingers on a hand are needed to hold a randomly formed object (e.g., a potato) in
space. With only two fingers, the object can still rotate about an axis. In contrast, if
three anchor points are taken, its position is fixed in space. Practical experience
with a compound class is often helpful when assigning pharmacophoric groups.
For example, inhibitors of the angiotensin-converting enzyme (Fig. 17.4 and
▶ Sect. 25.5) need a terminal carboxylate group, a carbonyl group, and a group
that coordinates to the catalytic zinc ion.
Fig. 17.3 Superposition of the spatial structure of active (yellow) and inactive (blue) derivatives
of picrotoxinin. The united volumes around the active derivatives are shown by the red mesh. The
total volume around all inactive derivatives is shown in blue. A difference is formed between the
two volumes. The remaining volume (green) shows areas that are only occupied by inactive
derivatives. An explanation for the lack of activity of these derivatives can be that they try to
occupy volume areas that are already occupied by the receptor protein. This spatial clash does not
occur with the active derivatives.
17.4 The Pharmacophore Is Modified by Conformational Transitions 353
S
N
OH
N
N
N
O
HS
N
N
O
HS
COOH
O
HOOC
OH
O COOH
N
O
N
O COOH
H3C
S
N
O
HS
COOH
N
O
CH3
HS
COOH
N
O COOH
HS
OH
H
O
N
O
CH3
HS
COOH
N
O
CH3
CH3
CH3 CH3
CH3
CH3
N
P
COOH
HO
O
N
N
N
O
O COOH
HOOC
N
O COOH
S
H
N
S
H
N
O
HS
COOH H
O
O COOH
H H
P
HO N
O
N
COOH
N
O
N
COOH
HOOC
N
O COOH
COOH
HS
N
S
H
H
N
P
S
S
O
N
O
N
COOH
COOH
N
HOOC
HOOC
HOOC
N
N
O
O
COOH
OH
O
H
H
O
N N
O COOH
SH
Fig. 17.4 Inhibitors of the angiotensin-converting enzyme. A pharmacophore that consists of
a terminal carboxylate group, a carbonyl group, and a group that coordinates to the catalytic zinc is
necessary for binding to the enzyme. The latter function is assumed by a thiol, a phosphoric,
phosphonic, or carboxylic acid. The individual derivatives possess conformational flexibility in
different areas.
354 17 Pharmacophore Hypotheses and Molecular Comparisons
How can it be determined whether a common orientation for the assumed
equivalent groups in different molecules exists? In a computational method
these groups are assigned “virtual” springs that are coupled to one another. The
spatial overlap is reinforced by pulling these springs together. To avoid arriving at
an entirely distorted molecular geometry, a force-field is simultaneously taken into
consideration for each molecule (▶ Chap. 15, “Molecular Modeling”). The steroid
17.2 and three different inhibitors 17.3–17.5 (Fig. 17.5) are considered as an
example. They are ligands of an enzyme in the ergosterol biosynthesis. Spring
forces are applied between the marked atoms with the same numbers. The minimi-
zation of these forces along with the individual force-fields of the four molecules
leads to the superposition that is shown in Fig. 17.5.
12 12
11 10
9
8
11
10
9
1
2
7
6
4
3
5
HO
1
2 6
3
5
N
17.2 17.3
12 12
10
9
N
10
8
6
5
NH
1
6
4
3
5 N
7
1
2
4
3
17.5
17.4
Fig. 17.5 “Virtual” springs are coupled to the atoms that are marked with numbers around the
steroid 17.2 and the three derivatives 17.3–17.5. The structural superposition (bottom) that is
shown is determined by the force of these springs and the simultaneous consideration of molecular
force-fields.
17.4 The Pharmacophore Is Modified by Conformational Transitions 355
Unfortunately the resulting solution depends on the starting conditions. If the
molecules are differently oriented in space at the beginning of the calculation, or if
they start from different conformations, different superpositions can result. At first
glance, this argument appears perhaps somewhat implausible. It should be kept in
mind that molecules are not only considered under the influence of “virtual” spring
forces but also under their own force fields. The many minima problem of molecular
force-field calculations was already mentioned in ▶ Chap. 16, “Conformational
Analysis”. They play an important role here too. The hiker in the last chapter should
help to explain this problem. He stands on a mountain peak and wants to descend into
the deepest valley possible. At the same time, he feels an “additional force” as he has
severe thirst. He wants to meet his friends in a pub. The friends are coming from
different peaks in the mountains. He sees a pub in all valleys. But which is his choice?
For a common meeting point he would also accept a less deep valley. In the beginning
of his hike he looks for the steepest descent to come down quickly. After a while, the
other valleys fall from view. If he arrives at a different pub in the end, he does not have
the energy anymore to look for another one. If he had started from a different mountain
top, he might have found a comparably deep valley, but had found the pub of his
choice and met his friends at the same time. The problem with the choice of starting
conditions for molecular comparisons with “virtual” spring forces is similar. How
should it be checked whether the best possible solution was found? Here only an
experiment can help. For this it is necessary to synthesize molecules that are
conformationally rigid in particular parts because of the incorporation of rings. They
confer a fixed spatial arrangement to the pharmacophore. If they also possess activity,
their rigidified geometry indicates the correct pharmacophore (see Sect. 17.9).
17.5 Systematic Conformational Search and Pharmacophore
Hypothesis: The “Active-Analogue Approach”
In the last chapter conformational analysis was the central topic. Could the tech-
niques described there, for example, the systematic rotation around particular
bonds, be used in the search for the pharmacophore? Garland Marshall developed
such a technique, called the active-analogue approach, at the end of the 1970s.
First a pharmacophore must be assigned to all molecules in a data set. Then the
equivalency of groups must be defined, that means, which groups are equivalent to
which other groups. Then a systematic conformational search is carried out for the
first compound in the data set. The distance between each functional group in the
pharmacophore for each geometry is determined during the search. These distances
are saved. Because molecules cannot take on any arbitrary geometry, the distances
will occur in particular intervals. An analogous approach is taken for the second
molecule in the set. In principle, only the distance ranges of the first molecule must
be searched. It could be that all of the distances found with the second molecule
were already found with the first. It could also be that particular ranges are
excluded, and the “allowed” distance ranges are therefore limited. All of the
molecules in the data set are analyzed in this way.
356 17 Pharmacophore Hypotheses and Molecular Comparisons
If the conformational flexibility of the molecule is limited in one part of the
scaffold, there is a chance that the functional groups of the pharmacophore remain
in only one or a few different spatial patterns. The possible binding geometries of
the pharmacophoric groups of the ligand are derived from this. Afterward,
a geometry optimization can be carried out, in which case, the “virtual-springs”
approach is now ideally suited because the latter approach has approximated the
final solution rather closely. It is easy to imagine that the order with which the
molecules are investigated is decisive for the efficiency of the technique. Ideally,
the most rigid molecule from the data set is the first to be studied. With a little luck,
this limits a large part of the possible conformational space. The resulting list of
possible distances will remain small. By consistently using such limitations, in
1987 Garland Marshall and his research group were able to propose a model for the
receptor-bound conformation of the ACE inhibitors shown in Fig. 17.4. What could
be more rewarding than years later to be able to personally validate the model and
that it proved correct within an astonishingly small error margin! The validation
was achieved in the meantime because the crystal structures of the enzyme with
bound inhibitors from this data set were solved (see ▶ Sect. 25.5).
17.6 Molecular Recognition Properties and the Similarity of
Molecules
The question must be allowed as whether the conceptions presented in the previous
sections to represent the properties of molecules were really appropriately consid-
ered in the attempted comparisons? Deciding which functional groups belong to the
individual “teeth” of a pharmacophore is not easy. Analogous functional groups
must be oriented in a similar spatial direction in all molecules. In the case of the ACE
inhibitors (Fig. 17.4) conflict occurs already during the assignment of the functional
groups. Some analogues carry two carboxylate groups, which must be unambigu-
ously assigned to the pharmacophore prior to comparison with other inhibitors.
The binding of low molecular weight ligands to a protein is a mutual, targeted
recognition process. Both partners must fit together so that a strong interaction can
be formed. Parts of the ligand that have complementary recognition properties
determine the binding to the receptor. The term “recognition properties” refers to all
qualities that contribute to the specific interaction between molecules. Until now,
only properties and similarities have been considered that could be directly read
from the molecular scaffold. But is that sufficient? How would the world look if we
recognized ourselves only by our “scaffolds,” that is, only by the skeletons? Male
and female could not even be differentiated straightaway on these grounds! All of
the allure of interpersonal relationships that function over personal appearance and
charisma would be lost. Until now, molecules have been considered on the grounds
of their “skeleton”. Why should ligand–receptor interactions be described at this
level? Even molecules recognize one another by the properties of their shapes and
surfaces exposed to their immediate vicinity to form contacts. The following
example should clarify this point. Methotrexate 17.6 (MTX) and dihydrofolate
17.6 Molecular Recognition Properties and the Similarity of Molecules 357
17.7 (DHF) bind to the enzyme dihydrofolate reductase (Fig. 17.6 and ▶ Sect.
27.2). The side chains of both molecules are nearly identical, but the heterocycles
are different. It is known from NMR spectroscopic investigations that the proton-
ated form of MTX binds to the protein. When considering the chemical formulae, it
is tempting to overlay the two heterocycles directly upon one another. Good
scaffold equivalence is achieved, and the heteroatoms in both molecules fall on
top of one another. The receptor, however, does not care about the apparent
equivalence of molecular skeletons. The interaction with the molecular surface is
much more important. Polar molecules such as MTX or DHF are bound to the
protein through hydrogen bonds. The arrows in Fig. 17.6 characterize the H-bond
donor and acceptor groups. The arrows are pointing to the molecule when an
acceptor property is exposed, and away in the case of donor groups. At the start, the
molecules are oriented in space so that they correspond in terms of a direct atom–
atom matching. For the moment, the basic molecular skeleton should be ignored,
and only the distribution of H-bond donor and acceptor groups is considered. The
equivalence achieved is not very convincing. Another variant is taken into consid-
eration in which the heterocycle of DHF is flipped over along the bond between the
heterocycle and the side chain. The spatial overlap of both molecules is no longer
optimal, but the pattern of exposed donor and acceptor groups for both molecules
shows much better agreement (Fig. 17.6). If transformed into another conformation,
the molecule now has entirely different molecular recognition properties. This
difference can hardly be read from chemical formulae, even by a trained eye in
cases such as this one.
Models are nice, but are they also correct? Here only an experiment can provide an
answer. Luckily, in the present case, crystal structures are available for both ligands in
complex with DHFR. The observed binding geometries are shown in Fig. 17.7. One
aspartate and two carbonyl groups in the main chain and two water molecules are
responsible for recognition in the binding pocket. The water molecules mediate the H-
bonds between ligand and protein. The experimentally determined binding geometries
show that the conceptions about the similarity of the hydrogen bond properties led to
the correct conclusions. On first glance, a surprising and seemingly “non-equivalent”
orientation of both ligands in the binding pocket is easily explained. The properties
that are responsible for the mutual recognition process must be compared to one
another. Only these count in the comparison! It is notable that this experimental
confirmation of the above-described ideas came eight years after the working hypoth-
esis was proposed. This is a nice example of the performance of model hypothesis.
Other properties, apart from hydrogen bonds, can serve as additional criteria to
define similarities in the molecular-recognition process. The electrostatic poten-
tial (▶ Chap. 15, “Molecular Modeling”) computed for the heterocyclic ring
systems of DHF and MTX (Fig. 17.7) suggests very similar conclusions. In addition
to the previously mentioned H-bonding properties and electrostatic potential, steric
space filling and the distribution of hydrophobic properties on the surface of both
ligands, play an important role. When molecules are superimposed to predict their
putative geometries in the binding pocket, their conformational flexibility must also
be considered.
358 17 Pharmacophore Hypotheses and Molecular Comparisons
17.7 Automated Molecular Comparisons and Superpositioning
Based on Recognition Properties
Is it possible to consider all of the properties that were mentioned in the last section
in a method to superimpose molecules for a relative comparison? For this,
a measure of similarity for all properties must be calculated. This measure must
be related to a spatial distance function. Subsequently, an optimization of the spatial
superposition can be performed. At the same time, the maximum similarity of the
chosen properties is sought. The program SEAL from Simon Kearsley and Graham
Smith determines the spatial similarity of different properties distributed over
the molecular scaffold. It simultaneously ranks the similarity with respect to the
overlap volume of the molecules that were determined during superposition. In
this way the superposition of MTX and DHF is correctly predicted according to
experiment. The conformational flexibility is also considered in this analysis.
N
H
H
O
N
H
H
O
N
N R
N
H
N
N
H
O
H
R
N
N N
N
N
R
N
H
N
N
H
O
H
R
N N
N
H
H
N N
N
H
H H
N N
N
H
H
N N
N
H
H H
+
17.6 17.7
a
N
H
H
O
N
H
H
N
R
H
N
N N
N
N
R
H
H
H
N
N N
N
H
N
H
H H
R
+
N
N N
N
N
R
N
H
N
N
N
N O
N
H
+
H
H H H
H
H H
H
+
b c
Fig. 17.6 Methotrexate 17.6 and dihydrofolate 17.7 are ligands of dihydrofolate reductase. The
side chain R (see ▶ Sect. 27.2, Fig. 27.9) is identical for both except for a methyl group on the
nitrogen atom. The heterocycles are different. (a) Intuitively, superposition of both heterocycles
directly upon one another when comparing the structures appear reasonable. Heteroatoms match
pair-wise one another. (b) Arrows are distributed around the molecules to compare the hydrogen-
bonding properties. They are pointed to the molecule when an acceptor is present and they point
away for donor groups. If the molecular skeletons are masked out, and the distribution of H-bond
donor and acceptor groups is concentrated upon, the atom–atom overlap obtained via the direct
superposition of the rings shows rather unconvincing equivalence. (c) Instead if the heterocycle in
17.7 is flipped about the bond between the heterocycle and the side chain R, the pattern of donor
and acceptor groups that is obtained exhibits convincing equivalence.
17.7 Automated Molecular Comparisons and Superpositioning 359
For this, precalculated conformers can be taken and compared successively to
one another. This is realized in the program ROCS from Anthony Nicchols at
OpenEye. Alternatively, a different approach was taken by Christian Lemmen at
GMD in St. Augustin, Germany, in the program FlexS. First a reference ligand
is depicted through a series of property-bound Gaussian functions. The molecule
is described as a density distribution of pharmacophore properties in space.
Then the molecule to be compared by superposition with the reference ligand
is deconvoluted into fragments. A central base fragment is laid upon the
reference in such a way that the description with Gauss functions overlaps
with the reference as optimally as possible. Then the other fragments are
attached to the base fragment until the complete ligand is reconstructed. During
this attachment, care is taken to fit the fragments just as optimally in the Gauss
function. At the same time the conformational flexibility of the ligand is
considered.
One complication occurs during the similarity analysis of the molecules in this
method. Assuming that the relevant properties defining the similarity were found at
all, the question arises as to what is accepted as “sufficient” similarity to induce
Fig. 17.7 Experimentally determined binding geometries of methotrexate (green carbon atoms)
and dihydrofolate (gray carbon atoms) in dihydrofolate reductase. The heterocycles of the ligands
are bound through H-bonds to the carboxylate or carbonyl group of an amino acid that is oriented
into the binding pocket. Two water molecules (red spheres) mediate additional H-bonds between
the ligands and the protein. The difference in the binding mode that is discussed in Fig. 17.6, is
clearly recognized. On the right-hand side the electrostatic potentials around methotrexate (top)
and dihydrofolate are shown. The molecules are found in a spatial orientation that was determined
by crystal structure analysis. Considered qualitatively, the electrostatic potentials of both mole-
cules in this orientation have very similar form.
360 17 Pharmacophore Hypotheses and Molecular Comparisons
a comparable effect on the receptor. There is a toy with which children try to push
differently shaped pieces through preformed holes into a box, a so-called “shape
sorter.” For each block form, be it a cube, cuboid, round cylinder, or elliptical
cylinder, there is one performed hole that it fits. In similarity considerations there
is a tendency to group cube and cuboid, or round and elliptical cylinder into related
categories because of their similar form. If an attempt is made to push these parts
through the holes of the shape sorter, it is easily discovered that the cuboid will not
only fit through the square hole but also, with a bit of force, through the hole for the
elliptical cylinder. The cube is only slightly too big to, in addition to the square
hole, also fit through the hole for the circular cylinder. Therefore, are the cuboid and
the elliptical cylinder or the cube and the circular cylinder not more similar to one
another? The measure of similarity that is to be used for a molecule is calibrated
with respect to the receptor to which the molecule should fit. It is therefore always
a relative measure!
Thiorphan and retro-thiorphan (▶ Sect. 5.5, formulae 5.23 and 5.24) differ only
in the spatial sequence of the amide bond. They bind with almost identical affinity
to the zinc protease thermolysin, and NEP 24.11. Therefore, one would classify
them as very similar. The zinc protease ACE binds thiorphan by at least a factor of
100 times more strongly than retro-thiorphan (▶ Sect. 5.5, Fig. 5.10). Relative
to this enzyme, both substances must be called dissimilar. Another extreme is
seen in the oligopeptide-binding protein A (▶ Sect. 4.1). It binds every tri- to
pentapeptide comprising a central Lys—Xxx—Lys moiety with almost equal
affinity. In principle, only information about the shape of the binding site is
needed for a similarity analysis. Only then the requirements can be adequately
defined. However, the structure of the receptor is still not known in many drug-
design projects. Here there is no choice: it is only through hypothesis and its
experimental testing in gradual steps that the structural requirements of the
receptor can be approximated.
17.8 Rigid Analogues Trace the Biologically Active
Conformation
The concepts in ▶ Chap. 16, “Conformational Analysis” showed that an enor-
mously large number of conformers can be easily generated for many drug-like
molecules. If comparison of all conformers is desired, the undertaking quickly
becomes computationally very intensive. When would a chance been given to get
an idea of the bound conformations? Either one compound in the data set is highly
rigid and constrains the putative arrangements of the pharmacophore in space, or
the considered molecules are rigid in different areas of their molecular scaffold. In
Fig. 17.8 the structural superposition of the steroid 17.2 with the above-described
inhibitors 17.3–17.5 is shown. This result was obtained from a similarity analysis
with multiple conformers. The achieved result is very similar to the calculation with
the “virtual” spring forces. It has, however, a decisive advantage: no preconceived
definitions of equivalent centers are necessary, between which the spring forces
17.8 Rigid Analogues Trace the Biologically Active Conformation 361
are applied. These equivalences arise automatically through a similarity compari-
son of the properties that are distributed over the molecules.
17.9 If Rigid Analogues are Lacking: Model Compounds
Elucidate the Active Conformation
In the last example a largely rigid reference compound was furnished. How should
one proceed when no such reference compound is known? Only experiment can
help here. Rigidized analogues must be synthesized. These are tested for biological
activity. If they still exhibit affinity to the receptor, it can be assumed that the active
conformation was frozen.
An example should demonstrate how the receptor-bound conformation can be
probed by synthesizing rigid model compounds. The calcium channel blocker
nifedipine 17.8 (▶ Sect. 2.5) contains multiple rotatable bonds (Fig. 17.9). It can
therefore adopt numerous conformations. Which orientation does the phenyl ring,
for instance, take relative to the dihydropyridine ring? This question was very
elegantly clarified by Wolfgang Seidel at Bayer through the synthesis and crystal
structure determination of cyclized derivatives 17.9. An additional lactone ring
changes the biological activity of the derivative depending on the ring size. In
compounds with a six-membered lactone the phenyl and dihydropyridine rings lie
virtually in the same plane. Conversely, the phenyl ring stands perpendicular to the
dihydropyridine ring in the derivative with the twelve-membered ring. The affinity
of this compound is about five orders of magnitude higher than for the derivative
with the six-membered lactone. Therefore it must be assumed that nifedipine exerts
its effect in a conformation in which the phenyl and dihydropyridine rings are
perpendicular to one another.
After this question has been answered, more compounds can be designed.
A relevant superposition that corresponds to the conditions in the protein’s binding
pocket will be possible. Such superpositions have gained a decisive meaning in the
context of 3D structure–activity relationships. An example is shown in ▶ Sect. 29.4
of how the structural fixation of the biologically active conformation of a ligand can
support the design process.
Fig. 17.8 Superposition of the steroid 17.2 and three inhibitors 17.3–17.5 according to a spatial
comparison of their molecular properties. In contrast to methods with “virtual” spring forces, this
method does not require a predefined equivalence of molecular groups. It is automatically
generated by the similarity comparison of many different conformations.
362 17 Pharmacophore Hypotheses and Molecular Comparisons
N
CO2CH3
CH3
NO2
H3CO2C
H3C
17.8 Nifedipine
H
CH3
H3C N
O
(CH2)n
O
RO2C
17.9
H
6
7
9
8
10
11
12
1
0
20
40
60
80
90°-α
10 100 1,000
Ki (nM)
10,000 100,000
Fig. 17.9 The calcium
channel blocker nifedipine
17.8 contains multiple
rotatable bonds. The phenyl
ring can coincide with a plane
of the dihydropyridine ring or
they orient perpendicular to
one another. To distinguish
between these possibilities,
lactones with different ring
size 17.9 were synthesized
and their crystal structures
were determined. The phenyl
ring lies almost parallel to the
dihydropyridine ring (a  0
)
in the compound with the six-
membered-ring lactone
(orange). Upon increasing the
ring size, the angle between
the two rings grows so that
a perpendicular orientation
(a  80
) is achieved in the
twelve-membered-ring
derivative (green). The
biological activity increases
from virtually inactive, as in
the six-membered ring, to
almost five orders of
magnitude higher for the
twelve-membered-ring
derivative. The bioactive
conformation of nifedipine
(gray) therefore requires
a perpendicular orientation of
the two rings.
17.9 If Rigid Analogues are Lacking 363
17.10 The Protein Defines the Pharmacophore: “Hot Spot”
Analysis of the Binding Pocket
It was described in Sect. 17.1 that a pharmacophore can also be derived from the
protein structure. The computer program GRID from Peter Goodford is a tool that
is often used for this purpose. It calculates favorable positions for functional groups
on a putative ligand in the protein’s binding pocket. These could be, for instance,
a carboxylate group, a hydroxyl group, or an aliphatic carbon atom. The potential
function, implemented into GRID, has been calibrated on numerous functional
groups from crystal structures of organic molecules. The result of a GRID calculation
is a set of interaction energies assigned to the intersections of a regularly spaced grid
that is inscribed into the binding pocket. The energies are graphically displayed, for
instance, by contouring the spatial area at which the interaction energy reaches or
exceeds a certain predefined threshold. They indicate hot spots for the placement of
functional groups of a potential ligand. The areas in which the interactions with an
aromatic carbon atom or a hydroxyl oxygen atom are favorable are shown for the
enzyme thermolysin in Fig. 17.10. Such calculations are carried out with a set of
different probes, for instance, a water molecule, an aromatic carbon, a hydrogen-bond
acceptor or donor, or a positively or negatively charged group. The results provide
valuable informationabout the shape and electrostatic properties of the bindingpocket.
Another way of analyzing protein structures is based on the idea that the physical
nature of non-bonding interactions in protein–ligand complexes and in the crystal
packing of small organic molecules is identical. The latter are particularly inter-
esting for this purpose because the crystal structures of small organic molecules are
regularly determined with great precision. There are over 500,000 crystal structures
stored in the Cambridge Database (▶ Sect. 13.9). This collection is ideal to obtain
relevant and reliable data via a statistical analysis for ligand-design purposes
(▶ Sect. 14.7). Let us assume that there is a carboxylate group —COO— on the
protein that protrudes into the binding pocket. Where must a partner group be
positioned to form a favorable interaction? To answer this question, the Cambridge
Database was searched first for compounds with carboxylate groups, and then for
each of the retrieved groups, the position of the counter group that forms an H-bond
to the carboxylate was saved. Finally, the collective of all the found H-bonds was
superimposed in that the carboxylate groups of all examples are superimposed
exactly onto one another. The distribution of H-bond-donor groups (Fig. 17.11)
offers a valuable picture of the allowed area of the H-bond geometry. Subsequently,
such a distribution can be superimposed onto the protein structure by matching with
the carboxylate group of the protein. Areas in which the distribution overlaps with
other atoms of the protein are discarded. In this way the energetically most
favorable areas for a counter group in the binding pocket are found. In Fig. 17.12
these distributions are compared with a protein–ligand complex. As expected, the
hydrogen-bond geometries found in the complex coincide nicely with the range that
was found in the crystal packings of organic molecules. A system of rules for non-
bonding interactions in protein–ligand complexes was obtained from the statisti-
cal evaluations of all groups that are found in proteins. These rules are compiled at
364 17 Pharmacophore Hypotheses and Molecular Comparisons
the Cambridge Crystallographic Data Centre in the Isostar database. Once
superimposed with the protein, they can be contoured to map hot spots of binding
with the program SuperStar.
Knowledge-based potentials represent another approach for the display of
a protein-based pharmacorphore. For this, the contact geometries in protein–ligand
complexes are evaluated. A histographical distribution is compiled that shows how
often a particular contact occurs between a group of a ligand and an amino acid in
the protein. If such a statistical frequency distribution is related to a mean reference
Phe114
Asn112
Zn2+
Arg203
O
HO
HO O
HO
OH
O
CH3
N
H2O
Benzylsuccinic acid
Acetone
Water Acetonitrile Isopropanol Phenol
Fig. 17.10 An analysis of the binding pocket of thermolysin. Areas of favorable interactions were
calculated for an aromatic carbon probe (white) and a hydroxyl oxygen atom (red). There are also
fragments mentioned in Fig. 7.8 that could be determined by allowing the probe molecules to
diffuse into the protein crystals. The calculated hot spot corresponds well with the positions that
were crystallographically determined with molecular probes.
17.10 The Protein Defines the Pharmacophore 365
state, an energy function can be calculated from it. In this function it is assumed that
contacts that occur more frequently than the average distribution are energetically
favorable. If they occur rarely, they are assigned to be unfavorable. These statistical
potentials have been integrated into the scoring function DrugScore. They can also
be used for the analysis of binding pockets and help to indicate hot spots in the
ligand binding.
The MCSS method was developed in the group of Martin Karplus. Several
thousand random probe molecules such as acetone, water, methanol, or benzene
were placed in a binding pocket for this. A computer simulation is started with
OH
O
O
OH
O
OH
O−
O
OH
O
OH
a b
c d
Fig. 17.11 Hydrogen-bonding geometries (carbon is green, oxygen is red, and hydrogen is white)
around a carboxylate group (a), ester group (b), carbonyl group (c), and ether group (d). Structures
with these central groups that form hydrogen bonds with OH donor groups were extracted from the
Cambridge database. These examples were superimposed based on the geometry of the central
group. It is obvious that there is considerable variability in the interaction geometry, but also that
preferred orientations are to be found. It is also shown that, for instance, the interaction pattern
around an ester group (b) is not simply a superimposition of the distribution around a carbonyl
group (c) and an ether group (d).
366 17 Pharmacophore Hypotheses and Molecular Comparisons
which the single probe molecules are moved into optimal positions. They are driven
by a calculation according to the underlying force-field. The probe molecules
experience the interaction with the protein, but they do not “see” one another. At
the end of the calculation a frequency distribution for the probe molecules is
obtained. If this distribution is evaluated, a hot spot for an interaction with the
protein is highlighted. If the so-obtained hot spots are compiled into a composite
picture, a protein-based pharmacophore is obtained.
17.11 The Search for Pharmacophore Patterns in Databases
Generate Ideas for Novel Lead Compounds
A pharmacophore can be used to search a database for promising candidates that
are able to be accommodated in a protein’s binding pocket. The reference
pharmacophore can be either derived from a set of superimposed ligands, or
a reference protein can define its properties. How such a database search is carried
out and what is discovered in the process depends on how much information is
stored in the database itself. If only 2D structures are collected, all examples can be
retrieved that possess a particular functional group or substructure. Based on the
topology, different criteria are defined to determine the degree of similarity
between molecules. If the definition of the pharmacophore is very generally
defined, for instance, an aromatic compound with an acid group and a basic
Ala97
Leu4
Asp26
Tyr155
Fig. 17.12 The distribution of H-bond-donor groups (carbon is white, oxygen is red, and nitrogen
is blue) around a carboxylate group or a carbonyl group are superimposed with the 3D structure of
the complex of methotrexate with dihydrofolate reductase (Fig. 17.7). The distributions are
imposed onto the acid group of Asp26 and the carbonyl groups of Leu4 and Ala97. The hydrogen
bonds formed between protein and ligand coincide geometrically with ranges often found in small
organic molecules in the crystal structures.
17.11 The Search for Pharmacophore Patterns in Databases 367
nitrogen atom, then numerous hits will be found. However, it is important which
relative spatial distances are given between these groups. Such information is not
taken into account in searching a 2D database. Matthias Rarey and Scott Dixon
developed the Feature-Trees method, which can screen large databases according to
topological criteria. However, the connectivities of the chemical formulae are not
compared. Rather, the database entries are initially classified by the topological
sequences of particular characteristics, for instance, the presence of an H-bond-
donor group or a hydrophobic cyclic molecular portion. Such a method can
compare molecules and find candidates that have pharmacophore properties in
a comparable topological sequence extremely quickly.
Databases that contain 3D molecular geometries allow the search for the spatial
pattern of the pharmacophore. For example, the Cambridge Database of crystal
structures of small organic molecules (▶ Sect. 13.9) can be used for such a search.
Molecules are found with experimental geometries that satisfy the pharmacophore.
In the search for ligands for HIV protease (▶ Sect. 24.3) a pharmacophore pattern
was derived from the known crystal structure of the enzyme, and the Cambridge
Database was searched for molecules that match this pattern. The result of this
search is presented in ▶ Sect. 24.4 (Fig. 24.16) in detail. It inspired the researchers
at Dupont–Merck with the first ideas that led to the development of an entirely new
class of non-peptidic HIV-protease inhibitors.
These days databases containing 3D structures of molecules generated from 2D
structural formulae are commonly used along side experimental structural databases.
In other approaches, the molecules spatial structure is generated on the fly during the
search (▶ Sect. 15.2). Here, as with most entries in the Cambridge Database, each
molecule is present in only one conformation. Molecules can, however, adopt many
different conformations (▶ Chap. 16, “Conformational Analysis”). It is therefore
usually the exception that a flexible molecule exists in the “right” conformation
required for the search. Therefore conformational flexibility must be considered
during the search. An elaborate search, for example, the active-analogue approach,
would demand too much computational time. Therefore fast algorithms have been
developed to figure out whether particular pharmacophoric groups on the molecules
could fall within predefined distances. It is enough to estimate the minimum or
maximum achievable distances. This concept has been realized e.g., in the program
UNITY from the company Tripos. One can start from a database holding multiple
precalculated conformers. Here it is critical that the stored conformers are distributed
as representatively as possible throughout the conformational space (▶ Sect. 16.6).
The single conformers are then checked to see whether they fit to the defined
pharmacophore. This concept is followed by the program Catalyst from the company
Accelrys.
It is not to be expected that such database searches directly deliver candidates for
clinical trials. As an idea generator, however, they can guide the drug researcher to
novel lead structures and can drive synthetic plans down entirely different path-
ways. Today database searches are carried out on a large scale during the course of
virtual screenings (▶ Sect. 7.6). For this, proprietary compound libraries are
screened, or collections of commercially available compounds are searched.
368 17 Pharmacophore Hypotheses and Molecular Comparisons
John Irwin and Brian Shoichet at UCSF in San Francisco have taken on the initiative
with the database ZINC, which collects current commercially available compounds
and makes the collection available for database searches. Preset filters help to sieve out
the desired subsets for the search at hand from the millions of compounds in the
databases. As a major advantage, the found hits can be purchased and experimentally
tested in an assay. Many candidates for new lead structures have already been
discovered by using this “lead discovery by shopping” strategy (see ▶ Sect. 21.7).
17.12 Synopsis
• The structure of the binding pocket determines which functional groups are
necessary on the ligand side for successful protein binding. Either the ligand
or the protein structure can be used as the starting point from which
a pharmacophore is derived.
• The superposition of active and inactive small molecule ligands from a series of
related compounds upon one another can be used to define the allowed and
forbidden areas in a hypothetical binding pocket. Logical operations of volume
differences are indicative for the design of optimized ligands.
• Flexible molecules that can adopt different conformations present a special
challenge in superpositions. The molecules must be energy-minimized as part
of the superposition procedure or, alternatively, multiple conformations must be
evaluated.
• Alternatively, a set of molecules can be superimposed by assigning
pharmacophoric groups, and through systematic rotations about all open-
chain single bonds a common alignment is found in the active-analogue
approach.
• Care must be taken to not be deceived by molecules that look similar with
respect to their chemical formulae. Instead, the interacting functional groups are
important for the molecular recognition at the binding pocket and not the
scaffold itself. The role of water in the binding must not be underestimated.
• Molecular recognition properties can also be considered to mutually superim-
pose molecules.
• The synthesis of a structurally rigid analogue (or analogues) can help to define
and validate the pharmacophore assignment and the determination of the bio-
logically active conformation.
• Binding “hot spots” can be found by examining the protein by mapping the
binding pocket with small molecules and probes with different properties. These
give some ideas as to what sort of molecule might show successful binding to the
target protein.
• The Cambridge Database of crystal structures provides valuable insights into
preferred interaction geometries and motifs. Such information is of high rele-
vance for protein–ligand complexes because the forces that are responsible for
crystal packing are the same as for non-bonding interactions between active
substances and proteins.
17.12 Synopsis 369
• A variety of databases are available that can be screened by using a 3D
pharmacophore as a search query. Usually, commercially available compounds
are screened first. If they show activity on a certain protein of interest, they can be
purchased and tested, and will hopefully provide a starting point for lead discovery.
Bibliography
General Literature
Klebe G (1993) Structural alignment of molecules. In: Kubinyi H (ed) 3D-QSAR in drug design,
Theory, methods and application. ESCOM, Leiden, pp 173–199
Langer T, Hoffmann RD (2006) Methods and principles in medicinal chemistry. In: Mannhold R,
Kubinyi H, Folkers G (eds) Pharmacophores and pharmacophore searches, vol 32.
Wiley-VCH, Weinheim
Marshall GR (1989) Computer-aided drug design. In: Richards WG (ed) Computer-aided
molecular design. IBC Technical Services, London, pp 91–104
Special Literature
Bolin JT, Filman DJ, Matthews DA, Hamlin RC, Kraut J (1982) Crystal structure of Eschericha
coli and Lactobacillus casei dihydrofolate reductase refined at 1.7 Å resolution. J Biol Chem
257(13):13650–13662
Kearsley SK, Smith GM (1990) An alternative method for the alignment of molecular structures:
maximizing electrostatic and steric overlap. Tetrahedron Comput Methodol 3:615–633
Klebe G, Mietzner T, Weber F (1995) Different approaches toward an automatic structural
alignment of drug molecules: applications to sterol mimics, thrombin and thermolysin inhib-
itors. J Comput-Aided Mol Des 8:751–778
Klunk WE, Kalman BL, Ferrendelli JA, Covey DF (1983) Computer-assisted modeling of the
picrotoxinin and g-butyrolactone receptor site. Mol Pharmacol 23:511–518
Kuster DJ, Marshall GR (2005) Validated ligand mapping of ACE active site. J Comput-Aided
Mol Des 19:609–615
Mackay MF, Sadek M (1983) The crystal and molecular structure of picrotoxinin. Aust J Chem
36:2111–2117
Marshall GR, Barry CD, Bossard HE, Dammkoehler RA, Dunn DA (1979) The conformational
parameter in drug design: the active analog approach. In: Olson EC, Christoffersen RE (eds)
Computer-assisted drug design, vol 112, ACS symposium series. American Chemical Society,
Washington, DC, pp 205–226
Martin YC (1992) 3D database searching in drug design. J Med Chem 35:2145–2154
Mayer D, Naylor CB, Motoc I, Marshall GR (1987) A unique geometry of the active site of
angiotensin-converting enzyme consistent with structure-activity studies. J Comput-Aided Mol
Des 1:3–16
Seidel W, Meyer H, Born L, Kazda S, Dompert W (1984) Rigid calcium antagonists of the
Nifedipine-type: geometric requirements for the dihydropyridine receptor. In: Seydel JK (ed)
QSAR as strategies in the design of bioactive compounds. VCH, Weinheim, pp 366–369
370 17 Pharmacophore Hypotheses and Molecular Comparisons
Quantitative Structure–Activity
Relationships 18
Quantitative structure–activity relationships, QSAR (usually pronounced [0
ky€
u:
sar]), attempt to describe and quantify the correlation between chemical structure
and biological activity. The investigated substances should come from a chemically
uniform series and must interact with the same biological target. They should also
display the same mode of action. For example, structurally analogous inhibitors of
a particular protein can be compared among themselves, but not different blood
pressure lowering drugs that have diverse modes of action on different target proteins.
The correlation of biological activity with the physicochemical properties is always
related to relative potency in a test model, but not to different effect qualities.
The foundation of quantitative correlations between chemical structure and
biological effect is the entirely reasonable assumption that the differences in the
physicochemical properties are responsible for the relative potency of the interac-
tions of the drug with biological macromolecules. It is assumed in the first approx-
imation that these contribute additively to the affinity of an active substance on its
receptor. The concept of describing the biological activity of substances with
mathematical models is derived from this approach.
For the system under investigation, it can be assumed that the simpler it is, the
more likely it will be that a quantitative structure–activity relationship can be
derived. To a certain extent this is valid for in vitro systems, such as the inhibition
of an enzyme or the binding to a receptor, where the assay records only the binding
of a compound to a protein. The more complex the system is, for example, central
nervous system effects on an animal after oral administration, the more different
processes must be considered. In this case the absorption, distribution, blood–brain
barrier penetration, further transport to the target tissue, metabolism, and elimina-
tion overlap with one another and with the actual effect on the receptor. In principle,
an individual structure–activity relationship is required for each of these events.
Establishing valid and relevant models for each of these steps, requires
corresponding test systems that examine the different steps separately. In favorable
cases it might be possible to characterize a complex multistep process by one single
equation. This is only feasible if one step, for instance, the penetration through the
blood–brain barrier, dominates the entire structure–activity relationship.
G. Klebe, Drug Design, DOI 10.1007/978-3-642-17907-5_18,
# Springer-Verlag Berlin Heidelberg 2013
371
18.1 Structure–Activity Relationships of Alkaloids
The South American dart poison tubocurare (▶ Sect. 7.1) was the first therapeutic
principle for which the exact mode of action was elucidated. In 1852, Claude
Bernard recognized that this quaternary alkaloid causes muscle paralysis, but that
the nerve as well as the muscle remain independently excitable. Curare must
therefore act on the coupling between nerve and muscle. Scottish pharmacologists
Alexander Crum-Brown and Thomas Fraser occupied themselves somewhat more
exhaustively with the question of whether the quaternization of the nitrogen atom of
different alkaloids (Fig. 18.1) has an influence on their biological effects. In 1868,
from entirely different effects observed before and after the transformation of
alkaloid, they formulated a general equation to describe structure–activity rela-
tionships (Eq. 18.1).
F ¼ fðCÞ (18.1)
This equation is ingeniously simple, but it says only that F (Greek letter Phi), the
biological activity, is a function of C, the chemical structure. At that time, the
tetrahedral structure of the carbon atom had not been clarified, and the constitution
of many organic compounds, above all complex natural products, was entirely
unknown.
18.2 From Richet, Meyer, and Overton to Hammett and
Hansch
In 1893, Charles Richet published an investigation on the toxicity of organic
compounds. From the comparison of the water solubility of ethanol, diethyl ether,
urethane, paraldehyde, amyl alcohol, and absinthe extract(!) to the lethal dose in the
dog, he concluded plus ils sont subles, moins ils sont toxiques, that is, the better the
solubility, the less the toxicity. This was the first evidence of a linear inverse
relationship between water solubility and biological activity.
R2 N
R1
H
R3
R2 N
R1
R3
R2 N
R1
CH3
R3
pH  9
pH  9
+ +
Positively
Charged Form
Neutral Alkaloid Quaternary, Perpetually
Charged Form
e.g.
CH3I
Fig. 18.1 The protonation of a tertiary amine depends on the pH value of the medium (left). On
the other hand, the quaternization of a nitrogen atom leads to a permanently positively charged
compound (right).
372 18 Quantitative Structure–Activity Relationships
Around the turn of the twentieth century, pharmacologist Hans Horst Meyer and
botanist Charles Ernest Overton founded the lipid theory of anesthesia indepen-
dently, which unifies three important statements:
• All chemically unreactive substances that are lipophilic and can be distributed in
biological systems have anesthetic effects.
• The biological effect occurs in nerve cells because fat plays an important role in
their function.
• The relative potency of anesthetics depends on their partition coefficient
(▶ Sect. 19.2) in a mixture of fat and water.
The work of Crum-Brown, Fraser, and Richet, or the contribution of Meyer and
Overton can be seen as the origin of quantitative structure–activity relationships. In
fact after the formulation of the anesthesia theory, numerous other linear, and later
non-linear, dependencies on the lipophilicity, the “fat affinity” of active substances,
were found. But all of these activities were relatively unspecific “membrane”
effects.
In the middle of the 1930s Louis P. Hammett formulated a relationship between
the electronic properties of the substituents and the reactivity of aromatic com-
pounds. Accordingly, the relative contribution of electron-withdrawing and elec-
tron-donating substituents on the electron density of the aromatic ring is always
constant. They are determined by the electronic parameter of the substituent, the
Hammett constant, s. Electron-accepting substituents with positive s values are,
among others, the nitro group, the cyano group, and the halogens. Electron-
donating substituents with negative s values are hydroxyl and amino groups, the
methoxy group, and alkyl substituents. Acceptor substituents enhance the acidity of
benzoic acids and phenols, they reduce the basicity of anilines, and they accelerate
the basic hydrolysis of benzoic ethers. Electron-donating substituents exert an
opposite influence.
However an individual reaction constant r must be applied for each reaction
type of aromatic compounds. By using Eq. 18.2, later generally called the Hammett
equation, the equilibrium constant K for an arbitrary reaction can be calculated from
r and s. R–X and R–H represent the relevant aromatic compounds substituted with
the group X, or unsubstituted, respectively.
rs ¼ log KRX  log KRH (18.2)
Acceptor and donor substituents influence the electron density on the heteroatoms
and reduce or increase the ability to form hydrogen bonds. This, among other things,
explains the electronic influence of aromatic substituents on the biological activity of
drug molecules. The Hammett equation was therefore seen as a challenge to phar-
maceutical chemists and biologists to derive quantitative structure–activity relation-
ships from this concept. Many groups have made efforts to find relationships between
biological activity and the Hammet constants s, or between s and/or r-analogous
substituents and to derive test parameters for biological systems. Despite individually
interesting results, no generally valid concept could be established.
18.2 From Richet, Meyer, and Overton to Hammett and Hansch 373
It was Corwin Hansch and Toshio Fujita who in 1964 published a work that
established the fundamentals for quantitative structure–activity relationships. In
this, they describe:
• The definition of a lipophilicity parameter p, analogous to the electronic term s
in the Hammett equation.
• The combination of different parameters in a model.
• The formulation of a parabolic model for the description of non-linear
lipophilicity–activity relationships.
18.3 The Determination and Calculation of Lipophilicity
Corwin Hansch had previously investigated the structure–activity relationship of
phenoxyacetic acids, which show growth-stimulatory effects in plants. In addition
to their biological activity, he was particularly interested in their lipophilicity,
which can be measured by the partition coefficient in an octanol/water system
(▶ Sect. 19.1). It occurred to him while analyzing the data that the lipophilicity is
an additive molecular parameter. The logarithm of the octanol/water partition
coefficient P is given by the sum of the group contributions of the individual parts
of the molecule. Hansch defined a lipophilicity parameter p (Eq. 18.3), analo-
gously to the Hammett equation. R–X and R–H have the same meaning here as in
Eq. 18.2. The absence of a reaction-specific r term in Eq. 18.3 is because the p
value is based on a single distribution system: n-octanol and water.
p ¼ log PRX  log PRH (18.3)
n-Octanol was chosen for theoretical and practical reasons. It has a long aliphatic
chain and a hydroxyl group that is an H-bond donor as well as an acceptor. Its
structure therefore resembles the membrane lipids to some extent. It dissolves a
large number of organic compounds, it has a low vapor pressure, but can nonethe-
less be easily removed. Its UV transparence over an extremely wide range is
particularly advantageous.
With the help of the lipophilicity parameter p, the log P values of new com-
pounds, and therefore their lipophilicity, can be calculated. For this the lipophilicity
of the basic scaffold and the p values of the substituents must be known. In this way
the biological activity can be correlated without the tedious experimental measure-
ments of each individual partition coefficient. In addition to the p values of all
important substituents, a very large number of experimentally determined octanol/
water partition coefficients are available in the literature.
18.4 Lipophilicity and Biological Activity
Lipophilicity has an overwhelming role in describing the dependence of biological
effects on chemical structure and therefore accounts for many quantitative
374 18 Quantitative Structure–Activity Relationships
structure–activity relationships. This is easily understood because biological
systems consist of aqueous phases that are separated by lipid membranes. The
transport and the distribution of small molecules in such systems must therefore
depend on the lipophilicity. For polar substances the lipid membrane represents
a barrier that they cannot surmount. Only substances with moderate lipophilicity
have a good chance to “migrate” into the aqueous as well as the lipid phases to
arrive in adequate concentrations in the target tissue (▶ Chap. 19, “From In Vitro to
In Vivo: Optimization of ADME and Toxicology Properties”). Although soluble
proteins carry overwhelmingly polar amino acid residues on their surfaces, the more-
or-less buried binding site for ligands is constructed from polar and non-polar areas.
The hydrophobic parts of the ligand bind to the hydrophobic parts of the pocket.
The size of these hydrophobic surface areas is always limited. The size and form of
the lipophilic portion of the ligand must fit to the hydrophobic surfaces in the binding
pocket. Because the natural ligands that are normally bound in these pockets have
adequate water solubility themselves, the lipophilic areas in the binding pockets are of
limitedsize.Anotherreasonforthe complex,generallynon-linearlipophilicity–activity
relationships results from this fact. Many linear and non-linear lipophilicity–activity
relationships describe relatively unspecific biological effects such as anesthetic,
bactericidal, fungicidal, and hemolytic effects. They shall not be further discussed
here. Other relationships describe the transport and distribution in a biological system.
Such structure–activity relationships are discussed in ▶ Chap. 19, “From In Vitro to
In Vivo: Optimization of ADME and Toxicology Properties”.
18.5 The Hansch Analysis and the Free–Wilson Model
In 1964 Corwin Hansch and Toshio Fujita derived a mathematical model more
intuitively than theoretically that can quantitatively describe structure–activity
relationships, the Hansch analysis (Eq. 18.4).
log
1
C
¼ k1ðlog PÞ2
þ k2 log P þ k3s þ K k (18.4)
In Eq. 18.4, C is a molar concentration that induces a particular biological effect.
When related to a series of substances, it is the equieffective molar dose. Log P is the
logarithm of the octanol/water partition coefficient P, and s is the Hammett constant.
The square of the log P term allows the quantitative description of non-linear
lipophilicity–activity relationships. This term is omitted when the dependence is
linear. Other terms such as polarizability and steric parameters can additionally occur.
The coefficients k1, k2,. . . and k are determined with the method of regression
analysis. The Hansch analysis therefore establishes a hypothetical model for quan-
titative relationships between biological activity and physicochemical parameters.
Biological data are flawed, and the same is true for physicochemical properties.
Despite this, the reliability of the latter parameters is usually greater than those of
the biological data. The result of a calculation is judged by the squared differences
18.5 The Hansch Analysis and the Free–Wilson Model 375
between the measured biological data and the values that were calculated from the
model. The sum must show the smallest possible value over all of the investigated
compounds. It represents an important criterion for the judgment of the quality of
a model, or for the comparison of different models with different qualities.
The quantitative structure–activity relationship of the antiadrenergic effect of
N,N-dimethyl-b-bromophenethylamines 18.1 (Table 18.1) is considered as an
example. According to their structure, these compounds more or less reverse the
agonistic effect of an adrenaline dose. The value C is the dose of an antagonist that
blocks the adrenaline effect by 50%. The data can be described with the Hansch
model, which is illustrated in Fig. 18.2.
The description of the entire data set is possible with a mathematical model by
using the derived equations. A carbocation is formed upon cleavage of bromine,
Table 18.1 The biological activity of meta- and para-substituents of phenethylamines 18.1 (i.v.
application in the rat; C in mol/kg rat)
meta (X) para (Y) log 1/C
Br
X
Y
N
Y x HCI
18.1
H H 7.46
H F 8.16
H Cl 8.68
H Br 8.89
H I 9.25
H Me 9.30
F H 7.52
Cl H 8.16
Br H 8.30
I H 8.40
Me H 8.46
Cl F 8.19
Br F 8.57
Me F 8.82
Cl Cl 8.89
Br Cl 8.92
Me Cl 8.96
Cl Br 9.00
Br Br 9.35
Me Br 9.22
Me Me 9.30
Br Me 9.52
376 18 Quantitative Structure–Activity Relationships
and the substances bind irreversibly to the adrenergic receptor. Accordingly, the sþ
term is found in the Hansch equation (Fig. 18.2), which describes such reaction
types particularly well. Lipophilic substituents increase the biological activity
(positive p term) and electron-withdrawing substituents decrease it (negative sþ
term). Therefore lipophilic electron-donating substituents, for example, large alkyl
substituents, should be optimal for the activity. Second, within certain limits, the
effect of further compounds can be predicted. Interpolations, that is, conclusions
that are drawn based upon very similar substituents, have a better reliability than
extrapolations, which are predictions made outside of the parameter space, for
instance, for considerably more lipophilic, more polar, or larger substituents. As
a first approximation, it can be said of the statistical parameters r, s, and F
(Fig. 18.2) that the correlation coefficient r should have values that are close to
1.00, the standard deviation, s, should be as small as possible, and the F value
should be as large as possible. The better the criteria are fulfilled, the better the
quantitative model will be, in other words, the experimental and calculated values
agree better with one another.
Also in 1964 and independently of Hansch and Fujita, S. R. Free and J. W. Wilson
developed an entirely different model for structure–activity analysis. Because the
original approach is confusingly formulated and awkward to use, here only a variant
shall be discussed that was later proposed by Fujita and T. Ban, the Free–Wilson
analysis. The Free–Wilson analysis assumes that within a set of chemically related
C = Molar concentration
that invokes a particular
biological effect
Regression
coefficient
values
95% Confidence interval
for the coefficients and
constants
Log 1/C = 1.15 (±0.2) p -1.46 (±0.4) s+ + 7.82 (±0.2)
The logarithm of
the reciprocal value
gives the correct
scaling
Lipophilicity
parameter
Electronic
parameter
Constant term
(n = 22; r = 0.945; s = 0.196; F = 78.6)
The Fischer value F is a
measure of the significance;
it is often not reported
The standard deviation,
s, is a measure of the
absoute quality of the
model
The correlation
coefficient, r, is a
measure for the
quality of the model
Number of
compounds
Fig. 18.2 A QSAR equation delivers individual parameters for a quantitative model for the
prediction of biological activity, in this case from substituted N,N-dimethyl-b-
bromophenethylamines (Table 18.1).
18.5 The Hansch Analysis and the Free–Wilson Model 377
substances, a reference compound, usually the unsubstituted starting compound,
makes per se a specific contribution m to the biological effect. Each substituent on
this scaffold delivers an “additive and constitutive” contribution ai to the biological
activity (Fig. 18.3). Additive, because there is no consideration of structural variation
in other positions in the molecule, and constitutive because it does matter on what
position of the molecule the specific structural change is undertaken. Despite these
relatively simple assumptions, the Free–Wilson analysis delivers good quantitative
models for many structure–activity relationships.
In contrast to the Hansch analysis, which compares properties, the Free–Wilson
analysis is a real “structure–activity analysis,” because the parameter that codes for
the structural information (1 for present, 0 for absent) correlates with biological
effects. It is easily carried out, but the structures and the biological data must be
known. Unfortunately, the Free–Wilson analysis also has disadvantages:
• The structural variation must be present on at least two different substitution
sites, because otherwise there will not be enough degrees of freedom to use
statistical methods.
• The usually large number of variables diminishes the predictive value and
reliability of the analyses.
• Predictions are only possible for combinations of substituents that have already
been considered in the analysis, and not for new substituents.
If the Free–Wilson analysis is applied to the above-mentioned antiadrenergic
phenethylamine example, the values in Table 18.2 are obtained for the scaffold and
the substituent contributions. Even after a quick glance, an increase in the values from
Basic Scafforld (contribution μ)
X1
Xn
Active
substance
Free-Wilson Model:
X2
log 1/C = Σ ai + m
(Contribution a1)
(Contribution a2)
(Contribution an)
Fig. 18.3 The Free–Wilson analysis uses the additive nature of the group contributions to
describe the biological activity. Accordingly, the biological activity in the displayed equation is
made up of the activity of the basic scaffold, m, and the constant group contributions ai of the
substituents Xi.
Table 18.2 Free–Wilson group contributions for phenethylamines
Position H F Cl Br I Me
meta 0.00 0.30 0.21 0.43 0.58 0.45
para 0.00 0.34 0.77 1.02 1.43 1.26
m ¼ 7.82
(n ¼ 22; r ¼ 0.97; s ¼ 0.19)a
a
For an explanation of these values see Fig. 18.2
378 18 Quantitative Structure–Activity Relationships
F to Cl and Br to I, that is, the influence of the lipophilicity, is obvious. Despite having
almost the same lipophilicity, the methyl and chloro substituents are different. This is
explained by their different electronic properties. Differences in the meta and para
position on the electronic influence can also be followed. Therefore the Free–Wilson
analysis indeed has advantages for the analysis of substituent effects.
18.6 Structure–Activity Relationships of Molecules in Space
As was shown in the previous section, an attempt is made to correlate structure–
activity relationships with substance-specific parameters. These parameters, for
example, volume, polarizability, or lipophilicity are properties that are calculated
or measured for the entire molecule or for specific groups of substituents. The 3D
structure of the molecules is only conditionally considered by these descriptors.
Therefore in the context of increasing knowledge of the spatial structure of protein–
ligand complexes, the QSAR methods focus on parameters that can be derived from
the 3D structure. As a general rule the goal of these approaches is to calculate
binding affinity. The techniques can also be applied for the description of
other biological properties such as the bioavailability or the metabolic reactivity
(▶ Chap. 19, “From In Vitro to In Vivo: Optimization of ADME and Toxicology
Properties”). To distinguish them from the above-described classical QSAR tech-
niques, these are referred to as 3D-QSAR methods.
Ideally, parameters are desired that can be read directly from the 3D structure of
an active substance and that can be used to draw conclusions about their binding
affinity. The interplay between these parameters and the activity are, however, very
complex and even today are by no means fully understood. Furthermore there are
still many other biological systems on which one would like to apply 3D-QSAR
methods, but the structures of the relevant target proteins are unknown. Many
pharmacologically relevant receptors are membrane bound, and their structure
determination has proven to be extremely difficult. The knowledge of their struc-
tures is, however, a prerequisite for a reasonable estimation of the binding affinity
of a ligand from the geometry of the formed complex (▶ Chap. 4, “Protein–Ligand
Interactions as the Basis for Drug Action”). As a consequence, an attempt is made
to calculate not the absolute values of the binding affinities from these incomplete
data, instead relative affinity differences between active substances in a data set are
concentrated upon. The gradual changes in the substance-specific parameters are
then correlated with the biological data.
18.7 Structural Alignment as a Prerequisite for the Relative
Comparison of Molecules
Assumptions about the spatial structure of molecules are already considered in
classical QSAR techniques. Different positions of substituents, for example, in the
meta or para position of an aromatic ring, are often described by individual
18.7 Structural Alignment as a Prerequisite for the Relative Comparison of Molecules 379
parameters. In this form they are regarded in the Hansch equation as well as the
Free–Wilson analysis (Sect. 18.5). Moreover, indicator variables for different
configurations of substituents, for example, the configuration of stereoisomers,
are defined in classical QSAR models. An analogous orientation of the molecule
in a hypothetical binding pocket is assumed for the use of these parameters. For
example, it is assumed that all ortho substituents are oriented toward the “same
side” in a series of ortho-substituted derivatives. As a prerequisite structure–activity
relationships that correlate the biological activity with properties of the 3D structure
need a spatial superposition of the active substances. This superposition should
approximate the relative orientation in the binding pocket as accurately as possible.
A technique was discussed in ▶ Chap. 17, “Pharmacophore Hypotheses and Molec-
ular Comparisons” that can be used for the calculation of these spatial
superpositions.
18.8 Binding Affinities as Compound Properties
Which substance-specific characteristics can be used to correlate the properties of
the 3D structure with the binding affinity? As was discussed in ▶ Chap. 4, “Protein–
Ligand Interactions as the Basis for Drug Action,” the binding affinity is composed
of enthalpic and entropic components. The first contribution comprises everything
that depends on the direct energetic interaction. These are predominantly of a steric
(van der Waals potentials, ▶ Sect. 15.4) or electrostatic (Coulomb potentials)
nature. The second contribution concentrates on the degree of ordering and the
distribution of the energy over the different degrees of freedom of the studied
system. The ligands as well as the binding pockets of a protein are solvated by water
molecules in the uncomplexed state. Upon complex formation, the enthalpic inter-
actions to these water molecules are lost. They are replaced by direct interactions
between ligand and protein. Because only the relative differences between mole-
cules of a data set are of interest, any effects that are the same for all derivatives are
ignored. Among these effects are practically all influences that affect the protein.
This omission is certainly a rough simplification because the protein changes its
solvation state upon ligand binding. Water molecules are displaced from the
binding site. Ligand-induced adaptations of side chains in the binding pocket or
changes in the rotational degrees of freedom of methyl groups and side chains
(▶ Sect. 4.10) are imaginable. These effects are either not considered or are
accepted as being the same for all molecules in the data set. Presumably this
assumption is valid for many cases. Nonetheless, many new investigations clearly
show that changes affecting the protein or the dynamics of the ligand are often not
constant within a series of compounds. Here the methods will fail.
In the beginning only the steric and electrostatic interactions of an active
substance in the binding pocket should be taken into consideration. How can these
properties be compared for a series of ligands? A first approach to this was the
hypothetical interaction models from Hans-Dieter Höltje and Lemont B. Kier. The
decisive prerequisite of the latter models was the choice of and spatial positioning
380 18 Quantitative Structure–Activity Relationships
for amino acid side chains around the ligands. These assumptions can be dropped
once the molecules are embedded in a lattice and can be explored with an interac-
tion probe. Richard Cramer and M. Milne proposed such a model in 1978. It took
another 10 years until the generally applicable CoMFA method (Comparative
Molecular Field Analysis) was established. Despite many theoretical and practical
deficiencies with their application, the method was quickly accepted. Today it is
applied in many different variations.
Before such an analysis can be practically carried out, a few basic consider-
ations should be made. Do steric and electrostatic interactions consider all
contributions to ligand binding that lead to a correct relative ranking of binding
affinity? As already mentioned, the binding affinity is composed of enthalpic and
entropic contributions. A sampling of the properties via probes to map interac-
tions certainly affords a measure for how well a molecule can undergo energet-
ically favorable interactions. How well are the entropic contributions considered?
A considerable portion is made up of solvation and desolvation processes
(▶ Sect. 4.6). These processes change the local water structure around the ligand
and in the binding pocket. The water structure in the immediate vicinity of the
hydrophobic surfaces of the ligand is more ordered in the solvated state than it is
in bulk water. The transition of such ligands out of the bulk water into the
protein’s binding pocket immediately causes a certain number of water molecules
to adopt a less-ordered state. This increases the entropy of the system and pro-
motes spontaneity in the binding process. The number of water molecules that are
involved in this process depends on the size of the hydrophobic surface of
the ligand. Furthermore the displacement of the water molecules from the
binding pocket upon ligand binding increases the disorder of the examined
system and also increases its entropy. In the above-mentioned approximation it
is assumed that this water-related effect is the same for all molecules in the data
set. Therefore, it is not considered in a relative comparison. Additionally,
a molecule can move “freely” in an aqueous solution and adopt different confor-
mations. In the binding pocket, however, it is fixed predominantly in one partic-
ular conformation. Rotational, translational, and conformational degrees of
freedom are lost, and the system loses entropy. All of these influences are to be
taken into consideration for the correct treatment of affinities.
18.9 How Is a CoMFA Analysis Performed?
The most important and most often used method for 3D structure–activity analysis
is the CoMFA method. The execution of a CoMFA study first requires the choice of
a data set of suitable compounds. This data set should encompass around 50–100
compounds with related overall geometry. It should also be ensured that all sub-
stances bind to the same protein at the same site, and that a binding affinity is
known for all of them. The ligands must possess a given diversity with regard to
their structural variation. Their binding affinities should scatter over at least three
orders of magnitude. Conformations are generated for all of the molecules
18.9 How Is a CoMFA Analysis Performed? 381
(▶ Chap. 16, “Conformational Analysis”) and are superimposed by using one of
the techniques discussed in ▶ Chap. 17, “Pharmacophore Hypotheses and Molec-
ular Comparisons”. As a general rule, the spatial structure of the protein, if known,
is taken, and the ligands are mutually aligned in the binding pocket. Finally the
superimposed molecules are embedded in a lattice (Fig. 18.4) that encloses them
by a broad margin. The intersections of the lattice should show a grid spacing of 1 or
2 Å. A probe, that is, an atom with the properties of hydrogen, carbon, or oxygen, or a
particle with a formal charge, is placed at each of the grid points. The interaction
energies are calculated between this probe and each molecule in the data set. The
collective interaction contributions on the grid are referred to as the interaction field
of the molecule. This also gave rise to the name of the method. Finally the fields of the
molecules in the data set are compared with one another. If the box size is 10–20 Å,
and a grid spacing of 1–2 Å is applied, there are many thousands of field values per
molecule of the data set to be handled. This huge amount of data means that the
evaluation of the fields can be computationally very intensive.
18.10 Molecular Fields as Criteria of a Comparative Analysis
Steric and electrostatic interactions are described by a Lennard-Jones or a Coulomb
potential (Fig. 18.5) in force-fields (▶ Sect. 15.4). If the distance between a probe
and an atom of the molecule approaches zero, the Lennard-Jones and Coulomb
potentials increase toward infinity. With like-charged particles the Coulomb poten-
tial approaches infinity with oppositely charged particles negative infinity. These
values reach extremely high field contributions at the grid points that fall near the
surface or lie inside a molecule. They must be avoided in a CoMFA analysis.
Therefore the field contributions above and below a particular threshold are set
to a predefined cut-off value. According to these procedures, a Lennard-Jones or
a Coulomb potential can be calculated. Aliphatic carbon atoms, for example, can be
used as probes. These probes are given a positive or negative charge to study the
electrostatic properties of the molecules. The program GRID of Peter Goodford was
introduced in ▶ Sect. 17.10. Molecular fields can be calculated with this program
for numerous probes that describe different functional groups. For each predefined
probe there are areas in space at which favorable or unfavorable interactions
between the probe and the examined molecule are to be expected.
Moreover, other fields can also be defined aside from fields that probe the steric
and electrostatic properties of molecules. Further above in ▶ Sect. 18.8, it was
discussed that the hydrophobic surface of a molecule represents a measure for the
entropic contribution, particularly upon transfer from the bulk water phase. Molec-
ular fields were developed in the group of Donald Abraham that allow the hydro-
phobic properties of molecules to be explored (program HINT). These are
calculated by using a very similar distance-dependent function. The resulting
molecular field describes the lipophilicity distribution on the surface of a molecule.
382 18 Quantitative Structure–Activity Relationships
Comp. –lg(Ki) S1 S2 S3 .... .... ....
Sn E1 E2 E3 En
4.15
.
.
.
.
3.89
6.74
8.83
−lg(Ki) = y + a S1 + b S2 + c S3 + ... + h Sn + k E1 + m E2 + n E3 ... + z En
5.74
Fig. 18.4 A grid is generated for the calculation of molecular fields that broadly encompasses
a molecule. The grid points are color-coded, with increasing distance from the ligand (red 
yellow  green  blue  gray). The contributions from the chosen fields are calculated at points of
the lattice, which have a grid spacing of 1–2 Å. The field contributions at each point in the grid (S1,
S2,. . .Sn, E1, E2, . . . En) are written into a table. The analysis is carried out for all molecules in the
data set. The binding affinities are incorporated into the table as, for instance, –log (Ki). The field
contributions are weighted with appropriate coefficients (a, b, . . .z) and using a special statistical
method, the PLS analysis, they are related to the affinity. A model is obtained in the form of an
equation that indicates at which grid points and with what weight the different field contributions
explain the biological activity.
18.10 Molecular Fields as Criteria of a Comparative Analysis 383
18.11 3D-QSAR: Correlation of Molecular Fields with Biological
Properties
Let us assume that multiple molecular fields for each molecule in a data set have
been calculated, and a correlation of their differences with the binding affinity is
attempted. How are these differences expressed? For this we want to consider three
hypothetical examples of substituted phenyl derivatives.
• First, all of the substituents on the phenyl ring in a compound series should be
varied so that increasingly large field contributions result in the vicinity of the
substituent when being scanned with a positively charged probe. If the binding
affinities increase in the same way as the field contributions become larger, this will
be reflected in the quantitative analysis. It means that derivatives with increasingly
positively charged groups in this molecular region are more potent substances.
E(r)
r
0
Lennard-Jones Potential
Coulomb Potential
Coulomb Potential
(opposite charges)
Cut-off value
Cut-off value
Gauss
Curve
Fig. 18.5 The Lennard-Jones potential (green) is a model for describing the intermolecular
interactions of two atoms without considering their charge. Negative potential values correspond
to mutual attraction, positive values correspond to a repulsion of the particles. If a reciprocal distance
becomes infinite, the potential approaches zero. Upon approach it goes through a shallow minimum
due to alternating polarization. At even shorter distance it very steeply rises toward positive infinity
because of atom-atom repulsions. The Coulomb potential (blue) considers only electrostatic inter-
actions that formally reside as point charges on the atomic nuclei. It also approaches infinity when
the distance disappears for like-charged particles. For oppositely charged atoms, negatively infinite
values result. The hyperbolic form of the Coulomb potential is considerably less steep, so that the
particles can still “feel” one another at larger distances. Boundary values are set for potentials in
a CoMFA analysis. A Gaussian function, which takes the course of a bell-shaped curve (here only
the right half of the “bell” is shown) describes the distance dependence of the interaction potential
between the particles in the context of the CoMSIA model. As the distance disappears between the
particles, the curve reaches its maximum value, which remains finite.
384 18 Quantitative Structure–Activity Relationships
• A second example should be positioned a little bit differently. Now the phenyl
ring substituents are given positive or negative partial charges. Their variation
has no influence on the potency of the substances. The quantitative analysis
shows that the changes in the electrostatic field contributions have no correlation
with the biological activity. A possible explanation might be that this effect and
another property, for example, the size of the substituents, mutually cancel their
influences. It could also be that the biological activity is influenced through other
qualities of the substituents, for instance, their hydrophobic character.
• In the third case, the electrostatic properties of the substituents that are important
for binding to the receptor should be hardly varied at all at the
examined position. There might be different substituents present, however,
they all have comparable partial charges. The model that analyzes the field
contributions in the vicinity of these groups does not recognize differences and
therefore also does not correlate with the binding affinity. It can indeed be that
a class of substituents at a particular position on a molecular scaffold is actually
very important for binding but nonetheless it remains insignificant in the anal-
ysis. This has to do with the fact that a QSAR analysis only performs a relative
comparisons within a data set.
These examples are still easily manageable. The question can be posed whether
a tedious correlation method with the “detour” via molecular fields is really needed.
The situation is more complicated in practice, above all if molecules with different
scaffolds are considered. The substituents do not fall exactly on top of one another
in the molecular superposition. Their contribution must be described as a field in
space and only as such they can be evaluated. At any rate, these examples under-
score the importance for careful planning of the analysis. The structures in the data
set must be chosen so that they have the largest possible variation of substituents
and their properties.
18.12 Graphical Interpretation of the Results of a Comparative
Molecular Field Analysis
If the full complexity of the field contributions is considered in terms of
a multidimensional matrix, a straightforward regression analysis cannot be applied
to extract the interdependence of the variables, for example, the binding affinity.
PLS analysis (partial least squares) is a statistical method that extracts relevant
and explanatory factors, so-called PLS vectors, out of the large quantities of data. In
CoMFA analysis these vectors describe the area of the fields that correlate best with
the experimentally determined affinity. The result is an equation that is analogous to
the results of the classical QSAR methods. It shows to what extent particular grid
points in the individual fields contribute to the binding affinities. Depending on how
many field points there are to be evaluated in the analysis, a strict monitoring of the
statistical significance of the derived results must be undertaken. This significance
is checked by a particular test: the crossvalidation.
18.12 Graphical Interpretation of the Results of a Comparative Molecular Field Analysis 385
One or more compounds are randomly extracted from the data set. A model is
constructed with the remaining derivatives and the affinities of the removed com-
pounds are predicted with this model. The removal of compounds is repeated
several times, in the simplest case, so often until all substances have been removed
one time. The quality of the prediction represents a measure for the reliability and
significance of the model. The achieved result is expressed with the q2
value, which
can be calculated from the square of the deviation from the predicted value. It takes
on values from 1 to +1. A value of +1 indicates that a perfect model was
achieved. All predictions exactly agree with the measured binding affinities.
There is no deviation. A value of q2
¼ 0 indicates that the predictions of the
model are no better than no model at all; it is just as good as the average of all
affinities. If q2
takes on negative values, the model is worse than the average, that is,
worse than no model. A model is therefore only to be trusted when the q2
value lies
above 0.4–0.5.
Another step must be performed to check the predictive value of a trained model.
For this, a test data set of molecules is needed that are similar to the molecules in the
training data set, but that were not used for the training. The binding affinities are
predicted for these molecules. It is only if the correlation coefficient for this set is of
similar size to that of the training set that the model possesses adequate predictive
power.
The derived model can be used to estimate the affinity of new compounds
that have not yet been synthesized. The conformations of these compounds
are calculated and superimposed on the other structures. They must fall within
the grid that was defined in the training set. Next their field contributions are
calculated. By using the correlation derived by CoMFA for the training set, it is
possible to compute which grid points are predictive with respect to the binding
affinity of new compounds.
CoMFA techniques establish a correlation between activity data and molecular
properties. A model can be derived that encompasses the properties of new mole-
cules, from the relative comparison within a training set. Relevant predictions are
only to be expected when the structural variations in the new molecule remain
within the scope of the model. In other words, the model cannot make predictions
about the influence of substituents that occur in areas in which there were no
structural variations in the training set. CoMFA models interpolate between field
contributions from molecules. An extrapolation to areas that were not covered by
the data set is not possible.
The results of a CoMFA analysis can be graphically evaluated. From the model
it is known at which grid points field contributions are obtained that contribute
significantly to explain the binding affinity. These contributions can be contoured
for the different fields according to their importance. They indicate volume areas
around the molecules in which changes in the field contributions run parallel or
opposite to the affinity changes in the data set. These contour maps significantly
support the design of new active substances (Sect. 18.14). They indicate the
position at which the properties of a lead structure have to be varied so that an
increase in affinity can be achieved.
386 18 Quantitative Structure–Activity Relationships
18.13 Scope, Limitations, and Possible Expansions of the
CoMFA Analysis
Usually only steric and electrostatic field contributions are evaluated in CoMFA
analyses. A hydrophobic field can quantify the size of the hydrophobic surfaces and
therefore partially considers the entropic contribution to affinity. Because CoMFA
evaluations yield relevant models without the explicit use of hydrophobic fields,
these field contributions must be at least partially contained in Lennard-Jones and
Coulomb fields. The lipophilicity of a molecule increases upon enlarging an
uncharged, sterically demanding group, for instance, from methyl to butyl. Here
the changes in the steric field contributions can correctly reflect the lipophilic
surface. A correlation with electrostatic properties is also imaginable. Hydrophobic
molecular portions carry, as a general rule, only minor partial charges. Positively or
negatively charged groups represent hydrophilic regions. In this way the lipophilic
and hydrophilic surface regions can be quantified via differences in the charge.
The deviation that is not explained by a CoMFA model comprises, apart from
experimental errors, also all inadequately described binding contributions. These
include structural adaptations of the protein that are not identical for all compounds
in the data set. Entropic contributions that come from the conformational fixation of
the active substance in the binding pocket or the residual mobility of the ligand in
the binding pocket are also not considered in any of the fields.
In addition to these inadequacies, the fields themselves cause a few problems. Due
to their mathematical function behavior, very large and/or very small values are
achieved at the surface or in the interior of the molecule (Fig. 18.5). Because the
Lennard-Jones potential increases faster upon approaching the atoms than the Cou-
lomb potential does, both achieve arbitrarily set cut-off values (Sect. 18.10) at different
distances from the molecule. Within a distance of 2 Å, which is the commonly chosen
grid spacing, the extremely steep Lennard-Jones potential can change from practically
zero to the cut-off value. These discontinuities and the neglected areas near the surface
can cause significant problems for the interpretation. Furthermore, they often cause
fragmented contour maps in the individual fields that are difficult to interpret.
The deficits in these fields have stimulated the search for other solutions. In one
method the similarity of molecules is investigated by use of their steric and
physicochemical properties in space and correlated to the binding affinity
(CoMSIA methods; Comparative Molecular Similarity Indices Analysis). The
molecules are superimposed just as they are in the CoMFA methods. Then their
relative similarity is determined through their relationship to a probe, a carbon atom
for instance, in that the similarity of each molecule is sampled with a probe at the
intersections of a surrounding grid. The measure of similarity between the probe
and the molecule is defined in a distance-dependent way. A Gaussian function
(Fig. 18.5) is chosen for this purpose. In contrast to the hyperbolic form of the
above-described potentials, the Gaussian bell-type curve approaches for decreasing
distances finite values instead of infinity. Cut-off values need not be set. For many
different properties a similarity is determined at all grid points. The prerequisite is
that the properties must be described by atom-based values, for example, partial
18.13 Scope, Limitations, and Possible Expansions of the CoMFA Analysis 387
charges or atomic volumes. The same distance dependency is used for all proper-
ties. Property-specific similarity fields are obtained. These are correlated with the
binding affinity. The interpretation of the field contributions is achieved analo-
gously to the CoMFA method. The advantage of this method lies, above all, in the
interpretability and the preserved contour maps. If a particular property in an area of
the superimposed molecules correlates significantly with binding affinity, this area
is enhanced. In contrast, the CoMFA method contours areas outside of the mole-
cules, where a property reveals changes in the field contributions that affect the
affinity positively or negatively. The setting of cut-off values, however, masks
entire areas of these field contributions near the surface (Fig. 18.5).
3D-QSAR analyses were first meant to establish structure–activity relationships in
cases when the target protein’s structure was unavailable as a reference. Nowadays,
more and more crystal structures of the target proteins become available, so, the
technique is increasingly used for cases in which this reference is actually known. It
serves as a method of generating a reasonable and relevant superpositions of the
substances to be compared in their biologically active conformations. It seems all the
more paradoxical to use the information about the surrounding protein environment
only to superimpose the molecules and then to relinquish this valuable data in the
comparative field analysis. Methods have been developed that consider this informa-
tion. The group of Rebecca Wade at EMBL in Heidelberg have developed the
COMBINE method. For this, a set of modeled protein–ligand complexes are used
to calculate a data table. It contains the interaction energies between individual ligand
atoms in the test molecules of the data set and the amino acid residues and water
molecules in the surrounding protein. The interpretation of this enormous data table is
achieved by using a technique that is similar to the CoMFA methods. The graphical
interpretation of the correlation model obtained by COMBINE indicates which
regions of the protein account for decisive contributions to explain the affinity
differences in the ligand data set. These are very valuable details, but they only
help a little for the design of better molecules that achieve higher affinity.
Holger Gohlke in Marburg developed the variation AFMoC (Adaptation of
Fields for Molecular Comparison), with which it is possible to transfer information
about the protein environment into the field-based model. The advantage of the
intuitive interpretation of the field contributions with regard to the structural
optimization of the ligands is not lost. For this, values are generated on
a COMFA-like grid by using the empirical scoring function DrugScore (▶ Sect.
17.10) by placing atomic probes at each grid point. The resulting values reflect the
protein environment and the grid has been “prepolarized.” By using a docking and
superposition technique, the ligands of the training set are then placed onto this
grid. It is only when an atom of the ligand falls upon an area of the grid for which
the protein environment has predicted this atom type as advantageous, the field
contribution is enhanced. In other cases the interaction contribution on the grid is
reduced. In this way a data table is generated for the entire training set analogously
to a CoMFA method. This table is accordingly evaluated and affords a QSAR
equation. The individual contributions can be shown on a grid. They indicate where
particular atom types increase or reduce affinity.
388 18 Quantitative Structure–Activity Relationships
A similar field analysis is also used for the correlation and prediction of
selectivity differences between ligands. Many enzymes occur as isoforms. They
therefore have similarities in their binding pockets. As a consequence ligands show
graduated affinities or “selectivity profiles” to these isoforms. If a ligand is to be
optimized to improve selectivity, the positions at which a change in a property
results in an improved profile must be known. A 3D-QSAR model is constructed for
each isoenzyme. Either the difference in the affinity values can be calculated and
used for the model as values to be predicted, or alternatively, two correlation
models can be constructed and at each grid point the field contributions are
subtracted from one another. The models that are obtained with both approaches
can be graphically interpreted. Contour diagrams show where and how the mole-
cules are to be changed to improve their selectivity with regard to the one or other
isoenzyme.
18.14 A Glimpse Behind the Scenes: Comparative Molecular
Field Analysis of Carbonic Anhydrase Inhibitors
Today comparative field analyses belong to the standard repertoire in drug research.
As an example, the binding of inhibitors to carbonic anhydrase I and II shall be
examined. The biological function of this enzyme is described in detail in ▶ Sect.
25.7. The sequence identity of the isoforms is 60%. The ligands in the training data
set are derived from the parent structures shown in Fig. 18.6. First, a superposition
model is generated by docking the ligands into the protein (Fig. 18.7). The
enzyme’s funnel-shaped binding pocket is occupied by ligands in a large variety
of ways. A good correlation model is obtained with the three methods, CoMFA,
CoMSIA, and AFMoC. The models also achieve a convincing predictive power on
a test data set that was independent from the training set.
S
N
N
N
H
SO2
NH2
R1
SO2 S
SO2
NH2
N
R1
H3C S
N
SO2
NH2
R1
SO2
NH2
R1
SO2
N
H
R1
R2
N
H
O
OH R1
SO2
N
H
OH
Thiadiazolsulfonamide Thienothiopyransulfonamide Benzothiazolsulfonamide
Phenylsulfonamide Hydroxamate Hydroxysulfonamide
Fig. 18.6 The scaffolds of inhibitors that were used in different field analyses to establish affinity
(pKi[CAII]) and selectivity models (pKi[CAII] – pKi[CAI] ¼ DpKi[CAII – CAI]) to describe the
inhibition of the carboanhydrases CAI and CAII. Different substituents were varied at the positions
that are marked as R1
and R2
.
18.14 A Glimpse Behind the Scenes 389
The contours for the acceptor properties with regard to the inhibition of carbonic
anhydrase II are shown in Fig. 18.8. Molecules in the data set that exhibit an
acceptor function in the areas marked in red have lower potency. On the other
hand, an acceptor function in the blue area improves potency. Compound 18.2,
which has both acceptor functions of an SO2 group oriented in the detrimental red
area, is a weak CAII inhibitor. Moreover its NH group is in the blue region, which
should be occupied by an acceptor. Compound 18.3, which is about four orders of
magnitude more potent, leaves the area that was occupied by an oxygen atom in
18.2 empty, and orients its thiadiazole ring in the direction of the desirable acceptor
function. It achieves considerably better inhibition of the target enzyme.
Just as for the acceptor properties, contour maps can be generated for steric,
electrostatic, hydrophobic, and hydrogen-bond-donor properties. Their evaluation
Fig. 18.7 The superposition of inhibitors from the data set in the funnel-shaped binding pocket of
CAII; the zinc ion is shown as the blue-gray sphere, carbon is light-yellow, oxygen is red, nitrogen
is blue, sulfur is orange, and hydrogen is white.
390 18 Quantitative Structure–Activity Relationships
helps to make evident where particular properties improve or lower the binding
affinity. Such correlation analyses help the synthetic chemist to plan the optimiza-
tion of lead structures in a tailored way.
Contour maps for steric properties that cause a selectivity difference between
CAI and CAII are shown in Fig. 18.9. Occupancy of the green areas with an
inhibitor improves the selectivity for CAI. On the other hand, spatially filling the
yellow-colored regions improves the selectivity for CAII. Compound 18.4 binds
unselectively with the same affinity to both isoforms, but 18.5 can clearly discrim-
inate between the two. The shown model is purely derived from the correlation of
ligand binding data. The relative alignment of the molecules in the data set is
accomplished in the binding pocket of the protein. Therefore the protein environ-
ment around this binding pocket should be examined more closely, to see if the
derived contours are reasonable. If the amino acid replacement between the two
isoforms is compared, it is apparent that CAI has two large residues Phe91 and
Leu131 that constrain the lower left portion of the binding pocket. The inhibitors
have less room in CAI than they do in CAII. In fact the comparative field analysis
O
N
H
N N
S S
N
O O
Cl3C
S
N
O O
H
S
O
O O
Zn2+
Zn2+
H
H
18.2 CA II pKi = 4.7
18.3 CA II pKi = 8.7
–
Zn2+
Fig. 18.8 Contour map for the description of the binding contributions of H-bond acceptor
properties. Inhibitors that occupy the red contour areas with H-bond acceptor groups do not inhibit
CAII well, the occupancy of the blue areas with acceptor groups, however, leads to increasing
values. Both oxygen atoms of the sulfonamide group of 18.2 occupy the red-contoured area, which
is unfavorable for acceptor properties. On the other hand, 18.3 leaves these areas unoccupied and
places its basic nitrogen in the vicinity of the blue-contoured region, which is favorable for
occupancy by acceptor groups. This explains the markedly better inhibition of CAII by 18.3.
18.14 A Glimpse Behind the Scenes 391
in this region generates a yellow contour, (near position 91) the occupancy of which
should be favorable for the inhibition of CAII. CAII also makes a large amount of
space available for inhibitors next to position 204, which is occupied by the less-
crowding Leu204 instead of Tyr204. A yellow contour is seen that indicates
a favorable occupancy of this area. Inhibitor 18.5, which is considerably more
potent on CAII, orients its pentafluorophenyl group exactly in this region (Fig. 18.9,
right). In the vicinity of position 131 (Leu131/Phe131) a yellow and a green area
occur directly next to one another but spatially separated, the occupancy of which is
favorable for either CAI or CAII inhibitors, respectively. Compound 18.4, which
can hardly distinguish between the two isoforms, occupies the upper edge of both
areas equally well. Moreover it leaves virtually all regions unoccupied that should
lead to a better inhibition of either CAI or CAII for steric reasons. Therefore it is
evident why this compound shows no particular selectivity.
S
N
N
N
H
SO2
NH2
O
N
S
N
N
N
H
SO2
NH2
O
N
H
S
F
F
F
F
F
O
O
18.4
CAI: pKi = 8.15
CAII: pKi = 8.10
CAI: pKi = 6.70
CAII: pKi = 9.40
18.5
His200
Tyr204
Leu131
Phe91
Thr200
Leu204
Phe131
Ile91
His200
Tyr204
Leu131
Phe91
Thr200
Leu204
Phe131
Ile91
CAI selective
His200
Tyr204
Leu131
Phe91
Thr200
Leu204
Phe131
Ile91
CAII selective
Fig. 18.9 The selectivity can be improved with regard to CAII inhibition by sterically filling
the yellow-contoured area. Filling the green area with sterically demanding group causes an
increase in selectivity with regard to CAI (top left). Compound 18.4 occupies virtually no area
that is particularly selectivity discriminating; the compound is not isoenzyme specific (top left and
top right). On the other hand, 18.5 occupies a yellow-contoured area neighboring position 204,
which causes a selectivity enhancement for CAII. Compound 18.5 inhibits CAII decidedly more
potently than CAI.
392 18 Quantitative Structure–Activity Relationships
Finally, the binding of the well-discriminating compound 18.6 should be con-
sidered (Fig. 18.10). The evaluation of the acceptor properties of the ligands in the
training data set shows that the occupancy of the red regions with H-bond-acceptor
groups shifts the selectivity to the benefit of CAII. Filling the blue contours with
this property achieves an increase in potency regarding CAI. Compound 18.6 places
CAI CAI selective
CAII CAII selective
NH2
N
H
CAI: pKi = 4.30
S S
SO2
H3C
O O
CAII: pKi = 8.05
18.6
Fig. 18.10 Compound 18.6 inhibits CAII significantly more potently than CAI. Its sulfone
oxygen atom lies near one red contoursed area, the filling of which causes an increase in the
selectivity for CAII binding. Interestingly, Gln92 is found in this region in both isoforms.
However, it is only in CAII that this group is available to accept an H-bond from the inhibitor
that will contribute to binding affinity. The comparable residue in CAI is involved in a network of
H-bonds to neighboring amino acids. Therefore it is not available as a binding partner, and
a decrease in the affinity for CAI is the consequence.
18.14 A Glimpse Behind the Scenes 393
its oxygen atoms of the endocyclic SO2 group in the vicinity of the red CAII-
selective areas. Furthermore, a glutamine is neighboring position 92 both in CAI as
well as CAII. This amino acid can accept an H-bond from the inhibitor via the NH2
group of its carboxamide group. However, only CAII allows this structural condi-
tions. Gln92 neighbors Asn69 and Glu58 in CAI. The carboxamide group of Glu92
forms a continuous H-bond network with these residues and with His94. Therefore
the NH group is no longer available for interactions with a bound inhibitor. This is
expressed in the poorer binding affinity of inhibitors that place an acceptor function
at this position, as 18.6 does. The situation is entirely different in CAII. The
neighboring groups of Glu69 and Arg58 form an internal salt bridge with each
other. Therefore they are not available as H-bond partners for Gln92. The
carboxamide group of Gln92 involves His94 via its carboxamide CO group in an
H-bond, and its NH2 group is now available as an acceptor functionality to interact
with a bound ligand. This results in a considerably enhanced binding to CAII and is
expressed as a selectivity advantage.
Alexander Hillebrecht at the University of Marburg has performed yet another
evaluation of the data set of carbonic anhydrase inhibitors that underscores the
difference between 3D, 2D, and 1D QSAR analyses. First, 32 so-called one-
dimensional descriptors were calculated with the MOE program for all molecules
in the data set. These are surface-based descriptors that describe the lipophilicity (log
P), the molar refraction (and therefore the polarization), and partial charges distrib-
uted over the molecules. These 32 descriptors are correlated with the binding affinity
to CAII or the selectivity difference between CAI and CAII to establish a QSAR
model. In another model the connectivities in the chemical formulae (so-called
molecular graphs) were used as descriptors. For this a topological connectivity tree
of all bonds in a molecular formula was generated, and by “walking” along the bond
connections it was counted how often a particular connectivity, for instance, an N–S–
C–C–N or C–N–C–C–C sequence occurs (so-called MACCS keys). In all, the
frequency of 166 different connectivity fragments was evaluated.
Such descriptors code indirectly for the molecular composition of the individual
inhibitors in the data set, as was introduced above in the Free–Wilson analysis (Sect.
18.5). These topological 2D descriptors were then related to the binding affinity or
selectivity data as described above. Good correlation models can be derived using 1D
as well as 2D descriptors. The models based on the 1D descriptors proved to be not
predictive. If an attempt was made to predict a molecule that was not in the data set,
the model failed. The topological descriptors obtain better results. They possess
a certain degree of predictive power, but they perform less well than the above-
described 3D descriptors in the comparative field analysis. This comparison makes
evident that the increase in the complexity of the model and the structural validity of
the descriptors increases their predictive power with regard to the binding properties
of new molecules that were not part of the training data set. But it is especially this
predictive power and the straightforward translation of the obtained correlation
model into the design of new or the modification of existing chemical structures
during the optimization that make QSAR models valuable for drug design.
394 18 Quantitative Structure–Activity Relationships
18.15 Synopsis
• The concept of quantitative structure–activity relationships is not new. It was
first described in the nineteenth century qualitatively, and later more quantita-
tively by Hansch and Fujita. It is an attempt to describe structure–activity
relationships with mathematical models.
• Across a series of structurally closely related test compounds, the equieffective
dose that induces a particular biological effect is related in a linear or squared
dependence on the logarithm of the octanol/water partition coefficient and the
Hammett constant, which describes the electronic properties of substituents at a
given scaffold. A mathematical correlation model is computed by regression
analysis.
• 3D QSAR methods have been developed to consider and correlate the spatial
structure of active substances beyond molecular topology.
• The mutually aligned test molecules are embedded in a regularly spaced lattice
and their properties are explored with an interaction probe. This is placed
systematically at all grid points and a molecular interaction field is computed
around the aligned molecules by using a distance-dependent property
potential.
• Usually, Lennard-Jones and Coulomb potentials are evaluated, and the gener-
ated data table for all molecules of the training data set is correlated by a partial
least-squares technique.
• The derived CoFMA correlation model can be used to predict the biological
properties of novel ligands not included in the training data set. Strict
criteria to monitor the statistical significance of the derived correlations must
be defined.
• Other property fields beyond Lennard-Jones and Coulomb potentials with
mathematically different functional forms can be applied. With respect to
the prediction of binding affinity, it has to be regarded that this property
comprises an entropic contribution that is particularly difficult to reflect in
property fields.
• QSAR analysis only performs a relative comparison of molecules with regard to
the considered biological property. Any dependence on a particular descriptor
across a compound series can only be expected if the property related to this
descriptor is varied in the series. QSAR methods only interpolate and never
extrapolate beyond the scope of molecular properties reflected by the training
set.
• Comparative molecular field analyses can be evaluated graphically. Results are
displayed as contours around the molecules and indicate where the change of
a particular property runs either parallel or opposite to the changes in the
biological property in the data set.
• The graphical information can be directly translated into the design of modified
molecules and thus support the medicinal chemist in optimizing a given lead
structure in a systematic fashion.
18.15 Synopsis 395
Bibliography
General Literature
Hansch C, Leo A (1995) Exploring QSAR. Fundamentals and applications in chemistry and
biology, vol 2. American Chemical Society, Washington, DC
Kubinyi H (1993a) QSAR: Hansch analysis and related approaches. VCH, Weinheim
Kubinyi H (ed) (1993b) 3D-QSAR in drug design: theory, methods, and applications. ESCOM, Leiden
Kubinyi H, Folkers G, Martin YC (1998) 3D QSAR in drug design, vol 2 and 3. Kluwer/ESCOM,
Dordrecht/Boston/London
Ramsden CA (1990) Quantitative drug design. In: Hansch C, Sammes PG, Taylor JB (eds)
Comprehensive medicinal chemistry, vol 4. Pergamon Press, Oxford
van de Waterbeemd H (1995a) Chemometric methods in molecular design. VCH, Weinheim
van de Waterbeemd H (1995b) Advanced computer-assisted techniques in drug discovery. VCH,
Weinheim
Special Literature
Blaney JM, Hansch C, Silipo C, Vittoria A (1984) Structure–activity relationships of dihydrofolate
reductase inhibitors. Chem Rev 84:333–407
Cramer RD, Patterson DE, Bunce JD (1988) Comparative molecular field analysis (CoMFA). 1.
Effect of shape on binding of steroids to carrier proteins. J Am Chem Soc 110:5959–5967
DePriest SA, Mayer D, Naylor CB, Marshall GR (1993) 3DQSAR of angiotensin-converting
enzyme and thermolysin inhibitors: a comparison of CoMFA models based on deduced and
experimentally determined active site geometries. J Am Chem Soc 115:5372–5384
Gohlke H, Klebe G (2002) DrugScore meets CoMFA: adaptation of fields for molecular compar-
ison (AFMoC) or how to tailor knowledge-based pair-potentials to a particular protein. J Med
Chem 45:4153–4170
Goodford PJ (1985) A computational procedure of determining energetically favorable binding
sites on biologically important macromolecules. J Med Chem 28:849–857
Hansch C, Klein TE (1991) Quantitative structure–activity relationships and molecular graphics in
evaluation of enzyme–ligand interactions. Methods Enzymol 202:512–543
Hillebrecht A, Klebe G (2008) The use of 3D QSAR models for database screening: a feasibility
study. J Chem Inf Model 48:384–396
Hillebrecht A, Supuran CT, Klebe G (2006) Integrated approach using protein and ligand
information to analyze affinity and selectivity determining features of carbonic anhydrase
isozymes. ChemMedChem 1:839–853
Kellogg GE, Abraham DJ (1992) Key, lock and locksmith: complementary hydrophathic map
predictions of drug structure from a known receptor-receptor structure from known drugs.
J Mol Graph 10:212–217
Klebe G, Abraham U, Mietzner T (1994) Molecular similarity indices in a comparative analysis
(CoMSIA) of drug molecules to correlate and predict their biological activity. J Med Chem
37:4130–4146
Ortiz AR, Pisabarro MT, Gago F, Wade RC (1995) Prediction of drug binding affinities by
comparative binding energy analysis. J Med Chem 38:2681–2691
Unger SH, Hansch C (1973) On model building in structure–activity relationships.
A reexamination of adrenergic blocking activity of b-Halo-b-arylalkylamines. J Med Chem
16:745–749
Weber A, Böhm M, Supuran CT, Scozzafava A, Sotriffer CA, Klebe G (2006) 3D QSAR
selectivity analyses of carbonic anhydrase inhibitors: insights for the design of isozyme
selective inhibitors. J Chem Inf Model 46:2737–2760
396 18 Quantitative Structure–Activity Relationships
From In Vitro to In Vivo: Optimization of
ADME and Toxicology Properties 19
The interaction between a substance and the binding site of a therapeutically
relevant biological macromolecule is the decisive prerequisite for suitability as
a drug. Another, no less important, prerequisite is the ability of the substance to
manage to get from the site of application, through an often rather tortuous path, to
the target tissue. The substance must penetrate aqueous phases and lipid membranes
for this to occur. According to its water and lipid solubility, it will arrive in different
compartments of the biological system. It is also changed by metabolic enzymes.
After conjugation or degradation it is finally eliminated via the kidney, the bile, and/
or by the intestines (▶ Sect. 9.1).
In contrast to the biological activity of a drug, which is called pharmacodynam-
ics, the sum of all processes that affect the absorption, distribution, metabolism,
and excretion, so-called ADME parameters, is covered by the term pharmaco-
kinetics. Roughly simplified, pharmacodynamics can be thought of as “the effect of
the substance on the organism” and pharmacokinetics as “the effect of the organism
on the substance.” In the last years this clear separation of definitions has begun to
disappear. The term pharmacodynamics has expanded more and more to processes
of pharmacokinetics. Above all, this has to do with increasing knowledge that
transporters or enzyme systems are responsible for properties such as absorption,
distribution, or metabolism. More and more structures are being solved for these
enzymes, and structure–activity relationships have been established (▶ Sects. 27.6
and ▶ 30.7).
The pharmacokinetics of an arbitrary biological system and the dependence of
the absorption, distribution, and excretion processes on time are described with
mathematical models. The pharmacokinetics of every pharmaceutical is scrupu-
lously investigated and a dosing scheme is determined before entry into clinical
trials, especially during the clinical phases I and II, which evaluate the tolerability
and efficacy in humans. The isolation and structural elucidation of metabolic
products in humans help to find the animal model that is most similar to humans
in its metabolic properties. These species are then used for toxicology studies,
which are chosen to investigate possible teratogenic effects, and long-term studies
G. Klebe, Drug Design, DOI 10.1007/978-3-642-17907-5_19,
# Springer-Verlag Berlin Heidelberg 2013
397
to investigate possible carcinogenic effects. In parallel, individual metabolites of
a pharmaceutical are investigated for their toxic side effects.
In the context of the rational design of new active substances, a substantial
problem arises from the pharmacokinetic parameters and the toxicity: these inves-
tigations are only carried out for very few compounds because of the enormous
experimental effort and the high costs, and only for those compounds that are
intended for clinical development. This approach comes with a serious danger:
scant pharmacokinetic properties are only recognized until very late development
stages, and only then after considerable sums have already been invested in the
development of a new pharmaceutical. In the middle of the 1990s a study emerged
that tellingly showed that numerous unsuccessful development campaigns failed
because of unsatisfactory pharmacokinetics and intolerable toxicity. For these
reasons, an intensified search for in vitro models to predict ADME-tox properties
has taken place in the last 15 years. Therefore it is not the pharmacokinetics of
individual substances that are investigated in detail, but rather the dependence of
different pharmacokinetic parameters on the properties of many different sub-
stances. This allows a better comprehension of the interrelationship between chem-
ical structure and pharmacokinetics. At the same time, it leads to the derivation of
general rules and numerous computer models that are today applied early on in the
design of new drugs.
19.1 Rate Constants of Compound Transport
The distribution of a substance in phases of different lipophilicities is measured as
the partition coefficient P (▶ Sect. 18.3). This definition is valid for systems at
equilibrium. The distribution between the water and octanol phases is considered
as a model system. The ratio of the concentration of the non-ionized form of an
investigated compound in the two phases is considered. In addition, the pH value is
adjusted during the measurement so that the investigated compound overwhelm-
ingly occurs in its non-ionized form. As a general rule, log P, the logarithm of this
value is used.
log Pðoctanol=waterÞ ¼ log
concentration ðdissolved compoundÞoctanol
concentrationðdissolved compoundÞnon-ionized in water
Biological systems are open systems that are kinetically controlled. They can be
temporarily found in a dynamic equilibrium. This condition can be compared to
a chromatographic process in which a substance is in a constant exchange between
the solid support and the mobile phase. Locally, equilibria occur that are disrupted
by the continuous progression of the mobile phase. In contrast to the relatively
simple conditions in chromatography, there are a plethora of different phases in
biological systems. A drug is distributed throughout all of these phases. Further-
more, metabolic processes are running in parallel that lead to different metabolites.
398 19 From In Vitro to In Vivo: Optimization of ADME and Toxicology Properties
To analyze these dynamic equilibria, the kinetic equilibrium constants of the
substance transport from the aqueous phases into the lipid phases and in the
reverse direction must be known. It is astonishing that such fundamental experi-
mental investigations on organic substances were first carried out by Bernard
Lippold in the mid–1970s, and later also by Han van de Waterbeemd. Lippold
used a three-phase system: water/n-octanol/water (Fig. 19.1). After the addition of
the substance in one of the two aqueous phases, the time dependence of the
substance concentration in the different phases was measured. From this the
equilibrium constant k1 for the transport from the water into the octanol phase
and the rate constant k2 in the opposite direction were calculated.
In addition to the partition coefficient P, which is described in Eq. 19.1, a very
simple correlation has been shown for the dependence of k1 and k2 (Eq. 19.2); b and
c are constants that depend on the system and not on the structures of the
substances.
P ¼
k1
k2
(19.1)
k2 ¼ bk1 þ c (19.2)
The dependence of the rate constants k1 and k2 on the partition coefficient P
results from the combination of both equations (Eqs. 19.3 and 19.4).
log k1 ¼ log P  logðbP þ 1Þ þ constant (19.3)
log k2 ¼  logðbP þ 1Þ þ constant (19.4)
The experimental k values for 20 different sulfonamides and 15 further sub-
stances that were experimentally determined by Han van de Waterbeemd are shown
Octanol phase B
k1 k2 k2
k1
Aqueous
phase A
Aqueous
phase C
Fig. 19.1 Three-
compartment system for the
determination of the rate
constants k1 and k2. At the
beginning of the experiment
the substance is dissolved in
aqueous phase A. Next the
substance concentration is
measured in phases A, B, and
C after different times until an
equilibrium is established
between the individual
phases.
19.1 Rate Constants of Compound Transport 399
in Figure 19.2. Among the latter are neutral, acidic, basic, and even quaternary
charged compounds with very different molecular weights. The characteristic
course of the curve says that the rate constant k1 for the transfer from the aqueous
phase into the organic phase depends on the partition coefficient P for relatively
polar substances. It is thermodynamically controlled, that is, it increases with
increasing lipophilicity. A point is reached, however, at which the diffusion of
the substance is limited by k1 at the maximally achievable value. More lipophilic
substances cannot simply penetrate the organic phase faster. Analogously, this is
valid for the opposite direction as well, from which the diffusion from the organic
phase into the aqueous phase is described by k2. The chemical structure plays
a role in both cases in that it determines the value of the partition coefficient P.
Because the rate constants are limited by diffusion, there must be an apparent
dependence on the molecular size in this area. According to Fick’s law of
diffusion, the diffusion should be proportional to the radius of the particle, as
a first approximation, parallel to the third root of the volume. Because of the
relatively low variability of the molecular size of organic drugs and their
conformational flexibility, this effect is probably lost by the noise level of
experimental error. Moreover, it must not be forgotten that the discussed
octanol/water system is very simple and it only slightly approximates the
complex structural relationships of real membrane systems. Therefore today
more relevant models to collect experimental distribution data, such as the so-
called PAMPA or Caco-2 models, are increasingly being used (Sect. 19.6).
Here more complex correlations are indicated. Obviously how a compound is
distributed and structurally oriented in the vicinity of membrane structures is
important. These properties simultaneously influence how the penetration, and
therefore the distribution, is to be described.
4
log P
log
k
log k1
(r = 0.997)
log k2
(r = 0.998)
3
2
1
0
−1
−2
−3
−7
−6
−5
−4
−3
Fig. 19.2 Experimentally
determined rate constants k1
and k2 for the transport of 20
sulfonamides and 15 further
chemically different
substances with molecular
weights between 100 and
500 Da. The curves and
correlation coefficients r
correspond to the fitting of the
data with Eqs. 19.3 and 19.4.
400 19 From In Vitro to In Vivo: Optimization of ADME and Toxicology Properties
19.2 Absorption of Organic Molecules: Model and
Experimental Data
The rate constant, k, for the penetration through the lipid membrane from the
aqueous phase is described by another equation, Eq. 19.5. Here, the rate constants
k1 and k2 also describe the entry into the organic phase and the transport in the
opposite direction, respectively.
log k ¼ log k1 þ log k2 þ constant (19.5)
In the first approximation, this equation should also describe transportation
processes in multicompartment systems. Model calculations on arbitrary, complex
systems show that this is indeed the case. They confirm that there is bilinear
dependence of the transport in different phases on the total lipophilicity of
a substance. For multiple groups of drugs, for example, barbiturates, this was
demonstrated experimentally in simple in vitro model systems (Fig. 19.3, bottom).
The log k values increase linearly upon penetration through an organic membrane,
which correlates with the increase of k1 with constant k2. After passing through
a maximum, they decrease with a constant k1 value and decreasing k2 value. This
dependence was quantitatively summarized by Hugo Kubinyi in the so-called
bilinear model (Eq. 19.6); a, b, b, and c are constants, by which the nonlinear
regression analysis is ascertained.
log k ¼ a log P  b logðbP þ 1Þ þ c (19.6)
Entirely analogous dependencies are observed with the absorption of com-
pounds, that is, out of the stomach or intestines (Fig. 19.3, middle). Active
substances that should be orally available should not be either very polar or very
non-polar. Substances with intermediate lipophilicity can cross the blood–placenta
barrier more easily than very polar or very non-polar compounds (Fig. 19.3, top).
A nonlinear dependence on the lipophilicity for substance penetration through the
blood–brain barrier is particularly pronounced (Fig. 19.4). The optimum for this
barrier is in the range of log P ¼ 1.5–2.5. For CNS-active substances, an optimal
lipophilicity around log P ¼ 2 should be aimed for in order to facilitate penetration
across the blood–brain barrier.
19.3 The Role of Hydrogen Bonds
The simple concept about the dependence of absorption on the octanol/
water partition coefficients that was outlined above, has been questioned in the
last few years. Octanol is indeed a relevant model for lipid membranes in many
respects (▶ Sect. 4.2), but it can only incompletely model the influence of
hydrogen bonding. Upon establishing equilibrium in the octanol/water system,
19.3 The Role of Hydrogen Bonds 401
the organic phase contains considerable amounts of water so that the molar ratio of
octanol/water ¼ 4:1. Substances with polar, solvated groups therefore do not need
to fully release their water solvation shell upon entry into the octanol phase.
Entering into a biological membrane is obviously different. Aside from the depen-
dence on lipophilicity, even worse membrane penetration is observed for sub-
stances that can form an increasing number of hydrogen bonds. Similarly,
a ligand must release its water shell before it can be accommodated in the binding
site of a protein.
The system water/cyclohexane is more suitable for the description of such
processes. Because of the non-polar character of this hydrocarbon, upon transition
from water into cyclohexane the molecule cannot take its water shell with it. Many
years ago P. Seiler derived an increment IH (Eq. 19.7) from the differences in the
partition coefficients in cyclohexane/water (loss of water shell) and octanol / water
−3
−5
−4
−3
−2
−1
0
1
−2 −1 0 1 2 3 4
log P
Penetration
through
an organic
membrane
Intestinal
resorption
Blood-placenta
penetration
Gastric
resorption
log
K
5
Fig. 19.3 The rate constant k for the transport of drugs depends nonlinearly on lipophilicity. This
is valid for simple in vitro models as well as for biological systems. The bottom curve describes the
log k values of the transport of barbiturates in an in vitro absorption model from an aqueous phase,
through an organic membrane into another aqueous phase. Both curves in the middle (gray points)
describe the dependence of the absorption rate constants k on the lipophilicity for the absorption of
homologous carbamates from the stomach (gastric absorption) or the gut (intestinal absorption) of
rats. The top curve was determined for the entry of different drugs into the placenta from the
circulation. In all cases an increase in log k dependent upon log P is seen, until a more-or-less-
pronounced maximum for substances with moderate lipophilicity. For very non-polar substances,
this curve falls, and in rare cases a plateau is reached. The curves for gastric and intestinal
absorption and for the penetration into the placenta run flatter than the curve for the in vitro
transport of barbiturates (below), because here no lipid barrier is present.
402 19 From In Vitro to In Vivo: Optimization of ADME and Toxicology Properties
(no loss of water shell) for different functional groups. These IH values characterize
the tendency of groups to form hydrogen bonds.
log Pcyclohexane þ
X
IH ¼ 1:00 log Poctanol þ 0:16 (19.7)
The concept of Seiler remained largely ignored. In 1988 Robin Ganellin and co-
workers described the CNS bioavailability of different substances, that is, their
ability to cross the blood–brain barrier, as a linear function of a Dlog P value. This
Dlog P value is the difference between the log P values in the systems cyclohexane/
water and octanol/water. The bioavailability of peptides also runs in first approx-
imation parallel to the Dlog P value, or the number of groups that potentially
participate in hydrogen bonds. The methylation of all NH groups of a peptide
scaffold can, in fact, deliver substances with good bioavailability. The prerequisite
for good membrane penetration is similar to those for high affinity at the binding
site (▶ Chap. 4, “Protein–Ligand Interactions as the Basis for Drug Action”). Here
too, the requirement to release relatively strongly bound water molecules can also
have a detrimental influence on binding affinity.
Several other distribution systems, for instance, heptane/ethylene glycol, have
been proposed as alternatives to the octanol/water or cyclohexane/water systems
with regard to the simulation of penetration through a lipid membrane. But even
these systems cannot correctly reflect the architecture of membranes with an
interior lipophilic zone and a polar, negatively charged outer rim. Another option
−2 −1 0
0
1
2
3
4
1 2 3 4 5
DecOH
EtOH
MeOH
AmOH
log P
log
1/c
Fig. 19.4 The neurotoxicity (C ¼ molar dose that induces a particular toxic effect) of homolo-
gous primary alcohols in the rat is a measure of their ability to cross the blood–brain barrier. Polar
substances remain overwhelmingly in the blood circulation. In contrast, substances with moderate
lipophilicity reach the central nervous system easily. Accordingly, neither methanol (MeOH) nor
ethanol (EtOH) shows a pronounced neurotoxicity. The high general toxicity of methanol (blind-
ness) is not because of its own effect but rather the severely toxic metabolic products formalde-
hyde and formic acid (acidosis). Short-chained alcohols such as amyl alcohol (AmOH) are
considerably more neurotoxic. The highly lipophilic decanol (DecOH) shows low toxicity.
19.3 The Role of Hydrogen Bonds 403
is the determination of the membrane/water partition coefficient, which is, how-
ever, experimentally rather laborious. For this, artificial membranes or liposomes
are used as models.
19.4 Distribution Equilibria of Acids and Bases
Many drugs are acids (HA) or bases (B). They exist in two forms through dissoci-
ation (Eq. 19.8) or protonation (Eq. 19.9); one is usually a non-polar neutral form
and the other is a polar ionic form. The values of the partition coefficients of the
ionic species are generally three to five orders of magnitude less than the
corresponding neutral molecule.
HA þ H2O Ð A
þ H3Oþ
(19.8)
B þ H3Oþ
Ð BHþ
þ H2O (19.9)
The distribution equilibrium of an acid and its anion in a two-phase system
depends on the pKa value and the pH value of the aqueous phase, as well as the
partition coefficients Pu and Pi of the substance (Fig. 19.5). All components in each
phase must be in equilibria with one another to establish equilibrium of the total
system. The dependence of the partition coefficient P on the pH value, the pH–
partition profile, usually takes on a sigmoidal (i.e., S-shaped) course. Plateaus are
observed for the uncharged neutral form and in case of pH values at which so little
of the neutral form exists that solely the transfer of the charged species in the
organic phase determines the measured partition coefficient (Fig. 19.6). The
charged species goes into the organic phase with a counterion as an ion pair. Either
the corresponding ion of the salt or the excess of ions in the aqueous buffer come
into play as counterions. The partition coefficient of the ion pair decidedly depends
on the lipophilicity of the counterion. The tetrabutylammonium salt of salicylic acid
Octanol
HA A−
Pi
HA + H2O A−
+ H3O+
Ka
Aqueous Buffer
Pu
Fig. 19.5 Two-phase system with partition and dissociation equilibria for an acid HA (Eq. 19.8).
Ka is the dissociation constant, Pu and Pi are the partition coefficients of the non-dissociated and
ionic forms, that is, neutral and charged species, respectively. Because there is usually a difference
of several orders of magnitude between the Pu and Pi values, in many cases the Pi value can be
neglected. This leads to considerable simplification of the corresponding mathematical models.
404 19 From In Vitro to In Vivo: Optimization of ADME and Toxicology Properties
has an only slightly lower partition coefficient than the neutral form of salicylic
acid. In contrast, the sodium salt of salicylic acid has absolutely no tendency to
cross over into the organic phase. Amino acids and other mixed acidic and basic
compounds afford pH–partition profiles with a maximum between the pKa values of
the two ionizable groups (Fig. 19.6), that is, when the zwitterionic form is present.
Knowledge about the log P value of the neutral form and the pKa value allows
the partition coefficient of a substance to be calculated at neutral pH. These
principles allow the estimation of absorption and distribution properties of new
substances. Of course, these considerations are only valid for drugs for which no
transporter exists that facilitates their membrane penetration ( ▶ Sects. 22.7 and
▶ 30.7)
Because of their importance, today pKa values are routinely measured by
potentiometric titration in pharmaceutical research. However, it remains neglected
that the definition of pKa values of acids and bases are only valid for aqueous
solutions. The addition of an organic solvent, which changes the dielectric con-
stant, shifts this value (▶ Sect. 4.4). This is even more valid for the binding site of
a protein or the interior of a membrane. In individual cases, experimental values
have been determined by NMR spectroscopy and isothermal titration calorimetry.
log P
Acid AH
Base B
Acid/Ion pair
+H3NCH(R)COO−
Amino Acid
Protonated
Base B+
Dibasic Acid
+
H3NCH(R)COOH H2NCH(R)COO−
pH = 0 pH = 7 pH = 14
Anion A−
A−
× N(R4)+
Fig. 19.6 The pH dependence of the distribution equilibrium of acids and bases, the so-called pH
distribution profile, follows simple rules. Typically when an acid or a base is present, sigmoidal,
that is, S-shaped, curves are observed. For a two-base acid, for example, oxalic acid, the decrease
in the partition coefficient continues with increasing pH values. In the presence of lipophilic
counterions, for example, the tetrabutylammonium salt of salicylic acid, the ion pair displays
a very high partition coefficient. Amino acids with neutral side chains carry one basic amino group
and an acidic carboxyl group. Accordingly, they go through a maximum in their partition
coefficient at the neutral point. Here the majority of the substance indeed exists as a zwitterion;
aside from that, however, a larger part is in the neutral form than is at lower or higher pH values.
19.4 Distribution Equilibria of Acids and Bases 405
19.5 Absorption Profiles of Acids and Bases
The absorption of an active substance, for example, out of the intestines into blood,
should be dependent on the pH of the surrounding medium and the pKa of the
substance, just as the distribution between an aqueous buffer system and an organic
phase is. The absorption should follow very simliar profiles as the distribution. In
the 1950s, Brodie, Hogben, and Schanker formulated the pH–partition theory to
this effect. It says that the dependence of absorption profile on the pH value, the
pH–absorption profile is identical to the pH–partition profile (Sect. 19.4). This
theory was confirmed by, among other things, the investigation of the rate constant
of absorption of a few acids and phenols from the colon of the rat at pH 6.8. The
neutral forms of the strong acids 5-nitrosalicylic acid (pKa ¼ 2.3), salicylic acid
(pKa ¼ 3.0), m-nitrobenzoic acid (pKa ¼ 3.4), and benzoic acid (pKa ¼ 4.2) display
comparable lipophilicity with log P values between 1.8 and 2.3. Under experimen-
tal conditions near neutral pH, they are largely dissociated. Less than 0.1% are in
the neutral form. Therefore they are distinctly more slowly absorbed than the
comparably lipophilic, weakly acidic phenols p-hydroxypropiophenone
(pKa ¼ 7.8) and m-nitrophenol (pKa ¼ 8.2), which are more than 90% in their
neutral form at pH 6.8.
Neutral forms can diffuse through membranes; charged forms are well soluble in
water. An equilibrium is quickly established between the two forms in an aqueous
medium and also at the phase boundaries. In the case that the pKa values of the
substances are not more than 2–3 units from the neutral value of pH 7, the neutral
form is present in the aqueous phase at the entirely adequate concentration of about
0.1–1%. The latter penetrates into the membrane. In the aqueous phase it is
immediately regenerated by the dissociation equilibrium. In a biological system
the distribution of such substances is accomplished quickly and effectively
(Fig. 19.7), and indeed even better the closer the pKa value is to the neutral pH 7.
This also explains why so many drugs are organic acids or bases. Because of the
strongly deviating pH values in the stomach and intestines, at some place along the
gastrointestinal tract the conditions are right that a neutral substance, an acid, or
a base can be well absorbed. If the pKa values are too far from the physiological pH
values, for example, amidines or guanidines with extremely high pKa values, the
absorption can become problematic. This is also true for zwitterionic compounds,
for example, amino acids, and for compounds with multiple acidic or basic groups
in the molecule. Because of the large volume available for the distribution the
diffusion occurs overwhelmingly from the gastrointestinal tract into blood or tissue
and only to a negligible extent in the opposite direction (Fig. 19.7).
The absorption of strongly acidic compounds outside the range in which the
compound exists as a neutral molecule, runs in first approximation parallel to the
difference pH  pKa, and for bases the difference is pKa  pH. There are exceptions
to this approximation. Highly lipophilic compounds require a more detailed descrip-
tion of the pH–absorption profile. The neutral forms of these substances enter the
lipid phase as soon as they come near the membranes. The neutral molecule is being
constantly removed from the dissociation equilibrium, which is established in the
406 19 From In Vitro to In Vivo: Optimization of ADME and Toxicology Properties
Neutral substance
a b
c d
Acid, pKa = 4
N
Stomach, pH = 1
HA HA
A-
A-
A−
A−
Stomach, pH = 1
N
Blood circulation, pH = 7.4
Blood circulation, pH = 7.4
Intestines, pH = 6–8
N N
Intestines, pH = 6–8
HA HA
Stomach, pH = 1
B B BH+ BH+
Stomach, pH = 1
B B
Blood circulation, pH = 7.4
Blood circulation, pH = 7.4
BH+
BH+
BH+ BH+
BH+
B B B B
Intestines, pH = 6–8
Intestines, pH = 6–8
BH+
Weak base, pKa = 5 Strong base, pKa = 9
Fig. 19.7 (a) A moderately polar neutral substance N is absorbed very well from the stomach as
well as from the intestines. It is quickly distributed in the circulation so that the back-transport does
not play a notable role. (b) An organic acid HA (pKa ¼ 4) is absorbed well from the stomach, as
long as it is not too polar, because it exists there overwhelmingly in the neutral form. The
absorption is facilitated by the fact that the free acid is in considerably lower concentration in
the blood than in the stomach. The formation of an anion shifts the concentration gradient in this
direction. The absorption is slower from the gut because there the equilibrium lies overwhelmingly
on side of the ionized form. (c) A weak base (pKa ¼ 5) is absorbed relatively poorly from the
stomach because it exists overwhelmingly in its polar, protonated form. It is well absorbed in the
intestines because it exists as its neutral form there. (d) A strong base with a pKa ¼ 9 cannot be
absorbed from the stomach. The equilibrium indeed lies heavily on the side of protonated form in
the intestines, but the non-polar form is supplied in adequate quantities. Therefore the substance
can be absorbed. When a pKa value of 11 is reached by a substance, the concentration of the
neutral, bioavailable form is too low for good absorption.
19.5 Absorption Profiles of Acids and Bases 407
aqueous phase. However it is very quickly replenished by this equilibrium. In the
balance, a continuous transport of substance from the aqueous phase into the
membrane is achieved. The small amounts of uncharged neutral form is the door
over which the entire process takes place. The rate of the transition into the lipid
layer does not depend on the (often very low) concentration of the neutral form, but
rather on:
• The total concentration of the compound,
• The rate constants of the dissociation equilibrium,
• The diffusion constant of the compound.
Accordingly, a shift in the pH–absorption profile is observed in biological
systems for lipophilic acids and bases relative to the pH–partition profile, which is
referred to as pH shift. This always occurs in the direction toward the neutral point,
that is, with acids to higher and with bases to lower pH values (Fig. 19.8). The
larger the lipophilicity of an acid or a base, the larger the observed shift in
the absorption profile. To judge the question of how well a substance is absorbed,
the log P value and the pKa values must not be considered separately. Their
cooperation is decisive. For the design of new drugs, this means that a substance
with an unfavorable partition behavior, that is, with a too high or too low a pKa
value, can be beneficially modified in the desired direction by increasing its
lipophilicity. To describe the pH dependency of the distribution equilibrium,
a distribution coefficient D was introduced as a supplement to the partition
Amount of an acid,
AH, distributed or
absorbed
pH–Absorption
Diagram (dynamic
equilibrium)
Δ pH =
pH shift
pH–Distribution
Diagram (dynamic
equilibrium)
pH Value
Fig. 19.8 The dependence of the absorption of lipophilic acids on the pH value, the absorption
profile (red curve) decidedly deviates from the pH distribution curve (black curve, see Fig. 19.6).
Although the pH-distribution profile is valid for an equilibrium system, a steady-state equilibrium
is established during absorption. Even at relatively high pH values, that is, when small concen-
trations of the neutral species are present, a fast absorption of these few molecules is achieved.
Because of the high anion concentrations and the continuous adjustment of the dissociation
equilibrium, a minimally necessary concentration of the neutral species is maintained. The shift
in the pH-absorption profile is referred to as a pH shift. Analogous shifts are observed in the
opposite direction for lipophilic bases.
408 19 From In Vitro to In Vivo: Optimization of ADME and Toxicology Properties
coefficient P. For this, the ratio of the sum of all concentrations of ionized and non-
ionized forms of an investigated compound in the two phases are considered. The pH
value is adjusted for measurement in a buffer solution so that the addition of the
investigated compound does not shift the pH. Usually log D, logarithm of the
distribution coefficient, is used in this place.
19.6 What Is the Lipophilicity Optimum of Drugs?
Lipophilicity plays an important role in the appraisal of the therapeutic suitability of
a pharmaceutical. This is valid for the absorption, distribution, metabolism, as well
as the excretion. With the exception of substances that are taken up via
a transporter, the absorption is usually better when the compounds are more
lipophilic. This advantage is limited by the solubility in aqueous phases, which
decreases severely as the lipophilicity increases. The solvation enthalpy and the rate
by which the solid of an active substance dissolves in the gastrointestinal tract are
also decisive for the bioavailability. These factors depend on the intermolecular
interaction in the crystalline solid and can vary severely from one polymorphic
crystal modification to the other. Therefore correlations to predict bioavailability
regard the melting point as an additional parameter apart from the lipophilicity and
the solubility. In addition to the solubility, the kinetics of dissolution are important
for galenic formulations, that is, the final drug preparation. It determines the amount
of substance that goes into solution during the gastrointestinal passage. This amount
can be increased by different factors such as:
• Increasing the surface area by grinding the crystals into miniscule particles
(micronization),
• Growing a modified crystal with better solubility properties,
• Crystallization under special conditions to afford a more uniform (usually
smaller) size, or crystals with lattice defects,
• Changing the salt form,
• Adding solubility-mediating additives,
• Embedding the drug as amorphic solid solutions of easily dissolvable polymers.
Because of its importance, techniques to measure the solubility on a high-
throughput scale have been established in the last years.
Cell cultures are also increasingly used as in vitro models to record substance
absorption. A thin layer of cells from human colon carcinomas (so-called Caco-2,
HT29, or MFCH cell lines) is grown in a two-chamber system. The transport of
active substance can be followed from both sides, this is either the so-called apical
or basolateral side. Because these cells also express transporters, the involvement of
specific transportation mechanisms of substances can also be studied. These models
are less suitable for the study of the possible consequences of substance metabolism
because the metabolizing enzymes (▶ Sect. 27.6) are only expressed in diminished
quantities by these cells.
Relevant in vitro test models have also been developed to study blood–brain
barrier penetration. These models are experimentally relatively laborious, and the
19.6 What Is the Lipophilicity Optimum of Drugs? 409
results often can only be compared within a series of structurally related substances.
Assay systems with artificial membranes (PAMPA, from parallel artificial
membrane-permeability assay) can be constructed that allow high-throughput
screening. Moreover, the penetration behavior in liposomes can be evaluated by
surface plasmon resonance.
When experimentally determining the absorption of different substances, results
obtained from saturated solutions of the substances should not be compared with
results from solutions with concentrations well below the saturation limit. Due to
the lower solubility of the lipophilic compounds their solutions will exhibit minor
concentrations which pretends worse absorption. In the second case using compa-
rable concentrations for all test compounds improved or good absorption is also
found for the lipophilic substances. A comparison of such different experimental
conditions will lead to incorrect conclusions. Further confusion occurs when the
terms absorption and bioavailability are incorrectly applied (▶ Sect. 9.1). The
absorption of a substance can be excellent, but the bioavailability is nonetheless
poor. Lipophilic compounds and substances with a molecular weight of more than
500–600 Da are often well absorbed, but suffer from very fast biliary (via the bile)
elimination. This usually happens during the first liver passage (first pass effect,
▶ Sect. 9.1) directly after absorption from the intestines. To achieve good bioavail-
ability, the lipophilicity must not be too high. The excretion path also depends on
the lipophilicity. In general, extremely lipophilic substances are more quickly
metabolized, but are also toxicologically worrisome. Hydrophilic substances and
polar metabolites, including those after conjugation with polar groups, are excreted
via the kidneys. The excretion of lipophilic substance is usually accomplished
hepatically, and subsequently over the intestines. Such substances often undergo
oxidative metabolism, with the concomitant possibility of toxic metabolites being
produced.
Substances that interact with membrane-bound receptors or ion channels can
often access their targets more easily if they are enriched in the surrounding
membrane. For this, the substances should be lipophilic, or should carry a large
lipophilic group with which they can be anchored in the membrane (▶ Sect. 4.2,
▶ Fig. 4.2).
19.7 Computer Models and Rules to Predict ADME Parameters
Aside from the set-up of suitable test systems to systematically record parameters
that determine the pharmacokinetic properties, major effort has been spent to
establish rules and computer models to predict favorable ADME properties. In
the first place, the rule of five must be mentioned, which was developed by Chris
Lipinski at Pfizer. Accordingly, an active substance should not violate more than
two of the rule of five in Table 19.1. These simple rules were derived from
experience and are almost exclusively used to preselect compounds for screening.
Tudor Oprea refined these rules further and extended them to cover the occurrence
of particular structural building blocks such as, for instance, the maximum number
410 19 From In Vitro to In Vivo: Optimization of ADME and Toxicology Properties
of rings of a particular size. Programs such as CLOGP, or ACD/pKa, and Pallas/pKa
have been developed to estimate lipophilicity and pKa values. To predict solubility,
attempts are made to calculate solvation enthalpies. Permeability, absorption, and
bioavailability predictions are based on empirical correlation models. For this,
experimental observations are related to the chemical structure of the investigated
molecules. The applied methods are derived from QSAR models presented
in ▶ Chap. 18, “Quantitative Structure–Activity Relationships.” The properties to
be predicted are described by models based on intuitively selected descriptors.
Usually molecular parameters are consulted that are frequently derived from the
molecular surface and are assumed to be decisive for the considered properties.
In addition to routine regression analyses, more recent mathematical models such as
neural networks, nearest-neighbor classifiers, decision trees, or machine-learning
techniques such as support vector machines are applied. In addition to the
easily evaluated rule of five, the following criteria should also be considered for
rational design: substances that are meant to act in the periphery, for instance,
cardiovascular drugs, should be relatively polar. Of course, a certain amount of
minimal lipophilicity is necessary for their absorption. Due to the risk of central
side effects, or the generation of toxic metabolites, this lipophilicity should not
be too severely exceeded. Here the following motto is valid: better to be a little
less potent than have all the other problems! A good therapeutic window is much
more valuable for therapy than having picomolar affinity to a protein. Substances
that act upon membrane-bound proteins and substances that act in the central
nervous system should have a moderate to high log P value of 1. To avoid the
development of toxic metabolites, the incorporation of the following is
recommendable:
• Easily conjugated groups, for example hydroxyl, amino, or carboxyl groups,
• Preconceived metabolic cleavage points such as ester or amide bonds,
• Oxidizable groups that lead to nontoxic and easily excretable metabolites, for
example, methyl groups.
Of course, this strategy should not be exaggerated, otherwise the substances are
excreted too quickly. The biological half-life is then reduced to a value that makes
a therapeutic administration in humans impossible.
The structural consideration of properties that lead to optimal bioavailability,
adequate biological half-life, and non-toxic metabolites represents a problem in the
search for new active substances. Structure-based design of active substances
initially concentrates on the fitting of a ligand to its binding site. Often, aspects
that have to do with the pharmacokinetics and metabolism are not adequately
considered in this phase. Disappointments at the end of a successful optimization
in the preclinical phase, or at the very latest in the clinic, punish such a one-sided
Table 19.1 Criteria for the
rule of five.
Molecular weight 500 Da
Partition coefficient log P  5
No more than 5 H-bond donor groups
No more than 10 H-bond acceptor groups
19.7 Computer Models and Rules to Predict ADME Parameters 411
approach. Because the spatial structures of transporters, channels, and metabolic
enzymes are increasingly becoming available structure-based design can be used to
test for cross-reactivity of proposed or developed ligands on these target structures.
Binding to the potassium-ion-transporting hERG ion channels leads to their block-
age. A consequence could be life-threatening cardiac arrhythmias (▶ Sect. 30.3).
For this reason, QSAR models were developed that can examine molecules for
a possible hERG channel binding. Methods for the direct docking of ligands in
structural models of the channel have also been developed. Another system that was
recently structurally characterized is the membrane-bound glycoprotein GP170. It is
a transporter that can expel drugs from the cell (▶ Sect. 30.8). It is desirable to avoid
interactions with this protein as much as possible. Another large family of enzymes
worthy of attention are the cytochrome P450 metabolic enzymes (▶ Sect. 27.6). Here
an attempt is made to estimate how drugs interact with these proteins and how they are
metabolized. A wide field is opened here for structure-based design.
19.8 From In Vitro to In Vivo Activity
Active substances are initially investigated in simple in vitro test models, for
instance, with respect to enzyme inhibition, receptor binding, in cell cultures, and
later in organs and animal models. As a general rule, the simplest model is chosen
for which the results are predictive of the effect that can be expected in an animal or
in humans. For this it is necessary to derive quantitative relationships between the
different test models, so-called activity–activity relationship. This describes the
relationship between biological activity, for instance, between in vitro and in vivo
data. In the best case, it even allows the extrapolation of the values of binding
affinity in an inhibition assay to the therapeutic effect in humans.
The confirmation of a correlation between a simple test model and a therapeutic
effect is often more important than the derivation of a structure–activity relation-
ship. After finding the relevant, quantitative relationship, inexpensive and quickly
performed tests can be used instead of laborious animal experiments. The number
of animal tests is reduced in this way considerably. But that is not the only
advantage. The use of automated molecular testing systems allows the profile of
active substances to be reliably characterized.
19.9 Natural Ligands Are Often Unspecific
Prior to the biological testing of an active substance: the following questions must
be clarified. What therapeutic goal should be achieved? How is this goal to be
realized? Therapeutic concepts are derived from the pathophysiology of the disease
mechanism. Regulatory intervention with drugs should restore the original physi-
ological condition as far as possible. Problems can occur in the process: to imitate
natural ligands of enzymes and receptors, the active substance must demonstrate
adequate specificity and must distinctly access the target site.
412 19 From In Vitro to In Vivo: Optimization of ADME and Toxicology Properties
Nature works with two orthogonal principles with respect to endogenous
substances: the specificity of the effect and a usually very pronounced spatial
compartmentalization. Hormones act overwhelmingly systemically, that is, they
are released at one site in the organism and transported through the circulation
to another, entirely different site. There they exert their action. Other substances,
for instance, neurotransmitters, act strictly locally. In the context of the picture
of the lock and key (▶ Sect. 4.1), Nature prefers to have a master key that can act
on different locks. It acts only at the site of its production and is removed as soon
as it has fulfilled its tasks. The neurotransmitters are synthesized in nerve cells,
stored, and released upon stimulation of the cell at the synaptic gap (▶ Sect. 22.5).
There they bind to specific receptors and exert a stimulation of the neighboring
nerve cell. The effect quickly subsides after reuptake in the cell or after decompo-
sition, for instance, by monoamine oxidases (amines), esterases (acetylcholine), or
peptidases.
The efficiency of Nature is documented especially impressively in the variety
with which small molecules, for example, adrenaline and noradrenaline (▶ Sect. 1.4),
can be used as hormones as well as neurotransmitters. A plethora of different
receptors and receptor subtypes are available for these substances, with which
entirely different effects can be induced with the same molecule. The recoding of
the amino acid sequence of a particular receptor, and therefore an alteration in its
binding site, is relatively easy to do on the gene level. The evolution of complex
biosynthetic pathways of non-peptide ligands, which often occur over multiple
enzyme-catalyzed steps, is considerably more intricate. Accordingly, almost all
neurotransmitters and many hormones are derived in a simple way from the central
intermediates of metabolism of, for instance, amino acids. On the other hand, the
steroid hormones (▶ Sect. 28.3) prove that Nature can achieve very different effects
with a set of chemically similar structures and evolutionarily and structurally related
receptors, for instance, as with the estrogens, gestagens, androgens, glucocorticoid
steroids, and mineralocorticoid steroids.
Frequently, the spatial distribution of biosynthesis or the release of a receptor
ligand or the distribution from membrane-bound receptors or enzymes plays
a decisive role for the specificity of an effect. Different effects are achieved by
the same ligand through locally restricted substance release or through the presence
of different receptors. In doing so, there is not only a differentiation between
particular organs or areas, but also between individual cells and cell compartments.
This is how, for example, the dopamine concentration in different regions of the rat
brain was determined. Whereas in some regions, for example, the caudate nucleus
(Lat.: Nucleus caudatus), an important synaptic site for the motor system and the
olfactory system, concentrations of up to 100 ng dopamine per mg protein are
reached, most of the other areas of the brain contain only between 0.2 and 10 ng/mg.
Even in the Substantia nigra of the mesencephalon, the dopamine level is only
5–6 ng/mg. The degeneration of dopaminergic neurons in this area leads to
Parkinson’s disease in humans. It is known from labeling experiments that the
distribution and population density of receptor subtypes in diverse areas of the brain
and other tissues can be very different.
19.9 Natural Ligands Are Often Unspecific 413
19.10 Specificity and Selectivity of Drug Interactions
How specifically should a drug act? There is no absolute answer to this question.
Because active substances are almost always administered orally or intravenously,
they act systemically, that is, on the entire organism. The lack of limitation to
a particular organ or a particular compartment must be compensated for with
a higher specificity. At any rate the drug must act as specifically as necessary to
achieve a successful therapy with tolerable side effects.
In the case of enzyme inhibitors substances are preferred that act so specifically
that only one particular enzyme is inhibited. Unspecific inhibitors that simulta-
neously inhibit multiple serine or metalloproteases would wreak havoc in an organ-
ism. A thrombin inhibitor, which should reduce an increased thrombosis risk, must
not act simultaneously as an inhibitor of the closely related plasmin, which causes
fibrinolysis, leading to dissolvation of blood clots that have already formed. The
situation with kinase inhibitors (▶ Sect. 26.3) is a bit different. Because of the
similarity among kinases one member of the family can take over the task of another
related kinase, which has been blocked. In doing so it reduces the therapeutic effect
to nothing. Here, a broad-spectrum kinase inhibitor might be desirable that can
simultaneously shut off an entire protein family. A broad-spectrum action that
inhibits multiple isoenzymes of a parasite equally well can also be beneficial for
antibacterial or antiparasitic compounds (e.g., plasmapepsins, ▶ Sect. 24.7).
Receptor agonists and antagonists should also display a high selectivity.
b-Agonists that are used to treat asthma (▶ Sect. 29.3) must be b2-specific so that
they do not induce an undesirable increase in the heart rate or blood pressure. Often
the necessary effect of a drug cannot be achieved with only one drug. The simul-
taneous use of multiple drugs is often indicated for the treatment of arterial
hypertension (▶ Sect. 22.10). More complex, multifactor-induced disease pro-
cesses must be treated by addressing multiple mechanisms. Because of the low
dosing of the different components, the unspecific side effects of the individual
different components fade into the background.
The specificity is critical for the effect of CNS-acting drugs. Progress in gene
technology has provided us with an explosion of knowledge about receptors, but
also a dilemma. We know the exact receptor profile of established substances. We
know what specificity must be achieved to imitate a particular type of action. But in
many cases, we do not know which profile should be present to achieve a better
therapeutic effect. An example should clarify this point. Neuroleptics and many
antidepressants (▶ Sect. 1.6) act on neuroreceptors. The classic neuroleptics chlor-
promazine 19.1 and haloperidol 19.2 (Sect. 19.9), which are used in the treatment
of schizophrenia, are relatively unspecific dopamine receptor antagonists
(Table 19.2). The mixed-type neuroleptic/antidepressant sulpiride 19.3 acts on the
D2 and D3 receptors simultaneously. All of these substances cause side effects on
the muscular–skeletal system, as is observed in Parkinson’s disease (Sect. 19.4),
which is caused by a dopamine deficiency. Because of its mode of action, it was
assumed that the side effects of the neuroleptics were inevitable consequences of
antagonism of the dopamine receptors. Then an atypical neuroleptic, clozapine 19.4,
414 19 From In Vitro to In Vivo: Optimization of ADME and Toxicology Properties
came along (Fig. 19.9). It does not have the described side effects. Today we know
that clozapine, in contrast to the other neuroleptics, acts much more potently on the
D4 receptor than on the D2 and D3 receptors (Table 19.2). At the concentration at
which clozapine acts on the D4 receptor, and which was also measured in the
cerebral spinal fluid of the treated patients, is sufficient so that clozapine also
binds to particular serotonin and muscarine receptors, partly with even higher
affinity. Because of this it could also be that the antagonistic effects of clozapine
on these receptors are responsible for the atypical effects.
Table 19.2 The natural neurotransmitter dopamine binds with higher affinity to dopamine
receptors of the D1-type. The classic neuroleptics chlorpromazine 19.1, haloperidol 19.2, and
(S)-sulpiride 19.3 are different from clozapine 19.4 (Fig. 19.9) in one point: they have no
comparable selectivity for the D4 receptor.
Substance
Binding to the dopamine receptors, Ki in nM
D1-Type D2-Type
D1 D5 D2 D3 D4
Dopamine 0.9 0.9 7 4 30
Chlorpromazine 19.1 30 130 3 4 35
Haloperidol 19.2 80 100 1.2 7 2.3
(S)-Sulpiride 19.3 45,000 77,000 25 13 1,000
Clozapine 19.4 170 30 230 170 21
S
N
N
Cl
F
N
OH
Cl
O
N
N
N
N
Cl
H
N
H
O
N
SO2NH2
19.1 Chlorpromazine
19.4 Clozapine
19.3 Sulpiride
19.2 Haloperidol
MeO
Fig. 19.9 Chlorpromazine 19.1, haloperidol 19.2, and sulpiride 19.3 are neuroleptics with typical
side effects that are associated with dopamine antagonists. Clozapine 19.4 is different from these
substances in its binding profile on the dopamine receptors (Table 19.2) as well as in its side
effects.
19.10 Specificity and Selectivity of Drug Interactions 415
Many drugs are classified as “dirty drugs” because of their multifaceted action
on many, totally different receptors. From the pharmacologists’ point of view, such
a characterization is appropriate. A general statement about the therapeutic value
cannot be derived from that. It may well be that many dirty drugs are optimal for
therapy because of their balanced action on multiple receptors. Recently, these
compounds have been termed “rich in pharmacology” and they define a “polyphar-
macology.” The suitability or unsuitability of a drug is only decided in the clinical
testing and later by the experience from broad application in patients.
The differences between enzymes and receptors in different species also offers
a chance to therapeutically achieve desired selectivity. Species differences play
a role if an undesired organism should be killed, that is, with antibiotics,
antimycotics, antivirals, and antiparasitic drugs. To avoid side effects in humans,
the metabolic pathways of the bacteria, fungus, viruses, or parasites are purpose-
fully attacked either by adequate selectivity or by selecting a point of action that is
not present in higher organisms (see ▶ Sects. 23.7, ▶ 24.3, ▶ 27.2, or ▶ 30.8).
19.11 Of Mice and Men: The Value of Animal Models
Quantitative activity–activity relationships serve to draw conclusions about humans
from animals, but also valuable to compare different biological models to one
another. From the huge plethora of examples that are described in the literature,
only a few typical relationships will be mentioned here.
Even before the characterization of the different dopamine receptors,
(Sect. 19.10, Table 19.2) 25 clinically used neuroleptics were investigated to
2
1
0
Log (average clinical dose, in mmol/kg)
Haloperidol
(r = 0.87)
Dopamine
(r = 0.27)
Log
(K
i
receptor
binding,
in
mM)
−1
−2
−5
−4
−3
−2
−1
0
1
Fig. 19.10 The agonist
dopamine preferably binds to
the D1-type of dopamine
receptors (Table 19.2). It was
clear very early, however,
from binding studies on
membrane homogenates that
the potency of clinically used
neuroleptics correlated with
the displacement of
haloperidol (r ¼ 0.87) rather
than with dopamine binding
(r ¼ 0.27).
416 19 From In Vitro to In Vivo: Optimization of ADME and Toxicology Properties
unravel correlations between the results of in vitro models, animal experiments, and
the potency of these substances in humans. Two radioactively labeled ligands,
dopamine and haloperidol 19.2 (Sect. 19.10, Fig. 19.9), one of which prefers the
D1-type and the other prefers the D2-type dopamine receptor, were used to charac-
terize binding. It was demonstrated that the average clinical dose significantly
correlated with the displacement of the D2-type ligand haloperidol 19.2. Signifi-
cantly higher concentrations were needed to displace the D1-type ligand dopamine.
A correlation with these data is virtually non-existent. Not only the clinical efficacy,
but also the data from animal models that are used to test for neuroleptic effects
correlate better with the displacement of haloperidol than with dopamine
(Table 19.3). In hindsight, the results suffer from a lack of ligand specificity for a
single receptor, and the preparations are affected by receptor heterogeneity because
the presence of the different receptor subtypes was not standardized in the calf brain
homogenates that were used. All substances were investigated with dirty ligands in
dirty test models. The profile of active substances can only now be unambiguously
assigned by using uniform receptor subtypes, which are produced by using gene
technology (see Table 19.2).
There are many cases in which the relationship between different test models
depends strongly on the species used. Investigations on isolated arteries and veins
from the lungs of rabbits, sheep, pigs, and humans show that the vascular
preparations from rabbits and humans react to noradrenaline in a comparable
way. Sheep and pig arteries are significantly less sensitive. Isolated pig veins
cannot be stimulated at all at comparable doses of noradrenaline. The experimental
results are even more inhomogeneous and difficult to interpret upon stimulation
with acetylcholine. It must not be forgotten that the metabolism in humans and in
animal species is also different and exerts an influence on the test results.
Tachykinins are short peptides that trigger a wealth of physiological and patho-
logical processes. Their central role in pain and asthma is certain. They act over the
NK1, NK2, and NK3 receptor subtypes, which also bind specifically to the three
Table 19.3 Correlation of the clinical efficacy (Fig. 19.10) of 25 different neuroleptics and their
potency in different animal models that are typically used for the evaluation of neuroleptic effects
with the displacement of dopamine or haloperidol 19.2. The clinical data and the results of the
animal models correlate conspicuously better with the displacement of the D2-type ligand halo-
peridol than with the displacement of the D1-type ligand dopamine (r ¼ correlation coefficient).
Model
Correlation with dopamine
displacement (r)
Correlation with haloperidol
displacement (r)
Mean clinical dose in humans 0.27 0.87
Inhibition of the stereotypical
behavior after application of
apomorphine (rat)
0.46 0.94
Inhibition of the stereotypical
behavior after application of
amphetamine (rat)
0.41 0.92
Protection from apomorphine-
induced emesis (dog)
0.22 0.93
19.11 Of Mice and Men: The Value of Animal Models 417
peptide agonists substance P, neurokinin A, and neurokinin B (▶ Sect. 10.7). CP 96
345, 19.5 a non-peptide NK1 antagonist, displaces substance P with high affinity in
two human cell culture models and in guinea pig and rabbit membrane preparations.
In membrane preparations from mouse, rat, and chicken brains, with which sub-
stance P binds with entirely comparable affinities, 19.5 demonstrates IC50 values
that are 60–500 times higher (Table 19.4). It is known from sequence-specific point
mutations that the agonist substance P and the antagonist CP 96 345 bind to
different regions of the receptor (see ▶ Sect. 29.7).
The differences between humans and individual animal species are not surpris-
ing considering that the amino acid sequence of the receptor proteins is usually
different in multiple positions. The use of human proteins in molecular test systems
is just as critical for the relevance of the achieved results as it is for the determi-
nation of the 3D structures (▶ Chaps. 13, “Experimental Methods of Structure
Determination” and ▶ 14, “Three-Dimensional Structure of Biomolecules”). This
can be seen very cleary in the results of the aspartic protease renin (▶ Sect. 24.2).
The inhibitors remikiren 19.6 and aliskirien 19.7 were tested on renins from
different species. The renins of two primate species and humans were inhibited at
very low concentrations. On the other hand, the renins from the rat and the dog,
which are two species that are most commonly used in cardiovascular pharmacol-
ogy, were inhibited at conspicuously higher concentrations (Table 19.5). Remikiren
would have indeed been found in a classical test for blood-pressure-lowering
Table 19.4 Binding of substance P and displacement by the antagonist CP 96 345 19.5 (tested as
a racemate) on cells of different origins.
OMe
N
NH
19.5 CP 96 345
System
Binding of substance P; IC50
in nM
Displacement of substance
P by 19.5; IC50 in nM
Human cell line U373 0.13 0.40
Human cell line IM9 0.22 0.35
Guinea pig brain 0.07 0.32
Guinea pig lung 0.04 0.34
Rabbit brain 0.16 0.54
Mouse brain 0.19 32
Rat brain 0.20 35
Chicken brain 0.26 156
418 19 From In Vitro to In Vivo: Optimization of ADME and Toxicology Properties
effects, but it would have been judged to be much too weak. A comparison of the
X-ray structure analysis of the renins from the mouse and human also shows
a conserved binding mode in the main chain of the peptide inhibitors that is
common to other aspartic proteases. Subtle differences are found at the rim of the
binding pocket arising from sequence differences between the species.
The amino acid sequences of 5-HT1B and 5-HT1Db subtypes of the serotonin
receptors of humans and rats show more than 90% identity. If the relationships
between the individual amino acids are considered, a homology of 95% is obtained.
Despite these similarities, a series of active substances bind with very different
affinities to these two receptors. The difference is traceable to a single amino acid:
the exchange of threonine 355 for an asparagine (Fig. 19.11). The human receptor
is, from the point of view of the affinity, converted to the rat receptor by this
mutation! After the exchange of this amino acid, the b-blockers propranolol and
pindolol bind with approximately three orders of magnitude higher affinity.
The affinities of many other ligands, on the other hand, are significantly reduced.
Table 19.5 Inhibition of the renins of humans and other animal species by remikiren 19.6 and
aliskiren 19.7.
S N
O
N
OH
O
O
H
H
O OH
NH
N
H
19.6 Remikiren
O
O
O
O
H2N
OH
N
H
NH2
O
19.7 Aliskiren
Renin from: IC50 in nM, Remikerin IC50 in nM, Aliskiren
Human 0.8 0.6
Monkey 1.0 1.72
Dog 107 7
Rat 3,600 80
19.11 Of Mice and Men: The Value of Animal Models 419
This may indeed only be a weak indication, but it can be speculated that the two
b-blockers bind to the mutated 5-HT receptor as they do to the b-receptor.
19.12 Toxicity and Adverse Effects
One of the most difficult chapters in preclinical research is the estimation of the
toxicity of a substance, above all the human toxicity, from data that were obtained
from other species. Such considerations must be made to be able to estimate the
potential danger of the substance before it is introduced to the clinic. Are there any
drugs without toxicity and without side effects? Paracelsus recognized in the
sixteenth century:
Everything is poison and nothing is without poison, it is the dose alone that makes a thing
non-poisonous.
Friedrich Schiller had his Fiesko say:
A desperate evil needs a bold medicine.
4
4 5 6 7 8 9
5
6
7
log
1/K
i
(Rat)
log 1/Ki (Human)
N,N⬘-Dipropyl-5-CT
Rauwolscine
5-OMe-diMe-tryptamine
Methysergide
Sumatriptan
(−)-Propranolol
Pindolol Metergoline
5-Carboxamido-tryptamine (5-CT)
5-Hydroxytryptamine
8
9
Fig. 19.11 Different serotonin receptor ligands and the b-blockers propranolol and pindolol show
very different binding affinities on very similar 5-HT receptors from rats and humans. The open
circles refer to the wild-type human receptor. They are irregularly distributed over the diagram
(correlation coefficient r ¼ 0.27). If one amino acid in the human receptor is exchanged for the
corresponding amino acid in the rat receptor, the binding profile changes. Relative to the affinity of
the ligands, the human receptor becomes a rat receptor. The black-filled circles refer to this
Asn355 mutant (correlation coefficient r ¼ 0.98).
420 19 From In Vitro to In Vivo: Optimization of ADME and Toxicology Properties
And the pharmacologist Gustav Kuschinski formulated:
Whenever it is proclaimed that a substance has no side effects, the urgent suspicion ensues
that there is also no main effect.
The determination of the acute toxicity in multiple animal species, and the
determination of the chronic toxicity in at least two animal species is routine before
entry into clinical trials, phase I, which is tolerability testing on healthy volunteers.
It is considered to be standard that the species for the chronic toxicity investigations
should be selected according to which animal species displays the most similarity to
humans in their pharmacokinetics and metabolism.
Cats and guinea pigs react extremely sensitively to cardiac glycosides. Therefore
they were previously used as models for the effect on humans. Rats react consid-
erably less sensitively. The hallucinogen lysergic acid diethylamide (LSD 2.21,
▶ Sect. 2.5) shows decidedly different toxicity in multiple animal species. An
experiment to test the hallucinogenic effects of LSD on an elephant led to
a disaster. A hallucinogenic, but non-toxic dose was desired. Despite carefully
estimating this dose, the elephant died within minutes after 0.3 g of LSD
(corresponding to 0.06 mg/kg) was administered. Relative to the mouse, which is
relatively insensitive (Table 19.6), the elephant reacted at least 1,000 times more
sensitively. This experiment was not repeated! The discoverer of LSD, Albert
Hofmann, took 0.25 mg of LSD in his first controlled self-experiment. With
about 0.0035 mg/kg he was significantly below the dose that cost the elephant its
life. Despite this, it can be assumed that LSD is less toxic for humans than it is for
elephants. Direct fatality through LSD is not known, only mortality that occurs as
a result of accidents or from suicide while in the psychotic state.
The toxicities of poisons that end up in our environment are very exactingly
investigated. Chlorinated dibenzodioxines and furans form during the uncontrolled
chemical decomposition of the corresponding substituted chlorophenols. The
Seveso accident is attributed to such an incident. Toxic chlorinated dioxins and
furans also occur during many burning processes. Tetrachlorodibenzodioxine 19.8
(TCCD, “Seveso dioxine”) belongs to one of the best investigated substances
regarding its toxicity. Even here, different species react differently (Table 19.7).
Three orders of magnitude difference is found between the two relatively closely
related species of hamster and guinea pig. Accordingly, it is difficult to draw
conclusions about the toxicity in humans. If an extrapolation is made between
primates and humans, TCCS would be classified as relatively non-toxic. In con-
nection with humans, the definition of an acute LD50 is absolutely inappropriate.
Table 19.6 Acute toxicity of
lysergic acid diethyl amide
(LSD, 2.21, ▶ Sect. 2.5,
▶ Fig. 2.8) in different
species and in humans
(LD50 ¼ dose that was lethal
for 50% of the animals).
Species Toxicity; LD50 (in mg/kg)
Mouse 50–60
Rat 16.5
Rabbit 0.3
Elephant 0.06
Human 0.003
19.12 Toxicity and Adverse Effects 421
To be able to exclude one fatality per one million people, an “LD0.00001” must be
determined or calculated. Because of its pronounced mutagenic effects, the
long-term damage stands in the foreground with TCDD. It is questionable in this
case whether an absolute no-effect level, that is, the lowest ineffective dose, can
be defined. The estimation of the potential danger of environmentally
relevant chemicals looks entirely different if considered relative to toxic natural
products, natural radioactivity, cosmic radiation, etc., or even when compared
to socially tolerated substances of abuse such as alcohol and nicotine. This
puts some things into perspective that are very contentiously discussed in public
forums.
A difficult problem must be mentioned when discussing structure–activity
relationships from in vitro investigations in order to estimate the mutagenic and
carcinogenic potential. Such tests indeed afford valuable information that must be
carefully checked. In individual cases they are neither in the positive nor the
negative sense proofs.
To develop theoretical models for toxicity and carcinogenic estimation that
have adequate reliability and predictive power has proven to be extremely difficult.
The mechanisms that are responsible for the activity are too diverse and multifac-
eted, and the chemical structures and structure–activity relationships, which are
only valid for one substance class, are too different. Today, testing for toxic,
carcinogenic, and teratogenic adverse effects has reached a high standard. The
pharmaceutical catastrophes of earlier decades such as the following would be
almost impossible with today’s standards:
• Early childhood brain damage and death of many premature and mature new-
borns by the sulfonamides in the late 1930s,
Table 19.7 Acute toxicity of tetrachlorodibenzodioxine 19.8 in different animal species.
O
Cl Cl
O
Cl Cl
19.8 2,3,7,8-Tetrachlordibenzodioxin
Species Toxicity (LD50 in mg/kg)
Mouse 114–280
Rat 22–320
Hamster 1,150–5,000
Guinea pig 0.5–2.5
Mink 4
Rabbit 115–275
Dog 100 300
Monkey 70
Human ?
422 19 From In Vitro to In Vivo: Optimization of ADME and Toxicology Properties
• Over 100 fatalities in the USA because of the use of diethylene glycol as
a solvent for sulfanilamide (this incident led to the foundation of the Food and
Drug Administration, FDA.),
• The SMON (subacute myelo-optic-neuropathy) illness of thousands of Japanese,
caused by the prolonged and too-frequent use of an antidiarrheal medicine,
• The severe birth defects of approximately 10,000 children worldwide that were
caused by thalidomide (Contergan®
) in the late 1950s.
Nonetheless, criminal intrigue and the uncontrolled distribution of faked drugs
from internet-based providers or the unscrupulous pursuit of economic advantages
can still cause such catastrophes today. The melamine-contaminated baby formula
(melamine makes the protein content of inferior or diluted milk seem higher) in
September 2008 in China, from which many thousand toddlers and babies were
sickened and several even died, serves as an example.
Moreover, in addition to the markedly stricter testing guidelines for medicines
that exist today in most countries, there is a reporting system that registers and
investigates adverse drug effect incidents. The slightest suspicion of a causal
relationship results in anything from public announcement or warning all the way
to the withdrawal of the marketing license.
A complication for the estimation of the toxicity is the formation of toxic, and
particularly reactive metabolites, even in small amounts. As was already discussed
in ▶ Sects. 9.1 and 19.6, an ideal drug should contain predetermined cleavage and/
or conjugation sites in addition to finely tuned pharmacodynamics and pharmaco-
kinetics. The more these requirements are fulfilled, the lower the risk that the
substance will exert toxic effects.
Some toxicity studies suffer from the fact that the extrapolated results to humans
reflect a higher toxicity than is actually the case because of the unphysiologically
high doses that are used in the studies. On the other hand, even the most compre-
hensive investigation cannot eliminate the risk of serious adverse effects occurring
in extraordinarily rare cases once the drug is used broadly. An adverse effect ratio
of 1:10,000 or less can remain undiscovered in even the most careful preclinical and
clinical trial.
Toxic side effects in humans are particularly seen after chronic pharmaceutical
misuse. The life-long consumption of large amounts of pain medication sums up to
kilogram amounts. In the case of phenacetin (▶ Sect. 2.1), this led to the conse-
quence that an effective and
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf
Drug design book.pdf

Drug design book.pdf

  • 1.
    1 3Reference Methodology, Concepts, andMode-of-Action Drug Design Gerhard Klebe
  • 2.
  • 4.
    Gerhard Klebe Drug Design Methodology,Concepts, and Mode-of-Action With 494 Figures and 44 Tables
  • 5.
    Gerhard Klebe Institute ofPharmaceutical Chemistry Philipps-University Marburg Marburg, Germany Translator Leila Telan D€ usseldorf, Germany ISBN 978-3-642-17906-8 ISBN 978-3-642-17907-5 (eBook) ISBN 978-3-642-17908-2 (print and electronic bundle) DOI 10.1007/978-3-642-17907-5 Springer Heidelberg New York Dordrecht London This work is based on the second edition of “Wirkstoffdesign”, by Gerhard Klebe, published by Spektrum Akademischer Verlag 2009, ISBN: 978-3-8274-2046-6 Library of Congress Control Number: 2013933987 # Springer-Verlag Berlin Heidelberg 2013 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
  • 6.
    Preface The present handbookon drug design builds on the German version first written by Hans-Joachim Böhm, Hugo Kubinyi, and me in 1996. After 12 years of success on the market, the German version of this handbook was entirely rewritten and significantly extended, then by me as the sole author. The new edition particularly considers novel approaches in drug discovery and many successful examples reported in literature on structure-based drug design and mode-of-action analysis. This novel version appeared in 2009 on the German market. Several attempts were made to translate this book into English to make it available to a wider audience. This intention was driven by the fact that the author was repeatedly approached with the question as to why such a successful book is not available in the English language. An analysis of the textbook market made apparent that no similar compendium was (and still is) available covering the same field of interest. Finally, Springer agreed in the translation project, and Dr. Leila Telan, a gifted bilingual medicinal chemist and physician, was found willing to take the task of producing a first draft of a cover-to-cover translation of the German original. This version was corrected, and some chapters extended by the author. The book is meant for students of chemistry, pharmacy, biochemistry, biology, chemical biology, and medicine interested in the design of new active agents and the structural founda- tions of drug action. But it is also tailored to experts in drug industry who want to obtain a more comprehensive overview of various aspects of the drug discovery process. Such a book project would not have been possible without the help of many friends and colleagues. First of all, I want to express my sincere thanks to Dr. Leila Telan, D€ usseldorf, Germany, who produced the first version of this translation. Her version and the modifications of the author have been carefully proofread by many colleagues in the field. Their help is highly appreciated. Furthermore, I would like to acknowledge the help of Prof. Dr. Hugo Kubinyi, Heidelberg, Germany, who assisted in correcting the first version of the English translation. Particular thanks go to Dr. Simon Cottrell, Cambridge, England, and to Dr. Nathan Kilah, Hobat, Tasmania, Australia, for their excellent and very thorough proofreading of the different chapters. The project was ideally guided by Dr. Daniel Quinones and v
  • 7.
    Dr. Sylvia Blago,Springer, Heidelberg, Germany. The author is grateful to the publisher for their assistance and technical support in producing the electronic and printed version of this handbook. Marburg, Germany, May 2013 Gerhard Klebe vi Preface
  • 8.
    Contents Part I Fundamentalsin Drug Research . . . . . . . . . . . . . . . . . . . . . . 1 1 Drug Research: Yesterday, Today, and Tomorrow . . . . . . . . . . . . 3 2 In the Beginning, There Was Serendipity . . . . . . . . . . . . . . . . . . . 23 3 Classical Drug Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4 Protein–Ligand Interactions as the Basis for Drug Action . . . . . . 61 5 Optical Activity and Biological Effect . . . . . . . . . . . . . . . . . . . . . . 89 Part II The Search for the Lead Structure . . . . . . . . . . . . . . . . . . . 111 6 The Classical Search for Lead Structures . . . . . . . . . . . . . . . . . . . 113 7 Screening Technologies for Lead Structure Discovery . . . . . . . . . 129 8 Optimization of Lead Structures . . . . . . . . . . . . . . . . . . . . . . . . . . 153 9 Designing Prodrugs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 10 Peptidomimetics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 Part III Experimental and Theoretical Methods . . . . . . . . . . . . . . . 209 11 Combinatorics: Chemistry with Big Numbers . . . . . . . . . . . . . . . . 211 12 Gene Technology in Drug Research . . . . . . . . . . . . . . . . . . . . . . . . 233 13 Experimental Methods of Structure Determination . . . . . . . . . . . 265 14 Three-Dimensional Structure of Biomolecules . . . . . . . . . . . . . . . 291 15 Molecular Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 16 Conformational Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 vii
  • 9.
    Part IV Structure–ActivityRelationships and Design Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 17 Pharmacophore Hypotheses and Molecular Comparisons . . . . . . 349 18 Quantitative Structure–Activity Relationships . . . . . . . . . . . . . . . 371 19 From In Vitro to In Vivo: Optimization of ADME and Toxicology Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397 20 Protein Modeling and Structure-Based Drug Design . . . . . . . . . . 429 21 A Case Study: Structure-Based Inhibitor Design for tRNA-Guanine Transglycosylase . . . . . . . . . . . . . . . . . . . . . . . . . . 449 Part V Drugs and Drug Action: Successes of Structure-Based Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469 22 How Drugs Act: Concepts for Therapy . . . . . . . . . . . . . . . . . . . . . 471 23 Inhibitors of Hydrolases with an Acyl–Enzyme Intermediate . . . . 493 24 Aspartic Protease Inhibitors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533 25 Inhibitors of Hydrolyzing Metalloenzymes . . . . . . . . . . . . . . . . . . 565 26 Transferase Inhibitors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599 27 Oxidoreductase Inhibitors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641 28 Agonists and Antagonists of Nuclear Receptors . . . . . . . . . . . . . . 697 29 Agonists and Antagonists of Membrane-Bound Receptors . . . . . . 719 30 Ligands for Channels, Pores, and Transporters . . . . . . . . . . . . . . 745 31 Ligands for Surface Receptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 777 32 Biologicals: Peptides, Proteins, Nucleotides, and Macrolides as Drugs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 813 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 851 Illustration Source References . . . . . . . . . . . . . . . . . . . . . . . . . . . . 853 Name Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 863 Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 867 viii Contents
  • 10.
    Introduction Drug design isa science, a technology, and an art all in one. An invention is the result of a creative act, and a discovery is the detection of an already-existing reality. Design encompasses the two processes with emphasis on a targeted approach based on the available knowledge and technology. Furthermore, the creativity and intuition of the researcher play a decisive role. Drugs are all substances that affect a system by inducing a particular effect. In the context of this book, drugs are substances that exhibit a biochemical or pharmacological effect, in most cases medications, that achieve a therapeutic result in humans. The idea of rational drug design is not new. Organic compounds were prepared more than a century ago with the goal of attaining new medicines. The sedatives chloral hydrate (1869) and urethane (1885), and the antipyretics phenacetin (1888) and acetylsalicylic acid (1897) are early examples of how targeted com- pounds can be made that have favorable therapeutic properties by starting with a working hypothesis. The fact, that the hypotheses in all four cases were more or less incorrect (▶ Sects. 2.1, ▶ 2.2, and ▶ 3.1) simultaneously demonstrates one of the main problems of drug design. In the case of the artistic design of a poster or commodity, or, in the case of engineering, the design of an automobile, a computer, or a machine, the result is usually predictable. In contrast, the design of a drug is even today not completely foreseeable. The consequences of the smallest structural changes of a drug on its biological properties and target tissue are too multifaceted and at present too poorly understood. Until modern times, scientists have worked on the principle of trial and error to find new medicines. By this they derived mostly empirical rules that have contrib- uted to a knowledge base for rational drug design and which has been translated by individual researchers more or less successfully into practice. Today new technol- ogies are available for drug research, for instance, combinatorial chemistry, gene technology, and automated screening methods with high throughput, protein crys- tallography and fragment screening, virtual screening, and the application of bio- and chemoinformatics. ix
  • 11.
    In many casesthe molecular mechanisms of the mode of action of medicines are fairly well understood, but in other cases we are at the threshold of comprehension. Many of these mechanisms will be discussed in this book. Progress in protein crystallography and NMR spectroscopy allows the determination of the three- dimensional structure of protein–ligand complexes on a routine basis. As is shown in many of the illustrations in this book (for a general explanation of “reading” these illustrations, see the appendix at the end of this book) these structures make a decisive contribution to the targeted design of drugs. 3D struc- tures with up to atomic resolution are known for approximately 550,000 small molecules and more than 85,000 proteins and protein–ligand complexes, and the numbers are increasing exponentially. Methods for the prediction of the 3D struc- tures of small molecules are now mature, and semiempirical and ab initio quantum chemical calculations on drugs are now routinely performed. The sequencing of the human genome is complete, and the genomes of other organisms are reported nearly every week, including those of important human pathogens. The age of structural genomics has begun, and it is only a matter of time before the 3D structures of entire gene families are available. Given enough sequence homology, modeling programs can nowadays achieve an impressive reliability. In the meantime, the composition of entire genomes is being processed with structure-prediction programs. There are already interesting approaches for the de novo prediction of 3D protein structures, and the first correct 3D structural predictions have been successfully accomplished. Structure-based and computer-aided design of new drugs is here to stay in practical drug research. Computer programs serve the search for, modeling of, and targeted design of new drugs. In countless cases these techniques have assisted the discovery and optimization of new drugs. On the other hand, a too-strict and one-sided focus on the computational results bears the danger of losing sight of the available knowledge of the relationship between the chemical structure and bio- logical activity. Another danger is the limited consideration of an active agent only with respect to its interaction with one single target without considering the other essential requirements for a drug, for instance, the pharmacokinetic and toxicolog- ical properties. In the last decade, intensive research effort has gone into the compilation of empirical guidelines to predict bioavailability, toxicological pro- files, and metabolic properties (ADME parameters). The ability to predict the metabolic profile for a given xenobiotic by the arsenal of cytochrome P450 enzymes or to predict for each individual patient the metabolic peculiarities is still a dream. Nonetheless, just such an individually adjusted therapy and dosing regime is within the realm of possibilities. It is also conceivable that in the foreseeable future, gene sequencing of each of us will be financially feasible and will require a manageable and justifiable amount of time and effort. This will open entirely new perspectives for drug research. Whether this pushes open the gate to individualized personal medicines will be a question of cost. The theme of this book is to introduce the methods required for drug design particularly based on structural and mechanistic evidence. By the use of well-selected examples the route to the discovery and development of new medicines is discussed and will be reflected under the constantly changing conditions. x Introduction
  • 12.
    Drug research isa multidisciplinary field in which chemists, pharmacists, technologists, molecular biologists, biochemists, pharmacologists, toxicologists, and clinicians work together to pave the way for a substance to become a therapeutic. Because of this, the majority of drug developments is done in an industrial setting. It is only there that the financial requirements and structural organization are in place to allow a successful cooperation of all disciplines that are necessary to channel the research in the required manner toward a common goal. The fundamentals and future-oriented innovations of drug research are, however, increasingly being established in academia. Interestingly, an increasing amount of research activities at the universities have recently been devoted to drug develop- ments for infectious diseases and for diseases that particularly afflict developing countries, which have been sorely neglected by the profit oriented pharmaceutical industry of the industrialized world. This is even more alarming when we consider that our improved quality of life and prolonged life expectancy are attributable to, above all else, a victory over devastating infectious diseases. We can only hope that politicians recognize this situation in time and make the resources and organizational infrastructure available so that the academic research groups can step into the breach in an efficient and goal-oriented way. The rising costs of research and development, an already high standard of health care in many indications, and distinctly increased safety awareness and the con- comitant demanding standards of the regulatory authorities have caused the number of new chemical entities (NCE) to steadily decrease over the last decades from 70– 100 per year from 1960 to 1969, to 60–70 from 1970 to 1979, to an average of 50 between 1980 and 1989, to 40–45 in the 1990s, and even less in the new millen- nium. Despite this, there have still been new developments, and distinct progress has been made in the therapy of, for example, psychiatric diseases, arterial hyper- tension, gastrointestinal ulcers, and leukemia in addition to the broadening of indications for older compounds. Of the blockbusters, a disproportionately large percentage of the drugs were found in the last years by using a rational approach. The cost of developing and launching a new drug has increased continuously; to date, it is between US $800–$1,600 million. Only large pharmaceutical companies can still afford these costs, with the associated risk of failure in the last phases of clinical trials, or a misjudgment of the therapeutic potential of a new drug. There is talk nowadays of a paradigm shift in pharmaceutical research. In research this refers to the use of new technologies; in the market place this refers to a concentration process of corporate mergers and acquisitions. The last decade brought about many such “mega-mergers.” Larger and larger sales figures are being achieved by fewer and fewer companies. In parallel to this, a very dynamic and hardly insignificant scene has developed of small- to medium-sized, highly flexible biotech companies. The areas of gene technology, combinatorial chemistry, sub- stance profiling, and rational design are particularly well represented in numerous such companies. Larger companies try to outsource their riskier research concepts to these companies and contract their services for everything up to the development of clinical candidates. However, the success of this scene has led to the result that the “good” companies have been swallowed by the “big” companies. Many former Introduction xi
  • 13.
    employees of “bigpharma” have established their own small companies with an innovative idea. If the idea was good and successful, after a few years these innovators find themselves once again incorporated into the organization of a “big pharma” company. At the same time the prescribing practices in all areas of health care have changed. Formerly it was the physician alone, occasionally in consultation with a pharmacist, who was responsible for the pharmacological therapy of the patient. Today cost-cutting measures, “negatives lists,” health insurance, the purchasing departments of hospitals and pharmacies, the ubiquitous Internet, and even public opinion influence therapies to an ever larger extent. The drug market, with its US $600 billion, is an extremely attractive market. Furthermore, this market is characterized by dynamic growth, which is decidedly more than in other markets. The best selling drug in 2005, Lipitor® (Sortis® in Europe; atorvastatin) achieved US $12.2 billion in annual sales. Only illegal narcotics like heroin and cocaine have higher sales figures. Tailored medications – Will the latest technologies really deliver on this prom- ise? What makes drug research so difficult? To use a parable, it is something like playing against an almighty chess computer. The rules are known to both sides, but it is very difficult to comprehend the consequences of each individual move during a complicated middle game. A biological organism is an extremely complicated system. The effect of a drug on the system and the effect of the system on the drug are multifaceted. Every structural change made with the goal of optimizing one particular characteristic simultaneously changes the finely tuned equilibrium of the other characteristics of the drug. The knowledge of the interplay between the chemical structure and the biolog- ical effect must be united with the newest technology and results of genetic research to purposefully develop new medicines. It is also necessary to define the range of applications and the limitations of new technologies. Theory and modeling cannot exist detached from experiment. The results of calculations depend strongly on the boundary parameters of the simulation. The results collected at one system are only conditionally transferable to other systems. Only an experienced specialist is in a position to fully exploit the special potential of theoretical approaches. The claims that some software and venture capital companies make, that their results automat- ically lead to success, should be considered with some skepticism. This book should be helpful in these situations too, to separate the wheat from the chaff and to identifying the application range of these method as well as their limitations. This book is about drug research and the mode of action of medicines. It is different from classical textbooks on pharmaceutical chemistry in its structure and goals. The principles, methods, and problems associated with the search for new medicines are the themes. Classes of drugs are not discussed, but rather the way that these drugs were discovered and some insights into the structural requirements for their action on a particular target protein. As the title suggests, the book is meant for students of chemistry, pharmacy, biochemistry, biology, and medicine who are interested in the art of designing new medicines and the structural fundamentals of how drugs act on their targets. xii Introduction
  • 14.
    In the firstsection, after an introduction to the history of medicines and the concept of serendipity as an unpredictable but always very successful concept in drug research, examples from classical drug research will be presented. A discussion about the fundamentals of drug action, the ligand–receptor interaction, and the influence of the three-dimensional structure on the efficacy of a drug round the section out. In the second section, the search for lead structures and their optimization and the use of prodrug strategies are introduced. New screening technologies but also the systematic modification of structures by using the concept of bioisosteres and a peptidomimetic approach are discussed. In the third section, experimental and theoretical methods applied in drug research are described. Combinatorial chemistry has afforded access to a wide variety of test substances. Gene technology has produced the target proteins in their pure form, and has helped to characterize these proteins’ properties and function from the molecular level to the cellular assembly, all the way to the organism level. It has built a bridge between understanding the effects of a drug therapy on the complex microstructure of a cell and in systems biology of an organism. The spatial structure of proteins and protein–ligand complexes are accessible through NMR spectroscopy and X-ray crystallography. Their structural principles are becoming better understood and are increasingly allowing us access to the binding geometry of the drugs. The computer methods and molecular dynamics simulations of complex conformational analysis have also sharpened our understanding of targeted drug design. The fourth section introduces design techniques such as pharmacophore and receptor modeling, and discusses the methods of, and uses for, quantitative structure–activity relationships (QSAR). Insights into the transport and distribution of drugs in biological systems are given, and different techniques for structure-based design are presented. A drug- design case study from the author’s research closes the chapter. The fifth section of this book focuses on the core question of pharmacology: How drugs actually work? Enzymes, receptors, channels, transporters, and surface proteins are divided into individual chapters and discussed as a group of target proteins. The spatial structure of the protein and modes of action are used to elucidate in detail why a drug works and why it must exhibit a particular geometry and structure to work. Exemplarily, the contributions of structure-based and computer-aided design to the discovery of new drugs are presented in these chapters, and other aspects are also shifted into the spotlight. Because of the concept of this book, many important drugs are not considered or are only fleetingly mentioned. The same is true of receptor theory, pharmacokinet- ics and metabolism, the basics of gene technology, and statistical methods. The biochemical, molecular biological, and pharmacological fundamentals of the mode of action of drugs, which are important for the understanding of the theme of drug design, are only commented upon in outline form. Other disciplines that are critical for the development of an active substance to a medicine and application to patients, such as pharmaceutical formulations, toxicological testing, and clinical trials, are not themes that are covered in this book. The selection of examples from therapeutic areas was made subjectively and for didactic reasons based on case studies and to bring other aspects of drug research to Introduction xiii
  • 15.
    the foreground. Abalanced presentation of the methods of drug design and their practical application was attempted. The interested reader does not have to read the book chronologically. If the reader’s interest is purely on drugs and their mode of action, then they can also begin with ▶ Chap. 22. There are many cross references in the text to help the reader to find the passages in other parts of the book that are necessary for an exact comprehension of what is being discussed at any given part. The references and literature suggestions that follow cite particularly recommend- able monographs and are ordered alphabetically; journals and series on the themes that are discussed in later chapters are not mentioned specifically again. Literature Monographs Brunton L, Lazo J, Parker K (2005) Goodman & Gilman’s the pharmacological basis of thera- peutics, 11th edn. McGraw-Hill, Europe Ganellin CR, Roberts SM (eds) (1993) Medicinal chemistry. The role of organic chemistry in drug research, 2nd edn. Academic Press, London King FD (ed) (2003) Medicinal chemistry: principles and practice, 2nd edn. The Royal Society of Chemistry, Cambridge Krogsgaard-Larsen P, Bundgaard H (eds) (1991) A textbook of drug design and development. Harwood Academic Publishers, Chur, Schweiz Lednicer D (ed) (1993) Chronicles of drug discovery, vol 3. American Chemical Society, Washington, DC and earlier volumes from this series Lemke TL, Williams DA (2008) Foye’s principles of medicinal chemistry, 6th edn. Williams & Wilkins, Baltimore Mannhold R, Kubinyi H, Folkers G (eds) Methods and principles in medicinal chemistry. Wiley- VCH, Weinheim, Series with Guest Editors Maxwell RA, Eckhardt SB (1990) Drug discovery. A casebook and analysis. Humana Press, Clifton Mutschler E, Derendorf H (1995) Drug action, basic principles and therapeutic aspects. CRC Press:Boca Raton/Ann Arbor/London/Tokyo Silverman RB (2004) The organic chemistry of drug design and drug action, 2nd edn. Elsevier/ Academic Press, Burlington Wermuth CG, Koga N, König H, Metcalf BW (eds) (1992) Medicinal chemistry for the 21st century. Blackwell Scientific, Oxford Journals and Series Annual Reports in Medicinal Chemistry Chemistry & Biology ChemMedChem Drug Discovery Today Drug News and Perspectives Journal of Computer-Aided Molecular Design Journal of Medicinal Chemistry Methods and Principles in Medicinal Chemistry xiv Introduction
  • 16.
    Nature Nature Reviews DrugDiscovery Perspectives in Drug Discovery and Design Pharmacochemistry Library Progress in Drug Research Quantitative Structure-Activity Relationships Reviews in Computational Chemistry Science Scientific American Trends in Pharmacological Sciences Nowadays the Internet, discussion platforms, and the tremendously valuable tool of Wikipedia are available to everyone and provide access to an enormous source of information. Introduction xv
  • 18.
  • 19.
    This colored copperplate engraving from arguably the most beautiful plant book, the Hortus Eystettensis by Basilius Besler, Eichst€ att, 1613, shows the squill, Scilla alba (modern name: Urginea maritima L.). This plant was known to the ancient Egyptians, Greeks, and Romans as a remedy for many ailments, but especially dropsy (today: congestive heart failure). It was venerated faithfully as general defense against harm. It was not until our century that the active components of squill, the glycosides scillaren, and proscillaridin were isolated in their pure form, and a derivative with improved bioavailability, meproscillarin (Clift® ), was avail- able for pharmaceutical therapy. 2 I Fundamentals in Drug Research
  • 20.
    Drug Research: Yesterday,Today, and Tomorrow 1 The targeted route to medicines is an old dream of humanity. Even the alchemists sought after the Elixir, the Arcanum that was meant to heal all disease. It still has not been found today. On the contrary, drug therapy has become even more complicated as our knowledge of the different disease etiologies has become more complex. Nonetheless, the success of drug research is impressive. For hundreds of years, alcohol, opium, and solanaceae alkaloids (from thorn apples) were the only prepa- ratory measures for surgery. Today general anesthesia, neuroleptanalgesia, and local anesthetics allow absolutely pain-free surgical and dental procedures to be carried out. Until this century, plagues and infectious diseases have killed more people than all wars. Today, thanks to hygiene, vaccines, chemotherapeutics, and antibiotics, these diseases have been suppressed, at least in industrialized countries. The dangerously increasing numbers of therapy-resistant bacterial and viral path- ogens (e.g., tuberculosis) have presented new problems and make the development of new medications urgently necessary. The H2-receptor inhibitors and proton- pump inhibitors have drastically reduced the number of surgical procedures to treat gastric and duodenal ulcers. Combinations of these inhibitors with antibiotics have brought even more advances in that it allows a causal therapy (▶ Sect. 3.5). Cardiovascular diseases, diabetes, and psychiatric diseases (diseases of the central nervous system, CNS) are treated mostly symptomatically, that is, the cause of the disease is not addressed, but rather the negative effects of the disease on the organism. Often the therapy is limited to slowing the progression of these diseases or increasing the quality of life. Synthetic corticosteroids have lead to significant pain reduction and retardation of the pathological bone degeneration associated with chronic inflammatory diseases (e.g., rheumatoid and chronic polyarthritis). The spectrum of cancer therapy ranges from healing, particularly in combination with surgical and radiation therapy, all the way to complete failure of all therapeutic measures. The history of drug research can be divided into several sequential phases: • the beginning, when empirical methods were the only source of new medicines, • targeted isolation of active compounds from plants, G. Klebe, Drug Design, DOI 10.1007/978-3-642-17907-5_1, # Springer-Verlag Berlin Heidelberg 2013 3
  • 21.
    • the beginningof a systematic search for new synthetic materials with biological effects and the introduction of animal models as surrogates for patients, • the use of molecular and other in vitro test systems as precise models and as a replacement for animal experiments, • the introduction of experimental and theoretical methods such as protein crystallography, molecular modeling, and quantitative structure–activity rela- tionships for the targeted structure-based and computer-supported design of drugs, and • the discoveries of new targets and the validation of their therapeutic value through genomic, transcriptomic, and proteomic analysis, knock-in and knock- out animal models, and gene silencing with siRNA. Each preceding phase loses its importance with the arrival of the next phase. Interestingly, in modern drug research individual phases run in the opposite direc- tion. That is, first a target structure is discovered in the sequenced genome of an organism and its function is modulated to validate it as a candidate for drug therapy. Then the structure-based and computer-aided design of an active substance is undertaken in close cooperation with multiple in vitro tests to clarify the activity and the activity spectrum. Next, the animal experiments substantiate the clinical relevance, and in the final step clinical trials confirm a test substance’s suitability as a medicine for patients. 1.1 It All Began with Traditional Medicines The beginnings of drug therapy can be found in traditional medicines. The narcotic effect of the milk of the poppy, the use of autumn crocus (Colchicum autumnale) for gout, and the diuretic effect of squill (Urginia maritime) for dropsy (today: congestive heart failure) have been known since antiquity. The dried herbs and extracts from these and other plants have served as the most important source of medicines for more than 5,000 years. The oldest written records of these uses are from 3000 BC. Around 1550 BC the ancient Egyptian Papyrus Ebers listed approximately 800 prescriptions, of which many contained additional rituals to invoke the help of the gods. The five-volume book De Materia Medica of Dioskurides (Greek physician, first century AD) is the most scientifically rigorous work of antiquity. It contains descriptions of 800 medicinal plants, 100 animal products, and 90 minerals. Its influence reached into the late Arabic medicine and the early modern age. The most famous medicine of antiquity was undoubtedly Theriac. Its precursor, Mithridatum, served the King of Pontus, Mithridates VI (120–63 BC) as an antidote for poisonings of all kinds. Theriac can be traced to Andromachus, the private physician of the emperor Nero, and originally contained 64 ingredients. This preparation remained very widespread even into the eighteenth century. It was prepared in many variations with up to 100 ingredients. In some cities it was even prepared under state control to ensure that none of the ingredients were left out! Its use evolved into a panacea for all diseases. In addition, every imaginable wonder 4 1 Drug Research: Yesterday, Today, and Tomorrow
  • 22.
    drug was inuse, some examples include rain worm oil, unicorn powder, gastric calculus stones, human cranium powder (Lat. Cranium, skull), mummy dust, and many more. Traditional Chinese medicine was very advanced even in ancient times. A special feature of their formulation was, and is, the circumstances responsible for the effect of four different qualities. The chief (jun) is the carrier of the effect, the adjutant (chen) supports the effect or induces a different effect. The assistant (zuo) can also support the main effect or can serve to ameliorate side effects, and one or more messengers (shi) moderate the desired effect. The Chinese Pen-Ts’ao school (first and second century AD), whose goal it was to live for as long as possible without aging (!), recommended the following dosing regime: When treating a disease with a medicine, if a strong effect is desired, one should begin with a dose that is not larger than a grain of millet. If the disease is healed, no more medicine should be given. If the disease is not healed, the dose should be doubled. If that does not heal the disease, the dose should be increased tenfold. When the disease is healed, the therapy should always be discontinued. The Chinese Materia Medica published by Li Shizhen in 1590 is made up of 52 volumes. It contains almost 1,900 medical principles, plants, insects, animals, and minerals incorporated into 10,000 detailed recipes for their preparation. The Chinese Pharmacopeia from 1990 contains only two volumes. One of those volumes contains 784 traditional medicines; the other contains 967 medications from “Western” medicine. Paracelsus (born Theophrastus Bombastus von Hohenheim; 1493/1494–1541) made a great breakthrough for scientific medical research. He understood the human to be a “chemical laboratory” and held the ingredients of drugs themselves, the Quinta essentia, responsible for their healing effects. Despite this, up until the beginning of the nineteenth century all therapeutic principles were based on either extracts from plant, animal ingredients, or minerals; only in the most seldom cases were pure organic compounds used. That changed fundamentally with the advent of organic chemistry. The great age of natural products from plants (for examples see 1.1–1.9, Fig. 1.1), and the active substances that were derived from them had begun. Premature hopes that were invested in some of these substances around the turn of the previous century, for example in heroin (▶ Sect. 3.3), or cocaine (▶ Sect. 3.4), were very quickly squelched, but natural products from plants established the fundamentals for, and form an exceedingly large part of our modern pharmacy. Natural products and their analogues and derivatives are also well represented among the best-selling drugs today. 1.2 Animal Experiments as a Starting Point for Drug Research The wealth of experience gained by traditional medicine is based on many thou- sands of years of sometimes accidental, sometimes intentional observations of their therapeutic effects on humans. Planned investigations on animals were relatively seldom. The biophysical experiment of Luigi Galvani, an anatomy professor in 1.2 Animal Experiments as a Starting Point for Drug Research 5
  • 23.
    O HO H H N N N N H3C CH3 O O Morphine 1.1 HO N H N N CH3 O Caffeine 1.2 MeO N HO H N NHCOMe O OMe MeO MeO MeO N NCOOCH3 O O H3C H3C OH N CH3 Quinine 1.3 1.4 Colchicine H H N H CH3 Ephedrine Cocaine 1.6 1.5 H O O OH Coniine 1.7 N N O H H H MeO Atropine (racemate) 1.8 H O OMe OMe OMe OMe O MeO 1.9 Reserpine N CH3 H Fig. 1.1 Many important natural products were isolated in the nineteenth century, and a few were synthesized. Morphine 1.1 was isolated from opium by Friedrich Wilhelm Adam Sert€ urner in 1806, caffeine 1.2 was isolated from coffee, and quinine 1.3 was isolated from cinchona bark by Friedlieb Runge in 1819. Quinine was discovered independently by Pierre Joseph Pelletier and Joseph Bienaimé Caventou, who one year later isolated colchicine 1.4 from autumn crocus. Cocaine 1.5 was extracted from coca leaves by Albert Niemann in 1860, and ephedrine 1.6 was extracted from the Chinese plant Ma Huang (Ephedra vulgaris) by Nagayoshi Nagai. In 1886 the first alkaloid, coniine 1.7, which is found in hemlock, was synthesized by Albert Ladenburg; in 1901 atropine 1.8 from Deadly Nightshade was synthesized by Richard Willst€ atter. Reserpine 1.9, from Rauwolfia serpentina was first prepared in the middle of the twentieth century, and its structure was elucidated. 6 1 Drug Research: Yesterday, Today, and Tomorrow
  • 24.
    Bologna, which wasfirst described in his book De viribus electricitatis in motu musculari in 1791, has become famous. In 1780 his students had already observed how frog thighs would twitch when the nerve was dissected and if a static electricity generator was simultaneously in use, such devices were standard laboratory equip- ment in many laboratories at the time. He wanted to demonstrate in standardized experiments whether the twitching was also caused by thunderstorms. He hung the legs on an iron window grill with a copper hook — they twitched simply upon contact with the grill. The voltage difference between the two metals was enough to stimulate the nerve, even without an electrical discharge. The systematic investigation of the biological effects in animals of plant extracts, animal venoms, and synthetic substances began in the next-to-last century. In 1847 the first pharmacology department was founded at the Imperial University in Dorpat (today: Tartu, Estonia). The famous pharmacologist, Sir James W. Black, who developed the first b-blocker (an antihypertensive, ▶ Sect. 29.3) at ICI, and later took part in the development of the first H2 antagonists (see gastrointestinal ulcer medications, ▶ Sect. 3.5) at Smith, Kline & French, compared pharmacolog- ical testing to a prism: what pharmacologists see in their substances’ properties directly depends on the model that was used to test the substances. Just as a prism would, the models distort our vision in different ways. There is no such thing as a depressed rabbit or a schizophrenic rat. Even if there were such animals, they would not be able to share their subjective perceptions and emotions with us. Gene-modified animals (▶ Sect. 12.5), such as the Alzheimer mouse, are also approximations of reality that have been distorted through a different prism, to use Black’s analogy. This actuality is often underestimated in industrial practices. Sci- entists tend to optimize their experiments on a particular, isolated model. In doing so, many factors and characteristics that are essential for a medicine, for instance the selectivity or bioavailability, are inadequately considered. There is no way out of this dilemma. We need simple in vitro models (Sect. 1.5) to be able to test large series of potentially active compounds, and we need the animal models to correlate the data and to make predictions about the therapeutic effects on humans. In the past, therapeutic progress was preferentially achieved when a new in vivo or in vitro pharmacological model was available for a new effect (see the H2 receptor antagonists, ▶ Sect. 3.5). Typical mistakes in the selection of models and interpretation and comparison of experimental results arise from different modes of application and the correlation of results obtained in different species of animals. It does not make sense to optimize the therapeutic range of a substance in one species, and the toxicology in another. Further, comparing effects after a fixed dose, without determining an effective dose also distorts the results because very strong and weak substances fall outside the measurement range. Measuring the effect strictly according to a schedule is also questionable because neither the latency period, that is the time before an effect is seen, nor the time of maximum biological effect are recorded. In whole-animal models, auxiliary medications are usually applied, which can also influence the experimental results. Anesthetized animals often give entirely different results than conscious animals. 1.2 Animal Experiments as a Starting Point for Drug Research 7
  • 25.
    1.3 The BattleAgainst Infectious Disease Plagues and infectious diseases, and at the top of this list are malaria and tubercu- losis, have killed more people over the ages than all of the wars in the history of humanity. Twenty-two million people died during the first wave of the 1918 influenza (“Spanish flu”). Up until the middle of the twentieth century, millions of people died every year of malaria, and unfortunately, today these numbers are shooting up again (▶ Sect. 3.2). Until the turn of the twentieth century, ipecac (Psychotria ipecacuanha) and cinchona (Cinchona officinalis L.) were the only therapeutic approaches to this disease. The impressive successes in the fight against plagues came in large part from the last 80 years of drug research. We have the sulfonamides (▶ Sect. 2.3) and their combinations with dihydrofolatereductase inhibitors (▶ Sect. 27.2), the antibiotics (▶ Sects. 2.4, ▶ 6.4, and ▶ 32.6), and the synthetic tuberculostatic medicines (▶ Sect. 6.5) to thank for this. When Selman A. Waksman (1888–1973) received the Nobel Prize for the discovery of streptomycin (▶ Sect. 6.4), a little girl congratulated him with a bouquet of flowers. She was the first patient with meningeal tuberculosis to be healed with streptomycin. Today we cannot appreciate the atmosphere in a tuberculosis hos- pital from our own experience, rather solely from Thomas Mann’s The Magic Mountain (German: Zauberberg). However, the infectious diseases, including tuberculosis, are on the advance again. In the past many antibiotics were too broadly used. This and the spread of resistant pathogens in hospitals have led to the situation that many cases are only treatable with very specific antibiotics. If resistance develops to these antibiotics, all of our weapons are dull. New viral infections are looming. Before the advent of the immune disease AIDS (acquired immune deficiency syndrome) there were very few cases of pneumonia from the fungus Pneumocystis jirovecii (formerly Pneumocystis carinii), nowadays the numbers have increased tremendously. This type of pneumonia is the primary cause of death of AIDS patients and immunosupressed patients after organ transplantation. A great effort has been made to find drugs for AIDS and its complications. On the other hand, many widespread tropical diseases, for instance malaria and Chagas disease, have been inadequately researched, and expanding resistance to the currently available med- ications represents an increasing worldwide problem. Because these diseases are rampant in parts of the world where people lack the economic resources to finance chemotherapy, more and more pharmaceutical companies have withdrawn from these research areas for financial reasons. The chances of recovering the develop- ment costs from this social stratum are poor. Here the global politics must establish some structure so that these people are able to benefit from the technological progress made by modern drug research. An example of this is the Bill and Melinda Gates Foundation, which is dedicated to the treatment and eradication of diseases around the entire world, but with particular emphasis on developing countries. Improved hygiene has also helped to reduce the risk of infection, for instance traumatic fever or Shigella dysentery (discussed in ▶ Chap. 21, “A Case 8 1 Drug Research: Yesterday, Today, and Tomorrow
  • 26.
    Study: Structure-Based InhibitorDesign for tRNA-Guanine Transglycosylase”). Above all else, it was the vaccines that contributed to the eradication of many infectious diseases. Now as before, hopes rest on new and combined vaccines for the prevention of AIDS, malaria, and gastrointestinal ulcers, the latter of which we now know to be caused by the bacteria Helicobacter pylori (▶ Sect. 3.5). 1.4 Biological Concepts in Drug Research Acetylcholine 1.10 (Fig. 1.2), which was synthesized in 1869 by Adolf v. Bayer, is a neurotransmitter, that is, a transfer agent for nerve impulses. In 1921 Otto Loewi, a pharmacologist, proved its biological effect in an elegant experiment. Two isolated frog hearts were perfused with the same solution. The vagal nerve of one of the hearts was stimulated, leading to a slowing of the heart rate, a so-called bradycardia. Shortly afterward, the second heart also began to beat more slowly, which was a clear indication of a humoral (Lat. humor, umor, fluid) signal transfer. Soon after that acetylcholine was recognized as the responsible “Vagus Stoff”. Acetylcholine is itself not usable as a therapeutic because it is metabolized too quickly by acetylcholine esterases (▶ Sect. 23.7). In 1901 Thomas Bell Aldrich (1861–1938) and Jokichi Takamine isolated the first human hormone, adrenaline 1.11 (Fig. 1.2). This hormone and its N-desmethyl derivative, noradrenaline 1.12 (Fig. 1.2), are produced in a central location, the adrenal glands, and are released under stress conditions into the entire system with the exceptions of the CNS and the placenta, which have their own barriers against most polar compounds. These substances cause different reactions in different parts of the organism, where they react with the relevant receptors. The specificity is poor, and a plethora of pharmacodynamic effects result: pulse and blood pressure rise, and the organism is prepared for “flight” – which has been an exceedingly important function over the course of evolution. Noradrenaline and adrenaline (also called norepinephrine and epinephrine, respectively) are also neurotransmitters (▶ Sect. 29.3), just like acetylcholine, the biogenic amines 1.13–1.15, the amino acids 1.16–1.19, and peptides, such as 1.20 and 1.21 (Fig. 1.2). Neurotransmitters are produced locally in the nerve cells, stored, and upon stimulation of the nerve, released. After interaction with receptors on the neighboring nerve cell, they are quickly metabolized or taken up again by the same neuron that released them. Depending on the name of the neurotransmitter, one speaks of the adrenergic, cholinergic, and dopaminergic (etc.) systems. The effect that adrenaline invokes is referred to as adrenergic, and an antagonist to this system is called antiadrenergic. However, this nomenclature is not always strictly adhered to. It is common to see combinations of the name of the neurotransmitter with the term agonist or antagonist, or sometimes blocker instead of antagonist, for instance a dopamine agonist, a histamine antagonist, or a b-blocker for antagonists of b-adrenergic receptors. A plethora of drugs have arisen from the structural variations of neurotransmitters. 1.4 Biological Concepts in Drug Research 9
  • 27.
    At the endof the 1920s the steroid hormones were isolated, and their structures were determined in short order (▶ Sect. 28.5). Altogether the discoveries of the mid-twentieth century heralded the “golden age” of drug research. The systematic variation of the principles responsible for biological activity and our increasing knowledge of the mode of action has led to the synthesis of enzyme inhibitors, receptor agonists and antagonists, which together with natural product derivatives from plants makes up the largest part of our modern pharmacy. + H3C O N CH3 CH3 CH3 O OH N R HO H 1.10 Acetylcholine 1.11 Adrenaline, R = CH3 HO NH2 HO HO 1.12 Noradrenaline, R = H 1.13 Dopamine HO NH2 H 1.15 1.14 Histamine N N N NH2 H Serotonin HOOC HOOC NH2 COOH NH2 COOH 1.17 Glutamic acid 1.16 Aspartic acid 1.18 Glycine 1.19 γ-Aminobutyric acid H2N COOH H2N COOH Tyr-Gly-Gly-Phe-Met Tyr-Gly-Gly-Phe-Leu 1.20 Met-Enkephalin 1.21 Leu-Enkephalin Fig. 1.2 The natural hormones und neurotransmitters acetylcholine 1.10, adrenaline 1.11, nor- adrenaline 1.12, dopamine 1.13, histamine 1.14, and serotonin 1.15, the excitatory amino acids glutamic acid 1.16 and aspartic acid 1.17, the inhibitory amino acid glycine 1.18 and g-aminobutyric acid (GABA) 1.19, and several peptides, such as the enkephalins 1.20 and 1.21, substance P and others serve as lead structures for drugs for a variety of cardiovascular and CNS diseases (see ▶ Chaps. 3, “Classical Drug Research”; ▶ 29, “Agonists and Antagonists of Mem- brane-Bound Receptors”; and ▶ 30, “Ligands for Channels, Pores, and Transporters”). 10 1 Drug Research: Yesterday, Today, and Tomorrow
  • 28.
    1.5 In VitroModels and Molecular Test Systems Around 40 years ago, we began to think about testing substances in simple in vitro models. With these models biological testing takes place in test tubes rather than animals. There are many compelling reasons to avoid animal experiments. They increasingly provoke public criticism and are time and cost intensive. In the beginning cell culture models were preferentially employed, for example tumor cell cultures for testing cytostatic therapies, or embryonic chicken heart cells for cardio-active compounds. Later these were joined by receptor-binding studies. The first molecular test models were enzyme-inhibitor assays in which the inhibitory activity of a molecule could be evaluated on one particular target protein in the absence of interfering side effects (▶ Chap. 7, “Screening Technologies for Lead Structure Discovery”). With the progress of gene technology methods (▶ Chap. 12, “Gene Technology in Drug Research”), not only is the preparation of the enzyme simplified, but also receptor-binding studies can be carried out on standardized materials. Today it is possible to achieve an exact evaluation of the entire activity spectrum of any substance on any enzyme, receptors of all types and subtypes, ion channels, and transporters. In the meantime, in industrial drug dis- covery this procedure has become routine. Before biological screening begins, the following questions have to be answered: what therapeutic goal should be achieved and is this goal achievable? Therapeutic concepts are established based on the pathophysiology and the causes of its alteration. Regulatory interventions with drugs should re-establish the normal physiological conditions as closely as possible. In doing so, a distinct problem occurs. Nature works on two orthogonal principles: the specificity of the mode of action and an accentuated spa separation of effects; the compartmentalization. Adrenaline that is produced in the adrenal glands works on the entire body except for the brain. If it is released there, it works only in the synapse between two nerve cells. As far as the specificity goes, the chemists can beat nature most of the time, but they fail when it comes to spatial separation by a wide margin. Through the progress made in gene technology (▶ Chap. 12, “Gene Technology in Drug Research”) we can investigate active substances much more exactly than before; but by using isolated enzymes and binding studies we are a long way away from the reality of animal models, and even further away from humans. In analogy to the difference between an animal experiment and an isolated-organ experiment, a well-established correlation between the results obtained in cell culture and an in vitro test and the desired therapeutic effect is a prerequisite to successfully using the in vitro model. The quantitative relationship between different biological effects (▶ Chap. 18, “Quantitative Structure–Activity Relationships”) establishes the con- nection between animal models and humans. One modern researcher stands out in the area of CNS-active compounds especially, but also in areas of cardiovascular-active compounds and antihista- mines. Paul Janssen (1926–2003) was the director of the company Janssen Pharmaceuticals in Beerse, Belgium. In the years after World War II, his company discovered over 70 new active substances, carried out the preclinical and clinical 1.5 In Vitro Models and Molecular Test Systems 11
  • 29.
    development, and establishedthem as therapies. In doing so, his company established itself as the most successful in pharmaceutical history. His recipe for success was not a secret. Paul Janssen was a master of structural variation, a Beethoven of drug discovery. The systematic combination of pharmacologically interesting structural fragments, and the elegant evaluation of receptor-binding studies, in vitro models, and animal experiments were the foundation of his successes. 1.6 The Successful Therapy of Psychiatric Illness Up until the middle of the last century psychiatric hospitals were purely custodial care facilities; they were almost indistinguishable from prisons in terms of the restriction of personal freedom of the individual. The discovery of neuroleptics, antidepressants, anticonvulsives, and sedatives revolutionized psychiatry. Typical examples of this class of drugs are depicted in Fig. 1.3. With the repertoire of drugs that are available today, schizophrenia, chronic anxiety, and depression preponderate open-ward psychi- atry. Many patients can be treated in an ambulatory setting. In 1933 Manfred Sakel (1901–1957), who worked at the psychiatric university hospital in Vienna, noticed that when schizophrenics were given insulin to stimulate their appetites, they became calmer. Encouraged by this result, he increased the dose to the point of hypoglycemic coma, which is a form of deep unconsciousness induced by too little blood sugar. Insulin shock, pentetrazole, and electroshock became the standard treatment over the next two decades for psychotic illness, an impressive and frightening proof of the absence of therapeutic alternatives. This situation changed in the 1950s with the discovery of reserpine 1.9 (Fig. 1.1, Sect. 1.1), a herbal natural product. This substance exerts its effect by emptying the reserves of the neurotransmitters noradrenaline, serotonin, and dopamine in nerve cells. Reserpine was the first substance to display a prominent neuroleptic effect, that is, it is sedating and calming, and it was the first compound to be used for psychotic illness, for which the biological effect could be explained by a mode of action. In addition, reserpine was used as an antihypertensive medication. Because of its very broad and unspecific effect it is rarely used today for psychiatric illness or arterial hypertension. The role of dopamine 1.13 (Fig. 1.2, Sect. 1.4) in the etiology of schizophrenia became clear with the discovery of chlorpromazine 1.22 (Fig 1.3, ▶ Sects. 8.5 and ▶ 19.10), a substance that showed a favorable clinical effect. In contrast to the unspecific reserpine, chlorpromazine is a pure dopamine antagonist. The applica- tion of chlorpromazine and analogous tricyclic neuroleptics caused symptoms that occur in Parkinson’s disease. This was the first indication that an endogenous dopamine deficiency is the cause of that disease. Chlordiazepoxide (Librium® , ▶ Sect. 2.7), the first tranquilizer of the group of benzodiazepines, was found by accident. Only one year after its introduction and for many years after that, the chemically closely related medication diazepam 1.23 (Valium® , Fig. 1.3) was the worldwide best-selling drug. The Rolling Stones 12 1 Drug Research: Yesterday, Today, and Tomorrow
  • 30.
    commemorated it intheir multifaceted song “Mother’s Little Helper.” Many com- panies started grandly endowed synthetic programs, and chemists and pharmacol- ogists applied their entire arsenal of methods. Their success justified their efforts. Substances with different modes of action resulted: further tranquilizers, sedatives, hypnotics, and even antagonists. Even today the benzodiazepines (▶ Sect. 30.5) belong to the most popular and widespread medications. The first antidepressant, iproniazid (▶ Sects. 6.7 and ▶ 27.8) was also an acci- dental discovery. It works by inhibiting the metabolism of the biogenic amines dopamine, serotonin, noradrenaline, and adrenaline by inhibiting the enzyme monoamino oxidase (▶ Sect. 27.8). In addition to other severe side effects, the first unspecific representatives caused hypertensive crises, and when taken with certain foods a few fatalities occurred. Tyramine, a substance found in cheese, wine, and beer (therefore the term “cheese effect”) was not duly metabolized. This caused a life- threatening rise in noradrenaline, a hormone that raises blood pressure. The antidepressant imipramine 1.24 (Fig. 1.3, ▶ Sect. 8.5) resulted from the synthesis of analogues of chlorpromazine. Interestingly and despite its close struc- tural relationship, it is not a neuroleptic but rather it works in the opposite way. It blocks the transporter for noradrenaline and serotonin, and this prevents the N N CH3 R O N CH3 F3C N S N CH3 CH3 Cl N N Cl O CH3 1.24 Imipramine, R = CH3 1.25 Fluoxetine Desipramine, R = H 1.26 1.23 Diazepam Chlorpromazine 1.22 H Fig. 1.3 A revolution in the therapy of psychiatric illness was brought about by the discovery of potent neuroleptics such as chlorpromazine 1.22, tranquilizers such as diazepam 1.23, and antidepressants such as imipramine 1.24. For the first time, these compounds allowed a purposeful treatment of schizophrenia, chronic anxiety, and depression. Examples of newer antidepressants with specific modes of action on transport systems (▶ Sect. 4.6) for noradrenaline and serotonin are desipramine 1.25 and fluoxetine 1.26, respectively. 1.6 The Successful Therapy of Psychiatric Illness 13
  • 31.
    reuptake of theseneurotransmitters from the synaptic gap. Desipramine 1.25 and fluoxetine 1.26 are even more selective in that they inhibit only the noradrenaline or the serotonin transporter of nerve cells. 1.7 Modeling and Computer-Aided Design An extremely capable tool is available for modeling the properties and reactions of molecules, and particularly their intermolecular interactions: the computer. In addition to processing complex numerical problems, it is the translation of the results into color graphics that exceedingly accommodates the human ability to grasp pictures faster and more easily than text or columns of numbers. That is not a surprise. Our brains process text sequentially, but pictures are comprehended in parallel. X-ray crystallography and multidimensional NMR spectroscopic tech- niques (▶ Chap. 13, “Experimental Methods of Structure Determination”) contrib- ute to our understanding of molecules as much as quantum mechanical and force field calculations (▶ Chap. 15, “Molecular Modeling”). Is molecular modeling an invention of modern times? Yes and No. Friedrich August Kekulé (1829–1896) supposedly derived his cyclic structure for benzene from a vision of a snake that circled upon itself and bit its own tail (incidentally, the snake Uroborus is an age-old alchemist symbol). This now-famous dream may be, however, traced to a memory of the book Constitutionsformeln der Organischen Chemie by the Austrian schoolteacher Joseph Loschmidt (1821–1895; Fig. 1.4). Loschmidt admittedly would take pleasure in contemplating pictures of models that are quite similar to his own. More and more today we place the three-dimensional structure, the steric dimensions, and the electronic qualities of molecules in the foreground. Advances in theoretical organic chemistry and X-ray crystallography have made this possible. The first structure-based design was carried out on hemoglobin, the red blood pigment, in the research group of Peter Goodford. Hemoglobin’s affinity for oxygen is modulated by so-called allosteric effector molecules that bind in the core of the tetrameric protein. From the three- dimensional structure he deduced simple dialdehydes and their bisulfite addition products. These substances bind to hemoglobin in the predicted way and shift the oxygen-binding curve in the expected direction. The first drug developed by using a structure-based approach is the antihy- pertensive agent captopril, an angiotensin-converting enzyme (ACE) inhibitor (▶ Sect. 25.4). Although the lead structure was a snake venom, the decisive breakthrough was made after modeling the binding site. For this, the binding site of carboxypeptidase, another zinc protease, was used because its three-dimensional structure was known at the time. The road to a new drug is difficult and tedious. A nested overview of the interplay between the different methods and disciplines from a modern point of view is illustrated in the scheme in Fig. 1.5. In the last few years molecular modeling (▶ Chap. 15, “Molecular Modeling”) and particularly the modeling of ligand– receptor interactions (▶ Chap. 4, “Protein–Ligand Interactions as the Basis 14 1 Drug Research: Yesterday, Today, and Tomorrow
  • 32.
    for Drug Action”),have gained importance. Although modeling is employed predominantly for the targeted structure modification of lead compounds, it is also suitable for the structure-based and computer-aided design of drugs (▶ Chap. 20, “Protein Modeling and Structure-Based Drug Design”) and lead structure discovery (▶ Sect. 7.6). Examples of these approaches are given in ▶ Chaps. 23, “Inhibitors of Hydrolases with an Acyl–Enzyme Intermediate”; ▶ 24, “Aspartic Protease Inhibitors”; ▶ 25, “Inhibitors of Hydrolyzing Metalloenzymes”; ▶ 26, “Transferase Inhibitors”; ▶ 27, “Oxidoreductase Inhibi- tors”; ▶ 28, “Agonists and Antagonists of Nuclear Receptors”; ▶ 29, “Agonists and Antagonists of Membrane-Bound Receptors”; ▶ 30, “Ligands for Channels, Pores, and Transporters”; ▶ 31, “Ligands for Surface Receptors”; ▶ 32, “Biologicals: Peptides, Proteins, Nucleotides, and Macrolides as Drugs”. In addition to modeling and computer-aided design, structure–activity relation- ship analysis (▶ Chap. 18, “Quantitative Structure–Activity Relationships”) has contributed to the understanding of the correlation between the chemical structure of compounds and their biological effects. By using these methods, the influence of lipophilic, electronic, and steric factors on the variation of the biological activity, transport, and distribution of drugs in biological systems could be systematized for the first time on statistically significant foundations. H H OH O H2N NH2 Cl N N N N N H H Fig. 1.4 Loschmidt’s book Constitutionsformeln der Organischen Chemie (1861) contains struc- tures that anticipate both the formulation of the benzene ring as well as the modern modeling structure. Kekulé must have known about this book because he disparaged it in a letter to Emil Erlenmeyer in January 1862 in that he referred to it as “Confusionsformeln.” Loschmidt did not become famous for his book, but rather because he carried out an experiment in 1865 that determined the number of molecules in a mole to be 6.021023 , a constant that was later to be named after him. 1.7 Modeling and Computer-Aided Design 15
  • 33.
    1.8 The Resultsof Drug Research and the Drug Market The development of different methods in drug research has already been described in the last section. Table 1.1 gives a short historical overview of the most prominent results. Identification of a biological target, proof of principle, molecular test system Literature, patents, competitor products (‘me too’ research) Biological concept, clinical side effects Screening Natural products, synthetics, peptides, combinatorial chemistry Lead structures Experimental design, synthetic design Synthesis Computer-aided design: protein crystallography, NMR, 3D database searches, de novo design DESIGN CYCLE Biological testing Structure—activity relationships, QSAR, molecular modeling Candidate for further development Developmental substance Drug Formulation Fig. 1.5 The way to a drug is long. The upper part of the figure shows routes to lead structures. The middle part describes the design cycle, which in practically all cases must be repeatedly reiterated. Each of these phases is described in detail in the following chapters. The result of iterative optimization is candidates for further development such as preclinical and toxicological studies. It is from these studies that the actual candidates are selected. Formulation, clinical trials, and registration then lead to a new medicine. The last phases are not presented in this book. 16 1 Drug Research: Yesterday, Today, and Tomorrow
  • 34.
    Table 1.1 Importantmilestones in drug research Year Substance Indication/Mode of action 1806 Morphine Hypnotic 1875 Salicylic acid Anti-inflammatory 1884 Cocaine Stimulant, local anesthetic 1888 Phenacetin Analgetic and antipyretic 1889 Acetylsalicylic acid Analgetic and antipyretic 1903 Barbiturate Sedative 1909 Arsphenamin Anti-syphilitic 1921 Procaine Local anesthetic 1922 Insulin Antidiabetic 1928 Estrone Female sex hormone 1928 Penicillin Antibiotic 1935 Sulfamidochrysoidine Bacteriostatic 1944 Streptomycin Antibiotic 1945 Chloroquine Antimalarial 1952 Chlorpromazine Neuroleptic 1956 Tolbutamide Oral antidiabetic 1960 Chlordiazepoxide Tranquilizer 1962 Verapamil Calcium channel blocker 1963 Propranolol Antihypertensive (b-blocker) 1964 Furosemide Diuretic 1971 L-DOPA Parkinson’s disease 1973 Tamoxifen Breast cancer (estrogen receptor antagonist) 1975 Nifedipine Calcium channel blocker 1976 Cimetidine Gastrointestinal ulcer (H2 blocker) 1981 Captopril Antihypertensive (ACE inhibitor) 1981 Ranitidine Gastrointestinal ulcer (H2 blocker) 1983 Ciclosporin A Immunosuppressant 1984 Enalapril Antihypertensive (ACE inhibitor) 1985 Mefloquine Antimalarial 1986 Fluoxetine Antidepressant (5-HT-transport inhibitor) 1987 Artemisinin Antimalarial 1987 Lovastatin Cholesterol biosynthesis inhibitor 1988 Omeprazole Gastrointestinal ulcer (H+ /K+ -ATPase inhibitor) 1990 Ondansetron Antiemetic (5-HT3 blocker) 1991 Sumatriptan Migraine (5-HT1B,D agonist) 1993 Risperidone Antipsychotic (D2/5-HT2-blocker) 1994 Famciclovir Antiviral/herpes (DNA polymerase inhibitor) 1995 Losartan Arterial hypertension (ATII antagonist) 1995 Dorzolamide Glaucoma (carboanhydrase inhibitor) 1996 Saquinavir HIV protease inhibitor 1996 Ritonavir HIV protease inhibitor 1996 Indinavir HIV Protease inhibitor 1996 Nevirapine HIV reverse transcriptase inhibitor (continued) 1.8 The Results of Drug Research and the Drug Market 17
  • 35.
    The assessment ofthe efficacy and safety of a drug has reached an extraordi- narily high standard today. To some extent this development is a bystander in our goal of finding new medicines, but it is also a hindrance. Acetylsalicylic acid (Aspirin® ) is without any doubt a valuable drug. Today this compound would have great difficulty to pass clinical trials. Acetylsalicylic acid is an irreversible enzyme inhibitor, it has relatively weak efficacy, it causes gastric bleeding in high doses, and it has a very short biological half-life. Each of these problems would be a profound argument against its continued development today. It probably would have already failed in screening. In a risk–benefit analysis however, it is better than most of the alternatives. Where is the problem? It probably lies in the analytical– deterministic mindset that dominates science, and therefore also drug research. It is often overlooked that such an approach deals with a system as complicated and complex as a human, to whom we apply a drug therapy, cannot always be ade- quately addressed by all means. Despite public healthcare systems that constitute a barrier between the supplier and the consumer, the drug market, with worldwide sales of more than US$880 billion, has strong competition. Two forces affect this market: the state of science Table 1.1 (continued) Year Substance Indication/Mode of action 1997 Sibutramine Obesity (uptake inhibitor) 1997 Orlistat Obesity (lipase inhibitor) 1997 Tolcapon Parkinson’s disease (COMT inhibitor) 1998 Sildenafil Erectile dysfunction (PDE5 inhibitor) 1998 Montelukast Broncholytic (leukotriene receptor antagonist) 1999 Infliximab Antirheumatic (TNFa antagonist) 2000 Celecoxib Analgesic (COX-2 inhibitor) 2000 Verteporfin Macular degeneration (photodynamic therapy) 2001 Imatinib Acute myeloid leukemia (kinase inhibitor) 2002 Boscutan Arterial hypertension (endothelin-1 receptor antagonist) 2002 Aprepitant Antiemetic (neurokinin receptor antagonist) 2003 Enfuvirtid HIV fusion inhibitor (oligopeptide) 2004 Ximelagatran Coagulation inhibitor (thrombin inhibitor) 2004 Bortezomib Multiple myeloma (proteasome inhibitor) 2005 Bevacizumab Cytostatic (angiogenese inhibitor) 2006 Natalizumab Multiple sclerosis (monoclonal antibody; integrin inhibitor) 2006 Aliskiren Antihypertensive (renin inhibitor) 2007 Maraviroc HIV fusion inhibitor (CCR5 antagonist) 2007 Sitagliptin Type-II diabetes (DPPVI inhibitor) 2008 Raltegravir HIV integrase inhibitor 2009 Rivaroxaban Oral Anticoagulant (FXa inhibitor) 2010 Mifamurtide Drug against Osteosarcoma (bone cancer) 2011 Fingolimod Immunomodulating drug (multiple sclerosis treatment) 18 1 Drug Research: Yesterday, Today, and Tomorrow
  • 36.
    and technology andthe needs of patients. A few drugs command a large portion of sales. Constantly changing “hit lists” of the best-selling drugs can be found on the internet. Because of the merging of established pharmaceutical companies in the last years, the market has contracted to fewer, bigger companies. It is frequently the case that a single drug can make or break a company. Often only two to three drugs make up more than 50% of a large company’s sales. A historical example is Glaxo. This company made its way out of the midfield to the top with ranitidine. Astra experienced a similar boom with omeprazole. Today after the merger with Zeneca, it belongs to the biggest representatives of this field. Sankyo also had a single drug, lovastatin, that exceedingly boosted sales. With its drugs sildenafil (Viagra® ) and atorvastatin (Sortis® /Lipitor® ) Pfizer’s profits shot to unimaginable highs. Just in the last years we have been able to see an increasing concentration of pharmaceutical companies, so that the market is making a transition to an oligo- poly, dominated by multinational corporations. Keep in mind that sales giants such as GlaxoSmithKline (GSK), Novartis, Sanofi-Aventis, Bayer HealthCare, Bristol- Myers Squibb or AstraZeneca have only originated in the last 10 years through mergers. Companies such as Pfizer and Roche have significantly grown from acqui- sitions. The role research plays for pharmaceutical companies is apparent when one considers that typically 15–20% of turnover is invested in this area. It is certain that the concentration of the pharmaceutical market is not complete. We can only wait and see how the landscape continues to shift and adapt at an almost annual pace. 1.9 Controversial Drugs Drugs remain in the focal point of public interest. Whereas for decades it was the physician alone who prescribed medication, today it is the patient, frightened by the lay press or better informed through labeling or reputable literature, who wants to take control of, or at least share in the decision making. The issues can be illustrated by one example. Psychotropic pharmaceuticals exert an impressive effect on personality and behavior. At least since the intro- duction of Valium® (diazepam) these drugs have been in the media spotlight. They are invaluable for the treatment of psychiatric illness. On the other hand, the danger of misuse and addiction is particularly high. Some of these drugs are even used as self-medication, without strict adherence to the indication guidelines. Fluoxetine 1.26 (Prozac® , Fig. 1.3, Sect. 1.6) was introduced in 1988 by Eli Lilly, and brought unequivocal progress in the treatment of depression. On this one medication alone there are now over ten popular science books with controversial content. Peter Kramer’s book Listening to Prozac takes an overall sympathetic tone with the assertion that depressed patients feel better and more “in harmony” with their personality after treatment with fluoxetine. This book was on the New York Times bestsellers’ list for over 21 weeks. Peter Breggin’s book Talking Back to Prozac criticized fluoxetine, the company Eli Lilly, and the U.S. Food and Drug Administration (FDA) polemically. The side effects, risks, and particularly the addictive potential were placed in the foreground. Both books contain correct 1.9 Controversial Drugs 19
  • 37.
    assertions, and bothbooks lead to the wrong conclusions. Prozac® is a valuable medicine for the treatment of clinically manifest depression; for the treatment of mundane unhappiness or as a general stimulant, however, it is a drug with many risks. To make a risk–benefit analysis of a medication, it is important to consider not only the desired effect but also the severity of the illness and the objective and subjective side effects. In oncology one accepts even severe side effects for the possibility of improving the patient’s condition. If an end-stage cancer patient is refused an effective pain therapy because of the risk of addiction, then that must be seen as malpractice. On the other hand many people handle highly potent medica- tions recklessly. The misuse of antibiotics, the faith in the almighty power of tranquilizers and antidepressants, or the chronic use of analgesics and laxatives do more damage than good. 1.10 Synopsis • Drug research can be divided into several sequential phases starting with empirical observations of the uptake of natural products from food, the development of in vitro test systems, increasing understanding of structures and modes of action, to in vivo models and gene technology. • It all started with traditional medicines. The first prescriptions date back to the ancient Egyptians and to traditional Chinese medicine. • Paracelsus founded scientific medical research and understood humans to be a “chemical laboratory.” The ingredients of drugs were first held responsible for healing effects. • With the advent of organic chemistry, the first therapeutic principles based on pure organic compounds became available. The great age of natural products from plants and their active ingredients began. • Systematic studies on animals began in the next-to-last century and can be seen as a starting point for drug research. In vitro models are needed to test large series of potentially active compounds, but animal models are required to correlate the data and make predictions about the therapeutic effects in humans. • Our present life expectancy would not be possible without the successful fight against infectious diseases. The broad application of antibiotics and the spread of resistant pathogens, however, have led to situations in which the best weapons against infectious diseases are becoming increasingly dull. Research against widespread tropical diseases has been neglected, and the currently increasing resistance to available medications represents a worldwide problem. • The elucidation of biological concepts, pathways, and regulatory cycles by endogenous compounds has strongly stimulated drug research. Many developed drugs have arisen from structural variations of neurotransmitters, hormones, steroids, or natural substrates. 20 1 Drug Research: Yesterday, Today, and Tomorrow
  • 38.
    • Systematic substancetesting began with the establishment of in vitro models that replaced biological testing on animals by assays in test tubes. Gene technology has made it possible to prepare sufficient amounts of pure proteins for testing. • The discovery of neuroleptics, antidepressants, anticonvulsives, and sedatives has revolutionized the treatment of psychiatric diseases. • Molecular modeling and computer-aided design along with structural biology give access to rational considerations on drug action. The first structure-based design project was carried out on hemoglobin, and the first drug developed by using a structure-based approach was the antihyperten- sive captopril. • The assessment of drug efficacy and safety has reached an extraordinary high standard today. The worldwide drug market, with nearly a thousand billion US dollars in sales per year, is large and highly competitive. Only a few drugs command a large portion of the sales and determine the particular dynamics in the market; the current tendency is corporate contraction to fewer and bigger companies. Often a single drug can make or break a company. • Drugs remain in the focal point of public interest. It is no longer the physician alone who influences the prescription of medication; multiple sources of infor- mation have an impact and inform the patient. A proper risk–benefit analysis of a medication, taking into consideration not only the desired therapeutic effect but also the severity of an illness, is needed. Bibliography General Literature Barondes SH (1993) Molecules and mental illness, Scientific American Library. W. H. Freeman and Company, New York Beddell CR (ed) (1992) The design of drugs to macromolecular targets. Wiley, Chichester Fischer D, Breitenbach J (eds) (2003) Die Pharmaindustrie. Spektrum Akademischer Verlag, Heidelberg/Berlin Friedrich C, Müller-Jahncke W-D (2005) Von der Frühen Neuzeit bis zur Gegenwart, vol 2. GOVI-Verlag, Eschborn Higby G (ed) (1997) The inside story of medicine. A Symposium. Madison, Wi Herrmann EC, Franke R (eds) (1995) Computer-aided drug design in industrial research, Ernst Schering research foundation workshop 15. Springer, Berlin Müller K (ed) (1995) De Novo Design, Persp. Drug Discov. Design, vol 3, Escom, Leiden, 1995 MüllerJahnke WD, Friedrich C (2005) Arzneimittelgeschichte. Wissenschaftliche Verlagsge- sellschaft, Stuttgart Perun TJ, Propst CL (eds) (1989) Computer-aided drug design. Methods and applications. Marcel Dekker, New York Porter R, Teich M (eds) (1995) Drugs and narcotics in history. Cambridge Restak RM (1994) Receptors. Bantam Books, New York Schmitz R (1998) Geschichte der Pharmazie, vol 1. GOVI-Verlag, Eschborn Bibliography 21
  • 39.
    Verband Forschender Arzneimittelhersteller(2009) e.V.: http://www.vfa.de/de/presse/statcharts/ arzneimittelmarkt/. Accessed 22 Nov 2011 Werth B (1994) The Billion-Dollar Molecule. One Company’s Quest for the Perfect Drug. Touchstone, New York Special Literature Beddell CR, Goodford PJ, Norrington FE et al (1976) Compounds designed to fit a site of known structure in human hemoglobin. Br J Pharmac 57:201–209 Breggin PR, Breggin GR (1994) Talking back to prozac. St. Martin’s Press, New York Kramer P (1993) Listening to prozac. Viking, New York Mutschler E (1987) Arzneimittel – Erfolge, Misserfolge, Hoffnungen. Deutsche Apoth-Ztg 127:2025–2033 Newman DJ, Cragg GM (2007) Natural products as sources of new drugs over the last 25 years. J Nat Prod 70:461–477 Noe CR, Bader A (1993) Facts are better than dreams. Chem Brit 29:126–128, Kekulés and Loschmidts Formeln 22 1 Drug Research: Yesterday, Today, and Tomorrow
  • 40.
    In the Beginning,There Was Serendipity 2 “A lucky accident dropped the medicine into our hands”; this is how a publication on August 14, 1886, from Arnold Cahn and Paul Hepp in the Centralblatt f€ ur Klinische Medizin began. The history of drug research is punctuated by lucky accidents. As a general rule, detailed knowledge of biological systems was absent. So it is not surprising that the working hypotheses were often wrong, and the obtained results differed from the expectations. The case of accidental success fell into the back- ground over time. Today happenstance as a strategy has been replaced by the arduous and ambitious goal of preparing drugs by using a straightforward approach. The only exception to this is the kind of shotgun-style testing of large and diverse chemical compound libraries, including microbial and plant extracts that is done with the goal of finding new lead structures. In this case, serendipity is desired to find as large and diverse a palette of lead structures (▶ Chaps. 6, “The Classical Search for Lead Structures” and ▶ 7, “Screening Technologies for Lead Structure Discovery”) with potential for further optimization (▶ Chaps. 8, “Optimization of Lead Structures” and ▶ 9, “Designing Prodrugs”). 2.1 Acetanilide Instead of Naphthalene: A New, Valuable Antipyretic Back to Cahn and Hepp. What happened? There are several legends about this lucky accident. The most plausible version is that the antipyretic effect of naph- thalene, which was widely available from coal tar, was tested. The substance indeed showed fever-lowering qualities. The responsible substance however, was not naph- thalene but rather something entirely different: acetanilide 2.1 (Fig. 2.1). Further experiments confirmed the efficacy. Shortly thereafter, the company Kalle Co. introduced it to the market with the name “Antifebrin.” Phenacetin 2.2 (Fig. 2.1) was subsequently developed based upon a targeted approach. At the time, Bayer in Elberfeld had 30 t of p-nitrophenol, a side product from dye production, on their waste heap. The then 25-year-old Carl Duisberg, who later became the chairman of Bayer Farbenfabriken AG and who also took a leading G. Klebe, Drug Design, DOI 10.1007/978-3-642-17907-5_2, # Springer-Verlag Berlin Heidelberg 2013 23
  • 41.
    role in thefoundation of I.G. Farbenindustrie in 1924, wanted to use it for the preparation of acetanilide as it could easily be reduced to p-aminophenol. The known toxicity of phenol groups led to the design of p-ethoxyacetanilide 2.2 (phen- acetin), which actually did have the desired qualities and served as an analgesic for headaches and as an antipyretic for a century. Unfortunately its metabolite 2.4, which still contains the ethoxy group, leads to the production of methemoglobin, an oxidized form of the red blood pigment that is incapable of carrying oxygen. Furthermore, chronic misuse by, for instance, taking kilogram quantities of phenac- etin over a lifetime, leads to kidney damage. Paradoxically, the main metabolite of phenacetin, p-hydroxyacetanilide 2.5 (Fig. 2.1, acetaminophen in American English, or paracetamol in UK English) is actually responsible for the effect, and it is less toxic and better tolerated. In the USA alone, paracetamol achieved over US$1.3 billion in annual sales. This is even more than for acetylsalicylic acid. 2.2 Anesthetics and Sedatives: Pure Accidental Discovery In 1799 Humphry Davy (1778–1829) discovered the euphoric effect of nitrous oxide (N2O), which was appropriately named “laughing gas.” The dentist Horace Wells (1815–1848) saw a traveling theater production of a “sniffing party” with N2O in 1844 in which a participant suffered from a flesh wound, apparently without pain. To test this effect, Wells had one of his own teeth extracted, also without pain. He then repeated the procedure on many people, with success. A public demonstration went O N O- O C O + HN CH3 HN CH3 OH OEt Phenacetin O p-Nitrophenol Acetanilide 2.1 2.2 2.3 NH2 HN CH3 OEt OH 2.4 2.5 Paracetamol Toxic metabolite Fig. 2.1 By starting with the accidently discovered acetanilide 2.1, Carl Duisberg planned the synthesis of phenacetin 2.2 from nitrophenol 2.3. In contrast to the toxic metabolite 2.4, the main metabolite, paracetamol (Amer. acetaminophen) 2.5 is well tolerated. 24 2 In the Beginning, There Was Serendipity
  • 42.
    wrong though, andthis drove him to suicide four years later. The same effect was observed in 1842 by Crawford W. Long (1815–1878) with ether, but he did not report it immediately. After administering ether, he was able to remove an ulcer from the neck of a volunteer. William T. Morton (1819–1868) successfully carried out the first ether anesthesia in the same hospital as Wells. Starting in 1847, chloroform was used as an anesthetic. A few years later anesthesia became standard for surgical pro- cedures, a real blessing for the suffering of humanity. Oskar Liebreich (1839–1908) wanted to develop a depot form of chloroform 2.6 in 1868. Because chloral hydrate can be cleaved with base in an aqueous milieu, he hoped that this could also happen in the body. Chloral hydrate is in fact a sedative, but this is because of its active metabolite, trichloroethanol 2.8 (Fig. 2.2), and not because it releases chloroform. In 1885 Oswald Schmiedeberg (1838–1921) tested urethane 2.9 (ethylcarbamate, Fig. 2.3) because he thought that it would release ethanol in the organism. Urethane itself is the active agent. Its optimization later led to isoamylcarbamate 2.10 (Hedonal® , 1899). Based on this, open and cyclic carbamates and ureas were investigated. In 1903 the first barbiturate sedative, barbital (Veronal® ) resulted. In the decades that followed, a wealth of better-tolerated barbiturates with a broader pharmacokinetic spectrum was introduced. 2.3 Fruitful Synergies: Dyes and Pharmaceuticals Dyes and pharmaceuticals have stimulated each other. The first synthetic dye was the result of a failed drug synthesis. In 1856 August Wilhelm v. Hoffman assigned the task of synthesizing quinine, an alkaloid used for treating malaria (▶ Sects. 1.1 and ▶ 3.2), to the then 17-year-old William Henry Perkins (1838–1907); by starting Cl CH Cl OH H Cl Cl CH2OH Cl Cl OH− 2.6 Chloroform 2.7 Chloral hydrate 2.8 Trichlorethanol Metabolism Cl OH Cl Cl Fig. 2.2 The anesthetic chloroform 2.6 is formed upon treatment of chloral hydrate 2.7 with base. This reaction does not work in vivo, however. The active metabolite of 2.7 is trichloroethanol 2.8. O R 2.9 Urethane R = −CH2 CH3 O H2N 2.10 Isoamylcarbamate O H N N O O Et Et H 2.11 Barbital R = −CH2CH2CH(CH3)2 Fig. 2.3 The hypothetical “prodrug” of ethanol, urethane 2.9, led to the development of isoamylcarbamate 2.10, which in turn led to the first barbiturate, barbital 2.11. 2.3 Fruitful Synergies: Dyes and Pharmaceuticals 25
  • 43.
    with only themolecular formula, it was anticipated that the oxidation of an allyl- substituted toluidine would deliver the desired product. Now that the structural formula is known, we understand that this could not possibly have worked! Upon oxidation of aniline that was contaminated with o- and p-toluidine Perkins isolated a dark precipitate. It contained a dye, mauveine 2.12 (Fig. 2.4) that colored silks a brilliant mauve. Other dyes were prepared in rapid succession. The development and later proliferation of the dye industry in England and Germany in the second half of the nineteenth century can be traced back to this accidental discovery. Toward the end of the next-to-last century increasing competition and a difficult economic situation in the dye market inspired the reactionary expansion into industrial pharmaceutical research. In 1896 a pharmaceutical research laboratory was founded in the 33-year-old Bayer Farbenfabrik. At that time innumerable synthetic dyes were known, therefore it is not surprising that these substances were tested for pharmacological effects. Of all people, wine adulterators played an important role in the discovery of the first synthetic laxative. To stop people from selling Trester wine (so-called Nachwein) as a natural wine (Naturwein), in 1900 the dye phenolphthalein was added as an easily detectable indicator. The Hungarian pharmacologist Zoltán von Vámossy (1868–1953) investigated the effects of this compound. Back then, the conventions of the pharmacologists were still rather primitive. The intravenous application of 0.01–0.03 g to rabbits caused death “with loud shrieking, convul- sions, and paralysis”. Vámossy then decided to feed 1–2 g to a rabbit and 5 g to a 4 kg lap dog. Because these oral doses were all well tolerated, Vámossy took 1.5 g of phenolphthalein himself, and a friend took 1.0 g. The effects were explosive: rumbling in the bowels, diarrhea, and for two additional days loose stools. It was later established that 150–200 mg would have been a therapeutic dose. N N H3C H2N NH CH3 CH3 NH2 R = H oder o-, p-Methyl C20H24N2O2 + H2O 3 [O] [O] + 2 C10H13N 2.12 Mauveine Allyl- toluidin Quinine R Fig. 2.4 An unsuccessful quinine synthesis founded the dye industry. The structures of many organic compounds were still entirely unknown in the middle of the nineteenth century. The attempt to prepare quinine via a simple route (upper reaction) could not have worked. The oxidation of an impure aniline (below) gave mauveine 2.12 in 1856, which was used to dye silk a brilliant mauve color. It was the first synthetic dye! 26 2 In the Beginning, There Was Serendipity
  • 44.
    An entire rangeof antibacterial and antiparasitic dyes are based on the work of Robert Koch (1843–1910). He showed that bacteria and parasites accumulate dyes specifically. Based on this, Paul Ehrlich (1854–1915) hoped to kill pathogens selectively with suitably chosen dyes. In 1891 he cured two mild cases of malaria by treating the patients with methylene blue. In the following years he tested hundreds of different pigments, and thousands more analogues were later synthesized in the laboratories of Bayer and Hoechst. In 1909 Paul Ehrlich pursued a rational design when he exchanged both of the nitrogen atoms of an —N═N— group of an azodye for arsenic atoms. Arsphenamine 2.14 (Salvarsan® , Fig. 2.5) was the first effective compound to treat syphilis; the first chemotherapeutic. It became an extraordinary economic success for the company Hoechst. The breakthrough with chemotherapeutics was made by the physician Gerhard Domagk (1895–1964). At the age of 31, he took over the newly formed department of experimental pathology at Bayer in Elberfeld. Azo dyes bearing sulfonamide groups had already been designed by the chemists Fritz Mietzsch and Josef Klarer, but they showed no in vitro activity; Domagk tested these substances in strepto- cocci-infected mice. By using this model, he found the first active substances in 1932. Sulfamidochrysoidine 2.15 (Protonsil® , Fig. 2.6), a dark-red dye that could As HO H2N As NH2 OH O O HO HO 2.13 Phenolphthalein 2.14 Arsphenamine x 2 HCl Fig. 2.5 The laxative effect of phenolphthalein became apparent while testing it as an additive for cheap wines. The antisyphilis compound arsphenamine 2.14 (Salvarsan® , here shown as monomer) is simply an azodye in which the —N═N— group was exchanged for an —As═As— group. H2N H2N H2N N N NH2 SO2NH2 2.15 Sulfamidochrysoidine 2.16 Sulfanilamide 2.17 p-Aminobenzoic acid SO2NH2 COOH Fig. 2.6 The red azodye sulfamidochrysoidine 2.15 is effective only after cleavage to the colorless sulfanilamide 2.16, which is a bacterial antimetabolite of p-aminobenzoic acid 2.17. 2.3 Fruitful Synergies: Dyes and Pharmaceuticals 27
  • 45.
    treat even severestreptococci infections, resulted in 1935. The sulfonamides became world famous a year later when the son of the US president, Theodore D. Roosevelt, Jr., was treated with one to cure a severe sinus infection. But even here a false hypothesis led to success. It was not the azodye itself, but rather its metabolite, sulfanilamide 2.16 that was effective. Sulfanilamide replaces p-aminobenzoic acid 2.17 (Fig. 2.6), which is needed for the bacterial synthesis of an enzymatic cofactor, dihydrofolic acid. 2.4 Fungi Kill Bacteria and Help with Syntheses The discovery of the antibiotic effect of Penicillium notatum by Alexander Fleming (1881–1955) in 1928 is the most famous example of a serendipitous discovery. Fleming noticed that a spoiled staphylococcus culture had been contaminated with a fungal infection. In the area around the fungus, no bacteria could grow. Further investigations showed that this fungus could also curb other bacteria. Fleming called the still-unknown agent penicillin. It was not until 1940 that it was isolated and characterized by Ernst Boris Chain (1906–1979) and Howard Florey (1910– 1985). In 1941 an English policeman was the first patient to be treated with penicillin. Despite a temporary improvement, and even though penicillin could be isolated from his urine, he died after a few days as no more penicillin was available for his continued therapy. The fungus Penicillium chrysogenum, which produces more penicillin than Penicillium notatum and is easier to cultivate was isolated from a moldy melon in Illinois. The tedious route to the structural elucidation of penicillin and the successful work to systematically vary its structure are scientific masterworks of the first order. There were even more difficult problems to conquer to optimize its production and its biotechnological mass production. Today the modified penicillins 2.18 and cephalosporins 2.19 (Fig. 2.7), which make up a broad range of antibiotics with outstanding bioavailability are available. The newer analogues have a broader spectrum of activity against many pathogens and are distinguished by a generally improved stability to the penicillin-degrading enzyme b-lactamase. Fleming was a researcher to whom Pasteur’s thesis “chance favors the prepared mind” fully applies. One day in 1921 while working in his laboratory with a cold, he tried a rather headstrong experiment. He added a drop from his own nasal mucus to a bacterial culture and found a few days later that the bacteria had been S H H CH3 CH3 N S H H RHN RHN N O COOH O CH2R⬘ COOH 2.18 Penicillins 2.19 Cephalosporins Fig. 2.7 Fleming’s accidental discovery of the antibiotic effects of a fungus has delivered a wide palette of penicillins 2.18 and cephalosporins 2.19, each with different R groups. 28 2 In the Beginning, There Was Serendipity
  • 46.
    killed. This “experiment”led to the discovery of lysozyme, an enzyme that hydro- lyzes the bacterial wall. As a therapy it is unfortunately unsuitable because it does not attack most human pathogens. Chance and a fungus played an important role in the industrial synthesis of corticosteroids. An important step in the synthesis is the introduction of an oxygen atom at a particular position in the steroid scaffold, position 11. In 1952 chemists at the Upjohn company sought after a soil bacteria that could hydroxylate a steroid in this position. Just when they finally decided to set an agar plate on the window bank of the laboratory, Rhizopus arrhizus landed exactly there. This fungus transforms progesterone (▶ Sect. 28.5) to 11a-hydroxyprogesterone. With its help the yield could be increased to 50%. The closely related fungus Rhizopus nigricans even afforded 90% of the desired product. 2.5 The Discovery of the Hallucinogenic Effect of LSD In the 1930s Albert Hoffmann (1906–2008) was working on the partial synthesis of ergoline alkaloids at Sandoz. In 1938 he wanted to find a way to transfer the respiratory and cardiovascular stimulatory effect of N,N-diethyl nicotinamide 2.20 onto this class of compounds. In analogy to 2.20, he prepared N,N-diethyl lysergamide 2.21 (Fig. 2.8) with the hope of maintaining the stimulatory circulatory and respiratory effects. Except in case the experimental animals were agitated under anesthesia, the substances showed no particular effect. Therefore they were not pursued at first. Hoffman prepared the substances for a second time five years later because he wanted to investigate them more thoroughly. Upon the purification procedure and recrystallization he reported feeling “a strange agitation combined with a slight dizziness.” At home he fell into “a not-unpleasant inebriated condition that was characterized by extremely animated fantasies . . . after about 2 hours, the condition went away.” Hoffman suspected a connection to the compounds he prepared and conducted a self-experiment with 0.25 mg a few days later. That was the smallest dose with which he expected to see an effect. The outcome was dramatic, the experience was the same as the first time, but much more intense. He had a technician accompany him home on his bicycle. During the ride, his condition H N CO-N(Et)2 N CH3 H H CO-N(Et)2 HN 2.20 N,N-diethyl nicotinamide 2.21 LSD Fig. 2.8 N,N-Diethyl nicotinamide 2.20 is a centrally active derivative of nicotinic acid. Hofmann wanted to synthesize a general stimulant analogously by preparing the N,N-diethyl amide of lysergic acid. The result was the hallucinogen lysergic acid diethyl amide 2.21 (LSD). 2.5 The Discovery of the Hallucinogenic Effect of LSD 29
  • 47.
    took on athreatening form, and he fell into a severe crisis dominated by dizziness and anxiety. The world took on a grotesque form. Later it was determined that 0.02–0.1 mg is enough to cause hallucinations. The substance was temporarily marketed as Delyside® for use in psychotherapy and to treat anxiety and compul- sive disorders. 2.6 The Synthetic Route Determines the Structure The structure of the first calcium channel blocker, verapamil 2.22 was determined by its synthesis (Fig. 2.9). Verapamil counteracts the effects of b-adrenergic agonists, but it is not a b-blocker. It was only after its introduction to the market CN CH3 CH3 MeO + N CH3 OMe OMe OMe OMe Cl CH2 H3C H3C Br MeO + + CN N MeO MeO CH3 2.22 Verapamil CHO NO2 COOMe NO2 COOMe + MeOOC MeOOC NH3 OH H3C H3C HO CH3 CH3 N H + 2.23 Nifedipine Fig. 2.9 Ferdinand Dengel, a chemist at the former Knoll AG wanted to prepare a cardiovascular therapeutic by alkylating a nitrile. To avoid a double substitution, he started with the sterically demanding isopropyl group. The result was the first calcium channel blocker, verapamil 2.22. The isopropyl group is the optimal alkyl group because it stabilizes the biologically active conforma- tion. The synthetic route played an important role in the development of the second calcium channel blocker, nifedipine 2.23. In 1948, Friedrich Bosser at Bayer was given the task of finding new substances that dilate the coronary arteries. After years of work, in 1964 he turned to the easily prepared dihydropyridines, which surprisingly displayed the desired effects. In this case, the space-filling nitro group promotes the biologically active conformation (▶ Sect. 17.9). 30 2 In the Beginning, There Was Serendipity
  • 48.
    that Albrecht Fleckensteinclarified its mode of action: it blocks the inward mem- brane-voltage-dependent flow of calcium ions through the calcium channels (▶ Sect. 30.1) in heart and endothelial cells. The hypotonic effect was initially seen as a side effect, but in the following years it became the most important reason for use. The second group of therapeutically important calcium channel blockers, nifedipine 2.23 was inspired by a synthetic principle. It was a reaction from 1882, the Hantzsch synthesis of dihydropyridines (Fig. 2.9). Remarkably, the pharmaco- logical experiments on nifedipine had to be carried out in a darkened room because of its photosensitivity. All the more reason to acclaim that it was developed into a medicine despite this characteristic. 2.7 Surprising Rearrangements Lead to Medicines Leo Sternbach (1908–2005), a chemist at Hoffman La Roche was involved in a program in the mid-1950s to find structurally novel tranquilizers. Sternbach remembered a synthetic program on pigments from a decade before in which N-oxide 2.24 (Fig. 2.10) was also prepared. Its reaction with secondary amines delivered the expected products, which were pharmacologically absolutely uninteresting. The work was practically ended in 1957, and the laboratory was being cleaned up when it was noticed that a crystalline base and its hydrochloride salt had precipitated from a solution. The substance was the product of a reaction between N-oxide 2.24 and methylamine, but it was never tested due to other priorities. The subsequent pharmacological testing convincingly showed outstanding qualities. It was only later established that an unexpected ring rearrangement reaction had occurred to afford chlordiazepoxide 2.25 (Librium® , Fig. 2.10). There are other examples of this sort. In 1974 W. Berney was working on spirodihydronaphthalenes 2.26 (Fig. 2.11) with the goal of preparing CNS-active substances. Upon acid treatment, he obtained a compound that was highly potent in vitro and in vivo against a series of human-pathogenic fungi in a routine broad screening at Sandoz Research Institute in Vienna. In 1985 the substance was N Cl O− Cl N+ N+ N O− N H CH3 Cl CH3NH2 2.24 2.25 Chlorodiazepoxide Fig. 2.10 Treatment of 2.25 with methylamine delivers the rearrangement product chlordiaz- epoxide 2.25 (Librium® ) instead of the expected one. This first test compound became the first of the benzodiazapine class to be marketed. 2.7 Surprising Rearrangements Lead to Medicines 31
  • 49.
    introduced as naftifine2.27, and later a more potent analogue, terbinafine 2.28 (Fig. 2.11) followed. Both substances showed a previously unknown mode of action. They damage the membrane of fungi in that they block the ergosteroe biosynthesis. This happens in a very early step because of the inhibition of the enzyme squalene epoxidase. 2.8 A Long List of Accidents The list of accidental discoveries, from which a few are described here, can be prolonged ad infinitum. A few more examples are briefly mentioned without chemical formulae. • Pethidine (▶ Sect. 3.3), the first fully synthetic opiate analgesic, was synthesized in the 1930s as part of an anticonvulsives research program, by starting from atropine. • The suitability of antihistamines for the prevention of motion sickness was discovered in Boston because of a treatment for a skin rash. A patient reported that her motion sickness, which always occurred when riding a Boston street car went away. The “clinical trial” was carried out in 1947 on hundreds of sailors on the transatlantic voyage of the USNS General Ballou. • Haloperidol (▶ Sect. 3.3) was meant to be an analgesic, it turned out to be a neuroleptic. • Imipramine is structurally very similar to the neuroleptic chlorpromazine (▶ Sects. 1.6 and ▶ 8.5). Nonetheless it has the opposite effect and is an antidepressant. • Phenylbutazone was meant to be an additive used to dissolve the anti- inflammatory aminophenazone. The substance turned out to be an anti- inflammatory agent itself as did its metabolite, oxyphenbutazone. N N CH3 2.27 Naftifine H+ N CH3 HO N 2.26 N CH3 tBu 2.28 Terbinafine Fig. 2.11 Instead of CNS activity, naftifine 2.27, prepared from spiro-compound 2.26, is an antimycotic. A comparison with the more portent terbinafine 2.28 shows that the phenyl group can advantageously be replaced with a tert-butylethinyl group. 32 2 In the Beginning, There Was Serendipity
  • 50.
    • An attemptto isolate the causative agent of bipolar disorder from the urine of patients afforded only uric acid. Because uric acid is poorly soluble, lithium ureate was tested. This led to the discovery of the antidepressant effect of lithium salts. • Clonidine was meant to be a local treatment for the runny nose that accom- panies the common cold. Instead of the expected effect, a profound hypotonic effect was surprisingly found. Despite intensive structural variations, none of clonidine’s analogues have surpassed its potency. • Levamisole was developed as a broad-spectrum anthelmintic (anti-worm agent). Instead, an immunomodulatory effect was accidently found that now stands in the therapeutic foreground. • Praziquantel was originally meant to be an antidepressant. Because of its high polarity, it cannot cross the blood–brain barrier. An outstanding suitability for the treatment of the tropical disease bilharziosis was found through broad biological testing. • A chemist at Searle who was working on dipeptides licked his fingers while flipping through the pages of a book. The sweet taste that he noticed turned out to be caused by the artificial sweetener aspartame. Saccharine was also found in a very similar way. In the case of cyclamate, a smoker noticed a sweet taste to his cigarettes. • Even today when one would think that rational concepts dominate drug research, the lucky accident still helps to make “blockbusters.” In the pursuit of a phosphodiesterase inhibitor to hinder the degradation of cyclic guanosine monophosphate (cGMP), an improved treatment for angina pectoris was not found (▶ Sect. 25.8). Instead it became conspicuous that the male subjects in the clinical trial did not want to give up the substance. After the side effect of a stronger penile erection was recognized, the side effect became the main effect. The compound sildenafil was marketed for the treatment of erectile dysfunction as Viagra® , and developed into a billion-dollar product. 2.9 Where Would We Be Without Serendipity? In the English-speaking world, a word is in use that is difficult to translate into other languages: serendipity. This term, as an expression of a lucky accident, was coined by Sir Horace Walpole in 1754. It is derived from a Persian fairytale in which three princes of Serendip (earlier Ceylon, today Sri Lanka) have accidental and unex- pected luck and make interesting discoveries entirely analogously to the many examples in this chapter. Serendipity has played an exceedingly important role in general in science, and especially in drug research. How would our modern medicine supply look without all of these lucky accidents? By no means should an arbitrary approach be taken, and an accidental discovery be counted upon. To the contrary, chemists and pharmacologists have always developed concrete ideas as to how and why particular structural variations on a lead compound should be 2.9 Where Would We Be Without Serendipity? 33
  • 51.
    pursued. Some ofthese hypotheses were correct, and others were false. One thing that they always had in common that helped the researchers was that when a hypothesis failed, or an unexpected result was found, they recognized the poten- tial consequences of the result, drew the correct conclusions, and did the right things. The following chapters will show numerous examples of successful targeted drug design in cases in which the correct working hypothesis was realized. The search for a new active substance is, however, not a process that can be pushed through by a purely technically oriented management. As a general rule, short-term planning and bureaucratic control have only negative consequences. On the other hand the search for new medicines requires a concerted effort from many different groups of specialists, who must work together in a suitable organizational structure. The subsequent preclinical and clinical development of a newly found active substance is an extremely expensive and time-consuming process that must be carefully planned, carried out, and controlled. For this, other instruments are necessary than are used for drug discovery. 2.10 Synopsis • The history of early drug research is full of lucky accidents. Many active principles of substances were discovered by serendipity, but mostly success can be attributed to an outstanding researcher with a “prepared mind” who observed important effects. • Dyes and pharmaceuticals, both developed in the early stages of the up-coming chemical industry, especially stimulated each other in very fruitful synergies. • The discovery by Alexander Fleming of the first antibiotic principle, the peni- cillins, as a defense mechanism of a fungus against bacteria, is one of the most famous examples of a serendipitous discovery. • The partial synthesis of ergoline alkaloids led to the discovery of the hallucino- genic effects of LSD. In those days, researchers frequently conducted self- experiments to first test active principle in humans. • Unexpected synthetic products, surprising structural rearrangements, and ini- tially false working hypotheses produced new, pharmacologically interesting substances with surprising or outstanding qualities. • Even today, where rational concepts and the understanding of mode-of-action dominates drug research, the lucky accident can still help to make “block- busters” as proven recently by the example of sildenafil (Viagra® ). Bibliography Primary Literature Ban TA (2006) The role of serendipity in drug discovery. Dialogues Clin Neurosci 8:335–344 Burger A (1983) A guide to the chemical basis of drug design. Wiley, New York 34 2 In the Beginning, There Was Serendipity
  • 52.
    de Stevens G(1986) Serendipity and structured research in drug discovery. Fortschr Arzneimit- telforsch 30:189–203 Kubinyi H (1999) Chance favors the prepared mind. From serendipity to rational drug design. J Receptor Signal Transd Res 19:15–39 Restak RM (1994) Receptors. Bantam Books, New York Roberts RM (1989) Serendipity. Accidental discoveries in science. Wiley, New York Sneader W (1990) Chronology of drug introductions. In: Hansch C, Sammes PG, Taylor JB (eds) Comprehensive medicinal chemistry, vol 1, Kennewell PD (ed). Pergamon Press, Oxford, S.7–S.80 Secondary Literature Cahn A, Hepp P (1886) Das Antifebrin, ein neues Fiebermittel. Centralblatt f€ ur Klinische Medizin 7:561–564 Hofmann A (1993) LSD – mein Sorgenkind, dtv/Klett-Cotta Sternbach LH (1978) The Benzodiazepine story. Fortschr Arzneimittelforsch 22:229–266 St€ utz A (1987) Allylamine derivatives – a new class of active substances in antifungal chemo- therapy. Angew Chem Int Ed 26:320–328 von Vámossy Z (1900) Ist Phenolphthalein ein unsch€ adliches Mittel zum Kenntlichmachen von Tresterweinen? Chemiker-Zeitung 24:679–680 Bibliography 35
  • 54.
    Classical Drug Research 3 Thehundred years of pharmaceutical research from 1880 to 1980 were punctuated by trial and error, but also by elegant ideas and their translation into therapeutically valuable principles. Many lead structures were found by accident (see ▶ Chap. 2, “In the Beginning, There Was Serendipity”), others came from traditional medi- cines or from biochemical concepts. In contrast to modern drug research, classical design was the result of rather limited knowledge of the pathophysiology and cellular and molecular etiology of disease, and was restricted to animal experi- ments. Nonetheless, this phase, and particularly the last 50 years, has been excep- tionally successful. The targeted fight against infectious diseases and the successful treatment of many psychiatric and other important diseases can be attributed to this period in drug development. With this came a significant increase in quality of life and life expectancy. In the following sections, selected examples are used to demonstrate different aspects of classical pharmaceutical research. 3.1 Aspirin: A Never-Ending Story The history of acetylsalicylic acid (ASA, Aspirin® ) reflects the progress of phar- maceutical research like no other example. This is especially true for the elucida- tion of the mode of action, and the newly found targeted therapies that resulted. Willow bark extracts have been used since antiquity for the treatment of inflam- mation. When Napoleon marched across Europe, between 1806–1813 the bark was even used as a substitute for cinchona bark (Sect. 3.2). Salicin 3.1, a glucoside of the o-hydroxybenzylalcohol saligenin, is responsible for the effect. Upon hydrolysis and oxidation, the actual active compound, salicylic acid 3.2 (Fig. 3.1), is formed. In 1897 the then 29-year-old Bayer chemist Felix Hoffmann began a systematic search for derivatives of salicylic acid. His father, who suffered from severe rheumatoid arthritis, had asked him to. High doses of salicylic acid caused unpleas- ant gastric irritation and vomiting. Hoffmann prepared simple derivatives of salicylic acid, and was successful within the year. On October 10, 1897 he synthe- sized acetylsalicylic acid 3.3 (ASA, Fig. 3.1) for the first time in a pure form. G. Klebe, Drug Design, DOI 10.1007/978-3-642-17907-5_3, # Springer-Verlag Berlin Heidelberg 2013 37
  • 55.
    It was alucky strike. Although ASA has a very short half-life in plasma, it is analgetic, antipyretic, and anti-inflammatory in large measure. The clinical trial was carried out at the Diakonissenkrankenhaus in Halle an der Saale on 50 patients. On February 1, 1899 Bayer registered ASA as Aspirin® (A for acetyl and spiraea, another plant that contains salicylic acid) as a trademark under the number 36 433. From then on it was sold as 1 g of powder in envelopes, and shortly thereafter as tablets. Detractors alleged that it was only developed in tablet form so that Bayer could emboss their famous Bayer cross onto it. Aspirin quickly gained a leading place in drug therapy. One-hundred years after its market introduction, 40,000 t of ASA are produced and pressed into tablets every year, worldwide. At the end of 1994 the Bayer plant in Bitterfeld produced 400,000 Aspirin® tablets per hour, 3.5 billion per year. The importance that the trademark Aspirin had for Bayer became clear in 1994 when the company paid US$1 billion to take over the self-medication business from Sterling—Winthrop, which included the trademark rights for Aspirin, which had been lost in 1918. The Spanish philosopher José Ortega y Gasset called the previous century the‚ “age of Aspirin.” In his book‚ The Rising of the Masses, he wrote: The ordinary person lives today more easily, comfortably and safely than the most powerful of the past. Why should he care that he is not richer than others when the world is and roads, trains, hotels, telegraphs, personal safety, and Aspirin® are at his disposal. Jaroslaw Hasek, Kurt Tucholsky, Giovanni Guareschi, Graham Greene, John Steinbeck, Agatha Christie, Truman Capote, Hans Helmut Kirst, and Edgar Wallace also wrote about Aspirin. The singer Enrico Caruso treated his headaches with only “German Aspirin,” out of principle. Even Franz Kafka and Thomas Mann raved about its outstanding effects in their letters. In 1986 on an official visit to Germany, Queen Elizabeth II said that: German successes span the entire breadth of human life. From philosophy, music and literature, to the discovery of X-rays and the mass production of Aspirin® . The compliment was wonderful, but one must also consider that all of these scientific discoveries are slightly more than 100 years old! ASA was considered to O-b-D-glucopyranoside 3.1 Salicin 3.2 Salicylic acid CH2OH COOH OH COOH O CH3 3.3 Acetylsalicylic acid O Fig. 3.1 Salicylic acid 3.2 is the oxidation and cleavage product of salicin 3.1, which is isolated from willow bark. Acetylsalicylic acid (ASA) 3.3 is not simply a prodrug of salicylic acid, but rather a drug with its own mode of action. 38 3 Classical Drug Research
  • 56.
    be a prodrugof salicylic acid and a drug of unknown mode of action until John Robert Vane (Nobel Prize 1982) and Sergio H. Ferreira discovered in 1971 that salicylic acid and other nonsteroidal anti-inflammatory drugs inhibit prostaglandin G/H synthase (cyclooxygenase, COX). COX, a ubiquitously present, membrane-bound enzyme transforms arachidonic acid 3.4 over a cyclic endoperoxide into PGH2 3.5, which in turn is transformed into prostacyclin 3.6, thromboxane A2 3.7, and other prostaglandins. Large quantities of prostaglandins are produced in inflamed tissue, so that the inhibition of cyclooxygenase intervenes in the cause of the process itself (Fig. 3.2). ASA is in fact a metabolic precursor of salicylic acid. In contrast to other anti- inflammatory drugs, including salicylic acid, however, it has an astonishing mode of action (▶ Sect. 27.9). It has been known for some time that ASA selectively acetylates the hydroxyl group of the amino acid serine 530 of cyclooxygenase. In 1995 the three-dimensional complex structure of a bromine analogue was solved for the first time. This drives the point home that ASA, analogously to other COX inhibitors, docks near the arachidonic acid binding site (▶ Sect. 27.9). Therefore despite its relatively weak binding, ASA is in an outstanding position to acetylate this serine. Serine 530 is not involved in the catalytic mechanism, but the additional - COOH Cyclo oxygenase O O COOH OH ASA 3.5 PGH2 3.4 Arachidonic acid Prostacyclin- synthase Thromboxane- synthase O COOH COOH O OH O OH HO 3.6 Prostacyclin 3.7 Thromboxane A2 Fig. 3.2 Arachidonic acid 3.4 undergoes an oxidative cyclization and a peroxidase reaction in the prostaglandin biosynthesis to give the primary product PGH2 3.5. Finally prostacyclin synthase transforms PGH2 into prostacyclin 3.6, which protects the gastric mucosa, dilates blood vessels, and inhibits platelet (thrombocyte) aggregation. The platelet thromboxane synthase transforms PGH2 into thromboxane A2, which promotes aggregation. ASA irreversibly inhibits cyclooxygen- ase. By using low ASA doses, the thromboxane A2 synthesis in the platelets is more strongly inhibited than the production of prostacyclin in the vascular walls. 3.1 Aspirin: A Never-Ending Story 39
  • 57.
    volume of theacetyl group impedes arachidonic acid’s entrance to the binding site and therefore the synthesis of the prostaglandin precursors. A COX mutant that carries an alanine instead of a serine at position 530, is enzymatically fully active but is inhibited by all other anti-inflammatory compounds. This mutant is, as expected, only weakly inhibited by ASA. Stimulation for the continued research on nonsteroidal anti-inflammatory drugs was generated by the discovery in 1991 of a second cyclooxygenase, COX-2. All anti-inflammatory drugs until then were unselective, or they exerted their effect overwhelmingly over COX-1 and only slightly over COX-2. The most important side effect of ASA and other anti-inflammatory drugs is the gastrointestinal damage that can occur at high doses; this results from the inhibition of the COX-1- dependent synthesis of prostacyclin 3.6, which protects the gastric mucosa. In contrast to the ubiquitously occurring COX-1, COX-2 is responsible for the fast synthesis of prostaglandins in inflamed tissue. It has been possible to bring many drugs to the market that are more than 1,000-fold more selective for COX-2 than COX-1, for instance, 3.8 and 3.9 (Fig. 3.3 and ▶ Sect. 27.9). But do not worry, Aspirin® will live forever. Its success is growing in another market. Even at low doses ASA inhibits the synthesis of thromboxane A2 3.7, which initiates the coagulation of platelets (thrombocytes). Because of its irreversible inhibition of cyclooxygenase, and the inability of platelets to synthesize new enzyme, a one-time contact with the substance is enough to suppress the synthesis for the lifetime of the thrombocyte, that is, for about a week. The enzyme is replaced in other tissues besides thrombocytes. Therefore the physiological adver- sary to thromboxane, the aggregation-inhibiting prostacyclin that is produced in the walls of the vasculature, can be replenished (Fig. 3.2). With regard to the condition of increased coagulation tendency, ASA adjusts the biosynthesis away from the “bad” thromboxane in the direction of the “good” prostacyclin. This effect is the basis for the therapeutic use of ASA in cases of thrombosis susceptibility, for instance, before and after a heart attack or stroke. Considering the now-known mechanism of the effect, the dose can be decreased by tenfold! That reduces the risk of gastrointestinal bleeding as a possible side effect. Based on these observations, it is now recommended that ASA be taken SO2NH2 SO2NH2 H3C N N CH3 O N F3C 3.9 Valdecoxib 3.8 Celecoxib Fig. 3.3 Celecoxib 3.8 and valdecoxib 3.9 are specific inhibitors of cyclooxygenase COX-2, which is in particular responsible for the fast synthesis of prostaglandins in inflamed tissue than COX-1. 40 3 Classical Drug Research
  • 58.
    prophylactically before long-haulflights. The constricted sitting and lack of move- ment coupled with the dry air and reduced pressure in the cabin lead to dehydration and cause a “thickening” of the blood. The economy-class syndrome typically leads to jet legs and increases the risk of embolism and vein thromboses. Here ASA can offer a measure of protection. On the other hand, its use before surgical procedures is not recommended. No surgeon wants an increased bleeding risk for the patient as a result of diminished coagulation competence during a procedure. Felix Hoffmann’s approach of using simple derivatization to improve the tolerability of a substance led to a new therapeutic principle 100 years ago, the value of which cannot be appreciated enough. The victory lap of ASA was, and is, unstoppable. A German/Austrian study on 13,300 patients showed that ASA therapy reduces the mortality of a heart attack by 17%, and the number of non-fatal repeat attacks by 30%. On October 9, 1985 the US FDA, a normally conservative organization, announced that the daily consumption of ASA can reduce the chances of a recurrent heart attack by 20%, and in some high-risk populations by even more than 50%. A further study on 22,000 physicians inves- tigated the influence of regular ASA use on the chances of heart attack. Here, the physicians were not the experimenters but the patients. The study was prematurely ended when it was established that the control group had 18 lethal and 171 non- lethal heart attacks, whereas the ASA-treated group had 5 lethal and 99 non-lethal heart attacks: altogether a reduction of 50%. A study on 90,000 nurses showed the same protective effect in women. The risk of a first heart attack was reduced by 30%. This marked the introduction of ASA as a “preventive medicine.” A six-year study of 600,000 volunteers is worth an entry in the Guinness Book of World Records. After the results were in, it appeared that ASA reduces the risk of lethal colon cancer by 40%. Even this effect has a plausible explanation. Malondialdehyde, a metabolite of prostaglandines, damages DNA. Mutations in the so-called tumor-suppressor gene TP53 occur in human colon tumors particu- larly frequently. This causes the cancer cells to lose the ability to regulate their growth, and they grow uncontrollably. It could also be entirely different. As a result of gastrointestinal bleeding, a possible side effect of ASA, the treated group was probably more frequently examined than the control group. It is entirely conceiv- able that the colon cancer was therefore found in an earlier stage in which it was more easily operable. Since 1992 Aspirin® is available as a chewable tablet. In this form it is buffered with calcium carbonate, the absorption is much faster, and the side effects are reduced. ASA has had an unbelievable career, particularly if one considers that it would never have had a chance under modern criteria to be approved. Its short plasma half-life, the irreversible protein inhibition, and the high doses would have met today’s exclusion criteria. A definitive end point in its hypothetical modern development would be the teratogenicity seen in rats. A pathological result in toxicity studies with this animal model will definitely lead to discontinuation, because who would dare to wager that a teratogenic effect occurs in rodents, but not in humans. Aspirin® — really a never-ending story. 3.1 Aspirin: A Never-Ending Story 41
  • 59.
    3.2 Malaria: Successand Failure The therapy of malaria begins with the discovery of cinchona, around which there are numerous legends. The nicest and most frequently cited version is that of the fever-stricken Countess Cinchon, the wife of the Spanish viceroy in Lima, Peru, who was healed by the doctor Juan de Vega in 1638. On the advice of the town magistrate of Loja, Quinquina the “bark of the barks” (therefore the confusing name “cinchona bark”) was brought in from 800 km away. The Countess was allegedly healed and from then on distributed the powder herself. In the older works, the cinchona bark was also called “Countess powder” or “Jesuit powder.” Perhaps it was also true that the Indians, who were forced into compulsory service in the silver mines by their Christian conquerors, chewed the bark to fight off shivering in the cold. The clever Jesuits took note of these observations, and thought that chewing the bark would also help with the shivering that comes from a malarial fever episode. Cinchona then came back to Europe with the Jesuits. Malaria, the remittent fever, is a widespread tropical and sub-tropical disease. Because it is transmitted by the anopheles mosquito, it occurs particularly in wetlands. Even the city Buenos Aires (Span. “good airs”) was badly hit by malaria (Ital. mala aria¼“bad airs”). Alexander the Great, the Gothic King Alarich, and the German Emperors Otto II and Heinrich IV died of it. Even Albrecht D€ urer (1471–1528) apparently suffered from malaria. He sent his private physician a drawing of himself in which he was wearing only a loincloth. His right hand is over his spleen with the additional text that do der gelb Fleck ist vnd mit dem Finger drawff dewt, do ist mir we (there where the yellow spot is and where the finger points, is where it hurts). In Europe malaria was still widespread until the middle of the last century. In the north of Germany, the last epidemics were in the years 1896, 1918, and 1926. The miasma, emissions from the ground, swamps, and corpses, were long seen as the source of malaria and other epidemics. The Roman author Marcus Terrentius Varrus (116–127 BC) suspected back then that small invisible organisms might be responsible. Toward the end of the nineteenth century, the anopheles mosquito was identified as the vector, and a plasmodium was recognized as the cause of malaria. Around 1930 about 700 million people were infected, and in 2003 the number was estimated to be 300–500 million. Up to 1.2 million people die every year, mostly children under the age of 5, and many others retain permanent damage. Psychiatric changes are also a consequence. The term “spleen” for eccentricity originally came from the enlarged spleen that malaria causes. It should not go unmentioned that heterozygotic (i.e, genetically mixed) carriers of sickle cell anemia are protected from malaria. This genetic form of anemia was the first disease for which the molecular cause could be identified (▶ Sect. 12.12). A single amino acid in the hemoglobin of those afflicted is mutated. This causes hemoglobin to aggregate, and the erythrocyte shrinks together. The malaria parasite cannot adequately reproduce in such an erythrocyte. This partial protection from malaria has abetted the spread of sickle cell anemia in malaria-endemic areas, but not in other areas. 42 3 Classical Drug Research
  • 60.
    The active substancein the cinchona bark, the alkaloid quinine 3.10 (Fig. 3.4) was isolated in 1820. Aside from the positive therapeutic effects, it also had considerable side effects; nonetheless up until a few years ago it was the most important antimalarial, particularly for the parenteral treatment of severe malaria. The first synthetic alternative, plasmoquine 3.11, became available in 1927, but it is seldom used due to its side effects. The later-developed, more potent analogues 3.12–3.14 show a clear structural relationship to the lead structure quinine (Fig. 3.4). It was only through the protection from malaria that the exploitation of the colonies was possible. The World Health Organization, WHO, initiated a global malaria-eradication program in 1955 mainly through the use of the insecticide dichlorodiphenyltrichloroethane 3.16 (DDT, Fig. 3.5). The success was overwhelming, the number of cases and fatalities was reduced to practically zero (Table 3.1). In 1953 it was estimated that five million lives have H MeO HO H N N HN HN N(Et)2 MeO N CH3 3.10 Quinine 3.11 Plasmoquine CH3 CH3 N(Et)2 HN N(Et)2 MeO N Cl N Cl 3.12 Mepacrine 3.13 Chloroquine NH HO H HN N OH N CF3 CF3 3.14 Mefloquine N Cl 3.15 Amodiaquine Fig. 3.4 Simple synthetic analogues with antimalarial effects were derived from quinine 3.10. Plasmoquine 3.11 still contains the methoxyquinoline ring of quinine, but it is in a different position. The later-developed analogues mepacrine 3.12 and chloroquine 3.13 show strong similarity to quinine. The newer derivatives mefloquine 3.14 and amodiaquine 3.15 are also structurally closely related to quinine. 3.2 Malaria: Success and Failure 43
  • 61.
    been saved since1942. In India alone the number of cases went from 75 million to 750,000, and the number of annual fatalities was reduced to 1,500. DDT has saved more lives than all antimalarial drugs put together! The acute toxicity of DDT is actually not a problem for mammals and humans. Unfortunately, it turned out that DDT decomposes extremely slowly in the environment, and it enriches as it moves its way up the food chain, especially in birds and fish. It also accumulates in human fat and in breast milk. The chronic toxicity comes from long-term retention of one year or more, and that is a serious problem. The moving book, Silent Spring by Rachel Carson, was published in 1962. Despite warnings from experts, DDT spraying for mosquitoes was stopped in Sri Lanka in 1963, and the number of malaria cases raced to 2.4 million by Table 3.1 Number of malaria cases in different countries before and after the introduction of DDT 3.16 (Fig. 3.5) The numbers in parentheses are the years (Jukes TH (1974) Naturwiss 61:6–16) Country Cases of malaria (year) Before DDT After DDT Italy 411,602 (1946) 37 (1969) Spain 19,644 (1950) 28 (1969)a Yugoslavia 169,545 (1937) 15 (1969)a Bulgaria 144,631 (1946) 10 (1969)a Romania 338,198 (1948) 4 (1969)a Turkey 1,188,969 (1950) 2,173 (1969) India 75 million per year 750,000 (1969) Sri Lanka 2.8 million (1946) 110 (1961) 31 (1962) 17 (1963) 2.5 million (1968/1969)b Taiwan 1 million (1945) 9 (1969) Venezuela 817,115 (1943) 800 (1958) Mauritius 46,395 (1948) 17 (1969) a Imported cases b After DDT spraying was discontinued in 1963 Cl Cl Cl Cl CCl3 CCl2 3.16 DDT 3.17 DDE Fig. 3.5 The insecticide p, p0 -dichlorodiphenyltrichloroethane 3.16 (DDT) saved more human life than all of antimalarials put together. The latest investigations show though, that the antiandrogenic effects of the main metabolite p, p0 -dichlorodiphenyldichloroethylene 3.17 (DDE) is possibly the main culprit responsible for reproductive disorders found in animals, including perhaps humans. 44 3 Classical Drug Research
  • 62.
    1968/1969. By thenit was too late to use DDT again because the mosquitoes had become resistant, and this was certainly also partially due to the residual DDT that remained in the environment in the intervening years. Further investigations showed that a DDT metabolite, dichlorodiphenyldichloro- ethylene 3.17 (DDE, Fig. 3.5) has surprisingly strong antiandrogenous effects, that is, it blocks the effects of male hormones. Therefore, DDE is responsible for the DDT-dependent reproductive and developmental disorders that are seen in some species, perhaps also in humans. It is remarkable that the effect of this metabolite was only discovered 50 years after DDT was introduced. Not only the mosquitoes became resistant to DDT, the parasite also became resistant to the drugs. For this reason, the history of the chemotherapeutic develop- ments for malaria has been a rollercoaster ride of new promising compounds, and the more or less quick development and distribution of resistant parasites. Chloroquine 3.13, was prepared in 1934 in the Bayer laboratories, but was judged to be “too toxic”; it was “rediscovered” by the Americans and deployed as a malaria therapeutic par excellence. Efficacious, well tolerated, and above all else inexpensive to produce, it, along with the above-described mosquito extermi- nation with DDT and landscaping measures, brought us within reach of a victory over malaria. But resistant parasites emerged almost simultaneously and indepen- dently from one another in the 1960s in different parts of Southeast Asia, Oceania, and South America. They possessed a mutated transport protein in the membrane of their gastriole that recognizes chloroquine as a substrate. By using this protein they were able to expel chloroquine from its target. In the meantime, resistant parasites have spread throughout almost the entire geographic range of malaria. Chloroquine lost its once phenomenal status for the therapy of malaria tropica. Since then, a malaria therapeutic with similar qualities as chloroquine has been sought by researchers, until now, however, without success. The structurally related amodiaquine 3.15 (Fig. 3.4) is in fact effective against weakly chloroquine-resistant strains, but it is largely ineffective against highly resistant strains (especially in Southeast Asia). Moreover, upon long-term use as a prophylaxis, it carries the risk of irreversible liver damage or a life-threatening agranulocytosis. In the short term, it appeared that the antifolate combination of sulfadoxine/pyrimethamine 3.18/3.19 (Fansidar® ) could replace chloroquine (Fig. 3.4), but the first resistance occurred much faster than with chloroquine. Starting from the point of origin in Southeast Asia, the resistance has spread over the entire world. The wars of the last century have also promoted the search for new antimalarial drugs. Tremendous effort was made at the Walter Reed Army Institute of Research in the USA. Over the course of 40 years, and particularly during WWII and the Vietnam War, more than 250,000 substances were tested for an anti- malarial effect. Judging on hand of the exerted effort, the success was modest: the two aryl amino alcohols halofantrine 3.20 and mefloquine 3.14, and the 8-aminoquinoline tafenoquine 3.21, which still has not completed clinical trials, were the result of strenuous labor. After its introduction, halofantrine was with- drawn from the market because it caused lethal arrhythmias (▶ Sect. 30.3). In Southeast Asia the resistance to mefloquine developed so quickly that it can only 3.2 Malaria: Success and Failure 45
  • 63.
    be used incombinations with artesunate 3.22. Because mefloquine has been used sparingly due to its price, most of the parasite strains are still sensitive to it. For this reason, today mefloquine is one of the most important malaria prophylactics for Western tourists. Artesunate is a partial-synthetic derivative of dihydroartemisinin 3.24, which is isolated from annual mugwort (Artemisia annua). Artemisinin’s very unusual endoperoxide structure is essential for its activity. Intense research is currently devoted to clarifying whether the iron(II)-catalyzed production of radicals, which then react with the immediate cell structures (iron-triggered cluster bomb), or a specific calcium pump inhibition is its mode of action. At any rate, these are the most potent medicines to fight malaria to date. Scientists consider it to be only a matter of time until resistance to artemisinin develops. The artemisinin-based combination therapy is the current recommendation of the WHO. It is combined with whatever is available, even with substances that have already established massive resistance. At the moment it is combined with the Chinese-developed aryl amino alcohol lumefantrine 3.23, which is usually still effective. The combinations of dihydroartemisinin/piperaquine 3.24/3.25 and artesunate/pyronaridine 3.22/3.26 (Fig. 3.6) are in advanced stages of clinical trials. Both combination partners were developed in China in the 1960s and 1980s, respectively. They belong to the same class as chloroquine, even though pyronaridine has an azaacridine instead of a quinone scaffold. Resistance to both of these compounds is already widespread in Southeast Asia. The combination of dapsone/chlorproguanil (LapDap® ) 3.27/3.28 was introduced only a few years ago and both compounds are representatives of a long-used class: the antifolates. Even in this case, the majority of Southeast Asian strains are already resistant. True novelties in the mode of action are rare. In 1997 a very expensive combination medication atovaquone/proguanil 3.29/3.30 (Malarone® ) was introduced that syn- ergistically inhibits the mitochondrial respiratory chain. Fosmidomycin 3.31, an inhibitor of the parasite-specific mevalonate-independant isoprenoid synthesis pathway, is currently in clinical trials. Increased efforts are necessary to find new substances. Ideally, modes of action that have not been exploited yet should be pursued. It is only in this way that we can be armed and ready for the time that resistance to artemisinin spreads. 3.3 Morphine Analogues: A Molecule Cut to Pieces Research on the opiates has taught us how complex natural products can be systematically simplified, and structurally abbreviated analogues can be prepared that have the identical effect, but sometimes with even better specificity. It has also shown that there is sometimes no obvious solution for a specific problem. The separation of the analgesic and addictive qualities could not, or only inadequately be achieved. The narcotic, analgesic, and euphoric effects of opium, which is isolated from poppies, have been known for at least 5,000 years. Opium was used for operations, 46 3 Classical Drug Research
  • 64.
    N N N H S H2N OMe MeO O O 3.18 Sulfadoxine N N H2NCl NH2 H3C 3.19 Pyrimethamine N CH3 CH3 HO F3C Cl Cl 3.20 Halofantrine CF3 N O H3C CH3 CH3 O CH3 O N HN NH2 3.21 Tafenoquine O O O O H3C CH3 CH3 H H O O O O– O 3.22 Artesunate Cl Cl N CH3 CH3 HO Cl 3.23 Lumefantrine O O O H3C CH3 H H O O CH3 OH 3.24 Dihydroartemisinin Fig. 3.6 (continued) 3.3 Morphine Analogues: A Molecule Cut to Pieces 47
  • 65.
    but is alsoa traditional drug of abuse. The importance of its abuse in the cultural history of humanity is illustrated, among other places, in the “Opium Wars” of the nineteenth century. In 1840 the Chinese wanted to stop the English from importing opium and burned 20,000 cases of it; this led to a 2-year-long war between the two countries. N N N N N Cl N Cl 3.25 Piperaquine HN OH N O N N N Cl CH3 3.26 Pyronaridine S O O H2N NH2 3.27 Dapsone N Cl N N H CH3 CH3 H2N NH2 NH2 N Cl N N H CH3 CH3 H2N Cl 3.28 Chlorproguanil O Cl O OH 3.29 Atovaquone 3.30 Proguanil N O H OH P O– O OH Na+ 3.31 Fosmidomycin Fig. 3.6 The latest research in antimalarials shows that many products can be used in combination. First Fansidar® , a combination of sulfadoxine 3.18 and pyrimethamine 3.19 was the drug of choice. The development of rapid resistance has made this once-promising treatment useless in the meantime. To date, hopes rest on the artemisinin derivatives 3.22 and 3.24. A new beacon of hope is found in fosmidomycin 3.31, which has a novel mode of action in that it inhibits the mevalonate-independent biosynthetic route to isoprenoids. 48 3 Classical Drug Research
  • 66.
    In 1804/5 thepharmacy assistant Friedrich Wilhelm Adam Sert€ urner of the Hof-Apotheke in Paderborn isolated the compound with the sleep-inducing princi- ple. He named it morpheum (later morphine) after Morpheus, the Greek god of dreams and son of Hypnos. Morphine addiction took on a whole new dimension after 1853 and the invention of the hypodermic needle and syringe by Charles G. Pravaz and Alexander Wood. As a result, morphine and heroin addiction spread widely, and in the history of humanity it is one of many examples of the misuse of a beneficial discovery. Morphine 3.32 (Fig. 3.7) is one of the few examples of a natural product that is still used today in its original form. It belongs to the most potent known analgesics. If it is administered according to the correct dose and schedule, the danger of addiction is low. The addictive potential is often overestimated by physicians such that patients with severe pain are often inadequately treated with opiates. Morphine is also a prime example of the success of systematic structural variation in the direction of more-easily manufactured, simpler analogues as well as more selective activity. The first modified products were simple derivatives such as the methyl ether codeine 3.33, which is also found in the poppies. Codeine is weaker than morphine, but it is bioavailable after oral administration. It has a pronounced antitussive effect and a low addictive potential. Unfortunately, the opposite is true for the potent, fast-acting diacetyl derivative heroin 3.34. It has enormous addictive potential. Today it seems ironic that at the end of the nineteenth century Heinrich Dreser, a senior pharmacologist at Bayer, wanted to discontinue the development of Aspirin® because of a suspected cardiotoxicity in favor of devel- oping heroin as a well-tolerated and potent cough medicine (sic!), at least until he realized the mistake. Of all the morphine derivatives, codeine and heroin are the most widespread: codeine is in numerous combination preparations, and heroin is in the drug scene. Some n-alkyl derivatives of morphine and close analogues, for instance, naloxone 3.35, are opiate antagonists, that is, they inhibit the effect of morphine (Fig. 3.7). The structural elucidation of morphine took more than 120 years, and its total synthesis, and ultimate structural proof, was completed in 1952 by Marshall Gates O H H O HO N H OH R1 O R2 O N CH3 O N 3.32 Morphine, R1 = R2 = H 3.35 Naloxone 3.33 Codeine, R1 = Me, R2 = H 3.34 Heroin, R1 = R2 = Acetyl Fig. 3.7 Morphine 3.32 and codeine 3.33 served as lead structures for heroin 3.34, which has better CNS bioavailablity, and naloxone 3.35, a morphine antagonist. 3.3 Morphine Analogues: A Molecule Cut to Pieces 49
  • 67.
    and Gilg Tschudi.Morphine contains five rings: an aromatic benzene ring, two unsaturated six-membered rings, the nitrogen-containing piperidine ring, and an oxygen-containing five-membered ring. Systematic structural modifications had the goal of simplifying the structure, for example, by opening one or more rings, or removing them altogether. In 1939, the potent analogue pethidine 3.36 (Fig. 3.8) was the first fully synthetic analgesic, though it was originally based on the spasmolytic atropine 3.37. Despite this, it is recognized to be a morphine analogue. In levomethadone 3.38 the piperidine ring of pethidine is opened, an oxygen atom from the ester group is removed, and another aromatic ring is added. There are thousands of other ana- logues, some of which have been introduced to therapy. Aside from the decon- struction of morphine, the construction of additional rings has surprisingly led to analogues with more potency, for example, etorphine 3.39 (Fig. 3.8). For a long time it was a complete mystery why our bodies would have extra receptors for the contents of poppy plants, so-called opiate receptors. The solution came with the discovery of the endogenous morphine-like peptides Met- and Leu- enkephalin (▶ Sect. 10.2), which are the natural ligands for these receptors. The discovery stimulated an intensive search for orally active peptides or peptidomimetics devoid of addictive potential. The result of the work was more COOEt N H3C N H3C H3C H3C H O OH N CH3 COOEt O 3.36 Pethidine = 3.37 Atropine O Et O HO N N(Me)2 N CH3 H OH CH3 MeO 3.38 Levomethadone 3.39 Etorphine Fig. 3.8 The architecture of morphine was dissected in many ways. The strongly potent pethidine 3.36 was the first fully synthetic opiate analgesic, but it was discovered in the 1930s in a search for anticonvulsives by varying the structure of atropine 3.37. It is recognizable however, that pethidine retains the benzene ring of morphine as well as its piperidine ring. Levomethadone 3.38 is derived from pethidine. The addition of another ring led to substances the potency of which surpasses morphine by orders of magnitude. Etorphine 3.39 is 2,000–10,000-times more potent than morphine in animals. Since 1963 it is used in African wildlife preserves to immobilize large animals such as elephants and rhinoceroses. 50 3 Classical Drug Research
  • 68.
    than sobering. Althoughorally active analogues were found, their addictive poten- tial was identical to that of morphine and most morphine-derived analogues. A few synthetic analogues have, in addition to agonistic activity, a weak antag- onistic effect as well. The potential for these substances to be abused by addicts is less than with the classical morphine analogues. Combination preparations of agonists and antagonists are also available. With appropriate use, the analgesic effect of the agonist dominates because it is present in excess. If the medicine is injected intravenously, the more-strongly binding antagonist displaces the agonist, and the desired euphoric effect never sets in. The work with regard to improved selectivity was also successful. Today cough medicines and antidiarrhea medicines, for example, loperamide 3.40 (Fig. 3.9), are available that have no central morphine-like effects. This substance is able to pass through the blood–brain barrier but is immediately expelled by an active trans- porter. Upon inhibition of these transporters, for instance, when coupled with quinidine, loperamide also has classical opiate effects. Its structure unites elements of pethidine 3.36 and levomethadone 3.38. In this section only a few representatives of the many thousand structural modifications of morphine can be discussed. The approach of Paul Janssen should not remain unmentioned though; he started with pethidine 3.36 with the goal of preparing a strong analgesic, but instead experienced an unexpected success in another area. The result was the neuroleptic haloperidol 3.41 (Fig. 3.9), a drug for the treatment of schizophrenia, the mode of action of which is mediated by an antagonistic effect at the dopamine D2 receptor (▶ Sect. 29.4). 3.4 Cocaine: Drug and Valuable Lead Structure No other substance sparkles in so many ways as cocaine. In the introduction it was already mentioned that it is at the pinnacle of all illegal drugs. Cocaine was also the chemical starting material for a wide palette of valuable local anesthetics and antiarrhythmics. We can thank the lead-structure cocaine for local anesthesia, pain-free dentistry, and nerve-block anesthesia for smaller surgical procedures. N Cl OH Cl O CON(Me)2 N OH F 3.40 Loperamide 3.41 Haloperidol Fig. 3.9 Structural derivatives of morphine and its analogues have led to selective antidiarrhea agents, loperamide 3.40, for instance, as well as neuroleptics such as haloperidol 3.41. 3.4 Cocaine: Drug and Valuable Lead Structure 51
  • 69.
    The translation ofthe quite positive central effects of cocaine onto analogues devoid of addictive potential is still in progress. The example of morphine leads one to fear that this goal might not be possible. Coca leaves and cocaine 3.42 (Fig. 3.10) belong to the oldest known drugs. Chewing dried coca leaves has a long tradition in Peru and Bolivia. In 1744 Garcilaso de la Vega wrote that coca “satisfies hunger, gives new energy to the tired and exhausted, and lets the unhappy forget their troubles”. The Scottish author, Robert Louis Stevenson (Treasure Island) wrote in his novella The Strange Case of Dr. Jekyll and Mr. Hyde about a personality split that a doctor undergoes under the influence of drugs; he wrote the first draft of this novella in only three days and nights while under the influence of cocaine. In 1863 the American chemist Angelo Mariani (1838–1914) patented a mixture of coca extract and wine as Vin Mariani. It made him a rich man. In 1886 the pharmacist John S. Pemberton developed a coca-containing stimulant and headache remedy that he named Coca Cola. He sold the rights in 1891 to a colleague, Asa G. Candler, who founded the Coca Cola Company one year later. Up until 1906 Coca Cola indeed contained a small amount of cocaine, but today it only contains the harmless stimulant caffeine. Back at the turn of the last century, cocaine was already fashionable, particularly in artistic circles. The Viennese psychiatrist Sigmund Freud (1856– 1939) experimented with cocaine intensively and rather uncritically. He considered it to be a wonder drug, took it himself regularly, and recommended it generously for use in therapy, for the treatment of stomach aches, and for a depressed mood. Later, after massive criticism from his colleagues he turned away from it. Cocaine causes the release of dopamine from its transporter (see ▶ Sect. 30.7). Usually it is sniffed, occasionally it is intravenously injected, or it is mixed in drinks or taken orally. Sniffing delivers it quickly to the brain where it displaces dopamine N COOCH3 O O H3C H3C H3C O H3C O H NH2 3.42 Cocaine 3.43 Benzocaine CH3 CH3 CH3 N H N N H N O H3C H3C O 3.44 Lidocaine 3.45 Mepivacaine Fig. 3.10 The local anesthetic effect of cocaine 3.42 was recognized early on. The independently found lead structure benzocaine 3.43 and the basic moiety of cocaine were models for synthetic local anesthetics. The structural relationship is clearly recognizable in lidocaine 3.44, which also acts as an antiarrhythmic, and in mepivacaine 3.45. 52 3 Classical Drug Research
  • 70.
    from the bindingsite of the transporter and this causes increased dopamine release into the synaptic gap. The free base, which is made by mixing it with sodium bicarbonate (crack) is absorbed very quickly through the lungs by smoking it, and causes euphoria that is even more distinct stronger than when the salt (coke, powder, snow) is sniffed. Because cocaine does not bind for long, the transporter is quickly reloaded with dopamine. The same effect can be induced again after a little while. Other cocaine analogues that bind for longer do not allow the effect to be repeated for hours. Psychological dependence occurs very quickly, even after the first use in the case of crack cocaine. Physical withdrawal symptoms, as seen with heroin addicts, usually do not occur. The credit for discovering the local anesthetic effect of cocaine does not go to Freud but rather a friend of his, the ophthalmologist Carl Koller (1857–1944). Freud had planned to investigate this effect but in 1884 he wanted to visit a friend of his, Martha Bernays, in New York quickly first. Koller picked up on Freud’s suggestion and carried out the decisive experiment on the eye in his absence. The synthetic benzoic acid esters and anilides that were initially used as local anes- thetics were not derived from cocaine 3.42, but rather from p-aminobenzoic acid esters; benzocaine 3.43 was already in use in therapy in 1902. A structural rela- tionship to cocaine is, however, easily seen in modern local anesthetics such as lidocaine 3.44 and mepivacaine 3.45 (Fig. 3.10). 3.5 H2 Antagonists: Ulcer Therapy Without Surgery The history of the treatment of gastroduodenal ulcers is long and educational. Basic research clarified the important mechanisms without providing a new drug. The development of the therapy occurred in several phases. Again and again, better was the enemy of good. In the beginning the treatment consisted of antacids, and later anticholinergics. In severe cases only surgery helped. The H2 antagonists made the breakthrough to purely pharmaceutical treatment. Now we are experiencing the victory lap of the proton-pump inhibitors, which are used in different combinations with antibiotics. Perhaps in the future this will be augmented or even replaced by a vaccine. Gastric and duodenal ulcers are usually chronic illnesses and are widespread in the general population. Any damage to the mucosal membrane of the stomach leads to damage to the underlying cells through proteolytic enzymes and gastric acid. Acetylcholine 3.46, histamine 3.47, and gastrin, a mixture of peptides with 17 (little gastrin) and 34 (big gastrin) amino acids, stimulate the production of acid. For decades the treatment of gastroduodenal ulcers was based on reducing the amount of acid, for instance, with sodium bicarbonate, calcium carbonate, magne- sium salts, and aluminum oxide hydrate. Advanced ulcers had to be treated surgi- cally. Anticholinergics, antagonists of the acetylcholine receptor should, in principle, have been suitable for ulcer treatment; however, unspecific antagonists are out of the question because of their severe side effects. It was not until pirenzepine 3.48 (Fig. 3.11), a selective so-called M1 antagonist, was developped 3.5 H2 Antagonists: Ulcer Therapy Without Surgery 53
  • 71.
    that this classcould be used in therapy. Here the undesirable side effects of unspecific anticholinergics are only apparent at relative high doses. The role of histamine in acid secretion was initially called into question because the classical antihistamines, later defined as H1 antihistamines, did not reduce acid secretion. These substances, for instance, diphenhydramine 3.49 (Fig. 3.11) antag- onize histamine in the intestines, lungs, and in allergic reactions. Today a wide palette of different histamine antagonists is available for the treatment of allergic rhinitis (hay fever). The most important side effect, particularly with the older substances, is a more or less pronounced sedation. Histamine-induced gastric acid secretion, the effect on the heart, and uterus contractions are not inhibited by diphenhydramine and other analogues. It was first suspected in 1948 that there might be two different histamine receptors, H1 and H2. The H1-type is inhibited by diphenhydramine, but the H2-type, which is responsible for the above-mentioned effects is not. Both belong to the family of G protein-coupled receptors (▶ Sect. 29.1). In the meantime two additional members of the family, the H3 and H4 receptors, had been discovered. In 1964 James W. Black (1924–2010) at Smith Kline French in England began to develop three models to test the inhibition of these other effects of the H2-mediated effect of histamine. One was an in vivo model measuring gastric perfusion on anesthetized rats, and two were in vitro models evaluating the histamine-induced stimulation of a guinea pig heart and a rat uterus. James Black later received not only the Nobel Prize, but was also knighted by Queen Elizabeth II, two rather unusual honors for an industrial pharmaceutical researcher. Despite all strategies that were available for the development of receptor antag- onists, the search for an H2 antagonist was to no avail for years. The American management in Philadelphia became impatient and wanted to end the program. The first promising result came just in the nick of time. Because all lipophilic analogues + O N CH3 H3C CH3 O N N NH2 H 3.46 Acetylcholine 3.47 Histamine O CH3 O H N N O N(Me)2 H N N N O Me 3.48 Pirenzepine 3.49 Diphenhydramine Fig. 3.11 Acetylcholine 3.46 and histamine 3.47 stimulate the acid production in the stomach. The acetylcholine receptor antagonist pirenzepine 3.48 was the first drug specifically for ulcer therapy. Classical H1 antihistamines such as diphenhydramine 3.49 cannot antagonize histamine in the stomach. 54 3 Classical Drug Research
  • 72.
    were ineffective, theearlier more polar compounds that had already been investi- gated were reinvestigated. A compound that had already been synthesized in 1928 and determined to be ineffective, Na-guanylhistamine 3.50 (Fig. 3.12), now appeared to be a weak antagonist. The effect had been overlooked because 3.50 is actually a partial agonist and therefore shows a weak histamine-like effect. Within a few days the first lead structure, S-(2-imidazoyl-4-yl-ethyl)isothiourea 3.51, with interesting activity was identified (Fig. 3.12). The extension of the side chains of both of these compounds delivered partial agonists, the antagonistic effects of which were too weak. It was only in 1972 after they abandoned the hypothesis that the basic nitrogen in the side chain was necessary for activity that they, after chain elongation and an N-methyl substitution of the thiourea, arrived at the first clinically useful H2 antagonist burimamide 3.52. Human trials confirmed the efficacy, but the bioavailability was poor. The next milestone was achieved with the development of metiamide 3.53 (Fig. 3.12), which is 5–10-times more potent than burimamide and clinically demonstrated the desired ulcer-healing effect. In some patients, however, a granulocytopenia occurred, which is a dangerous suppression of the white blood cells and cannot be tolerated. The medical need was great. It was not foreseeable whether the observed effect was a result of H2 antagonism. We have the company to thank for taking on the risk of further research. The sulfur atom of the thiourea was suspect. An isosteric exchange for an oxygen atom delivered a less-potent urea analogue. Exchange for an ═NH group led back to a guanidine, which was strongly basic, but a potent antagonist nonetheless. Substitution of the imino group for an NO2 or a CN group led to less-basic analogues, the antagonistic potency of which was comparable to metiamide. The somewhat more active of the two analogues, cimetidine 3.54 (Fig. 3.12) was clinically tested. In November 1976 and in August 1977 it was introduced in England and the USA, respectively. By 1979 it was available in over 100 countries. Shortly thereafter in 1983, cimetidine (Tagamet® ) became the most- prescribed drug in many countries, and its sales reached about US $1 billion. X N N R H H N HN X CH3 S X NH2 N HN NH 3.50 X = -NH- 3.52 Burimamide, R = H, X = -CH2- 3.51 X = -S- 3.53 Metiamide, R = CH3, X = -S- S N H3C N H H N HN S CH3 N C N 3.54 Cimetidine Fig. 3.12 Na-Guanylhistamine 3.50 and S-(2-imidazolyl-4-yl-ethyl)isothiourea 3.51 served as lead structures for the H2-type antihistamines. The first clinically tested H2 antagonists, burimamide 3.52 and metiamide 3.53, were unsuitable for therapy. Only the development of cimetidine 3.54 led to a breakthrough and an exceedingly successful therapy. 3.5 H2 Antagonists: Ulcer Therapy Without Surgery 55
  • 73.
    Such a successfuldrug makes other companies restless. There are many cases in the history of pharmaceutical research in which a major new concept was adapted by developments in other companies. Other examples of this are the structurally entirely different calcium channel blockers verapamil and nifedipine (▶ Sect. 2.6) and the angiotensin-converting enzyme inhibitors captopril and enalapril (▶ Sect. 25.4). The same happened in the development of the H2 antagonists. Ulcer therapy had been researched since 1960 at Allen and Hansburys, a subsidiary of Glaxo. One of the first lead structures 3.55 (Fig. 3.13), an aminotetrazole with about the same potency as burimamide, was systematically varied without success. Their research management also wanted to stop the project to concentrate on the anticholinergics. The breakthrough came upon replacement of the tetrazole ring with a furan. It was not exactly an obvious idea because the previously synthesized compounds always had at least one nitrogen atom in the ring. The —CH2SCH2CH2— chain was taken over from metiamide 3.53, and a dimethylaminomethylene group was added to improve water solubility; the result was AH 18665 3.56 (Fig. 3.13). The chemists also synthesized a cyanoguanidine AH 18801 3.57 that was comparable to cimetidine 3.54 in terms of potency. The substance’s characteristics were, however, unsatisfactory: the melting point was too low. The nitrovinyl analogue 3.58 brought success in this respect. It was synthesized and was an oil! That was not seen as a prohibitive problem because it was redeemingly 10-times more potent than cyanoguanidine 3.57 in the rat. Ranitidine 3.58 (Fig. 3.13) was developed as a drug and introduced in 1981 as Zantac® and Sostril® . Compared to cimetidine, ranitidine was 4–5-times more efficacious in humans and had the advantage that it was more selective. In 1987 ranitidine overtook cimetidine. In 1994 with US $4 billion in sales, it became the most economically successful drug in annual sales at that time. Within a few years, Glaxo was catapulted to the pinnacle of the world rankings of pharmaceutical corporations. Glaxo used this opportunity. The research of this company and its strategy in drug development belong to “the finest” in the branch today. Through mergers and acquisitions with competitors, Glaxo, “GSK” as it is known today, has become one of the largest pharmaceutical corporations in the market. In the meantime, an antitumor effect in colon, gastric, and renal cancer has been reported for cimetidine. Apparently it suppresses the tumor-mediated interleukin-1- induced selectin activation (▶ Sect. 31.3). It is understandable from the chemical structure that cimetidine has a high affinity for cytochrome P450 enzymes, particularly CYP 3A4 (▶ Sect. 27.6). As a consequence, interactions with other drugs that depend on CYP 3A4 for meta- bolism are common. What was first seen as an indispensible imidazole moiety in 3.54 blocks the catalytic iron center in the P450 enzymes. Ranitidine 3.58 carries a furan ring in the same position and lacks the P450 inhibition. After cimetidine and ranitidine, very few other drugs have made their way to the market. Nizatidine 3.59 and famotidine 3.60 contain a thiazole ring as a heterocycle (Fig. 3.13). In 3.60, the electron-withdrawing group of the guanidine moiety is replaced by a sulfonamide group. 56 3 Classical Drug Research
  • 74.
    It is trueeven for the H2 blockers that good drugs are replaced by better ones. After being prompted to acid stimulation, the cells use an H+ /K+ -ATPase active enzyme to pump protons out of the cell in exchange for potassium at the cost of energy. If “the faucet is turned off ” at this step, not only the histamine-induced acid production, but also the acetylcholine and gastrin-mediated acid production is stopped. Omeprazole 3.61 is a prodrug that has been developed, which, upon rearrangement, acts as an irreversible inhibitor of this proton pump (▶ Sect. 9.5). The effect of omeprazole therefore lasts longer, and the reduction in acid secretion is stronger than with the H2 antagonists. Gastric and duodenal ulcers heal more N N N N N H N H CH3 S 3.55 NH2 O N H3C S N H N H CH3 X CH3 X 3.56 AH 18665, X = S 3.57 AH 18801, X = N-CN N N H3C S N H N H CH3 3.58 Ranitidine, X = CH-NO2 S CH3 NO2 NH2 N SO2NH2 N N H2N S 3.59 Nizatidine N S NH2 S CH3 3.60 Famotidine N S N OMe CH3 N S O 3.61 Omeprazole MeO H Fig. 3.13 The lead structures 3.55–3.57 were steps on the way to ratinidine 3.58, which in the 1980s was the economically most important drug. Nizatidine 3.59 and famotidine 3.60 represent newer developments. Omeprazole 3.13 is a proton pump inhibitor. 3.5 H2 Antagonists: Ulcer Therapy Without Surgery 57
  • 75.
    quickly and reliably.These substances also hit it big. At the end of the last century, Losec® , Antra® (both from Astra), and Prilosec® (Merck Co., USA) had com- bined global sales of over US $6 billion despite the fact that they were introduced to the market much later than ranitidine. The enantiomerically pure form esomeprazole (Nexium® ) even reached US $7 billion in sales in 2007. That is not even the end of the story. Although in principle it had been known since 1983, the relevance of the bacteria Helicobacter pylori for the etiology of ulcers was first discussed in 1994 at a conference of the US National Institutes of Health (NIH). This bacterium infects a large portion of the population in childhood. Frequently it is spread within a family; a kiss can be enough to infect someone. It causes gastrointestinal damage in a portion of those infected, which can lead to an ulcer. In the meantime it is held responsible not only for ulcers but also for at least two different forms of gastric cancer. It survives assault by many antibacterial agents as well as the acidic milieu of the stomach. It has an urease that releases ammonia in its immediate vicinity, which in turn neutralizes the gastric acid. The drugs of choice to treat such infections are combinations of H2 blockers, proton pump inhibitors, and antibiotics. H. pylori seems to quickly develop antibi- otic resistance though. Since the beginning of 1995 the first animal model is available, a mouse with a sustained H. pylori infection; this should promote further research in this important area. There is a vaccine currently in development. A portion of the vaccinated patients exerted enough of an immune response to defend themselves from the bacteria. For practical use however, its reliability must be improved. Perhaps in the foreseeable future we will have an ulcer therapy that is completely different, for instance, a swallowed vaccine that delivers life-long protection. The revolution is in sight: a one-time treatment without repeated gastroscopy. The patients will be delighted. Others will see this dramatic change in therapy with mixed emotions. 3.6 Synopsis • Even though the period of classical drug research was strongly governed by trial and error, it has been exceptionally successful. Many leads were found by accident or from traditional medicine, though limited knowledge of pathophys- iology or molecular disease etiology was available. • Acetylsalicylic acid or Aspirin® is one of our oldest but also most prototypical drugs. Originating from bark extracts and chemically modified to improve taste and tolerance, it achieves its actual potency and mode of action by irreversibly inhibiting cyclooxygenase. • Since then two isoforms of cyclooxygenase have been characterized, one is constitutionally present, and the other is induced in inflamed tissue. Acetylsalicylic acid inhibits both unselectively, giving rise to some undesirable side effects. • Due to irreversible inhibition of COX in platelets, Aspirin exerts an influence on the ratio of synthesized thromboxane and prostacyclin, which has a depressing 58 3 Classical Drug Research
  • 76.
    effect on blood’scoagulation tendency. As a consequence, Aspirin is recommended as “preventive medicine” to protect against thrombosis or to reduce mortality of heart attack. • Malaria is a widespread tropical/subtropical disease transmitted by the anophe- les mosquito and caused by the plasmodium parasite accessing erythrocytes in humans. The disease had been nearly eradicated by fighting the mosquito with the insecticide DDT. One of the oldest active substances to hit the parasite is quinine, isolated from cinchona bark. • After stopping DDT spraying for the mosquitos, malaria raged again. Increasing resistance of the parasite to known drugs occurred, and the development of new chemotherapeutics for malaria has been a rollercoaster ride of promising com- pounds and the development of resistant parasites. • Morphine, isolated from poppies, is in use as the unchanged natural product, as a potent analgesic. When administered correctly, the risk of addiction is low. Its complex structure of five fused rings has been simplified and cut into pieces to give more-easily accessible analogues with higher selectivity. • Cocaine, which is the active ingredient in coca leaves, is one of our oldest drugs. Upon replacement of dopamine from its transporter in the synaptic gap, its euphoric effect is achieved. The cocaine structure served as lead structure for the development of anesthetics. • Ulcer therapy went through several phases of drug development leading to active substances with increasingly efficient mode of action to reduce production of gastric acid. • Starting with antacids and rather unspecific anticholinergics, selective H2 antag- onists were developed as a real breakthrough in pure pharmaceutical treatment of ulcera. They act upon the H2 receptor, a member of G protein-coupled receptors (GPCRs). A protein that pumps protons for acid release is stimulated through these receptors. Proton-pump inhibitors such as omeprazole directly block the function of the proton-secreting H+ /K+ -ATPase that builds up acidic milieu. • The bacterium Helicobacter pylori causes gastrointestinal damage leading to ulcers. It can be eradicated by a combination of a proton-pump inhibitor with an antibiotic. A vaccine against the bacterium could deliver life-long protection. Bibliography General Literature Burger A (1983) A guide to the chemical basis of drug design. Wiley, New York Ryan J, Newman A, Jacobs M (eds) (2000) The pharmaceutical century. Ten decades of drug discovery. American Chemical Society, Washington, DC, Supplement to ACS Publications Sneader W (1996) Drug prototypes and their exploitation. Wiley, Chichester Sneader W (2005) Drug discovery. A history. Wiley, Chichester Verg E (1988) Meilensteine: 125 Jahre Bayer, 1863–1988. Bayer AG, Leverkusen Bibliography 59
  • 77.
    Special Literature Aspirin –eine unendliche Geschichte, Research. Das Bayer-Forschungsmagazin, Issue 6, S. 4–21 (1992) and other articles in this magazine Battistini B, Botting R, Bakhle YS (1994) COX-1 and COX-2: toward the development of more selective NSAIDs. Drug News Perspect 7:501–512 Kelce WR et al (1995) Persistent DDT Metabolite p, p’-DDE is a potent androgen receptor antagonist. Nature 375:581–585 Patrono C (1989) Aspirin and human platelets: from clinical trials to acetylation of cyclooxygen- ase and back. Trends Pharm Sci 10:453–458 Schlitzer M (2007) Malaria chemotherapeutics part I: history of antimalarial drug development, currently used therapeutics, and drugs in clinical development. Chem Med Chem 2:944–986 Wiesner J, Ortmann R, Jomaa H, Schlitzer M (2003) New antimalarial drugs. Angew Chem Int Ed Engl 42:5274–5293 60 3 Classical Drug Research
  • 78.
    Protein–Ligand Interactions asthe Basis for Drug Action 4 To purposefully design an active substance the following questions must first be answered: How does a drug act? How does Aspirin® relieve headaches? Why do b-blockers lower blood pressure? Where does a calcium channel blocker act? How does cocaine work? How do sulfonamides prevent the proliferation of bacterial pathogens? An active substance must bind to a very special target molecule in the body to exert its pharmacological action. Usually this is a protein, but nucleic acids in the form of RNA and DNA can also be target structures for active molecules. An important prerequisite for the binding is that the active substance has the correct size and shape to fit into a cavity on the surface of the protein, a binding pocket, as well as possible. Furthermore, it is also necessary that the surface properties of ligand and protein fit together so that specific interactions can form. In 1894, Emil Fischer compared the exact fit of a substrate for the catalytic center of an enzyme to the picture of a lock and key. In 1913, Paul Ehrlich formulated the Corpora non agunt nisi fixata, which literally translated means “bodies do not act if they are not bound.” With this he wanted to express that drugs that are meant to kill bacteria or parasites must be “fixed,” that is, bound by certain structures. Both concepts form the starting point for rational drug research. In the broadest sense, they are valid even today. After being taken, a drug must arrive at its target tissue and enter into interactions with biological macromolecules there. Specific active substances have a high affinity to a binding site on these macromolecules and are adequately selective. It is only in this way that the desired biological effect can be deployed without extensive side effects. The most important terms that have to do with the modes of action of drugs are briefly defined in Table 4.1. These terms are described in detail ▶ Chaps. 23, “Inhibitors of Hydrolases with an Acyl–Enzyme Intermediate”; ▶ 24, Aspartic Protease Inhibitors; ▶ 25, “Inhibitors of Hydrolyzing Metalloenzymes”; ▶ 26, “Transferase Inhibitors”; ▶ 27, “Oxidoreductase Inhibitors”; ▶ 28, “Agonists and Antagonists of Nuclear Receptors”; ▶ 29, “Agonists and Antagonists of Membrane-Bound Receptors”, ▶ 30, “Ligands for Channels, Pores, and Trans- porters”, ▶ 31, “Ligands for Surface Receptors”; and ▶ 32, “Biologicals: Peptides, Proteins, Nucleotides, and Macrolides as Drugs” in detail with examples of target G. Klebe, Drug Design, DOI 10.1007/978-3-642-17907-5_4, # Springer-Verlag Berlin Heidelberg 2013 61
  • 79.
    structures. Drugs oftenact as inhibitors of enzymes or as agonists or antagonists on receptors. Enzyme inhibitors and receptor antagonists occupy a binding site and prevent the substrate or endogenous ligand from docking there. Agonists exhibit an additional quality, a so-called intrinsic effect. This has the consequence that the receptor adopts a three-dimensional structure that is in a state that invokes a response from a downstream process. Although ion channels, pores, and transport systems are also receptors in the broadest sense, they are considered as a separate group. Often the term “receptor” is used rather loosely as a general term for any biological macromolecule that interacts with a drug. Biomolecules communicate frequently with one another by the recognition and formation of large common surface contacts. It is over these contacts that the primary attack and entry of viruses, bacteria, and parasites into the host cell take place. Many cells receive a signal via surface receptors upon binding a macromolecule. Even the rolling behavior of leukocytes in the vasculature is governed by such surface receptors. These systems are increasingly being tapped for drug therapy (▶ Chap. 31, “Ligands for Surface Receptors”) in that active macromolecular substances, so-called biologicals or biopharmaceuticals Table 4.1 Brief definitions of the most important terms Term Definition Ligand A (usually small) molecule that binds to a biological macromolecule Enzyme An endogenous biocatalyst that can transform one or more substrates into one or more products Substrate A ligand that is a starting material for an enzymatic reaction Inhibitor A ligand that prevents the binding of a substrate either directly (competitive) or indirectly (allosteric), reversibly or irreversibly Receptor A membrane-bound or soluble protein (or a protein complex) that initiates an effect after binding an agonist Agonist A receptor ligand that exhibits an intrinsic effect, that is, it causes a receptor response Antagonist A receptor ligand that either directly (competitive) or indirectly (allosteric) prevents the binding of an agonist Partial agonist A weak agonist that has a high affinity to the binding site, and in this manner acts as an antagonist Inverse agonist A ligand that stabilizes the inactive conformation of a receptor or ion channel Functional antagonist A substance that prevents a receptor response by another mode of action Allosteric effector A ligand that influences the function of a protein by causing a change in the 3D structure of the protein Ion channel A pore in a protein that allows specific ions to flow in and out across the cell membrane along a concentration gradient. Opening and closing is affected by binding a ligand or by a membrane potential change Transporter A protein that transports molecules or ions across the cell membrane against the concentration gradient by consuming energy Antimetabolite A substance that interferes with the biosynthesis of a central metabolic product either as a false substrate or as an inhibitor 62 4 Protein–Ligand Interactions as the Basis for Drug Action
  • 80.
    (▶ Chap. 32,“Biologicals: Peptides, Proteins, Nucleotides and Macrolides as Drugs”), are more often finding application as therapeutics in our pharmaceutical arsenal. 4.1 The Lock-and-Key Principle In the early 1880s, Emil Fischer investigated the cleavage of glucosides with different enzymes that only differed in the stereochemistry of the glycosidic carbon atom. He noticed that particular glucosides could only be cleaved with one group of enzymes. Other glucosides, on the other hand, could only be cleaved with another group of enzymes. He drew the correct conclusions from his observation and in 1894 formulated them in an article in the Berichte der Deutschen Chemischen Gesellschaft (Reports of the German Chemical Society): The limited effect of enzymes on the glucosides can also be explained by the assumption that a chemical process can be initiated only by those [enzymes] that have a similar geometric construction that approximates that of the molecule [substrates]. To use a picture, I want to say that enzymes and glucosides must fit together like a lock and key to be able to exert a chemical effect upon one another. This idea has gained plausibility and value for stereochemistry research after the phenomena was transferred from the biological to the chemical field. In the same year he refined this picture: Apparently here the geometrical construction exerts such a large influence on the play of chemical affinities that the comparison of the two molecules undergoing an interaction seems to me to be comparable to a lock and key. If the fact that some yeasts can ferment a larger number of hexoses than others is to be explained, the picture can be completed by differentiating between master and special keys. Emil Fischer did not pursue this image any further, and later even complained that it is often quoted out of context. The configuration of the sugars interested him, that of the isomeric glucosides did not. He expressed a rather distanced attitude to purely theoretical considerations. In 1912, he wrote in a letter “I myself take not so much pleasure in theoretical things.” This is remarkably modest for a man who exerted such a great influence with his image of a lock and key! Emil Fischer would have certainly been pleased and proud if he had seen the results of the X-ray structural analysis of protein–ligand complexes, for instance, of retinol (vitamin A) bound to the retinol- binding protein, which is the transport protein for this molecule (Fig. 4.1). Many binding sites can exceedingly specifically discriminate between analogues that are chemically closely related. Even the smallest mishap must not occur in protein biosynthesis. Friedrich Cramer more closely investigated the recognition mechanism for the incorporation of the amino acids valine and leucine. These amino acids differ in their side-chains only in that a methyl group is exchanged for an ethyl group. The smaller valine residue should easily fit into the “lock” for 4.1 The Lock-and-Key Principle 63
  • 81.
    a leucine, thoughit might not bind as strongly. A clear distinction, which is absolutely necessary for an error-free protein synthesis, can only occur through repeated recognition. Indeed, that is the case. An energy-consuming, iterative, and “skeptical” auditing process reduces the error quotient to less than 1:200,000. Because of this harsh feedback and control process, even the correct binding partner is sometimes unsuccessful. Over 80% are rejected as being “dubious.” The result is a process with an accuracy of about 1:40,000. The retinol-binding protein is less selective. In this case, such extreme precision is apparently not necessary for flawless functioning. In addition to the “stretched” retinol isomer, the “folded” retinol isomer and chemically related substances also bind to the protein. Other proteins discriminate very little. Examples of less-selective proteins include digestive enzymes (▶ Sect. 23.3), metabolic enzymes (e.g., cytochromes; ▶ Sect. 27.6), and glycoprotein GP 170, which is responsible for the drug resistance of tumor cells (▶ Sect. 30.7). A bacterial transport protein, oligopeptide-binding protein A, can bind any peptide with two to five amino acids with approximately the same affinity; this represents an extreme case of “chemical promiscuity.” Linus Pauling translated the “lock-and-key” principle to the transition states of enzymatically catalyzed reactions. A flexible adaptation often occurs during the binding of the substrate. The transition state of the reaction binds more strongly to the enzyme than either the substrate or the product (▶ Sect. 22.3) and is stabilized by the functional groups of the binding site. The “lock-and-key” principle has been repeatedly challenged because of the mobility of the ligand in the binding site; but even with a high-security lock, the pins are still mobile and play an essential role in the mechanism. In the 1950s, Daniel E. Koshland proposed the theory of “induced fit,” which says that the ligand induces a conformational change in the protein by binding to it. The theory works under the assumption of a particular effect, for instance, the enzymatic cleavage of the substrate. This mechanism does not contradict the lock-and-key principle because, as previously stated, even a high-security lock has mobile parts. Small, induced adaptations play an essential role in the ligand–receptor complex. Even the relocation of entire protein domains has been observed. As a rule, Fig. 4.1 Like a key in a lock, vitamin A (retinol) fits into the binding pocket of its transport protein. The surface of the ligand is green. The protein residues in the direct vicinity of the binding pocket are visible. To improve the clarity, the back of the binding site and the residues in front of the binding site have been omitted. 64 4 Protein–Ligand Interactions as the Basis for Drug Action
  • 82.
    the adaptability ofthe protein is related to its function. Proteins often have to be adequately flexible to fulfill their biological functions. For the rational design of ligands, there are two fundamentally different starting points that differ in the informational content of the system. Either the exact three-dimensional structure of the binding site is known or it is unknown. In the first case, the lock is known, and the key “only” has to be cut (▶ Chap. 20, “Protein Modeling and Structure-Based Drug Design”). In the other case, the active and inactive analogues represent the fitting and ill-fitting keys. It is through the comparison of the keys and systematic variations that better-fitting keys can be designed (▶ Chap. 17, “Pharmacophore Hypotheses and Molecular Comparisons”). In the following section, the binding of a low-molecular-weight drug (“ligand”) and a macromolecular receptor will be more precisely illuminated. These target structures for drugs can be outside or inside the cell, or they can be embedded in the cell membrane. Therefore we will briefly address the construction and function of the cell membrane before the protein–ligand interaction is brought to the foreground. 4.2 The Essential Role of the Membrane The majority of biological processes in our body take place inside cells. These cells are surrounded by a membrane that protects the cellular content from “leaking”. The membrane also hinders undesirable xenobiotics from entering the cell and mediates the contacts between cells. Membranes are also found within the cell, where they form substructures (so-called compartments) and separate individual cellular components from one another. In mammalian cells, the outer membrane is made up of a lipid double-layer, in which proteins and cholesterol molecules are embedded (Fig. 4.2). All molecules can move relatively freely, therefore it is called a “fluid mosaic membrane.” Lipid membranes of this type function as barriers for polar substances and as permeable layers for non-polar molecules. The importance of membranes for the transport and distribution of drugs is presented in detail in ▶ Chap. 19, “From In Vitro to In Vivo: Optimization of ADME and Toxicology Properties”. Here, only the important function that the lipid membrane has for the activity of drug molecules is discussed. Membrane-embedded proteins belong to entirely different classes. Among them are the membrane-anchored and membrane-residing enzymes, the large class of G protein-coupled receptors (▶ Chap. 29, “Agonists and Antagonists of Membrane-Bound Receptors”), ion channels, pores and transporters (▶ Chap. 30, “Ligands for Channels, Pores, and Transporters”), and surface recep- tors (▶ Chap. 31, “Ligands for Surface Receptors”). Due to the phosphate and ethanolamine head groups, both of the outer layers of the lipid double layer are very polar. The alkyl chains are found on the inside, where the membrane is non-polar. Many drugs are also non-polar and accumulate here in higher concentration than in solution. Amphiphilic (soap-like) molecules, that is, substances that have both non-polar and polar character, arrange themselves in the membrane so that the non-polar portion is on the inside (Fig. 4.2). 4.2 The Essential Role of the Membrane 65
  • 83.
    This orientation withinthe membrane plays a particularly important role when the polar group is a positively charged nitrogen atom that can form additional electro- static interactions with the phosphate group of the lipids. In the meantime, this concept has been proven experimentally with numerous independent methods. For many receptors it is accepted that the ligand binds at a site in the protein that is only accessible from the inner layer of the membrane (e.g., lipases, ▶ Sect. 23.7; or cyclooxygenases, ▶ Sect. 27.9). Therefore the enrichment and arrangement of an active molecule in the membrane plays an important role for the optimal approach to the binding site. If the molecule, on the other hand, assumes an incorrect orientation, its docking is hindered. 4.3 The Binding Constant Ki Describes the Strength of Protein–Ligand Interactions The binding of a ligand to its target protein is measurable. The extent of the binding is characterized by the binding constant (Eq. 4.1). Literally interpreted, the disso- ciation constant Kd is the reverse of the association constant Ka. With enzymes, the so-called inhibition constant Ki is determined in a kinetic assay (▶ Sect. 7.2). At low substrate concentration, it determines the inhibitory concentration that is necessary to reduce the rate of an enzyme reaction by one half. Although Ki is therefore not exactly defined as a dissociation constant, the two quantities are usually referred to interchangeably. In the following, the abbreviation Ki is used in the same sense as a dissociation constant, which indicates the strength of the interaction between protein and ligand. It is a thermodynamic equilibrium measure that indicates what portion of the ligand is bound to the protein, on average. The law of mass action can be expressed as: Polar drug Amphiphilic drug Membrane-embedded cholesterine molecule Protein Exterior of the membrane Interior of the membrane Non-polar drug Polar head groups Non-polar alkyl groups Membrane lipids Fig. 4.2 Membranes from mammalian cells are constructed from a lipid double layer, in which proteins (yellow) and individual cholesterol molecules (black) are embedded. The individual lipid molecules (orange) point their polar groups to the exterior of the membrane, and their alkyl chains to the interior. Therefore polar drugs (light blue) accumulate on the outside of the membrane. Non-polar drugs (red) are enriched in the interior of the membrane. Amphiphilic drugs (violet) are oriented into the membrane according to their structure. Despite this, all of the molecules can move relatively freely. Therefore this is called a “fluid mosaic membrane”. 66 4 Protein–Ligand Interactions as the Basis for Drug Action
  • 84.
    Ki ¼ ½ligand ½protein ½ligand protein complex (4.1) Ki has the dimensions of a concentration with the units of mol/L (M). The smaller the Ki value is, the more strongly the ligand binds to the protein. If the concentration of the ligand is significantly lower than Ki, only a very small portion of the protein molecules are occupied by ligand molecules. A biological effect like that of the inhibition of an enzyme cannot be observed. If the ligand concentration is equivalent to Ki, half of the available protein molecules are occupied by a ligand. The Gibbs free energy can be derived from the binding constants by a thermodynamic relationship (which is valid for equilibria under so-called standard conditions; Eq. 4.2): DG ¼ RT ln Ki (4.2) in which R is the gas constant, and T is the absolute temperature in Kelvin. A binding constant of Ki ¼ 109 M ¼ 1 nM, which is a respectable value for an active substance, corresponds to a Gibbs free energy of 53.4 kJ/mol at body temperature. A change in Ki of one order of magnitude means a change in the Gibbs free energy of 5.9 kJ/mol (or 1.4 kcal/mol). Frequently, instead of a Ki value, a so-called IC50 value is given. In contrast to the Ki value, the IC50 value depends on the concentration of the enzyme and the substrate. The obtained value is affected by the affinity of the substrate for the enzyme as substrate and inhibitor compete for the same binding site. The IC50 value can be transformed into a Ki value by use of the Cheng-Prusoff equation. Experience has shown that both values in the first approximation run parallel to one another so that the more easily determined IC50 value is well suited to characterize a ligand in comparison to other compounds. Why is the Gibbs free energy used to describe the energetic relationships upon complex formation? In chemistry and biology, processes run in open systems under atmospheric pressure. Because the volume of the environment is enormous, it can be assumed that the external pressure is unchanged even in processes in which produc- tion of gas occurs. Therefore these processes are considered to be under constant- pressure conditions. Nonetheless, a gas that was formed in the reaction must first find space in the surrounding particles in the air. Therefore work must be performed. This so-called pressure–volume work diminishes the maximum possible work to be achieved by the system (internal energy, DU). The energy diminished by the pressure–volume work is referred to as the enthalpy (DH). It is therefore the energy converted during a process, corrected by the portion of the pressure–volume work. The change in enthalpy is not the entire answer as to why a particular process, such as the formation of a protein–ligand complex, spontaneously occurs. If we take a hot and a cold chunk of metal and bring them into contact, everyone knows that the heat will flow from the hot metal to the cold one. The opposite cannot be observed, even though the energy content of the entire system would remain unchanged for this process. Why does energy spontaneously flow from a hot to a 4.3 The Binding Constant Ki Describes the Strength of Protein–Ligand Interactions 67
  • 85.
    cold object andnot the other way around? This has something to do with the tendency of all natural process to distribute energy evenly. The metal atoms vibrate very strongly in a hot metal block in around their resting positions. Therefore the piece of metal is hot. Some vibrational degrees of freedom are strongly activated. If the cold metal block is brought into contact with the hot metal, these vibrations are transmitted. In the end, the metal atoms in both blocks vibrate around their resting positions, but on average not as vigorously as the atoms in the hot block moved before. The sum of the energy content has remained constant; it is, however, now distributed over many more degrees of freedom. The system can be described as having gone into a more disor- dered state (many more atoms are now vibrating on average than in the beginning). This happens for all spontaneously occurring processes. The entropy, S, is used as a measure to describe the uniform distribution or random disorder. To correctly describe the process of the formation of a protein–ligand complex (Eq. 4.3), we need not only the enthalpy (DH) that is exchanged between the two binding partners, how the distribution of degrees of freedom changes, and whether the system migrates into a more disordered state must also be considered. Therefore the term free energy (DG) has been introduced because it considers not only the energy balance of the process. It also considers the changes in entropy (TDS) that reflect the spontaneous distribution of energy over the degrees of freedom of the system. Spontaneously occurring processes are characterized by a negative value for DG. DG ¼ DH TDS (4.3) As shown in Eq. 4.3, DG is composed of an enthalpic component DH, and an entropic component TDS. The entropic component is weighted with the temperature. It matters a great deal whether the entropy in a system is changed at low temperature, where all the particles are largely in an ordered state, or whether it occurs at high temperature where the disorder is already very high. Because of the negative sign, an increase in the entropy causes a decrease in the DG, and therefore an increase in the binding affinity. 4.4 Important Types of Protein–Ligand Interactions Organic molecules can bind to proteins by forming chemical bonds between ligand and protein as well as non-covalent interactions. For example, a chemically mod- ified product of omeprazole reacts with its target a protein and forms a covalent bond (▶ Sect. 9.5). In this section, we want to limit ourselves to ligands that bind to the protein by forming non-covalent interactions. For the following discussion, it is helpful to classify protein–ligand interactions into different categories. The different types of interactions are summarized in Fig. 4.3. Hydrogen bonds (H-bonds) are very frequently observed between protein and ligand. The proton-carrying partner in a biological system is usually an NH or OH group, which is termed hydrogen-bond donor. The opposite group is an electro- negative atom with a partial negative charge and is termed hydrogen-bond acceptor. 68 4 Protein–Ligand Interactions as the Basis for Drug Action
  • 86.
    Examples of hydrogen-bondacceptors are oxygen and nitrogen atoms. Hydrogen bonds are predominantly electrostatic interactions. They achieve their extraordi- nary strength because the hydrogen atom of the donor group is bound to a strongly electronegative atom, whereby the electron density of the hydrogen atom is shifted to the neighboring atom. The sphere of influence of the hydrogen atom becomes virtually smaller. This, in turn allows the acceptor to come closer to the proton than the sum of the van der Waals radii should actually allow. The electrostatic attraction between the partners therefore becomes larger. The geometry of an H-bond is shown in Fig. 4.4. A hydrogen bond is characterized by a pronounced distance and angle dependence. It is directional; its geometry is defined within narrow limits. It is often found that the charged groups of the ligand bind to the oppositely charged groups on the protein. Such ionic interactions (also known as salt bridges) are particularly strong when the two groups are separated by 2.7–3.0 Å from one another. Frequently an ionic interaction overlaps with a hydrogen bond. This is called a charge-assisted hydrogen bond. We will see that in many protein–ligand complexes, the association is determined in large part by such ionic interactions. A few proteins contain metal ions as cofactors, for example, Zn2+ in metallo- proteases (▶ Chap. 25, “Inhibitors of Hydrolyzing Metalloenzymes”). It is often O O H CH3 H N O H3C H3N+ N + N N H H H H Zn 2+ S H O O O O Protein Ligand Hydrogen bonds Hydrophobic interactions Cation–p interactions Ionic interactions (salt bridges) Metal complexation + - - Fig. 4.3 Frequently occurring protein–ligand interactions. Important polar interactions are hydrogen bonds and ionic interactions. Metalloproteases contain zinc ions as a cofactor, the interaction of which with a ligand often yields important contributions to the binding affinity. Non-polar parts of the protein and ligand contribute hydrophobic interactions. Because of the particular electron distribution in aromatic rings, the interaction between unsaturated ring systems is particularly large. 4.4 Important Types of Protein–Ligand Interactions 69
  • 87.
    the attractive interactionsbetween the metal ion and the opposite charge in the ligand that makes a decisive contribution to the affinity in these structures. Fur- thermore, there are a few groups that are particularly well suited to forming complexes with transition metals. Among these are the thiols RSH, hydroxamic acids RCONHOH, acid groups, and many nitrogen-containing heterocycles. Whether the charge can increase the affinity contribution of hydrogen bonds depends strongly on the protonation state in which the involved functional groups are found. Drugs are usually weak acids or bases, that is, they contain so-called titratable groups (▶ Sect. 19.4). Whether these groups, for example, a carbonic acid, an acidic sulfonamide, or a nitrogen-containing heterocycle, can release or accept a proton and transform into a charged state depends strongly on the pH. The same can happen with functional groups of the acidic or basic amino acid residues. Then these groups can form charge-assisted hydrogen bonds that provide a higher contribution to the binding affinity (Sect. 4.8). The pKa value is considered to estimate whether a group is in the protonated or deprotonated state. It indicates at which pH value the two forms, which are in equilibrium with one another, are present in equal amounts. The situation might become even more complicated because the pKa value can be shifted by the local environment. In a hydrophobic environment, adopting a charged state is less favorable for acidic and basic groups, that is, a shift to less acid or basic character is the consequence. If an already-protonated, positively charged group in the ligand faces an amino acid of the protein with the same charge, its protonation becomes even more difficult to accomplish. The group therefore behaves less basic. The opposite is the case when putative positively charged basic groups bind in a protein environment with a negative charge. Here, the charged state is even more easily formed, which corresponds to having stronger basic character. Entirely analogous considerations result for acidic groups, just with opposite signs. Here a positively charged protein environment shifts acidic groups toward higher acidity, and a negatively charged environment makes them behave less acidic. In this way the protein environment can induce a significant pKa shift of the titratable groups of the ligand. Uncharged H-bonds can become charge-supported contacts that significantly contribute to the binding affinity (▶ Sect. 21.9). With the help of electrostatic calculations an attempt can be made to estimate the pKa shift upon complex formation (▶ Sect. 15.4). N H O N-H··O C=O··H Fig. 4.4 Geometry of a hydrogen bond. The atoms N, H, and O adopt an almost linear orientation to one another. The N···O distance is between 2.8 and 3.2 Å. The angle N–H···O is practically always larger than 150 . A large variability is observed for the C═O···H angle. It is typically between 100 and 180 . 70 4 Protein–Ligand Interactions as the Basis for Drug Action
  • 88.
    Hydrophobic interactions formthrough the close proximity between non-polar amino acid side chains of the protein and lipophilic groups on the ligand. Lipophilic groups are aliphatic or aromatic hydrocarbon groups and also halogen substituents (e.g., chlorine) and many heterocycles such as thiophene and furan (Fig. 4.5). All areas that cannot form H-bonds or other polar interactions count as lipophilic parts of the surface of a protein and ligand. In contrast to hydrogen bonds, hydrophobic interactions are not directional. It does not matter in which relative orientation that the lipophilic groups are to each other. The interactions between aromatic rings, for which there is indeed a preferred relative orientation, are an exception to this. It has been shown that for ligands with large lipophilic groups hydrophobic interactions often afford a significant contribution to the binding affinity. The influence of direct attractive forces between the lipophilic groups is, however, small. The hydrophobic interactions are mainly caused by the displacement, or more correctly put, the liberation of water molecules from the lipophilic environ- ment of the binding pocket. Moreover the ligand with its lipophilic substituents leaves the bulk water phase in the vicinity of the protein. The solvent “cave”, in which the ligand was hosted in water, collapses. This step is also coupled with changes in the free energy. The role of water molecules is discussed in Sect. 4.6. Yet another important interaction should be mentioned here. Obviously, quaternary amines bind particularly well in binding pockets that are formed by the aromatic side chains of the protein. This contact is largely based on the polarization interac- tion between the positive charge and the electronic system of the aromatic rings. 4.5 The Strength of Protein–Ligand Interactions When evaluating the strength of protein–ligand interactions, it is reasonable to first consider the non-covalent interactions between isolated small molecules. Information about these is available through quantum mechanical calculations (▶ Sect. 15.5) as well as spectroscopic investigations. In this way, molecule pairs can be experimen- tally investigated in the gas phase. The association energies that are obtained for the molecules afford an impression about the strength of the direct interactions. The influence of effects that originate from the liberation of the solvent water (desolvation) is, of course, missing in such experiments. Some of these data are summarized in Table 4.2. O S O iso-Pentyl- Cyclohexyl- Phenyl- Chlorophenyl- Furanyl- Thiophenyl- Phenoxy- Cl Fig. 4.5 Typical lipophilic groups in ligands are aliphatic and aromatic hydrocarbons, halogen substituents, as well as non-polar heterocycles such as furan and thiophene. 4.5 The Strength of Protein–Ligand Interactions 71
  • 89.
    The results showthat electrostatic interactions are the dominating energetic factor. The interaction between a cation and an anion in a vacuum is more than 400 kJ/mol. This corresponds to the strength of a covalent bond! This amount is enormous compared to the typical protein–ligand interactions in water that are summarized in Sect. 4.4. The binding of an ion pair in the gas phase, therefore, is much larger than the typical strength of a protein–ligand interaction in water. Two water molecules bind to each other with 22 kJ/mol. This interaction is also overwhelmingly of electrostatic nature in that the large dipole moment is respon- sible for the strong binding. Interactions between small, non-polar molecules are much weaker. Two methane molecules bind to each other with about 2 kJ/mol. This is less than 10% of an H2O···H2O interaction. Correspondingly, methane boils at 90 K whereas water is a liquid at room temperature. The direct interactions between polar groups are therefore orders of magnitude stronger than those between non- polar groups. 4.6 Blame It All on Water! The data that were presented in the previous section could suggest that protein– ligand interactions are mainly determined by H-bonds and ionic interactions. All the more astonishing is the fact that the acetate ion, CH3COO , does not form a dimer with the guanidinium ion H2NC(═NH2 + )NH2 in water. Likewise, amides practically do not associate in water at all, even though hydrogen bonds often occur between two amide groups in protein structures. How can that be? The answer is: water is to blame for everything! All biochemical reactions take place in water, and they only occur at all because of this reason! The binding of a ligand to a protein occurs in an aqueous environ- ment. At first, the “empty” binding pocket of the protein is filled with water. A few water molecules form hydrogen bonds to the protein and are found in an energet- ically favorable orientation. Other water molecules are in contact with lipophilic areas on the protein surface and cannot build a perfect hydrogen-bond network. The ligand is also solvated. When it diffuses into the binding pocket it displaces the water molecules that are there and must additionally strip off its own solvation shell. At the same time, the “cave” in which the ligand was situated in the water phase collapses. Therefore not only are direct interactions between protein and ligand formed, numerous H-bonds to water molecules are broken. Table 4.2 Experimental or quantum mechanically determined association energies in the gas phase Dimer Binding energy in kJ/mol CH4···CH4 2 C6H6···C6H6 10 H2O···H2O 22 NH3···NH3 18 Na+ ··· H2O 90 NH4 + ···CH3COO 400 Na+ ···Cl 400 72 4 Protein–Ligand Interactions as the Basis for Drug Action
  • 90.
    We want toconsider the formation of a hydrogen bond as well as a lipophilic contact between the protein and ligand more closely. Both processes are displayed in Fig. 4.6. How are H-bonds formed between protein and ligand? Let us assume that the polar groups of both partners are solvated. Then at least two water molecules must be displaced to form an H-bond between the two partners. The released water molecule can in turn form H-bonds with other water molecules. In this way, exactly as many new H-bonds are formed as are broken. The total number of H-bonds remains constant! The gain in free energy is determined by the relative strength of the different H-bonds as well as the entropic contribution, which is based on the change in the degree of order of the system (Sect. 4.7). The total contribution to the free energy that results is difficult to quantitatively predict. If a ligand manages to form more hydrogen bonds to the protein than were possible with the solvent shell, very strong binding results. This is particularly the case if in the binding pocket of the protein, the groups forming the polar H-bond are oriented in a way that the water molecules alone cannot fully manage to satisfy all these interactions. This is possible for the ligand because it has an optimal arrangement of its donor and acceptor groups. The formation of a hydrophobic contact also leads to the release of water molecules (that were previously occupying the space) from the binding pocket. Once released into the surrounding aqueous bulk phase, they form H-bonds with N H H O H H O H O N H H O H H O H O CH3 H O H H O H CH3 CH3 H O H H O H CH3 + + + + Formation of a hydrogen bond between protein and ligand Hydrophobic interactions Ligand Protein–ligand complex Protein a b Fig. 4.6 The influence of water molecules on the strength of protein–ligand interactions. (a) Upon formation of an H-bond between protein and ligand, water molecules must be displaced. These form hydrogen bonds to both protein and ligand prior to complex formation. The balance of hydrogen bonds, that is, the number of H-bonds before and after binding, remains unchanged. (b) Upon formation of hydrophobic contacts, water molecules are released from an environment that was unfavorable for them into the bulk water phase. The number of H-bonds increases. 4.6 Blame It All on Water! 73
  • 91.
    each other (Fig.4.6). Because previously H-bonds were possible neither to the protein nor to the ligand, the total number of H-bonds now increases. Moreover, the strength with which the water molecules were fixed in the binding pocket before their release is decisive. If they were strongly fixed, the newly gained degrees of translational freedom increase the disorder and therefore boosts the entropy, which is thermodynamically favorable for the free energy DG. If the displaced water molecules were already severely disordered, their displacement causes very little entropy gain. Newer findings have shown that the binding pocket does not need to always be uniformly packed with water molecules. Narrow hydrophobic pockets in particular are not perfectly solvated. This has consequences for the free energy balance during binding because it is just this displacement of water molecules that is decisive for the hydrophobic interactions. 4.7 Entropic Contributions to Protein–Ligand Interactions In addition to the energetic contributions, the entropic component must also be considered in the evaluation of the strength of protein–ligand interactions. As described previously, the entropy S is a measure of the order of a system. This allows an estimate to be made as to over how many degrees of freedom a particular amount of energy is distributed. A degree of freedom can mean, for example, a particular vibration of the system or a rotation of individual groups around one another. A highly ordered system in which the energy is distributed over only a few degrees of freedom has little entropy; increasing the disorder increases the entropy and concomitantly decreases the free energy G. At room or body temperature, proteins and ligands can move in all spatial directions. Furthermore, a water shell is, of course, also mobile; the water mole- cules diffuse back and forth. A few of them are spatially fixed for a longer period of time because they are bound to the protein by several H-bonds. Such water molecules can be identified by X-ray crystallography of the protein. A spatial fixation of a molecule is entropically unfavorable. Other water molecules are freely mobile and are therefore not captured in an X-ray crystal structure. Such water molecules are in an entropically favorable state because their TDS contribution is more positive than for a spatially fixed water molecule. The hydrophobic protein–ligand interaction is, in many cases, of an entropic nature, above all, when individual, previously fixed water molecules are displaced from the binding pocket and released into the surrounding bulk water. The entropic contribution to protein–ligand interactions is therefore based not on direct interactions but rather on how the number of degrees of freedom for the protein–ligand–water system changes upon ligand–protein binding. The more water molecules are released from the hydro- phobic environment, the greater the contribution to the binding affinity. The number of released molecules is, in a first approximation, proportional to the size of the hydro- phobic surface that is no longer accessible to water upon ligand binding, that is, in other words, “buried.” Therefore this surface contribution often serves as a benchmark for the estimation of the entropic portion. 74 4 Protein–Ligand Interactions as the Basis for Drug Action
  • 92.
    In addition tothe release of fixed water molecules, there is a further entropic contribution to the binding energy. The association of a ligand to the protein leads to a loss in translational and rotational degrees of freedom, and therefore to a loss in entropy. Before the association, the ligand and protein move freely and independently of one another. They each have three degrees of translational and three degrees of rotational freedom. After binding, the protein and ligand rotate and diffuse together so that three degrees of translational and three degrees of rotational freedom are lost. Furthermore, a freely mobile, flexible ligand takes on different conformations (▶ Chap. 16, “Conformational Analysis”) and is therefore entropically favored. Once bound to the protein the ligand is restricted in its conformational degrees of freedom to one or a few conformations that fit into the binding pocket of the protein. It finds itself in an entropically unfavorable state. Different enthalpic and entropic binding contributions are summarized in Fig. 4.7. It is first assumed that the entropy TDS contributes positively and the enthalpy DH contributes negatively to DG. If the negative enthalpic contribution over- compensates entropic losses, an overall negative DG results (cf. Eq. 4.3). In fact, such enthalpy-driven binding is very frequently observed, but there are also known cases, especially with large lipophilic ligands, in which the binding is entropy driven. This means that the ligand binding is enthalpically unfavorable, but the effect is over-compensated by the marked entropy increase, that is, DG is overall negative. Receptor Bound H2O molecules Ligand in solution Loosely associated H2O molecules Free rotation Ligand–receptor complex H2O molecules that can move freely in solution Fig. 4.7 Illustration of the thermodynamic contribution to the free energy DG. Before binding, the ligand can move freely; this gives rise to a certain translational and rotational entropy. Moreover, the ligand is usually flexible, and adopts different conformations. Protein and ligand are solvated in that H-bonds to water molecules are formed. Some water molecules are in loose contact with the protein or the ligand without forming H-bonds. Translational and rotational degrees of freedom are lost upon binding. The concomitant loss in entropy is unfavorable for the binding. Furthermore, both the protein and the ligand must shed their water shells, which is also an unfavorable process for the binding. The binding of the ligand leads to the formation of direct interactions to the protein and it releases water molecules. Both of these are contributions that are favorable for the binding. H-bonds are indicated by dashed lines and hydrophobic interactions by dotted lines. 4.7 Entropic Contributions to Protein–Ligand Interactions 75
  • 93.
    The entropy gainoccurs, as mentioned, because of the release of fixed water molecules. This, however, is not the only entropy contribution that changes upon ligand binding. The protein changes too. For example, many side chains in proteins are distributed over multiple conformational states. Upon binding a ligand, this distribution can change. According to the total balance, the entropy can increase or decrease through this change. The same is true for the rotation of side chains, especially methyl groups. If the rotational behavior changes, the total entropy of the ligand-binding process is influenced. The picture can even be complicated in that some areas of the protein transform into a more ordered state, and others become less ordered. In this way the entropic contribution is partially compensated. It is often assumed that the changes in the entropic portion of the binding within a series of very similar ligands are the same. Then such contributions can be neglected in a relative comparison of ligands. Unfortunately, this simplified picture has proven to be a fallacy. Just such an example is introduced in Sect. 4.10. 4.8 What Is the Contribution of a Hydrogen Bond to the Strength of Protein–Ligand Interactions? Naturally, in any discussion about protein–ligand interactions, the question arises as to how large the contribution of particular hydrogen bonds to the binding affinity actually is. The question can be experimentally answered when two protein–ligand complexes that are only different by one hydrogen bond are compared to one another. Such a comparison is possible, for example, by using protein mutants in which an amino acid that contributes an H-bond to the ligand is exchanged for another amino acid that cannot do this. Alan Fersht conducted an elegant experiment for protein tyrosyl–RNA synthase in complex with the substrate tyrosyl adenylate (Fig. 4.8). Numerous H-bonds are formed between the protein and substrate, for example, between the phenolic OH group of tyrosine 34 and the substrate. The mutant Tyr34 ! Phe, in which tyrosine is replaced by a non- polar phenylalanine, was prepared, and the binding of the substrate to the mutant protein was tested. The binding was weakened by 2 kJ/mol. Analogously, other mutants were investigated. The loss of a neutral H-bond led to a loss in binding affinity between 2 and 6 kJ/mol. The H-bonds in which one partner is charged are stronger. The mutation Tyr169 ! Phe decreases the binding affinity by 15.6 kJ/mol. Fidarestat 4.1 is a potent aldose reductase inhibitor (▶ Sect. 27.5). It forms a hydrogen bond to the NH function of the amide group of Leu300 with its carboxamide group (Fig. 4.9). If the leucine is exchanged for a proline, the possibility of forming the H-bond is lost because proline has no free NH group. This exchange means a loss in free energy of 7.8 kJ/mol. When the partitioning of enthalpy DH and entropy TDS is measured by microcalorimetry, it can be seen that the H-bond loss is largely of an enthalpic nature (▶ Sect. 7.7). In comparison, the inhibitor sorbinil 4.2, in which the carboxamide group is missing, should be considered. Interestingly, the free energy of binding for the wild-type protein and the Leu300 ! Pro mutant is practically identical. Because the group to form the 76 4 Protein–Ligand Interactions as the Basis for Drug Action
  • 94.
    O P O O O O N N N N NH2 O O O N O H H H H HH O H O H Asp38 Asp176 Tyr169 Gln195 Asp38 His48 Thr51 Gly192 Gly36 Cys35 Tyr34 + - Fig. 4.8 Numerous intermolecular hydrogen bonds are formed in the complex between tyrosyl- RNA synthetase and the substrate tyrosyl adenylate. The exchange of amino acid Tyr34 for Phe or Tyr169 for Phe leads to the situation that in each case the hydrogen bond can no longer be formed. This results in a loss of binding affinity. N H N H O O NH2 F O O N H O N H O NH N O N H O NH Leu300 4.1 Pro300 N H N H O F O O N H O N H O NH N O N H O NH H O H Leu300 4.2 Pro300 ΔΔG: 7.8 kJ/mol ΔΔH: 6.9 kJ/mol −TΔΔS: 0.9 kJ/mol ΔΔG: −0.8 kJ/mol ΔΔH: 5.1 kJ/mol −TΔΔS: −5.9 kJ/mol Fig. 4.9 Fidarestat 4.1 (left) forms a hydrogen bond with its carboxamide group to the NH function of Leu300 (blue). By exchanging Leu for Pro (red), the H-bond can no longer be formed. This leads to a DDG loss of 7.8 kJ/mol, which is paid for mostly by the enthalpy (DDH: 6.9 kJ/ mol). The carboxamide group is missing in sorbinil 4.2 (right). The exchange leucine ! proline leaves the free energy of binding DDG practically unchanged. Sorbinil, however, binds to the wild type (leucine, blue) enthalpically more favorably and entropically less favorably than to the proline mutant (red). An entrapped water molecule mediates an H-bond between sorbinil and Leu300. This brings an enthalpic advantage to the wild type of about 5 kJ/mol. At the same time, the entrapment of a water molecule is entropically disadvantageous for the wild type (─TDDS: 6 kJ/mol) and compensates the enthalpic advantage. 4.8 Contribution of a Hydrogen Bond 77
  • 95.
    H-bond with theNH group of Leu300 is missing in sorbinil, the loss of the NH function in the protein is hardly noticeable. This explains the practically unchanged free energy of binding. Nonetheless, the sorbinil complexes with the wild-type protein and the mutant are different. The binding with the wild type is enthalpically more favorable, but it is entropically more expensive than with the mutant. The crystal structure indicates that a water molecule mediates an H-bond between the ether group of sorbinil and the NH function of Leu300 (Fig. 4.9). This yields an enthalpy gain of about 5 kJ/mol. At the same time, the uptake of water is entropically disfavored. This contribution of nearly 6 kJ/mol just compensates the enthalpic gain so that there is practically no affinity gain in DG in the balance. The proline mutant cannot form a water-mediated contact to sorbinil because of the missing NH function. Therefore the enthalpic gain from the H-bond is lost. There is also no entropic loss from capturing a water molecule. The three-dimensional structures of a large number of protein–ligand complexes have been elucidated. Many of these complexes contain hydrogen bonds between the protein and ligand. The entire issue of the contribution of hydrogen bonds to the binding affinity becomes apparent in Fig. 4.10. Here the experimentally determined binding constants for 80 protein–ligand complexes are plotted against the number of hydrogen bonds. The measured binding constants spread over a considerable range for a given number of hydrogen bonds. The contribution of a single H-bond is therefore by no means constant, but rather it varies significantly. The contribution of an H-bond can even reduce the binding affinity due to an unfavorable desolvation effect. If two ligands are compared that are only different 16 14 12 10 8 6 4 2 0 0 2 4 6 8 10 12 14 −lgK i n Fig. 4.10 A plot of the binding constants Ki of 80 crystallographically investigated protein ligand complexes shows that Ki has no direct relationship to the number of hydrogen bonds that exist between protein and ligand. 78 4 Protein–Ligand Interactions as the Basis for Drug Action
  • 96.
    in the functionalgroup that forms the H-bonds with the protein, the affinity can increase, remain the same, or even decrease. An impressive example of the importance of hydrogen bonds is displayed by the inhibitors 4.3 of the metalloprotease thermolysin, which were synthesized in the research group of Paul Bartlett. There, a phosphonamide ─PO2HN─ was replaced by a phosphinate ─PO2CH2─ or a phosphonate ─PO2O─. The results of these exchanges are summarized in Table 4.3. Although the X-ray structure shows that the NH groups form an H-bond, it can nonetheless be replaced with a CH2 group without loss of binding affinity. This result is understandable if we consider the number of hydrogen bonds before and after ligand binding for the phosphonamide and for the phosphinate, as we did in Fig. 4.6. In both cases the number of H-bonds is unchanged. If the NH group is replaced by an oxygen atom, the binding affinity decreases by a factor of 1,000. In water, the oxygen atom that is in the place of the NH group can form a hydrogen bond to the bulk water. In the protein–ligand complex of the phosphonate ─PO2O─, the electronegative oxygen atom is found exactly opposite the oxygen of the carbonyl group of Ala113. Two acceptor groups are directly facing one another. A hydrogen bond cannot be formed here. The inventory of hydrogen bonds remains unbalanced. Furthermore, the two groups repel one another, which results in a poorer binding. A similarly positioned case is illustrated in Table 4.4. Here the binding affinity of three thrombin inhibitors 4.4 that were synthesized at Eli Lilly are compared with each other. The amine (X ¼ ─NH─) can form an H-bond with Gly219 and binds the most strongly. The ether (X ¼ ─O─) binds 5,000-times weaker because of an electrostatic repulsion Table 4.3 Binding constants Ki for the thermolysin inhibitors 4.3, which contain either a phosphonamide (X ¼ ─NH─), a phosphonate (X = ─O─), or a phosphinate (X ¼ ─CH2─) group. The phosphonamide group -PO2NH- complexes the zinc ion and simultaneously forms an H-bond with Ala113 O N O P X R O O O O Zn 2+ Ala 113 - 4.3 H Binding constant Ki in mM X¼ R ─NH─ ─O─ ─CH2─ OH 0.76 660 1.4 Gly-OH 0.27 230 0.3 Phe-OH 0.08 53 0.07 Ala-OH 0.02 13 0.02 Leu-OH 0.01 9 0.01 4.8 Contribution of a Hydrogen Bond 79
  • 97.
    between the etheroxygen atom and the carbonyl group of the protein. The aliphatic compound (X ¼ ─CH2─) shows remarkable binding compared to X ¼ ─NH─ that is merely reduced by a factor of eight (thrombin) and two (trypsin). 4.9 The Strength of Hydrophobic Protein–Ligand Interactions We have seen that the direct attractive forces between lipophilic groups are considerably smaller than those between polar groups. Hydrophobic interactions are mainly based on the displacement of water molecules. It has been shown in many experiments that their contribution to the binding affinity is, as a first approx- imation, proportional to the size of lipophilic surface that is buried upon ligand binding and therefore no longer accessible to water. Typically it is found that the contribution is approximately between 50 to 200 J/mol per Å2 of lipophilic contact area. An example for this is retinol. It binds to the retinol-binding protein (Fig. 4.1) with a binding constant of 190 nM, exclusively through lipophilic contacts. This corresponds to a free energy of 39.8 kJ/mol. As a result of the binding, a lipophilic area of 250 Å2 is buried. The contribution per Å2 amounts to 39,800/250 ¼ 159 J/mol Å2 . Six HIV protease inhibitors (▶ Sect. 24.6) are listed in Fig. 4.11. During the course of a lead structure optimization, the hydrophobic surface of 4.5 was enlarged by adding hydrophobic groups. It could be confirmed crystallographically that the binding mode did not change. If the changes in the molecular volume in this series are plotted against the affinity, a linear relationship is obtained. The binding affinity increases by 65 J/mol Å2 . In many cases, the hydrophobic interactions are a dominant contribution to the free energy of binding. In Fig. 4.12 the lipophilic surface area that is buried upon Table 4.4 Binding of 4.4 to the serine proteases thrombin and trypsin X O N O N CHO NH NH O N Gly 216 4.4 H H H2N IC50 values in mg/mL Enzyme X ¼ ─NH─ ─O─ –CH2– Thrombin 0.009 52 0.07 Trypsin 0.009 43 0.018 80 4 Protein–Ligand Interactions as the Basis for Drug Action
  • 98.
    complex formation ofthe same 80 protein–ligand complexes as in Fig. 4.10 are shown together with their experimentally determined binding constants. Here too, the values are scattered over a broad range. 4.10 Binding and Mobility: Compensation of Enthalpy and Entropy According to Eq. 4.3, enthalpy and entropy are in a close physical relationship, and their sum results in the free energy of binding. If the formation of protein–ligand N+ H H N N S O O SO2 X X X=H Cl CH3 CF3 Br I 4.5 Fig. 4.11 The scaffold of the HIV protease inhibitor 4.5 was enlarged during the course of a lead structure optimization by adding hydrophobic groups to the aromatic N-benzyl group. An unchanged binding mode was evidenced crystallographically. The additional molecular volume improved the binding affinity in a linear manner by about 65 J/mol Å2 . 0 6 2 0 4 8 10 12 14 16 100 200 300 400 −lgK i X/Å2 Fig. 4.12 In analogy to Fig. 4.10, a plot of the binding constants Ki of the 80 crystallographically investigated protein–ligand complexes against the buried hydrophobic surface area shows that there is no simple function for this measure either. 4.10 Binding and Mobility: Compensation of Enthalpy and Entropy 81
  • 99.
    complexes is considered,the DG of weakly binding millimolar complexes and strongly binding nanomolar complexes fall between ca. 35–55 kJ/mol. A lead structure optimization (▶ Chap. 8, “Optimization of Lead Structures”) usually covers an even smaller range. Typically, the binding constants are improved by 5–6 orders of magnitude, which correspond to 25–30 kJ/mol. Upon exchanging functional groups in a lead structure, the enthalpy DH usually varies over a considerably broader range. If the change in DG is much smaller during the course of this replacement, out of purely mathematical reasons the changes in the enthalpy DH must be compensated by an opposite change in the entropy TDS. It is only in this way that the large variations in the two properties can lead to the result that DG remains in a small window. An important question is derived from this: Is there a connection that causes the enthalpy and entropy, which are opponents, to partially compensate one another during the optimization? How can it be nonethe- less achieved that both measures are optimized without canceling out the effects of one another so that DG remains unchanged? Entropic optimization aims at increasing the hydrophobic surface of a ligand that becomes buried upon binding. It is embodied in this very intuitive factor that the enlarged ligand displaces an increasing number of water molecules upon binding. The design of a rigid ligand with correctly frozen conformational degrees of freedom usually leads to an improvement in the entropic binding contribution (▶ Sect. 24.6). To increase the enthalpic binding of a ligand to the protein, above all, additional polar interactions must be incorporated. This, however, as a rule comes at a price in that the additional polar groups must first release their solvation shell. This contribution to desolvation must be paid for. If an amidine group is added to the para position of the unsubstituted phenyl group of the thrombin inhibitor 4.6, a significant improvement in the affinity is obtained in 4.7, which is accompanied by a strong increase in the enthalpy (Fig. 4.13). The inhibitor forms a salt bridge with its benzamidine group to an aspartate residue in thrombin. It is therefore strongly spatially fixed, which is entropically unfavorable. The inhibitor 4.6, which lacks the polar group, binds with a similar geometry. It cannot, however, form the salt bridge. The structure indicates an increased residual mobility of the inhibitor in the binding pocket, which is advantageous from an entropic point of view. The two compounds 4.8 and 4.9 also represent thrombin inhibitors. They differ in the size of the cycloalkyl group on the basic scaffold that fills a hydrophobic pocket of the protein. Both inhibitors have practically the same binding affinity for thrombin. However, the free energy of binding is partitioned very differently into the enthalpy and entropy components. The compound with the cyclopentyl substituent has an enthalpic advantage and an entropic disadvantage compared to the six-membered-ring derivative. From where does this surprising effect originate? The crystal structures of the two derivatives with thrombin show an important difference with regard to the cycloalkyl group. Whereas the five-membered ring is easily recognized in the electron density (▶ Sect. 13.5), practically no density at all is visible where the six-membered ring should be encountered. Such an observation in an X-ray structure indicates a high degree of disorder in a particular moiety of 82 4 Protein–Ligand Interactions as the Basis for Drug Action
  • 100.
    the protein–ligand complex.This disorder can be of a purely static nature whereby the six-membered ring is scattered over many orientations. Alterna- tively, it can also be the result of a much larger residual mobility in the protein- bound state than observed for the five-membered-ring derivative. Molecular dynamics simulations (▶ Sect. 15.7) confirmed this difference. In the case of the five-membered ring compound, the cyclopentyl group remains in a hydrophobic pocket and from time to time it undergoes a jump rotation. In doing so, the virtually planar ring jumps between two orientations and exchanges its upper and lower face. This practically does not change the placement of the ring in the pocket. Furthermore, compound 4.8 does not form a hydrogen bond to the carbonyl group of Gly216. The six-membered ring derivative 4.9 behaves entirely differently. Here the cyclohexyl group moves out of the binding pocket during the course of the simulation and returns after some time. At the same time, 4.9 forms N O O N H NH2 4.6 N H N O O NH2 H2N NH 4.7 N H N O O NH NH H2N 4.8 N H N O O NH H2N NH 4.9 ΔG: −31.7 kJ/mol ΔH: −13.6 kJ/mol −TΔS: −18.1 kJ/mol ΔG: −46.7 kJ/mol ΔH: −40.6 kJ/mol −TΔS: −6.1 kJ/mol ΔG: −36.2 kJ/mol ΔH: −10.5 kJ/mol −TΔS: −25.7 kJ/mol ΔG: −35.4 kJ/mol ΔH: −16.9 kJ/mol −TΔS: −18.5 kJ/mol Fig. 4.13 Replacement of a phenyl group in 4.6 by a para-benzamidinophenyl group in 4.7 leads to a significant improvement in the affinity of this thrombin inhibitor, which is largely because of an enthalpic gain. This is because of the formation of a salt bridge to Asp189 (▶ Sect. 23.3). The homologous ligands 4.8 and 4.9 bind equally strongly to thrombin, but the binding affinity is divided into the enthalpic and entropic contributions entirely differently. Compound 4.9 has a significantly higher residual mobility in the binding pocket than 4.8, which results in an entropic advantage for this derivative, even though the poorer contacts to the protein cause an enthalpic disadvantage. 4.10 Binding and Mobility: Compensation of Enthalpy and Entropy 83
  • 101.
    an intermediate hydrogenbond to Gly216. It is because of this that 4.9 maintains a large amount of residual mobility. This difference in the dynamic behavior of 4.8 and 4.9 explains the divergent thermodynamic profile. The cyclopentyl derivative has an entropic disadvantage because it is largely fixed in the binding pocket. The unambiguous orientation achieves an advantage for enthalpic interactions. The good and stabile contacts to the protein ensure an increased contribution to the interaction energy. This looks different for the six-membered-ring derivative. Its looser fixation in the binding pocket means a smaller loss in degrees of freedom upon complex formation. This causes an entropic advantage. Enthalpically, however, this behavior is disadvantageous. Because it temporarily leaves the binding pocket, interaction with the protein can only be formed with reduced strength. What can be learned from this example? Even when ligands have a very similar structure, the binding behavior can be significantly different. Their residualmobility in the binding pocket can have decisive consequences for the thermodynamic binding contributions. Obviously a mutual compensation of enthalpy and entropy leads to an unchanged free energy. This interplay of residual mobility in the binding pocket and quality of the formed interactions has, of course, consequences for the optimiza- tion process. Medicinal chemists like to think in terms of group contributions to binding affinity experienced during the exchange of particular functional groups. Statistical analyses of such group contributions have been carried out and can be applied as a rule of thumb to guide optimization strategies. The thinking is usually done additively. How much is gained if a particular group is combined with another in a molecule that is to be optimized? One should be careful with these considerations. Small differences in the binding behavior cause such simple rules to fail. The optimization of the thrombin inhibitors 4.10 and 4.11 should be considered as examples (Fig. 4.14). Two changes should be undertaken. One is that a hydro- phobic substituent on the end of the molecular scaffold should be enlarged from an n-propyl to a phenylethyl group. This means a significant increase in the hydro- phobic molecular surface area. Second, an amino group introduced next to the hydrophobic group should form a hydrogen bond to Gly216. The two changes from 4.10 to 4.11 lead to an improvement in the affinity of DDG ¼ 18.6 kJ/mol. Both modifications could also be introduced sequentially with the intermediates 4.12 and 4.13. If the hydrophobic group is first enlarged from 4.10 to 4.12, only a small amount of binding affinity is gained. If 4.12 is further optimized, a significant affinity gain is obtained. Does the amino group yield so much in affinity? The reverse approach can also be taken, and the amino group can be introduced to 4.10 to give 4.13. For this change an improvement of only DDG ¼ 9.6 kJ/mol is obtained. The final enlargement of the hydrophobic surface area of 4.13 to 4.11 features another 9.0 kJ/mol of affinity gain. This example shows that simple additivity rules fail. As in the example with the five- and six-membered-ring derivatives 4.8 and 4.9, the balance of the residual mobility, partial solvation of the binding pocket, and quality of the formed inter- actions exert a decisive influence on the increase in affinity. The interplay of these 84 4 Protein–Ligand Interactions as the Basis for Drug Action
  • 102.
    partially compensating enthalpicand entropic binding contributions is responsible for this complex picture. 4.11 Lessons for Drug Design This chapter should not give the impression that a quantitative prediction about the strength of protein–ligand interactions is impossible. Despite the complex character of protein–ligand interactions, some simple rules should always be consulted first. ΔΔG = -3.1 kJ/mol ΔΔG = -9.0 kJ/mol ΔΔG = -9.6 kJ/mol ΔΔG = -15.5 kJ/mol N O O N H Cl 4.10 ΔG = -19.9 kJ/mol N O O N H Cl NH2 4.13 ΔG = -29.5 kJ/mol N O O N H Cl 4.12 ΔG = -23.0 kJ/mol N O O N H Cl NH2 4.11 ΔG = -38.5 kJ/mol ΔΔG = -18.6 kJ/mol Fig. 4.14 Optimization of the thrombin inhibitor 4.10 to 4.11 increases affinity by DDG = 18.6 kJ/mol. This is achieved by increasing the size of the hydrophobic side chain (red) from n-propyl to phenyl and attaching an amino group (blue). The changes can also be accomplished in step-wise fashion. Increasing the hydrophobic surface to 4.12 enhances affinity only by 3.1 kJ/mol, major contribution of 15.5 kJ/mol is provided by the addition of the subsequently introduced amino group. Adding first the amino group to feature 4.13, contributes 9.6 kJ/mol, and the subsequent substitution of the hydrophobic substituent increases affinity by another 9 kJ/mol. Explanation for the lack of additivity is found in the complex interference of residual mobility, desolvation and strength of the formed enthalpic interactions. 4.11 Lessons for Drug Design 85
  • 103.
    • Many strongprotein–ligand interactions are characterized by extensive lipo- philic contacts. An increase in the lipophilic contact area between the protein and the ligand often leads to an improvement in the binding affinity. This means that the search for unoccupied lipophilic pockets in the protein should be the first step in the design and optimization of new ligands. Admittedly, this approach should not be taken too far because a huge increase in the total lipophilicity of a molecule increasingly reduces its water solubility. • An additional H-bond does not guarantee an increase in the binding affinity. An H-bond contributes to the total inventory if a stronger interaction of the participating groups occurs in the protein–ligand complex compared to those in bulk water. On the other hand, a buried polar atom that cannot be accommodated with an H-bond almost always leads to a loss in binding affinity. It must be ensured in ligand design that polar atoms find binding partners in case they are no longer water-accessible in the formed protein–ligand complex. • Each ligand displaces water molecules upon protein binding. There are binding pockets in proteins that are formed in a way that they cannot be optimally solvated by water. In these cases, a ligand can be in the position to form more H-bonds to the protein than is possible with water. The binding affinity of such ligands can be very high. • Rigid ligands can bind more strongly than flexible ligands because the loss of internal degrees of freedom is less for rigid ligands. • Water can form strong H-bonds, but is often not as good a ligand for transition metals as thiols, acids, hydroxamic acids, and related groups. Accordingly, a direct interaction with the metal ion is important for most proteins that contain a transition metal (▶ Chap. 25, “Inhibitors of Hydrolyzing Metalloenzymes”). Generally, all protein–ligand interactions that either cannot at all or can only very poorly be replaced by water contribute strongly to the affinity. The relative contributions of enthalpy and entropy to the binding affinity DG, the actual property that is to be optimized in drug design, are important for the characterization of ligand binding. This goal can be achieved by improving the enthalpic or entropic contributions, or optimally both in parallel. For this the different parameters of the protein–ligand interaction must be concentrated upon (▶ Sect. 8.8). The question is open whether an enthalpically or an entropically driven binding is advantageous for a particular drug. The break-through strategy will depend on whether the binding of the active substance will show adequate tolerance for quickly developing resistance mutations (▶ Sects. 24.5, ▶ 31.4, and ▶ 32.5), high target selectivity, or even the desired broad binding promiscuity toward multiple members of a protein family (▶ Sect. 25.6, ▶ 26.4, and ▶ 27.4). 4.12 Synopsis • Emil Fisher introduced the “lock-and-key” principle to describe the interaction of a small molecule substrate and a macromolecular receptor. More than 50 years later, Koshland extended this picture by induced-fit considerations 86 4 Protein–Ligand Interactions as the Basis for Drug Action
  • 104.
    that allow bothbinding partners to change conformations and mutually adapt to one another to optimally interact. • The cells are surrounded by a lipid double-layer membrane with polar head groups on the exterior and hydrophobic alkyl chains in the interior. This membrane is a barrier for polar substances, but sufficiently lipophilic compounds can penetrate and even pass through the membrane. • The strength of protein–ligand interactions is measured by the binding constant, which quantifies the stability of a protein–ligand complex as a dissociation constant according to the law of mass action for complex formation. • The binding constant is logarithmically related to the Gibbs free energy of binding. The free energy is composed of an enthalpic and entropic contribution. The enthalpic part summarizes all terms that relate to the interaction energy of the binding partners. The entropic part considers the order of the system and how its energy content is distributed over the degrees of freedom of the system. • Protein–ligand complexes usually form through non-covalent interactions, pre- dominantly through hydrogen bonds. The strength of hydrogen bonds strongly depends on the distributions of charges among the interacting functional groups. Whether a group is charged or not depends on its protonation state, which is defined by the pKa value of the titratable groups involved in the protein–ligand interactions. • Depending on the local environment in a binding pocket, the pKa values of titratable groups can vary significantly and can, by this, transform a normal H-bond into a much stronger charge-assisted H-bond. • Hydrophobic interactions form through the close proximity of non-polar functional groups of the binding partners. As direct interactions, they are rather weak. Nevertheless they can afford a significant contribution to binding affinity through the release of water molecules from either the lipophilic environment of the binding pocket or from the ligand surface next to a lipophilic surface patch. • The strength of protein–ligand interactions is strongly influenced by the water environment. Both the protein binding pocket and the ligand are solvated before complex formation and functional groups of protein and ligand will form H-bonds to water molecules. The total balance of the hydrogen-bond inventory before and after complex formation matters for binding affinity considerations. Only if the newly formed hydrogen bonds in the complex are increased in number and/or stronger than those previously formed to water, a net affinity increase results. • The release of water molecules from hydrophobic surface patches can increase affinity by enthalpy and entropy. Release of fixed water molecules increases the degrees of freedom and boosts entropy. Replacement of highly disordered water molecules into the bulk water environment can contribute to an enthalpic gain. • Entropic contributions to binding arise from an increase of the degrees of freedom of the protein–ligand–water system and, as a first approximation, correlate with the size of the hydrophobic surface buried in the formed complex. • Free energy variations are observed over a window of about 30–55 kJ/mol in protein–ligand complexes. Variations in enthalpy (DH) and entropy (TDS) can 4.12 Synopsis 87
  • 105.
    be much larger.This results from extensive enthalpy/entropy compensation. Entropically favored increases in the degrees of freedom, release of water molecules, or enhanced residual mobility are usually detrimental to improve- ments in the enthalpy that result from strong interactions. • The pronounced interdependence of enthalpy and entropy along with dynamic versus interaction geometric phenomena causes simple additive rules about func- tional group contributions to fail. Instead pronounced cooperative effects are in operation. Bibliography General Literature Andrews PR (1993) Drug-receptor interactions. In: Kubinyi H (ed) 3D-QSAR in drug design. Theory, methods and applications. Escom, Leiden, pp 13–40 Andrews PR, Craik DJ, Martin JL (1984) Functional group contributions to drug-receptor interactions. J Med Chem 27:1648–1657 Böhm HJ, Klebe G (1996) What can we learn from molecular recognition in protein-ligand complexes for the design of new drugs? Angew Chem Int Ed Engl 35:2588–2614 Böhm H-J, Schneider G (2003) Protein-ligand interactions. From molecular recognition to drug design. In: Mannhold R, Mannhold R, Kubinyi H, Folkers G (eds) Methods and principles in medicinal chemistry. Wiley-VCH, Weinheim Creighton TE (1992) Proteins: structures and molecular properties, 2nd edn. W.H. Freeman, New York Gohlke H, Klebe G (2002) Approaches to the description and prediction of binding affinity of small-molecule ligands to macromolecular receptors. Angew Chem Int Ed Engl 41:2644–2676 Kuntz ID, Chen K, Sharp KA, Kollman PA (1999) The maximal affinity of ligands. Proc Natl Acad Sci USA 96:9997–10002 Special Literature Ehrlich P (1913) Chemotherapeutics: scientific principles, methods and results. Lancet 182:445–451 Fersht AR, Shi JP, Knill-Jones J et al (1985) Hydrogen bonding and biological specificity analysed by protein engineering. Nature 314:235–238 Gerlach C, Smolinski M et al (2007) Thermodynamic inhibition profile of a cyclopentyl- and a cyclohexyl derivative towards thrombin: the same, but for deviating reasons. Angew Chem Int Ed Engl 46:8511–8514 Lichtenthaler FW (1994) 100 Years “Schluessel-Schloss-Prinzip”: what made Emil Fischer use this analogy? Angew Chem Int Ed Engl 33:2364–2374 Mason RP, Rhodes DG, Herbette LG (1991) Reevaluating equilibrium and kinetic binding parameters for lipophilic drugs based on a structural model for drug interaction with biological membranes. J Med Chem 34:869–877 Morgan BP, Scholtz JM, Ballinger MD, Zipkin ID, Bartlett PA (1991) Differential binding energy: a detailed evaluation of the influence of hydrogen-bonding and hydrophobic groups on the inhibition of thermolysin by phosphorous-containing inhibitors. J Am Chem Soc 113:297–307 Petrova T, Steuber H et al (2005) Factorizing selectivity determinants of inhibitor binding toward aldose and aldehyde reductases: structural and thermodynamic properties of the aldose reduc- tase mutant Leu300Pro-Fidarestat complex. J Med Chem 48:5659–5665 88 4 Protein–Ligand Interactions as the Basis for Drug Action
  • 106.
    Optical Activity andBiological Effect 5 The three-dimensional shape of a molecule has a decisive influence on its biological activity. The configuration of a molecule is made up of the bonds between the atoms. Substances with an asymmetric center that are considered here are optically active and exist in two different forms. They are asymmetrically built and have a relationship to one another like of an image and its mirror image. They are called chiral. It is impossible to convert one form into the other without breaking and remaking bonds. Chirality is often unimportant to chemists because the image and mirror image behave exactly the same in a symmetrical environment. If they are brought into an asymmetrical environment, for instance at the binding site of a protein, that is not true anymore. The consequences of this for drug design and therapy are the topic of this chapter. At the beginning of the nineteenth century, Jean Baptiste Biot observed that some quartz crystals rotated the plane of linearly polarized light to the right, and others to the left. Macroscopically this optical activity is imprinted in the asym- metric, handed (enantiomorphic) form of the crystals; they exist as left and right- handed mirror-image forms. A little later, Biot found that not only crystals but also organic compounds like turpentine oil or sugar solutions rotated polarized light in a particular direction. 5.1 Louis Pasteur Sorts Crystals The decisive experiment was carried out by the then 26-year-old Louis Pasteur in Paris in 1848. Several literature reports were inconsistent with his theory that an obvious relationship must exist between crystal forms and their optical properties. During a careful investigation of the sodium–ammonium salt of the optically inactive tartaric acid, he discovered that the crystals had different forms. They were either right- or left-symmetrical and could be sorted by hand. The crystals of the enantiomers 5.1 and 5.2 (Fig. 5.1) gave solutions that had an opposite rotational direction. This confirmed his suspicion. Before Pasteur could present his results to the Academy of Science, he had to repeat the experiment publically (!) in Biot’s G. Klebe, Drug Design, DOI 10.1007/978-3-642-17907-5_5, # Springer-Verlag Berlin Heidelberg 2013 89
  • 107.
    presence at theCollège de France. He was lucky. It was only because his solutions were allowed to slowly evaporate at room temperature that his experiment was successful. Above the critical temperature of 28 C, a stoichiometric 1:1 mixture of both enantiomeric forms, a racemate, would have homogeneously crystallized (Sect. 5.4). A few years later Pasteur managed another important observation: mold con- tamination of a racemic tartaric acid solution caused optical activity to develop. One enantiomer of tartaric acid is metabolized significantly faster than the other. With this, he discovered two important methods to separate racemates into enan- tiomers. Whereas mechanical sorting is limited to a very few examples, enzymatic kinetic resolution of enantiomers has found broad applications (Sect. 5.4). 5.2 Structural Basis of Optical Activity An explanation for optical isomerism was possible with the help of the theory of tetrahedral carbon, which was independently developed in 1874 by Jacobus COOH HO H COOH H OH COOH H OH COOH H OH COOH HO H COOH H OH Inversion Symmetry 5.1 5.2 5.3 D-(-)-Tartaric acid L-(+)-Tartaric acid meso-Tartaric acid Mirror plane Fig. 5.1 Optical isomerism in tartaric acid. The enantiomers ()-tartaric acid 5.1 (mp. 168–170 C, [a]D 20 ¼ 12 ) and (+)-tartaric acid 5.2 (mp. 168–170 C, [a]D 20 ¼ +12 ) cannot be superimposed upon each other either in the plane of the paper or in 3D space. They have only a twofold rotational axis (orange axes) that dissect the central C—C bond. Each mirror image rotates the plane of polarized light in opposite directions to the other. In contrast, meso-tartaric acid 5.3 (mp. ¼ 140 C) has an inversion center of symmetry (the purple center on the central C—C bond). Solutions of meso-tartaric acid have no optical activity because the contribution from each stereogenic center compensates for the other. Racemic tartaric acid (mp. ¼ 206 C, no rotation) is a 1:1 mixture of both enantiomers of tartaric acid 5.1 and 5.2. Such mixtures are optically inactive and are called racemates (Lat. racemus, the grape—tartaric acid is found in grapes and wine). 90 5 Optical Activity and Biological Effect
  • 108.
    Henricus van’t Hoffand Joseph-Achille Le Bel. When a carbon atom carries four different substituents an asymmetric, or, as it is sometimes called, a stereogenic center is produced. This property is not limited to carbon; nitrogen (in ammonium salts), or silicon atoms with four different substituents, phosphorus, for instance, in phosphonic or phosphoric acid esters, or even sulfur atoms in sulfoxides (with two different substituents, oxygen, and the lone electron pair) can also be asymmetric. The spatial orientation of these compounds give rise to two mirror-image isomers, each of which rotates polarized light in the opposite direction to the same degree. These forms are called enantiomers (earlier antipodes). With the exception of their optical activity, enantiomers are identical in all of their chemical and physicochemical properties, but only as long as they are in an achiral environment. Compounds with two chiral centers that are configured as an image and mirror image within the same molecule do not exhibit optical activity macroscopically. meso-Tartaric acid 5.3 (Fig. 5.1), an inversion-symmetrical molecule, exists as a racemic mixture of chiral conformers. Each conformer exists as an “internal” racemic mixture because in one energetically favored conformation the molecule exhibits inversion symmetry. Its left part can be inverted by point reflection through the center of the central C—C bond into its right part. Optical activity is also present in other forms of molecular asymmetry. An example is any regular or irregular tetrahe- dral orientation of different substituents on any other scaffold than a single carbon atom. Another case can be found in compounds in which two groups are strongly rotationally hindered around a common bond. An asymmetrical center results, giving rise to optically active rotational isomers, so-called atropisomers (Fig. 5.2). The experimentally determined rotational value (+) or () (previously called d or l) is used to characterize enantiomeric compounds. The spatial configuration of a stereogenic center in a molecule is described as D or L (Lat. dextro, levo). This notation is based on the Fischer convention and is related to the absolute 5.4 Twistane Methalqualone N O N N N 5.6 N O 5.5 N N O Fig. 5.2 Even molecules without stereogenic centers can form an image–mirror-image pair because of their spatial construction; an example is twistane 5.4. If rotation around the bonds is limited, as in the case of the sedative methaqualone 5.5, enantiomers are separable (so-called atropisomers). In non-planar fused ring systems like the dibenzocycloheptadiene derivative 5.6, the enantiomeric separation depends on the barrier of inversion for the ring system. 5.2 Structural Basis of Optical Activity 91
  • 109.
    configuration of D-and L-glyceraldehyde, 5.7 and 5.8 (Fig. 5.3). Most sugars, for instance glucose 5.9, can be traced back to D-glyceraldehyde 5.7, and the natural amino acids of proteins, for instance alanine 5.10, can be traced back to L-glyceraldehyde 5.8. For this reason, today the D/L nomenclature is still frequently applied to sugars and amino acids. The enantiomers of tartaric acid correspond to the D-() or L-(+) form. The Cahn–Ingold–Prelog rule allows an unambiguous stereochemical assign- ment (Fig. 5.4). According to the convention, the optical center is oriented so that the substituent with the smallest atomic number is at the back (e.g., a hydrogen atom or a lone pair of electrons). To use an intuitive explanatory model, we want to assign this substituent to be the column of a steering wheel. Then the other sub- stituents lie in the plane of the steering wheel. If these substituents are regarded in descending order according to the atomic number, and this sequence follows a rotation to the right, the stereogenic center has an R configuration; the opposite direction is the S configuration (from the Latin: rectus and sinister). The only disadvantage to this nomenclature system is that the assignment of the stereocenter can change just because of the atomic number, valency, or oxidation state. The homologous L-amino acids serine and cysteine, which are structurally stereochem- ical analogues that differ only in that an oxygen is exchanged for a sulfur atom, are classified as (S)-serine and (R)-cysteine. If one stereogenic center is present in a molecule, there are two enantiomers. Each additional symmetry-independent stereocenter increases the number of CHO CHO H OH CHO Fischer Projection CH2OH CH2OH CH2OH H OH HO H H OH CH2OH CH2OH O H H H OH 5.7 L-Glyceraldehyde 5.8 D-Glyceraldehyde Stereoprojection 5.9 D-Glucose COOH CHO CHO H2N H H OH HO H 5.7 5.8 5.10 L-Alanine Fig. 5.3 The rotation (+ or ) and the Fischer assignment (D or L) is reported as part of the characterization of optically active compounds. To determine the Fischer assignment, the longest carbon chain is drawn vertically with the highest-oxidized carbon atom on top (e.g., 5.9). The standard is set by the asymmetric carbon (red) of the D- and L-glyceraldehyde pair (5.7 and 5.8). With sugars (e.g., glucose 5.9) or amino acids (e.g., alanine 5.10), the carbon that is marked with the arrow decides whether the molecule is D or L. 92 5 Optical Activity and Biological Effect
  • 110.
    enantiomers by afactor of 2. For n asymmetric centers, there are 2n optical isomers. They occur as 2n1 racemic mixtures because each has two isomers that behave as mirror images of each other. Diastereomers cannot be superimposed onto each other by any translation and rotation in space or by generating a mirror image because the chirality of the stereocenters differs relative to each other. As a result they have different physicochemical and chemical properties. All pairwise race- mates of a diastereomeric mixture are present as a 1:1 mixture of enantiomers, but their relative portions in the total composition can vary greatly. Labetalol 5.11 (Fig. 5.5) is just such a diastereomer pair that consists of two racemates, that is, two enantiomeric pairs. As a mixed antagonist, it affects the a-, b1-, and b2-adrenergic receptors (cf. ▶ Sect. 29.3). Because of the asymmetric architecture of biological macromolecules, the individual components of this mixture vary significantly in their quantitative and qualitative biological properties (Sect. 5.5, 5.7). • Large atomic numbers have priority over low ones, (e.g., BrClFONCH) • Free electron pairs always have the lowest priority • Larger atomic masses have priority, (e.g., for isotopes DH) • In case the first sphere is identical, (i.e., C), the next sphere is considered Cahn–ngold–Prelog Rules CH3 CH3 CH3 CH3 CH3 CH3 H H H H H H C[C+C+C] C[C+C+H] C[C+H+H] C[H+H+H] F CH3 CH3 CH3 CH3 CH3 CH3 CH3 CH3 CH3 CH3 CH3 F OH OH H NH2 NH2 H H H • Multiple bonds are considered as multiple single bonds, e.g., aldehyde CHO = C (O+O+H)CH2OH = (O+H+H) • If the substituents are chiral, the RS and R,RRS and S,SS,R • In the case of differently configurated double bonds ZE (Z = zusammen = together and E = entgegen = apart for the configuration of double bonds) H H H CHO H HO CH2OH CHO H HOH2C OH (R)-Glyceraldehyde (S)-Glyceraldehyde 5.7 5.8 Fig. 5.4 The R/S nomenclature that was proposed by R. S. Cahn, C. K. Ingold, and V. Prelog is unambiguous. Priority rules for each of the four different substituents on the tetrahedral stereogenic center were established. The substituent with the lowest priority is placed in the back, and the direction of remaining substituents determine the direction of rotation by decreasing priority. 5.2 Structural Basis of Optical Activity 93
  • 111.
    5.3 The Isolation,Synthesis, and Biosynthesis of Enantiomers Racemic acids and bases can often be separated by using other enantiomerically pure, optically active bases and acids, as the formed diastereomeric salts of which have different solubility. The chemical reaction of racemic acids, amines, and alcohols with optically active alcohols or acids results in diastereomeric reaction products. Because of their different characteristics, it is possible to separate them and finally isolate the desired optically active product by chemical cleavage. Syntheses that do not start with optically active starting materials, and that use no optically active auxiliaries, always lead to racemic mixtures, that is, an exact 50:50 mixture of both enantiomers. Access to optically active compounds can be obtained when synthetic reaction components are taken from the “chiral pool”. Here, all optically active natural products, their derivatives, and degradation prod- ucts that are available in an optically pure form can be used as easily accessible synthetic building blocks. Syntheses with chiral catalysts are particularly elegant. In most cases the optimization of the yield and enantiomeric purity, which is expressed as the ee value (ee¼enantiomeric excess) requires considerable process development. The chromatographic separation of racemates on optically active solid supports is more appropriate for semipreparative or analytical purposes. Enzymatic and biotechnological techniques have increasingly gained favor in the last years. Proteases, esterases, lipases, or hydantoinases react more or less selectively, preferentially with a distinctly different reaction rate; only one enan- tiomer of a racemic mixture is transformed to the product. The selectivity and yield of such a reaction can be optimized through the careful selection of the medium and other reaction conditions. The production of optically pure ephedrine is an example of an industrial application of biotechnological synthesis that has been in use for decades. This phytopharmacon is found in combination preparations for the adjuvant therapy of O OH H2N N H * * HO CH3 5.11 Labetalol N H OH H N H H HO R1 N R2 H CH3 R1 N R2 H3C H O (R,R) (S,S) R1 N H R2 OH H R1 N H R2 H O H H3C H H CH3 (R,S) (S,R) Fig. 5.5 Because it has two different asymmetric centers, labetalol 5.11 is a diastereomeric mixture of four different compounds with different activities on the same receptor. The antagonistic potency on the a1 receptor of the (R,R)-, (R,S)-, (S,R)-, and (S,S)-isomers is: S,R S,SR,RR,S; and on the b1 receptor is: R,R R,SS,SS,R; and on the b2 receptor is: R,R R,S S,SS,R. 94 5 Optical Activity and Biological Effect
  • 112.
    rhinitis, bronchitis, andasthma. The synthetic intermediate 5.12 (Fig. 5.6) is obtained from a mixture of benzaldehyde, sugar, and yeast. It is then transformed to (1R,2S)-()-ephedrine 5.13, which is identical to the natural product in both of its optical centers. The C1 isomer (1S,2S)-(+)-pseudoephedrine 5.14 is a diastereomer of ephedrine. Its optical rotation, melting point, and biological characteristics are different from ephedrine’s. Innumerable other microbial syntheses deliver optically pure products with or without the use of achiral, racemic, or enantiomerically pure starting materials. The biotechnological syntheses of a variety of antibiotics, above all the penicillins and cephalosporins (▶ Sects. 2.4 and ▶ 23.7), are of particular economic importance. Even the biotechnological preparation of synthetic intermediates for chiral drugs is gaining increasing importance. 5.4 Lipases Separate Racemates Because of their asymmetric architecture, lipases are well suited to separate race- mates. This can either happen if one of the two enantiomers binds as a substrate better and reacts faster, or if a chemical reaction takes place in the binding pocket of the protein with disparate efficiency. Lipases are often used for kinetic resolution because their architecture and their lipophilic surface allow them to sustain their reactivity in organic solvents. They belong to a larger family of hydrolyzing enzymes (▶ Chap. 23, “Inhibitors of Hydrolases with an Acyl–Enzyme CHO H3C COOH O sugar yeast + Benzaldehyde + Pyruvic acid H OH Yeast CH3 O (R)-(-)-1-Hydroxy- 1-phenylacetone 5.12 H CH3NH2/H2/Pt NHCH3 NHCH3 OH H H HO H H CH3 CH3 (1R,2S)-(-)- Ephedrine 5.13 (1S,2S)-(+)- Pseudoephedrine 5.14 Fig. 5.6 The biotechnological production of ephedrine is accomplished by the fermentation of sugar with baker’s yeast Saccharomyces cerevisiae to pyruvic acid. Pyruvic acid is coupled to benzyldehyde with decarboxylation to form (R)-(–)-1-hydroxy-1- phenylacetone 5.12. Upon further chemical transformation (1R,2 S)-(–)- ephedrine 5.13 is obtained in optically pure form. (1S,2S)- (+)-pseudoephedrine 5.14 is a diastereomer of ephedrine. The configuration of one of the two chiral centers is different. 5.4 Lipases Separate Racemates 95
  • 113.
    Intermediate”). A nucleophilicserine is present in the catalytic center that forms an acyl–enzyme complex upon hydrolysis of an amide or ester substrate. The protein is then itself converted to an ester through the OH group of the serine, the so-called acyl form (▶ Sect. 23.2). Such a complex can then react with another nucleophile, for instance an amine. The amine attacks the internal enzyme ester, the bond to the serine oxygen atom is broken, and a new amide bond is formed. If one employs the right or left-handed form of an amine, one form will react preferentially. In this way, the racemate is resolved. How does the enzyme manage to distinguish between both enantiomers of an amine? The reaction of (R)- and (S)-phenylethylamine 5.15 and 5.16 with the lipase Candida antarctica was carefully investigated (Fig. 5.7). The energy barrier for the faster-reacting R form is lower than for the slower S form. A more exact evaluation of the kinetic parameters showed that this is above all due to an enthalpic advantage of the (R)-amine. The S form has an entropic advantage. Altogether the enthalpic component is in excess so that the free energy (DG) favors the R form (Fig. 5.7). How is this discrimination to be understood? Structural transition-state analogues were synthesized. In the place of the unstable tetrahedral carbon atom intermediate, a phosphorus atom was introduced (5.17 and 5.18, Fig. 5.8). This trick gives a stable compound that is very similar to the transition state form at the carbon atom. These analogues were synthesized with both enantiomeric amines, and complexes with the lipase were prepared. Marco Bocola managed to get a crystal structure of both. E-R E-S ΔG ΔR-SG ΔR-SH E-A TΔR-SS E+S + (R)-Amine (S)-Amine + E+R Reaction coordinate NH2 5.15 NH2 5.16 ΔSG ΔRG ΔR-SΔG = −19.4 ± 6 kJ/mol Fig. 5.7 The reaction of (R)- and (S)-phenylethylamine, 5.15 and 5.16, with Candida antarctica lipase begins with the formation of an acyl–enzyme complex, E–A. The faster-reacting R-amine 5.15 (red) forms a lower-energy transition state that leads to the free enzyme and the R-amide (E+R). Analogously the S-amide (E+S) forms from the higher-energy E–S transition state (blue) from the S-amine 5.16. Difference in DG{ is 19.4 kJ/mol and favors the R form. The DG{ difference is based on a combined enthalpic and entropic contribution in which the R form is enthalpically favored, and entropically disfavored. The S form is enthalpically disfavored but has an entropic advantage. 96 5 Optical Activity and Biological Effect
  • 114.
    O a N H CH3 P O O O 5.18 Trp104 Ser HN N N H His224 N H CH3 P O O O 5.17 b Trp104 Ser HN N N H His224 Fig.5.8 Shown above is a phosphorous transition state analogue 5.18 for the lipase with the (S)-amine (a). The crystal structure and simulations indicate that it is less-rigidly fixed in the transition state and rarely adopts the geometry with an H-bond (purple) to histidine (on the lower edge of the binding pocket) that is necessary for the reaction to occur. The relevant complex with the transition-state analogue 5.17 of the faster- reacting (R)-amine is shown in (b). This substrate is highly restricted in the binding pocket. Its methyl group (above right) is embedded in a small niche in the binding pocket. This substrate exclusively adopts the geometry with the H-bond to histamine. This orientation is required for a successful substrate reaction. Therefore, the (R)-amine 5.17 reacts with the enzyme faster. 5.4 Lipases Separate Racemates 97
  • 115.
    Interestingly the transition-stateanalogue of the faster-reacting R form fits into the binding pocket well (Fig. 5.8). On the other hand, the S form demonstrated great residual mobility in the catalytic center. Computer simulations and molecular dynam- ics with both forms confirmed the picture: whereas the R analogue had a well-defined and temporally stable geometry, which is ideal for the reaction, the S analogue is very mobile and rarely adopts an orientation that is productive for the catalytic reaction in the lipase. Therefore a successful reaction of this substrate occurs much less often. On the other hand, the R analogue, fixed in a vice-like clamp and waiting for its reaction, forms good enthalpic contacts with the enzyme. It takes on a form that is practically complementary to the enzyme pocket. This results in a large enthalpic advantage. The fixation has its entropic price though. The methyl group on the stereogenic center embeds itself in a small niche in the binding pocket. The S analogue does not have this possibility because its methyl group is oriented in the mirrored direction. In this case, the anchor that can be embedded in the binding pocket is missing. It has a high mobility in the catalytic center and does not lose as many degrees of freedom compared to the situation before enzyme binding. Entropically this is advantageous. Enthalpically, however, the substrate loses a good interaction and the complementary fit is rarely achieved. In the end, the enthalpic component prevails so that the (R)-amine is transformed significantly faster. This is more than enough to ensure that, in practice, only the (R)-amide is formed in high yield. This lipase can also be immobilized onto a solid support and loaded into a glass column. After the acyl form is prepared on the column, a racemic mixture of the amine only need to be poured onto the column. The (S)-amine and (R)-amide must then simply be collected in a flask. If the solvent is well chosen, the amide crystallizes directly from the solution, and can be mechanically separated. Interestingly, the enantiopreference of the kinetic resolution is lost with increasing temperature or enlargement of the enzyme pocket. An enlargement can be achieved by exchanging a tryptophan along the rim of the catalytic pocket for a histidine. The higher temperature or increased space in the binding pocket increases the mobility of both substrates in the lipase. The enthalpic advantage of the faster-reacting R-amine is lost. The entropic difference of both substrates levels out under these conditions. This example shows on a molecular level how a lipase achieves kinetic resolution. With knowledge of the energetic parameters and structural information, an attempt can be made to tailor lipases for other transformations. Because of the importance of such reactions, the targeted design of enzyme catalysts has developed into an ever more important theme for the synthesis of chiral building blocks in new drugs. 5.5 Differences in the Activity of Enantiomers Flora and fauna stand out because of their symmetry. Consider the face, the arms and legs, the ribs, or an orchid flower. The exceptions, for instance a snail shell, are rare or occur, as in the case of the flounder, only under special evolutionary conditions. The inner organs of vertebrates are oriented partially paired and partially asymmetrically. 98 5 Optical Activity and Biological Effect
  • 116.
    On the molecularlevel, there is no correlating symmetry: optically active building blocks prevail. All specific interaction partners of biologically active molecules are chiral. Enzymes and receptors are built of L-amino acids. Nucleic acids are built on a scaffold of D-ribose or D-deoxyribose building blocks. Most naturally occurring sugars have a D configuration. Important vitamins, hormones, and messengers exist in an optically homogenous form. Accordingly it is to be anticipated that enantiomers of an optically active ligand have different effects. This has been proven with many thousand examples. Enantiomers most often show significant differences in their efficacy and the quality of their effect. According to the suggestion of Everhardus J. Ariëns, biologically active enan- tiomers are referred to as eutomers, and inactive enantiomers as distomers. The quotient of both affinities or effects is defined as the eudismic ratio, and the logarithm of this value is called the eudismic index. It should be considered that this value must be determined on extremely pure compounds. As little as 1% of the eutomer as an impurity in an entirely inactive distomer can simulate 1% relative activity in the distomer! The more the activities of enantiomers in a racemic pair differ, the stronger the eudismic ratio drifts away from 1. Examples of this are given in the compounds 5.20–5.22 (Fig. 5.9). A eudismic ratio of 500,000 was measured for a chloride ion transporter inhibitor. In this case the chemists pulled out all the stops for the purification of the less-effective enantiomer. Theoretically, a nanomolar-effective compound should give even higher values. A few naturally occurring peptide antibiotics contain D-amino acids. This affords them better metabolic stability. For the same reason, D-amino acids are incorporated into many synthetic peptide molecules. In the best cases, a stronger and longer-acting analogue is obtained. Synthetic analogues of peptides with a retro–inverso configuration represent a special case. The direction of the peptide chain, or a part of the peptide chain is reversed in these cases, that is, compared to the original peptide, the amino and carboxyl groups of single amino acids are reversed. In order to maintain the relative configuration, D-amino acids or their analogues are used instead of L-amino acids. In this way it is possible to deceive some enzymes or receptors; they bind the natural peptide and the retro–inverso peptide in the same way. This is true for thiorphan 5.23 and its retro–inverso analogue 5.24, for two enzymes, but not for a third one (Fig. 5.10). As a general rule retro–inverso peptides are metabolically more stable than their original peptide analogues. Enantiomers differ not only in the strength of their effects, but also the qualities. These differences can manifest as undesirable side effects of the antipode, for instance the chiral barbiturate 5.25 (Fig. 5.11). The most severe drug side effect of the last 50 years was the embryonal malformations that were caused by the sleeping pill thalidomide 5.26 (Contergan® ); these were caused by one of the two enantiomers (Fig. 5.11). In the 1950s, thalidomide was claimed to be the best- tolerated sleeping pill, with the fewest side effects. In 1957 it was introduced to the market and was available in pharmacies as an over-the-counter drug. There were no concerns that even women in the first months of their pregnancies were taking these sleeping pills. In 1961 it was withdrawn from the market because of its teratogenic 5.5 Differences in the Activity of Enantiomers 99
  • 117.
    effects. If drugtesting were then what it is today, this catastrophe would certainly have been recognized earlier and probably largely avoided. This would not have been prevented by the administration of only one enantiomer. Both enantiomers racemize in vitro, that is, one converts into the other even in a test tube. O N H CH3 CH3 Eudismic Ratio * H OH b-Blockade Membrane effect 5.19 Propranolol H3C O O H CH3 N CH3 CH3 CH3 Cholinergic effect * + 5.20 Metacholine O O H CH3 N CH3 CH3 CH3 OH * * Ester group center + 50–100 5.21 Anticholinergic agent N H H t-Bu OH α1 Receptor 73 * * H D2 Receptor 1250 5 HT1 Receptor 8 5 HT2 Receptor 73 Muscarinic Receptor 0.5 * 5.22 Butaclamol, (+)-Enantiomer 100 1 320 Amino alcohol center 2–4 Fig. 5.9 Enantiomers have different biological effects. The eudismic ratio of propanolol 5.19 is 100 for b-antagonism, and for unspecific membrane interaction, it is, expectedly, 1. Identical partial structures can have entirely different eudismic ratios, for instance compare the optical center of the alcohol moiety of the cholinergic compound metacholine 5.20, with the identical center on the anticholinergic compound 5.21. Compound 5.21 also proves that the eudismic ratio of different centers in a compound are independent from each other. The example butaclamol 5.22 also shows that the same substance can have different eudismic ratios on different receptors. 100 5 Optical Activity and Biological Effect
  • 118.
    Accordingly, the effectwas confirmed in vivo after administration of the suppos- edly safe enantiomer led to teratogenic effects in an animal model. The “other” enantiomer can also open new therapeutic opportunities. The enantiomer of a synthetic opiate, for instance propoxyphene 5.27 (Fig. 5.11) has weak analgesic and narcotic effects, but good cough-suppressing effects. Enantio- mers can also influence each other in their effects, and even cancel one another out. In the case of the calcium channel ligand 5.28, one enantiomer is an agonist and the other is an antagonist. In the time period between 1983 and 2002, 38% of all approved drugs were achiral, 39% were enantiomerically pure, and 23% were racemic or diastereo- meric mixtures. The fact is that racemic mixtures of chiral drugs were much more easily accepted in earlier decades than they are today. This was certainly not caused by a stereophobia on the part of the chemical industry. It was more an expression of inadequate understanding of the stereospecificity and side effects, and perhaps also because economic considerations were in the foreground; kinetic resolution and/or enantiomerically pure syntheses are very expensive. You can certainly see that the proportion of enantiomerically pure drugs is gaining in the marketplace (Fig. 5.12). In the 1970s, Ariëns was the first to decisively come out against the use of racemic mixtures in therapy. Racemates are, in his view, compounds with 50% impurity. The non-active or less-active enantiomer is seen as enantiomeric ballast. He used the diastereomeric mixture labetalol 5.11 (Fig. 5.5, Sect. 5.2) as a showcase Enzyme Ki Value in mmol HS N COOH NEP 24.11 Thermolysin 0.0019 1.8 H O ACE 5.23 Thiorphan 0.14 NEP 24.11 0.0023 HS N COOH O Thermolysin ACE 2.3 ACE 5.24 retro -Thiorphan 10 H Fig. 5.10 Thiorphan 5.23 inhibits the metabolism of enkephalins and contains a b-mercaptopropionic acid, the absolute configuration of which is analogous to L-phenylalanine. Application of the retro–inverso concept gives aminothiol 5.24, the absolute configuration of which corresponds to D-phenylalanine. The identical binding mode to the zinc protease was determined for both thiorphan 5.23 and retro-thiorphan 5.24. Thiorphan and neutral endopeptidase 24.11 (NEP 24.11, previously referred to as enkephalinase) are inhibited by both compounds to the same extent. On the other hand, angiotensin-converting enzyme (ACE), another zinc protease, discriminates decidedly between these substances. 5.5 Differences in the Activity of Enantiomers 101
  • 119.
    example, which isnot a “mixed a,b-antagonist” but rather a mixture of four different drugs. The effect of this “combination” is a result of the effects of each enantiomer. In most cases Ariëns criticism is fully justified. It must be ensured that the biological activity is as specific as possible, and the side effects are minimal in the design and development of new drugs. Compound uniformity is usually easier to achieve for an enantiomer than for a racemate, which is a mixture of two substances, or even for a diastereomeric mixture. The choice of the correct enantiomer can even reduce or prevent undesirable side effects of metabolites. Selegilin 5.29, a monoamine oxidase inhibitor, is metabo- lized to the CNS-effective compounds methamphetamine 5.30 and amphetamine 5.31 (Fig. 5.13). The more-active enantiomer of 5.29 luckily forms the less active of these two metabolites! If the correct enantiomer of the racemate is used, the desired effect is increased and the undesired CNS side effects are reduced. There are also a few counter examples. The ()-enantiomer of the calcium channel blocker verapamil (▶ Sect. 2.6) is more effective than the (+)-enantiomer. The therapeutic spectrum of both enantiomers is practically identical. After oral application, the ()-enantiomer is quickly metabolized. Therefore the (+)-enantiomer contributes substantially to the desired effect. In this case it would not be economical to try to separate the racemic mixture. N O CH3 N O * N H O O N N H O O O * 5.25 N-Methyl-5-phenyl-5 - propylbarbiturate 5.26 Thalidomide CF3 H OCOEt * * N H COOMe H3C CH3 H N CH3 CH3 CH3 * O2N 5.27 Propoxyphene 5.28 Bay K 8644 Fig. 5.11 Enantiomers also differ in their mode of action. The (R)-()-enantiomer of barbiturate 5.25 is a hypnotic agent, whereas the (S)-(+)-enantiomer causes seizures. In rats and mice only the (S)-()-enantiomer of thalidomide 5.26 (Contergan® ) is teratogenic, that is, it causes embryopathies. Thalidomide 5.26 racemizes in vitro as well as in rabbits. Therefore even the (R)-(+)-enantiomer is teratogenic in rabbits. Propoxyphene 5.27 is a potent analgesic, the effect of which depends on the (2S,3R)-(+) enantiomer, dextropropoxyphene. The (2R,3S)-()-enantiomer is a cough suppressant. The (R)-(+)-enantiomer of Bay K 8644 5.28 is a weak calcium channel blocker. The (S)-()-enantiomer stabilizes calcium channels in the open form and is therefore an agonist, that is, a calcium channel opener. 102 5 Optical Activity and Biological Effect
  • 120.
    Ibuprofen 5.32, ananti-inflammatory drug of the arylpropionic acid class (Fig. 5.14 and ▶ Sect. 27.9), is a special case. The potency of the enantiomers are very different in vitro. In vivo, however, the inactive (R)-()-enantiomer is converted to a large extent to the (S)-(+)-enantiomer. The reverse reaction does not take place. Therefore the racemate and each enantiomer are therapeutically identical, even at the same dose. Only the side-effect spectrum is different because the inversion of the (R)-()-enantiomer is not 100% complete. Sometimes the effort to produce a pure enantiomer is hardly justifiable. In such cases the effects and the side effects of both forms must be compared. Fig. 5.12 The proportion of achiral, enantiomerically pure, and racemic or diastereomeric drugs approved in the period from 1983 to 2003. In the meantime, the proportion of newly approved drugs has shifted decidedly in the direction of enantiomerically pure compounds. R N CN CH3 NH R * * CH3 CH3 Metabolism 5.29 5.30 R = CH3 5.31 R = H Fig. 5.13 Upon metabolism of the monoamine oxidase inhibitor, selegilin 5.29, which is used to treat Parkinson’s disease, the more potent (R)-(–)-enantiomer is converted to methamphetamine 5.30 and amphetamine 5.31. The less-active (S)-(+)-selegilin has less severe side effects because it is not metabolized to CNS-active stimulants. 5.5 Differences in the Activity of Enantiomers 103
  • 121.
    According to theresult, in special cases the continued use of the racemate or the development of an achiral analogue can be considered. At any rate, today these data must be complete before the drug can receive approval. 5.6 Image and Mirror Image: Why Is It Different for the Receptor? Enantiomers and diastereomers have different biological characteristics because the proteins to which they bind have a handedness. They occur naturally in only one form. The amino acids with their chiral centers and the secondary structural elements (▶ Sect. 14.2) with their helical rotational direction are responsible for these properties. If a protein is offered a left or right-handed ligand, different binding modes are to be expected, just as two right hands come together to shake hands more easily than a right and a left hand can. Up to now only a few successful examples of the structure determination of protein–ligand complexes have been reported with the ligand bound in the left as well as right-handed form. This is only possible when both enantiomers have enough affinity for the target protein, that is, they both bind so strongly to the protein that an X-ray crystal structure could be determined. The R- and S-enantiomers of the compound BX5633 (5.33) inhibit the serine protease trypsin (▶ Sect. 23.3) equally well. They have a stereogenic center next to an acid group. The crystal structure determination explains this lack of discrimina- tion. The inhibitor’s acid group is oriented outside of the binding pocket so that no specific interaction is to be expected (Fig. 5.15). A stereopreference cannot exist. Both enantiomers 5.34 and 5.35 bind to carbonic anhydrase II, a zinc hydrolase (▶ Sect. 25.7). There is a difference of a factor of 100 in their affinities. As the X-ray structure with both enantiomers shows, they have similar binding modes (Fig. 5.16). All properties that relate to the solvation of the ligands must be the same for both enantiomers. The difference in affinity is therefore only caused by differ- ences in the binding mode. The sulfonamide groups of both enantiomeric ligands COOH H CH3 H3C CH3 COOH H3C H H3C CH3 5.32 (R)-(-)-Form * No Inversion 5.32 (S)-(+)-Form Metabolic Inversion Fig. 5.14 The (R)-()-enantiomer of ibuprofen 5.32 undergoes a metabolic inversion of its stereocenter, and the (S)-(+)-enantiomer is formed. As a cyclooxygenase inhibitor in vitro, the (S)-(+)-form is more potent than the (R)-()-form. The less-active form is converted to the more- active enantiomer in vivo. Therefore both compounds exhibit equally anti-inflammatory properties in animal models. 104 5 Optical Activity and Biological Effect
  • 122.
    bind almost identicallyto the catalytic zinc. Further, the endocyclic SO2 group forms very similar hydrogen bonds to Gln92. The hydrophobic isobutyl side chains are in similar parts of the binding pockets. The six-membered ring, however must adopt a conformation in the case of the more-weakly binding enantiomer that is highly strained. The price for taking on this strained conformation is paid for in the reduced binding affinity to the enzyme. The enantiomeric agonists 5.36 and 5.37 bind in the ligand-binding domain of the retinoic acid receptor with a difference of a factor of 1,000 (▶ Sect. 28.2). The receptor itself adopts the same geometry (Fig. 5.17). The alcohol function in the middle of the molecule is at the stereogenic center. In both cases, the hydrogen bond to Met272 is formed. As a result, the neighboring amide must take on a deviating orientation in the binding pocket. On the “right” side, the tetraline moiety for both stereoisomers is in a similar place. On the “left” side, the benzoic acid moiety of both enantiomers form a hydrogen-bond network with Arg278, Ser289, and Leu233. The fluorine-substituted benzene ring adopts in both cases a 180 flipped orientation. These different orientations, together with the diver- gently oriented amide bond are responsible for the severe difference in the binding affinity of the mirror-image agonists. 5.7 An Excursion in the World of Antipodes Experience has taught us that if an enantiomer crystallizes with a particular auxil- iary base or acid, the other enantiomer will crystallize with the antipode of the auxiliary in the same way if the identical reaction conditions are applied. Poly- peptides composed of L-amino acids form right-handed helices, and polypeptides made of D-amino acids form left-handed helices. COO− NH H2N NH R,S + + O NH2 5.33 Fig. 5.15 The (R)- (gray) and (S)-enantiomers (beige) of the inhibitor BX5633 5.33 bind with the same affinity to trypsin. Because the protein adopts practically the same geometry with both inhibitors, only one structure is shown. The crystal structure shows that both have almost identical binding modes. The acid function on the stereogenic center points out of the binding pocket and into the surrounding aqueous medium. Therefore no stereochemical discrimination can take place. 5.7 An Excursion in the World of Antipodes 105
  • 123.
    Some naturally occurringpeptides form ion channels in lipid layers. Their synthetic antipodes are also able to do this. The more interesting question is: how does the mirror image of an enzyme behave? In 1992 Stephan Kent and co-workers prepared HIV protease, a homodimer made up of 299 amino acids, entirely from D-amino acids. The naturally occurring protein was also prepared in parallel. The L-enzyme reacts only with L-peptide substrates and the D-enzyme reacts only with the all-D enantiomer. The same is true for chiral inhibitors of the HIV-1 protease. An achiral inhibitor, on the other hand, inhibits both enzymes in the same way. Rubredoxin, an electron-transport protein, was prepared as the D-protein for the sole purpose of mixing it with the naturally occurring L-protein and to make the racemate! If the effort involved is considered, this is certainly an approach that takes some getting used to. The reward for the work was very high-quality crystals. The racemate crystallized in a centrosymmetric space group (▶ Sect. 13.2), which allowed a better resolution of the 3D structure than was possible with the natural, all-L enantiomer. S NH2 NH2 O O S O O S S O O N S S O O N O 5.34 5.35 Fig. 5.16 The enantiomeric sulfonamides 5.34 (gray) and 5.35 (beige) bind in a similar way to the enzyme carbonic anhydrase. Because the protein adopts practically the same geometry with both inhibitors, only one structure is shown. The zinc ion in the catalytic center (purple sphere) is coordinated to the sulfonamide groups. The SO2 groups in the six-membered ring form a hydrogen bond to Gln92 (green). The hydrophobic isobutylamino moieties on the chiral centers project into a hydrophobic pocket and fill this out to the same extent. In doing this, the six-membered ring must adopt a deviating conformation in both enantiomers. In one stereoisomer this conformation is much more strained than in the other, and causes a loss in binding affinity. 106 5 Optical Activity and Biological Effect
  • 124.
    What does avisit to the mirror-image world look like? Achiral drugs would have an identical potency and mode of action. On the other hand, many enantiomerically pure drugs would be useless. We would have to watch out for chiral barbiturates such as 5.25. They would sooner cause a seizure than act as a sedative. In cases in which chiral antibiotics were used to treat bacterial infections, it would first have to be established whether the infecting bacteria came from the mirror-image world or the normal world. The administration of trimethoprim (▶ Sect. 37.2) and a sulfonamide (both achiral) would help at any rate. There would be tremendous problems with nutrition. The carbohydrate and protein metabolism would not work anymore, nor would the resorption of mono- mers from the gastrointestinal tract. We would not be able to recognize some plants by their smell. (R)-Carvone smells of caraway seeds, (S)-carvone smells of spear- mint. Our beloved sugar would have lost its sweet taste, and fruit juices and S Met272 Met272 S N H O OH N H O F OH O F HOOC HOOC F (R)-5.36 (S)-5.37 Fig. 5.17 Both enantiomers of the agonists 5.26 (beige) and 5.37 (gray) bind the retinoic acid receptor with 1,000-fold difference in affinity. Because the protein adopts practically the same geometry with both ligands, only one structure is shown. Both ligands form H-bonds with their OH groups to the sulfur in Met272. In doing so, the fluorine-substituted aromatic ring of the benzoic acid moiety on the left with its central amide bond has to adopt a deviating orientation. The tetrahydro- naphthalene (tetraline) moiety, on the other hand, is positioned in the same way in both enantiomers. 5.7 An Excursion in the World of Antipodes 107
  • 125.
    lemonade would tastesour. Coffee, tea, and cola would retain their stimulatory effects because caffeine is achiral. Diet drinks would have to be sweetened with saccharine or cyclamate (both achiral) because aspartame is chiral. Let us return to the normal world! But first, let us have a quick glass of vodka. It could also be cognac, whisky, or a dry red wine. The taste would be the same as in the normal world, or would it not? Despite the many hundred flavor components of wine, the exchange of a single chiral center could have the consequence that a connoisseur might no longer recognize the chateau. The euphoric effects would be the same, though this would not be the case for the hard, optically active drugs such as heroin, cocaine, or LSD. 5.8 Synopsis • Compounds with an asymmetric or chiral center give rise to enantiomers, two isomeric forms that relate to each other like an image and mirror image and cannot be mutually transferred without breaking and reforming bonds. • Enantiomers exhibit the same properties as long as they are found in a non-chiral environment. If exposed to the asymmetric environment of a protein-binding site, they experience different interactions and thus produce distinct biological properties. • Chiral centers are mostly found at atoms carrying four different substituents, but also an overall handed scaffold can give rise to chirality. If n independent stereocenters are present, 2n isomers (diastereomers) are produced occurring as 2n1 racemic mixtures (pair of equally present enantiomers) as long as there is no internal inversion, mirror, or improper rotation symmetry present. • Chiral centers are named according to the Cahn–Ingold–Prelog priority rules that bring the substituents in a unique sequence according to their atomic numbers. The substituent with lowest priority has to be oriented to the back and the direction of the remaining substituents determine R/S by the sense of rotation following decreasing priority. • Enantiomers can be separated by fractional crystallization after being converted into diastereomeric salts with appropriate chiral auxiliaries. Also enzymes such as lipases, esterases, or proteases can be used for resolution because they transform one enantiomer faster than the other for steric and kinetic reasons. • Most natural products are optically active and occur in just one form. Biologi- cally active enantiomers are called eutomers, inactive ones distomers. • Biological activities of enantiomers and diastereomers can vary greatly either in strength and quality. Application of racemates has to be examined carefully for each individual case. Side effects, chemical stability, and deviating metabolism can have decisive influence on the activity profile. • On the molecular level the affinity discrimination of enantiomers is explained by deviating binding modes in the binding pocket of the target protein resulting in differences of the observed interaction pattern or strain of the adopted bound conformation. 108 5 Optical Activity and Biological Effect
  • 126.
    Bibliography General Literature Ariëns EJ,Soudijn W, Timmermans PBMWM (1983) Stereochemistry and biological activity of drugs. Blackwell Scientific, Oxford Brown C (ed) (1990) Chirality in drug design and synthesis. Academic, London Caner H, Groner E, Levy L (2004) Trends in the development of chiral drugs. Drug Discov Today 9:105–110 Eichelbaum M, Testa B, Somogyi A (2002) Handbook of experimental pharmacology, stereo- chemical aspects of drug action and disposition. Springer, Heidelberg Holmstedt B, Frank H, Testa B (1990) Chirality and biological activity. Alan R. Liss, New York Klebe G (2004) Differences in binding of stereoisomers to protein active sites. In: Pifat-Mrzljak G (ed) Supramolecular structure and function 8. Kluwer Academic/Plenum, New York, pp 31–53 Smith DF (ed) (1989) CRC handbook of stereoisomers: therapeutic drugs. CRC Press, Boca Raton Special Literature Ariëns EJ (1984) Stereochemistry, a basis for sophisticated nonsense in pharmacokinetics and clinical pharmacology. Eur J Clin Pharmacol 26:663–668 Ariëns EJ (1993) Nonchiral, homochiral and composite chiral drugs. Trends Pharmacol Sci 14:68–75 Ariëns EJ et al (1976) Stereoselectivity and affinity in molecular pharmacology. Fortschr Arzneimittelforsch 20:101–142 Bocola M, Stubbs MT, Sotriffer C, Hauer B, Friedrich T, Dittrich K, Klebe G (2003) Structural and energetic determinants for enantiopreferences in kinetic resolution of lipases. Protein Eng 16:319–322 Greer J, Erickson JW, Baldwin JJ, Varney MD (1994) Application of the three-dimensional structures of protein target molecules in structure-based drug design. J Med Chem 37:1035–1054 Jung G (1992) Proteins from the D-chiral world. Angew Chem Int Ed Engl 31:1457–1459 Klaholz BP, Mitschler A, Belema M, Zusi C, Moras D (2002) Enantiomer discrimination illus- trated by high-resolution crystal structures of the human nuclear receptor hRARg. Proc Natl Acad Sci USA 97:6322–6327 Mason S (1986) The origin of chirality in nature. Trends Pharmacol Sci 7: 20–23, and other articles from the same author on pp. 60–64, 112–116, 155–158, 200–205, 227–230 and 281–285 Stinson SC (1994) Chiral drugs. Chem Eng News S. 38–72, and 9 Oct 1995, S. 44–74 Stubbs MT, Huber R, Bode W (1995) Crystal structures of factor Xa-specific Inhibitors in complex with Trypsin: structural grounds for inhibition of factor Xa and selectivity against thrombin. FEBS Lett 375:103–107 Bibliography 109
  • 128.
    Part II The Searchfor the Lead Structure
  • 129.
    The starting pointin the development of a new drug is the search for an appropriate lead structure for a target protein. Next such a target structure within the genome or proteome must be validated as being relevant as a therapeutic principle. The production of the pure target structure is possible by using gene technology methods. After a high-throughput screening assay is established, thousands of test molecules can be evaluated for binding to the target protein. The X-ray crystal structure is solved and serves both the search for, and optimization of lead struc- tures. Without techniques such as bio- and chemoinformatics, molecular modeling, and computational chemistry this type of search and optimization is unthinkable (announcement poster from the research group of the author on the occasion of a conference in 2003 in Rauischholzhausen, Marburg). 112 II The Search for the Lead Structure
  • 130.
    The Classical Searchfor Lead Structures 6 The starting point in the search for a new drug is the lead structure. Such a substance has already a desirable biological effect, but some specific character- istics are still inadequate for its therapeutic use. The definition of the term “lead structure” also means that analogues can be prepared by targeted chemical varia- tions which produce compounds better than the lead structure in, for instance, their potency or selectivity. The goal is the optimization of all characteristics until a final substance is ready for therapeutic use. The largest part of our pharmacy originates directly or indirectly from natural products, that is, from plants, animals, or microbial sources, or from endogenous substances such as hormones and neurotransmitters. Only a few natural products have become drugs themselves. Examples include morphine, codeine, papaverin, digoxin, ephedrine, cilcosporin, and hirudin, the latter of which was isolated from leeches. Examples of endogenous drugs are the thyroid hormone T3, insulin, coag- ulation factor VIII, erythropoietin, and further proteins for substitution therapy. Most naturally occurring compounds serve as lead structures. They are chemically manipulated with the goal of optimizing their desirable characteristics and mini- mizing their side effects (▶ Chap. 8, “Optimization of Lead Structures”). Examples are found in the many natural products and endogenous receptor agonists that have been modified into selective agonists and antagonists (▶ Sects. 6.2, ▶ 6.3, ▶ 6.4, and ▶ 6.6). Drugs are also derived from enzyme substrates (▶ Sect. 6.6 and ▶ Chaps. 23, “Inhibitors of Hydrolases with an Acyl–Enzyme Intermediate”; ▶ 24, “Aspartic Protease Inhibitors”; ▶ 25, “Inhibitors of Hydrolyzing Metalloenzymes”; ▶ 26, “Transferase Inhibitors”; ▶ 27, “Oxidoreductase Inhibitors”) which can either be substrates for endogenous enzymes, for instance, that play a role in blood pressure regulation or inflammation, or they are substrates of enzymes from viruses, bacteria, or parasites, of which the metabolism should be specifically shut down. In the last 100 years preparative organic chemistry has played a decisive role not only in the systematic variation of lead structures but also in lead structure discovery. The search for new active substances has delivered many drugs that have no structural relationship to endogenous examples. In other cases, the relationship between the biological effect and the mode of action was clarified long after their discovery. G. Klebe, Drug Design, DOI 10.1007/978-3-642-17907-5_6, # Springer-Verlag Berlin Heidelberg 2013 113
  • 131.
    6.1 How ItBegan: Hits by In Vivo Screening The first example of discovering an active principle through testing occurred in the eighteenth century, and is found in the effects of digitalis. The Scottish physician, William Withering, while working in England, was consulted by a patient who suffered from an extremely weak heart. After the doctor was unable to help him, the patient consulted a gypsy woman, who prescribed a herbal therapy. Impressed by the recovery of the patient, Withering sought out the woman and asked for the recipe. He received it in exchange for a handsome fee. The mixture contained an extract of the (poisonous) purple foxglove, Digitalis purpurea. The physician investigated the potency of different preparations of these plants in that he gave the medicines to 163 patients! With this experiment, he established that the best formulation was made up of the dried, powdered leaves. After the observation was made that a toxic dose is quickly reached, he recommended that diluted preparations be administered in repeated doses until the desired effect was achieved. Even though digitalis is still used today for congestive heart failure, no one would recommend that Withering’s experimental technique be used to establish the therapeutic potential of a substance. This approach was neither ethical nor practical. 6.2 Lead Structures from Plants The example of the previous section shows that nature has furnished plants with highly potent substances. A plethora of secondary metabolites, for example, alka- loids, terpenes, flavones, and glycosides are also available. The contents of about a hundred different plant species have either directly or indirectly, in the form of analogues, found their way into human therapy. Traditional medicines use about 5,000–10,000 of the several hundred thousand already known species from the rich plant kingdom. Morphine, caffeine, quinine, cocaine, ephedrine, coniine, atropine, and reserpine were already mentioned in ▶ Sect. 1.1. Further plant-based pharma- ceuticals that are used in therapy, or that have served as lead structures for the development of medicines are compounds 6.1–6.7 (Fig. 6.1), and, in addition, emetine, pilocarpine, podophyllotoxin, and the vinca alkaloids vinblastine and vincristine. Why do plants contain so many valuable therapeutic compounds? There is not a human-related answer because plants did not evolve so that they could become human medicines. The plants, however, had to respond to their environment, and a competition with other species occurred. The decisive disadvantage of being a plant is that it cannot run away! That is not a disadvantage when it comes to reproduction. Bees take care of the first part, and aerodynamic seeds help with the rest. An effective protective mechanism against, for instance, fungal infection and pests such as caterpillars, sheep, and cattle served as a selection advantage for some plants. The substances that offer an advantage taste bitter, hot, or are toxic. They exert their effects in that they interact with the enzymes or receptors 114 6 The Classical Search for Lead Structures
  • 132.
    of the “enemy.”The stronger the effect, the better the protection. A successful principle of evolution is the development of defensive substances that do not kill, but cause an unpleasant experience for the predator, which in turn teaches the enemy to stay away. That is how butterflies survive that accumulate poisonous N+ H3C CH3 OH O OMe OH N+ O R O O 6.1 Tubocurarin x 2 Cl− H CH3 CH3 OH MeO MeO N OMe O CH3 O O OH O H MeO OMe O CH3 O OH OH 6.2 Papaverin 6.3 6.4 OH HO O O CH3 H H CH3 O O O H3C H O H CH3 H O O H3C OH CH3 O O 6.6 Artemisinin H N H O CH3 N H OH O H O O O O O CH3 O H3C NH2 6.5 Paclitaxel 6.7 Huperzin A Digitoxin, R = H Digoxin, R = OH Fig. 6.1 Natural products from plants that have been introduced to therapy or have served as lead structures include, in addition to the substances introduced in ▶ Sect. 1.1, tubocurarine (curare) 6.1, papaverin 6.2, digitoxin 6.3, digoxin 6.4, and the related cardiac glycosides. Newer natural products from plants with great therapeutic potential include paclitaxel (Taxol® ) 6.5 for tumor therapy, artemisinin 6.6 for malaria therapy (▶ Sect. 3.3), and the acetylcholinesterase inhibitor huperzin A 6.7 for the potential treatment of Alzheimer’s disease. 6.2 Lead Structures from Plants 115
  • 133.
    plant-based substances intheir bodies, and even those others that just imitate the appearance of these butterflies. After the first experience with the poisonous species, birds give both species a wide berth. Plant substances have already undergone a selection process on biologically relevant proteins; during the course of evolution they have “seen” receptors and binding sites. Further, the course of their biosynthesis takes place in the binding site of a protein, that is, they have functionality that mediates affinity to a protein. Certainly, there are many plant substances that coincidently have a biological effect in humans. Morphine contains a basic nitrogen, a phenolic hydroxyl group, an ether bridge, and a hydrophobic domain: a medicinal chemist would also choose such a mixture of functional groups, without the complicated ring structure, in the conception of an active substance. The isolation of natural products from plants for lead discovery has experienced rather changing valuation in the last decades. Large pharmaceutical companies have repeatedly started ambitious programs to elucidate the mechanism of action of traditional medicines, only to abandon the area again disappointed. The disappoint- ments are a result of an unfavorable relationship between effort and reward. All too often only a toxin is isolated instead of a valuable lead structure, and all too often an already-known principle is found. Nonetheless, the search continues. Nature offers structural variation that the chemist can only dream of. 6.3 Lead Structures from Animal Venoms and Other Ingredients In contrast to the plants, the evolution of animal venoms occurred with the objective of subduing prey or defending against an enemy. Many of these substances are proteins, peptides, and alkaloids. They function as potent poisons that can quickly lame or kill a victim. Because of this, many active substances from animals are unsuitable for therapy, but others, for the exact same reason, are interesting lead structures. Animal products offer many surprises, as illustrated in the following two examples. Despite its simple structure, epibatidine 6.8 (Fig. 6.2), which was isolated from the Ecuadorian poison dart frog Epipedobates tricolor, is a 100-fold more-potent analgesic than morphine! It does not affect the opiate receptor, but rather it is an agonist at the nicotinic acetylcholine (nACh) receptor (▶ Sect. 30.4). That comes as no surprise when its structural similarity to nicotine 6.9 is considered. Epibatidine has a binding constant of 0.04 nM on the nACh receptor, which is 50-fold stronger than nicotine. Unfortunately, its analgesic effects are coupled with a pronounced body temperature reduction (hypothermia). Dolastatine 6.10 (Fig. 6.2) was isolated from the wedge sea hare, Dolabella auricularia, a marine snail. It is an interesting lead structure for antitumor com- pounds. Synthetic analogues of 6.10 cause the complete disappearance of tumors in some animal models. The diversity of marine animals in particular has historically been a rich source of new and interesting lead structures and modes of action. 116 6 The Classical Search for Lead Structures
  • 134.
    Other animal substanceshave gained importance in experimental pharmacology. Among them are the poison of the notorious fugu fish, tetrodotoxin 6.11, and the steroid alkaloid batrachotoxin 6.12 from the skin of the Columbian poison dart frog (Fig. 6.2). Whereas tetrodotoxin specifically blocks sodium channels, batrachotoxin stabilizes sodium channels in the open form. Peptides from snake venom made a decisive contribution to the development of the antihypertensive angiotensin-converting enzyme inhibitors (▶ Sect. 25.4). Research on the area of thrombin inhibitors in the past years have turned toward the active ingredient of leech saliva, hirudin. Aside from the direct use of hirudin, longer-acting derivatives, shorter peptides that only bind on the fibrinogen-binding site, and protein conjugates with other thrombin inhibitors have been derived from the structure. H H N NH Cl N N CH3 H 6.9 Nicotine 6.8 Epibatidine N N H3C N N N O O O O O O N OMe O H CH3 CH3 O 6.10 Dolastatin-15 N H O O O− H HO H H OH H2N+ N H N HO H H OH CH2OH H H OH 6.11 Tetrodotoxin HO HO N CH3 H3C H3C O NH O CH3 O H O N 6.12 Batrachotoxin Fig. 6.2 Epibatidine 6.8, a non-opiate analgesic that binds 50-fold more potently to the nicotinic acetylcholine receptor than nicotine 6.9, comes from a South American frog (▶ Sect. 30.4). Dolastatin-15 6.10, which was isolated from a marine snail, is an interesting lead structure for cancer therapeutics. The toxin of the fugu fish, tetrodotoxin 6.11, is not a lead structure but rather a sodium channel blocker for experimental (in vitro) use. The steroid alkaloid batrachotoxin 6.12 is the most potent animal venom known. The LD50 value in mice, that is, the dose necessary to kill 50% of the experimental animals within 24 h, is 200 ng/kg. 6.3 Lead Structures from Animal Venoms and Other Ingredients 117
  • 135.
    Animal and humanproteins as well as polymeric carbohydrates are extraor- dinarily important for substitution therapies. Insulin (isolated from the pig pan- creas) is at the top of the list, followed by aprotinine, a protease inhibitor (isolated from cattle lungs), digestive enzymes, and the coagulation inhibitor heparin. Now that the possibility of the gene-technological production of insulin is available, its isolation from animal organs has become less important. Other proteins, for example, the erythrocyte-stimulating hormone erythropoietin (▶ Sect. 29.8), human growth hormone, tissue plasmin activator tPA, urokinase, and factor VIII, are all manufactured by using gene technology nowadays (▶ Sect. 32.1). In this way, these proteins are available in practically unlimited quantities. The protease ancrod, isolated from the venom of the Malayan pit viper Agkistrodon rhodostoma, cleaves the precursor of fibrin, fibrinogen, to a product that can no longer aggregate. Thus the viscosity and the coagulation ability of the blood is reduced (▶ Sect. 23.4). An elevated thrombosis risk can be significantly reduced through this mechanism. To isolate the active component of this venom, several hundred snakes have to be “milked” regularly. 6.4 Lead Structures from Microbial Organisms When speaking of active substances from microorganisms, antibiotics must be mentioned first. The b-lactams penicillin and cephalosporin (▶ Sects. 2.4 and ▶ 23.7) are highlighted as particularly valuable lead structures. Aside from oral bioavailability, the therapeutic goals were broad-spectrum activity and metabolic stability. Tetracycline 6.13 (Fig. 6.3) was also intensively structurally modified. It attacks the ribosome during protein biosynthesis (▶ Sect. 32.6). Other microbial antibiotics, for instance, streptomycin 6.14, are used directly in therapy. The immunosuppressants ciclosporin A (▶ Sects. 4.7 and ▶ 10.1), FK 506, and rapamycin also originated from microorganisms. Ciclosporin A is a convincing example of how difficult it is to predict the potential of a new therapeutic substance. Sandoz almost abandoned its development because of “lack of market potential.” This decision would have had fatal consequences because a large portion of the success of transplantation surgery today can be attributed to this substance. Instead, ciclosporin became one of the company’s best-selling products. The fungus Claviceps purpurea, which grows in grain (ergot, Secale cornutum), contains a toxic alkaloid. For hundreds of years, the consumption of bread that had been made from contaminated flour was the cause of severe poisonings. The structures of these alkaloids, for example, ergotamine 6.15 (Fig. 6.3), were in large part elucidated at Sandoz. Their systematic modification led to active sub- stances for many indications, e.g., for inducing contractions during labor, migraine therapy, perfusion disorders, and arterial hypertension. Today they have little importance because of their limited therapeutic index. Another representative of this class is the hallucinogen lysergic acid diethylamide (▶ Sect. 2.5), which was discovered by accident. 118 6 The Classical Search for Lead Structures
  • 136.
    Lovastatin and someanalogues (▶ Sects. 9.2 and ▶ 27.3) are exceedingly important therapeutic substances that were isolated from microorganisms; they interfere in the biosyntheses of cholesterol. Cholecystokinin (CCK) is a peptide hormone that acts at a G protein-coupled receptor (▶ Sect. 29.1). It induces multifaceted effects in the central nervous system and gastrointestinal tract. The non-peptide CCK antagonist asperlicin 6.16 (IC50 ¼ 1.4 mM) originated from extracts of Aspergillus alliaceus. After intensive structural variation, the much simpler devazepide 6.17 (IC50 ¼ 80 pM) was designed, which has more than OH OH O O O O NH2 NH HN HN NH H OH CH3 N(CH3)2 HO H OHHO H2N NH2 OH O O CHO H3C H3C HO O 6.13 Tetracyclin H O N H N N O H HO R1 HO OH R1 = −CH2OH R2 = −NHCH3 R2 N CH3 H O H O 6.14 Streptomycin HN 6.15 Ergotamine NH N O H O CH3 N N H N H O N O HO H N H N H O HN 6.16 Asperlicin 6.17 Devazepide Fig. 6.3 Penicillins, cephalosporins (▶ Sects. 2.4 and ▶ 23.7), and tetracycline 6.13 were impor- tant lead structures for even better antibiotics. In contrast, streptomycin 6.14 is used in therapy itself. Ergotamine 6.15 is a typical representative of the ergot alkaloids, from which a plethora of different drugs have been derived. Likewise, asperlicin 6.16 is a structurally complex microbial natural product. The 10,000-fold more potent derivative devazepide 6.17 was derived from it. 6.4 Lead Structures from Microbial Organisms 119
  • 137.
    10,000-fold better affinityto the CCK receptor (Fig. 6.3). This antagonist is orally bioavailable and is an appetite stimulator. The enzyme streptokinase for the dissolution of blood clots, and bacterial collagenase for wound treatment are examples of therapeutically important proteins that were isolated from microorganisms. 6.5 Dyes and Intermediates Lead to New Drugs In 1903, Paul Ehrlich investigated hundreds of dyes in mice that had been infected with trypanosomes. The result of this research was Nagana Red, the first drug for Trypanosoma crucei infection, the causative agent of cattle trypanosomiasis. Other dyes followed, as did colorless compounds that contained amide instead of azo groups. It was only after Ehrlich’s death in 1916 that Bayer, after having investi- gated more than a thousand analogues, produced its wonder drug suramin (Germanin® ) 6.18 (Fig. 6.4). The work in this area led to the discovery of the antibacterial sulfonamides in the 1930s (▶ Sect. 2.3). Thousands, if not tens of thousands, of analogues were synthesized and tested. Many were introduced to the market. Depending on the structure, they cover an extraordinarily broad spectrum of different pharmacokinetic characteristics. No actual biological activity was expected from the synthetic intermediates. They were seen merely as starting material for the desired end product. Despite this, many intermediates were routinely tested for biological activity, and it was a good thing too! CH3 N H N H N H N H CH3 O O NH O NH SO3Na O O O SO3Na SO3Na SO3Na SO3Na SO3Na 6.18 Suramin Fig. 6.4 Bayer’s suramin 6.18, which is also known as E 205 or Germanin® , had strategic importance for the colonies. An English engineer who was suffering from the African sleeping sickness (trypsanosomiasis) and was near death despite aggressive treatment with diverse anti- mony and arsenic preparations, was cured after a few injections of this substance. The solvent for the preparation of the intravenous injection solution was rain water in the tropical clinical trials(!). After a short time, suramin was considered to be a “wonder drug.” Despite the fact that the structure was kept secret, French researchers worked out their own synthesis within a short time. Suramin is still used for the treatment of trypsanosomiasis because it has good efficacy and a long- lasting effect. 120 6 The Classical Search for Lead Structures
  • 138.
    Gerhard Domagk, thediscoverer of sulfonamides (▶ Sect. 2.3), investigated just such a synthetic intermediate in addition to the many end-target substances and found a surprisingly good effect against tuberculosis. Structural optimization afforded thiacetazone 6.19 (Fig. 6.5), which unfortunately turned out to be hepatotoxic. In the search for a follow-up substance, Bayer started a concerted program with 5,000 compounds. In 1951 another synthetic intermediate showed surprisingly potent tuberculostatic activity. Isoniazid 6.20 (Fig. 6.5) was 15 times more active than the best antituberculosis antibiotic at the time, streptomycin 6.14 (Fig. 6.3). The discovery was palpable. Two other research groups, both in the USA, simultaneously and independently discovered the effect of this substance, which, upon enzymatic radical generation, irreversibly binds to the cofactor NADH of a fatty-acid-synthesizing enzyme of the tuberculosis bacillus. The hypothesis that metabolic cleavage to isonicotinic acid 6.21, which in turn exerts its effect by acting as an anti metabolite to nicotinic acid 6.22 (Fig. 6.5), was evidently wrong. Inhibitors of the enzyme dihydrofolatereductase, for instance, methotrexate 6.23 (Fig. 6.6), are used in the treatment of leukemia (▶ Sect. 27.2). During the inves- tigation of analogues, a simple synthetic intermediate, mercaptopurine 6.24 was tested. It showed efficacy, but was too toxic. The further development delivered azathioprine 6.25, which releases mercaptopurine in the organism (Fig. 6.6). As an immunosuppressive, azathioprine was even better than the then-used corticoste- roids (▶ Sect. 28.5). Until the introduction of ciclosporin (▶ Sect. 10.1) it was used in all organ transplantations. Another intermediate from this class, allopurinol 6.26 (Fig. 6.6), is a xanthine oxidase inhibitor. It is used for the treatment of gout. 6.6 Mimicry: How to Copy Endogenous Ligands As of the middle of the nineteenth century, biological substances, enzyme sub- strates, neurotransmitters, and hormones were increasingly being used as N N H NH2 S R O COOH H N N HNCOCH3 6.19 Thiacetazone 6.20 Isoniazid R = −NH-NH2 6.21 Isonicotinic acid R = −OH 6.22 Nicotinic acid Fig. 6.5 Thiacetazone 6.19 and isoniazid 6.20 are tuberculostatics that originated as synthetic intermediates. Isoniazid penetrates the cell wall and irreversibly binds to the enzymatic cofactor NADH after radical generation. The originally accepted hypothesis that, upon metabolic degredation to isonicotinic acid 6.21, it acts as an antimetabolite for nicotinic acid 6.22, proved to be incorrect. 6.6 Mimicry: How to Copy Endogenous Ligands 121
  • 139.
    archetypes for newmedicines. The directed design of drugs from these lead structures led to the “golden age” of pharmaceutical research (▶ Sect. 1.4). The principal approach is demonstrated here on the example of enzyme inhib- itors. Enzymes catalyze chemical reactions in that they stabilize the transition state of the reaction. In doing so, they decrease the activation energy, and the reaction can proceed at a lower temperature (▶ Sect. 22.3). This specificity can be exploited particularly well for the optimization of enzyme inhibitors. By starting with knowl- edge of the reaction mechanism, substrate groups are assembled that are structurally analogous to the transition state (Fig. 6.7). They imitate it but do not lead to O N H O OH P X O OH Groups that imitate transition states X = -CH2-, -NH-, -O- O H OH X CHO CH OH OH B OH Substrate , as N H O X OH X O H OH OH Transition state , as X = -CF3, -CF2-, -Aryl Fig. 6.7 Examples of substrate, transition state, and groups that imitate the enzymatic transition state of an amide hydrolysis reaction. A few of the groups reversibly form covalent bonds to the serine in the catalytic pocket of a serine protease (see ▶ Sect. 23.2). COOH O N N NH2 N N H COOH 6.23 Methotrexate N N H2N CH3 S OH S N N NO2 N N N H N N N N H N N N N H N H3C 6.24 Mercaptopurine 6.25 Azathioprine 6.26 Allopurinol Fig. 6.6 Simple synthetic intermediates to methotrexate 6.23 turned out to be new drugs. Mercaptopurine 6.24 and azathioprine 6.25 are immunosuppressants, and allopurinol 6.26 is used to treat gout. 122 6 The Classical Search for Lead Structures
  • 140.
    a product. Inthis way in a single step, through an entirely purposeful chemical transformation, a substrate can be converted into a potent and selective inhibitor. The correct inhibitor binding geometry improves the affinity by several orders of magnitude. The two natural products pentostatin 6.29 and nebularine 6.30 (Fig. 6.8) are inhibitors of the enzymatic transformation of adenosine 6.27 to inosine 6.28 and impressive examples of transition-state mimetics. The introduction of a hydroxyl group with the correct stereochemistry increased the affinity of the ligand to the enzyme by many orders of magnitude. Never before was the search for new drugs as successful as it was in the two to three decades of the “golden age.” Subsequently the success rate fell. Research became more expensive and laborious. How is this explainable? Because of the success during this period, many indication areas achieved a very high standard of care. That makes it difficult for modern research to be as successful as before, even with the use of superior tools. Other reasons include higher requirements for efficacy and safety. H2N OH N OH H N N NH2 N N N N O O N N N O HO N N N Sugar OH HO OH Hypothetical transition state of the enzyme reaction O OH H Adenosine- deaminase 6.27 Adenosine 6.29 Pentostatin N N N N N N N N N N N N O O HO HO HO O HO OH HO OH HO OH 6.28 Inosine 6.30 Nebularine Hypothetical active form of 6.30 Fig. 6.8 Pentostatine 6.29 and nebularine 6.30 inhibit the enzymatic transformation of adenosine 6.27 to inosine 6.28. The affinity of 6.29 is 7 orders of magnitude more potent than the substrate adenosine (Ki ¼ 2.5 pM), and the active form of 6.30 is 10 orders of magnitude even more potent (Ki ¼ 0.3 pM). The structures of pentostatin as well as the active form of nebularine correspond to the transition state of the enzymatic reaction. 6.6 Mimicry: How to Copy Endogenous Ligands 123
  • 141.
    6.7 Side EffectsIndicate New Therapeutic Options Many drugs came from the observation of side effects during clinical or practical use (see ▶ Sect. 2.8). The diuretic effects of mercury compounds were discovered purely by accident (▶ Sect. 30.9). In 1919 physicians in the First Medical Univer- sity Hospital in Vienna were testing a new treatment for syphilis. It was observed in a 21-year-old woman that her urine production increased from 200–500 mL a day to 1.2–2.0 L on the third day of treatment with the test substance. This result led to the development of the first effective diuretic (medicine to increase urine production). Fortunately, we are no longer dependent on extremely toxic mercury compounds for the therapy of venereal disease or as diuretics! In 1948 it was observed in vulcanization factories that the antioxidant disulfiram 6.31 (Fig. 6.9) caused workers to become intolerant of alcoholic drinks. This discovery led to the use of the substance for the treatment of chronic alcoholism. S H CH3 CH3 S S N(Et)2 S N O N H (Et)2N N OH O CH OH OH 6.31 Disulfiram 6.32 Iproniazid O OH CH3 O O O O O 6.33 Dicoumarol 6.34 Warfarin OH O HS H CH3 6.35 Penicillamine NH2 H3C Fig. 6.9 Tetraethylthiuram disulfide 6.31 or disulfiram, better known as Antabuse® , is an aldehyde dehydrogenase inhibitor. The accumulation of the toxic acetaldehyde leads to nausea. Iproniazid 6.32, a simple derivative of isoniazid 6.20 (Fig. 6.5), is a monoamine oxidase inhibitor (▶ Sect. 27.8). It acts as an antidepressant by prolonging the effects of the biogenic amines. The rat poison warfarin 6.34 is derived from dicoumarol 6.33. Even though the coagulation parameters must be closely monitored, it is still the standard of therapy for diseases that are coupled with a thrombosis risk, for example, heart attack or stroke. Penicillamine 6.35 is a complexation agent for heavy metals; it is used for–among other indications–the treatment of Wilson’s disease, which is an inherited disease that leads to the accumulation of copper in the tissues. It was only later that its efficacy in chronic rheumatic diseases was discovered. 124 6 The Classical Search for Lead Structures
  • 142.
    The metabolic intermediateof ethanol, acetaldehyde, is not metabolized any further. This leads to generalized poisoning symptoms such as nausea, palpitations, and cold sweats. The effect is, however, difficult to control. Alcohol consumption after treatment has occasionally been fatal. A classic example of the discovery of an important indication by observing side effects can be found in the sulfonamides. The sulfonamide diuretics and the oral antidiabetics (▶ Sect. 30.2), drugs of choice to treat certain forms of diabetes, were found in this way (▶ Sect. 8.4). Iproniazid 6.32 (Fig. 6.9) is a derivative of isoniazid 6.20 (Fig. 6.5). In 1957 a tuberculosis patient noticed a distinctive mood brightening, which led to its broad use for the treatment of chronic depression. The substance had to be withdrawn from the market a few years later due to severe side effects (▶ Sect. 27.8). Sweet clover has been used in Europe to feed livestock for hundreds of years. During its introduction in the 1920s to the USA and Canada, it was initially stored inappropriately, with disastrous consequences. Massive bleeding and fatalities in the cattle were attributed to the spoiled sweet clover (i.e., hemorrhagic sweet clover disease). The active substance, dicoumarol 6.33 (Fig. 6.9), was introduced into therapy in 1942, but its effects were unreliable. The Wisconsin Alumni Research Foundation investigated 150 analogues and produced warfarin 6.34, which was sold as a rat poison. The name is derived from the company’s acronym WARF, and the ending “arin” from coumarin. In 1951 an American soldier attempted suicide with a high dose of warfarin. Because he survived, a clinical trial was initiated. Despite the need for frequent and tight control of the coagulation values, treatment with warfarin is the standard therapy today after a heart attack or stroke. Penicillamine 6.35 (Fig. 6.9) provides an example of an important indication exten- sion. It was introduced for the treatment of Wilson’s disease, an inherited metabolic disease thatleadstocopperaccumulationintissue.Because6.35formscomplexeswell, it is also appropriate for the treatment of heavy-metal poisonings. It was only later, after its practical use, thatits much larger importance as a basis therapy for rheumatic disease was recognized. The mechanism of action remains largely unclear. 6.8 From the Traditional Search to the Screening of Large Compound Libraries The approaches that are described in the previous sections are still used in industrial pharmaceutical research today. Because of the enormous costs associated with the development of drugs, the search for original lead structures is an increasingly important goal. Large sums are paid for novel therapeutic approaches, test models, or 3D structures of target proteins. This information can lead to an advantage over the competition that indeed takes time to realize, but must be zealously defended and brought to fruition. According to the principle of risk diversification and the maximal exploitation of all imaginable resources, today pharmaceutical companies subscribe to a strategy of broadly established screening of huge substance libraries of plant extracts, 6.8 From the Traditional Search to the Screening of Large Compound Libraries 125
  • 143.
    microbial fermentations, andsynthetically prepared compounds. The last category comes from in-house chemistry as well as purchased compounds and combinatorial substance libraries (▶ Chap. 11, “Combinatorics: Chemistry with Big Numbers”). Furthermore, a large part of the search for new lead structures takes nowadays place by computer methods. The identification of therapeutically relevant target proteins plays an ever– increasing role for the discovery of new lead structures. The elucidation of the human genome (▶ Sect. 12.3) has delivered the sequences of all human proteins. By comparing the expression pattern between diseased and healthy cells, it is possible to recognize particular proteins as a cause or consequence of a given pathology (▶ Sect. 12.8). Should such a protein be detected, the next steps are certain. The therapeutic concept is tested on a genetically modified animal (▶ Sect. 12.5), or the gene is silenced (▶ Sect. 12.7), a molecular test system is established, and the 3D structure of the protein is elucidated. In parallel, all available techniques for lead structure search are employed. Because this process chain is being carried out with increasingly high throughput, the capacity for lead- structure searching must be constantly extended. Many companies try to simultaneously develop chemically unrelated lead struc- tures for the same indication. The elaborateness of the animal models for the preclinical profiling and the preparations for clinical testing require so much labor and expense that it seems hardly justifiable to start such a program with only one compound class. Risk minimization and distribution are required for the search as well as the development of a medicine. Techniques that are used for the detection of new lead structures are presented in the next chapter (▶ Chap. 7, “Screening Technologies for Lead Structure Discovery”). 6.9 Synopsis • Many active substances originate from natural products found in plants, animals, and microbial sources. Their mode of action has been copied as an active principle for the development of drugs. • Endogenous substances such as hormones and neurotransmitters also served as references for drug development. • Only a few natural products became drugs themselves. • Usually targeted chemical variations are required to optimize a lead for meta- bolic stability, half-life, or selectivity to be ready for therapeutic use. • Plants contain many valuable therapeutic compounds usually developed as an effective protective mechanism against all sorts of enemies. • Nature offers a tremendous body of structural variations, however, ambitious programs to elucidate mechanisms of action of traditional medicines all too often only isolate toxins and discover already-known principles. • Animals have developed venoms as aggressive or defense mechanisms to be used as predators or against enemies. They are mostly proteins, peptides, or alkaloids that either kill or lame a victim. 126 6 The Classical Search for Lead Structures
  • 144.
    • Snake venomsserved as references for the development of anti-hypertensive drugs; active principles to block blood clotting (e.g., by leeches or bats) were turned into active ingredients for anticoagulation drugs. • Proteins for substitution therapy (such as insulin, erythropoietin, factor VII) are manufactured by gene technology. • Microorganisms have provided leads for antibiotics (e.g., penicillins), which had to be optimized for oral availability, broad-spectrum activity, and metabolic stability. • The immunosuppressant ciclosporin A, a cyclic peptide; ergotamine, a toxic alka- loid in ergot; lovastatin, an inhibitor of cholesterol biosynthesis; or streptokinase to dissolve blood clots, are successful drugs originating from microorganisms. • Dyes and many synthetic intermediates produced in chemical industry were investigated for biological effects and provided important compound classes such as the sulfonamides. • Small but essential structural changes of endogenous ligands transform enzyme substrates, neurotransmitters, and hormones into successful drugs. • Many drugs originated from clinical observations of side effects during practical use, for instance, the anti-diabetic effect of sulfonyl ureas from the observation of side effects of sulfonamides. • To exploit all imaginable resources to discover leads today, huge substance libraries of plant extracts, microbial fermentations, and libraries of synthetically prepared compounds are screened. Bibliography General Literature Burger A (1983) A guide to the chemical basis of drug design. Wiley, New York Sneader W (1990) Chronology of drug introductions. In: Hansch C, Sammes PG, Taylor JB (eds) Comprehensive medicinal chemistry. vol 1, Kennewell PD (ed). Pergamon, Oxford, pp 7–80 Verg E (1988) Meilensteine. 125 Jahre Bayer 1863–1988. Bayer AG, Leverkusen Special Literature Badio B et al (1994) Epibatidine: discovery and definition as a potent analgesic and nicotinic agonist. Med Chem Res 4:440–448 and other works (Special journal edition dedicated to Epibatidine) Buss AD, Waigh RD (1995) Natural products as leads for new pharmaceuticals. In: Wolff M (ed) Burger’s medicinal chemistry and drug discovery. Wiley, New York, pp 983–1033 Hylands PJ, Nisbet LJ (1991) The search for molecular diversity (I): natural products. Ann Rep Med Chem 26:259–269 Pettit GR et al (1993) Isolation of dolastatins 10–15 from the marine mollusc Dolabella Auricularia. Tetrahedron 41:9151–9170 Suffness M (1993) Taxol: from discovery to therapeutic use. Ann Rep Med Chem 28:305–314 Tempesta MS, King SR (1994) Ethnobotany as a source for new drugs. Ann Rep Med Chem 29:325–330 Bibliography 127
  • 146.
    Screening Technologies forLead Structure Discovery 7 In the last chapter, examples were presented of how lead structures can be discovered by purposefully searching, particularly by using examples from nature or compounds with known modes of action. Even if a large number of natural products and synthetic substances are available, it is not always easy to filter the active molecules out and to assess their value for a given indication. This requires a time and cost-intensive sorting or screening of enormous substance libraries. By “screening” is meant the more or less specific biological testing of compounds. Although today molecular test systems and cell culture models are practically exclusively used, the cost for testing a compound is between US $2 and US $5. Because typically millions of compounds are tested, a screening campaign can cost a lot of money! The screening process can be divided into three phases. First there is an automatic introductory screening, which is usually carried out by robots and encompasses libraries of millions of compounds. The first substances that show an interaction are identified as “hits” that have to be validated by repeated testing. Next, a more detailed screening follows, with which the chemical space around the identified compounds is explored. The goal is to establish a structure–activity relationship (▶ Chap. 18, “Quantitative Structure–Activity Relationships”) and to improve the pharmacological and physicochemical properties (▶ Chap. 19, “From In Vitro to In Vivo: Optimization of ADME and Toxicology Properties”). Along the way, lead structures (so-called “leads”) are discovered. Then in the last phase the lead optimization takes place through detailed biological testing, through which a drug candidate is selected for clinical testing (▶ Chap. 8, “Optimization of Lead Structures”). How can we find appropriate hits from the enormous amount of test candidates that have the potential to be developed into a medicine? The question is answered by screening for biological effects. 7.1 Screening for Biological Activity by HTS The prerequisite for a large-scale screening was the development of in vitro test systems as a surrogate for animal experiments. The first were carried out on G. Klebe, Drug Design, DOI 10.1007/978-3-642-17907-5_7, # Springer-Verlag Berlin Heidelberg 2013 129
  • 147.
    isolated enzymes andmembrane homogenates for receptor-binding studies. Later gene technology (▶ Sect. 12.6) made sufficient quantities of pure proteins available for the development of molecular test systems. This offered the advantage that homogenous proteins, preferentially human proteins, could be tested. In the mid-1990s, automated test systems with an extremely high capacity (high-throughput screening, HTS) led to a daunting boom. The discovery of candidates for drug development is now attempted by using the entire methodo- logical repertoire of biochemistry in a test tube. Meanwhile it is known how to reprogram cells and organisms so that the function of single genes is highlighted. The special trick with all of these test methods lies in translating the molecular effect into a macroscopically visible signal. Despite the enormous effort that is associated with HTS, and the not-always- justifiable hit rate, HTS is here to stay in pharmaceutical research. There are always interesting lead structures to be found in this way (▶ Chaps. 23, “Inhib- itors of Hydrolases with an Acyl–Enzyme Intermediate”; ▶ 24, “Aspartic Pro- tease Inhibitors”; ▶ 25, “Inhibitors of Hydrolyzing Metalloenzymes”; ▶ 26, “Transferase Inhibitors”; ▶ 27, “Oxidoreductase Inhibitors”; ▶ 28, “Agonists and Antagonists of Nuclear Receptors”; ▶ 29, “Agonists and Antagonists of Membrane-Bound Receptors”; ▶ 30, “Ligands for Channels, Pores, and Trans- porters”; ▶ 31, “Ligands for Surface Receptors”; ▶ 32, “Biologicals: Peptides, Proteins, Nucleotides, and Macrolides as Drugs”). A weakness may be the limited diversity of synthetic substances, compared with the structural com- plexity of plant and microbial metabolites. Another limitation of in vitro test systems is that neither the entire effect spectrum nor many other effects such as transport, distribution, metabolism, and excretion (▶ Chap. 19, “From In Vitro to In Vivo: Optimization of ADME and Toxicology Properties”) can be assessed. The composition of suitable screening libraries is exceedingly critical. Fre- quently molecules and test candidates are used that were prepared during the course of other drug-development projects. As such, these molecules already have the size of a typical drug. Usually only modest, almost always micromolar binding to the test receptor is found. To improve the properties of such a hit, it must be structurally modified. As a general rule, this is accomplished by adding more chemical groups. This means that the molecular weight can quickly reach or exceed 500–600 Da, which is considered to be the upper threshold for good bioavailability (▶ Sect. 9.1). The optimization of such a screening hit therefore means that the size must be reduced first, so that it can be increased again during a goal-oriented optimization. Yet the size reduction often comes with a loss in binding. Therefore the criterion “ligand efficiency” was introduced to judge a screening hit’s optimization potential. For this, the number of non-hydrogen atoms of the hit are considered in relation to the binding affinity. Small sub- stances that have good binding in relation to their size are seen as particularly promising candidates for an optimization program. 130 7 Screening Technologies for Lead Structure Discovery
  • 148.
    7.2 Color ChangeDemonstrates Activity Important target proteins for drug development are proteases and esterases, which are enzymes that cleave peptide and ester bonds (▶ Chaps. 23, “Inhibitors of Hydrolases with an Acyl–Enzyme Intermediate”; ▶ 24, “Aspartic Protease Inhib- itors”; ▶ 25, “Inhibitors of Hydrolyzing Metalloenzymes”). How can their enzy- matic activity be visualized? One prepares synthetic substrates that are similar to the natural substrate. They carry however, a para-nitroanilide or a para- nitrophenolate group coupled by a peptide or ester bond (Fig. 7.1) When the enzyme cleaves this substrate, yellow nitrophenolate or nitroanilide is released, and the absorption properties of the produced anion are a measurably change. This is observed spectroscopically. If then, during screening, a compound acts as an inhibitor, the enzymatic cleavage of the synthetic substrate is more or less suppressed, and the yellow color is minimized. In this way the inhibition potency of test substances can be determined (Fig. 7.1) 405 nm ε λ NH2 O− + Peptide NH− NH N − O O + R N O N O + N − O O N − O + + OH H O N O O− + Ester O− O -RCOO− O− N − O O + R O O O N − O O N − O O− + + cleavage Fig. 7.1 A p-nitrophenolate or a p-nitroanilide group is added to the terminus of a natural protease or esterase substrate. The enzyme cleaves the p-nitrophenolate or p-nitroanilide, which becomes visible as a yellow-colored mesomerically stabilized anion (absorption maximum at 405 nm). If a competitive inhibitor is added along with the substrate to the enzyme, the cleavage reaction rate is suppressed depending on the binding strength. This is apparent by the more or less strong yellow color of the solution and can be quantitatively measured. 7.2 Color Change Demonstrates Activity 131
  • 149.
    A broad paletteof chromophoric reactions could be developed that are suitable for the characterization of enzymatic activity. Many enzymes, for example, dehydrogenases, need NAD(P)H as a natural cofactor, which is subsequently oxidized to NAD(P)+ (▶ Sect. 27.1). Because the NAD(P)H starting material, in contrast to the product, absorbs at 340 nm, the progress of the enzymatic reaction can be followed at this wavelength. As a variation, two enzymatic reactions can be coupled to one another. This possibility is interesting when the substrate that is easily spectroscopically followed is produced in an upstream reaction. In this case the reaction of the enzyme of interest is not actually directly observed. Rather, the activity of interest is registered based on the consumption of the upstream reaction products in the subsequent enzyme reaction. Although absorption spectroscopic assays are preferred for technical reasons, tests that are based on the reaction of radiolabeled compounds play an even more important role. The activity of kinases is, for example, followed by using 32 P-labeled adenosine triphosphate. The terminal phosphate group of the labeled substrate is transferred to the phosphorylated protein by the kinase (▶ Sect. 26.3). The incorporation rate serves as a measure of the kinase activity. Receptor-binding studies are carried out with a known radioactively labeled ligand. The assay investigates to what extent test compounds can displace the radioactively labeled ligand from the receptor-binding site. This type of test does not necessarily represent a functional assay though. Agonistic and antagonistic binding (▶ Chaps. 28, “Agonists and Antagonists of Nuclear Receptors” and ▶ 29, “Agonists and Antagonists of Membrane-Bound Receptors”) must still be distinguished. 7.3 Getting Faster and Faster: More and More Compounds by Using Less and Less Material Antibodies play an important role in assay development. The enormous specificity of antibody–antigen interactions can be exploited as a highly sensitive system (▶ Sect. 32.3). In classical immunoassays, either the release of a radioactively labeled substance is followed (Radioimmunoassay, RIA), or an enzymatic reaction is provoked (enzyme-linked immunosorbent assay, ELISA). The latter technique has enjoyed a distinctly larger application range, mostly because radioactivity is best avoided as a measured quantity. Because they only recognize a single molec- ular species, immunoassays are not only highly specific but also versatile. Screening techniques are optimized to be automated and miniaturized. Driven by the desire for higher capacity, these tests are hardly ever carried out in 96-well (8 12) microtiter plates anymore. The wells of these plates hold a reaction volume of about 0.3 mL. In the meantime 384-well (16 24) microtiter plates are used or even 1536-well (32 48) plates, the volumes of which are only a few microliters per well. The aggregation behavior of hydrophobic test compounds poses a large problem. The aqueous buffer solutions that are used for these assays can cause these compounds to aggregate. This aggregation generates hydrophobic surfaces, on which the proteins can adsorb. The concentration of free protein is reduced, 132 7 Screening Technologies for Lead Structure Discovery
  • 150.
    which can appearas though the protein is well inhibited. The addition of detergents can reverse this effect. By using a sophisticated robot system, 100,000 assays a day can be carried out. This leads to an enormous flood of data to be evaluated. The reduced test volume has the advantage that much less material is consumed. Furthermore, the measurements can be carried out quickly. At the same time the sample manipu- lation has become ever more difficult. One only has to consider the evaporation of such small amounts of solution, the enormously increasing logistics of comprehending so much data in parallel, or the reproducibility of the results, and the necessary sensitivity to measure weak signals with certainty to appreciate the difficulty. In order to improve this last aspect, ever more sensitive detection procedures are used. Fluorescence measuring techniques are particularly sensitive. In the sim- plest case, a fluorescing substrate such as coumarin (▶ Sect. 14.6) is incorporated in the place of para-nitroanilide. The protein–ligand binding can also be followed by fluorescence anisotropy (or polarization). A known ligand is coupled to a fluorophore and excited with polarized light. The emitted fluorescence is in this case also polarized. In the time that the excited molecule can freely diffuse in solution, the extent of the induced polarization decreases. Because a small molecule can diffuse much faster than a big one, its polarization signal decreases much faster than if it were bound to a large protein. The difference is determined based on the change in diffusion character of the large protein, which can be measured. Even better sensitivity can be achieved with so-called FRET measuring techniques (fluorescence resonance energy transfer). A resonance energy transfer can occur between donor and acceptor fluorophores of similar absorption if both are separated by no more than 50 Å. If, for example, a phosphatase assay is desired, a phosphorylated peptide substrate must be coupled with a covalently bound donor fluorophore. The substrate is added with the test compound. Depending on how potent the inhibiting test compound is, the enzyme’s activity is reduced, and less substrate is cleaved. Then an antibody is added that binds to the unphosphorylated substrate. The antibody is also coupled to a fluorescence acceptor, the absorption maximum of which overlaps with the emission spectrum of the donor fluorophore. If a fair amount of phosphorylated substrate is still present, that is, the test compound is a potent inhibitor, the spatial proximity of the donor and acceptor leads to a strong FRET signal. This can be quantitatively measured. In the meantime, progress in assay miniaturization allows the detection of single molecules. This is possible by using fluorescence correlation spectroscopy (FCS). A confocal laser microscope irradiates approximately a femtoliter of test solution. If a single fluorophore diffuses through the volume of interest, it causes a time- resolved fluctuation in the fluorescence signal. An exact analysis of these signals delivers information about the concentration and diffusion constants. The diffusion velocity, on the other hand, depends on whether the fluorescence-marker-labeled substance is bound to a protein or not. If the proteins as well as the ligands are tagged with different markers, the association and dissociation can even be followed. 7.3 Getting Faster and Faster 133
  • 151.
    7.4 From Bindingto Function: Testing in Entire Cells The binding of a ligand to a protein says nothing about the concomitant function or change in function. Often it is easy to relate the observed inhibition in an enzyme assay to a function. The correlation is less obvious with receptors and ion channels (▶ Chaps. 28, “Agonists and Antagonists of Nuclear Receptors”; ▶ 29, “Agonists and Antagonists of Membrane-Bound Receptors”; ▶ 30, “Ligands for Channels, Pores, and Transporters”). If the biochemical pathways and cell cycle regulation are considered, it becomes even more complex to assign function for enzymes. This correlation is not so easily reproduced in a test tube. Therefore assays must also be developed to study function that allow the response of an entire cell to be measured upon ligand binding. It is possible to culture cells for many different tissues, which then allows the study of tissue-specific receptors. Typically the activity of ion channels can be investigated by using binding tests or radioactive assays. The so-called patch–clamp technique allows the influence of a drug candidate to be even better characterized. An electrode is attached to the surface of a cell, and a voltage or current is applied. In this way the opening or closing of single channels can be registered, particularly when a test molecule is added. This technique certainly does not encroach on the dimension of the high-throughput techniques. It is better used to elucidate the function of hits from a prescreening. Fluorescence methods are more popular for the first step. As an example, Ca2+ -channel function can be assessed by measuring an increase in intracellular calcium levels by using a dye that fluoresces in the presence of calcium ions. Other tests employ the coupling to a reporter gene. Receptor stimulation initiates a signaling cascade that, for some receptors, leads to the transcription of gene products that are controlled by the relevant promoters (▶ Sect. 28.1). If the sequence of the relevant gene is replaced with that of a reporter’s, such as b- galactosidase, luciferase, or green-fluorescent protein (GFP), then these proteins are produced by the cell instead. This can subsequently be observed as an easily detectable signal (Fig. 7.2). As examples, if the produced b-galactosidase cleaves X-gal, a blue dye is released, luciferase develops an ATP-dependent chemilumines- cence, and the green-fluorescent protein is detectable because of its own intrinsic fluorescence. 7.5 Back to Whole-Animal Models: Screening on Nematodes Primary substance testing on animals as it was once carried out is ethically unjustifiable today. Further, an animal model is not predictive for target-oriented optimization. Nevertheless it does have advantages. The reaction of an entire organism to a substance is immediately transparent, the bioavailability is directly measured, and side effects as well as synergistic effects are straightaway obvious. Back in 1963, Sydney Brenner recognized the complexity of molecular biology in that he emphasized the biochemical control of cellular development. He proposed that the pinworm (the nematode Caenorhabditis elegans) would be the simplest 134 7 Screening Technologies for Lead Structure Discovery
  • 152.
    multicellular organism toinvestigate. This nematode normally lives in soil and feeds on bacteria. It is also easily culturable in microtiter plates and fed with Escherichia coli bacteria. It is a hermaphrodite, has a short lifespan, reproduces itself within 3 days, can be conserved in liquid nitrogen, is transparent, and homologous genes have been found in humans for 60–80% of its genes. The pinworm genome has been sequenced, and we now understand how to easily manipulate it. Because it is transparent, any internal changes can be easily observed so that, for instance, proteins can be tagged with fluorescence markers. Its 959 somatic cells form many different organs, including a nervous system with 302 neurons. Can substance testing be carried out in such a life form? The ethical threshold may be set lower in this case. But then, how predictive would any tests be? Can such an animal be used to predict mood changes, depression, or appetite and its relation to obesity? This is only possible if the causes of these diseases are known on the molecular level, for example, a defect caused by an altered serotonin- mediated signaling. In such a situation the worm can serve as a model. A first step toward the discovery of a potential target is selective gene silencing. This is possible by using RNA interference (▶ Sect. 12.7). If the pinworm (nematode) is exposed to a substance library, it is possible to see a change in appearance or behavior. Is the life expectancy lengthened or shortened? These are indications that the compounds could interfere with the aging process or are toxic. If there are changes in muscle cells, perhaps it might be useful for neurodegenerative muscle GFP hν hν DNA Preparation of the construct Promotor for Gene A Gene A DNA Promotor GFP GFP Gene Cell penetration DNA Promotor for Gene A GFP Gene Test model Activation by active substances Registered signal Fig. 7.2 Genes are controlled by promoters. Promoter-initiated gene activation leads to the synthesis of the relevant protein. By using green fluorescent protein (GFP), an easily observed assay can be constructed based on this principle. For this the gene promoter that is activated by agonist binding is coupled to the GF-protein gene. Activation of the promoter then delivers not the original gene product, but rather the GF protein. The presence of GF protein is easily observed because of its fluorescence upon excitation with ultraviolet light. 7.5 Back to Whole-Animal Models: Screening on Nematodes 135
  • 153.
    disease. Aside frommacroscopic changes in the body form, changes in the gene expression pattern can also be analyzed (▶ Sect. 12.9). Are mutations in proteins apparent? Certainly the worm does not have the same metabolic pathways as we do. Even its disease models only partially represent the pathophysiology that is seen in human disease. Nonetheless the direct testing of compounds on the pinworm seems to afford a new perspective for screening substance libraries. As an alternative, the fruit fly (Drosophila melanogaster) or the zebra fish (Danio rerio) are also available as test organisms. They help to test the validity of a therapeutic approach early in a program. 7.6 In Silico Screening of Virtual Libraries As described in the previous section, experimental high-throughput screening (HTS) has been automated with great effort. When fed with compounds from combinatorial chemistry (▶ Chap. 11, “Combinatorics: Chemistry with Big Num- bers”), several hundred thousand substances can be scanned by using HTS. At first it seemed that this would be the end to all rational structure-based techniques. In view of the enormous financial investment and the disappointingly low hit rate, the initial euphoria began to soberingly wane. Therefore as an alternative, the technique of enumerating huge databases on the computer by fitting smaller molecules in a predefined binding pocket (docking, ▶ Sect. 20.8) was developed (virtual screening). The unsatisfactory hit rate from HTS is attributed to the size, structural diversity, and poorly selected composition of the substance library with respect to the actual properties of the target protein. The recognition of false-positive and false-negative hits in biological systems causes large problems. Disappointing hit rates have been reported for the translation of initial hits into potential lead structures for lead optimization. This is all the more reason to attempt to develop virtual screening techniques into a complementary and alternative method. The prerequisites for the successful use of the these techniques are entirely different from those of the technology-driven HTS: virtual screening can only reasonably be applied if the factors that are responsible for a putative drug to bind to its target protein are understood on the molecular level. The starting point for this is the spatial structure of the target protein, which is usually determined by NMR spectroscopy (▶ Chap. 13, “Experimental Methods of Structure Determination”) or X-ray structure analysis (Fig. 7.3). Models can be increasingly derived from structurally homologous proteins of known geometry (▶ Sect. 20.5). To successfully bind to a protein, the ligand must adopt a shape that is complementary to the binding pocket. Molecules are flexible and can change their shape through bond rotations that require very little energy (▶ Chap. 16, “Conformational Analysis”). In addition to spatial fit in a suitable conformation, the functional groups of a potential ligand must find complementary functional groups in the binding pocket of the protein. Hydrogen bonds must be formed between ligand and protein, and hydrophobic molecular portions must find their 136 7 Screening Technologies for Lead Structure Discovery
  • 154.
    counterpart in theprotein (▶ Chap. 4, “Protein–Ligand Interactions as the Basis for Drug Action”). For this, the protein binding pocket is analyzed to highlight the areas that are essential for binding. For a particular atom type, for instance, a hydrogen-bond donor or acceptor, the binding pocket is systematically scanned. By using computer graphics, it is possible to see where functional groups attached to a candidate ligand might be optimally placed (▶ Sect. 17.10). The composite picture of all such placed atom types in the binding pocket that are indicated by this analysis reveals a spatial pattern of physicochemical properties that a ligand must meet to successfully bind to the protein (“hot spots” ▶ Sect. 17.1 and ▶ 17.10). With these criteria in hand, a molecular database can be searched that is composed of already-synthesized compounds or compounds that have been virtually assembled on the computer. 1 2 3 4 5 6 −2 −1 0 1 2 3 Computer Screening a b c O OH OAc d e f g h Fig. 7.3 The spatial structure of a protein is the starting point for virtual screening (a). The binding pocket is explored with a variety of different probe atoms, for instance, for hydrogen bond acceptors or donors (b). Regions that are particularly favorable for such interacting groups are highlighted on the computer graphics. If the “hot spots” in these areas are summarized, a spatial pattern of properties that a potential ligand should have become apparent (c). This pattern is called “pharmacophore” and serves as the search criterion for a database retrieval (d). Potential ligands from a large database are filtered and energetically evaluated by docking (e). The found hits are either commercially available or synthesized in the laboratory (f). Next biological testing takes place (g), and if the binding is successful, the lead structure is crystallized with the protein. The subsequent structural determination (h) serves as a starting point for further design cycles. 7.6 In Silico Screening of Virtual Libraries 137
  • 155.
    In case ahit from the latter group is found, the compound can be subsequently synthesized. The search is divided into multiple filtering steps that become increas- ingly stringent and sophisticated with successive reduction of the search quantity. With the help of fast docking programs (▶ Sect. 20.8), molecules are fitted into the binding pocket and a binding geometry is generated, from which the expected binding affinity can be estimated. This step is the decisive one, but unfortunately it is also the most difficult (▶ Sect. 20.9). In ▶ Chap. 21, “A Case Study: Structure- Based Inhibitor Design for tRNA-Guanine Transglycosylase”, examples are presented that were found by virtual screening. The evaluation of the generated binding geometries is accomplished with suffi- cient accuracy in about 70% of cases nowadays. An improvement in predictive power requires that we understand the ligand–protein recognition process better (▶ Chap. 4, “Protein–Ligand Interactions as the Basis for Drug Action”). The role of water in the binding, the induced steric and dielectric adaptation, the plastic behavior and residual mobility of proteins and bound ligands, and the dynamic changes during complex formation are still poorly understood. The composition of the databases themselves plays a decisive role in the search’s success. Enlarging the database alone is not enough. The enrichment of the compounds that could fulfill the requirements is crucial. Screening is often compared to the search for a needle in a haystack. When looking for such a needle, it is not helpful to simply double the size of the haystack! The haystack must be spiked with more promising needles. To achieve this, all available knowledge about the structure, function, and dynamic behavior of the target protein must be used to define the database search. Compar- isons between proteins and protein binding pockets, especially among members of the same protein family, can offer decisive information (▶ Sects. 20.3, ▶ 20.4, ▶ 20.5, ▶ 20.6). In principle, all of the data that are needed about the composition of a suitable compound library for a virtual screening are already intrinsically coded in the structure and geometric interaction properties of the binding pocket. It is only a question of applying it correctly. Another decisive criterion for a hit is an adequate pharmacokinetic profile so that satisfactory bioavailability can be achieved (▶ Chap. 19, “From In Vitro to In Vivo: Optimization of ADME and Toxicology Properties”). 7.7 Biophysics Supports Screening Surface plasmon resonance techniques are being increasingly used to screen for new lead structures. For this a target molecule is anchored onto the gold-coated surface of a sensor chip. The underside of a glass carrier is irradiated with light (Fig. 7.4). Changes in the refractive index, which are measured as a shift in total internal refraction are a measure for bulk change on the sensor surface. If a compound binds, the resulting change in mass on the gold surface can be registered. Because the technique is fast and time resolved, other kinetic parameters such as the association or dissociation rate constants of the binding event can be measured in addition to the stoichiometry. One problem associated with screening 138 7 Screening Technologies for Lead Structure Discovery
  • 156.
    in microtiter platesis the huge amount of time that is needed to load the plate with compounds. One way around this bottleneck is to apply the entire compound library to a sensor chip in a microarray format by using spraying techniques. This means now all the low-molecular weight ligands are anchored on the chip. If a test receptor protein is added to such a chip, a mass difference is detected where the protein binds. Because of the spatial resolution of the chip, it can easily be determined which library compound is responsible for the interaction. The disadvantage of the method is that the test compounds must be attached with a chemical anchor that allows them to be immobilized on the chip surface. Surface plasmon resonance has meanwhile achieved a sensitivity that allows the detection of even very small test compounds with a mass as small as 100 Da. Therefore the approach can be reversed: Now the protein is immobilized at the surface and ligand binding from solution can be recorded. In Sect. 7.1, the concept of “ligand efficiency” was introduced. To take the latter aspect into consideration, test libraries are being increasingly supplied with com- pounds that have a molecular weight of less than 250 Da. In the meantime the term chemical “fragment” has become popular for these search candidates. The term is a bit unfortunate because the molecules are actually “complete” small molecules, and not as the term might suggest that they are simply a “fragment”, that is, an additional building block to be attached to a lead structure. Proteins denature when they are heated. A “melting temperature” is defined when an unfolding process (▶ Sect. 14.2) occurs. This temperature can be mea- sured very sensitively with a thermal sensor. The binding of a ligand to a protein Light Source Sensor Chip with Gold Film Prism Resonance Signal Time II I Intensity Angle I II Polarized Light Reflected Light I II Detector Flow Channel Sensorgram Kon Koff Target Protein Test Ligand Fig. 7.4 The principle of surface plasmon resonance (SPR). The method registers changes in the refractive index on the surface of a sensor chip (green). The extent of the changes on the gold surface that are caused by the binding of the substrate molecule (yellow) onto an anchored receptor (red) leads to a shift in the resonance angle of the reflected light (I and II). That way, not only the binding affinity but also the kinetic association (kon) and dissociation (koff) parameters are measured. 7.7 Biophysics Supports Screening 139
  • 157.
    changes this meltingpoint. As described in Sect. 7.3, fluorescence measurements, are extremely sensitive indicators. This effect of melting can be registered in that the unfolded proteins interact with a fluorescent dye, and the change in fluorescence signal can be detected. The temperature shift caused by ligand binding can be used as evidence as to whether a ligand is bound to a protein or not. It has also been possible to construct quantitative binding assays exploiting this effect. This very sensitive technique is also suitable to detect weakly binding fragments. Mass spectrometry has developed significantly in the last decades. By applying very gentle bombardment conditions it is possible to detach single electrons from huge biomacromolecules, or even to generate negatively charged species. In the best case, it is possible to detect the investigated protein in its intact form as a singly charged ion. The charged particles are then accelerated between charged parallel-oriented condensator plates. The flow of charged particles can be bent by the application of a magnetic field. The flight path of a particular particle depends on its mass and charge. In this way it is possible to separate and detect particles based on their mass-to-charge ratio. This principle has been refined with the most sophisticated technology and adept combination of electrical and magnetic fields so that it is now possible to detect single mass differences of only a few Daltons among even huge proteins. Clever experimental conditions allow a given situation in solution, for instance, a protein–ligand complex, to be carried across into the gas phase without decomposition. There it is ionized and detected in the mass spectrom- eter. With this technique an assay is at our disposal that can be used to detect the binding of very small ligands to proteins. It is even possible to cause the tailored decomposition of the complexes by varying the acceleration voltage. By registering the voltage at which the decomposition occurs, the strength of the protein–ligand complex can be assessed. Because the decomposition occurs in the gas phase, information about the binding strength of such complexes in a water-free environ- ment is available. Ligands can also be “fished” with proteins. For this, a protein for which a ligand is sought, is exposed to an entire library of test compounds in aqueous solution. Whatever compounds from the library bind to the protein are captured. The protein is then separated with a microfilter, and the bound ligand is released in that the protein is chemically denatured. The solution with the released ligands is then processed, and a micro-HPLC separation is carried out. The chromatographically separated ligands are then subjected to a very sensitive analysis to determine which members of the original library were fished out by the protein. The binding process of a ligand to a protein represents a chemical reaction. As with all chemical reactions, a more or less pronounced heat of reaction can be observed. The process can either release (exothermic) or absorb heat (endothermic). This heat signal can be recorded to register the binding event of a ligand to a protein. A very sensitive calorimeter is required. When equipped with an electronically controlled compensatory heating, these devices can achieve astonishing sensitivity. As an example, such a device was built to study the activity of a butterfly that was being enticed with different pheromones. The heat that was generated with the stroke of the wing was detected as a signal by the calorimeter. 140 7 Screening Technologies for Lead Structure Discovery
  • 158.
    A dissolved ligandcan be titrated by dropwise injection into the solution of a target protein in such a calorimeter. Each drop results in a heat signal. Upon increasing saturation of the protein, the heat signal decreases so that a curve can be generated from which the binding constant of the ligand can be deduced (Fig. 7.5). If all the signals are integrated over the entire titration, the total heat of reaction for the binding event is determined. With this, two different thermodynamic binding characteristics are measured. The free energy DG is determined from the equilib- rium constant, and the enthalpy DH is given by the integrated heat signal (▶ Sect. 4.3). By using Eq. 4.3, the entropy of binding can be calculated. It is important that in addition to the proof of ligand binding to the protein, the most relevant thermo- dynamic parameters DG, DH, and DS are assessable in one experiment at one temperature. The method of isothermal titration calorimetry is not for high throughput. It is better for the analysis and description of the binding process. Because of its importance, particularly with the optimization of ligands in mind, this method is considered again in ▶ Sect. 8.8. 2.0 1.5 Stoichiometry ΔG ΔH ∫dF = ΔH 0.5 0.0 −6 −4 −2 −0.8 μJ/s kJ/mol −0.6 −0.4 −0.2 0.0 0 10 20 30 Time (min) 40 50 60 70 0 1.0 Molar Ratio Fig. 7.5 In isothermal titration calorimetry, a solution of a ligand is added dropwise to a solution of a protein. The binding to the protein leads to an exothermic or an endothermic reaction. The heat that evolves upon the addition of each drop is the area under the single signal peaks. The total integral of all signal peaks is the binding enthalpy DH. With increasing amount of ligand the protein becomes saturated so that the signal intensity of the heat signal decreases. The binding constant (dissociation constant) can be derived from the shape of the curve and the free energy DG can be obtained from the relationship DG ¼ RT ln Kd. The stoichiometry of the reaction is simultaneously obtained. The entropy is calculated by using the equation: DG ¼ DH TDS. 7.7 Biophysics Supports Screening 141
  • 159.
    7.8 Screening byUsing Nuclear Magnetic Resonance The method of NMR spectroscopy is presented in ▶ Sect. 13.7 in greater depth. Here it suffices to say that it has to do with the orientation of magnetic moments of the nuclei in a substance sample. By applying a carefully chosen spatial and time- resolved sequence of electromagnetic fields, it is possible to specifically activate nuclei that are oriented within these magnetic fields. This can be carried out for one type of nucleus in a protein. If a solution of test ligands or an entire mixture of ligands is added to such a solution, protein binding can occur, assuming that the ligands are suitable. According to their binding strength, they reside for a particular length of time on the magnetically saturated protein. In doing so the magnetic signal is transferred from the protein to the ligand. Upon dissociation, the changed magnetic characteristics can be spectroscopically detected because the relaxation time of the transferred magnetization is faster in the uncomplexed state. The solution is measured with and without the magnetized protein. Then the difference between the spectra is evaluated. Signals are only then recognizable for ligands that had been bound to the protein in the interim and have therefore experienced magnetization transfer. The so-called saturation transfer difference (STD) spec- trum can be used to screen for possible ligands (Fig. 7.6). Many different variations and elaborate experimental protocols have been developed for the above-described RF (selective) fast fast minus = Fig. 7.6 To determine the saturation transfer difference (STD) with NMR spectroscopy, a library of test ligands ( , ) is added to a target protein (ellipse). Potential binders (here ) reside for a finite time span bound to the protein. If the nuclear spin of one type of nucleus in the protein is selectively saturated (red) by using a suitable resonance frequency (RF), the protein magnetization can be transferred (nuclear Overhauser effect, see ▶ Sect. 13.7) to the ligand that was bound in the meantime ( ). These ligands become apparent in that their spectrum is altered even though they are already dissociated from the protein. If the difference between the spectra in presence of the saturated and unsaturated protein is displayed, it is possible to determine which ligands were bound immediately to the protein. Many variations and sophisticated experimental protocols have been developed for the principle of magnetization transfer. 142 7 Screening Technologies for Lead Structure Discovery
  • 160.
    principle of magnetizationtransfer. Even the use of so-called reporter or spy ligands, which have an easily measured NMR signal are used. The resonance of fluorine atoms is particularly well suited. For this, a fluorine-containing reporter ligand that binds to the protein is needed; its binding should not be too strong though. The ligand should be easily displaced from the protein by the test ligand. This release is detectable as a change in the fluorine NMR spectrum, and reveals the binding of the test ligand in this way. As is explained in detail in ▶ Sect. 13.7, the spatial structure of proteins can be determined by isotopic labeling and the measurement of mutually coupled NMR spectra. By such means, where a test ligand binds to a protein can be accurately determined by evaluating the specific resonance shifts of the labeled protein. In the best case, it is even possible to see two ligands binding at once or two different ligands binding on different non- overlapping positions in the binding pocket. The research group of Steven Fesik at Abbott developed these methods. It is known as “SAR by NMR” (SAR stands for structure–activity relationship) and is used for lead-structure identification and optimization. A nanomolar inhibitor for the matrix metalloproteinase stromelysin (▶ Sect. 25.6) was found with this method. First a potent head group was sought that could bind to the zinc ion in the catalytic center of this protease. Just such a molecule, acetohydroxamic acid 7.1, was found with an admittedly weak but specific binding of Kd ¼ 17 mM (Fig. 7.7). After the discovery of this ligand, the binding site on zinc was saturated by this compound. Further NMR measurements concentrated on the search for a ligand suited to fill the neighboring S1 0 binding pocket. For this, a small library of heteroarylphenyl and biphenyl derivatives was employed. 4-Cyano-40 -hydroxy-biphenyl 7.2 was identified as a hit. On the right side of Fig. 7.7 both ligands are shown in the binding pocket. The evaluation of the structural data showed that the hydroxylated phenyl ring binds in proximity to the methyl group of the acetohydroxamic acid. Therefore connecting the fragments was the next obvious thing to do. An ethylenoxy group was used as a bridge and was coupled to the cyanobiphenyl moiety. NMR spectroscopy confirmed this structural hypothesis and an inhibitor, 7.3, with an affinity of 25 nM was produced. 7.9 Crystallographic Screening for Small Molecular Fragments Crystal structure analysis delivers the most exact spatial position of a molecule in the binding pocket of a protein. Even the geometry of small, very weakly binding molecules is easily recognized. In structures that have a resolution better than 2–2.5 Å (▶ Sect. 13.5), water molecules are usually still recognizable as discrete density maxima. Often, they indicate sites in the binding pocket that can be equivalently accommodated by polar functional groups of ligands (Fig. 7.8). In the early 1990s, Dagmar Ringe in the research group of Greg Petzko exposed protein crystals intentionally to solvent molecules to allow the solvent to diffuse into the crystals (▶ Sect. 20.2). The solvent molecules can act as probes in that they populate binding regions of the protein pockets. As an example, the areas where 7.9 Crystallographic Screening for Small Molecular Fragments 143
  • 161.
    isopropanol, acetonitrile, oracetone are encountered in thermolysin, a zinc prote- ase, are shown in Fig. 7.8. Even phenol, a small organic molecule, manages to diffuse into the binding pocket. Phenylsuccinic acid, a lead structure with a typical fragment size, binds to the zinc protease. Its binding position has been determined by crystallography. The phenyl ring of this molecule sits in the position that is also explored by phenol. One of the acid groups of the succinic acid is in the position that was indicated by the carbonyl carbon of acetone. The second acid group coordinates to the zinc ion and occupies positions where water molecules resided in the uncomplexed state (Fig. 7.8). There are many protein–ligand complexes in which small molecules from the crystallization solution or cryobuffer were adsorbed. These can be used as probes to map out N H CH3 CH3 Kd = 17 mM HO O Zn2+ Zn2+ Zn2+ S1⬘ S1⬘ S1⬘ 7.1 Stromelysin a b c Kd = 17 mM Kd = 0.02 mM HO N H HO 7.2 O CN Stromelysin O HO N H IC50 = 25 nM 7.3 O CN Stromelysin d His211 Zn2+ Val163 His205 His211 Zn2+ Val163 His211 His205 e Fig. 7.7 In the “SAR by NMR” method, ligands with weak affinity to a protein, in this case stromelysin, are sought from a large complex mixture. 15 N-labeled protein is used and so-called 1 H-15 N HSQC spectra are measured. If a ligand such as acetohydroxamic acid 7.1 becomes apparent through a shift in the resonance of specific amino acids that protrude into the binding pocket, the binding geometry can be deduced (a, d). Later the binding site is saturated with these ligands. Further NMR measurements are carried out to identify ligands for neighboring binding positions. These are revealed by the shift in the resonances of neighboring amino acids. That is how 4-cyano-40 -hydroxybiphenyl 7.2 was discovered (b, d). A chemical coupling of both hits 7.1 and 7.2 with a –CH2CH2O– linker produced 7.3, which is a nanomolar inhibitor of the protease stromelysin (c, e). 144 7 Screening Technologies for Lead Structure Discovery
  • 162.
    Phe114 Asn112 Zn2+ Arg203 a OH O CH3 N H2O Phe114 Asn112 Zn2+ Arg203 O b HO HO O Benzylsuccinic acid HO Acetone WaterAcetonitrile Isopropanol Phenol Fig. 7.8 It was possible to soak small probe molecule (so-called “fragments”) into crystals of the protease thermolysin. (a) Superposition of multiple structures in which water (red spheres), isopropanol (C atoms are gray), acetone (C atoms are light blue), acetonitrile (C atoms are green), and phenol (C atoms are violet) had penetrated the crystals. They describe potential positions for functional groups of putative ligands. The structure of benzylsuccinic acid, a weakly binding inhibitor of thermolysin, is also shown in (b). That molecule coordinates with one of its acid groups to the catalytic zinc ion (upper row). Both oxygen atoms of the acid group displace two water molecules that are present in the non-complexed structure. The other carboxylate group forms a salt bridge with the neighboring Arg203. The oxygen of an acetone molecule was found at almost the same position. The phenyl ring of the benzylsuccinic acid that occupies nearly the same position as the phenol molecule in the fragment structure was detected. Benzylsuccinic acid can be used as a starting structure for further optimization. 7.9 Crystallographic Screening for Small Molecular Fragments 145
  • 163.
    a binding pocket.A creative scientist will directly exploit their position for the design of new drug candidates. From there, it was obvious to use crystal structure analysis as a method to screen small molecules or “fragments” (MW 250 Da). Even today a crystal structure determination is fairly laborious. All the same, it can be largely automated so that a few hundred molecules can be processed. In addition, the tendency of small molecules to diffuse into mature protein crystals can also be used (so-called “soaking”; ▶ Sect. 13.9). If a “cocktail” of multiple test substances is used, the screening can be accelerated. A protein crystal can be exposed to up to 10 compounds at once. The composition of the cocktails is construed so that a mixture of different forms (long and stretched, angular, spher- ical, etc.) is present. This makes it easier to distinguish them later in the electron density (see ▶ Sect. 12.5). To optimize the effort-to-yield ratio for the crystallo- graphic screening, often a different screening method is carried out first to pre-filter possible hits. Only compounds that have been identified as hits in the first screening are used in the subsequent crystallographic screening. However, only a few tech- niques that have been described in the previous section are really suitable to find a small, weakly binding candidate from a fragment library. Frequently this concerns only millimolar-binding candidates. The hits from the crystallographic fragment screening can be further developed (▶ Sect. 20.7). One possibility is to probe the different regions of the binding pocket and then connect the pieces with a linker, analogously to what was described in Sect. 7.6 in the “SAR by NMR” method. In another, usually more successful variation, the fragment hits are chemically elaborated upon. For this approach additional moieties are added on the basis of the crystal structure. In this way the original hit, which serves as a seed, can be enlarged to bind more strongly to the protein. 7.10 Tethered Ligands Explore Protein Surfaces Ligands bind with very poor affinity to flat pockets that are open to the surrounding solvent. Therefore, it is extremely difficult to evidence their binding or obtain a crystal structure with a ligand bound in such an area. James Wells and his colleagues at the Sunesis company in San Francisco developed the idea to tether ligands for this type of binding. From a chemical point of view, this means that a reaction is carried out with the exposed thiol of a cysteine residue on the protein’s surface. Such a cysteine must be available in the native protein, or it is appropriately introduced by mutagenesis (▶ Sect. 12.2). Under suitable reaction conditions, the ligand is anchored with a disulfide bond, which is formed through the thiol group of the exposed cysteine (Fig. 7.9). Only those test candidates from the compound library will react that are able to form an interaction with the surface in the vicinity of the cysteine thiol group. For all intents and purposes, they explore the surround- ing region, react with the cysteine, and remain coupled to the surface by the disulfide bridge. Successfully formed complexes are then evidenced by mass spectrometry. James Wells and Robert Strout chose thymidylate synthase as their 146 7 Screening Technologies for Lead Structure Discovery
  • 164.
    first test example.This enzyme plays an important role in the de novo synthesis of thymidine, an essential building block for DNA. Cells with a high division rate especially need this building block so that inhibition of this enzyme might represent potent anti-infective agents or antitumor compounds (▶ Sect. 27.2). Thymidylate synthase has a cysteine residue in position 146, in the vicinity of the catalytic site. From a library of 1200 disulfides, compounds 7.4–7.7 proved to be binders whereas the very similar derivatives 7.8–7.11 were not selected (Fig. 7.10). Accordingly, the phenylsulfonamide together with the proline moiety seemed to be essential for binding. Next the disulfide anchor was removed, and the binding constant for N-tosyl-D-proline 7.12 was measured to be 1.1 mM (Fig. 7.11). To further test the concept, Cys146 was exchanged for a serine (Fig. 7.12). When no binding was apparent with this mutant, the neighboring His147 was mutated to a cysteine, but this mutant could not fish out the N-tosylproline moiety either. In contrast, the position-143 mutant was successful (Fig. 7.12). In that case a leucine was exchanged for a cysteine. The subsequently determined crystal structure showed that the N-tosylprolyl moiety was almost identically bound in both cova- lently anchored complexes, just as they are without an S—S anchor (Fig. 7.12). This is convincing proof that the covalent coupling is not responsible for the binding geometry. In fact, the technique allows small, initially weakly binding ligands to be fished out of a large library. From the original millimolar hit 7.12, the side chain of the natural cofactor methylenetetrahydrofolic acid could be transferred to give 7.13, which was developed into a nanomolar inhibitor 7.15 in two steps. The method of “tethering” can be fairly generally applied. It has especially achieved success in the search for ligands that disrupt the formation of protein– protein surface contacts (▶ Sect. 10.6). A great advantage of the technique is that it is not necessary to develop an additional biochemical binding assay. Weakly R R S S S R R S S S S S SH S S + Fig. 7.9 The thiol group of the exposed cysteine is used as an anchor group for the formation of disulfide bonds with ligand candidates from a compound library. There, suitable ligands react that are also able to interact with the surface region in the vicinity of the cysteine thiol. A crystal structure was determined from just such a covalently linked complex (Fig. 7.12). After optimiza- tion of the initially discovered hit, the disulfide anchor can be discarded and a non-covalent inhibitor can be developed. 7.10 Tethered Ligands Explore Protein Surfaces 147
  • 165.
    binding ligands arecovalently “tethered” and cannot be washed away as happens in the case of simple complex formation. Further, the covalently bound chemical probes allow the adaptive capacity of the surface region to be explored. 7.11 Synopsis • Large substance libraries are screened for biological effects to filter out active molecules and assess their value for a given indication. • Three phases are distinguished, a broad automatic introductory screening for hits, a more detailed screening of chemical analogues around a hit to establish the first structure–activity relationship, and a lead optimization to find candidates for clinical testing. • A prerequisite for high-throughput screening was the development of in vitro test systems using pure proteins produced by gene technology along with the entire arsenal of biochemical methods in the test tube so that the function of single-gene products can be recorded. • As a disadvantage, high-throughput screening does not assess the entire effect spectrum and ignores effects such as transport, distribution, metabolism, and excretion. • Screening libraries are frequently assembled of molecules from other drug development projects; as such, they are rather inefficient with regard to their molecular size and their modest screening hit activity in micromolar range. CH3 F O S S S S S O N O S O N O 7.4 CH3 H3C CH3 7.5 S O S S O S S Cl S O N S O N 7.6 7.7 S S CH3 N H3C S S S O N O CH3 CH3 7.8 7.9 S O S S O S S O N H S O N H 7.10 7.11 Fig. 7.10 From a library of 1,200 disulfides, the compounds on the left side 7.4–7.7 proved to be binders although structurally similar derivatives 7.8–7.11 (right) were synthesized but did not bind to the protein. 148 7 Screening Technologies for Lead Structure Discovery
  • 166.
    Small substances withhigh ligand efficiency and sufficient space for structural optimization are particularly promising. • Enzymatic function and its inhibition can be recorded by the production of chromophoric reaction products. • Radioactively labeled compounds or enzyme-linked immunosorbent assays are versatile techniques to record protein function on the molecular level. • Progress in assay miniaturization calls for sophisticated robotic systems, ever- improving sensitivity of the read-out, including fluorescence measuring tech- niques, and reliable logistics to handle the enormous data flow. • Aggregate formation of hydrophobic test compounds can exert significant influ- ence on the assay read-out or even cause false positive or negative hits. • Testing on cell-based assays is performed to study changes in cellular or organism-related function beyond pure binding of a test compound to a given protein target. CH3 NH O HOOC HOOC COOH COOH NH O S O COOH S O COOH S O N H O COOH O N O N O N 7.14 7.12 7.15 HOOC COOH NH O N HN N N H N O N H2N 7.13 Ki = 1,1 mM Ki = 330 nM Ki = 24 μM Fig. 7.11 By transferring a side chain from the natural cofactor methylenetetrahydrofolic acid 7.13, N-tosyl-D-proline, a millimolar inhibitor could be transformed into a nanomolar inhibitor 7.15 in two steps. 7.11 Synopsis 149
  • 167.
    • Primary animaltesting in vertebrates has been abolished today for ethical reasons, but it is being increasingly replaced by whole-animal screening by using nematodes as the simplest multicellular organism to record synergistic and side effects. • As a complementary and alternative method, virtual computer screening has been developed to screen large compound libraries by docking ligand candidates into the known spatial structure of a target protein. • Binding events are recoreded by biophysical methods such as surface plasmon resonance, thermal stability shifting, mass spectrometry, or microcalorimetry. They are used to detect ligands as potential binders. • NMR spectroscopy can be used to detect ligand binding by magnetization transfer. Multiple binders can be chemically linked to more strongly binding ligands according to the SAR by NMR technique. • Exposure of small molecular probes and fragments to protein crystals allows for the structural characterization of the binding modes of weakly binding fragments as a versatile starting point to lead optimization. • Small-molecule fragments tethered to a protein through covalent attachment to the exposed thiol group of a cysteine residue allow the exploration of the binding properties of flat, solvent-exposed surface depressions and serve as a starting point to develop antagonists to perturb the protein–protein interface in complex formation. S S Cys143 Cys143 Leu = 7.4 S S Cys146 = 7.4 Cys146 Ser = 7.12 His147 Fig. 7.12 Superpostions of crystal structures of the enzyme thymidylate synthase with two tethered ligands, one bound to Cys143 (C atoms of ligand 7.4 are green) and the other to Cys146 (C atoms of ligand 7.4 are violet), both of which are N-tosyl-D-proline derivatives and which are covalently anchored through S—S bridges. Upon cleavage of the disulfide anchor, the free N-tosyl-D-proline (C atoms are gray, 7.12) proved to be a ligand with an affinity of 1.1 mM. Its binding geometry is very similar to both of the covalently anchored derivatives. 150 7 Screening Technologies for Lead Structure Discovery
  • 168.
    Bibliography General Literature Blundell TL,Jhoti H, Abell C (2002) High-throughput crystallography for lead discovery in drug design. Nat Rev Drug Discov 1:45–54 Hajduk PJ, Greer J (2007) A decade of fragment-based drug design: strategic advances and lessons learned. Nat Rev Drug Discov 6:211–219 Jahnke W, Erlanson DA (2006) Fragment-based approaches in drug discovery. In: Mannhold R, Kubinyi H, Folkers G (eds) Methods and principles in medicinal chemistry, vol 34. Wiley- VCH, Weinheim Jones AK, Buckingham SD, Sattelle DB (2005) Chemistry-to-gene screens in Caenorhabitis elegans. Nat Rev Drug Discov 4:321–330 Klebe G (2006) Virtual ligand screening: strategies, perspectives and limitations. Drug Discov Today 11:580–592 Löfås S (2004) Optimizing the hit-to-lead process using SPR analysis. Assay Drug Dev Technol 2:407–415 Siegel MM (2002) Early discovery drug screening using mass spectrometry. Curr Topics Med Chem 2:13–33 Sotriffer C (2010) Virtual screening. In: Mannhold R, Kubinyi H, Folkers G (eds) Methods and principles in medicinal chemistry, vol 48. Wiley-VCH, Weinheim Vogtherr M, Fiebig K (2003) NMR-based screening methods for lead discovery. In: Hillisch A, Hilgenfeld R (eds) Modern methods of drug discovery. Birkh€ ausen Verlag, Boston, pp S183– S120. ISBN 376436081X Special Literature Hajduk PJ, Sheppard G, Nettesheim DG, Olejniczak ET, Shuker SB, Meadows RP, Steinman DH, Carrera GM Jr, Marcotte PA, Severin J, Walter K, Smith H, Gubbins E, Simmer R, Holzman TF, Morgan DW, Davidsen SK, Summers JB, Fesik SW (1997) Discovery of potent nonpeptide inhibitors of stromelysin using SAR by NMR. J Am Chem Soc 119:5818–5827 Erlanson DA, Braisted AC, Raphael DR, Randal M, Stroud RM, Gordon EM, Wells JA (2000) Site-directed ligand discovery. Proc Natl Assoc Soc 97:9367–9372 Bibliography 151
  • 170.
    Optimization of LeadStructures 8 A lead structure is the starting point on the way to a drug. The potency, specificity, and duration of effect must be optimized, and the side effects and toxicity must be minimized in an usually elaborate, iterative process. Every change in the chemical structure modulates the 3D structure of the molecule, its physicochemical prop- erties, and the activity spectrum. The isosteric replacement of atoms or groups, the introduction of hydrophobic building blocks, the dissection of rings or the restriction of flexible molecular portions into cyclic structures, and the optimiza- tion of the substitution pattern are all possibilities to purposefully modify a target structure. Creativity and luck are always important prerequisites for success in pharmaceu- tical research. Nonetheless, there is a treasure chest of decades of accumulated experience that can be exceedingly supportive to the rational optimization process. The computer-aided methods can contribute to their full capability in this field in particular. Several general considerations and approaches to lead optimization are presented in the sections of this chapter. A discussion of the structure-based and computer-aided optimization of lead structures is presented in ▶ Chaps. 17, “Pharmacophore Hypotheses and Molecular Comparisons” and ▶ 20, “Protein Modeling and Structure-Based Drug Design”; examples for its application to differ- ent therapeutic areas are presented in ▶ Chaps. 23, “Inhibitors of Hydrolases with an Acyl–Enzyme Intermediate”; ▶ 24, “Aspartic Protease Inhibitors”; ▶ 25, “Inhibitors of Hydrolyzing Metalloenzymes”; ▶ 26, “Transferase Inhibitors”; ▶ 27, “Oxidore- ductase Inhibitors”; ▶ 28, “Agonists and Antagonists of Nuclear Receptors”; ▶ 29, “Agonists and Antagonists of Membrane-Bound Receptors”; ▶ 30, “Ligands for Channels, Pores, and Transporters”; ▶ 31, “Ligands for Surface Receptors”; ▶ 32, “Biologicals: Peptides, Proteins, Nucleotides, and Macrolides as Drugs”. 8.1 Strategies for Drug Optimization The optimization of active substances follows a process that is best characterized by the words of the philosopher Sir Karl Popper: G. Klebe, Drug Design, DOI 10.1007/978-3-642-17907-5_8, # Springer-Verlag Berlin Heidelberg 2013 153
  • 171.
    The truth isobjective and absolute. But we can never be sure that we have found it. Our knowledge is always an assumed knowledge. Our theories are hypotheses. We test for the truth in that we exclude what is false. (Objective Knowledge, 1972) Accordingly the optimization of a compound’s potency follows a working hypothesis, while an iterative process of trial and error refines the hypothesis. The assembled data about the relationship between chemical structure and biological activity serve the design of new structures. These are synthesized and tested, and a new working hypothesis is modified as appropriate. In negative cases, the hypothesis is discarded and a new one is formulated that fits more harmoniously with the biological data. The following qualities in the structure of the active substance are distinguished from one another: • The actual pharmacophore (Sects. 8.7 and ▶ 17.1) that is responsible for the specific binding and upon which only limited chemical modification can be carried out, • The additional groups (adhesion groups) that improve the affinity and biolog- ical activity, • Further groups that do not influence the binding but rather the lipophilicity of the molecule and with it the transport and distribution in biological systems (▶ Chap. 19, “From In Vitro to In Vivo: Optimization of ADME and Toxicology Properties”), • The groups that must be cleaved or modified in the organism to release the actual active form (▶ Chap. 9, “Designing Prodrugs”). The most important steps in the optimization of lead structures are the systematic changes in the shape and form, that is, the three-dimensional structure, and/or the physicochemical properties. Single steps along this route are: • Changes in the lipophilicity and the electronic properties through the introduc- tion or removal of hydrophobic or hydrophilic groups, • Variations of substituents at aromatic or heteroaromatic rings, • Introduction or elimination of heteroatoms in chains or rings, • Changes in chain length of aliphatic groups or linkers, • Introduction of space-filling substituents to stabilize a particular conformation, • Changes in the ring size of alicyclic or heterocyclic rings, • Incorporation of flexible partial structures in rings, • Incorporation of branches or attachments to rings (rigidifying), • Opening of rings, • Elimination of chiral centers to simplify a structure, • Addition of chiral centers to increase the selectivity or • Shift the thermodynamic binding profile and the drug’s residence time at the target protein. These processes are usually unidirectional in classical drug optimization, that is, the optimization takes place on one position of the molecule at a time, in one single direction. In the past, such unidirectional optimization has led to many disappointments because interdependent influences of the structural changes were neglected, or the optimal lipophilicity was exceeded. John Topliss developed 154 8 Optimization of Lead Structures
  • 172.
    a scheme forthe variation of aromatic substituents that allows the biological activity to be optimized in a minimum number of steps (Sect. 8.3). The application of experimental design, simultaneously changing multiple parts of a molecule, and the evaluation of the results by using quantitative structure–activity relationships (▶ Chap. 18, “Quantitative Structure–Activity Relationships”) usually allows a fast and effective optimization. In structure-based and computer-aided optimization, the 3D structure of the target protein and its complexes leads to directed structural variations of the active substances. Here again, the aspects of total lipophilicity and metabolism should not be neglected. 8.2 Isosteric Replacement of Atoms and Functional Groups Isosteric replacement is the exchange of particular groups in a molecule for sterically and electronically related groups. If the biological effect is essentially maintained, the term bioisosteric replacement (Fig. 8.1) is used. In the simplest case a single atom is exchanged, for instance, a Cl (lipophilic, weakly electron withdrawing) is replaced by a Br (same characteristics as Cl) or methyl (lipophilic, weakly electron donating), or an –O– (polar, H-bond acceptor) is exchanged for an NH (polar, H-bond donor) or a –CH2– (lipophilic, unable to form H-bonds). Furthermore, bioisosteric replacement also means the exchange of entire groups. Substituents: F-, Cl-, Br-, CF3-, NO2- Methyl-, Ethyl-, Isopropyl-, Cyclopropyl-, tert-Butyl-, -OH, -SH, -NH2, -OMe, -N(Me)2 Bridging Groups: -CH2-, -NH-, -O- -COCH2-, CONH-, -COO-, C=O, C=S, C=NH, C=NOH, C=NOAlkyl Atoms and Groups in Rings: -CH=, -N= -CH2-, -NH-, -O-, -S- -CH2CH2-, CH2-O- -CH=CH-, -CH=N- Larger Groups: -NHCOCH3, -SO2CH3 N N O N H -COOH, -CONHOH, -SO2NH2, , N NH HO N HO N O HO N N H H Fig. 8.1 A few possibilities for the isosteric replacement of atoms and/or groups. 8.2 Isosteric Replacement of Atoms and Functional Groups 155
  • 173.
    For example, –COOH,an H-bond acceptor and donor, can be replaced with other groups that have the same or modified properties, for instance, with the similarly acidic tetrazole. Another example can be found in the exchange of a phenyl ring for a thiophene or a furan building block (Fig. 8.1). The potential of isosteric replace- ment is illustrated in the exchange of all three iodine atoms of triiodothyronine T3 8.1 for alkyl groups to give 3,5-dimethyl-30 -isopropylthyronine 8.2, which in turn retains impressive affinity and agonistic activity on the thyroid hormone receptor. In contrast to triiodothyronine, which is both iodinated and metabolized by a deiodinase, the alkyl groups of 8.2 are no longer metabolically cleavable. Bioisosteric replacement was and is one of the most important strategies in pharmaceutical research. Nonetheless, surprises sometimes occur. The replacement of an ester for an amide group in the local anesthetics (▶ Sect. 3.4) expectedly improved the metabolic stability. In the case of acetylsalicylic acid 8.3 (Fig. 8.2) this exchange cannot be made. An analogous exchange of the –COO– group for a –CONH– group results in a complete activity loss because the amide can no longer acylate the cyclooxygenase enzyme (▶ Sect. 27.9). In the case of p-aminobenzoic acid (R ¼ –COOH, Fig. 8.2) the exchange of a carboxyl group for a sulfonamide group gives sulfanilamide 8.4 (R ¼ –SO2NH2), which is an antimetabolite of p-aminobenzoic acid (▶ Sect. 2.3). O CH2CH(NH2)COOH CH2CH(NH2)COOH HO I I I 8.1 Triiodothyronine, T3 O HO R 8.2 COOH O O 8.4 R = -COOH NH2 8.3 Acetylsalicylic acid or -SO2NH2 Fig. 8.2 Isosteric replacement with retention, loss, and reversal of the biological activity. All three iodine atoms of the thyroid hormone thyroxine 8.1 can be replaced with alkyl groups and compound 8.2 is still active. In the case of acetylsalicylic acid 8.3, the exchange of the –OCOCH3 for an NHCOCH3 group led to the loss of the acylating ability and therefore a nearly complete loss of the biological activity. The antimetabolite sulfanilamide 8.4 (R ¼ SO2NH2) is derived from p-aminobenzoic acid 8.4 (R ¼ COOH), which is a critical intermediate in the bacterial dihydrofolate synthesis; 8.4 (R ¼ SO2NH2) is the result of the exchange of a carboxyl group for an isosteric sulfonamide group. 156 8 Optimization of Lead Structures
  • 174.
    A lead structureis rarely studied exclusively by one research group. Other companies adopt successful examples, at the very latest after the economic success of a new medicine. The goal of this so-called “me-too” research is to modify the competitor’s lead structure to arrive at patent-free analogues that are more effica- cious, more selective, or better tolerated. It must be accepted that even this form of competition has led to the therapeutically most valuable compounds in many thera- peutic areas. On the one hand, a plentitude of duplicate work has been performed, while on the other hand, new analogues with improved properties have been produced and introduced to therapy which turned out to be successful in the long run. Penicil- lins of the third and fourth generation with broad-spectrum activity and metabolic stability, b-blockers with improved selectivity, and many other specific drugs would simply not exist if it were not for the much-disparaged “me-too” research. 8.3 Systematic Variation of Aromatic Substituents The goal of lead structure optimization has an impact on the planning of the relevant experimental series. If the biological consequences of structural changes are to be evaluated with minimal effort, careful design must precede the synthesis of the substances. Here an almost unsolvable problem emerges in that, as a general rule, the exchange of a substituent or group leads to complex changes in multiple properties. The exchange of an ethyl group for a methyl group changes only the lipophilicity and size of the substituent. If a methyl group is exchanged for a chlorine atom, the polarizability, electronic properties, and moreover the metab- olism is altered. Other substituents could then change the H-bond donor and acceptor properties as well as the ionization and dissociation. In 1971, Paul Craig proposed the use of a simple diagram for the structural variation of aromatic substituents, with which the important characteristics of these substitu- ents, for instance, lipophilicity and electronic properties, are plotted against each other. The selection of substituents from different quadrants of this diagram allows an evaluation of different combinations of properties. The concept can be extended to multiple dimensions, possibly with the aid of mathematical and statistical methods. In 1972, John Topliss made a suggestion that went further, which would be called today an evolutionary strategy. One substituent at a time (e.g., hydrogen for chlorine) is exchanged in the optimization of the substitution pattern of an aromatic compound. The next compound is planned based on which of the first two com- pounds demonstrated better effects. If the new substituent improves the effect, a new substituent is chosen that has the same physicochemical properties, in larger measure, or more of these substituents are added. If the new substituents impair the biological activity, then a substituent is chosen that has the opposite physicochem- ical properties. If two different substituents produce the same effect, it should be evaluated whether changes in the physicochemical properties influence the activity in the opposite direction. Despite its elegance, this strategy often fails for the mundane reason that it is too time consuming to take such a stepwise approach. 8.3 Systematic Variation of Aromatic Substituents 157
  • 175.
    As a consequenceof the work of Craig and Topliss, further design methods were developed. None of these methods should be interpreted too closely. Synthetic planning must be oriented on both the accessibility of the compounds as well as achieving the largest possible structural variation, that is, a diversity of physico- chemical properties and 3D structure. Since the introduction of combinatorial chemistry (▶ Chap. 11, “Combinatorics: Chemistry with Big Numbers”), the ratio- nal design of diverse substance libraries has taken on entirely new possibilities and perspectives. 8.4 Optimizing the Activity and Selectivity Profile The structural variation of a lead structure influences not only the activity strength but also the activity spectrum. That can be thoroughly advantageous, but it also brings with it the risk that the selectivity can deteriorate. A simple rule of thumb is that enlarging the molecule, introducing optically active centers, and rigidification improves the selectivity, assuming that the activity is not entirely lost. On the other hand, removing a chiral center, establishing more flexibility, or reducing the size of the molecule usually results in unspecific and weaker activity. Because of the sequencing of the human genome, the gene family to which a target protein belongs is known, as is the number of members of the gene family. By using gene technology it is possible to construct single isoform test systems (assays). As a result, today pharmaceutical research is in a position to make a predictive selectivity profile. This has stimulated efforts to develop selective drugs. An interesting corollary to these efforts is the fact that the molecular weight of drugs has increased, as statistics show, in the last years, a confirmation of the above-mentioned rule of thumb. For drugs that are meant to act on neuroreceptors in the brain, the polarity is critical to whether they can cross the blood–brain barrier. Polar compounds are unable to do this and act only in the periphery, for instance, on the circulatory system. Examples of this are adrenaline 8.5 and dopamine 8.6 (Fig. 8.3). The stepwise removal or masking of polar groups brings the central effects into the foreground. Ephedrine 8.7 acts in the brain and in the periphery, it is centrally stimulating and raises the blood pressure. Amphetamine 8.8 (“speed”) and the intoxicant MDMA 8.9 (the designer drug “ecstasy”) are weak bases. Their relatively nonpolar neutral forms easily overcome the blood–brain barrier and their CNS effects dominate (Fig. 8.3). There are exceptions even here. L-DOPA 8.10 (Fig. 8.3) is an extremely polar amino acid. It could never cross the blood–brain barrier by passive diffusion alone. Instead it is recognized by an amino acid transporter and actively transported over the membrane and into the brain. This simultaneously solves the problem of bringing dopamine 8.6, which is used to treat Parkinson’s disease, into the brain because L-DOPA is decarboxylated to dopamine there (▶ Sects. 9.4 and ▶ 27.8). The decisive influence that even the smallest changes in the structure can have is seen in the effect spectrum of the hormone and neurotransmitter noradrenaline and adrenaline and their synthetic analogues. Whereas noradrenaline 8.11 (Fig. 8.4) 158 8 Optimization of Lead Structures
  • 176.
    OH OH 8.11 Noradrenaline, R= H Predominantly α-Mimetic N H R HO HO HO HO HO HO 8.5 Adrenaline, R = CH3 α- and β-Mimetic 8.12 Isoprenaline, R = -CH(CH3)2 β1-Mimetic N H 8.13 Dobutamine β1-Mimetic CH3 OH N H CH3 N H Cl CH3 OH CH3 CH3 H2N CH3 CH3 Cl 8.14 Salbutamol b2-Mimetic 8.15 Clenbuterol b2-Mimetic Fig. 8.4 Noradrenaline 8.11, adrenaline 8.5, and isoprenaline 8.12 act to different extents on the a and b receptors. Selective b1 and b2 agonists, for instance, 8.13, 8.14, and 8.15, act specifically as cardiac stimulants or bronchodilators. OH Polar Molecules Intermediate Polarity: Nonpolar Molecules: H N CH3 HO HO HO HO OH N CH3 NH2 CH3 8.8 Amphetamine H H NH2 R CH3 N CH3 O 8.5 Adrenaline 8.7 Ephedrine H CH3 O 8.6 Dopamine, R = H 8.10 L-DOPA, R = COOH 8.9 MDMA Fig. 8.3 The polar compounds adrenaline 8.5 and dopamine 8.6 are cardiovascularly active in the periphery after intravenous administration. Ephedrine 8.7 is more lipophilic and therefore shows both peripheral and central effects. The more nonpolar compound amphetamine 8.8 (“speed”) has overwhelmingly stimulatory effect in the CNS. 3,4-Methylenedioxymethamphetamine 8.9 (MDMA; “ecstasy”) is hallucinogenic. Polar groups are red and neutral or lipophilic groups are blue. 8.4 Optimizing the Activity and Selectivity Profile 159
  • 177.
    affects the a-adrenergicreceptors, its N-methyl derivative adrenaline 8.5 (Fig. 8.3) acts on a and b receptors as a mixed a/b agonist. This difference was used to enlarge the N-alkyl group to arrive at the specific b-agonist isoprenaline 8.2 (Fig. 8.4). Further differentiation of the effects could be achieved within the class of b-adrenergic substances. Dobutamine 8.13 is missing the alcoholic hydroxyl group of adrenaline. Despite its structural relationship to dopamine 8.6 (Fig. 8.3) it is a b1 agonist with cardioselective effects. Specific b2 agonists, for instance salbutamol 8.14 and clenbuterol 8.15 (Fig. 8.4) are used to treat asthma because they are bronchiodilators without the cardio-stimulatory effects of the unspecific b agonists (▶ Sect. 29.3). The sulfonamides are a prime example for the targeted optimization of lead structures in different therapeutic indications. From the first antibacterial examples, the diuretics as well as hypoglycemics (antidiabetics) resulted. It had already been noticed in 1940 that sulfanilamide (▶ Sect. 2.3) inhibits the enzyme carbonic anhydrase, and therefore should lead to increased urine production (▶ Sect. 25.7). Among other substances, hydrochlorothiazide 8.16, furosemide 8.17 (Fig. 8.5), and structurally related compounds gained therapeutic importance. In the early 1940s, the hypoglycemic effects of a few sulfonamides were clinically observed. The antibacterial and simultaneously hypoglycemic carbutamide 8.18 was introduced to therapy in 1955, the lipophilic and therefore more bioavailable tolbutamide 8.19 N NH S S O O O O Cl H2N H2N N H S O O Cl O O OH H H 8.16 Hydrochlorothiazide 8.17 Furosemide R S N H N H CH3 O O O 8.18 Carbutamide, R = NH2 8.19 Tolbutamide, R = CH3 Cl S N H N H O O O O OMe N H 8.20 Glibenclamide Fig. 8.5 The sulfonamides hydrochlorothiazide 8.16, furosemide 8.17, and related diuretics are different from most antibacterial analogues because of the unsubstituted sulfonamide group. Carbutamide 8.18 and tolbutamide 8.19 were the first unspecific sulfonamides with hypoglycemic effects that were later replaced with specific hypoglycemics of the glibenclamide-type 8.20. 160 8 Optimization of Lead Structures
  • 178.
    was introduced later.Systematic structural variation finally led to glibenclamide 8.20 (Fig. 8.5 and ▶ Sect. 30.2), which is much more potent and specific. 8.5 From Agonists to Antagonists There is no general recipe for the transformation of an agonist into an antagonist. An example of this is found in the tedious route from the agonist histamine to the H2 antagonist, as is described in detail in ▶ Sect. 3.5. There are, however, recognized principles that have proven to be of value. For example, the exchange of polar for non- polar substituents or the introduction of large groups such as additional aromatic rings changes some receptor agonists to antagonists. The exchange of both phenolic hydroxyl groups in isoprenaline 8.12 for two chlorine atoms (DCI, 8.21) or additional aromatic rings (pronethalol, 8.22) delivered the first b-adrenergic antagonists, the b-blockers. The introduction of an oxygen atom in the side chain, and further structural optimization afforded the first b1-selective antagonists, for example, practolol 8.23 and metoprolol 8.24. The b1-selective partial agonist xamoterol 8.25 is a blocker as well as an agonist (Fig. 8.6). It occupies b1 receptors and displays a moderately stimulating effect. By occupying the receptor, it protects it from an excessive response upon elevated adrenaline release, for instance, from exercise or stress. Analogously, the exchange of the imidazole ring of histamine 8.26 for large hydrophobic groups led to the first H1 antagonists, for instance, diphenhydramine 8.27 (Fig. 8.7). Sedation is the most troublesome side effect of the classic H1 antagonists, which are used to treat allergies. The non-sedating terfenadine Cl OH N H CH3 CH3 OH N H CH3 CH3 Cl OH 8.21 DCI 8.22 Pronethalol R O N H CH3 CH3 8.23 Practolol, R = -NHCOCH3 8.24 Metoprolol, R = -CH2CH2OMe O N H OH N H N O O 8.25 Xamoterol HO Fig. 8.6 3,4-Dichloroisoprenaline 8.21 (DCI) and pronethalol 8.22, the first unspecific b-blockers, were derived from isoprenaline 8.12. Practolol 8.23 and metoprolol 8.24 are specific b1 agonists. Xamoterol 8.25 is a partial b1 agonist, a combined agonist and antagonist. 8.5 From Agonists to Antagonists 161
  • 179.
    8.28 (R ¼H) can cross the blood–brain barrier because of its high lipophilicity, but is immediately expelled by a transporter. Because of its cardiotoxicity, terfenadine has been withdrawn from the market in the meantime and replaced by its active metabolite fexofenadine 8.28 (R ¼ COOH). The sedating side effects of antihista- mines also led to neuroleptics and antidepressants (▶ Sect. 1.6). Here, however, the limits of rational drug optimization are apparent. Promethazine 8.29 is an antihis- tamine with antiallergic action and sedating side effects. The neuroleptic chlor- promazine 8.30 is a central depressant and therefore an antipsychotic; the extraordinarily similar structure of imipramine 8.31 acts, on the other hand, as a stimulant and is an antidepressant (Fig. 8.8). All three substances have different mechanisms of action. The introduction of additional aromatic rings to other receptor agonists, for instance, to the neurotransmitters acetylcholine and dopa- mine, has led to antagonists (Fig. 8.9). 8.6 Optimizing Bioavailability and Duration of Effect The absorption of the majority of pharmaceuticals depends only on their lipophilicity. The more polar the drug, the more poorly it can penetrate the lipid membrane, and the lower the absorption (▶ Sect. 19.6). Increasing the lipophilicity improves the absorption (▶ Sect. 19.6). Extremely lipophilic compounds are insol- uble in water, and the absorption is too slow. Lipophilic acids and bases offer advantages here, if their acidity constant is not too far away from the neutral point, pH 7. In their ionized form they are highly water soluble, while in their neutral form, with which they are in equilibrium, they are lipophilic and membrane penetrable. N H O N CH3 CH3 N N NH2 8.26 Histamine H-Agonist 8.27 Diphenhydramine Non-polar H1 Antagonist (sedating) N R OH OH CH3 CH3 8.28 Terfenadine, T = CH3 Polar H1 Antagonist (non-sedating) Fexofenadine, Active Metabolite: R = -COOH Fig. 8.7 By starting with histamine 8.26 and introducing large hydrophobic groups, the H1 antagonists, for instance, diphenhydramine 8.27, were obtained. The non-sedating terfenadine 8.28 (R ¼ CH3) crosses the blood–brain barrier but is immediately expelled by a transporter. In the meantime the active metabolite, fexofenadine with R ¼ COOH, is in the market. 162 8 Optimization of Lead Structures
  • 180.
    These correlations arediscussed in detail in ▶ Sect. 19.5. The molecular size influ- ences the bioavailability insofar that substances with a molecular weight above 500–600 Da are captured by the liver on the sole grounds of the molecular size, and are quickly excreted with the bile. Aside from this there are substances that penetrate the membrane regardless of their polarity. These are taken up into the cell or are eliminated from the cell by transporters (▶ Sect. 30.7). Among these are structural analogues of amino acids and nucleosides. Classical strategies to extend the duration of action are the conversion of free hydroxyl groups to ethers (see ▶ Sect. 9.2), the replacement of esters with amides, and the replacement of metabolically labile amide groups with isosteres. In a few cases, such structural changes are associated with a reduction in potency, which is more than compensated for by a longer duration of action. In the case of peptides the replacement of L-amino acids with D-amino acids, the inversion of amide groups, and the replacement of larger structural elements with peptidomimetic groups (▶ Sect. 10.4) have all proven successful. The metabolism of aliphatic amino groups can be suppressed with alkyl substi- tution or branching at the a carbon. Secondary alcohols can be converted to the more bioavailable tertiary alcohols by introducing an ethinyl group at the same carbon atom (▶ Sect. 28.5). The introduction of an isosteric fluorine atom in the para position as a replacement for hydrogen prevents hydroxylation in this position. If steric considerations do not play a role, the para position can also be blocked N N NH3 + D A P H 8.26 Histamine (Positively charged form at pH = 7) Pharmacophore Fig. 8.9 The active substance histamine 8.26 and pharmacophores that are attributed to it (A acceptor, D donor, P positively charged group). S S N S CH3 N Cl N N H3C CH3 CH3 CH3 CH3 CH3 N N 8.29 Promethazine H1 Antagonist 8.30 Chlorpromazine Neuroleptic 8.31 Imipramine Antidepressant Fig. 8.8 Closely related structures of active substances can have very different qualitative activity. Chlorpromazine 8.30, a dopamine antagonist with neuroleptic activity, and imipramine 8.31, a dopamine transporter inhibitor with antidepressant activity, are both derived from promethazine 8.29, an H1 antagonist with antiallergic activity. 8.6 Optimizing Bioavailability and Duration of Effect 163
  • 181.
    with a largergroup, such as a chlorine atom or a methoxy group. In the hydroxylated 3- and 4-position of the neurotransmitters dopamine, adrenaline, and noradrenaline, the conversion to the monohydroxylated analogues, 3,5-dihydroxy compounds or to the NH-isosteric indole group (Fig. 8.1, Sect. 8.2) led to metabolically more stable and therefore longer-acting compounds. 8.7 Variations of the Spatial Pharmacophore Rational design is characterized by the fact that the common feature of all active compounds, and the differences to less potent or inactive analogues can be derived from the structure of the pharmacophore. A pharmacophore (Sect. 8.9) is defined as a special arrangement of particular functionalities that are common to more than one drug and form the basis of the biological activity (▶ Sect. 17.1). During the course of rational optimization the molecular scaffold and the sub- stituents at a pharmacophore are changed to maintain the principle function while arriving at higher potency or better selectivity. Many computer methods have been developed to generate ideas for the spatial isomorphic replacement of ligand scaf- folds. By considering the conformational aspects of the molecules (▶ Chap. 16, “Conformational Analysis”), they scan databases to find possible candidates that, despite a different parent scaffold, can place the side chains and interacting groups in the same spatial orientation. Examples of such approaches are presented in ▶ Sect. 10.8 and ▶ Chap. 17, “Pharmacophore Hypotheses and Molecular Com- parisons”. But an indirect approach using the protein structure has also been tried. For this, the spatial structure of the protein–ligand complex is the starting point from which a part of the binding pocket is cut out, and new building blocks for the ligand are sought. Subsequently the form and interaction properties of the cut-out pocket are compared with a database of all known protein–ligand complexes (▶ Sect. 20.4). If a subpocket is discovered that has similarities to the sought- after pocket, then ligands that bind there provide an interesting design hypothesis. The structure of the building blocks that occupy the newly discovered pocket can generate ideas for isosteric structural elements in a modified ligand. A different strategy that also considers the pharmacophore can be successful. In this approach the pharmacophore is retained and only those groups are modified that affect the pharmacokinetic properties, that is, the transport, distribution, metabolism, and excretion of a molecule. An efficient and pragmatic strategy is important. For this, it is essential that not too many changes are made at the same time, and the changes should not be too biased. With little synthetic effort, a broad spectrum of physicochemical properties and spatial arrangements should be covered. In the meantime it has been established that binding to human plasma proteins such as serum albumin and the acidic k1-glycoprotein is of decisive importance for the transport and pharmacokinetic properties of a drug. Therefore binding to these proteins is considered even in the early phase of drug development (▶ Chap. 19, 164 8 Optimization of Lead Structures
  • 182.
    “From In Vitroto In Vivo: Optimization of ADME and Toxicology Properties”). On the other hand, binding to the hERG ion channels (so-called “antitarget”) is avoided because blocking these channels can lead to arrhythmias (▶ Sect. 30.3). Drug metabolism is in itself a very important theme and must be considered in earlier phases of development. The cytochrome P450 enzymes are responsible for the vast majority of chemical transformations that occur on xenobiotics (▶ Sect. 27.6). To be able to predict the behavior of drug candidates at this stage of the development process, the expected interactions with these metabolic enzymes are evaluated in an early phase of optimization. The expression of P450 enzymes can also be induced by xenobiotics. The trigger for this could be the binding to a transcription factor like the PXR receptor (▶ Sect. 28.7). Drug candi- dates binding to this transcription factor can be evaluated early in their development to avoid this undesirable enhanced metabolism. 8.8 Optimizing Affinity, Enthalpy, and Entropy of Binding and Binding Kinetics Generally, the binding affinity to a target protein is primarily improved during the course of optimization. If multiple candidates are available, the ligand efficiency (▶ Sect. 7.1) in addition to the chemical accessibility leads the way. Small, potent lead structures offer legitimate hope that they can be well optimized. Very small compounds that have nanomolar affinity, despite their low molecular weight, can be problematic. Most of the time an optimal interaction pattern is already established. It is then almost impossible to transfer this pattern to another molecular scaffold. Medicinal chemists have established a set of rules based on experience (▶ Sect. 4.10). According to these rules it is possible to judge how much a particular group, if correctly placed, can contribute to the binding affinity. It was shown in ▶ Sect. 4.10 that the affinity is a combination of the enthalpic and entropic contributions. Usually one begins with a lead structure that has a binding affinity in the micromolar range. Expressed as the Gibb’s free energy DG, this is usually about 30 kJ/mol. An increase in the binding affinity of 4–5 orders of magnitude causes an improvement in DG of 20–30 kJ/mol. Where should the screw be turned to optimize a lead structure? Does it make more sense to improve the binding enthalpy, or is one better advised to improve the binding entropy? Given the enthalpy/entropy compensation described in ▶ Sect. 4.10, is it even possible to attempt optimization of both values independently? The prereq- uisite for using such a concept in the optimization is the determination of both values of a lead structure. Does this help in the choice of the right candidate for optimization? In the case that the thermodynamic binding profiles of multiple alternative lead candidates are known, should enthalpically or entropically driven binders be chosen for optimization? It is very interesting to compare the thermodynamic signatures of multiple generations of marketed products. The binding profiles for HIV protease inhibitors (▶ Sect. 24.3) and HMG-CoA 8.8 Optimizing Affinity, Enthalpy, and Entropy of Binding and Binding Kinetics 165
  • 183.
    inhibitors (▶ Sect.27.3) are displayed in Fig. 8.10. Notably, it has been successful to shift the profile from initially strongly entropically driven binders to enthalpically driven ones. This observation suggests that it is initially simpler to optimize a substance’s entropic binding contribution than its enthalpic contribu- tion. Most of the time this can be seen in the first lead structure upon which an enlargement of the hydrophobic surface area leads to better binding. The affinity that is gained is explained by the displacement of ordered water molecules (▶ Sect. 4.6). Such contributions are assumed to be entropically favorable. A strategy of introducing rigid rings can also be pursued. In doing so, the com- pound loses degrees of freedom. If the geometry of the bound state is correctly frozen, the binding is improved for entropic reasons. An example of this is the 5 ΔG ΔH −TΔS kcal/mol −5 0 −20 −15 −10 I n d i n a v i r S a q u i n a v i r N e l f i n a v i r R i t o n a v i r A m p r e n a v i r L o p i n a v i r A t a z a n a v i r T i p r a n a v i r D a r u n a v i r 5 kcal/mol −5 0 −20 −15 −10 F l u v a s t a t i n P r a v a s t a t i n C e r i v a s t a t i n A t o r v a s t a t i n R o s u v a s t a t i n Fig. 8.10 Between 1995 and 2006, the profile of multiple development generations of HIV protease inhibitors (upper, for formulae see ▶ Fig. 24.15) and statins as HMG-CoA inhibitors (lower, for formulae see ▶ Fig. 27.13) could be optimized for their thermodynamic signatures, that is, the extent to which they are driven by entropy or enthalpy. The free energy DG is shown in red, the enthalpy DH in blue, and the entropic contribution TDS in green. The more negative the column becomes, the stronger the binding affinity and the more the profile is determined by enthalpy or entropy. The initially developed compound such as indinavir, saquinavir, nelfinavir, and pravastatin were entropic binders; in contrast, the newer derivatives such as darunavir or rosuvastatin have an improved enthalpic profile. 166 8 Optimization of Lead Structures
  • 184.
    binding of thelargely rigid thrombin inhibitor 8.32, which binds in an almost exclusively entropically driven manner to the protein (Fig. 8.11). In contrast, the decidedly more flexible ligand 8.33 displays a large enthalpic binding contribu- tion. Compound 8.32 represents the result of an optimization that led to a substance with single-digit nanomolar binding and an optimal shape complementarity for the binding pocket of thrombin. As it seems, in general there are applicable concepts for the entropy-driven optimi- zation. If one can “always win entropically,” then for theoretical reasons enthalpically favored lead structures should be preferred as a starting point for optimization. However, caution is called for here. Why a ligand has a particular thermody- namic profile must be clarified. The inhibitors 8.34 and 8.35 were discovered in a virtual screening as aldose reductase inhibitors (Fig. 8.12). The chemical struc- tures of both ligands are very similar. Nevertheless one is an enthalpically driven binder, and the other is an entropically driven binder. The crystal structure of both ligands with the protein delivered the reason: the enthalpically preferred inhibitor 8.34 entraps a water molecule, which mediates binding between the ligand and the protein, whereas the other one does not. The incorporation of a water molecule is entropically disfavored, and therefore the profile appears to be that of an enthalpic binder. A resistance profile for inhibitors against mutants of the viral HIV protease was investigated in the research group of Ernesto Freire at The Johns Hopkins University in Baltimore (▶ Sect. 24.5). Interestingly, the result was that resistance to the entropically favored inhibitors could be developed much faster than to inhibitors with enthalpic advantages. This observation indicates that it is worth- while to concentrate on enthalpically favored binders in cases in which resistance can be expected to develop. In the investigated example the enthalpically driven O O O N CH3 CH3 S N H O N H O O O O H3C H3C CO2H N N H H HN NH2 O NH2 8.33 8.32 HN ΔG: −42.3 kJ/mol ΔH: −6.2 kJ/mol −TΔS: −36.1 kJ/mol ΔG: −49.2 kJ/mol ΔH: −48.5 kJ/mol −TΔS: −0.7 kJ/mol Fig. 8.11 The rigid thrombin inhibitor 8.32 only has a small number of rotatable bonds. It has an optimal shape complementarity to the binding pocket of thrombin. Its binding is, for the most part, entropically driven. On the other hand, the considerably more flexible ligand 8.33 has a higher enthalpic binding contribution. 8.8 Optimizing Affinity, Enthalpy, and Entropy of Binding and Binding Kinetics 167
  • 185.
    binder 8.33 hada less-rigid scaffold (Fig. 8.11). This allows it to more easily elude changes that are caused by mutations. It is much more difficult for rigid ligands that bind for entropic reasons to adapt to such steric modifications. On the other hand, entropic binders can also have an advantage in escaping resistance. If a ligand is entropically favored because it adopts multiple binding modes, and even exhibits large residual mobility in the binding pocket when bound, this can prove to be beneficial! If the protein tries to change the shape of its binding pocket through resistance mutations to this inhibitor, an incorporated ligand that is able to adopt multiple binding modes is left with alternative orientations, which, despite the mutation, still offer good binding. If it is clear that a lead structure is an enthalpically driven binder, and superimposed effects such as the entrapment of water molecules have not distorted the profile, how is the binding of an enthalpically driven binder optimized? Let us remember the consideration in ▶ Sects. 4.5 and ▶ 4.8: hydrogen bonds, electrostatic interactions, and van der Waals contacts determine the binding enthalpy. However, a change in such an interaction property of a molecule is often coupled with a compensation of enthalpy and entropy. The result is that DG and the binding affinity do not change at all! The optimization process can be compared to the act of getting around the inherent enthalpy/entropy compensation. Enthalpically favorable hydrogen bonds should have an optimal geometry and should not induce severe structural changes in the protein environment. Otherwise this can lead to an entropic compensation by causing a shift in the dynamic degrees of freedom. It seems to be more favorable to strengthen the hydrogen bonds in structurally rigid regions of the binding pocket. There, enthalpy is better gained because the compensatory shift in dynamic parameters is less likely. Introduced hydrogen bonds should also not reduce the degree of desolvation of a bound ligand in that they induce small structural changes in the binding geometry of hydrophobic groups that become stronger when OH O S OH O N N O S O O N O N O2N O2N 8.34 8.35 ΔG: −35.4 kJ/mol ΔH: −25.6 kJ/mol −TΔS: −9.8 kJ/mol ΔG: −31.3 kJ/mol ΔH: −8.7 kJ/mol −TΔS: −22.6 kJ/mol Fig. 8.12 Compounds 8.34 and 8.35 were discovered in a virtual screening as lead structure for the inhibition of aldose reductase. Although they are structurally similar, 8.34 is a stronger enthalpic binder and 8.35 is an entropic binder. The subsequent crystal structure analysis of the complex with the reductase showed that 8.34 traps a water molecule upon binding, whereas this was not observed with 8.35. Because the entrapment of a water molecule is entropically unfavor- able, the binding of 8.34 is enthalpically preferred. 168 8 Optimization of Lead Structures
  • 186.
    exposed to thesurrounding solvent environment. It is also important that the local water structure in the binding pocket remains unchanged. Another essential question has to do with the optimal interaction kinetics that a ligand should have. Surface plasmon resonance was introduced in ▶ Sect. 7.7. The question of whether a ligand binds quickly or slowly to a protein and with what rate it is released again can be determined with this method. Ideally, how long should a ligand stay bound to a protein, what is the optimal residence time? The binding affinity is determined by the relative ratio of the association rate (kon) and the dissociation rate (koff). It has been shown that structurally similar ligands can have entirely different kinetic profiles. Which profile is optimal? A loss in affinity can manifest itself as an increased dissociation rate, or a slower association rate, as well as a combination of both effects. It was shown in the research group of Helena Danielson in Uppsala that different binding profiles of therapeutically used HIV protease inhibitors correlate with the development of resistance to mutants of the protease. They also demonstrated that resistance forms more rapidly against drugs that have a higher dissociation rate. This is a decisive criterion to direct drug optimization in the correct direction. Certainly the kinetic binding profile must be granted a greater priority in the future. Therefore, a more comprehensive correla- tion between the structure and the binding is necessary so that this knowledge can be used for targeted design. Until now, what differentiates a “fast” or “slow” binder has only been understood in a very few cases. These are parameters that have to do with the induced-fit adaptations of the protein. It can also involve the ease with which the desolvation of the previously uncomplexed binding pockets takes place or with the kinetics with which a ligand in the solvated state sheds its own water shell. More attention must be paid to these protein and ligand-based properties. 8.9 Synopsis • A lead structure is only the starting point on the way to a drug; potency, specificity, and duration of action have to be optimized concurrently to minimize side effects and toxicity. • The structure of an active substance is determined by its pharmacophore, which is responsible for target binding. Its adhesion groups enhance potency and biological activity, its lipophilicity is responsible for transport and distribution, and groups to be cleaved or modified release the active form. • Multiple concepts to modify the chemical structure of a lead can be planned, however, optimization is multifactorial due to highly correlated influences of the attempted changes. • Bioisosteric functional group replacement attempts the exchange of groups on a given skeleton for sterically and electronically related groups that maintain activity but improve other drug properties. • Me-too research follows the goal of modifying the competitor’s lead structures to arrive at patent-free analogues with improved properties. 8.9 Synopsis 169
  • 187.
    • Assuming unchangedactivity, enlarging a molecule, adding chiral centers, and rigidification usually improves selectivity, whereas removing chiral centers, allowing more flexibility, and reducing the size makes a drug less selective. • The activity spectrum of a substance can be tailored even by the smallest structural changes that modulate affinity, transportation, distribution, or metab- olism. Therefore a particular compound class can show activity in quite different therapeutic indications. • Transforming agonists to antagonists does not follow clear-cut rules, however, increasing the size and the attachment of hydrophobic groups such as aromatic rings often shift the profile. • The more polar a drug, the more poorly it can penetrate lipid membranes, and the lower is the absorption. On the other hand, special transporters can assist penetration. • Extension of the duration of action is mostly achieved by replacement of metabolically labile groups with more stable isosteres, the introduction of more branching groups, blockage of metabolically labile positions at aromatic rings by F or Cl, or by exchanging L- for D-amino acids concurrently with the inversion of amide groups. • Molecular databases can be screened to detect other scaffolds or substitution patterns that represent a given pharmacophore in an alternative fashion. • In the early phase of drug development undesired binding to plasma proteins, antitargets such as the hERG ion channel or preferred binding, inhibition, or activation of transcription factors or metabolizing cytochrome P450 enzymes are examined and possibly avoided. • Proper adjustment of the thermodynamic binding profile can be essential for the optimization of binding affinity and to endow a drug with the required target- specific properties. Similarly the interaction kinetics determining binding on and off rates or residence times are of decisive importance to develop drugs with, for example, an optimal resistance profile. Bibliography General Literature Sneader W (1985) Drug discovery: the evolution of modern medicines. Wiley, New York Taylor JB, Triggle DJ (eds) (2007) Comprehensive medicinal chemistry II. Elsevier, Oxford Wermuth CG (ed) (2008) The practice of medicinal chemistry, 3rd edn. Elsevier-Academic, New York Special Literature Burger A (1991) Isosterism and bioisosterism in drug design. Fortschr Arzneimittelforsch 37:287–371 170 8 Optimization of Lead Structures
  • 188.
    Copeland RA, PomplianoDL, Meek TD (2006) Drug–target residence time and its implications for lead optimization. Nat Rev Drug Discov 5:730–740 Fokkens J, Klebe G (2006) A simple protocol to estimate protein binding affinity differences for enantiomers without prior resolution of racemates. Angew Int Ed Engl 45:985–989 Hansch C (1974) Bioisosterism. Intra-Science Chem Rept 8:17–25 Lipinski CA (1986) Bioisosterism in drug design. Ann Rep Med Chem 21:283–291 Ohtaka H, Freire E (2005) Adaptive inhibitors of the HIV-1 protease. Prog Biophys Mol Biol 88:193–208 Shuman CF, Markgren P-O, H€ am€ al€ ainen M, Danielson UH (2003) Elucidation of HIV-1 protease resistance by characterization of interaction kinetics between inhibitors and enzyme variants. Antiviral Res 58:235–242 Steuber H, Heine A, Klebe G (2007) Structural and thermodynamic study on aldose reductase: nitro-substituted inhibitors with strong enthalpic binding contribution. J Mol Biol 368:618–638 Thornber CW (1979) Isosterism and molecular modification in drug design. Chem Soc Rev 8:563–580 Bibliography 171
  • 190.
    Designing Prodrugs 9 After theoptimization of a lead structure there are still problems. Many substances lack important characteristics that are required for therapy in humans, for instance, adequate bioavailability, duration of action and metabolic stability, the ability to penetrate the blood–brain barrier, selectivity, or good tolerability. Often it proves impossible to address or improve these properties through structural variation. A solution to this problem can be found through special preparations, for instance to be used for poorly water-soluble substances, or via a derivatization to a prodrug. This term refers to a non-active or poorly active precursor or derivative of an active molecule. In the organism this form is converted to the actual active substance. In most cases, this is achieved by enzymatic reactions, in a few cases it happens by spontaneous chemical decomposition. Aside from this, the metabolites of some drugs also show favorable therapeutic properties. In some cases this has led to new and improved drugs, in other cases the original substance was retained as a prodrug. 9.1 Foundations of Drug Metabolism Multiple factors have crucial importance for the absorption, bioavailability, and duration of action of an active substance. The most important are the solubility and lipophilicity of the drug, which are nearly equal in importance, followed by the molecular size and the metabolic stability. The terms absorption and bioavailability have very different meanings. Absorption refers to the amount of active substance that is taken up by the entire gastrointestinal tract. The bioavailability refers to just the portion of the active substance that is available in the circulation after the first pass through the liver. After oral administration, the metabolism of the substance by enzymes begins. Ester and amide bonds are hydrolyzed, often already in the stomach and intestines, or by passage through the stomach and intestinal wall. The entire blood volume that flows through the intestines goes first to the liver via the portal vein (Fig. 9.1). This passage is called “first pass”. Because of its rich spectrum of hydrolyzing, G. Klebe, Drug Design, DOI 10.1007/978-3-642-17907-5_9, # Springer-Verlag Berlin Heidelberg 2013 173
  • 191.
    oxidizing, reducing, andconjugating enzymes, the liver is the main site of drug degradation, that is, metabolism. A drug can have poor bioavailability despite good absorption because of fast and pronounced metabolism in the liver. For many substances, the first pass is already ‘the end of the road’. They are well absorbed, but are immediately metabolized or excreted in the bile. The “first-pass effect” refers to cases of successful and extensive metabolism in the very first passage. Lipophilic active substances and those with a molecular weight of more than 500–600 Daltons (Da) are susceptible to particularly intense first-pass effects. Of course, blood flows continuously through the liver, and metabolism carries on. The substances are no longer in the blood stream at as high a concentration as they were before the first liver passage because they have been distributed to the tissue. In general, the hydrolytic cleavage of ester or amide groups leads to highly water- soluble metabolites that can be excreted by the kidneys. Conjugation, that is, the coupling of the substance with native polar substances, for instance, with sulfate groups, the amino acid glycine, or the glucose oxidation product glucuronic acid, leads to easily excreted products. In humans, conjugation has great importance. It is more critical if the substance has neither easily degradable functional groups nor conjugation positions. Nonetheless humans have enzymes that can metabolize xeno- biotics. Among these, the cytochrome P450 isoenzymes are particularly important because they are able to chemically change a molecule oxidatively at various posi- tions. Usually this leads to better water solubility and therefore better-excretable substances. Because these enzymes cannot predict what properties the metabolites of these biotransformations will possess, it can occasionally happen that toxic com- pounds ensue that have mutagenic or carcinogenic properties (▶ Sect. 27.6). Evolution has had time over millions of years to hone the degradation and excretion of foreign substances. For many compounds however, the system fails. Instead of detoxifying, the opposite happens, a “poisoning”. The carcinogenic effect of polycyclic hydrocarbons is attributed to an oxidative assault, just as is Bile Metabolites Liver Organs Circulatory System Feces Metabolites Gastrointestinal Wall Portal Vein Kidney Drug Urine Fig. 9.1 Schematic sketch of the “lifecycle” of a drug after oral administration. The drug is already metabolized during the passage through the stomach or intestinal wall, and above all, at the first pass through the liver. Lipophilic drugs and substances with a molecular weight of more than 500–600 Da are excreted with the bile. Polar substances and conjugated and/or metabolic products (metabolites) are excreted by the kidneys. 174 9 Designing Prodrugs
  • 192.
    the bone marrowdamage and blood disease that is caused by benzene 9.1. The simplest alkyl homologue of benzene, toluene 9.2 is less toxic for this reason alone because it can be oxidized to benzoic acid 9.3, which, after conjugation with the amino acid glycine, can be excreted as hippuric acid 9.4 (Fig. 9.2). There are even more conjugation possibilities available for the benzoic acid intermediate. One can speculate as to why no multienzyme complexes have evolved to immediately convert toxic intermediates into polar, nontoxic metabolites. In any case, it is an almost unsolvable problem because the properties of the metabolites would have to be predicted for each xenobiotic. A modification that leads to improved water solubility in one compound can cause a mutagenic effect in another. For their own protection humans have, in fact, mechanisms for trapping reactive metabolites. Here glutathione and glutathione transferase must be men- tioned because they can detoxify electrophiles particularly well (▶ Sect. 27.7). Perhaps toxic or carcinogenic effects were not a particularly decisive theme for evolution until now. Tumors play a secondary role for most animals because of their short lifespan. Up until just a few generations ago, war and infectious diseases were the primary causes of death in humans. It has only been in recent times that the average life expectancy increased. In the sense of evolution, aging individuals play only a secondary role. Once reproduction is complete, the parents are only neces- sary for the care of their young until early adulthood. One only needs to think of female spiders that consider their mates to be nothing more than their next prey immediately after copulation! From the above-described examples of toxic chemicals, the wrong conclusion should not be drawn that only human-made substances can cause cancer. That is true for a few natural products as well, for instance, aflatoxins. These microbial secondary metabolites, which form in spoiled nuts and other foodstuffs are potent carcinogens. Certain alkaloids, for example, from the Spurge family (Euphorbiaceae) are also strongly cancer-promoting substances; they are so-called tumor promoters. The principle of nil nocere (Lat. do not harm) is strictly applied to medicines, and only slowly have these standards been applied to other materials in our O H H Conjugation with Macromolecules Further Metabolization COOH CH3 9.1 Benzene Epoxide 9.2 Toluene 9.3 Benzoic Acid 9.4 Hippuric Acid CONHCH2COOH Fig. 9.2 The oxidation of benzene 9.1 leads to a reactive and toxic intermediate. In contrast, the oxidation of toluene 9.2 affords benzoic acid 9.3, which can be excreted by the kidney as its nontoxic glycine conjugate 9.4. 9.1 Foundations of Drug Metabolism 175
  • 193.
    environment. For thetesting and development of active compounds, this means that particularly rigorous tests for carcinogenic, mutagenic, and teratogenic effects must be conducted. The well-founded suspicion alone that a compound or one of its possible metabolites displays such effects leads to the consequence that the com- pound is not further developed. 9.2 Esters Are Ideal Prodrugs Establishing satisfactory water solubility in substances that are simultaneously suitable for passive transport across membranes is a special challenge in pharma- ceutical optimization. Nowadays attention is paid to the correct balance of these parameters already in the early phase of development (▶ Chap. 19, “From In Vitro to In Vivo: Optimization of ADME and Toxicology Properties”). If it is not possible to achieve this optimum with the actual active substance, esters are often produced as suitable prodrugs. Esters are easily cleaved by ubiquitously occurring esterases. The improved lipophilicity helps with the passive transport through diffusion over membrane barriers, as found in the intestines and above all else the blood–brain barrier. One prodrug that has sadly achieved infamy is heroin 9.5 (Fig. 9.3), the diacetyl ester of morphine (▶ Sect. 3.3). Because of its markedly increased lipophilicity, heroin penetrates the blood–brain barrier quickly. The pharmacologist Heinrich Dreser, who tested acetylsalicylic acid at Bayer, intro- duced heroin to therapy in 1898 as a pain and cough medicine because of its minimal respiratory depression. But heroin belongs to the substances with the highest addictive potential. Its abuse is an enormous social problem in many countries. It is used therapeutically in exceptional cases, for instance, for pain therapy in cancer patients, particularly those, who have exhausted other thera- peutic options. Many other prodrugs are also esters. The transformation from an acid or alcohol group to an ester usually leads to a better-absorbable product. The formerly used antilipidemic clofibrate 9.6 (▶ Sect. 28.6) is just such an example of a bioavailable ester of a biologically active free acid 9.7. The angiotensin-converting enzyme inhibitor enalapril 9.8 (▶ Sect. 25.4) and its analogues are also prodrugs. The free acid 9.9 is not absorbed, but it is the active form in vitro (Fig. 9.3). The diester is chemically unstable and quickly forms the inactive diketopiperazine 9.10. It is essential that only one of the acid groups is esterified to prevent the formation of this side product. The monoester 9.8 is “interpreted” as a dipeptide and is transported over the cell membrane by an oligopeptide transporter (▶ Sect. 30.7). The b-lactam antibiotics (▶ Sect. 23.7) are also taken up by this transporter. Hydroxymethylglutaryl-coenzyme A 9.11 (HMG-CoA) is enzymatically reduced to mevalonic acid 9.12 in the biosynthesis of cholesterol (Fig. 9.4). The antilipidemic lovastatin 9.13 (▶ Sect. 27.3) prevents this reaction by inhibiting HMG-CoA reductase. It contains a lactone ring, which is transformed to its active form 9.14 by hydrolysis. This form is structurally very similar to the product of the enzymatic reaction, mevalonic acid 9.12. 176 9 Designing Prodrugs
  • 194.
    Other ester prodrugswere developed for depot formulations to achieve a longer duration of action after subcutaneous or intramuscular administration. The phenolic hydroxyl group of bambuterol 9.15 is masked as a carbamate. Terbutaline 9.16 (Fig. 9.5) is formed from this prodrug after hydrolysis by unspecific cholinesterases (▶ Sect. 23.7). By using this prodrug strategy it was possible to make a long-acting bronchospasmolytic that only needs to be adminis- tered once daily in contrast to the actual active substance, which must be admin- istered three times daily. Occasionally, a prodrug can be used to improve the taste, for instance, in the case of the extremely bitter chloramphenicol 9.17. By converting it to the palmitate 9.18 (Fig. 9.5) the water solubility is strongly reduced, but the substance no longer tastes bitter. The concomitant reduction in the absorption is of no consequence. The substance is hydrolyzed to the highly soluble and easily absorbed chloramphenicol in the duodenum by the pancreatic lipase enzymes. The glucoside salicin (▶ Sect. 3.1) represents a true prodrug that after hydrolysis and oxidation is converted to the anti-inflammatory salicylic acid. In contrast, acetylsalicylic acid (ASA) is a mixed type. It has its own activity through the CH3COO Cl O H3C CH3 OR O O N CH3 H H CH3COO Cl 9.6 Clofibrate, R = Et 9.5 9.7 Clofibric acid, R = H R2 R1 N N CH3 ROOC N N O O EtOOC O COOH 9.8 Enalapril, R = Et 9.9 9.10 Diketopiperazine H Enalaprilat, R = H R1 = Phenethyl, R2 = Me Heroin Fig. 9.3 Heroin 9.5, the diacetyl derivative of morphine acts reliably and quickly, “heroically.” Like morphine, it is slowly and inefficiently absorbed, but after intravenous application it crosses the blood–brain barrier 100 times faster than morphine. There, the ester is converted by the enzyme pseudocholinesterase to morphine, which can no longer leave the brain because of its higher polarity. The cholesterol-lowering drug clofibrate 9.6 is a prodrug of the actual active compound, the free acid 9.7. The antihypertensive enalapril 9.8 is also a prodrug of the active compound 9.9. Here the high lipophilicity is not responsible nor the absorption, rather it is actively transported by binding to a dipeptide transporter. The diester of enalapril is unsuitable as a drug because it spontaneously forms the inactive diketopiperazine 9.10. 9.2 Esters Are Ideal Prodrugs 177
  • 195.
    irreversible inhibition ofcyclooxygenase, above all as a coagulation-inhibiting substance. On the other hand, ASA has prodrug character because the metabolic release of salicylic acid contributes a small part to the anti-inflammatory effect (▶ Sect. 27.9). Furthermore, ASA is less irritating to the mucous membranes and tastes less unpleasant than salicylic acid. For a drug with a molecular weight of 180 Da, this combination of favorable characteristic in one structure is a proud achievement. Esterification can also help with inadequate water solubility of an active sub- stance. For this, esters with phosphoric acid or hemiesters with dicarboxylic acids such as succinic acid are formed. The added groups carry a charge and increase the water solubility of the active substance. In the organism, the esters are easily hydrolyzed again. The anticonvulsive compound phenytoin could be converted to a more-hydrophilic phosphate prodrug 9.19 (Fig. 9.5), which is easily hydrolyzed by phosphatases (▶ Sect. 26.8). If a terminal sulfonamide group, as found in the prodrug of celecoxib (9.21, 9.22 Fig. 9.5), is acylated, water-soluble salts are more easily formed. The acyl group is also easily hydrolyzed in the intestines. Esterification with polyethylene glycol (PEG) can also be used to enhance solubility. This very water-soluble polymer has been coupled through an ester group to the natural product paclitaxel (▶ Sect. 6.2, ▶ 6.5). As PEG-paclitaxel, this compound can be used as an intravenous chemotherapeutic. 9.3 Chemically Well Wrapped: Multiple Prodrug Strategies The antibacterial sulfonamide, sulfamidochrysoidine (▶ Sect. 2.3) is a prodrug. It is only after cleavage of the azo bond that the metabolic product, sulfanilamide, acts as an antimetabolite of p-aminobenzoic acid, which is critical for microorganisms. COOH H3C HO OH COOH H3C HO HMG-CoA- Reductase O SCoA OH 9.11 9.12 Mevalonic Acid HMG-CoA O H HO O OH COOH H HO O R 9.13 Lovastatin 9.14 Active Metabolite R Fig. 9.4 The enzymatic reduction of hydroxymethylglutaryl-coenzyme A 9.11 (HMG-CoA) to mevalonic acid 9.12 is inhibited by the lactone-ring-opened active metabolite 9.14 of lovastatin 9.13 (▶ Sect. 27.3). 178 9 Designing Prodrugs
  • 196.
    O CH3 H3C O N CH3 H3C O OH N H CH3 N HO HO OH N H CH3 Bioactivation O OPO(OH)2 SO2 N R O Na+ 9.15 Bambuterol 9.16 Terbutaline HOH HN O H CHCl2 O2N N H N O O N N H3C R O CF3 9.17 R = H 9.18 R = CO(CH2)14CH3 9.19 Fosphenytoin 9.20 R = Methyl 9.21 R = Ethyl Celecoxib Prodrugs N N N NH NH N N N NH2 H H H CH3 CH3 Cl Cl CH3 S O S 9.22 Proguanil 9.23 Cycloguanil S S F CH2COOH CH2COOH F 9.25 9.24 Sulindac Bioactivation Bioactivation H3C H2N Fig. 9.5 Bambuterol 9.15 is a carbamate-masked prodrug of the bronchospasmolytic terbutaline 9.16. It is transformed to the active compound slowly, by hydrolysis. The prodrug 9.18 of chloramphenicol 9.17 masks only its extremely bitter taste. Phenytoin can be converted to a phosphoric acid ester 9.19, which is significantly better water-soluble. The cyclooxygenase inhibitor celecoxib can be converted to prodrugs (9.21–9.21) by adding acyl groups; these have much-improved water solubility. The antimalarial cycloguanil 9.23 is formed by a metabolic cyclization of the inactive precursor proguanil 9.22. The anti-inflammatory sulindac 9.24 has 100 times better water solubility than its actual active form, the sulfide 9.25. In addition to this reversible enzymatic reduction an irreversible enzymatic oxidation to a biologically inactive sulfone also occurs. 9.3 Chemically Well Wrapped: Multiple Prodrug Strategies 179
  • 197.
    Additional prodrugs areproguanil 9.22, which is converted to cycloguanil 9.23 (▶ Sect. 27.2), or the anti-inflammatory sulindac 9.24, which is metabolically converted to the active sulfide 9.25 (Fig. 9.5). Amidines are used as building blocks in thrombin inhibitors and antagonists of the integrin receptor aIIbb3 (▶ Sect. 31.2). These strongly basic groups are detri- mental for good bioavailability. Through oxidation to the corresponding amidoximes, a less-basic group is formed that is not protonated under physiolog- ical conditions. Reductases, which are present in the liver, kidney, lung, or brain, release the original amidine structure. This concept, together with the esterifica- tion of the terminal acid function, was applied in a double-prodrug strategy for the thrombin inhibitor ximelagatran 9.26 and the receptor antagonist sibrafiban 9.27 (Fig. 9.6). The bombing of an allied ship that was docked in an Italian harbor in 1943 with 100 t of mustard gas 9.28 (bis-b-chlorethylsulfide, Fig. 9.7) led to the observation that many of those who were poisoned experienced a severe reduction in their white blood cell counts. This severe toxicity for cells that quickly divide could be used for killing tumor cells. The cytotoxic effect arises from multiple alkylations of DNA. Consequently, replication and subsequent cell devision are affected. A purposeful search for analogues of mustard gas with less toxicity led over N-derivative 9.29 to the aromatic-substituted derivative 9.30, which still had inadequate tolerability and tumor specificity. Tumor cells are especially rich in phosphatases. Because of this, H. Arnold at the German company Chemie Gr€ unenthal reasoned that phosphoric acid derivatives of N-lost might be suitable for a tumor-specific therapy. The most interesting compound was cyclophosphamide 9.31, a substance that can cause the complete disappearance of tumors in animal experiments. The originally assumed mechanism is not correct because the substance is inactive in vitro in cell cultures of tumors. The metabolic activation occurs outside of the tumor in the liver through oxidation (Fig. 9.7). O N H O O N N N H EtO NOH NH2 NH2 O Ximelagatran 9.26 O N N O EtO H O NOH Sibrafiban 9.27 Fig. 9.6 Ximelagatran 9.26 and sibrafiban 9.27 were developed to improve oral bioavailability, and contain both an uncharged amidoxime group and an ester function as a double prodrug. 180 9 Designing Prodrugs
  • 198.
    In the caseof the cancer therapeutic 5-fluorouracil 9.33, the activation occurs through tumor-specific enzymes. The triple-prodrug capecitabin 9.34 is initially activated to 9.35 by a carboxylesterase in the liver (Fig. 9.8). Then cytidine deaminase cleaves an amino group to give 9.36 in the liver as well as in the tumor. Lastly thymidine phosphorylase releases the active substance 9.33 in the tumor cell. There, the compound unleashes its effect by blocking thymidylate synthase, an enzyme that plays an important role in the thymine biosynthesis (▶ Sect. 27.2) in that it delivers building blocks for DNA synthesis. Because cancer cells divide more quickly than healthy cells, they are more dependent on the activity of thymidylate synthase. 9.4 L-DOPA Therapy: A Clever Prodrug Concept The neurotransmitters dopamine and acetylcholine fulfill different tasks in partic- ular parts of the central nervous system. Parkinson’s disease, also called the “shaking palsy,” is a result of the degeneration of dopamine-producing cells in the Substantia nigra in the midbrain. The ensuing disproportion between the S Cl Cl N Cl Cl R O P N Cl O 9.28 Mustard gas 9.29 9.30 N-Aryl-analog, R = Aryl N H Cl 9.31 Cyclophosphamide Metabolic activation in the liver O P N Cl Cl O N H HO Cl HO O O O Cl O N Cl P O H2N H2N P N Cl O 9.32 Active form Acrolein + N-analog, R = CH3 Fig. 9.7 The cytostatic N-methyl and N-aryl compounds 9.29 and 9.30 are derived from mustard gas 9.28. The first step in the activation of the prodrug cyclophosphamide 9.31 is a metabolic hydroxylation of the carbon next to the nitrogen atom. The biologically active agent 9.32 and the toxic side product acrolein come from a labile intermediate that is formed by enzymatic degrada- tion and spontaneous decomposition. 9.4 L-DOPA THERAPY: A CLEVER PRODRUG CONCEPT 181
  • 199.
    dopaminergic and cholinergicnerve impulses leads to episodic chronic movement disorders such as rigidity, tremor, shaking, and an inability to move normally. Similar side effects are caused by substances that block the dopamine receptors, for instance, the tricyclic neuroleptics (▶ Sect. 1.6). Intravenous administration of dopamine 9.37 (Fig. 9.9) does not lead to the desired effect because the substance cannot penetrate the blood–brain barrier. Because of its purely peripheral effect, undesirable side effects on the heart and circulation are observed, for example, an increase in heart rate and blood pressure. The desired equilibrium in the brain should also be established by suppressing the cholinergic system. This route is also taken by giving anticholinergics, that is, antagonists to the cholinergic receptors. The administration of the amino acid L-DOPA 9.38 (Fig. 9.9) is a more elegant possibility for dopamine substitutions. This metabolic precursor of dopamine is an orally bioavailable, CNS-effective medicine. It is even more polar than dopamine and can neither be absorbed from the gastrointestinal tract nor can it cross the blood–brain barrier just by passive diffusion. Because it is an amino acid, it uses an amino acid transporter (▶ Sect. 30.7). With this, the first goal, CNS activity, is achieved. Oral L-DOPA administration however, still presents too many side effects in the peripheral nervous system. Furthermore, L-DOPA is very short acting as dopamine is quickly metabolized in the brain. Therefore, one must try to prevent the metabolism of the substance while simultaneously reducing its concentration in the periphery. The combination of N NH2 F HN F O O CH3 N O O N N O O H3C H3C Carboxyl- esterase Liver Cytidine- deaminase Liver, Tumor HO HO OH OH 9.35 9.34 Capecitabin H3C HO O F HN O F HN N H O F N O O Thymidine- phosphorylase Tumor OH 9.33 5-Fluorouracil 9.36 Fig. 9.8 The triple-prodrug capecitabin 9.34 is activated to 9.35 by a carboxylesterase in the liver, then it is transformed into 9.36 by a cytidine deaminase in the tumor, and a thymidine phosphor- ylase produces the cancer therapeutic 5-fluorouracil 9.33. 182 9 Designing Prodrugs
  • 200.
    L-DOPA with theperipheral decarboxylase inhibitor benserazide 9.39 and the CNS- effective monoamino oxidase inhibitor selegilin 9.40 (▶ Sect. 27.8) largely solves this problem. The peripheral side effects are reduced and the CNS effects are extended (Fig. 9.9). Despite this tour de force of drug design, which has led to significant therapeutic progress, the metabolically produced dopamine still acts in too many places. Aside from the residual peripheral side effects, sudden changes between excessive movement, normal movement, and rigidity, insomnia, agitation, and hallucinations are all manifestations of the generalized CNS activity. It has been speculated in conjunction with this observation, whether, in addition to endogenous and genetic factors, environmental factors, for example, the meta- bolic transformation of structurally analogous foreign substances, might be respon- sible for triggering Parkinson’s disease. 9.5 Drug Targeting, Trojan Horses, and Pro-prodrugs The design of active substances that exert their effect only in, or overwhelmingly in, one particular organ is called drug targeting. Aside from general principles, for example an optimal lipophilicity as a prerequisite for crossing the blood–brain barrier, specific metabolic transformations are used. The Parkinson’s disease drug L-DOPA, which was introduced in the previous section, is such a prodrug. The anticonvulsive medicine progabide 9.41 is a double prodrug because both func- tional groups of the neurotransmitter are masked. After crossing the blood–brain barrier and release of the amino and carboxyl groups, the actual active compound, g-aminobutyric acid (GABA, Fig 9.10), is formed. NH2 NH2 HO HO HO HO HO HO COOH 9.37 Dopamine 9.38 L-DOPA OH N N NH2 N CH3 CH H N CH2OH O CH3 9.39 Benserazide (racemate) 9.40 Selegilin H Fig. 9.9 Because dopamine 9.37 cannot enter the central nervous system, the metabolic precursor L-DOPA 9.38 is used. To reduce the cardiovascular effects of dopamine, L-DOPA is combined with a peripherally active decarboxylase inhibitor benserazide 9.39. The administration of a monoamino oxidase inhibitor, for example, selegilin 9.40, prevents the fast degradation of dopamine. 9.5 Drug Targeting, Trojan Horses, and Pro-prodrugs 183
  • 201.
    The ability ofthe blood–brain barrier to exclude polar substances can also be used as a prodrug concept. For this an active compound with a metabolically labile group can be coupled to a dihydropyridine. The neutral conjugate 9.43 can cross the blood– brain barrier. Oxidation leads to a permanently charged compound 9.44, which can no longer leave the brain. Upon metabolic cleavage the free active compound is released in situ (Fig 9.11). If oxidation takes place in the periphery, the highly water- soluble complex is excreted before the actual active substance is released. As nice as this principle seems, it has not found its way into therapy yet. N O OH O N NH2 F H2N OH Blood–Brain Barrier Cl 9.41 Progabid 9.42 GABA Fig. 9.10 Because it is a lipophilic neutral molecule, progabide 9.41 can cross the blood–brain barrier. It is transformed into the neurotransmitter g-aminobutyric acid (GABA) 9.42 upon metabolic release of the amino and carboxyl groups. Periphery Blood–Brain Barrier Brain X Drug O H H X Drug Drug O H H N CH3 CH3 CH3 CH3 N Metabolic Activation 9.43 Neutral lipophilic N X Drug O N X O + + N N + Metabolic cleavage Free drug 9.44 + Charged polar Fast elimination Fig. 9.11 Drug targeting in the brain is accomplished with a drug–dihydropyridine conjugate 9.43. This substance can easily enter the central nervous system. Metabolic oxidation leads to a permanently charged pyridine 9.44, which cannot cross the blood–brain barrier. The active compound is released in the brain, and the polar conjugate is quickly excreted from the periphery. 184 9 Designing Prodrugs
  • 202.
    Several analogues ofnucleoside bases and nucleosides are Trojan horses. The anti-herpes medicine aciclovir 9.45 enters the cell as its inactive form. The first monophosphorylation occurs only in virus-infected cells by a virus-specific thymi- dine kinase. Next cellular kinases carry out the formation of the triphosphate, the actual active substance. Because of this aciclovir acts as a targeted antiviral. The compound is, however, poorly absorbed. The more suitable valaciclovir 9.46 (Fig. 9.12) is understood to be a pro-prodrug. In the organism it is initially hydrolyzed to aciclovir and then transformed into the active form by the viral enzyme. Valaciclovir is more lipophilic than aciclovir, but despite this it is more soluble in water and approximately 55% bioavailable. Omeprazole 9.47 is the prodrug of an irreversible inhibitor of the H+ /K+ - ATPase, the so-called proton pump. Only under strongly acidic conditions, in the acid-producing cells of the stomach, it is transformed into sulfenic acid 9.48, which is in equilibrium with cyclic sulfenamide 9.49 (Fig. 9.13). This reacts irreversibly with an SH group of the enzyme to form a disulfide. Omeprazole is more effective than the H2 antagonists (▶ Sect. 3.5) because it blocks not only the histamine- induced acid secretions but rather all forms of acid secretion. The different metabolic activity in different tissues can be used to achieve a selective effect in one specific organ. In principle, adrenaline (▶ Sect. 1.4) as well as some b-blockers are suitable for the treatment of glaucoma, because they can normalize elevated intraocular pressure. However, they have substantial unde- sirable side effects on the heart function and circulation. This can be avoided by the administration of prodrugs that are metabolized more quickly in the eye, or only in the eye, for example, a particularly robust ester 9.50 of adrenaline 9.51, or a ketone– oxime ether 9.52 of timolol 9.53 (Fig. 9.14). The area of drug targeting has developed into an exciting field in the last years. Aside from the above-described prodrugs that release active compounds in the target area, the concept of antibody-coupled drugs has been pursued especially for the development of novel cancer therapeutics. Another approach is the coupling of drugs to a cell-specific recognition sequence. The goal of this work is to trick the membrane transporters of very specific cells so that the drug conjugate gains entry. Tumor therapeutics that were derived from N-lost were introduced in Sect. 9.3. These cytotoxic alkylating compounds, however, are very reactive and should only be activated in the desired target tissue. For this, the HN N O NH2 9.45 Aciclovir, X = H N N H2N O O O CH3 CH3 9.46 Valaciclovir, X = X Fig. 9.12 Aciclovir 9.45 is a Trojan horse. An enzymatic phosphorylation of its hydroxyl group by a viral kinase affords its monophosphorylated form in virus-infected cells only, which is then transformed to the triphosphate derivative by the cellular kinases. Valaciclovir 9.46 is a pro-prodrug because it is first transformed to aciclovir by hydrolysis and subsequently activated. 9.5 Drug Targeting, Trojan Horses, and Pro-prodrugs 185
  • 203.
    CH3 OMe N N S N CH3 N N N CH3 OMe CH3 + H+ O N S OH 9.47 Omeprazole 9.48 MeO MeO H H N N CH3 OMe N N N CH3 CH3 OMe+ ATPase-SH + N S CH3 N S S ATPase 9.49 MeO MeO H Fig. 9.13 In the presence of acids, omeprazole 9.47 is rearranged to a sulfenic acid 9.48, which is in equilibrium with a cyclic sulfenamide 9.49. This reacts irreversibly with an SH group on the H+ /K+ -ATPase, the so-called proton pump. OH N H CH3 N N N O O N X CH3 CH3 RO H S N N H3C 9.50 Dipevefrine, R = COC(CH3)3 9.52 Oxime Ether, X = N-OCH3 RO OH Ketone, X = O N H CH3 HO HO 9.53 Timolol, X = H, OH 9.51 Adrenaline, R = H Fig. 9.14 The metabolic peculiarities of the eye are exploited for drug targeting in glaucoma therapy. After penetrating the cornea, the bis-pivaloyl ester, dipivefrin 9.50 of adrenaline 9.51 is hydrolyzed 20 times faster than it is in the periphery. The oxime ether of timolol 9.52 is metabolized through the ketone to the active form, timolol 9.53, only in the eye. 186 9 Designing Prodrugs
  • 204.
    following strategies weredeveloped. Aromatic N-lost derivative 9.55 (Fig. 9.15) is released from prodrug 9.54 by specific peptide cleavage with carboxypeptidase G2, an enzyme that only exists in bacteria. This enzyme was coupled to a monoclonal antibody (▶ Sect. 32.3) that specifically recognizes human colorec- tal cancer cells. With this, the enzyme that “arms” the cancer drug is brought in the immediate vicinity of the cancer cell. In the future, this antibody-guided enzyme-activated prodrug therapy could make cancer therapy more tolerable and less toxic by releasing the active substance locally and in a distinctly more targeted way. 9.6 Synopsis • If it is impossible to achieve sufficient bioavailability, duration of action, membrane penetration or metabolic stability by chemical modifications, a prodrug can be developed that corresponds to a non- or poorly active precursor or derivative that is converted in the organism to its active form. • After absorption, a drug is transported to the liver and exposed to degrading enzymes that make it better water-soluble for excretion. The amount of the drug that survives this first liver pass is referred to as the bioavailable portion and can be distributed in the organism. • Esters are often used as prodrugs to mask polar acid groups; they are cleaved by ubiquitously present esterases. • A large variety of chemical modifications have been applied to modulate the physicochemical properties of drug molecules, however, they require special enzymes in the targeted cells or organs for metabolic activation. • L-DOPA, an amino acid analogue of dopamine, is delivered to the brain via an amino acid transporter and rapidly decarboxylated. To avoid side effects in the periphery, a combination with polar decarboxylase inhibitors is advisable. • Drug targeting to particular organs or cells exploits specific metabolic trans- formations only present in these compartments of the body. Cl Cl Cl Cl N N Carboxypeptidase O N COOH COOH O OH H 9.54 9.55 Fig. 9.15 The highly reactive cancer therapeutic derivative 9.55 is released from prodrug 9.54, which is activated by a specific carboxypeptidase. The carboxypeptidase is bound to an antibody that is targeted to the cancer cell. 9.6 Synopsis 187
  • 205.
    • Antibody-coupled drugsare specifically delivered to those compartments or organs that present the antibody-specific recognition site on the surface of disease-related cells. To trick membrane transporters, drugs can be coupled to cell-specific recognition sequences and thus gain entry to the cells. Bibliography General Literature Balant LP, Doelker E (1995) Metabolic considerations in prodrug design. In: Wolff ME (ed) Burger’s medicinal chemistry, vol I, 5th edn. Wiley, New York, pp 949–982 Bodor N (1987) Prodrugs and site-specific chemical delivery systems. Annu Rep Med Chem 22:303–313 Bundgaard H (ed) (1985) Design of prodrugs. Elsevier, Amsterdam Bundgaard H (1991) Design and application of prodrugs. In: Krogsgaard-Larsen P, Bundgaard H (eds) A textbook of drug design and development. Harwood Academic, Chur, pp 113–191 Ettmayer P, Amidou GL, Clement B, Testa B (2004) Learned from marketed and investigational prodrugs. J Med Chem 47:2394–2404 Gibson GG (1994) Introduction to drug metabolism. Blackie, London Rautio J (2012) Prodrugs and targeted delivery—towards better ADME properties. In: Mannhold R, Kubinyi H, Folkers G (eds) Methods and principles in medicinal chemistry, vol 47. Wiley-VCH, Weinheim Silverman RB (2004) The organic chemistry of drug design and drug action, 2 edn. Elsevier Academic, Oxford, Chapter 7, Drug metabolism, and Chapter 8, Prodrugs and drug delivery systems Stella VJ, Borchardt RT, Hageman MJ, Oliyai R, Maag H, Tilley JW (eds) (2007) Prodrugs: challenges and rewards, vol 2. Springer, New York Testa B (2007) Prodrug and soft drug design. In: Taylor JB, Triggle DJ (eds) Comprehensive medicinal chemistry II, vol 5. Elsevier, Oxford, pp 1009–1041 Testa B, Mayer JM (2003) Hydrolysis in drug and prodrug metabolism – chemistry, biochemistry and enzymology. Wiley-VHCA, Z€ urich Special Literature Bodor N, Buchwald P (2005) Ophthalmic drug design based on the metabolic activity of the eye: soft drugs and chemical delivery systems. AAPS J 7:E820–E833 Brewster ME, Pop E, Bodor N (1993) Chemical approaches to brain-targeting of biologically active compounds. In: Kozikowski AP (ed) Drug design for neuroscience. Raven, New York Napier MP, Sharma SK et al (2000) Antibody-directed enzyme prodrug therapy: efficacy and mechanism of action in colorectal carcinoma. Clin Cancer Res 6:765–772 188 9 Designing Prodrugs
  • 206.
    Peptidomimetics 10 Peptides are open-chainpolymers made up of amino acids (Fig. 10.1). The main chain is constructed of alternating amide groups —CONH— and aliphatic carbon atoms, which are labeled Ca. The side chains branch from the main chain at the Ca atom. The amide group is barely flexible (▶ Sect. 14.1). In contrast, a rotation around the Ca–Cb bond is possible. The side chains are flexible as well. Because of this, each amino acid can take on multiple conformations. As a consequence, peptides are very flexible molecules with many rotatable bonds and a multitude of possibilities to adopt different spatial configurations. Formally, there is no difference between the construction of peptides and proteins. Nonetheless, oligomers of amino acids up to a size of 30–50 monomer building blocks are called peptides, and the term protein is preferred for any members of this substance class that are above this limit. 10.1 The Therapeutic Relevance of Peptides Peptides are responsible for numerous biological functions in humans as enzyme substrates and hormones. A few important examples are summarized in Table 10.1. Accordingly, peptides are interesting for therapeutic purposes, and in fact, several important drugs are peptides (Fig. 10.2). The use of peptides as drugs is significantly limited by several factors: • Peptides are poorly absorbed after oral administration; this is mostly because of their high molecular weight and pronounced polarity. • Peptides are easily degraded by proteases in the gastrointestinal tract and are therefore metabolically unstable. • The body is able to very quickly excrete peptides via the liver and kidneys. Because peptides accomplish so many biological functions in our bodies, there is tremendous interest in finding active substances that do not have the above-mentioned detrimental properties, but that bind to the same receptors analogously to peptides or block enzymes that transform peptide substrates. A stepwise approach is taken in the search for such compounds. Peptide structures G. Klebe, Drug Design, DOI 10.1007/978-3-642-17907-5_10, # Springer-Verlag Berlin Heidelberg 2013 189
  • 207.
    O H H H H2N N N N O O O N H COOH O Tyr GlyGly Phe Leu HO Cg Cb O ω φ χ ψ H H N N Ca O Fig. 10.1 The pentapeptide Leu-enkephalin as an example of a peptide structure. The left side with the free NH2 group is the N terminus, and the other is the C terminus. Each amino acid contributes three atoms to the peptide chain. Nature almost exclusively uses the 20 natural (proteinogenic) L-amino acids for the construction of peptides (see Appendix 1). Depending on the functional groups in the side chains, the distinction is made between hydrophilic acidic and basic amino acids and those with hydrophobic aliphatic and aromatic side chains. The amino acids are abbreviated with three-letter codes. A one-letter code is also used. The definition of the torsion angles o, f, c, and w is shown on the example of the amino acid phenylalanine. The angle o is practically always close to 180 . The spatial course of the peptide backbone is determined by the f and c angles (see ▶ Sect. 14.2). The first atom in the side chain is called the Cb atom, and the next is given the index g. Table 10.1 Several important peptide hormones. Peptide Function Leu-Enkephalin, Met-Enkephalin Opiate receptor ligands, analgesics Fibrinogen Platelet aggregation Angiotensin II Blood pressure increase Endothelin Blood pressure increase (among other actions) Neuropeptide Y Blood pressure increase (among other actions) Substance P Bronchoconstriction and pain mediation 190 10 Peptidomimetics
  • 208.
    are replaced withisosteric building blocks so that the molecular recognition properties of the peptide remain, but the undesirable characteristics are reduced. Such peptidomimetics should have the following qualities: • Few or no cleavable amide bonds to improve metabolic stability. • Reduced molecular weight to improve oral bioavailability. • The same spatial orientation of groups responsible for strong binding to the receptor or enzyme as in the peptide. Bacteria are the true masters of constructing peptide structures that frequently achieve the desired metabolic stability. They incorporate amino acids that do not belong to the typical 20 residues that are usually used for the construction of proteins. Stereochemically inverted amino acids are also employed, and many of these structures have a cyclic architecture. They have even evolved a dedicated synthesis machinery for this: nonribosomal peptide synthesis (▶ Sect. 32.6). This system of modular, coupled enzymes works like an assembly line. Depending on the desired product, different enzymatic functional units are lined up, one after the other, to successively assemble the amino acids cyclizing the product in the final step. The exchange of an enzymatic synthesis unit causes other amino acids to be incorporated into the otherwise unchanged peptide. Even ester bonds can be constructed with a very similar multienzyme complex. Many lead structures all the way to complete drugs can be derived from these originally bacterial peptides, such as ciclosporin in Fig. 10.1, which is a most important immunosuppressant. A large number of macrolide antibiotics (▶ Sect. 32.6) are also synthesized in this way. Recently a so-called chemoenzymatic synthetic strategy has been developed for the construction of such macrolides. As discussed in ▶ Sect. 11.6, linear oligopeptides can easily be synthesized by using the Merrifield synthesis. Non-natural amino acids H-Cys-Tyr-Ile-Gln-Asn-Cys-Pro-Leu-Gly-NH2 Oxytocin N Me N N Me N N Me O O O Ciclosporin N O Me N O N Me N Me O O O N O O H O N N N O N Me H H pGlu-His-Trp-Ser-Tyr-D-Leu-Leu-Arg-Pro-NHEt Leuprolide Fig. 10.2 Peptides as drugs. Oxytocin is used to induce and strengthen contractions during labor. The immunosuppressive ciclosporin prevents organ rejection after transplantation. Leuprolide (pGlu ¼ pyro-glutamate) is an analogue of LHRH (luteinizing hormone releasing hormone), one of the hypothalamic hormones that, via LH (luteinizing hormone), controls the synthesis of male and female sexual hormones. Leuprolide is used to treat advanced-stage prostate cancer. 10.1 The Therapeutic Relevance of Peptides 191
  • 209.
    with L andD configurations can also be used to generate high combinatorial diversity. It is very difficult to cyclize these linear oligopeptides to the desired macrocycle by using chemical-synthetic methods. Here the nonribosomal peptide synthetic machin- ery is of service. The synthetically prepared peptides are then funneled into the enzymatic process chain and the cyclization domain from the bacteria catalyzes the ring closure of the peptide: a perfect symbiosis between synthetic chemistry and enzyme biology! 10.2 Designing Peptidomimetics In the beginning of the 1980s, there was only one generally accepted example for a low-molecular-weight active substance that takes over the function of an endog- enous peptide: the opiate. It is assumed that morphine 10.1 is a mimetic of the endogenous peptide b-endorphine 10.2 (Fig. 10.3). A comparison of both structures makes it immediately clear that morphine cannot possibly simulate all of the functional groups of the peptide. Obviously not all are necessary for the biological activity. This underscores the suspicion that other peptides also bind to receptors with only a few functional groups. If this hypothesis is true, it should be possible to identify the essential functional groups and find a small organic molecule that has the necessary functional groups in the correct relative orientation. The starting point for the design of peptidomimetics is the identification of the biologically active peptide, the function of which is to be imitated. In the first step, single amino acids are excluded to determine whether a portion of the peptide retains sufficient activity. Next the importance of the individual side chains is investigated. In a so-called alanine scan (Sect. 10.7), each amino acid is succes- sively replaced with alanine. A severe loss of activity is an indication that the removed side chain is important. Until now only peptides made up of the natural 20 amino acids have been investigated. In the next step structural elements are introduced that do not occur in the 20 proteinogenic amino acids. In principle, the following are possibilities for peptide structure modification: • The use of D- instead of L-amino acids. • Modifications of the side chain of amino acids. Tyr-Gly-Gly-Phe-Met-Thr-Ser-Glu-Lys-Ser- Gln-Thr-Pro-Leu-Val-Thr-Leu-Phe-Lys-Asn- Ala-Ile-Ile-Lys-Asn-Ala-Tyr-Lys-Lys-Gly-Glu HO H H O HO N CH3 H Morphine 10.2 10.1 b-Endorphine Fig. 10.3 Morphine 10.1 is a peptidomimetic for the endogenous peptide b-endorphine 10.2 and the enkephalins (▶ Sect. 1.4). It binds as an agonist to the opiate receptor. 192 10 Peptidomimetics
  • 210.
    • Changes onthe peptide main chain. • Cyclization to stabilize the conformation. • The use of templates that enforce a particular secondary structure, or that allow the attachment of side chains in a defined spatial orientation. 10.3 First Step to Variation: Modifying Side Chains An improvement in a peptide’s binding properties can often be achieved by using other side chains. For instance, in Fig. 10.4 a few analogues of the amino acid phenylalanine are shown that could be used as possible replacements. An increase in the binding affinity can be achieved if nonproteinogenic amino acids fill the COOH NH2 α β Phenylalanine COOH NH2 COOH NH2 COOH NH2 NH N COOH COOH NH H2N O H NH H2N H2N H2N H2N O HN O NH2 F O COOH COOH COOH Fig. 10.4 Sterically demanding, conformationally fixed, or metabolically stable analogues of the amino acid phenylalanine; the structural enhancements are indicated in red. 10.3 First Step to Variation: Modifying Side Chains 193
  • 211.
    binding pocket morecompletely. Rigid analogues lead to improved binding if the biologically active conformation, the one that is adopted in or at the receptor site, is immobilized. The introduction of nonproteinogenic amino acids can increase the metabolic stability. The hydroxylation of aromatic side chains can be suppressed by using a substituent, for example fluorine or a methoxy group, in the para position. Stability to cleavage by the digestive enzyme chymotrypsin can be improved by adding substituents to the Cb atom because the modified side chain no longer fits into the active site of this protease. A peptide’s proteolytic stability can also be improved by exchanging L- for D-amino acids. As described above, bacteria have already recognized this trick. Distributing D-amino acids randomly in the peptide can furnish active substances with astonishing metabolic stability. 10.4 A More Courageous Step: Modifying the Main Chain An important step in the design of peptidomimetics is the replacement of amide bonds in the main chain. A few commonly used groups are summarized in Fig. 10.5. It can be difficult or even impossible to find replacements for amide groups, which make hydrogen bonds to the protein with the C═O as well as NH groups, that do not decidedly reduce the binding affinity. If the amides only bridge functional groups to one another and do not form hydrogen bonds to the protein, N O R Amide bond N O R O H OH R R R R H N CH3 N R O R H X O OH R N R N-Methyl- Ketomethylen- Hydroxyethylen- (E)-Ethylen- Carba- Ether Reduced Amide H H X = -NH-, -O-, -CH2- Phosphonamides, Phosphonates, Phosphinates Retro-inverso P X N O Fig. 10.5 Different functional groups that can serve as a replacement for amide bonds in peptidomimetics. 194 10 Peptidomimetics
  • 212.
    then a largepalette of different replacement groups is available. Substitution at the amide nitrogen atom leads to metabolic stabilization because proteases can hardly cleave N-methylated amide bonds. If the N-methylation of a main-chain amide group leads to a loss in affinity, several different explanations come into question. One is that the N-methylated compound can no longer form hydrogen bonds, and an essential H-bond is lost in which the NH group was involved. Further, it could be that an undesired conformational change might have occured as a result of the additional methyl group, or the methyl group might be sterically blocking the binding onto the protein. On the other hand, an improvement in binding as a result of N-methylation indicates that the biologically active conformation is stabilized. At room temperature, an amide bond is practically exclusively in the trans geometry. Therefore it can also be substituted with an ester bond that takes on the same geometry. In doing so, however, the hydrogen-bond-donating properties of the amide are lost. An N-methyl substitution improves the stability of the 180 -rotated conforma- tion of the amide. In the case of proline, the only proteinogenic amino acid with an N-alkyl substitution, both the cis and trans amide configuration can be found. The exchange for a 1,5-disubstituted tetrazole can replace the cis orientation of a proline. In addition, trans-configured double bonds imitate the geometry of an amide bond well. The polar characteristics however, are lost. To a certain extent, this can be compensated if the double bond is substituted with fluorine. The reduction of an amide or an isosteric ester bond means the loss of the carbonyl group and leads to increased flexibility. If the carbonyl group is exchanged for an —S═O, —SO2 or —PO2 group, the H-bond-accepting characteristics are amplified, however, a geometry change comes with the bargain. The exchange of an amide for a thioamide results in a weakening of the H-bond-accepting properties and can serve as a test of the possible importance of H-bonds to carbonyl groups in the peptide backbone. Nonetheless, a measure of caution is warranted because the desolvation of a thiocarbonyl group is less difficult than that of a carbonyl group. This overlaps with the observed affinity and can mask the effect of the loss of the H-bond. The retro-inverso exchange of an amide bond can lead to marked improvement in the proteolytic stability without losing the binding qualities (▶ Sect. 5.5). An entirely different concept is the incorporation of b-amino acids (▶ Sect. 31.7). In contrast to the proteinogenic a-amino acids, these residues have four chain members per monomer unit. The amide bonds are separated by two aliphatic carbon atoms. Peptides that are made from these amino acids also show secondary structural characteristics (Sects. 10.5 and ▶ 14.2). They have already successfully been incorporated into naturally occurring peptides as mimetics and can simulate peptide–protein interactions. Because of the altered sequence of amide bonds, they are stable to proteolytic degradation. If the cleavable bond of a protease substrate is replaced with an isosteric, non- cleavable group, a substrate can be converted to an inhibitor (▶ Sect. 6.6). If the newly introduced group forms particularly favorable interactions with the active site of an enzyme, an exceedingly potent enzyme inhibitor can result. An example is found in the ketomethylene group in serine and cysteine protease inhibitors as a 10.4 A More Courageous Step: Modifying the Main Chain 195
  • 213.
    possible replacement forthe amide bond that is destined for cleavage (▶ Chap. 23, “Inhibitors of Hydrolases with an Acyl–Enzyme Intermediate”). The hydroxyethylene group is especially suitable for aspartic protease inhibitors (▶ Chap. 24, “Aspartic Protease Inhibitors”). Phosphonamides, phosphonates, and phosphinates are often strong inhibitors of metalloproteases (▶ Chap. 25, “Inhibitors of Hydrolyzing Metalloenzymes”). 10.5 Rigidifying the Backbone by Fixing Conformations An important aspect to the design of peptidomimetics is the peptide conformation. Peptides are flexible molecules and can take on different conformations. It is known however that certain conformations are preferably adopted in proteins and in some peptides. Among these are the two most important secondary structural elements: the a helix and the b sheet (▶ Sect. 14.2). Furthermore there are loops and turns at the ends of these secondary structural elements that also adopt preferred patterns, particularly the b turn (Fig. 10.6). A b turn is formed when a hydrogen bond exists between the carbonyl group of the amino acid i and the NH group of the amino acid i + 3. It is obvious that such hydrogen bonds can only form for certain combinations of the torsion angles f and c, which are determined by the amino acids in the i + 1 and i + 2 positions (▶ Sect. 14.2). b turns are especially interesting because many peptides bind to proteins in a b-turn conformation. Let us assume that the main chain of the peptide only serves to position the side chains so that an optimal receptor interaction can occur. Then it should be possible to replace the peptide chain with an entirely different scaffold, upon which functional groups are attached that adopt the same spatial orientation as the amino acid side chains. If a b-turn-configured peptide binds to a receptor, then a rigid analogue that “freezes” the b-turn conformation should lead to improved binding. The simplest way to fix a b turn is the incorporation of the necessary sequence in a small cyclic peptide. It is known from experimental structure determination that cyclic penta- and hexapeptides almost always contain a b turn. The conformation of these peptides were investigated at length in the research group of Horst Kessler at the Universities of Frankfurt and Munich. It could be shown that the position of a b turn O Ri+2 Ri+1 O HN HN O N Ri H NH O Ri+3 fi+1 yi+1 fi+2 yi+2 Fig. 10.6 A b turn is a peptide conformation in which a hydrogen bond is formed between the amino acids i and i + 3. Particular ranges for the values of the torsion angles fi+1, ci+1, fi+2, and ci+2 are characteristic for the b turn. 196 10 Peptidomimetics
  • 214.
    in a sequencecan be controlled. Proline as well as D-amino acids prefer the i + 1 position in these loops. The introduction of D-amino acids supports the formation of a b turn above other possible conformations. A b turn can also be forced by a non-peptide template. Numerous b-turn mimetics have been proposed for this (Fig. 10.7). A part of the structures serves as a template on which two peptide chains can be forced in an antiparallel orien- tation. However, substitution by introduction of the R2 and R3 side chains is synthetically difficult. Benzodiazepines are interesting scaffolds onto which all four side chains R1 –R4 can be coupled. Other peptide conformations can also be fixed by the introduction of rigid groups. A few examples of conformation- stabilizing ring systems are displayed in Fig. 10.8. An especially convincing example of a scaffold mimic is the design of a thyreotropine-releasing hormone (TRH) mimetic by P. N. Olson and co-workers. S H S H N H O N O N O H O S NH O O N H N R S O O N N H O R O N O O N O N Figure 10.7 Typical b-turn mimics. The amino acids are added onto the template at the colored positions. N N O N N R H H N O R N O O O R N S S H H H H N R N HN O N R N N N R O N N H R R O O N O N N O H Fig. 10.8 The illustrated rings replace one or two amino acids and force a particular conformation. 10.5 Rigidifying the Backbone by Fixing Conformations 197
  • 215.
    TRH is thetripeptide pGlu–His–Pro–NH2 10.3. The approach is shown in Fig. 10.9. After deducing a pharmacophore hypothesis, a rigid scaffold molecule was sought upon which the side chains could be appended in the correct relative orientation. Cyclohexane was chosen as a scaffold. Compound 10.4 is a potent TRH receptor ligand. The substance acts as an agonist and elicits the same effects as TRH. An improvement in cognitive function could be seen in animal experiments after the administration of 10.4. 10.6 Peptidomimetics to Interfere with Protein–Protein Interactions Proteins communicate with one another and transmit information and signals in that they form mutual complexes through commonly shared surfaces. The area of the shared contact surface is usually larger than a few thousand square Ångströms (Å2 ). This is a large value when compared to the surface that a small organic molecule of typical drug size occupies upon binding. Furthermore, the contact area between two proteins is, as a general rule, not very jagged. It hardly resembles the deep binding pockets in enzymes that can host small ligands. Nevertheless it would open entirely new perspectives for drug therapy if such protein–protein contact surfaces could be blocked with low-molecular-weight compounds. At first glance, this task seems almost impossible. How can a small molecule bind to a flat, barely structured O N N H N O CONH2 N NH O 10.3 TRH H N N N H H N N O CONH2 O N R CONH2 N Ph Pharmacophore 10.4 Fig. 10.9 By starting with the structure of tripeptide TRH 10.3 and a hypothesis for the functional groups that are essential for binding, the non-peptidic molecule 10.4 was designed, which also binds to the TRH receptor. 198 10 Peptidomimetics
  • 216.
    protein surface withan interaction that is strong enough not to be “washed away” when the protein–protein contact forms? Furthermore, there is the problem that amino acid residues on the convex surface of a protein have in general much more space to flexibly adapt their conformation. A statistical analysis of the amino acid composition across the contact surfaces in protein complexes showed a preference for aromatic residues, aspartate, arginine and the aliphatic residues proline and isoleucine. The selective exchange of amino acids in the contact surface also showed that there are a few protruding residues that dominate the interaction (so-called “hot spots,” ▶ Sect. 17.10). The search for possible binding sites of a small molecule that can compete with the formation of the protein–protein interface starts with a detailed analysis of the complementary geometry to the contacting surfaces. Are there clustered areas with charged residues or does a structural element such as a b turn or a helix penetrate a little more deeply into the opposite contact surface? Next, the peptide sequence that corresponds to the contact surface is synthesized. This can be portions that preferably adopt a helical structure or that can be fixed in a turn pattern such as a cyclopeptide. If an active peptide is found, it must be structurally characterized in complex with the opposite contact surface. The complex of the BCL-XL (B-cell lymphoma) protein with a 16-residue peptide that was cut from the BAK protein is shown in Fig. 10.10. BCL-XL belongs to the proteins that prevent programmed cell death (apoptosis). Its function is regulated by binding to pro- and antiapoptotic factors such as BAK. Inhibitors of this contact formation might therefore deliver potential drugs for an anticancer therapy. The binding of the helical peptide takes place in a stretched-out groove. Small molecules have been discovered that fill this crevice (Fig. 10.11). The group Fig. 10.10 The NMR spectroscopic structure of the BCL-XL protein with the a-helical, 16-membered peptide fragment from the BK protein (orange). The peptide binds in a deep groove with the amino acids Ile85, Ile81, Leu78, Val74 (from left to right, side chains are in light blue). The surface of the BCL protein is shown in white, the contact surface of the hydrophobic amino acids of the peptide all protrude into the cleft and are indicated by the light-blue net. 10.6 Peptidomimetics to Interfere with Protein–Protein Interactions 199
  • 217.
    of Andrew Hamiltonat Yale University has been searching for a basic scaffold that can imitate the characteristics of a helix and simultaneously hold the side chains on one side. Terphenyl derivatives 10.5–10.7 were found that can arrange the side chains in a staggered conformation analogous to a helix. An alanine scan along the BAK peptide showed that four hydrophobic residues (Val74, Leu78, Ile81, and Ile85) are essential for binding. In addition, Asp83 forms a salt bridge to BCL-XL. The terphenyl scaffold was therefore furnished with an acidic group at the end and decorated with alkyl and aryl residues in the ortho positions. Compound 10.6 binds to the BCL-XL protein with an affinity of 114 nM. A different approach was taken at Abbott. Small molecules that interact with the BCL protein were sought by NMR spectroscopy (▶ Sect. 7.8). The millimolar inhibitors para-fluorobiphenylcarboxylic acid 10.8 (Fig. 10.11) and 1-hydroxytetraline 10.9 were discovered. Both bind to distinct but neighboring positions. They replace Asp83 and Leu78 of the binding domain of the BAK peptide, and 10.9 occupies the Ile85 position. From the two discovered fragments, the scientists at Abbott developed compound 10.10, which had two-digit nanomolar affinity for the protein. Further optimization led to 10.11, a highly potent antagonist that blocks the entire family of antiapoptotic BCL-2 proteins. The synergistic effect of ABT-737 together with radiation and chemotherapy was demonstrated in animal experiments. An analogous case was studied with the MDM2 protein at Roche. MDM2 is overexpressed in many tumors. It binds to the tumor-suppressor protein p53, which protects cells from converting to a malignant state. It is therefore the protein that is most often inactivated during the carcinogenesis. Inhibition of complex formation between the overexpressed MDM2 protein and p53 could thus represent an approach to a possible cancer therapy. Here too, an a-helical p53 peptide stretch binds to a hydrophobic groove on the MDM2 protein. A cis-imidazoline with an affinity of 100–300 nM was found in screening. The co-crystal structure was accomplished with 10.12 (Fig. 10.11). The imidazoline scaffold imitates the side of an a helix of the peptide from the p53 protein. The two p-bromophenyl rings replace a Trp and a Leu. The ethyl ether group on the third aromatic ring orients in the pocket that is filled with a phenylalanine in the peptide. The MDM2 protein is blocked through this competitive binding, and the level of free p53 increases. Through this, the p53 pathway in cancer cells is activated, and the cell cycle comes to a complete stop. The cell may go into programmed cell death. The tumor growth inhibition was already demonstrated in animal models. Another large class of proteins that is controlled by contacts with other proteins is the integrins. Numerous low-molecular-weight inhibitors have been discovered for this class. An example for the successful design of antagonists by starting from cyclic peptides is presented in ▶ Sect. 31.2. Many G protein-coupled receptors (▶ Sect. 29.1) are controlled by endogenous peptides or proteins. For this, the peptide or protein binds to the receptor. The replacement of the peptide sequences with an organic molecule that imitates the binding of the natural ligand has also been attempted. An example of the design of such an active compound is given in ▶ Sects.29.5 and ▶ 29.6. Although successful, the design concept that was followed 200 10 Peptidomimetics
  • 218.
    O COOH O COOH O COOH COOH COOH COOH 10.510.6 10.7 Kd = 114 nM Kd = 1.89 μM Kd = 2.70 μM OH COOH 10.9 K Kd = 4.3 mM F 10.8 H NO2 O NH S O O N S F 10.10 Ki = 36 nM Kd = 0.3 mM N H NO2 N H O NH S O O S N N 10.11 Ki = 1 nM ABT-737 Br N N O N OH O N Br O 10.12 Cl Fig. 10.11 Different inhibitors of protein–protein contacts that imitate the a-helical structural building blocks in the contact surface. The terphenyl derivatives 10.5–10.7 bind to the BCL-XL protein in a pronounced crevice and block the binding site of a helix. The small fragments 10.8 and 10.9, which led to the development of inhibitors 10.10 and 10.11 were discovered in the same area in an NMR spectroscopic screening. Compound 10.12 is a different helix mimetic that prevents the interaction between the MDM2 and p53 proteins. 10.6 Peptidomimetics to Interfere with Protein–Protein Interactions 201
  • 219.
    was wrong: theactive peptide and the derived synthetic mimic do not bind in an overlapping binding region of the receptor. 10.7 Tracing Selective NK Receptor Antagonists by Ala Scan Tachykinins are neuropeptides that all contain the same lipophilic C terminus: –Phe–X–Gly–Leu–Met–NH2. A well-investigated representative of the tachykinins is substance P, Arg–Pro–Lys–Pro–Gln–Gln–Phe–Phe–Gly–Leu–Met–NH2 (10.13, Table 10.2). Tachykinins bind to at least three different tachykinin receptors, the NK1, NK2, and NK3 receptors. All three belong to the class of G protein-coupled receptors (▶ Sect. 29.1). They mediate a variety of biological effects, for example, bronchoconstriction or pain transmission. Consequently a receptor antagonist could be helpful for the treatment of asthma as well as to fight pain. The study that was carried out on the development of an NK2 receptor antagonist at Parke–Davis in Cambridge is a classic example of conversion of a peptide to a peptidomimetic (Table 10.2 and Fig. 10.12). A compound was sought that binds to the same receptor as substance P. Starting point of the work was a hexapeptide, Leu–Gln–Met–Trp–Phe–Gly–NH2 (10.14), known from the literature that binds to the NK2 receptor with an affinity of 11.7 nM. In the first step each amino acid was systematically exchanged for alanine (10.15–10.20). In a few cases the Table 10.2 The rational design of NK2 receptor ligands. No. Structure Ki (nM) Substance P 10.13 Arg-Pro-Lys-Pro-Gln-Gln-Phe- Phe-Gly-Leu-Met-NH2 295 Minimal fragment 10.14 Leu-Gln-Met-Trp-Phe-Gly-NH2 11.7 Ala scan 10.15 Ala-Gln-Met-Trp-Phe-Gly-NH2 40 10.16 Leu-Ala-Met-Trp-Phe-Gly-NH2 138 10.17 Leu-Gln-Ala-Trp-Phe-Gly-NH2 156 10.18 Leu-Gln-Met-Ala-Phe-Gly-NH2 10,000 10.19 Leu-Gln-Met-Trp-Ala-Gly-NH2 8,300 10.20 Leu-Gln-Met-Trp-Phe-Ala-NH2 28 10.21 Leu-Gln-Met-Trp-Phe-NH2 200 Dipeptid 10.22 Z-Trp-Phe-NH2 2,700 Immobilization of the biologically active conformation 10.23 Z-Trp-(R,S)-(a-Me)Phe-NH2 327 N-Terminal optimization 10.24 (2,3-di-OCH3)C6H3CH2OCO- Trp-(R,S)-(a-Me)Phe-NH2 37.6 Stereochemical optimization 10.25 (2,3-di-OCH3)C6H3CH2OCO- Trp-(R)-(a-Me)Phe-NH2 10,000 10.26 (2,3-di-OCH3)C6H3CH2OCO- Trp-(S)-(a-Me)Phe-NH2 17.2 Addition of amino acid 10.27 (2,3-di-OCH3)C6H3CH2OCO- Trp-(S)-(aMe)Phe-Gly-NH2 1.4 202 10 Peptidomimetics
  • 220.
    replacement with alanineresulted in only a weak decrease in the binding affinity. As an example, the N-terminal leucine could be replaced with an alanine (10.15). The conclusion was that the Leu side chain can only be of secondary importance for receptor binding. The compound in which tryptophan or phenylalanine were replaced with alanine, however, showed very little affinity for the NK2 receptor. This was the “smoking gun” that these two amino acids are essential for the binding. The removal of the C-terminal amino acid glycine (10.21) decreased the affinity by a factor of 7. Obviously this amino acid also has some importance for receptor binding. The testing of several N-terminal protected dipeptides led to Z–Trp–Phe–NH2 (10.22, Ki ¼ 2700 nM) as a lead structure for further work. With this, the first stage of the project was accomplished. As a dipeptide, 10.22 represented an interesting lead structure for further work. In the next stage, additional methyl groups were introduced at different positions of the molecule. This limited the number of possible conformations. A decrease in binding affinity was observed for many of the investigated com- pounds with conformational restriction. A methyl group on the Ca atom of phenylalanine increased the binding affinity by a factor of 8 (10.23, Ki ¼ 327 nM). A possible explanation for this finding is that the conformation that is adopted in the receptor is stabilized by the additional methyl group. Then the N-terminal part of the molecule was varied. The replacement of the terminal phenyl ring with a 2,3-dimethoxyphenyl group further increased the binding affinity by a factor of 10 (10.24, Ki ¼ 37.6 nM). This value corresponds to the racemic a-methylphenyl- alanine. The enantiomerically pure compound 10.26 with this building block in the H Ki = 2700 nM 10.22, R = H N O O H H Ki = 327 nM 10.23, R = CH3 O N N O NH2 R H H 10.26, R = H N H Ki = 17.2 nM 10.27 O N O N O O O H Ki = 1.4 nM , R = CH2 CONH2 O N O NHR H Fig. 10.12 Important intermediates on the way to NK2 receptor antagonists 10.27. 10.7 Tracing Selective NK Receptor Antagonists by Ala Scan 203
  • 221.
    S configuration bindswith a Ki of 17.2 nM. The reintroduction of the C-terminal glycine finally led to the highly potent compound 10.27 (Ki ¼ 1.4 nM). Independent of the work at Parke–Davis, lead structure 10.28 was optimized to the NK1-specific receptor antagonists 10.32 and 10.33 at Merck, Sharp, Dohme (MSD). Although 10.28–10.32 were only effective in vitro, 10.33 is also active in vivo because of its higher metabolic stability (Fig. 10.13). MSD was finally successful with the structurally related aprepitant 10.34. The compound was intro- duced as a medicine to prevent acute emesis (vomiting) during highly nausea- inducing chemotherapy. 10.8 CAVEAT: Idea Generator for the Design of Peptidomimetics In the previous sections it was often highlighted that the side chains of the amino acids are responsible for the binding to receptors. Usually the main chain merely plays the role of a scaffold that serves to bring the side chains into the necessary spatial alignment for binding. As such, a rigid, non-peptidic scaffold onto which the side chains can be attached in the same spatial orientation should be suitable to design molecules with similar properties as peptides. This idea was embedded in a computer program in the group of Paul Bartlett at the University of California in Berkeley. The program CAVEAT allows the search for rigid molecules that N H 10.28, R = Et, X = H O R X 10.29, R = H, X = H 10.30, R = H, X = 3,5-di-CH3 I C50 = 1533 nM I C50 10000 nM I C50 = 3800 nM 10.31, R = Ac, X = 3,5-di-CH3 I C50 = 67 nM N O N H F 10.32, R = Ac, X = 3,5-di-CF3 I C50 = 1.6 nM H CF3 CF3 CF3 CH3 O N O N H N O CF3 O N H N O 10.33, IC50 = 3 nM H 10.34 Aprepitant Fig. 10.13 The optimization of lead structure 10.28, which was found by screening, to selective NK1 receptor antagonists 10.32 and 10.33. In contrast to the metabolically labile benzyl esters 10.28–10.32, ketone 10.33 is also active in animal experiments. The first NK1 receptor antagonist aprepitant 10.34 was successfully brought to the market by MSD for the prevention of acute emesis. 204 10 Peptidomimetics
  • 222.
    imitate a particularsegment of a peptide scaffold. For this, the bonds on the peptide backbone are described with vectors (Fig. 10.14). The 3D structure of the peptide for the peptidomimetic being sought must be known as a prerequisite. The orien- tation of the side chains is determined by the binding vectors Ca–Cb. The relative orientation of, for instance, three amino acid side chains is found by the position of the relevant Ca–Cb binding vectors. With this spatial pattern of vectors, a 3D database of molecular scaffolds that contain three substitutable bonds oriented analogously to the three Ca–Cb vectors is searched. The result is a list of rigid, usually cyclic molecular scaffolds, the free positions of which can be coupled to the amino acid side chains. 10.9 Design of Peptidomimetics: Quo Vadis? In this chapter the systematic approach to the design of peptidomimetics has been described. The approaches have proven themselves in many cases and have led to many attractive drugs. Nevertheless there are also difficulties. The first problem is the stepwise approach. A peptide is systematically modified, and the synthesized structures serve only to identify the essential functional groups. The synthesis of the many resultant derivatives, that is, practically all in which an amide group was NH NH NH2 HN A B NH N HN O O H C O OH Fig. 10.14 The principles of a 3D search for scaffold mimics with the CAVEAT program. First, the relative orientation of the biologically active side chains in the peptide lead structure is defined by the Ca–Cb vectors. In this example the three essential amino acids Trp, Arg, and Tyr are taken. The three vectors, A, B, and C are the essential information used to search the 3D database for rigid scaffold structures that bear substitutable bonds in the same relative orientation. A list of cyclic structures that represent possible templates for peptidomimetics is the result. 10.9 Design of Peptidomimetics: Quo Vadis? 205
  • 223.
    replaced by oneof the structures in Fig.10.4, is laborious. Furthermore these compounds only serve as tools because most modified peptides have high molecular weights, and this can result in poor oral bioavailability. In the past, many new nonpeptidic active substances, especially as receptor antagonists, were found in high-throughput screening, and these could frequently be developed into clinical candidates in relatively little time. These successes have pushed rational concepts for the development of peptidomimetics, which were once in the foreground, somewhat into the background. Despite this, the design of peptidomimetics remains an important research area in drug design. The terphenyl scaffold helix mimetics serve as an example of this. Many enzyme inhibitors that are introduced in ▶ Chaps. 23, “Inhibitors of Hydrolases with an Acyl–Enzyme Intermediate”; ▶ 24, “Aspartic Protease Inhibitors” and ▶ 25, “Inhibitors of Hydro- lyzing Metalloenzymes” continue to have peptidic character. Here the peptide substrate is clearly the “gold standard” for the design of a mimetic. As always, the peptidomimetic concept plays an important role in lead structure optimization. 10.10 Synopsis • Peptides are open-chain polymeric molecules made up of amino acids that are mutually linked by amide bonds. Side chains branch from the main chain at the Ca atoms and show a high degree of flexibility. If such a polymer contains up to 30–50 amino acids, it is called peptide, beyond this limit, it is called protein. • Peptides are responsible for many biological functions; their applicability as drugs is limited due to size, polarity, and poor proteolytic stability. • Due to their multiple functions, peptides can be mimicked by smaller – similarly binding – and metabolically stable peptidomimetics. • Peptidomimetic design starts with the identification of the minimal peptide sequence responsible for a biological effect, followed by successive replacement of each amino acid in the chain with alanine to detect the side chains responsible for activity. Finally, individual amino acids are replaced by non-proteinogenic ones or similar chemical building blocks. • Multiple surrogates for amino acid side chains have been developed and can be tested to reveal better binding and more stable peptidomimetics. If not involved in direct binding, main-chain amide bonds can be replaced by a large variety of substitutes that achieve a similar geometry. • Peptides are flexible and adopt multiple conformations. If a particular fold is adopted to correctly orient interacting side chains, the peptide backbone can be replaced by an entirely different scaffold that correctly positions the essential interacting groups. • Peptides fold upon themselves through particular turn patterns. These turns stabilize a required conformation and can be chemically replaced by rigid structural surrogates that freeze a given turn conformation. 206 10 Peptidomimetics
  • 224.
    • Proteins communicatewith each other through the formation of large, mutu- ally shared surface patches. Small molecules designed to bind to such flat surfaces can antagonize complex formation and interfere with protein–protein communication. • Design of small molecules to block protein–protein interfaces exploits depres- sions on the surface that accommodate spatial patterns such as turns or helical portions of the penetrating contact surface of the partner protein. • Peptides bind to receptors mostly via side chains, and the backbone provides the scaffold for their attachment. Computer programs can be used to screen struc- tural databases to retrieve alternative scaffolds that are able to orient substituents in very similar fashion. Bibliography General Literature Ahn J-M, Boyle NA, MacDonald MT, Janda KD (2002) Peptidomimetics and peptide backbone modifications. Mini Rev Med Chem 2:463–473 Gante J (1994) Peptidomimetics—tailored enzyme inhibitors. Angew Chem Int Ed Engl 33:1699–1701 Giannis A, Kolter T (1993) Peptidomimetics for receptor ligands—discovery, development, and medical perspectives. Angew Chem Int Ed Engl 32:1244–1267 Hirschmann R (1991) Medicinal chemistry in the golden age of biology: lessons from steroid and peptide research. Angew Chem Int Ed Engl 30:1278–1301 Marahiel MA (2009) Working outside the protein-synthesis rules: Insights into non-ribosomal peptide synthesis. J Pept Sci 15:799–807 Special Literature Howson W (1995) Rational design of Tachykinin receptor antagonists. Drug News Perspect 8:97–103 Lauri G, Bartlett PA (1994) CAVEAT: a program to facilitate the design of organic molecules. J Comput Aided Mol Des 8:51–66 Lelais G, Seebach D (2004) b2 -amino acids-synthesis, occurrence in natural products, and components of b-peptides. Biopolymers 76:206–243 McLeod AM, Merchant KJ, Cascieri MA et al (1993) N-Acyl-Ltryptophan benzyl esters: potent substance P receptor antagonists. J Med Chem 36:2044–2045 Merchant KJ, Lewis RT, MacLeod AM (1994) Synthesis of homochiral ketones derived from L-tryptophan: potent substance P receptor antagonists. Tetrahedron Lett 35:4205–4208 Olson GL, Bolin DR, Bonner MP et al (1993) Concepts and progress in the development of peptide mimetics. J Med Chem 36:3039–3049 Bibliography 207
  • 226.
    Part III Experimental andTheoretical Methods
  • 227.
    A crystal isthe prerequisite for the 3D-structure determination of a protein with X-ray crystallography (▶ Chap. 13). The figure shows crystals of a complex of protein kinase A that were used to elucidate the reaction mechanism of this class of enzymes (▶ Chap. 26). (Reprinted with the kind permission of Dr. Dirk Bossenmeyer, Deutsches Krebsforschungszentrum, Heidelberg, Germany.) 210 III Experimental and Theoretical Methods
  • 228.
    Combinatorics: Chemistry withBig Numbers 11 The search for new lead structures and the optimization of their activity profile by systematic modification are among the most time and cost-demanding steps in drug research. The optimization of a small organic molecule can serve as an example. Even if the number of different groups per position is limited to relatively few, several million structures are possible as exemplarily shown in the case of the multisubstituted tetrahydroisoquinoline carboxylic acid amide 11.1 (Fig. 11.1). The combinatorial explosion of all imaginable substitution possibilities can no longer be realized with classical chemical techniques. The diversity increases even more when the different stereoisomers are considered. The number is already consider- ably larger than the number of all of the compounds referenced in Chemical Abstracts (33 million) or in Beilstein (10 million compounds). In the days when substances were tested on whole animals or in complex pharmacological in vitro models, the biological tests were the rate-determining step. The introduction of molecular test models, for example, enzyme or receptor-binding tests, and extensive automation of screening has fundamentally changed this situation. Testing of many thousands of compounds per day is technically unproblematic (▶ Sect. 7.3). To use the capacity of these methods to their fullest extent, the synthesis of thousands or even tens or hundreds of thousands of different molecules is desirable. The strategy can then shift either to automated parallel synthesis to cover a large number of single compounds or the simultaneous production of compound mixtures by using combinatorial chemistry. 11.1 How Nature Produces Chemical Multiplicity Nature has shown a way to achieve combinatorial diversity with the nucleic acids and with proteins. A 600-base-pair DNA sequence codes a protein with 200 amino acids. From the “pool” of four nucleic acids that code for the 20 proteinogenic amino acids in triplet sequences, 4600 (a number with 360 digits!) different G. Klebe, Drug Design, DOI 10.1007/978-3-642-17907-5_11, # Springer-Verlag Berlin Heidelberg 2013 211
  • 229.
    DNA sequences arepossible. This translates to 20200 (a number with 260 digits!) different amino acid sequences for the resulting protein. Short peptides with enormous structural variety can be constructed with just the 20 proteinogenic amino acids. If instead of amino acid A, a manageable number of modified amino acids M is used, the number of possible analogues increases even more (Table 11.1). Peptides play an important role in biological systems. They are found as protein ligands in the free form or as simple derivatives. Peptide sequences exposed on the surface of a protein determine the recognition properties of the protein at a receptor. Nature exhausts the full combinatorial diversity of the variable sequences on the surface regions (epitopes) of proteins for their selective recogni- tion. This principle of Nature can be adopted to generate huge compound libraries with highly variable composition. 4 5 N R5 R6 R4 N O R9 5 2 * * R7 R8 R2 R1 R3 R10 10 5 20 11.1 5 10 2 Fig. 11.1 The tetrahydroisoquinoline carboxylic acid amide 11.1 is to be substituted in 10 posi- tions. The groups in these positions encompass a multiplicity of a total of 68 building blocks (R1–R10 ¼ 5, 10, 10, 4, 5, 5, 5, 2, 2, 20 groups). Twenty million compounds can be constructed in this way. If the structural diversity that results from the two stereocenters (*) is considered, this number increases again by a factor of 4. Table 11.1 Four-hundred dipeptides, 8,000 tripeptides, 160,000 tetrapeptides, and 64 million hexapeptides can be generated from the 20 natural amino acids, A. If the palette is expanded to 100 modified, non-natural amino acids, M, the combinatorial diversity increases dramatically. Compounds Number Natural amino acids, A 20 Dipeptides, A–A 400 Tripeptides, A–A–A 8,000 Tetrapeptides, A–A–A–A 160,000 Hexapeptides, A–A–A–A–A–A 64,000,000 Modified amino acids, M 100 (for example) Modified hexapeptides, M–M–M–M–M–M 1,000,000,000,000 Number of known compounds 33,000,000 212 11 Combinatorics: Chemistry with Big Numbers
  • 230.
    11.2 Protein Biosynthesisas a Tool to Build Compound Libraries How can the biochemical synthesis machinery be used as a vehicle to generate a multiplicity of peptide sequences? It is possible to connect short sequences to a carrier protein so that they are exposed on the surface and can interact with the target protein in a molecular test system. The test system is constructed in a way that the binding to the target protein is monitored with an easily registered signal, for instance, a fluorescence signal or a colorimetric reaction (▶ Sect. 7.2). To exploit protein synthesis for the construction of such a library, information about the randomly constructed peptide must be translated into the “genetic make- up” of the DNA molecule. This codes the sequence of the protein on the surface of which the library will be presented. Randomly assembled double-stranded DNA sequences must be introduced in the correct position of the DNA. After production of a large number of identical copies (clones), the gene can be expressed. A large population of proteins is produced that carry the randomly composed peptide sequence in a particular position, usually at the beginning or the end of the polymer chain. These proteins are then investigated in a test system. The distribution of the 20 proteinogenic amino acids over the variable sequence section is not entirely homogenous. That is because some amino acids are coded with a single triplet sequence (codons), and others are represented with up to five different codons (▶ Sect. 32.7). Because of this, biased libraries are inevitably formed. The bacteriophage M13 is an extremely popular expression system. M13 is a virus that infects Escherichia coli strains well. The virus carries six proteins on its coat. Two of these coat proteins allow randomly assembled protein sections to be added to their ends. A library of 20 million modified 15-residue peptides was constructed with this M13 system. Their binding to the protein streptavidin was tested. Fifty-eight candidates were identified as binding partners. They all carried the ─His─Pro─Gln─ segment in common. A crystal structure of the streptavidin in complex with this oligopeptide was successfully determined. The ─His─Pro─Gln─ segment of the peptide occupies the binding pocket that is normally used by biotin. This proves that selectively binding peptide sequences can be found with this method. The biochemical approach to generating and presenting compound libraries has the overwhelming advantage that the high-capacity protein biosynthesis is exploited. Furthermore the sophisticated protein and DNA synthesis techniques and analytical methods that have been developed for such substances (Sect. 11.7) can be used to characterize screening hits. But it also has disadvantages. The molecular diversity is limited to the 20 proteinogenic L-amino acids, and only peptides result as lead structures. Often these represent the starting point for the development of an active substance. However, we want to get away from the metabolically unstable, poorly bioavailable peptides. Therefore structures are searched for by using classic organic molecular scaffolds. At the very least, peptidomimetics or peptides with metabolically stable non-natural amino acids are desired. Unfortunately the step away from peptides toward alternative scaffolds, with retention of the biological activity, is not trivial (▶ Chap. 10, “Peptidomimetics”). 11.2 Protein Biosynthesis as a Tool to Build Compound Libraries 213
  • 231.
    11.3 Organic Chemistryfrom a Different Angle: Random- Guided Synthesis of Compound Mixtures Organic preparative methods were devised as an alternative to the biological approaches to generate compound libraries. Simple access to a compound library is gained by starting with reactive molecular building blocks, such as oligofunctional acid chlorides (11.2–11.4, Fig. 11.2). These components are simultaneously reacted with numerous reagents, for example, amines or amino acids. A mixture of many products is formed in an uncontrolled manner. Contrary to the general academic opinion that organic reactions should only deliver homogenous products, in this case as much product diversity as possible is desired. The advantage of this method is that it is easily carried out, and automation is readily realized. This synthesis Cl O O Cl Cl Cl O O O Cl O O Cl O O Cl 11.2 11.3 Cl O O Cl Cl Cl O 11.4 O O AA1 AA2 AA4 O AA3 O O O O Ile O O Ile Lys Lys O O Pro Pro O O Val O O Val 11.6 11.5 Fig. 11.2 The oligofunctional acid chlorides of the central building blocks cubane 11.2, xanthene 11.3, and benzene 11.4 are treated with protected amino acids. A xanthene-containing library inhibits the digestive enzyme trypsin. The active component of the library was deconvoluted and characterized by targeted resynthesis. In the end the isomers 11.5 and 11.6 remained as the most potent compounds. The derivative 11.5 inhibits trypsin with a Ki of 9.4 mM. 214 11 Combinatorics: Chemistry with Big Numbers
  • 232.
    strategy also hasdisadvantages. The coupling partners have different reactivities. As a result, the products are not evenly distributed. The transformation of a particular functional group on the central building block can depend upon which components the central molecule has already reacted with, and how this influences the other functional groups. The thus-generated library is then tested. If binding to the target protein is found, the active substance in the mixture is characterized, a task that is not particularly simple. On the one hand, sophisticated analytical techniques such as liquid chro- matography coupled with NMR spectroscopy and mass spectrometry can be used. Moreover, an attempt can be made to “deconvolute” the library. For this, a targeted resynthesis of the library is carried out in which a partial library is prepared by using a defined selection of building blocks. This smaller library is then tested and the composition of the active mixture is determined. This strategy must be followed back to the level of single defined reactions product. 11.4 What Is Contained in Chemical Space? At this point the fundamental question must be asked: how many organic molecules are principally possible from which medicinal chemists can create their candidates? What is the content of this, at first virtual, chemical space? Much has been speculated about this question. Numbers between 1020 and 10200 possible molecules have been named. The last claim encompasses so many molecules that the entire mass of the universe would not be enough to synthesize at least one molecule of every com- pound! We have to thank Tobias Fink and Jean-Louis Reymond of the University of Bern for forming a concrete idea of the principle occupancy of this chemical space. Beginning with mathematical graphs that describe simple hydrocarbon scaffolds, molecules with up to 11 C, N, O, and F atoms were generated on the computer. Heteroatoms and unsaturated bonds were scattered throughout the generated molec- ular graphs in a combinatorial fashion. Different filters that consider the chemical stability of the functional groups, the strain of the ring systems, and the formation of tautomers produced a database of 26.4 million structures. If all possible stereoiso- mers are generated, an average of 4.2 isomers per entry is formed. The database finally encompassed 110 million molecules. It is interesting to see that the number of entries increases exponentially with the square of the number of atoms. Therefore already 90% of the database is composed of molecules with 11 non-hydrogen atoms. If the number of molecules that can be generated with 25 non-hydrogen atoms is estimated, the result is 1027 imaginable products. Twenty-five atoms represent approximately the average size of a typical drug molecule. It is worthwhile, however, to look at the database with entries of 11 non- hydrogen atoms more closely. The average molecular mass in this database is 153 7 Da. Molecules of this size fall into the range of typical fragments or “lead-like” molecules (▶ Sect. 7.9). Exclusion criteria were proposed that emphasize promising candidates for drug development. The so-called “rule of three” leans on the “rule of five,” which was established by Chris Lipinski at Pfizer (▶ Sect. 19.7). 11.4 What Is Contained in Chemical Space? 215
  • 233.
    If the databaseis filtered with these rules, approximately half of the entries remain. Of these, ca. 15% are acyclic compounds, and about 43% contain one ring. It is very enlightening to see that only about 55% of the ring systems in the virtual database have been described in Chemical Abstracts or Beilstein. Comparison with a data collection of already-synthesized molecules of the same size makes clear where the chemical space has been only sketchily explored. It seems that very big gaps still exist! Over 99.8% of the entries in the virtual database are waiting to be synthesized. A comparison of the physicochemical properties of the molecules in both databases suggests that very broad areas still remain that until now have not been explored. If the chemical space is limited to compounds with 7, 8, or 9 atoms, it seems that the chemical space is well covered with already prepared molecules. Approximately 2/3 of the molecules with 10 or 11 atoms in the virtual database are chiral. In this group particularly, there are many candidates that meet the “lead-like” criteria. This is a real challenge for synthetic chemists. Chiral fused carbo- and heterocycles are difficult to make. Nevertheless, Nature has led the way: many biologically active natural prod- ucts contain just these building blocks. 11.5 Compound Libraries on Solid Support: Complete Yield and Easy Purification An interesting variation on classical chemistry in solution is found in the synthesis of compound libraries on solid supports. Organic polymers, usually cross-linked polystyrenes, are used as carriers. This material is chemically modified so that it carries numerous reactive functional groups of a particular sort, for example, chloromethyl, carboxylate, or amino groups. Through these groups the reaction product remains covalently attached to the insoluble polymer during the synthetic steps. Stepwise growth of the product is accomplished by coupling with appropri- ately protected building blocks (e.g., amino acids) and subsequent cleavage of these protecting groups. Large excess of reagents causes fast and nearly complete trans- formations. Unreacted starting materials can be removed by simple washing. After assembly of the target molecule, all protecting groups are removed. At the end of the synthesis, the product is either tested directly on the support or it is cleaved and its biological activity is tested in solution (Sect. 11.7). The technique can be easily automated. In the beginning of the 1960s, Robert Bruce Merrifield developed solid-phase synthesis for peptides and small proteins (Fig. 11.3). In the beginning of the 1980s the idea to use synthetic combinatorial principles for peptide synthesis emerged for the first time. H. Mario Geysen devised a multipin synthesis of peptides. By using a conventional Merrifield solid-phase synthesis 96 different peptides or defined peptide mixtures were prepared in an 8 12 format on polymer pins. This concept was so revolutionary that the originally submitted manuscript was rejected for publication in 1984. The referees were too severely restricted by their traditional thinking. The absolute control of stoichiometry and yield were less in the foreground for Geyson, rather the creation of combinatorial diversity with minimal effort was more important. 216 11 Combinatorics: Chemistry with Big Numbers
  • 234.
    Cl O O N H Boc R1 O H R1 O O N H Boc O O N H R1 O R2 N H R1 O NH H Br HO O N H R2 + ClCH2OCH3 + SnCl4 or ZnCl2 + HCHO Boc-NHCHR1COO− HCl / CH3COOH Boc-NHCHR2COOH/ DCCI / DMF Cleavage with strong acid HBr/CF3COOH Fig. 11.3 The Merrifield peptide synthesis is assembled on a polymeric resin that is functionalized in an appropriate way. The first N-terminal-protected amino acid is coupled to the chloromethylene group (Boc ¼ tert-butoxycarbonyl protecting group). Then the amino group is released, activated with dicyclohexylcarbodiimide (DCCI), and coupled with a second amino acid. The N terminus of the resulting dipeptide can be deprotected and elongated. It can also be cleaved from the resin under strongly acidic conditions as a peptide. 11.5 Compound Libraries on Solid Support: Complete Yield and Easy Purification 217
  • 235.
    In this way,thousands of different peptides could be prepared weekly. Entire libraries of compounds could be prepared and tested. The new methods were originally used for “epitope mapping,” that is, the structural probing of the surface of a protein with different antibodies (▶ Sect. 32.1). This technique allows the recognition of areas in a polypeptide chain that are exposed to the surface of a protein. Later it served the search for optimal sequences of protease substrates (▶ Sect. 14.6) and for the synthesis of biologically active peptides. In addition to the multipin method, high-efficiency methods have been established, for instance, the teabag method. Support beads are filled into teabags and dipped into solutions of protected amino acids with which their peptide sequence is to be elongated. 11.6 Compound Libraries on Solid Support Require Contrived Synthetic Strategies An especially sophisticated synthetic strategy is needed for the construction of compound libraries. Hexapeptides are considered as an example. In principle, all 20 proteinogenic amino acids could be used and 206 ¼ 64 million hexapeptides prepared and individually tested–an impossible undertaking. Therefore intelligent strategies are needed to quickly identify the biologically active sequences. As a consequence, an attempt is made to summarize the 64 million peptides in partial libraries. They contain constant amino acids in fixed positions. For example, all 400 partial libraries should be prepared for all possible hexapeptides with the form XXABXX (A, B ¼ predefined amino acids, and X is any natural mixture of amino acids). After testing these 400 substances the most biologically active mixture is the starting point for the second round of synthesis. Another 400 libraries are generated, this time with the form XA(Aa1)(Aa2)BX. Aa1 and Aa2 are the amino acids from the most active mixture from the first testing. These too are fed into the testing. The “best” amino acids for position 2 and 5 are found. This strategy is pursued step for step, until the most active sequence is identified. In a simpler procedure the amino acids are varied in one position at a time. By starting with 20 libraries AXXXXX the most active amino acid is determined in the first position. The starting point for the next synthetic cycle is the most active mixture Aa1XXXXX. By varying the adjacent position, the second amino acid is ascertained. This is repeated and 6 20 ¼ 120 hexapeptide libraries are prepared in the form of AXXXXX, Aa1AXXXX, . . . Aa1Aa2Aa3Aa4Aa5A until the “best” amino acids in all positions are determined. Another method allows the targeted construction of a library in few working steps. The conceptual design of the synthesis ensures that a defined compound is produced on each polymeric support bead. This is achieved by using the so-called “split- and -combine” technique (Fig. 11.4). For example, it is possible to synthesize all 8,000 possible tripeptides from the 20 proteinogenic amino acids in only 60 reaction steps. They are produced as 20 mixtures of 400 substances each. In the end, one definite compound is located on each polymer bead. The individual beads are available as a batch that is easily separated mechanically and individually tested. 218 11 Combinatorics: Chemistry with Big Numbers
  • 236.
    A B C A B C AC B C C C A A A C C C A C B B B A A A B B B B A A C B B C A C B A A A A A A A C A A B C A B B A A B B A B A C C C A C A B A A A A A B B B B B C C C B B B B B B B B B C C C A A A B C C C B B C C C C C C C C C C A B A C B A B C C C C C C C C C C Fig. 11.4 The construction of a compound library according to the split-and-combine technique starts with a certain amount of resin beads. These are evenly distributed among n reaction vessels. Only three are considered here for the sake of simplicity. In the first flask reagent A (e.g., amino acid A) is coupled to the resin. Reagents B and C are analogously added to flask 2 and 3. In the next step a dipeptide is constructed. To solve the problem of different reaction rates between the different amino acids A, B, and C, only one soluble reaction partner is added in excess to the mixture of solid-phase-bound starting materials. After the first reaction step, the resin, which is now loaded with an amino acid is combined and mixed. It is again distributed between 3 (or more) reaction flasks. The next reaction is carried out. In the case of a peptide synthesis, amino acid A is added to flask 1, B to flask 2, and C to flask 3. The resin is combined and mixed thoroughly. In the meantime all nine possible dipeptides are on the beads. After separating the beads again, the third step follows. In case the peptide chain is to be extended by another amino acid, amino acid A is added to flask 1, B to flask 2, and C to flask 3. Now all 27 imaginable sequential tripeptides are on the resin after three parallel reaction steps. A clearly identifiable compound is found on each resin bead. The library can be tested directly on the polymer or it can be tested in solution after cleavage from the support. 11.6 Compound Libraries on Solid Support Require Contrived Synthetic Strategies 219
  • 237.
    11.7 Which Compoundin the Solid Support Combinatorial Library Is Biologically Active? The libraries that were generated on the solid support are biologically tested. This can be done directly on the polymer-immobilized compounds. As with testing the libraries from bacteriophages, there is a danger that the support material influences the test, for example, through steric hindrance or unspecific interactions. Furthermore, it is important that the test protein is in a soluble form. Membrane- bound receptors therefore elude testing. Alternatively, the compound library can be cleaved from the resin. For this, the coupling between the resin and the library component must be made by using an appropriate “linker,” which allows the library to be selectively released. This linker is cleaved off, for instance, at a low pH or photochemically with UV light. It must not interfere, however, with the synthetic assembly of the library, and must not be cleaved during the synthesis. The final cleavage from the resin must not destroy the products. Testing the cleaved products certainly correlates better with physiological conditions. Spread- ing the cleaved compounds onto a large area or embedding them in a gel achieves a spatial separation so that compounds interacting with the test protein occur in local high concentrations. This way, the binding to insoluble (e.g., mem- brane-bound receptors) proteins can be tested. However, the advantage of the mechanical manipulation of a polymer-bound compound library is lost upon release. If biological activity is found in the test, it remains to be determined which compound from the library is responsible. If the library is precisely defined through the synthetic program, then it is known which compounds were tested. Active components are narrowed down by deconvolution and the resynthesis of partial libraries. Only one defined compound is produced on each resin bead with the one- bead-one-compound technique. It is not known, however, which one. It is only after activity is found that the compound characterization is attempted. There are many ways to do this: they can be tested on the resin by separating the relevant resin beads and analyzing the compounds. If the library is of peptides or oligonucleo- tides, peptide sequencing by Edman degradation (works even on 0.1 picomolar!) is carried out, or polymerase chain reaction allows (▶ Sect. 12.1) amplification and enrichment of oligonucleotides. Even more elaborate techniques are also used. During synthesis, the library is allowed to “grow” on multiple different linkers. The single library compounds can be released from these linkers under different conditions (e.g., different pH values, or photochemically at different wavelengths). First the compound is cleaved from the first linker to carry out testing. The cleavage from the second linker is performed after mechanical separation of the desired resin beads. This method serves to practically “label” the resin beads. The technique is therefore an elegant variation on library testing in a detached state. The different linker-bound compounds on the resin bead need not be identical. Therefore a test library of peptides can be linked to the resin bead by oligonucleotides, which are used as labels. Halogenated aromatics were also proposed as labels because they can be easily identified by mass 220 11 Combinatorics: Chemistry with Big Numbers
  • 238.
    spectrometry even inthe smallest quantities. The labels can even be encoded based on their sequence or the number of monomer building blocks with an appropriate binary code. The techniques of labeling the resin bead require considerable synthetic effort for the library preparation. The transformation steps for the assembly of the library and the labeling must not disturb one another. Even the final reading of the labels can require multiple working steps. The alternative route using the programmed synthesis concept with deconvolution and resynthesis also means increased effort for the repeated construction of the library components. However, the same work- ing steps are always used, they are just carried out with different reagent compo- sitions. With respect to automation, this is certainly an advantage. 11.8 Combinatorial Libraries with Large Diversity: A Challenge for Synthetic Chemistry Another aspect speaks for the last above-mentioned concept. In the meantime a large number of organic reactions have been transferred to solid-phase synthesis. For each solid-phase synthesis, a special strategy, a specific linker, and a suitable cleavage method must be developed. Each single synthetic step must be compatible with the protecting groups, the polymer support, and the linker. However, a whole new dimension of chemical diversity is made available than is possible with peptides and nucleotides. Careful design of the target molecules to be synthesized is indispensible for combinatorial chemistry. Limitations arise from the accessibility, that is, the devel- opment of an appropriate synthetic scheme, and furthermore from the desired structural diversity of the resulting library. Computer methods help to find a “reasonable” selection of synthetic components. How is the optimal composition obtained? This highly depends on what the constructed library should be tested for. A library can be developed for general-purpose screening. It should then be “optimally diverse.” Its composition is outlined according to generally accepted criteria such as molecular weight, total lipophilicity, an even distribution of H-bond donors and acceptors, as well as the size of the hydrophobic surface area. These characteristics are important for the similarity or diversity of active com- pounds in the library (▶ Chap. 17, “Pharmacophore Hypotheses and Molecular Comparisons”). The desired library diversity can also be considered in relation to the biological properties of a receptor (target oriented). Criteria that make a molecule “similar” or “diverse” for one receptor are not necessarily the same for another receptor (▶ Sect. 17.7). In view of the broad palette of proteins for which combinatorial libraries should be tested, there is no absolute measure of diversity. Therefore, combinatorial chemistry plays an important role in the estab- lishment of structure–activity relationships for a target protein. For this the chem- ical variation in different positions must be very quickly conducted on a suitable, discovered lead structure. The design and synthesis of such targeted compound libraries opens the gateway quickly. 11.8 Combinatorial Libraries with Large Diversity: A Challenge for Synthetic Chemistry 221
  • 239.
    11.9 Nanomolar Ligandsfor G Protein-Coupled Receptors Chemists at the company Chiron synthesized a library of trimeric N-substituted oligoglycines (peptoids) by using the split-and-combine method (Fig. 11.5). In their design of the nitrogen substituents the scientists had G protein-coupled receptors in mind. These receptors are the targets of many neurotransmitters and hormones. In the construction of their peptoids they combined at least one aromatic group and a side chain with an H-bond donor in the form of a hydroxyl group (Fig. 11.5, Groups A and O). Furthermore, a basic nitrogen atom is present in the molecules with X ¼ H. These groups match those also found in neurotransmitters and hormones. They have chosen as diverse a substituent composition as possible for the remaining third substituents (Group D). A peptoid library with ca. 5,000 di- and tripeptoids was prepared from these groups. Different mixtures were tested on the adrenergic receptors. The H-ODA-NH2 partial library was identified as the most active. It served as a starting point for the stepwise deconvolution of the library. Partial libraries were resynthesized, first by keeping the hydroxy side chain O constant, then the members of the diverse group D, and finally the aromatic substituent A. In the end, 11.7 remained as a nanomolar ligand (Fig. 11.6). The same peptoid library was tested on another GPCR, the opiate receptor. In this case the most active partial library H-ADO-NH2 was found in the first step. The relevant deconvolution through resynthesis delivered 11.8 as a nanomolar ligand. The molecule has a p-hydroxyphenylethyl moiety and a diphenylmethane group on both ends of the tripeptoid. It is known from detailed studies on Met-enkephalin 11.9 that the amino acids tyrosine and phenylalanine are essential for the activity. There are analogous groups for both moieties on the tripeptoid (Fig. 11.6). 11.10 More Potent than Captopril: A Hit from a Combinatorial Library of Substituted Pyrrolidines The Affymax company prepared a library of ca. 500 differently substituted pyrrolidines by 1,3-dipolar cycloaddition. In the first step, the resin was loaded with protected amino acids (Gly, Ala, Leu, and Phe; Fig. 11.7). Then the transfor- mation to an imine was made with four different aromatic aldehydes. Cycloaddition with five different alkenes led to five-membered-ring heterocycles. In the last step, the pyrrolidines were N-substituted with three different thiols. This last step was done in view of testing these ligands on the angiotensin- converting enzyme (ACE, ▶ Sect. 25.4). Inhibitors of this enzyme contain a functionalized proline residue at their C terminus. The iterative deconvolution of the library afforded 11.10 as a potent ACE inhibitor (Fig. 11.7; Ki ¼ 160 pM). It is distinctly a stronger binder than the marketed product captopril and belongs to the most potent thiol-containing ACE inhibitors. 222 11 Combinatorics: Chemistry with Big Numbers
  • 240.
    N X R1 N O R2 N NH2 O R3 O H X ∗ ∗ O O H3C ∗ cHex X R1 R3 R2 XA D O X A O D X O D A X O A D X D A O X D O A A ∗ ∗ ∗ ∗ O ∗ OH ∗ ∗ OH OH ∗ ∗ ∗ ∗ ∗ ∗ D CH3 CH3 COOH CH3 ∗ ∗ ∗ ∗ ∗ ∗ ∗ O NH2 CH3 OH CH3 O O NH2 ∗ ∗ ∗ ∗ ∗ CH3 O CH3 N O ∗ ∗ ∗ ∗ N OCH3 O O OCH3 OCH3 O ∗ N Fig. 11.5 Peptoids are oligoglycines that are substituted at nitrogen. A library of di- and tripeptoids was constructed according to the split-and-combine technique. Three X groups were added to the N terminus. Three groups O with a hydroxy function, 4 groups A with an aromatic ring, and 17 groups D with diverse groups were used as nitrogen side chains. Eighteen mixtures (6 permutations of A, O, and D with 3 end groups) gave ca. 5,000 di- and tripeptides. The H-ODA-NH2 library showed activity on the a-adrenergic receptor. First, the hydroxy groups O were deconvoluted. The compounds with p-hydroxyphenethyl groups were the most active ones. In the next synthesis round, 17 partial libraries were composed with this O group held constant, and defined groups were used from the diverse D group. Compounds with a diphenyl or diphenyl ether group were particularly active. With these groups in the D position, the work was continued. The division of the aromatic side chains A in the last position led to eight individual compounds. 11.10 More Potent than Captopril 223
  • 241.
    11.11 Parallel orCombinatorial, in Solution or on Solid Support? Combinatorial chemistry on solid support has enabled the automated synthesis of numerous molecules, but it also faces problems. The difficulties associated with testing on resins or the deconvolution and resynthesis of libraries have already been mentioned. Labeling is an elegant but laborious alternative. Another way to OH HN N N O NH2 O O O D A 11.7 N N O O O O O OH HN NH2 O A D 11.8 NH2 N H O N H N H N H O HO O O O SCH3 HO 11.9 Fig. 11.6 The derivative 11.7 is the most potent compound from the H-ODA-NH2 library with a Ki ¼ 5 nM on the a-adrenergic receptor. Testing on the opiate receptor gave compound 11.8 as the candidate with highest affinity (Ki ¼ 6 nM) from the H-ADO-NH2 library after deconvolution. Met- enkephalin 11.9 is a potent opiate receptor ligand. The relationship between the p-hydroxyphenyl group in 11.8 and the tyrosine side chain in 11.9, and a phenyl portion in the diphenylmethane groups of 11.8 and the benzyl groups of phenylalanine in 11.9 is obvious. Tyr and Phe are essential for the activity of Met-enkephaline. 224 11 Combinatorics: Chemistry with Big Numbers
  • 242.
    avoid deconvolution ofa library but which still uses the advantages of combinato- rial chemistry is parallel synthesis in spatially separated reaction vessels. It remains clear along the entire reaction sequence which reactant and product is in each vessel. A laborious deconvolution is omitted. At first this strategy seems to be impractical. How should a thousand reaction components be reasonably transformed in a thousand reaction flasks? For this purpose, the reaction flasks Ar-CHO: * + Aa: Gly Ala Le + a R1: H * CH2Cl O NH2 R O R1 Leu Phe Me OMe OSiMe2tBu Y: + b c O N O Ar O N O Ar CN CO2Me COMe H R R Y CO2Me CO2tBu + d Cl Thio O N O A Thio O Thio : CH2SAc CH2CH2SAc + O R r Y CH(Me)CH2SAc N O Ph O CH3 SH 11.10 HO CO2Me Fig. 11.7 The amino acids Aa ¼ Gly, Ala, Leu, or Phe are coupled to the support resin (a). Next, they are transformed to imines with four different aromatic aldehydes (Ar-CHO; b), which react with alkenes under 1,3-dipolar cycloaddition conditions to give pyrrolidines (c). In the last step the free NH proton on the heterocycle is treated with different thiol compounds (thio-COCl; d). With the help of the split-and-combine technique the library is cleaved from the polymer with release of an acid function. Its ability to inhibit the angiotensin-converting enzyme was tested. By resynthesis and renewed testing, the library was deconvoluted to the active compound. In doing so, compound 11.10 was identified as a high-affinity inhibitor. 11.11 Parallel or Combinatorial, in Solution or on Solid Support? 225
  • 243.
    should not bethought about in the classical organic chemistry sense. Rather, miniaturized reaction “automats” are developed in which all reactions steps are carried out in parallel. Alternatively, methods have been developed in which the resin beads are filled into many small reaction capsules (e.g., called teabags or “KansTM ”). These are open for the solution-phase for compound transport, but the beads are mechanically enclosed. Each capsule is fitted with a label that can be read with a radio transmitter. All of the capsules are then placed in a classical round- bottomed flask and the usual chemistry is carried out. The capsules can be mechan- ically separated and brought into contact with different reagents. Which reaction sequence is performed on which capsule is followed by the registration system with the radio transmitter. In this way, one molecule can clearly be prepared by combi- natorial principles per reaction capsule, practically as it is in parallel synthesis. The single compounds are then available for testing. Synthesis on a solid support material has disadvantages compared to chemistry in the solution phase. Usually transformations are slower and the analysis to follow the reactions is considerably more laborious to carry out. Coupling to the solid support requires a suitable linker. Such an anchoring group should be removed from the library before testing as tracelessly as possible. Above all else, upon removal of the linker (“traceless linkers”) no functional groups should remain in the library that might unintentionally be part of the pharmacophore. The chemistry to attach and remove the linker must be compatible with all of the other reactions in the synthesis of the solid-supported library. This can lead to limitations in the usable chemistry. In preparative chemistry, molecules are preferably constructed by using a convergent synthesis strategy. For this, an approach is developed in which the components of the final product are prepared in separate steps, each in parallel. In the subsequent reaction steps, the previously prepared components are brought together and coupled with one another to produce the final product. Such a strategy is more efficient and leads to a higher yield than a linear synthetic route. A convergent strategy, however, cannot be carried out by sequential construction on a resin. Therefore, the tables have been turned for some syntheses. The prepared libraries are not bound to the solid support, but rather the reagents with which they are treated. The advantage of carrying out reactions on the solid support is retained. Good mechanical separation of reaction components, working simply with large excesses of reagent, and automated reactions belong to this technique. An advan- tage is that it is now possible to carry out convergent syntheses. Even toxic reagents can be used as their separation is ensured by their firm adhesion to a solid support. The usual analytical methods that are typically applied for the solution phase can also be used. Some reactions, especially ring-closure reactions or condensations, are in competition with intermolecular transformations. To avoid these, highly diluted solution conditions are used. If a solid-supported reactant is used, the local con- centration of the reactant will be reduced as it is fixed to the solid support and spatially separated. Reactions that occur over a trapped reaction product can be simplified if the trapping reagent is coupled to a solid support. Mechanical filtering is enough to separate the trapped components. Similarly, the products can be 226 11 Combinatorics: Chemistry with Big Numbers
  • 244.
    separated and purifiedby trapping them on a solid support. Acids and bases can be separated for purification by treatment with an immobilized amine or a sulfonic acid. In the meantime, the adhesion of metal-complexing groups or hydrophobic adhesion groups are already used for the purification of combinatorially produced compound libraries. How will combinatorial chemistry develop further? The miniaturization of reaction vessels and synthetic automats seems to be a seminal perspective. The “lab-on-a-chip” concept is already intensively used for bioanalytical methods. Small reaction volumes, integrated separation columns, miniaturized valves, and pumps that are controlled by piezo elements are integrated on small chip cards. We can only wait and see whether such serial reaction automats are the laboratories of the future. 11.12 The Protein Finds Its Own Optimal Ligand: Click Chemistry and Dynamic Combinatorial Chemistry Could a protein simply produce its own best inhibitor itself? With the ideal geometry, it should be able to form the optimal interactions directly in the binding pocket of the target enzyme. Which chemical reactions might be best suited for such a concept? It would have to be a reaction that can be conducted in aqueous medium, is reliably enthapically driven, is fast, and that gives complete turnover. Such a reaction, named “click chemistry,” was investigated in detail in the group of Barry Sharpless in La Jolla California in recent years. Cycloadditions of unsatu- rated compounds (1,3-dipolar cycloadditions, Diels–Alder reactions); nucleophilic substitutions, particularly ring-opening reactions; non-aldol-like carbonyl reac- tions; and additions to C─C multiple bonds fulfill these requirements. These can be applied by using combinatorial principles. The 1,3-dipolar cycloaddition (Huisgen Reaction) can be particularly well used to build five-membered triazole and tetrazole heterocycles (Fig. 11.8). 1,4-Disubstituted 1,2,3-triazoles can be regiospecifically produced by the reaction of an azide and alkyne in the presence of Cu(I) salts at room temperature. 1,5-Disubstituted triazoles are formed when copper ions are excluded or other ions such as ruthenium are added. The reaction runs in a broad pH range between 4 and 12. The reaction type can be extended to tetrazoles. For this, nitriles are needed as dipolarophile reaction partners in the presence of zinc ions. The research group of Jean-Marie Lehn in Strasbourg chose another way. They developed “dynamic combinatorial chemistry” through the spontaneous construc- tion of molecules from suitable starting materials and irreversible reactions (Fig. 11.9). All imaginable combinatorial products form from a mixture of different building blocks. A dynamic exchange equilibrium is established between them. The target receptor, (e.g., a protein) is added to such an equilibrium system. This way the mixture components with the best protein-binding characteristic have an advan- tage, as the protein captures the best binders and shifts the equilibrium. It leads to a self-perpetuating choice of the ligands that fit best into the binding pocket. In this way the added protein practically seeks its own best inhibitor. 11.12 The Protein Finds Its Own Optimal Ligand 227
  • 245.
    Even click chemistrycan be directed toward a such a self-selecting synthetic process. Acetylcholinesterase (AChE, ▶ Sect. 23.7) was added as a target protein to a mixture of potential azides and alkynes as starting materials for the Huisgen reaction partners. From the multiplicity of imaginable reaction products a femtomolar inhibitor was selected! When decorated on one end by a phenylphenanthridine group, and a tacrine head group suited for the shallow entrance, the azide and alkyne react in the middle of the hose-shaped binding pocket to afford a triazole (Fig. 11.10). Very few products form. They are predetermined by the possible position of the starting compounds. The crystal structures could be determined with two potent products. The newly produced triazole ring forms an H-bond that is mediated by a water molecule with Ser203 in the catalytic center of the protein. The triazole ring does not form preferentially as the entropically favored product of a simple linker the polar interaction with Ser203 appears to be the driving force. In a similar way, carboanhydrase II (▶ Sect. 25.7) was used as a target protein for the selection of suitable reactants in a Huisgen reaction. In this case the alkyne component was initially coupled to the catalytic zinc ion over a benzylsulfonamide anchor. Later, structurally fitting azide components could be brought to react in the funnel-shaped binding pocket. Nanomolar inhibitors were produced. Analogously, N N + N R2 N N R2 R2 N CH R1 N R1 N N R1 N N + N R2 1 4 1 5 + HC R1 N N + N R2 N N R2 1,4-Triazole 1,5-Triazole N N N N R1 N N N N R1 1,5-Tetrazole Fig. 11.8 The 1,3-dipolar cycloaddition (Huisgen reaction) is a typical click chemistry reaction and leads to five-membered triazole and tetrazole heterocycles. In the presence of Cu(I) salts, azides and alkynes react regiospecifically at room temperature to form 1,4-disubstituted 1,2,3- triazoles, in the absence of copper but with ruthenium ions, 1,5-disubstituted products are formed. If a nitrile is used instead of the alkyne component, and the reaction is catalyzed by zinc ions, 1,5- disubstituted tetrazoles are obtained as products. 228 11 Combinatorics: Chemistry with Big Numbers
  • 246.
    success has beenachieved with HIV protease (▶ Sect. 24.3) and ACh-binding protein (▶ Sect. 30.5) Goal-oriented combinatorial libraries are used as starting materials for these reactions. Time will tell what significance this in situ inhibitor synthesis will gain for practical drug research. 11.13 Synopsis • As a consequence of the tremendous acceleration of automated compound screening for biological activity, the amount of compounds required for testing significantly increased and stimulated the development of automated parallel synthesis and combinatorial chemistry. • Nature produces an enormous chemical multiplicity by combining either amino acids or nucleic acids to reveal polymers that fold into 3D arrangements. • The chemical space of organic molecules with up to 25 non-hydrogen atoms and that satisfy the requirements of drug-likeness has been estimated to host about 1027 imaginable candidates. • Chemical reactions on solid support, usually organic polymer resins such as cross-linked polystyrene, follow a stepwise synthesis strategy to build up mol- ecules sequentially on the solid phase. Complete yields and easy purification can be achieved, and product release from the solid phase is accomplished as the final step. Library generation Selection through the receptor Receptor Dynamic exchange of library components Library components Receptor Selection of the best binder Fig. 11.9 A mixture of different library components is furnished that interact under equilibrium conditions in dynamic combinatorial chemistry. Numerous products can form in the equilibria. They represent potential “keys” that can fit in the “lock” of the target protein. The added receptor protein binds to the best-fitting ligands from the compound mixture and shifts the equilibrium in the direction of increased formation of this product. It is then removed from the equilibrium by the protein binding (according to O. Ramström and J.-M. Lehn). 11.13 Synopsis 229
  • 247.
    N HN N N + NH2 H2N N N N + NH2 H2N N HN N N N N N H N N + NH2 H2N N N Phenylphenanthridine Tacrine 11.11 11.12 Ser203 Fig. 11.10The library produced from alkynes bearing an AChE-suitable tacrine side chain and AChE custom-made phenylphenanthridine-substituted azides. In the presence of acetylcho- linesterase (AChE) the products 11.11 (green) and 11.12 (gray) are formed, which proved to be potent enzyme inhibitors. They differ in the topology on the five-membered ring. Crystal structure determinations were accomplished with both inhibitors. The surface around the protein is shown with the bound ligand 11.12. Both ligands occupy the tube-shaped binding pocket of AChE. Compound 11.11 binds via a water molecule (red sphere) to the hydroxy function of Ser203. 230 11 Combinatorics: Chemistry with Big Numbers
  • 248.
    • Sophisticated syntheticstrategies have been developed to generate multiple products on the solid support from reagent mixtures in a limited number of reaction steps. Elaborate protocols have been established to keep track of product formation that also use elaborate chemical labeling techniques. • The biological activity testing of compound libraries generated by combinatorial chemistry on a solid support requires sophisticated protocols to detach and deconvolute the library. • The design and selection of building blocks used for library synthesis are purpose-oriented and consider the properties of the target(s) at which the library is subsequently screened. • Multiple protocols have been developed, either for combinatorial chemistry or parallel synthesis that immobilize either the library substrates on the solid phase, or the reagents are immobilized and the library is developed in the solution phase. • The target protein can be added to a mixture of reagents in click chemistry and dynamic combinatorial chemistry. From a large variety of possible reaction products, the protein binding pocket selects the best binder as a potent inhibitor or antagonist of the target protein. Bibliography General Literature Balkenhohl F, von dem Bussche-Hünnefeld C, Lansky A, Zechel C (1996) Combinatorial syn- thesis of small organic molecules. Angew Chem Int Ed Engl 35:2288–2337 Bannwarth W, Hinzen B (2006) Combinatorial chemistry. From theory to application. In: Mannhold R, Kubinyi H (eds) Methods and principles in medicinal chemistry, 26th edn. Wiley-VCH, Weinheim Baum RM (1994) Combinatorial approaches provide fresh leads for medicinal chemistry. Chem Eng News 72:20–26 Beck-Sickinger AG, Weber P (2002) Combinatorial strategies in biology and chemistry. Wiley, Weinheim Bunin BA (1998) The combinatorial index. Academic, San Diego Gallop MA, Barrett RW, Dower WJ, Fodor SPA, Gordon EM (1994) Applications of combina- torial technologies to drug discovery. 1. Background and peptide combinatorial libraries. J Med Chem 37:1233–1251 Gordon EM, Barrett RW, Dower WJ, Fodor SPA, Gallop MA (1994) Applications of combina- torial technologies to drug discovery. 2. Combinatorial organic synthesis, library screening strategies, and future directions. J Med Chem 37:1385–1401 Jung G (1999) Combinatorial chemistry. Wiley-VCH, Weinheim Jung G, Beck-Sickinger AG (1992) Multiple peptide synthesis methods and their applications. New synthetic methods. Angew Chem Int Ed Engl 31:367–383 Kay BK (1994) Biologically displayed random peptides as reagents in mapping protein–protein interactions. Persp Drug Discov Design 2:251–268 Kolb HC, Finn MG, Barry Sharpless K (2001) Click chemistry: diverse chemical function from a few good reactions. Angew Chem Int Ed Engl 40:2004–2021 Ley SV, Baxendale IR (2002) New tools and concepts for modern organic synthesis. Nat Rev Drug Discov 1:573–586 Bibliography 231
  • 249.
    Madden D, KrchnakV, Lebl M (1994) Synthetic combinatorial libraries: views on techniques and their application. Persp Drug Discov Design 2:269–285 Moos WH, Green GD, Pavia MR (1993) Recent advances in the generation of molecular diversity. Annu Rep Med Chem 28:315–324 Nicolaou KC, Hanko R, Hartwig W (2002) Handbook of combinatorial chemistry. Drugs, cata- lysts, materials. Wiley-VCH, Weinheim Ramström O, Lehn J-M (2002) Drug discovery by dynamic combinatorial libraries. Nat Rev Drug Discov 1:27–36 Seneci P (2000) Solid-phase synthesis and combinatorial technologies. Wiley-Interscience, New York Special Literature Bourne Y, Kolb HC, Radic Z, Sharpless KB, Taylor P, Marchot P (2004) Freeze-frame inhibitor captures acetylcholinesterase in a unique conformation. Proc Natl Acad Sci 110:1449–1454 Carell T, Wintner EA, Sutherland AJ, Rebek J, Dunayevskiy YM, Vouros P (1995) New promise in combinatorial chemistry: synthesis, characterization, in screening of small-molecule librar- ies in solution. Chem Biol 2:171–183 Dooley CT, Chung NN, Schiller PW, Houghton RA (1993) Acetalins: opioid receptor antagonists determined through the use of synthetic peptide combinatorial libraries. Proc Natl Acad Sci USA 90:10811–10815 Fink T, Reymond J-L (2007) Virtual exploration of the chemical universe up to 11 atoms of C, N, O, F: assembly of 26.4 million structures (110.9 million stereoisomers) and analysis for new ring systems, stereochemistry, physicochemical properties, compound classes, and drug dis- covery. J Chem Inf Model 47:342–353 Geysen HM, Meloen R, Barteling S (1984) Use of peptide synthesis to probe viral antigens for epitopes to a resolution of a single amino acid. Proc Natl Acad Sci USA 81:3998–4002 Murphy MM, Schullek JR, Gordon EM, Gallop MA (1995) Combinatorial organic synthesis of highly functionalized pyrrolidines: identification of a potent angiotensin converting enzyme inhibitor from a mercaptoracyl proline library. J Am Chem Soc 117:7029–7030 Zuckermann RN, Martin EJ, Spellmeyer DC et al (1994) Discovery of nanomolar ligands for 7-transmembrane G-protein- coupled receptors from a diverse N-(substituted)glycine peptoid library. J Med Chem 37:2678–2685 232 11 Combinatorics: Chemistry with Big Numbers
  • 250.
    Gene Technology inDrug Research 12 Engineers and writers have predicted many developments in science and technol- ogy. In addition to other sophisticated machines, Leonardo da Vinci described the principle of the helicopter. In the early 1820s, Charles Babbage designed an automatic calculator long ahead of its time. Over 160 years later, the mechanical precursor of a programmable computer was in fact built, and it worked! Jules Verne described submarines and a journey to the moon, and Hans Dominik described obtaining energy by splitting the atom. All of these visions have become reality. Only a single application was preconceived for gene technology, the most seminal invention of our time: the cloning of two genetically identical individuals in Aldous Huxley’s Brave New World. It remains a hope that researchers will respect ethical boundaries, and despite the feasibility, never actually use Huxley’s idea. With the methods of gene technology it is possible to bring new genes into the cell, multiply them, and exchange or remove them. If they are removed or changed, the cell can no longer produce the originalprotein derived from that gene. By introducing a new gene and using a clever choice of method, the cell manufactures a foreign product, either a purposefully modified protein, or an entirely new one. For many diseases, the molecular cause is known to be the absence of a protein, or a genetically caused mutation in a protein. Only a few generally known examples are mentioned here: • Diabetes as a result of insulin deficiency, • Particular, hereditary cancer forms (e.g., familial colon cancer, malignant melanoma), • Chorea Huntington, a chronic form of brain atrophy, • Sickle cell anemia, a genetic disease producing malformed red blood cells (Sect. 12.14), and • Bleeding disorders that are caused by the absence of particular coagulation factors (see Sect. 12.14). The possibility of purposefully producing arbitrary proteins has yielded the following main applications of gene technology. • The identification of genes and proteins that could play a role in the treatment of a disease, • The development of animal models to test a therapeutic principle, G. Klebe, Drug Design, DOI 10.1007/978-3-642-17907-5_12, # Springer-Verlag Berlin Heidelberg 2013 233
  • 251.
    • The productionof proteins for therapies in which a particular protein is missing, • The manufacture of monoclonal antibodies and vaccines, • The manufacture of proteins for molecular test systems, and the determination of the 3D structures of enzymes and other soluble proteins, • The generation of proteins of which a targeted mutagenesis has been undertaken to exchange one or more amino acids for the elucidation of the mode of action of enzymes and for the characterization of receptor binding sites, • Somatic individual gene therapy for specific patients. Other application possibilities, for example, manipulation of the human germline, or genetic changes in crops to achieve herbicide resistance, or to prolong the shelf life of fruits, are only briefly mentioned here. 12.1 The History and Basics of Gene Technology The foundations of gene technology were first established in the middle of the twentieth century. The starting shot was made in 1953. Back then, James Watson and Francis Crick elucidated the three-dimensional structure of the hereditary substance of all living things, desoxyribonucleic acid, DNA. Immediate indications were obtained from the structure about the mechanism of our hereditary transfer and about the genetic code for the biosynthesis of proteins. A few years later, Werner Arber found enzymes, restriction enzymes that attack a very specific position on the double helix to sequence-specifically cleave the DNA. What was initially seen as a curiosity proved to be an exceedingly important discovery for gene technology. It is possible to selectively cleave DNA with these enzymes and to introduce new fragments. Next, the merging of new information with the original DNA, the recombination of the genetic constitution, is accomplished with ligases from special viruses called bacteriophages. The techniques for DNA sequencing have also made decisive progress. Soon afterward, the amino acid sequence of a protein was no longer directly determined, but rather deduced from the analysis of the corresponding DNA. Today, for sequencing the detour over the cDNA is used, which is complementary to the RNA (Sect. 12.6). In 1973 Stanley Cohen and Herbert Boyer managed to recombine the genome of a bacterium for the first time (Fig. 12.1). Then things happened one after the other: two years later the bacterial strain Escherichia coli K12, which is still used today, was developed. A part of its genetic constitution is missing so that it is only viable under laboratory conditions. This bacterium can be arbitrarily genetically manipu- lated without the worry that it could be injurious. The British scientists H. Wil- liams-Smith and E. S. Anderson carried out self-experiments independently of one another in that they orally ingested Escherichia coli K12. They proved that these bacteria only survive in the GI tract for a short period of time, and that the K12 gene, which confers antibiotic resistance for the selection of transformed cells, cannot be transferred to the normal Escherichia coli that is found in the intestinal flora. Experts discussed the possible dangers of gene experiments at a conference in Asilomar, California, and defined different risk and safety classes. In 1976 the 234 12 Gene Technology in Drug Research
  • 252.
    company Genentech wasestablished. Its founder, Herbert Boyer had to borrow US $500 as start-up capital! When in 1980 the company was initially traded on the stock market, within a few minutes he became a millionaire because of the value of his stock. As early as 1982 Genentech introduced the first medication to the market, that was manufactured by using gene technology human insulin (Humulin® ). In 1983 Kary Mullis made a very decisive contribution to gene technology in that he developed the polymerase chain reaction (PCR) while he was working at the California company Cetus, which was founded in 1971. Heating melts double- stranded DNA into its single strands, then the four DNA nucleotides are added, as are two short single-stranded DNA pieces that are complementary to the regions at the beginning of the DNA, the so-called primers. A polymerase can then be used to synthesize new DNA in a test tube. This means that by starting with the primers, a new double strand is formed (Fig. 12.2). A heat-stable DNA polymerase from the bacteria Thermus aquaticus, which is endemic to the hot springs in Yellowstone National Park is used for the DNA synthesis. Each repetition of this step doubles the original DNA amount. Within a few hours, billions and trillions of DNA molecules can be manufactured from a single starting molecule. This amount is enough for a sequencing of the relevant DNA segment. Cutting the Plasmid into a Linear, Double- Stranded DNA Target DNA is Added Ligases Fuse the Vector and Target DNA The Plasmid is Introduced into the Cell DNA Vector Bacteria Cell Transformed Cell Recombinant DNA Plasmid Fig. 12.1 The principle of gene technological recombination of hereditary information. Bacteria often contain additional genetic material in addition to their “chromosome” in the form of ring- shaped plasmids; these are used in gene technology as vectors to introduce foreign genes. Plasmids are removed from the cell and sequence-specifically cut with so-called restriction enzymes, which come from bacteria. The target DNA that carries the desired genes, which were typically also treated with the same restriction enzyme, is bound in vitro to the overlapping single-stranded DNA ends. The DNA ends are coupled with the enzyme DNA ligase, and the modified, recom- binant plasmid is brought into the bacteria cell. Plasmid vectors that are used in gene technology carry in addition to the DNA segment that is necessary for replication, additional information that allows the recognition and selection of the transformed cells (usually an antibiotic-resistance gene). In the presence of the selecting agent, only plasmid-containing cells grow. 12.1 The History and Basics of Gene Technology 235
  • 253.
    PCR methods areapplied diversely. The entire genetic information of an indi- vidual can be derived from a single DNA molecule. In medical diagnostics, this serves to evidence genetic disorders, cancer, infectious diseases, and risk factors. PCR methods are also used to establish a genetic fingerprint in paternity tests and in forensic science. Heating at 95 °C Double-Stranded DNA Molecule Two Single Strands + Two Primers Excess Nucleotide Heating at 95 °C Heat-Stable Polymerase + + Arrows: Direction of the DNA Synthesis (DNA + Copy) Repeating the PCR Cycle Four Single Strands Multiple Repetitions of the PCR Cycle (DNA + 3 Copies) Fig. 12.2 Polymerase chain reaction allows unlimited identical copies of a DNA molecule to be manufactured. For this the DNA is heated to separate the double-stranded DNA into complemen- tary single strands. Synthetic oligonucleotides with approximately 20 bases, so-called primers, which are complementary to these DNA strands hybridize with the corresponding strand. Each primer must bind to one end of the two DNA strands. The primers set the boundaries of the amplified DNA. Furthermore an excess of primer must be used because in each cycle one primer pair is needed for each DNA double strand. They are not explicitly produced in later cycles. The primers are necessary to effectuate the new synthesis of DNA in the presence of the DNA polymerase and an excess of the four different nucleotides. This occurs in the reverse direction (dashed arrow) because of opposite course of the DNA strands and the specificity of the polymerase. The newly synthesized DNA segment can be a few hundreds to thousands of base pairs long. The result is two identical double-stranded DNA molecules. After heating, single strands are obtained and the above-described procedure is repeated. Because the DNA polymerase is heat stable, it does not need to be repeatedly added. Each repeat of the above-described steps leads to a doubling of the DNA molecule. Its number grows exponentially. Ten cycles lead to about 1,000 DNA molecules, 20 to a million, and 30 already to a billion. In this way, a single DNA molecule can be multiplied into a quantity that is biochemically analyzable. 236 12 Gene Technology in Drug Research
  • 254.
    New genetic informationcannot only be brought into bacterial cells, but also into yeasts, virus-infected insect cells, and even in mammalian cells. In a first approximation though, it is valid that the more complex the organism is, from bacteria all the way to mammalian cells, the more difficult it is to produce proteins in these cells. On the other hand, insect and mammalian cells have the advantage that they not only produce small proteins but also more complex ones (e.g., glycosylated proteins) in a functional form. In many cases, such organisms are therefore to be depended upon. 12.2 Gene Technology: A Key Technology in Drug Design The 1970s and 1980s were the grand age of receptor-binding tests with membrane preparations. Radioactively labeled ligands were used to determine the specific binding of new substances. The most important receptors for hormones and neurotransmitters were known and in some cases, the difference between pre- and postsynaptic receptors as well. The different subtypes and their amino acid sequences were not known. Correspondingly, the results of the investigations were inaccurate. Gene-technology methods allow the preparation of homogeneous recombinant proteins in practically unlimited quantities. They play an important role at the very first step of drug design: the identification of a target protein. Progress with the methodology led to the discovery of new receptors with partially unknown function or specificity. The next steps are the testing of the therapeutic concept on genetically altered animals. Another important contribution is the preparation of proteins for molecular test systems and the isolation of adequate material for the elucidation of the 3D protein structure (▶ Chap. 13, “Experimental Methods of Structure Determination”). With perhaps the exception of a very few proteins that can be isolated from blood or other natural sources, the production of large quantities of protein is dependent on gene technology. Nowadays the purification of proteins from animals or human blood is done rather reluctantly. The risk of transmitting viruses or infections is deemed to be too high. Gene technology offers the possibility to selectively produce structural variants of proteins. The generation of point mutations (site-directed mutagenesis) allows particular properties in proteins to be improved, and the binding and catalytic properties of enzymes to be purposefully changed. Membrane-bound receptors can be probed position by position to establish which amino acids are responsible for the maintenance of the 3D structure, the adoption of a particular conformation, or are of critical importance with respect to binding of a ligand. Three-dimensional structural models of receptors can be generated in this way, or their relevance can be appraised. In many cases, it has also proven worthwhile to introduce point mutations that change the surface properties of proteins and help to elucidate the 3D structure of proteins. Sometimes the charge on individual amino acids must be changed for the 12.2 Gene Technology: A Key Technology in Drug Design 237
  • 255.
    sake of theprotein crystallization. In the case of proteins in which a part of their sequence is anchored in the membrane, the membrane anchor, which would impede crystallization, is removed before the experiment. With soluble receptors it has proven worthwhile to remove individual domains, crystallize them, and determine their structure. Of course such modified proteins must still fulfill their particular functions, that is, ligand binding or DNA docking. If the difficult crystallization step is accomplished, then the actual structural elucidation is nowadays usually only a matter of a few weeks in most cases (▶ Chap. 13, “Experimental Methods of Structure Determination”). If the contributions for humanity are considered that come from all of this progress, the question involuntarily arises: why are such broad segments of society so afraid of gene technology? It takes a little effort to understand these prejudices. With the use of gene technology, almost everything that is theoretically imaginable is possible in the field of genetics. The trust that people have in science is, however, not as unshakable as it was before the atom bomb. Now, when significantly more chances than risks are at hand, the sins of our forefathers have come back to haunt us. Scientists have all too often underestimated possible risks in the past and put their ethical concerns on hold. Scientists have still not managed to assuage public fears. We must take these fears earnestly and build new trust by behaving responsibly. 12.3 Genome Projects Decipher Biological Constructions The entire human genome is organized on 23 chromosomes. In 1990 the Human Genome Organization (HUGO), equipped with a budget of US $3 billion, started with the then exceedingly ambitious goal of sequencing the entire human genetic code from DNA within 15 years. By the end of 1993, the first annotated genomic maps became available, which were later refined. By 2001 it was then so advanced that the entire genome was published in Science and Nature in parallel by two consortia. The two competing consortia followed different strategies. The publically funded international consortium chose an approach of setting progressively narrowing parameters, the stepwise digestions of the genome, and the systematic elucidation of the sequences for the complete genomic analyses. In humans, this means that in addition to the 5% of DNA that corresponds to genes, the other 95% of sequenced DNA, the function of which was unknown, was classified with the somewhat derogatory term “junk DNA.” Today it is known that these areas take on important tasks in the regulation of gene expression (Sect. 12.7). The second strategy, which was pursued by the privately financed consortium, made use of the so-called shotgun method. For this, a longer DNA strand was amplified, and then cleaved into many arbitrary small segments. After these segments were sequenced, the sequences were reconstructed to the original long DNA strand by using a powerful computer pro- gram. This can only work, of course, when the sequences of the cleaved segments display adequate overlap. This technique proved to be significantly faster than the 238 12 Gene Technology in Drug Research
  • 256.
    usual systematic sequencingmethods. Above all, it benefited from the development of faster and faster sequencing machines and powerful bioinformatic programs. It was of no disadvantage in the end that because of the high redundancy of the method, the genome had to be sequenced multiple times with the shotgun method. Interest- ingly, the shotgun method was also used at the end by the international consortium that followed the systematic approach to elucidate local sequence areas. Because the initial intent of the private enterprise was to patent the sequenced genome, the competition between the two initiatives was great. In March 2000, the American President Bill Clinton declared the human genome to be not patentable, and spoke for its use by everyone for the common good. How did it come that a competing private initiative started to sequence the genome? In spring 1995 Craig Venter and his group identified the entire genome for the bacteria Haemophilus influenzae by using the shotgun method. The enormous amount of 1,830,121 base pairs that code for 1,749 genes was sequenced. The complete genomes of individual viruses were already known, but this was the decoding of the genetic information of a self-contained creature. The subsequent decoding of the sequence of 580,067 base pairs of the Mycoplasma genitalium genome by Venter’s wife, Claire Fraser, took only four months. Venter and his group worked with the shotgun method on the entire genome, the so-called “whole-genome shotgun sequencing.” The statistical approach that was followed by Venter initially seemed so unusual and utopian that his application for a research grant from the American National Institutes of Health (NIH) was rejected. This brought about the founding of The Institute for Genomic Research (TIGR) and the Celera Genomics company. There, Venter could pursue his research according to his ideas and plans. Finally, the success proved the feasibility of the proposed strategy. Whose genome was actually sequenced? In both initiatives the DNA of multiple individuals was mixed and the individual differences were purposefully calculated out. In this way the “consensus sequence” of the human genome was determined. But it did not stop with the human genome. The complete elucidation of baker’s yeast Saccharomyces cerevisiae, and the common thale cress Arabidopsis thaliana, the rice plant Oryza sativa, the pinworm Caenorhabditis elegans, the fruit fly Drosophila melanogaster, the chimpanzee Pan troglodytes, the mouse Mus musculus, and many other organisms (Table 12.1) has been accomplished. In the meantime new ones emerge weekly. This raises new questions: how should this plethora of information be managed? How can the genetic information be trans- lated into useful knowledge? The field of bioinformatics has been challenged. Computer programs for the intelligent comparison of sequences and the analysis of metabolic pathways and signaling cascades already exist. New initiatives were founded that have the goal of determining the spatial structure of all or at least many sequences. The structural space of all real, naturally occurring proteins is filling slowly. The crystal structures of all representatives of some protein families of the human genome have been determined. Therefore it is only a question of time until we can lay spatial blueprints next to the catalogue of all sequences in our genome. 12.3 Genome Projects Decipher Biological Constructions 239
  • 257.
    12.4 What IsContained in the Biological Space of the Human Proteome? After the human genome was sequenced, the exciting question arose as to which gene products all of these DNA sequences code for. Initially it must be remarked that the genome is not static, it is constantly changing. It is only in this way that the genetic variations can occur that make up the diversity of all creatures. In the course of evolution, the genetic constitution has expanded. Simple single-cell organisms without cell nuclei (prokaryotes) have a circular genome that contains only coding genes. Single-celled organisms with a cell nucleus (eukaryotes) such as yeast, have a larger genome, of which about 20% represents coding genes. Multicellular organisms such as humans have a genome that is 200-times larger than that of yeast (Table 12.1). The number of coding genes, however, is not larger. There are even organisms such as the amoeba that have a genome that is 200-times larger than that of humans. Even the miniscule water flea numerically overshadows us with its 31,000 genes. So the alleged masterpiece of creation does not necessarily also have the largest genome. Obviously only a small number of additional DNA sequences have accrued during the course of evolution that in fact code for additional gene products. Many genes in higher organisms are similar to those in simpler species. If the number of coding genes has hardly grown from the single-cell organisms to humans, and even the gene products that are coded for are similar, what is the explanation for the massive increase in complexity of the genome in higher- developed organisms? The answer is not in the diversity of the needed gene products, but much more in the finely tuned regulation of gene expression (Sect. 12.13). In higher organisms, it is of decisive importance where and at what time particular genes are expressed and gene products are synthesized. The 95% of Table 12.1 Examples for the sequenced genomes of different organisms Organism Genome sizea Genes HI virus 9.2 103 b 9 HI-9.2 virus, Phage l 4.85 104 70 Intestinal bacteria Escherichia coli 4.6 106 4,800 Baker’s yeast, Saccharomyces cerevisiae 2 107 6,275 Pin worm, Caenorhabditis elegans 8 107 19,000 Wallcress, Arabidopsis thaliana 1 108 25,500 Fruit fly, Drosophila melanogaster 2 108 13,600 Green blow fish, Tetraodon nigroviridis 3.85 108 Human, Homo sapiens 3.2 109 25,000 Common newt, Triturus vulgaris 2.5 1010 Ethiopian lung fish, Protopterus aethiopicus 1.3 1010 Amoeba, Amoeba dubia 6.70 1010 a Number of base pairs b Single-stranded RNA 240 12 Gene Technology in Drug Research
  • 258.
    human DNA thatdoes not code for proteins contains numerous sequences and signals that control this regulation. Therefore the total number of genes in higher- developed creatures does not seem to increase, but rather the gene density decreases. On average, 12 genes per one million base pairs are found in the human genome, whereas this number is 118 in the fruit fly, 197 in the pinworm, and 221 in the common thale cress. Furthermore the human genome is very scattered. It seems that it is not the number of genes but rather how they are used and how their activation is regulated that is decisive for the developmental state of the organism. It must also be considered that multicellular organisms also need a great deal of cell differentiation into different organs. These processes must be reliably regulated and controlled. Moreover, higher organisms achieve a much larger diversity in their protein composition by so-called alternative splicing. Posttranslational modification after the biosynthesis also plays a role. This is observed to a much smaller extent in, for instance, prokaryotes. The splicing process cuts out the portions of DNA that are not coding for proteins during translation from DNA to RNA. During alternative splicing, it is decided in what is cut out and what is translated. In this way, one DNA sequence can code for multiple different proteins. To date, the largest genome of a prokaryote that has been found belongs to the pathogenic protozoa Trichomonas vaginalis. It consists of 160 million base pairs. This pathogen is usually transmitted in humans by sexual intercourse and causes urinary tract infections. Its enormous genome takes on an over-proportional dimen- sion in the cell. This could create an advantage for the pathogen because its large surface area adheres to the vaginal mucosa better. Furthermore, the immune system has trouble to attack and destroy such an over-sized parasite. The genome of the soil bacterium Sorangium cellosum with 13 million bases and 10,000 genes is four times as large as the average genome of other bacteria. This might have something to do with the fact that this soil bacteria is able to carry out special tasks that makes its therapeutic use interesting. It is a versatile producer of complex natural products such as the epothilones, which are potent chemotherapeutics that have great poten- tial in the treatment of cancer. According to an analysis carried out in 2007, the human genome encompasses 3.25 billion bases. It contains around 25,000 genes, a few thousand of which are recognized as RNA genes (even today the number is not exactly named because only 92% has been fully sequenced). The earlier textbook knowledge that one gene product is behind each DNA sequence, must be expanded. It must not be overlooked that our genome contains many thousands of genes that are for non- coding RNA segments. The resulting RNA molecules accomplish important func- tions in our bodies. The large groups of tRNAs that serve as adapter molecules for the reading and translation of base-pair triplets in the genome into the correct amino acid sequence deserve special mention. Furthermore it has been shown that the ribosome itself, which is the molecular machinery for protein synthesis, consists largely of RNA. The spliceosome, the complex machinery for the removal of non- coding segments of the genome, contains RNA molecules, so-called snRNAs. 12.4 What Is Contained in the Biological Space of the Human Proteome? 241
  • 259.
    There are evenmore small RNA molecules (snoRNAs) that are responsible for the processing and modification of other RNA molecules. Since then, it is known that over 21,500 genes in our genome are translated into proteins. It is not known however, what functions all of these proteins fulfill. Bioinformatics has contributed a great deal to classification of their biochemical function, that is, whether the protein is an enzyme (e.g., a protease, kinase, or oxidoreductase) or whether it is a receptor, ion channel, or transporter. The function or to what protein class a new sequence belongs can be discovered by sequence comparisons to already annotated proteins. Often by making so-called multiple sequence comparisons within a protein family, a significant similarity can be recog- nized. The information about the spatial architecture and folding (▶ Sect. 14.2) can be analyzed through relationships because the spatial geometry of proteins has been much more strongly conserved than the sequential composition of the folded protein chain. It is often that individual motifs or characteristic sequence segments disclose a particular biochemical function of a protein. Another tool in this detective tour de force of functional annotation has proven to be protein sequence comparisons between the genomes of other species. The assignment of a biochemical function to a protein sequence affords a glimpse into its molecular function. It shows whether, for example, it cleaves a peptide sequence as a catalyst, carries out a metabolic reduction, or transduces a signal to the cell as a receptor. What this regulation and control mean for the organism remains to be resolved. Whether a particular protein causes a disease by either a defective function or by dysregulation is just as unclear. The correction of such a defect could lead to a successful pharmaceutical therapy. In the Science publication from the Venter group in 2001, it was assumed that the genome coded for more than 26,500 proteins. At that time, a definitive function could not be assigned to 40% of the sequences. In the remaining part, about 10% were detected to be enzymes. Another 12% proved to be involved in signal transduction, and 13.5% are nucleic acid binding proteins. The large remaining group was scattered across many different functions such as proteins of the cyto- skeleton, surface receptors, ion channels, transporters, extracellular matrix proteins, immune system proteins, or chaperones. Seven year later this picture could be refined. The largest protein family with more than 7,000 members contains the zinc finger domain (▶ Sect. 28.2). These proteins assume an important role in transcrib- ing sequence segments of the DNA into RNA. Most zinc finger proteins belong to the group of transcription factors. Another large protein family contains the immunoglobulins. These domains (▶ Sect. 32.1), which are constructed from b- pleated sheets, occur in antibodies. A few protein families are listed in Table 12.2 and are presented in more detail in ▶ Chaps. 23, “Inhibitors of Hydrolases with an Acyl–Enzyme Intermediate”; ▶ 24, “Aspartic Protease Inhibitors”; ▶ 25, “Inhibitors of Hydrolyzing Metalloenzymes”; ▶ 26, “Transferase Inhibitors”; ▶ 27, “Oxidoreductase Inhibitors”; ▶ 28, “Agonists and Antagonists of Nuclear Receptors”; ▶ 29, “Agonists and Antagonists of Membrane-Bound Receptors”; ▶ 31, “Ligands for Surface Receptors”; and ▶ 32, “Biologicals: Peptides, Proteins, 242 12 Gene Technology in Drug Research
  • 260.
    Nucleotides, and Macrolidesas Drugs” of this book. The compilation of what protein family is frequently associated with what disease (Fig. 12.3) is interesting. This list is led by the protein kinases (▶ Chap. 26, “Transferase Inhibitors”). Therefore it is not surprising that current basic research in the pharmaceutical industry is intensively concentrated on the control and inhibition of protein kinases. The cadherins follow this group. These proteins are important for the stabilization of cell–cell contacts. They play a role in the embryonic morphogenesis, signal transduction, and intervene in the construction of the cytoskeleton in cells. The G protein-coupled receptors, ion channels, trypsin-like serine proteases, or RAS proteins also belong to this list of proteins that are potentially associated with disease, especially when genetically altered. Finally, how the human genome is different from other eukaryotes should also be considered. From more than 2,200 protein families that have been discovered in organisms with a cell nucleus, over 1,000 are missing from the human genome. Most of these families assume specific tasks in the relevant organisms or are explained phylogenetically. Among these are, for example, venoms such as found in snakes, scorpions, or insects. Proteins occur in plants that assume a very specific function for the plant, for example nutrient storage in the seeds, or defense against disease. As a rule, the proteins that are absent in humans assume biochemical functions that are irrelevant for our organism, or they assume a highly specific task in the lower eukaryotes. Table 12.2 Examples of protein families in the human genome and the number of their members Protein superfamily Number Zinc finger (C2H2 and C2HC) 7,707 Protein kinase-like 876 G Protein-coupled receptor-like 784 a/b-Hydrolases 151 Cysteine proteases 164 Trypsin-like serine proteases 155 Metalloprotease (“Zincins”), catalytic domains 132 FAD/NAD(P)-binding domains 79 Cytochrome P450 79 Integrin a, N-terminal domains 51 Cytokines 52 cycl. Nucleotide-phosphodiesterase, catalytic domains 50 Caspase-like 39 Carbonic anhydrases 23 Aquaporin-like 20 Integrin domains 18 Aspartic proteases 16 ClC-chloride channel 16 Subtilisin-like 14 http://hodgkin.mbu.iisc.ernet.in/human/ 12.4 What Is Contained in the Biological Space of the Human Proteome? 243
  • 261.
    12.5 Knock In,Knock Out: Validation of Therapeutic Concepts Molecular biology delivers a plethora of information about how diseases develop and how their course can be influenced. The long route from the search for and development of a new drug is based on this. In the end it might be determined that the result, even though it was so well planned, did not lead to the desired clinical success. It is therefore important to have an animal model available upon which the therapeutic concept can be validated early on. Clas- sical test models are often not available because the corresponding disease does not occur in animals. Since the 1980s, transgenic animals have been increasingly used in pharmaco- logical research. These are animals in which a particular gene is fully or partially turned off or is replaced by a human gene. An animal in which the gene is completely turned off corresponds to an animal in which the relevant protein is absent or non-functional. A heterozygous animal in which the gene is only present in one parent, corresponds to an animal in which the protein is partially blocked. If the gene for an enzyme or receptor is affected, the influence of an inhibitor or an antagonist can be simulated. The development and progress of a disease or the influence of protein inhibition on a disease can be observed in such an animal. In this way some assurance about the relevance of a therapeutic concept is established before an exceedingly long research and development process. The increased Protein kinases 250 300 Cadherins GPCR Fibronectin III 150 200 Homeobox Spectrin MHC I 50 100 Ion Transport Myosin RRM 0 Frequency Trypsin-like Laminin EGF Ras SH2 Fig. 12.3 The composition of protein families that are particularly often associated with human diseases (GPCR: G-protein-coupled receptor; Fibronectin: extracellular glycoproteins in tissue construction; homeobox: proteins that influence the morphogenetic development; spectrin: cyto- skeletal proteins; MHC I: major histocompatibility complex proteins that are involved in immune- recognition processes; myosin: motor protein in muscle control; RRM: RNA-recognition motif transcriptions factor; trypsin-like: serine proteases; laminin EGF: a growth factor in the extracel- lular matrix; Ras: oncoprotein in tumorigenesis; SH2: protein domains in the phosphorylation signal cascade). 244 12 Gene Technology in Drug Research
  • 262.
    production of aparticular protein can be induced by multiplying a gene. If the absence of a gene causes the overexpression of another gene, this will also become transparent. The gene product that is then produced in increased quantities can take over the missing function of the turned-off product. In such a case the planned therapeutic principle would only work if the other gene-product’s function were also blocked. This question plays an important role in the inhibition of kinases (▶ Sect. 26.2). A very specific gene is turned off in the so-called knock-out method. The technique was developed in 1987 by Mario Capecchi at the University of Utah. The sequence of the turned-off gene must be known. A structurally homologous gene that is not functional, for example because of the insertion of a stop signal, is generated. The gene is introduced in an animal, and the intact gene is replaced at exactly the same position. The process is called homologous recombination or also gene targeting. Mice are particularly well suited because the technology to manipulate their embryonic stem cells is especially advanced. A foreign gene, for example a human gene, can also be introduced. Mice are also well suited for this because their genome and the human genome are surprisingly similar. To generate a transgenic mouse, the female mice are treated so that they produce a large number of egg cells. After fertilization, stem cells are extracted from the embryos in a very early stage, the blastocytic stage. They are cultured in vitro, and the desired gene is injected into the cell. This procedure only works in low yield. A technique was developed with which transfected from non-transfected cells can be differentiated. For this, the gene that should be transferred is coupled beforehand with a gene that confers resistance to the cell toxin neomycin. When the cells are treated with neomycin, only the transformed cells survive. The blastocytes are united with blastocytes of other mice and the altered embryos are carried to birth by mice. The offspring of the surrogate mothers are chimeric, that is, they carry the genetic information from the donor as well as the acceptor mice. Here mice with differently colored fur are chosen so that the transformed mice can be easily recognized by their spotted fur. Another method is that foreign DNA can be injected directly into an early embryonic stage. A disadvantage of the random incorporation of a gene is the possibility of destroying another gene, a lack of expression of the new gene, or multiple incorporations. Animals from the first litter are bred to generate both genetically mixed, heterozygous animals, and genetically homogenous, homozy- gous animals. Particularly sophisticated techniques even allow the selective turning on and off of the new genes. In this way transgenic animals are generated in which hereditary diseases, for instance, cystic fibrosis, Crohn’s disease, phenylketonuria, and others, can be studied. Relevant animal models also exist nowadays for diseases that have different or multiple causes such as cancer, diabetes, rheumatoid arthritis, and Alzheimer’s disease. Since 1988 when the American Patent Office first granted a patent for a transgenically altered mouse, a controversy has erupted as to whether a living creature can be patented at all. In the meantime there are whole series of 12.5 Knock In, Knock Out: Validation of Therapeutic Concepts 245
  • 263.
    patents for gene-technologicallyaltered animals, including European and German patents, and the conflict about whether these patents are ethically or legally valid continues. 12.6 Recombinant Proteins for Molecular Test Systems Early on, pure or enriched enzymes were available for in vitro tests, but only in those cases in which the material was easily available, for instance, human throm- bin from blood. In other cases, animal material had to be used with all the risks that come with it considering the relevance for rational design (see ▶ Sect. 19.11). There are many proteins that cannot be isolated in adequate amounts or in a homogeneous form. The sequence determination and the production of such proteins are simple today. The unbelievably small amount of a few picomoles (1 pmol¼1012 mol) is enough to determine the primary structure of a short sequence. It is over the thus-determined amino acid sequence that, after the translation procedure, the genetic code can be reconstructed into a gene. In doing so, it must be considered that multiple base triplets can stand for a particular amino acid (so-called degenerate codes, ▶ Sect. 32.7). A group of single-stranded oligo- nucleotides are synthesized that could theoretically cover all the original peptide segment. These molecules can be used to find a complementary sequence in a cDNA library. cDNA (complementary DNA) is the complementary DNA to the mRNA (messenger RNA). It is obtained from the mRNA, which merely contains the sequence that is needed for the biosynthesis of proteins, by translation with a reverse transcriptase (▶ Sect. 32.5). Finally the gene is produced in larger quantities by using PCR techniques, and the amino acid sequence is determined via its base sequence simply because oligonucleotides are much easier to sequence. Next, the gene is brought into cells that are allowed to reproduce. There can be difficulties in a few cases with this step. In bacteria, such as the intestinal bacterium Escherichia coli, or in yeast cells, only soluble proteins can be produced. Some proteins accumulate in inclusion bodies. They must be extracted, dissolved, and refolded under specific conditions. The gene segment for a small protein is often coupled with the information for another protein and both are then expressed together. The large protein conjugate that forms in the cell is better protected from metabolic degradation than small proteins. In the preparation, the non-essential part is cleaved from the protein conjugate. There can be problems if the folding of the protein is not correctly accomplished, or if multiple chains (as in insulin) must be coupled over disulfide bridges. Larger proteins that must be furnished with sugar groups to accomplish their function (glycosylation) must be produced in cells from higher organisms, for example in mammalian cells. The manufacture of complex proteins in insect cells has become particularly attractive. These cells are infected with the so-called baculovirus, in which the desired information has been incorpo- rated into its genome. The virus codes for the protein and insect cells provide the production and subsequent glycosylation abilities. Not only enzymes, but also recep- tors, ion channels, and entire signaling cascades can be produced in cells in this way. 246 12 Gene Technology in Drug Research
  • 264.
    12.7 Silencing Genesby RNA Interference How intervention in the germline of organisms of genetically altered species can occur so that a particular absent or defective gene and therefore gene product can be replaced was introduced in Sect. 12.5. The function of particular genes for an organism can be studied in this way. The consequences for the organism, for example, of blocking a particular gene product, are made transparent prior to the development of a potent active substance. In the late 1990s, another technique was discovered that allows genes to be silenced without intervening in the molecular biology of the genes of an organism. This work was carried out by Andrew Fire and Craig Mello. For their accomplishments, which are only slowly being validated, they were awarded the Nobel Prize in 2006. Genes are archived in DNA. For gene expression the coding part of the genome is transcribed into mRNA. Based on this copied information, the ribosome trans- forms the base sequence into a peptide sequence. In the early 1980s the idea emerged to trap the translated information on the single-stranded mRNA by adding an inversely arranged RNA complement strand, the so-called antisense strand. The two strands can hybridize, that is, they can bind to form a matching double strand. Double-stranded RNA is then the result. This antisense principle (▶ Sect. 32.4) did not deliver the hoped-for, break-through result. Genes were partially or weakly suppressed, however, even the addition of a normal RNA strand can achieve suppression. Fire and Mello suspected that neither the normal nor the antisense strand could cause a gene blockade, but rather the double-stranded form that was added inadvertently as an impurity. Renewed experiments confirmed the assump- tion. Interestingly, even small amounts of double-stranded RNA are enough to take many mRNA molecules out of action. When using the antisense strands, on the other hand, stoichiometric amounts are necessary. This also shows that short, ca. 20-nucleotide-long double-stranded RNA fragments are enough to silence an entire mRNA gene sequence. Fire and Mello named the phenomena RNA interference. What had happened? An enzyme with the name dicer cleaved the double-stranded RNA into 21–23 base-long pieces that then caused the blockade. For this, double- stranded RNA pieces are incorporated in an enzyme complex called RISC (RNA- induced silencing complex) and separated into single strands. One strand breaks away from the complex while the other remains there to act as a template to capture mRNA molecules. The sequence of the captured strand allows the RISC complex to recognize all mRNA with a complementary base sequence and to cleave them sequentially. Finally, they are digested by enzymes in the cell plasma. The cell selectively eliminates only the mRNAs that contain the sequence pattern that is complementary to the short RNA strands in the RISC complex. In practice, this gene blockade has proven to be simpler and more reliable than the antisense technique. RNA interfer- ence even allows discovered genes to be systematically blocked to draw conclu- sions about the resulting consequences for the organism. RNA interference serves not only analytical purposes. There are already biotech companies that want to turn off disease-causing genes with small RNA fragments. 12.7 Silencing Genes by RNA Interference 247
  • 265.
    There is anotherbigger problem though: how is a 22-base RNA molecule to be transported into the cell to the place where it should act? Strongly charged mole- cules cannot cross the cell membrane. For this, a special delivery system that allows this task is needed. Intensive research is taking place on the development of such systems, but the problem is a long way from being solved. A reliable and highly efficient system that can selectively transfer such polar and nuclease-sensitive molecules into the cell interior is likely to open a totally new and presently unforeseeable perspective for the therapy of disease. The goal is to construct delivery systems that can pack the fragile and polar freight of RNA molecules and dock onto the cell. There, the coat of these carriers must merge with the cell membrane or selectively achieve penetration to arrive in the interior of the cell. One concept follows the packaging and compartmental- ization of RNA in polymers such as polyethylenimine. The positive charges on the polymer backbone can bind and encapsulate a negatively charged polymer molecule such as RNA or DNA building blocks. Other systems try to make the RNA or DNA molecules bioavailable for the cell by encapsulating them in a membrane-like coat. This packaging in liposomes leads to a selective adhesion of the artificial cell to the membrane of the target cell, and then the liposome melts with the target cell in an endocytosis-like process. A further problem is the danger that small, silencing RNA molecules (siRNAs) could cause an immune response. A solution to this dilemma is represented by the chemical modification of the siRNAs. For this the RNA molecules are modified so that they can still optimally hybridize to the addressed segment in the mRNA, but have better properties in terms of transport, immunogenity, and stability. For this, the OH groups of ribose building block of the nucleotides have been exchanged for fluorine, methoxy, or hydrogen. siRNA research certainly is still in its infancy. The potential of the methods seems to be impressive, as it uses the principles applied in Nature for gene regulation. As previously described, we have genes in our genome that code for microRNAs and that show sequence complementarity over long stretches. Structurally, they exist as double strands. They are cut to size by the dicer protein and can serve to interfere with RNA: as a result, this leads to an alternative means of gene regulation. For a broad therapeutic application of externally administered RNA fragments there are certainly important prerequisites to fulfill such as transport into the host cell and the prevention of an immune response. Currently the technology serves the construction of model organisms, to study the consequence of turning off genes. Nonetheless, the validation of the method for use under in vivo conditions has long since begun. 12.8 Proteomics and Metabolomics The approaches that were described in Sects. 12.5 and 12.7 pursue the goal of turning a disease-causing gene, or a gene that plays a role in a disease off. But how is it to be recognized whether a particular gene or gene product is involved in a disease process at all? Decisive indicators to answer this question can be extracted 248 12 Gene Technology in Drug Research
  • 266.
    from the proteincomposition in the cell. This composition changes dynamically. It is termed proteome, and reflects the totality of all proteins in a cell, actually in the entire organism, at a given time under entirely defined conditions. If we concentrate on the protein pattern of a cell from a particular organ, important variables are the metabolic state, the developmental stage of the organism, the time point in the cell cycle, or the surrounding temperature. Disease processes and pharmaceutical ther- apy also change this pattern. In the transcriptome, all theoretically expressed proteins are coded as static hereditary information. In contrast, the proteome reflects the protein composition at a particular time point. The difference between a butterfly in its caterpillar and adult phases serves as an impressive example of the difference between the genome and the proteome. The genome is the same for both, but the proteome is significantly changed, which is expressed in the form of a completely different phenotype. In view of disease processes or a pharmaceutical therapy thereof, the proteome can be used to compare the state of cells that are healthy, diseased, as well as under the influence of a drug therapy. Initially this seems like an extremely complex, barely solvable task. A cell contains thousands of proteins, of which many are modified after their expression. For example, the first amino acids in a sequence are cleaved (▶ Sect. 25.9), phosphate groups are transferred (▶ Sect. 26.3), sugar building blocks are added on, disulfide bridges are coupled, prosthetic groups are added, and ubiquitin or prenyl groups are added (▶ Sect. 26.10). In addition, alternative RNA splicing occurs, which is carried out as a mechanism of gene regulation and further increases the diversity of the proteome on the basis of a comparatively small number of genes. All of this dramatically increases the diversity of the protein composition, probably by a factor of 5–10 compared to the genome composition. Nevertheless, a sophisticated analytical method has been developed with which it is possible to analyze the proteome of a cell at a particular time point. First a cell must be denatured in a way that all modifying processes are abruptly stopped so that conclusions can be drawn about the cell contents. The cell lysate is then subjected to separation. Proteins contain many acidic and basic amino acids so that an exactly defined pH value exists for each protein at which the protonation or deprotonation arrives at a state at which the protein appears to be overall electrically neutral. This pH value is specific for each protein and depends on the amino acid composition (isoelectric point). The protein mixture is added to a solid support (a polyacrylic acid gel) as would typically be used for chromatog- raphy purposes. Then voltage is applied. If the proteins carry a charge, they migrate over this solid support in the direction of the oppositely charged pole. In this way, at some point in their migration over the gel, which is construed in such a way that a continuous pH gradient from one end to the other is established, the applied proteins reach a point where their exterior appears to be uncharged overall. If this position is reached on the solid support, the proteins no longer migrate. Proteins are then separated according to their isoelectric point by using this so-called isoelectric focusing. All proteins with the same isoelectric point migrate the same distance and occur as a mixture. Then the chromatography plate is turned 90 , and the proteins are separated again, however now according to a different principle. For this, the 12.8 Proteomics and Metabolomics 249
  • 267.
    proteins are thermallydenatured and their charges are masked with sodium dodecylsulfate, a strongly charged anionic surfactant, so that all are virtually equally charged on the exterior (SDS-PAGE). The denatured proteins migrate again by the application of an electrical field. Now the migration speed is, however, dependent on the mass of the proteins. The direction of the migration, which is perpendicular to the first isoelectric separation, causes the originally applied pro- teome to be broadly distributed and well separated on the solid support in the end. By using this 2D electrophoresis, it is possible to separate many thousands of proteins. The quantity and sequential composition of the separated proteins must be characterized. Many different staining and fluorescence techniques have been developed for the quantitative analysis. They allow quantitative determinations, especially in comparison to the proteomes of analogous cells that are in a different state. This is how the quantitative comparison of the protein composition in a diseased and healthy state is accomplished. How the proteome changes under the influence of a drug (Fig. 12.4) can also be determined. But how does one recognize what is hidden in each individual protein spot on such a 2D gel? For this the proteins are extracted from the plate and digested with trypsin. This protease (▶ Sect. 23.3) cleaves the denatured proteins into small peptide fragments, which are finally analyzed by mass spectrometry. Sophisticated technologies together with computer analyses of precalculated fragmentation patterns of proteins allow the proteins to be reconstructed and characterized with regard to their sequence. Proteins in the proteome that have either been up- or down-regulated due to a disease process can be detected in this way. Whether, however, the altered expression pattern causes the pathological state or is a consequence of it is a question that remains to be answered by independent experiments. As described, the proteome of a cell can change upon therapy with a drug. What are the interaction partners for a given drug? Are the induced effects always the a b c 1 2 3 1 2 3 1 2 3 8 9 8 9 8 9 5 6 4 7 5 6 4 7 5 6 4 7 10 10 10 Fig. 12.4 2D-Gel electrophoresis for cellular proteome analysis: (a) proteome of a normal cell. (b) Proteome for a pathologically altered cell. (c) Proteome of a pathologically altered cell after treatment with a drug. Changes in the protein concentration are indicated by red circles. Above all, the proteins at positions 3, 6, and 7 are clearly up-regulated in the diseased state. A few of the pathological changes are corrected by the drug therapy, but new changes in the proteome (e.g., 2, 8, and 10) might be induced by side effects (Figure taken from Lottspeich F (1999) Angew Chem Intl Ed Engl 38:2476–2492). 250 12 Gene Technology in Drug Research
  • 268.
    same if drugsfrom the same compound class are used? The properties of three kinase inhibitors that were developed for the treatment of chronic myeloid leukemia (▶ Sect. 26.5) were investigated in detail in the research group of Giulio Superti- Furga in Vienna, Austria. For this, the drug first had to be equipped with a chemically inert anchor group. It is certainly quite a sophisticated challenge to find the correct position to place an anchor on such an active substance so that the mode of action is not significantly perturbed. As a rule, multiple positions along the molecular scaffold must be tried for this purpose. Finally the drug is irreversibly covalently coupled over the attached anchor group to a chromatography column. Once equipped with these “baits,” the proteome from the lysate of a cell is added to the column. Proteins that have affinity to the immobilized drug stick to the column. Finally the binding partners that were detected in this pull-down experiment must be released from the column, separated, and characterized analogously to the above- described technique. The composition of all proteins that have affinity to the active substance is obtained. It is difficult to initially extract quantitative conclusions about the affinity of the binding partners, above all because the protein quantities and their composition in the lysate are highly variable. It is possible, however, to construct a profile for each active substance according to its protein interaction partners. This led to the surprising result that even drugs that belong to the same or similar substance classes and were developed for the same therapeutic indication can well display significantly different interaction profiles in the cell. This is an impressive observa- tion, the evaluation and application of which will require great research effort. We will see in the next section that the different efficacy, therapeutic deviations, and variable side-effect profiles in patients can be explained by this. Proteome analytical techniques (proteomics) can also be used in clinical diag- nostics. Without exactly resolving the analyte, significant changes in the form of a mass fingerprint can be recognized. Tumor diseases are revealed by changes in their protein composition. These can be recognized at a very early point, which should hopefully still allow a curative treatment for the tumor. Another technique that is analogous to proteomics is the analysis of metabolites that are produced in an organism. The term metabolome comprises all metabolites (e.g., metabolic degradation products) that are present at a specific time point. The techniques of metabolomics try to quantify the metabolite composition and to draw conclusions about the condition of the cell based on this information. This is particularly valid when the cell is exposed to foreign substances. If the metabolite profile at a particular time point is studied, especially in pathophysiologically or genetically changed conditions, the term metabolomic is used. The goal of this technique is to draw conclusions about the molecular composition in cells from body fluids such as urine, serum, or cerebral spinal fluid. This can lead to an improved and more sophisticated diagnostic procedure, and therefore an easier early detection of diseases. These techniques also serve to characterize proteins for drug therapy or to analyze the greater influence of an event in the cell that is being treated with a drug. The hope remains that these techniques will allow for a better understanding of the total effects of the use of pharmaceuticals, and finally achieve a higher safety standard for therapy. 12.8 Proteomics and Metabolomics 251
  • 269.
    12.9 Expression Patternson a Chip: Microarray Technology Thousands of molecules are found in the analysis of the genome, transcriptome, proteome, or metabolome, the occurrence of which must be characterized. This flood of data requires immense measurement capacity. Therefore in the late 1980s the development of microarray technology was initiated. Thousands of molecules that are to be analyzed in parallel in an automated fashion are attached to a support that is only a few centimeters large and is made of glass, silicon, gold, or nylon (Fig. 12.5). Only very small quantities of the biomolecules are needed. In the meantime, this technique has achieved a maturity that allows its use in routine analytical procedures. In addition to the appropriate preparation of the surface, it is also the art of reliable and standardized immobilization of the molecules needed for the precise analysis that guarantees the success of the method. In addition to proteins and protein domains, antibodies, antigens, and especially DNA, oligonu- cleotides, and RNA can be immobilized. Proteins are often anchored in that the protein of interest is co-expressed coupled to an anchoring protein such as streptavidin as a so-called fusion protein. The streptavidin anchor is attached to the surface via biotin. Further, chemistry with thiol groups is used. The coupling to the surface, which was previously equipped with appropriate reactive groups, is accomplished with disulfide bridges. Other strategies use amino groups, for exam- ple of lysine, which are then coupled to a reactive aldehyde group on solid support material. To test the composition of an analyte, a soluble mixture is added to a premanufactured chip. If binding partners are found in this transformation, the components from the analyte solution remain adhered to the surface. Such binding must be simple and detectable on the chip in a spatially resolved manner. Initially, stain and fluorescence were the method of choice (Fig. 12.5). Fluorescence stains, for example, green and red stains, are used for this because they can be excited and detected easily and in a spatially resolved way. If mixed signals resulting from a simultaneous red and green fluorescence occur, a yellow signal is obtained. In the meantime surface plasmon resonance has achieved a greater significance (▶ Sect. 7.7). As an alternative, this technique is used for the detection of binding. Moreover, techniques that function similarly to ELISA methods are also used (▶ Sect. 7.3). Frequently, microarrays are used to analyze the expression pattern of biolog- ical systems. For this the transcriptome of a cell is investigated under different conditions, for example in a diseased and healthy state. The first molecules to be successfully anchored onto chips were single-stranded DNA oligonucleotides. To study the coding mRNA of a cell in a particular state, these molecules are translated into a complementary DNA segment, the so-called cDNA by using reverse tran- scriptase (Fig. 12.5). These cDNA molecules, or the fragmented sections of cDNA that are obtained, are immobilized on a chip and cleaved into single strands. The cell lysate with the single-stranded mRNA (transcriptome analysis), or the trans- lated cDNA that was prepared from it, is added to such a chip, and the comple- mentary mRNA strand hybridizes with the oligonucleotide fragments that are anchored there. It is important in this process that the samples to be analyzed are 252 12 Gene Technology in Drug Research
  • 270.
    equipped with differentfluorescence dyes according to their origin. For example, the mRNA from a healthy cell is labeled green and that from a diseased cell is red. After the hybridization on the chip, there will be areas that fluoresce green, red, or yellow upon excitation, and others that remain without fluorescence. Areas that glow yellow under the fluorescent light indicate that mRNA molecules from healthy as well as diseased tissue have been bound. Obviously the mRNA that binds there is available equally in the diseased and healthy states. Areas where no fluorescence is seen indicate that neither healthy nor diseased cells produced mRNA that bound there. Areas that fluoresce either green or red are interesting Cells from Diseased Tissue Gen1 Gen2 Gen3 PCR RNA Isolation Reverse Transcriptase Fluorescence Labeling mRNA mRNA cDNA cDNA Construction of the Microarray Chip Hybridization ….. ….. Cells from Healthy Tissue Fig. 12.5 Manufacture and testing of an expression pattern with microarray technology. Individ- ual gene segments from an organism are cut out and amplified by using PCR (above left). Next they are immobilized on a microchip support as single-stranded oligonucleotides (below left). In addition to the isolated and amplified DNA, synthetically manufactured DNA building blocks or cDNA molecules, which are obtained from reverse transcription can also be brought onto the support. One sort of bait molecule is at each point on the support. RNA molecules are isolated from the cells of healthy (green) and diseased tissue (red), translated into mRNA, and reverse- transcribed into cDNA. The cDNA is provided with a colored fluorescence marker. Then the test molecule is added in a single-stranded form to the microarray plate, and if it is complementary, a hybridization (below middle) results. Finally the binding is analyzed under fluorescent light (below right). Yellow areas indicate that mRNA molecules from the healthy as well as the diseased cells have bound. The mRNA that binds there is expressed in healthy as well as diseased states. Areas that remain dark indicate that the mRNA is up-regulated neither in a healthy nor in a diseased state. Areas that are either only green or only red fluorescent indicate a difference in the expression pattern between cells from healthy and diseased tissue. 12.9 Expression Patterns on a Chip: Microarray Technology 253
  • 271.
    because they indicatedifferences in the expression pattern between healthy and diseased cells. In this way, gene products can be discovered that are involved in a disease process. If a misregulation is present, an attempt can be made to correct this state with a pharmaceutical therapy. 12.10 SNPs and Polymorphism: What Makes us Different What makes a single organism of a particular species different and leads to the enriching diversity of a population? We speak of the human genome, but many interesting deviations must be present so that we all look different and have different features. Polymorphisms, that is, variations in the composition of the genome, cause the observed diversity in or form the different phenotypes of a species. The most obvious phenotypic difference is the division into male and female individuals. Of course this is not the only difference that we recognize for the human species. Many sequence variations occur within a population at the genome level. If they occur in more than 1% of the population, then different alleles are spoken of, otherwise they are attributed to mutations that have not yet been enforced by evolution. Genetic polymorphisms are, for instance, observed as insertions or deletions in which at least one nucleotide has been either partially or completely incorporated or lost. However, single nucleotide exchanges occur as the most common sequence variation. Here the term SNPs (spoken “snips”) is used, which is an abbreviation of single nucleotide polymorphism. Compared to the entire genome, polymorphisms encompass only a very small portion. They are estimated to be 1% of the entire genome, so about three million bases. Of these, SNPs are the overwhelming portion with about 90% share. Therefore the largest part of our genome is identical over the entire species human, even though enor- mous diversity in the phenotype is observed between us. Within the SNPs, coding and non-coding changes are differentiated according to whether these observed exchanges are translated into proteins or not. In the coding regions of the genome the single exchange of a nucleotide can lead to an altered protein sequence. In ▶ Sect. 32.7 the translation procedure of a base triplet into a protein sequence is introduced. If a base in a coding triplet is changed, it can either be translated into the same amino acid, or it leads to the incorporation of a different group. This is related to the fact that sometimes multiple triplets code for the same amino acid. The incorporation of a different amino acid into a protein can change its properties. For example the amino acid composition of a glycosyltransferase is decisive for the blood group that we have. An example is introduced in ▶ Sect. 29.7 of how an altered incorporation of a few amino acids in a G protein-coupled receptor can exert an influence on our sense of smell. Humanity is divided into different alleles according to their ability to smell different intensities and qualities. However, not only SNPs in coding regions lead to differences in our species. SNPs in noncoding segments of the genome can lead to changes in gene regulation. In the context of drug research and therapy, SNPs can also be relevant where they have no immediate effect on the phenotype. It is assumed that some SNPs confer 254 12 Gene Technology in Drug Research
  • 272.
    susceptibility to diseasesor influence the cellular response to a drug. It must be considered at this point that SNPs can also occur in the region of the binding site of a drug molecule, which may not necessarily be identical with that of the natural substrate. Then they exert a direct influence on the affinity and the binding profile of the active substance. As a result, an active substance can exert a stronger or weaker inhibition of protein function in patients with an observed SNP than it would in patients in which this SNP is not present. 12.11 The Personal Genome: Access to an Individual Therapy? Genome sequencing and the analysis of SNPs and polymorphisms have impres- sively uncovered the source of disease predisposition, and why drugs have attenu- ated tolerability and different side-effect profiles. It has offered an explanation for why undesirably high variations in the efficacy of drugs can occur in different patients. All the more reason to ask whether the sequencing of the individual genome of each person would provide options for a tailored individual and personalized therapy. It is in no way an utopian idea that in a few years the full sequencing of each individual person will be possible at manageable prices and within an acceptable time frame. It is long known in medicine that the blood groups of donor and recipient must match for blood transfusions. A genome analysis would make the search for a matching donor organ easier for transplantations. A particularly high density of SNPs has been discovered in the genome, especially in regions coding for proteins that present antigens in the immune system on their surface to stimulate an immune response (▶ Sects. 31.7 and ▶ 32.2). An SNP analysis of each individual could indicate the probability of developing a particular disease. Here, early detection of this risk and possible lifestyle modification could be better than any therapy. Already today high-resolution DNA chips (Sect. 12.9) allow the simultaneous determination of more than 500,000 genetic SNP markers. Discovered SNPs can indicate an elevated disposition for, for instance, the development of Alzheimer’s disease in old age. A simple screening of the individual DNA sequence would allow a predisposition for a particular disease pattern to be recognized. Craig Venter, who determined the human genome in his company by the mRNA shotgun method, had his own genome analyzed and published. From the gene analyses of these data, a tendency for obesity and cardiovascular disease was identified. His own father died at 59 years old of a heart attack. Based on this analysis, Venter decided to take a lipid-lowering agent from the statin class preventatively. A doctor could simply read from a personal genome whether the patient displays an SNP pattern that would lead one to expect an intolerance for a particular drug therapy. Moreover the doctor could see what type of metabolizer category (▶ Sect. 27.7) the patient belongs to. This could reduce intolerance upon the simultaneous treatment with multiple drugs, and would allow a safe adjustment of individual dosing. It can also help to choose the right drug for a therapy, particularly if multiple drugs are available for one indication. 12.11 The Personal Genome: Access to an Individual Therapy? 255
  • 273.
    The dream ofa development of “personalized medicines” for individual ther- apy will be difficult to realize for cost reasons. Just the addition of one more methyl group in a drug requires a full toxicological and pharmacological testing program to achieve approval. It would devour millions in development costs. As always, the determination of the individual genome and the elucidation of all imaginable pre- dispositions for possible diseases has, however, its downside. In the hands of the treating physician, this information is a blessing. But what would a future employer read from these data about the prospect of hiring an employee? Insurance compa- nies could accept only risk-free clients based on their genomic data—a chilling idea that the individual genomic composition would decide an insurance premium! By all estimations, our genetic differences and the imaginable consequences for drug therapy, it must not be forgotten that our gastrointestinal tract is home to millions of microorganisms. This flora exerts a decisive influence on our wellbeing, our health stability, our metabolism, and also on our response to drug therapy. The individual gastrointestinal flora begins to build up at birth, and is influenced in critical measure by the mother. It varies considerable with lifestyle, the food culture, and exposure to the regional microorganism landscape. In India, China, or Europe a different microbe culture is found than in, for instance, America. Interestingly, it changes if a person changes his home between the continents. Other microorganisms cause a different configuration of secondary metabolites and contribute to a displaced health equilibrium. Presumably these differences between individuals are just as important as the genetic diversity that makes us different. 12.12 When Genetic Difference Becomes Disease Genetic diseases have a molecular origin. A gene is altered (allele), sometimes the two genes originating from both parents. Each of us carries a large number of such altered genes, which are a result of arbitrary base exchanges: the SNPs. The principle of evolution is based on these random mutations. If a mutation causes a better adaptability of an individual in the environment, the chances of survival and reproduction increase. Those genes are then reproduced with increased probability. So-called horizontal gene transfer exerts an accelerated effect on evolution in asexually reproducing species. There, entire DNA fragments between individuals or even species are exchanged. Crossover plays an important role in this sense in sexual reproduction. In this case, neighboring gene sequences of both parents arbitrarily crossover and make new couplings. Without mutations and crossover, all species would remain absolutely constant. In individual cases many errors are produced as a mechanism of evolution. Some of these errors are the cause of genetic disease. In sickle cell anemia a single amino acid in hemoglobin, which gives blood its red color, is exchanged and a glutamic acid in position 5 of the b chain of hemoglobin A (HbA) is replaced by a valine. The altered hemoglobin aggregates: it “sticks” together in the red blood cells. The cells collapse and take on a characteristic sickle form. Homozygous carriers, that is, individuals in whom the “sick” gene is inherited from the father and the mother, are not able to survive. 256 12 Gene Technology in Drug Research
  • 274.
    Heterozygous carriers whocarry one “sick” and one “healthy” gene produce normal and altered hemoglobin alongside one another. These people indeed have a shorter life expectancy, but usually achieve reproductive maturity. In areas in which malaria is endemic, there is a selection pressure for the genetic disease. Heterozygous carriers of sickle cell anemia are more resistant to malaria than healthy people (▶ Sect. 3.2). Here we are witnesses to Nature’s great experiment. How will it end? Even people intervene. If malaria is successfully treated, wild-type HbA carriers are no longer disadvantaged, the evolutionary advantage of sickle cell anemia and the consequent selection pressure in the direction of this disease disappears. This genetic disease could become “extinct” after a few generations. On the other hand, if sickle cell anemia is treated either conventionally or gene therapeutically, then these people would have entirely normal “healthy” red cells. The malaria pathogen could reproduce well in them again. The protection from this disease would disappear, and the susceptibility of these people to malaria would rise to a normal risk level. In addition to sickle cell anemia, around four thousand other diseases and their molecular causes are known. Some, for example cystic fibrosis, phenylketonuria, and inherited coagulopathies occur relatively frequently. Many others are rare and are sometimes only described once. In the last years a multifactorial genetic cause has been established for an increasing number of diseases, for example for diabetes, rheumatoid arthritis, some cancers, asthma, and Alzheimer’s disease. The occur- rence of these diseases is brought about by the simultaneous coincidence of multiple genetic alterations, or is at least fostered by them. The mechanisms of evolution are also responsible for the development of resistances (▶ Sect. 4.8). Here, the selection pressure is exerted by a drug or an insecticide (e.g., to exterminate malaria-carrying mosquitoes). Because of their rapid reproduction, bacteria and viruses adapt quickly to a “hostile” environment. The true masters are the retroviruses, which can develop resistance particularly quickly because of their high mutation rates, and can therefore annihilate the success of a drug with one stroke (▶ Sect. 24.5). 12.13 Epigenetics: Lifestyle and Environment Influence Gene Activity as a Pen Would Make a Mark in the Book of Life For the development of an organism, it is not only the kind of hereditary infor- mation stored in the DNA that can be translated into gene products that is critical, it is just as important that particular genes are only read in particular cells at particular times. Even social factors and environment influence the genes and change their behavior. Scientists observed the following example with zebra finches. If a male zebra finch hears the song of another male, the gene EGR-1 is more strongly read. The unknown song of a potential rival leads to a much stronger activity in EGR-1 than background bird song that the finch has already heard. EGR-1 is itself a key gene in gene regulation so that a change in the social surroundings of the finch leads to many shifts in the protein expression pattern of 12.13 Epigenetics: Lifestyle and Environment Influence 257
  • 275.
    the bird. Thisresponse helps the bird to adapt to the new changes because the intrusion of a potential competitor into his own territory can be of essential importance to him. Pluripotent embryonal stem cells can differentiate into very different cell types. For example, liver, brain, and muscle cells have the same chromosome set. They are fundamentally different in their function. Many different phenotypes arise from the identical genotype. This is true for the different cell types of an organism at the same time as well as for different time-staged developmental steps in an organism. Research on twins has produced remarkable results in this regard. Comparative studies on identical twins, who are genetically identical, show that with increasing age, and above all with different lifestyles, progressively larger differences in the phenotype occur. There must therefore be mechanisms that lead to changes in the phenotype that are passed along without changes in the genotype. They regulate the transcription process and pass along this property to daughter cells. This process is summarized under the term epigenetics. It leads to the situation where an additional level of information is formed that regulates the reading of the genes from the DNA. The surroundings exert their effect on the genes through the epigenome. Upbringing, childhood experiences, the effects of chemicals or intoxicants, and stress are all epigenetic regulatory influences over which the gene activity is temporarily or even permanently changed. As the following example of the Agouti mice shows, such information can even be passed along to subsequent generations. Normally, these rodents are small brown, thin, and very agile animals. The so- called Agouti gene is contained within their genes, which after activation causes the animal to become ill, their coat turns yellow, and they become ravenous and fat. The offspring of these ill mice are colored the exact same way and are just as frail as their parents. The American molecular biologist Randy Jirtle at Duke University in Durham, NC, fed pregnant Agouti females a special diet that was rich in dietary supplements such as vitamin B12, folic acid, choline, and betaine. As a result, the majority of the offspring of these females were brown, thin, and in the best of health. The Agouti gene was turned off by the enriched diet, without requiring any changes to the genome sequence of the rodents. On the molecular level, it is in particular methylation and acetylation that transmits the additional epigenetic information. In contrast to genetic changes that cause mutations in the translated gene products, epigenetic changes have a strong dynamic component and are, above all, reversible. In the stretched-out state, there is more than two meters of DNA in the cell; this is wound into a highly compact form onto small basic proteins: the histones. Lined up like pearls along a string, they collectively make up the chromatin, which makes up the chromo- somes in its maximally packed form. Histones are the most strongly conserved proteins in existence, for example, the 102-residue histone protein H4 from the pea and from the cow are only different in two positions. Epigenetic changes modify as one option the DNA in that methyl groups are transferred to cytosine by methyltransferases (see ▶ Sect. 26.9) to give 5-methylcytosine. The base pairing with guanine in the DNA is not affected by 258 12 Gene Technology in Drug Research
  • 276.
    this modification, andthe genetic code remains unchanged. If a methylation occurs in a promoter region of the DNA, this leads to a silencing of the corresponding gene. The methylation makes the DNA inaccessible to the reading apparatus, which is somewhat similar to password-protected computer data. If the promoters in these gene segments are demethylated again by methylases, the translation into the corresponding protein is possible once more. As a second epigenetic change histone proteins can be modified. Methyl, acetyl, and phosphate groups can be enzymatically transferred to lysine and arginine residues of these basic proteins with, for example, histone acetyltransferases (HATs). The added acetyl groups neutralize the positive charge on the Lys and Arg residues (the so-called “histone tails”). They can no longer interact as efficiently with the negatively charged phosphate groups of DNA. Added phosphate groups have an even more repulsive effect. These changes lead to less densely packed chromatin, which makes the DNA reading in particular regions easier. The transcription and gene expression is regulated in this way. On the contrary, the cleavage of acetyl groups by histone deacetylases (HDACs) or by methylation of the Lys and Arg residues of the histone causes the packing density of the chromatin to increase, and this diminishes the probability for the DNA to be read in the affected areas. Misregulation of the described enzymes is associated with the development of diverse cancers. Because epigenetic processes are fundamentally reversible, there is a chance that a drug therapy could intervene in the misregulated function of these transferases. For this reason, intensive research efforts are underway for inhibitors of different methyltransferases and histone deacylases, the latter of which are mechanistically comparable to metalloproteinases (▶ Chap. 25, “Inhibitors of Hydrolyzing Metalloenzymes”). The hope remains that these inhibitors can sup- press disease-causing epigenetic changes and become potent drugs for cancer therapy in humans. 12.14 The Scope and Limitations of Gene Therapy In September 1990 the 4-year-old Ashanti DeSilva was the first patient to be treated with a gene therapy. The alleles of both parents for the enzyme adenosine deam- inase were defective. Because this enzyme is critical for the function of the immune system, the little girl suffered from severe immune insufficiency that could no longer be classically treated. As a therapy, the white cells of the patient were repeatedly infected with a virus that carried the correct information for the missing enzyme. The patient, who previously was hospitalized and in constant danger of infection, has developed into a person with entirely normal health. The term gene therapy refers to any technology with which a gene is introduced into a cell of a patient to replace a defective or missing gene. In principle it is very simple. Viruses demonstrate it for us daily: they bring their own genetic informa- tion into a foreign cell and use it to code for a few key enzymes that are necessary for their own reproduction. For the rest they use the biosynthesis machinery of the infected cell. The retroviruses, the genetic information of which is coded in RNA, 12.14 The Scope and Limitations of Gene Therapy 259
  • 277.
    translate this informationinto DNA and integrate it into the host’s DNA. In gene therapy, a nucleic acid segment is inserted into the genome of a virus that codes for the protein that is to be substituted in the patient. The construct, which is what these modified viral genes are called, is surrounded by the virus capsid and is introduced into the cells of the patient. This can either take place outside of the body, that is, in bone marrow or in white blood cells, that have already been aspirated or within the body such as by injection into tumor tissue or in a particular organ. Adenoviruses, herpes viruses, or retroviruses are all well suited as carriers of the genes because these viruses incorporate their own genetic infor- mation into mammalian DNA. Although retroviruses only transfer their genes during cell division, adenoviruses can cause non-dividing cells to incorporate and use foreign genetic information. Plasmids, DNA and liposomes and pure DNA constructs are also being experimented with. The rates of transfer for the new information into cellular DNA is significantly higher here than for the viruses. In the meantime over 1,000 gene therapy clinical studies are underway, most in the USA and overwhelmingly for tumor therapy. Cancer is indeed not a hereditary disease, but the genetic information that is inherited from cell to cell creates a “local” genetic disease. Oncogenes are a large group of proteins that are respon- sible for the occurrence of cancer. Tumor-suppressor genes code for proteins that interfere in the cell cycle and stop the division of cells. The quickly increasing knowledge of the molecular structure of these proteins has afforded many approaches for the gene therapy of tumors. Other diseases can also be approached with gene therapy. The standard therapy for cardiovascular diseases that are characterized by an excessive growth of endo- thelial cells and consequent narrowing of the blood vessels is widening with a balloon catheter. That helps, but only temporarily. After a few months the cells proliferate anew and the blood flow in the downstream areas decreases threaten- ingly. Here a gene therapy could be employed. Adenoviruses can be released locally during the balloon catheter treatment. These carry the genetic information for a protein that inhibits cell division, the so-called retinoblastoma protein. The cells can then no longer proliferate. AIDS patients die from infections because their immune systems are damaged. The so-called T cells die. Bone marrow transplantation is a possible therapy. For this it is decisive that the immunological properties of the donor and patient are as close as possible. Many people are eliminated as possible donors, not to mention animals. Or are they suited? A new approach for bone marrow transplantation and perhaps even organ transplantation is the humanization of animals. For this immature human bone marrow cells, stem cells, are transplanted into an animal, for example, a baboon. The rejection reaction of the foreign cells is prevented by treatment with immuno- suppressants. The human recipient does not bear the risk of an immune reaction, but rather the animal donor. After the proliferation of the human cells in the animal, the cells can be safely transplanted into the human “pro-donor.” Will gene therapy replace classical drug therapy? The answer is absolutely certain: no. The technique is very laborious and each patient needs an individually adapted therapy. Moreover, the results to date have been a bit disappointing and 260 12 Gene Technology in Drug Research
  • 278.
    sometimes devastating. Fatalitieshave been observed in the gene therapy of pediatric leukemias. Gene therapy will conquer a place in the therapy of special diseases because it is a curative and not a symptomatic therapy. With increasing experience and better appraisal of the possible risks, interventions into the human genome will become acceptable for such diseases because it would make it possible to eliminate the genetic disease for the individual and his or her offspring once and for all and eradicate it from the world. Gene technology not only solves problems, it creates new ones too. The techni- cal barrier to the creation of a Homo perfectus is as low as it has ever been in the history of humanity. The door to possible misuse has been widely opened. We can only hope that ethics and common sense prevents this from happening. Draconian legal regulations damage the beneficial use of gene technology more than it contributes to the prevention of misuse. Those in responsibility have recognized this and have established a framework in which gene technology can further develop for the good of humanity. 12.15 Synopsis • Gene technology has developed as a key technology in modern drug research because it allows the production of pure proteins, the targeted mutagenesis to elucidate functional and mechanistic properties of proteins or to confirm and disprove binding modes, produces animal models by knocking-in and out par- ticular genes, allows genes to be activated or silenced, or allows somatic individual gene therapy. • The elucidation of the genetic code, the recombinant production of genes and gene products, and the polymerase chain reaction were milestones in the estab- lishment of gene technology. • Sequencing of the human genome revealed the constitution of our genes, the number of gene products, and many functional insights. Meanwhile hundreds of genomes of other species have been sequenced, and the genome analysis of individuals is on the horizon. • The human genome contains about 25,000 genes of which about 22,000–23,000 are translated into proteins. Some sequence segments are non-coding RNAs and they accomplish important functions in the organism (e.g., in the ribosome or spliceosome). About 95% of the genome contains numerous sequences and signals that control the regulation of the genome. A functional classification of the gene products has been accomplished for a significant portion of the genome. • To study the relevance of blocking the function of a gene product, that is, a protein in a disease situation, a particular gene can be knocked-out in an animal model, mostly in mice. Genes can also be knocked in. Such turning on and off of genes is of utmost importance in drug research because it provides decisive information about the relevance of a planned therapeutic intervention. • In vitro models for drug screening could only be developed once proteins could be produced in pure form and high yield. Various expression systems from 12.15 Synopsis 261
  • 279.
    bacterial up tomammalian cells can be used for the production of foreign proteins, which are brought into cells via the corresponding coding DNA. • Genes can be silenced by RNA interference. Therefore small amounts of double- stranded RNA, usually produced by the enzyme dicer, are incorporated in the enzyme complex RISC. RISC uses one strand of the RNA dimer segments as a template to capture mRNA molecules with a complementary sequence and cleaves them sequentially. By doing this, mRNAs with particular sequences are eliminated. • To copy this principle for therapy, one needs about 22-base RNA molecules that have to be transported across the membrane into cells, a difficult task with fragile and highly polar species. Furthermore, these molecules can cause unwanted immune response. Chemical modifications of the RNA molecules are aimed at improvements in the transportation, immunogenicity, and stability properties. • The proteome reflects the totality of all proteins in a cell at a given time under precisely defined conditions. Its composition changes dynamically and differs between healthy or diseased states or under the influence of therapeutic treatments. • The proteome can be analyzed at any given time by 2D gel electrophoresis; this combines a separation by isoelectric focusing and SDS-PAGE analysis. Differ- ences in the expression patterns indicate the involvement of proteins in a disease situation. Back regulation under drug administration can indicate a possible therapeutic strategy. • Pull-down experiments with immobilized drug molecules on a chromatographic solid support allow trapping of proteins that show interaction with the studied drug molecules. Interaction profiles for drug molecules in the cell can be determined. • Biomolecules can be immobilized on microarray chips. Particularly RNA, DNA, and oligonucleoides thereof are anchored on these chips to extract from the complementary RNA or DNA sequences from large mixtures. By appropriate fluorescence labeling of the anchored baits, sequences, and the target sequences to be “fished,” detection of binding can be easily recorded in automated fashion. With this, expression patterns of cells can be studied. • Polymorphisms, particularly single nucleotide polymorphisms (SNPs), are vari- ations in the composition of the genome of a species. These changes make individuals different, and some SNPs confer susceptibility or resistance to diseases or influence the cellular response to a drug. • Differences in the individual genomes might be the key to a tailored individual and personalized drug therapy and can allow a susceptibility to a particular disease pattern to be recognized. Intolerance to a given drug therapy could become transparent or classification of an individual into different metabolizer classes could be achieved. • Genetic differences can be a reason for the development of diseases. In some cases, they are caused by single amino acid exchanges in one gene product (e.g., sickle cell anemia), in other cases multifactorial genetic causes are responsible for the disease development. 262 12 Gene Technology in Drug Research
  • 280.
    • Epigenetics regulatethe transcription process not by altering the genetic sequence of DNA but by regulating the reading of the genes from the DNA. Lifestyle, experience, and environment exert their effect on the genes through the epigenome. • Methylations and acetylations transmit additional epigenetic information in a reversible manner. Either the bases of DNA are directly methylated or the packing density of stored DNA on the histone proteins is altered making it more or lesser accessible to the reading apparatus. The latter process modifies the charges of positively charged Lys and Arg residues involved in packing via the transfer of acetyl groups. • Gene therapy tries to replace a defective or missing gene in the cells of a patient. This would make it possible to eliminate the genetic disease for the individual and his or her offspring. A nucleic acid segment is inserted into the genome via viral carriers, and it codes for the protein that is to be substituted in the patient. Gene therapy opens opportunities in special disease situations but it also has its risks. Bibliography General Literature Cooper NG (ed) (1994) The human genome project. Deciphering the blueprint of heredity. University Science, Mill Valley Kiely JS (1994) Recent advances in antisense technology. Ann Rep Med Chem 29:297–306 Lander ES et al (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921 Monastersky GM, Robel JM (eds) (1995) Strategies in transgenic animal science. Blackwell Science, Oxford Mullis KB, Ferré F, Gibbs RA (eds) (1994) The polymerase chain reaction. Birkh€ auser, Boston Pandit SB, Balaji S, Srinivasan N (2004) Structural and functional characterization of gene products encoded in the human genome by homology detection. IUBMB Life 56:317–331 Post LE (1995) Gene therapy: progress, new directions, and issues. Ann Rep Med Chem 30:219–226 Slagboom PE, Meulenbelt I (2002) Organisation of the human genome and our tools for identi- fying disease genes. Biol Psychol 61:11–31 Venter JC et al (2001) The sequence of the human genome. Science 291:1304–1351 Wolff JA (1994) Gene therapeutics. Methods and applications of direct gene transfer. Birkh€ auser, Boston Special Literature Adams MD et al (1995) Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. Nature 377(Suppl 6547):3–174 (85 authors including JC Venter) Carlton JM et al (2007) Draft genome sequence of the sexually transmitted pathogen Trichomonas vaginalis. Science 315:207–212 Chang MW, Barr E, Seltzer J, Jiang Y-Q, Nabel GJ, Nabel EG, Parmacek MS, Leiden JM (1995) Cytostatic gene therapy for vascular proliferative disorders with a constitutively active form of the retinoblastoma gene product. Science 267:518–522 Bibliography 263
  • 281.
    Craig C (1995)Bristol-Myers to Pay $2.7M for transgenic goats that make human antibodies. BioWorld Today 6:1 Explore the Homo sapiens genome. http://www.ensembl.org/Homo_sapiens/index.html Fleischmann RD et al (JC Venter et al) (1995) Whole genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269:496–512 Human genome database with functional predictions Schneiker S et al (2007) Complete genome sequence of the Myxobacterium Sorangium cellulosum. Nat Biotech 25:1281–1289 Seide RK, Giaccio A (1995) Patenting animals. Chem Ind 16:656–659 Sippl W, Jung M (2009) Epigenetic targets in drug discovery methods and principles in medicinal chemistry. In: Mannhold R, Kubinyi H, Folkers G (eds) Methods and principles in medicinal chemistry, vol 42. Wiley-VCH, Weinheim 264 12 Gene Technology in Drug Research
  • 282.
    Experimental Methods ofStructure Determination 13 In this chapter we want to turn to the experimental structure determination methods of ligands and proteins. There are two techniques in particular that deliver information about the three-dimensional structure of small organic molecules all the way to proteins: crystal structure analysis and high-resolution NMR spectroscopy. The first technique is the older method. It goes back to an experiment of Max von Laue in 1912. It was just 17 years earlier that Wilhelm Röntgen had discovered an electro- magnetic radiation, which was later named X-rays, or “Roentgen rays” in German in honor of him. Together with his collaborators Walter Friedrich and Paul Knipping, Laue was able to demonstrate the wave nature of X-rays with a copper sulfate crystal. At the same time they proved the lattice structure of crystals. Only one year later William Lawrence Bragg and his father William Henry Bragg reaped the rewards of these experiments. They determined the crystal structure of sodium chloride. The technique has grown over the years. Today the structures of proteins with 4,000 amino acids have been determined. In the last years electron microscopy has proven to be a very powerful crystal diffraction technique tool for the structure elucidation of membrane-bound proteins and viruses. NMR spectroscopy is likewise a relatively young technique. In 1945 the research group of Felix Bloch and Edward Purcell in the USA observed the resonance absorption of hydrogen atom nuclei in a magnetic field for the first time. From this experiment, the technique has grown, mostly due to progress with the instrumentation, to the extent that the structure determination of proteins with more than 800 amino acids has been accomplished. For this purpose, however, the protein must be extensively labeled with different isotopes. 13.1 Crystals: Aesthetic on the Outside, Periodic on the Inside The term “crystal” causes one to immediately think of well-formed minerals or sparkling gemstones with a magnificent cut. The association of crystals with the structures of the molecules that determine our lives only occur to us as a second thought. The crystal is typically associated with “dead” material. When Jack Dunitz took over his chair as professor of organic chemistry at the ETH in Zurich at the end G. Klebe, Drug Design, DOI 10.1007/978-3-642-17907-5_13, # Springer-Verlag Berlin Heidelberg 2013 265
  • 283.
    of the 1950s,the famous natural product chemist Leopold Ruzicka dismissively told him that crystals are a “chemical graveyard.” Nonetheless, Dunitz and his research group showed over many years that a crystal in no way belongs in a “graveyard,” but rather is the key to understanding the structure, dynamics, and reactivity of molecules. If a mineral is considered, the regular construction of the single crystals stands out. Even organic materials have the ability to form shapely crystals. One must only think of the fascinating crystals of candied sugar. Is this external regularity a representation of the inner structure? Before this question is answered, the way that crystals are obtained should be clarified. A mineralogist got it easy. Nature has already provided well-formed crystals over thousands or millions of years. Organic molecules and proteins rarely occur in Nature in a crystalline state. Conditions must be found under which they crystallize. In general, crystals are grown from a solution. For simple organic substances this can also be accomplished from liquid material or by sublimation. Both crystalliza- tion methods are known from water when a lake freezes to ice, or from beautiful crystals of frost. For crystallization from solution a solvent is sought in which the compound is adequately soluble. By changing the conditions, the saturation point of the solution is exceeded. If this occurs slowly, small crystal nuclei form that can grow to large crystals. As a rule the solubility of the compound decreases with sinking temperatures. The saturation point of the solution can be exceeded by changing the temperature. The solution can also be “thickened”, that is, some of the solvent is removed. Another possibility is the addition of a second solvent in which the compound is less soluble. If the ratio of the two solvents is correctly chosen, the saturation point can be slowly approached. For compounds with acidic or basic groups, pH conditions can be found under which the compound exists as a salt. Because of strong ionic interactions the salts often form better crystals. They can be “salted out.” For this, a salt, for example, sodium chloride, is added to an aqueous solution of the compound. The salt “uses up” the water molecules as it goes into solution. It becomes surrounded by a solvation sphere of water molecules. In doing so, the water is removed from the organic compound, which also has a sphere of water surrounding it, the solvent. The saturation point of the compound is exceeded, and the crystallization begins. Proteins are complex entities that, as a general rule, are only soluble in water. Because of their amino acid composition, they carry charged ionic groups on their surfaces. Even with proteins it holds true that conditions must be found under which they associate in periodic array. This is accomplished by slowly changing the amount of water in which the protein is dissolved. This can work in both directions. Hydrophobic proteins begin to aggregate when the amount of water increases. Proteins that have stronger polar groups on their surfaces aggregate when the water molecules are removed from their surfaces. Adjusting pH to find the right value, the choice of suitable salt for salting out, and different temperatures are the conditions that must be optimized. In addition to salts, surface-active substances (detergents) can also influence the solvent shell and support the crystallization. Despite this, crystallization is a kind of fine art. The search for suitable conditions 266 13 Experimental Methods of Structure Determination
  • 284.
    requires creativity anddiligence. Today, however, the crystallization methods are so elaborate that the tedious work of setting up thousands of different test condi- tions is carried out by robots. Sometimes considerable effort is invested into structure determination. In 1995, the crystallization and structure determination of HIV integrase, one of the key enzymes in the generation cycle of the virus, was accomplished only after the 40th point mutation of the original protein. This point mutation was made with the goal of changing the surface properties of the protein so that an orderly aggregation to a crystal could occur. Let us return to the original question of whether the orderly outward appearance of a crystal is a reflection of the internal construction. Chemically, a crystal is homogenously composed. The organic molecule or the protein represents the basic building block. It is only when these building blocks are spatially neatly organized that a periodic array occurs that optimally fills the space. In daily life, many solutions to these packing problems are easily seen, for example, sugar cubes that only fit into the box if they are layered in the right direction, or paving stones that must be neatly laid in a periodic fashion to completely cover the path without gaps (Fig. 13.1). A single paving stone, when correctly fitted to the next, represents a repeating unit in the lattice. A crystallographer refers to this unit as an elementary unit cell, and the orderly setting of one unit upon another in terms of periodic translation. In the most simple organic crystal structure, the elementary cell is one molecule (Fig. 13.2). 13.2 Just Like Wallpaper: Symmetries Govern Crystal Packings The contents of an elementary cell can also be more complexly composed, for example, like a wallpaper pattern. A basic motif is repeated so that it fills the surface area. Crystallographers call the basic motif the asymmetric unit. In Fig. 13.3 this motif is a flower branch. Not all of the motifs can be generated simply by shifting a b Fig. 13.1 Paving stones cover a surface without leaving holes (a). This is only possible if they are derived from a particular basic geometric pattern, for instance a parallelogram, rectangle, square, triangle, or hexagon. This basic pattern can by modulated by complementary bulges and recesses. A path cannot be covered without holes if equilateral pentagons or octagons are used. If an octagonal stone is combined with a square stone, however, the surface is completely covered. It is immediately clear that if a square stone is cut along its two diagonals, two triangles result. Adding four such pieces an octagon can be amended to a square in this way (b). 13.2 Just Like Wallpaper: Symmetries Govern Crystal Packings 267
  • 285.
    the branch, somemust be additionally reflected. A pair of image and mirror-image branches represent the elementary cell. The surface can now be filled with this building block by simply shifting it. In addition to reflecting, basic motifs can also be rotated. By using reflections and rotations, both so-called symmetry operations, the contents of the elementary cell is generated from the asymmetric unit. This cell is layered on itself in all three spatial directions in an orderly formed crystal lattice. Even as a three-dimensional entity, the elementary cell must take on a particular Fig. 13.2 In the most simple case, molecular packing, or unit cell, is accomplished purely by shifting the molecule in all three spatial directions. The resulting unit, the elementary cell, is derived from an irregularly angled body, a parallelepiped (above right, violet). If a point near the molecule is picked out and all of the molecules in the crystal packing are connected by this point, a three-dimensional lattice results. Fig. 13.3 An area can be covered not only by purely shifting an object, the asymmetric unit. Additional symmetry operations such as reflection and rotation can also be used. This way multiple copies of the object are generated. In the presented case, the flower branch along with its mirror image makes up the unit (the elementary cell is outlined in red) that can be used to cover the surface simply by shifting it regularly. 268 13 Experimental Methods of Structure Determination
  • 286.
    form to completelyfill all of the space. If the basic types of elementary cells are combined with all of the possible symmetry operations, 230 possibilities result for the basic motif to fill the space. The crystallographer calls them the 230 space groups. For chiral molecules, and proteins belong to this group, mirror reflection does not occur. Therefore proteins only crystallize in 65 space groups. 13.3 Crystal Lattices Diffract X-Rays Max von Laue used crystals to prove the wave nature of X-rays (Roentgen rays) by diffracting them. For illustration, we shall consider a water wave. When a drop of rain strikes a puddle, circular waves form that propagate from the center outward. The drop generates a so-called elementary wave upon submersion. If two drops that are separated by a particular distance simultaneously strike the water’s surface, circular waves propagate outwardly from both submersion points. It is better to observe this experiment if the water’s surface is constantly being “excited,” for instance, with a constantly dripping tap. The circular outwardly spreading wave fronts meet each other at some point. What happens? A lamellar pattern forms, parts of the water’s surface remain at rest and other parts seem to move vigorously (Fig. 13.4). In the cross section the water surface moves sinusoidally (Fig. 13.5). How do two waves behave that collide and superimpose with one another? If the wave peak and another wave peak or the wave trough and another wave trough meet, the wave is amplified. If, on the other hand, a wave peak meets a trough, they cancel one another out. The water surface remains calm. The lamellar pattern of moving and still water surface between waves that are moving outwardly and inwardly is caused by this superimposition. It is called interference. The band density depends on the distance between the submersion points of the drops. The ensuing interference pattern therefore contains information about the relative posi- tion of the points from which the elementary waves were generated. Fig. 13.4 Two raindrops strike the surface of the water and form circular, outwardly moving water waves. These superimpose on one another to give a band-formed interference pattern. There are areas along these bands where the water surface is quiet. In other areas it moves that much more strongly. 13.3 Crystal Lattices Diffract X-Rays 269
  • 287.
    If parallel waterwaves (e.g., a wave front at the coast) collide with a barrier that has a small opening (e.g., a harbor entrance) semicircular waves spread outward from the backside. If this barrier has two neighboring openings (double slit), a semicircular wave develops behind each opening. The same picture as with the two raindrops is achieved (Fig. 13.4). The waves interfere with one another behind the double-slit barrier, and a diffraction pattern forms. The density of this pattern, that is, the progression of the bands, depends on the geometry of the double slit. Formally, the diffraction sequence on the crystal lattice is analogous. The same principles are valid, but the superimposition is more complex. A very simple lattice shall be considered that only has one type of atom. An X-ray runs as a parallel wave toward this crystal. It collides with an array of atoms and initiates an interaction that is comparable to that between the raindrop and the puddle. Each atom generates a spherical wave because of the interaction between the atom’s electrons and the X-ray. The circular wave on the water’s surface represents therefore the spherical wave in space. The spreading spherical waves superimpose on one another and form a wave that leaves the crystal in a changed direction (Fig. 13.6). Formally seen, the incoming and outgoing waves have an angular relationship to one another that is equivalent to the reflection of the wave in a plane perpendicular to the Fig. 13.5 The waves run in a sinusoidal manner in cross section. The distance between two wave peaks is called the wavelength. The height of the water wave at the summit is called the amplitude. The position at which the wave crosses the resting position determines the phase. (a) If two wave trains with the same phase meet, they add to one another and the amplitude doubles. This situation is in the places in Fig. 13.4 where the water’s surface moves more strongly. (b) If there is a phase difference of exactly one half of a wavelength, the wave peaks meet with the troughs. Both waves cancel one another out. This represents the parts of Fig. 13.4 where the water surface is very still. (c) Any other superimposed phase shift causes a wave, the amplitude of which is somewhere between the extremes in (a) and (b). 270 13 Experimental Methods of Structure Determination
  • 288.
    considered atom row.Therefore, the diffraction of the three-dimensional crystal lattice can be treated formally as a reflection at a plane in the lattice. Many parallel sets of such lattice planes can be inscribed on a crystal with differing relative separation from one another and relative occupation density with atoms (Fig. 13.7). The reflected waves contain the information about the geometry (distance) and the relative occupancy (scattering power) in this plane. To record the diffraction properties of a crystal, each set of parallel planes of the crystal must be oriented in the X-ray beam so that a reflection is possible. This laborious work is taken over by a computer-controlled diffractometer. 13.4 Crystal Structure Analysis: Evaluating the Spatial Arrangement and Intensity of Diffraction Patterns To demonstrate that different lattices indeed generate different diffraction patterns, a simple experiment should be considered. For this purpose a laser pointer and different pinhole filters are needed. The pinhole filters can easily be made. A black and white print out of the periodic alignment as is shown in Fig. 13.8 can be reduced and transferred to high-resolution photography film. This homemade aperture rep- resents a two-dimensional periodic lattice. The laser beam is bent through the pinhole mask and generates the diffraction pattern on a screen that is shown in Fig. 13.8. a b Fig. 13.6 If a wave front (blue) in one plane meets with a row of atoms (black points on the dotted lines), each atom in this row becomes the starting point for a circular wave. This is analogous to those created when the raindrop hits the surface of a puddle. The circular waves that formed from the back row of atoms superimpose upon one another just as in the case with the water waves (Fig. 13.4). All circular waves are generated with the same phase in the indicated direction of the incoming wave (a). As a result of this superimposition, a new wave front forms (red) that leaves the crystal in an altered direction. Relative to the direction of the incoming wave, they have an angle that is formally a reflection of the incoming wave front on the atom row that is marked with the green line. If a different incoming direction is taken the circular waves are not generated from the same place (b), that is, there is a phase difference between them. Their superimposition does not lead to a new wave front. 13.4 Crystal Structure Analysis 271
  • 289.
    In the firsttwo masks the distance and symmetry of the pinhole mask is changed. In the third and fourth mask the repeating motif of the three or five differently sized holes represent a molecule that has two types of atoms. These motifs produce a periodic lattice when lined up next to each other. They have the same dimension as is found in the first image on the left. If the diffraction pictures are compared, the distribution of the intensity of the light points is different. That is b a d c Fig. 13.7 A cluster of parallel planes can be laid through the atoms of a crystal lattice (a, b, c). Their relative distance from one another and their atomic occupation density varies. Each one can give rise to “reflections” in an X-ray diffraction experiment. For this the crystal must be brought into the correct orientation for the incoming beam each time. The X-ray counter is positioned so that it captures the out-going X-ray beam. It is from this geometry that the spatial orientation of the cluster of planes in the crystal is determined. The occupation density of the atoms decides how “well” a particular plane cluster reflects. This information is contained in the intensity (amplitude) of the outgoing wave. (d) Different types of atoms in a molecular crystal have different spatial relationships to one another. A parallel cluster of planes can be placed through each atom in the molecule (here a three-atom molecule). The amplitude of the outgoing beam results in the superimposition of wave trains that are reflected in these planes. 272 13 Experimental Methods of Structure Determination
  • 290.
    Fig. 13.8 A perforated mask can be used for a diffraction experiment with a laser pointer. For this the displayed hole patterns (above) must be brought to the size of the wavelength of laser light. The diffraction patterns below were generated from the masks. The holes in the two left masks are all the same size, which is comparable to having only one type of atom. The hole pattern changes from wide-meshed squares to an angular orientation. The diffraction patterns reflect the symmetry and distance of the holes to one another. In the two masks on the right, the distance between the repeating units is identical to the first masks. The composition of the motif in the repeating unit, however, varies. It is made up of multiple holes and can be compared to the different atoms in a molecule. The distance between the diffracted light reflections (lower row) is identical for the first, third, and fourth masks. The intensity of the diffracted radiation, however, varies from reflection to reflection. It contains the information about the composition and the geometry of the original motif. 13.4 Crystal StructureAnalysis 273
  • 291.
    what contains theinformation about the construction of the motif that generated the lattice. It is just this information that is used to determine the crystal structure. The reflections, that is, the intensity of the individual light points in the diffraction pattern, contain the information about the form of the molecule. There is a mathematical technique, the Fourier transform, which can be used to translate the diffraction pattern back to the generating motif. A Fourier transform is the superimposition of many sine and cosine functions. The intensity of the diffraction reflections determines the contribution of the functions, as does the phasing. The importance of these aspects was already underscored in the interference of the waves (Fig. 13.5). Unfortunately just this information about the relative phasing is lost in the diffraction experiment. The diffractometer only registers the intensity of the reflec- tions. The missing information is referred to as the phase problem of crystal structural determination. It must be reconstructed for the individual reflections by computational methods and by using appropriate measuring conditions. Fre- quently large electron-rich elements (e.g., heavy-metal ions) are embedded in the protein (i.e., by coordinating to histidine or cysteine). These heavy atoms dominate the diffraction pattern, and in doing so, they betray their position in the crystal lattice. Another method takes advantage of the so-called anomalous scattering. This effect is based on the interaction of X-rays with the electrons of heavy atoms in particular. This leads to the situation that a spherical wave that is propagating toward an atom is reflected with a phase shift. Simply stated, it is returned with a delay. The effect is dependent on the wavelength and can be exploited to determine the phasing. The crystal is measured on a synchroton (particle accelerator that also produces electro- magnetic radiation in a broad wavelength range, including X-rays) and the diffraction experiment is carried out with multiple different wavelengths. Anomalous scatter- ing requires that a heavy atom is contained in the protein structure. This is already the case for metalloproteins. Often another approach is taken. Proteins that are produced in a special expression system (▶ Sect. 12.6), can be generated with selenomethionine instead of methionine. The heavier selenium serves as an anoma- lous scatterer in the diffraction experiment. There are methods for small molecules that allow a straightforward reconstruction of the phase information from the inten- sity distribution, the so-called “direct methods.” The development of such methods is being worked on for protein structural determination. Often an already-solved, related protein structure can be utilized as a starting model for a structure determi- nation (molecular replacement method). The model is translated and rotated in the elementary cell by computer simulations until a calculated diffraction pattern is produced that matches the diffraction pattern of the unknown protein. The phasing obtained at the beginning of the structural analysis with this method is only approximate. Altogether the regeneration of the phasing information is not trivial. Even in the 1960s, phasing calculations kept one scientist busy for several years. The methodical progress and the increased performance of computers now allow this to be accomplished in a few minutes. Even today, however, this step can still be very challenging for proteins. It is becoming apparent though, that the structure determination of medium-sized proteins is becoming routine. Historically, the time span from crystallization to structure determination could be quite long. 274 13 Experimental Methods of Structure Determination
  • 292.
    Urease is certainlya curiosity. It was the first protein to be successfully crystallized. James B. Sumner accomplished this back in 1926. Its 3D structure, however, was first elucidated in 1995, that is, 70 years later! 13.5 Diffraction Power and Resolution Determine the Accuracy of a Crystal Structure A picture of the contents of the unit cell is the result of the Fourier transform. It is portrayed in terms of the electron density in space (Fig. 13.9). The detail with which the electron density can be determined depends on the spatial resolution with which the diffraction pattern was measured. In relation to the Fourier trans- form, this is a question of the number of different wave fronts that were superimposed upon one another in the correct amplitude and phase. It can be seen in the diffraction pattern created with the laser beam (Fig. 13.8) that the intensity clearly weakens toward the edges. The extent to which the diffraction pattern is Fig. 13.9 View of a crystal structure of aldose reductase (▶ Sect. 27.4). The electron density (the so-called 2F0–Fc density at 1s level) is displayed as a blue mesh on the predefined contour level around a tryptophan residue. In (a) the diffraction data were obtained at a resolution of 4 Å, and a Fourier transform was used to calculate the electron density. The resolution increases from (a) 4 Å to (b) 3 Å, to (c) 2 Å, and to (d) 0.66 Å. The resolution in the last-shown contour density is so high that hydrogen atoms can be recognized as single density peaks in the difference density map (positive is yellow, negative is violet F0–Fc difference density, 2s level). The electron density is so clearly structured at 2 Å that it is simple to fit the indole building block in place. At 4-Å resolution this assignment is problematic and can easily lead to errors. 13.5 Diffraction Power and Resolution Determine the Accuracy of a Crystal Structure 275
  • 293.
    perceivable in theedges limits the accuracy with which the generated motif can be spatially resolved. For small organic molecules, this resolution is easily achieved in that the atoms are visible as distinct maxima in the electron density. If the crystal’s quality is diminished due to lattice defects or disorder, the resolution is poorer. The resolution in protein crystals is usually between 1.5 and 3 Å. In the best case, a resolution is achieved that is in the order of magnitude of a bond length. The upper limit falls into the range of the cross section of a benzene ring. Resolutions of less than 1 Å, however, have been achieved (Fig. 13.9). In those cases many details are recognizable, such as single hydrogen atoms or multiple arrangements of side chains. At higher resolution the electron density maxima are directly assigned to the atoms in the molecule (Fig. 13.10). In the beginning this assignment is crude, the phases used in the Fourier transform are only approximate. The position of the detected maxima must still be optimized. This is defined as “refinement of the structure.” For this the experimentally observed diffraction pattern is compared with the diffraction pattern that is calculated from the atomic positions of the preliminary model. If the measure- ment is very accurate, the density of a “pseudomolecule” with spherical atoms can be subtracted from the observed electron densities at the end of the structure determina- tion. What remains is the electron distribution of the bonds between the atoms in the molecule (Fig. 13.10). This is, however, only possible with very high-resolution measurements. At lower resolution, as is the case in moderately resolved protein structure determinations, a direct assignment of the atoms of the protein to the electron density maxima cannot be made (Fig. 13.11). More commonly the course of the chains is fitted to the electron density. Because proteins are constructed from 20 different amino acids that prefer to take on typical geometries, the interpretation of the electron density is simplified (Fig. 13.11). As with low-molecular-weight structures the model is iteratively refined, and the structural data improved. Electrons scatter X-rays. Therefore, the number of electrons around an atom determines how well it is detected in the resulting density. Hydrogen atoms have only one electron in their shell. As a consequence, they are often not located or are located with poor accuracy in the electron density. Hydrogen atoms can be recog- nized as densities in the structure determination of small molecules, but this is only possible in protein structures if the resolution is less than 1 Å. It is unproblematic as long as it only concerns hydrogen atoms at positions that correspond to spatially fixed positions at a rigid molecular scaffold, for instance, hydrogen atoms on phenyl rings. It is more difficult if the hydrogen atom is on a conformationally flexible group or groups that can be protonated or deprotonated. It is good to know if a carboxyl group is ionized, or if it exists as the free acid, and in which direction the hydrogen atom is oriented. This information can only be indirectly gleaned from the protein structure through an exact analysis of the spatial orientation of the surrounding hydrogen-bonding partners. The accuracy of the structure determination depends on the resolution of the data that was obtained from a crystal. Even if the structure of the protein is displayed on the computer screen like that of an organic molecule, its geometry is much less accurately determined. The error margins in small molecule determina- tions are approximately 0.01 Å for bond lengths, 0.1 for bond angles, and 1 –2 for 276 13 Experimental Methods of Structure Determination
  • 294.
    a b c e d H H H H H H C C 0 0 0 0 0 0 f Fig.13.10 Crystals with an edge of 0.1–0.3 mm are needed for the structure determination of small organic molecules. (a) A diffraction pattern is obtained in an X-ray beam (compare Fig. 13.8) that is displayed on a photographic plate or (b) is registered with a diffractometer counter. The molecule that generated this diffraction pattern, which is periodically arranged in the crystal is back-calculated from the reflections. (c) A Fourier transform is carried out with approximate phasing, and a map of the electron density in space is obtained that is contoured according to its height. The maxima are assigned to the atoms in the molecule (here oxalic acid). (d) The spatial blurring of the electron density is associated with thermal motion of the atoms. It is displayed with ellipsoids that represent the 50% probability of the occupancy of each atom. (e) Crystals that scatter well allow the determination of the electron density in the bonds between atoms. (f) The application of symmetry operations generates the molecular packing in the crystal lattice. It delivers information about noncovalent interactions between molecules. 13.5 Diffraction Power and Resolution Determine the Accuracy of a Crystal Structure 277
  • 295.
    b f a c e d Fig. 13.11 (a)The diffraction pattern of a protein crystal clearly shows more reflections. As they are made up by larger molecules the unit cells comprise a bigger volume and exhibit more lattice planes and therefore reflections. However, due to high solvent content and inherent flexibility of the more complicated macromolecules the crystals give rise to poorer diffraction quality and the data are registered to a lower resolution. (b) The enormous data flood is registered with an area detector on a diffractometer. This allows the simultaneous registration of many diffracted inten- sities. (c) A Fourier transform performed with phases from the first model delivers the distribution of the electron density in space (blue mesh). Because no atomic centers are resolved in this density, the trace of the protein chain (here a segment from a b sheet of tumor necrosis factor, TNF) is fitted to the electron density distribution. (d) Similarly to small molecules, the obtained model is refined until all of the atoms of the protein fit optimally into the density. (e) The color-coded thermal motion of the molecule is shown over the entire molecule. Blue to yellow to red color changes show the transition from mild to severe movement. (f) Symmetry operations generate the molec- ular packing in the crystal lattice. There are “empty” areas that are occupied by numerous water molecules. Because of the strong thermal motion and the disorder that it causes, they are not found in the electron density map. 278 13 Experimental Methods of Structure Determination
  • 296.
    dihedral angles (▶Chap. 16, “Conformational Analysis”). For protein structures, significantly larger errors must be assumed, and they are difficult to quantify. They depend on how the structure was refined. The electron density does not allow individual atoms to be resolved. Therefore amino acids are placed with idealized bond lengths and angles in the electron density. Their geometry is left at the predefined knowledge-based values for the subsequent refinement. The assignment of atom types for the placement of the side chains is partially based on assumptions. Knowledge-based values are used, or attempts are made to keep the hydrogen-bonding network consistent. These aspects are to be considered when judging the accuracy of a protein structure. The result of the crystal structure determination is given in a spatially and time-averaged picture of one “mean” molecule that represents the whole crystal. Often it is discovered that the electron density in some areas indicates only a reduced occupancy of a side chain or a part of a bound ligand. Furthermore, alternative orientations (conformations) are recognizable. Sometimes the electron density from entire areas is missing. This is indicative of “disorder,” and argues for a distribution over multiple orientations in the crystal. This disorder can be dynamic, that is, the relevant groups jump back and forth between two or more orientations. Or the disorder is static which mean several orientations are present side-by-side in a crystal. Because the structure is an averaged picture, these arrangements are scattered throughout the crystal with different orientations. If a part of the molecule is entirely disordered, that is, scattered over numerous orientations, the electron density is usually not visible. Today, just to reduce the damage due to radiation exposure, structures are measured at 100 K by using a nitrogen cool gas stream. At this temperature many movements in the crystal are frozen and static disorder can be observed. Despite this, it has been shown that the determined structure corre- sponds well to the situation at room or body temperature. These conclusions can be drawn by comparing the results to the analogous determination from NMR spectroscopy (Sect. 13.7) and molecular dynamic simulations (▶ Sect. 15.8). 13.6 Electron Microscopy: Using Two-Dimensional Crystals to Trace Membrane Proteins Cryoelectron microscopy represents an ideal complement to X-ray structure deter- mination because it makes the structure of very large membrane-bound proteins accessible. Electrons are used as the radiation source. These slightly penetrate the crystalline sample and they are more strongly absorbed than X-rays. Molecules scatter electrons much more strongly than X-rays. Therefore much smaller crystals can be used. Even crystals that are razor blade thin and are made up of only a few molecular layers are sufficient. Single molecules can even be imaged, but their molecular mass must exceed several million Daltons. Smaller molecular weights make periodically organized arrays of multiple molecules necessary. In the mean- time, membrane protein crystals have been successfully grown in two-dimensional periodic molecular orientation. The attempt to grow crystals of such proteins that 13.6 Electron Microscopy 279
  • 297.
    are large enoughfor an X-ray structure analysis has only worked a few times and requires very special additives for the crystallization. In recent times crystallization of membrane proteins has been successful in lipidic cubic phases. Sophisticated mixtures of lipid, water, and protein can form structured three-dimensional lipidic arrays that are pervaded by water channels. Protein molecules diffuse into this structured yet flexible matrix, which facilitates crystal nucleation and growth. In addition to the work with readily obtained crystals, electron radiation has another advantage over X-rays. It can be used for a diffraction experiment as well as for the direct visualization of an object. The microscopic visualization is unfortu- nately not possible with X-rays because a convergent lens cannot be built for X-rays. This is successful for electrons because they can be focused by using magnetic fields. Why not use an electron microscope to visualize molecules in general? Despite the reduced radiation, electrons still damage the samples con- siderably. Furthermore the crystals that are used represent about a millionth the sample size that is used for X-ray structure analysis. The data for an X-ray structure can be collected on one single crystal. In contrast, several hundred to thousand tiny, often only 5-mm large crystals are needed for electron microscopy. They are shock-frozen under high vacuum and directly exposed to the electron beam. Proteins can only withstand these conditions after special preparation. A very low radiation dose is worked with. Because of this, the images are very noisy and must be averaged over many observations. To obtain a detailed reso- lution in the plane perpendicular to the crystal’s plane, the crystal must be measured in many orientations. Fine structural details are lost in doing this. The analogous patterns in the electron diffraction diagram, as would be obtained in an X-ray experiment, can be corrected by computational methods. With the help of the Fourier transform, an electron density map of the molecule is obtained. Its interpretation or refinement is accomplished in the same way as for the X-ray experiment. The phasing that is necessary for the transform can be determined from the images in electron microscopy. The technique is relatively young and the methods are developing further. There is more work to be done. Structural determination still takes several years, and only a few laboratories have adequately powerful microscopes. Nonetheless, the knowledge that we have about the structure of membrane-bound receptors today is often based on the results that were achieved with this method (▶ Chap. 30, “Ligands for Channels, Pores, and Transporters”). 13.7 Structures in Solution: The Resonance Experiment in NMR Spectroscopy Many atomic nuclei have an angular momentum, or spin. The nuclei that occur in biological systems that have a nuclear spin are the hydrogen isotope 1 H, the carbon isotope 13 C, the nitrogen isotope 15 N, the fluorine isotope 19 F, and the phosphorus 280 13 Experimental Methods of Structure Determination
  • 298.
    isotope 31 P. Justas a top would, these nuclei rotate about their axes. As long as no magnetic field is applied, the tops orient in all possible spatial directions. In a magnetic field they are forced into alignment (Fig. 13.12). If a toy top is spun, it moves in the gravitation field. This field has one preferred direction. If the alignment of the rotation axis of the top and the direction of the gravitation field, which is oriented toward the center of the Earth, are not exactly the same, the top wobbles. The end of the rotation axis performs a circular movement, an arc, with a very precise rotational speed. It depends on the mass and geometry of the top. In physics this movement is known as precession. Atomic nuclei with a spin behave in a very similar way. In contrast to the macroscopic top, they obey the laws of quantum mechanics. This means that the rotation axes that their precession movement takes on can only adopt very specific angles with respect to the applied field direction. The result for the 1 H, 13 C, 15 N, 19 F, and 31 P nuclei is that the rotation axis for the precession arc can only be parallel or antiparallel to the direction of the field. The orientation in the direction of the field is energetically somewhat more favorable than the rotation antiparallel to b a Fig. 13.12 Atomic nuclei with a rotational momentum behave like a spinning top. In the absence of an external magnetic field, they orient in all possible directions randomly (a). Upon application of a magnetic field, they orient their rotation axes parallel or antiparallel to the direction of the field (b). The precession movement is oriented in an arc around the applied field direction. The two orientations, parallel or antiparallel, with respect to the direction of the field are energetically different. Because of this, there is a small difference in occupancy between the two states. By applying an electromagnetic field with a frequency that corresponds to the rotational speed of the top’s axis, the occupancy can be inverted. This resonance absorption, the exact frequency of which depends on the type of nucleus and its immediate chemical environment, is registered with a spectrometer. 13.7 Structures in Solution: The Resonance Experiment in NMR Spectroscopy 281
  • 299.
    the direction ofthe field. Statistically, therefore, more nuclear spins in the substance sample will align with the direction of the field. If an additional magnetic field is applied to the outer magnetic field, and its frequency corresponds to the precession frequency of the nuclear spin, the occupancy of “parallel” to “antipar- allel” spinning nuclei can be reversed and a resonance absorption for the sample can be registered. After a particular time span, the original situation is restored (relaxation). The rotational speed of the top’s axis for precession movements is character- istic for each type of nucleus. It depends further on the composition of the chemical environment in which the nucleus resides. A carbon atom of a phenyl ring has a different resonance frequency than that of an aliphatic chain. The relative position of the resonance absorption in relation to a standard reference is also called the chemical shift. Furthermore the individual nuclei can perceive the spin orientation of the neighboring nuclei. An alignment in the same direction as a neighboring nucleus is energetically different from that of an antiparallel orientation. This influence also modulates the rotational speed of the spin on the observed nucleus. The information transfer regarding the orientation or the magnetic state of the nuclei in the vicinity can be transmitted over several bonds. This transfer can even occur through space without any direct covalent connection. To measure an NMR spectrum (nuclear magnetic resonance), a solution of the substance has to be placed in a strong magnetic field. In addition, a variable electromagnetic field is applied to the sample. The frequencies at which the nuclei in the sample have resonance, meaning when they flip from parallel to antiparallel, are recorded. The resulting spectrum discloses information about the composition and the chemical environment around the studied nuclei. It contains information about the spatial structure of the molecules under investigation. Based on the work of Richard Ernst, multidimensional NMR techniques have been developed in the last 30 years. By using suitable measuring conditions and selectively irradiating electromagnetic fields, information about the mutual influence of resonance fre- quencies among individual nuclei is separated and analyzed. This either-way induced information transfer about the magnetic state of neighboring nuclei is apparent from the signal form of multidimensional spectra, which are registered in terms of cross peaks. Only the hydrogen isotope 1 H occurs in nearly 100% natural abundance. Therefore, it can be assumed that for statistical reasons, two 1 H nuclei will always be adjacent to each other in a molecule. In contrast, the 13 C and 15 N isotopes are scarce. As a result, statistically they are only very rarely found in the direct vicinity of one another. Data on the mutual influence of the magnetization of these nuclei are required for the spectra. Therefore it is necessary to enrich the proteins with the appropriate isotopes. For this, bacteria are fed with isotopically labeled substrates such as glucose or ammonium chloride and will then produce proteins that are isotopically enriched. It is even necessary to produce deuterated proteins for the structural investigation of very large proteins. Today, by using numerous spectroscopic techniques, spectra from proteins of more than 800 amino 282 13 Experimental Methods of Structure Determination
  • 300.
    acids have beensuccessfully interpreted. The following questions can be addressed by NMR analysis: • Which atomic nuclei occur in which chemical environment? • What is in the immediate, covalently connected neighborhood of these nuclei? Information about the spatial orientation of atoms in the vicinity is also contained within these spectral parameters. • Which geometric relationships are given between different segments of the polypeptide chain? This results from information transfer about magnetic states of nuclei that are not directly connected by covalent bonds. 13.8 From Spectra to Structure: Distance Maps Evolve into Spatial Geometries This last-mentioned observation, which results from the nuclear Overhauser effect (NOE), yields intramolecular distances of spatially neighboring but not directly covalently bound atoms. The entire connectivity, that is, the list of all covalent bonds within a molecule, and a list of the recorded intramolecular noncovalent distances are applied to generate the structure for the molecule (Fig. 13.13). For this purpose, so-called distance–geometry calculations are used to create the spatial coordinates of the atoms. Often multiple equally good structural models fulfill the experimentally deter- mined distance conditions in complex molecules. If the spectral parameters for a section of the structure are too scarsely distributed with too large distances, it is very difficult to achieve a unique spatial configuration of the atoms. Therefore, the generation of a structural model is coupled with molecular dynamics simulations (▶ Sect. 15.7). These calculations deliver geometries of molecules that represent energetically favorable 3D structures consistent with the spectral parameters. Multiple slightly divergent models are given in areas with few spectral conditions. Therefore, the NMR spectroscopists always suggest a bunch of structural solutions (Fig. 13.14). Attempts are often made to compare the quality of X-ray and NMR structures. Both methods measure different properties, and the structures are derived from different measured variables. This must be considered when making a direct comparison. The accuracy of an NMR structure fluctuates with the density and frequency of spectral distance constraints, while that of an X-ray structure mainly depends on the resolution of the diffraction experiment. 13.9 How Relevant Are Structures in a Crystal or NMR Tube to a Biological System? The discussed structure determination techniques investigate molecules in a crystal assembly or in solution in an NMR tube. Are these conditions at all relevant for the 13.9 How Relevant Are Structures in a Crystal or NMR Tube to a Biological System? 283
  • 301.
    biological conditions inan organism? Small flexible molecules change their geom- etry depending on the environment. They will adopt a different shape in a crystal, in solution, or in the binding pocket of a protein. Therefore the question can be asked whether the data from a small-molecule crystal structure are suitable to deliver information about the molecular geometry in a binding pocket. From the numerous known crystal structures, and in the meantime it is more than 500,000, some general principles about the molecular architecture of organic compounds can be deduced. All of the published crystal structures are electronically archived at the Cambridge Crystallographic Data Centre in England. They can be retrieved and compared with one another. It will be shown in ▶ Chaps. 14, “Three-Dimensional Structure of 0.0 10.0 8.0 6.0 4.0 2.0 0.0 2.0 4.0 PPM 6.0 8.0 10.0 A B B B A A H2N H2N COOH COOH Fig. 13.13 A multidimensional NMR spectrum contains information about the spatial vicinity of atomic nuclei in a molecule (here, the trypsin inhibitor from bovine pancreas). It is expressed in so- called cross peaks. Information can be extracted about the distance between non-covalently bound atoms in a molecule. The individual signals of the spectra are assigned to atoms in the molecule (e.g., A and B). The positions that these atoms have in the polypeptide chain are known from the sequence of the protein (above left). The intensity of the cross peak indicates which spatial distance is found between nuclei A and B in the folded polypeptide chain (above right). Just as was done for A and B, the many other cross peaks are evaluated and translated into distance conditions. 284 13 Experimental Methods of Structure Determination
  • 302.
    Biomolecules” and ▶16, “Conformational Analysis” that valuable information about possible molecular and interaction geometries are available through a statistical evaluation of these data, which provides insights also relevant for the conditions in a protein binding pocket. Nevertheless, are the structures in the crystal of the protein too remote from the conditions in a biological system, much further than, for instance, the solution- phase state? A good many structure determinations that were carried out in solution and in the crystal in parallel are available. Experience has shown that the correlation is usually very large. Deviations are preferably found on the surface area of proteins. There, the amino acid side chains form interactions with the environment. Therefore, these deviations are not surprising. The crystal packing of tumor necro- sis factor (TNF) is presented in Fig. 13.11. Large holes are conspicuous in the crystal packing. These areas are filled with water molecules that are so loosely incorporated into the crystal that they can freely move to a large extent. Therefore, they are not locatable in the electron density. Channels filled with water in protein Fig. 13.14 The accuracy of an NMR structure depends on the density of the experimentally determined atomic distances. These come from experiments that deliver information about the exchange of the magnetic state of spatially adjacent, but not directly connected atoms (so-called nuclear Overhauser effect, NOE). With the connectivity list and the NOE conditions, multiple structural models are generated. These models represent the low-energy geometries that agree with the spectral parameters. In the left part of the figure (a) the experimentally measured NOEs (black dashed lines) are distributed over the 3D structure of a domain of the guanine nucleotide exchange factor. For the sake of clarity, only the long-range NOEs are shown. Most of the amino acid side chains are also suppressed; many of these NOEs therefore indicate the positions of atoms that are not shown. In areas in which very few distances could be determined (e.g., in the green loop areas or at the termini), the model is ambiguously defined. Multiple models are consistent with the experimental data (b). The main chain of the protein fans out. In areas where a large number of NOE conditions are found (e.g., the helices and the central b strand), the structural models diverge only slightly from one another. 13.9 How Relevant Are Structures in a Crystal or NMR Tube to a Biological System? 285
  • 303.
    crystals can makeup to 70% of the crystal’s mass! Therefore, the crystal can also be considered as a highly concentrated, ordered solution. NMR measurements also require high concentrations. They are considerable higher than in biological systems, but are still 10–100 times lower than in protein crystals. The high water content of protein crystals offers the possibility to allow small molecules to diffuse into the crystals. In the water channels, they move as they would in an aqueous solution. In favorable cases, the binding pocket of the protein is directly accessible from one of these channels. By placing the protein crystal directly in a solution of the active substance (soaking), the latter can penetrate the crystal through the channels, diffuse into the binding pockets, and dock there. Then a new diffraction experiment is carried out with the loaded crystal. The reflections are measured, and, based on the known structure of the protein, the electron density map is generated. The density of the uncomplexed protein is subtracted from that map. The difference density of the incorporated ligand remains. This information is of essential importance for understanding the interactions between small molecules and proteins. The question of whether the experimental structure is really relevant for the biological conditions has still not been answered. Crystalline hemoglobin is able to reversibly take up and release oxygen. It could be shown on crystals of purine nucleoside phosphorylase (PNP) that the enzyme is still catalytically active in the crystal (Fig. 13.15). The research group of Malcolm Walkinshaw at the University of Edinburgh could even show on the example of the enzyme Cyp3 , a peptidylproline isomer- ase, that there is a quantitative agreement between the crystalline and solution states. Different concentrations of an inhibiting prolyldipeptide were allowed to diffuse into the crystal. Afterward, the occupancy of this inhibitor obtained from the differently concentrated soaking solutions was determined in a crystallo- graphic experiment. The binding constants were then ascertained from this occupancy data. They quantitatively agreed with the inhibition constants that were determined in a functional assay in solution. The diffraction data can be very quickly collected with even more intense, so-called white X-rays from a synchrotron source (the so-called Laue technique). With this experiment, it was possible to observe stable intermediates of enzyme reactions. Structural changes of the two-dimensional crystals of the acetylcholine receptor (▶ Sect. 30.4) could be observed with electron microscopy after loading with the natural ligand. This and other experiments have proven that proteins exist in a crystal lattice that must be, at the very least, very similar to the biologically active form. 13.10 Synopsis • The most powerful methods to determine the spatial structure of molecules are X-ray crystallography and NMR spectroscopy. The former requires the bio- molecules to be arranged in periodic arrays in a crystal, and the latter studies them in solution, usually in an isotopically labeled form. 286 13 Experimental Methods of Structure Determination
  • 304.
    • Crystals needspecial conditions to grow from saturated solutions. They spatially arrange in periodic arrays, and the molecules pack through translational sym- metry in three dimensions. In addition to the pure shifting of basic motifs, usually one molecule that represents the asymmetric unit, symmetry operation such as mirror reflection, two-, three-, four-, and six-fold rotation or inversion can be applied. • Crystal lattices diffract X-rays and the diffraction experiment can be understood as a three-dimensional interference of elementary spherical waves generated at the positions of the atoms in the lattice. The diffraction phenomenon at a 3D lattice can be treated formally as multiple reflections at crystal planes in the lattice. • Because the relative phases of the generated elementary spherical waves, superimposed in the various reflections, are not accessible by experiment, they O O O HO N NH N N NH2 N NH N N NH2 H OH OH H2PO3 − O O H OPO2H− + + PNP OH OH Crystal removed Reaction rate Crystal removed Crystal soaked Crystal soaked Time Fig. 13.15 The enzyme purine nucleoside phosphorylase (PNP) transforms guanosine and phosphate to guanine and ribose-1-phosphate. If a protein crystal is placed in a solution of the substrate, the reaction begins. This could also have been caused by a partial dissolution of the enzyme crystal. If the crystal is removed from the solution, the reaction stops. If the crystal is brought back into the solution, the reaction carries on. This experiment demonstrates that even crystalline enzymes are catalytically active. Therefore, a geometry must be present in the crystal that corresponds to the biologically active form. 13.10 Synopsis 287
  • 305.
    must be regeneratedby sophisticated phasing methods. Only then can a Fourier transform be calculated from the measured reflections that represents the spatial distribution of the electron density in the crystal. A model of the crystallized molecules is assigned to this electron density. • The diffraction power and resolution of the crystals determine the accuracy of the resolved structure. For proteins, a resolution of 1.5–3 Å is usually achieved. At the lower end, molecular building blocks such as phenyl rings are well resolved, and individual water molecules are visible. At the upper limit, only the overall topology is determined, and the water molecules usually cannot be assigned. • The crystal structure is an average structure over space and time. Enhanced B-factors give an estimate of the residual mobility of molecular portions in a molecule. • Cryoelectron microscopy is an alternative method to determine the structure of membrane-bound proteins in particular by diffraction experiments. Data are collected from thousands of tiny razor blade-thin crystals. • NMR spectroscopy records the resonance of magnetic nuclei such as 1 H, 13 C, or 15 N oriented in a strong magnetic field. The transition between parallel and antiparallel orientation of the nuclear spins can be induced by additional fields. Because the frequency at which these transitions take place depends on the chemical environment in a molecule, the spectral parameters contain informa- tion about the 3D structure of the molecules in solution. • The multiplicity of the recorded spectral parameters can be transformed into distance maps. They can be translated into the spatial structure of the protein by using a distance geometry approach coupled with molecular dynamics simulations. • It could be shown for many cases that the NMR structure of a protein in solution and the X-ray structure in a crystal largely coincide with one another. Differ- ences are observed for the surface-exposed residues. • Protein crystals contain up to 70% water and exhibit large water channels that pass through the crystal. Small molecules can diffuse and access binding sites through these channels, particularly if these sites are accessible from one of these channels. The binding modes of small-molecule ligands can be easily deter- mined by using soaking techniques. • The significance of the architecture of proteins determined in a crystalline envi- ronment for biologically relevant conditions has been demonstrated. For example, enzyme reactions also take place when the protein is arranged in a crystalline state. Bibliography General Literature Blundell TL, Johnson LN (1976) Protein crystallography. Academic, London Drenth J (1994) Principles of protein X-ray crystallography. Springer, Berlin Dunitz JD (1979) X-ray analysis and the structure of organic molecules. Cornell University Press, Ithaca 288 13 Experimental Methods of Structure Determination
  • 306.
    Friebolin H (2010)Basic one- and two-dimensional nmr spectroscopy. Wiley-VCH, Weinheim Glusker JP, Trueblood KN (1985) Crystal structure analysis, a primer, 2nd edn. Oxford University Press, New York Glusker JP, Lewis M, Rossi M (1994) Crystal structure analysis for chemists and biologists. VCH Publishers, New York Pellecchia M, Bertini I, Cowburn D et al (2008) Perspectives on NMR in drug discovery: a technique comes of age. Nature Rev Drug Discov 7:738–745 Wuthrich K (1986) NMR of proteins and nucleic acids. Wiley, New York Special Literature DeRosier DJ (1993) Turn-of-the-century electron microscopy. Curr Biol 3:690–692 Wear MA, Kan D, Rabu A, Walkinshaw MD (2007) Experimental determination of van der Waals energies in a biological system. Angew Chem Int Ed 46:6453–6456 Bibliography 289
  • 308.
    Three-Dimensional Structure of Biomolecules14 In drug design the ligand, which is generally a small organic molecule with a molecular weight of under 500 Da is under focus. It undergoes interactions with a macromolecular receptor and exerts an influence on the receptor’s characteristics. On the other hand, the surrounding receptor can also determine the properties of the bound active ligand. Selective interference in these interactions requires not only an understanding of the ligand but also the receptor. After the methods for the structural determination of biomolecules were introduced in the last chapter, we want to take a look at what can be learned about the construction principles and characteristics of these molecules. Proteins are made up of 20 basic building blocks, the amino acids (see Appendix 1). A dipeptide is formed by coupling two amino acids through an amide bond. Larger peptides and proteins are formed by the addition of further amide bonds. 14.1 The Amide Bond: Backbone of Proteins The simplest molecule with an amide bond is formamide 14.1. Its structure is shown in Fig. 14.1. This connection occurs many hundreds of times in proteins, for instance, over 50,000 times in the shell of the rhinovirus. The bond length between the carbon, oxygen, and nitrogen atoms can be obtained from the crystal structure of formamide. The microwave spectrum of gas-phase formamide also affords bond lengths, but different values are obtained. In the gas phase, formamide is “isolated,” that is, it does not “perceive” any neighbors in its immediate vicinity. The C═O double bond is shorter, and the C–N single bonds are longer than in the crystalline formamide. In the crystal assembly, the individual formamide molecules are not “alone.” They are connected to neighboring molecules by hydrogen bonds. A hydrogen bond is a non-covalent interaction. It couples a functional group carrying a hydrogen atom (e.g., NH or OH) with an electronegative heteroatom (e.g., N, O; ▶ Sect. 4.2). Obviously, incorporating a molecule into a network of G. Klebe, Drug Design, DOI 10.1007/978-3-642-17907-5_14, # Springer-Verlag Berlin Heidelberg 2013 291
  • 309.
    hydrogen bonds causesa change in its geometry. The electron density between the atoms is shifted so that the C═O double bonds are longer and consequently weaker. Simultaneously, the C–N single bonds become shorter and stronger. Twisting the molecule around this bond away from planarity is therefore made difficult. The amide bond is the fundamental building block of proteins. Every third bond in the polymer chain is an amide bond. As we have seen in formamide, they have a planar geometry, that is, a plane can be defined through its atoms. The folding of the polymer chain and the concomitant spatial construction of the protein is determined by the torsion angle in the plane of the amide bonds against one another (Fig. 14.2). Its rigidity and planarity is decisive for the stability of the spatially folded protein. In proteins, the amide bonds are practically only in the trans configuration. Only the rotation around the plane of the amide bond remains as a degree of freedom for the polymer chain. These torsions (▶ Chap. 16, “Confor- mational Analysis”) occur around bonds that lie between the Ca carbon atoms. As was shown in the bond-length comparison between the gaseous and crystalline formamide, the decisive additional stiffening of the amide bond is caused by its incorporation into a hydrogen-bonding network. Bond length in Å C=O C-N Formamide N H H Crystal assembly Gas phase 1.241 1.219 1.318 1.352 H O 14.1 H H N C O H Fig. 14.1 Formamide 14.1 is the smallest molecule that has an amide group. Its molecular structure is shown the lower part. Because of thermal motion in the solid state the molecule carries out vibrational movements. Its electron density is therefore distributed over a larger area. This is described by using ellipsoids that encompass the 50% probability of occurance the atom. Two hydrogen bonds are incurred between the carbonyl group and the amide group of a neighboring molecule in the crystal packing. An extended H-bond network stabilizes the crystal structure and polarizes the amide group. The bond lengths (in Å) are different in the crystal assembly and in the gas phase (upper part). 292 14 Three-Dimensional Structure of Biomolecules
  • 310.
    180 −100 Ψ −180 −160 −140 −120 −100 −80 −60 −40 −20 0 20 40 60 80 100 120 140 160 180 Φ −120 −140 −160 −180 160 140 120 100 80 60 40 20 0 −20 −40 −60 −80 Cα H N O C H ψ H H Cα R N H φ C O O Cα Fig. 14.2The spatial course of a polypeptide chain is determined by the relative orientation of the planar peptide bonds (above). The twist of these planes against one another is measured on the basis of the two twisting or dihedral angles f and c. These do not assume any value around the bond axes, but rather are limited to a few combinations of value ranges. In the diagram, a so-called Ramachandran plot, the values for both angles (below) along the peptide chain are plotted. The angle combinations for an a helix are found in the middle left (Fig. 14.3), and those for a b-pleated sheet in the top left (Fig. 14.4). 14.1 The Amide Bond: Backbone of Proteins 293
  • 311.
    14.2 Proteins Foldin Space to Form a Helices and b Strands Typically, the angles named f and c are used to describe the two dihedral angles around the Ca carbon atom, and these angles usually take on value pairs from two ranges. These ranges are related to a helical or sheet-like course of the polymer chain (Fig. 14.2). In an a helix with a right-handed turn, all CO and NH groups orient in the same direction (Fig. 14.3). They form a network of H-bonds among themselves. Each amino acid in the helix is in contact with the next fourth amino acids in the sequence. This unidirectional orientation of the polar groups of the R R R R R R R R R R R 1 2 3 4 5 6 7 8 9 10 11 a b Fig. 14.3 The a helix is a commonly found secondary structure. The polypeptide chain forms a right-handed spiral with a pitch of 7 Å, and 3.6 amino acids per turn (a). All carbonyl groups (oxygen is red) are oriented parallel to the helix axis in the same direction. The NH functionalities (nitrogen is blue, hydrogen is light blue) are oriented in the opposite direction. The groups form a pronounced hydrogen-bond network (violet dashed line) between themselves (b). The side chain (R) on the Ca atoms are on the outside pointing away from the helix axis. This forms a typical furrow pattern that runs in a spiral over the surface. This “ridge and groove” pattern determines the mutual packing of a helices in proteins. 294 14 Three-Dimensional Structure of Biomolecules
  • 312.
    amide bonds inan a helix has consequences for the electrostatic characteristics (▶ Sect. 15.4). Whereas a helix is made up of amino acids from a single segment of the peptide, amino acids from at least two sequence sections must come together to form a b-pleated sheet. Both strands can be bonded with each other in either a parallel or antiparallel orientation relative to the polymer chain (Fig. 14.4). This network exhibits a different progression of H-bonds for both orientations. The side H H H H H H H H H H H H H H H H H H H H H N O R C N O R C N O R C N O R C N O R C N O R C N O R C C C C C C C C C C O O O O R R R N N N C C C C N Antiparallel H H H H H H H H H H H H H H H H H H H H H C N O R C N O R C N O R C N O R C N O R C N O R C N O R C C C C C C C C C C O O O O R R R N N N N C C C C Parallel 7 Å R R R R R R R R Fig. 14.4 A second important secondary structure, the b strand is composed of multiple sections of the polymer chain that exist in a stretched conformation (top). The strands can run parallel or antiparallel. They are crosslinked to each other via hydrogen bonds (violet). The sheet-like structure displays a zigzag wrinkle and is called a b-pleated sheet. The side chains (R) of the amino acids point away from, and alternate above and below the pleated sheet. 14.2 Proteins Fold in Space to Form a Helices and b Strands 295
  • 313.
    chains alternate aboveand below the pleated sheet. The entire strand is slightly twisted upon itself. Because of this a pleated sheet of multiple strands has a twist to it when viewed from the side (Fig. 14.5). Aside from these two common secondary structures, other typical combinations of torsion angles occur. A polymer chain that folds to a globular structure in space must reverse its direction. This is achieved in the so-called turn or loop region. Turns can be classified according to the number of involved amino acids and the type of interaction that closes the turn. Loops that form a C═O···H–N hydrogen bond in the direction of the polymer chain, inverse turns with hydrogen bonds in the reversed orientation, and open turns in the chain that are held together by van der Waals interactions and polar interactions can be distinguished from one another (Fig.14.6). A total of 158 turn classes were summarized in a recent evaluation by Oliver Koch. What force effectuates the organization of a protein? Amino acids possess hydrophilic and hydrophobic side chains. Hydrophobic groups avoid aqueous environments (▶ Sect. 4.2). During the folding of the polymer chain in an aqueous medium the hydrophobic amino acids aggregate to diminish their common hydro- phobic surface. That is why the hydrophobic amino acids are predominantly found in the inside of a folded protein. The polar groups of the amide bonds of the main chain become saturated in the secondary structure by hydrogen bonds. The side chains of polar amino acids are only found on the inside of a protein if they can form a polar interaction with another amino acid in the vicinity. Other- wise they orient themselves on the outside of a protein; they protrude into the surrounding water. Proteins can also span a cell membrane. In those areas where they have contact with the membrane they have a large, cohesive hydrophobic surface (Sect. 14.7). If the packing density in the interior of the protein is Fig. 14.5 Within a b-pleated sheet of multiple strands, here shown with a parallel orientation, a right-handed twist occurs. For simplification, the single b strands are indicated with an arrow. The twist can be seen by the internal rotation of the arrow. The pleated sheet here is shown in two perpendicular views. 296 14 Three-Dimensional Structure of Biomolecules
  • 314.
    considered, it ison the same scale as is found in crystals of small organic molecules. The interactions that determine the molecular packing are identical in both cases. 14.3 From Secondary Structure Via Motifs and Domains to Tertiary and Quaternary Structure Proteins organize their secondary structural segments in motifs. As an example the sequence of an a helix, a b strand, and another a helix makes up one motif. Multiple motifs fold into domains to yield the tertiary structure of a protein. Domains can be preferably constructed from helices, pleated sheets, or a combination of both building blocks. Often the domain has a particular function. Many proteins are made up of a single domain. Complex proteins can be built from multiple domains. If a complex assembly of multiple separate polymer chains forms (e.g. as in hemoglobin), this will be referred to as quaternary structure. Despite the enormous multiplicity that can be achieved by combining the 20 amino acids into sequences, there seems to be a rather limited number of folding possibilities for the domains. How many total folding patterns exist can be specu- lated upon. Of all the crystal structures that are known today, 1,150 different folding patterns have been found. Because no new examples have been found in the last years despite intense efforts, it can be assumed that there are perhaps 1,200 stabile patterns. This number is essentially based on data from globular enzymes and transport pro- teins. Approximately 30% belong to one of the classes shown in Fig. 14.7. To date perhaps only 100 structures are known from the group of membrane-bound proteins. On the basis of these examples it seems difficult to make an estimate about possible additional folding classes to be found in membrane proteins. Drug design concentrates on the interaction of a ligand with a protein. Therefore the structural considerations of chemists are usually limited to the amino acid COi – NHi+n Cαi – Cαi+n NHi – COi+n 3–6 2–6 4–6 Amino acids Fig. 14.6 The polymer chain of a globular protein reverses its direction in the loop or turn area. Numerous turn patterns have been found. They are made up of 2–6 amino acids. Normal turns (left) form a C═OHN hydrogen bond (violet) in the direction of the polymer chain. This hydrogen bond has a different order in inverse turns (middle). Another group of open turns (right) is held together by van der Waals contacts and polar interactions. 14.3 From Secondary to Tertiary and Quaternary Structure 297
  • 315.
    Fig. 14.7 Thecourse of the polypeptide chains is symbolized with spirals for a helices, with arrows for b-pleated sheets, and with threads for different turn segments. Approximately 30% of the structurally known proteins can be assigned to one of the nine shown folding classes. The first folding pattern (bottom left) is a “TIM barrel,” and the one above is an open pleated-sheet structure. 298 14 Three-Dimensional Structure of Biomolecules
  • 316.
    groups that protrudeinto the binding pocket. The folding pattern in the vicinity of the binding pocket, however, exerts an influence on the properties that are found there. For example, a helix that is arranged toward the binding pocket decisively determines the local electrostatic potential. Even this can be exploited for the design of selective ligands that bind only to proteins of a particular folding class. Despite progress in the methods of structure determination techniques, it can occur that the structure analysis of an important protein fails, but the structure of, for example, a related protein can be solved. A model of the desired protein can be built on this basis (▶ Sect. 20.5). Information about the construction and folding principles of proteins are needed for this purpose. They allow the understanding of what part of the protein stabilizes the scaffold, what parts determine functions, and what parts make up the differences between homologues. An in-depth discussion of these principles would go too far here. As an example, the folding pattern of the b barrel should be examined. A stretched-out sheet of multiple b strands has an internal twist (cf. Fig. 14.5). If, as an example, eight such strands are lined up next to one another, a cylinder is formed. This barrel-like folding pattern of eight and more strands is often observed. Several variations of this folding pattern are displayed in Fig. 14.8 that show how, and according to which principles, a polypeptide can spatially fold. A loop acts as a connecting element between the pleated sheet strands of the b barrel in the example in Fig. 14.8. a Helices can also serve as connecting elements (Fig. 14.7). A barrel-like structure forms on the surface, and the bridging a helices align on its surface. This folding pattern was first discovered in triosephosphate isomerase. It is therefore called a TIM barrel (Fig. 14.7). Another important folding class that is made up of a-helical and b-pleated sheet segments are the open-sheet structures (Fig. 14.7). In this class the pleated sheet does not close to a cylinder but rather it remains open. Helices group above and below the sheet. 14.4 Are the Fold Structure and Biological Function of Proteins Correlated? How is the structure of a protein coupled to its function? Do all proteases, for example, display the same folding pattern? A large number of enzymes that have distinctly different functions all belong to the TIM barrel type, or the open-sheet structure. There are many oxidases, isomerases, kinases, aldolases, synthases, dehydrogenases, or proteases that can be assigned to these two classes. Here, Nature started from a common origin and developed divergently. Consequently, the function of a protein is not necessarily coupled to a particular folding pattern. If the construction of the enzyme is analyzed further, it turns out that the catalytic sites of the proteins of a folding class are at the same position. This is found at the C terminal end of the barrel in the TIM-barrel structure, and at the topological switch of the connecting helices from the upper to the lower side of the open-sheet structure (Fig. 14.9). 14.4 Are the Fold Structure and Biological Function of Proteins Correlated? 299
  • 317.
    C N C C N 4 1 23 4 1 2 3 2 8 1 7 6 3 4 5 N C C N N C N C N C N a b c Figure 14.8 The folding pattern of different b-barrel structures can be thought of as a polymer chain with eight separate b strands (arrows). These are separated by loop areas. (a) An up-and-down barrel forms when the folding of the polymer chain of eight b strands follows a zigzag pattern. The antiparallel sections form hydrogen bonds between themselves that close up to form a cylinder. (b) The four-b-strand polypeptide chains lie next to one another so that the first chain interacts with the fourth, and the second interacts with the fifth. Then the double strand folds and the first pair comes to lie next to the second. Because the course of the polymer chain is reminiscent of the engravings on Greek vases, the pattern is called a Greek key. Two such patterns can come together into a cylinder-like orientation and form a Greek-key barrel. (c) Another folding pattern is formed from a double strand that is placed together with an internal twist. The double strand wraps itself into a cylinder-like structure that is called a jelly roll. 300 14 Three-Dimensional Structure of Biomolecules
  • 318.
    The function-determining aminoacids occur in the loop area between neighboring pleated sheets and helices. Why would Nature follow this principle of separating the folding structure from the function? The amino acids that enable the stable folding of a domain are separated from those that induce a specific function. This approach is a very efficient evolutionary strategy. Two areas were simultaneously optimized: • The stability of the protein scaffold in special folding patterns • The layout of the amino acid sequence to serve a special function. Spatially separating and displacing the function-carrying groups in the structur- ally less-committed loop areas allowed the two tasks to be optimized in parallel. Exchanging a single amino acid in a secondary structure element could destabilize the entire folding pattern and stop the folding. This is avoided if the amino acid sequence that is to be functionally optimized is placed on a stable scaffold that does not interfere with the optimization. A protein class that implements this principle to perfection are the immunoglob- ulins. As antibodies they recognize and bind to xenobiotics, the antigens. To remove an antigen, immunoglobulins with highly specific binding pockets and high affinity must be available within a few days. The recognized substances could be anything from small organic molecules to large proteins. Despite this, it is estimated that about 1012 different variable sequences are formed based on only about 25,000 human genes. The difficult task of achieving such high diversity is solved by immune-system cells by using a combination of different variable gene segments and excessive amino acid exchange in these segments during lymphocyte maturation. In this way, variable loop areas are formed that are set upon a stable scaffold of Fig. 14.9 The folding-pattern-determining and function-carrying amino acid groups are found in proteins in different regions. (a) The catalytic site (yellow spheres), which binds and transforms substrates lies in a TIM-barrel-type structure (a helices: red cylinder, b strands: light-blue arrows) at the end of the barrel where one would expect to find a lid. The loops of the polymer chain that surround this “lid” (gray and green threads) carry the function-determining amino acids. (b) The function-determining amino acids in the loop area occur in the open-pleated-sheet structure there, where the attached helices change from the top to the bottom of the pleated sheet. 14.4 Are the Fold Structure and Biological Function of Proteins Correlated? 301
  • 319.
    barrel-like pleated sheetstructures (Fig. 14.10). The therapeutic value of such bio- molecules (so-called biologicals) has been recognized. Many humanized antibodies can be found in development as therapeutics (▶ Sect. 32.3). 14.5 Proteases Recognize and Cleave Substrates in Well-Tailored Pockets Proteases cleave polypeptide chains during enzymatic degradation or upon the release of an active protein or peptide from an inactive precursor form. For this, the enzymes possess a catalytic site in which the cleavage takes place (Sect. 14.6 and ▶ Chaps. 23, “Inhibitors of Hydrolases with an Acyl–Enzyme Intermediate”; ▶ 24, “Aspartic Protease Inhibitors”; and ▶ 25, “Inhibitors of Hydrolyzing Metalloenzymes”). To recognize a particular substrate specifically, multiple binding pockets are on their surface. These are structurally complementary to the side chains of the substrate that orient themselves around the catalytic sites. In 1967 Israel Schechter and Arieh Berger proposed a system of nomenclature to describe these pockets (Fig. 14.11). The position of the amino acids of the peptide substrate are described as P3, P2, P1 P1 0 , P2 0 , P3 0 and so forth. Starting at the N terminus, the position P1 is immediately before and the position P1 0 is immediately after the cleavage site. The binding pocket of the enzyme for the side chain of the amino acid P1 is called S1, and the same goes for the other side chains. This very useful nomenclature is initially purely formal. The translation of these labels to a particular enzyme does not mean that the named binding pocket really exists. Two binding pockets can appear as one large binding pocket Fig. 14.10 The immunoglobulins form a highly specific binding pocket in which they recognize antigens, which are exogenous substances. The enormously large structural variety of these binding pockets is achieved by variations in the amino acids in the loop areas. The immunoglobulins have a Y-like form that is divided into a trunk (constant Fc domain) and two identical Fab branches (a). The course of the polymer chain in these branches corresponds to the barrel type. The antigen- binding site is indicated by an arrow. Picture (b) is an enlargement of the circled branch in (a). Loops are found at the right end (colored) that are responsible for the recognition of exogenous substances. They grasp the antigen (here dark red) like the fingers of two hands. 302 14 Three-Dimensional Structure of Biomolecules
  • 320.
    in the 3Dstructure. The S3 and S4 binding pockets in the serine protease thrombin are really only one large pocket (▶ Sect. 23.3). It can also happen that a substrate amino acid has no complementary binding pocket in the enzyme. It then pro- trudes into the water. 14.6 From Substrate to Inhibitor: Screening of Substrate Libraries Peptides are easily synthesized with enormous diversity (▶ Sect. 11.5). If the peptide is attached to a probe that changes its color or fluorescence upon release (▶ Sect. 7.2), the labeled peptide can be used to ascertain the substrate profile of the protease. For this purpose a large library (▶ Sect. 11.1) of these peptides is offered to the protease, and the members that are well cleaved are identified. In Fig. 14.12 the amino acid composition of a labeled tetrapeptide is given that is preferably cleaved by the proteases trypsin, factor Xa, plasmin, and chymotryp- sin. Peptides with basic groups such as arginine or lysine are preferably cleaved by trypsin, plasmin, and factor Xa. Factor Xa converts peptides with arginine in the P1 position almost exclusively. Chymotrypsin behaves entirely differently. It prefers to have aromatic amino acids such as tyrosine, phenylalanine, and tryp- tophan in the P1 position. The selectivity at the positions P2 to P4 is not nearly as pronounced. Trypsin transforms tetrapeptides that have branched groups at P2 such as Phe, Tyr, Trp, Ile, or Val much more poorly if an arginine is at the P1 position. Basic groups are also less preferred. Trypsin shows virtually no selec- tivity at the P3 and P4 positions. Factor Xa has a particular preference for the small glycine at position P2, but hardly any difference at all is seen for the groups in the N N N N N O R2 R1 O R1⬘ O R2⬘ O R3⬘ R3 N O S3 S1 S2⬘ S2 S1⬘ S3⬘ P3 P2O P1 P3⬘ P2⬘ P1⬘ H H H H H H Fig. 14.11 The side chains of a peptide substrate and the binding pockets that they belong to them are classified on the N-terminal side of the peptide as P3, P2, P1. . . or S3, S2, S1. . . (left); on the C-terminal side they are classified as P1 0 , P2 0 , P3 0 . . . or S1 0 , S2 0 , S3 0 . . . (right). 14.6 From Substrate to Inhibitor: Screening of Substrate Libraries 303
  • 321.
    NH2 O V F K HD E N S T Y R W G A P I L n F K H D E N Q S T Y R W G A P V I L n Q V F K H D E N S T Y R W G A P I L n F K H D E N Q S T Y R W G A P V I L n Q O N O N H O O N H N P4 P2 O H O O H N H P3 P1 a Trypsin Faktor Xa Plasmin Chymo- trypsin O NH2 P4 P2 O N H O N H O N H O O N H O N H O P3 NH P1- constant NH H2N R K H D E N Q S T Y F WG A P V I L n R K H D E N Q S T Y F WG A P V I L n R K H D E N Q S T Y F WG A P V I L n R K H D E N Q S T Y F WG A P V I L n R K H D E N Q S T Y F WG A P V I L n R K H D E N Q S T Y F WG A P V I L n b Trypsin P4 P3 P2 P3 P4 Faktor Xa P2 Fig. 14.12 A tetrapeptide library, held constant in position P2 to P4, was varied at position 1 with 19 amino acids (one-letter notation; n norleucine). It is cleaved by trypsin after arginine and lysine, by factor Xa after arginine, and by plasmin after lysine (a). If arginine is held in position P1 and the remaining three positions are varied, trypsin shows practically no selectivity for the amino acids at P2, P3, and P4. On the other hand, factor Xa prefers a glycine in position P2 (b). 304 14 Three-Dimensional Structure of Biomolecules
  • 322.
    P3 position forthis enzyme. On the other hand, different groups in the P4 position are more strongly selected. The substrate-binding profile helps to expose the selectivity characteristics of enzymes. They display the complementary proper- ties in the binding pocket and help to inspire the first ideas about the design of imaginable inhibitors. This concept was applied to cysteine proteases in the research group of Jonathan Ellman at the University of California at Berkley. Substrate molecules were synthesized that carried a fluoresence marker at the end of an amide bond that was to be cleaved. Different organic building blocks were placed on the other side. If such a substrate molecule is cleaved by the protease, the organic part must be bound in the binding pocket of the enzyme. Therefore, the transformation indicates the binding of a test molecule. The method can be optimally used for screening. A hit that is discovered in this way can easily be chemically transformed from a substrate molecule to an inhibitor. If the cleaved amide bond is replaced with, for instance, an aldehyde function, a cysteine-protease inhibitor (▶ Sect. 23.9) can be developed that has very little in common with the peptide substrate. 14.7 When Crystals Learn to Walk: From Static Crystal Structures to Dynamics and Reactivity What kind of information about the dynamics and reactivity of molecules can be extracted from a crystal structure? Molecular vibrations are visible even in the solid state. This is reflected in the blurriness of the electron density. If a molecule takes part in a reaction, bonds are broken and new ones are formed. The formation and cleavage of amide bonds is a central task in biochemical processes. The molecule 14.2 contains an amide and an ester group (Fig. 14.13). If a crystal of this compound is exposed to thermal energy, a reaction takes place in the solid state to form 14.3. The molecule is in a geometry in the incipient crystal structure that is conducive for entry into the reaction pathway. Having information about changes in the geometric orientation of functional groups in the chemical reaction is decisive for understanding the concomitant structural changes that occur. This knowledge is a prerequisite for the design of transition-state-analogue inhibitors (▶ Sects. 6.6 and ▶ 22.3). In view of the for- mation or cleavage of an amide bond, the question is posed: from which direction does the amino group attack the carbonyl carbon in the course of the nucleophilic addition to form a new bond? In the early 1970s Hans-Beat B€ urgi and Jack Dunitz began to extract information about the geometric changes along such reaction steps from crystal structures. Before there were movies and television, people developed creative ideas to bring pictures to movement, for example, with flip-books (Fig. 14.14). These impart the impression of the dynamic sequence of a story. Let us imagine that because of frequent use, the pages of the little book have fallen apart and are now in disarray. You must bring them into the correct order again. Ordering criteria are needed in this case. A similar 14.7 When Crystals Learn to Walk 305
  • 323.
    task is posedfor the organization of structural data to describe a reaction. Particular crystal structures are sought from databases of known crystal structures (▶ Sect. 13.9) in which an amino group is in the vicinity of a carbonyl group, as in the structure of 14.2. Finally they are brought into a logical order (Fig. 14.15). The systematic comparison of crystal structure data affords a first understanding of structural molecular properties, for instance, about the preferred conformation (▶ Sect. 16.4). The geometry of non-covalent interactions can also be evaluated this way. The side chain of the amino acid histidine contains an imidazole ring with its two nitrogen atoms. In the neutral state one of these nitrogen atoms is a hydrogen- bond acceptor, and the other is a donor. There are hundreds of molecules with an imidazole ring in the database of low-molecular-weight crystal structures. In these structures the imidazole ring has, in fact, acceptor and donor interactions, usually with neighboring molecules. All these structures are superimposed upon one another based on their common imidazole ring (Fig. 14.16). It shows in which spatial direction the imidazole nitrogen atom’s hydrogen-bonding partner is found. The task of estimating the possible interaction positions in the binding site of the protein for the functional groups of a ligand is undertaken in the course of de novo drug design (▶ Chap. 20, “Protein Modeling and Structure-Based Drug Design”). Fur- ther, this information is needed for comparing the binding properties of molecules (▶ Chap. 17, “Pharmacophore Hypotheses and Molecular Comparisons”) or for the exploration of binding pockets for their preferred ligand-binding sites (hot spots). H2N O CH3 HN O CH3 O O HO O 14.2 14.3 a b C(9) C(8) C(2) C(3) C(4) C(5) C(6) C(1) O(1) C(7) N(1) O(3) O(2) Fig. 14.13 If thermal energy is applied to a crystal of 14.2, the carbonyl group of the ester function reacts with the amide NH2 group and an imide bond is formed between N1 and C8 to give 14.3 (a). There must be implied vibrational motion (b) that ends in the reaction. Simultaneously the ester bond between C(8) and O(2) is cleaved during the reaction steps. 306 14 Three-Dimensional Structure of Biomolecules
  • 324.
  • 325.
    The database Isostar,assembled at the Cambridge Crystallographic Data Centre, holds numerous such contact geometries and spatial distributions available. 14.8 Solutions to the Same Problem: Serine Proteases with Differing Folds Have Identical Function It was shown in Sect. 14.4 that the amino acids that determine the folding and function of a protein occur in separate parts of the structure. For enzymes with the same function, Nature has come to the same solution, however, by different folding. The function and therapeutic meaning of serine proteases will be discussed in more detail in ▶ Chap. 23, “Inhibitors of Hydrolases with an Acyl–Enzyme Inter- mediate”. A unit of three amino acids, the so-called catalytic triad, plays a key role in accelerating the hydrolysis of amide bonds by these enzymes. The two amino acids serine and histidine, and an acidic amino acid, such as aspartic or glutamic acid, are found in a characteristic spatial orientation. They are defined by the narrow Fig. 14.15 The formation or cleavage of an amide bond occurs by nucleophilic addition. A nucleophile, for instance, an oxygen or nitrogen atom, approaches the planar carbonyl carbon atom. During the reaction it rises out of the plane of the three neighbors and adopts a tetrahedral configuration. Examples were sought from low-molecular-weight crystal structures in which a nitrogen atom approaches a carbonyl group between a single bond and a van der Waals contact in the crystal packing. By superimposing these data it is recognizable that the approach of the nucleophilic nitrogen towards the carbonyl group is “perfomed from back and behind.” With this approach the carbon migrates out of the plane in the direction of the nucleophile. The geometry of this reaction step also determines the structural composition of the catalytic center of a variety of hydrolases (▶ Sect. 22.3). 308 14 Three-Dimensional Structure of Biomolecules
  • 326.
    boundaries that areestablished by the reaction geometry required for a nucleophilic addition (Sects. 14.6 and ▶ 23.2). Their composition is ideally suited for the cleavage of amide bonds. The enzyme trypsin is constructed from two barrel-like subunits (Fig. 14.17a). The catalytic site is located at the interface of these two subunits. Subtilisin is another serine protease that belongs to the class of open-pleated-sheet structures. The catalytic triad occurs in a loop area at the edge of the pleated sheet (Fig. 14.17b). If the amino acids that are involved in the catalysis are removed from the protein and superimposed in space, the identical geometry of the triad is obvious. In addition to the mentioned enzymes, this catalytic triad is also encoun- tered in lipases and esterases (▶ Sect. 23.7), which also cleave peptide or ester bonds. Although they display divergent scaffold folding, the geometric orientation of their triads is once again identical. 14.9 DNA as a Target Structure of Drugs Our genetic information is encoded on the DNA molecule. It is a thread-like molecule approximately 20 Å in diameter and reaches a length of up to 2 m in the extended form. It is constructed as a double helix (Fig. 14.18). On the outside, a polymer chain of sugar and phosphate building blocks tighten themselves like a guardrail around the base pairs. The latter bases form a complementary pair on each step. Base pairs are coupled between one another by a hydrogen-bond pattern. In doing so, a purine base (adenine A and guanine G) always interacts with Fig. 14.16 The crystal packing of low-molecular-weight compounds affords an overview of possible interaction geometries of hydrogen-bond donors (left) and acceptors (right) around the nitrogen atoms of an imidazole ring. Accordingly all structures with an imidazole ring were sought in which at least one of the two nitrogen atoms participates in a hydrogen bond. The superposition of the structures shows where the positions of the interacting partners can be expected. 14.9 DNA as a Target Structure of Drugs 309
  • 327.
    a pyrimidine base(cytosine C and thymine T, in the related RNA molecule, thymine is replaced by uracil U; Figs. 14.18 and 14.19). The spiral staircase that is formed has a pitch of 34 Å and reaches a full turn after ten steps. The two mutually wound polymer strands form two grooves of different sizes on their surfaces (Fig. 14.18). If the DNA is examined from the side along the steps at the major and minor groove, the characteristics of the base pairs will be visible. There are three functionalities in the minor groove that determine the interaction with other molecules. In the major groove there are four. Interestingly the pattern that is read in the major groove is unambiguous because of the exposed properties for each base pair on a step. Only the difference between either AT/TA or GC/CG can be distinguished in the minor groove (Fig. 14.19). The base pairs on each three neighboring steps code for an amino acid (▶ Sect. 32.6). To read this information unambiguously from the DNA, proteins that regulate gene expression (so-called transcription factors) read the information Fig. 14.17 Trypsin (a, red) and subtilisin (b, green) are serine proteases. They have the same catalytic triad of serine, histidine, and aspartic acid. These function-determining amino acids are, however, placed upon entirely different folding patterns. In the above-right picture, the course of the chain of both proteins is superimposed upon one another (c). Despite this, the side chains of the amino acids of the catalytic triad are in the same spatial position (d). The course of the polymer chains are shown with colored ribbons that represent the spatial orientation of side chains of the three catalytic amino acids. 310 14 Three-Dimensional Structure of Biomolecules
  • 328.
    from the majorgroove, from the side (cf., ▶ Sect. 28.2). Only there is it possible to read the prescribed code (AT, TA, GC, CG) unambiguously. Due to the many outwardly oriented phosphate groups, the DNA molecule is heavily charged. This charge is neutralized by the formation of ion pairs, mostly with magnesium. Because of its important role in the mediation of genetic information, several important drugs act on DNA. Two examples are briefly mentioned here. Cisplatin 14.4 is a reactive metal complex that can react with the nitrogen atoms of two nucleobases on two adjacent steps of the DNA by exchanging both chlorine substituents (Fig. 14.20). This crosslinking distorts the DNA in such a way that the sequence information is no longer readable. Cisplatin and analogous derivatives such as carboplatin are used in cancer therapy as potent chemotherapeutics. Daunorubicin 14.5 is a representative with a somewhat different mode of action, but it also prevents the reading of the DNA base pairs. By slightly spreading the DNA along the chain the planar molecular part of 14.5 slips largely between two adjacent base pairs and causes a structural distortion of the DNA (intercalation). This intravenously administered cytostatic is used as a combination scheme therapeutic for the treatment of acute leukemias. Many natural products also use this so-called intercalation mechanism for a b c major groove minor groove Fig. 14.18 The DNA molecule is built of single stair steps. A base pair forms each step. The sugar phosphate chain suspends the steps like a double banister. It forms a major and a minor groove on the surface. (a) A segment of DNA with 14 base pairs, (b) a schematic representation with the sugar phosphate backbone as a gray arrow, thymine (light blue) adenine (red), cytosine (violet), and guanine (light green). (c) A model of a DNA surface in which the size difference between the minor and major grooves is emphasized. The individual bases align according to their interaction properties (blue: H-bond donor, red: H-bond acceptor, gray: hydrophobic contact). 14.9 DNA as a Target Structure of Drugs 311
  • 329.
    their antibacterial activityspectrum. Other pharmaceutical research approaches try to use segments of DNA themselves for therapy. Such modified-oligonucleotide thera- peutics are discussed in ▶ Sect. 32.4. 14.10 Synopsis • Every third bond in the polymer chain of a protein is an amide bond. It is the fundamental building block in the protein backbone and the mutual spatial arrangement of the sequential planar amide bonds determines the overall archi- tecture of a protein. O N N N N O N H H H O N N N O H H H O N N N N N H H H O N N O O CH3 H O N N N N O N H H H O N N N O H H H O N N N N N H H H O N N O O H3C H G C A T C G T A G • • • • • C C • • • • • G A • • • • • T T • • • • • A major major major major minor minor minor minor Fig. 14.19 The DNA base pairs of cytosine (C) with guanine (G) and thymine (T) with adenine (A) on the individual steps are formed by complementary hydrogen bonds. Each base carries a sugar phosphate group that is coupled with the polymer chain. It affords a double-helical construction with a minor (green) and major (yellow) groove (cf. Fig. 14.18). If viewed from parallel to the steps, four groups can been seen in the major groove that possess either hydrogen bond donors (blue), acceptors (red), or hydrophobic properties (gray). Three such groups are aligned in the minor groove. If an attempt is made to read the interaction pattern from this side, a GC or CG pair and a AT or TA pair are recognized as identical. Here the orientation of the interaction pattern cannot be distinguished. In the major groove, on the other hand, the pattern of exposed interaction is unambiguous. Therefore, proteins read information about the DNA from the major groove. 312 14 Three-Dimensional Structure of Biomolecules
  • 330.
    • Typical arrangementsinvolving the amide NH and C═O groups in hydrogen bonds lead to a-helical and b-pleated sheet structures. Reversal of the polymer chain in space is achieved in turns that can adopt a variety of distinct geometries. • Helices, sheets, and turns, the secondary structure elements, assemble into motifs and domains to form the tertiary and quaternary structure of proteins. • The function of a protein is not necessarily coupled to a particular folding pattern, however, the catalytic and ligand-functional sites within a folding class are found at the same position. • Nature separates fold-stabilizing residues from function-carrying amino acids to keep the dual optimization problem separated. • Proteases recognize peptide sequences specifically via the binding in well- tailored pockets on both sides of the cleavage site. • Peptide libraries with an attached photometric or fluorescent label that can be cleaved by the protease reaction help to elucidate the substrate profile of different proteases. Fig. 14.20 Crystal structure of an oligomeric DNA segment after a reaction with cisplatin 14.4 (a) or intercalation with daunorubicin 14.5 (b). In both cases the DNA molecule is severely distorted and the genetic information on the DNA cannot be read for cell division. Cisplatin reacts with the nitrogen atoms of two nucleobases (here guanine) of the DNA on neighboring steps with substitution of both chlorine atoms. With its planar tetracyclic ring system, daunorubicin interca- lates between two neighboring base pairs by spreading the DNA along the helix axis. The compound’s amino sugar accommodates in the DNA minor groove. 14.10 Synopsis 313
  • 331.
    • Structural arrangementsof molecular portions found in multiple crystal struc- tures can be arranged sequentially in a kinematic order to provide an idea of a dynamic process. • The spatial arrangement of amino acid residues exerting a particular chemical transformation is highly conserved and can reside on protein architectures with similar geometry that are constructed from deviating folds. • The DNA molecule encodes our inheritance and forms a double helix of two banister-like sugar-phosphate polymer chains wrapping around complementary pairs of bases on successive steps. Through the H-bonding pattern of the central bases each single DNA strand is complementary to the second strand. A minor and a major groove are formed between the sugar-phosphate banisters. A unique reading of the coding base pairs can be accomplished from the major groove only. Bibliography General Literature Branden C, Tooze J (1999) Introduction to protein structure, 2nd edn. Garland, New York Bürgi HB, Dunitz JD (1994) Structure correlation, vol 1. VCH, Weinheim Jeffrey GA, Saenger W (1991) Hydrogen bonding in biological structures. Springer, Berlin Schulz GE, Schirmer RH (1978) Principles of protein structure. Springer, New York Special Literature Allen FA, Kennard O, Taylor R (1983) Systematic analysis of structural data as a research technique in organic chemistry. Acc Chem Res 16:146–153 CSD Database: www.ccdc.cam.ac.uk/products/csd/ Klebe G (1994) The use of composite crystal-field environments in molecular recognition and the de novo design of protein ligands. J Mol Biol 237:212–235 Koch O, Klebe G (2008) Turns revisited: a uniform and comprehensive classification of normal, open, and reverse turn families minimizing unassigned random chain portions. Proteins: Struct Funct Bioinform 74:353–367 Lario PI, Vrielink A (2003) Atomic resolution density maps reveal secondary structure dependent differences in electronic distribution. J Am Chem Soc 125:12787–12794 Orengo CA, Jones DT, Thornton JM (1994) Protein superfamilies and domain superfolds. Nature 372:631–634 PDB Database: http://www.rcsb.org/pdb/home/home.do Vyas K, Monahar H, Venkatesan K (1990) Thermally induced O to N acyl migration in salicylamides. Thermal motion analysis of the reactants. J Phys Chem 94:6069–6073 Wood VJL, Patterson AW et al (2005) Substrate activity screening: a fragment-based method for the rapid identification of nonpeptidic protease inhibitors. J Am Chem Soc 127:15521–15527 314 14 Three-Dimensional Structure of Biomolecules
  • 332.
    Molecular Modeling 15 Molecules aremost commonly communicated in chemistry as two-dimensional molecular representations. This formalism is tried and true and has proven to be enormously fruitful. The ability of a chemist to quickly comprehend and intellec- tually process structures should not be underestimated. The notation nonetheless has its limitations. In particular, the three-dimensional shape of a molecule is not directly apparent from the chemical formula. The geometry, however, is of great importance for the physical, chemical, and biological properties of drugs and consequently for drug design as well. Therefore structure determination (▶ Chap. 13, “Experimental Methods of Structure Determination”) is granted special impor- tance. Whenever possible, the experimentally determined 3D structure of the active substance and the target protein is consulted to explain the structure–activity relationship. That notwithstanding, there is often the problem that these structures are not always available. In these cases, the explanation for the experimental results is limited to the structural consideration of generated models. 15.1 3D Structural Models as Well-Established Tools in Chemistry Three-dimensional structure models have been used since Jacobus H. van’t Hoff and Joseph Le Bel. Emil Fischer reported in his book Aus meinem Leben about a vacation in Italy: In the previous winter 1890/91 I was busy with the task of clarifying the configuration of sugar, without entirely achieving my goal. Then the thought came to me in Bordighera that the decision about the configuration of pentose has to do with its relation to trioxyglutaric acid. Unfortunately for lack of a model I could not tell to what extent such acids are possible according to theory and I therefore posed the question to Baeyer. He picks up such things with great enthusiasm, and directly constructed carbon atoms from balls of bread and toothpicks. But after many attempts he gave the cause up, ostensibly because it was too hard. Later in W€ urzburg after considering good models at length, I managed to find the conclusive solution. G. Klebe, Drug Design, DOI 10.1007/978-3-642-17907-5_15, # Springer-Verlag Berlin Heidelberg 2013 315
  • 333.
    Linus Pauling wasthe first to propose the a helix as a secondary structure in proteins. The key to Linus’s success was his reliance on the simple laws of structural chemistry. The a-helix had not been found by only staring at X-ray pictures. The essential trick, instead, was to ask which atoms like to sit next to each another. In place of pencil and paper, the main working tools for this work were a set of molecular models superficially resembling the toys of pre-school children. With these sentences the Nobel prize winner James Watson described the approach of Pauling in his book The Double Helix. Pauling’s success was also based upon well-founded proficiency in theoretical chemistry. That is how Pauling knew that an amide bond is stiff and flat, whereas his rivals, William Bragg, Max Perutz, and John Kendrew, were of the misconception that they would be flexible. James Watson and Francis Crick went the same way as Pauling in the search for the DNA structure: We could thus see no reason why we should not solve the DNA problem in the same way [as Pauling]. All we had to do was build a set of molecular models and begin to play—with luck the structure would be a helix. Working with molecular models must not have been pure pleasure back then. In one place in the book, for example, he writes: Our first minutes with the models, though, were not joyous. Even though only about fifteen atoms were involved, they kept falling out of the awkward pincers set up to hold them the correct distance apart. Later other problems were talked about: No serious models were built, however, for several days. Not only did we lack the purine and pyrimidine components, but we had never had the shop put together any phosphorus atoms. Our machinist needed at least three days merely to turn out the simplest phosphorus atoms. . . Based on this background the achievement of Watson and Crick seems even more impressive. They were awarded the Nobel Prize in 1962 for the elucidation of the double-helix structure of DNA. This example should underscore the importance of models in science. To end with a word from Francis Crick: “A good model is worth its weight in gold.” 15.2 Strategies in Molecular Modeling In contrast to the 1950s and 1960s, computers are available today with impressive graphical performance and high computing speed. Accordingly, programs are available for working with molecular models. The new field of molecular model- ing has been established. This term encompasses the display and manipulation of realistic three-dimensional molecular structures along with the calculation of their physicochemical properties. The most important methods that are employed in the context of molecular modeling are summarized in Table 15.1. 316 15 Molecular Modeling
  • 334.
    In principle, molecularmodeling can be approached from two sides. One possibility is to extrapolate the geometry and physicochemical properties to the investigated structure from known experimental data. In the other approach an attempt is made to obtain as accurate a computed prediction as possible by starting from first principles. Quantum chemical methods and force-field calculations belong to this strategy. In practice both approaches are used in parallel and are increasingly coupled to one another. When relevant experimentally determined structures are available, it would be silly not to use these for the model construction. On the other hand, the quantum chemical and molecular mechanical approaches are broadly applicable and deliver reliable results. The construction of a structural model is achieved in three steps: • Generation of a starting model • Optimization and analysis • Work with the model. It is advisable to stay as close to experimental structures as possible when generating the starting model. For this, the crystal structure of an active substance can be consulted. The Cambridge Crystallographic Database, in which experimentally determined structures of small molecules are stored, is searched, Table 15.1 Overview of the most important molecular modeling approaches in pharmaceutical research Technique Objective Interactive computer graphics Display of 3D structures Modeling small molecules 3D Structure generation (CONCORD, CORINA) Molecular mechanics—force fields Molecular dynamics Quantum mechanical techniques Conformational analysis Calculation of physicochemical properties Comparing molecules Superimposition of molecules according to their similarity Volume comparisons 3D-QSAR (e.g., CoMFA methods) Protein modeling Sequence comparisons Protein homology modeling Protein-folding simulations Modeling of protein–ligand interactions Binding constant calculations Ligand docking Ligand design Searches in 3D databases Structure-based ligand design de novo design Virtual screening 15.2 Strategies in Molecular Modeling 317
  • 335.
    and the geometryof the resulting hits most closely resembling the query molecule are used. In the next step the molecule is optimized by a force-field calculation. There are also standard programs for the generation of starting models that translate a 2D structure formula into a 3D spatial structure according to the principle of a molecular model kit. These “electronic molecule-construction kits” have lists of bond lengths and angles as well as preferred fragment geometries stored, and build molecules according to a sophisticated system of rules. In frac- tions of seconds they determine the 3D spatial structure for the 2D structural formula. The program CONCORD from Robert Pearlman in Austin, Texas, and CORINA from Johann Gasteiger and Jens Sadowski at the University of Erlangen are among the most important. Both programs are used to generate 3D structures of small molecules. The 3D structure of a protein, however, cannot be built with these programs. More sophisticated techniques are necessary for proteins (▶ Sect. 20.1). 15.3 Knowledge-Based Approaches Perhaps the most often used technique for molecular modeling is the so-called knowledge-based approach. Here an attempt is made to exploit the enormous accumulated knowledge from experimentally determined molecular structures, crystal packings, protein structures, protein sequences, and structure–activity rela- tionships from protein–ligand complexes, etc., to efficiently solve the relevant problem. Basically nothing more is done here than to imitate the approach that a conscientious scientist would take with a computer program. Initially as much experimental data as possible is collected and analyzed. Important information sources are the Cambridge Crystallographic Database with over 500,000 crystal structures of small molecules as well as the protein databank (PDB) with more than 80,000 protein and DNA structures. Physicochemical properties are also available in databases. The Beilstein database, with almost 10 million chemical structures, contains, for example, pKa values for more than 20,000 compounds. The challenge lies in the extraction of the necessary data for the question at hand from the enormous plethora of electronically available information. Furthermore, it must be considered that the data comes from different sources and could be partially erroneous. The largest growth in electronically available data recently has occurred in the area of DNA sequences. Hundreds of genomes have been sequenced, and new ones are added weekly. The nearly endless number of sequences can only be conquered with intelligent searching protocols. Knowledge-based approaches play a central role in this area and in the modeling of protein structures. 15.4 Force-Field Methods Force-field methods, also known as molecular mechanics, are empirical tech- niques for the calculation of molecular geometries. The goal of a force-field 318 15 Molecular Modeling
  • 336.
    calculation is thedetermination of an energetically favorable three-dimensional structure of a molecule, or of a complex of several molecules. The forces that act between the atoms are described in an analytical form with the appropriate param- eters. Covalent and non-covalent forces are considered. The central idea of molec- ular mechanics is the assumption that the bond lengths and angles adopt values that are close to standard values in molecules. Steric interactions, that is, the repulsion of two atoms that are not directly connected to one another, can lead to the situation that some bond lengths and angles cannot adopt their ideal values. These repulsive interactions are also called van der Waals interactions. For the first time in 1946, three terms, van der Waals interaction, bond stretching, and angle deformation, were proposed that should be enough to calculate the structure and energy of molecules. However, at that time the execution of such calculations was extremely difficult. It was only after the availability of computers increased that molecular mechanics calculations gained importance. In addition to the three originally proposed terms, a typical force-field that is used today contains at least one additional contribution that considers rotations around the dihedral angles (Fig. 15.1). Furthermore, many force-fields use terms for electrostatic interactions. For this, a partial charge must be assigned to each atom. The sum of these charges results in the formal charge of the entire molecule. In most cases, this is set to zero. Coulomb’s law is used to describe the forces that occur between charges. This law states that the product of interacting charges is inversely related to the square of the distance between them, or considering the potential, it is inversely related to the distance. The assignment of the charges and the correct choice of dielectric constant is critical for the correct treatment of electrostatic energy contributions. These values are in the denominator of Coulomb’s law and can adopt values between e ¼ 80 for water and e ¼ 1 for vacuum. With this, the electrostatic interactions in water are very quickly damped, whereas in a vacuum they tend to reach further. The choice of the correct dielectric constant for force-field calculations in proteins is very difficult. Many values between e ¼ 4 and e ¼ 20 have been tried. The constant is sometimes assumed to be environmentally dependent so that larger values are chosen next to the surface than for the protein interior. The van der Waals interactions are described by the Lennard–Jones potential. This interaction has an attractive term that falls at a rate of 1/r6 , and a repulsive term that falls at a rate of 1/r12 (Fig. 15.1). The result of the combination of these terms is a gradient that is very large near the atoms, and that approaches zero the larger the distance becomes. In the middle it passes through a potential energy minimum (▶ Fig. 18.5). In addition to the A/r6 –C/r12 gradient, other distance dependencies with other poten- tials or exponential gradients in force-fields are used. A force-field is derived by calibrating upon the experimental data and upon the results of high-level quantum mechanical calculations. For this the 3D structures of small molecules as well as infrared and Raman spectroscopy-derived force con- stants are used. It is clear that different parameters must be used for a single bond between two carbon atoms than for a double bond. Therefore multiple different atom types per element are used in a force-field. The crystal packing of small organic molecules can be consulted for nonbonding interactions. Amino acids and 15.4 Force-Field Methods 319
  • 337.
    many functional groupsof active compounds can occur in a protonated or deprotonated state according to the applied pH conditions (so-called titratable groups). The strength of the interactions strongly depends on the charge state of the involved functional groups. The acidity or basicity of a given functional group is E = EBond length + EBond angle + ETorsion + ENon-covalent Bonds E = Kb (b −b0)2 ∑ 1 2 Bond angle + KΘ(Θ − Θ0)2 ∑ 1 2 Torsion angle + KΦ(1+cos(nΦ−d)2 ∑ 1 2 Nonbonding atom pairs + (Aijrij −12 − Cijrij −6 + qiqj / εrij) ∑ Fig. 15.1 E is the total energy of a molecule or a complex of several molecules. It is composed of various contributions. The first term describes the energy change upon stretching or compressing a chemical bond. In the example at hand, it describes the so-called harmonic potential with the force constant Kb and the equilibrium bond length b0 as a parameter. The energy as a function of the bond angle Y is described in the second term. Here too, the harmonic potential is used with the force constants KY and an equilibrium constant Y0. The third contribution describes the change in the energy upon changing the dihedral angle, and the last term stands for non-covalent interactions. The sum of three terms is used for this last contribution. The first term Aij/rij 12 is always positive and rises quickly with decreasing distance. It describes the repulsion between atoms that come too close together. The contribution from Cij/rij 6 is always negative and approaches zero with increasing distance rij, though not as fast as the repulsive term. It describes attractive interactions, which are also called dispersion interactions. Other attractive interactions exist between polar molecules that are also proportional to 1/rij 6 (for a description of the potentials see ▶ Sect. 18.12, ▶ Fig. 18.5). The last term qiqj/erij describes the electrostatic interactions based on Coulomb’s law, which are based on a point charge model. The dielectric constant is e. The non-covalent contri- bution to the total energy, without the electrostatic term, is called van der Waals energy. 320 15 Molecular Modeling
  • 338.
    determined by itspKa value. This indicates how easily a group accepts or releases a proton. This property, in turn, depends heavily upon the partial charge that the group carries and what other charges are in the immediate vicinity of the group. Thus, the pKa value shifts if a functional group comes into an altered environment. For example, carboxylic acids become more acidic when they are brought near a positive charge. Their acidic nature changes, on the other hand, if a partially negatively charged group is nearby. This effect must be considered in a reliable force-field calculation. An attempt can be made to predict the protonation state in protein–ligand complexes with such calculations. For this, the contribution to the energy content of the complex is determined by evaluating all possible combina- tions of states of titratable groups. In this way the shift in the pKa values of functional groups can be estimated. The importance of water as a binding partner in the formation of protein–ligand complexes was emphasized in ▶ Chap. 4, “Protein–Ligand Interactions as the Basis for Drug Action”. Complex formation causes a change in the solvation conditions for the involved molecules. This must be considered in the force-field calculations. For this, a force-field is combined with estimations for the contribution from solvation. Newer methods such as the MM-PBSA or MM-GBSA methods try to sum up these contributions over the local environment in a surface-dependant way. The choice of a relevant starting geometry is important for any force-field calculation. A force-field calculation leads to an energy minimization. By starting from an energetically unfavorable geometry, the force field drives “downhill” to the next local minimum on the multidimensional energy surface (▶ Sect. 16.2). If one starts with two different geometries, the resultant minimized structure can also be different. Many molecules and especially protein–ligand complexes can adopt numerous energetically favorable conformations. It is therefore recommended that multiple force-field calculations are performed by starting from different geometries. 15.5 Quantum Chemical Methods In quantum mechanical approaches, the electronic structure of molecules is calcu- lated by using the Schrödinger equation. Its mathematically closed solution is, however, only possible for simple cases such as the hydrogen atom or the molecular ion of hydrogen, H2 + . For molecules with multiple electrons, approximate methods must be used for the solution of the quantum mechanical “many-body problem.” The most commonly used approximation is the so-called Hartree–Fock method. Here, the many-body problem is reduced to multiple single-body problems. The sum of the electron–electron interactions within a molecule is replaced with an effective field that can be iteratively refined. It is from this that the commonly used name, SCF (self-consistent field) is derived. Each electron in this model “sees”, in addition to the potential of the nuclei, the averaged potential of the remaining electrons. The state of each electron in a molecule is described by a single-particle 15.5 Quantum Chemical Methods 321
  • 339.
    function, the so-calledatomic orbital (AO) or, in a molecule, molecular orbital (MO). The wave function of the entire molecule is applied as the antisymmetric product of these many orbitals. The Hartree–Fock equation is obtained on the condition that optimally chosen orbitals lead to minimal energy. The main defi- ciency of the Hartree–Fock approach, namely, neglecting the electron correlation, can be corrected with more elaborate methods, whereby the calculation time, however, severely increases. Quantum mechanical ab initio calculations allow the calculation of the molec- ular structure and electron density distribution as well as molecular properties without the assumptions that are necessary for force-field calculations. In many cases it is difficult to make predictions a priori based on the hybridization state of the atoms. In the case of amines and sulfonamides, it is often impossible to predict whether the atoms that are bound to nitrogen are in the same plane or whether nitrogen is in a pyramidal environment. In a force-field calculation one must specify from the very beginning what atom type is to be assigned to which atom (i.e., for the above case, whether it should be a planar or a pyramidal nitrogen atom). If the wrong atom type is chosen, the resulting structure is, of course, meaningless. Quantum mechanical calculations require no such assumptions. The majority of currently applied force-fields use a point-charge model to describe the electrostatic interactions. One possibility to derive the atomic charges is to calculate the electrostatic potential of a small molecule that contains the group in question by using quantum-mechanical methods. Subsequently, a set of partial charges is assigned to the various nuclei so that the quantum mechanically calcu- lated potential is depicted as accurately as possible. These charges can then be transferred to force-field calculations to be used in a large system. A further important application of quantum-mechanical calculations in drug design is found in the calculation of conformational energies of small molecules to calibrate force-fields. The force-fields that have been developed for proteins and peptides are based on conformational energies that have been quantum- mechanically calculated for small peptides. In contrast to force-field methods, quantum-mechanical techniques are able to consider the polarization of the electron density caused by the influence of neighboring groups. For example, the amide bond dipoles in an a helix are all oriented in the same direction so that they sum up to a significant total dipole moment. As a consequence, such large compiled dipoles can polarize other groups that are localized at the end of the helix. In this way the induced dipoles are incompletely described by force-field methods. For quantum-mechanical methods, this is not a problem. A further important application area is chemical reactions for which force-fields are hardly parameterized at all, with the exception of a few special cases. Here quantum mechanical methods are the only possibility for theoretical description. Quantum-mechanical methods are considerably more elaborate than force-field methods. The most accurate methods, which also devour the most calculation time, are the so-called ab initio methods. These techniques meet their limits however, with very large systems. Therefore other less computationally demanding methods were developed. In these so-called semiempirical methods, certain integrals, the 322 15 Molecular Modeling
  • 340.
    determination of whichrepresents the rate-determining step in ab initio methods, are replaced with adequate approximations that are quickly calculated. The drasti- cally reduced calculation time that results, which is nevertheless accompanied by reduced accuracy, allows the routine application of semiempirical calculations to active molecules and proteins. Density functional theory represents another faster ab initio technique. With this method, the position-dependent electron density distribution is calculated in the ground state for a many-body system; the complete solution to the Schrödinger equation for a many-body system is avoided. All of the interesting properties are then derived from the electron density. Techniques have been developed for large protein–ligand systems that treat the interesting areas, for example, the binding site or the catalytic reaction center, quantum mechanically. The surrounding areas are approximated with a faster force-field method (QM/MM methods). 15.6 Computing Molecular Properties The result of a molecular mechanics or quantum chemical calculation is at first a set of atomic coordinates that define the three-dimensional shape of the molecule. What can be done with this? An important application of the calculations is the determination of conformational energies: this is the relative energy of a molecular conformation in comparison to another (▶ Sect. 16.1). Two further molecular properties can be calculated: the form and size of a molecule along with its electronic characteristics. All of the currently used graphics programs have multiple options for the display of the spatial structure of molecules. The most important are summarized in Fig. 15.2. The most often used representation is a line or stick representation (Dreiding models), sometimes atoms are displayed as little spheres. As a general rule, a color- coding is used to denote the atoms; nitrogen is blue, oxygen is red, sulfur is yellow, fluorine is turquoise, chlorine is green, bromine is brown, and iodine is violet. Hydrogen atoms are shown in white, but usually they are omitted for the sake of clarity. Carbon atoms are generally shown in black or gray. In the majority of figures in this book, carbon atoms that belong to protein are shown in orange, and carbon atoms that belong to the ligand are shown in gray. Another display option is the space-filling model, with which van der Waals surfaces are shown. For this representation each atomic nucleus is shown with a sphere, the size of which corresponds to the van der Waals radius. Values for these radii come from the crystal packing or from very exact ab initio calculations. Such representations are also known as CPK models (named after the scientists Corey, Pauling, and Koltun). Furthermore there are other options for displaying surfaces (Fig. 15.3). The solvent-accessible surface has proven particularly valuable for proteins. The most-used protein-display form in this book is transparent-opaque white surfaces. The van der Waals surfaces in Fig. 15.3a give the impression that a crack is present at the position that is marked with the arrow. This crevice, however, is so narrow that no other atom fits inside. Therefore the solvent-accessible surface (Fig. 15.3b) 15.6 Computing Molecular Properties 323
  • 341.
    is less misleading.It is generated by rolling a sphere with a radius of 1.4 Å, which corresponds to the size of a water molecule, over the surface of the molecule. This surface appears much smoother. Depressions that are still present mean that small molecules – at least a water molecule – can really fit in there. The Lee– Richards surface is less frequently used but very helpful. It is so chosen that ligand atoms that come into contact with the examined surface lie directly on this surface (Fig. 15.3c). The surface can be colored too. For example, a color can be assigned to each atom type, and then the color of the next-closest atom can be used for the surface. A representation in which the molecule’s surface is colored according to other properties, for example, electrostatic or hydrophobic potential, is very instructive. Fig. 15.2 Different computer graphics representations of dopamine (▶ Sect. 1.4, Formula 1.13). Carbon atoms are colored gray, hydrogen atoms are white, nitrogen atoms are blue, and oxygen atoms are red. (a) Dreiding models. (b) Ball-and-stick models. (c) Space-filling models (CPK representation). (d) Solvent-accessible surface. (e) Electrostatic potential projected on the surface (positively charged areas are blue, negatively charged areas are red). (f) Highest-occupied molecular orbitals (HOMO), calculated for the uncharged dopamine molecule. The blue or red areas of the wave function indicate a different sign. 324 15 Molecular Modeling
  • 342.
    15.7 Molecular Dynamics:Simulation of Molecular Motion None of the processes that are interesting to us run at 0 Kelvin, but rather at body temperature, which is approximately 310 Kelvin. It is therefore clear that not only the potential energy but also the kinetic energy must be considered. Molecules move at room temperature. They diffuse and change their shape in that they adopt different conformations. The flexibility and adaptability of both partners play a big role in protein–ligand interactions. A prerequisite for protein binding is that the ligand can take on a conformation that corresponds to the shape of the binding pocket. On the other hand, the protein is flexible to a certain extent. For example, side chains on the surface can adopt different conformations or entire domains can move relative to one another. The mutual adaptation of protein and ligand shapes plays an important role in the formation of protein–ligand complexes in particular. The molecular dynamics simulation (MD) is a theoretical method to describe these effects. In molecular dynamics simulations the movement of atoms and molecules is followed under the influence of the chosen force-fields. It is assumed a b c Fig. 15.3 Definitions of molecular surfaces (a) van der Waals surface. The arrow marks a place where a crevice is found, but it is too small to accommodate a water molecule. (b) Solvent- accessible area. (c) Lee–Richards surface. 15.7 Molecular Dynamics: Simulation of Molecular Motion 325
  • 343.
    in these calculationsthat the interactions between particles obey the laws of classical mechanics. For this, the Newtonian equations of motion are solved in parallel and stepwise for all particles simultaneously. Usually it is assumed that the force between two particles is not influenced by other particles. In practical applications, a starting geometry is generated at first (Fig 15.4). If an experimentally determined structure, for instance, the crystal structure of a protein– ligand complex, is available, then that is the starting point. To take the surrounding water shell into consideration, the complex is dipped into a “water bath,” that is, a large number of water molecules enclose it. Further, an adequate number of ions is added to keep the whole system in an electrically neutral state. To prevent boundary effects on the “walls,” a trick called “periodical boundary conditions” is used on the water bath. If the simulated protein complex approaches such a wall and wants to leave the water bath, the process is handled on the computer as though the complex had again entered from the opposite side. Formally, the boundary areas of the water bath are eliminated. In the beginning of the actual simulation each atom is assigned a random starting velocity with an arbitrary orientation. The velocities are chosen so that on average they correspond to the desired temperature (Boltzmann distribution). Then all forces from all surrounding atoms acting on a particular atom are calculated. At set time intervals the next position is calculated with Newtonian motion equa- tions, and so forth. The step width is typically a femtosecond (1 fs ¼ 1015 s). This small step width is necessary because there are many extremely fast processes that occur on the molecular level. The development of the movement is followed for multiple nanoseconds, and is shown in terms of a trajectory. Ten nanoseconds are enough to follow the movement of side chains and sometimes even of protein domains. It is not enough, however, to describe the diffusion of an active compound into the binding pocket. For this, longer simulation times are necessary. The folding of a protein is also difficult to simulate with this technique. The necessary time for protein folding is on the actual time scale between 20 ms and 1 h. The calculation of one time step (1 fs) still requires seconds of processing time on even the fastest computers. Nonetheless new algorithms and computers with more specific archi- tectures are being developed that will make such simulations possible in the foreseeable future. Another application of MD simulations, the calculation of binding affinity, should be mentioned here. In principle the free energy DG for a given system can be calculated. From the point of view of statistical thermodynamics the so-called partition function (German: Zustandssumme) is determined for this, in which the energetic contributions of all possible configurations of a system are considered. The entropic component of the system is automatically calculated by determining the distribution and relative population of the many states. Differences in the free binding energy of different ligands is of particular interest in the context of protein– ligand interactions. Experience has shown, however, that only differences in the binding free energy between two similar ligands can be reliably calculated. In modern applications (e.g., for screening purposes, ▶ Sect. 7.4), particularly large amounts of data are evaluated. Therefore, the effort associated with MD 326 15 Molecular Modeling
  • 344.
    calculations to estimatethe binding affinities can hardly be afforded. Furthermore, many simple empirical methods allow a good affinity estimation to be made that is of similar quality. Therefore these faster methods are more readily used. 15.8 Dynamics of a Flexible Protein in Water The most important application of molecular dynamics simulations is undoubtedly the ability to follow the motion of one or more molecules in solution. For example, which parts of a protein’s binding pocket or a ligand are rigid upon protein–ligand complex formation and which are flexible, can be investigated. The enzyme aldose reductase has proven to be a very flexible protein. It is capable of adapting its binding pocket to the shape of a complexed ligand in versatile ways. This property is related to the biological function of this protein. It reduces a very broad palette of aldehyde substrates. Its exact function and role as a target structure for a drug therapy is discussed in ▶ Sect. 27.5. Highly flexible and adaptive proteins pose a special challenge to drug design. From the many crystal structure determinations it became apparent that there are several parent confor- mations for aldose reductase that are most likely in a dynamic equilibrium with one another. A binding ligand picks out a conformation from this equilibrium that fits, and the conformation becomes stabilized upon binding. These considerations are Generate Start Coordinates Choose Starting Velocity Calculate Forces (Pair Approximation) Calculate Velocity and New Coordinates Save Coordinates Another Step? Yes End No Fig. 15.4 Schematic course of a molecular dynamics simulation. The starting geometry is either an experimentally determined structure or a geometry that was optimized with force-fields. Each atom is assigned an appropriate starting velocity. Then the movement equations are stepwise solved with these starting conditions and the coordinates are periodically saved. 15.8 Dynamics of a Flexible Protein in Water 327
  • 345.
    applied to GPCRsin particular, which are introduced in ▶ Chap. 29, “Agonists and Antagonists of Membrane-Bound Receptors”. Matthias Zentgraf carried out extensive molecular dynamic simulations on aldose reductase. The profile that resulted was consistent with the crystallographic structure determinations. Amino acids that are repeatedly found in many protein– ligand complexes with modified geometries were shown to be very flexible in MD simulations as well. If the trajectory of such simulations is evaluated, it is apparent that the protein flips between the above-mentioned parent conformations. Addi- tionally, many geometries occur that have only small but structurally critical variations to these parent conformations. Small areas in the binding pocket are thus opened that are able to accommodate, for example, an additional methyl group or a phenyl ring on a ligand. Such information can be directly used for the design of new inhibitors. To provide an overview of the flexibility of a protein, the variation of the atom positions is calculated from one simulation state to the next along a trajectory. Just as with photographic film, these momentary pictures of complexes are called “snapshots.” Above all, it becomes transparent if a protein fluctuates for a particular time in one conformation before it flips into another geometry. In further progress it can either return to the original geometry or flip into another basis geometry. Such an orientation map is shown in Fig. 15.5. From this map, it can be extracted that the protein spends time in multiple parent conformations. If representative snapshots from these clusters of basis conformations are superimposed upon one another, a very good picture of which groups in the binding pocket show enhanced flexibility is obtained. In the example at hand, the side chains from two neighboring phenylalanines (Phe121 and Phe122, Fig. 15.6) are particularly implicated. These can swing out of the way to open a new, previously closed cavity in the binding pocket. In the context of drug design, such information can be translated into the design of new inhibitors that can occupy new binding pockets. In this way an improved affinity or selectivity for the target protein can be achieved. A ligand is shown in Fig. 15.7 that has been furnished with an additional benzyl group (red), that optimally fills the newly opened cavity in the snapshot (in light blue) in Fig. 15.6. 15.9 Model and Simulation: Where Are the Differences? To conclude this chapter, the terms “model” and “simulation” should be briefly compared and contrasted. Molecular models are used to approach questions that are experimentally difficult or impossible to address. What different conformations can a molecule adopt? This question is currently difficult to answer experimentally. Does a possible drug candidate fit into a protein’s binding pocket? Even this question is only answerable with laborious experiments. The use of models is an elementary component of every scientific discipline. Models have always played a central role in chemistry. It is shown in ▶ Chaps. 23, “Inhibitors of Hydrolases with an Acyl–Enzyme Intermediate”; ▶ 24, “Aspartic Protease Inhibitors”; 328 15 Molecular Modeling
  • 346.
    ▶ 25, “Inhibitorsof Hydrolyzing Metalloenzymes”; ▶ 26, “Transferase Inhibitors”; ▶ 27, “Oxidoreductase Inhibitors”; ▶ 28, “Agonists and Antagonists of Nuclear Receptors”; ▶ 29, “Agonists and Antagonists of Membrane-Bound Receptors”; ▶ 30, “Ligands for Channels, Pores, and Transporters”; ▶ 31, “Ligands for Surface Receptors”; and ▶ 32, “Biologicals: Peptides, Proteins, Nucleotides, and Macrolides as Drugs” how models, built on the basis of crystal structures of protein–ligand complexes, afford important contributions to drug design, especially in the preselection of possible molecular candidates for synthesis. The term “simulation” describes the calculations with models. Multiple options or variable combinations can be quickly evaluated on the computer for a given mathematical model. Such investigations can contribute considerably to a better understanding of the system. Next to theory and experiment, computer simulations have been called the third pillar of exact science. Number of Snapshots Number of Snapshots 2D RMS Diagram rmsd [Å] 600 500 400 300 300 400 500 600 200 200 100 100 0 0 0 0.3 0.6 0.9 1.2 1.5 1.8 Fig. 15.5 The development with time of the spatial deviations of various snapshots along the simulation trajectory are visualized on this map. Large deviations are color-coded with red, medium- sized deviations with green, and small deviations are colored blue. Green delineated square areas are recognizable along the main diagonal. There the complex spends time near a parent conformation. The transition to the next square represents a flip to a new geometry. If sectors outside the main diagonal are colored increasingly red, the geometry deviates strongly from the previously adopted conformation. If an area outside the diagonal is reached that is green, the newly adopted geometry is not very different from a state that the system reached one time. With such a map it is possible to see which of the many parent conformations a complex swings between. 15.9 Model and Simulation: Where Are the Differences? 329
  • 347.
    Beware of too-highexpectations in the area of drug design! It should not be overlooked that the performance of a reasonable simulation requires that the fundamental model is accurate and its limitation are well understood. This prereq- uisite is indeed well met in many areas of engineering science so that a simulation plays an important role in the design of automobiles or computer chips. Unfortu- nately, in chemistry things are more complicated. The currently available molecular models allow the assembly and ranking of compounds that are to be synthesized. They can also be used to design ligands with improved binding properties. None- theless, the present models are often not exact enough to allow detailed simulations of protein–ligand complexes with sufficient accuracy to determine a binding energy. In view of the importance of this field, this can only mean that more effort must be exerted for the collection of experimental data for the development of improved models. Fig. 15.6 Representative snapshots were taken from the different square area along the main diagonal in Fig. 15.5 and superimposed upon one another. It can be seen that above all else, the side chains of the phenylalanines Phe121 and Phe122 can undergo severe movements in the binding pocket. In doing so, they can also adopt conformations (e.g., the light-blue geometry) that open a new hydrophobic cavity in the binding pocket. 330 15 Molecular Modeling
  • 348.
    15.10 Synopsis • Modelshave been and still are used in chemistry in general, but in particular in modern drug design. Computer graphics is a versatile tool to display structures and models along with various properties assigned and/or geometrically superimposed onto these molecules. • Structures can be calculated by starting from first principles and by trying to regard physics as closely as possible. This is done with quantum mechanical calculations. Because these methods easily become elaborate and computation- ally intractable, an alternative is the empirical approaches. They are based on much simpler physics, normally classical mechanics, and treat molecules as a set of point charges in space interconnected by springs following harmonic potentials. • Empirical approaches can only be used if enough experimental data are available to parameterize and calibrate the empirical concepts. Therefore large databases assembling knowledge about molecular properties have been developed. COOH N N O Fig. 15.7 Conformations occur along the trajectory of the protein that open a new hydrophobic pocket when the side chain of a phenylalanine swings away (Fig. 15.6, i.e., light-blue geometry). This pocket can be occupied by a ligand. For this a benzyl group was added to the scaffold of the shown benzodiazepine-like inhibitor, which can occupy the opened pocket during the simulation. 15.10 Synopsis 331
  • 349.
    • Molecular mechanicsto compute the geometry of molecules are based on empirical force-fields. They comprise multiple energy terms that describe mutual interactions in the molecule either through bonds or through space. Particular potentials are used to describe the torsional barrier to rotations around single bonds. Furthermore nonbonded interactions are handled by special potentials. • The accuracy and required computational capacity of quantum chemical approaches depend on the sophistication of the basis sets of atomic or molecular orbitals used for the calculations. Parameterization of some parts of the calcu- lations with empirical data can significantly reduce the computational require- ments. Density function theory is a faster approach and works with electron density distributions instead of orbitals. Combinations of quantum chemical methods and force-field approaches have been developed to handle large sys- tems such as protein–ligand complexes. • Properties such as charges can be displayed on the surface of molecules. Different types of surfaces have been defined such as the van der Waals surface or the solvent-accessible surface. • Molecular dynamics simulations are normally based on potentials derived from empirical force-fields. They consider the properties of a molecule under dynamic conditions by solving Newtonian equations of motion. As a result, the motion of a molecule can be evaluated with time by analyzing the so-called molecular trajectory. • Molecular dynamics simulations can be used to study the flexibility of a protein next to its ligand-binding site. Such simulations can show multiple conforma- tions of the protein that are competent to accommodate different ligands. • Computer simulations allow the possible properties of molecules under differ- ent test conditions to be enumerated. They help to interpret results from experiments or help to predict properties of molecules to better plan the next experiments. Bibliography General Literature Barnickel G (1995) Molecular modelling – von der Theorie zur Wirklichkeit. Chemie in unserer Zeit 29:176–185 Birner P, Hofmann HJ, Weis C (1979) MO-theoretische Methoden in der organischen Chemie. Akademie-Verlag, Berlin Burkert U, Allinger NL (1982) Molecular mechanics, ACS monograph 177. American Chemical Society, Washington, DC Goodfellow JM (ed) (1995) Computer modelling in molecular biology. VCH, Weinheim Kunz RW (1991) Molecular modelling f€ ur Anwender, Teubner Studienb€ ucher Leach A (2001) Molecular modelling: principles and applications, 2nd edn. Prentice Hall, New York Lipkowitz KB, Boyd DB (eds) (1990) Reviews in computational chemistry. VCH, Weinheim 332 15 Molecular Modeling
  • 350.
    Special Literature Cornell WDet al (1995) A Second generation force field for the simulation of proteins, nucleic acids, and organic molecules. J Am Chem Soc 117:5179–5197 Cram DJ (1988) The design of molecular hosts, guests, and their complexes. Angew Chem Int Ed Eng 27:1009–1020 Fischer E (1922) Aus meinem Leben. Springer, Berlin, p 134 Pullman B (1990) Molecular modelling, with or without quantum chemistry. In: Rivail JL (ed) Modelling of molecular structures and properties, vol 71, Studies in physical and theoretical chemistry. Elsevier, Amsterdam, pp 1–15 van Gunsteren WF, Weiner PK (1989) Computer simulations of biomolecular systems. ESCOM, Leiden Watson JD (2010) The double helix, Phoenix, London; originally published by Weidenfeld Nicholson 1968 Bibliography 333
  • 352.
    Conformational Analysis 16 Assembling amolecule with a modelling kit makes it already clear that rotations around single bonds can be easily carried out. The molecule will achieve a different shape, or as the chemists say, it is transformed into a different conformation. In a real molecule, rotations around these bonds are not fully free. They are subjected to a potential and the molecule adopts during the rotation particular, energetically favor- able arrangements. n-Butane represents the simplest case (Fig. 16.1). The central torsion or dihedral angle determines the relative orientation of the two bonds to the methyl groups to one another. If n-butane is rotated out of the arrangement with the two bonds to the methyl groups in 180 orientation (trans), the methyl group at the “front” carbon and the hydrogen atom at the “back” carbon will directly coincide which each other ata rotation angle of 120 and 240 called“eclipsed”. In this geometry, they come closer to one another, therefore this arrangement is unfavorable for steric reasons. At a rotationangle of 60 and300 the groups are again in a staggered geometry,which is an energetically more favorable situation. This arrangement is somewhat less favorable than the staggered trans orientation because of the spatial vicinity of the methyl groups, which are now said to be “gauche” to one another. Finally along the rotation path an orientation is adopted at 0 and 360 in which both methyl groups are exactly behind one another. This is an even less favorable orientation. 16.1 Many Rotatable Bonds Create Large Conformational Multiplicity Multiple energy maxima and minima can be passed through during the course of a full rotation about 360 depending on which atoms and groups are attached to the rotatable bond. They are at different energy levels relative to one another. The lowest minimum is called the global minimum, and the energetically higher minima are called local minima. Knowledge about these minima is important because molecules adopt geometries that correspond to such energy minima. Calculations are necessary to find these minima. A possible method is in the systematic rotation of all rotatable bonds, for instance in 10 steps. At each step the energy of the G. Klebe, Drug Design, DOI 10.1007/978-3-642-17907-5_16, # Springer-Verlag Berlin Heidelberg 2013 335
  • 353.
    molecule is calculatedby using a force-field. All detected minima correspond to possible conformations of the molecule. Most drug-like molecules have many single bonds and therefore exhibit more than one rotatable bond. For these bonds, multiple values of the torsion angle can be adopted. These values have to be combined for all rotatable bonds in the molecule. The number of possible combinations increases multiplicatively. The molecule n-hexane has three rotatable bonds. If, analogous to n-butane, three local minima are assumed for each rotatable bond (60 , 80 , and 300 ), we can expect 3 3 3 ¼ 27 minima. To perform a systematic search for these minima in 10 steps however, the evaluation of 36 36 36 ¼ 46,656 positions would be necessary. In principle, the energy must be calculated for each of these positions. Not all angle positions will, however, lead to reasonable geometries. It can happen that parts of the molecule fold back upon itself, and parts will mutually superimpose. Such collisions can be recognized by computer programs, and the geometry is discarded from consideration. It is also easily imaginable that with an increasing number of rotatable bonds, the number of local minima and adoptable geometries can dramat- ically increase in a systematic search. Energy (kJ/mol) 3,8 kJ CH3 CH3 CH3 CH3 CH3 CH3 CH3 CH3 CH3 CH3 CH3 CH3 CH3 CH3 τ 0 60 120 180 240 300 360 Torsion Angle t [°] gauche trans gauche 14,6 kJ 25,5 kJ Fig. 16.1 Butane, CH3CH2CH2CH3, is made up of a linear chain of carbon atoms. If the terminal methyl groups are covering one another after rotation around the central C—C bond, the torsion angle about the central bond is 0 . At a 60 angle the “back” methyl group is half way between the “front” methyl group and a hydrogen atom. This situation is called a “gauche” orientation. At 120 a methyl group and a hydrogen atom are eclipsed to one another. At 180 the terminal methyl groups are exactly opposite one another. Here the energetically most favorable situation, the trans orientation, is achieved. From now on, the course of the rotation is mirror symmetrical, and ends in the starting position at 360 . The orientations at 120 and 140 are energetically less favorable than the 180 -orientation by 14.6 kJ/mol. The gauche orientations at 60 and 300 are the least favorable ones and are 25.5 kJ/mol higher in energy. If a minimization method is applied that can only run “downhill,” the three minima on the potential curve can be reached by starting at the 110 , 130 , and 250 points. 336 16 Conformational Analysis
  • 354.
    16.2 Conformations Arethe Local Energy Minima of a Molecule It was shown in ▶ Chap. 15, “Molecular Modeling” that the energy and geometry of a molecule can be calculated with the help of a force field or a quantum mechanical method. In this way every possible angle value combination about the rotatable bonds in a molecule can be found that correspond to energetically favorable states. The mathematical method that is used to search for such a minimum geometry can only move downhill on the potential energy surface (▶ Sect.15.5). For this, the potential of n-butane should be considered again (Fig. 16.1). If an angle of 130 is used as a starting value, the minimization ends with a trans geometry. If an angle of 110 is started with, which is only 20 distant, the optimization will lead to a gauche orientation. By doing this, two of the three possi- bilities are detected. The third minimum that mirrors the gauche conformation is reached if an angle of 350 is started from. In this way, all three conformations are found for the simplest possible case. How are complex molecules to be approached? In principle, in exactly the same way. Because it is not known which torsion angles of the individual single bonds will give access to potential minima, that is, stable conformations, the minimization must be started from numerous angles for each of the single bonds. From these values the minimization always goes “downhill”. The minima on the potential surface are found in this way. The art is to efficiently define the starting points from which a given geometry is minimized. This is a very laborious task, particu- larly with large molecules. It is akin to a hiker in the mountains searching for the deepest valley. Adenosine monophosphate 16.1 serves as an example (Fig. 16.2). The analysis concentrates on the five-membered ribose ring, the bond to nitrogen in the adenine, and the three bonds of the sugar phosphate side chain. What conformations can this molecule adopt? Rotations are performed about the open-chain bonds in 10 steps. In the systematic search for the ribose ring only those orientations are considered that allow the ring to close. To get a rough overview of the hypothetically obtained geometries, the distance between the center of the adenine scaffold and the phos- phorus atom is measured in each generated geometry. This falls between 4.5 and 9.3 Å for the more than 300,000 generated geometries. To estimate the energy content of a molecule in an arbitrary geometry, its van der Waals energy (▶ Chap. 15, “Molecular Modeling”) is calculated. Such a calculation is quickly accomplished. The energies of the 300,000 geometries are between 0 and 64 kJ/mol. The so-generated structures are not yet in local potential minima. To achieve this, each starting geometry must be minimized (cf., the potential energy curve of n-butane in Fig. 16.1). The subsequently obtained conformations are compared to determine whether the same local minima have been reached by starting from different points. This is a rather laborious endeavor for 300,000 starting geometries! It is akin to letting our hiker walk downhill from each level square to find the deepest valley. Hopefully he is granted great longevity so that he lives long enough to see the results of the search! Can this search be structured more effectively? 16.2 Conformations Are the Local Energy Minima of a Molecule 337
  • 355.
    16.3 How toScan Conformational Space Efficiently? Sometimes rolling the dice is better than systematic probing! The hiker could choose random places in the mountains from which to descend into the next valley. With a little luck he will find the deepest valley with significantly less effort. Such Monte Carlo methods are very popular in conformational analysis. For this the starting angles for the conformation search are chosen purely randomly. Molecular dynamics serves as another approach. The hiker would have to climb into an airplane that flies at high speed between the mountains and changes its direction with each obstruction. After set time intervals, the hiker jumps from the airplane and hikes to the base of the valley upon landing. The higher the airplane flies, the fewer mountain peaks are encountered and the faster the mountains can be crisscrossed. In the course of molecular dynamics a molecular trajectory (▶ Sect. 15.8) is followed, and the geometry is saved at predefined time intervals to use them as starting points for energy minimizations in a conformational anal- ysis. By increasing the temperature (i.e., flying higher) a larger area of conforma- tional space can be searched in a shorter period of time. 16.4 Is It Necessary to Search the Entire Conformational Space? Until now molecules have been considered in an isolated state. How does their flexibility change when they are brought into an environment like the binding pocket of a protein? In principle nothing changes in their conformational flexibility. It could be that minima are found at different positions that have different relative energies because of electrostatic and steric interactions in the binding pocket. This begs the question of whether the torsion angles in all areas must be sought for N N NH2 O P O− OH O N N O O 16.1 OH O H t1 t4 t2 t3 Fig. 16.2 Adenosine monophosphate 16.1 exhibits the conformationally flexible ribose ring and four open-chain torsion angles, t1–t4. Rotations are performed and the center of the around these torsion angles during the conformational analysis. To get a rough description of the attained geometry, the distance between the phosphorus atom in the side chain and the adenine scaffold ( N ) is measured. 338 16 Conformational Analysis
  • 356.
    a ligand thatis in a binding pocket. If energy minima occur preferentially at particular torsion angles, it is reasonable to limit the search to these angles. The hiker could, for example, get the impression that the villages are predominantly found in valleys and hardly ever on peaks or slopes. Because of this, all of the villages would be worthwhile as starting points for his minimum search. Ligands in the binding pocket of a protein are under the influence of directional interactions from the amino acids that are located there. Similar conditions are found for molecules in a crystal lattice. There, the environment is built of identical copies of neighboring molecules (▶ Chap. 13, “Experimental Methods of Structure Determination”). These undergo directional interactions with the molecule, analo- gously to the amino acids in the binding pocket. Interestingly, the molecular packing density in the interior of a protein is similar to organic molecules in a crystal lattice. As was already mentioned in ▶ Sect. 13.9, the crystal structures of numerous organic molecules are known and stored in a database. Experience has unfortunately shown that the conformation of a flexible molecule in a crystal structure is often not identical, or even similar to the geometry of the molecule in the binding pocket of a protein. The same is true for conformations that have been found in solution. The receptor-bound conformation of a molecule cannot be unambiguously derived from its small-molecule crystal structure or from that in solution. None- theless, much can be learned from crystal structures. As an example, not the entire molecule should be considered, but rather individual torsion angles. The potential energy for the central torsion angle of n-butane is shown in Fig. 16.1. If the angles for multiple C—CH2—CH2—C fragments are extracted from a database of small-molecule crystal structures, they gather overwhelmingly in areas where the potential energy curve shows local minima. Adenosine monophosphate 16.1 has four open-chain torsion angles t1–t4 (Fig. 16.2). The bond between the ribose ring and the adenine scaffold forms the torsion angle t4. A further fragment is the phosphate group with the oxygen and the attached carbon in the chain (t3). This fragment occurs in the database in a large variety of different structures. A representative picture can be expected because this fragment occurs in very many different environments when enough crystal structures are considered. The results of such searches for the four torsion angles t1–t4 are shown in Fig. 16.4 as frequency distributions, so-called histograms. Experience has shown that clearly preferred values occur for many torsion angles. That is the case here for t1, t2, and t3. The question can be raised as to why this statistical evaluation is not better performed on ligands that are taking part in crystallographically studied protein– ligand complexes. Unfortunately the diversity of these data is still limited, and the data are usually not accurate enough for the desired evaluation. Nevertheless, comparative studies have shown that the same torsion angles are preferentially found in protein–ligand complexes and small-molecule crystal structures (Fig. 16.3). The experience that torsion angles prefer particular values can be used for the conformational search. The angle t4 between the ribose ring and the adenine scaffold shows a broad distribution over many possible values (Fig. 16.4). 16.4 Is It Necessary to Search the Entire Conformational Space? 339
  • 357.
    Unfortunately, the searchcannot be narrowed here. This looks better for the other angles t1–t3. There, only specific values occur. If the systematic search is limited to these areas, and a search in 10 steps is carried out around the average value, it would only be necessary to generate 6,340 geometries. Almost the same distance between phosphorus and adenine is covered with 5.9–9.3 Å as in the unrestricted search. If a van der Waals energy calculation is carried out on these geometries, values between 0 and 16.3 kJ/mol are obtained. In contrast to the results from Sect. 16.2, all the geometries that correspond to the energetically unfavorable areas are discarded. How can it be confirmed that this restricted search also covers that part of the conformational space that includes the receptor-bound conformations? Adenosine monophosphate 16.1 often occurs as a substructure of cofactors in protein com- plexes so that there is enough information about receptor-bound conformations for this particular example. They come from crystal structures of proteins with these bound cofactors. The distance range of 5.9–9.2 Å between the adenine scaffold and the phosphorus in the receptor-bound structures covers the same range that was detected in the enhanced systematic search. It can therefore be assumed that enough geometries were generated that satisfactorily populate the local minima of the bound state of adenosine monophosphate. Reflecting back to the initial butane example (Fig. 16.1), this means that the starting points were well distributed so that all minima were reached. 16.5 The Difficulty in Finding Local Minima Corresponding to the Receptor-Bound State As already described, the local minima in a systematic conformational search are obtained by subjecting all of the generated geometries to a force-field optimization. There can be problems with this approach. To explain this, a different molecule, citric acid 16.2 can be considered, in the binding pocket of citrate synthase. Seven Frequency [%] 60 80 0 20 40 0 30 60 90 120 150 180 210 240 270 300 330 360 Torsion Angle t [°] Fig. 16.3 A value distribution for the torsion angles with clusters at 60 , 180 , and 300 is derived from a database of small-molecule crystal structures for the C—CH2—CH2—C fragment. Most values are found at 180 . Torsion angles between 0 and 360 are entered as the relative frequency in percent. The maxima of the distribution are at the points where the potential curve of n-butane (Fig. 16.1) shows its energy minima. 340 16 Conformational Analysis
  • 358.
    hydrogen bonds areformed by its three carboxylate groups and the hydroxyl group to three histidine and two arginine residues of the protein (Fig. 16.5). If the free, not to the protein bound citrate molecule is considered and its geometry is minimized in an isolated state, it takes on a conformation with internally saturated hydrogen bonds (▶ Sect. 15.5). Of course, a different geometry can be started from, but in all cases, conformations with intramolecular hydrogen bonds will result upon minimization. Such hydrogen bonds rarely occur in the protein-bound state. Therefore the conformation that was obtained after minimization in the isolated state has no relevance for the conditions in the protein. As a general rule, ligands rarely bind to proteins in a conformation exhibiting intramolecular hydrogen bonds. The H-bond-forming groups are generally involved in interactions with the protein. To circumvent the problem of intramolecular H-bond formation, a minimization of the generated starting structure can be neglected, and all geometries from the systematic search can be used for further comparison (▶ Chap. 17, “Pharmacophore Hypotheses and Molecular Comparisons”). Then, however, very many geometries must be examined. This would severely limit the scope of such comparisons for N N NH2 O P − O HO N N O O 16.1 OH HO t1 t4 t2 t3 40 60 20 30 40 0 20 0 10 Frequency [%] Frequency [%] 0 30 60 90 120 150 180 210 240 270 300 330 360 Frequency [%] 60 0 20 40 Torsion Angle t [°] 0 30 60 90 120 150 180 210 240 270 300 330 360 Torsion Angle t [°] Frequency [%] 15 0 5 10 0 30 60 90 120 150 180 210 240 270 300 330 360 Torsion Angle t [°] 0 30 60 90 120 150 180 210 240 270 300 330 360 Torsion Angle t [°] Fig. 16.4 The frequency distribution of the torsion angles of the open-chain bonds of adenosine monophosphate as found in the crystal structures of small organic molecules. The torsion-angle histograms are constructed for fragments that are representative for corresponding portions of the test molecule. There are clearly preferred values for the angles t1–t3, but a broad distribution of all possible angles is found for t4. This knowledge is used in the conformational analyses and limits the search for t1–t3 to the preferred value ranges. 16.5 The Difficulty in Finding Local Minima 341
  • 359.
    computational reasons. Furthermore,such generated results would likely describe rather distorted geometries. The force field responsible for the formation of intra- molecular H-bonds could be neglected. But how reliable would such a reduced force field be? An attempt can be made to summarize the geometries that were generated in a systematic search so that groups with similar conformations are described by one representative member. 16.6 An Effective Search for Relevant Conformations by Using a Knowledge-Based Approach A knowledge-based approach analyzes first the experimentally determined confor- mations and generates for new molecules only those conformations that are con- sistent with the experimental knowledge base. In this way, many geometries are never generated from the very beginning. The example of adenosine monophosphate 16.1 is once again invoked. The approach recognizes a flexible five-membered ring and four open-chain rotatable bonds. Energetically favorable conformations of the ring are chosen from a database. This database contains many different ring systems as they are found in, for example, crystal structures of organic molecules. In the case at hand, the approach suggests the five energetically most favorable ring conformations from which two are in fact found in the protein- bound cofactors. For the open-chain part of the molecule the method is guided by the above-mentioned frequency distribution of the dihedral angle (Fig. 16.4). The starting geometries are only generated in areas in which these distributions show significant frequencies. The distribution is still rather crude. In a final step, the generated geometries are optimized by readjusting the torsion angles. Clashes between non-covalently bound atoms are avoided. At the same time the adjusted dihedral angles are kept as close as possible to the preferred values. This approach gets by with relatively few conformations. They are rather evenly distributed in the part of the conformational space that is relevant for receptor-bound conformations (Fig. 16.6). N N His 238 Arg 329 N HN N NH OH O O H H HN H H His 320 + NH O− O O− HN O− H N H Arg 401 + N N 16.2 His 274 Fig. 16.5 Interactions between citric acid 16.2 and the enzyme citrate synthase. The molecule is bound by seven hydrogen bonds to three histidine and two arginine residues. 342 16 Conformational Analysis
  • 360.
    16.7 What Isthe Outcome of a Conformational Search? Many drug-like molecules are flexible. They can adopt markedly different confor- mations depending on the surrounding environment. Usually the receptor-bound geometry is not in the energetically most favorable conformation found for the isolated state, but will fall in an energetically favorable area. For the conformational analysis, this means that it is not necessarily the deepest minimum that is sought. Rather, it should be the “relevant” minimum that corresponds to the bound state. There is only a chance of finding it when the criteria for the search are known. There is no difference in the difficulty of finding the energetically most favorable conformation, or the one that “fits” best the binding site. An important tool in the search for novel lead structures is the docking of candidate molecules into the binding pocket of a given protein. Programs that are able to use this approach must be able to handle the conformation problem. Meanwhile, a large variety of methods have been developed that allow efficient docking searches on computer clusters, particularly for molecules of drug-like size. Fig. 16.6 Eighty-one conformers (upper part) from experimentally determined protein–ligand complexes are superimposed upon one another to illustrate the areas in space that adenosine monophosphate 16.1 can adopt in a protein-bound state. The ribose ring is located in the center, for which two ring conformations occur. The possible orientations of the adenine ring are shown on the top, and the conformations of the flexible phosphate chain are on the bottom. Similar coverage of the conformational space is achieved with a manageable number of 14 conformations (lower part), which were generated by a knowledge- based approach. 16.7 What Is the Outcome of a Conformational Search? 343
  • 361.
    16.8 Synopsis • Drug-likemolecules exhibit multiple rotatable bonds. Rotations around these bonds drive the molecules into different conformations that correspond to local minima on the energy surface of the molecule. • The receptor-bound conformation of a drug-like molecule is the starting point for any drug-design considerations. Therefore, many methods have been developed to perform conformational analyses. Systematic searches by incre- mental rotations about each single bond torsion angle will produce a huge amount of geometries that need to be optimized to the local minima on the energy surface. • The conformation of a drug-like molecule frequently changes with the environ- ment. Usually the conformation in the protein-bound state differs from that in solution, in the gas phase, or in the small-molecule structure. • Considering torsional fragments in small molecules and analyzing them across databases of crystal structures by statistical means reveals clear-cut torsional preferences for many examples. Such knowledge can be exploited to perform a conformational search more efficiently. Not all values around a rotatable bond have to be tested, and the search can be limited to the ranges that are known to be preferred. • A further obstacle in the conformational search of the protein-bound conforma- tion of a drug-like molecule is that the molecule will interact with its environ- ment. This environment, the protein’s binding pocket, is often polar and will involve the bound ligand in multiple hydrogen bonds. • Using a knowledge base on torsional preferences of small organic molecules can significantly enhance the conformational search, particularly during docking, in molecular comparisons, or in database searches based on predefined pharmacophores. Bibliography General Literature Leach A (2001) Molecular modelling: principles and applications, 2nd edn. Prentice Hall, Englewood Cliffs Special Literature Böhm HJ, Klebe G (1996) What can we learn from molecular recognition in protein–ligand complexes for the design of new drugs? Angew Chem Intl Ed Eng 35:2588–2614 Klebe G, Mietzner T (1994) A fast and efficient method to generate biologically relevant conformations. J Comput Aided Mol Design 8:583–606 Klebe G (1994) Structure correlation and ligand/receptor interactions. In: Bürgi HB, Dunitz JD (eds) Structure correlation. VCH, Weinheim, pp 543–603 344 16 Conformational Analysis
  • 362.
    Klebe G (1995)Toward a more efficient handling of conformational flexibility in computer- assisted modelling of drug molecules. Persp Drug Des Discov 3:85–105 Marshall GR, Naylor CB (1990) Use of molecular graphics for structural analysis of small molecules. In: Hansch C, Sammes PG, Taylor JB (eds) Comprehensive medicinal chemistry, 4. Pergamon, Oxford, pp 431–458 Stegemann B, Klebe G (2011) Cofactor-binding sites in proteins of deviating sequence: Compar- ative analysis and clustering in torsion angle, cavity, and fold space. Proteins 80:626–648 Bibliography 345
  • 364.
  • 365.
    Today drug designis supported by numerous computational approaches that, like the pieces of a puzzle, all provide contributions from the development of a first design hypothesis to a clinical candidate. (Announcement poster from the research group of the author on the occasion of a conference in 2005 in Rauischholzhausen, Marburg, Germany.) 348 IV Structure–Activity Relationships and Design Approaches
  • 366.
    Pharmacophore Hypotheses andMolecular Comparisons 17 Emil Fischer’s lock-and-key principle (▶ Sect. 4.1) demonstrates the specific interaction of an active compound with its receptor. With a key, it is the grooves on the blade that interact with the wards in the keyway to open the lock. With active substances it is a particular part of the molecule that undergoes an interaction with the amino acids in the binding pocket. Similar molecules are frequently compared in drug design to derive ideas for new structures. In this chapter, the criteria that make such comparisons possible are compiled. Furthermore, these criteria can be used to search in databases for alternative molecules that can bind in the same way with the protein. 17.1 The Pharmacophore Anchors a Drug Molecule in the Binding Pocket The structure of the binding pocket determines which functional groups are neces- sary for the ligand to bind. The spatial orientation of these functional groups in ligands is referred to as the pharmacophore (▶ Sect. 8.7, Fig. 8.9). Because of its importance for drug design and model hypothesis in medicinal chemistry, an official IUPAC definition has been established by Camille G. Wermuth (Table 17.1). The interacting groups that a ligand must possess to be able to successfully interact with a protein defines the pharmacophore in space and is independent of the special molecular scaffold to which they are attached. The hydrogen-bond-forming groups or hydrophobic parts are considered for this. A more detailed examination differentiates between positively and negatively charged groups in a molecule. When derived from a set of similarly binding ligands, this generalized description is referred to as the ligand-based pharmacophore. On the other hand, the protein structure can be the starting point. For this, an analysis is made as to which amino acid functional groups are in the binding pocket. They define the properties with which a ligand can bind to them. In this sense, the protein structure determines how the pharmacophore of a ligand must be shaped to be able to successfully bind to the protein. This description is referred to as the G. Klebe, Drug Design, DOI 10.1007/978-3-642-17907-5_17, # Springer-Verlag Berlin Heidelberg 2013 349
  • 367.
    protein-based pharmacophore. Incontrast to the lock-and-key picture, ligands and proteins are flexible. In ligands the functional groups of the pharmacophore must be oriented in the direction of the corresponding counter groups in the protein. Therefore, detailed knowledge about the conformational properties of the ligand is essential. Only then, it can be predicted whether a ligand can potentially adopt a geometry that satisfies the interactions with the protein. On the receptor side, the geometry of the binding pocket can adapt to the shape of the ligand, similar to how a glove fits on the hand of its wearer (induced fit, ▶ Sect. 4.1). Binding pockets are indeed found in the interior or in buried grooves on the surface of proteins, and it is there that the small but decisive conformational changes of the protein occur. An example for the adaptability of a protein is presented in ▶ Sect. 15.8. An attempt is made to describe these induced-fit adaptations by using molecular dynamics simulations. 17.2 Structural Superposition of Drug Molecules For the moment, we want to limit ourselves to an example with an unknown receptor structure. All of the effects that ligand binding induces in the protein are therefore neglected. An example should clarify this. The fruit of the shrub Anamirta cocculus, the fishberry, contains the terpene picrotoxinin 17.1, which causes con- vulsions. This compound affects chloride channels (▶ Sect. 30.5). Because of its central stimulatory effect, it has been used in the past as an antidote to sleeping pill overdoses. Due to its high toxicity, it has no therapeutic importance today. The structure of picrotoxinin was determined by crystallography (Fig. 17.1). Synthetic modifications of the cyclic core structure have led to active and inactive derivatives (Fig. 17.2). The spatial structure of the individual derivatives can be constructed on a computer from the crystal structure of the parent compound and superimposed upon one another to recognize structural differences. The parts of the molecule that are seen as equivalent in a ligand-based pharmacophore model are Table 17.1 Official IUPAC definition of a pharmacophore by Wermuth CG et al (1998) Pure Appl Chem 70:1129–1143 A pharmacophore is the ensemble of steric and electronic features that is necessary to ensure the optimal supramolecular interactions with a specific biological target structure and to trigger (or to block) its biological response. A pharmacophore does not represent a real molecule or a real association of functional groups, but a purely abstract concept that accounts for the common molecular interaction capacities of a group of compounds toward their target structure. A pharmacophore can be considered as the largest common denominator shared by a set of active molecules. This definition discards a misuse often found in the medicinal chemistry literature, which consists of naming as pharmacophores simple chemical functionalities such as guanidines, sulfonamides, or dihydroimidazoles (formerly imidazolines), or typical structural skeletons such as flavones, phenothiazines, prostaglandins, or steroids. A pharmacophore is defined by pharmacophoric descriptors, including H-bonding, hydrophobic, and electrostatic interaction sites, defined by atoms, ring centers, and virtual points. 350 17 Pharmacophore Hypotheses and Molecular Comparisons
  • 368.
    placed upon oneanother for this superimposition. The superposition of all active and inactive derivatives along with the common volumes of both classes are shown in Fig. 17.3. The difference between both volumes is computed. It describes those areas in space that are only occupied by the inactive molecules. 17.3 Logical Operations with Molecular Volumes What information can be extracted from such comparative volumes? It is assumed as a working hypothesis that a molecule can only be bound when its size does not exceed the maximum available space. How large is the maximum available space? To get an idea of this, the common volumes of all active derivatives are considered and compared with the volumes of all inactive derivatives. A possible explanation for the lack of activity of a molecule could then be that the area in the binding pocket that the molecule would likely occupy is already taken by the protein. Volume comparisons between active and inactive derivatives deliver informa- tion about the possible shape of the receptor pocket. Such comparisons can be very supportive in drug design. If the “forbidden” volume area for a compound class is found, it can be checked before synthesis whether a compound really leaves the “forbidden” area unoccupied. O OH O O CH3 O O 17.1 Fig. 17.1 Picrotoxinin 17.1 is responsible for the centrally stimulating effect of the extracts of fishberries. Its structure and spatial architecture were proven by X-ray structure analysis. 17.3 Logical Operations with Molecular Volumes 351
  • 369.
    Because of therigidity of the molecule, it is simple to superimpose picrotoxinin analogues on one another. Considering flexible molecules however, it cannot be expected that the transition from a 2D molecular representation to a 3D structure (▶ Chaps. 15, “Molecular Modeling” and ▶ 16, “Conformational Analysis”) active R1 OH R1 = O OH O 17.1 Picrotoxinin O CH3 O H OAc O CH3 CH3 CH3 CH3 CH3 CH3 CH3 CH2 R2 = OH O OH O O OH H OAc CH3 R2 N CH3 O O inactive O OCOCH3 O O OH O O O O O CH3 CH3 CH3 CH3 O O O OH O O OH O O O O O O O Fig. 17.2 By starting with picrotoxinin 17.1, active and inactive derivatives were synthesized. 352 17 Pharmacophore Hypotheses and Molecular Comparisons
  • 370.
    delivers molecules inconformations in which all of the functional groups of the pharmacophore are already analogously placed in space. Therefore, there are two problems to solve: • The groups that correspond to one another in different molecules and define the pharmacophore must be determined. • Techniques are needed that bring the molecules into conformations in which the equivalent groups of the pharmacophores are analogously oriented in space. 17.4 The Pharmacophore Is Modified by Conformational Transitions To resolve the first problem, the role that the functional groups of the active substance that form the contact with the receptor must be considered. They must form hydrogen-bonding and hydrophobic interactions with the protein. In this context, similarity of the functional groups means that they can form analogous interactions with the protein. To define a pharmacophore in space, at least three interacting groups are needed. This is immediately clear if one considers how many fingers on a hand are needed to hold a randomly formed object (e.g., a potato) in space. With only two fingers, the object can still rotate about an axis. In contrast, if three anchor points are taken, its position is fixed in space. Practical experience with a compound class is often helpful when assigning pharmacophoric groups. For example, inhibitors of the angiotensin-converting enzyme (Fig. 17.4 and ▶ Sect. 25.5) need a terminal carboxylate group, a carbonyl group, and a group that coordinates to the catalytic zinc ion. Fig. 17.3 Superposition of the spatial structure of active (yellow) and inactive (blue) derivatives of picrotoxinin. The united volumes around the active derivatives are shown by the red mesh. The total volume around all inactive derivatives is shown in blue. A difference is formed between the two volumes. The remaining volume (green) shows areas that are only occupied by inactive derivatives. An explanation for the lack of activity of these derivatives can be that they try to occupy volume areas that are already occupied by the receptor protein. This spatial clash does not occur with the active derivatives. 17.4 The Pharmacophore Is Modified by Conformational Transitions 353
  • 371.
    S N OH N N N O HS N N O HS COOH O HOOC OH O COOH N O N O COOH H3C S N O HS COOH N O CH3 HS COOH N OCOOH HS OH H O N O CH3 HS COOH N O CH3 CH3 CH3 CH3 CH3 CH3 N P COOH HO O N N N O O COOH HOOC N O COOH S H N S H N O HS COOH H O O COOH H H P HO N O N COOH N O N COOH HOOC N O COOH COOH HS N S H H N P S S O N O N COOH COOH N HOOC HOOC HOOC N N O O COOH OH O H H O N N O COOH SH Fig. 17.4 Inhibitors of the angiotensin-converting enzyme. A pharmacophore that consists of a terminal carboxylate group, a carbonyl group, and a group that coordinates to the catalytic zinc is necessary for binding to the enzyme. The latter function is assumed by a thiol, a phosphoric, phosphonic, or carboxylic acid. The individual derivatives possess conformational flexibility in different areas. 354 17 Pharmacophore Hypotheses and Molecular Comparisons
  • 372.
    How can itbe determined whether a common orientation for the assumed equivalent groups in different molecules exists? In a computational method these groups are assigned “virtual” springs that are coupled to one another. The spatial overlap is reinforced by pulling these springs together. To avoid arriving at an entirely distorted molecular geometry, a force-field is simultaneously taken into consideration for each molecule (▶ Chap. 15, “Molecular Modeling”). The steroid 17.2 and three different inhibitors 17.3–17.5 (Fig. 17.5) are considered as an example. They are ligands of an enzyme in the ergosterol biosynthesis. Spring forces are applied between the marked atoms with the same numbers. The minimi- zation of these forces along with the individual force-fields of the four molecules leads to the superposition that is shown in Fig. 17.5. 12 12 11 10 9 8 11 10 9 1 2 7 6 4 3 5 HO 1 2 6 3 5 N 17.2 17.3 12 12 10 9 N 10 8 6 5 NH 1 6 4 3 5 N 7 1 2 4 3 17.5 17.4 Fig. 17.5 “Virtual” springs are coupled to the atoms that are marked with numbers around the steroid 17.2 and the three derivatives 17.3–17.5. The structural superposition (bottom) that is shown is determined by the force of these springs and the simultaneous consideration of molecular force-fields. 17.4 The Pharmacophore Is Modified by Conformational Transitions 355
  • 373.
    Unfortunately the resultingsolution depends on the starting conditions. If the molecules are differently oriented in space at the beginning of the calculation, or if they start from different conformations, different superpositions can result. At first glance, this argument appears perhaps somewhat implausible. It should be kept in mind that molecules are not only considered under the influence of “virtual” spring forces but also under their own force fields. The many minima problem of molecular force-field calculations was already mentioned in ▶ Chap. 16, “Conformational Analysis”. They play an important role here too. The hiker in the last chapter should help to explain this problem. He stands on a mountain peak and wants to descend into the deepest valley possible. At the same time, he feels an “additional force” as he has severe thirst. He wants to meet his friends in a pub. The friends are coming from different peaks in the mountains. He sees a pub in all valleys. But which is his choice? For a common meeting point he would also accept a less deep valley. In the beginning of his hike he looks for the steepest descent to come down quickly. After a while, the other valleys fall from view. If he arrives at a different pub in the end, he does not have the energy anymore to look for another one. If he had started from a different mountain top, he might have found a comparably deep valley, but had found the pub of his choice and met his friends at the same time. The problem with the choice of starting conditions for molecular comparisons with “virtual” spring forces is similar. How should it be checked whether the best possible solution was found? Here only an experiment can help. For this it is necessary to synthesize molecules that are conformationally rigid in particular parts because of the incorporation of rings. They confer a fixed spatial arrangement to the pharmacophore. If they also possess activity, their rigidified geometry indicates the correct pharmacophore (see Sect. 17.9). 17.5 Systematic Conformational Search and Pharmacophore Hypothesis: The “Active-Analogue Approach” In the last chapter conformational analysis was the central topic. Could the tech- niques described there, for example, the systematic rotation around particular bonds, be used in the search for the pharmacophore? Garland Marshall developed such a technique, called the active-analogue approach, at the end of the 1970s. First a pharmacophore must be assigned to all molecules in a data set. Then the equivalency of groups must be defined, that means, which groups are equivalent to which other groups. Then a systematic conformational search is carried out for the first compound in the data set. The distance between each functional group in the pharmacophore for each geometry is determined during the search. These distances are saved. Because molecules cannot take on any arbitrary geometry, the distances will occur in particular intervals. An analogous approach is taken for the second molecule in the set. In principle, only the distance ranges of the first molecule must be searched. It could be that all of the distances found with the second molecule were already found with the first. It could also be that particular ranges are excluded, and the “allowed” distance ranges are therefore limited. All of the molecules in the data set are analyzed in this way. 356 17 Pharmacophore Hypotheses and Molecular Comparisons
  • 374.
    If the conformationalflexibility of the molecule is limited in one part of the scaffold, there is a chance that the functional groups of the pharmacophore remain in only one or a few different spatial patterns. The possible binding geometries of the pharmacophoric groups of the ligand are derived from this. Afterward, a geometry optimization can be carried out, in which case, the “virtual-springs” approach is now ideally suited because the latter approach has approximated the final solution rather closely. It is easy to imagine that the order with which the molecules are investigated is decisive for the efficiency of the technique. Ideally, the most rigid molecule from the data set is the first to be studied. With a little luck, this limits a large part of the possible conformational space. The resulting list of possible distances will remain small. By consistently using such limitations, in 1987 Garland Marshall and his research group were able to propose a model for the receptor-bound conformation of the ACE inhibitors shown in Fig. 17.4. What could be more rewarding than years later to be able to personally validate the model and that it proved correct within an astonishingly small error margin! The validation was achieved in the meantime because the crystal structures of the enzyme with bound inhibitors from this data set were solved (see ▶ Sect. 25.5). 17.6 Molecular Recognition Properties and the Similarity of Molecules The question must be allowed as whether the conceptions presented in the previous sections to represent the properties of molecules were really appropriately consid- ered in the attempted comparisons? Deciding which functional groups belong to the individual “teeth” of a pharmacophore is not easy. Analogous functional groups must be oriented in a similar spatial direction in all molecules. In the case of the ACE inhibitors (Fig. 17.4) conflict occurs already during the assignment of the functional groups. Some analogues carry two carboxylate groups, which must be unambigu- ously assigned to the pharmacophore prior to comparison with other inhibitors. The binding of low molecular weight ligands to a protein is a mutual, targeted recognition process. Both partners must fit together so that a strong interaction can be formed. Parts of the ligand that have complementary recognition properties determine the binding to the receptor. The term “recognition properties” refers to all qualities that contribute to the specific interaction between molecules. Until now, only properties and similarities have been considered that could be directly read from the molecular scaffold. But is that sufficient? How would the world look if we recognized ourselves only by our “scaffolds,” that is, only by the skeletons? Male and female could not even be differentiated straightaway on these grounds! All of the allure of interpersonal relationships that function over personal appearance and charisma would be lost. Until now, molecules have been considered on the grounds of their “skeleton”. Why should ligand–receptor interactions be described at this level? Even molecules recognize one another by the properties of their shapes and surfaces exposed to their immediate vicinity to form contacts. The following example should clarify this point. Methotrexate 17.6 (MTX) and dihydrofolate 17.6 Molecular Recognition Properties and the Similarity of Molecules 357
  • 375.
    17.7 (DHF) bindto the enzyme dihydrofolate reductase (Fig. 17.6 and ▶ Sect. 27.2). The side chains of both molecules are nearly identical, but the heterocycles are different. It is known from NMR spectroscopic investigations that the proton- ated form of MTX binds to the protein. When considering the chemical formulae, it is tempting to overlay the two heterocycles directly upon one another. Good scaffold equivalence is achieved, and the heteroatoms in both molecules fall on top of one another. The receptor, however, does not care about the apparent equivalence of molecular skeletons. The interaction with the molecular surface is much more important. Polar molecules such as MTX or DHF are bound to the protein through hydrogen bonds. The arrows in Fig. 17.6 characterize the H-bond donor and acceptor groups. The arrows are pointing to the molecule when an acceptor property is exposed, and away in the case of donor groups. At the start, the molecules are oriented in space so that they correspond in terms of a direct atom– atom matching. For the moment, the basic molecular skeleton should be ignored, and only the distribution of H-bond donor and acceptor groups is considered. The equivalence achieved is not very convincing. Another variant is taken into consid- eration in which the heterocycle of DHF is flipped over along the bond between the heterocycle and the side chain. The spatial overlap of both molecules is no longer optimal, but the pattern of exposed donor and acceptor groups for both molecules shows much better agreement (Fig. 17.6). If transformed into another conformation, the molecule now has entirely different molecular recognition properties. This difference can hardly be read from chemical formulae, even by a trained eye in cases such as this one. Models are nice, but are they also correct? Here only an experiment can provide an answer. Luckily, in the present case, crystal structures are available for both ligands in complex with DHFR. The observed binding geometries are shown in Fig. 17.7. One aspartate and two carbonyl groups in the main chain and two water molecules are responsible for recognition in the binding pocket. The water molecules mediate the H- bonds between ligand and protein. The experimentally determined binding geometries show that the conceptions about the similarity of the hydrogen bond properties led to the correct conclusions. On first glance, a surprising and seemingly “non-equivalent” orientation of both ligands in the binding pocket is easily explained. The properties that are responsible for the mutual recognition process must be compared to one another. Only these count in the comparison! It is notable that this experimental confirmation of the above-described ideas came eight years after the working hypoth- esis was proposed. This is a nice example of the performance of model hypothesis. Other properties, apart from hydrogen bonds, can serve as additional criteria to define similarities in the molecular-recognition process. The electrostatic poten- tial (▶ Chap. 15, “Molecular Modeling”) computed for the heterocyclic ring systems of DHF and MTX (Fig. 17.7) suggests very similar conclusions. In addition to the previously mentioned H-bonding properties and electrostatic potential, steric space filling and the distribution of hydrophobic properties on the surface of both ligands, play an important role. When molecules are superimposed to predict their putative geometries in the binding pocket, their conformational flexibility must also be considered. 358 17 Pharmacophore Hypotheses and Molecular Comparisons
  • 376.
    17.7 Automated MolecularComparisons and Superpositioning Based on Recognition Properties Is it possible to consider all of the properties that were mentioned in the last section in a method to superimpose molecules for a relative comparison? For this, a measure of similarity for all properties must be calculated. This measure must be related to a spatial distance function. Subsequently, an optimization of the spatial superposition can be performed. At the same time, the maximum similarity of the chosen properties is sought. The program SEAL from Simon Kearsley and Graham Smith determines the spatial similarity of different properties distributed over the molecular scaffold. It simultaneously ranks the similarity with respect to the overlap volume of the molecules that were determined during superposition. In this way the superposition of MTX and DHF is correctly predicted according to experiment. The conformational flexibility is also considered in this analysis. N H H O N H H O N N R N H N N H O H R N N N N N R N H N N H O H R N N N H H N N N H H H N N N H H N N N H H H + 17.6 17.7 a N H H O N H H N R H N N N N N R H H H N N N N H N H H H R + N N N N N R N H N N N N O N H + H H H H H H H H + b c Fig. 17.6 Methotrexate 17.6 and dihydrofolate 17.7 are ligands of dihydrofolate reductase. The side chain R (see ▶ Sect. 27.2, Fig. 27.9) is identical for both except for a methyl group on the nitrogen atom. The heterocycles are different. (a) Intuitively, superposition of both heterocycles directly upon one another when comparing the structures appear reasonable. Heteroatoms match pair-wise one another. (b) Arrows are distributed around the molecules to compare the hydrogen- bonding properties. They are pointed to the molecule when an acceptor is present and they point away for donor groups. If the molecular skeletons are masked out, and the distribution of H-bond donor and acceptor groups is concentrated upon, the atom–atom overlap obtained via the direct superposition of the rings shows rather unconvincing equivalence. (c) Instead if the heterocycle in 17.7 is flipped about the bond between the heterocycle and the side chain R, the pattern of donor and acceptor groups that is obtained exhibits convincing equivalence. 17.7 Automated Molecular Comparisons and Superpositioning 359
  • 377.
    For this, precalculatedconformers can be taken and compared successively to one another. This is realized in the program ROCS from Anthony Nicchols at OpenEye. Alternatively, a different approach was taken by Christian Lemmen at GMD in St. Augustin, Germany, in the program FlexS. First a reference ligand is depicted through a series of property-bound Gaussian functions. The molecule is described as a density distribution of pharmacophore properties in space. Then the molecule to be compared by superposition with the reference ligand is deconvoluted into fragments. A central base fragment is laid upon the reference in such a way that the description with Gauss functions overlaps with the reference as optimally as possible. Then the other fragments are attached to the base fragment until the complete ligand is reconstructed. During this attachment, care is taken to fit the fragments just as optimally in the Gauss function. At the same time the conformational flexibility of the ligand is considered. One complication occurs during the similarity analysis of the molecules in this method. Assuming that the relevant properties defining the similarity were found at all, the question arises as to what is accepted as “sufficient” similarity to induce Fig. 17.7 Experimentally determined binding geometries of methotrexate (green carbon atoms) and dihydrofolate (gray carbon atoms) in dihydrofolate reductase. The heterocycles of the ligands are bound through H-bonds to the carboxylate or carbonyl group of an amino acid that is oriented into the binding pocket. Two water molecules (red spheres) mediate additional H-bonds between the ligands and the protein. The difference in the binding mode that is discussed in Fig. 17.6, is clearly recognized. On the right-hand side the electrostatic potentials around methotrexate (top) and dihydrofolate are shown. The molecules are found in a spatial orientation that was determined by crystal structure analysis. Considered qualitatively, the electrostatic potentials of both mole- cules in this orientation have very similar form. 360 17 Pharmacophore Hypotheses and Molecular Comparisons
  • 378.
    a comparable effecton the receptor. There is a toy with which children try to push differently shaped pieces through preformed holes into a box, a so-called “shape sorter.” For each block form, be it a cube, cuboid, round cylinder, or elliptical cylinder, there is one performed hole that it fits. In similarity considerations there is a tendency to group cube and cuboid, or round and elliptical cylinder into related categories because of their similar form. If an attempt is made to push these parts through the holes of the shape sorter, it is easily discovered that the cuboid will not only fit through the square hole but also, with a bit of force, through the hole for the elliptical cylinder. The cube is only slightly too big to, in addition to the square hole, also fit through the hole for the circular cylinder. Therefore, are the cuboid and the elliptical cylinder or the cube and the circular cylinder not more similar to one another? The measure of similarity that is to be used for a molecule is calibrated with respect to the receptor to which the molecule should fit. It is therefore always a relative measure! Thiorphan and retro-thiorphan (▶ Sect. 5.5, formulae 5.23 and 5.24) differ only in the spatial sequence of the amide bond. They bind with almost identical affinity to the zinc protease thermolysin, and NEP 24.11. Therefore, one would classify them as very similar. The zinc protease ACE binds thiorphan by at least a factor of 100 times more strongly than retro-thiorphan (▶ Sect. 5.5, Fig. 5.10). Relative to this enzyme, both substances must be called dissimilar. Another extreme is seen in the oligopeptide-binding protein A (▶ Sect. 4.1). It binds every tri- to pentapeptide comprising a central Lys—Xxx—Lys moiety with almost equal affinity. In principle, only information about the shape of the binding site is needed for a similarity analysis. Only then the requirements can be adequately defined. However, the structure of the receptor is still not known in many drug- design projects. Here there is no choice: it is only through hypothesis and its experimental testing in gradual steps that the structural requirements of the receptor can be approximated. 17.8 Rigid Analogues Trace the Biologically Active Conformation The concepts in ▶ Chap. 16, “Conformational Analysis” showed that an enor- mously large number of conformers can be easily generated for many drug-like molecules. If comparison of all conformers is desired, the undertaking quickly becomes computationally very intensive. When would a chance been given to get an idea of the bound conformations? Either one compound in the data set is highly rigid and constrains the putative arrangements of the pharmacophore in space, or the considered molecules are rigid in different areas of their molecular scaffold. In Fig. 17.8 the structural superposition of the steroid 17.2 with the above-described inhibitors 17.3–17.5 is shown. This result was obtained from a similarity analysis with multiple conformers. The achieved result is very similar to the calculation with the “virtual” spring forces. It has, however, a decisive advantage: no preconceived definitions of equivalent centers are necessary, between which the spring forces 17.8 Rigid Analogues Trace the Biologically Active Conformation 361
  • 379.
    are applied. Theseequivalences arise automatically through a similarity compari- son of the properties that are distributed over the molecules. 17.9 If Rigid Analogues are Lacking: Model Compounds Elucidate the Active Conformation In the last example a largely rigid reference compound was furnished. How should one proceed when no such reference compound is known? Only experiment can help here. Rigidized analogues must be synthesized. These are tested for biological activity. If they still exhibit affinity to the receptor, it can be assumed that the active conformation was frozen. An example should demonstrate how the receptor-bound conformation can be probed by synthesizing rigid model compounds. The calcium channel blocker nifedipine 17.8 (▶ Sect. 2.5) contains multiple rotatable bonds (Fig. 17.9). It can therefore adopt numerous conformations. Which orientation does the phenyl ring, for instance, take relative to the dihydropyridine ring? This question was very elegantly clarified by Wolfgang Seidel at Bayer through the synthesis and crystal structure determination of cyclized derivatives 17.9. An additional lactone ring changes the biological activity of the derivative depending on the ring size. In compounds with a six-membered lactone the phenyl and dihydropyridine rings lie virtually in the same plane. Conversely, the phenyl ring stands perpendicular to the dihydropyridine ring in the derivative with the twelve-membered ring. The affinity of this compound is about five orders of magnitude higher than for the derivative with the six-membered lactone. Therefore it must be assumed that nifedipine exerts its effect in a conformation in which the phenyl and dihydropyridine rings are perpendicular to one another. After this question has been answered, more compounds can be designed. A relevant superposition that corresponds to the conditions in the protein’s binding pocket will be possible. Such superpositions have gained a decisive meaning in the context of 3D structure–activity relationships. An example is shown in ▶ Sect. 29.4 of how the structural fixation of the biologically active conformation of a ligand can support the design process. Fig. 17.8 Superposition of the steroid 17.2 and three inhibitors 17.3–17.5 according to a spatial comparison of their molecular properties. In contrast to methods with “virtual” spring forces, this method does not require a predefined equivalence of molecular groups. It is automatically generated by the similarity comparison of many different conformations. 362 17 Pharmacophore Hypotheses and Molecular Comparisons
  • 380.
    N CO2CH3 CH3 NO2 H3CO2C H3C 17.8 Nifedipine H CH3 H3C N O (CH2)n O RO2C 17.9 H 6 7 9 8 10 11 12 1 0 20 40 60 80 90°-α 10100 1,000 Ki (nM) 10,000 100,000 Fig. 17.9 The calcium channel blocker nifedipine 17.8 contains multiple rotatable bonds. The phenyl ring can coincide with a plane of the dihydropyridine ring or they orient perpendicular to one another. To distinguish between these possibilities, lactones with different ring size 17.9 were synthesized and their crystal structures were determined. The phenyl ring lies almost parallel to the dihydropyridine ring (a 0 ) in the compound with the six- membered-ring lactone (orange). Upon increasing the ring size, the angle between the two rings grows so that a perpendicular orientation (a 80 ) is achieved in the twelve-membered-ring derivative (green). The biological activity increases from virtually inactive, as in the six-membered ring, to almost five orders of magnitude higher for the twelve-membered-ring derivative. The bioactive conformation of nifedipine (gray) therefore requires a perpendicular orientation of the two rings. 17.9 If Rigid Analogues are Lacking 363
  • 381.
    17.10 The ProteinDefines the Pharmacophore: “Hot Spot” Analysis of the Binding Pocket It was described in Sect. 17.1 that a pharmacophore can also be derived from the protein structure. The computer program GRID from Peter Goodford is a tool that is often used for this purpose. It calculates favorable positions for functional groups on a putative ligand in the protein’s binding pocket. These could be, for instance, a carboxylate group, a hydroxyl group, or an aliphatic carbon atom. The potential function, implemented into GRID, has been calibrated on numerous functional groups from crystal structures of organic molecules. The result of a GRID calculation is a set of interaction energies assigned to the intersections of a regularly spaced grid that is inscribed into the binding pocket. The energies are graphically displayed, for instance, by contouring the spatial area at which the interaction energy reaches or exceeds a certain predefined threshold. They indicate hot spots for the placement of functional groups of a potential ligand. The areas in which the interactions with an aromatic carbon atom or a hydroxyl oxygen atom are favorable are shown for the enzyme thermolysin in Fig. 17.10. Such calculations are carried out with a set of different probes, for instance, a water molecule, an aromatic carbon, a hydrogen-bond acceptor or donor, or a positively or negatively charged group. The results provide valuable informationabout the shape and electrostatic properties of the bindingpocket. Another way of analyzing protein structures is based on the idea that the physical nature of non-bonding interactions in protein–ligand complexes and in the crystal packing of small organic molecules is identical. The latter are particularly inter- esting for this purpose because the crystal structures of small organic molecules are regularly determined with great precision. There are over 500,000 crystal structures stored in the Cambridge Database (▶ Sect. 13.9). This collection is ideal to obtain relevant and reliable data via a statistical analysis for ligand-design purposes (▶ Sect. 14.7). Let us assume that there is a carboxylate group —COO— on the protein that protrudes into the binding pocket. Where must a partner group be positioned to form a favorable interaction? To answer this question, the Cambridge Database was searched first for compounds with carboxylate groups, and then for each of the retrieved groups, the position of the counter group that forms an H-bond to the carboxylate was saved. Finally, the collective of all the found H-bonds was superimposed in that the carboxylate groups of all examples are superimposed exactly onto one another. The distribution of H-bond-donor groups (Fig. 17.11) offers a valuable picture of the allowed area of the H-bond geometry. Subsequently, such a distribution can be superimposed onto the protein structure by matching with the carboxylate group of the protein. Areas in which the distribution overlaps with other atoms of the protein are discarded. In this way the energetically most favorable areas for a counter group in the binding pocket are found. In Fig. 17.12 these distributions are compared with a protein–ligand complex. As expected, the hydrogen-bond geometries found in the complex coincide nicely with the range that was found in the crystal packings of organic molecules. A system of rules for non- bonding interactions in protein–ligand complexes was obtained from the statisti- cal evaluations of all groups that are found in proteins. These rules are compiled at 364 17 Pharmacophore Hypotheses and Molecular Comparisons
  • 382.
    the Cambridge CrystallographicData Centre in the Isostar database. Once superimposed with the protein, they can be contoured to map hot spots of binding with the program SuperStar. Knowledge-based potentials represent another approach for the display of a protein-based pharmacorphore. For this, the contact geometries in protein–ligand complexes are evaluated. A histographical distribution is compiled that shows how often a particular contact occurs between a group of a ligand and an amino acid in the protein. If such a statistical frequency distribution is related to a mean reference Phe114 Asn112 Zn2+ Arg203 O HO HO O HO OH O CH3 N H2O Benzylsuccinic acid Acetone Water Acetonitrile Isopropanol Phenol Fig. 17.10 An analysis of the binding pocket of thermolysin. Areas of favorable interactions were calculated for an aromatic carbon probe (white) and a hydroxyl oxygen atom (red). There are also fragments mentioned in Fig. 7.8 that could be determined by allowing the probe molecules to diffuse into the protein crystals. The calculated hot spot corresponds well with the positions that were crystallographically determined with molecular probes. 17.10 The Protein Defines the Pharmacophore 365
  • 383.
    state, an energyfunction can be calculated from it. In this function it is assumed that contacts that occur more frequently than the average distribution are energetically favorable. If they occur rarely, they are assigned to be unfavorable. These statistical potentials have been integrated into the scoring function DrugScore. They can also be used for the analysis of binding pockets and help to indicate hot spots in the ligand binding. The MCSS method was developed in the group of Martin Karplus. Several thousand random probe molecules such as acetone, water, methanol, or benzene were placed in a binding pocket for this. A computer simulation is started with OH O O OH O OH O− O OH O OH a b c d Fig. 17.11 Hydrogen-bonding geometries (carbon is green, oxygen is red, and hydrogen is white) around a carboxylate group (a), ester group (b), carbonyl group (c), and ether group (d). Structures with these central groups that form hydrogen bonds with OH donor groups were extracted from the Cambridge database. These examples were superimposed based on the geometry of the central group. It is obvious that there is considerable variability in the interaction geometry, but also that preferred orientations are to be found. It is also shown that, for instance, the interaction pattern around an ester group (b) is not simply a superimposition of the distribution around a carbonyl group (c) and an ether group (d). 366 17 Pharmacophore Hypotheses and Molecular Comparisons
  • 384.
    which the singleprobe molecules are moved into optimal positions. They are driven by a calculation according to the underlying force-field. The probe molecules experience the interaction with the protein, but they do not “see” one another. At the end of the calculation a frequency distribution for the probe molecules is obtained. If this distribution is evaluated, a hot spot for an interaction with the protein is highlighted. If the so-obtained hot spots are compiled into a composite picture, a protein-based pharmacophore is obtained. 17.11 The Search for Pharmacophore Patterns in Databases Generate Ideas for Novel Lead Compounds A pharmacophore can be used to search a database for promising candidates that are able to be accommodated in a protein’s binding pocket. The reference pharmacophore can be either derived from a set of superimposed ligands, or a reference protein can define its properties. How such a database search is carried out and what is discovered in the process depends on how much information is stored in the database itself. If only 2D structures are collected, all examples can be retrieved that possess a particular functional group or substructure. Based on the topology, different criteria are defined to determine the degree of similarity between molecules. If the definition of the pharmacophore is very generally defined, for instance, an aromatic compound with an acid group and a basic Ala97 Leu4 Asp26 Tyr155 Fig. 17.12 The distribution of H-bond-donor groups (carbon is white, oxygen is red, and nitrogen is blue) around a carboxylate group or a carbonyl group are superimposed with the 3D structure of the complex of methotrexate with dihydrofolate reductase (Fig. 17.7). The distributions are imposed onto the acid group of Asp26 and the carbonyl groups of Leu4 and Ala97. The hydrogen bonds formed between protein and ligand coincide geometrically with ranges often found in small organic molecules in the crystal structures. 17.11 The Search for Pharmacophore Patterns in Databases 367
  • 385.
    nitrogen atom, thennumerous hits will be found. However, it is important which relative spatial distances are given between these groups. Such information is not taken into account in searching a 2D database. Matthias Rarey and Scott Dixon developed the Feature-Trees method, which can screen large databases according to topological criteria. However, the connectivities of the chemical formulae are not compared. Rather, the database entries are initially classified by the topological sequences of particular characteristics, for instance, the presence of an H-bond- donor group or a hydrophobic cyclic molecular portion. Such a method can compare molecules and find candidates that have pharmacophore properties in a comparable topological sequence extremely quickly. Databases that contain 3D molecular geometries allow the search for the spatial pattern of the pharmacophore. For example, the Cambridge Database of crystal structures of small organic molecules (▶ Sect. 13.9) can be used for such a search. Molecules are found with experimental geometries that satisfy the pharmacophore. In the search for ligands for HIV protease (▶ Sect. 24.3) a pharmacophore pattern was derived from the known crystal structure of the enzyme, and the Cambridge Database was searched for molecules that match this pattern. The result of this search is presented in ▶ Sect. 24.4 (Fig. 24.16) in detail. It inspired the researchers at Dupont–Merck with the first ideas that led to the development of an entirely new class of non-peptidic HIV-protease inhibitors. These days databases containing 3D structures of molecules generated from 2D structural formulae are commonly used along side experimental structural databases. In other approaches, the molecules spatial structure is generated on the fly during the search (▶ Sect. 15.2). Here, as with most entries in the Cambridge Database, each molecule is present in only one conformation. Molecules can, however, adopt many different conformations (▶ Chap. 16, “Conformational Analysis”). It is therefore usually the exception that a flexible molecule exists in the “right” conformation required for the search. Therefore conformational flexibility must be considered during the search. An elaborate search, for example, the active-analogue approach, would demand too much computational time. Therefore fast algorithms have been developed to figure out whether particular pharmacophoric groups on the molecules could fall within predefined distances. It is enough to estimate the minimum or maximum achievable distances. This concept has been realized e.g., in the program UNITY from the company Tripos. One can start from a database holding multiple precalculated conformers. Here it is critical that the stored conformers are distributed as representatively as possible throughout the conformational space (▶ Sect. 16.6). The single conformers are then checked to see whether they fit to the defined pharmacophore. This concept is followed by the program Catalyst from the company Accelrys. It is not to be expected that such database searches directly deliver candidates for clinical trials. As an idea generator, however, they can guide the drug researcher to novel lead structures and can drive synthetic plans down entirely different path- ways. Today database searches are carried out on a large scale during the course of virtual screenings (▶ Sect. 7.6). For this, proprietary compound libraries are screened, or collections of commercially available compounds are searched. 368 17 Pharmacophore Hypotheses and Molecular Comparisons
  • 386.
    John Irwin andBrian Shoichet at UCSF in San Francisco have taken on the initiative with the database ZINC, which collects current commercially available compounds and makes the collection available for database searches. Preset filters help to sieve out the desired subsets for the search at hand from the millions of compounds in the databases. As a major advantage, the found hits can be purchased and experimentally tested in an assay. Many candidates for new lead structures have already been discovered by using this “lead discovery by shopping” strategy (see ▶ Sect. 21.7). 17.12 Synopsis • The structure of the binding pocket determines which functional groups are necessary on the ligand side for successful protein binding. Either the ligand or the protein structure can be used as the starting point from which a pharmacophore is derived. • The superposition of active and inactive small molecule ligands from a series of related compounds upon one another can be used to define the allowed and forbidden areas in a hypothetical binding pocket. Logical operations of volume differences are indicative for the design of optimized ligands. • Flexible molecules that can adopt different conformations present a special challenge in superpositions. The molecules must be energy-minimized as part of the superposition procedure or, alternatively, multiple conformations must be evaluated. • Alternatively, a set of molecules can be superimposed by assigning pharmacophoric groups, and through systematic rotations about all open- chain single bonds a common alignment is found in the active-analogue approach. • Care must be taken to not be deceived by molecules that look similar with respect to their chemical formulae. Instead, the interacting functional groups are important for the molecular recognition at the binding pocket and not the scaffold itself. The role of water in the binding must not be underestimated. • Molecular recognition properties can also be considered to mutually superim- pose molecules. • The synthesis of a structurally rigid analogue (or analogues) can help to define and validate the pharmacophore assignment and the determination of the bio- logically active conformation. • Binding “hot spots” can be found by examining the protein by mapping the binding pocket with small molecules and probes with different properties. These give some ideas as to what sort of molecule might show successful binding to the target protein. • The Cambridge Database of crystal structures provides valuable insights into preferred interaction geometries and motifs. Such information is of high rele- vance for protein–ligand complexes because the forces that are responsible for crystal packing are the same as for non-bonding interactions between active substances and proteins. 17.12 Synopsis 369
  • 387.
    • A varietyof databases are available that can be screened by using a 3D pharmacophore as a search query. Usually, commercially available compounds are screened first. If they show activity on a certain protein of interest, they can be purchased and tested, and will hopefully provide a starting point for lead discovery. Bibliography General Literature Klebe G (1993) Structural alignment of molecules. In: Kubinyi H (ed) 3D-QSAR in drug design, Theory, methods and application. ESCOM, Leiden, pp 173–199 Langer T, Hoffmann RD (2006) Methods and principles in medicinal chemistry. In: Mannhold R, Kubinyi H, Folkers G (eds) Pharmacophores and pharmacophore searches, vol 32. Wiley-VCH, Weinheim Marshall GR (1989) Computer-aided drug design. In: Richards WG (ed) Computer-aided molecular design. IBC Technical Services, London, pp 91–104 Special Literature Bolin JT, Filman DJ, Matthews DA, Hamlin RC, Kraut J (1982) Crystal structure of Eschericha coli and Lactobacillus casei dihydrofolate reductase refined at 1.7 Å resolution. J Biol Chem 257(13):13650–13662 Kearsley SK, Smith GM (1990) An alternative method for the alignment of molecular structures: maximizing electrostatic and steric overlap. Tetrahedron Comput Methodol 3:615–633 Klebe G, Mietzner T, Weber F (1995) Different approaches toward an automatic structural alignment of drug molecules: applications to sterol mimics, thrombin and thermolysin inhib- itors. J Comput-Aided Mol Des 8:751–778 Klunk WE, Kalman BL, Ferrendelli JA, Covey DF (1983) Computer-assisted modeling of the picrotoxinin and g-butyrolactone receptor site. Mol Pharmacol 23:511–518 Kuster DJ, Marshall GR (2005) Validated ligand mapping of ACE active site. J Comput-Aided Mol Des 19:609–615 Mackay MF, Sadek M (1983) The crystal and molecular structure of picrotoxinin. Aust J Chem 36:2111–2117 Marshall GR, Barry CD, Bossard HE, Dammkoehler RA, Dunn DA (1979) The conformational parameter in drug design: the active analog approach. In: Olson EC, Christoffersen RE (eds) Computer-assisted drug design, vol 112, ACS symposium series. American Chemical Society, Washington, DC, pp 205–226 Martin YC (1992) 3D database searching in drug design. J Med Chem 35:2145–2154 Mayer D, Naylor CB, Motoc I, Marshall GR (1987) A unique geometry of the active site of angiotensin-converting enzyme consistent with structure-activity studies. J Comput-Aided Mol Des 1:3–16 Seidel W, Meyer H, Born L, Kazda S, Dompert W (1984) Rigid calcium antagonists of the Nifedipine-type: geometric requirements for the dihydropyridine receptor. In: Seydel JK (ed) QSAR as strategies in the design of bioactive compounds. VCH, Weinheim, pp 366–369 370 17 Pharmacophore Hypotheses and Molecular Comparisons
  • 388.
    Quantitative Structure–Activity Relationships 18 Quantitativestructure–activity relationships, QSAR (usually pronounced [0 ky€ u: sar]), attempt to describe and quantify the correlation between chemical structure and biological activity. The investigated substances should come from a chemically uniform series and must interact with the same biological target. They should also display the same mode of action. For example, structurally analogous inhibitors of a particular protein can be compared among themselves, but not different blood pressure lowering drugs that have diverse modes of action on different target proteins. The correlation of biological activity with the physicochemical properties is always related to relative potency in a test model, but not to different effect qualities. The foundation of quantitative correlations between chemical structure and biological effect is the entirely reasonable assumption that the differences in the physicochemical properties are responsible for the relative potency of the interac- tions of the drug with biological macromolecules. It is assumed in the first approx- imation that these contribute additively to the affinity of an active substance on its receptor. The concept of describing the biological activity of substances with mathematical models is derived from this approach. For the system under investigation, it can be assumed that the simpler it is, the more likely it will be that a quantitative structure–activity relationship can be derived. To a certain extent this is valid for in vitro systems, such as the inhibition of an enzyme or the binding to a receptor, where the assay records only the binding of a compound to a protein. The more complex the system is, for example, central nervous system effects on an animal after oral administration, the more different processes must be considered. In this case the absorption, distribution, blood–brain barrier penetration, further transport to the target tissue, metabolism, and elimina- tion overlap with one another and with the actual effect on the receptor. In principle, an individual structure–activity relationship is required for each of these events. Establishing valid and relevant models for each of these steps, requires corresponding test systems that examine the different steps separately. In favorable cases it might be possible to characterize a complex multistep process by one single equation. This is only feasible if one step, for instance, the penetration through the blood–brain barrier, dominates the entire structure–activity relationship. G. Klebe, Drug Design, DOI 10.1007/978-3-642-17907-5_18, # Springer-Verlag Berlin Heidelberg 2013 371
  • 389.
    18.1 Structure–Activity Relationshipsof Alkaloids The South American dart poison tubocurare (▶ Sect. 7.1) was the first therapeutic principle for which the exact mode of action was elucidated. In 1852, Claude Bernard recognized that this quaternary alkaloid causes muscle paralysis, but that the nerve as well as the muscle remain independently excitable. Curare must therefore act on the coupling between nerve and muscle. Scottish pharmacologists Alexander Crum-Brown and Thomas Fraser occupied themselves somewhat more exhaustively with the question of whether the quaternization of the nitrogen atom of different alkaloids (Fig. 18.1) has an influence on their biological effects. In 1868, from entirely different effects observed before and after the transformation of alkaloid, they formulated a general equation to describe structure–activity rela- tionships (Eq. 18.1). F ¼ fðCÞ (18.1) This equation is ingeniously simple, but it says only that F (Greek letter Phi), the biological activity, is a function of C, the chemical structure. At that time, the tetrahedral structure of the carbon atom had not been clarified, and the constitution of many organic compounds, above all complex natural products, was entirely unknown. 18.2 From Richet, Meyer, and Overton to Hammett and Hansch In 1893, Charles Richet published an investigation on the toxicity of organic compounds. From the comparison of the water solubility of ethanol, diethyl ether, urethane, paraldehyde, amyl alcohol, and absinthe extract(!) to the lethal dose in the dog, he concluded plus ils sont subles, moins ils sont toxiques, that is, the better the solubility, the less the toxicity. This was the first evidence of a linear inverse relationship between water solubility and biological activity. R2 N R1 H R3 R2 N R1 R3 R2 N R1 CH3 R3 pH 9 pH 9 + + Positively Charged Form Neutral Alkaloid Quaternary, Perpetually Charged Form e.g. CH3I Fig. 18.1 The protonation of a tertiary amine depends on the pH value of the medium (left). On the other hand, the quaternization of a nitrogen atom leads to a permanently positively charged compound (right). 372 18 Quantitative Structure–Activity Relationships
  • 390.
    Around the turnof the twentieth century, pharmacologist Hans Horst Meyer and botanist Charles Ernest Overton founded the lipid theory of anesthesia indepen- dently, which unifies three important statements: • All chemically unreactive substances that are lipophilic and can be distributed in biological systems have anesthetic effects. • The biological effect occurs in nerve cells because fat plays an important role in their function. • The relative potency of anesthetics depends on their partition coefficient (▶ Sect. 19.2) in a mixture of fat and water. The work of Crum-Brown, Fraser, and Richet, or the contribution of Meyer and Overton can be seen as the origin of quantitative structure–activity relationships. In fact after the formulation of the anesthesia theory, numerous other linear, and later non-linear, dependencies on the lipophilicity, the “fat affinity” of active substances, were found. But all of these activities were relatively unspecific “membrane” effects. In the middle of the 1930s Louis P. Hammett formulated a relationship between the electronic properties of the substituents and the reactivity of aromatic com- pounds. Accordingly, the relative contribution of electron-withdrawing and elec- tron-donating substituents on the electron density of the aromatic ring is always constant. They are determined by the electronic parameter of the substituent, the Hammett constant, s. Electron-accepting substituents with positive s values are, among others, the nitro group, the cyano group, and the halogens. Electron- donating substituents with negative s values are hydroxyl and amino groups, the methoxy group, and alkyl substituents. Acceptor substituents enhance the acidity of benzoic acids and phenols, they reduce the basicity of anilines, and they accelerate the basic hydrolysis of benzoic ethers. Electron-donating substituents exert an opposite influence. However an individual reaction constant r must be applied for each reaction type of aromatic compounds. By using Eq. 18.2, later generally called the Hammett equation, the equilibrium constant K for an arbitrary reaction can be calculated from r and s. R–X and R–H represent the relevant aromatic compounds substituted with the group X, or unsubstituted, respectively. rs ¼ log KRX log KRH (18.2) Acceptor and donor substituents influence the electron density on the heteroatoms and reduce or increase the ability to form hydrogen bonds. This, among other things, explains the electronic influence of aromatic substituents on the biological activity of drug molecules. The Hammett equation was therefore seen as a challenge to phar- maceutical chemists and biologists to derive quantitative structure–activity relation- ships from this concept. Many groups have made efforts to find relationships between biological activity and the Hammet constants s, or between s and/or r-analogous substituents and to derive test parameters for biological systems. Despite individually interesting results, no generally valid concept could be established. 18.2 From Richet, Meyer, and Overton to Hammett and Hansch 373
  • 391.
    It was CorwinHansch and Toshio Fujita who in 1964 published a work that established the fundamentals for quantitative structure–activity relationships. In this, they describe: • The definition of a lipophilicity parameter p, analogous to the electronic term s in the Hammett equation. • The combination of different parameters in a model. • The formulation of a parabolic model for the description of non-linear lipophilicity–activity relationships. 18.3 The Determination and Calculation of Lipophilicity Corwin Hansch had previously investigated the structure–activity relationship of phenoxyacetic acids, which show growth-stimulatory effects in plants. In addition to their biological activity, he was particularly interested in their lipophilicity, which can be measured by the partition coefficient in an octanol/water system (▶ Sect. 19.1). It occurred to him while analyzing the data that the lipophilicity is an additive molecular parameter. The logarithm of the octanol/water partition coefficient P is given by the sum of the group contributions of the individual parts of the molecule. Hansch defined a lipophilicity parameter p (Eq. 18.3), analo- gously to the Hammett equation. R–X and R–H have the same meaning here as in Eq. 18.2. The absence of a reaction-specific r term in Eq. 18.3 is because the p value is based on a single distribution system: n-octanol and water. p ¼ log PRX log PRH (18.3) n-Octanol was chosen for theoretical and practical reasons. It has a long aliphatic chain and a hydroxyl group that is an H-bond donor as well as an acceptor. Its structure therefore resembles the membrane lipids to some extent. It dissolves a large number of organic compounds, it has a low vapor pressure, but can nonethe- less be easily removed. Its UV transparence over an extremely wide range is particularly advantageous. With the help of the lipophilicity parameter p, the log P values of new com- pounds, and therefore their lipophilicity, can be calculated. For this the lipophilicity of the basic scaffold and the p values of the substituents must be known. In this way the biological activity can be correlated without the tedious experimental measure- ments of each individual partition coefficient. In addition to the p values of all important substituents, a very large number of experimentally determined octanol/ water partition coefficients are available in the literature. 18.4 Lipophilicity and Biological Activity Lipophilicity has an overwhelming role in describing the dependence of biological effects on chemical structure and therefore accounts for many quantitative 374 18 Quantitative Structure–Activity Relationships
  • 392.
    structure–activity relationships. Thisis easily understood because biological systems consist of aqueous phases that are separated by lipid membranes. The transport and the distribution of small molecules in such systems must therefore depend on the lipophilicity. For polar substances the lipid membrane represents a barrier that they cannot surmount. Only substances with moderate lipophilicity have a good chance to “migrate” into the aqueous as well as the lipid phases to arrive in adequate concentrations in the target tissue (▶ Chap. 19, “From In Vitro to In Vivo: Optimization of ADME and Toxicology Properties”). Although soluble proteins carry overwhelmingly polar amino acid residues on their surfaces, the more- or-less buried binding site for ligands is constructed from polar and non-polar areas. The hydrophobic parts of the ligand bind to the hydrophobic parts of the pocket. The size of these hydrophobic surface areas is always limited. The size and form of the lipophilic portion of the ligand must fit to the hydrophobic surfaces in the binding pocket. Because the natural ligands that are normally bound in these pockets have adequate water solubility themselves, the lipophilic areas in the binding pockets are of limitedsize.Anotherreasonforthe complex,generallynon-linearlipophilicity–activity relationships results from this fact. Many linear and non-linear lipophilicity–activity relationships describe relatively unspecific biological effects such as anesthetic, bactericidal, fungicidal, and hemolytic effects. They shall not be further discussed here. Other relationships describe the transport and distribution in a biological system. Such structure–activity relationships are discussed in ▶ Chap. 19, “From In Vitro to In Vivo: Optimization of ADME and Toxicology Properties”. 18.5 The Hansch Analysis and the Free–Wilson Model In 1964 Corwin Hansch and Toshio Fujita derived a mathematical model more intuitively than theoretically that can quantitatively describe structure–activity relationships, the Hansch analysis (Eq. 18.4). log 1 C ¼ k1ðlog PÞ2 þ k2 log P þ k3s þ K k (18.4) In Eq. 18.4, C is a molar concentration that induces a particular biological effect. When related to a series of substances, it is the equieffective molar dose. Log P is the logarithm of the octanol/water partition coefficient P, and s is the Hammett constant. The square of the log P term allows the quantitative description of non-linear lipophilicity–activity relationships. This term is omitted when the dependence is linear. Other terms such as polarizability and steric parameters can additionally occur. The coefficients k1, k2,. . . and k are determined with the method of regression analysis. The Hansch analysis therefore establishes a hypothetical model for quan- titative relationships between biological activity and physicochemical parameters. Biological data are flawed, and the same is true for physicochemical properties. Despite this, the reliability of the latter parameters is usually greater than those of the biological data. The result of a calculation is judged by the squared differences 18.5 The Hansch Analysis and the Free–Wilson Model 375
  • 393.
    between the measuredbiological data and the values that were calculated from the model. The sum must show the smallest possible value over all of the investigated compounds. It represents an important criterion for the judgment of the quality of a model, or for the comparison of different models with different qualities. The quantitative structure–activity relationship of the antiadrenergic effect of N,N-dimethyl-b-bromophenethylamines 18.1 (Table 18.1) is considered as an example. According to their structure, these compounds more or less reverse the agonistic effect of an adrenaline dose. The value C is the dose of an antagonist that blocks the adrenaline effect by 50%. The data can be described with the Hansch model, which is illustrated in Fig. 18.2. The description of the entire data set is possible with a mathematical model by using the derived equations. A carbocation is formed upon cleavage of bromine, Table 18.1 The biological activity of meta- and para-substituents of phenethylamines 18.1 (i.v. application in the rat; C in mol/kg rat) meta (X) para (Y) log 1/C Br X Y N Y x HCI 18.1 H H 7.46 H F 8.16 H Cl 8.68 H Br 8.89 H I 9.25 H Me 9.30 F H 7.52 Cl H 8.16 Br H 8.30 I H 8.40 Me H 8.46 Cl F 8.19 Br F 8.57 Me F 8.82 Cl Cl 8.89 Br Cl 8.92 Me Cl 8.96 Cl Br 9.00 Br Br 9.35 Me Br 9.22 Me Me 9.30 Br Me 9.52 376 18 Quantitative Structure–Activity Relationships
  • 394.
    and the substancesbind irreversibly to the adrenergic receptor. Accordingly, the sþ term is found in the Hansch equation (Fig. 18.2), which describes such reaction types particularly well. Lipophilic substituents increase the biological activity (positive p term) and electron-withdrawing substituents decrease it (negative sþ term). Therefore lipophilic electron-donating substituents, for example, large alkyl substituents, should be optimal for the activity. Second, within certain limits, the effect of further compounds can be predicted. Interpolations, that is, conclusions that are drawn based upon very similar substituents, have a better reliability than extrapolations, which are predictions made outside of the parameter space, for instance, for considerably more lipophilic, more polar, or larger substituents. As a first approximation, it can be said of the statistical parameters r, s, and F (Fig. 18.2) that the correlation coefficient r should have values that are close to 1.00, the standard deviation, s, should be as small as possible, and the F value should be as large as possible. The better the criteria are fulfilled, the better the quantitative model will be, in other words, the experimental and calculated values agree better with one another. Also in 1964 and independently of Hansch and Fujita, S. R. Free and J. W. Wilson developed an entirely different model for structure–activity analysis. Because the original approach is confusingly formulated and awkward to use, here only a variant shall be discussed that was later proposed by Fujita and T. Ban, the Free–Wilson analysis. The Free–Wilson analysis assumes that within a set of chemically related C = Molar concentration that invokes a particular biological effect Regression coefficient values 95% Confidence interval for the coefficients and constants Log 1/C = 1.15 (±0.2) p -1.46 (±0.4) s+ + 7.82 (±0.2) The logarithm of the reciprocal value gives the correct scaling Lipophilicity parameter Electronic parameter Constant term (n = 22; r = 0.945; s = 0.196; F = 78.6) The Fischer value F is a measure of the significance; it is often not reported The standard deviation, s, is a measure of the absoute quality of the model The correlation coefficient, r, is a measure for the quality of the model Number of compounds Fig. 18.2 A QSAR equation delivers individual parameters for a quantitative model for the prediction of biological activity, in this case from substituted N,N-dimethyl-b- bromophenethylamines (Table 18.1). 18.5 The Hansch Analysis and the Free–Wilson Model 377
  • 395.
    substances, a referencecompound, usually the unsubstituted starting compound, makes per se a specific contribution m to the biological effect. Each substituent on this scaffold delivers an “additive and constitutive” contribution ai to the biological activity (Fig. 18.3). Additive, because there is no consideration of structural variation in other positions in the molecule, and constitutive because it does matter on what position of the molecule the specific structural change is undertaken. Despite these relatively simple assumptions, the Free–Wilson analysis delivers good quantitative models for many structure–activity relationships. In contrast to the Hansch analysis, which compares properties, the Free–Wilson analysis is a real “structure–activity analysis,” because the parameter that codes for the structural information (1 for present, 0 for absent) correlates with biological effects. It is easily carried out, but the structures and the biological data must be known. Unfortunately, the Free–Wilson analysis also has disadvantages: • The structural variation must be present on at least two different substitution sites, because otherwise there will not be enough degrees of freedom to use statistical methods. • The usually large number of variables diminishes the predictive value and reliability of the analyses. • Predictions are only possible for combinations of substituents that have already been considered in the analysis, and not for new substituents. If the Free–Wilson analysis is applied to the above-mentioned antiadrenergic phenethylamine example, the values in Table 18.2 are obtained for the scaffold and the substituent contributions. Even after a quick glance, an increase in the values from Basic Scafforld (contribution μ) X1 Xn Active substance Free-Wilson Model: X2 log 1/C = Σ ai + m (Contribution a1) (Contribution a2) (Contribution an) Fig. 18.3 The Free–Wilson analysis uses the additive nature of the group contributions to describe the biological activity. Accordingly, the biological activity in the displayed equation is made up of the activity of the basic scaffold, m, and the constant group contributions ai of the substituents Xi. Table 18.2 Free–Wilson group contributions for phenethylamines Position H F Cl Br I Me meta 0.00 0.30 0.21 0.43 0.58 0.45 para 0.00 0.34 0.77 1.02 1.43 1.26 m ¼ 7.82 (n ¼ 22; r ¼ 0.97; s ¼ 0.19)a a For an explanation of these values see Fig. 18.2 378 18 Quantitative Structure–Activity Relationships
  • 396.
    F to Cland Br to I, that is, the influence of the lipophilicity, is obvious. Despite having almost the same lipophilicity, the methyl and chloro substituents are different. This is explained by their different electronic properties. Differences in the meta and para position on the electronic influence can also be followed. Therefore the Free–Wilson analysis indeed has advantages for the analysis of substituent effects. 18.6 Structure–Activity Relationships of Molecules in Space As was shown in the previous section, an attempt is made to correlate structure– activity relationships with substance-specific parameters. These parameters, for example, volume, polarizability, or lipophilicity are properties that are calculated or measured for the entire molecule or for specific groups of substituents. The 3D structure of the molecules is only conditionally considered by these descriptors. Therefore in the context of increasing knowledge of the spatial structure of protein– ligand complexes, the QSAR methods focus on parameters that can be derived from the 3D structure. As a general rule the goal of these approaches is to calculate binding affinity. The techniques can also be applied for the description of other biological properties such as the bioavailability or the metabolic reactivity (▶ Chap. 19, “From In Vitro to In Vivo: Optimization of ADME and Toxicology Properties”). To distinguish them from the above-described classical QSAR tech- niques, these are referred to as 3D-QSAR methods. Ideally, parameters are desired that can be read directly from the 3D structure of an active substance and that can be used to draw conclusions about their binding affinity. The interplay between these parameters and the activity are, however, very complex and even today are by no means fully understood. Furthermore there are still many other biological systems on which one would like to apply 3D-QSAR methods, but the structures of the relevant target proteins are unknown. Many pharmacologically relevant receptors are membrane bound, and their structure determination has proven to be extremely difficult. The knowledge of their struc- tures is, however, a prerequisite for a reasonable estimation of the binding affinity of a ligand from the geometry of the formed complex (▶ Chap. 4, “Protein–Ligand Interactions as the Basis for Drug Action”). As a consequence, an attempt is made to calculate not the absolute values of the binding affinities from these incomplete data, instead relative affinity differences between active substances in a data set are concentrated upon. The gradual changes in the substance-specific parameters are then correlated with the biological data. 18.7 Structural Alignment as a Prerequisite for the Relative Comparison of Molecules Assumptions about the spatial structure of molecules are already considered in classical QSAR techniques. Different positions of substituents, for example, in the meta or para position of an aromatic ring, are often described by individual 18.7 Structural Alignment as a Prerequisite for the Relative Comparison of Molecules 379
  • 397.
    parameters. In thisform they are regarded in the Hansch equation as well as the Free–Wilson analysis (Sect. 18.5). Moreover, indicator variables for different configurations of substituents, for example, the configuration of stereoisomers, are defined in classical QSAR models. An analogous orientation of the molecule in a hypothetical binding pocket is assumed for the use of these parameters. For example, it is assumed that all ortho substituents are oriented toward the “same side” in a series of ortho-substituted derivatives. As a prerequisite structure–activity relationships that correlate the biological activity with properties of the 3D structure need a spatial superposition of the active substances. This superposition should approximate the relative orientation in the binding pocket as accurately as possible. A technique was discussed in ▶ Chap. 17, “Pharmacophore Hypotheses and Molec- ular Comparisons” that can be used for the calculation of these spatial superpositions. 18.8 Binding Affinities as Compound Properties Which substance-specific characteristics can be used to correlate the properties of the 3D structure with the binding affinity? As was discussed in ▶ Chap. 4, “Protein– Ligand Interactions as the Basis for Drug Action,” the binding affinity is composed of enthalpic and entropic components. The first contribution comprises everything that depends on the direct energetic interaction. These are predominantly of a steric (van der Waals potentials, ▶ Sect. 15.4) or electrostatic (Coulomb potentials) nature. The second contribution concentrates on the degree of ordering and the distribution of the energy over the different degrees of freedom of the studied system. The ligands as well as the binding pockets of a protein are solvated by water molecules in the uncomplexed state. Upon complex formation, the enthalpic inter- actions to these water molecules are lost. They are replaced by direct interactions between ligand and protein. Because only the relative differences between mole- cules of a data set are of interest, any effects that are the same for all derivatives are ignored. Among these effects are practically all influences that affect the protein. This omission is certainly a rough simplification because the protein changes its solvation state upon ligand binding. Water molecules are displaced from the binding site. Ligand-induced adaptations of side chains in the binding pocket or changes in the rotational degrees of freedom of methyl groups and side chains (▶ Sect. 4.10) are imaginable. These effects are either not considered or are accepted as being the same for all molecules in the data set. Presumably this assumption is valid for many cases. Nonetheless, many new investigations clearly show that changes affecting the protein or the dynamics of the ligand are often not constant within a series of compounds. Here the methods will fail. In the beginning only the steric and electrostatic interactions of an active substance in the binding pocket should be taken into consideration. How can these properties be compared for a series of ligands? A first approach to this was the hypothetical interaction models from Hans-Dieter Höltje and Lemont B. Kier. The decisive prerequisite of the latter models was the choice of and spatial positioning 380 18 Quantitative Structure–Activity Relationships
  • 398.
    for amino acidside chains around the ligands. These assumptions can be dropped once the molecules are embedded in a lattice and can be explored with an interac- tion probe. Richard Cramer and M. Milne proposed such a model in 1978. It took another 10 years until the generally applicable CoMFA method (Comparative Molecular Field Analysis) was established. Despite many theoretical and practical deficiencies with their application, the method was quickly accepted. Today it is applied in many different variations. Before such an analysis can be practically carried out, a few basic consider- ations should be made. Do steric and electrostatic interactions consider all contributions to ligand binding that lead to a correct relative ranking of binding affinity? As already mentioned, the binding affinity is composed of enthalpic and entropic contributions. A sampling of the properties via probes to map interac- tions certainly affords a measure for how well a molecule can undergo energet- ically favorable interactions. How well are the entropic contributions considered? A considerable portion is made up of solvation and desolvation processes (▶ Sect. 4.6). These processes change the local water structure around the ligand and in the binding pocket. The water structure in the immediate vicinity of the hydrophobic surfaces of the ligand is more ordered in the solvated state than it is in bulk water. The transition of such ligands out of the bulk water into the protein’s binding pocket immediately causes a certain number of water molecules to adopt a less-ordered state. This increases the entropy of the system and pro- motes spontaneity in the binding process. The number of water molecules that are involved in this process depends on the size of the hydrophobic surface of the ligand. Furthermore the displacement of the water molecules from the binding pocket upon ligand binding increases the disorder of the examined system and also increases its entropy. In the above-mentioned approximation it is assumed that this water-related effect is the same for all molecules in the data set. Therefore, it is not considered in a relative comparison. Additionally, a molecule can move “freely” in an aqueous solution and adopt different confor- mations. In the binding pocket, however, it is fixed predominantly in one partic- ular conformation. Rotational, translational, and conformational degrees of freedom are lost, and the system loses entropy. All of these influences are to be taken into consideration for the correct treatment of affinities. 18.9 How Is a CoMFA Analysis Performed? The most important and most often used method for 3D structure–activity analysis is the CoMFA method. The execution of a CoMFA study first requires the choice of a data set of suitable compounds. This data set should encompass around 50–100 compounds with related overall geometry. It should also be ensured that all sub- stances bind to the same protein at the same site, and that a binding affinity is known for all of them. The ligands must possess a given diversity with regard to their structural variation. Their binding affinities should scatter over at least three orders of magnitude. Conformations are generated for all of the molecules 18.9 How Is a CoMFA Analysis Performed? 381
  • 399.
    (▶ Chap. 16,“Conformational Analysis”) and are superimposed by using one of the techniques discussed in ▶ Chap. 17, “Pharmacophore Hypotheses and Molec- ular Comparisons”. As a general rule, the spatial structure of the protein, if known, is taken, and the ligands are mutually aligned in the binding pocket. Finally the superimposed molecules are embedded in a lattice (Fig. 18.4) that encloses them by a broad margin. The intersections of the lattice should show a grid spacing of 1 or 2 Å. A probe, that is, an atom with the properties of hydrogen, carbon, or oxygen, or a particle with a formal charge, is placed at each of the grid points. The interaction energies are calculated between this probe and each molecule in the data set. The collective interaction contributions on the grid are referred to as the interaction field of the molecule. This also gave rise to the name of the method. Finally the fields of the molecules in the data set are compared with one another. If the box size is 10–20 Å, and a grid spacing of 1–2 Å is applied, there are many thousands of field values per molecule of the data set to be handled. This huge amount of data means that the evaluation of the fields can be computationally very intensive. 18.10 Molecular Fields as Criteria of a Comparative Analysis Steric and electrostatic interactions are described by a Lennard-Jones or a Coulomb potential (Fig. 18.5) in force-fields (▶ Sect. 15.4). If the distance between a probe and an atom of the molecule approaches zero, the Lennard-Jones and Coulomb potentials increase toward infinity. With like-charged particles the Coulomb poten- tial approaches infinity with oppositely charged particles negative infinity. These values reach extremely high field contributions at the grid points that fall near the surface or lie inside a molecule. They must be avoided in a CoMFA analysis. Therefore the field contributions above and below a particular threshold are set to a predefined cut-off value. According to these procedures, a Lennard-Jones or a Coulomb potential can be calculated. Aliphatic carbon atoms, for example, can be used as probes. These probes are given a positive or negative charge to study the electrostatic properties of the molecules. The program GRID of Peter Goodford was introduced in ▶ Sect. 17.10. Molecular fields can be calculated with this program for numerous probes that describe different functional groups. For each predefined probe there are areas in space at which favorable or unfavorable interactions between the probe and the examined molecule are to be expected. Moreover, other fields can also be defined aside from fields that probe the steric and electrostatic properties of molecules. Further above in ▶ Sect. 18.8, it was discussed that the hydrophobic surface of a molecule represents a measure for the entropic contribution, particularly upon transfer from the bulk water phase. Molec- ular fields were developed in the group of Donald Abraham that allow the hydro- phobic properties of molecules to be explored (program HINT). These are calculated by using a very similar distance-dependent function. The resulting molecular field describes the lipophilicity distribution on the surface of a molecule. 382 18 Quantitative Structure–Activity Relationships
  • 400.
    Comp. –lg(Ki) S1S2 S3 .... .... .... Sn E1 E2 E3 En 4.15 . . . . 3.89 6.74 8.83 −lg(Ki) = y + a S1 + b S2 + c S3 + ... + h Sn + k E1 + m E2 + n E3 ... + z En 5.74 Fig. 18.4 A grid is generated for the calculation of molecular fields that broadly encompasses a molecule. The grid points are color-coded, with increasing distance from the ligand (red yellow green blue gray). The contributions from the chosen fields are calculated at points of the lattice, which have a grid spacing of 1–2 Å. The field contributions at each point in the grid (S1, S2,. . .Sn, E1, E2, . . . En) are written into a table. The analysis is carried out for all molecules in the data set. The binding affinities are incorporated into the table as, for instance, –log (Ki). The field contributions are weighted with appropriate coefficients (a, b, . . .z) and using a special statistical method, the PLS analysis, they are related to the affinity. A model is obtained in the form of an equation that indicates at which grid points and with what weight the different field contributions explain the biological activity. 18.10 Molecular Fields as Criteria of a Comparative Analysis 383
  • 401.
    18.11 3D-QSAR: Correlationof Molecular Fields with Biological Properties Let us assume that multiple molecular fields for each molecule in a data set have been calculated, and a correlation of their differences with the binding affinity is attempted. How are these differences expressed? For this we want to consider three hypothetical examples of substituted phenyl derivatives. • First, all of the substituents on the phenyl ring in a compound series should be varied so that increasingly large field contributions result in the vicinity of the substituent when being scanned with a positively charged probe. If the binding affinities increase in the same way as the field contributions become larger, this will be reflected in the quantitative analysis. It means that derivatives with increasingly positively charged groups in this molecular region are more potent substances. E(r) r 0 Lennard-Jones Potential Coulomb Potential Coulomb Potential (opposite charges) Cut-off value Cut-off value Gauss Curve Fig. 18.5 The Lennard-Jones potential (green) is a model for describing the intermolecular interactions of two atoms without considering their charge. Negative potential values correspond to mutual attraction, positive values correspond to a repulsion of the particles. If a reciprocal distance becomes infinite, the potential approaches zero. Upon approach it goes through a shallow minimum due to alternating polarization. At even shorter distance it very steeply rises toward positive infinity because of atom-atom repulsions. The Coulomb potential (blue) considers only electrostatic inter- actions that formally reside as point charges on the atomic nuclei. It also approaches infinity when the distance disappears for like-charged particles. For oppositely charged atoms, negatively infinite values result. The hyperbolic form of the Coulomb potential is considerably less steep, so that the particles can still “feel” one another at larger distances. Boundary values are set for potentials in a CoMFA analysis. A Gaussian function, which takes the course of a bell-shaped curve (here only the right half of the “bell” is shown) describes the distance dependence of the interaction potential between the particles in the context of the CoMSIA model. As the distance disappears between the particles, the curve reaches its maximum value, which remains finite. 384 18 Quantitative Structure–Activity Relationships
  • 402.
    • A secondexample should be positioned a little bit differently. Now the phenyl ring substituents are given positive or negative partial charges. Their variation has no influence on the potency of the substances. The quantitative analysis shows that the changes in the electrostatic field contributions have no correlation with the biological activity. A possible explanation might be that this effect and another property, for example, the size of the substituents, mutually cancel their influences. It could also be that the biological activity is influenced through other qualities of the substituents, for instance, their hydrophobic character. • In the third case, the electrostatic properties of the substituents that are important for binding to the receptor should be hardly varied at all at the examined position. There might be different substituents present, however, they all have comparable partial charges. The model that analyzes the field contributions in the vicinity of these groups does not recognize differences and therefore also does not correlate with the binding affinity. It can indeed be that a class of substituents at a particular position on a molecular scaffold is actually very important for binding but nonetheless it remains insignificant in the anal- ysis. This has to do with the fact that a QSAR analysis only performs a relative comparisons within a data set. These examples are still easily manageable. The question can be posed whether a tedious correlation method with the “detour” via molecular fields is really needed. The situation is more complicated in practice, above all if molecules with different scaffolds are considered. The substituents do not fall exactly on top of one another in the molecular superposition. Their contribution must be described as a field in space and only as such they can be evaluated. At any rate, these examples under- score the importance for careful planning of the analysis. The structures in the data set must be chosen so that they have the largest possible variation of substituents and their properties. 18.12 Graphical Interpretation of the Results of a Comparative Molecular Field Analysis If the full complexity of the field contributions is considered in terms of a multidimensional matrix, a straightforward regression analysis cannot be applied to extract the interdependence of the variables, for example, the binding affinity. PLS analysis (partial least squares) is a statistical method that extracts relevant and explanatory factors, so-called PLS vectors, out of the large quantities of data. In CoMFA analysis these vectors describe the area of the fields that correlate best with the experimentally determined affinity. The result is an equation that is analogous to the results of the classical QSAR methods. It shows to what extent particular grid points in the individual fields contribute to the binding affinities. Depending on how many field points there are to be evaluated in the analysis, a strict monitoring of the statistical significance of the derived results must be undertaken. This significance is checked by a particular test: the crossvalidation. 18.12 Graphical Interpretation of the Results of a Comparative Molecular Field Analysis 385
  • 403.
    One or morecompounds are randomly extracted from the data set. A model is constructed with the remaining derivatives and the affinities of the removed com- pounds are predicted with this model. The removal of compounds is repeated several times, in the simplest case, so often until all substances have been removed one time. The quality of the prediction represents a measure for the reliability and significance of the model. The achieved result is expressed with the q2 value, which can be calculated from the square of the deviation from the predicted value. It takes on values from 1 to +1. A value of +1 indicates that a perfect model was achieved. All predictions exactly agree with the measured binding affinities. There is no deviation. A value of q2 ¼ 0 indicates that the predictions of the model are no better than no model at all; it is just as good as the average of all affinities. If q2 takes on negative values, the model is worse than the average, that is, worse than no model. A model is therefore only to be trusted when the q2 value lies above 0.4–0.5. Another step must be performed to check the predictive value of a trained model. For this, a test data set of molecules is needed that are similar to the molecules in the training data set, but that were not used for the training. The binding affinities are predicted for these molecules. It is only if the correlation coefficient for this set is of similar size to that of the training set that the model possesses adequate predictive power. The derived model can be used to estimate the affinity of new compounds that have not yet been synthesized. The conformations of these compounds are calculated and superimposed on the other structures. They must fall within the grid that was defined in the training set. Next their field contributions are calculated. By using the correlation derived by CoMFA for the training set, it is possible to compute which grid points are predictive with respect to the binding affinity of new compounds. CoMFA techniques establish a correlation between activity data and molecular properties. A model can be derived that encompasses the properties of new mole- cules, from the relative comparison within a training set. Relevant predictions are only to be expected when the structural variations in the new molecule remain within the scope of the model. In other words, the model cannot make predictions about the influence of substituents that occur in areas in which there were no structural variations in the training set. CoMFA models interpolate between field contributions from molecules. An extrapolation to areas that were not covered by the data set is not possible. The results of a CoMFA analysis can be graphically evaluated. From the model it is known at which grid points field contributions are obtained that contribute significantly to explain the binding affinity. These contributions can be contoured for the different fields according to their importance. They indicate volume areas around the molecules in which changes in the field contributions run parallel or opposite to the affinity changes in the data set. These contour maps significantly support the design of new active substances (Sect. 18.14). They indicate the position at which the properties of a lead structure have to be varied so that an increase in affinity can be achieved. 386 18 Quantitative Structure–Activity Relationships
  • 404.
    18.13 Scope, Limitations,and Possible Expansions of the CoMFA Analysis Usually only steric and electrostatic field contributions are evaluated in CoMFA analyses. A hydrophobic field can quantify the size of the hydrophobic surfaces and therefore partially considers the entropic contribution to affinity. Because CoMFA evaluations yield relevant models without the explicit use of hydrophobic fields, these field contributions must be at least partially contained in Lennard-Jones and Coulomb fields. The lipophilicity of a molecule increases upon enlarging an uncharged, sterically demanding group, for instance, from methyl to butyl. Here the changes in the steric field contributions can correctly reflect the lipophilic surface. A correlation with electrostatic properties is also imaginable. Hydrophobic molecular portions carry, as a general rule, only minor partial charges. Positively or negatively charged groups represent hydrophilic regions. In this way the lipophilic and hydrophilic surface regions can be quantified via differences in the charge. The deviation that is not explained by a CoMFA model comprises, apart from experimental errors, also all inadequately described binding contributions. These include structural adaptations of the protein that are not identical for all compounds in the data set. Entropic contributions that come from the conformational fixation of the active substance in the binding pocket or the residual mobility of the ligand in the binding pocket are also not considered in any of the fields. In addition to these inadequacies, the fields themselves cause a few problems. Due to their mathematical function behavior, very large and/or very small values are achieved at the surface or in the interior of the molecule (Fig. 18.5). Because the Lennard-Jones potential increases faster upon approaching the atoms than the Cou- lomb potential does, both achieve arbitrarily set cut-off values (Sect. 18.10) at different distances from the molecule. Within a distance of 2 Å, which is the commonly chosen grid spacing, the extremely steep Lennard-Jones potential can change from practically zero to the cut-off value. These discontinuities and the neglected areas near the surface can cause significant problems for the interpretation. Furthermore, they often cause fragmented contour maps in the individual fields that are difficult to interpret. The deficits in these fields have stimulated the search for other solutions. In one method the similarity of molecules is investigated by use of their steric and physicochemical properties in space and correlated to the binding affinity (CoMSIA methods; Comparative Molecular Similarity Indices Analysis). The molecules are superimposed just as they are in the CoMFA methods. Then their relative similarity is determined through their relationship to a probe, a carbon atom for instance, in that the similarity of each molecule is sampled with a probe at the intersections of a surrounding grid. The measure of similarity between the probe and the molecule is defined in a distance-dependent way. A Gaussian function (Fig. 18.5) is chosen for this purpose. In contrast to the hyperbolic form of the above-described potentials, the Gaussian bell-type curve approaches for decreasing distances finite values instead of infinity. Cut-off values need not be set. For many different properties a similarity is determined at all grid points. The prerequisite is that the properties must be described by atom-based values, for example, partial 18.13 Scope, Limitations, and Possible Expansions of the CoMFA Analysis 387
  • 405.
    charges or atomicvolumes. The same distance dependency is used for all proper- ties. Property-specific similarity fields are obtained. These are correlated with the binding affinity. The interpretation of the field contributions is achieved analo- gously to the CoMFA method. The advantage of this method lies, above all, in the interpretability and the preserved contour maps. If a particular property in an area of the superimposed molecules correlates significantly with binding affinity, this area is enhanced. In contrast, the CoMFA method contours areas outside of the mole- cules, where a property reveals changes in the field contributions that affect the affinity positively or negatively. The setting of cut-off values, however, masks entire areas of these field contributions near the surface (Fig. 18.5). 3D-QSAR analyses were first meant to establish structure–activity relationships in cases when the target protein’s structure was unavailable as a reference. Nowadays, more and more crystal structures of the target proteins become available, so, the technique is increasingly used for cases in which this reference is actually known. It serves as a method of generating a reasonable and relevant superpositions of the substances to be compared in their biologically active conformations. It seems all the more paradoxical to use the information about the surrounding protein environment only to superimpose the molecules and then to relinquish this valuable data in the comparative field analysis. Methods have been developed that consider this informa- tion. The group of Rebecca Wade at EMBL in Heidelberg have developed the COMBINE method. For this, a set of modeled protein–ligand complexes are used to calculate a data table. It contains the interaction energies between individual ligand atoms in the test molecules of the data set and the amino acid residues and water molecules in the surrounding protein. The interpretation of this enormous data table is achieved by using a technique that is similar to the CoMFA methods. The graphical interpretation of the correlation model obtained by COMBINE indicates which regions of the protein account for decisive contributions to explain the affinity differences in the ligand data set. These are very valuable details, but they only help a little for the design of better molecules that achieve higher affinity. Holger Gohlke in Marburg developed the variation AFMoC (Adaptation of Fields for Molecular Comparison), with which it is possible to transfer information about the protein environment into the field-based model. The advantage of the intuitive interpretation of the field contributions with regard to the structural optimization of the ligands is not lost. For this, values are generated on a COMFA-like grid by using the empirical scoring function DrugScore (▶ Sect. 17.10) by placing atomic probes at each grid point. The resulting values reflect the protein environment and the grid has been “prepolarized.” By using a docking and superposition technique, the ligands of the training set are then placed onto this grid. It is only when an atom of the ligand falls upon an area of the grid for which the protein environment has predicted this atom type as advantageous, the field contribution is enhanced. In other cases the interaction contribution on the grid is reduced. In this way a data table is generated for the entire training set analogously to a CoMFA method. This table is accordingly evaluated and affords a QSAR equation. The individual contributions can be shown on a grid. They indicate where particular atom types increase or reduce affinity. 388 18 Quantitative Structure–Activity Relationships
  • 406.
    A similar fieldanalysis is also used for the correlation and prediction of selectivity differences between ligands. Many enzymes occur as isoforms. They therefore have similarities in their binding pockets. As a consequence ligands show graduated affinities or “selectivity profiles” to these isoforms. If a ligand is to be optimized to improve selectivity, the positions at which a change in a property results in an improved profile must be known. A 3D-QSAR model is constructed for each isoenzyme. Either the difference in the affinity values can be calculated and used for the model as values to be predicted, or alternatively, two correlation models can be constructed and at each grid point the field contributions are subtracted from one another. The models that are obtained with both approaches can be graphically interpreted. Contour diagrams show where and how the mole- cules are to be changed to improve their selectivity with regard to the one or other isoenzyme. 18.14 A Glimpse Behind the Scenes: Comparative Molecular Field Analysis of Carbonic Anhydrase Inhibitors Today comparative field analyses belong to the standard repertoire in drug research. As an example, the binding of inhibitors to carbonic anhydrase I and II shall be examined. The biological function of this enzyme is described in detail in ▶ Sect. 25.7. The sequence identity of the isoforms is 60%. The ligands in the training data set are derived from the parent structures shown in Fig. 18.6. First, a superposition model is generated by docking the ligands into the protein (Fig. 18.7). The enzyme’s funnel-shaped binding pocket is occupied by ligands in a large variety of ways. A good correlation model is obtained with the three methods, CoMFA, CoMSIA, and AFMoC. The models also achieve a convincing predictive power on a test data set that was independent from the training set. S N N N H SO2 NH2 R1 SO2 S SO2 NH2 N R1 H3C S N SO2 NH2 R1 SO2 NH2 R1 SO2 N H R1 R2 N H O OH R1 SO2 N H OH Thiadiazolsulfonamide Thienothiopyransulfonamide Benzothiazolsulfonamide Phenylsulfonamide Hydroxamate Hydroxysulfonamide Fig. 18.6 The scaffolds of inhibitors that were used in different field analyses to establish affinity (pKi[CAII]) and selectivity models (pKi[CAII] – pKi[CAI] ¼ DpKi[CAII – CAI]) to describe the inhibition of the carboanhydrases CAI and CAII. Different substituents were varied at the positions that are marked as R1 and R2 . 18.14 A Glimpse Behind the Scenes 389
  • 407.
    The contours forthe acceptor properties with regard to the inhibition of carbonic anhydrase II are shown in Fig. 18.8. Molecules in the data set that exhibit an acceptor function in the areas marked in red have lower potency. On the other hand, an acceptor function in the blue area improves potency. Compound 18.2, which has both acceptor functions of an SO2 group oriented in the detrimental red area, is a weak CAII inhibitor. Moreover its NH group is in the blue region, which should be occupied by an acceptor. Compound 18.3, which is about four orders of magnitude more potent, leaves the area that was occupied by an oxygen atom in 18.2 empty, and orients its thiadiazole ring in the direction of the desirable acceptor function. It achieves considerably better inhibition of the target enzyme. Just as for the acceptor properties, contour maps can be generated for steric, electrostatic, hydrophobic, and hydrogen-bond-donor properties. Their evaluation Fig. 18.7 The superposition of inhibitors from the data set in the funnel-shaped binding pocket of CAII; the zinc ion is shown as the blue-gray sphere, carbon is light-yellow, oxygen is red, nitrogen is blue, sulfur is orange, and hydrogen is white. 390 18 Quantitative Structure–Activity Relationships
  • 408.
    helps to makeevident where particular properties improve or lower the binding affinity. Such correlation analyses help the synthetic chemist to plan the optimiza- tion of lead structures in a tailored way. Contour maps for steric properties that cause a selectivity difference between CAI and CAII are shown in Fig. 18.9. Occupancy of the green areas with an inhibitor improves the selectivity for CAI. On the other hand, spatially filling the yellow-colored regions improves the selectivity for CAII. Compound 18.4 binds unselectively with the same affinity to both isoforms, but 18.5 can clearly discrim- inate between the two. The shown model is purely derived from the correlation of ligand binding data. The relative alignment of the molecules in the data set is accomplished in the binding pocket of the protein. Therefore the protein environ- ment around this binding pocket should be examined more closely, to see if the derived contours are reasonable. If the amino acid replacement between the two isoforms is compared, it is apparent that CAI has two large residues Phe91 and Leu131 that constrain the lower left portion of the binding pocket. The inhibitors have less room in CAI than they do in CAII. In fact the comparative field analysis O N H N N S S N O O Cl3C S N O O H S O O O Zn2+ Zn2+ H H 18.2 CA II pKi = 4.7 18.3 CA II pKi = 8.7 – Zn2+ Fig. 18.8 Contour map for the description of the binding contributions of H-bond acceptor properties. Inhibitors that occupy the red contour areas with H-bond acceptor groups do not inhibit CAII well, the occupancy of the blue areas with acceptor groups, however, leads to increasing values. Both oxygen atoms of the sulfonamide group of 18.2 occupy the red-contoured area, which is unfavorable for acceptor properties. On the other hand, 18.3 leaves these areas unoccupied and places its basic nitrogen in the vicinity of the blue-contoured region, which is favorable for occupancy by acceptor groups. This explains the markedly better inhibition of CAII by 18.3. 18.14 A Glimpse Behind the Scenes 391
  • 409.
    in this regiongenerates a yellow contour, (near position 91) the occupancy of which should be favorable for the inhibition of CAII. CAII also makes a large amount of space available for inhibitors next to position 204, which is occupied by the less- crowding Leu204 instead of Tyr204. A yellow contour is seen that indicates a favorable occupancy of this area. Inhibitor 18.5, which is considerably more potent on CAII, orients its pentafluorophenyl group exactly in this region (Fig. 18.9, right). In the vicinity of position 131 (Leu131/Phe131) a yellow and a green area occur directly next to one another but spatially separated, the occupancy of which is favorable for either CAI or CAII inhibitors, respectively. Compound 18.4, which can hardly distinguish between the two isoforms, occupies the upper edge of both areas equally well. Moreover it leaves virtually all regions unoccupied that should lead to a better inhibition of either CAI or CAII for steric reasons. Therefore it is evident why this compound shows no particular selectivity. S N N N H SO2 NH2 O N S N N N H SO2 NH2 O N H S F F F F F O O 18.4 CAI: pKi = 8.15 CAII: pKi = 8.10 CAI: pKi = 6.70 CAII: pKi = 9.40 18.5 His200 Tyr204 Leu131 Phe91 Thr200 Leu204 Phe131 Ile91 His200 Tyr204 Leu131 Phe91 Thr200 Leu204 Phe131 Ile91 CAI selective His200 Tyr204 Leu131 Phe91 Thr200 Leu204 Phe131 Ile91 CAII selective Fig. 18.9 The selectivity can be improved with regard to CAII inhibition by sterically filling the yellow-contoured area. Filling the green area with sterically demanding group causes an increase in selectivity with regard to CAI (top left). Compound 18.4 occupies virtually no area that is particularly selectivity discriminating; the compound is not isoenzyme specific (top left and top right). On the other hand, 18.5 occupies a yellow-contoured area neighboring position 204, which causes a selectivity enhancement for CAII. Compound 18.5 inhibits CAII decidedly more potently than CAI. 392 18 Quantitative Structure–Activity Relationships
  • 410.
    Finally, the bindingof the well-discriminating compound 18.6 should be con- sidered (Fig. 18.10). The evaluation of the acceptor properties of the ligands in the training data set shows that the occupancy of the red regions with H-bond-acceptor groups shifts the selectivity to the benefit of CAII. Filling the blue contours with this property achieves an increase in potency regarding CAI. Compound 18.6 places CAI CAI selective CAII CAII selective NH2 N H CAI: pKi = 4.30 S S SO2 H3C O O CAII: pKi = 8.05 18.6 Fig. 18.10 Compound 18.6 inhibits CAII significantly more potently than CAI. Its sulfone oxygen atom lies near one red contoursed area, the filling of which causes an increase in the selectivity for CAII binding. Interestingly, Gln92 is found in this region in both isoforms. However, it is only in CAII that this group is available to accept an H-bond from the inhibitor that will contribute to binding affinity. The comparable residue in CAI is involved in a network of H-bonds to neighboring amino acids. Therefore it is not available as a binding partner, and a decrease in the affinity for CAI is the consequence. 18.14 A Glimpse Behind the Scenes 393
  • 411.
    its oxygen atomsof the endocyclic SO2 group in the vicinity of the red CAII- selective areas. Furthermore, a glutamine is neighboring position 92 both in CAI as well as CAII. This amino acid can accept an H-bond from the inhibitor via the NH2 group of its carboxamide group. However, only CAII allows this structural condi- tions. Gln92 neighbors Asn69 and Glu58 in CAI. The carboxamide group of Glu92 forms a continuous H-bond network with these residues and with His94. Therefore the NH group is no longer available for interactions with a bound inhibitor. This is expressed in the poorer binding affinity of inhibitors that place an acceptor function at this position, as 18.6 does. The situation is entirely different in CAII. The neighboring groups of Glu69 and Arg58 form an internal salt bridge with each other. Therefore they are not available as H-bond partners for Gln92. The carboxamide group of Gln92 involves His94 via its carboxamide CO group in an H-bond, and its NH2 group is now available as an acceptor functionality to interact with a bound ligand. This results in a considerably enhanced binding to CAII and is expressed as a selectivity advantage. Alexander Hillebrecht at the University of Marburg has performed yet another evaluation of the data set of carbonic anhydrase inhibitors that underscores the difference between 3D, 2D, and 1D QSAR analyses. First, 32 so-called one- dimensional descriptors were calculated with the MOE program for all molecules in the data set. These are surface-based descriptors that describe the lipophilicity (log P), the molar refraction (and therefore the polarization), and partial charges distrib- uted over the molecules. These 32 descriptors are correlated with the binding affinity to CAII or the selectivity difference between CAI and CAII to establish a QSAR model. In another model the connectivities in the chemical formulae (so-called molecular graphs) were used as descriptors. For this a topological connectivity tree of all bonds in a molecular formula was generated, and by “walking” along the bond connections it was counted how often a particular connectivity, for instance, an N–S– C–C–N or C–N–C–C–C sequence occurs (so-called MACCS keys). In all, the frequency of 166 different connectivity fragments was evaluated. Such descriptors code indirectly for the molecular composition of the individual inhibitors in the data set, as was introduced above in the Free–Wilson analysis (Sect. 18.5). These topological 2D descriptors were then related to the binding affinity or selectivity data as described above. Good correlation models can be derived using 1D as well as 2D descriptors. The models based on the 1D descriptors proved to be not predictive. If an attempt was made to predict a molecule that was not in the data set, the model failed. The topological descriptors obtain better results. They possess a certain degree of predictive power, but they perform less well than the above- described 3D descriptors in the comparative field analysis. This comparison makes evident that the increase in the complexity of the model and the structural validity of the descriptors increases their predictive power with regard to the binding properties of new molecules that were not part of the training data set. But it is especially this predictive power and the straightforward translation of the obtained correlation model into the design of new or the modification of existing chemical structures during the optimization that make QSAR models valuable for drug design. 394 18 Quantitative Structure–Activity Relationships
  • 412.
    18.15 Synopsis • Theconcept of quantitative structure–activity relationships is not new. It was first described in the nineteenth century qualitatively, and later more quantita- tively by Hansch and Fujita. It is an attempt to describe structure–activity relationships with mathematical models. • Across a series of structurally closely related test compounds, the equieffective dose that induces a particular biological effect is related in a linear or squared dependence on the logarithm of the octanol/water partition coefficient and the Hammett constant, which describes the electronic properties of substituents at a given scaffold. A mathematical correlation model is computed by regression analysis. • 3D QSAR methods have been developed to consider and correlate the spatial structure of active substances beyond molecular topology. • The mutually aligned test molecules are embedded in a regularly spaced lattice and their properties are explored with an interaction probe. This is placed systematically at all grid points and a molecular interaction field is computed around the aligned molecules by using a distance-dependent property potential. • Usually, Lennard-Jones and Coulomb potentials are evaluated, and the gener- ated data table for all molecules of the training data set is correlated by a partial least-squares technique. • The derived CoFMA correlation model can be used to predict the biological properties of novel ligands not included in the training data set. Strict criteria to monitor the statistical significance of the derived correlations must be defined. • Other property fields beyond Lennard-Jones and Coulomb potentials with mathematically different functional forms can be applied. With respect to the prediction of binding affinity, it has to be regarded that this property comprises an entropic contribution that is particularly difficult to reflect in property fields. • QSAR analysis only performs a relative comparison of molecules with regard to the considered biological property. Any dependence on a particular descriptor across a compound series can only be expected if the property related to this descriptor is varied in the series. QSAR methods only interpolate and never extrapolate beyond the scope of molecular properties reflected by the training set. • Comparative molecular field analyses can be evaluated graphically. Results are displayed as contours around the molecules and indicate where the change of a particular property runs either parallel or opposite to the changes in the biological property in the data set. • The graphical information can be directly translated into the design of modified molecules and thus support the medicinal chemist in optimizing a given lead structure in a systematic fashion. 18.15 Synopsis 395
  • 413.
    Bibliography General Literature Hansch C,Leo A (1995) Exploring QSAR. Fundamentals and applications in chemistry and biology, vol 2. American Chemical Society, Washington, DC Kubinyi H (1993a) QSAR: Hansch analysis and related approaches. VCH, Weinheim Kubinyi H (ed) (1993b) 3D-QSAR in drug design: theory, methods, and applications. ESCOM, Leiden Kubinyi H, Folkers G, Martin YC (1998) 3D QSAR in drug design, vol 2 and 3. Kluwer/ESCOM, Dordrecht/Boston/London Ramsden CA (1990) Quantitative drug design. In: Hansch C, Sammes PG, Taylor JB (eds) Comprehensive medicinal chemistry, vol 4. Pergamon Press, Oxford van de Waterbeemd H (1995a) Chemometric methods in molecular design. VCH, Weinheim van de Waterbeemd H (1995b) Advanced computer-assisted techniques in drug discovery. VCH, Weinheim Special Literature Blaney JM, Hansch C, Silipo C, Vittoria A (1984) Structure–activity relationships of dihydrofolate reductase inhibitors. Chem Rev 84:333–407 Cramer RD, Patterson DE, Bunce JD (1988) Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. J Am Chem Soc 110:5959–5967 DePriest SA, Mayer D, Naylor CB, Marshall GR (1993) 3DQSAR of angiotensin-converting enzyme and thermolysin inhibitors: a comparison of CoMFA models based on deduced and experimentally determined active site geometries. J Am Chem Soc 115:5372–5384 Gohlke H, Klebe G (2002) DrugScore meets CoMFA: adaptation of fields for molecular compar- ison (AFMoC) or how to tailor knowledge-based pair-potentials to a particular protein. J Med Chem 45:4153–4170 Goodford PJ (1985) A computational procedure of determining energetically favorable binding sites on biologically important macromolecules. J Med Chem 28:849–857 Hansch C, Klein TE (1991) Quantitative structure–activity relationships and molecular graphics in evaluation of enzyme–ligand interactions. Methods Enzymol 202:512–543 Hillebrecht A, Klebe G (2008) The use of 3D QSAR models for database screening: a feasibility study. J Chem Inf Model 48:384–396 Hillebrecht A, Supuran CT, Klebe G (2006) Integrated approach using protein and ligand information to analyze affinity and selectivity determining features of carbonic anhydrase isozymes. ChemMedChem 1:839–853 Kellogg GE, Abraham DJ (1992) Key, lock and locksmith: complementary hydrophathic map predictions of drug structure from a known receptor-receptor structure from known drugs. J Mol Graph 10:212–217 Klebe G, Abraham U, Mietzner T (1994) Molecular similarity indices in a comparative analysis (CoMSIA) of drug molecules to correlate and predict their biological activity. J Med Chem 37:4130–4146 Ortiz AR, Pisabarro MT, Gago F, Wade RC (1995) Prediction of drug binding affinities by comparative binding energy analysis. J Med Chem 38:2681–2691 Unger SH, Hansch C (1973) On model building in structure–activity relationships. A reexamination of adrenergic blocking activity of b-Halo-b-arylalkylamines. J Med Chem 16:745–749 Weber A, Böhm M, Supuran CT, Scozzafava A, Sotriffer CA, Klebe G (2006) 3D QSAR selectivity analyses of carbonic anhydrase inhibitors: insights for the design of isozyme selective inhibitors. J Chem Inf Model 46:2737–2760 396 18 Quantitative Structure–Activity Relationships
  • 414.
    From In Vitroto In Vivo: Optimization of ADME and Toxicology Properties 19 The interaction between a substance and the binding site of a therapeutically relevant biological macromolecule is the decisive prerequisite for suitability as a drug. Another, no less important, prerequisite is the ability of the substance to manage to get from the site of application, through an often rather tortuous path, to the target tissue. The substance must penetrate aqueous phases and lipid membranes for this to occur. According to its water and lipid solubility, it will arrive in different compartments of the biological system. It is also changed by metabolic enzymes. After conjugation or degradation it is finally eliminated via the kidney, the bile, and/ or by the intestines (▶ Sect. 9.1). In contrast to the biological activity of a drug, which is called pharmacodynam- ics, the sum of all processes that affect the absorption, distribution, metabolism, and excretion, so-called ADME parameters, is covered by the term pharmaco- kinetics. Roughly simplified, pharmacodynamics can be thought of as “the effect of the substance on the organism” and pharmacokinetics as “the effect of the organism on the substance.” In the last years this clear separation of definitions has begun to disappear. The term pharmacodynamics has expanded more and more to processes of pharmacokinetics. Above all, this has to do with increasing knowledge that transporters or enzyme systems are responsible for properties such as absorption, distribution, or metabolism. More and more structures are being solved for these enzymes, and structure–activity relationships have been established (▶ Sects. 27.6 and ▶ 30.7). The pharmacokinetics of an arbitrary biological system and the dependence of the absorption, distribution, and excretion processes on time are described with mathematical models. The pharmacokinetics of every pharmaceutical is scrupu- lously investigated and a dosing scheme is determined before entry into clinical trials, especially during the clinical phases I and II, which evaluate the tolerability and efficacy in humans. The isolation and structural elucidation of metabolic products in humans help to find the animal model that is most similar to humans in its metabolic properties. These species are then used for toxicology studies, which are chosen to investigate possible teratogenic effects, and long-term studies G. Klebe, Drug Design, DOI 10.1007/978-3-642-17907-5_19, # Springer-Verlag Berlin Heidelberg 2013 397
  • 415.
    to investigate possiblecarcinogenic effects. In parallel, individual metabolites of a pharmaceutical are investigated for their toxic side effects. In the context of the rational design of new active substances, a substantial problem arises from the pharmacokinetic parameters and the toxicity: these inves- tigations are only carried out for very few compounds because of the enormous experimental effort and the high costs, and only for those compounds that are intended for clinical development. This approach comes with a serious danger: scant pharmacokinetic properties are only recognized until very late development stages, and only then after considerable sums have already been invested in the development of a new pharmaceutical. In the middle of the 1990s a study emerged that tellingly showed that numerous unsuccessful development campaigns failed because of unsatisfactory pharmacokinetics and intolerable toxicity. For these reasons, an intensified search for in vitro models to predict ADME-tox properties has taken place in the last 15 years. Therefore it is not the pharmacokinetics of individual substances that are investigated in detail, but rather the dependence of different pharmacokinetic parameters on the properties of many different sub- stances. This allows a better comprehension of the interrelationship between chem- ical structure and pharmacokinetics. At the same time, it leads to the derivation of general rules and numerous computer models that are today applied early on in the design of new drugs. 19.1 Rate Constants of Compound Transport The distribution of a substance in phases of different lipophilicities is measured as the partition coefficient P (▶ Sect. 18.3). This definition is valid for systems at equilibrium. The distribution between the water and octanol phases is considered as a model system. The ratio of the concentration of the non-ionized form of an investigated compound in the two phases is considered. In addition, the pH value is adjusted during the measurement so that the investigated compound overwhelm- ingly occurs in its non-ionized form. As a general rule, log P, the logarithm of this value is used. log Pðoctanol=waterÞ ¼ log concentration ðdissolved compoundÞoctanol concentrationðdissolved compoundÞnon-ionized in water Biological systems are open systems that are kinetically controlled. They can be temporarily found in a dynamic equilibrium. This condition can be compared to a chromatographic process in which a substance is in a constant exchange between the solid support and the mobile phase. Locally, equilibria occur that are disrupted by the continuous progression of the mobile phase. In contrast to the relatively simple conditions in chromatography, there are a plethora of different phases in biological systems. A drug is distributed throughout all of these phases. Further- more, metabolic processes are running in parallel that lead to different metabolites. 398 19 From In Vitro to In Vivo: Optimization of ADME and Toxicology Properties
  • 416.
    To analyze thesedynamic equilibria, the kinetic equilibrium constants of the substance transport from the aqueous phases into the lipid phases and in the reverse direction must be known. It is astonishing that such fundamental experi- mental investigations on organic substances were first carried out by Bernard Lippold in the mid–1970s, and later also by Han van de Waterbeemd. Lippold used a three-phase system: water/n-octanol/water (Fig. 19.1). After the addition of the substance in one of the two aqueous phases, the time dependence of the substance concentration in the different phases was measured. From this the equilibrium constant k1 for the transport from the water into the octanol phase and the rate constant k2 in the opposite direction were calculated. In addition to the partition coefficient P, which is described in Eq. 19.1, a very simple correlation has been shown for the dependence of k1 and k2 (Eq. 19.2); b and c are constants that depend on the system and not on the structures of the substances. P ¼ k1 k2 (19.1) k2 ¼ bk1 þ c (19.2) The dependence of the rate constants k1 and k2 on the partition coefficient P results from the combination of both equations (Eqs. 19.3 and 19.4). log k1 ¼ log P logðbP þ 1Þ þ constant (19.3) log k2 ¼ logðbP þ 1Þ þ constant (19.4) The experimental k values for 20 different sulfonamides and 15 further sub- stances that were experimentally determined by Han van de Waterbeemd are shown Octanol phase B k1 k2 k2 k1 Aqueous phase A Aqueous phase C Fig. 19.1 Three- compartment system for the determination of the rate constants k1 and k2. At the beginning of the experiment the substance is dissolved in aqueous phase A. Next the substance concentration is measured in phases A, B, and C after different times until an equilibrium is established between the individual phases. 19.1 Rate Constants of Compound Transport 399
  • 417.
    in Figure 19.2.Among the latter are neutral, acidic, basic, and even quaternary charged compounds with very different molecular weights. The characteristic course of the curve says that the rate constant k1 for the transfer from the aqueous phase into the organic phase depends on the partition coefficient P for relatively polar substances. It is thermodynamically controlled, that is, it increases with increasing lipophilicity. A point is reached, however, at which the diffusion of the substance is limited by k1 at the maximally achievable value. More lipophilic substances cannot simply penetrate the organic phase faster. Analogously, this is valid for the opposite direction as well, from which the diffusion from the organic phase into the aqueous phase is described by k2. The chemical structure plays a role in both cases in that it determines the value of the partition coefficient P. Because the rate constants are limited by diffusion, there must be an apparent dependence on the molecular size in this area. According to Fick’s law of diffusion, the diffusion should be proportional to the radius of the particle, as a first approximation, parallel to the third root of the volume. Because of the relatively low variability of the molecular size of organic drugs and their conformational flexibility, this effect is probably lost by the noise level of experimental error. Moreover, it must not be forgotten that the discussed octanol/water system is very simple and it only slightly approximates the complex structural relationships of real membrane systems. Therefore today more relevant models to collect experimental distribution data, such as the so- called PAMPA or Caco-2 models, are increasingly being used (Sect. 19.6). Here more complex correlations are indicated. Obviously how a compound is distributed and structurally oriented in the vicinity of membrane structures is important. These properties simultaneously influence how the penetration, and therefore the distribution, is to be described. 4 log P log k log k1 (r = 0.997) log k2 (r = 0.998) 3 2 1 0 −1 −2 −3 −7 −6 −5 −4 −3 Fig. 19.2 Experimentally determined rate constants k1 and k2 for the transport of 20 sulfonamides and 15 further chemically different substances with molecular weights between 100 and 500 Da. The curves and correlation coefficients r correspond to the fitting of the data with Eqs. 19.3 and 19.4. 400 19 From In Vitro to In Vivo: Optimization of ADME and Toxicology Properties
  • 418.
    19.2 Absorption ofOrganic Molecules: Model and Experimental Data The rate constant, k, for the penetration through the lipid membrane from the aqueous phase is described by another equation, Eq. 19.5. Here, the rate constants k1 and k2 also describe the entry into the organic phase and the transport in the opposite direction, respectively. log k ¼ log k1 þ log k2 þ constant (19.5) In the first approximation, this equation should also describe transportation processes in multicompartment systems. Model calculations on arbitrary, complex systems show that this is indeed the case. They confirm that there is bilinear dependence of the transport in different phases on the total lipophilicity of a substance. For multiple groups of drugs, for example, barbiturates, this was demonstrated experimentally in simple in vitro model systems (Fig. 19.3, bottom). The log k values increase linearly upon penetration through an organic membrane, which correlates with the increase of k1 with constant k2. After passing through a maximum, they decrease with a constant k1 value and decreasing k2 value. This dependence was quantitatively summarized by Hugo Kubinyi in the so-called bilinear model (Eq. 19.6); a, b, b, and c are constants, by which the nonlinear regression analysis is ascertained. log k ¼ a log P b logðbP þ 1Þ þ c (19.6) Entirely analogous dependencies are observed with the absorption of com- pounds, that is, out of the stomach or intestines (Fig. 19.3, middle). Active substances that should be orally available should not be either very polar or very non-polar. Substances with intermediate lipophilicity can cross the blood–placenta barrier more easily than very polar or very non-polar compounds (Fig. 19.3, top). A nonlinear dependence on the lipophilicity for substance penetration through the blood–brain barrier is particularly pronounced (Fig. 19.4). The optimum for this barrier is in the range of log P ¼ 1.5–2.5. For CNS-active substances, an optimal lipophilicity around log P ¼ 2 should be aimed for in order to facilitate penetration across the blood–brain barrier. 19.3 The Role of Hydrogen Bonds The simple concept about the dependence of absorption on the octanol/ water partition coefficients that was outlined above, has been questioned in the last few years. Octanol is indeed a relevant model for lipid membranes in many respects (▶ Sect. 4.2), but it can only incompletely model the influence of hydrogen bonding. Upon establishing equilibrium in the octanol/water system, 19.3 The Role of Hydrogen Bonds 401
  • 419.
    the organic phasecontains considerable amounts of water so that the molar ratio of octanol/water ¼ 4:1. Substances with polar, solvated groups therefore do not need to fully release their water solvation shell upon entry into the octanol phase. Entering into a biological membrane is obviously different. Aside from the depen- dence on lipophilicity, even worse membrane penetration is observed for sub- stances that can form an increasing number of hydrogen bonds. Similarly, a ligand must release its water shell before it can be accommodated in the binding site of a protein. The system water/cyclohexane is more suitable for the description of such processes. Because of the non-polar character of this hydrocarbon, upon transition from water into cyclohexane the molecule cannot take its water shell with it. Many years ago P. Seiler derived an increment IH (Eq. 19.7) from the differences in the partition coefficients in cyclohexane/water (loss of water shell) and octanol / water −3 −5 −4 −3 −2 −1 0 1 −2 −1 0 1 2 3 4 log P Penetration through an organic membrane Intestinal resorption Blood-placenta penetration Gastric resorption log K 5 Fig. 19.3 The rate constant k for the transport of drugs depends nonlinearly on lipophilicity. This is valid for simple in vitro models as well as for biological systems. The bottom curve describes the log k values of the transport of barbiturates in an in vitro absorption model from an aqueous phase, through an organic membrane into another aqueous phase. Both curves in the middle (gray points) describe the dependence of the absorption rate constants k on the lipophilicity for the absorption of homologous carbamates from the stomach (gastric absorption) or the gut (intestinal absorption) of rats. The top curve was determined for the entry of different drugs into the placenta from the circulation. In all cases an increase in log k dependent upon log P is seen, until a more-or-less- pronounced maximum for substances with moderate lipophilicity. For very non-polar substances, this curve falls, and in rare cases a plateau is reached. The curves for gastric and intestinal absorption and for the penetration into the placenta run flatter than the curve for the in vitro transport of barbiturates (below), because here no lipid barrier is present. 402 19 From In Vitro to In Vivo: Optimization of ADME and Toxicology Properties
  • 420.
    (no loss ofwater shell) for different functional groups. These IH values characterize the tendency of groups to form hydrogen bonds. log Pcyclohexane þ X IH ¼ 1:00 log Poctanol þ 0:16 (19.7) The concept of Seiler remained largely ignored. In 1988 Robin Ganellin and co- workers described the CNS bioavailability of different substances, that is, their ability to cross the blood–brain barrier, as a linear function of a Dlog P value. This Dlog P value is the difference between the log P values in the systems cyclohexane/ water and octanol/water. The bioavailability of peptides also runs in first approx- imation parallel to the Dlog P value, or the number of groups that potentially participate in hydrogen bonds. The methylation of all NH groups of a peptide scaffold can, in fact, deliver substances with good bioavailability. The prerequisite for good membrane penetration is similar to those for high affinity at the binding site (▶ Chap. 4, “Protein–Ligand Interactions as the Basis for Drug Action”). Here too, the requirement to release relatively strongly bound water molecules can also have a detrimental influence on binding affinity. Several other distribution systems, for instance, heptane/ethylene glycol, have been proposed as alternatives to the octanol/water or cyclohexane/water systems with regard to the simulation of penetration through a lipid membrane. But even these systems cannot correctly reflect the architecture of membranes with an interior lipophilic zone and a polar, negatively charged outer rim. Another option −2 −1 0 0 1 2 3 4 1 2 3 4 5 DecOH EtOH MeOH AmOH log P log 1/c Fig. 19.4 The neurotoxicity (C ¼ molar dose that induces a particular toxic effect) of homolo- gous primary alcohols in the rat is a measure of their ability to cross the blood–brain barrier. Polar substances remain overwhelmingly in the blood circulation. In contrast, substances with moderate lipophilicity reach the central nervous system easily. Accordingly, neither methanol (MeOH) nor ethanol (EtOH) shows a pronounced neurotoxicity. The high general toxicity of methanol (blind- ness) is not because of its own effect but rather the severely toxic metabolic products formalde- hyde and formic acid (acidosis). Short-chained alcohols such as amyl alcohol (AmOH) are considerably more neurotoxic. The highly lipophilic decanol (DecOH) shows low toxicity. 19.3 The Role of Hydrogen Bonds 403
  • 421.
    is the determinationof the membrane/water partition coefficient, which is, how- ever, experimentally rather laborious. For this, artificial membranes or liposomes are used as models. 19.4 Distribution Equilibria of Acids and Bases Many drugs are acids (HA) or bases (B). They exist in two forms through dissoci- ation (Eq. 19.8) or protonation (Eq. 19.9); one is usually a non-polar neutral form and the other is a polar ionic form. The values of the partition coefficients of the ionic species are generally three to five orders of magnitude less than the corresponding neutral molecule. HA þ H2O Ð A þ H3Oþ (19.8) B þ H3Oþ Ð BHþ þ H2O (19.9) The distribution equilibrium of an acid and its anion in a two-phase system depends on the pKa value and the pH value of the aqueous phase, as well as the partition coefficients Pu and Pi of the substance (Fig. 19.5). All components in each phase must be in equilibria with one another to establish equilibrium of the total system. The dependence of the partition coefficient P on the pH value, the pH– partition profile, usually takes on a sigmoidal (i.e., S-shaped) course. Plateaus are observed for the uncharged neutral form and in case of pH values at which so little of the neutral form exists that solely the transfer of the charged species in the organic phase determines the measured partition coefficient (Fig. 19.6). The charged species goes into the organic phase with a counterion as an ion pair. Either the corresponding ion of the salt or the excess of ions in the aqueous buffer come into play as counterions. The partition coefficient of the ion pair decidedly depends on the lipophilicity of the counterion. The tetrabutylammonium salt of salicylic acid Octanol HA A− Pi HA + H2O A− + H3O+ Ka Aqueous Buffer Pu Fig. 19.5 Two-phase system with partition and dissociation equilibria for an acid HA (Eq. 19.8). Ka is the dissociation constant, Pu and Pi are the partition coefficients of the non-dissociated and ionic forms, that is, neutral and charged species, respectively. Because there is usually a difference of several orders of magnitude between the Pu and Pi values, in many cases the Pi value can be neglected. This leads to considerable simplification of the corresponding mathematical models. 404 19 From In Vitro to In Vivo: Optimization of ADME and Toxicology Properties
  • 422.
    has an onlyslightly lower partition coefficient than the neutral form of salicylic acid. In contrast, the sodium salt of salicylic acid has absolutely no tendency to cross over into the organic phase. Amino acids and other mixed acidic and basic compounds afford pH–partition profiles with a maximum between the pKa values of the two ionizable groups (Fig. 19.6), that is, when the zwitterionic form is present. Knowledge about the log P value of the neutral form and the pKa value allows the partition coefficient of a substance to be calculated at neutral pH. These principles allow the estimation of absorption and distribution properties of new substances. Of course, these considerations are only valid for drugs for which no transporter exists that facilitates their membrane penetration ( ▶ Sects. 22.7 and ▶ 30.7) Because of their importance, today pKa values are routinely measured by potentiometric titration in pharmaceutical research. However, it remains neglected that the definition of pKa values of acids and bases are only valid for aqueous solutions. The addition of an organic solvent, which changes the dielectric con- stant, shifts this value (▶ Sect. 4.4). This is even more valid for the binding site of a protein or the interior of a membrane. In individual cases, experimental values have been determined by NMR spectroscopy and isothermal titration calorimetry. log P Acid AH Base B Acid/Ion pair +H3NCH(R)COO− Amino Acid Protonated Base B+ Dibasic Acid + H3NCH(R)COOH H2NCH(R)COO− pH = 0 pH = 7 pH = 14 Anion A− A− × N(R4)+ Fig. 19.6 The pH dependence of the distribution equilibrium of acids and bases, the so-called pH distribution profile, follows simple rules. Typically when an acid or a base is present, sigmoidal, that is, S-shaped, curves are observed. For a two-base acid, for example, oxalic acid, the decrease in the partition coefficient continues with increasing pH values. In the presence of lipophilic counterions, for example, the tetrabutylammonium salt of salicylic acid, the ion pair displays a very high partition coefficient. Amino acids with neutral side chains carry one basic amino group and an acidic carboxyl group. Accordingly, they go through a maximum in their partition coefficient at the neutral point. Here the majority of the substance indeed exists as a zwitterion; aside from that, however, a larger part is in the neutral form than is at lower or higher pH values. 19.4 Distribution Equilibria of Acids and Bases 405
  • 423.
    19.5 Absorption Profilesof Acids and Bases The absorption of an active substance, for example, out of the intestines into blood, should be dependent on the pH of the surrounding medium and the pKa of the substance, just as the distribution between an aqueous buffer system and an organic phase is. The absorption should follow very simliar profiles as the distribution. In the 1950s, Brodie, Hogben, and Schanker formulated the pH–partition theory to this effect. It says that the dependence of absorption profile on the pH value, the pH–absorption profile is identical to the pH–partition profile (Sect. 19.4). This theory was confirmed by, among other things, the investigation of the rate constant of absorption of a few acids and phenols from the colon of the rat at pH 6.8. The neutral forms of the strong acids 5-nitrosalicylic acid (pKa ¼ 2.3), salicylic acid (pKa ¼ 3.0), m-nitrobenzoic acid (pKa ¼ 3.4), and benzoic acid (pKa ¼ 4.2) display comparable lipophilicity with log P values between 1.8 and 2.3. Under experimen- tal conditions near neutral pH, they are largely dissociated. Less than 0.1% are in the neutral form. Therefore they are distinctly more slowly absorbed than the comparably lipophilic, weakly acidic phenols p-hydroxypropiophenone (pKa ¼ 7.8) and m-nitrophenol (pKa ¼ 8.2), which are more than 90% in their neutral form at pH 6.8. Neutral forms can diffuse through membranes; charged forms are well soluble in water. An equilibrium is quickly established between the two forms in an aqueous medium and also at the phase boundaries. In the case that the pKa values of the substances are not more than 2–3 units from the neutral value of pH 7, the neutral form is present in the aqueous phase at the entirely adequate concentration of about 0.1–1%. The latter penetrates into the membrane. In the aqueous phase it is immediately regenerated by the dissociation equilibrium. In a biological system the distribution of such substances is accomplished quickly and effectively (Fig. 19.7), and indeed even better the closer the pKa value is to the neutral pH 7. This also explains why so many drugs are organic acids or bases. Because of the strongly deviating pH values in the stomach and intestines, at some place along the gastrointestinal tract the conditions are right that a neutral substance, an acid, or a base can be well absorbed. If the pKa values are too far from the physiological pH values, for example, amidines or guanidines with extremely high pKa values, the absorption can become problematic. This is also true for zwitterionic compounds, for example, amino acids, and for compounds with multiple acidic or basic groups in the molecule. Because of the large volume available for the distribution the diffusion occurs overwhelmingly from the gastrointestinal tract into blood or tissue and only to a negligible extent in the opposite direction (Fig. 19.7). The absorption of strongly acidic compounds outside the range in which the compound exists as a neutral molecule, runs in first approximation parallel to the difference pH pKa, and for bases the difference is pKa pH. There are exceptions to this approximation. Highly lipophilic compounds require a more detailed descrip- tion of the pH–absorption profile. The neutral forms of these substances enter the lipid phase as soon as they come near the membranes. The neutral molecule is being constantly removed from the dissociation equilibrium, which is established in the 406 19 From In Vitro to In Vivo: Optimization of ADME and Toxicology Properties
  • 424.
    Neutral substance a b cd Acid, pKa = 4 N Stomach, pH = 1 HA HA A- A- A− A− Stomach, pH = 1 N Blood circulation, pH = 7.4 Blood circulation, pH = 7.4 Intestines, pH = 6–8 N N Intestines, pH = 6–8 HA HA Stomach, pH = 1 B B BH+ BH+ Stomach, pH = 1 B B Blood circulation, pH = 7.4 Blood circulation, pH = 7.4 BH+ BH+ BH+ BH+ BH+ B B B B Intestines, pH = 6–8 Intestines, pH = 6–8 BH+ Weak base, pKa = 5 Strong base, pKa = 9 Fig. 19.7 (a) A moderately polar neutral substance N is absorbed very well from the stomach as well as from the intestines. It is quickly distributed in the circulation so that the back-transport does not play a notable role. (b) An organic acid HA (pKa ¼ 4) is absorbed well from the stomach, as long as it is not too polar, because it exists there overwhelmingly in the neutral form. The absorption is facilitated by the fact that the free acid is in considerably lower concentration in the blood than in the stomach. The formation of an anion shifts the concentration gradient in this direction. The absorption is slower from the gut because there the equilibrium lies overwhelmingly on side of the ionized form. (c) A weak base (pKa ¼ 5) is absorbed relatively poorly from the stomach because it exists overwhelmingly in its polar, protonated form. It is well absorbed in the intestines because it exists as its neutral form there. (d) A strong base with a pKa ¼ 9 cannot be absorbed from the stomach. The equilibrium indeed lies heavily on the side of protonated form in the intestines, but the non-polar form is supplied in adequate quantities. Therefore the substance can be absorbed. When a pKa value of 11 is reached by a substance, the concentration of the neutral, bioavailable form is too low for good absorption. 19.5 Absorption Profiles of Acids and Bases 407
  • 425.
    aqueous phase. Howeverit is very quickly replenished by this equilibrium. In the balance, a continuous transport of substance from the aqueous phase into the membrane is achieved. The small amounts of uncharged neutral form is the door over which the entire process takes place. The rate of the transition into the lipid layer does not depend on the (often very low) concentration of the neutral form, but rather on: • The total concentration of the compound, • The rate constants of the dissociation equilibrium, • The diffusion constant of the compound. Accordingly, a shift in the pH–absorption profile is observed in biological systems for lipophilic acids and bases relative to the pH–partition profile, which is referred to as pH shift. This always occurs in the direction toward the neutral point, that is, with acids to higher and with bases to lower pH values (Fig. 19.8). The larger the lipophilicity of an acid or a base, the larger the observed shift in the absorption profile. To judge the question of how well a substance is absorbed, the log P value and the pKa values must not be considered separately. Their cooperation is decisive. For the design of new drugs, this means that a substance with an unfavorable partition behavior, that is, with a too high or too low a pKa value, can be beneficially modified in the desired direction by increasing its lipophilicity. To describe the pH dependency of the distribution equilibrium, a distribution coefficient D was introduced as a supplement to the partition Amount of an acid, AH, distributed or absorbed pH–Absorption Diagram (dynamic equilibrium) Δ pH = pH shift pH–Distribution Diagram (dynamic equilibrium) pH Value Fig. 19.8 The dependence of the absorption of lipophilic acids on the pH value, the absorption profile (red curve) decidedly deviates from the pH distribution curve (black curve, see Fig. 19.6). Although the pH-distribution profile is valid for an equilibrium system, a steady-state equilibrium is established during absorption. Even at relatively high pH values, that is, when small concen- trations of the neutral species are present, a fast absorption of these few molecules is achieved. Because of the high anion concentrations and the continuous adjustment of the dissociation equilibrium, a minimally necessary concentration of the neutral species is maintained. The shift in the pH-absorption profile is referred to as a pH shift. Analogous shifts are observed in the opposite direction for lipophilic bases. 408 19 From In Vitro to In Vivo: Optimization of ADME and Toxicology Properties
  • 426.
    coefficient P. Forthis, the ratio of the sum of all concentrations of ionized and non- ionized forms of an investigated compound in the two phases are considered. The pH value is adjusted for measurement in a buffer solution so that the addition of the investigated compound does not shift the pH. Usually log D, logarithm of the distribution coefficient, is used in this place. 19.6 What Is the Lipophilicity Optimum of Drugs? Lipophilicity plays an important role in the appraisal of the therapeutic suitability of a pharmaceutical. This is valid for the absorption, distribution, metabolism, as well as the excretion. With the exception of substances that are taken up via a transporter, the absorption is usually better when the compounds are more lipophilic. This advantage is limited by the solubility in aqueous phases, which decreases severely as the lipophilicity increases. The solvation enthalpy and the rate by which the solid of an active substance dissolves in the gastrointestinal tract are also decisive for the bioavailability. These factors depend on the intermolecular interaction in the crystalline solid and can vary severely from one polymorphic crystal modification to the other. Therefore correlations to predict bioavailability regard the melting point as an additional parameter apart from the lipophilicity and the solubility. In addition to the solubility, the kinetics of dissolution are important for galenic formulations, that is, the final drug preparation. It determines the amount of substance that goes into solution during the gastrointestinal passage. This amount can be increased by different factors such as: • Increasing the surface area by grinding the crystals into miniscule particles (micronization), • Growing a modified crystal with better solubility properties, • Crystallization under special conditions to afford a more uniform (usually smaller) size, or crystals with lattice defects, • Changing the salt form, • Adding solubility-mediating additives, • Embedding the drug as amorphic solid solutions of easily dissolvable polymers. Because of its importance, techniques to measure the solubility on a high- throughput scale have been established in the last years. Cell cultures are also increasingly used as in vitro models to record substance absorption. A thin layer of cells from human colon carcinomas (so-called Caco-2, HT29, or MFCH cell lines) is grown in a two-chamber system. The transport of active substance can be followed from both sides, this is either the so-called apical or basolateral side. Because these cells also express transporters, the involvement of specific transportation mechanisms of substances can also be studied. These models are less suitable for the study of the possible consequences of substance metabolism because the metabolizing enzymes (▶ Sect. 27.6) are only expressed in diminished quantities by these cells. Relevant in vitro test models have also been developed to study blood–brain barrier penetration. These models are experimentally relatively laborious, and the 19.6 What Is the Lipophilicity Optimum of Drugs? 409
  • 427.
    results often canonly be compared within a series of structurally related substances. Assay systems with artificial membranes (PAMPA, from parallel artificial membrane-permeability assay) can be constructed that allow high-throughput screening. Moreover, the penetration behavior in liposomes can be evaluated by surface plasmon resonance. When experimentally determining the absorption of different substances, results obtained from saturated solutions of the substances should not be compared with results from solutions with concentrations well below the saturation limit. Due to the lower solubility of the lipophilic compounds their solutions will exhibit minor concentrations which pretends worse absorption. In the second case using compa- rable concentrations for all test compounds improved or good absorption is also found for the lipophilic substances. A comparison of such different experimental conditions will lead to incorrect conclusions. Further confusion occurs when the terms absorption and bioavailability are incorrectly applied (▶ Sect. 9.1). The absorption of a substance can be excellent, but the bioavailability is nonetheless poor. Lipophilic compounds and substances with a molecular weight of more than 500–600 Da are often well absorbed, but suffer from very fast biliary (via the bile) elimination. This usually happens during the first liver passage (first pass effect, ▶ Sect. 9.1) directly after absorption from the intestines. To achieve good bioavail- ability, the lipophilicity must not be too high. The excretion path also depends on the lipophilicity. In general, extremely lipophilic substances are more quickly metabolized, but are also toxicologically worrisome. Hydrophilic substances and polar metabolites, including those after conjugation with polar groups, are excreted via the kidneys. The excretion of lipophilic substance is usually accomplished hepatically, and subsequently over the intestines. Such substances often undergo oxidative metabolism, with the concomitant possibility of toxic metabolites being produced. Substances that interact with membrane-bound receptors or ion channels can often access their targets more easily if they are enriched in the surrounding membrane. For this, the substances should be lipophilic, or should carry a large lipophilic group with which they can be anchored in the membrane (▶ Sect. 4.2, ▶ Fig. 4.2). 19.7 Computer Models and Rules to Predict ADME Parameters Aside from the set-up of suitable test systems to systematically record parameters that determine the pharmacokinetic properties, major effort has been spent to establish rules and computer models to predict favorable ADME properties. In the first place, the rule of five must be mentioned, which was developed by Chris Lipinski at Pfizer. Accordingly, an active substance should not violate more than two of the rule of five in Table 19.1. These simple rules were derived from experience and are almost exclusively used to preselect compounds for screening. Tudor Oprea refined these rules further and extended them to cover the occurrence of particular structural building blocks such as, for instance, the maximum number 410 19 From In Vitro to In Vivo: Optimization of ADME and Toxicology Properties
  • 428.
    of rings ofa particular size. Programs such as CLOGP, or ACD/pKa, and Pallas/pKa have been developed to estimate lipophilicity and pKa values. To predict solubility, attempts are made to calculate solvation enthalpies. Permeability, absorption, and bioavailability predictions are based on empirical correlation models. For this, experimental observations are related to the chemical structure of the investigated molecules. The applied methods are derived from QSAR models presented in ▶ Chap. 18, “Quantitative Structure–Activity Relationships.” The properties to be predicted are described by models based on intuitively selected descriptors. Usually molecular parameters are consulted that are frequently derived from the molecular surface and are assumed to be decisive for the considered properties. In addition to routine regression analyses, more recent mathematical models such as neural networks, nearest-neighbor classifiers, decision trees, or machine-learning techniques such as support vector machines are applied. In addition to the easily evaluated rule of five, the following criteria should also be considered for rational design: substances that are meant to act in the periphery, for instance, cardiovascular drugs, should be relatively polar. Of course, a certain amount of minimal lipophilicity is necessary for their absorption. Due to the risk of central side effects, or the generation of toxic metabolites, this lipophilicity should not be too severely exceeded. Here the following motto is valid: better to be a little less potent than have all the other problems! A good therapeutic window is much more valuable for therapy than having picomolar affinity to a protein. Substances that act upon membrane-bound proteins and substances that act in the central nervous system should have a moderate to high log P value of 1. To avoid the development of toxic metabolites, the incorporation of the following is recommendable: • Easily conjugated groups, for example hydroxyl, amino, or carboxyl groups, • Preconceived metabolic cleavage points such as ester or amide bonds, • Oxidizable groups that lead to nontoxic and easily excretable metabolites, for example, methyl groups. Of course, this strategy should not be exaggerated, otherwise the substances are excreted too quickly. The biological half-life is then reduced to a value that makes a therapeutic administration in humans impossible. The structural consideration of properties that lead to optimal bioavailability, adequate biological half-life, and non-toxic metabolites represents a problem in the search for new active substances. Structure-based design of active substances initially concentrates on the fitting of a ligand to its binding site. Often, aspects that have to do with the pharmacokinetics and metabolism are not adequately considered in this phase. Disappointments at the end of a successful optimization in the preclinical phase, or at the very latest in the clinic, punish such a one-sided Table 19.1 Criteria for the rule of five. Molecular weight 500 Da Partition coefficient log P 5 No more than 5 H-bond donor groups No more than 10 H-bond acceptor groups 19.7 Computer Models and Rules to Predict ADME Parameters 411
  • 429.
    approach. Because thespatial structures of transporters, channels, and metabolic enzymes are increasingly becoming available structure-based design can be used to test for cross-reactivity of proposed or developed ligands on these target structures. Binding to the potassium-ion-transporting hERG ion channels leads to their block- age. A consequence could be life-threatening cardiac arrhythmias (▶ Sect. 30.3). For this reason, QSAR models were developed that can examine molecules for a possible hERG channel binding. Methods for the direct docking of ligands in structural models of the channel have also been developed. Another system that was recently structurally characterized is the membrane-bound glycoprotein GP170. It is a transporter that can expel drugs from the cell (▶ Sect. 30.8). It is desirable to avoid interactions with this protein as much as possible. Another large family of enzymes worthy of attention are the cytochrome P450 metabolic enzymes (▶ Sect. 27.6). Here an attempt is made to estimate how drugs interact with these proteins and how they are metabolized. A wide field is opened here for structure-based design. 19.8 From In Vitro to In Vivo Activity Active substances are initially investigated in simple in vitro test models, for instance, with respect to enzyme inhibition, receptor binding, in cell cultures, and later in organs and animal models. As a general rule, the simplest model is chosen for which the results are predictive of the effect that can be expected in an animal or in humans. For this it is necessary to derive quantitative relationships between the different test models, so-called activity–activity relationship. This describes the relationship between biological activity, for instance, between in vitro and in vivo data. In the best case, it even allows the extrapolation of the values of binding affinity in an inhibition assay to the therapeutic effect in humans. The confirmation of a correlation between a simple test model and a therapeutic effect is often more important than the derivation of a structure–activity relation- ship. After finding the relevant, quantitative relationship, inexpensive and quickly performed tests can be used instead of laborious animal experiments. The number of animal tests is reduced in this way considerably. But that is not the only advantage. The use of automated molecular testing systems allows the profile of active substances to be reliably characterized. 19.9 Natural Ligands Are Often Unspecific Prior to the biological testing of an active substance: the following questions must be clarified. What therapeutic goal should be achieved? How is this goal to be realized? Therapeutic concepts are derived from the pathophysiology of the disease mechanism. Regulatory intervention with drugs should restore the original physi- ological condition as far as possible. Problems can occur in the process: to imitate natural ligands of enzymes and receptors, the active substance must demonstrate adequate specificity and must distinctly access the target site. 412 19 From In Vitro to In Vivo: Optimization of ADME and Toxicology Properties
  • 430.
    Nature works withtwo orthogonal principles with respect to endogenous substances: the specificity of the effect and a usually very pronounced spatial compartmentalization. Hormones act overwhelmingly systemically, that is, they are released at one site in the organism and transported through the circulation to another, entirely different site. There they exert their action. Other substances, for instance, neurotransmitters, act strictly locally. In the context of the picture of the lock and key (▶ Sect. 4.1), Nature prefers to have a master key that can act on different locks. It acts only at the site of its production and is removed as soon as it has fulfilled its tasks. The neurotransmitters are synthesized in nerve cells, stored, and released upon stimulation of the cell at the synaptic gap (▶ Sect. 22.5). There they bind to specific receptors and exert a stimulation of the neighboring nerve cell. The effect quickly subsides after reuptake in the cell or after decompo- sition, for instance, by monoamine oxidases (amines), esterases (acetylcholine), or peptidases. The efficiency of Nature is documented especially impressively in the variety with which small molecules, for example, adrenaline and noradrenaline (▶ Sect. 1.4), can be used as hormones as well as neurotransmitters. A plethora of different receptors and receptor subtypes are available for these substances, with which entirely different effects can be induced with the same molecule. The recoding of the amino acid sequence of a particular receptor, and therefore an alteration in its binding site, is relatively easy to do on the gene level. The evolution of complex biosynthetic pathways of non-peptide ligands, which often occur over multiple enzyme-catalyzed steps, is considerably more intricate. Accordingly, almost all neurotransmitters and many hormones are derived in a simple way from the central intermediates of metabolism of, for instance, amino acids. On the other hand, the steroid hormones (▶ Sect. 28.3) prove that Nature can achieve very different effects with a set of chemically similar structures and evolutionarily and structurally related receptors, for instance, as with the estrogens, gestagens, androgens, glucocorticoid steroids, and mineralocorticoid steroids. Frequently, the spatial distribution of biosynthesis or the release of a receptor ligand or the distribution from membrane-bound receptors or enzymes plays a decisive role for the specificity of an effect. Different effects are achieved by the same ligand through locally restricted substance release or through the presence of different receptors. In doing so, there is not only a differentiation between particular organs or areas, but also between individual cells and cell compartments. This is how, for example, the dopamine concentration in different regions of the rat brain was determined. Whereas in some regions, for example, the caudate nucleus (Lat.: Nucleus caudatus), an important synaptic site for the motor system and the olfactory system, concentrations of up to 100 ng dopamine per mg protein are reached, most of the other areas of the brain contain only between 0.2 and 10 ng/mg. Even in the Substantia nigra of the mesencephalon, the dopamine level is only 5–6 ng/mg. The degeneration of dopaminergic neurons in this area leads to Parkinson’s disease in humans. It is known from labeling experiments that the distribution and population density of receptor subtypes in diverse areas of the brain and other tissues can be very different. 19.9 Natural Ligands Are Often Unspecific 413
  • 431.
    19.10 Specificity andSelectivity of Drug Interactions How specifically should a drug act? There is no absolute answer to this question. Because active substances are almost always administered orally or intravenously, they act systemically, that is, on the entire organism. The lack of limitation to a particular organ or a particular compartment must be compensated for with a higher specificity. At any rate the drug must act as specifically as necessary to achieve a successful therapy with tolerable side effects. In the case of enzyme inhibitors substances are preferred that act so specifically that only one particular enzyme is inhibited. Unspecific inhibitors that simulta- neously inhibit multiple serine or metalloproteases would wreak havoc in an organ- ism. A thrombin inhibitor, which should reduce an increased thrombosis risk, must not act simultaneously as an inhibitor of the closely related plasmin, which causes fibrinolysis, leading to dissolvation of blood clots that have already formed. The situation with kinase inhibitors (▶ Sect. 26.3) is a bit different. Because of the similarity among kinases one member of the family can take over the task of another related kinase, which has been blocked. In doing so it reduces the therapeutic effect to nothing. Here, a broad-spectrum kinase inhibitor might be desirable that can simultaneously shut off an entire protein family. A broad-spectrum action that inhibits multiple isoenzymes of a parasite equally well can also be beneficial for antibacterial or antiparasitic compounds (e.g., plasmapepsins, ▶ Sect. 24.7). Receptor agonists and antagonists should also display a high selectivity. b-Agonists that are used to treat asthma (▶ Sect. 29.3) must be b2-specific so that they do not induce an undesirable increase in the heart rate or blood pressure. Often the necessary effect of a drug cannot be achieved with only one drug. The simul- taneous use of multiple drugs is often indicated for the treatment of arterial hypertension (▶ Sect. 22.10). More complex, multifactor-induced disease pro- cesses must be treated by addressing multiple mechanisms. Because of the low dosing of the different components, the unspecific side effects of the individual different components fade into the background. The specificity is critical for the effect of CNS-acting drugs. Progress in gene technology has provided us with an explosion of knowledge about receptors, but also a dilemma. We know the exact receptor profile of established substances. We know what specificity must be achieved to imitate a particular type of action. But in many cases, we do not know which profile should be present to achieve a better therapeutic effect. An example should clarify this point. Neuroleptics and many antidepressants (▶ Sect. 1.6) act on neuroreceptors. The classic neuroleptics chlor- promazine 19.1 and haloperidol 19.2 (Sect. 19.9), which are used in the treatment of schizophrenia, are relatively unspecific dopamine receptor antagonists (Table 19.2). The mixed-type neuroleptic/antidepressant sulpiride 19.3 acts on the D2 and D3 receptors simultaneously. All of these substances cause side effects on the muscular–skeletal system, as is observed in Parkinson’s disease (Sect. 19.4), which is caused by a dopamine deficiency. Because of its mode of action, it was assumed that the side effects of the neuroleptics were inevitable consequences of antagonism of the dopamine receptors. Then an atypical neuroleptic, clozapine 19.4, 414 19 From In Vitro to In Vivo: Optimization of ADME and Toxicology Properties
  • 432.
    came along (Fig.19.9). It does not have the described side effects. Today we know that clozapine, in contrast to the other neuroleptics, acts much more potently on the D4 receptor than on the D2 and D3 receptors (Table 19.2). At the concentration at which clozapine acts on the D4 receptor, and which was also measured in the cerebral spinal fluid of the treated patients, is sufficient so that clozapine also binds to particular serotonin and muscarine receptors, partly with even higher affinity. Because of this it could also be that the antagonistic effects of clozapine on these receptors are responsible for the atypical effects. Table 19.2 The natural neurotransmitter dopamine binds with higher affinity to dopamine receptors of the D1-type. The classic neuroleptics chlorpromazine 19.1, haloperidol 19.2, and (S)-sulpiride 19.3 are different from clozapine 19.4 (Fig. 19.9) in one point: they have no comparable selectivity for the D4 receptor. Substance Binding to the dopamine receptors, Ki in nM D1-Type D2-Type D1 D5 D2 D3 D4 Dopamine 0.9 0.9 7 4 30 Chlorpromazine 19.1 30 130 3 4 35 Haloperidol 19.2 80 100 1.2 7 2.3 (S)-Sulpiride 19.3 45,000 77,000 25 13 1,000 Clozapine 19.4 170 30 230 170 21 S N N Cl F N OH Cl O N N N N Cl H N H O N SO2NH2 19.1 Chlorpromazine 19.4 Clozapine 19.3 Sulpiride 19.2 Haloperidol MeO Fig. 19.9 Chlorpromazine 19.1, haloperidol 19.2, and sulpiride 19.3 are neuroleptics with typical side effects that are associated with dopamine antagonists. Clozapine 19.4 is different from these substances in its binding profile on the dopamine receptors (Table 19.2) as well as in its side effects. 19.10 Specificity and Selectivity of Drug Interactions 415
  • 433.
    Many drugs areclassified as “dirty drugs” because of their multifaceted action on many, totally different receptors. From the pharmacologists’ point of view, such a characterization is appropriate. A general statement about the therapeutic value cannot be derived from that. It may well be that many dirty drugs are optimal for therapy because of their balanced action on multiple receptors. Recently, these compounds have been termed “rich in pharmacology” and they define a “polyphar- macology.” The suitability or unsuitability of a drug is only decided in the clinical testing and later by the experience from broad application in patients. The differences between enzymes and receptors in different species also offers a chance to therapeutically achieve desired selectivity. Species differences play a role if an undesired organism should be killed, that is, with antibiotics, antimycotics, antivirals, and antiparasitic drugs. To avoid side effects in humans, the metabolic pathways of the bacteria, fungus, viruses, or parasites are purpose- fully attacked either by adequate selectivity or by selecting a point of action that is not present in higher organisms (see ▶ Sects. 23.7, ▶ 24.3, ▶ 27.2, or ▶ 30.8). 19.11 Of Mice and Men: The Value of Animal Models Quantitative activity–activity relationships serve to draw conclusions about humans from animals, but also valuable to compare different biological models to one another. From the huge plethora of examples that are described in the literature, only a few typical relationships will be mentioned here. Even before the characterization of the different dopamine receptors, (Sect. 19.10, Table 19.2) 25 clinically used neuroleptics were investigated to 2 1 0 Log (average clinical dose, in mmol/kg) Haloperidol (r = 0.87) Dopamine (r = 0.27) Log (K i receptor binding, in mM) −1 −2 −5 −4 −3 −2 −1 0 1 Fig. 19.10 The agonist dopamine preferably binds to the D1-type of dopamine receptors (Table 19.2). It was clear very early, however, from binding studies on membrane homogenates that the potency of clinically used neuroleptics correlated with the displacement of haloperidol (r ¼ 0.87) rather than with dopamine binding (r ¼ 0.27). 416 19 From In Vitro to In Vivo: Optimization of ADME and Toxicology Properties
  • 434.
    unravel correlations betweenthe results of in vitro models, animal experiments, and the potency of these substances in humans. Two radioactively labeled ligands, dopamine and haloperidol 19.2 (Sect. 19.10, Fig. 19.9), one of which prefers the D1-type and the other prefers the D2-type dopamine receptor, were used to charac- terize binding. It was demonstrated that the average clinical dose significantly correlated with the displacement of the D2-type ligand haloperidol 19.2. Signifi- cantly higher concentrations were needed to displace the D1-type ligand dopamine. A correlation with these data is virtually non-existent. Not only the clinical efficacy, but also the data from animal models that are used to test for neuroleptic effects correlate better with the displacement of haloperidol than with dopamine (Table 19.3). In hindsight, the results suffer from a lack of ligand specificity for a single receptor, and the preparations are affected by receptor heterogeneity because the presence of the different receptor subtypes was not standardized in the calf brain homogenates that were used. All substances were investigated with dirty ligands in dirty test models. The profile of active substances can only now be unambiguously assigned by using uniform receptor subtypes, which are produced by using gene technology (see Table 19.2). There are many cases in which the relationship between different test models depends strongly on the species used. Investigations on isolated arteries and veins from the lungs of rabbits, sheep, pigs, and humans show that the vascular preparations from rabbits and humans react to noradrenaline in a comparable way. Sheep and pig arteries are significantly less sensitive. Isolated pig veins cannot be stimulated at all at comparable doses of noradrenaline. The experimental results are even more inhomogeneous and difficult to interpret upon stimulation with acetylcholine. It must not be forgotten that the metabolism in humans and in animal species is also different and exerts an influence on the test results. Tachykinins are short peptides that trigger a wealth of physiological and patho- logical processes. Their central role in pain and asthma is certain. They act over the NK1, NK2, and NK3 receptor subtypes, which also bind specifically to the three Table 19.3 Correlation of the clinical efficacy (Fig. 19.10) of 25 different neuroleptics and their potency in different animal models that are typically used for the evaluation of neuroleptic effects with the displacement of dopamine or haloperidol 19.2. The clinical data and the results of the animal models correlate conspicuously better with the displacement of the D2-type ligand halo- peridol than with the displacement of the D1-type ligand dopamine (r ¼ correlation coefficient). Model Correlation with dopamine displacement (r) Correlation with haloperidol displacement (r) Mean clinical dose in humans 0.27 0.87 Inhibition of the stereotypical behavior after application of apomorphine (rat) 0.46 0.94 Inhibition of the stereotypical behavior after application of amphetamine (rat) 0.41 0.92 Protection from apomorphine- induced emesis (dog) 0.22 0.93 19.11 Of Mice and Men: The Value of Animal Models 417
  • 435.
    peptide agonists substanceP, neurokinin A, and neurokinin B (▶ Sect. 10.7). CP 96 345, 19.5 a non-peptide NK1 antagonist, displaces substance P with high affinity in two human cell culture models and in guinea pig and rabbit membrane preparations. In membrane preparations from mouse, rat, and chicken brains, with which sub- stance P binds with entirely comparable affinities, 19.5 demonstrates IC50 values that are 60–500 times higher (Table 19.4). It is known from sequence-specific point mutations that the agonist substance P and the antagonist CP 96 345 bind to different regions of the receptor (see ▶ Sect. 29.7). The differences between humans and individual animal species are not surpris- ing considering that the amino acid sequence of the receptor proteins is usually different in multiple positions. The use of human proteins in molecular test systems is just as critical for the relevance of the achieved results as it is for the determi- nation of the 3D structures (▶ Chaps. 13, “Experimental Methods of Structure Determination” and ▶ 14, “Three-Dimensional Structure of Biomolecules”). This can be seen very cleary in the results of the aspartic protease renin (▶ Sect. 24.2). The inhibitors remikiren 19.6 and aliskirien 19.7 were tested on renins from different species. The renins of two primate species and humans were inhibited at very low concentrations. On the other hand, the renins from the rat and the dog, which are two species that are most commonly used in cardiovascular pharmacol- ogy, were inhibited at conspicuously higher concentrations (Table 19.5). Remikiren would have indeed been found in a classical test for blood-pressure-lowering Table 19.4 Binding of substance P and displacement by the antagonist CP 96 345 19.5 (tested as a racemate) on cells of different origins. OMe N NH 19.5 CP 96 345 System Binding of substance P; IC50 in nM Displacement of substance P by 19.5; IC50 in nM Human cell line U373 0.13 0.40 Human cell line IM9 0.22 0.35 Guinea pig brain 0.07 0.32 Guinea pig lung 0.04 0.34 Rabbit brain 0.16 0.54 Mouse brain 0.19 32 Rat brain 0.20 35 Chicken brain 0.26 156 418 19 From In Vitro to In Vivo: Optimization of ADME and Toxicology Properties
  • 436.
    effects, but itwould have been judged to be much too weak. A comparison of the X-ray structure analysis of the renins from the mouse and human also shows a conserved binding mode in the main chain of the peptide inhibitors that is common to other aspartic proteases. Subtle differences are found at the rim of the binding pocket arising from sequence differences between the species. The amino acid sequences of 5-HT1B and 5-HT1Db subtypes of the serotonin receptors of humans and rats show more than 90% identity. If the relationships between the individual amino acids are considered, a homology of 95% is obtained. Despite these similarities, a series of active substances bind with very different affinities to these two receptors. The difference is traceable to a single amino acid: the exchange of threonine 355 for an asparagine (Fig. 19.11). The human receptor is, from the point of view of the affinity, converted to the rat receptor by this mutation! After the exchange of this amino acid, the b-blockers propranolol and pindolol bind with approximately three orders of magnitude higher affinity. The affinities of many other ligands, on the other hand, are significantly reduced. Table 19.5 Inhibition of the renins of humans and other animal species by remikiren 19.6 and aliskiren 19.7. S N O N OH O O H H O OH NH N H 19.6 Remikiren O O O O H2N OH N H NH2 O 19.7 Aliskiren Renin from: IC50 in nM, Remikerin IC50 in nM, Aliskiren Human 0.8 0.6 Monkey 1.0 1.72 Dog 107 7 Rat 3,600 80 19.11 Of Mice and Men: The Value of Animal Models 419
  • 437.
    This may indeedonly be a weak indication, but it can be speculated that the two b-blockers bind to the mutated 5-HT receptor as they do to the b-receptor. 19.12 Toxicity and Adverse Effects One of the most difficult chapters in preclinical research is the estimation of the toxicity of a substance, above all the human toxicity, from data that were obtained from other species. Such considerations must be made to be able to estimate the potential danger of the substance before it is introduced to the clinic. Are there any drugs without toxicity and without side effects? Paracelsus recognized in the sixteenth century: Everything is poison and nothing is without poison, it is the dose alone that makes a thing non-poisonous. Friedrich Schiller had his Fiesko say: A desperate evil needs a bold medicine. 4 4 5 6 7 8 9 5 6 7 log 1/K i (Rat) log 1/Ki (Human) N,N⬘-Dipropyl-5-CT Rauwolscine 5-OMe-diMe-tryptamine Methysergide Sumatriptan (−)-Propranolol Pindolol Metergoline 5-Carboxamido-tryptamine (5-CT) 5-Hydroxytryptamine 8 9 Fig. 19.11 Different serotonin receptor ligands and the b-blockers propranolol and pindolol show very different binding affinities on very similar 5-HT receptors from rats and humans. The open circles refer to the wild-type human receptor. They are irregularly distributed over the diagram (correlation coefficient r ¼ 0.27). If one amino acid in the human receptor is exchanged for the corresponding amino acid in the rat receptor, the binding profile changes. Relative to the affinity of the ligands, the human receptor becomes a rat receptor. The black-filled circles refer to this Asn355 mutant (correlation coefficient r ¼ 0.98). 420 19 From In Vitro to In Vivo: Optimization of ADME and Toxicology Properties
  • 438.
    And the pharmacologistGustav Kuschinski formulated: Whenever it is proclaimed that a substance has no side effects, the urgent suspicion ensues that there is also no main effect. The determination of the acute toxicity in multiple animal species, and the determination of the chronic toxicity in at least two animal species is routine before entry into clinical trials, phase I, which is tolerability testing on healthy volunteers. It is considered to be standard that the species for the chronic toxicity investigations should be selected according to which animal species displays the most similarity to humans in their pharmacokinetics and metabolism. Cats and guinea pigs react extremely sensitively to cardiac glycosides. Therefore they were previously used as models for the effect on humans. Rats react consid- erably less sensitively. The hallucinogen lysergic acid diethylamide (LSD 2.21, ▶ Sect. 2.5) shows decidedly different toxicity in multiple animal species. An experiment to test the hallucinogenic effects of LSD on an elephant led to a disaster. A hallucinogenic, but non-toxic dose was desired. Despite carefully estimating this dose, the elephant died within minutes after 0.3 g of LSD (corresponding to 0.06 mg/kg) was administered. Relative to the mouse, which is relatively insensitive (Table 19.6), the elephant reacted at least 1,000 times more sensitively. This experiment was not repeated! The discoverer of LSD, Albert Hofmann, took 0.25 mg of LSD in his first controlled self-experiment. With about 0.0035 mg/kg he was significantly below the dose that cost the elephant its life. Despite this, it can be assumed that LSD is less toxic for humans than it is for elephants. Direct fatality through LSD is not known, only mortality that occurs as a result of accidents or from suicide while in the psychotic state. The toxicities of poisons that end up in our environment are very exactingly investigated. Chlorinated dibenzodioxines and furans form during the uncontrolled chemical decomposition of the corresponding substituted chlorophenols. The Seveso accident is attributed to such an incident. Toxic chlorinated dioxins and furans also occur during many burning processes. Tetrachlorodibenzodioxine 19.8 (TCCD, “Seveso dioxine”) belongs to one of the best investigated substances regarding its toxicity. Even here, different species react differently (Table 19.7). Three orders of magnitude difference is found between the two relatively closely related species of hamster and guinea pig. Accordingly, it is difficult to draw conclusions about the toxicity in humans. If an extrapolation is made between primates and humans, TCCS would be classified as relatively non-toxic. In con- nection with humans, the definition of an acute LD50 is absolutely inappropriate. Table 19.6 Acute toxicity of lysergic acid diethyl amide (LSD, 2.21, ▶ Sect. 2.5, ▶ Fig. 2.8) in different species and in humans (LD50 ¼ dose that was lethal for 50% of the animals). Species Toxicity; LD50 (in mg/kg) Mouse 50–60 Rat 16.5 Rabbit 0.3 Elephant 0.06 Human 0.003 19.12 Toxicity and Adverse Effects 421
  • 439.
    To be ableto exclude one fatality per one million people, an “LD0.00001” must be determined or calculated. Because of its pronounced mutagenic effects, the long-term damage stands in the foreground with TCDD. It is questionable in this case whether an absolute no-effect level, that is, the lowest ineffective dose, can be defined. The estimation of the potential danger of environmentally relevant chemicals looks entirely different if considered relative to toxic natural products, natural radioactivity, cosmic radiation, etc., or even when compared to socially tolerated substances of abuse such as alcohol and nicotine. This puts some things into perspective that are very contentiously discussed in public forums. A difficult problem must be mentioned when discussing structure–activity relationships from in vitro investigations in order to estimate the mutagenic and carcinogenic potential. Such tests indeed afford valuable information that must be carefully checked. In individual cases they are neither in the positive nor the negative sense proofs. To develop theoretical models for toxicity and carcinogenic estimation that have adequate reliability and predictive power has proven to be extremely difficult. The mechanisms that are responsible for the activity are too diverse and multifac- eted, and the chemical structures and structure–activity relationships, which are only valid for one substance class, are too different. Today, testing for toxic, carcinogenic, and teratogenic adverse effects has reached a high standard. The pharmaceutical catastrophes of earlier decades such as the following would be almost impossible with today’s standards: • Early childhood brain damage and death of many premature and mature new- borns by the sulfonamides in the late 1930s, Table 19.7 Acute toxicity of tetrachlorodibenzodioxine 19.8 in different animal species. O Cl Cl O Cl Cl 19.8 2,3,7,8-Tetrachlordibenzodioxin Species Toxicity (LD50 in mg/kg) Mouse 114–280 Rat 22–320 Hamster 1,150–5,000 Guinea pig 0.5–2.5 Mink 4 Rabbit 115–275 Dog 100 300 Monkey 70 Human ? 422 19 From In Vitro to In Vivo: Optimization of ADME and Toxicology Properties
  • 440.
    • Over 100fatalities in the USA because of the use of diethylene glycol as a solvent for sulfanilamide (this incident led to the foundation of the Food and Drug Administration, FDA.), • The SMON (subacute myelo-optic-neuropathy) illness of thousands of Japanese, caused by the prolonged and too-frequent use of an antidiarrheal medicine, • The severe birth defects of approximately 10,000 children worldwide that were caused by thalidomide (Contergan® ) in the late 1950s. Nonetheless, criminal intrigue and the uncontrolled distribution of faked drugs from internet-based providers or the unscrupulous pursuit of economic advantages can still cause such catastrophes today. The melamine-contaminated baby formula (melamine makes the protein content of inferior or diluted milk seem higher) in September 2008 in China, from which many thousand toddlers and babies were sickened and several even died, serves as an example. Moreover, in addition to the markedly stricter testing guidelines for medicines that exist today in most countries, there is a reporting system that registers and investigates adverse drug effect incidents. The slightest suspicion of a causal relationship results in anything from public announcement or warning all the way to the withdrawal of the marketing license. A complication for the estimation of the toxicity is the formation of toxic, and particularly reactive metabolites, even in small amounts. As was already discussed in ▶ Sects. 9.1 and 19.6, an ideal drug should contain predetermined cleavage and/ or conjugation sites in addition to finely tuned pharmacodynamics and pharmaco- kinetics. The more these requirements are fulfilled, the lower the risk that the substance will exert toxic effects. Some toxicity studies suffer from the fact that the extrapolated results to humans reflect a higher toxicity than is actually the case because of the unphysiologically high doses that are used in the studies. On the other hand, even the most compre- hensive investigation cannot eliminate the risk of serious adverse effects occurring in extraordinarily rare cases once the drug is used broadly. An adverse effect ratio of 1:10,000 or less can remain undiscovered in even the most careful preclinical and clinical trial. Toxic side effects in humans are particularly seen after chronic pharmaceutical misuse. The life-long consumption of large amounts of pain medication sums up to kilogram amounts. In the case of phenacetin (▶ Sect. 2.1), this led to the conse- quence that an effective and