Presentation about how much bioinformatics involved in the medical field. This was presented at the University of Colombo in 2007 for an undergraduate seminar
2. The case
45 year old Joseph has severe pain in his
abdominal region.
Doctors are trying to diagnose what the cause is.
3. Diseases and Diagnosis
• Diagnosing and curing diseases has always
been and will always continue to be an art.
• The diagnosis and therapy involve many
aspects like biochemical, pharmacological,
physiological etc…
4. Molecular Biology
• The recent developments in science have
surpassed the traditional aspects in diagnosis
and cure, and introduced a new aspect: “the
Molecular aspect”, molecular biology behind
medicine is the reason for this new aspect.
5. Central Dogma of Molecular Biology
17/01/2020 Bioinformatics and Drugs 6
Transcription
Splicing
Post translational
modifications
Primary
Structure
Secondary
Structure
Tertiary
Structure
Quaternary
Structure
Proteins
Translation
Points of
Diagnosis
6. Molecular Circuitry
• All cellular organisms are considered as complex
networks of molecular interactions that fuel the
processes of life.
• These processes on the whole can be termed as
“Molecular Circuitry”.
• The molecular circuitry has intended modes of
operation leading to healthy states of the
organism and aberrant modes of operation
leading to diseased states of the organism.
17/01/2020 Bioinformatics and Drugs 7
7. Molecular circuitry cont…
• Molecular circuitry is not only restricted to the central dogma.
• It involves all the processes within and between cells.
• For example
• This may be a overly simplified diagram because there are
thousands of molecules and processes involved in metabolic
pathways which carry out metabolic functions.
17/01/2020 Bioinformatics and Drugs 8
Metabolism
Catabolism
Breakdown
of molecules
Anabolism
Synthesis of
molecules
8. Molecular Circuitry
• In molecular medicine the question in the
diagnosis is to find out the molecular basis of the
disease.
• Find out what has gone wrong in the molecular
circuit.
• The abnormality of the circuit might be an
internal problem or due to an external stimuli.
• Goal of the therapy is to guide the biochemical
circuitry back to its healthy state.
9. The case
• Doctors have diagnosed a tumour in the
abdominal region as the reason for Joseph’s
pain.
• The reason for the tumour growth has to be
diagnosed yet.
• It might be a fault in the genome or due to
infections from retroviruses.
17/01/2020 Bioinformatics and Drugs 10
10. Bioinformatics in Medicine
• Traditionally the medical diagnosis were
reliant on chemical diagnosis of diseases.
• Now these have taken a new approach after
the introduction of Genomics and
Bioinformatics.
Genomics
Bioinform
atics
Molecular
Perspective
of Diseases
11. Bioinformatics
• Bioinformatics is defined formally as
Creation and advancement of algorithms,
computational and statistical techniques, and
theory to solve formal and practical problems
arising from the management and analysis of
biological data.
13. Bioinformatics
• Bioinformatics gained momentum only after
the beginning of genomics.
• The enormous amount of data resulting from
genomics gave bioinformatics a launching pad
to advance to the heights of immense
importance it is enjoying now.
• The widespread use of computers when they
became cheap and powerful is another reason
for the advancement of Bioinformatics.
14. The case
• The diagnosis of tumor related diseases have
largely led to the discovery of cancer. If not
the next largest contributors are pathogens.
• Primarily the doctors have to do tests to
confirm or reject the suspicion that the tumor
is cancerous and if it is so which stage.
15. Cancer basics
• In broader sense the onset of cancer is due to
mutations in certain types of cells
• p53 gene is found in all cells and is important
for cell cycle control. Generally large portion
of cancers occur due to mutations in p53
genes.
•e.g. (ras)Proto - oncogenes
•e.g. (p53,APC)Tumor suppressor
genes
16. Diagnosis of Cancer
• The diagnosis of cancer starts with testing for
tumor markers.
• Bioinformatics is involved in DNA sequencing
as well as in microarray analysis.
Markers
• DNA,
• RNA
• Protein
tests
• Hybridization
• Protein assays
• DNA sequencing
• Microarray
examination
17. DNA Sequencing
Automated DNA Sequencing
Advancement
of Computers
Fluorescence
labelling
Sanger’s
Chain
Termination
Method
18. Automated Sequencing
• The Automated sequencer follows a simple
concept.
Reaction
Fluorescently
labeled Sanger’s
chain
termination
Separation Capillary
Electrophoresis
Detection
Detector detects
the wavelength
of fluorescence
and intensities.
19. Automatic sequencing
• The fluorescence readings are output as
fluorescent peak trace chromatograms.
• Commonly known as sequence traces they will
give the colour of the peak and the intensity.
• It will also be translated into the relevant
nucleotide sequence automatically.
20. Application of Bioinformatics
• The process of detecting and translating the
peaks in the sequence traces need special
hardware and a software built into the
sequencers.
• Special Software is needed to view the trace in
the computer.
• Latest programs like “Fast Chromatogram
Viewer” make the trace reading fast and
automatically trim the low quality ends.
Data
• Wavelength
• Intensity
Processing
• Base calling
• Assembly
• Finishing
Information
• Sequence
• Quality
values
21. Application of Bioinformatics
• The same genes are selected and sequenced
in both samples.
• The sequence obtained from the tumour
tissue is analysed with a sequence obtained
from normal tissue.
• The mutations in the gene can be found by
sequence alignment.
• Single nucleotide polymorphism (SNP) can
also be identified from sequencing.
22. SNP
• SNPs are becoming the practical marker of choice
for many studies. E.g. Association studies.
• They have a number of reasons for this
preference.
– Most abundant polymorphism in the human genome
– They are co-dominant/diallelic
– Single nucleotide substitutions, a type of SNPs are the
most frequent disease causing mutations.
– Found in coding & non-coding regions
• In silico SNP discovery has become a necessity.
23. SNP Discovery
• Direct SNP discovery methods use sequence
alignment from multiple individuals to identify
high quality mismatches.
• These programs calculate a statistical
likelihood of whether a site is actually a
polymorphism or a sequencing error.
• The statistical likelihood is generated using
alignment depth, sequence context and read
quality at the sites.
24. The Case
• After the sequence analysis doctors have
diagnosed that there is a mutation in one of
the genes.
• Generally cancer onset begins only when
several mutations accumulate.
• The doctors have to discover the genes which
are mutated and are the reason for this faulty
mode of the molecular circuit.
• DNA microarray technology is one of the
efficient way to discover this.
25. DNA microarray/DNA chip
This method analyses
expression levels of
100s and 1000s of
genes at once on a
small chip in a size of
a small stamp.
26. DNA microarray/DNA chip
• Around 6000 genes are spotted onto a 2x2 cm
glass slide by a special machine.
• Spots can be cDNA from mRNA or partial gene
fragments.
• A fluorescence microscope is used to scan the
red and green intensities of each spot and
stores the values under the position of a
particular gene in a computer.
27. DNA chip analysis
• Scanning and visualizing the spots aren’t feasible
without the use of computers.
• A computer calculates the relative intensities of
red and green fluorescence which gives the
relative expression levels of each gene.
• Capturing of the fluorescence, storing of intensity
values and calculation of relative intensity values
requires the use of bioinformatics.
• Co-regulated genes also can be found using
microarray analysis
28. The Case
• The doctors after the DNA Microarray analysis
have found that only one gene is mutated.
• The only gene mutated is not enough for the
onset of cancer and doctors can only say he is
susceptible to cancer but the tumour is not
malignant.
• The next step is to look for any pathogens
which cause tumours or tumour like
symptoms.
29. Diagnosis of infections
• Tradition medicine has been reliant on
patients history in the diagnosis of pathogenic
infections.
• Several microbiological laboratory methods
such as cell cultures and serological methods
have been used to identify pathogens causing
the disease.
• Even though the molecular methods are on
the expensive side they have been accurate
and less time consuming for the identification.
30. Molecular methods
• The Blotting necessarily doesn’t need any
bioinformatics, but the other two methods are
reliant on bioinformatics.
Molecular
diagnosis
Blotting
Northern Southern Western
DNA sequence
analysis
PCR based
methods
31. Genome sequencing
• Identification of various pathogens are now
being done using DNA sequence analysis and
PCR based methods.
• Some of the latest pandemics and epidemics
are, the re-emergence of the same pathogen.
• Genome sequencing is an effective tool to
identify the strain of a pathogen.
32. Genome sequencing
• The genome of an organism comprises of its
entire complement of DNA. Excluding the sperm
and egg cells in complex organisms.
• The genome of an organism cannot be sequenced
with an automatic sequencer directly because of
its size.
• The sequencing machine can only sequence DNA
fragments up to the size of 1000bp.
• Usually Genomes are much larger than that
– E.g. E. coli 3 Mb, Human Genome 3Gb
33. Genome Construction
• In-ability of the sequencer machine requires
that the genome sequence to be constructed
from the small sequence reads produced by
the sequencers.
• The sequenced fragments have to be
assembled to construct the long fragments to
get the whole genome sequence.
35. DNA Sequence assembly
• In both these methods the short fragment
sequences have to be assembled based on the
overlapping regions of the fragments.
• The problem is that some bases are less accurate
than the others in the read(a sequence trace).
• To accurately construct the sequence, contig
assembly programs are input the sequence of a
read as well a quality value for each base in the
read.
• Base calling programs like Phred prepare the
reads to be assembled by translating the digital
signals in to reads and quality values of bases.
36. Assembling sequences
Phase 1
• Pairs of similar
reads are
identified
• Poor end
regions of each
region is
clipped
• Overlaps
between reads
are computed
• False overlaps
are identified
and removed
Phase 2
• Reads are
joined to form
contigs
• Constraints are
used to make
corrections to
contigs
Phase 3
• A multiple
sequence
alignment of
reads is
constructed
• consensus
sequence
along with a
quality value
for each base
is computed
for contigs
37. Applications of a Genome Sequence
• The genome sequence of a pathogen can be
used to find unique sequences that can be
identified using blotting or DNA microarray
analysis.
• Genome can be analysed to find any genes
encoding proteins that are unique to the
organism that can be inhibited to kill the
pathogen.
38. The case
• The doctors have found out finally that the swelling is
actually not a tumour, but its is the SCAR tissue.
• When the TB infection spreads to other parts of the
body other than lungs. Body tries to form scar tissues
by fibrosis to isolate the bacteria from the rest of the
tissues.
• Since the SCAR tissues aren’t calcified they are
indistinguishable from tumours
• Doctors are currently treating Joseph with a course of
antibiotics.
• Modern day drugs are designed by in silco methods
with the use of bioinformatics rather than found by
trial and error.
39. Drug Designing
• Drug designing is the most important process
bioinformatics is involved within Medicine.
• The abnormal mode of molecular circuitry will
produce an undesirable protein or a necessary
protein at an unwanted level.
• This protein has to be identified and blocked if
it is unnecessary or is in high levels.
40. Drug Targets
• Proteins become drug targets when they can
be modified by external stimuli.
• Drug targets can be identified by several
methods including protein assays, PCR based
methods, immunoassays, etc.
• Drug Targets can be
– Enzymes
– Receptors
– Ion channels
• Targets can be from the Host or the pathogen.
41. Target Structure
• Once the target has been identified the 3D
structure of that protein has to be constructed
in order to obtain the conformation and
identification of its active site.
• Active site is the target to which drugs bind to
and the most important part of the target to
be analysed by bioinformatics.
• The binding will mostly inhibit the protein but
sometimes it may enhance as well.
42. Structure
• Protein structures can be determined by
laboratory methods like NMR and X-ray
crystallography.
• These methods are expensive and time
consuming.
• Computational structure prediction has
become an alternative to these methods.
• Structure is necessary to find ways to inhibit
or inactivate the proteins.
43. Structure prediction
Ab initio
Ab initio methods use only
sequence data and physics of
molecular dynamics to predict
the structure of the protein.
Comparative
modelling
Comparative modelling use a
template protein with similar
amino acid sequence to
predict the structure.
44. Importance of Structure
• The structure is the most important aspect of
the protein because even a slight mis-
conformation can cause the protein to lose it’s
function
• Whether the structure is obtained by
experimental methods or in silico methods the
structure has to be saved in a computer
readable format to be visualized or design
drugs for it.
46. Bioinformatics applied
• lead compound development is one of the key
elements in drug designing.
• A lead compound is any substance that shows the
biological activity needed.
• High Throughput Screening (HTS) is defined as
automatic testing of potential drug candidates
from a library of compounds. (large libraries
typically >200 000)
• The HTS was not very successful because of
problems like false positives, non-specific binding
etc.
47. Virtual HTS
• Introduced as an alternative and a
complementary approach to the experimental
HTS .
• Virtual HTS or Virtual screening comprises of a
variety of computational tools to select
potentially active and bio-available
compounds from libraries (databases).
• These tools range from simple filter systems to
complex molecular docking techniques.
48. Virtual HTS
• Like the experimental HTS needs a library of
compounds to screen.
• These are available as databases of various compounds
o e.g. World drug index, Available chemicals directory
• Combinatorial libraries also can be used for screening
purposes
• They are libraries which give a fast synthetic access to
vast number of compounds out of a limited set of
compounds(building blocks).
• Computers afford generating, handling and recording
of members of combinatorial libraries.
49. Steps in Virtual Screening
• Molecular
Weight
Fast Filter
Criteria
• Topological
Descriptors
Ligand similarity
based
• Fast
Docking
methods
Structure based
50. Fast Filter Criteria
• Apply fast filter criteria to eliminate compounds with
undesirable physiochemical properties.
• Drug candidates have to meet 5 requirements as
developed drugs.
o Absorption Distribution Metabolism Excretion Toxicity
(ADMET)
o Most drugs fail during later stages because they don’t
satisfy these requirements.
• Criteria to filter candidates corresponding to
acceptable ADMET parameters are applied in the early
stages.
o E.g. Molecular weight thresholds, total # of donor and
acceptor groups.
• The filtering selects about 1000-5000 compounds out
of hundreds of thousands of molecules.
51. Steps in Virtual Screening
• Molecular
Weight
Fast Filter
Criteria
• Topological
Descriptors
Ligand similarity
based
• Fast
Docking
methods
Structure based
52. Ligand-similarity Based virtual
screening
• Compounds which are biologically similar to a
molecule that binds with the drug target are screened
from the database.
• This method may be applied to proteins which are hard
to crystallize thus no structure data is available but
data about the binding molecule is available. E.g.
membrane bound receptors.
• This may also be applied as a pre-screening procedure
before applying Structure based screening because of
high computational requirements of 3D conformational
analysis.
• The similarity search can be done using 2D or 3D
descriptors.
53. Steps in Virtual Screening
• Molecular
Weight
Fast Filter
Criteria
• Topological
Descriptors
Ligand similarity
based
• Fast
Docking
methods
Structure based
54. Structure based virtual screening
• The geometry of the binding/active site is
used to screen for molecules based on their fit
into the binding pocket.
• Structure based screening is done when a 3D
structure of a protein is available.
• The Docking algorithms try to fit the ligand
into the binding pocket and obtain a score for
the fit to be compared with the other ligands
during screening.
55. Other application areas in drug
designing
• Improvement of Lead compound
– QSAR methods
• Pharmacokinetic studies
• Metabolism simulation
• Manufacturing modelling
• Process modelling
• Toxicology prediction
• Product stability modelling
• Disease simulation
56. The Case
• After six months of continuous treatment for TB
Joseph is finally cured of the infection.
• Thanks to the modern day diagnosis as well as
the drugs designed using bioinformatics.
• Timely diagnosis saved a lot of time for the
treatment and stopped the progress of the
disease.
• The potency of the drug was increased and the
toxic side effects were decreased.
57. Future: A Dream Diagnosis
A patient walks into a hospital with a
abnormality. Doctor asks him to give a blood
sample. While the patient is waiting within
minutes the blood sample is analysed and the
whole genome is sequenced. While patient
history is recorded blood report is prepared and
delivered to the doctor. With accordance to the
history and report disease is diagnosed.
According to the genome sequence the designer
drugs which are effective for the particular
patient is designed and delivered.
58. References
• Molecular Analysis of Cancer, Edited by Boultwood J,
Fidler C, Totowa New Jersey USA, 2002.
• Arthur M. Lesk, Introduction to Bioinformatics, New
York USA, Oxford University Press Inc, 2002.
• Lodish H, Darnell J et al, Molecular Cell Biology, 5th
Edition USA W. H. Freeman Company, 2004.
• Bryan Bergeron, Bioinformatics Computing, Pearsons
Education Inc, New Jersey USA, 2003.
• Bioinformatics - From Genomes to Drugs, Edited by
Thomas Lengauer, Wiley-VCH Verlag GmbH, Weinheim
Germany, 2002