Target Identification Using Systems Biology Approach

Target Identification
How? & Why?
Girinath G. Pillai, PhD
Nyro Research Foundation
Zastra Innovations
www.zastrain.com & www.nyroindia.org
pillai@nyroindia.org

What to expect?
➤ Target Identification
➤ Disease Interaction
➤ Network Mapping
➤ Target Prediction
➤ Protein Target Databases
11-11-2018 19:39 Girinath, Nyro Research Foundation 2018 Slide 2 of 58

A grand challenge for all of us is
how to best incorporate existing knowledge….
Martha S. Head, GSK, in 2009

Systems Biology Approach
➤ Computational modelling of molecular systems and integrative
interpretation of larger genomic datasets.
➤ Reliable methodology for target identification.
➤ Identification of target genes are done by understanding its
relation with associated interacting partners, diseases and
pathways.
11-11-2018 19:39 Girinath, Zastra Innovations 2017 Slide 4 of 58

Computational Approach - Mining
Genes, Interacting
partners, diseases
Systems
Biology
Approach
Prediction of hub genes
Active site prediction
Literature review, CASTp,
ScanProsite

Systems Biology Approach (contd…)
➤ Identification of Target Genes(TG) associated with candidate
drug molecule.
➤ STITCH database - Integrates information about interactions from
metabolic pathways, crystal structures, binding experiments and
drug target relationships.
➤ Input – Drug candidate molecules (.smiles format)
➤ Available at - http://stitch.embl.de/

SMILES
➤ Simplified Molecular-Input Line-Entry System (SMILES)
is a specification in form of a line notation for describing the structure
of chemical species using short ASCII strings
➤ SMILES specification was initiated by David Weininger in the 1980s
➤ .smi is the file format for SMILES
➤ SMILES form depends on the choices:
➤ of the bonds chosen to break cycles,
➤ of the starting atom used for the depth-first traversal, and
➤ of the order in which branches are listed when encountered.
➤ There are different variants of SMILES
11-11-2018 19:39 Girinath, Zastra Innovations 2017
Source:
OC(=O)C1CCCN1
Slide 7 of 58

STITCH Database
The STITCH database
currently covers 9'643'763
proteins from 2'031
organisms.
Escherichia coli K12 MG1655
STITCH is a database of known and predicted interactions
between chemicals and proteins. The interactions include
direct (physical) and indirect (functional) associations; they
stem from computational prediction, from knowledge
transfer between organisms, and from interactions
aggregated from other (primary) databases.

STITCH – Input Data

STITCH – Chemical Selection

STITCH - Network

STITCH - Enrichments

STITCH – Function Partners

Step 1
Compounds in Dataset (.smiles)
➤ Indirubin
➤C1=CC=C2C(=C1)C(=C3C(=O)C4=CC=CC=C4N3)C(=O)N2
➤Stigmasterol
➤CCC(C=CC(C)C1CCC2C1(CCC3C2CC=C4C3(CCC(C4)O)C)C)C(C)C
➤Sitosterol
➤CCC(CCC(C)C1CCC2C1(CCC3C2CC=C4C3(CCC(C4)O)C)C)C(C)C

STRING Database
➤ Identification of Interacting partners (IP) associated with Target Genes.
➤ STRING Database – Database of known and predicted protein-protein
interactions.
➤ Uses information derived from 5 different sources such as genomic context
predictions, high throughput experiments, co-expression, automated text mining,
and previous knowledge from database.
➤Available at – www.string-db.org

STRING
The STRING database currently
covers 9'643'763 proteins from
2'031 organisms.
STRING is a database of known and predicted protein-
protein interactions. The interactions include direct
(physical) and indirect (functional) associations; they
stem from computational prediction, from knowledge
transfer between organisms, and from interactions
aggregated from other (primary) databases.

STRING - Input

STRING - Selection

STRING - Network

STRING – Function Partners

STRING - Enrichments

STRING - Partners

STRING – Cooccurrence and Coexpression

Disease Assocation
➤ Identification of Diseases associated with Target Genes.
➤ DisGeNET Database – Comprehensive database integrating information on
human disease- associated genes and variants.
➤ Input – Target genes associated with biological function.
➤ Available at – www.disgenet.org/

DisGeNET
➤ 561,119 gene-disease associations (GDAs), between 17,074 genes
and 20,370 diseases, disorders, traits, and clinical or abnormal
human phenotypes
➤ 135,588 variant-disease associations (VDAs), between 83,002 SNPs
and 9,169 diseases and phenotypes
➤ The data in DisGeNET is organized according to type and level of
curation:
➤ CURATED: GDAs from UniProt, PsyGeNET, ClinVar, Orphanet, the GWAS
Catalog, CTD (human data), and Human Phenotype Ontology
➤ ANIMAL MODELS: GDAs from RGD, MGD, and CTD (mouse and rat data)
➤ ALL: GDAs from previous sources and from GAD, LHGDN and BeFree

DisGeNET – Input, Data, Results

Network Construction
➤ Cytoscape is an open source software platform for visualizing molecular
interaction networks and biological pathways and integrating these
networks with annotations, gene expression profiles and other state
data. Although Cytoscape was originally designed for biological research,
now it is a general platform for complex network analysis and
visualization.
➤ Network construction for
➤1. Drug candidate molecules-Target genes
➤2. Target genes- Interacting partners
➤3. Target genes - Diseases

Data Input - 1
➤ Bioactive compounds – TG
➤ Manually create table to curated
with chemical compound, target
genes
KAPPA CARAGEENAN TNFα
KAPPA CARAGEENAN IL-6
KAPPA CARAGEENAN GAPDH
INDIRUBIN YBX1
INDIRUBIN EGFP
CHOLESTA-5,22-TRANS-DIEN-3β-OL SREBP
CHOLESTA-5,22-TRANS-DIEN-3β-OL LXR
STIGMASTEROL FXR
STIGMASTEROL BSEP
STIGMASTEROL SHP
6-BROMOINDIRUBIN-3-METHOXIME GSK3A
6-BROMOINDIRUBIN-3-METHOXIME GSK3B
6-BROMOINDIRUBIN-3-METHOXIME ATAD5
6-BROMOINDIRUBIN-3-METHOXIME KPNB1
γ SITOSTEROL BSS
γ SITOSTEROL BSG

Data Input 2
➤ TG – IP
➤ Manually create table to curated with
chemical compound, target genes as
well as Interacting Partners
➤ The chemical compound name and
target genes name should match with
Input 1 table.
NPC2 NPC1
IL-6 GPD2
IL-6 MGLL
IL-6 AQP9
IL-6 AGPAT76
IL-6 AQP7
1L-6 AQP3
IL-6 AGPA79
GK GPAM
GK AKR1B1
GK AKR1A1
ETFA ETFB
ETFA ETFDH
ETFA ACADM
ETFA HADHA
ETFA CYCS
ELOVL5 FADS2
ELOVL5 HSD17B12

Data Input 3
➤ TG – Disease
➤ Manually table to curated with
chemical compound, target genes
as well as Interacting Partners and
disease association
➤ The chemical compound name and
target genes name should match
with Input 1 and 2 table.
NPC2 Mental Retardation
NPC2 Carcinogenesis
NPC2 Dull intelligence
NPC2 Intellectual Disability
NPC2 Low intelligence
NPC2 Poor school performance
NPC2 Mental deficinecy
NPC2 Seizures
NPC2 Dysfunction disorders
NPC2 Cardiovascular disease
IL-6 Cardiovascular disease
IL-6 Cadiovascular disease
GK Liver carcinoma
GK Lymphoma
GK Leukemia
GK Malignant neoplasm of prostate
GK Prostate carcinoma
GK Melanoma
SHP DiGeorge Syndrome
SHP Schizophrenia

Cytoscape - Final Network
➤ Tables TG and TG_IP to be merged
➤ Tables TG, IP and DC to be merged
➤ Network emphasizing interaction between bioactive
compounds from Eclipta alba, it’s TG, IP and diseases.

Network Topology Parameters
➤ Top target genes ranked based on Betweeness
➤ Top target genes ranked based on Closeness
➤ Top target genes ranked based on Degree
Rank Name Score
1 IL-6 3740
2 GK 2517
Rank Name Score
1 IL-6 33.567
2 IL-8 29.8
Rank Name Score
1 IL-6 11
2 IL-8 10

Prediction of Target Genes
➤ Network topology parameters such as Degree, Betweeness, Closeness,
Centrality helps in the identification of closely associated target protein.
➤ Genes present in all network topology parameters are considered as
hub genes (Target genes).
➤ IL-6 is found to be the hub gene for compounds derived from Eclipta
Alba, which is further associated with cardiovascular diseases.

IL6- Cardiovascular Diseases

Target Prediction
➤ Target prediction from literature
survey.
➤Binding site identification for co-
crystalized ligands.
➤Homology modelling and Ab-initio
modelling
➤Prediction of bioavailability by
BLASTP analysis.
Target prediction
PDB
structure
available?
Yes
Bioassay & Mutation
B.S from CASTp
No
Insilico Modelling
Bioavailability prediction
Similarity ensemble search,
HitPick server
No
B.S from ScanProsite
B.S from literature
No
Yes
Yes
Yes

Active Site, Binding Site Predictions
➤ Site prediction
➤1. Literature reviews
➤2. CASTp
➤3. ScanProsite

Predictions - Outcome
➤ Predicted active site (Literature review) – Leu101, Gln102, Asn103, Arg104,
Glu109, Gln111, Ala114, Ser118, Lys120, Phe125, Leu122,Leu126.
➤ Predicted active site (CASTp) – Leu101, Gln102, Asn103, Arg104, Glu109,
Gln111, Ala114, Ser118, Lys120, Phe125, Leu122,Leu126
➤ Predicted functional site (ScanProsite) – Amino acids spanning in the
region from 101-126

RCSB

Protein Model Portal

PDB Sum

PDB Submissions

CATH

ModBase

Catalytic Site Atlas

Nyro Research Foundation – www.nyroindia.org
➤ Internships and Project training with real-time projects
➤ Programming languages, environments and high-
performance computing systems
➤ Third-party software tools
➤ Bootcamps, summer camps, annual conference with
tutorials
➤ Hosts monthly web meetings
➤ Shares information resources, including a blog and other
learning materials
➤ Collaborative research projects
➤ Computing Consultants
➤ 2 Interns per year

Zastra Innovations
➤ Scientific software providers/training
➤ Computational Biology
➤ Computational Chemistry
➤ Materials Science
➤ Nanotechnology
➤ BioStatistics
➤ Dosage Tolerance/Curve Fitting
➤ Medicinal Chemistry / Cheminformatics
➤ Collaborative research projects
➤ Computing Consultants
➤ 2 Interns per year

Agree / Disagree?
➤ Participants/Delegates
➤ My mentors, colleagues and students.
➤ Organizers of the seminar
➤ Email : pillai@nyroindia.org
➤ Phone: 94483 67493
➤ Web : www.nyroindia.org
➤ Inviting Programmers who can
code QSAR models and web portal

Target Identification Using Systems Biology Approach

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

More from Girinath Pillai

More from Girinath Pillai (13)

Recently uploaded

Recently uploaded (20)

Target Identification Using Systems Biology Approach