Target Identification with relationship between Genes, Interacting Partners and Disease Associations. Protein Target Prediction and Binding Site predictions
Target Identification Using Systems Biology Approach
1. Target Identification
How? & Why?
Girinath G. Pillai, PhD
Nyro Research Foundation
Zastra Innovations
www.zastrain.com & www.nyroindia.org
pillai@nyroindia.org
2. What to expect?
➤ Target Identification
➤ Disease Interaction
➤ Network Mapping
➤ Target Prediction
➤ Protein Target Databases
11-11-2018 19:39 Girinath, Nyro Research Foundation 2018 Slide 2 of 58
3. A grand challenge for all of us is
how to best incorporate existing knowledge….
Martha S. Head, GSK, in 2009
4. Systems Biology Approach
➤ Computational modelling of molecular systems and integrative
interpretation of larger genomic datasets.
➤ Reliable methodology for target identification.
➤ Identification of target genes are done by understanding its
relation with associated interacting partners, diseases and
pathways.
11-11-2018 19:39 Girinath, Zastra Innovations 2017 Slide 4 of 58
5. Computational Approach - Mining
11-11-2018 19:39 Girinath, Nyro Research Foundation 2018 Slide 5 of 58
Genes, Interacting
partners, diseases
Systems
Biology
Approach
Prediction of hub genes
Active site prediction
Literature review, CASTp,
ScanProsite
6. Systems Biology Approach (contd…)
➤ Identification of Target Genes(TG) associated with candidate
drug molecule.
➤ STITCH database - Integrates information about interactions from
metabolic pathways, crystal structures, binding experiments and
drug target relationships.
➤ Input – Drug candidate molecules (.smiles format)
➤ Available at - http://stitch.embl.de/
11-11-2018 19:39 Girinath, Nyro Research Foundation 2018 Slide 6 of 58
7. SMILES
➤ Simplified Molecular-Input Line-Entry System (SMILES)
is a specification in form of a line notation for describing the structure
of chemical species using short ASCII strings
➤ SMILES specification was initiated by David Weininger in the 1980s
➤ .smi is the file format for SMILES
➤ SMILES form depends on the choices:
➤ of the bonds chosen to break cycles,
➤ of the starting atom used for the depth-first traversal, and
➤ of the order in which branches are listed when encountered.
➤ There are different variants of SMILES
11-11-2018 19:39 Girinath, Zastra Innovations 2017
Source:
OC(=O)C1CCCN1
Slide 7 of 58
8. STITCH Database
11-11-2018 19:39 Girinath, Nyro Research Foundation 2018 Slide 8 of 58
The STITCH database
currently covers 9'643'763
proteins from 2'031
organisms.
Escherichia coli K12 MG1655
STITCH is a database of known and predicted interactions
between chemicals and proteins. The interactions include
direct (physical) and indirect (functional) associations; they
stem from computational prediction, from knowledge
transfer between organisms, and from interactions
aggregated from other (primary) databases.
9. STITCH – Input Data
11-11-2018 19:39 Girinath, Nyro Research Foundation 2018 Slide 9 of 58
10. STITCH – Chemical Selection
11-11-2018 19:39 Girinath, Nyro Research Foundation 2018 Slide 10 of 58
13. STITCH – Function Partners
11-11-2018 19:39 Girinath, Nyro Research Foundation 2018 Slide 13 of 58
14. Step 1
Compounds in Dataset (.smiles)
➤ Indirubin
➤C1=CC=C2C(=C1)C(=C3C(=O)C4=CC=CC=C4N3)C(=O)N2
➤Stigmasterol
➤CCC(C=CC(C)C1CCC2C1(CCC3C2CC=C4C3(CCC(C4)O)C)C)C(C)C
➤Sitosterol
➤CCC(CCC(C)C1CCC2C1(CCC3C2CC=C4C3(CCC(C4)O)C)C)C(C)C
11-11-2018 19:39 Girinath, Nyro Research Foundation 2018 Slide 14 of 58
15. STRING Database
➤ Identification of Interacting partners (IP) associated with Target Genes.
➤ STRING Database – Database of known and predicted protein-protein
interactions.
➤ Uses information derived from 5 different sources such as genomic context
predictions, high throughput experiments, co-expression, automated text mining,
and previous knowledge from database.
➤Available at – www.string-db.org
11-11-2018 19:39 Girinath, Nyro Research Foundation 2018 Slide 15 of 58
16. STRING
11-11-2018 19:39 Girinath, Nyro Research Foundation 2018 Slide 16 of 58
The STRING database currently
covers 9'643'763 proteins from
2'031 organisms.
STRING is a database of known and predicted protein-
protein interactions. The interactions include direct
(physical) and indirect (functional) associations; they
stem from computational prediction, from knowledge
transfer between organisms, and from interactions
aggregated from other (primary) databases.
23. STRING – Cooccurrence and Coexpression
11-11-2018 19:39 Girinath, Nyro Research Foundation 2018 Slide 23 of 58
24. Disease Assocation
➤ Identification of Diseases associated with Target Genes.
➤ DisGeNET Database – Comprehensive database integrating information on
human disease- associated genes and variants.
➤ Input – Target genes associated with biological function.
➤ Available at – www.disgenet.org/
11-11-2018 19:39 Girinath, Nyro Research Foundation 2018 Slide 24 of 58
25. DisGeNET
➤ 561,119 gene-disease associations (GDAs), between 17,074 genes
and 20,370 diseases, disorders, traits, and clinical or abnormal
human phenotypes
➤ 135,588 variant-disease associations (VDAs), between 83,002 SNPs
and 9,169 diseases and phenotypes
➤ The data in DisGeNET is organized according to type and level of
curation:
➤ CURATED: GDAs from UniProt, PsyGeNET, ClinVar, Orphanet, the GWAS
Catalog, CTD (human data), and Human Phenotype Ontology
➤ ANIMAL MODELS: GDAs from RGD, MGD, and CTD (mouse and rat data)
➤ ALL: GDAs from previous sources and from GAD, LHGDN and BeFree
11-11-2018 19:39 Girinath, Nyro Research Foundation 2018 Slide 25 of 58
26. DisGeNET – Input, Data, Results
11-11-2018 19:39 Girinath, Nyro Research Foundation 2018 Slide 26 of 58
27. Network Construction
➤ Cytoscape is an open source software platform for visualizing molecular
interaction networks and biological pathways and integrating these
networks with annotations, gene expression profiles and other state
data. Although Cytoscape was originally designed for biological research,
now it is a general platform for complex network analysis and
visualization.
➤ Network construction for
➤1. Drug candidate molecules-Target genes
➤2. Target genes- Interacting partners
➤3. Target genes - Diseases
11-11-2018 19:39 Girinath, Nyro Research Foundation 2018 Slide 27 of 58
28. Data Input - 1
➤ Bioactive compounds – TG
➤ Manually create table to curated
with chemical compound, target
genes
11-11-2018 19:39 Girinath, Nyro Research Foundation 2018 Slide 28 of 58
KAPPA CARAGEENAN TNFα
KAPPA CARAGEENAN IL-6
KAPPA CARAGEENAN GAPDH
INDIRUBIN YBX1
INDIRUBIN EGFP
CHOLESTA-5,22-TRANS-DIEN-3β-OL SREBP
CHOLESTA-5,22-TRANS-DIEN-3β-OL LXR
STIGMASTEROL FXR
STIGMASTEROL BSEP
STIGMASTEROL SHP
6-BROMOINDIRUBIN-3-METHOXIME GSK3A
6-BROMOINDIRUBIN-3-METHOXIME GSK3B
6-BROMOINDIRUBIN-3-METHOXIME ATAD5
6-BROMOINDIRUBIN-3-METHOXIME KPNB1
Îł SITOSTEROL BSS
Îł SITOSTEROL BSG
29. Data Input 2
➤ TG – IP
➤ Manually create table to curated with
chemical compound, target genes as
well as Interacting Partners
➤ The chemical compound name and
target genes name should match with
Input 1 table.
11-11-2018 19:39 Girinath, Nyro Research Foundation 2018 Slide 29 of 58
NPC2 NPC1
IL-6 GPD2
IL-6 MGLL
IL-6 AQP9
IL-6 AGPAT76
IL-6 AQP7
1L-6 AQP3
IL-6 AGPA79
GK GPAM
GK AKR1B1
GK AKR1A1
ETFA ETFB
ETFA ETFDH
ETFA ACADM
ETFA HADHA
ETFA CYCS
ELOVL5 FADS2
ELOVL5 HSD17B12
30. Data Input 3
➤ TG – Disease
➤ Manually table to curated with
chemical compound, target genes
as well as Interacting Partners and
disease association
➤ The chemical compound name and
target genes name should match
with Input 1 and 2 table.
11-11-2018 19:39 Girinath, Nyro Research Foundation 2018 Slide 30 of 58
NPC2 Mental Retardation
NPC2 Carcinogenesis
NPC2 Dull intelligence
NPC2 Intellectual Disability
NPC2 Low intelligence
NPC2 Poor school performance
NPC2 Mental deficinecy
NPC2 Seizures
NPC2 Dysfunction disorders
NPC2 Cardiovascular disease
IL-6 Cardiovascular disease
IL-6 Cardiovascular disease
IL-6 Cadiovascular disease
IL-6 Cardiovascular disease
GK Liver carcinoma
GK Lymphoma
GK Leukemia
GK Malignant neoplasm of prostate
GK Prostate carcinoma
GK Melanoma
SHP DiGeorge Syndrome
SHP Schizophrenia
31. Cytoscape - Final Network
➤ Tables TG and TG_IP to be merged
➤ Tables TG, IP and DC to be merged
➤ Network emphasizing interaction between bioactive
compounds from Eclipta alba, it’s TG, IP and diseases.
11-11-2018 19:39 Girinath, Nyro Research Foundation 2018 Slide 31 of 58
32. Network Topology Parameters
➤ Top target genes ranked based on Betweeness
➤ Top target genes ranked based on Closeness
➤ Top target genes ranked based on Degree
11-11-2018 19:39 Girinath, Nyro Research Foundation 2018 Slide 32 of 58
Rank Name Score
1 IL-6 3740
2 GK 2517
Rank Name Score
1 IL-6 33.567
2 IL-8 29.8
Rank Name Score
1 IL-6 11
2 IL-8 10
33. Prediction of Target Genes
➤ Network topology parameters such as Degree, Betweeness, Closeness,
Centrality helps in the identification of closely associated target protein.
➤ Genes present in all network topology parameters are considered as
hub genes (Target genes).
➤ IL-6 is found to be the hub gene for compounds derived from Eclipta
Alba, which is further associated with cardiovascular diseases.
11-11-2018 19:39 Girinath, Nyro Research Foundation 2018 Slide 33 of 58
35. Target Prediction
➤ Target prediction from literature
survey.
➤Binding site identification for co-
crystalized ligands.
➤Homology modelling and Ab-initio
modelling
➤Prediction of bioavailability by
BLASTP analysis.
11-11-2018 19:39 Girinath, Nyro Research Foundation 2018 Slide 35 of 58
Target prediction
PDB
structure
available?
Yes
Bioassay & Mutation
B.S from CASTp
No
Insilico Modelling
Bioavailability prediction
Similarity ensemble search,
HitPick server
No
B.S from ScanProsite
B.S from literature
No
Yes
Yes
Yes
36. Active Site, Binding Site Predictions
➤ Site prediction
➤1. Literature reviews
➤2. CASTp
➤3. ScanProsite
11-11-2018 19:39 Girinath, Nyro Research Foundation 2018 Slide 36 of 58
37. Predictions - Outcome
➤ Predicted active site (Literature review) – Leu101, Gln102, Asn103, Arg104,
Glu109, Gln111, Ala114, Ser118, Lys120, Phe125, Leu122,Leu126.
➤ Predicted active site (CASTp) – Leu101, Gln102, Asn103, Arg104, Glu109,
Gln111, Ala114, Ser118, Lys120, Phe125, Leu122,Leu126
➤ Predicted functional site (ScanProsite) – Amino acids spanning in the
region from 101-126
11-11-2018 19:39 Girinath, Nyro Research Foundation 2018 Slide 37 of 58
46. Nyro Research Foundation – www.nyroindia.org
➤ Internships and Project training with real-time projects
➤ Programming languages, environments and high-
performance computing systems
➤ Third-party software tools
➤ Bootcamps, summer camps, annual conference with
tutorials
➤ Hosts monthly web meetings
➤ Shares information resources, including a blog and other
learning materials
➤ Collaborative research projects
➤ Computing Consultants
➤ 2 Interns per year
11-11-2018 19:39 Girinath, Nyro Research Foundation 2018 Slide 46 of 58
47. Zastra Innovations
➤ Scientific software providers/training
➤ Computational Biology
➤ Computational Chemistry
➤ Materials Science
➤ Nanotechnology
➤ BioStatistics
➤ Dosage Tolerance/Curve Fitting
➤ Medicinal Chemistry / Cheminformatics
➤ Collaborative research projects
➤ Computing Consultants
➤ 2 Interns per year
11-11-2018 19:39 Girinath, Nyro Research Foundation 2018 Slide 47 of 58
48. Agree / Disagree?
➤ Participants/Delegates
➤ My mentors, colleagues and students.
➤ Organizers of the seminar
➤ Email : pillai@nyroindia.org
➤ Phone: 94483 67493
➤ Web : www.nyroindia.org
➤ Inviting Programmers who can
code QSAR models and web portal
11-11-2018 19:39 Girinath, Nyro Research Foundation 2018 Slide 48 of 58