2. Recent Projects
! Druggability prediction
! 3D structure
! Protein Sequence
! Predict a protein’s druggability based on it’s position in the
protein-protein interaction network
! Drug Resistance
! Therapeutic opportunities
! Identification of new gene targets for cancer
! Are they Druggable?
! Candidate Compounds
! Compounds more likely to be a hit for a bioassay
3. Drug Discovery Process
Early-stage:
Discovery
Optimisation ADMET
Clinical
Trials
Paperwork
• Target Evaluation
• Compound
Screening
• Computational
Chemistry
• Structure-
based Drug
Design
• Absorption
Distribution
Metabolism
Excretion
Toxicity
• Patient
Stratification
• Protocol
• Drug Approval
4. Biology 101
! There is a many to many relationship between Gene and Protein
! A Protein is a large molecule; a Drug is a small molecule
! Gene Expression data
! The amount of a gene produced. Epigenetics.
! highly / lowly / over / under – fold change
! Warning: Platforms and preprocessing
! Gene Copy Number
! Loss / Gain a gene
! On one strand or 2?
! There are only approx. 400 genetic targets of approved
pharmaceuticals
! Only from a handful of Protein Families
! Desperate need for diversity
6. Target Identification
! Prediction of disease-associated genes
! patient level
! gene / protein level
! network
! Prediction of mechanisms of disease
! Epigenetic targets – meta-targets
! Prediction of protein function – from sequence / structure / network
! multi-class; multi-label
! Prediction of 3D structure
! Prediction of protein binding
! New immune targets
7. Druggability Prediction
! Drugs – FDA Approved ~350 Very strict – know
therapeutic benefit
! Drugbank – loose – binds but no therapeutic benefit
! Tractable or Druggable
! Rule of 5 compliant
! Precedence-based
- Druggable families / Homology
- Ligand-based scoring
- Uniprot, bioassays – EBI and Pubchem bioassay
- Statistical analysis
8. Druggability Prediction
! Sequence Analysis
- Amino Acid motifs and composition
- Physicochemical descriptors
- infinite amount – very wide data set
- Supervised classification
! FASTA - can download all human sequences from Uniprot
>seq0
FQTWEEFSRAAEKLYLADPMKVRVVLKYRHVDGNLCIKVTD
! R ProtR ; R Bioconductor
! species,mhc,peptide_length,A,R,N,D,C,E,Q,G,H,I,L,K,M,F,P,S,T,W,Y
,V,scl1.lag1,scl2.lag1,scl1.lag2,scl2.lag2,scl1.2.lag1,scl2.1.lag1,scl1.2.l
ag2,scl2.1.lag2,AA,RA,NA,DA,CA,EA,QA,GA,HA,IA,LA,KA,MA,F
A,PA,SA,TA,WA,YA,VA,AR,RR,NR,DR,CR ..... ,Schneider.Xr.K,Schn
eider.Xr.M,Schneider.Xr.F, Grantham.Xr.A,Grantham.Xr.R,
9. Druggability Prediction
! 3D structure
- Pockets, surface area
- Ligand interaction fingerprints
- Supervised classification
11. Druggability Prediction
! Interaction Network
! Many use cases
! Data from EBI and Y2H
! List of binary interactions
! Becareful 1: Data is inherently biased
! Becareful 2: Complex interactions
! R iGraph; Gephi for visualisation
! Topological properties
! Community analysis
! Subgraph analysis
! Statistical analysis, network analysis and supervised
classification
15. Compound Bioactivity
! Brute force mass screening
! 1000s compounds screened in batches
! Primary Assays; Secondary / confirmatory assays
! Can be binary classification or regression
! The IC50 is a measure of how effective a drug is.
! Active / inactive : IC50 threshold
! Goal is also to identify diverse compound structures
! Scaffold Hopping
! Same kind of method as Protein Sequence conversion
! Pharmacophore fingerprints
! https://www.chemaxon.com/free-software/
16.
17. Compound ADMET
! Many use cases
! ADMET of hits
! Absorption
! Distribution
! Metabolism
! Excretion
! Toxicity
! Mutagenecity
! Protein binding
18. General Resources
! EBI European Bioinformatics Institute / Pubchem
! API
! Integrates several downloadable Data Sources (expression, Copy
Number, Bioassays, network, disease-specific)
! Baseline data (Normal not diseased)
! Protein Data Bank – 3D Structures
! DrugBank
! Cancer – The Cancer Genome Atlas (TCGA) and International
Cancer Genome Consortium (ICGC)
! Coding Tools – R Bioconductor , BioPerl, BioPython
! https://docs.chemaxon.com/display/docs/Documentation
19. General Resources
! canSAR database
! Integration of biological, pharmacological, chemical, structural
biology and protein network data
20. Beware 101
! Non-standard Gene names
! Some experiments Genes, some are Proteins
! We need new Drug Targets, different from established ones.
! Keep in mind when analysing results
! Cancer is difficult
! Drug resistance
! Data is not up with the science
! Tumour Heterogeneity
! Wide data = random patterns
! Different expression / sequencing platforms
21. Therapeutic Opportunities
! Approximately only 350 - 400 protein targets
! DNA damage response (DDR) is essential for maintaining
the genomic integrity of the cell
! Currently targeted by chemotherapy and radiation. Goal is for
small molecule targeting
! TCGA Patient Analysis: Expression, Copy Number Variation
and Mutation data.
! 15 cancer disease types
! Telegraph March 2015
! New drugs to tackle cancer cell weak spots could end
'scattergun' chemotherapy
Laurence H. Pearl, Amanda C. Schierz, Simon E. Ward, Bissan Al-Lazikani, Frances M. G.
Pearl. Therapeutic opportunities within the DNA Damage Response. Nature Cancer Reviews
22. Therapeutic Opportunities
! Statistical analysis of DDR deregulation in patients compared
to a random set of genes
! Druggability prediction of deregulated DDR genes
! Synthetic Lethality analysis of Yeast DDR orthologues
! Two genes are synthetic lethal if mutation of either alone is fine
but mutation of both leads to cell death. Targeting a gene that is
synthetic lethal to a cancer-relevant mutation theoretically will
kill only cancer cells.