Intelligent Systems for Cancer Genomics (AIS305) - AWS re:Invent 2018

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AI Summit
Intelligent Systems for Cancer Genomics
Mona Singh, Princeton University

Cancer in the United States
1 in 3 lifetime
risk of developing
cancer
1 in 5 lifetime
risk of dying from
cancer
Source: American Cancer Society

Cancer Cells Acquire Mutations
+
tumor cellsnormal cells
tumor biopsy
Individual’s (normal)
blood sample
mutations in cells
Healthy cells: ATTCGTCATGCGTGACGAGATCGAGCTAGCGCGAAATCGAGCGATC...
Cancer cells: ATTCGTCATGCGTGACGAGATCGAGCTAGCGCGAAATCGAGCGATC...

Personalized Cancer Treatments

Cancer Genome Landscapes
32 cancer subtypes
11,315 patient samples
• 22 cancer subtypes
• 20,487 patient samples
• Many mutations per cancer genome
• Only a few mutations within an individual “drive” the cancer
Mutationsper1MillionDNAbases

Cancer Genome Landscapes
32 cancer subtypes
11,315 patient samples
• 22 cancer subtypes
• 20,487 patient samples
• Many mutations per cancer genome
• Only a few mutations within an individual “drive” the cancer
Mutationsper1MillionDNAbases
❶ Discover “causal” cancer
driver mutations and genes
❷ Predict drug response

Uncovering Cancer Genes In The Context of Other
Information
AAATCGAGGCGATC...
ATATCGAGTCGATC...
ATATCGAGTCGATC...
CAATCGAGGCGATC...
ATATCGAGGCGGTC...
TTATCGAGGAGATC...
8,608,691 varying sites
60,706 individuals
WWbZIP
Bromo
ZF ZF ZF
Population Genomic Data Probabilistic Sequence
Patterns
16,230 “domains” cover 88%
of human proteins
Protein Structures
>127,000 PDB structures
Biological Networks
~300,000 interactions

Proteins Function Through Interactions
protein 1 protein 2 protein 3 protein 20K
protein−RNA
protein−ionprotein−DNAprotein−protein
protein−small molecule

MISILRRGLLVLLAAFPLLALAVQTPHEVVQS
TTNELLGDLKANKEQYKSNPNAFYDSLNRILG
PVVDADGISRSIMTVKYSRKATPEQMQRFQEN
FKRSLMQFYGNALLEYNNQGITVDPAKADDGK
RASVGMKVTGNNGAVYPVQYTLENIGGEWKVR
NVIVNGINIGKLFRDQFADAMQRNGNDLDKTI
DGWAGEVAKAKQAADNSPEKSVKLEHHHHHH
Proteins function &
interact in 3D
Proteins 1D representation
“read” of the genome
Specific Mutations Alter Function

Proteins function &
interact in 3D
Hypothesis:
Cancer cells malfunction due to
mutations that change
interactions & networks

Proteins function &
interact in 3D
Goal:
ID cancer genes by identifying
those with an enriched number
of mutations in interaction sites
across tumor samples

22,712total genes in human
61%13,923Computationally inferred interaction site info
2,871 13% genes w/ structural knowledge of any interaction sites
0
1
MISILRRGLLVLLAAFPLLALAVQTPHEVVQSTTNELLGDLKANKE
Partial, per-position 0 to 1
interaction potential

Uncovering Significantly Mutated Binding Sites
N C
no known interactionsmodeled interactions
0
1
1 20 3
Somatic mutations
per-position binding
potentials
Xi
sum of binding potentials where
mutations land
analytically compute mean
and variance
Z-score:
Xi
~7X speedup per shuffle
Typically >1,000 shuffles

Uncovering Significantly Mutated Interaction Sites
How to consider information together?

standardized
multivariate Gaussian
0
+1
-1
-1
0
+1
PertInInt: Integrative Approach to Uncover Cancer Genes
analytically compute
covariance matrix

PertInInt Identifies Cancer-Relevant Genes
Frequency based
Conservation
Domain
Interaction
All
Gene Rank
EnrichmentofGenesintheCancerGene
Census
30
20
10
0 1 50 100 150 200
~10 minutes
to process 10,000+ tumor samples
(2.4-2.7Ghz processor, <4GB RAM)

PertInInt In Summary
ZF ZF ZF
H. sap. MEGDAVEAIVEES...
P. tro. MENEPSEVILEEN...
G. gor. MEGGPTEAVVEDA...
P. mar. MEKILQMAEGIDI...
*** * **
• Perturbed interactions predictive for cancer genes
• Integrative framework identifies cancer-relevant
genes
– Novel and distinct mutational avenues for driver
genes
• Alternate way to prioritize mutations in an
individual’s cancerAAATCGAGGCGATC...
ATATCGAGTCGATC...

Tumor Growth in the Presence of Drug “X”
Illustration obtained from Verschoor et al. 2013
Compound is more effective
Compound is less effective

Drug Effectiveness Varies Across Tumor Cells
Source: Genomics of Drug Sensitivity in Cancer (GDSC)
Activity of 250 compounds on 960 cell lines
~160K drug-cell pairings

Cancer Cells Are Heterogeneous
cells
~20Kgenes

Our Data
Activity of drugs on diverse cell lines
Gene expression measurements on
untreated cells
Chemical structure of drugs
Goal: Predict activity of drugs on a
tumor using gene expression
profiles
and drug features
New tumors, new drugs!

Genomics Data Has Modular Structure
Illustration obtained from https://rgd.mcw.edu/rgdweb/pathway/pathwayRecord.html?acc_id=PW:0000
Can we use this modular information
to aid in our predictions?

Solution
Use modular knowledge of cellular function
Starting feature space: 960 cell lines x 20K features
(Resistant/Sensitive)

Approach: Autoencoders
Neural network approach to obtain a reduced feature space using
a guided modular genomics approach.
Use gene set autoencoded features for prediction

Merging Multiple Genomic Sources
Mutation within known cancer genes (CGCs)
Reduced set of gene expression values

The Other Half: Structure of Drugs
Features: 2D structural descriptors (chemical
subgroups) and physical features (e.g., size, charge)
Starting space: 250 drug compounds

Chemical Features
881 substructure descriptions as a vector of binary features

Physical Features
1444 PaDEL physicochemical features from SMILES strings.
Molecular free energy, volume, topology.
Apply autoencoders to reduce feature space (90 features)

Combine Input to Model
165K Cell-Drug Pairs

Deep Neural Network on Combined Data

Cross-Validation Testing: Leave All Samples Out Per
Drug

Comparison to Previous Leave-One Out Per Drug

Summary
• Biologically-guided deep net approach to
predict response to drugs
• By training model across drugs and tumors,
can make predictions for new drugs & tumors
• Ultimate goals:
–Personalized oncology
–In silico drug development

Thank you!
Mona Singh
mona@cs.princeton.edu
Shilpa Kobren Jose Zamalloa

Intelligent Systems for Cancer Genomics (AIS305) - AWS re:Invent 2018

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Intelligent Systems for Cancer Genomics (AIS305) - AWS re:Invent 2018

Similar to Intelligent Systems for Cancer Genomics (AIS305) - AWS re:Invent 2018 (20)

More from Amazon Web Services

More from Amazon Web Services (20)

Intelligent Systems for Cancer Genomics (AIS305) - AWS re:Invent 2018