3. ● Why?
● How?
● What are the results?
● Conclusions
● Future works
Road map
4. ● Need
○ The huge amount of death caused by cancer each year
○ The diversity of cancer and anti-cancer drugs makes it difficult to
provide customized therapy for cancer patients
○ Accurate prediction of drug response is important
● Possibility
○ A large amount of accessible genomic data of cancer cell lines and
molecular data of anti-cancer drugs
○ Machine learning methods that can help extract useful features
from these data & facilitate the analysis
Motivation
5. ● Problem
○ Too hard to get labeled data
○ How to learn from unlabeled data?
● Variational auto-encoder (VAE)
○ Encoder: learns the distribution of lantent vectors of input data
○ Decoder: reconstructs input data with latent vectors sampled
from this distribution
○ Robust unsupervised learning method
Motivation
7. ● VAE model for gene expression data(geneVAE)
● Junction Tree VAE for anti-cancer drug molecular data(JTVAE)
● Multi-Layer Perceptron (MLP) model to produce final prediction
● Baseline model: Support Vector Regression (SVR)
Methods in our paper
8. GeneVAE
● Encoder & Decoder: Two-
layer fully connected
neural network (Each)
● Normalized input gene
expression data
● Sampled by guassian
distirbution at latent space
13. ● Data
○ Gene expression data: Cancer Cell Line Encyclopedia (CCLE) data set
○ Important genes related with cancer: Cancer Genomic Census (CGC)
data set
○ Organic compound molecular structure data: ZINC data set
○ Anti-cancer drug molecular structure data: PubChem data set
○ Drug response data: Genomics of Drug Sensitivity in Cancer (GDSC)
data set
Experiments
14. Gene expression data in CCLE dataset
Experiments
Gene expression data filtered by CGC dataset
CGC
dataset
15. ● Evaluation: Pearson Correlation (R2 score) & Root Mean Square Error
(RMSE)
● Test specifically on breast cancer
● Model comparison
○ CGC+SVR
○ RAW+SVR
○ CGC+MLP
○ RAW+VAE+MLP
○ CGC+VAE+MLP
Experiments
16. ● R2 score: 0.678
● RMSE: 1.489
CGC+SVR
● R2 score: 0.700
● RMSE: 1.439
CGC+VAE+SVR
breast cancer (baseline)
20. ● High-Pearson-Correlation Predition
○ 0.830 R2 score on anti-breast cancer drug prediction (our model)
○ 0.845 R2 score on pan cancer drug prediction (our model)
○ 0.843 R2 score, high score of similar research(CDRscan: Link)
● Data encoded by VAE retains the features of input data.
● Accurate prediction on drugs with similar latent vectors and functional
groups
Conclusions
21. ● Model Improvement
○ Attention-based model
○ Sequence generation model based on Graph Neural Network(GNN)
and Graph Attention Network (GAT)
○ Hyper-parameters tuning(eg. Batch Norm layers)
● Date Improvement
○ Add TCGA cell line data set(Comparability & Crediblity)
○ Better gene subset selection (Protein-protein network propagation)
● Drug response prediction toolkit
Future work