Team c final slides

Team C
Variational Autoencoder for Anti-Cancer Drug
Response Prediction

Steven Yuan Varus
Dexin
Team C

● Why?
● How?
● What are the results?
● Conclusions
● Future works
Road map

● Need
○ The huge amount of death caused by cancer each year
○ The diversity of cancer and anti-cancer drugs makes it difficult to
provide customized therapy for cancer patients
○ Accurate prediction of drug response is important
● Possibility
○ A large amount of accessible genomic data of cancer cell lines and
molecular data of anti-cancer drugs
○ Machine learning methods that can help extract useful features
from these data & facilitate the analysis
Motivation

● Problem
○ Too hard to get labeled data
○ How to learn from unlabeled data?
● Variational auto-encoder (VAE)
○ Encoder: learns the distribution of lantent vectors of input data
○ Decoder: reconstructs input data with latent vectors sampled
from this distribution
○ Robust unsupervised learning method
Motivation

Related Models
CDRscan(Neural Network) Paccmann(Attention-based)

● VAE model for gene expression data(geneVAE)
● Junction Tree VAE for anti-cancer drug molecular data(JTVAE)
● Multi-Layer Perceptron (MLP) model to produce final prediction
● Baseline model: Support Vector Regression (SVR)
Methods in our paper

GeneVAE
● Encoder & Decoder: Two-
layer fully connected
neural network (Each)
● Normalized input gene
expression data
● Sampled by guassian
distirbution at latent space

GeneVAE
t-SNE result before VAE t-SNE result after VAE

JTVAE
● Graph VAE
○ Fine-grained connectivity
information
○ node-by-node generation
● Tree VAE
○ Tree structure
○ node by functional group

JTVAE
● Similar drugs share
latent vectors
adjacent to each
other in terms of
Euclidean Distance

● Data
○ Gene expression data: Cancer Cell Line Encyclopedia (CCLE) data set
○ Important genes related with cancer: Cancer Genomic Census (CGC)
data set
○ Organic compound molecular structure data: ZINC data set
○ Anti-cancer drug molecular structure data: PubChem data set
○ Drug response data: Genomics of Drug Sensitivity in Cancer (GDSC)
data set
Experiments

Gene expression data in CCLE dataset
Experiments
Gene expression data filtered by CGC dataset
CGC
dataset

● Evaluation: Pearson Correlation (R2 score) & Root Mean Square Error
(RMSE)
● Test specifically on breast cancer
● Model comparison
○ CGC+SVR
○ RAW+SVR
○ CGC+MLP
○ RAW+VAE+MLP
○ CGC+VAE+MLP
Experiments

● R2 score: 0.678
● RMSE: 1.489
CGC+SVR
● R2 score: 0.700
● RMSE: 1.439
CGC+VAE+SVR
breast cancer (baseline)

● R2 score: 0.822 (averagely)
● RMSE: 1.133 (averagely)
CGC+MLP
RAW+VAE+MLP
breast cancer

CGC+VAE+MLP (BRCA)
breast cancer vs. pan-cancer
CGC+VAE+MLP (PAN)

● High-Pearson-Correlation Predition
○ 0.830 R2 score on anti-breast cancer drug prediction (our model)
○ 0.845 R2 score on pan cancer drug prediction (our model)
○ 0.843 R2 score, high score of similar research(CDRscan: Link)
● Data encoded by VAE retains the features of input data.
● Accurate prediction on drugs with similar latent vectors and functional
groups
Conclusions

● Model Improvement
○ Attention-based model
○ Sequence generation model based on Graph Neural Network(GNN)
and Graph Attention Network (GAT)
○ Hyper-parameters tuning(eg. Batch Norm layers)
● Date Improvement
○ Add TCGA cell line data set（Comparability & Crediblity）
○ Better gene subset selection (Protein-protein network propagation)
● Drug response prediction toolkit
Future work

Steven, Yuan, Varus, Dexin
Thank you all!

Team c final slides

Recommended

Recommended

More Related Content

Similar to Team c final slides

Similar to Team c final slides (20)

Recently uploaded

Recently uploaded (20)

Team c final slides