Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deep learning based drug protein interaction

103 views

Published on

Predicting drug-target interactions (DTI) is an essential part of the drug discovery process, which is an expensive process in terms of time and cost. Therefore, reducing DTI cost could lead to reduced healthcare costs for a patient. In addition, a precisely learned molecule representation in a DTI model could contribute to developing personalized medicine, which will help many patient cohorts. In this paper, we propose a new molecule representation based on the self-attention mechanism, and a new DTI model using our molecule representation. The experiments show that our DTI model outperforms the state of the art by up to 4.9% points in terms of area under the precision-recall curve. Moreover, a study using the DrugBank database proves that our model effectively lists all known drugs targeting a specific cancer biomarker in the top-30 candidate list.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Deep learning based drug protein interaction

  1. 1. Aug/26/2019 Deep Learning based Drug Discovery Bonggun Shin 1
  2. 2. Aug/26/2019 /50 Outline • Problem definition • Drug discovery process • Drug target interaction (DTI) • Background • Sequence data in DTI • Recent trends in word embeddings • Previous SOTA in DTI • Molecule transformer !2
  3. 3. Aug/26/2019 /50 Drug Discovery Process Target Identification Molecule Discovery Molecule Optimization Clinical Test FDA Approval Repurposing Generating • Green - Physical or computer based (in-silico) experiments • Yellow - Animal and human experiments !3
  4. 4. Aug/26/2019 /50 Drug Repurposing • Safe - already approved drugs • Cheap - no need to come up with a new molecule Allarakhia, Minna. "Open-source approaches for the repurposing of existing or failed candidate drugs: learning from and applying the lessons across diseases." Drug design, development and therapy 7 (2013): 753 !4
  5. 5. Aug/26/2019 /50 Drug Target Interaction • Input: • Drug - molecule • Target - protein (biomarker) • Output: Interaction (affinity score) • Example: EGFR protein (cancer biomarker) has high affinity scores with Lapatinib (anti-cancer drug) • If other non anti-cancer drugs has high affinity scores with EGFR, they can be candidates of an anti-cancer drug
 !5
  6. 6. Aug/26/2019 /50 Inputs of DTI • Sequence • Molecule (SMILES format) • Lapatinib: "CS(=O)(=O)CCNCC1=CC=C(O1)C2…" • protein (FASTA format) • EGFR: "MRPSGTAGAALLALLAALCPASRALE…" !6
  7. 7. Aug/26/2019 /50 Sequence Representation • Sequence: SMILES, FASTA, and text • Vector representation • One hot vector • (word/character) Embedding - more information • Once represented as a vector, we can apply many deep learning methods !7
  8. 8. Aug/26/2019 /50 Recent trends in word embeddings • Local contextual embeddings: Word2vec [1] • RNN based contextual embeddings: ELMO [2] • Attention (w/o RNN) based contextual embeddings: Transformer [3] • The (current) final boss: BERT [4] 
 (Transformer+Masked LM) [1] Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." NIPS 2013. [2] Peters, Matthew E., et al. "Deep contextualized word representations." NAACL (2018). [3] Vaswani, Ashish, et al. "Attention is all you need." Advances in Neural Information Processing Systems. 2017. [4] Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).!8
  9. 9. Aug/26/2019 /50 Word2Vec • Word representation: Word -> Vector • W2V: local context words when calculating a representation vector for a target word • EX) When inferring the red word, 4 context words (blues) are used !9
  10. 10. Aug/26/2019 /50 Word2Vec • How to train word2vec • Example sentence: 
 I go to Emory University located in Atlanta. • input - context words • "I", "go", "Emory" "University" • output - target word, "to" !10
  11. 11. Aug/26/2019 /50 ELMO • Concats of independently trained left-to-right and right- to- left LSTM • It considers all words in a sentence to represent a word • Long sequence -> information vanishing problem !11
  12. 12. Aug/26/2019 /50 How to train ELMO • For the simplicity I assume word level embeddings (but actually they use character level embeddings) • How to train word2vec • Example sentence: 
 I go to Emory University located in Atlanta. • Input • left-to-right model: "I", "go" • right-to-left model: "Atlanta", "in" "located" "University", "Emory" • Output • "to" !12
  13. 13. Aug/26/2019 /50 Transformer • Calculating a vector for a word using all words in the sentence • Attention is all you need! • replaces Embedding+RNN with the transformer (self-attention) !13
  14. 14. Aug/26/2019 /50 Transformer • Model for machine translation • Trained (sub) model can be used as word representation model * All transformer figures in this slide are from http://jalammar.github.io/illustrated-transformer/ !14
  15. 15. Aug/26/2019 /50 Encoder-Decoder !15
  16. 16. Aug/26/2019 /50 Encoder • Encoder can be stacked on top of each other (sequence length is preserved) • Input words are transformed into randomly initialized vectors, x_i • Encoder consists of two parts; self-attention and feed forward !16
  17. 17. Aug/26/2019 /50 Self-Attention High level explanation • The vector for the token "it_" can be calculated as weighted sum (Attention) of all tokens in the same sentence (Self). !17
  18. 18. Aug/26/2019 /50 Weighted Sum • Get three helper vectors • For a given token, calculate the scores of all other tokens • Normalize those scores to get weights • Weighted sum !18
  19. 19. Aug/26/2019 /50 Three (Helper) Vectors • Query, Key, and Value vector are used when calculating hidden representations • These helper vectors are just a projection from trainable params Wq, Wk, and Wv !19
  20. 20. Aug/26/2019 /50 Scoring • Calculate scores for each word with respect to token "Thinking" using (query, key) • For example: "Thinking": 112, "Machines: "96" • Repeat this for all other tokens • The vector, values, will be used in the next step !20
  21. 21. Aug/26/2019 /50 Self-Attentions • Divide by 8 • The square root of the dimension of the key vectors (paper used dim=64) • Softmax: Normalize scores to be sum to one • Hidden representation is a weighted sum of value vectors !21
  22. 22. Aug/26/2019 /50 Multi-Heads • Multi filters in CNN, Multi heads in Transformer • 8 Heads: 8 sets of trainable params Wq, Wk, and Wv, 8 sets of z1 and z2 • Expecting different heads to learn different aspects !22
  23. 23. Aug/26/2019 /50 FeedForward This is the output of the one encoding layer (R) !23
  24. 24. Aug/26/2019 /50 Positional Encoding • Why PE? - Need to distinguish "I am a student" vs "am student I a" • Special patterns representing order of words !24
  25. 25. Aug/26/2019 /50 BERT • Google AI Language Team • 10 month ago • 1100+ citations • SOTA on eleven natural language processing tasks • Outperforms human on SQuaD task • Transformer + New language model task !25
  26. 26. Aug/26/2019 /50 Overview • Based on the Transformers • Two new tasks • Modified input representation Transformer Transformer Transformer MaskedLM IsNext !26
  27. 27. Aug/26/2019 /50 Input Representation • Segment Embedding (0:first sentence, 1:second sentence) • Position Embedding (same as Transformer) used for classification tasks sentence separator * Adopted from the BERT paper !27
  28. 28. Aug/26/2019 /50 Masked LM Task • Given some "masked tokens" in a sentence, the task is to predict the original tokens • Original: I am a student • Input: I [MASK] a student, location (1) • Output: am • Downsides of this • [MASK] token is never seen during fine-tuning • 80% [MASK], 10% random token, 10% no change • 15% of [MASK] -> should be slow to learn a general language model • 4 days of TPU (v2-128) (worth of $10,000 google cloud credits) !28
  29. 29. Aug/26/2019 /50 IsNext Task • Input consists of two sentences • Original: I am a student / I go to Emory • Input: [CLS] I [MASK] a student [SEP] I go [MASK] Emory [SEP], (masked token location, "2", "8") • Output of MaskedLM: 
 "am", "to" • Output of IsNext: 
 1 (2nd one is the next sentence of 1st one) !29
  30. 30. Aug/26/2019 /50 Model Config • Base • Hidden-dim: 768 • A-Head: 12 • Layer: 12 • 110M parameters • Large • Hidden-dim: 1024 • A-Head: 16 • Layer: 24 • 340M parameters !30
  31. 31. Aug/26/2019 /50 Finetuning (1/4) Sentence Pair Classification • MRPC • One sentence is a paraphrased one of the other • Task is to predict if given two sentences are semantically equivalent • X: ("I go to Emory", 
 "I am an Emory student") • Y: yes (equivalent) * From the BERT paper !31
  32. 32. Aug/26/2019 /50 Finetuning (2/4) Single Sentence Classification • SST • Movie review • X: This movie is fun • Y: positive * From the BERT paper !32
  33. 33. Aug/26/2019 /50 Finetuning (3/4) Question Answering • SQuAD • Given question and paragraph pairs, the task is to select a word/phrases that could answer the given question • X: (Q: "Where is Emory?", P: "Emory University is a private research university in the Druid Hills neighborhood of the city of Atlanta, Georgia, United States") • Y: (Druid, States) * From the BERT paper !33
  34. 34. Aug/26/2019 /50 Finetuning (4/4) Single Sentence Tagging (NER) • Named Entity Recognition (NER) • Named entity? • Organization (Emory) • People (Bill Gates) • Location (Atlanta) … • Given a sentence the task is to tag each word if it indicates a certain named entity • X: "Dr. Xiong is a professor at Emory" • Y: B-PER I-PER O O O O B-ORG * From the BERT paper !34
  35. 35. Aug/26/2019 /50 DeepDTA • DeepDTA: Deep drug target affinity • Previous SOTA in DTI • Bioinformatics (IF=5.481) • Task: predicting affinity scores • One-hot embedding+CNN+Dense Drug Vector Target Vector CNN Regression CNN FFNN !35
  36. 36. Aug/26/2019 /50 Convolution Operation CS(=O)(=O)CCNCC1=CC=C(O1)C2=CC3=C(C=C2)N=CN=C3NC4=CC(=C(C=C4)OCC5=CC(=CC=C5)F)Cl 5 !36
  37. 37. Aug/26/2019 /50 Convolution Operation CS(=O)(=O)CCNCC1=CC=C(O1)C2=CC3=C(C=C2)N=CN=C3NC4=CC(=C(C=C4)OCC5=CC(=CC=C5)F)Cl 35 !37
  38. 38. Aug/26/2019 /50 Convolution Operation CS(=O)(=O)CCNCC1=CC=C(O1)C2=CC3=C(C=C2)N=CN=C3NC4=CC(=C(C=C4)OCC5=CC(=CC=C5)F)Cl 35 8 !38
  39. 39. Aug/26/2019 /50 Convolution Operation CS(=O)(=O)CCNCC1=CC=C(O1)C2=CC3=C(C=C2)N=CN=C3NC4=CC(=C(C=C4)OCC5=CC(=CC=C5)F)Cl 35 8 2XX X XX XXX X XX XX X XX XX X XX !39
  40. 40. Aug/26/2019 /50 Convolution Operation CS(=O)(=O)CCNCC1=CC=C(O1)C2=CC3=C(C=C2)N=CN=C3NC4=CC(=C(C=C4)OCC5=CC(=CC=C5)F)Cl 35 8 2XX X XX XXX X XX XX X XX XX X XX MRPSGTAGAALLALLAALCPASRALEEKKVCQGTSNKLTQLGTFEDHFLSLQRMFNNCEV…APQSSEFIGA YY Y YYY Y YY YY YY YY…YY Y YY Y YY Y Drug Vector Target Vector CNN Regression CNN FFNN !40
  41. 41. Aug/26/2019 /50 Limitations of CNN • F and Cl will be convolved together: but they are actually in long distance • Cl will never be convolved with N: but they are closer than F • Local context (CNN) -> global context (self-attention) CS(=O)(=O)CCNCC1=CC=C(O1)C2=CC3=C(C=C2)N=CN=C3NC4=CC(=C(C=C4)OCC5=CC(=CC=C5)F)Cl !41
  42. 42. Aug/26/2019 /50 Molecule Transformer • BERT based sequence representation • Pre-training: only masked LM • Special tokens - [CLS] [MASK] [SEP] vs [REP] [MASK] [BEGIN] [END] !42
  43. 43. Aug/26/2019 /50 Special Tokens • [REP]: same as [CLS] • [BEGIN]/[END]: indicates truncation of long sequence (>100) • length<100: [REP] [BEGIN] C N = C = O [END] • length>100: [REP] C C = ( N == C) Cl … O = C !43
  44. 44. Aug/26/2019 /50 Pre-train • PubChem database • 97,092,853 molecule • Parameters: layers 8, heads 8, and hidden vector size 128 • 8-core TPU machine, the pre-training took about 58 hours. • Masked LM task result: 0.9727 
 (BERT was 0.9855) MT-DTI We choose 15% of SMILES tokens at random for each molecule sequence, and he chosen token with one of the special tokens, [MASK] with the probability of 0.8. other 20% of the time, we replace the chosen token with a random SMILES token2 rve the chosen token, with an equal probability, respectively. The target label of is the chosen token with the index. For example, one possible prediction task for socyanate (CN=C=O) is input : [REP] [BEGIN] C N = [MASK] = O [END] label : (C, 5) ine-tuning hts of the pre-trained Transformers (Section 2.2.4) are used to initialize the Molecule mers in the proposed MT-DTI model (Figure 1). The output of the Transformers is Methyl isocyanate (CN=C=O) !44
  45. 45. Aug/26/2019 /50 Fine-tuning • Protein uses CNN without pre-training - small number of proteins !45
  46. 46. Aug/26/2019 /50 Evaluation Metrics • C-Index • probability of being correctly ordered of two random samples • MSE (mean square error) • Metric used in QSAR[1] • r^2 and r_0^2 are the squared correlation coefficients with and without intercept, respectively. • Acceptable model : value greater than 0.5 • AUPR - Area under the precision- recall curve Concordance Index (C-Index) QSAR !46 [1] Partha Pratim Roy, Somnath Paul, Indrani Mitra, and Kunal Roy. On two novel parameters for validation of predictive qsar models. Molecules, 14(5):1660–1701, 2009.
  47. 47. Aug/26/2019 /50 Result • Five-fold CV • MT-DTI outperforms all the other methods in all of the four metrics • MT-DTIw/oFT • Outperforms the similarity based metrics • Performs better than Deep-DTA for some metrics. !47
  48. 48. Aug/26/2019 /50 Case Study Design • Goal: To find drugs (among FDA-approved drugs) targeting a specific protein, epidermal growth factor receptor (EGFR) • FDA-approved drugs: 1794 molecules in the DrugBank database • EGFR: a well-known gene related to many cancer types • Method: infer scores between EGFR and the 1,794 selected drugs and sort in descending order • Expected result: Actual EGFR targeting drugs will be highly ranked !48
  49. 49. Aug/26/2019 /50 Case Study Result • All existing EGFR drugs (8 out of 1794 drugs) are listed in top 30 • KIBA scores>12.1 indicates it has binding with the target • Other non EGFR drugs might possibly be a new anti-cancer drug candidate !49
  50. 50. Aug/26/2019 /50 Discussion • Summary • Pre-train self-attention network with 97M molecules • Fine-tune the self-attention network for DTI prediction • Results • A new SOTA of DTI • Promising drug candidates targeting a specific protein • Published to MLHC’19 (JMLR) • Future direction • Molecule generation • Molecule optimization !50

×