Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Biomedical Word Sense Disambiguation presentation [Autosaved]

260 views

Published on

  • Be the first to comment

  • Be the first to like this

Biomedical Word Sense Disambiguation presentation [Autosaved]

  1. 1. Biomedical Word Sense Disambiguation with Neural Word and Concept Embedding Department of Computer Science University of Kentucky Oct 7, 2016 AKM Sabbir Advisor, Dr. Ramakanth Kavuluru 10/27/2016 1
  2. 2. Outline  Introduction • Application of Word Sense Disambiguation(WSD) • Motivation • Related Methods to Solve WSD • Our Method • Word Vectors • Tools Used • Our Method in Detail • Experiment and Analysis • Conclusion 10/27/2016 2
  3. 3. Introduction • WSD is the task of detecting correct sense or assigning proper sense – the air in the center of the vortex of a cyclone is generally very cold – I Could not come to office last week because I had a cold • Retrieving information from Machine is not easy task • Number of Natural Language Processing(NLP) tasks require WSD 10/27/2016 3
  4. 4. Outline • Introduction  Application of Word Sense Disambiguation(WSD) • Motivation • Related Methods to Solve WSD • Our Method • Word Vectors • Tools Used • Our Method in Detail • Experiment and Analysis • Conclusion 10/27/2016 4
  5. 5. Application • Text to Speech Conversion – Bass can be pronounced either base or baes • Machine Translation – French Word Grille can be translated into gate or bar • Information Retrieval • Named Entity Recognition • Document Summary Generation 10/27/2016 5
  6. 6. Outline • Introduction • Application of Word Sense Disambiguation(WSD)  Motivation • Related Methods to Solve WSD • Our Method • Word Vectors • Tools Used • Our Approach • Experiment and Analysis • Conclusion 10/27/2016 6
  7. 7. Motivation • Generalized WSD is a difficult problem • Solve it for each domain • Biomedical domain contains a large number of ambiguous words • Medical report summary generation • Drug side effect prediction 10/27/2016 7
  8. 8. Outline • Introduction • Application of Word Sense Disambiguation(WSD) • Motivation  Related Methods to Solve WSD • Our Method • Word Vectors • Tools Used • Our Method in Detail • Experiment and Analysis • Conclusion 10/27/2016 8
  9. 9. Related Method • Supervised Methods – Support Vector Machines, Convolutional Neural Net • Unsupervised Methods – Clustering, generative model – If vocabulary has four words w1, w2, w3, w4 • Knowledge Based Methods – WordNet, UMLS(Unified Medical Language System) 10/27/2016 9 w1 … w4 w1 1/5 2/5 0 w2 …
  10. 10. Outline • Introduction • Application of Word Sense Disambiguation(WSD) • Motivation • Related Methods to Solve WSD  Our Method • Word Vectors • Tools Used • Our Approach • Experiment and Analysis • Conclusion 10/27/2016 10
  11. 11. Our Method • We build a semi supervised model • Model involves usage of concept/sense/CUI vectors just like how people use word vectors (more later) • Metamap is an knowledge based NER tool. We use its decisions is used to generate concept vectors • Model also involves the usage of P(w|c) where c is a concept or sense generated using other knowledge based approaches • Generated word vectors using unstructured data source 10/27/2016 11
  12. 12. Outline • Introduction • Application of Word Sense Disambiguation(WSD) • Motivation • Related Methods to Solve WSD • Our Approach  Word Vectors • Tools Used • Our Method in Detail • Experiment and Analysis • Conclusion 10/27/2016 12
  13. 13. What is Word Vector • Distributed representation of words • Representation of word spread across all dimension of vector • The idea is different from other representation where the length is equal to the vocabulary size. Here we choose a small dimension say d=200 and generate dense vectors • Each element of the vector contributes to the definition of many different words 10/27/2016 13 0.07 0.05 0.8 0.002 0.1 0.3King 0.7 0.05 0.67 0.002 0.2 0.3Queen
  14. 14. What is Word Vector • It is a numerical way of word representation • Where each dimension captures some semantic and syntactic information related to that word • Using the similar idea we can generate concept/sense/CUI vectors. 10/27/2016 14
  15. 15. Why Word Vectors Work ? • Learned word vectors capture the syntactic and semantic information exist in text – vector(“king”) – vector(“man”) + vector(“woman”) ≈ vector(“queen”) 10/27/2016 15 Fig 5: resultant queen vector and other vectors [5]
  16. 16. Outline • Introduction • Application of Word Sense Disambiguation(WSD) • Motivation • Related Methods to Solve WSD • Our Method • Word Vectors  Tools Used • Our Method in Detail • Experiment and Analysis • Conclusion 10/27/2016 16
  17. 17. Required Tools • language model 10/27/2016 17
  18. 18. Required Tools Contd. 10/27/2016 18 Step1 Parsing: text parsed in noun phrases using xerox POS tagger to perform syntactic analysis [4]. Step2 Variant Generation: Varaint for each input phrase are generated using the knowledge of specialist lexicons and supplementary database of synonyms Step3 Candidate Retrieval: the candidate sets retrieved from the UMLS metathesaurus contains at least One of the variants generated from step three Step 4 Candidate Evaluation Fig2 : Variants for word ocular Fig3 : evaluated candidates for Ocular complication • Metamap
  19. 19. Outline • Introduction • Application of Word Sense Disambiguation(WSD) • Motivation • Related Methods to Solve WSD • Our Method • Word Vectors • Tools Used  Our Method in Detail • Experiment and Analysis • Conclusion 10/27/2016 19
  20. 20. Our Method in Detail • Text preprocessing – English stop words – Nltk word tokenization – Frequency greater than five – Lower case everything • Word context is ten words long • Generated word vectors are 300 dimension 10/27/2016 20
  21. 21. Generating word and concept vectors • Generate word and concept vectors • 20 million citations from pubmed for training word vectors • Randomly chosen 5 million citations • Retrieved 7.1 million sentences containing target ambiguous words • Each sentence is 16-17 words long • Combined sentence are used to generate the bigrams • Each bigrams fed into metamap with WSD option turned on • Replace each bigram with corresponding concepts • Then fed the data to language model to generate concept vector10/27/2016 21
  22. 22. Estimate P(D|c) [Yepes et al.] • Using Jimeno-Yepes and Berlanga[3] model used Markov Chain to calculate P(D|c) • In order to get P(D|c), need to calculate P(w|c) – 𝑃 𝑤𝑗 𝑐𝑖 = 𝑐𝑜𝑢𝑛𝑡(𝑤 𝑗,𝑐𝑖) 𝑤 𝑗∈𝑙𝑒𝑥(𝑐 𝑖) 𝑐𝑜𝑢𝑛𝑡(𝑤 𝑗, 𝑐 𝑖) 10/27/2016 22
  23. 23. Biomedical MSH WSD • A dataset with 203 ambiguous words • 424 unique concept identifiers (senses) • 38,495 test context instances with an average of 200 test instances for each ambiguous word. • Goal -- to correctly identify the senses for each test instance 10/27/2016 23
  24. 24. Model I Cosine Similarity 𝑓 𝑐 𝑇, 𝑤, 𝐶 𝑤 = 𝑎𝑟𝑔max 𝑐∈𝐶(𝑤) cos(𝑇𝑎𝑣𝑔, 𝑐) • W is the ambiguous word • T is test instance context containing the ambiguous word w • C(w) is the set of concepts that w can assume 10/27/2016 24
  25. 25. Model II projection magnitude 𝑓 𝑐(𝑇, 𝑤, 𝐶(𝑤)) = argmax 𝑐∈𝐶(𝑤) 𝜌(cos 𝑇𝑎𝑣𝑔, 𝑐 ). 𝑃𝑟(𝑇𝑎𝑣𝑔, 𝑐) 𝑐 • Took projection along concept vector and then consider the Euclidean norm 10/27/2016 25
  26. 26. Model III 𝑓 𝑐 (𝑇, 𝑤, 𝐶(𝑤)) = argmax 𝑐∈𝐶(𝑤) cos 𝑇𝑎𝑣𝑔, 𝑐 . 𝑃𝑟(𝑇𝑎𝑣𝑔, 𝑐) 𝑐 • Combined both angular and magnitude 10/27/2016 26
  27. 27. Model IV 𝑓 𝑐 (𝑇, 𝑤, 𝐶(𝑤)) = argmax 𝑐∈𝐶(𝑤) cos 𝑇𝑎𝑣𝑔, 𝑐 . 𝑃𝑟(𝑇𝑎𝑣𝑔, 𝑐) 𝑐 + 𝑃(𝑇|𝑐) 10/27/2016 27
  28. 28. Model V KNN • Now we have multiple ways to resolve sense for ambiguous terms • Built distantly supervised dataset by collecting data from biomedical citations • For each ambiguous words there is on average 40000 sentences • Resolved senses for each sentences using Model IV 10/27/2016 28
  29. 29. KNN in Pseudo Code 10/27/2016 29
  30. 30. KNN contd. 10/27/2016 30 𝑓 𝑘−𝑁𝑁 (𝑇, 𝑤, 𝐶(𝑤)) = argmax 𝑐∈𝐶(𝑤) (𝐷,𝑤,𝑐)∈𝑅 𝑘(𝐷 𝑤) cos 𝑇𝑎𝑣𝑔, 𝐷 𝑎𝑣𝑔 Training instance 1 (c_1) Training instance 2(c_1) Training instance 3 (c_2) Training instance 4 (c_1) Training instance 5 (c_2) …………….. ……………… Training instance n (c_2) Test Instance 1 (__) Cosine similarity Training instance 1 (c_1, 0.7) Training instance 2(c_1, 0.9) Training instance 3 (c_2, 0.1) Training instance 4 (c_1, 0.03) Training instance 5 (c_2, 0.02) …………….. ……………… Training instance n (c_2, 0.12)
  31. 31. KNN Accuracy graph 10/27/2016 31
  32. 32. Distant Supervision with CNN • Used the refined assignment of CUIs to sentences as a training set • Then used MSH WSD data as a test data set • Trained 203 Convolutional Neural Net • With one convolutional layer and one hidden layer • Used 900 filters of 3 different size • Used the test case for testing purpose 10/27/2016 32
  33. 33. Distant Supervision Using CNN 10/27/2016 33
  34. 34. Ensembling of CNNs • Five CNN training and testing for each ambiguous words • Average the output and takes the best one • Tends to improve the result at the cost of computation 10/27/2016 34
  35. 35. Outline • Introduction • Application of Word Sense Disambiguation(WSD) • Motivation • Related Methods to Solve WSD • Our Method • Word Vectors • Tools Used • Our Approach  Experiment and Analysis • Conclusion 10/27/2016 35
  36. 36. Results and Analysis Methods Results Jimeno-Yepes and Berlanga [1] 89.10% Cosine similarity (Model I 𝑓 𝑐 ) 85.54% Projection length proportion(Model II 𝑓 𝑝) 88.68% Combining Model I and II (𝑓 𝑐,𝑝) 89.26% Combining Model I, II and [1] 92.24% Convolutional Neural Net 86.17% Ensembling CNN 87.78% K-NN with k = 3500 (𝑓 𝑘−𝑁𝑁) 94.34% 10/27/2016 36
  37. 37. Outline • Introduction • Application of Word Sense Disambiguation(WSD) • Motivation • Related Methods to Solve WSD • Our Method • Word Vectors • Tools Used • Our Approach • Experiment and Analysis  Conclusion 10/27/2016 37
  38. 38. Conclusion • The developed model is highly accurate beating previous best • It is unsupervised no requirement of hand label information • It is scalable however the accuracy level will be uncertain – By increasing the number of training sentence and the context of sentence more information may be extractable • Graph based algorithm need to be explored • HPC, Theano, Nltk, Gensim Word2Vec 10/27/2016 38
  39. 39. Questions 10/27/2016 39
  40. 40. References 1. Eneko Agirre and Philip Edmonds. Word sense disambiguation: Algorithms and applications, volume 33. Springer Science & Business Media, 2007. 2. Yoshua Bengio, Rejean Ducharme, Pascal Vincent, and Christian Janvin. A neural probabilistic language model. The Journal of Machine Learning Research, 3:1137-1155, 2003 3. Antonio Jimeno Yepes and Rafael Berlanga. Knowledge based word-concept model estimation and renement for biomedical text mining. Journal of biomedical informatics, 53:300-307, 2015. 4. Aronson, Alan R. "Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program." Proceedings of the AMIA Symposium. American Medical Informatics Association, 2001. 5. https://blog.acolyer.org/2016/04/21/the-amazing-power-of-word- vectors/ 10/27/2016 40
  41. 41. References 6. Alex Krizhevsky, Ilya Sutskever, and Georey E Hinton. Imagenet classication with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097-1105, 2012. 10/27/2016 41

×