Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Camilo Thorne, Stefano Faralli and Heiner Stuckenschmidt | Entity Linking for Clinical Text Annotation and Disambiguation

357 views

Published on

http://2016.semantics.cc/camilo-thorne

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Camilo Thorne, Stefano Faralli and Heiner Stuckenschmidt | Entity Linking for Clinical Text Annotation and Disambiguation

  1. 1. Cross-Evaluation of Entity Linking and Disambiguation Systems for Clinical Text Annotation Camilo Thorne Stefano Faralli Heiner Stuckenschmidt Data and Web Science (DWS) Group Universit¨at Mannheim, Germany {camilo,stefano,heiner}@informatik.uni-mannheim.de SEMANTiCS 2016 C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 1 / 13
  2. 2. Motivation Low dose pramipexole is neuroprotective in the MPTP mouse model of Parkinson’s disease (*) Problems: 1 identify entities (nouns, noun phrases) within an text; 2 identify or resolve the meaning of such entities within such text by linking them to a sense repository 3 resolve meaning of both domain-specific and generic terms C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 2 / 13
  3. 3. Motivation Low dose pramipexole is neuroprotective in the MPTP mouse model of Parkinson’s disease (*) Problems: 1 identify entities (nouns, noun phrases) within an text; 2 identify or resolve the meaning of such entities within such text by linking them to a sense repository 3 resolve meaning of both domain-specific and generic terms Question Are there annotation services capable of both? C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 2 / 13
  4. 4. Annotators MetaMap (Aronson and Lang, 2010) clinical domain sense repository: UMLS REST service multilingual sense: CUI BabelFly (Moro et al., 2014) general domain sense repository: BabelNet REST service multilingual sense: babelsynset TagMe (Ferragina and Scaiella, 2010) general domain “sense” repository: Wikipedia REST service English/Italian “sense”: Wiki page WordNet (Lesk) (custom) general domain sense repository: WordNet 3.0 Baseline English sense: synset Problem: Sense repositories a priori not aligned Solution: Use linked data in the form of DBpedia (Bizer et al., 2009) as pivot (partial mappings) C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 3 / 13
  5. 5. Annotators MetaMap (Aronson and Lang, 2010) clinical domain sense repository: UMLS REST service multilingual sense: CUI BabelFly (Moro et al., 2014) general domain sense repository: BabelNet REST service multilingual sense: babelsynset TagMe (Ferragina and Scaiella, 2010) general domain “sense” repository: Wikipedia REST service English/Italian “sense”: Wiki page WordNet (Lesk) (custom) general domain sense repository: WordNet 3.0 Baseline English sense: synset Problem: Sense repositories a priori not aligned Solution: Use linked data in the form of DBpedia (Bizer et al., 2009) as pivot (partial mappings) !! UMLS can be mapped to DBpedia via Medline and the LikedLifeData initiative (Momtchev et al., 2009) C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 3 / 13
  6. 6. Annotations (Overview) Use DBpedia as pivot: sense sense ID DBpedia URI Clinical pramipexol C0074710 http://dbpedia.org/resource/Pramipexole (Gold) Parkinson disease C0030567 http://dbpedia.org/resource/Parkinson disease MetaMap pramipexol C0074710 http://dbpedia.org/resource/Pramipexole Parkinson disease C0030567 http://dbpedia.org/resource/Parkinson disease BabelFly ATC code N04BC05 bn:03124207n http://dbpedia.org/resource/Pramipexole TagMe pramipexole https://goo.gl/twrSVu http://dbpedia.org/resource/Pramipexole Parkinson’s disease https://goo.gl/Xke6W3 http://dbpedia.org/resource/Parkinson’s disease annotations for example (*) C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 4 / 13
  7. 7. SemRep Corpus (Kilicoglu et al., 2011) Experiments ran over the SemRep corpus Small annotated clinical corpus 428 clinical excerpts (MedLine/PubMed) 13, 948 word tokens 856 UMLS-annotated clinical terms For each sentence, two noun phrases annotated with their corresponding UMLS CUI by clinicians 606 terms can be associated to a corresponding DBpedia URI Example (*) taken from SemRep C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 5 / 13
  8. 8. Annotation Statistics # of CUIs in corpus (total) = 856 # of corpus DBpedia URIs = 606 # of resolved corpus URIs = 404 # of MetaMap DBpedia URIs = 343 # of resolved MetaMap URIs = 242 # of BabelFly DBpedia URIs = 432 # of resolved BabelFly URIs = 269 # of TagMe DBpedia URIs = 469 # of resolved TagMe URIs = 320 # of WordNet DBpedia URIs = 182 # of resolved WordNet URIs = 97 C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 6 / 13
  9. 9. Cross-Evaluation Pre = #corrent senses #returned senses Rec = #corrent senses #corpus senses F1 = 2·Pre·Rec Pre+Rec Pre Rec F-1 0.0 0.2 0.4 0.6 0.8 1.0 Performance MetaMap BabelFly TagMe WordNet (unresolved URIs) Pre Rec F-1 0.0 0.2 0.4 0.6 0.8 1.0 Performance MetaMap BabelFly TagMe WordNet (resolved URIs) C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 7 / 13
  10. 10. Cross-Evaluation Pre = #corrent senses #returned senses Rec = #corrent senses #corpus senses F1 = 2·Pre·Rec Pre+Rec Pre Rec F-1 0.0 0.2 0.4 0.6 0.8 1.0 Performance MetaMap BabelFly TagMe WordNet (unresolved URIs) Pre Rec F-1 0.0 0.2 0.4 0.6 0.8 1.0 Performance MetaMap BabelFly TagMe WordNet (resolved URIs) Conclusion When URIs are resolved via same as, generic EL systems such as TagMe and BabelNet match domain-specific annotators like MetaMap C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 7 / 13
  11. 11. Semantic Relatedness Measures syn(s, s ) = {(w, w ) ∈ g(s) × g(s ) | wn>0.2(w, w )} |g(s)| + |g(s )| syn+ (s, s ) = {(w, w ) ∈ g(s) × g(s ) | wn>0(w, w )} |g(s)| + |g(s )| dsyn(s, s ) = {(w, w ) ∈ g(s) × g(s ) | dn>0.2(w, w )} |g(s)| + |g(s )| dsyn+ (s, s ) = {(w, w ) ∈ g(s) × g(s ) | dn>0(w, w )} |g(s)| + |g(s )| We measured: 1 WordNet similarity (low coverage, but better accuracy) under two “synonymy” thresholds (“strict” > 0.2, “loose” > 0) 2 word embedding relatedness (standard Wikipedia-trained word2vec word space models) under two “synonymy” thresholds (“strict” > 0.2 and “loose” > 0) C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 8 / 13
  12. 12. Annotation Relatedness sy sy+ dsy dsy+ 0.0 0.1 0.2 0.3 0.4 0.5 Coverage(avg.) MetaMap BabelFly TagMe WordNet Annotations Avg. len. (sent.) Corpus sense glosses 66.41 words BabelFly sense glosses 199.43 words TagMe sense glosses 325.51 words MetaMap sense glosses 191.76 words WordNet sense glosses 50.50 words Test Null hyp. p-value Kruskal-Wallis identical 0.897 C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 9 / 13
  13. 13. Annotation Relatedness sy sy+ dsy dsy+ 0.0 0.1 0.2 0.3 0.4 0.5 Coverage(avg.) MetaMap BabelFly TagMe WordNet Annotations Avg. len. (sent.) Corpus sense glosses 66.41 words BabelFly sense glosses 199.43 words TagMe sense glosses 325.51 words MetaMap sense glosses 191.76 words WordNet sense glosses 50.50 words Test Null hyp. p-value Kruskal-Wallis identical 0.897 Conclusion No significant differences w.r.t. semantic relatedness C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 9 / 13
  14. 14. Summing up... We have cross-evaluated generic WSD and linking systems (BabelFly, TagMe) with domain-specific (MetaMap) annotators Generic WSD and linking systems show competitive results over the SemRep gold standard In particular, their greater coverage yields improvements in F1-score (TagMe outclasses MetaMap in F1-score, but by a small margin) In the future we plan to investigate if domain adaptation yields better results and improve linking C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 10 / 13
  15. 15. Thank You! C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 11 / 13
  16. 16. References I Aronson, A. R. and Lang, F.-M. (2010). And overview of MetaMap: Historical perspective and recent advances. Journal of the American Medical Informatics Association, 17(3):229–236. Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., and Hellmann, S. (2009). DBpedia - A crystallization point for the web of data. Journal of Web Semantics, 7(3):154–165. Ferragina, P. and Scaiella, U. (2010). TAGME: on-the-fly annotation of short text fragments (by wikipedia entities). In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM 2010). Kilicoglu, H., Rosenblat, G., Fiszman, M., and Rindfleisch, T. C. (2011). Constructing a semantic predication gold standard from the biomedical literature. BMC Bioinformatics, 12(486). C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 12 / 13
  17. 17. References II Momtchev, V., Peychev, D., Primov, T., and Georgiev, G. (2009). Expanding the pathway and interaction knowledge in linked life data. Proceedings of 2009 International Semantic Web Challenge. Moro, A., Raganato, A., and Navigli, R. (2014). Entity linking meets word sense disambiguation: a unified approach. Transactions of the Association for Computational Linguistics, 2:231–244. C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 13 / 13

×