Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Entity Typing Using Distributional Semantics and DBpedia

203 views

Published on

Presentation given at NLP&DBpedia workshop on 18 October 2016. The presentation accompanies the work described in: https://nlpdbpedia2016.files.wordpress.com/2016/09/nlpdbpedia2016_paper_9.pdf

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Entity Typing Using Distributional Semantics and DBpedia

  1. 1. Entity Typing Using Distributional Semantics and DBpedia Marieke van Erp and Piek Vossen
  2. 2. Conclusions • Finegrained entity typing is necessary for semantic queries over text • Search space for word2vec is large, topics help • Combining Distributional Semantics with DBpedia can help overcome NIL and Dark Entities https://github.com/MvanErp/entity-typing/
  3. 3. Dark entities: little or no information available in KB https://github.com/MvanErp/entity-typing/
  4. 4. Dark entities: little or no information available in KB https://github.com/MvanErp/entity-typing/
  5. 5. Distributional Semantics • Similar concepts (denoted by words) occur in similar contexts • Word2Vec (Mikolov et al., 2013) explores this notion in a popular implementation Sushi Teriyaki Udon Okonomiyaki Soba Sashimi Kimono Yukata Nemaki KFC Steak Hamburger McDonald’s Jeans T-shirt Skirt
  6. 6. Research Question: • Can we predict the type of the concept ‘Sushi’ by modelling it in a distributional semantics space and comparing its vector to the vectors of concepts for which we do know the type? Sushi Teriyaki Udon Okonomiyaki Soba Sashimi Kimono Yukata Nemaki KFC Steak Hamburger McDonald’s Jeans T-shirt Skirt
  7. 7. Setup • 7 Named Entity Linking Benchmark datasets (AIDA-YAGO, 2014 NEEL, 2015 NEEL, OKE2015, RSS500, WES2015, Wikinews) • 3 Word2Vec models: GoogleNews, English Wikipedia, Reuters RCV1* • Compare all entities within datasets to each other and return highest ranking type (as taken from DBpedia) * AIDA-YAGO is part of Reuters RCV1 https://github.com/MvanErp/entity-typing/
  8. 8. Initial results • Not so great? https://github.com/MvanErp/entity-typing/
  9. 9. Initial results (some footnotes) • Ranking approach favours fine-grained entity types • The Word2Vec corpus matters! NEEL2014&2015 are derived from Tweets, typically low coverage when querying news • Smaller datasets (Wikinews, WES2015, OKE2015) do better? https://github.com/MvanErp/entity-typing/
  10. 10. Let’s zoom in on topics • Initially, all entities within a benchmark dataset were compared to all other entities. • What if we only compare entities from sports documents to other entities from sports documents? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 AIDA−YAGO Coarsegrained Categories GoogleNews Fine 20 40 60 80 100 1 5 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 AIDA−YAGO Coarsegrained Categories RCV1 Fine 20 40 60 80 100 1 5 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 AIDA−YAGO Coarsegrained Categories Wikipedia Fine 20 40 60 80 100 1 5 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 AIDA−YAGO Finegrained Categories GoogleNews Fine 20 40 60 80 100 1 5 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 AIDA−YAGO Finegrained Categories RCV1 Fine 20 40 60 80 100 1 5 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 AIDA−YAGO Finegrained Categories Wikipedia Fine 20 40 60 80 100 1 5 10 https://github.com/MvanErp/entity-typing/
  11. 11. Conclusions and Future Work • Difficult task, but topics help • Ranking needs to be improved • Multi-class classification (KFC: food & organisation, Arnold Schwarzenegger: Actor & Politician) • What else can we discover beyond type? https://github.com/MvanErp/entity-typing/
  12. 12. Thank you! https://github.com/MvanErp/entity-typing/
  13. 13. This research was made possible by the CLARIAH-CORE project financed by NWO. http://www.clariah.nl

×