Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Entity Search on Virtual Documents Created with Graph Embeddings

234 views

Published on

Entity Search is a search paradigm that aims to retrieve entities and all the information related to them. In the last few years the importance of this topic has become greater and greater due to the fact that 40% of the queries made by users mention specific entities nowdays.
This talk wants to give a first overview of the state-of-the-art methods used for entities retrieval and then describe the new approach Anna has implemented and proposed in her master thesis. The novelty introduced with this work exploits two machine learning techniques: neural network and clustering.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Entity Search on Virtual Documents Created with Graph Embeddings

  1. 1. London Information Retrieval Meetup Latest Updates Anna Ruggero, R&D Software Engineer 11th February 2020 Entity Search on Virtual Documents Created with Graph Embeddings
  2. 2. London Information Retrieval Meetup Sease Search Services ● Open Source Enthusiasts ● Apache Lucene/Solr experts ● Community Contributors ● Active Researchers ● Hot Trends : Learning To Rank, Document Similarity, Search Quality Evaluation, Relevancy Tuning
  3. 3. London Information Retrieval Meetup Who I am ! R&D Search Software Engineer ! Master Degree in Computer Science Engineering ! Big Data, Information Retrieval ! Organist, Music lover
  4. 4. London Information Retrieval Meetup Background Solution Design Evaluation Discussion and Conclusion Overview
  5. 5. London Information Retrieval Meetup Background Solution Design Evaluation Discussion and Conclusion Overview
  6. 6. London Information Retrieval Meetup Entity Search ! Entity: an entity is a uniquely identifiable object or thing characterized by its name(s), type(s) and relationships to other entities.
  7. 7. London Information Retrieval Meetup <dbr:Michael_Schumacher> Michael Schumacher Racing Driver 1969-01-03 Entity Search
  8. 8. London Information Retrieval Meetup What is Entity search? is the search paradigm of organizing and accessing information centered around entities and their attributes and relationships. Focus: Task of ad-hoc entity retrieval it aims to answer information needs relayed to a particular entity expressed in unconstrained natural language and resolved using a collection of structured data. Entity Search
  9. 9. London Information Retrieval Meetup How can we represent and manage these type of information Entities Representation
  10. 10. London Information Retrieval Meetup RDF
  11. 11. London Information Retrieval Meetup State of the Art Text Retrieval SPARQL < Michael_Schumacher ><name><Michael> <Michael_Schumacher><type><RacingDriver> <Michael_Schumacher><BirthDate><1969-01-03> Entity
  12. 12. London Information Retrieval Meetup ! Structured-based approach ! Fielded representation with a field associated to each predicate or class of predicates. State of the Art Document creation: ! Text-based approach ! Triples concatenation in a Bag-Of-Words approach.
  13. 13. London Information Retrieval Meetup What do we want to introduce with our approach Our approach, with respect to the state of the art, aims to create documents that involve entities defined by information that consider also the context in which they are collocated. TreeHyde Park Squirrel Nut What’s new
  14. 14. London Information Retrieval Meetup Graph Embeddings ! Graph embedding: is a method that allows us to obtain a numerical vector representation of nodes and edges of a graph. It represents the topology and structure of graph through vectors or set of vectors. ! Graph embeddings obtain these representation through the use of a neural network.
  15. 15. London Information Retrieval Meetup Word2Vec skip-gram model Graph Embeddings
  16. 16. London Information Retrieval Meetup Node2Vec is based on word2vec and wants to identify also relationships as homophily and structural roles through walk generation based on BFS and DFS strategies. Graph Embeddings
  17. 17. London Information Retrieval Meetup Background Solution Design Evaluation Discussion and Conclusion Overview
  18. 18. London Information Retrieval Meetup Solution Design Ranking list of documents Embeddings clusters RDF triples Entities representation Clustering Documents creation Ranking system Entity retrieval Ranking list of documents Entity embeddings Documents Ranking list of entities Embeddings clusters RDF triples Node2Vec
  19. 19. London Information Retrieval Meetup Ranking list of documents Embeddings clusters RDF triples Entities representation Clustering Documents creation Ranking system Entity retrieval Ranking list of documents Entity embeddings Documents Ranking list of entities Embeddings clusters RDF triples Node2Vec Solution Design
  20. 20. London Information Retrieval Meetup Dbpedia
  21. 21. London Information Retrieval Meetup Experimental Setup
  22. 22. London Information Retrieval Meetup Entities Representations ! We obtain entities embedding through the application of Node2Vec setting these parameters: ! Embedding dimension ! Walk length ! Number of walks ! P ! Q ! Workers ! Window size
  23. 23. London Information Retrieval Meetup Ranking list of documents Embeddings clusters RDF triples Entities representation Clustering Documents creation Ranking system Entity retrieval Ranking list of documents Entity embeddings Documents Ranking list of entities Embeddings clusters RDF triples Node2Vec Solution Design
  24. 24. London Information Retrieval Meetup Clustering ! Idea: create documents that contain more than one related and similar entities. ! We execute K-MeansSort on entities embeddings. ! Pro: ! Easy to initialize ! Obtains good results ! Highly scalable.
  25. 25. London Information Retrieval Meetup Ranking list of documents Embeddings clusters RDF triples Entities representation Clustering Documents creation Ranking system Entity retrieval Ranking list of documents Entity embeddings Documents Ranking list of entities Embeddings clusters RDF triples Node2Vec Solution Design
  26. 26. London Information Retrieval Meetup We associate a document to each cluster. Document Creation
  27. 27. London Information Retrieval Meetup Ranking list of documents Embeddings clusters RDF triples Entities representation Clustering Documents creation Ranking system Entity retrieval Ranking list of documents Entity embeddings Documents Ranking list of entities Embeddings clusters RDF triples Node2Vec Solution Design
  28. 28. London Information Retrieval Meetup ! We want to apply the classic text retrieval technique to our virtual documents in order to obtain a ranked list of clusters/documents ! We use BM25 model Ranking System
  29. 29. London Information Retrieval Meetup Ranking list of documents Embeddings clusters RDF triples Entities representation Clustering Documents creation Ranking system Entity retrieval Ranking list of documents Entity embeddings Documents Ranking list of entities Embeddings clusters RDF triples Node2Vec Solution Design
  30. 30. London Information Retrieval Meetup Entity Retrieval ! The user wants a list of entities and not a list of cluster. ! We implement two set of systems whose aim is to create a ranked list of entities starting from the ranked list of clusters/documents: ! Combination systems: basic approach ! Fusion systems: basic approach + state-of-the-art
  31. 31. London Information Retrieval Meetup Combination System
  32. 32. London Information Retrieval Meetup Fusion System
  33. 33. London Information Retrieval Meetup Overview Background Solution Design Evaluation Discussion and Conclusion
  34. 34. London Information Retrieval Meetup ! Quantitative evaluation: general consideration, average measures. ! Qualitative (i.e. topic-based) evaluation: specific consideration. We look for correlation between topics and system effectiveness. Evaluation
  35. 35. London Information Retrieval Meetup Quantitative Evaluation
  36. 36. London Information Retrieval Meetup Quantitative Evaluation
  37. 37. London Information Retrieval Meetup Quantitative Evaluation
  38. 38. London Information Retrieval Meetup Quantitative Evaluation
  39. 39. London Information Retrieval Meetup ! The topic is: ! ”Chefs that have a TV show in Food Network.” ! We retrieve many relevant entities in top positions. ! We retrieve entities that BM25 does not found. Qualitative Evaluation
  40. 40. London Information Retrieval Meetup Overview Background Solution Design Evaluation Discussion and Conclusion
  41. 41. London Information Retrieval Meetup Discussion and Conclusion ! In fusion systems we exploit positive aspects of both the methods: cluster-based and classic. ! The cluster construction process and the choice of the number of entities to insert into the final list are fundamental to obtaining good performances in the retrieval phase. ! Performances are penalized in evaluation phase due to the way the collection test is build. ! Our apporach turns out to be promizing because it succeed in finding new relevant entities with respect to the state-of- the-art.
  42. 42. London Information Retrieval Meetup Thank you for your attention

×