Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Exploiting Entity Linking in Queries For Entity Retrieval

676 views

Published on

Slides for the ICTIR 2016 paper: "Exploiting Entity Linking in Queries For Entity Retrieval"

The premise of entity retrieval is to better answer search queries by returning specific entities instead of documents. Many queries mention particular entities; recognizing and linking them to the corresponding entry in a knowledge base is known as the task of entity
linking in queries. In this paper we make a first attempt at bringing together these two, i.e., leveraging entity annotations of queries in the entity retrieval model. We introduce a new probabilistic component and show how it can be applied on top of any term based entity retrieval model that can be emulated in the Markov Random Field framework, including language models, sequential dependence models, as well as their fielded variations. Using a standard entity retrieval test collection, we show that our extension brings consistent improvements over all baseline methods, including the current state-of-the-art. We further show that our extension is robust against parameter settings.

Published in: Science
  • Be the first to comment

Exploiting Entity Linking in Queries For Entity Retrieval

  1. 1. Exploiting Entity Linking in Queries for Entity Retrieval Faegheh Hasibi, Krisztian Balog, Svein Erik Bratsberg Newark, DE, USA, September 2016
  2. 2. Exploiting Entity Linking in Queries for Entity Retrieval Faegheh Hasibi, Krisztian Balog, Svein Erik Bratsberg Newark, DE, USA, September 2016
  3. 3. Entity
  4. 4. Entity <rdfs:label> <dbo:abstract> <dbo:birthPlace> <dbo:birthDate> <dbo:deathPlace> <dbo:deathDate> <dbo:child> <dbp:spouse> <dbp:shortDescription> <dbo:wikiPageWikiLink> Ann_Dunham Ann Dunham Stanley Ann Dunham (November 29, 1942 – November 7, 1995) was the mother of Barack Obama, the 44th President of the United States, and an American anthropologis [..] <Honolulu> <Hawaii> <United_States> 1942-11-29 <Honolulu> <United_States> 1995-11-07 <Barack_Obama> <Maya_Soetoro-Ng> <Lolo_Soetoro> <Barack_Obama,_Sr.> Anthropologist; mother of Barack Obama <Barack_Obama> <President_of_the_United_States> <United_States> [..]
  5. 5. Conventional entity retrieval <Barack_Obama> Annotations: barack obama parents Entity-based representation ˆDˆD Term-based representation DD term-based matching entity-based matching entity linking <dbo:birthPlace>: [<Honolulu>, <Hawaii> ] <dbo:child>: <Barack_Obama> <dbo:wikiPageWikiLink>: [ <United_States>, <Family_of_Barack_Obama>, …] Query terms: <rdfs:label>: Ann Dunham <dbo:abstract>: Stanley Ann Dunham the mother Barack Obama, was an American anthropologist who … <dbo:birthPlace>: Honolulu Hawaii … <dbo:child>: Barack Obama <dbo:wikiPageWikiLink>: United States Family Barack Obama Term-based representation barack obama parents Conventional entity retrieval models compare the term-based representation of entity against the query terms. Query
  6. 6. Our proposal barack obama parents Incorporating entity linking into the retrieval model entity linking
  7. 7. Our proposal <Barack_Obama> Annotations: barack obama parents Entity-based representation ˆDˆD Term-based representation DD term-based matching entity-based matching entity linking <dbo:birthPlace>: [<Honolulu>, <Hawaii> ] <dbo:child>: <Barack_Obama> <dbo:wikiPageWikiLink>: [ <United_States>, <Family_of_Barack_Obama>, …] Query terms: <rdfs:label>: Ann Dunham <dbo:abstract>: Stanley Ann Dunham the mother Barack Obama, was an American anthropologist who … <dbo:birthPlace>: Honolulu Hawaii … <dbo:child>: Barack Obama <dbo:wikiPageWikiLink>: United States Family Barack Obama Term-based representation barack obama parents <Barack_Obama> Annotations: barack obama parents Entity-based representation ˆDˆD Term-based representation DD term-based matching entity-based matching entity linking <dbo:birthPlace>: [<Honolulu>, <Hawaii> ] <dbo:child>: <Barack_Obama> <dbo:wikiPageWikiLink>: [ <United_States>, <Family_of_Barack_Obama>, …] Query terms: <rdfs:label>: Ann Dunham <dbo:abstract>: Stanley Ann Dunham the mother Barack Obama, was an American anthropologist who … <dbo:birthPlace>: Honolulu Hawaii … <dbo:child>: Barack Obama <dbo:wikiPageWikiLink>: United States Family Barack Obama Entity-based representation <Barack_Obama> entity linking
  8. 8. Approach Entity Linking integrated Retrieval (ELR)
  9. 9. ELR approach Based on Markov Random Fields (MRF) D. Metzler and W. B. Croft. A Markov Random Field model for term dependencies. In Proc. of SIGIR, pages 472–479, 2005. q1q1 q2q2 q3q3 DD e1e1 barack obama parents BARACK OBAMAbarack obama parents <Barack_Obama>
  10. 10. ELR approach Based on Markov Random Fields (MRF) 9) 0) 1) to al. on te k- F- ument and a term node, (ii) 3-cliques consisting of the document and two term nodes, and (iii) 2-cliques consisting of edges between the document and an entity node. The potential functions for the first two types are identical to the SDM model (Eqs. (3) and 4). We define the potential function for the third clique type as: E(e, D; ⇤) = exp[ EfE(e, D)], where E is a free parameter and fE(e, D) is the feature function for the entity e and document D (to be defined in §4.2). By substi- tuting all feature functions into Eq. (2), the MRF ranking function becomes: P(D|Q) rank = X qi2Q T fT (qi, D)+ X qi,qi+12Q OfO(qi, qi+1, D)+ X qi,qi+12Q U fU (qi, qi+1, D)+ X e2E(Q) EfE(e, D). This model introduces an additional parameter for weighing the im- portance of entity annotations, E, on top of the three parameters ( {T,O,U}) from the SDM model (cf. §3.2). There is a crucial dif- ference between entity-based and term-based matches with regards 9) 0) 1) to al. on te k- F- ument and a term node, (ii) 3-cliques consisting of the document and two term nodes, and (iii) 2-cliques consisting of edges between the document and an entity node. The potential functions for the first two types are identical to the SDM model (Eqs. (3) and 4). We define the potential function for the third clique type as: E(e, D; ⇤) = exp[ EfE(e, D)], where E is a free parameter and fE(e, D) is the feature function for the entity e and document D (to be defined in §4.2). By substi- tuting all feature functions into Eq. (2), the MRF ranking function becomes: P(D|Q) rank = X qi2Q T fT (qi, D)+ X qi,qi+12Q OfO(qi, qi+1, D)+ X qi,qi+12Q U fU (qi, qi+1, D)+ X e2E(Q) EfE(e, D). This model introduces an additional parameter for weighing the im- portance of entity annotations, E, on top of the three parameters ( {T,O,U}) from the SDM model (cf. §3.2). There is a crucial dif- ference between entity-based and term-based matches with regards Terms Ordered bigrams Unordered bigrams Entities The framework encompass various retrieval models; e.g., LM, MLM, PRMS, SDM, and FSDM
  11. 11. Challenge 1 • Number of linked entities is not proportional to query length Affects setting of parameter • Entity linking involves some uncertainty Confidence scores should be integrated into the retrieval model all cars that are produced in Germany Germany (score: 0.7) Cars_(film) (score: 0.05)
  12. 12. Normalization The final model takes the normalized form: parents BARACK OBAMA ntation of the ELR model for the ts”. Here, all the terms are sequen- se "barack obama" is linked to the hile #uwN(qi, qi+1) counts the co- window of N words (where N is set meter µ is the Dirichlet prior, which ument length in the collection. SDM is basically the weighted sum ained from three sources: (i) query y bigrams, and (iii) unordered match ch in §4 employs the same sequential DM does. ial Dependence Model pendence Model (FSDM) [46] ex- ort structured document retrieval. In ocument language model of feature )) with those of the Mixture of Lan- Given a fielded representation of a hors, metadata, etc. in the context of M computes a language model prob- tion, that ELR is applicable to a wide range of retrieval prob where documents, or document-based representations of ob are to be ranked, and entity annotations are available to be l aged in the matching of documents and queries. Our main foc this paper, however, is limited to entity retrieval; entity annota are an integral part of the representation here, cf. Figure 1. is unlike to traditional document retrieval, where documents w need to be annotated by an automated process that is prone t rors). To show the generic nature of our approach, and also fo sake of notational consistency with the previous section, we refer to documents throughout this section. We detail how documents are constructed for our particular task, entity retr in §4.3. Our interest in this work lies in incorporating entity annota and not in creating them. Therefore, entity annotations of the q are assumed to have been generated by an external entity lin process, which we treat much like a black box. Formally, giv input query Q = q1...qn, the set of linked entities is denote E(Q) = {e1, ..., em}. We do not impose any restrictions on annotations, i.e., they may be overlapping and a given query might be linked to multiple entities. It might also be that E is an empty set. Further, we assume that annotations have c dence scores associated with them. For each entity e 2 E let s(e) denote the confidence score of e, with the constraiP e2E(Q) s(e) = 1. The graph underlying our model consists of document, term entity linking score
  13. 13. Challenge 2 Term-based matching is modeled based on term frequencies: tends the SDM model to support structured document retrieval. In essence, FSDM replaces the document language model of feature functions (Eqs. (6), (7), and (8)) with those of the Mixture of Lan- guage Models (MLM) [37]. Given a fielded representation of a document (e.g., title, body, anchors, metadata, etc. in the context of web document retrieval), MLM computes a language model prob- ability for each field and then takes a linear combination of these field-level models. Hence, FSDM assumes that a separate language model is built for each document field and then computes the fea- ture functions based on fields f 2 F, with F being the universe of fields. For individual terms, the feature function becomes: fT (qi, D) = log X f wT f tfqi,Df + µf cfqi,f |Cf | |Df | + µf , (9) while ordered and unordered bigrams are estimated as: fO(qi, qi+1, D) = log X f wO f tf#1(qi,qi+1),Df + µf cf#1(qi,qi+1),f |Cf | |Df | + µf (10) fU (qi, qi+1, D) = log X f wU f tf#uwN(qi,qi+1),Df + µf cf#uwN(qi,qi+1),f |Cf | |Df | + µf . might be l is an empt dence scor let s(e) deP e2E(Q) s The grap entity node are sequen tities are in on this ass types of cli ument and and two ter the docum first two ty define the p where E for the enti tuting all f becomes: P But entities have different behavior than terms … Entity-based representations are sparse Single presence of an entity in a document is considered as a match
  14. 14. Entity-based matching • The modeling considers presence/absence of entities in the entity representation presence/absence of entity in document field y that dence by the model. s as a of en- para- takes tity BARACK OBAMA (via the relationship <dbo:child>). This entity-based representation differs from the traditional term-based representation in at least two important ways. Firstly, each entity appears at most once in each document field. Secondly, if an entity appears in a field, then it should be considered a match, irrespec- tive of what other entities may appear in that field. Consider again the example in Figure 1, where the field <dbo:birthplace> has multiple values, HONOLULU and HAWAII. Then, if either of these entities is linked in the query, that should account for a per- fect match against this particular field, irrespective of how many other locations are present in that field. Motivated by these obser- vations, we define the feature function fE as: fE(e, D) = log X f2F wE f h (1 ↵)tf{0,1}(e, ˆDf ) + ↵ dfe,f dff i , (13) where the linear interpolation implements the Jelinek-Mercer smooth- ing method, with ↵ set to 0.1, and tf{0,1}(e, ˆDf ) indicates whether the entity e is present in the document field ˆDf or not. For the back- ground model, we employ the notion of document frequency as fol- lows: dfe,f = |{ ˆD|e 2 ˆDf }| is the total number of documents that contain the entity e in field f and df(f) = |{ ˆD| ˆDf 6= ;}| is the {T,O,U} of the summations (cf. Eq. (5)) and can be trained without having to worry about query length normalization. The parameter E, how- ever, cannot be treated the same manner for two reasons. Firstly, the number of annotated entities for each query varies and it is inde- pendent of the length of the query. For example, a long natural lan- guage query might be annotated with a single entity, while shorter queries are often linked to several entities, due to their ambiguity. Secondly, we need to deal with varying levels of uncertainty that is involved with entity annotations of the query. The confidence scores associated with the annotations, which are generated by the entity linking process, should be integrated into the retrieval model. To address the above issues, we re-write the parameters as a parameterized function over each clique and define them as: T (qi) = T 1 |Q| , O(qi, qi+1) = O 1 |Q| 1 , U (qi, qi+1) = U 1 |Q| 1 , E(e) = Es(e), where |Q| is the query length and s(e) is the confidence score of en- tity e obtained from the entity linking step. Considering this para- metric form of the parameters, our final ranking function takes the following form: P(D|Q) rank = T X qi2Q 1 |Q| fT (qi, D)+ O X qi,qi+12Q 1 |Q| 1 fO(qi, qi+1, D)+ U X qi,qi+12Q 1 |Q| 1 fU (qi, qi+1, D)+ D an entity-based representation ˆD is obtained ument terms and considering only entities. In work, the entity represented by document D sta tionships with a number of other entities, as spec edge base. The various relationships are model document. Consider the example in Figure 1, wh represents the entity ANN DUNHAM, who is bein tity BARACK OBAMA (via the relationship <dbo entity-based representation differs from the trad representation in at least two important ways. F appears at most once in each document field. Sec appears in a field, then it should be considered tive of what other entities may appear in that fiel the example in Figure 1, where the field <dbo has multiple values, HONOLULU and HAWAII. these entities is linked in the query, that should fect match against this particular field, irrespec other locations are present in that field. Motivat vations, we define the feature function fE as: fE(e, D) = log X f2F wE f h (1 ↵)tf{0,1}(e, ˆDf ) where the linear interpolation implements the Jeli ing method, with ↵ set to 0.1, and tf{0,1}(e, ˆDf ) the entity e is present in the document field ˆDf or ground model, we employ the notion of documen lows: dfe,f = |{ ˆD|e 2 ˆDf }| is the total number contain the entity e in field f and df(f) = |{ ˆD number of documents with a non-empty field f. All of the feature functions fT , fO, fU , and f rameters wf , which control the field weights. Z set these type of parameters using a learning algo to a large number of parameters to be trained (t he query. Therefore, in SDM, {T,O,U} are taken out tions (cf. Eq. (5)) and can be trained without having to query length normalization. The parameter E, how- be treated the same manner for two reasons. Firstly, annotated entities for each query varies and it is inde- e length of the query. For example, a long natural lan- might be annotated with a single entity, while shorter ten linked to several entities, due to their ambiguity. need to deal with varying levels of uncertainty that ith entity annotations of the query. The confidence ated with the annotations, which are generated by the process, should be integrated into the retrieval model. the above issues, we re-write the parameters as a d function over each clique and define them as: T (qi) = T 1 |Q| , O(qi, qi+1) = O 1 |Q| 1 , U (qi, qi+1) = U 1 |Q| 1 , E(e) = Es(e), he query length and s(e) is the confidence score of en- d from the entity linking step. Considering this para- of the parameters, our final ranking function takes form: rank = T X qi2Q 1 |Q| fT (qi, D)+ O X qi,qi+12Q 1 |Q| 1 fO(qi, qi+1, D)+ U X qi,qi+12Q 1 |Q| 1 fU (qi, qi+1, D)+ an entity-based representation of documents. For each document D an entity-based representation ˆD is obtained by ignoring doc- ument terms and considering only entities. In the context of our work, the entity represented by document D stands in typed rela- tionships with a number of other entities, as specified in the knowl- edge base. The various relationships are modeled as fields in the document. Consider the example in Figure 1, where the document represents the entity ANN DUNHAM, who is being linked to the en- tity BARACK OBAMA (via the relationship <dbo:child>). This entity-based representation differs from the traditional term-based representation in at least two important ways. Firstly, each entity appears at most once in each document field. Secondly, if an entity appears in a field, then it should be considered a match, irrespec- tive of what other entities may appear in that field. Consider again the example in Figure 1, where the field <dbo:birthplace> has multiple values, HONOLULU and HAWAII. Then, if either of these entities is linked in the query, that should account for a per- fect match against this particular field, irrespective of how many other locations are present in that field. Motivated by these obser- vations, we define the feature function fE as: fE(e, D) = log X f2F wE f h (1 ↵)tf{0,1}(e, ˆDf ) + ↵ dfe,f dff i , (13) where the linear interpolation implements the Jelinek-Mercer smooth- ing method, with ↵ set to 0.1, and tf{0,1}(e, ˆDf ) indicates whether the entity e is present in the document field ˆDf or not. For the back- ground model, we employ the notion of document frequency as fol- lows: dfe,f = |{ ˆD|e 2 ˆDf }| is the total number of documents that contain the entity e in field f and df(f) = |{ ˆD| ˆDf 6= ;}| is the number of documents with a non-empty field f. All of the feature functions fT , fO, fU , and fE involve free pa- rameters wf , which control the field weights. Zhiltsov et al. [46] set these type of parameters using a learning algorithm, which leads to a large number of parameters to be trained (the number of fea- field weight (PRMS probability)
  15. 15. Results
  16. 16. Overall performance • Experiments on DBpedia-entity collection Contains nearly 500 heterogeneous queries Model SemSearch ES ListSearch INEX-LD MAP P@10 MAP P@10 MAP P@10 LM .2485 .2008 .1529 .1939 .1129 .2210 LM + ELR .2531M (+1.8%) .2008 .1688N (+10.4%) .2096 .1244N (+10.2%) .2330 PRMS .3517 .2685 .1722 .2270 .1184 .2240 PRMS + ELR .3573 (+1.6%) .2700 .1956N (+14.1%) .2417 .1303N (+10%) .2330 SDM .2669 .2108 .1553 .1948 .1167 .2250 SDM + ELR .2641 (-1%) .2115 .1689M (+8.8%) .2174N .1264N (+8.3%) .2340 FSDM* .3563 .2692 .1777 .2165 .1261 .2290 FSDM* + ELR .3581 (+.5%) .2677 .1973M (+11%) .2391N .1332 (+5.6%) .2330 Table 4: Results of ELR approach on different query types. Significance is tested against the line a show the relative improvements, in terms of MAP. Model MAP P@10 LM 0.1588 0.1664 MLM-tc 0.1821 0.1786 MLM-all 0.1940 0.1965 PRMS 0.1936 0.1977 SDM 0.1719 0.1707 FSDM* 0.2030 0.1973 LM + ELR 0.1693N (+6.6%) 0.1757N (+5.6%) MLM-tc + ELR 0.1937N (+6.4%) 0.1895N (+6.1%) MLM-all + ELR 0.2082N (+7.3%) 0.2054M (+4.5%) PRMS + ELR 0.2078N (+7.3%) 0.2085N (+5.5%) SDM + ELR 0.1794N (+4.4%) 0.1812N (+6.1%) FSDM* + ELR 0.2159N (+6.3%) 0.2078N (+5.3%) Table 3: Retrieval results for baseline models (top) and with ELR applied on top of them (bottom). Significance is tested against the corresponding baseline model. Best scores are in boldface. shows the results we get by a lines. We observe consistent i relative ranking of models rem tc < MLM-all < PRMS < FS proved by 4.4–7.3% in terms of P@10. All improvements a these results, we answer our fi tity annotations of the query performance. For the analysis that follow of these models: LM, PRMS, enables us to make a meaning two dimensions: (i) single vs PRMS and FSDM*), and (ii) (LM and PRMS vs. SDM and 6.3 Breakdown by q How are the different quer Intuitively, we expect ELR to “Airports in Germany,” “the first 13 yword queries, involving a mixture of , and attributes (e.g., “Eiffel,” “viet- roman architecture in paris”). guage queries (e.g., “which country fy come from,” “give me all female tatistics on these query subsets. in the term-based representation is a wo of the evaluated models (single- mined retrieval performance against a = 10i , i = 0, 1, 2, 3), see Figure 3. y frequency. It is clear from the figure ined when the top 10 most frequent setting in all our experiments, unless ency, we also used the same setting, y-based representation. ng parameter settings used in our exper- in language models is set to the av- across the collection. The unordered d (11) is chosen to be 8, as suggested parameters involved in SDM, FSDM, y the Coordinate Ascent (CA) algo- ize Mean Average Precision (MAP). mization technique, which iteratively while holding all other parameters CA implementation provided in the the number of random restarts to 3. the parameters of SDM, FSDM*, sing 5-fold cross validation for each ely. We note that Zhiltsov et al. [46] meters (Eq. (5)-(11)) for the FSDM tity representation from [46] (with 10 default search settings, then this set is re-ranked with the specific retrieval model (using an in-house implementation). Evaluation scores are reported on the top 100 results. To perform cross-valida- tion, we randomly create train and test folds from the initial result set, and use the same folds throughout all the experiments. To mea- sure statistical significance we employ a two-tailed paired t-test and denote differences at the 0.01 and 0.05 levels using the N and M sym- bols, respectively. 5.4 Entity linking Entity linking is a key component of the ELR approach. For the purpose of reproducibility, all the entity annotations in this work are obtained using an open source entity linker, TAGME [19], ac- cessed through its RESTful API.1 TAGME is one of the best per- forming entity linkers for short queries [12, 14]. As suggested in the API documentation, we use the default threshold 0.1 in our ex- periments; we analyze the effect of the threshold parameter in §6.5. 6. RESULTS AND ANALYSIS We begin by enumerating our research questions, then present a series of experiments conducted to answer them. 6.1 Research questions We address the following research questions: • RQ1: Can entity retrieval performance be improved by in- corporating entity annotations of the query? (§6.2) • RQ2: How are the different query subsets impacted by ELR? (§6.3) • RQ3: How robust is our method with respect to parameter settings? (§6.4) • RQ4: What is the impact of the entity linking component on end-to-end performance? (§6.5) Retrieval results for the baseline models (top) and with ELR applied on top of them (bottom). Significance is tested against the corresponding baseline model. 6.2 Overall performance
  17. 17. nokia e73 What is the longest river? Vietnam war facts Airlines that currently use Boeing 747 planes EU countries University of Texas at Austin Named entity queries: Complex queries (list, type, and NLP queries): Breakdown by query sets Queries are of two main categories:
  18. 18. Breakdown by query sets Model SemSearch ES ListSearch INEX-LD QALD-2 MAP P@10 MAP P@10 MAP P@10 MAP P@10 LM .2485 .2008 .1529 .1939 .1129 .2210 .1132 .0729 LM + ELR .2531M (+1.8%) .2008 .1688N (+10.4%) .2096 .1244N (+10.2%) .2330 .1241N (+9.6%) .0836N PRMS .3517 .2685 .1722 .2270 .1184 .2240 .1180 .0893 PRMS + ELR .3573 (+1.6%) .2700 .1956N (+14.1%) .2417 .1303N (+10%) .2330 .1343N (+13.8%) .1064N SDM .2669 .2108 .1553 .1948 .1167 .2250 .1369 .0750 SDM + ELR .2641 (-1%) .2115 .1689M (+8.8%) .2174N .1264N (+8.3%) .2340 .1472M (+7.5%) .0857 FSDM* .3563 .2692 .1777 .2165 .1261 .2290 .1364 .0921 FSDM* + ELR .3581 (+.5%) .2677 .1973M (+11%) .2391N .1332 (+5.6%) .2330 .1583M (+16%) .1086N Table 4: Results of ELR approach on different query types. Significance is tested against the line above; the numbers in parentheses show the relative improvements, in terms of MAP. Model MAP P@10 LM 0.1588 0.1664 MLM-tc 0.1821 0.1786 MLM-all 0.1940 0.1965 PRMS 0.1936 0.1977 SDM 0.1719 0.1707 shows the results we get by applying ELR on top of these base- lines. We observe consistent improvements over all baselines; the relative ranking of models remains the same (LM < SDM < MLM- tc < MLM-all < PRMS < FSDM*), but their performance is im- proved by 4.4–7.3% in terms of MAP, and by 4.5–6.1% in terms of P@10. All improvements are statistically significant. Based on these results, we answer our first research question positively: en- ELR especially benefits complex heterogenous queries.
  19. 19. Analysis
  20. 20. Parameters Value of parameters (average of trained values across folds) Different query types are impacted differently by the ELR method.
  21. 21. Impact of parameters Default parameters suggested based on trained values No training is performed (b) PRMS+ELR (c) SDM+ELR (d) FSDM*+ELR T : unigrams, O: ordered bigrams, U : unordered bigrams, E: entities) in our experiments, d using Coordinate Ascent). is attributed to entity-based O,U,E} parameters for each are obtained by averaging olds of the cross-validation d across all retrieval meth- re assigned the highest E r but still sizable portion of S it bears little importance. nclude that different query ELR method. The results prove complex (ListSearch us (INEX-LD) query sets, the other hand, short key- eit often ambiguous, entity f parameters (RQ3)? To configurations: (i) default Model Default params. Trained params. MAP P@10 MAP P@10 LM 0.1588 0.1664 LM + ELR 0.1668M 0.1724 0.1693N 0.1757N PRMS 0.1936 0.1977 PRMS + ELR 0.2028N 0.2035 0.2078N 0.2085N SDM 0.1672 0.1685 0.1719 0.1707 SDM + ELR 0.1721 0.1722 0.1794N 0.1812N FSDM 0.1969 0.1973 0.2030 0.1973 FSDM + ELR 0.2043N 0.1996 0.2159N 0.2078N Table 5: Comparison of default vs. trained parameters over all queries. Significance is tested against the line above. threshold value. Figure 5 reports the results for the best performing model, FSDM* + ELR, for both trained and default parameters. Apart from the small fluctuations in the 0.4–0.6 range, retrieval per- • ELR can bring improvements, even with default parameter values. • ELR is robust against parameter settings.
  22. 22. Impact of entity linking 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Entity linking threshold 0.195 0.200 0.205 0.210 0.215 0.220 0A3 7rDined pDrDPeters DefDult pDrDPeters Entity linking systems involve a threshold parameter ELR improves performance, even considering entities with low confidence
  23. 23. Summary • ELR complements any term-based retrieval model that can be emulated in the MRF framework • ELR brings consistent improvements over all baselines (especially for complex queries) • ELR is robust against parameter settings parameters Entity linking threshold
  24. 24. Thanks! Questions? The code and runs are publicly available: http://bit.ly/ictir2016-elr

×