“30 are better than one” Query-Driven Hypothesis Generation for  Answering Queries over NLP Graphs   Chris Welty, Ken Bark...
Approach         the NLP process is not a one-shot dealto decrease the cost of maintaining critical system DBs the query p...
NLP Stack•  Contains NER, CoRef, RelEx, entity disambiguation  •  RelEx: SVM learner with output score: probabilities/    ...
NLP Graph                      citizenOf        Person                         Country                        citizenOf   ...
NLP Graph                    citizenOfPerson                                  Country         Mr.	  X	             India	 ...
NLP Graph                          Mr. X                                    rdf:type	     rdf:type	                       ...
Relation Extraction by RelEx•    RelEx: a set of SVM binary classifiers, one per relation•    for each sentence in the corp...
RelEx Secondary Graph                           Mr. X                                    rdf:type	     rdf:type	          ...
Primary vs. Secondary                 P          R               FPrimary @ 0.1   0.19      0.39            0.26Primary @ ...
Conjunctive Queriesfind all terrorist organizations that were agents of bombings               in Lebanon on October 23, 19...
Problem with Conjunctive Queries     n•  [Π Recall(Rk) ] x Recallcoref    k=1•  Recall for n term query O(Recalln)•  for c...
Hypothesis Generation•  For queries of size N   –  For each term       •  relax the query by removing the term H       •  ...
mric:bombing	                         mric:TerroristOrganiza=on	                                                          ...
This subgraph matchesfind all bombings by                                          the relaxed query	  terrorist orgs in Le...
Hypothesis Validation•  Once generated, a hypothesis must be validated   –  gather evidence that it is true   –  the proba...
Secondary Graph for Validation•  Hypotheses can be validated by looking for the   tuple in the secondary graph   •  a tupl...
Experiments•  3 for dev, 3 for test•  each experiment compares query results from   only PG to query results using the PG+...
0-threshold primary graph    with & without secondary graph                          secondary graph: all@0for a given PG ...
.1-threshold primary graph      with & without secondary graph                            secondary graph: all@0best perfo...
.2-threshold primary graph   with & without secondary graph                        secondary graph: all@0best performance ...
Performance                          Text the difference at the chosen threshold on the test setsignificantly outperforms th...
Conclusions•  the secondary graph can be exploited for getting answers•  the probability that a relation is true between t...
Upcoming SlideShare
Loading in …5
×

Query-Driven Hypothesis Generation for Answering Queries over NLP Graphs, by Chris Welty, Ken Barker, Lora Aroyo, Shilpa Arora

1,171 views
1,006 views

Published on

This paper has been presented at the ISWC2012.

It has become common to use RDF to store the results of Natural Language
Processing (NLP) as a graph of the entities mentioned in the text with the
relationships mentioned in the text as links between them. These NLP graphs
can be measured with Precision and Recall against a ground truth graph representing
what the documents actually say. When asking conjunctive queries on
NLP graphs, the Recall of the query is expected to be roughly the product of the
Recall of the relations in each conjunct. Since Recall is typically less than one,
conjunctive query Recall on NLP graphs degrades geometrically with the number
of conjuncts. We present an approach to address this Recall problem by hypothesizing
links in the graph that would improve query Recall, and then attempting
to find more evidence to support them. Using this approach, we confirm that in
the context of answering queries over NLP graphs, we can use lower confidence
results from NLP components if they complete a query result.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,171
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Query-Driven Hypothesis Generation for Answering Queries over NLP Graphs, by Chris Welty, Ken Barker, Lora Aroyo, Shilpa Arora

  1. 1. “30 are better than one” Query-Driven Hypothesis Generation for Answering Queries over NLP Graphs Chris Welty, Ken Barker, Lora Aroyo, Shilpa Arora Tex Answering Conjunctive SPARQL Queries over NLP Graphs
  2. 2. Approach the NLP process is not a one-shot dealto decrease the cost of maintaining critical system DBs the query provides context without changing the LSW can we replace the human for what the user is seeking andcan we build a machine re-interpret the text thus an opportunity to reader for this NLP NLP Stack Graphs query re-interpret Answering Conjunctive SPARQL Queries over NLP Graphs
  3. 3. NLP Stack•  Contains NER, CoRef, RelEx, entity disambiguation •  RelEx: SVM learner with output score: probabilities/ confidences for each known relation that the sentence expresses it between each pair of mentions•  Run over target corpus producing NLP graph •  nodes are entities (clusters of mentions produced by coref) •  edges are type statements between entities and classes in the ontology, or relations detected between mentions of these entities in the corpus Answering Conjunctive SPARQL Queries over NLP Graphs
  4. 4. NLP Graph citizenOf Person Country citizenOf …  Mr.  X  of  India  …   coref…  in  places  like  India,  Iraq,  …   citizenOf GPE Country subPlace
  5. 5. NLP Graph citizenOfPerson Country Mr.  X   India   coref India   Iraq   GPE Country subPlace
  6. 6. NLP Graph Mr. X rdf:type rdf:type citizenOf India Country IndiaPerson GPE rdf:type subPlaceOf rdf:type Iraq rdf:subClassOf
  7. 7. Relation Extraction by RelEx•  RelEx: a set of SVM binary classifiers, one per relation•  for each sentence in the corpus, •  for each pair of mentions in that sentence, •  for each known relation •  produce a probability that that pair is related by the relation•  NLP graphs are generated by selecting relations from RelEx output in two ways: •  Primary: takes only the top scoring relation between any mention pair above a confidence threshold •  Secondary: takes all relations between all mention pairs above a threshold Answering Conjunctive SPARQL Queries over NLP Graphs
  8. 8. RelEx Secondary Graph Mr. X rdf:type rdf:type causes locatedIn subPlaceOf citizenOf India Country IndiaPerson GPE rdf:type subPlaceOf rdf:type Iraq rdf:subClassOf
  9. 9. Primary vs. Secondary P R FPrimary @ 0.1 0.19 0.39 0.26Primary @ 0.2 0.29 0.33 0.30Secondary @ 0 0.01 0.95 0.02 Recall of max-F configuration
  10. 10. Conjunctive Queriesfind all terrorist organizations that were agents of bombings in Lebanon on October 23, 1983:   SELECT  ?t   WHERE  {   ?t  rdf:type  mric:TerroristOrganization  .   ?b  rdf:type  mric:Bombing  .  R  =  .65   ?b  mric:mediatingAgent  ?t  .  R  =  .09   ?b  mric:eventLocation  mric:Lebanon  .  R  =  .97   ?b  mric:eventDate  "1983-­‐10-­‐23"  .      }  R  =  .057    
  11. 11. Problem with Conjunctive Queries n•  [Π Recall(Rk) ] x Recallcoref k=1•  Recall for n term query O(Recalln)•  for complex queries Recall becomes dominating factor•  in our experiments: query recall <.1 for n>3•  To get any particular correct answer, all NLP components had to get it right
  12. 12. Hypothesis Generation•  For queries of size N –  For each term •  relax the query by removing the term H •  for each solution –  bind the variables in H from the solution forming a hypothesis –  If no solutions for size N-1 are found, then try for N-2•  appropriate for queries that are almost answerable, e.g. missing one of the terms•  biased towards generating more answers to queries, e.g. perform poorly on queries for which the corpus does not contain the answer
  13. 13. mric:bombing   mric:TerroristOrganiza=on   rdf:type rdf:typeSELECT  ?t   t   mric:mediatingAgent b  WHERE  {   ?t  rdf:type  mric:TerroristOrganization  .   ?b  rdf:type  mric:Bombing  .   mric:eventLocation ?b  mric:mediatingAgent  ?t  .   mric:eventDate ?b  mric:eventLocation  mric:Lebanon  .   ?b  mric:eventDate  "1983-­‐10-­‐23"  .      }   mric:Lebanon   1983-­‐10-­‐23   find all bombings by terrorist orgs in Lebanon (hypothesize that the bombings were on 1983-10-23)
  14. 14. This subgraph matchesfind all bombings by the relaxed query  terrorist orgs in Lebanon mric:org-­‐16   mric:event-­‐3   mric:eventDate 1983-­‐10-­‐23   hypothesize that event-3 was on 1983-10-23  
  15. 15. Hypothesis Validation•  Once generated, a hypothesis must be validated –  gather evidence that it is true –  the probability of a triple being true increases•  We utilize a stack of hypothesis checkers that provide –  confidence whether a hypothesis holds –  provenance: a pointer to a span of text that supports it•  Can be used to bind complex computational tasks –  e.g. formal reasoning/choosing between low-confidence extractions –  such tasks are made more tractable by using hypotheses as goals, e.g. a reasoner may be used effectively by constraining to only a part of the graph connected to a hypothesis
  16. 16. Secondary Graph for Validation•  Hypotheses can be validated by looking for the tuple in the secondary graph •  a tuple will appear in SG if the subject and object entities occur in the same sentence somewhere in the corpus•  With precision at .02, it is important to find a productive threshold for accepting hypotheses •  we conducted several experiments to find this threshold
  17. 17. Experiments•  3 for dev, 3 for test•  each experiment compares query results from only PG to query results using the PG+SG for hypothesis validation•  the three experiments compare performance at different primary graph thresholds
  18. 18. 0-threshold primary graph with & without secondary graph secondary graph: all@0for a given PG threshold we vary the SG threshold for validated hypotheses (x-axis)
  19. 19. .1-threshold primary graph with & without secondary graph secondary graph: all@0best performance point (.01 SG threshold)red line indicates the PG threshold - the PG-only flattens below this threshold as expected
  20. 20. .2-threshold primary graph with & without secondary graph secondary graph: all@0best performance point (.01 SG threshold) if a triple in the SG completes a query that is mostly answered by the PG it is very likely to be true the best performing configuration for dev is .2 threshold PG with SG hypotheses validated at .01 threshold
  21. 21. Performance Text the difference at the chosen threshold on the test setsignificantly outperforms the baseline on the same set
  22. 22. Conclusions•  the secondary graph can be exploited for getting answers•  the probability that a relation is true between two entities increases significantly when that relation completes a query answer that is partially satisfied in the primary graph•  able to target discarded interpretations when they will meet some user need•  the NLP process is not a one-shot deal, the query provides context for what the user is seeking and thus an opportunity to re-interpret the text

×