Textual Entailment  Dan Roth,  University of Illinois,  Urbana-Champaign USA ACL -2007  Ido Dagan Bar Ilan University Isra...
<ul><li>Motivation and Task Definition </li></ul><ul><li>A Skeletal review of Textual Entailment Systems </li></ul><ul><li...
I. Motivation and Task Definition Page
Motivation <ul><li>Text applications require  semantic  inference </li></ul><ul><li>A common framework for applied semanti...
Desiderata for Modeling Framework <ul><li>A framework for a target level of  language processing should provide: </li></ul...
Natural Language and Meaning Page  Meaning Language Ambiguity Variability
Variability of Semantic Expression <ul><li>Model variability   as relations between text expressions: </li></ul><ul><li>Eq...
Typical Application Inference:  Entailment <ul><li>Similar for IE:  X acquire Y </li></ul><ul><li>Similar for “semantic” I...
KRAQ'05 Workshop - KNOWLEDGE and REASONING for ANSWERING QUESTIONS (IJCAI-05) <ul><ul><li>CFP: </li></ul></ul><ul><ul><li>...
Classical Entailment Definition  <ul><li>Chierchia & McConnell-Ginet (2001): A text t entails a hypothesis h if h is true ...
“ Almost certain” Entailments <ul><li>t:   The technological triumph known as GPS … was incubated in the mind of Ivan Gett...
Applied Textual Entailment <ul><li>A directional relation between two text fragments:  Text (t)  and  Hypothesis (h): </li...
Probabilistic Interpretation <ul><li>Definition :  </li></ul><ul><li>t  probabilistically entails   h   if: </li></ul><ul>...
The Role of Knowledge <ul><li>For textual entailment to hold we require: </li></ul><ul><ul><li>text  AND  knowledge      ...
Page  PASCAL Recognizing Textual Entailment (RTE) Challenges EU FP-6 Funded PASCAL Network of Excellence  2004-7 Bar-Ilan ...
Generic Dataset by Application Use <ul><li>7 application settings in RTE-1, 4 in RTE-2/3 </li></ul><ul><ul><li>QA  </li></...
RTE Examples Page  TEXT HYPOTHESIS TASK ENTAIL-MENT 1 Regan attended a ceremony in  Washington to commemorate the landings...
Participation and Impact <ul><li>Very successful challenges, world wide: </li></ul><ul><ul><li>RTE-1 – 17 groups  </li></u...
Methods and Approaches (RTE-2) <ul><li>Measure similarity match between  t  and  h  ( coverage  of  h  by  t) :  </li></ul...
Dominant approach: Supervised Learning <ul><li>Features model similarity and mismatch </li></ul><ul><li>Classifier determi...
RTE-2 Results Page  Average:  60% Median: 59% Average Precision Accuracy First Author (Group) 80.8% 75.4% Hickl (LCC) 71.3...
Analysis <ul><li>For the first time:  methods that carry some deeper analysis seemed (?) to outperform shallow lexical met...
Why? <ul><li>System reports point at: </li></ul><ul><ul><li>Lack of knowledge (syntactic transformation rules, paraphrases...
Some suggested research directions <ul><li>Knowledge acquisition </li></ul><ul><ul><li>Unsupervised acquisition of linguis...
Complementary Evaluation Modes <ul><li>“ Seek” mode: </li></ul><ul><ul><li>Input:  h  and corpus </li></ul></ul><ul><ul><l...
II. A Skeletal review of Textual Entailment Systems Page
Textual Entailment Page  Eyeing the huge market potential, currently led by Google, Yahoo took over search company  Overtu...
A general Strategy for Textual Entailment  Page  Given a sentence T Decision Find the  set of Transformations/Features of ...
Details of The Entailment Strategy <ul><li>Preprocessing </li></ul><ul><ul><li>Multiple levels of lexical pre-processing  ...
The Case of Shallow Lexical Approaches <ul><li>Preprocessing </li></ul><ul><ul><li>Identify Stop Words </li></ul></ul><ul>...
Shallow Lexical Approaches (Example)  <ul><li>Lexical/word-based semantic overlap: score based on matching each word in  H...
An Algorithm:  L ocal L excial M atching <ul><li>For each word in Hypothesis, Text  </li></ul><ul><ul><li>if word matches ...
An Algorithm:  L ocal L exical M atching  (Cont.)  <ul><li>LexicalCompare()  </li></ul><ul><ul><li>if(LEMMA_H == LEMMA_T) ...
Details of The Entailment Strategy (Again) <ul><li>Preprocessing </li></ul><ul><ul><li>Multiple levels of lexical pre-proc...
Preprocessing  <ul><li>Syntactic Processing:  </li></ul><ul><ul><li>Syntactic Parsing (Collins; Charniak; CCG) </li></ul><...
Details of The Entailment Strategy (Again) <ul><li>Preprocessing </li></ul><ul><ul><li>Multiple levels of lexical pre-proc...
Basic Representations Page  Meaning Representation Raw Text  Inference Representation Textual Entailment Local Lexical Syn...
Basic Representations (Syntax)  Page  Local Lexical Syntactic Parse Hyp:   The Cassini spacecraft has reached Titan.
Basic Representations (Shallow Semantics: Pred-Arg ) <ul><li>T: The government purchase of the Roanoke building, a former ...
Basic Representations (Logical Representation) Page  [Bos & Markert]   The semantic representation language is a first-ord...
Representing Knowledge Sources <ul><li>Rather straight forward in the Logical Framework: </li></ul>Page  <ul><li>Tree/Grap...
Representing Knowledge Sources (cont.) Page  <ul><li>In general, there is a mix of  procedural  and  rule based  encodings...
Details of The Entailment Strategy (Again) <ul><li>Preprocessing </li></ul><ul><ul><li>Multiple levels of lexical pre-proc...
Knowledge Sources <ul><li>The knowledge sources available to the system are the most significant component of supporting T...
Enriching Preprocessing <ul><li>In addition to syntactic parsing several approaches enrich the representation with various...
Lexical Resources <ul><li>Recognizing that a  word  or a  phrase  in  S  entails a word or a phrase in  H  is essential in...
Lexical Resources (Cont.) <ul><li>Lexical Paraphrasing Rules </li></ul><ul><ul><li>A number of efforts to acquire relation...
Semantic Phenomena <ul><li>A large number of semantic phenomena have been identified as significant to Textual Entailment....
Semantic Phenomena (Cont.) <ul><li>Relative clauses  </li></ul><ul><ul><li>The assailants fired six bullets at the car, wh...
Logical Structure  <ul><li>Factivity  : Uncovering the  context in which a verb phrase is embedded  </li></ul><ul><ul><li>...
Some Examples   [ Braz et. al. IJCAI workshop’05;PARC Corpus]   <ul><li>T : Legally, John  could drive.  </li></ul><ul><li...
Details of The Entailment Strategy (Again) <ul><li>Preprocessing </li></ul><ul><ul><li>Multiple levels of lexical pre-proc...
Control Strategy and Decision Making <ul><li>Single Iteration </li></ul><ul><ul><li>Strict Logical approaches are, in prin...
Transformation Walkthrough [Braz et. al’05] <ul><li>T: The government purchase of the Roanoke building, a former prison, t...
Transformation Walkthrough (1) <ul><li>T: The government purchase of the Roanoke building, a former prison,  took place in...
Transformation Walkthrough (2) <ul><li>T: The government purchase of the Roanoke building,  a former prison,  took place  ...
Transformation Walkthrough (3) <ul><li>T: The government  purchase  of the Roanoke building, a former prison,  occurred  i...
Transformation Walkthrough (4) <ul><li>T: The government purchase of  the Roanoke building, a former prison,  occurred in ...
Transformation Walkthrough (5) <ul><li>T: The government purchase of the Roanoke building, a former prison,  took place in...
Characteristics <ul><li>Multiple paths => optimization problem </li></ul><ul><ul><li>Shortest or highest-confidence path t...
Summary: Control Strategy and Decision Making <ul><li>Despite the appeal of the Strict Logical approaches as of today, the...
Hybrid/Ensemble Approaches <ul><li>Bos et al.: use theorem prover and model builder </li></ul><ul><ul><li>Expand models of...
Justification <ul><li>For most approaches  justification  is given only by the data Preprocessed </li></ul><ul><ul><li>Emp...
<ul><li>R  -  a knowledge representation language, with a well defined  </li></ul><ul><li>syntax and semantics or a domain...
<ul><li>The proof theory is weak; will show  r s  µ  r t  only when they are relatively similar syntactically.  </li></ul>...
<ul><li>A rewrite rule  (l,r)  is a pair of expressions in R such that  l   µ  r   </li></ul><ul><li>Given a representatio...
<ul><li>The claim suggests an algorithm for generating alternative (equivalent) representations and for semantic entailmen...
<ul><li>The problem of determining non-entailment is harder, mostly due to it’s structure. </li></ul><ul><li>Most approach...
What are we missing?  <ul><li>It is completely clear that the key resource missing is knowledge.  </li></ul><ul><ul><li>Be...
Textual Entailment as a Classification Task Page
RTE as classification task <ul><li>RTE is a classification task: </li></ul><ul><ul><li>Given a pair we need to decide if T...
Defining the feature space <ul><li>How do we define the feature space? </li></ul><ul><li>Possible features </li></ul><ul><...
Distance Features <ul><li>Possible features </li></ul><ul><ul><li>Number of words in common </li></ul></ul><ul><ul><li>Lon...
Entailment Triggers <ul><li>Possible features </li></ul><ul><ul><li>from (de Marneffe et al., 2006) </li></ul></ul><ul><ul...
Pair Features <ul><li>Possible features </li></ul><ul><ul><li>Bag-of-word spaces of T and H </li></ul></ul><ul><ul><li>Syn...
Pair Features: what can we learn? <ul><li>Bag-of-word spaces of T and H </li></ul><ul><ul><li>We can learn: </li></ul></ul...
ML Methods in the possible feature spaces Page  Page  (…) (…) (…) Possible Features Sentence representation Bag-of-words S...
Effectively using the Pair Feature Space <ul><li>Roadmap </li></ul><ul><li>Motivation: Reason why it is important even if ...
Observing the Distance Feature Space… Page  Page  (Zanzotto, Moschitti, 2006) % common syntactic dependencies % common wor...
What can happen in the pair feature space? Page  Page  (Zanzotto, Moschitti, 2006) T 1 H 1 “ At the end of the year, all s...
Observations <ul><li>Some examples are difficult to be exploited in the distance feature space… </li></ul><ul><li>We need ...
Target <ul><li>How do we build it: </li></ul><ul><ul><li>Using a syntactic interpretation of sentences  </li></ul></ul><ul...
Observing the syntactic pair feature space <ul><li>Can we use syntactic tree similarity? </li></ul>Page  Page  (Zanzotto, ...
Observing the syntactic pair feature space <ul><li>Can we use syntactic tree similarity? </li></ul>Page  Page  (Zanzotto, ...
Observing the syntactic pair feature space <ul><li>Can we use syntactic tree similarity?  Not only! </li></ul>Page  Page  ...
Observing the syntactic pair feature space <ul><li>Can we use syntactic tree similarity?  Not only! </li></ul><ul><li>We w...
Exploiting Rewrite Rules  <ul><li>To capture the  textual entailment recognition rule  ( rewrite rule  or  inference rule ...
Exploiting Rewrite Rules Page  Page  (Zanzotto, Moschitti, 2006)
Exploiting Rewrite Rules Page  Page  Intra-pair operations (Zanzotto, Moschitti, 2006)
Exploiting Rewrite Rules Page  Page  Intra-pair operations    Finding  anchors (Zanzotto, Moschitti, 2006)
Exploiting Rewrite Rules Page  Page  <ul><li>Intra-pair operations </li></ul><ul><li>Finding  anchors </li></ul><ul><li>Na...
Exploiting Rewrite Rules Page  Page  <ul><li>Intra-pair operations </li></ul><ul><li>Finding  anchors </li></ul><ul><li>Na...
Exploiting Rewrite Rules Page  Page  <ul><li>Intra-pair operations </li></ul><ul><li>Finding  anchors </li></ul><ul><li>Na...
Exploiting Rewrite Rules Page  Page  <ul><li>Cross-pair operations </li></ul><ul><li>Matching placeholders across pairs </...
Exploiting Rewrite Rules Page  Page  <ul><li>Cross-pair operations </li></ul><ul><li>Matching placeholders across pairs </...
Exploiting Rewrite Rules Page  Page  <ul><li>Intra-pair operations </li></ul><ul><li>Finding  anchors </li></ul><ul><li>Na...
Exploiting Rewrite Rules Page  Page  <ul><li>Intra-pair operations </li></ul><ul><li>Finding  anchors </li></ul><ul><li>Na...
Exploiting Rewrite Rules <ul><li>The initial example:  sim(H 1 ,H 3 ) > sim(H 2 ,H 3 )? </li></ul>Page  Page  (Zanzotto, M...
Defining the Cross-pair similarity <ul><li>The cross pair similarity is based on the distance between syntatic trees with ...
Defining the Cross-pair similarity Page  Page
Refining Cross-pair Similarity <ul><li>Controlling complexity  </li></ul><ul><ul><li>We reduced the size of the set of anc...
BREAK (30 min) Page
III. Knowledge Acquisition Methods Page
Knowledge Acquisition for TE <ul><li>What kind of knowledge we need? </li></ul><ul><li>Explicit Knowledge (Structured Know...
Acquisition of Explicit Knowledge Page  Page
Acquisition of Explicit Knowledge <ul><li>The questions we need to answer </li></ul><ul><li>What? </li></ul><ul><ul><li>Wh...
Acquisition of Explicit Knowledge:  what? <ul><li>Types of knowledge </li></ul><ul><li>Symmetric </li></ul><ul><ul><li>Co-...
Acquisition of Explicit Knowledge :  Using what? <ul><li>Underlying hypothesis </li></ul><ul><li>Harris’ Distributional Hy...
Distributional Hypothesis (DH) Page  Page  Words or Forms Context (Feature) Space sim w (W 1 ,W 2 )  sim ctx (C(W 1 ), C(...
Point-wise Assertion Patterns (PAP) Page  Page  w 1  is in a relation  r  with  w 2   if the contexts  patterns r (w 1 , w...
DH and PAP cooperate  Page  Page  Words or Forms Context (Feature) Space w 1 =  constitute  w 2 =  compose C(w 1 ) C(w 2 )...
Knowledge Acquisition: Where methods differ? <ul><li>On the “word” side </li></ul><ul><li>Target equivalence classes: Conc...
KA4TE: a first classification of some methods Page  Page  Types of knowledge Underlying hypothesis Distributional Hypothes...
Noun Entailment Relation <ul><li>Type of knowledge:  directional relations </li></ul><ul><li>Underlying hypothesis:  distr...
Verb Entailment Relations <ul><li>Type of knowledge:  oriented relations </li></ul><ul><li>Underlying hypothesis:  point-w...
Verb Entailment Relations <ul><li>Understanding the idea </li></ul><ul><li>Selectional restriction </li></ul><ul><ul><li>f...
Verb Entailment Relations <ul><li>Understanding the idea </li></ul><ul><li>Given the expression </li></ul><ul><li>player w...
Knowledge Acquisition for TE:  How? <ul><li>The algorithmic nature of a DH+PAP method </li></ul><ul><li>Direct </li></ul><...
Direct Algorithm Page  Page  Words or Forms Context (Feature) Space sim( w 1 , w 2 )  sim(C( w 1 ), C( w 2 )) w 1 =  cat ...
Indirect Algorithm Page  Page  <ul><li>Given an equivalence class W, select relevant contexts and represent them in the fe...
Iterative Algorithm Page  Page  <ul><li>For each word w i  in the equivalence class W, retrieve the C(w i ) contexts and r...
Knowledge Acquisition using DH and PAH <ul><li>Direct Algorithms </li></ul><ul><ul><li>Concepts from text via clustering (...
TEASE <ul><li>Type: Iterative algorithm </li></ul><ul><li>On the “word” side </li></ul><ul><li>Target equivalence classes:...
TEASE Page  Page  WEB Lexicon Input template: X  subj -accuse- obj  Y Sample corpus for input template: Paula Jones  acc...
TEASE <ul><li>Innovations with respect to reasearches < 2004 </li></ul><ul><li>First direct algorithm for extracting rules...
Espresso <ul><li>Type: Iterative algorithm </li></ul><ul><li>On the “word” side </li></ul><ul><li>Target equivalence class...
Espresso Page  Page  (leader , panel) (city , region) (oxygen , water) Y  is composed by  X X , Y Y  is part of  Y 1.0  Y ...
Espresso <ul><li>Innovations with respect to reasearches < 2006 </li></ul><ul><li>A measure to determine specific vs. gene...
Acquisition of Implicit Knowledge Page  Page
Acquisition of Implicit Knowledge <ul><li>The questions we need to answer </li></ul><ul><li>What? </li></ul><ul><ul><li>Wh...
Acquisition of Implicit Knowledge:  what? <ul><li>Types of knowledge </li></ul><ul><li>Symmetric </li></ul><ul><ul><li>Nea...
Acquisition of Implicit Knowledge :  Using what? <ul><li>Underlying hypothesis </li></ul><ul><li>Structural and content si...
A first classification of some methods Page  Page  Types of knowledge Underlying hypothesis Structural and content similar...
Entailment relations among sentences <ul><li>Type of knowledge:  directional relations (entailment) </li></ul><ul><li>Unde...
Entailment relations among sentences Page  Page  examples from the web New York Plan for DNA Data in Most Crimes Eliot Spi...
Tricky Not-Entailment relations among sentences <ul><li>Type of knowledge:  directional relations (tricky not-entailment) ...
Tricky Not-Entailment relations among sentences Page  Page  examples from ( Hickl et al., 2006 ) One player losing a close...
<ul><li>He used a Phillips head to  tighten  the screw.  </li></ul><ul><li>The bank owner  tightened  security after a spa...
Context Sensitive Paraphrasing <ul><li>Can  speak   replace  command ? </li></ul><ul><li>The general  commanded  his troop...
Context Sensitive Paraphrasing <ul><li>Need to know  when  one word can paraphrase another, not just  if . </li></ul><ul><...
Related Work <ul><li>Paraphrase generation:   </li></ul><ul><ul><li>Given a sentence or phrase, generate paraphrases of th...
Context Sensitive Paraphrasing  [Connor&Roth ’07] <ul><li>Use a  single global binary classifier </li></ul><ul><ul><li>f( ...
IV. Applications of Textual Entailment Page
Relation Extraction (Romano et al. EACL-06) <ul><li>Identify different   ways   of expressing a target relation </li></ul>...
Proposed Approach Page  Input Template X prevent Y Entailment Rule Acquisition Templates X prevention for Y, X treat Y, X ...
Dataset <ul><li>Bunescu 2005 </li></ul><ul><li>Recognizing  interactions between annotated proteins pairs </li></ul><ul><u...
Manual Analysis - Results <ul><li>93%  of interacting protein pairs can be identified with lexical syntactic templates </l...
TEASE Output for  X  interact with  Y Page  A sample of correct templates learned: X binding to Y X bind to Y X Y interact...
<ul><li>Iterative - taking the top 5 ranked templates as input </li></ul><ul><li>Morph - recognizing morphological derivat...
Performance vs. Supervised Approaches Page  Supervised: 180 training abstracts
Textual Entailment for Question Answering <ul><li>Sanda Harabagiu and Andrew Hickl (ACL-06) :  Methods for Using Textual E...
Integrated three methods <ul><li>Test entailment between question and final answer –  filter and re-rank by entailment sco...
Answer Validation Exercise @ CLEF 2006-7 <ul><li>Peñas et al., Journal of Logic and Computation (to appear) </li></ul><ul>...
V. A Textual Entailment view of  Applied Semantics Page
Classical Approach = Interpretation Page  Stipulated Meaning Representation (by scholar) Language (by nature) Variability ...
Textual Entailment = Text Mapping Page  Assumed Meaning (by humans) Language (by nature) Variability
General Case – Inference Page  Meaning Representation Language Inference Interpretation Textual Entailment <ul><li>Entailm...
Some perspectives <ul><li>Issues with semantic interpretation </li></ul><ul><ul><li>Hard to agree on a representation lang...
Entailment as an Applied Semantics Framework  <ul><li>The new view:  formulate (all?) semantic problems as entailment task...
Some Classical Entailment Problems <ul><li>Monotonicity – traditionally approached via entailment </li></ul><ul><ul><li>Gi...
Revised definition of an Old Problem:  Sense Ambiguity <ul><li>Classical task definition - interpretation:  Word Sense  Di...
Synonym Substitution <ul><li>Source =  record Target =  disc </li></ul><ul><li>This is anyway a stunning  disc , thanks to...
Unsupervised Direct: kNN-ranking <ul><li>Test example score:   Average Cosine similarity of  target  example with  k  most...
Results (for synonyms): Ranking Page  <ul><li>kNN improves 8-18% precision up to 25% recall </li></ul>
Other Modified and New Problems <ul><li>Lexical entailment vs. classical lexical semantic relationships </li></ul><ul><ul>...
The importance of analyzing entailment examples <ul><li>Few systematic manual data analysis works were reported </li></ul>...
Unified Evaluation Framework <ul><li>Defining semantic problems as entailment problems facilitates unified evaluation sche...
Summary: Textual Entailment as Goal <ul><li>The essence of the textual entailment paradigm:  </li></ul><ul><ul><li>Base ap...
Textual Entailment  ≈   Human Reading Comprehension <ul><li>From a children’s English learning book (Sela and Greenberg): ...
Cautious Optimism: Approaching the Desiderata? <ul><ul><li>Generic (feasible) module for applications </li></ul></ul><ul><...
Lexical Entailment for Applications <ul><li>Sense equivalence </li></ul>Page  T1 :  IKEA announced a new comfort  chair Q ...
Meeting the  knowledge challenge  –  by a coordinated effort? <ul><li>A vast amount of “entailment knowledge” needed </li>...
Upcoming SlideShare
Loading in...5
×

ACL Tutorial on Textual Entailment

1,124

Published on

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,124
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
16
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • This is a collection of different problems of ambiguity resolution - from text correction – Sorry, it was too tempting to use this one… Word sense disambiguation, part of speech tagging to a decision that involves a decision across sentence All these are essentially the same classification problem – and with progress in learning theory and NLP We have pretty reliable solutions to these today. Here are a few more problems of this kind.
  • Scattered – but no systematic coverage of the space; SEMEVAL – 10 different tasks (19 overall)
  • Text understanding in a nutshell Levels of representation.
  • First step towards a broad semantic model of language variation
  • Related notions (implicature, pre-supposition) encompass likely inference
  • !!!Add: the intention is to have an absolutely high value for the conditional prob, which is also a confidence value Add assumption about prior &lt; 1
  • Regina warned me not to talk about Textual Entailment – so I will just use it as an example, and argue that some aspects of this work, at least the way, I am looking at this problem, also fall into the general “inference over classifiers”
  • Interpretation approach: supposedly easier to infer entailment at the meaning representation language The only way to test understanding – as we do not have access to actual meaning: how do I know that you understand (by phone)? A version of Turing test (I ignore entailment direction here)
  • Interpretation approach: supposedly easier to infer entailment at the meaning representation language The only way to test understanding – as we do not have access to actual meaning: how do I know that you understand (by phone)? A version of Turing test (I ignore entailment direction here)
  • 12 minutes to here
  • Interpretation approach: supposedly easier to infer entailment at the meaning representation language The only way to test understanding – as we do not have access to actual meaning: how do I know that you understand (by phone)? A version of Turing test (I ignore entailment direction here)
  • 12 minutes to here
  • This is a collection of different problems of ambiguity resolution - from text correction – Sorry, it was too tempting to use this one… Word sense disambiguation, part of speech tagging to a decision that involves a decision across sentence All these are essentially the same classification problem – and with progress in learning theory and NLP We have pretty reliable solutions to these today. Here are a few more problems of this kind.
  • This is a collection of different problems of ambiguity resolution - from text correction – Sorry, it was too tempting to use this one… Word sense disambiguation, part of speech tagging to a decision that involves a decision across sentence All these are essentially the same classification problem – and with progress in learning theory and NLP We have pretty reliable solutions to these today. Here are a few more problems of this kind.
  • This is a collection of different problems of ambiguity resolution - from text correction – Sorry, it was too tempting to use this one… Word sense disambiguation, part of speech tagging to a decision that involves a decision across sentence All these are essentially the same classification problem – and with progress in learning theory and NLP We have pretty reliable solutions to these today. Here are a few more problems of this kind.
  • This is a collection of different problems of ambiguity resolution - from text correction – Sorry, it was too tempting to use this one… Word sense disambiguation, part of speech tagging to a decision that involves a decision across sentence All these are essentially the same classification problem – and with progress in learning theory and NLP We have pretty reliable solutions to these today. Here are a few more problems of this kind.
  • This is a collection of different problems of ambiguity resolution - from text correction – Sorry, it was too tempting to use this one… Word sense disambiguation, part of speech tagging to a decision that involves a decision across sentence All these are essentially the same classification problem – and with progress in learning theory and NLP We have pretty reliable solutions to these today. Here are a few more problems of this kind.
  • Coiponimi
  • Coimponimi Two methods
  • State the name of the method – TEASE – and its acronym meaning Emphasize that the ASE part solves the supervision problem we had in previous web-based methods Finish with stating again the two parts of the TEASE method
  • Coiponimi
  • Coimponimi Two methods
  • Mentions Hickl &amp; Harabagiu ACL-2006 result for QA
  • Supposedly – map language into meaning; but it’s not really meaning It’s stipulated meaning representations, and we know this has been a dangerous business Maybe it’s not the most suitable framework - Maybe we’d like to stay away from the red area…
  • The only way to test understanding – as we do not have access to actual meaning: how do I know that you understand (by phone)? A version of Turing test (I ignore entailment direction here)
  • Interpretation approach: supposedly easier to infer entailment at the meaning representation language The only way to test understanding – as we do not have access to actual meaning: how do I know that you understand (by phone)? A version of Turing test (I ignore entailment direction here)
  • Emphasize – no external information (sense repository, definitions, hierarchy) Potential for improvement – similarity assessment
  • The common for method for testing human reading comprehension is by testing the entailment capability – either directly or via QA The difficulty – variability between question and text Knowledge needed – Florida in the US
  • Scattered – but no systematic coverage of the space; SEMEVAL – 10 different tasks (19 overall)
  • Mention similarity to MT
  • ACL Tutorial on Textual Entailment

    1. 1. Textual Entailment Dan Roth, University of Illinois, Urbana-Champaign USA ACL -2007 Ido Dagan Bar Ilan University Israel Fabio Massimo Zanzotto University of Rome Italy
    2. 2. <ul><li>Motivation and Task Definition </li></ul><ul><li>A Skeletal review of Textual Entailment Systems </li></ul><ul><li>Knowledge Acquisition Methods </li></ul><ul><li>Applications of Textual Entailment </li></ul><ul><li>A Textual Entailment view of Applied Semantics </li></ul>Outline Page
    3. 3. I. Motivation and Task Definition Page
    4. 4. Motivation <ul><li>Text applications require semantic inference </li></ul><ul><li>A common framework for applied semantics is needed, but still missing </li></ul><ul><li>Textual entailment may provide such framework </li></ul>Page
    5. 5. Desiderata for Modeling Framework <ul><li>A framework for a target level of language processing should provide: </li></ul><ul><ul><li>Generic (feasible) module for applications </li></ul></ul><ul><ul><li>Unified (agreeable) paradigm for investigating language phenomena </li></ul></ul><ul><li>Most semantics research is scattered </li></ul><ul><ul><li>WSD, NER, SRL, lexical semantics relations… (e.g. vs. syntax) </li></ul></ul><ul><ul><li>Dominating approach - interpretation </li></ul></ul>Page
    6. 6. Natural Language and Meaning Page Meaning Language Ambiguity Variability
    7. 7. Variability of Semantic Expression <ul><li>Model variability as relations between text expressions: </li></ul><ul><li>Equivalence: text1  text2 (paraphrasing) </li></ul><ul><li>Entailment: text1  text2 the general case </li></ul>Page Dow ends up Dow climbs 255 The Dow Jones Industrial Average closed up 255 Stock market hits a record high Dow gains 255 points
    8. 8. Typical Application Inference: Entailment <ul><li>Similar for IE: X acquire Y </li></ul><ul><li>Similar for “semantic” IR: t: Overture was bought for … </li></ul><ul><li>Summarization (multi-document) – identify redundant info </li></ul><ul><li>MT evaluation (and recent ideas for MT) </li></ul><ul><li>Educational applications </li></ul>Page Overture ’s acquisition by Yahoo Yahoo bought Overture Question Expected answer form Who bought Overture ? >> X bought Overture text hypothesized answer entails
    9. 9. KRAQ'05 Workshop - KNOWLEDGE and REASONING for ANSWERING QUESTIONS (IJCAI-05) <ul><ul><li>CFP: </li></ul></ul><ul><ul><li>Reasoning aspects:     * information fusion,     * search criteria expansion models     * summarization and intensional answers,     * reasoning under uncertainty or with incomplete knowledge, </li></ul></ul><ul><ul><li>Knowledge representation and integration:     * levels of knowledge involved (e.g. ontologies, domain knowledge),     * knowledge extraction models and techniques to optimize response accuracy … but similar needs for other applications – can entailment provide a common empirical framework? </li></ul></ul>Page
    10. 10. Classical Entailment Definition <ul><li>Chierchia & McConnell-Ginet (2001): A text t entails a hypothesis h if h is true in every circumstance (possible world) in which t is true </li></ul><ul><li>Strict entailment - doesn't account for some uncertainty allowed in applications </li></ul>Page
    11. 11. “ Almost certain” Entailments <ul><li>t: The technological triumph known as GPS … was incubated in the mind of Ivan Getting. </li></ul><ul><li>h: Ivan Getting invented the GPS. </li></ul>Page
    12. 12. Applied Textual Entailment <ul><li>A directional relation between two text fragments: Text (t) and Hypothesis (h): </li></ul>Page <ul><li>Operational (applied) definition: </li></ul><ul><ul><li>Human gold standard - as in NLP applications </li></ul></ul><ul><ul><li>Assuming common background knowledge – which is indeed expected from applications </li></ul></ul>t entails h (t  h) if humans reading t will infer that h is most likely true
    13. 13. Probabilistic Interpretation <ul><li>Definition : </li></ul><ul><li>t probabilistically entails h if: </li></ul><ul><ul><li>P( h is true | t ) > P( h is true ) </li></ul></ul><ul><ul><ul><li>t increases the likelihood of h being true </li></ul></ul></ul><ul><ul><ul><li>≡ Positive PMI – t provides information on h ’ s truth </li></ul></ul></ul><ul><li>P( h is true | t ): entailment confidence </li></ul><ul><ul><li>The relevant entailment score for applications </li></ul></ul><ul><ul><li>In practice: “most likely” entailment expected </li></ul></ul>Page
    14. 14. The Role of Knowledge <ul><li>For textual entailment to hold we require: </li></ul><ul><ul><li>text AND knowledge  h </li></ul></ul><ul><ul><li>but </li></ul></ul><ul><ul><li>knowledge should not entail h alone </li></ul></ul><ul><li>Systems are not supposed to validate h ’s truth regardless of t (e.g. by searching h on the web) </li></ul>Page
    15. 15. Page PASCAL Recognizing Textual Entailment (RTE) Challenges EU FP-6 Funded PASCAL Network of Excellence 2004-7 Bar-Ilan University ITC-irst and CELCT, Trento MITRE Microsoft Research
    16. 16. Generic Dataset by Application Use <ul><li>7 application settings in RTE-1, 4 in RTE-2/3 </li></ul><ul><ul><li>QA </li></ul></ul><ul><ul><li>IE </li></ul></ul><ul><ul><li>“Semantic” IR </li></ul></ul><ul><ul><li>Comparable documents / multi-doc summarization </li></ul></ul><ul><ul><li>MT evaluation </li></ul></ul><ul><ul><li>Reading comprehension </li></ul></ul><ul><ul><li>Paraphrase acquisition </li></ul></ul><ul><li>Most data created from actual applications output </li></ul><ul><li>RTE-2/3: 800 examples in development and test sets </li></ul><ul><li>50-50% YES/NO split </li></ul>Page
    17. 17. RTE Examples Page TEXT HYPOTHESIS TASK ENTAIL-MENT 1 Regan attended a ceremony in Washington to commemorate the landings in Normandy. Washington is located in Normandy. IE False 2 Google files for its long awaited IPO. Google goes public. IR True 3 … : a shootout at the Guadalajara airport in May, 1993, that killed Cardinal Juan Jesus Posadas Ocampo and six others. Cardinal Juan Jesus Posadas Ocampo died in 1993. QA True 4 The SPD got just 21.5% of the vote in the European Parliament elections, while the conservative opposition parties polled 44.5%. The SPD is defeated by the opposition parties. IE True
    18. 18. Participation and Impact <ul><li>Very successful challenges, world wide: </li></ul><ul><ul><li>RTE-1 – 17 groups </li></ul></ul><ul><ul><li>RTE-2 – 23 groups </li></ul></ul><ul><ul><ul><li>~150 downloads </li></ul></ul></ul><ul><ul><li>RTE-3 – 25 groups </li></ul></ul><ul><ul><ul><li>Joint workshop at ACL-07 </li></ul></ul></ul><ul><li>High interest in the research community </li></ul><ul><ul><li>Papers, conference sessions and areas, PhD’s, influence on funded projects </li></ul></ul><ul><ul><li>Textual Entailment special issue at JNLE </li></ul></ul><ul><ul><li>ACL-07 tutorial </li></ul></ul>Page
    19. 19. Methods and Approaches (RTE-2) <ul><li>Measure similarity match between t and h ( coverage of h by t) : </li></ul><ul><ul><li>Lexical overlap (unigram, N-gram, subsequence) </li></ul></ul><ul><ul><li>Lexical substitution (WordNet, statistical) </li></ul></ul><ul><ul><li>Syntactic matching/transformations </li></ul></ul><ul><ul><li>Lexical-syntactic variations (“paraphrases”) </li></ul></ul><ul><ul><li>Semantic role labeling and matching </li></ul></ul><ul><ul><li>Global similarity parameters (e.g. negation, modality) </li></ul></ul><ul><li>Cross-pair similarity </li></ul><ul><li>Detect mismatch (for non-entailment) </li></ul><ul><li>Interpretation to logic representation + logic inference </li></ul>Page
    20. 20. Dominant approach: Supervised Learning <ul><li>Features model similarity and mismatch </li></ul><ul><li>Classifier determines relative weights of information sources </li></ul><ul><li>Train on development set and auxiliary t-h corpora </li></ul>Page t,h Similarity Features: Lexical, n-gram,syntactic semantic, global Feature vector Classifier YES NO
    21. 21. RTE-2 Results Page Average: 60% Median: 59% Average Precision Accuracy First Author (Group) 80.8% 75.4% Hickl (LCC) 71.3% 73.8% Tatu (LCC) 64.4% 63.9% Zanzotto (Milan & Rome) 62.8% 62.6% Adams (Dallas) 66.9% 61.6% Bos (Rome & Leeds) 58.1%-60.5% 11 groups 52.9%-55.6% 7 groups
    22. 22. Analysis <ul><li>For the first time: methods that carry some deeper analysis seemed (?) to outperform shallow lexical methods </li></ul>Page  Cf. Kevin Knight’s invited talk at EACL-06, titled: Isn’t linguistic Structure Important, Asked the Engineer <ul><li>Still, most systems, which do utilize deep analysis, did not score significantly better than the lexical baseline </li></ul>
    23. 23. Why? <ul><li>System reports point at: </li></ul><ul><ul><li>Lack of knowledge (syntactic transformation rules, paraphrases, lexical relations, etc.) </li></ul></ul><ul><ul><li>Lack of training data </li></ul></ul><ul><li>It seems that systems that coped better with these issues performed best: </li></ul><ul><ul><li>Hickl et al. - acquisition of large entailment corpora for training </li></ul></ul><ul><ul><li>Tatu et al. – large knowledge bases (linguistic and world knowledge) </li></ul></ul>Page
    24. 24. Some suggested research directions <ul><li>Knowledge acquisition </li></ul><ul><ul><li>Unsupervised acquisition of linguistic and world knowledge from general corpora and web </li></ul></ul><ul><ul><li>Acquiring larger entailment corpora </li></ul></ul><ul><ul><li>Manual resources and knowledge engineering </li></ul></ul><ul><li>Inference </li></ul><ul><ul><li>Principled framework for inference and fusion of information levels </li></ul></ul><ul><ul><li>Are we happy with bags of features? </li></ul></ul>Page
    25. 25. Complementary Evaluation Modes <ul><li>“ Seek” mode: </li></ul><ul><ul><li>Input: h and corpus </li></ul></ul><ul><ul><li>Output: all entailing t ’s in corpus </li></ul></ul><ul><ul><li>Captures information seeking needs, but requires post-run annotation (TREC-style) </li></ul></ul><ul><li>Entailment subtasks evaluations </li></ul><ul><ul><li>Lexical, lexical-syntactic, logical, alignment… </li></ul></ul><ul><li>Contribution to various applications </li></ul><ul><ul><li>QA – Harabagiu & Hickl, ACL-06; RE – Romano et al., EACL-06 </li></ul></ul>Page
    26. 26. II. A Skeletal review of Textual Entailment Systems Page
    27. 27. Textual Entailment Page Eyeing the huge market potential, currently led by Google, Yahoo took over search company Overture Services Inc. last year Yahoo acquired Overture Entails Subsumed by  Overture is a search company Google is a search company ……… . Google owns Overture Phrasal verb paraphrasing Entity matching Semantic Role Labeling Alignment Integration How?
    28. 28. A general Strategy for Textual Entailment Page Given a sentence T Decision Find the set of Transformations/Features of the new representation (or: use these to create a cost function) that allows embedding of H in T. Given a sentence H  e Re-represent T Lexical Syntactic Semantic Knowledge Base semantic; structural & pragmatic Transformations/rules Re-represent T Re-represent T Re-represent H Lexical Syntactic Semantic Re-represent T Re-represent T Re-represent T Re-represent T Re-represent T Representation
    29. 29. Details of The Entailment Strategy <ul><li>Preprocessing </li></ul><ul><ul><li>Multiple levels of lexical pre-processing </li></ul></ul><ul><ul><li>Syntactic Parsing </li></ul></ul><ul><ul><li>Shallow semantic parsing </li></ul></ul><ul><ul><li>Annotating semantic phenomena </li></ul></ul><ul><li>Representation </li></ul><ul><ul><li>Bag of words, n-grams through tree/graphs based representation </li></ul></ul><ul><ul><li>Logical representations </li></ul></ul>Page <ul><li>Knowledge Sources </li></ul><ul><ul><li>Syntactic mapping rules </li></ul></ul><ul><ul><li>Lexical resources </li></ul></ul><ul><ul><li>Semantic Phenomena specific modules </li></ul></ul><ul><ul><li>RTE specific knowledge sources </li></ul></ul><ul><ul><li>Additional Corpora/Web resources </li></ul></ul><ul><li>Control Strategy & Decision Making </li></ul><ul><ul><li>Single pass/iterative processing </li></ul></ul><ul><ul><li>Strict vs. Parameter based </li></ul></ul><ul><li>Justification </li></ul><ul><ul><li>What can be said about the decision? </li></ul></ul>
    30. 30. The Case of Shallow Lexical Approaches <ul><li>Preprocessing </li></ul><ul><ul><li>Identify Stop Words </li></ul></ul><ul><li>Representation </li></ul><ul><ul><li>Bag of words </li></ul></ul>Page <ul><li>Knowledge Sources </li></ul><ul><ul><li>Shallow Lexical resources – typically Wordnet </li></ul></ul><ul><li>Control Strategy & Decision Making </li></ul><ul><ul><li>Single pass </li></ul></ul><ul><ul><li>Compute Similarity; use threshold tuned on a development set (could be per task) </li></ul></ul><ul><li>Justification </li></ul><ul><ul><li>It works </li></ul></ul>
    31. 31. Shallow Lexical Approaches (Example) <ul><li>Lexical/word-based semantic overlap: score based on matching each word in H with some word in T </li></ul><ul><ul><li>Word similarity measure: may use WordNet </li></ul></ul><ul><ul><li>May take account of subsequences, word order </li></ul></ul><ul><ul><li>‘ Learn’ threshold on maximum word-based match score </li></ul></ul>Page Text: The Cassini spacecraft has taken images that show rivers on Saturn’s moon Titan. Hyp: The Cassini spacecraft has reached Titan. Text: NASA’s Cassini-Huygens spacecraft traveled to Saturn in 2006. Text: The Cassini spacecraft arrived at Titan in July, 2006. Clearly, this may not appeal to what we think as understanding , and it is easy to generate cases for which this does not work well. However, it works (surprisingly) well with respect to current evaluation metrics (data sets?)
    32. 32. An Algorithm: L ocal L excial M atching <ul><li>For each word in Hypothesis, Text </li></ul><ul><ul><li>if word matches stopword – remove word </li></ul></ul><ul><ul><li>if no words left in Hypothesis or Text return 0 </li></ul></ul><ul><li>numberMatched = 0; </li></ul><ul><ul><li>for each word W_H in Hypothesis </li></ul></ul><ul><ul><li>for each word W_T in Text </li></ul></ul><ul><ul><li>HYP_LEMMAS = Lemmatize(W_H); </li></ul></ul><ul><ul><li>TEXT_LEMMAS = Lemmatize(W_T); </li></ul></ul><ul><ul><ul><li>Use Wordnet’s </li></ul></ul></ul><ul><li>if any term in HYP_LEMMAS matches any term in TEXT_LEMMAS </li></ul><ul><ul><li>using LexicalCompare() </li></ul></ul><ul><li>numberMatched++; </li></ul><ul><li>Return: numberMatched/|HYP_Lemmas| </li></ul>Page
    33. 33. An Algorithm: L ocal L exical M atching (Cont.) <ul><li>LexicalCompare() </li></ul><ul><ul><li>if(LEMMA_H == LEMMA_T) </li></ul></ul><ul><ul><ul><li>return TRUE; </li></ul></ul></ul><ul><ul><li>if( HypernymDistance FromTo(textWord, hypothesisWord) <= 3) </li></ul></ul><ul><ul><ul><li>return TRUE; </li></ul></ul></ul><ul><ul><li>if( MeronymyDistance FromTo(textWord, hypothesisWord) <= 3) </li></ul></ul><ul><ul><ul><li>returnTRUE; </li></ul></ul></ul><ul><ul><li>if( MemberOfDistance FromTo(textWord, hypothesisWord) <= 3) </li></ul></ul><ul><ul><ul><li>return TRUE: </li></ul></ul></ul><ul><ul><li>if( SynonymOf (textWord, hypothesisWord) </li></ul></ul><ul><ul><ul><li>return TRUE; </li></ul></ul></ul><ul><li>Notes: </li></ul><ul><ul><li>LexicalCompare is Asymmetric & makes use of single relation type </li></ul></ul><ul><ul><li>Additional differences could be attributed to stop word list (e.g, including aux verbs) </li></ul></ul><ul><ul><li>Straightforward improvements such as bi-grams do not help. </li></ul></ul><ul><ul><li>More sophisticated lexical knowledge (entities; time) should help. </li></ul></ul>Page LLM Performance: RTE2: Dev: 63.00 Test: 60.50 RTE 3: Dev: 67.50 Test: 65.63
    34. 34. Details of The Entailment Strategy (Again) <ul><li>Preprocessing </li></ul><ul><ul><li>Multiple levels of lexical pre-processing </li></ul></ul><ul><ul><li>Syntactic Parsing </li></ul></ul><ul><ul><li>Shallow semantic parsing </li></ul></ul><ul><ul><li>Annotating semantic phenomena </li></ul></ul><ul><li>Representation </li></ul><ul><ul><li>Bag of words, n-grams through tree/graphs based representation </li></ul></ul><ul><ul><li>Logical representations </li></ul></ul>Page <ul><li>Knowledge Sources </li></ul><ul><ul><li>Syntactic mapping rules </li></ul></ul><ul><ul><li>Lexical resources </li></ul></ul><ul><ul><li>Semantic Phenomena specific modules </li></ul></ul><ul><ul><li>RTE specific knowledge sources </li></ul></ul><ul><ul><li>Additional Corpora/Web resources </li></ul></ul><ul><li>Control Strategy & Decision Making </li></ul><ul><ul><li>Single pass/iterative processing </li></ul></ul><ul><ul><li>Strict vs. Parameter based </li></ul></ul><ul><li>Justification </li></ul><ul><ul><li>What can be said about the decision? </li></ul></ul>
    35. 35. Preprocessing <ul><li>Syntactic Processing: </li></ul><ul><ul><li>Syntactic Parsing (Collins; Charniak; CCG) </li></ul></ul><ul><ul><li>Dependency Parsing (+types) </li></ul></ul><ul><li>Lexical Processing </li></ul><ul><ul><li>Tokenization; lemmatization </li></ul></ul><ul><ul><li>For each word in Hypothesis, Text </li></ul></ul><ul><ul><li>Phrasal verbs </li></ul></ul><ul><ul><li>Idiom processing </li></ul></ul><ul><ul><li>Named Entities + Normalization </li></ul></ul><ul><ul><li>Date/Time arguments + Normalization </li></ul></ul><ul><li>Semantic Processing </li></ul><ul><ul><li>Semantic Role Labeling </li></ul></ul><ul><ul><li>Nominalization </li></ul></ul><ul><ul><li>Modality/Polarity/Factive </li></ul></ul><ul><ul><li>Co-reference </li></ul></ul>Page } often used only during decision making } often used only during decision making Only a few systems
    36. 36. Details of The Entailment Strategy (Again) <ul><li>Preprocessing </li></ul><ul><ul><li>Multiple levels of lexical pre-processing </li></ul></ul><ul><ul><li>Syntactic Parsing </li></ul></ul><ul><ul><li>Shallow semantic parsing </li></ul></ul><ul><ul><li>Annotating semantic phenomena </li></ul></ul><ul><li>Representation </li></ul><ul><ul><li>Bag of words, n-grams through tree/graphs based representation </li></ul></ul><ul><ul><li>Logical representations </li></ul></ul>Page <ul><li>Knowledge Sources </li></ul><ul><ul><li>Syntactic mapping rules </li></ul></ul><ul><ul><li>Lexical resources </li></ul></ul><ul><ul><li>Semantic Phenomena specific modules </li></ul></ul><ul><ul><li>RTE specific knowledge sources </li></ul></ul><ul><ul><li>Additional Corpora/Web resources </li></ul></ul><ul><li>Control Strategy & Decision Making </li></ul><ul><ul><li>Single pass/iterative processing </li></ul></ul><ul><ul><li>Strict vs. Parameter based </li></ul></ul><ul><li>Justification </li></ul><ul><ul><li>What can be said about the decision? </li></ul></ul>
    37. 37. Basic Representations Page Meaning Representation Raw Text Inference Representation Textual Entailment Local Lexical Syntactic Parse Semantic Representation Logical Forms <ul><li>Most approaches augment the basic structure defined by the processing level with additional annotation and make use of a tree/graph/frame-based system. </li></ul>
    38. 38. Basic Representations (Syntax) Page Local Lexical Syntactic Parse Hyp: The Cassini spacecraft has reached Titan.
    39. 39. Basic Representations (Shallow Semantics: Pred-Arg ) <ul><li>T: The government purchase of the Roanoke building, a former prison, took place in 1902. </li></ul><ul><li>H: The Roanoke building, which was a former prison, was bought by the government in 1902. </li></ul>Page The govt. purchase… prison take place in 1902 The government buy The Roanoke … prison The Roanoke building be a former prison purchase The Roanoke building In 1902 Roth&Sammons’07 ARG_0 ARG_1 ARG_2 PRED ARG_0 ARG_1 PRED ARG_1 ARG_2 PRED ARG_1 PRED AM_TMP
    40. 40. Basic Representations (Logical Representation) Page [Bos & Markert] The semantic representation language is a first-order fragment a language used in Discourse Representation Theory (DRS), conveying argument structure with a neo-Davidsonian analysis and Including the recursive DRS structure to cover negation, disjunction, and implication.
    41. 41. Representing Knowledge Sources <ul><li>Rather straight forward in the Logical Framework: </li></ul>Page <ul><li>Tree/Graph base representation may also use rule based transformations to encode different kinds of knowledge , sometimes represented as generic or knowledge based tree transformations. </li></ul>
    42. 42. Representing Knowledge Sources (cont.) Page <ul><li>In general, there is a mix of procedural and rule based encodings of knowledge sources </li></ul><ul><ul><li>Done by hanging more information on parse tree or predicate argument representation [Example from LCC’s system] </li></ul></ul><ul><ul><li>Or different frame-based annotation systems for encoding information, that are processed procedurally. </li></ul></ul>
    43. 43. Details of The Entailment Strategy (Again) <ul><li>Preprocessing </li></ul><ul><ul><li>Multiple levels of lexical pre-processing </li></ul></ul><ul><ul><li>Syntactic Parsing </li></ul></ul><ul><ul><li>Shallow semantic parsing </li></ul></ul><ul><ul><li>Annotating semantic phenomena </li></ul></ul><ul><li>Representation </li></ul><ul><ul><li>Bag of words, n-grams through tree/graphs based representation </li></ul></ul><ul><ul><li>Logical representations </li></ul></ul>Page <ul><li>Knowledge Sources </li></ul><ul><ul><li>Syntactic mapping rules </li></ul></ul><ul><ul><li>Lexical resources </li></ul></ul><ul><ul><li>Semantic Phenomena specific modules </li></ul></ul><ul><ul><li>RTE specific knowledge sources </li></ul></ul><ul><ul><li>Additional Corpora/Web resources </li></ul></ul><ul><li>Control Strategy & Decision Making </li></ul><ul><ul><li>Single pass/iterative processing </li></ul></ul><ul><ul><li>Strict vs. Parameter based </li></ul></ul><ul><li>Justification </li></ul><ul><ul><li>What can be said about the decision? </li></ul></ul>
    44. 44. Knowledge Sources <ul><li>The knowledge sources available to the system are the most significant component of supporting TE. </li></ul><ul><li>Different systems draw differently the line between preprocessing capabilities and knowledge resources. </li></ul><ul><li>The way resources are handled is also different across different approaches. </li></ul>Page
    45. 45. Enriching Preprocessing <ul><li>In addition to syntactic parsing several approaches enrich the representation with various linguistics resources </li></ul><ul><ul><li>Pos tagging </li></ul></ul><ul><ul><li>Stemming </li></ul></ul><ul><ul><li>Predicate argument representation : verb predicates and nominalization </li></ul></ul><ul><ul><li>Entity Annotation : Stand alone NERs with a variable number of classes </li></ul></ul><ul><ul><li>Acronym handling and Entity Normalization : mapping mentions of the same entity mentioned in different ways to a single ID. </li></ul></ul><ul><ul><li>Co-reference resolution </li></ul></ul><ul><ul><li>Dates, times and numeric values; identification and normalization. </li></ul></ul><ul><ul><li>Identification of semantic relations: complex nominals, genitives, adjectival phrases, and adjectival clauses. </li></ul></ul><ul><ul><li>Event identification and frame construction. </li></ul></ul>Page
    46. 46. Lexical Resources <ul><li>Recognizing that a word or a phrase in S entails a word or a phrase in H is essential in determining Textual Entailment. </li></ul><ul><li>Wordnet is the most commonly used resoruce </li></ul><ul><ul><li>In most cases, a Wordnet based similarity measure between words is used. This is typically a symmetric relation. </li></ul></ul><ul><ul><li>Lexical chains over Wordnet are used; in some cases, care is taken to disallow some chains of specific relations. </li></ul></ul><ul><ul><li>Extended Wordnet is being used to make use of Entities </li></ul></ul><ul><ul><li>Derivation relation which links verbs with their corresponding nominalized nouns. </li></ul></ul>Page
    47. 47. Lexical Resources (Cont.) <ul><li>Lexical Paraphrasing Rules </li></ul><ul><ul><li>A number of efforts to acquire relational paraphrase rules are under way, and several systems are making use of resources such as DIRT and TEASE. </li></ul></ul><ul><ul><li>Some systems seems to have acquired paraphrase rules that are in the RTE corpus </li></ul></ul><ul><ul><ul><li>person killed --> claimed one life </li></ul></ul></ul><ul><ul><ul><li>hand reins over to --> give starting job to </li></ul></ul></ul><ul><ul><ul><li>same-sex marriage --> gay nuptials </li></ul></ul></ul><ul><ul><ul><li>cast ballots in the election -> vote </li></ul></ul></ul><ul><ul><ul><li>dominant firm --> monopoly power </li></ul></ul></ul><ul><ul><ul><li>death toll --> kill </li></ul></ul></ul><ul><ul><ul><li>try to kill --> attack </li></ul></ul></ul><ul><ul><ul><li>lost their lives --> were killed </li></ul></ul></ul><ul><ul><ul><li>left people dead --> people were killed </li></ul></ul></ul>Page
    48. 48. Semantic Phenomena <ul><li>A large number of semantic phenomena have been identified as significant to Textual Entailment. </li></ul><ul><li>A large number of them are being handled (in a restricted way) by some of the systems. Very little quantification per-phenomena has been done, if at all. </li></ul><ul><li>Semantic implications of interpreting syntactic structures [Braz et. al’05; Bar-Haim et. al. ’07] </li></ul><ul><li>Conjunctions </li></ul><ul><ul><li>Jake and Jill ran up the hill Jake ran up the hill </li></ul></ul><ul><ul><li>Jake and Jill met on the hill *Jake met on the hill </li></ul></ul><ul><li>Clausal modifiers </li></ul><ul><ul><li>But celebrations were muted as many Iranians observed a Shi'ite mourning month. </li></ul></ul><ul><ul><li>Many Iranians observed a Shi'ite mourning month. </li></ul></ul><ul><ul><li>Semantic Role Labeling handles this phenomena automatically </li></ul></ul>Page
    49. 49. Semantic Phenomena (Cont.) <ul><li>Relative clauses </li></ul><ul><ul><li>The assailants fired six bullets at the car, which carried Vladimir Skobtsov. </li></ul></ul><ul><ul><li>The car carried Vladimir Skobtsov. </li></ul></ul><ul><ul><li>Semantic Role Labeling handles this phenomena automatically </li></ul></ul><ul><li>Appositives </li></ul><ul><ul><li>Frank Robinson, a one-time manager of the Indians, has the distinction for the NL. </li></ul></ul><ul><ul><li>Frank Robinson is a one-time manager of the Indians. </li></ul></ul><ul><li>Passive </li></ul><ul><ul><li>We have been approached by the investment banker. </li></ul></ul><ul><ul><li>The investment banker approached us. </li></ul></ul><ul><ul><li>Semantic Role Labeling handles this phenomena automatically </li></ul></ul><ul><li>Genitive modifier </li></ul><ul><ul><li>Malaysia's crude palm oil output is estimated to have risen.. </li></ul></ul><ul><ul><li>The crude palm oil output of Malasia is estimated to have risen . </li></ul></ul>Page
    50. 50. Logical Structure <ul><li>Factivity : Uncovering the context in which a verb phrase is embedded </li></ul><ul><ul><li>The terrorists tried to enter the building. </li></ul></ul><ul><ul><li>The terrorists entered the building. </li></ul></ul><ul><li>Polarity negative markers or a negation-denoting verb (e.g. deny , refuse, fail ) </li></ul><ul><ul><li>The terrorists failed to enter the building. </li></ul></ul><ul><ul><li>The terrorists entered the building. </li></ul></ul><ul><li>Modality/Negation Dealing with modal auxiliary verbs (can, must, should), that modify verbs’ meanings and with the identification of the scope of negation. </li></ul><ul><li>Superlatives/Comperatives/Monotonicity: inflecting adjectives or adverbs. </li></ul><ul><li>Quantifiers, determiners and articles </li></ul>Page
    51. 51. Some Examples [ Braz et. al. IJCAI workshop’05;PARC Corpus] <ul><li>T : Legally, John could drive. </li></ul><ul><li>H : John drove. </li></ul><ul><li>. </li></ul><ul><li>S: Bush said that Khan sold centrifuges to North Korea. </li></ul><ul><li>H: Centrifuges were sold to North Korea. </li></ul><ul><li>. </li></ul><ul><li>S: No US congressman visited Iraq until the war. </li></ul><ul><li>H: Some US congressmen visited Iraq before the war. </li></ul><ul><li>S: The room was full of women. </li></ul><ul><li>H: The room was full of intelligent women. </li></ul><ul><li>S: The New York Times reported that Hanssen sold FBI secrets to the Russians and could face the death penalty. </li></ul><ul><li>H: Hanssen sold FBI secrets to the Russians. </li></ul><ul><li>S: All soldiers were killed in the ambush. </li></ul><ul><li>H: Many soldiers were killed in the ambush. </li></ul>Page
    52. 52. Details of The Entailment Strategy (Again) <ul><li>Preprocessing </li></ul><ul><ul><li>Multiple levels of lexical pre-processing </li></ul></ul><ul><ul><li>Syntactic Parsing </li></ul></ul><ul><ul><li>Shallow semantic parsing </li></ul></ul><ul><ul><li>Annotating semantic phenomena </li></ul></ul><ul><li>Representation </li></ul><ul><ul><li>Bag of words, n-grams through tree/graphs based representation </li></ul></ul><ul><ul><li>Logical representations </li></ul></ul>Page <ul><li>Knowledge Sources </li></ul><ul><ul><li>Syntactic mapping rules </li></ul></ul><ul><ul><li>Lexical resources </li></ul></ul><ul><ul><li>Semantic Phenomena specific modules </li></ul></ul><ul><ul><li>RTE specific knowledge sources </li></ul></ul><ul><ul><li>Additional Corpora/Web resources </li></ul></ul><ul><li>Control Strategy & Decision Making </li></ul><ul><ul><li>Single pass/iterative processing </li></ul></ul><ul><ul><li>Strict vs. Parameter based </li></ul></ul><ul><li>Justification </li></ul><ul><ul><li>What can be said about the decision? </li></ul></ul>
    53. 53. Control Strategy and Decision Making <ul><li>Single Iteration </li></ul><ul><ul><li>Strict Logical approaches are, in principle, a single stage computation. </li></ul></ul><ul><ul><li>The pair is processed and transform into the logic form. </li></ul></ul><ul><ul><li>Existing Theorem Provers act on the pair along with the KB. </li></ul></ul><ul><li>Multiple iterations </li></ul><ul><ul><li>Graph based algorithms are typically iterative. </li></ul></ul><ul><ul><li>Following [Punyakanok et. al ’04] transformations are applied and entailment test is done after each transformation is applied. </li></ul></ul><ul><ul><li>Transformation can be chained, but sometimes the order makes a difference. The algorithm can be a greedy algorithm or can be more exhaustive, and search for the best path found [Braz et. al’05;Bar-Haim et.al 07] </li></ul></ul>Page
    54. 54. Transformation Walkthrough [Braz et. al’05] <ul><li>T: The government purchase of the Roanoke building, a former prison, took place in 1902. </li></ul><ul><li>H: The Roanoke building, which was a former prison, was bought by the government in 1902. </li></ul>Page Does ‘H’ follow from ‘T’?
    55. 55. Transformation Walkthrough (1) <ul><li>T: The government purchase of the Roanoke building, a former prison, took place in 1902. </li></ul><ul><li>H: The Roanoke building, which was a former prison, was bought by the government in 1902. </li></ul>Page The govt. purchase… prison take place in 1902 The government buy The Roanoke … prison The Roanoke building be a former prison purchase The Roanoke building In 1902 ARG_0 ARG_1 ARG_2 PRED ARG_0 ARG_1 PRED ARG_1 ARG_2 PRED ARG_1 PRED AM_TMP
    56. 56. Transformation Walkthrough (2) <ul><li>T: The government purchase of the Roanoke building, a former prison, took place in 1902. </li></ul><ul><li>The government purchase of the Roanoke building, </li></ul><ul><li>a former prison, occurred in 1902. </li></ul><ul><li>H: The Roanoke building, which was a former prison, was bought by the government. </li></ul>Page The govt. purchase… prison occur in 1902 Phrasal Verb Rewriter ARG_0 ARG_2 PRED
    57. 57. Transformation Walkthrough (3) <ul><li>T: The government purchase of the Roanoke building, a former prison, occurred in 1902. </li></ul><ul><li>The government purchase the Roanoke building in 1902. </li></ul><ul><li>H: The Roanoke building, which was a former prison, was bought by the government in 1902. </li></ul>Page The government purchase Nominalization Promoter the Roanoke building, a former prison In 1902 NOTE: depends on earlier transformation: order is important! ARG_0 ARG_1 PRED AM_TMP
    58. 58. Transformation Walkthrough (4) <ul><li>T: The government purchase of the Roanoke building, a former prison, occurred in 1902. </li></ul><ul><li>The Roanoke building be a former prison. </li></ul><ul><li>H: The Roanoke building, which was a former prison, was bought by the government in 1902. </li></ul>Page The Roanoke building be Apposition Rewriter a former prison ARG_1 ARG_2 PRED
    59. 59. Transformation Walkthrough (5) <ul><li>T: The government purchase of the Roanoke building, a former prison, took place in 1902. </li></ul><ul><li>H: The Roanoke building, which was a former prison, was bought by the government in 1902. </li></ul>Page The government buy The Roanoke … prison The Roanoke building be a former prison In 1902 The government purchase The Roanoke … prison The Roanoke building be a former prison In 1902 WordNet ARG_0 ARG_1 PRED ARG_1 ARG_2 PRED AM_TMP ARG_0 ARG_1 PRED ARG_1 ARG_2 PRED AM_TMP
    60. 60. Characteristics <ul><li>Multiple paths => optimization problem </li></ul><ul><ul><li>Shortest or highest-confidence path through transformations </li></ul></ul><ul><ul><li>Order is important; may need to explore different orderings </li></ul></ul><ul><ul><li>Module dependencies are ‘local’; module B does not need access to module A’s KB/inference, only its output </li></ul></ul><ul><li>If outcome is “true”, the (optimal) set of transformations and local comparisons form a proof </li></ul>Page
    61. 61. Summary: Control Strategy and Decision Making <ul><li>Despite the appeal of the Strict Logical approaches as of today, they do not work well enough. </li></ul><ul><ul><li>Bos & Markert: </li></ul></ul><ul><ul><ul><li>Strict logical approach is failing significantly behind good LLMs and multiple levels of lexical pre-processing </li></ul></ul></ul><ul><ul><ul><li>Only incorporating rather shallow features and using it in the evaluation saves this approach. </li></ul></ul></ul><ul><ul><li>Braz et. al.: </li></ul></ul><ul><ul><ul><li>Strict graph based representation is not doing as well as LLM. </li></ul></ul></ul><ul><ul><li>Tatu et. al </li></ul></ul><ul><ul><ul><li>Results show that strict logical approach is inferior to LLMs, but when put together, it produces some gain. </li></ul></ul></ul><ul><li>Using Machine Learning methods as a way to combine systems and multiple features has been found very useful. </li></ul>Page
    62. 62. Hybrid/Ensemble Approaches <ul><li>Bos et al.: use theorem prover and model builder </li></ul><ul><ul><li>Expand models of T , H using model builder, check sizes of models </li></ul></ul><ul><ul><li>Test consistency with background knowledge with T , H </li></ul></ul><ul><ul><li>Try to prove entailment with and without background knowledge </li></ul></ul><ul><li>Tatu et al. (2006) use ensemble approach: </li></ul><ul><ul><li>Create two logical systems, one lexical alignment system </li></ul></ul><ul><ul><li>Combine system scores using coefficients found via search (train on annotated data) </li></ul></ul><ul><ul><li>Modify coefficients for different tasks </li></ul></ul><ul><li>Zanzotto et al. (2006) try to learn from comparison of structures of T, H for ‘true’ vs. ‘false’ entailment pairs </li></ul><ul><ul><li>Use lexical, syntactic annotation to characterize match between T , H for successful, unsuccessful entailment pairs </li></ul></ul><ul><ul><li>Train Kernel/SVM to distinguish between match graphs </li></ul></ul>Page
    63. 63. Justification <ul><li>For most approaches justification is given only by the data Preprocessed </li></ul><ul><ul><li>Empirical Evaluation </li></ul></ul><ul><li>Logical Approaches </li></ul><ul><ul><li>There is a proof theoretic justification </li></ul></ul><ul><ul><li>Modulo the power of the resources and the ability to map a sentence to a logical form. </li></ul></ul><ul><li>Graph/tree based approaches </li></ul><ul><ul><li>There is a model theoretic justification </li></ul></ul><ul><ul><li>The approach is sound, but not complete, modulo the availably of resources. </li></ul></ul>Page
    64. 64. <ul><li>R - a knowledge representation language, with a well defined </li></ul><ul><li>syntax and semantics or a domain D . </li></ul><ul><li>For text snippets s, t : </li></ul><ul><ul><li>r s , r t - their representations in R. </li></ul></ul><ul><ul><li>M(r s ), M(r t ) their model theoretic representations </li></ul></ul><ul><li>There is a well defined notion of subsumption in R , defined model theoretically </li></ul><ul><li>u, v 2 R : u is subsumed by v when M(u) µ M(v) </li></ul><ul><li>Not an algorithm; need a proof theory. </li></ul>Justifying Graph Based Approaches [Braz et. al 05] Page
    65. 65. <ul><li>The proof theory is weak; will show r s µ r t only when they are relatively similar syntactically. </li></ul><ul><li>r 2 R is faithful to s if M(r s ) = M(r) </li></ul><ul><li>Definition : Let s, t, be text snippets with representations r s , r t 2 R. </li></ul><ul><li>We say that s semantically entails t if there is a representation r 2 R that is faithful to s, for which we can prove that r µ r t </li></ul><ul><li>Given r s need to generate many equivalent representations r’ s and test r’ s µ r t </li></ul>Defining Semantic Entailment (2) Page Cannot be done exhaustively How to generate alternative representations?
    66. 66. <ul><li>A rewrite rule (l,r) is a pair of expressions in R such that l µ r </li></ul><ul><li>Given a representation r s of s and a rule (r,l) for which r s µ l the augmentation of r s via (l,r) is r’ s = r s Æ r. </li></ul><ul><li>Claim: r’ s is faithful to s. </li></ul><ul><li>Proof: In general, since r’ s = r s Æ r then M(r’ s )= M(r s ) Å M(r) However, since r s µ l µ r then M(r s ) µ M(r). </li></ul><ul><li>Consequently: M(r’ s )= M(r s ) </li></ul><ul><li>And the augmented representation is faithful to s. </li></ul>Defining Semantic Entailment (3) Page r s l µ r , r s µ l µ r’ s = r s Æ r
    67. 67. <ul><li>The claim suggests an algorithm for generating alternative (equivalent) representations and for semantic entailment. </li></ul><ul><li>The resulting algorithm is a sound algorithm, but is not complete. </li></ul><ul><li>Completeness depends on the quality of the KB of rules. </li></ul><ul><li>The power of this algorithm is in the rules KB. </li></ul><ul><li>l and r might be very different syntactically, but by satisfying model theoretic subsumption they provide expressivity to the re-representation in a way that facilitates the overall subsumption. </li></ul>Comments Page
    68. 68. <ul><li>The problem of determining non-entailment is harder, mostly due to it’s structure. </li></ul><ul><li>Most approaches determine non-entailment heuristically. </li></ul><ul><ul><li>Set a threshold for a cost function. If not met by the pair, say ‘now’ </li></ul></ul><ul><ul><li>Several approach has identified specific features the hind on non-entialment. </li></ul></ul><ul><li>A model Theoretic approach for non-entailment has also been developed, although it’s effectiveness isn’t clear yet. </li></ul>Non-Entailment Page
    69. 69. What are we missing? <ul><li>It is completely clear that the key resource missing is knowledge. </li></ul><ul><ul><li>Better resources translate immediately to better results. </li></ul></ul><ul><ul><li>At this point existing resources seem to be lacking in coverage and accuracy . </li></ul></ul><ul><ul><li>Not enough high quality public resources; no quantification. </li></ul></ul><ul><li>Some Examples </li></ul><ul><ul><li>Lexical Knowledge: Some cases are difficult to acquire systematically. </li></ul></ul><ul><ul><ul><li>A bought Y  A has/owns Y </li></ul></ul></ul><ul><ul><ul><li>Many of the current lexical resources are very noisy. </li></ul></ul></ul><ul><ul><li>Numbers, quantitative reasoning </li></ul></ul><ul><ul><li>Time and Date; Temporal Reasoning. </li></ul></ul><ul><ul><li>Robust event based reasoning and information integration </li></ul></ul>Page
    70. 70. Textual Entailment as a Classification Task Page
    71. 71. RTE as classification task <ul><li>RTE is a classification task: </li></ul><ul><ul><li>Given a pair we need to decide if T implies H or T does not implies H </li></ul></ul><ul><li>We can learn a classifier from annotated examples </li></ul><ul><li>What do we need: </li></ul><ul><li>A learning algorithm </li></ul><ul><li>A suitable feature space </li></ul>Page Page
    72. 72. Defining the feature space <ul><li>How do we define the feature space? </li></ul><ul><li>Possible features </li></ul><ul><ul><li>“ Distance Features” - Features of “some” distance between T and H </li></ul></ul><ul><ul><li>“ Entailment trigger Features” </li></ul></ul><ul><ul><li>“ Pair Feature” – The content of the T-H pair is represented </li></ul></ul><ul><li>Possible representations of the sentences </li></ul><ul><ul><li>Bag-of-words (possibly with n-grams) </li></ul></ul><ul><ul><li>Syntactic representation </li></ul></ul><ul><ul><li>Semantic representation </li></ul></ul>Page Page T 1 H 1 “ At the end of the year, all solid companies pay dividends.” “ At the end of the year, all solid insurance companies pay dividends.” T 1  H 1
    73. 73. Distance Features <ul><li>Possible features </li></ul><ul><ul><li>Number of words in common </li></ul></ul><ul><ul><li>Longest common subsequence </li></ul></ul><ul><ul><li>Longest common syntactic subtree </li></ul></ul><ul><ul><li>… </li></ul></ul>Page Page T H “ At the end of the year, all solid companies pay dividends.” “ At the end of the year, all solid insurance companies pay dividends.” T  H
    74. 74. Entailment Triggers <ul><li>Possible features </li></ul><ul><ul><li>from (de Marneffe et al., 2006) </li></ul></ul><ul><ul><li>Polarity features </li></ul></ul><ul><ul><ul><li>presence/absence of neative polarity contexts (not,no or few, without) </li></ul></ul></ul><ul><ul><ul><ul><li>“ Oil price surged”  “Oil prices didn’t grow” </li></ul></ul></ul></ul><ul><ul><li>Antonymy features </li></ul></ul><ul><ul><ul><li>presence/absence of antonymous words in T and H </li></ul></ul></ul><ul><ul><ul><ul><li>“ Oil price is surging”  “Oil prices is falling down” </li></ul></ul></ul></ul><ul><ul><li>Adjunct features </li></ul></ul><ul><ul><ul><li>dropping/adding of syntactic adjunct when moving from T to H </li></ul></ul></ul><ul><ul><ul><ul><li>“ all solid companies pay dividends”  “all solid companies pay cash dividends” </li></ul></ul></ul></ul><ul><ul><li>… </li></ul></ul>Page Page
    75. 75. Pair Features <ul><li>Possible features </li></ul><ul><ul><li>Bag-of-word spaces of T and H </li></ul></ul><ul><ul><li>Syntactic spaces of T and H </li></ul></ul>Page Page end_T year_T solid_T companies_T pay_T dividends_T … … end_H year_H solid_H companies_H pay_H dividends_H … … insurance_H T H T H “ At the end of the year, all solid companies pay dividends.” “ At the end of the year, all solid insurance companies pay dividends.” T  H
    76. 76. Pair Features: what can we learn? <ul><li>Bag-of-word spaces of T and H </li></ul><ul><ul><li>We can learn: </li></ul></ul><ul><ul><ul><li>T implies H as when T contains “end”… </li></ul></ul></ul><ul><ul><ul><li>T does not imply H when H contains “end”… </li></ul></ul></ul>Page Page end_T year_T solid_T companies_T pay_T dividends_T … … end_H year_H solid_H companies_H pay_H dividends_H … … insurance_H T H It seems to be totally irrelevant!!!
    77. 77. ML Methods in the possible feature spaces Page Page (…) (…) (…) Possible Features Sentence representation Bag-of-words Semantic Distance Pair (Hickl et al., 2006) Syntactic Entailment Trigger (Zanzotto&Moschitti, 2006) (Bos&Markert, 2006) (Ipken et al., 2006) (Kozareva&Montoyo, 2006) (de Marneffe et al., 2006) (Herrera et al., 2006) (Rodney et al., 2006)
    78. 78. Effectively using the Pair Feature Space <ul><li>Roadmap </li></ul><ul><li>Motivation: Reason why it is important even if it seems not. </li></ul><ul><li>Understanding the model with an example </li></ul><ul><ul><li>Challenges </li></ul></ul><ul><ul><li>A simple example </li></ul></ul><ul><li>Defining the cross-pair similarity </li></ul>Page Page (Zanzotto, Moschitti, 2006)
    79. 79. Observing the Distance Feature Space… Page Page (Zanzotto, Moschitti, 2006) % common syntactic dependencies % common words In a distance feature space… … the two pairs are very likely the same point T 1 H 1 “ At the end of the year, all solid companies pay dividends.” “ At the end of the year, all solid insurance companies pay dividends.” T 1  H 1 T 1 H 2 “ At the end of the year, all solid companies pay dividends.” “ At the end of the year, all solid companies pay cash dividends.” T 1  H 2 T 1  H 1 T 1  H 2
    80. 80. What can happen in the pair feature space? Page Page (Zanzotto, Moschitti, 2006) T 1 H 1 “ At the end of the year, all solid companies pay dividends.” “ At the end of the year, all solid insurance companies pay dividends.” T 1  H 1 T 1 H 2 “ At the end of the year, all solid companies pay dividends.” “ At the end of the year, all solid companies pay cash dividends.” T 1  H 2 T 3 H 3 “ All wild animals eat plants that have scientifically proven medicinal properties. ” “ All wild mountain animals eat plants that have scientifically proven medicinal properties. ” T 3  H 3 S 2 S 1 <
    81. 81. Observations <ul><li>Some examples are difficult to be exploited in the distance feature space… </li></ul><ul><li>We need a space that considers the content and the structure of textual entailment examples </li></ul><ul><li>Let us explore: </li></ul><ul><li>the pair space! </li></ul><ul><li>… using the Kernel Trick : define the space defining the distance K( P 1 , P 2 ) instead of defining the feautures </li></ul>Page Page K( T 1  H 1 ,T 1  H 2 ) T 1  H 1 T 1  H 2
    82. 82. Target <ul><li>How do we build it: </li></ul><ul><ul><li>Using a syntactic interpretation of sentences </li></ul></ul><ul><ul><li>Using a similarity among trees K T (T’,T’’) : this similarity counts the number of subtrees in common between T’ and T’’ </li></ul></ul><ul><ul><li>This is a syntactic pair feature space </li></ul></ul><ul><ul><li>Question: do we need something more? </li></ul></ul>Page Page (Zanzotto, Moschitti, 2006) <ul><li>Cross-pair similarity </li></ul><ul><ul><li>K S ((T’,H’),(T’’,H’’))  K T (T’,T’’)+ K T (H’,H’’) </li></ul></ul>
    83. 83. Observing the syntactic pair feature space <ul><li>Can we use syntactic tree similarity? </li></ul>Page Page (Zanzotto, Moschitti, 2006)
    84. 84. Observing the syntactic pair feature space <ul><li>Can we use syntactic tree similarity? </li></ul>Page Page (Zanzotto, Moschitti, 2006)
    85. 85. Observing the syntactic pair feature space <ul><li>Can we use syntactic tree similarity? Not only! </li></ul>Page Page (Zanzotto, Moschitti, 2006)
    86. 86. Observing the syntactic pair feature space <ul><li>Can we use syntactic tree similarity? Not only! </li></ul><ul><li>We want to use/exploit also the implied rewrite rule </li></ul>Page Page (Zanzotto, Moschitti, 2006) a b c d a b c d a b c d a b c d
    87. 87. Exploiting Rewrite Rules <ul><li>To capture the textual entailment recognition rule ( rewrite rule or inference rule ), the cross-pair similarity measure should consider: </li></ul><ul><ul><li>the structural/syntactical similarity between, respectively, texts and hypotheses </li></ul></ul><ul><ul><li>the similarity among the intra-pair relations between constituents </li></ul></ul>Page Page How to reduce the problem to a tree similarity computation? (Zanzotto, Moschitti, 2006)
    88. 88. Exploiting Rewrite Rules Page Page (Zanzotto, Moschitti, 2006)
    89. 89. Exploiting Rewrite Rules Page Page Intra-pair operations (Zanzotto, Moschitti, 2006)
    90. 90. Exploiting Rewrite Rules Page Page Intra-pair operations  Finding anchors (Zanzotto, Moschitti, 2006)
    91. 91. Exploiting Rewrite Rules Page Page <ul><li>Intra-pair operations </li></ul><ul><li>Finding anchors </li></ul><ul><li>Naming anchors with placeholders </li></ul>(Zanzotto, Moschitti, 2006)
    92. 92. Exploiting Rewrite Rules Page Page <ul><li>Intra-pair operations </li></ul><ul><li>Finding anchors </li></ul><ul><li>Naming anchors with placeholders </li></ul><ul><li>Propagating placeholders </li></ul>(Zanzotto, Moschitti, 2006)
    93. 93. Exploiting Rewrite Rules Page Page <ul><li>Intra-pair operations </li></ul><ul><li>Finding anchors </li></ul><ul><li>Naming anchors with placeholders </li></ul><ul><li>Propagating placeholders </li></ul>Cross-pair operations (Zanzotto, Moschitti, 2006)
    94. 94. Exploiting Rewrite Rules Page Page <ul><li>Cross-pair operations </li></ul><ul><li>Matching placeholders across pairs </li></ul><ul><li>Intra-pair operations </li></ul><ul><li>Finding anchors </li></ul><ul><li>Naming anchors with placeholders </li></ul><ul><li>Propagating placeholders </li></ul>(Zanzotto, Moschitti, 2006)
    95. 95. Exploiting Rewrite Rules Page Page <ul><li>Cross-pair operations </li></ul><ul><li>Matching placeholders across pairs </li></ul><ul><li>Renaming placeholders </li></ul><ul><li>Intra-pair operations </li></ul><ul><li>Finding anchors </li></ul><ul><li>Naming anchors with placeholders </li></ul><ul><li>Propagating placeholders </li></ul>
    96. 96. Exploiting Rewrite Rules Page Page <ul><li>Intra-pair operations </li></ul><ul><li>Finding anchors </li></ul><ul><li>Naming anchors with placeholders </li></ul><ul><li>Propagating placeholders </li></ul><ul><li>Cross-pair operations </li></ul><ul><li>Matching placeholders across pairs </li></ul><ul><li>Renaming placeholders </li></ul><ul><li>Calculating the similarity between syntactic trees with co-indexed leaves </li></ul>
    97. 97. Exploiting Rewrite Rules Page Page <ul><li>Intra-pair operations </li></ul><ul><li>Finding anchors </li></ul><ul><li>Naming anchors with placeholders </li></ul><ul><li>Propagating placeholders </li></ul><ul><li>Cross-pair operations </li></ul><ul><li>Matching placeholders across pairs </li></ul><ul><li>Renaming placeholders </li></ul><ul><li>Calculating the similarity between syntactic trees with co-indexed leaves </li></ul>(Zanzotto, Moschitti, 2006)
    98. 98. Exploiting Rewrite Rules <ul><li>The initial example: sim(H 1 ,H 3 ) > sim(H 2 ,H 3 )? </li></ul>Page Page (Zanzotto, Moschitti, 2006)
    99. 99. Defining the Cross-pair similarity <ul><li>The cross pair similarity is based on the distance between syntatic trees with co-indexed leaves: </li></ul><ul><li>where </li></ul><ul><ul><li>C is the set of all the correspondences between anchors of (T’,H’) and (T’’,H’’) </li></ul></ul><ul><ul><li>t ( S, c ) returns the parse tree of the hypothesis (text) S where placeholders of these latter are replaced by means of the substitution c </li></ul></ul><ul><ul><li>i is the identity substitution </li></ul></ul><ul><ul><li>K T ( t 1 , t 2 ) is a function that measures the similarity between the two trees t 1 and t 2 . </li></ul></ul>Page Page (Zanzotto, Moschitti, 2006)
    100. 100. Defining the Cross-pair similarity Page Page
    101. 101. Refining Cross-pair Similarity <ul><li>Controlling complexity </li></ul><ul><ul><li>We reduced the size of the set of anchors using the notion of chunk </li></ul></ul><ul><li>Reducing the computational cost </li></ul><ul><ul><li>Many subtree computations are repeated during the computation of K T (t 1 , t 2 ). This can be exploited for a better dynamic progamming algorithm (Moschitti&Zanzotto, 2007) </li></ul></ul><ul><li>Focussing on information within a pair relevant for the entailment: </li></ul><ul><ul><li>Text trees are pruned according to where anchors attach </li></ul></ul>Page Page (Zanzotto, Moschitti, 2006)
    102. 102. BREAK (30 min) Page
    103. 103. III. Knowledge Acquisition Methods Page
    104. 104. Knowledge Acquisition for TE <ul><li>What kind of knowledge we need? </li></ul><ul><li>Explicit Knowledge (Structured Knowledge Bases) </li></ul><ul><ul><li>Relations among words (or concepts) </li></ul></ul><ul><ul><ul><li>Symmetric: Synonymy, cohypohymy </li></ul></ul></ul><ul><ul><ul><li>Directional: hyponymy, part of, … </li></ul></ul></ul><ul><ul><li>Relations among sentence prototypes </li></ul></ul><ul><ul><ul><li>Symmetric: Paraphrasing </li></ul></ul></ul><ul><ul><ul><li>Directional : Inference Rules/Rewrite Rules </li></ul></ul></ul><ul><li>Implicit Knowledge </li></ul><ul><ul><li>Relations among sentences </li></ul></ul><ul><ul><ul><li>Symmetric: paraphrasing examples </li></ul></ul></ul><ul><ul><ul><li>Directional: entailment examples </li></ul></ul></ul>Page Page
    105. 105. Acquisition of Explicit Knowledge Page Page
    106. 106. Acquisition of Explicit Knowledge <ul><li>The questions we need to answer </li></ul><ul><li>What? </li></ul><ul><ul><li>What we want to learn? Which resources do we need? </li></ul></ul><ul><li>Using what? </li></ul><ul><ul><li>Which are the principles we have? </li></ul></ul><ul><li>How? </li></ul><ul><ul><li>How do we organize the “knowledge acquisition” algorithm </li></ul></ul>Page Page
    107. 107. Acquisition of Explicit Knowledge: what? <ul><li>Types of knowledge </li></ul><ul><li>Symmetric </li></ul><ul><ul><li>Co-hyponymy </li></ul></ul><ul><ul><ul><li>Between words: cat  dog </li></ul></ul></ul><ul><ul><li>Synonymy </li></ul></ul><ul><ul><ul><li>Between words: buy  acquire </li></ul></ul></ul><ul><ul><ul><li>Sentence prototypes (paraphrasing) : X bought Y  X acquired Z% of the Y’s shares </li></ul></ul></ul><ul><li>Directional semantic relations </li></ul><ul><ul><li>Words: cat  animal , buy  own , wheel partof car </li></ul></ul><ul><ul><li>Sentence prototypes : X acquired Z% of the Y’s shares  X owns Y </li></ul></ul>Page Page
    108. 108. Acquisition of Explicit Knowledge : Using what? <ul><li>Underlying hypothesis </li></ul><ul><li>Harris’ Distributional Hypothesis (DH) (Harris, 1964) </li></ul><ul><ul><li>“ Words that tend to occur in the same contexts tend to have similar meanings.” </li></ul></ul><ul><li>Robison’s Point-wise Assertion Patterns (PAP) (Robison, 1970) </li></ul><ul><ul><li>“ It is possible to extract relevant semantic relations with some pattern.” </li></ul></ul>Page Page sim( w 1 , w 2 )  sim(C( w 1 ), C( w 2 )) w 1 is in a relation r with w 2 if the context pattern (w 1 , w 2 )
    109. 109. Distributional Hypothesis (DH) Page Page Words or Forms Context (Feature) Space sim w (W 1 ,W 2 )  sim ctx (C(W 1 ), C(W 2 )) w 1 = constitute w 2 = compose C(w 1 ) C(w 2 ) Corpus: source of contexts … sun is constituted of hydrogen … … The Sun is composed of hydrogen …
    110. 110. Point-wise Assertion Patterns (PAP) Page Page w 1 is in a relation r with w 2 if the contexts patterns r (w 1 , w 2 ) relation w 1 part_of w 2 patterns “ w 1 is constituted of w 2 ” “ w 1 is composed of w 2 ” Corpus: source of contexts … sun is constituted of hydrogen … … The Sun is composed of hydrogen … part_of(sun,hydrogen) selects correct vs incorrect relations among words Statistical Indicator S corpus (w 1 ,w 2 )
    111. 111. DH and PAP cooperate Page Page Words or Forms Context (Feature) Space w 1 = constitute w 2 = compose C(w 1 ) C(w 2 ) Corpus: source of contexts … sun is constituted of hydrogen … … The Sun is composed of hydrogen … Distributional Hypothesis Point-wise assertion Patterns
    112. 112. Knowledge Acquisition: Where methods differ? <ul><li>On the “word” side </li></ul><ul><li>Target equivalence classes: Concepts or Relations </li></ul><ul><li>Target forms: words or expressions </li></ul><ul><li>On the “context” side </li></ul><ul><li>Feature Space </li></ul><ul><li>Similarity function </li></ul>Page Page Words or Forms Context (Feature) Space w 1 = cat w 2 = dog C(w 1 ) C(w 2 )
    113. 113. KA4TE: a first classification of some methods Page Page Types of knowledge Underlying hypothesis Distributional Hypothesis Point-wise assertion Patterns Symmetric Directional ISA patterns (Hearst, 1992) Verb Entailment (Zanzotto et al., 2006) Concept Learning (Lin&Pantel, 2001a) Inference Rules (DIRT) (Lin&Pantel, 2001b) Relation Pattern Learning (ESPRESSO) (Pantel&Pennacchiotti, 2006) Hearst ESPRESSO (Pantel&Pennacchiotti, 2006) Noun Entailment (Geffet&Dagan, 2005) TEASE (Szepktor et al.,2004)
    114. 114. Noun Entailment Relation <ul><li>Type of knowledge: directional relations </li></ul><ul><li>Underlying hypothesis: distributional hypothesis </li></ul><ul><li>Main Idea: distributional inclusion hypothesis </li></ul>Page Page (Geffet&Dagan, 2006) w 1  w 2 if All the prominent features of w 1 occur with w 2 in a sufficiently large corpus Words or Forms Context (Feature) Space + + + + + + + + + w 1 w 2 C( w 1 ) C( w 2 ) w 1  w 2 I(C( w 2 )) I(C( w 1 ))
    115. 115. Verb Entailment Relations <ul><li>Type of knowledge: oriented relations </li></ul><ul><li>Underlying hypothesis: point-wise assertion patterns </li></ul><ul><li>Main Idea: </li></ul>Page Page (Zanzotto, Pennacchiotti, Pazienza, 2006) relation v 1  v 2 patterns “ agentive_nominalization( v 2 ) v 1 ” Point-wise Mutual information Statistical Indicator S  (v 1 ,v 2 ) win  play ? player wins !
    116. 116. Verb Entailment Relations <ul><li>Understanding the idea </li></ul><ul><li>Selectional restriction </li></ul><ul><ul><li>fly(x)  has_wings(x) </li></ul></ul><ul><ul><li>in general </li></ul></ul><ul><ul><li>v(x)  c(x) (if x is the subject of v then x has the property c) </li></ul></ul><ul><li>Agentive nominalization </li></ul><ul><ul><li>“ agentive noun is the doer or the performer of an action v’” </li></ul></ul><ul><ul><li>“ X is player” may be read as play(x) </li></ul></ul><ul><ul><li>c(x) is clearly v’(x) if the property c is derived by v’ with an agentive nominalization </li></ul></ul>Page Page (Zanzotto, Pennacchiotti, Pazienza, 2006) Skipped
    117. 117. Verb Entailment Relations <ul><li>Understanding the idea </li></ul><ul><li>Given the expression </li></ul><ul><li>player wins </li></ul><ul><li>Seen as a selctional restriction </li></ul><ul><li>win(x)  play(x) </li></ul><ul><li>Seen as a selectional preference </li></ul><ul><li>P(play(x)|win(x)) > P(play(x)) </li></ul>Page Page Skipped
    118. 118. Knowledge Acquisition for TE: How? <ul><li>The algorithmic nature of a DH+PAP method </li></ul><ul><li>Direct </li></ul><ul><ul><li>Starting point: target words </li></ul></ul><ul><li>Indirect </li></ul><ul><ul><li>Starting point: context feature space </li></ul></ul><ul><li>Iterative </li></ul><ul><ul><li>Interplay between the context feature space and the target words </li></ul></ul>Page Page
    119. 119. Direct Algorithm Page Page Words or Forms Context (Feature) Space sim( w 1 , w 2 )  sim(C( w 1 ), C( w 2 )) w 1 = cat w 2 = dog C( w 1 ) C( w 2 ) sim( w 1 , w 2 )  sim(I(C( w 1 )), I(C( w 2 ))) <ul><li>Select target words w i from the corpus or from a dictionary </li></ul><ul><li>Retrieve contexts of each w i and represent them in the feature space C(w i ) </li></ul><ul><li>For each pair (w i , w j ) </li></ul><ul><ul><li>Compute the similarity sim(C(w i ), C(w j )) in the context space </li></ul></ul><ul><ul><li>If sim(w i , w j )= sim(C(w i ), C(w j ))>  </li></ul></ul><ul><ul><li>w i and w j belong to the same equivalence class W </li></ul></ul>sim( w 1 , w 2 ) I(C( w 1 )) I(C( w 2 )) sim(I(C( w 1 )), I(C( w 2 ))) sim(C( w 1 ), C( w 2 ))
    120. 120. Indirect Algorithm Page Page <ul><li>Given an equivalence class W, select relevant contexts and represent them in the feature space </li></ul><ul><li>Retrieve target words (w 1 , …, w n ) that appear in these contexts. These are likely to be words in the equivalence class W </li></ul><ul><li>Eventually, for each w i , retrieve C(w iI ) from the corpus </li></ul><ul><li>Compute the centroid I(C(W)) </li></ul><ul><li>For each for each w i , </li></ul><ul><li>if sim(I(C(W), w i )<t, eliminate w i from W. </li></ul>Words or Forms Context (Feature) Space sim( w 1 , w 2 )  sim(C( w 1 ), C( w 2 )) w 1 = cat w 2 = dog C( w 1 ) C( w 2 ) sim( w 1 , w 2 )  sim(I(C( w 1 )), I(C( w 2 ))) sim( w 1 , w 2 ) sim(C( w 1 ), C( w 2 ))
    121. 121. Iterative Algorithm Page Page <ul><li>For each word w i in the equivalence class W, retrieve the C(w i ) contexts and represent them in the feature space </li></ul><ul><li>Extract words w j that have contexts similar to C(w i ) </li></ul><ul><li>Extract contexts C(w j ) of these new words </li></ul><ul><li>For each for each new word w j , if sim(C(W), w j )>  , put w j in W. </li></ul>Words or Forms Context (Feature) Space sim( w 1 , w 2 )  sim(C( w 1 ), C( w 2 )) w 1 = cat w 2 = dog C( w 1 ) sim( w 1 , w 2 )  sim(I(C( w 1 )), I(C( w 2 ))) C( w 2 ) sim(C( w 1 ), C( w 2 )) sim( w 1 , w 2 )
    122. 122. Knowledge Acquisition using DH and PAH <ul><li>Direct Algorithms </li></ul><ul><ul><li>Concepts from text via clustering (Lin&Pantel, 2001) </li></ul></ul><ul><ul><li>Inference rules – aka DIRT (Lin&Pantel, 2001) </li></ul></ul><ul><ul><li>… </li></ul></ul><ul><li>Indirect Algorithms </li></ul><ul><ul><li>Hearst’s ISA patterns (Hearst, 1992) </li></ul></ul><ul><ul><li>Question Answering patterns (Ravichandran&Hovy, 2002) </li></ul></ul><ul><ul><li>… </li></ul></ul><ul><li>Iterative Algorithms </li></ul><ul><ul><li>Entailment rules from Web – aka TEASE (Szepktor et al., 2004) </li></ul></ul><ul><ul><li>Espresso (Pantel&Pennacchiotti, 2006) </li></ul></ul><ul><ul><li>… </li></ul></ul>Page Page
    123. 123. TEASE <ul><li>Type: Iterative algorithm </li></ul><ul><li>On the “word” side </li></ul><ul><li>Target equivalence classes: fine-grained relations </li></ul><ul><li>Target forms: verb with arguments </li></ul><ul><li>On the “context” side </li></ul><ul><li>Feature Space </li></ul><ul><li>Innovations with respect to reasearches < 2004 </li></ul><ul><li>First direct algorithm for extracting rules </li></ul>Page Page prevent(X,Y) X_{filler}:mi?,Y_{filler}:mi? ( Szepktor et al., 2004 ) call indictable subj obj mod X Y finally mod
    124. 124. TEASE Page Page WEB Lexicon Input template: X  subj -accuse- obj  Y Sample corpus for input template: Paula Jones accused Clinton … BBC accused Blair … Sanhedrin accused St.Paul … … Anchor sets: { Paula Jones  subj ; Clinton  obj } { Sanhedrin  subj ; St.Paul  obj } … Sample corpus for anchor sets: Paula Jones called Clinton indictable… St.Paul defended before the Sanhedrin … Templates: X call Y indictable Y defend before X … TEASE Anchor Set Extraction (ASE) Template Extraction (TE) iterate ( Szepktor et al., 2004 ) Skipped
    125. 125. TEASE <ul><li>Innovations with respect to reasearches < 2004 </li></ul><ul><li>First direct algorithm for extracting rules </li></ul><ul><li>A feature selection is done to assess the most informative features </li></ul><ul><li>Extracted forms are clustered to obtain the most general sentence prototype of a given set of equivalent forms </li></ul>Page Page ( Szepktor et al., 2004 ) Skipped call {1} indictable {1} subj {1} obj {1} mod {1} X {1} Y {1} harassment {1} for {1} S 1 : call {2} indictable {2} subj {2} obj {2} mod {2} X {2} Y {2} S 2 : finally {2} mod {2} call {1,2} indictable {1,2} subj {1,2} obj {1,2} mod {1,2} X {1,2} Y {1,2} harassment {1} for {1} finally {2} mod {2}
    126. 126. Espresso <ul><li>Type: Iterative algorithm </li></ul><ul><li>On the “word” side </li></ul><ul><li>Target equivalence classes: relations </li></ul><ul><li>Target forms: expressions, sequences of tokens </li></ul><ul><li>Innovations with respect to reasearches < 2006 </li></ul><ul><li>A measure to determine specific vs. general patterns (ranking in the equivalent forms) </li></ul>Page Page Y is composed by X, Y is made of X compose(X,Y) (Pantel&Pennacchiotti, 2006)
    127. 127. Espresso Page Page (leader , panel) (city , region) (oxygen , water) Y is composed by X X , Y Y is part of Y 1.0 Y is composed by X 0.8 Y is part of X 0.2 X , Y (tree , land) (oxygen , hydrogen) (atom, molecule) (leader , panel) (range of information, FBI report) (artifact , exhibit) … 1.0 (tree , land) 0.9 (atom, molecule) 0.7 (leader , panel) 0.6 (range of information, FBI report) 0.6 (artifact , exhibit) 0.2 (oxygen , hydrogen) (Pantel&Pennacchiotti, 2006) Skipped
    128. 128. Espresso <ul><li>Innovations with respect to reasearches < 2006 </li></ul><ul><li>A measure to determine specific vs. general patterns (ranking in the equivalent forms) </li></ul><ul><li>Both pattern and instance selections are performed </li></ul><ul><li>Different Use of General and specific patterns in the iterative algorithm </li></ul>Page Page (Pantel&Pennacchiotti, 2006) 1.0 Y is composed by X 0.8 Y is part of X 0.2 X , Y Skipped
    129. 129. Acquisition of Implicit Knowledge Page Page
    130. 130. Acquisition of Implicit Knowledge <ul><li>The questions we need to answer </li></ul><ul><li>What? </li></ul><ul><ul><li>What we want to learn? Which resources do we need? </li></ul></ul><ul><li>Using what? </li></ul><ul><ul><li>Which are the principles we have? </li></ul></ul>Page Page
    131. 131. Acquisition of Implicit Knowledge: what? <ul><li>Types of knowledge </li></ul><ul><li>Symmetric </li></ul><ul><ul><li>Nearly Synonymy between sentences </li></ul></ul><ul><ul><ul><li>Acme Inc. bought Goofy ltd.  Acme Inc. acquired 11% of the Goofy ltd.’s shares </li></ul></ul></ul><ul><li>Directional semantic relations </li></ul><ul><ul><li>Entailment between sentences </li></ul></ul><ul><ul><li>Acme Inc. acquired 11% of the Goofy ltd.’s shares  Acme Inc. owns Goofy ltd. </li></ul></ul><ul><ul><li>Note: ALSO TRICKY NOT-ENTAILMENT ARE RELEVANT </li></ul></ul>Page Page
    132. 132. Acquisition of Implicit Knowledge : Using what? <ul><li>Underlying hypothesis </li></ul><ul><li>Structural and content similarity </li></ul><ul><ul><li>“ Sentences are similar if they share enough content” </li></ul></ul><ul><li>A revised Point-wise Assertion Patterns </li></ul><ul><ul><li>“ Some patterns of sentences reveal relations among sentences” </li></ul></ul>Page Page sim( s 1 , s 2 ) according to relations from s 1 and s 2
    133. 133. A first classification of some methods Page Page Types of knowledge Underlying hypothesis Structural and content similarity Revised Point-wise assertion Patterns Symmetric Directional Relations among sentences (Hickl et al., 2006) Paraphrase Corpus (Dolan&Quirk, 2004) entails not entails Relations among sentences (Burger&Ferro, 2005)
    134. 134. Entailment relations among sentences <ul><li>Type of knowledge: directional relations (entailment) </li></ul><ul><li>Underlying hypothesis: revised point-wise assertion patterns </li></ul><ul><li>Main Idea: in headline news items, the first sentence/paragraph generally entails the title </li></ul>Page Page ( Burger&Ferro, 2005 ) relation s 2  s 1 patterns “ News Item Title(s 1 ) First_Sentence(s 2 )” This pattern works on the structure of the text
    135. 135. Entailment relations among sentences Page Page examples from the web New York Plan for DNA Data in Most Crimes Eliot Spitzer is proposing a major expansion of New York’s database of DNA samples to include people convicted of most crimes, while making it easier for prisoners to use DNA to try to establish their innocence. … Title Body Chrysler Group to Be Sold for $7.4 Billion DaimlerChrysler confirmed today that it would sell a controlling interest in its struggling Chrysler Group to Cerberus Capital Management of New York, a private equity firm that specializes in restructuring troubled companies. … Title Body
    136. 136. Tricky Not-Entailment relations among sentences <ul><li>Type of knowledge: directional relations (tricky not-entailment) </li></ul><ul><li>Underlying hypothesis: revised point-wise assertion patterns </li></ul><ul><li>Main Idea: </li></ul><ul><ul><li>in a text, sentences with a same name entity generally do not entails each other </li></ul></ul><ul><ul><li>Sentences connected by “on the contrary”, “but”, … do not entail each other </li></ul></ul>Page Page ( Hickl et al., 2006 ) relation s 1  s 2 patterns s 1 and s 2 are in the same text and share at least a named entity “ s 1 . On the contrary, s 2 ”
    137. 137. Tricky Not-Entailment relations among sentences Page Page examples from ( Hickl et al., 2006 ) One player losing a close friend is Japanese pitcher Hideki Irabu, who was befriended by Wells during spring training last year. Irabu said he would take Wells out to dinner when the Yankees visit Toronto. T H According to the professor, present methods of cleaning up oil slicks are extremely costly and are never completely efficient. T H In contrast, he stressed, Clean Mag has a 100 percent pollution retrieval rate, is low cost and can be recycled.
    138. 138. <ul><li>He used a Phillips head to tighten the screw. </li></ul><ul><li>The bank owner tightened security after a spat of local crimes. </li></ul><ul><li>The Federal Reserve will aggressively tighten monetary policy. </li></ul>Context Sensitive Paraphrasing Page ……… . Loosen Strengthen Step up Toughen Improve Fasten Impose Intensify Ease Beef up Simplify Curb Reduce Loosen Strengthen Step up Toughen Improve Fasten Impose Intensify Ease Beef up Simplify Curb Reduce
    139. 139. Context Sensitive Paraphrasing <ul><li>Can speak replace command ? </li></ul><ul><li>The general commanded his troops. </li></ul><ul><li>The general spoke to his troops. </li></ul><ul><li>The soloist commanded attention. </li></ul><ul><li>The soloist spoke to attention. </li></ul>
    140. 140. Context Sensitive Paraphrasing <ul><li>Need to know when one word can paraphrase another, not just if . </li></ul><ul><li>Given a word v and its context in sentence S , and another word u : </li></ul><ul><ul><li>Can u replace v in S and have S keep the same or entailed meaning. </li></ul></ul><ul><ul><li>Is the new sentence S’ where u has replaced v entailed by previous sentence S </li></ul></ul><ul><li>The general commanded [V] his troops. [Speak = U] </li></ul><ul><li>The general spoke to his troops. YES </li></ul><ul><li>The soloist commanded [V ] attention. [Speak = U] </li></ul><ul><li>The soloist spoke to attention. NO </li></ul>
    141. 141. Related Work <ul><li>Paraphrase generation: </li></ul><ul><ul><li>Given a sentence or phrase, generate paraphrases of that phrase which have the same or entailed meaning in some context. [DIRT;TEASE] </li></ul></ul><ul><li>A sense disambiguation task – w/o naming the sense </li></ul><ul><ul><li>Dagan et. al’06 </li></ul></ul><ul><ul><li>Kauchak & Barzilay (in the context of improving MT evaluation) </li></ul></ul><ul><ul><li>SemEval word Substitution Task; Pantel et. al ‘06 </li></ul></ul><ul><li>In these cases, this was done by learning (in a supervised way) a single classifier per word u </li></ul>
    142. 142. Context Sensitive Paraphrasing [Connor&Roth ’07] <ul><li>Use a single global binary classifier </li></ul><ul><ul><li>f( S,v,u ) ! {0,1} </li></ul></ul><ul><li>Unsupervised, bootstrapped , learning approach </li></ul><ul><li>Key: the use of a very large amount of unlabeled data to derive a reliable supervision signal that is then used to train a supervised learning algorithm. </li></ul><ul><li>Features are amount of overlap between contexts u and v have both been seen with </li></ul><ul><li>Include context sensitivity by restricting to contexts similar to S </li></ul><ul><ul><li>Are both u and v seen in contexts similar to local context S </li></ul></ul><ul><li>Allows running the classifier on previously unseen pairs (u,v) </li></ul>
    143. 143. IV. Applications of Textual Entailment Page
    144. 144. Relation Extraction (Romano et al. EACL-06) <ul><li>Identify different ways of expressing a target relation </li></ul><ul><ul><li>Examples: Management Succession, Birth - Death, Mergers and Acquisitions, Protein Interaction </li></ul></ul><ul><li>Traditionally performed in a supervised manner </li></ul><ul><ul><li>Requires dozens-hundreds examples per relation </li></ul></ul><ul><ul><li>Examples should cover broad semantic variability </li></ul></ul><ul><li>Costly - Feasible??? </li></ul><ul><li>Little work on unsupervised approaches </li></ul>Page
    145. 145. Proposed Approach Page Input Template X prevent Y Entailment Rule Acquisition Templates X prevention for Y, X treat Y, X reduce Y Syntactic Matcher Relation Instances <sunscreen, sunburns> TEASE Transformation Rules
    146. 146. Dataset <ul><li>Bunescu 2005 </li></ul><ul><li>Recognizing interactions between annotated proteins pairs </li></ul><ul><ul><li>200 Medline abstracts </li></ul></ul><ul><li>Input template : X interact with Y </li></ul>Page
    147. 147. Manual Analysis - Results <ul><li>93% of interacting protein pairs can be identified with lexical syntactic templates </li></ul>Page Frequency of syntactic phenomena: Number of templates vs. recall (within 93%): % Phenomenon % Phenomenon 8 relative clause 34 transparent head 7 co-reference 24 apposition 7 coordination 24 conjunction 2 passive form 13 set # templates R(%) # templates R(%) 39 60 2 10 73 70 4 20 107 80 6 30 141 90 11 40 175 100 21 50
    148. 148. TEASE Output for X interact with Y Page A sample of correct templates learned: X binding to Y X bind to Y X Y interaction X activate Y X attach to Y X stimulate Y X interaction with Y X couple to Y X trap Y interaction between X and Y X recruit Y X become trapped in Y X associate with Y X Y complex X be linked to Y X recognize Y X target Y X block Y
    149. 149. <ul><li>Iterative - taking the top 5 ranked templates as input </li></ul><ul><li>Morph - recognizing morphological derivations (cf. semantic role labeling vs. matching) </li></ul>TEASE Potential Recall on Training Set Page Recall Experiment 39% input 49% input + iterative 63% input + iterative + morph
    150. 150. Performance vs. Supervised Approaches Page Supervised: 180 training abstracts
    151. 151. Textual Entailment for Question Answering <ul><li>Sanda Harabagiu and Andrew Hickl (ACL-06) : Methods for Using Textual Entailment in Open-Domain Question Answering </li></ul><ul><li>Typical QA architecture – 3 stages: </li></ul><ul><ul><li>Question processing </li></ul></ul><ul><ul><li>Passage retrieval </li></ul></ul><ul><ul><li>Answer processing </li></ul></ul><ul><li>Incorporated their RTE-2 entailment system at stages 2&3, for filtering and re-ranking </li></ul>Page
    152. 152. Integrated three methods <ul><li>Test entailment between question and final answer – filter and re-rank by entailment score </li></ul><ul><li>Test entailment between question and candidate retrieved passage – combine entailment score in passage ranking </li></ul><ul><li>Test entailment between question and Automatically Generated Questions (AGQ) created from candidate paragraph </li></ul><ul><ul><li>Utilizes earlier method for generating Q-A pairs from paragraph </li></ul></ul><ul><ul><li>Correct answer should match that of an entailed AGQ </li></ul></ul><ul><li>TE is relatively easy to integrate at different stages </li></ul><ul><li>Results: 20% accuracy increase </li></ul>Page
    153. 153. Answer Validation Exercise @ CLEF 2006-7 <ul><li>Peñas et al., Journal of Logic and Computation (to appear) </li></ul><ul><li>Allow textual entailment systems to validate (and prioritize) the answers of QA systems participating at CLEF </li></ul><ul><li>AVE participants receive: </li></ul><ul><ul><li>question and answer – need to generate full hypothesis </li></ul></ul><ul><ul><li>supporting passage – should entail the answer hypothesis </li></ul></ul><ul><li>Methodologically: Enables to measure TE systems contribution to QA performance, across many QA systems </li></ul><ul><ul><li>TE developers do not need to have full-blown QA system </li></ul></ul>Page
    154. 154. V. A Textual Entailment view of Applied Semantics Page
    155. 155. Classical Approach = Interpretation Page Stipulated Meaning Representation (by scholar) Language (by nature) Variability <ul><li>Logical forms, word senses, semantic roles, named entity types, … - scattered interpretation tasks </li></ul><ul><li>Feasible/suitable framework for applied semantics? </li></ul>
    156. 156. Textual Entailment = Text Mapping Page Assumed Meaning (by humans) Language (by nature) Variability
    157. 157. General Case – Inference Page Meaning Representation Language Inference Interpretation Textual Entailment <ul><li>Entailment mapping is the actual applied goal - but also a touchstone for understanding! </li></ul><ul><li>Interpretation becomes possible means </li></ul><ul><ul><li>Varying representation levels may be investigated </li></ul></ul>
    158. 158. Some perspectives <ul><li>Issues with semantic interpretation </li></ul><ul><ul><li>Hard to agree on a representation language </li></ul></ul><ul><ul><li>Costly to annotate semantic representations for training </li></ul></ul><ul><ul><li>Difficult to obtain - is it more difficult than needed? </li></ul></ul><ul><li>Textual entailment refers to texts </li></ul><ul><ul><li>Texts are theory neutral </li></ul></ul><ul><ul><li>Amenable for unsupervised learning </li></ul></ul><ul><ul><li>“Proof is in the pudding” test </li></ul></ul>Page
    159. 159. Entailment as an Applied Semantics Framework <ul><li>The new view: formulate (all?) semantic problems as entailment tasks </li></ul><ul><ul><li>Some semantic problems are traditionally investigated as entailment tasks </li></ul></ul><ul><ul><li>But also… </li></ul></ul><ul><ul><li>Revised definitions of old problems </li></ul></ul><ul><ul><li>Exposing many new ones </li></ul></ul>Page
    160. 160. Some Classical Entailment Problems <ul><li>Monotonicity – traditionally approached via entailment </li></ul><ul><ul><li>Given that: dog  animal </li></ul></ul><ul><ul><ul><li>Upward monotone: Some dogs are nice  Some animals are nice </li></ul></ul></ul><ul><ul><ul><li>Downward monotone: No animals are nice  No dogs are nice </li></ul></ul></ul><ul><ul><li>Some formal approaches – via interpretation to logical form </li></ul></ul><ul><ul><li>Natural logic – avoids interpretation to FOL (cf. Stanford @ RTE-3) </li></ul></ul><ul><li>Noun compound relation identification </li></ul><ul><ul><li>a novel by Tolstoy  Tolstoy wrote a novel </li></ul></ul><ul><ul><li>Practically an entailment task, when relations are represented lexically (rather than as interpreted semantic notions) </li></ul></ul>Page
    161. 161. Revised definition of an Old Problem: Sense Ambiguity <ul><li>Classical task definition - interpretation: Word Sense Disambiguation </li></ul><ul><li>What is the RIGHT set of senses? </li></ul><ul><ul><li>Any concrete set is problematic/subjective </li></ul></ul><ul><ul><li>… but WSD forces you to choose one </li></ul></ul><ul><li>A lexical entailment perspective: </li></ul><ul><ul><li>Instead of identifying an explicitly stipulated sense of a word occurrence ... </li></ul></ul><ul><ul><li>identify whether a word occurrence (i.e. its implicit sense) entails another word occurrence, in context </li></ul></ul><ul><ul><li>Dagan et al. ( ACL-2006) </li></ul></ul>Page
    162. 162. Synonym Substitution <ul><li>Source = record Target = disc </li></ul><ul><li>This is anyway a stunning disc , thanks to the playing of the Moscow Virtuosi with Spivakov. </li></ul><ul><li>He said computer networks would not be affected and copies of information should be made on floppy discs . </li></ul><ul><li>Before the dead soldier was placed in the ditch his personal possessions were removed, leaving one disc on the body for identification purposes. </li></ul>Page positive negative negative
    163. 163. Unsupervised Direct: kNN-ranking <ul><li>Test example score: Average Cosine similarity of target example with k most similar (unlabeled) instances of source word </li></ul><ul><li>Rational: </li></ul><ul><ul><li>positive examples of target will be similar to some source occurrence (of corresponding sense) </li></ul></ul><ul><ul><li>negative target examples won’t be similar to source examples </li></ul></ul><ul><li>Rank test examples by score </li></ul><ul><ul><li>A classification slant on language modeling </li></ul></ul>Page
    164. 164. Results (for synonyms): Ranking Page <ul><li>kNN improves 8-18% precision up to 25% recall </li></ul>
    165. 165. Other Modified and New Problems <ul><li>Lexical entailment vs. classical lexical semantic relationships </li></ul><ul><ul><li>synonym ⇔ s ynonym </li></ul></ul><ul><ul><li>hyponym ⇒ hypernym (but much beyond WN – e.g. “ medical technology ” ) </li></ul></ul><ul><ul><li>meronym ⇐ ? ⇒ holonym – depending on meronym type, and context </li></ul></ul><ul><ul><ul><li>boil on elbow ⇒ boil on arm vs. government voted ⇒ minister voted </li></ul></ul></ul><ul><li>Named Entity Classification – by any textual type </li></ul><ul><ul><li>Which pickup trucks are produced by Mitsubishi? Magnum  pickup truck </li></ul></ul><ul><li>Argument mapping for nominalizations (derivations) </li></ul><ul><ul><li>X’s acquisition of Y  X acquired Y </li></ul></ul><ul><ul><li>X’s acquisition by Y  Y acquired X </li></ul></ul><ul><li>Transparent head </li></ul><ul><ul><li>sell to an IBM division  sell to IBM </li></ul></ul><ul><ul><li>sell to an IBM competitor ⇏ sell to IBM </li></ul></ul><ul><li>… </li></ul>Page
    166. 166. The importance of analyzing entailment examples <ul><li>Few systematic manual data analysis works were reported </li></ul><ul><ul><li>Vanderwende et al. at RTE-1 workshop </li></ul></ul><ul><ul><li>Bar-Haim et al. at ACL-05 EMSEE Workshop </li></ul></ul><ul><ul><li>Within Romano et al. at EACL-06 </li></ul></ul><ul><ul><li>Xerox Parc Data set; Braz et. IJCAI workshop’05 </li></ul></ul><ul><li>Contribute a lot to understanding and defining entailment phenomena and sub-problems </li></ul><ul><li>Should be done (and reported) much more… </li></ul>Page
    167. 167. Unified Evaluation Framework <ul><li>Defining semantic problems as entailment problems facilitates unified evaluation schemes (vs. current state) </li></ul><ul><li>Possible evaluation schemes: </li></ul><ul><ul><li>Evaluate on the general TE task, while creating corpora which focus on target sub-tasks </li></ul></ul><ul><ul><ul><li>E.g. a TE dataset with many sense-matching instances </li></ul></ul></ul><ul><ul><ul><li>Measure impact of sense-matching algorithms on TE performance </li></ul></ul></ul><ul><ul><li>Define TE-oriented subtasks, and evaluate directly on sub-task </li></ul></ul><ul><ul><ul><li>E.g. a test collection manually annotated for sense-matching </li></ul></ul></ul><ul><ul><ul><li>Advantages: isolate sub-problem; researchers can investigate individual problems without needing a full-blown TE system (cf. QA research) </li></ul></ul></ul><ul><ul><ul><li>Such datasets may be derived from datasets of type (1) </li></ul></ul></ul><ul><li>Facilitates common inference goal across semantic problems </li></ul>Page
    168. 168. Summary: Textual Entailment as Goal <ul><li>The essence of the textual entailment paradigm: </li></ul><ul><ul><li>Base applied semantic inference on entailment “engines” and KBs </li></ul></ul><ul><ul><li>Formulate various semantic problems as entailment sub-tasks </li></ul></ul><ul><li>Interpretation and “mapping” methods may compete/complement </li></ul><ul><ul><li>at various levels of representations </li></ul></ul><ul><li>Open question: which inferences </li></ul><ul><ul><li>can be represented at “language” level? </li></ul></ul><ul><ul><li>require logical or specialized representation and inference? (temporal, spatial, mathematical, …) </li></ul></ul>Page
    169. 169. Textual Entailment ≈ Human Reading Comprehension <ul><li>From a children’s English learning book (Sela and Greenberg): </li></ul><ul><li>Reference Text: “…The Bermuda Triangle lies in the Atlantic Ocean, off the coast of Florida . …” </li></ul><ul><li>Hypothesis (True/False?): The Bermuda Triangle is near the United States </li></ul>Page ???
    170. 170. Cautious Optimism: Approaching the Desiderata? <ul><ul><li>Generic (feasible) module for applications </li></ul></ul><ul><ul><li>Unified (agreeable) paradigm for investigating language phenomena </li></ul></ul>Page Thank you!
    171. 171. Lexical Entailment for Applications <ul><li>Sense equivalence </li></ul>Page T1 : IKEA announced a new comfort chair Q : announcement of new models of chairs T2 : MIT announced a new CS chair position T1 : IKEA announced a new comfort chair Q : announcement of new models of furniture T2 : MIT announced a new CS chair position <ul><li>Sense entailment </li></ul>
    172. 172. Meeting the knowledge challenge – by a coordinated effort? <ul><li>A vast amount of “entailment knowledge” needed </li></ul><ul><li>Speculation: can we have a joint community effort for knowledge acquisition? </li></ul><ul><ul><li>Uniform representations (yet to be defined) </li></ul></ul><ul><ul><li>Mostly automatic acquisition (millions of “rules”) </li></ul></ul><ul><ul><li>Human Genome Project analogy </li></ul></ul><ul><li>Teaser: RTE-3 Resources Pool at ACLWiki (set up by Patrick Pantel) </li></ul>Page
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×