Text Processing for Procedural Question Answering

427 views

Published on

Material of the Natural Language Processing (NLP) Workshop with STIC-Asia representatives and the Nepal team.
August 30-31, 2007.
Institution: Institut de Recherche en Informatique de Toulouse (IRIT)
Patan Dhoka, Lalitpur, Nepal.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
427
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Text Processing for Procedural Question Answering

  1. 1. Text Processing for Procedural Question Answering Undergoing work for TextCoop project ILPL group, presentation by Estelle Delpech
  2. 2. Text Processing for Procedural Question Answering I. INTRODUCTION : GLOBAL ARCHITECTURE II. CLUES TO IDENTIFY TITLES/ INSTRUCTIONNAL COMPOUNDS III. THE WHOLE PROCESS IV. MAIN ISSUES V. DEMO
  3. 3. I. INTRODUCTION : GLOBAL ARCHITECTURE
  4. 4. A global Architecture (Surdeau & Pasca) How to…? Goal Task TEXT PROCESSING
  5. 5. TEXT PROCESSING for Procedural QA : Identification of task structure .html PRE-PROCESSING SEGMENTER TEXT GRAMMAR TASK     HTML cleaning MS tagging Identification of terminal symbols Xbar analysis of task structure DATABASE spec G’ Pre-requisite Goal Title complemen t Instructional Compound
  6. 6. II . CORPUS OBSERVATION : WHAT CLUES TO IDENTIFY -INSTRUCTIONNAL COMPOUNDS ? -TITLES ?
  7. 7. 1. Clues for Instructional Compounds Identification  Definition : kernel instructions linked to various clauses by rhetorical or logical relations.  Identification in two steps :   Detect presence of instructions : expression of obligation Find instructionnal compound boudaries, e.g. connectors… Fixing the first wall plate (or shelf bracket) Fixing the first wall plate (or shelf bracket) Fixing the first wall plate (or shelf bracket) We are going to mark the first wall plate (or bracket) for drilling. We are going to mark the first wall plate (or bracket) for drilling. First,position the face plate so one screw lines up with the mark on the wall you First, position the face plate sosoone screw lines up with the mark on the wall you made First, position the face plate one screw lines up with the mark on the wall you made made in the last step and the level on topon top of the faceto ensure it is level. level. in the last step and place the level on top of the face plate to ensure it is level. in the last step and place place the level of the face plate plate to ensure it is Second, you should mark thethewall in the next screw hole, again by turning the screw Second,you should mark the wallthethe next screw hole, again turning thethe screw Second, you should mark wall in in next screw hole, again by by turning screw until it bites into the wall (see fig 1.3). until it bites into the wall (see fig 1.3). It is advised that you mark any remaining screw holes while keeping the wall plate It is advised that you mark any remaining screw holes while keeping the wall plate firmly in position. firmly in position. Now you have toto choose suitable drill bitbit (masonry or the right type for the Now you have choose a a suitable drill (masonry or or right type for the surface). It Now you have to choosea suitable drill bit (masonry thethe right type for the surface). It should be theas the wall plug thebe used. to be used. surface). the same width same width as to wall plug should beIt should be the same width as the wall plug to be used. Get to hand one of the wall plugs, and place itit against the tip of the drill bit (seefig Get to hand one of the wall plugs, and place against the tip of the drill bit (see fig Get to hand one of the wall plugs, and place it against the tip of the drill bit (see fig 1.4). 1.4). Finally, Place a piece of masking tape on the drill bit to use as a guide, this will ensure piece of masking tape on the drill bit to use as a guide, this will ensure Finally, place aa piece of masking tape on the drill bit to use as a guide, this will ensure Finally, place you don't drill too deep. you don't drill too deep.
  8. 8. 1. Clues for Instructional Compounds Identification  Presence of instructions :  Morpho-lexical patterns You should pre-heat the oven shall Adv* base form verb Have to Adv* base form verb You have to pre-heat the oven ## Op? adv* base form verb Do not pre-heat the oven it be adv* (necessary|compulsory) that It is better that you pre-heat the oven  Compound boudaries :  Morpho-lexical patterns ## to Adv* base form verb .* , (##|Conj) (if|then|after )  [To cook the cake, pre-heat the oven] [and then start peeling … [If you want to cook the cake, preHTML tags (typo-disposition) : heat the oven.] [If you don’t want to cook … <p> </p> <li> </li> <li> [ Pre-heat the oven … ]</li>
  9. 9. 2. Titles identification : About the HTML encoding of titles  The <hn> tag can not be used as a single clue for title identification  HTML encoding is free, the code can be underspecified (css)  Corpus observation :      80 % titles are encoded with <b> 57 % <b> encode titles 64 % <h> encode titles the coding varies from a web site to another We had to find some other clues …
  10. 10. 2. Clues for Title Identification  Some helpful visual Clues :    Short sequence of word Emphasized Spaced from the rest of the text  emphasized not not a title   not short
  11. 11. 2. Clues for Title Identification  Linguistic Clues :   Rarely contains tensed verb Can be a single question ? ?  Textual environment clues :    Occurs between two paragraphs of text Occurs between title and a paragraph of text No single clue, but a bundle of clues ? ?
  12. 12. III. THE WHOLE PROCESS   HTML cleaning MS tagging PRE-PROCESSING  SEGMENTER Identification of terminal symbols Title Instructional Compound
  13. 13. 1. HTML Cleaning module Raw HTML Code  HTML Cleaning Text chunks tags The output of the HTML <p> Cleaning module is :   <div> <p> <ol> a list of text chunks, <ul> corresponding more or less to paragraph breaks Subdivision tags <br> <br> Their corresponding typo<li> <li> dispositionnal structure Emphasis tags <h> <b> <u> <i> Main typo-dispostional information <p> <b> <p> <li> <li> <p> <b> <p> <b> <br> <br> <p> <b> <b> <br>
  14. 14. 2. Clues Collection module STRUCTURE <b> <li> <li> TEXT MS Tagging TAGS  Collection module is : TreeTagger <b> <br> <br> <b> <br> <b> <li> <li> the list of text chunks with :   Nb corresponding typoTheir of instructions  Instructions types dispositionnal structure   Nb of goals Text with tagged instructions, goals,  Nb of words connectors  Nb of sentences  Linguistic information  Nb of question This information is used for :  Nb of tensed verbs  Titles identification  Instructionnal compounds identification  <b> <b> Clues The output collection of the Clues CLUES 
  15. 15. 3. Processing each chunk : text or title ? TEXT CHUNKS TYPE unknown unknown Short chunk spaced from the rest of the text with emphasis a single question  Identification of unambiguous Titles    unknown unknown unknown unknown unknown unknown title text text ambiguous unknown unknown TEXT CHUNKS  Identification of unambiguous paragraphs of text     Long chunk No emphasis Subdivided + than 1 instruction presence of tensed verbs ambiguous title ambiguous text text ambiguous
  16. 16. 3. Ambiguous chunks : text or title ?    Short chunks with no emphasis Instruction-like short chunks Use of textual environement clues : 1. Identify unambiguous titles/paragraphs of text 2. Desambiguates the remaining chunks
  17. 17. 3. Ambiguous chunks : text or title ? TEXT CHUNKS  title text text Desambiguisation using textual environment clues  ambiguous a series of ambiguous paragraphs become text an ambiguous paragraph between two paragraphs of text becomes a title ambiguous title ambiguous text ambiguous text TEXT CHUNKS title text text text text  an ambiguous paragraph between two paragraphs of text becomes a title title title text title text
  18. 18. MAIN GOAL OUTPUT EXAMPLE goal MAIN TASK task goal task goal task
  19. 19. IV. Main issues : noise in web pages  « noise » of web pages : advertisements, lists of links, navigation help... interfers with compouds /title identification :     short sequence emphasis linguistic form:   Base form verb at the beginning of a sentence typical of a title or an instruction  but it is a list of links !! titles instruction titles
  20. 20. IV. Main issues : refining goal/titles identification only sub-goals  sub tasks relations are identified  what about the hierarchy task/sub-task(s) ?  what about the head title / main goal ?  the head title is not always the 1st identified title (noise) sometimes there is no head title    what if the action is implicit ?    ex : the room and the bed implicit : how to clean the room and the bed some ideas :   choose a title that has vocabulary in common with instructions identify action verbs in relation with the nouns of the title
  21. 21. V. DEMO

×