SlideShare a Scribd company logo
Text Processing for Procedural
Question Answering
Undergoing work for TextCoop project
ILPL group, presentation by Estelle Delpech
Text Processing for Procedural
Question Answering
I.

INTRODUCTION : GLOBAL
ARCHITECTURE

II.

CLUES TO IDENTIFY TITLES/
INSTRUCTIONNAL COMPOUNDS

III.

THE WHOLE PROCESS

IV.

MAIN ISSUES

V.

DEMO
I. INTRODUCTION : GLOBAL
ARCHITECTURE
A global Architecture (Surdeau & Pasca)
How
to…?

Goal
Task

TEXT
PROCESSING
TEXT PROCESSING for Procedural QA :
Identification of task structure
.html

PRE-PROCESSING

SEGMENTER

TEXT GRAMMAR

TASK








HTML cleaning
MS tagging

Identification of
terminal symbols
Xbar analysis of
task structure

DATABASE

spec

G’

Pre-requisite Goal

Title

complemen
t
Instructional
Compound
II . CORPUS OBSERVATION :
WHAT CLUES TO IDENTIFY
-INSTRUCTIONNAL COMPOUNDS ?
-TITLES ?
1. Clues for Instructional Compounds
Identification


Definition : kernel instructions linked to various clauses by rhetorical
or logical relations.



Identification in two steps :



Detect presence of instructions : expression of obligation
Find instructionnal compound boudaries, e.g. connectors…

Fixing the first wall plate (or shelf bracket)
Fixing the first wall plate (or shelf bracket)
Fixing the first wall plate (or shelf bracket)
We are going to mark the first wall plate (or bracket) for drilling.
We are going to mark the first wall plate (or bracket) for drilling.
First,position the face plate so one screw lines up with the mark on the wall you
First, position the face plate sosoone screw lines up with the mark on the wall you made
First, position the face plate one screw lines up with the mark on the wall you made
made in the last step and the level on topon top of the faceto ensure it is level. level.
in the last step and place the level on top of the face plate to ensure it is level.
in the last step and place place the level of the face plate plate to ensure it is
Second, you should mark thethewall in the next screw hole, again by turning the screw
Second,you should mark the wallthethe next screw hole, again turning thethe screw
Second, you should mark wall in in next screw hole, again by by turning screw
until it bites into the wall (see fig 1.3).
until it bites into the wall (see fig 1.3).
It is advised that you mark any remaining screw holes while keeping the wall plate
It is advised that you mark any remaining screw holes while keeping the wall plate
firmly in position.
firmly in position.
Now you have toto choose suitable drill bitbit (masonry or the right type for the
Now you have choose a a suitable drill (masonry or or right type for the surface). It
Now you have to choosea suitable drill bit (masonry thethe right type for the
surface). It should be theas the wall plug thebe used. to be used.
surface). the same width same width as to wall plug
should beIt should be the same width as the wall plug to be used.
Get to hand one of the wall plugs, and place itit against the tip of the drill bit (seefig
Get to hand one of the wall plugs, and place against the tip of the drill bit (see fig
Get to hand one of the wall plugs, and place it against the tip of the drill bit (see fig
1.4).
1.4).
Finally, Place a piece of masking tape on the drill bit to use as a guide, this will ensure
piece of masking tape on the drill bit to use as a guide, this will ensure
Finally, place aa piece of masking tape on the drill bit to use as a guide, this will ensure
Finally, place
you don't drill too deep.
you don't drill too deep.
1. Clues for Instructional Compounds
Identification


Presence of instructions :


Morpho-lexical patterns

You should pre-heat the oven
shall Adv* base form verb
Have to Adv* base form verb
You have to pre-heat the oven
## Op? adv* base form verb
Do not pre-heat the oven
it be adv* (necessary|compulsory) that It is better that you pre-heat
the oven


Compound boudaries :


Morpho-lexical patterns

## to Adv* base form verb .* ,
(##|Conj) (if|then|after )


[To cook the cake, pre-heat the oven]
[and then start peeling …

[If you want to cook the cake, preHTML tags (typo-disposition) : heat the oven.] [If you don’t want to
cook …

<p> </p> <li> </li>

<li> [ Pre-heat the oven … ]</li>
2. Titles identification :
About the HTML encoding of titles


The <hn> tag can not be used as a single clue for
title identification



HTML encoding is free, the code can be
underspecified (css)



Corpus observation :







80 % titles are encoded with <b>
57 % <b> encode titles
64 % <h> encode titles
the coding varies from a web site to another

We had to find some other clues …
2. Clues for Title Identification


Some helpful visual Clues :




Short sequence of word
Emphasized
Spaced from the rest of the text

 emphasized
not
not a title





not short
2. Clues for Title Identification


Linguistic Clues :



Rarely contains tensed verb
Can be a single question

?

?



Textual environment clues :






Occurs between two
paragraphs of text
Occurs between title and a
paragraph of text

No single clue, but a bundle
of clues

?

?
III. THE WHOLE PROCESS




HTML cleaning
MS tagging

PRE-PROCESSING


SEGMENTER

Identification of
terminal symbols
Title
Instructional
Compound
1. HTML Cleaning module
Raw HTML
Code


HTML
Cleaning
Text chunks tags

The output of the HTML
<p>
Cleaning module is :




<div>
<p>
<ol>
a list of text chunks,
<ul>
corresponding more or less
to paragraph breaks
Subdivision tags
<br>
<br>
Their corresponding typo<li>
<li>
dispositionnal structure
Emphasis tags
<h>
<b>
<u>
<i>

Main typo-dispostional information

<p>
<b>
<p>
<li>
<li>

<p>
<b>
<p>
<b>
<br>
<br>

<p>
<b>

<b>

<br>
2. Clues Collection module
STRUCTURE
<b>
<li>
<li>

TEXT

MS Tagging

TAGS



Collection module is :

TreeTagger

<b>
<br>
<br>
<b>
<br>

<b>
<li>
<li>

the list of text chunks with :
  Nb corresponding typoTheir of instructions
 Instructions types
dispositionnal structure
  Nb of goals
Text with tagged
instructions, goals,
 Nb of words
connectors
 Nb of sentences
 Linguistic information
 Nb of question
This information is used for :
 Nb of tensed verbs
 Titles identification
 Instructionnal compounds
identification


<b>

<b>

Clues
The output collection
of the Clues

CLUES


3. Processing each chunk : text or title ?
TEXT
CHUNKS
TYPE
unknown
unknown

Short chunk
spaced from the rest of the
text
with emphasis
a single question



Identification of
unambiguous
Titles






unknown
unknown
unknown
unknown
unknown
unknown

title
text
text
ambiguous

unknown
unknown

TEXT
CHUNKS



Identification of
unambiguous
paragraphs of
text






Long chunk
No emphasis
Subdivided
+ than 1 instruction
presence of tensed verbs

ambiguous
title
ambiguous
text
text
ambiguous
3. Ambiguous chunks : text or title ?





Short chunks with no
emphasis
Instruction-like short chunks

Use of textual environement clues :
1. Identify unambiguous titles/paragraphs of text
2. Desambiguates the remaining chunks
3. Ambiguous chunks : text or title ?
TEXT
CHUNKS


title
text
text

Desambiguisation
using textual
environment clues



ambiguous

a series of ambiguous
paragraphs become text
an ambiguous
paragraph between two
paragraphs of text
becomes a title

ambiguous
title
ambiguous
text
ambiguous
text

TEXT
CHUNKS
title
text
text
text
text



an ambiguous
paragraph between two
paragraphs of text
becomes a title

title
title
text
title
text
MAIN GOAL

OUTPUT EXAMPLE

goal

MAIN TASK
task

goal
task

goal
task
IV. Main issues : noise in web pages


« noise » of web pages : advertisements,
lists of links, navigation help...
interfers with compouds /title identification :






short sequence
emphasis
linguistic form:



Base form verb at the beginning of a sentence
typical of a title or an instruction


but it is a list of links !!

titles
instruction
titles
IV. Main issues : refining goal/titles
identification
only sub-goals  sub tasks relations are
identified



what about the hierarchy task/sub-task(s) ?



what about the head title / main goal ?



the head title is not always the 1st
identified title (noise)
sometimes there is no head title







what if the action is implicit ?





ex : the room and the bed
implicit : how to clean the room and the
bed

some ideas :




choose a title that has vocabulary in
common with instructions
identify action verbs in relation with the
nouns of the title
V. DEMO

More Related Content

Similar to Text Processing for Procedural Question Answering

Writing Process
Writing ProcessWriting Process
Writing Process
Christine Rose
 
Presentation 5th
Presentation 5thPresentation 5th
Presentation 5th
Connex
 
Paragraph Structure
Paragraph StructureParagraph Structure
Paragraph Structure
Vasha Rambaran
 
Essay an Overview
Essay an OverviewEssay an Overview
Essay an Overview
Edi Brata
 
Effective Paragraphs
Effective ParagraphsEffective Paragraphs
Effective Paragraphs
Joey Valdriz
 
Media Assignment Feedback 2013
Media Assignment Feedback 2013Media Assignment Feedback 2013
Media Assignment Feedback 2013
Amanda Simmons
 
Composition i week 2
Composition i week 2Composition i week 2
Composition i week 2
charly2011
 
Pointers andmemory
Pointers andmemoryPointers andmemory
Pointers andmemory
Ashok Kumar
 
Business Writing Style Guide Your Writing Companion
Business Writing Style Guide Your Writing CompanionBusiness Writing Style Guide Your Writing Companion
Business Writing Style Guide Your Writing Companion
englishwriting
 
Coding standard
Coding standardCoding standard
Coding standard
Shwetketu Rastogi
 
My ap boot camp
My ap boot campMy ap boot camp
My ap boot camp
Wendy Scruggs
 
Python breakdown-workbook
Python breakdown-workbookPython breakdown-workbook
Python breakdown-workbook
HARUN PEHLIVAN
 
Outlining Process
Outlining ProcessOutlining Process
Outlining Process
Claudia Cárdenas
 
Pointers In C
Pointers In CPointers In C
Pointers In C
Sriram Raj
 
Pointers In C
Pointers In CPointers In C
Pointers In C
Sriram Raj
 
Pointers andmemory
Pointers andmemoryPointers andmemory
Pointers andmemory
meinutopia
 
Pointers andmemory
Pointers andmemoryPointers andmemory
Pointers andmemory
ariffast
 
Ir 03
Ir   03Ir   03
TDD Walkthrough - Encryption
TDD Walkthrough - EncryptionTDD Walkthrough - Encryption
TDD Walkthrough - Encryption
PeterKha2
 
Informative Speech Outline TemplateSpeech TitleNameThe.docx
Informative Speech Outline TemplateSpeech TitleNameThe.docxInformative Speech Outline TemplateSpeech TitleNameThe.docx
Informative Speech Outline TemplateSpeech TitleNameThe.docx
LaticiaGrissomzz
 

Similar to Text Processing for Procedural Question Answering (20)

Writing Process
Writing ProcessWriting Process
Writing Process
 
Presentation 5th
Presentation 5thPresentation 5th
Presentation 5th
 
Paragraph Structure
Paragraph StructureParagraph Structure
Paragraph Structure
 
Essay an Overview
Essay an OverviewEssay an Overview
Essay an Overview
 
Effective Paragraphs
Effective ParagraphsEffective Paragraphs
Effective Paragraphs
 
Media Assignment Feedback 2013
Media Assignment Feedback 2013Media Assignment Feedback 2013
Media Assignment Feedback 2013
 
Composition i week 2
Composition i week 2Composition i week 2
Composition i week 2
 
Pointers andmemory
Pointers andmemoryPointers andmemory
Pointers andmemory
 
Business Writing Style Guide Your Writing Companion
Business Writing Style Guide Your Writing CompanionBusiness Writing Style Guide Your Writing Companion
Business Writing Style Guide Your Writing Companion
 
Coding standard
Coding standardCoding standard
Coding standard
 
My ap boot camp
My ap boot campMy ap boot camp
My ap boot camp
 
Python breakdown-workbook
Python breakdown-workbookPython breakdown-workbook
Python breakdown-workbook
 
Outlining Process
Outlining ProcessOutlining Process
Outlining Process
 
Pointers In C
Pointers In CPointers In C
Pointers In C
 
Pointers In C
Pointers In CPointers In C
Pointers In C
 
Pointers andmemory
Pointers andmemoryPointers andmemory
Pointers andmemory
 
Pointers andmemory
Pointers andmemoryPointers andmemory
Pointers andmemory
 
Ir 03
Ir   03Ir   03
Ir 03
 
TDD Walkthrough - Encryption
TDD Walkthrough - EncryptionTDD Walkthrough - Encryption
TDD Walkthrough - Encryption
 
Informative Speech Outline TemplateSpeech TitleNameThe.docx
Informative Speech Outline TemplateSpeech TitleNameThe.docxInformative Speech Outline TemplateSpeech TitleNameThe.docx
Informative Speech Outline TemplateSpeech TitleNameThe.docx
 

More from Estelle Delpech

Génération automatique de texte
Génération automatique de texteGénération automatique de texte
Génération automatique de texte
Estelle Delpech
 
Identification de compatibilités entre tages descriptifs de lieux
Identification de compatibilités entre tages descriptifs de lieuxIdentification de compatibilités entre tages descriptifs de lieux
Identification de compatibilités entre tages descriptifs de lieux
Estelle Delpech
 
Usage du TAL dans des applications industrielles : gestion des contenus multi...
Usage du TAL dans des applications industrielles : gestion des contenus multi...Usage du TAL dans des applications industrielles : gestion des contenus multi...
Usage du TAL dans des applications industrielles : gestion des contenus multi...
Estelle Delpech
 
Nomao: data analysis for personalized local search
Nomao: data analysis for personalized local searchNomao: data analysis for personalized local search
Nomao: data analysis for personalized local search
Estelle Delpech
 
Nomao: carnet de bonnes adresses (entre amis)
Nomao: carnet de bonnes adresses (entre amis)Nomao: carnet de bonnes adresses (entre amis)
Nomao: carnet de bonnes adresses (entre amis)
Estelle Delpech
 
Nomao: local search and recommendation engine
Nomao: local search and recommendation engineNomao: local search and recommendation engine
Nomao: local search and recommendation engine
Estelle Delpech
 
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...Extraction of domain-specific bilingual lexicon from comparable corpora: comp...
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...
Estelle Delpech
 
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Estelle Delpech
 
Évaluation applicative des terminologies destinées à la traduction spécialisée
Évaluation applicative des terminologies destinées à la traduction spécialiséeÉvaluation applicative des terminologies destinées à la traduction spécialisée
Évaluation applicative des terminologies destinées à la traduction spécialisée
Estelle Delpech
 
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchangeDealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Estelle Delpech
 
R&D Lingua et Machina
R&D Lingua et MachinaR&D Lingua et Machina
R&D Lingua et Machina
Estelle Delpech
 
Bilingual terminology mining
Bilingual terminology miningBilingual terminology mining
Bilingual terminology mining
Estelle Delpech
 
Robust rule-based parsing
Robust rule-based parsingRobust rule-based parsing
Robust rule-based parsing
Estelle Delpech
 
Experimenting the TextTiling Algorithm
Experimenting the TextTiling AlgorithmExperimenting the TextTiling Algorithm
Experimenting the TextTiling Algorithm
Estelle Delpech
 

More from Estelle Delpech (14)

Génération automatique de texte
Génération automatique de texteGénération automatique de texte
Génération automatique de texte
 
Identification de compatibilités entre tages descriptifs de lieux
Identification de compatibilités entre tages descriptifs de lieuxIdentification de compatibilités entre tages descriptifs de lieux
Identification de compatibilités entre tages descriptifs de lieux
 
Usage du TAL dans des applications industrielles : gestion des contenus multi...
Usage du TAL dans des applications industrielles : gestion des contenus multi...Usage du TAL dans des applications industrielles : gestion des contenus multi...
Usage du TAL dans des applications industrielles : gestion des contenus multi...
 
Nomao: data analysis for personalized local search
Nomao: data analysis for personalized local searchNomao: data analysis for personalized local search
Nomao: data analysis for personalized local search
 
Nomao: carnet de bonnes adresses (entre amis)
Nomao: carnet de bonnes adresses (entre amis)Nomao: carnet de bonnes adresses (entre amis)
Nomao: carnet de bonnes adresses (entre amis)
 
Nomao: local search and recommendation engine
Nomao: local search and recommendation engineNomao: local search and recommendation engine
Nomao: local search and recommendation engine
 
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...Extraction of domain-specific bilingual lexicon from comparable corpora: comp...
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...
 
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
 
Évaluation applicative des terminologies destinées à la traduction spécialisée
Évaluation applicative des terminologies destinées à la traduction spécialiséeÉvaluation applicative des terminologies destinées à la traduction spécialisée
Évaluation applicative des terminologies destinées à la traduction spécialisée
 
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchangeDealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
 
R&D Lingua et Machina
R&D Lingua et MachinaR&D Lingua et Machina
R&D Lingua et Machina
 
Bilingual terminology mining
Bilingual terminology miningBilingual terminology mining
Bilingual terminology mining
 
Robust rule-based parsing
Robust rule-based parsingRobust rule-based parsing
Robust rule-based parsing
 
Experimenting the TextTiling Algorithm
Experimenting the TextTiling AlgorithmExperimenting the TextTiling Algorithm
Experimenting the TextTiling Algorithm
 

Recently uploaded

Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Fwdays
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
LizaNolte
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
DanBrown980551
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
Enterprise Knowledge
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
christinelarrosa
 
High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
Vadym Kazulkin
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
saastr
 

Recently uploaded (20)

Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
 
High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
 

Text Processing for Procedural Question Answering

  • 1. Text Processing for Procedural Question Answering Undergoing work for TextCoop project ILPL group, presentation by Estelle Delpech
  • 2. Text Processing for Procedural Question Answering I. INTRODUCTION : GLOBAL ARCHITECTURE II. CLUES TO IDENTIFY TITLES/ INSTRUCTIONNAL COMPOUNDS III. THE WHOLE PROCESS IV. MAIN ISSUES V. DEMO
  • 3. I. INTRODUCTION : GLOBAL ARCHITECTURE
  • 4. A global Architecture (Surdeau & Pasca) How to…? Goal Task TEXT PROCESSING
  • 5. TEXT PROCESSING for Procedural QA : Identification of task structure .html PRE-PROCESSING SEGMENTER TEXT GRAMMAR TASK     HTML cleaning MS tagging Identification of terminal symbols Xbar analysis of task structure DATABASE spec G’ Pre-requisite Goal Title complemen t Instructional Compound
  • 6. II . CORPUS OBSERVATION : WHAT CLUES TO IDENTIFY -INSTRUCTIONNAL COMPOUNDS ? -TITLES ?
  • 7. 1. Clues for Instructional Compounds Identification  Definition : kernel instructions linked to various clauses by rhetorical or logical relations.  Identification in two steps :   Detect presence of instructions : expression of obligation Find instructionnal compound boudaries, e.g. connectors… Fixing the first wall plate (or shelf bracket) Fixing the first wall plate (or shelf bracket) Fixing the first wall plate (or shelf bracket) We are going to mark the first wall plate (or bracket) for drilling. We are going to mark the first wall plate (or bracket) for drilling. First,position the face plate so one screw lines up with the mark on the wall you First, position the face plate sosoone screw lines up with the mark on the wall you made First, position the face plate one screw lines up with the mark on the wall you made made in the last step and the level on topon top of the faceto ensure it is level. level. in the last step and place the level on top of the face plate to ensure it is level. in the last step and place place the level of the face plate plate to ensure it is Second, you should mark thethewall in the next screw hole, again by turning the screw Second,you should mark the wallthethe next screw hole, again turning thethe screw Second, you should mark wall in in next screw hole, again by by turning screw until it bites into the wall (see fig 1.3). until it bites into the wall (see fig 1.3). It is advised that you mark any remaining screw holes while keeping the wall plate It is advised that you mark any remaining screw holes while keeping the wall plate firmly in position. firmly in position. Now you have toto choose suitable drill bitbit (masonry or the right type for the Now you have choose a a suitable drill (masonry or or right type for the surface). It Now you have to choosea suitable drill bit (masonry thethe right type for the surface). It should be theas the wall plug thebe used. to be used. surface). the same width same width as to wall plug should beIt should be the same width as the wall plug to be used. Get to hand one of the wall plugs, and place itit against the tip of the drill bit (seefig Get to hand one of the wall plugs, and place against the tip of the drill bit (see fig Get to hand one of the wall plugs, and place it against the tip of the drill bit (see fig 1.4). 1.4). Finally, Place a piece of masking tape on the drill bit to use as a guide, this will ensure piece of masking tape on the drill bit to use as a guide, this will ensure Finally, place aa piece of masking tape on the drill bit to use as a guide, this will ensure Finally, place you don't drill too deep. you don't drill too deep.
  • 8. 1. Clues for Instructional Compounds Identification  Presence of instructions :  Morpho-lexical patterns You should pre-heat the oven shall Adv* base form verb Have to Adv* base form verb You have to pre-heat the oven ## Op? adv* base form verb Do not pre-heat the oven it be adv* (necessary|compulsory) that It is better that you pre-heat the oven  Compound boudaries :  Morpho-lexical patterns ## to Adv* base form verb .* , (##|Conj) (if|then|after )  [To cook the cake, pre-heat the oven] [and then start peeling … [If you want to cook the cake, preHTML tags (typo-disposition) : heat the oven.] [If you don’t want to cook … <p> </p> <li> </li> <li> [ Pre-heat the oven … ]</li>
  • 9. 2. Titles identification : About the HTML encoding of titles  The <hn> tag can not be used as a single clue for title identification  HTML encoding is free, the code can be underspecified (css)  Corpus observation :      80 % titles are encoded with <b> 57 % <b> encode titles 64 % <h> encode titles the coding varies from a web site to another We had to find some other clues …
  • 10. 2. Clues for Title Identification  Some helpful visual Clues :    Short sequence of word Emphasized Spaced from the rest of the text  emphasized not not a title   not short
  • 11. 2. Clues for Title Identification  Linguistic Clues :   Rarely contains tensed verb Can be a single question ? ?  Textual environment clues :    Occurs between two paragraphs of text Occurs between title and a paragraph of text No single clue, but a bundle of clues ? ?
  • 12. III. THE WHOLE PROCESS   HTML cleaning MS tagging PRE-PROCESSING  SEGMENTER Identification of terminal symbols Title Instructional Compound
  • 13. 1. HTML Cleaning module Raw HTML Code  HTML Cleaning Text chunks tags The output of the HTML <p> Cleaning module is :   <div> <p> <ol> a list of text chunks, <ul> corresponding more or less to paragraph breaks Subdivision tags <br> <br> Their corresponding typo<li> <li> dispositionnal structure Emphasis tags <h> <b> <u> <i> Main typo-dispostional information <p> <b> <p> <li> <li> <p> <b> <p> <b> <br> <br> <p> <b> <b> <br>
  • 14. 2. Clues Collection module STRUCTURE <b> <li> <li> TEXT MS Tagging TAGS  Collection module is : TreeTagger <b> <br> <br> <b> <br> <b> <li> <li> the list of text chunks with :   Nb corresponding typoTheir of instructions  Instructions types dispositionnal structure   Nb of goals Text with tagged instructions, goals,  Nb of words connectors  Nb of sentences  Linguistic information  Nb of question This information is used for :  Nb of tensed verbs  Titles identification  Instructionnal compounds identification  <b> <b> Clues The output collection of the Clues CLUES 
  • 15. 3. Processing each chunk : text or title ? TEXT CHUNKS TYPE unknown unknown Short chunk spaced from the rest of the text with emphasis a single question  Identification of unambiguous Titles    unknown unknown unknown unknown unknown unknown title text text ambiguous unknown unknown TEXT CHUNKS  Identification of unambiguous paragraphs of text     Long chunk No emphasis Subdivided + than 1 instruction presence of tensed verbs ambiguous title ambiguous text text ambiguous
  • 16. 3. Ambiguous chunks : text or title ?    Short chunks with no emphasis Instruction-like short chunks Use of textual environement clues : 1. Identify unambiguous titles/paragraphs of text 2. Desambiguates the remaining chunks
  • 17. 3. Ambiguous chunks : text or title ? TEXT CHUNKS  title text text Desambiguisation using textual environment clues  ambiguous a series of ambiguous paragraphs become text an ambiguous paragraph between two paragraphs of text becomes a title ambiguous title ambiguous text ambiguous text TEXT CHUNKS title text text text text  an ambiguous paragraph between two paragraphs of text becomes a title title title text title text
  • 18. MAIN GOAL OUTPUT EXAMPLE goal MAIN TASK task goal task goal task
  • 19. IV. Main issues : noise in web pages  « noise » of web pages : advertisements, lists of links, navigation help... interfers with compouds /title identification :     short sequence emphasis linguistic form:   Base form verb at the beginning of a sentence typical of a title or an instruction  but it is a list of links !! titles instruction titles
  • 20. IV. Main issues : refining goal/titles identification only sub-goals  sub tasks relations are identified  what about the hierarchy task/sub-task(s) ?  what about the head title / main goal ?  the head title is not always the 1st identified title (noise) sometimes there is no head title    what if the action is implicit ?    ex : the room and the bed implicit : how to clean the room and the bed some ideas :   choose a title that has vocabulary in common with instructions identify action verbs in relation with the nouns of the title